Fast Modelling of Pinna Spectral Notches from HRTFs
using Linear Prediction Residual Cepstrum
Chaitanya Ahuja and Rajesh M. Hegde
chahuja@iitk.ac.in, rhegde@iitk.ac.in
Indian Institute of Technology, Kanpur
1. Introduction
• Head Related Transfer Functions(HRTFs) are assumed as a linear-
system for a given ear.
• HRTF takes into account reflection, resonance and diffraction effects
due to the pinna(outer ear), head and torso.
• Accurate Individualized HRTFs are a necessity for reconstruction of
accurate spatial audio. [1]
• Measuring HRTFs is a time-consuming job, hence is impractical in the
industry.
• Hence, reconstruction of HRTF using ear geometry would be a good
start.
• Spectral notches in HRTFs can be linked to the distance of walls of the
pinna from the entry of the ear canal [4].
• A new algorithm has been proposed, namely Linear Prediction Residual
Cepstrum (LPRC), which provides a more accurate way of extracting
Spectral Notches
2. HRTF as an All-Pole Model
• Let H(r, θ, φ, f ) be the HRTF describing a given ear
H(r, θ, φ, f ) =
ψ(r, θ, φ, f)
ψ
0
(f)
(1)
where ψ(r, θ, φ, f ) is sound pressure on right/left ear drum and ψ
0
(f)
is free-field sound pressure. (r, θ, φ) are spherical coordinates denoting
the sound source and f is frequency.
• HRTF is approximated using an all pole model as
– we do not have access to the input sequence
– it gives a system of equations which can be efficiently solved
• All-Pole Model in time domain is ˆx[n] =
P
k
i=1
a
i
x[n − i], where k is the
order of approximation (of Linear Prediction(LP) Residual).
– Order chosen as 12 for all the experiments demonstrated hence-
forth.
– The choice of the order does not have a significant effect on the
results as long as it is large enough(>8)
3. Estimating HRTF using LP Residual
• Assume n
th
point of the minimum phase, causal signal h[n] (IFFT of
HRTF)is unknown.
• It is modelled as a linear combination of k previous points in the signal,
k being the order of the LP Residual
• Expectation of the mean squared error e[n] is minimized
e[n] = h[n] −
k
X
i=1
a
i
h[n − i] (2)
• LP residual analysis assumes a source filter model and estimates 3 com-
ponents
– all-pole model
– residual, representing excitation of source of sound
– gain, corresponding to the energy of the signal
4. Linear Prediction Residual Cepstrum
• Windowed signal is transformed using Cepstrum which is defined as
c
x
[n
q
] = Re (IDCT (log
10
(|F{x[n]}|))) (3)
where F is discrete-fourier transform, IDCT is inverse-discrete cosine
transform and Re is real part of the sequence.
• Cepstrum, by virtue of FFT followed by log function, changes convolu-
tion to addition form.
• A half-rectangular lifter eliminates convolutional components of multi-
ple reflections.
• DCT requires fewer coefficients to better approximate the spectrum
than FFT.
– More information can be stored in fewer number of data points.
7. Model for Pinna Contour Extraction
• Reflection Model, as described by Batteau [2] and modified by Sa-
tarzadeh [3], has been applied to overlay contour of spectral notches
on the picture of pinna of a given individual.
• Let a pinna be subjected to a sound wave x[t]
• Total signal y[t] received at the ear canal is
y[t] = x[t] (Direct Signal) + ax[t − t
d
(θ)] (Reflected Signal) (4)
where a is the reflection coefficient and t
d
(θ) is the time delay.
• For destructive superposition of incident and reflected waves we have
t
d
(θ)2πf
n
(θ) = (2n + 1)π ∀n = 0, 1, 2 . . .
• For n = 0 and t
d
(θ) =
2d(θ)
c
we have f
0
(θ) =
1
2t
d
(θ)
=
c
4d(θ)
• Assuming reflection coefficient to be negative (Satarzadeh’s argument)
we get
f
0
(θ) =
c
2d(θ)
(5)
where c is the speed of sound in air, d(θ) is the path difference between
reflected and direct wave, f
0
(θ) is the frequency of the first spectral
notch and θ is the angle of elevation.
5. LPRC Algorithm
Figure 1: Flowchart of LPRC algorithm
6. Spectral Notch Extraction using LPRC
0 0.5 1 1.5 2 2.5 3 3.5
−1
0
1
(a)
Time (ms)
Signal
0 0.5 1 1.5 2 2.5 3 3.5
−1
0
1
(b)
Time (ms)
0 0.5 1 1.5 2 2.5 3 3.5
−1
0
1
(c)
Time (ms)
Windowed LP residual
0 0.5 1 1.5 2 2.5 3 3.5
−1
0
1
(d)
Quefrency (ms)
Cosine Cepstrum Window
0 0.5 1 1.5 2 2.5 3 3.5
−1
0
1
Quefrency (ms)
(e)
Windowed Cepstrum
0 5 10 15 20
−10
0
10
(f)
Corresponding Spectrum Magnitude
0 5 10 15 20
−10
−5
0
(g)
0 5 10 15 20
−15
−10
−5
(h)
0 5 10 15 20
0
5
10
(i)
0 5 10 15 20
−2
0
2
4
6
Frequency (kHz)
(k)
LP residual Window
Figure 2: Application of LPRC algorithm for extracting spectral notches for θ =
0 and φ = 0 of subject 119, Courtesy: CIPIC Database. Figure (a): Original
signal, Figure (b): LP residual of original signal, Figure (c): Half-hann window of
previous signal, Figure (d): Cepstrum of windowed signal, Figure (e): Rectangular
window of previous signal. Figure (f), (g), (h), (i), (k) refer to fourier transforms of
Figure (a), (b), (c), (d), (e) respectively. Local minimas in Figure (k) correspond
to frequencies of spectral notches.
9. Performance Evaluation
8. Results of Pinna Contour extraction
Notches are overlaid on picture using points corresponding to (d(θ), π + θ)
with respect to the ear canal as the origin.
Subject 162
(a1)
Subject 119
(b1)
Subject 58
(c1)
Subject 44
(d1)
(a2) (b2) (c2) (d2)
Figure 3: Illustration of pinna images with contours overlaid on them. (a1)
through (d1) are generated using LPRGD algorithm [4]. (a2) through (d2) are
using LPRC algorithm.
9(b). Analysis of Variance (ANOVA)
• Frequencies of extracted notches used to synthesise an all-pole filter of
fixed bandwidth
• This filter is excited by an impulse train to generate HRIR
• Synthesized HRTF is compared to original spectrum using Analysis of
Variance(ANOVA) F-Test
– Sensitivity = 5%
– Degrees of freedom of numerator, n
f
= 1 and denominator, d
f
=
1000
– This implies F
c
= 3.85
– F-stat values are calculated for all such comparisons of HRTF and
plot as a frequency chart
– Null-Hypothesis is rejected when F > F
c
Female Subjects
(a)
Male Subjects
(b)
All Subjects
(c)
• Bar 1 and Bar 2 represent analysis on LPRGD and LPRC respectively
• Clearly Null-Hypothesis is rejected more prominently in female subjects
when using LPRGD Algorithm
• Hence LPRC gives more accurate notches than LPRGD
10. Conclusions
• Linear Prediction Residual Cepstrum (LPRC) proposed as a more ac-
curate algorithm for extraction of spectral notches.
• Lesser number of coefficients are required for storing the information
about spectral notches.
• As compared to LPRGD, Mean and Variance in AED of notch distances
are significantly smaller for notches extracted using LPRC, which indi-
cates better accuracy of the proposed algorithm.
• Mean of DBR is significantly larger for spectral notches, which indicates
sharpness of the valleys in the spectrum.
• Analysis of Variance of the reconstructed HRTF with original HRTF
indicate more statistical closeness of HRTF constructed from notches
extracted using LPRC.
• Same algorithm can be modified to extract peaks, which are a result of
resonance effects.
• Accurate Spectral notch techniques are an essential component for ver-
ification of spectral notches (from geometry of the ear) and on-line
modelling of the pinna for synthesizing personalized spatial audio.
11. References
[1] Toni Liitola. Headphone sound externalization. PhD thesis, Helsinki University of Technology, 2006.
[2] D. W. Batteau. The role of the pinna in human localization. Proceedings of the Royal Society of London. Series B, Biological Sciences, 168, No.1011:158–180, August 1967.
[3] Patrick Satarzadeh. A study of physical and circuit models of the human pinnae. PhD thesis, Citeseer, 2006.
[4] Vikas C. Raykar, Ramani Duraiswami, and B. Yegnanarayana. Extracting the frequencies of the pinna spectral notches in measured head related impulse responses. The Journal of
the Acoustical Society of America, 118(1):364–374, 2005.
[5] V.R. Algazi, R.O. Duda, D.M. Thompson, and C. Avendano. The cipic hrtf database. In Applications of Signal Processing to Audio and Acoustics, 2001 IEEE Workshop on the,
pages 99–102, 2001.
9(a). Statistical Analysis
• Publicly available CIPIC Database [5] has been used as the database
for testing the algorithm.
• Contours on the pinna were marked manually at discrete angles
• Using Equation 5 frequency of spectral notches were calculated
• Used as reference for calculation of deviation errors
• Average Error Deviation (AED) in notch distances and Mean and Vari-
ance of DBR was calculated separately for female and male subjects
• Depth-Bandwidth Ratio DBR =
Depth
3dB Bandwidth
and notch distances
were also calculated
LPRGD
AED in Notch Distance DBR
Mean Variance Mean Variance
(cm) (cm) (dB kHz
−1
) (dB kHz
−1
)
Female 0.1496 0.1474 2.7600 8.1947
Male 0.1481 0.1375 2.8900 8.5188
LPRC
AED in Notch Distance DBR
Mean Variance Mean Variance
(cm) (cm) (dB kHz
−1
) (dB kHz)
−1
Female 0.0511 0.0848 8.9097 1746.6
Male 0.0349 0.0701 9.9529 1507.0