Fast Modelling of Pinna Spectral Notches from HRTFs

using Linear Prediction Residual Cepstrum

Chaitanya Ahuja and Rajesh M. Hegde

chahuja@iitk.ac.in, rhegde@iitk.ac.in

Indian Institute of Technology, Kanpur

1. Introduction

• Head Related Transfer Functions(HRTFs) are assumed as a linear-

system for a given ear.

• HRTF takes into account reﬂection, resonance and diﬀraction eﬀects

due to the pinna(outer ear), head and torso.

• Accurate Individualized HRTFs are a necessity for reconstruction of

accurate spatial audio. [1]

• Measuring HRTFs is a time-consuming job, hence is impractical in the

industry.

• Hence, reconstruction of HRTF using ear geometry would be a good

start.

• Spectral notches in HRTFs can be linked to the distance of walls of the

pinna from the entry of the ear canal [4].

• A new algorithm has been proposed, namely Linear Prediction Residual

Cepstrum (LPRC), which provides a more accurate way of extracting

Spectral Notches

2. HRTF as an All-Pole Model

• Let H(r, θ, φ, f ) be the HRTF describing a given ear

H(r, θ, φ, f ) =

ψ(r, θ, φ, f)

(f)

(1)

where ψ(r, θ, φ, f ) is sound pressure on right/left ear drum and ψ

(f)

is free-ﬁeld sound pressure. (r, θ, φ) are spherical coordinates denoting

the sound source and f is frequency.

• HRTF is approximated using an all pole model as

– we do not have access to the input sequence

– it gives a system of equations which can be eﬃciently solved

• All-Pole Model in time domain is ˆx[n] =

i=1

x[n − i], where k is the

order of approximation (of Linear Prediction(LP) Residual).

– Order chosen as 12 for all the experiments demonstrated hence-

forth.

– The choice of the order does not have a signiﬁcant eﬀect on the

results as long as it is large enough(>8)

3. Estimating HRTF using LP Residual

• Assume n

point of the minimum phase, causal signal h[n] (IFFT of

HRTF)is unknown.

• It is modelled as a linear combination of k previous points in the signal,

k being the order of the LP Residual

• Expectation of the mean squared error e[n] is minimized

e[n] = h[n] −

i=1

h[n − i] (2)

• LP residual analysis assumes a source ﬁlter model and estimates 3 com-

ponents

– all-pole model

– residual, representing excitation of source of sound

– gain, corresponding to the energy of the signal

4. Linear Prediction Residual Cepstrum

• Windowed signal is transformed using Cepstrum which is deﬁned as

] = Re (IDCT (log

(|F{x[n]}|))) (3)

where F is discrete-fourier transform, IDCT is inverse-discrete cosine

transform and Re is real part of the sequence.

• Cepstrum, by virtue of FFT followed by log function, changes convolu-

tion to addition form.

• A half-rectangular lifter eliminates convolutional components of multi-

ple reﬂections.

• DCT requires fewer coeﬃcients to better approximate the spectrum

than FFT.

– More information can be stored in fewer number of data points.

7. Model for Pinna Contour Extraction

• Reﬂection Model, as described by Batteau [2] and modiﬁed by Sa-

tarzadeh [3], has been applied to overlay contour of spectral notches

on the picture of pinna of a given individual.

• Let a pinna be subjected to a sound wave x[t]

• Total signal y[t] received at the ear canal is

y[t] = x[t] (Direct Signal) + ax[t − t

(θ)] (Reﬂected Signal) (4)

where a is the reﬂection coeﬃcient and t

(θ) is the time delay.

• For destructive superposition of incident and reﬂected waves we have

(θ)2πf

(θ) = (2n + 1)π ∀n = 0, 1, 2 . . .

• For n = 0 and t

(θ) =

2d(θ)

we have f

(θ) =

(θ)

4d(θ)

• Assuming reﬂection coeﬃcient to be negative (Satarzadeh’s argument)

we get

(θ) =

2d(θ)

(5)

where c is the speed of sound in air, d(θ) is the path diﬀerence between

reﬂected and direct wave, f

(θ) is the frequency of the ﬁrst spectral

notch and θ is the angle of elevation.

5. LPRC Algorithm

Figure 1: Flowchart of LPRC algorithm

6. Spectral Notch Extraction using LPRC

0 0.5 1 1.5 2 2.5 3 3.5

−1

(a)

Time (ms)

Signal

0 0.5 1 1.5 2 2.5 3 3.5

−1

(b)

Time (ms)

0 0.5 1 1.5 2 2.5 3 3.5

−1

(c)

Time (ms)

Windowed LP residual

0 0.5 1 1.5 2 2.5 3 3.5

−1

(d)

Quefrency (ms)

Cosine Cepstrum Window

0 0.5 1 1.5 2 2.5 3 3.5

−1

Quefrency (ms)

(e)

Windowed Cepstrum

0 5 10 15 20

−10

(f)

Corresponding Spectrum Magnitude

0 5 10 15 20

−10

−5

(g)

0 5 10 15 20

−15

−10

−5

(h)

0 5 10 15 20

(i)

0 5 10 15 20

−2

Frequency (kHz)

(k)

LP residual Window

Figure 2: Application of LPRC algorithm for extracting spectral notches for θ =

0 and φ = 0 of subject 119, Courtesy: CIPIC Database. Figure (a): Original

signal, Figure (b): LP residual of original signal, Figure (c): Half-hann window of

previous signal, Figure (d): Cepstrum of windowed signal, Figure (e): Rectangular

window of previous signal. Figure (f), (g), (h), (i), (k) refer to fourier transforms of

Figure (a), (b), (c), (d), (e) respectively. Local minimas in Figure (k) correspond

to frequencies of spectral notches.

9. Performance Evaluation

8. Results of Pinna Contour extraction

Notches are overlaid on picture using points corresponding to (d(θ), π + θ)

with respect to the ear canal as the origin.

Subject 162

(a1)

Subject 119

(b1)

Subject 58

(c1)

Subject 44

(d1)

(a2) (b2) (c2) (d2)

Figure 3: Illustration of pinna images with contours overlaid on them. (a1)

through (d1) are generated using LPRGD algorithm [4]. (a2) through (d2) are

using LPRC algorithm.

9(b). Analysis of Variance (ANOVA)

• Frequencies of extracted notches used to synthesise an all-pole ﬁlter of

ﬁxed bandwidth

• This ﬁlter is excited by an impulse train to generate HRIR

• Synthesized HRTF is compared to original spectrum using Analysis of

Variance(ANOVA) F-Test

– Sensitivity = 5%

– Degrees of freedom of numerator, n

= 1 and denominator, d

1000

– This implies F

= 3.85

– F-stat values are calculated for all such comparisons of HRTF and

plot as a frequency chart

– Null-Hypothesis is rejected when F > F

Female Subjects

1 2

100

F > Fc

F < Fc

(a)

Male Subjects

1 2

100

150

200

F > Fc

F < Fc

(b)

All Subjects

1 2

100

150

200

250

300

350

F > Fc

F < Fc

(c)

• Bar 1 and Bar 2 represent analysis on LPRGD and LPRC respectively

• Clearly Null-Hypothesis is rejected more prominently in female subjects

when using LPRGD Algorithm

• Hence LPRC gives more accurate notches than LPRGD

10. Conclusions

• Linear Prediction Residual Cepstrum (LPRC) proposed as a more ac-

curate algorithm for extraction of spectral notches.

• Lesser number of coeﬃcients are required for storing the information

about spectral notches.

• As compared to LPRGD, Mean and Variance in AED of notch distances

are signiﬁcantly smaller for notches extracted using LPRC, which indi-

cates better accuracy of the proposed algorithm.

• Mean of DBR is signiﬁcantly larger for spectral notches, which indicates

sharpness of the valleys in the spectrum.

• Analysis of Variance of the reconstructed HRTF with original HRTF

indicate more statistical closeness of HRTF constructed from notches

extracted using LPRC.

• Same algorithm can be modiﬁed to extract peaks, which are a result of

resonance eﬀects.

• Accurate Spectral notch techniques are an essential component for ver-

iﬁcation of spectral notches (from geometry of the ear) and on-line

modelling of the pinna for synthesizing personalized spatial audio.

11. References

[1] Toni Liitola. Headphone sound externalization. PhD thesis, Helsinki University of Technology, 2006.

[2] D. W. Batteau. The role of the pinna in human localization. Proceedings of the Royal Society of London. Series B, Biological Sciences, 168, No.1011:158–180, August 1967.

[3] Patrick Satarzadeh. A study of physical and circuit models of the human pinnae. PhD thesis, Citeseer, 2006.

[4] Vikas C. Raykar, Ramani Duraiswami, and B. Yegnanarayana. Extracting the frequencies of the pinna spectral notches in measured head related impulse responses. The Journal of

the Acoustical Society of America, 118(1):364–374, 2005.

[5] V.R. Algazi, R.O. Duda, D.M. Thompson, and C. Avendano. The cipic hrtf database. In Applications of Signal Processing to Audio and Acoustics, 2001 IEEE Workshop on the,

pages 99–102, 2001.

9(a). Statistical Analysis

• Publicly available CIPIC Database [5] has been used as the database

for testing the algorithm.

• Contours on the pinna were marked manually at discrete angles

• Using Equation 5 frequency of spectral notches were calculated

• Used as reference for calculation of deviation errors

• Average Error Deviation (AED) in notch distances and Mean and Vari-

ance of DBR was calculated separately for female and male subjects

• Depth-Bandwidth Ratio DBR =

Depth

3dB Bandwidth

and notch distances

were also calculated

LPRGD

AED in Notch Distance DBR

Mean Variance Mean Variance

(cm) (cm) (dB kHz

−1

) (dB kHz

−1

)

Female 0.1496 0.1474 2.7600 8.1947

Male 0.1481 0.1375 2.8900 8.5188

LPRC

AED in Notch Distance DBR

Mean Variance Mean Variance

(cm) (cm) (dB kHz

−1

) (dB kHz)

−1

Female 0.0511 0.0848 8.9097 1746.6

Male 0.0349 0.0701 9.9529 1507.0