Extraction of Pinna Spectral Notches in the Median Plane

of a Virtual Spherical Microphone Array

Ankit Sohni, Chaitanya Ahuja, and Rajesh M Hegde

Indian Institu t e of Technology Kanpur, India

Introduction

• Head Related Impulse Response (HRIR) ca ptures the effects of

interaction of sound with human anatomy.

• Head diffraction causes ITD and ILD between sound waves arriv-

ing at both ears which are the primary binaural cues in horizontal

plane localization.

• The effect of head is invariant in the median plane as both the

binaural cues (ITD and ILD) are nearly equal to zero.

• Pinna geometry causes multiple r eﬂections of sound wave, and the

delay between direct wave and the wave reﬂe c ted by pinna wall

results in periodic spectral notches.

• Head Related Transfer Function (HRTF) corresponding to mea-

sured HRIR are simulated by FB S over the me dian plane, and

spectral notches are extracted from reconstructed HRTF.

• These spectral notches smoothly vary with elevation angles, and

are highly dependent on pinna dimensions.

Plane wave Decomposition

• HRTF recorded by spherical arr a y of microphones due to source

located at the entrance of e a r canal can be decomposed into spher-

ical harmonics as

H(k; r, θ, φ) =

∞

n=0

m=−n

(k; r)Y

(θ, φ) (1)

(θ, φ) =

2n + 1

4π

(n − |m|)!

(n + |m|)!

|m|

(cos θ)e

jmφ

(2)

0 ≤ θ ≤ π, 0 ≤ φ < 2π

• Under the far ﬁeld assumption (r > 1m), HRTF will be indepen-

dent of range r and can be represented as

H(f; θ, φ) =

∞

n=0

m=−n

(f)Y

(θ, φ) (3)

where H

(f) is Spherical Fourier Tr ansform (SFT).

• Alternatively, the far ﬁ e ld HRTF can be decomposed into its cor-

responding Legendre polynomial and complex exponential as

H(f; θ, φ) =

∞

n=0

m=−n

(f)P

|m|

(cos θ)e

jmφ

(4)

HRTF Modeling over Median Plane

• In terms of convergence and computational complexity, complex

exponents are better choice as compared to associated Legendre

polynomial to represent HRTF over the median plane.

• Using head-centered interaural polar c oordinate system, 3 dimen-

sional HRTF in Equation 4 can be r e presented ove r the median

plane (θ =

) as

H(f, φ) =

∞

m=−∞

(f)e

jmφ

(5)

• The spectral component C

(f) can be modeled by the family of

Bessel functions of ﬁrst kind as

(f) =

∞

k=1

|m|

(β

|m|

max

) (6)

• Combining Equations 5 and 6, median plane HRTF can be decom-

posed into Fourier Bessel Series as

H(f, φ) =

∞

m=−∞

∞

k=1

|m|

(β

|m|

max

jmφ

(7)

where C

represent Fourier Bessel Coefﬁcient, and are calcu-

lated as

π[J

|m+1|

(β

|m|

)]

max

−π

fH(f, φ)J

|m|

(β

|m|

max

)

· · ·e

−jmφ

dfdφ (8)

Choice of Truncation number

−10

−5

Fourier Bessel Coefficient

Amplitude

• The modal parameter C

are band limited and preserve negligible energy after

some truncated value |m| > M and k > K + K

′

• C

corresponding to ﬁrst K

′

roots of Bessel function preserve faint initial pulse

which do not contribute any structural feature of HRIR.

• C

corresponding to next K roots preserve much of variations due to pinna

alone, and are very signiﬁcant for pinna spectral notches.

• In CIPIC database, It i s found that convergence is achieved for M = 10, K

′

30 and K = 40.

Pinna Reﬂection Model

• According to two ray reﬂection model, the resultant signal y(t) due to interfer-

ence between direct wave, x(t) and the wave reﬂected by pinna wall, x(t−t(φ))

is given by

y(t) = x(t) + ρ(φ)x (t − t(φ)) (9)

or Y (e

jω

) = (1 + ρ(φ)e

−jωt(φ)

)X(e

jω

) (10)

• The elevation dependent temporal delay t(φ) results the point of reﬂection in

the pinna image at a distance given by

d(φ) =

ct(φ)

(11)

• It also results in the periodic spectral notches whose frequencies (assuming

ρ(φ) > 0) are given by

(φ) =

2n + 1

2t(φ)

c(2n + 1)

4d(φ)

, ∀n = 0, 1, 2, · · · (12)

• The ﬁrst spectral notch frequency occurs at f

(φ) =

4d(φ)

• Assuming Satarzadeh’s hypothesis of negative reﬂection coefﬁcient (ρ(φ) < 0),

the spectral notch frequency gets doubled as

(φ) =

2d(φ)

(13)

Reconstructed HRIR

• The Fourier Bessel Coefﬁcients in Equation 7 are calculated from discrete spa-

tial and spectral HRTF measured over the hemispherical median plane as

π[J

|m+1|

(β

|m|

)]

max

5π

=−

H(f

, φ

|m|

(β

|m|

max

−jmφ

(14)

|m| ≤ M, K

′

< k < K

′

+ K

• Measured HRIR is composed of head diffraction, pinna and torso reﬂections,

and as an artifact, knee reﬂection.

• In the lower elevation angles, this knee reﬂection appears within 1 ms time

window along with pinna reﬂections.

• HRIR reconstructed through Fourier Bessel Series only preserves the pinna re-

ﬂections that appear within 0.5 ms window range.

Extraction of Pinna Spectra l N otches

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

−0.5

0.5

(a)

Reconstructed HRIR through FBS

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

−0.2

0.2

(b)

LP residual

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

−0.2

0.2

(c)

Windowed LP residual

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

−0.2

0.2

(d)

Autocorrelation

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

−0.2

0.2

Time (ms)

(e)

Windowed Autocorrelation

0 5 10 15 20

−20

−10

(f)

Corresponding Spectrum Magnitude (dB)

0 5 10 15 20

−15

−10

−5

(g)

0 5 10 15 20

−30

−20

−10

(h)

0 5 10 15 20

−30

−20

−10

(i)

0 5 10 15 20

−10

−5

(j)

0 5 10 15 20

−6

−4

−2

Frequency (kHz)

(k)

Group Delay

• HRIR reconstructed through Fourier Bessel Series only highlights the effects of

pinna resonances and notches.

• LP residual of reconstructed HRIR removes the pinna resonances while retains

the pinna spectral nulls.

• Windowing the LP Residual of reconstructed HRIR makes the spectrum smoothen

while preserving the pinna spectral notches.

• Auto-correlation of windowed LP residual preserves most of the details of spec-

tral envelop such as notch depth and bandwidth.

• Due to high frequency resolution property of group delay function, pinna spec-

tral notches are extracted from the group delay of the windowed auto-correlation

function.

• Threshold of -1 is empirically chosen in order to avoid spurious nulls caused by

windowing.

Experiments on Pinna Spectral Notches

• Publicly available CIPIC database is used where the data-set of several sub-

jects with their pinna images and corresponding anthropometry parameters are

available.

• HRIRs are measured using head-centered interaural polar coordinate system

with elevation uniformly sampled from −45

◦

to 230.625

◦

in the median plane.

• Based on prior researches, Pinna spectral notch frequencies are assumed to ap-

pear in frequency range from 5 kHz to 16 kHz, and are extracted from robust

signal processing techniques.

• Pinna image of particular subject is uniformly scaled in order to match with

pinna parameters such as d

(pinna height) and d

(pinna width).

• The distance d(φ) between pinna reﬂection point and the entrance of the ear

canal i s calculated from Equation 13 for frontal median plane φ ∈ [− 45

◦

• Each notch point is mapped to (d(φ), π + φ) in the right pinna and (d(φ), −φ)

in the left pinna with respect to entrance of the ear canal.

Pinna Spectral Notches overlaid on

HRTF

Elevation(degrees)

Frequency(kHz)

Subject 124 left pinna azimuth 0.00°

−40 −20 0 20 40 60 80

−60

−50

−40

−30

−20

−10

)

Elevation(degrees)

Frequency(kHz)

Subject 163 right pinna azimuth 0.00°

−40 −20 0 20 40 60 80

−70

−60

−50

−40

−30

−20

−10

)

Elevation(degrees)

Frequency(kHz)

Subject 119 right pinna azimuth 0.00°

−40 −20 0 20 40 60 80

−60

−50

−40

−30

−20

−10

)

Elevation(degrees)

Frequency(kHz)

Subject 017 right pinna azimuth 0.00°

−40 −20 0 20 40 60 80

−50

−40

−30

−20

−10

)

Elevation(degrees)

Frequency(kHz)

Subject 124 left pinna azimuth 0.00°

−40 −20 0 20 40 60 80

−50

−40

−30

−20

−10

)

Elevation(degrees)

Frequency(kHz)

Subject 163 right pinna azimuth 0.00°

−40 −20 0 20 40 60 80

−50

−40

−30

−20

−10

)

Elevation(degrees)

Frequency(kHz)

Subject 119 right pinna azimuth 0.00°

−40 −20 0 20 40 60 80

−40

−30

−20

−10

)

Elevation(degrees)

Frequency(kHz)

Subject 017 right pinna azimuth 0.00°

−40 −20 0 20 40 60 80

−60

−50

−40

−30

−20

−10

)

Pinna Notches ma rked on ear c ontour

Subject 124

)

Subject 163

)

Subject 119

)

Subject 17

)

) (b

) (c

) (d

)

Conclusion

• A fast method to extract accurate pinna spectral notches that follow the actual

pinna wall structure is proposed.

• The main novelty of the proposed work is the efﬁcient reconstruction of HRIR

over the median plane of a virtual spherical array simulated using the Fourier

Bessel series, especially at lower elevation angles.

• HRIRs corresponding t o lower elevation angles suffer from knee reﬂections

which have slight contribution as compared to other anatomical reﬂections in

the measured HRIR.

• The proposed method can suppress the knee reﬂections due to capability of

preserving strong variations of pinna alone under ﬁnite truncation.

• The pinna spectral notches extracted are also very accurate and smooth when

compared to conventional spherical array based approach.

• The proposed method i s robust to extract the pinna spectral notches even if

HRIR is measured over the complete hemisphere.

Reference s

[1] V Ralph Algazi, Richard O Duda, Dennis M Thompson, and Carlos Avendano,

“The cipic hrtf database,” in Applications of Signal Processing to Audio and

Acoustics, 2001 IEEE Workshop on the. IEEE, 2001, pp. 99–102.

[2] Vikas C. Raykar, Ramani Duraiswami, and B. Yegnanarayana, “Extracting

the frequencies of the pinna spectral notches in measured head related impulse

responses,” The Journal of the Acoustical Society of America, vol. 118, no. 1,

pp. 364–374, 2005.

[3] V Ralph Algazi, Richard O Duda, and Patrick Satarzadeh, “Physical and ﬁlter

pinna models based on anthropometry,” in Audio Engineering Society Conven-

tion 122. Audio Engineering Society, 2007.

[4] Dwight W Batteau, “The role of the pinna in human localization,” Proceedings

of the Royal Society of London. Series B. Biological Sciences, vol. 168, no.

1011, pp. 158–180, 1967.