Extraction of Pinna Spectral Notches in the Median Plane
of a Virtual Spherical Microphone Array
Ankit Sohni, Chaitanya Ahuja, and Rajesh M Hegde
Indian Institu t e of Technology Kanpur, India
Introduction
Head Related Impulse Response (HRIR) ca ptures the effects of
interaction of sound with human anatomy.
Head diffraction causes ITD and ILD between sound waves arriv-
ing at both ears which are the primary binaural cues in horizontal
plane localization.
The effect of head is invariant in the median plane as both the
binaural cues (ITD and ILD) are nearly equal to zero.
Pinna geometry causes multiple r eflections of sound wave, and the
delay between direct wave and the wave refle c ted by pinna wall
results in periodic spectral notches.
Head Related Transfer Function (HRTF) corresponding to mea-
sured HRIR are simulated by FB S over the me dian plane, and
spectral notches are extracted from reconstructed HRTF.
These spectral notches smoothly vary with elevation angles, and
are highly dependent on pinna dimensions.
Plane wave Decomposition
HRTF recorded by spherical arr a y of microphones due to source
located at the entrance of e a r canal can be decomposed into spher-
ical harmonics as
H(k; r, θ, φ) =
X
n=0
n
X
m=n
H
m
n
(k; r)Y
m
n
(θ, φ) (1)
Y
m
n
(θ, φ) =
s
2n + 1
4π
(n |m|)!
(n + |m|)!
P
|m|
n
(cos θ)e
j
(2)
0 θ π, 0 φ < 2π
Under the far field assumption (r > 1m), HRTF will be indepen-
dent of range r and can be represented as
H(f; θ, φ) =
X
n=0
n
X
m=n
H
m
n
(f)Y
m
n
(θ, φ) (3)
where H
m
n
(f) is Spherical Fourier Tr ansform (SFT).
Alternatively, the far fi e ld HRTF can be decomposed into its cor-
responding Legendre polynomial and complex exponential as
H(f; θ, φ) =
X
n=0
n
X
m=n
α
m
n
H
m
n
(f)P
|m|
n
(cos θ)e
j
(4)
HRTF Modeling over Median Plane
In terms of convergence and computational complexity, complex
exponents are better choice as compared to associated Legendre
polynomial to represent HRTF over the median plane.
Using head-centered interaural polar c oordinate system, 3 dimen-
sional HRTF in Equation 4 can be r e presented ove r the median
plane (θ =
π
2
) as
H(f, φ) =
X
m=−∞
C
m
(f)e
j
(5)
The spectral component C
m
(f) can be modeled by the family of
Bessel functions of first kind as
C
m
(f) =
X
k=1
C
mk
J
|m|
(β
|m|
k
f
f
max
) (6)
Combining Equations 5 and 6, median plane HRTF can be decom-
posed into Fourier Bessel Series as
H(f, φ) =
X
m=−∞
X
k=1
C
mk
J
|m|
(β
|m|
k
f
f
max
)e
j
(7)
where C
mk
represent Fourier Bessel Coefficient, and are calcu-
lated as
C
mk
=
1
π[J
|m+1|
(β
|m|
k
)]
2
f
max
Z
0
π
Z
π
fH(f, φ)J
|m|
(β
|m|
k
f
f
max
)
· · ·e
j
df (8)
Choice of Truncation number
30
35
40
45
50
55
60
65
70
−10
−5
0
5
10
0
1
2
3
4
5
6
k
Fourier Bessel Coefficient
m
Amplitude
The modal parameter C
mk
are band limited and preserve negligible energy after
some truncated value |m| > M and k > K + K
.
C
mk
corresponding to rst K
roots of Bessel function preserve faint initial pulse
which do not contribute any structural feature of HRIR.
C
mk
corresponding to next K roots preserve much of variations due to pinna
alone, and are very significant for pinna spectral notches.
In CIPIC database, It i s found that convergence is achieved for M = 10, K
=
30 and K = 40.
Pinna Reflection Model
According to two ray reflection model, the resultant signal y(t) due to interfer-
ence between direct wave, x(t) and the wave reflected by pinna wall, x(tt(φ))
is given by
y(t) = x(t) + ρ(φ)x (t t(φ)) (9)
or Y (e
jω
) = (1 + ρ(φ)e
jωt(φ)
)X(e
jω
) (10)
The elevation dependent temporal delay t(φ) results the point of reflection in
the pinna image at a distance given by
d(φ) =
ct(φ)
2
(11)
It also results in the periodic spectral notches whose frequencies (assuming
ρ(φ) > 0) are given by
f
n
(φ) =
2n + 1
2t(φ)
=
c(2n + 1)
4d(φ)
, n = 0, 1, 2, · · · (12)
The first spectral notch frequency occurs at f
0
(φ) =
c
4d(φ)
Assuming Satarzadeh’s hypothesis of negative reflection coefficient (ρ(φ) < 0),
the spectral notch frequency gets doubled as
f
0
(φ) =
c
2d(φ)
(13)
Reconstructed HRIR
The Fourier Bessel Coefficients in Equation 7 are calculated from discrete spa-
tial and spectral HRTF measured over the hemispherical median plane as
C
mk
=
1
π[J
|m+1|
(β
|m|
k
)]
2
f
max
X
f
i
=0
5π
4
X
φ
i
=
π
4
f
i
H(f
i
, φ
i
)J
|m|
(β
|m|
k
f
i
f
max
)e
j
i
(14)
|m| M, K
< k < K
+ K
Measured HRIR is composed of head diffraction, pinna and torso reflections,
and as an artifact, knee reflection.
In the lower elevation angles, this knee reflection appears within 1 ms time
window along with pinna reflections.
HRIR reconstructed through Fourier Bessel Series only preserves the pinna re-
flections that appear within 0.5 ms window range.
Extraction of Pinna Spectra l N otches
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
−0.5
0
0.5
(a)
Reconstructed HRIR through FBS
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
−0.2
0
0.2
(b)
LP residual
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
−0.2
0
0.2
(c)
Windowed LP residual
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
−0.2
0
0.2
(d)
Autocorrelation
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
−0.2
0
0.2
Time (ms)
(e)
Windowed Autocorrelation
0 5 10 15 20
−20
−10
0
(f)
Corresponding Spectrum Magnitude (dB)
0 5 10 15 20
−15
−10
−5
0
(g)
0 5 10 15 20
−30
−20
−10
0
(h)
0 5 10 15 20
−30
−20
−10
0
(i)
0 5 10 15 20
−10
−5
0
(j)
0 5 10 15 20
−6
−4
−2
0
2
Frequency (kHz)
(k)
Group Delay
HRIR reconstructed through Fourier Bessel Series only highlights the effects of
pinna resonances and notches.
LP residual of reconstructed HRIR removes the pinna resonances while retains
the pinna spectral nulls.
Windowing the LP Residual of reconstructed HRIR makes the spectrum smoothen
while preserving the pinna spectral notches.
Auto-correlation of windowed LP residual preserves most of the details of spec-
tral envelop such as notch depth and bandwidth.
Due to high frequency resolution property of group delay function, pinna spec-
tral notches are extracted from the group delay of the windowed auto-correlation
function.
Threshold of -1 is empirically chosen in order to avoid spurious nulls caused by
windowing.
Experiments on Pinna Spectral Notches
Publicly available CIPIC database is used where the data-set of several sub-
jects with their pinna images and corresponding anthropometry parameters are
available.
HRIRs are measured using head-centered interaural polar coordinate system
with elevation uniformly sampled from 45
to 230.625
in the median plane.
Based on prior researches, Pinna spectral notch frequencies are assumed to ap-
pear in frequency range from 5 kHz to 16 kHz, and are extracted from robust
signal processing techniques.
Pinna image of particular subject is uniformly scaled in order to match with
pinna parameters such as d
5
(pinna height) and d
6
(pinna width).
The distance d(φ) between pinna reflection point and the entrance of the ear
canal i s calculated from Equation 13 for frontal median plane φ [45
90
].
Each notch point is mapped to (d(φ), π + φ) in the right pinna and (d(φ), φ)
in the left pinna with respect to entrance of the ear canal.
Pinna Spectral Notches overlaid on
HRTF
Elevation(degrees)
Frequency(kHz)
Subject 124 left pinna azimuth 0.00°
−40 −20 0 20 40 60 80
0
5
10
15
20
−60
−50
−40
−30
−20
−10
0
10
(a
1
)
Elevation(degrees)
Frequency(kHz)
Subject 163 right pinna azimuth 0.00°
−40 −20 0 20 40 60 80
0
5
10
15
20
−70
−60
−50
−40
−30
−20
−10
0
10
(b
1
)
Elevation(degrees)
Frequency(kHz)
Subject 119 right pinna azimuth 0.00°
−40 −20 0 20 40 60 80
0
5
10
15
20
−60
−50
−40
−30
−20
−10
0
10
(c
1
)
Elevation(degrees)
Frequency(kHz)
Subject 017 right pinna azimuth 0.00°
−40 −20 0 20 40 60 80
0
5
10
15
20
−50
−40
−30
−20
−10
0
(d
1
)
Elevation(degrees)
Frequency(kHz)
Subject 124 left pinna azimuth 0.00°
−40 −20 0 20 40 60 80
0
5
10
15
20
−50
−40
−30
−20
−10
0
10
20
(a
2
)
Elevation(degrees)
Frequency(kHz)
Subject 163 right pinna azimuth 0.00°
−40 −20 0 20 40 60 80
0
5
10
15
20
−50
−40
−30
−20
−10
0
10
20
(b
2
)
Elevation(degrees)
Frequency(kHz)
Subject 119 right pinna azimuth 0.00°
−40 −20 0 20 40 60 80
0
5
10
15
20
−40
−30
−20
−10
0
10
20
(c
2
)
Elevation(degrees)
Frequency(kHz)
Subject 017 right pinna azimuth 0.00°
−40 −20 0 20 40 60 80
0
5
10
15
20
−60
−50
−40
−30
−20
−10
0
10
20
(d
2
)
Pinna Notches ma rked on ear c ontour
Subject 124
(a
1
)
Subject 163
(b
1
)
Subject 119
(c
1
)
Subject 17
(d
1
)
(a
2
) (b
2
) (c
2
) (d
2
)
Conclusion
A fast method to extract accurate pinna spectral notches that follow the actual
pinna wall structure is proposed.
The main novelty of the proposed work is the efficient reconstruction of HRIR
over the median plane of a virtual spherical array simulated using the Fourier
Bessel series, especially at lower elevation angles.
HRIRs corresponding t o lower elevation angles suffer from knee reflections
which have slight contribution as compared to other anatomical reflections in
the measured HRIR.
The proposed method can suppress the knee reflections due to capability of
preserving strong variations of pinna alone under finite truncation.
The pinna spectral notches extracted are also very accurate and smooth when
compared to conventional spherical array based approach.
The proposed method i s robust to extract the pinna spectral notches even if
HRIR is measured over the complete hemisphere.
Reference s
[1] V Ralph Algazi, Richard O Duda, Dennis M Thompson, and Carlos Avendano,
“The cipic hrtf database, in Applications of Signal Processing to Audio and
Acoustics, 2001 IEEE Workshop on the. IEEE, 2001, pp. 99–102.
[2] Vikas C. Raykar, Ramani Duraiswami, and B. Yegnanarayana, Extracting
the frequencies of the pinna spectral notches in measured head related impulse
responses, The Journal of the Acoustical Society of America, vol. 118, no. 1,
pp. 364–374, 2005.
[3] V Ralph Algazi, Richard O Duda, and Patrick Satarzadeh, “Physical and filter
pinna models based on anthropometry, in Audio Engineering Society Conven-
tion 122. Audio Engineering Society, 2007.
[4] Dwight W Batteau, “The role of the pinna in human localization, Proceedings
of the Royal Society of London. Series B. Biological Sciences, vol. 168, no.
1011, pp. 158–180, 1967.