Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Detecting proximity from personal audio recordings - Ellis, Satoh, Chen 2014-09-14 - /121
1. Detecting Proximity 2. Audio Similarity: Cross Correlation 3. Audio Similarity: Fingerprints 4. Evaluation & Conclusions
Detecting proximity from personal audio recordings
Dan Ellis, Hiroyuki Satoh, Zhuo Chen LabROSA, Columbia Univ., NY USA
ICSI, Berkeley, CA, USA Morikawa lab, University of Tokyo, Tokyo, Japan
[email protected] http://labrosa.ee.columbia.edu/
COLUMBIA UNIVERSITYIN THE CITY OF NEW YORK
Laboratory for the Recognition andOrganization of Speech and Audio
Detecting proximity from personal audio recordings - Ellis, Satoh, Chen 2014-09-14 - /12
1. Detecting Proximity
• Easy for smartphones to “listen” to ambient audio • what can they do with the
information? !
!
2
• Ubiquitous Smartphones • opportunities from having
everyone’s phones connected via the cloud?
Detecting proximity from personal audio recordings - Ellis, Satoh, Chen 2014-09-14 - /12
Detecting Proximity• Application:
Who did I speak with?
• Approaches: • High-resolution indoor GPS
- walls? • Local wireless (NFC, Bluetooth)
- what is the right range?
• Ambient audio similarity • all phones have microphones • “radius” depends on noisiness
- matches practical conversation radius
3
09:00
09:30
10:00
10:30
11:00
11:30
12:00
12:30
13:00
13:30
14:00
14:30
15:00
15:30
16:00
16:30
17:00
17:30
18:00
preschool
cafepreschoolcafelecture
officeofficeoutdoorgroup
lab
cafemeeting2
office
office
outdoorlabcafe
Ron
Manuel
Arroyo?
Lesser
2004-09-13
L2cafeoffice
outdoorlecture
outdoormeeting2
outdoorofficecafeoffice
outdoorofficepostlecoffice
DSP03
compmtg
Mike
Sambarta?
2004-09-14
Detecting proximity from personal audio recordings - Ellis, Satoh, Chen 2014-09-14 - /12
Data: Poster Sessions• Simultaneous recordings by multiple subjects
in a real “poster session” • two attempts: SANE 2013, NEMISIG 2014
• Live subjects wore Red Hats • warning others • for tracking in video
• Final data set • six subjects • 30 mins with
at least 5 of 6
4
Detecting proximity from personal audio recordings - Ellis, Satoh, Chen 2014-09-14 - /12
2. Audio Similarity: Cross Correlation• Are two audio signals
“proximal”? • recorded at slightly
different places • .. different orientations, etc
• Expect differences in detail, but shared core
!!
• Cross-correlation reveals common part
5
’ MA(ej!) = HA(e
j!)C(ej!) +NA(ej!)
’ MB(ej!) = HB(e
j!)C(ej!) +NB(ej!)
SMAMB =MAM⇤B
=HAH⇤B |C|2
+HACN⇤B +H⇤
BC⇤NA +NAN
⇤B
Detecting proximity from personal audio recordings - Ellis, Satoh, Chen 2014-09-14 - /12
Short-Time Cross Correlation• Calculate cross-correlation between
corresponding short windows • e.g. 2 s windows every 1 s
• Find peak in time domain correlation • lag at peak = best local time alignment • value at peak (normalized by energies)
= degree of similarity between signals
• Plot best lag as vs. window time • threshold peak value to ignore chance correlation
6
Detecting proximity from personal audio recordings - Ellis, Satoh, Chen 2014-09-14 - /12
skewview!
• Compiled MATLAB application tocalculate & plot short-time cross correlation of long-duration signals • raw cross-correlation
plotted in grayscale • export peak lag times & values
http://labrosa.ee.columbia.edu/projects/skewview/
7
t_ref / sect_
targ
− t_
ref /
sec
Ref: 100_1198 Targ: 20140125 134129
500 1000 1500 2000−80.8
−80.6
−80.4
−80.2
−80
−79.8
−0.06
−0.04
−0.02
0
0.02
0.04
0.06
Detecting proximity from personal audio recordings - Ellis, Satoh, Chen 2014-09-14 - /12
3. Audio Similarity: Fingerprints• Landmark-based audio fingerprinting:
• Represent audio as “constellation” of energy peaks
• Index nearby pairs of peaks for rapid search
• Match as multiple peaks in same relative positions
• Robust to… • channels - peak level not used • noise - only a few peaks need to match
• Fast search in large archives (Shazam)
8
Query audio
0
500
1000
1500
2000
2500
3000
3500
4000
Match: 05−Full Circle at 0.032 sec
0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
500
1000
1500
2000
2500
3000
3500
4000
time / secfre
q / H
z
Avery Wang, 2003
Detecting proximity from personal audio recordings - Ellis, Satoh, Chen 2014-09-14 - /12
audfprint• Open source audio fingerprinting tool
• Matlab: http://labrosa.ee.columbia.edu/matlab/audfprint/ Python: https://github.com/dpwe/audfprint
• Rapid retrieval of short noisy queries within large databases • 10 sec over-the-air
queries within 100k+ reference items in ~1 s
9
Detecting proximity from personal audio recordings - Ellis, Satoh, Chen 2014-09-14 - /12
4. Results• Mutual proximity between all six channels:
Cross-correlation Fingerprints
!
!
!
!
!
• Various “proximal episodes” between targets visible (dark)
• Good agreement between two methods
10
bmcrdl
dehpzc
crbm
dldehpzc
dlbm
crdehpzc
debm
crdl
hpzc
hpbm
crdl
dezc
zc
time / min1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
bmcrdl
dehp
counts0
5
bmcrdl
dehpzc
crbm
dldehpzc
dlbm
crdehpzc
debm
crdl
hpzc
hpbm
crdl
dezc
zc
time / min1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
bmcrdl
dehp
correlation0
0.3
Detecting proximity from personal audio recordings - Ellis, Satoh, Chen 2014-09-14 - /12
Evaluation• Ground truth?
• did not hand-mark video…
• Cross-correlation is quite reliable … • use it as reference for fingerprinting !
• DET curve for thresholded proximity • fprint vs. xcorr
• Execution time [for 6 x 30 min tracks]: • Cross-corr : ~(0.006 x tdur) x N2 [427 s] • Fingerprint: ~(0.030 x tdur) x N [317 s, linear]
11
0.1 0.5 2 5 10 20 40 0.1 0.2 0.5 1 2 5 10 20
40
False Alarm probability (in %)
Miss
pro
babil
ity (i
n %
)
FP DET curve for XC threshold 0.15
Detecting proximity from personal audio recordings - Ellis, Satoh, Chen 2014-09-14 - /12
Conclusions• Similarity between ambient audio
e.g. from smartphone micscan be used to track personal proximity
• Similarity can be measured by: • cross-correlation (accurate but expensive) • landmark fingerprinting (fast, but adequate?)
• Experiments showed both approachesgave very similar results • fingerprinting suitable for scaling to very large
datasets, e.g. across many users
12