Upload
lawrence-hampton
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Statistical automatic identification of microchiroptera from echolocation calls
Lessons learned from human automatic speech recognition
Mark D. Skowronski and John G. Harris
Computational Neuro-Engineering Lab
Electrical and Computer Engineering
University of Florida
Gainesville, FL, USA
November 19, 2004
Overview• Motivations for bat acoustic research
• Review bat call classification methods
• Contrast with 1970s human ASR
• Experiments
• Conclusions
Bat research motivations• Bats are among:
– the most diverse,– the most endangered,– and the least studied mammals.
• Close relationship with insects– agricultural impact– disease vectors
• Acoustical research non-invasive, significant domain (echolocation)
• Simplified biological acoustic communication system (compared to human speech)
Echolocation calls• Features (holistic)
– Frequency extrema– Duration– Shape– # harmonics– Call interval
Mexican free-tailed calls, concatenated
Current classification methods
• Expert spectrogram readers– Manual or automatic feature extraction– Comparison with exemplar spectrograms
• Automatic classification– Decision trees– Discriminant function analysis
Parallels the knowledge-based approach to human ASR from the 1970s (acoustic phonetics, expert systems, cognitive approach).
Acoustic phonetics
• Bottom up paradigm– Frames, boundaries, groups, phonemes, words
• Manual or automatic feature extraction– Determined by experts to be important for speech
• Classification– Decision tree, discriminant functions, neural network,
Gaussian mixture model, Viterbi path
DH AH F UH T B AO L G EY EM IH Z OW V ER
Acoustic phonetics limitations
• Variability of conversational speech– Complex rules, difficult to implement
• Feature estimates brittle– Variable noise robustness
• Hard decisions, errors accumulate
Shifted to information theoretic (machine learning) paradigm of human ASR, better able to account for variability of speech, noise.
Information theoretic ASR• Data-driven models from computer
science– Non-parametric: dynamic time warp (DTW)– Parametric: hidden Markov model (HMM)
• Frame-based– Expert information in feature extraction– Models account for feature, temporal
variability
Data collection• UF Bat House, home to 60,000 bats
– Mexican free-tailed bat (vast majority)– Evening bat– Southeastern myotis
• Continuous recording– 90 minutes around sunset– ~20,000 calls
• Equipment:– B&K mic (4939), 100 kHz– B&K preamp (2670)– Custom amp/AA filter– NI 6036E 200kS/s A/D card– Laptop, Matlab
Experiment design• Hand labels
– 436 calls (2% of data)– Four classes, a priori: 34, 40, 20, 6%– All experiments on hand-labeled data only– No hand-labeled calls excluded from experiments
1 2 3 4
Experiments• Baseline
– Features• Zero crossing• MUSIC super resolution frequency estimator
– Classifier• Discriminant function analysis, quadratic boundaries
• DTW and HMM– Features
• Frequency (MUSIC), log energy, first derivatives (HMM only)
– HMM• 5 states/model• 4 Gaussian mixtures/state• diagonal covariances
Results• Baseline, zero crossing
– Leave one out: 72.5% correct– Repeated trials: 72.5 ± 4% (mean ± std)
• Baseline, MUSIC– Leave one out: 79.1%– Repeated trials: 77.5 ± 4%
• DTW, MUSIC– Leave one out: 74.5 %– Repeated trials: 74.1 ± 4%
• HMM, MUSIC– Test on train: 85.3 %
Confusion matrices1 2 3 4
1 107 38 1 2 72.3%
2 21 134 16 4 76.6%
3 2 29 57 0 64.8%
4 4 3 0 18 72.0%
72.5%
Baseline, zero crossing Baseline, MUSIC
DTW, MUSIC HMM, MUSIC
1 2 3 4
1 110 36 1 1 74.3%
2 12 149 12 2 85.1%
3 4 18 66 0 75.0%
4 3 2 0 20 80.0%
79.1%
1 2 3 4
1 115 29 0 4 77.7%
2 32 131 11 1 74.9%
3 5 20 63 0 71.6%
4 5 4 0 16 64.0%
74.5%
1 2 3 4
1 118 25 0 5 79.7%
2 10 154 5 6 88.0%
3 1 12 75 0 85.2%
4 0 0 0 25 100%
85.3%
Conclusions• Human ASR algorithms applicable to bat
echolocation calls• Experiments
– Weakness: accuracy of class labels– HMM most accurate, undertrained– MUSIC frequency estimate robust, slow
• Machine learning– DTW: fast training, slow classification– HMM: slow training, fast classification
Further information• http://www.cnel.ufl.edu/~markskow• [email protected]• DTW reference:
– L. Rabiner and B. Juang, Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, NJ, 1993
• HMM reference:– L. Rabiner, “A tutorial on hidden Markov models and
selected applications in speech recognition,” in Readings in Speech Recognition, A. Waibel and K.-F. Lee, Eds., pp. 267–296. Kaufmann, San Mateo, CA, 1990.