19
identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida Gainesville, FL, USA December 1, 2004

Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

  • Upload
    barth

  • View
    47

  • Download
    3

Embed Size (px)

DESCRIPTION

Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition. Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida Gainesville, FL, USA December 1, 2004. - PowerPoint PPT Presentation

Citation preview

Page 1: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Statistical automatic identification of microchiroptera from echolocation calls

Lessons learned from human automatic speech recognition

Mark D. SkowronskiComputational Neuro-Engineering LabElectrical and Computer Engineering

University of FloridaGainesville, FL, USADecember 1, 2004

Page 2: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Overview• Motivations for bat acoustic research• Review bat call classification methods• Contrast with 1970s human ASR

– Machine learning vs. expert knowledge• Experiments• Conclusions and future work

Page 3: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Bat research motivations• Bats are among:

– the most diverse (25% of all mammal species),– the most endangered,– and the least studied mammals.

• Close relationship with insects– agricultural impact– disease vectors

• Acoustical research– non-invasive (compared to netting)– significant domain (echolocation)

Page 4: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

More motivations• Calls simple compared to human speech• Same goals as human ASR

– Detection– Feature extraction– Classification– Noise-robust performance

• Easier to design/develop models• Domain between toy problems and ASR

Page 5: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Bat echolocation• Ultrasonic, brief chirps (~active sonar)• Determine range, velocity of nearby objects

(clutter, prey, other bats)• Tailored for task, environment

Tadarida brasiliensis (Mexican free-tailed bat)

Listen to 10x time-expanded search calls:Sound (OLE2)

Page 6: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Echolocation calls• Two characteristics

– Frequency modulated (range information)– Constant frequency (velocity information)

• Features (holistic)– Freq. extrema– Duration– Shape– # harmonics– Call interval

Mexican free-tailed calls, concatenated

Page 7: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Current classification methods• Expert sonogram readers

– Manual or automatic feature extraction• Griffin 1958, Fenton and Bell 1981

– Comparison with exemplar sonograms– Decision trees

• Automatic classification– Discriminant function analysis

• By far the most popular method in literature• Available in statistical software packages (SAS, SPSS)

– Others• Artificial neural networks, Parsons 2001• Spectrogram correlation, Pettersson Elektronik AB

Parallels the 1970s acoustic-phonetic approach to human ASR.

Page 8: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Acoustic phonetics

• Bottom up paradigm– Frames, boundaries, groups, phonemes, words– Mimics techniques of expert spectrogram readers

• Manual or automatic feature extraction– Formants, voicing, duration, intensity, transitions

• Classification– Decision tree, discriminant functions, neural network, Gaussian

mixture model, Viterbi path

DH AH F UH T B AO L G EY EM IH Z OW V ER

Page 9: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Acoustic phonetics limitations• Variability of conversational speech

– Complex rules, difficult to train• Boundaries difficult to define

– Coarticulation, reduction• Feature estimates brittle

– Variable noise robustness• Hard decisions, errors accumulate

Shifted to machine learning paradigm of human ASR by 1980s: better able to account for variability of speech, noise.

Page 10: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Machine learning ASR• Data-driven models

– Non-parametric: dynamic time warp (DTW)– Parametric: hidden Markov model (HMM)

• Frame-based– Identical features from every frame– Expert information in feature extraction– Models account for feature, temporal

variabilitiesMachine learning dominates state-of-the-art ASR.

Page 11: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Data collection• UF Bat House, home to 60,000 bats

– Mexican free-tailed bat (vast majority)– Evening bat– Southeastern myotis

• Continuous recording– 90 minutes around sunset– ~20,000 calls

• Equipment:– B&K mic (4939), 100 kHz– B&K preamp (2670)– Custom amp/AA filter– NI 6036E 200kS/s A/D card– Laptop, Matlab– Portable

Page 12: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Experiment design• Hand labels as ground truth

– Narrowband spectrogram– 436 calls (2% of data) in 3 hours (80x real time)– Four classes, a priori: 34, 40, 20, 6%– All experiments on hand-labeled data only– No hand-labeled calls excluded from experiments

1 2 3 4

Page 13: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Methods• Baseline, from the literature

– Features• Duration• Zero crossing: Fmin, Fmax, Fmax_energy• MUSIC super resolution frequency estimator

– Classifier• Discriminant function analysis, quadratic boundaries

• DTW and HMM– Features

• Frequency (MUSIC), log energy, Δs (HMM only)– HMM

• 5 states/model• 4 Gaussian mixtures/state, diagonal covariances

• Tests– Leave one out– Repeated trials: 25% test data, 1000 trials– Test on train data (HMM only)

Page 14: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Results• Baseline, zero crossing

– Leave one out: 72.5% correct– Repeated trials: 72.5 ± 4% (mean ± std)

• Baseline, MUSIC– Leave one out: 79.1%– Repeated trials: 77.5 ± 4%

• DTW– Leave one out: 74.5 %– Repeated trials: 74.1 ± 4%

• HMM– Test on train: 85.3 %

Page 15: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Confusion matrices1 2 3 4

1 107 38 1 2 72.3%

2 21 134 16 4 76.6%

3 2 29 57 0 64.8%

4 4 3 0 18 72.0%

72.5%

Baseline, zero crossing Baseline, MUSIC

DTW HMM

1 2 3 4

1 110 36 1 1 74.3%

2 12 149 12 2 85.1%

3 4 18 66 0 75.0%

4 3 2 0 20 80.0%

79.1%

1 2 3 4

1 115 29 0 4 77.7%

2 32 131 11 1 74.9%

3 5 20 63 0 71.6%

4 5 4 0 16 64.0%

74.5%

1 2 3 4

1 118 25 0 5 79.7%

2 10 154 5 6 88.0%

3 1 12 75 0 85.2%

4 0 0 0 25 100%

85.3%

Page 16: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Comments• Experiments

– Weakness: accuracy of class labels– No labeled calls excluded, realistic– HMM most accurate, but undertrained– MUSIC frequency estimate robust, but 1000x slower

than ZCA (20x real time)• Machine learning

– Expert information still necessary• Feature extraction (dimensionality reduction)• Model parameters

– DTW: fast training, slow classification– HMM: slow training, fast classification (real time)

Page 17: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Future work• Ultimate goal

– Real-time portable system for species ID– Commercial product possibilites

• Feature extraction– Robust

• Broadband noise• Echos• Unknown distance between bat and microphone

– Chirp model, echo model– Faster frequency estimates– Match assumptions of classifiers

Page 18: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

More future work• Detection

– Replace energy-based method with principled statistical methods using frame-based features

• Classification– Accurate class labels for training

• Netting• Record from known bat roosts (preferred)

– Pseudo-sinusoidal input• Oscillator network• Echo state network

Page 19: Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Information

[email protected]• http://www.cnel.ufl.edu/~markskow