17
Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida, Gainesville, FL, USA May 19, 2005

Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

  • Upload
    hidi

  • View
    44

  • Download
    1

Embed Size (px)

DESCRIPTION

Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms. Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida, Gainesville, FL, USA May 19, 2005. Overview. - PowerPoint PPT Presentation

Citation preview

Page 1: Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Automatic detection of microchiroptera echolocation calls from field recordings

using machine learning algorithms

Mark D. Skowronski and John G. HarrisComputational Neuro-Engineering LabElectrical and Computer Engineering

University of Florida, Gainesville, FL, USAMay 19, 2005

Page 2: Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Overview• Motivations for acoustic bat detection• Machine learning paradigm• Detection experiments• Conclusions

Page 3: Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Bat detection motivations• Bats are among the most diverse yet least

studied mammals (~25% of all mammal species are bats).

• Bats affect agriculture and carry diseases (directly or through parasites).

• Acoustical domain is significant for echolocating bats and is non-invasive.

• Recorded data can be volumous automated algorithms for objective and repeatable detection & classification desired.

Page 4: Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Conventional methods• Conventional bat detection/classification parallels acoustic-

phonetic paradigm of automatic speech recognition from 1970s.

• Characteristics of acoustic phonetics:– Originally mimicked human expert methods– First, boundaries between regions determined – Second, features for each region were extracted– Third, features compared with decision trees, DFA

• Limitations:– Boundaries ill-defined, sensitive to noise– Many feature extraction algorithms with varying degrees of noise

robustness

Page 5: Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Machine learning• Acoustic phonetics gave way to machine

learning for ASR in 1980s:• Advantages:

– Decisions based on more information– Mature statistical foundation for algorithms– Frame-based features, from expert knowledge– Improved noise robustness

• For bats: increased detection range

Page 6: Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Detection experiments• Database of bat calls

– 7 different recording sites, 8 species– 1265 hand-labeled calls (from spectrogram

readings)• Detection experiment design

– Discrete events: 20-ms bins– Discrete outcomes: Yes or No: does a bin

contain any part of a bat call?

Page 7: Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Detectors• Baseline

– Threshold for frame energy• Gaussian mixture model (GMM)

– Model of probability distribution of call features– Threshold for model output probability

• Hidden Markov model (HMM)– Similar to GMM, but includes temporal constraints through piecewise-

stationary states– Threshold for model output probability along Viterbi path

Page 8: Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Feature extraction• Baseline

– Normalization: session noise floor at 0 dB– Feature: frame power

• Machine learning– Blackman window, zero-padded FFT– Normalization: log amplitude mean subtraction

• From ASR: ~cepstral mean subtraction• Removes transfer function of recording environment• Mean across time for each FFT bin

– Features:• Maximum FFT amplitude, dB• Frequency at maximum amplitude, Hz• First and second temporal derivatives (slope, concavity)

Page 9: Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Feature extraction examples

Page 10: Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Feature extraction examples

Page 11: Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Feature extraction examples

Six features: Power, Frequency, P, F P, F

Page 12: Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Detection example

Page 13: Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Experiment results

Page 14: Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Experiment results

Page 15: Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Conclusions• Machine learning algorithms improve detection

when specificity is high (>.6).• HMM slightly superior to GMM, uses more

temporal information, but slower to train/test.• Hand labels determined using spectrogram,

biased towards high-power calls.• Machine learning models applicable to other

species.

Page 16: Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Bioacoustic applications• To apply machine learning to other species:

– Determine ground truth training data through expert hand labels

– Extract relevant frame-based features, considering domain-specific noise sources (echos, propellor noise, other biological sources)

– Train models of features from hand-labeled data– Consider training “silence” models for discriminant

detection/classification

Page 17: Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Further information• http://www.cnel.ufl.edu/~markskow• [email protected]

AcknowledgementsBat data kindly provided by: Brock Fenton, U. of Western Ontario, Canada