byby
Jiazhi Ou Jiazhi Ou [email protected]@cs.cmu.edu
Tal Blum Tal Blum [email protected]@cs.cmu.edu
Wild Dolphin ProjectWild Dolphin Project 11-751 Speech Final 11-751 Speech Final
Project Project
OutlineOutline
Wild Dolphin Project, Dolphin SpeechWild Dolphin Project, Dolphin Speech Data, Labeling, Labeling problemsData, Labeling, Labeling problems Previous workPrevious work Models trainingModels training Experiments & ResultsExperiments & Results ConclusionsConclusions
The Wild Dolphin Project (WDP)The Wild Dolphin Project (WDP)
The Wild Dolphin Project The Wild Dolphin Project (WDP), founded by Dr. Denise (WDP), founded by Dr. Denise Herzing in 1985, is engaged in Herzing in 1985, is engaged in an ambitious, long-term an ambitious, long-term scientific study of a specific scientific study of a specific pod of Atlantic spotted pod of Atlantic spotted dolphins that live 40 miles off dolphins that live 40 miles off the coast of the Bahamas, in the coast of the Bahamas, in the Atlantic Ocean. For about the Atlantic Ocean. For about 100 days each year, Phase I 100 days each year, Phase I research has involved the research has involved the photographing, videotaping, photographing, videotaping, and audio taping of a group of and audio taping of a group of resident dolphins, aiming to resident dolphins, aiming to learn about their lives. learn about their lives.
http://www.wilddolphinproject.ohttp://www.wilddolphinproject.org/index.cfmrg/index.cfm
Dolphin’s SpeechDolphin’s Speech
Range of frequencies is widerRange of frequencies is wider Two mechanisms for producing Two mechanisms for producing
sound simultaneouslysound simultaneously Directionality of some of the Directionality of some of the
frequenciesfrequencies Carried in waterCarried in water Can travel large distancesCan travel large distances
Dolphin’s Speech is very different than man’s Dolphin’s Speech is very different than man’s speechspeech
Dolphin’s Speech(2)Dolphin’s Speech(2)
Is used for:Is used for: IdentificationIdentification CommunicatingCommunicating
• FightingFighting• DefendingDefending• CourtingCourting• WarningWarning• CallingCalling
HuntingHunting
Dolphin’s Speech(3)Dolphin’s Speech(3)
3 main types3 main types WhistlesWhistles
• SignatureSignature• Non-signatureNon-signature
ClicksClicks Spike trainsSpike trains
What do we knowWhat do we know
Not muchNot much We know that each dolphin has a unique We know that each dolphin has a unique
whistle called signature whistle.whistle called signature whistle. The signature whistle is similar to those The signature whistle is similar to those
that are in close contact with the baby that are in close contact with the baby dolphindolphin
DataData
164 files containing sounds of one dolphin 164 files containing sounds of one dolphin whose name is known.whose name is known.
Average file length is 7 secAverage file length is 7 sec Total data length less than 20 minutes out Total data length less than 20 minutes out
of which about half is silenceof which about half is silence The data does not contain all of the The data does not contain all of the
relevant frequenciesrelevant frequencies
LabelingLabeling
Dolphin NamesDolphin Names Dolphin ID projectDolphin ID project
Pause, Noise, Dolphin Signature Whistles, Pause, Noise, Dolphin Signature Whistles, Dolphin Non-Signature whistles.Dolphin Non-Signature whistles.
Labeling ProblemsLabeling Problems
How do we distinguish between those 2 How do we distinguish between those 2 whistles?whistles?
How to distinguish between whistles and non-How to distinguish between whistles and non-whistles?whistles? They co-occurThey co-occur
How to determine the duration of the label?How to determine the duration of the label? Should close labels be labeled as one label?Should close labels be labeled as one label? This has an effect on the modelThis has an effect on the model
Some signals are weak, probably due to a Some signals are weak, probably due to a change in the dolphins directionchange in the dolphins direction
Mapping from Labels to ModelsMapping from Labels to Models
LabelLabel ModelModel
dd Signature WhistlesSignature Whistles
dp, mddp, md Non-Signature WhistlesNon-Signature Whistles
click, electnoise, click, electnoise, electricnoise, h#, H#, electricnoise, h#, H#,
MachineSpike, sMachineSpike, s
GARBAGEGARBAGE
paupau PAUSE (Water)PAUSE (Water)
Label StatisticsLabel Statistics
PAUSE SIGWHISTLE GARBAGE DOLPHIN
#occurrences
756 633 13 24
Accumulated time (in
secs)
466 320 7.1 11.3
Average time per
occurrence
0.6 0.5 0.55 0.47
Previous WorkPrevious Work
Dolphin-ID Project by Tanja, Alan and YueDolphin-ID Project by Tanja, Alan and Yue Task: To identify dolphin ID using their Task: To identify dolphin ID using their
signature whistlessignature whistles 51 labeled files by Alan51 labeled files by Alan 13 HMMs: 10 for each dolphin + DOLPHIN, 13 HMMs: 10 for each dolphin + DOLPHIN,
PAUSE, and GARBAGEPAUSE, and GARBAGE Use Janus to do training and testingUse Janus to do training and testing Try different kinds of featuresTry different kinds of features
Our WorkOur Work
Model Generalized Signature WhistlesModel Generalized Signature Whistles Label More FilesLabel More Files Create HMMs for signature whistles, non-Create HMMs for signature whistles, non-
signature whistles, garbage, and pausesignature whistles, garbage, and pause Train and test the HMMs using JanusTrain and test the HMMs using Janus Evaluate the test results with our own methodEvaluate the test results with our own method Compare different model selectionsCompare different model selections
Signal ProcessingSignal Processing
Tanja scriptsTanja scripts Down samplingDown sampling High Pass FilterHigh Pass Filter FFTFFT LDALDA
HMM TopologiesHMM Topologies
b m eb m e
b m e m m m
Signature Whistles Non-Signature Whistles
Garbage Pause (Water)
Model SelectionModel Selection
Scheme 1Scheme 1 Signature Whistles, Non-Signature Whistles, Signature Whistles, Non-Signature Whistles,
GARBAGE, PAUSEGARBAGE, PAUSE Scheme 2Scheme 2
Signature Whistles, GARBAGE, PAUSESignature Whistles, GARBAGE, PAUSE Scheme 3Scheme 3
10 HMMs (one for each dolphin), GARBAGE, 10 HMMs (one for each dolphin), GARBAGE, PAUSEPAUSE
EvaluationEvaluation
We can not use WER here since there are We can not use WER here since there are no words, just segments.no words, just segments.
The method we used was to compute a The method we used was to compute a confusion matrix over hidden states.confusion matrix over hidden states.
Janus treat silence differently and doesn’t Janus treat silence differently and doesn’t show silence classification which show silence classification which complicates the evaluation.complicates the evaluation.
ExperimentsExperiments
DataData 162 labeled files were used162 labeled files were used Half of the data for training, half for testingHalf of the data for training, half for testing Swap the training set and test setSwap the training set and test set 162 test results all together162 test results all together
FeaturesFeatures The same as those in dolphin-ID projectThe same as those in dolphin-ID project
Model SelectionModel Selection 3 different schemes3 different schemes
Results – Scheme 1Results – Scheme 1
Sig Sig Non-SigNon-Sig GarbageGarbage PausePause
SigSig 58%58% 6%6% 18%18% 34%34%
Non-SigNon-Sig 33%33% 8%8% 37%37% 22%22%
GarbageGarbage 77%77% 0%0% 5%5% 18%18%
PausePause 31%31% 6%6% 27%27% 34%34%
Results – Scheme 2Results – Scheme 2
SigSig GarbageGarbage PausePause
SigSig 79%79% 9%9% 21%21%
GarbageGarbage 52%52% 21%21% 27%27%
PausePause 48%48% 14%14% 38%38%
Results – Scheme 3Results – Scheme 3
SigSig GarbageGarbage PausePause
SigSig 91%91% 0.6%0.6% 8%8%
GarbageGarbage 80%80% 10%10% 10%10%
PausePause 69%69% 1%1% 30%30%
Analysis of ResultsAnalysis of Results
You can only get as good as your labelsYou can only get as good as your labels Scheme 3 is the best to align signature whistles -- Scheme 3 is the best to align signature whistles --
speaker dependentspeaker dependent Scheme 1 is the worst – Not enough data to Scheme 1 is the worst – Not enough data to
model non-signature whistles and garbagemodel non-signature whistles and garbage Scheme 2 is in the middle – speaker independentScheme 2 is in the middle – speaker independent Pause is the most difficult to model – It contains Pause is the most difficult to model – It contains
all different things. We modeled it with only 1 stateall different things. We modeled it with only 1 state
ConclusionConclusion
Analyzing dolphin sounds is quite different Analyzing dolphin sounds is quite different than analyzing human speech. The than analyzing human speech. The methods used have to be adjusted to the methods used have to be adjusted to the characteristics of the dolphin sounds.characteristics of the dolphin sounds. There is a lot of work to be done in the signal There is a lot of work to be done in the signal
processing stageprocessing stage Partly supervised trainingPartly supervised training It might be better just to construct a model for It might be better just to construct a model for
the labels we are sure and let the model learn the labels we are sure and let the model learn what are signature whistles or units that what are signature whistles or units that discriminate between different labels.discriminate between different labels.
We also tried …We also tried …
One-state model for non-signature One-state model for non-signature whistles, garbage, and pausewhistles, garbage, and pause-- Segmentation fault in training-- Segmentation fault in training
““Loop back” model for signature whistlesLoop back” model for signature whistles-- The loop back transition makes no difference-- The loop back transition makes no difference
AcknowledgementAcknowledgement
Tanja SchultzTanja Schultz
Yue PanYue Pan
Alan W BlackAlan W Black
Szu-Chen Stan JouSzu-Chen Stan Jou
Hua YuHua Yu
Thank You!Thank You!
Jiazhi OuJiazhi Ou
Tal BlueTal Blue
{jzou, tblum}@cs.cmu.edu{jzou, tblum}@cs.cmu.edu
Recommended