View
30
Download
1
Category
Preview:
Citation preview
The Zero-Cost task goal...● ...not to be a zero/low resource
○ Lots of work done with data preparation.○ There is “lots” of data outside.
● To be more realistic○ Limitation in budget.. Not in data.○ Very noisy data.
● To setup a knowledge base○ How to put things together.○ Not how to compete in data download.
● To be in touch with community○ IARPA BABEL, MGB Challenge, JHU workshops, Zero resource Interspeech challenge
The Zero-Cost task in detail...● Language: Vietnamese● Data provided (~12 hours)● 2 sub-tasks
○ Full-size ASR -> Word Error Rate○ A tokenizer -> Normalized Mutual Information
● Participants can find and share more data (must be free).● Participants score using an on-line Leader Board - www.Zero-Cost.org● To help participants we:
○ Prepared a baseline system (Triphone-GMM system trained using Kaldi)○ Provided a BUT BABEL baseline system (to frame ZC results)○ Provided a local scoring
The task in deeper detail..
● Data○ ELSA - read sentences on a cellphone (various public places)○ Forvo.com - single word pronunciation (office)○ Rhinospike.com - read sentences or paragraphs (office)○ YouTube.com - news, presentations, talks (single speaker, reverberant)
● Other data provided by participants○ (NNI) - URLs of Vietnamese web pages - (BUT) downloaded and cleaned - 18MB○ (NNI) - Wiki text - cleaned - 750MB○ (NNI) - Wordlist - 80k○ (BUT) - Subtitles - 93MB○ (BUT) - Some telenovels (video + subtitles) - not used
Participants● 12 interested● 6 signed up● 3 finished
○ ASR sub task■ NNI - fusion of 3 systems based on DNNs and data augmentation. Paticp. data used.■ ININ - fusion of several systems based on SGMMs, RNN and data augmentation.■ BUT - single system based on BLSTMs.
○ Subword sub task
■ BUT - single system based on automatically derived units using an infinite mixture of HMM.
Conclusion● Thanks to all!
○ Participants shared data.○ Coped with low amount of data - data augmentation.○ Used state of the art techniques.
● The future? ○ Leader board stays open. Anyone can join or continue.○ We would like to continue!
■ The new language?■ Tweak the task a bit?■ More active participants...
Details...● Zero-cost.org● News, task description, scoring, leader board● Small dev subset provided to participants to save some of their time
uploading too often.● Uploading dev + test, to see just dev results. When evaluations are finished,
to see also test results.● Support of late submissions.● Web interface, token based backend processes (working independently on
web).
Recommended