Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Outline
• Artificial Intelligence
• AI Applications
• AI Technologies for Language Education
• Corpus-Based Evaluation on Chinese Text Normalization
• Applications for Language Education
• Concluding remarks
Artificial Intelligence
AI everywhere!
https://medium.com/infonation-monthly/5-companies-making-the-world-a-better-place-with-ai-right-now-7a7b109f0120
Artificial Intelligence (AI)
Artificial Intelligence (AI)
AI and Machine Learning
AI and Machine Learning
canvas.northwestern.edu
AI and Machine Learning
https://towardsdatascience.com/role-of-data-science-in-artificial-intelligence-950efedd2579
10
ObservableData
Modeling(Learning)
Information
AI Method: Machine Learning
11
ObservableData
Modeling(Learning)
Information
Speech Text
안녕하세요
SpeechRecognition
AI Method: Machine Learning
12
ObservableData
Modeling(Learning)
Information
OneLanguage
AnotherLanguage
I am a boy
MachineTranslation
나는소년이다
AI Method: Machine Learning
13
ObservableData
Modeling(Learning)
Information
Image Category
고양이
ImageClassify
AI Method: Machine Learning
14
ObservableData
Modeling(Learning)
Information
HOW ?
AI Method: Machine Learning
15
ObservableData
Modeling(Learning)
Information
DeepNeural
Networks
AI Method: Machine Learning
AI Applications
AI Applications
Search
Ⓒ2017 NAVER Corp.
AI Applications
Object detection and Image
classification
AI Applications
Recommendation
AI Applications
Recommendation
AI Applications
Translation Speech Synthesis
24
Smart Speaker With Screen Display
Lenovo Smart Display
All-new Echo Show
Echo Spot
Portal from Facebook
https://thedroidguy.com/2018/11/5-best-smart-speaker-with-screen-display-in-2019-1092784
JBL Link View
Smart Speaker With Screen Display
https://www.mk.co.kr/news/business/view/2019/05/312029/
Smart Devices & Multiexperience
Smart Devices & Multiexperience
Smart Devices & Multiexperience
AI Technologies for Language Education
30
ObservableData
Modeling(Learning)
Information
Speech Text
안녕하세요Speech
Recognition
Speech Recognition
Speech Recognition
Task Vocabulary Word Error Rate %
Digits 11 0.5
WSJ read speech 5K 3
WSJ read speech 20K 3
Broadcast news 64,000+ 5
Conversational Telephone 64,000+ 10
Source: http://web.stanford.edu/class/cs224s/lec/
Speech Recognition
Machines about 5 times worse than humansGap increases with noisy speech
Task Vocab ASR Hum SR
Continuous digits 11 .5 .009
WSJ 1995 clean 5K 3 0.9
WSJ 1995 w/noise 5K 9 1.1
SWBD 2004 65K 10? 3-4?
Source: http://web.stanford.edu/class/cs224s/lec/
Speech Recognition
Speech Synthesis
Speech Synthesis
TextLanguage Understanding
ModuleVoice
Prosody
Prediction
Module
Unit
Selection Module
Prosody ModelsSpeech
DBNLU
Models
Architecture
열대/십때/십대삼미터/삼메가/쓰리엠이뤌리릴/한시일분/…
이뤌 리릴이뤌 리릴이뤌 이릴
10 대3M
01.01
Speech Synthesis
텍스트정규화(Text Normalization)
예제
Je loue meublé de 38 m² en excellent état au 2ème étage (refait à neuf en 2012) situé Rue
Monsieur le Pince à coté du jardin du Luxembourg.
Rares au XIXe siècle, leur présence est coutumière dès le milieu de notre siècle.
문장내의숫자, 기호, 외국어, 등여러가지문제들을처리하는모듈
방법론: 규칙기반혹은통계기반방법
Speech Synthesis - NLU
발음변환(Grapheme-to-Phoneme Conversion)
Homophones
Les touristes affluent pour visiter le musée.
L'Isère est un affluent du Rhône.
Liaison
Il a été très étonné de voir ça !
Hier, on s'est bien amusés.
방법론: 규칙기반혹은통계기반방법
Speech Synthesis - NLU
운율경계및액센트추정
예제
Mon mari veut raminer Romain.
규칙기반혹은통계기반방법
운율경계정보태깅학습데이터구축
Speech Synthesis - NLU
최적발성목록설계
성우선정
TTSDB녹음
발음전사
음소전사
운율전사
언어정보태깅
합성단위특징추출
Pre-Selection
보이스폰트패키징
언어및도메인지식
언어지식
언어지식
언어지식
언어지식
언어지식
보이스폰트
보이스폰트
보이스폰트
Speech Synthesis - Speech DB
현재서귀포날씨입니다.
기온이 7.3도로춥지는않지만, 카메라에빗방울이맺혀있죠?
제주도와전남해안에는비가약하게내리고있는데요, 이비는밤에그밖에충청과남부지방으로확대되겠습니다.
오늘출근길에도크게춥지않겠습니다.
날씨정보였습니다.
YTN 기상캐스터
GoogleTranslator
nVoice본문듣기
Speech Synthesis
http://folk.uio.no/plison/research
Dialogue system
Speaker Recognition
Speaker Recognition
Speaker Verification (Speaker Detection)
• Is this speech sample from a particular speaker Is that Jane?
Speaker Identification
• Which of these speakers does this sample come from? Who is that?
• Related tasks: Gender ID, Language ID Is this a woman or a man
Speaker Diarization
• Segmenting a dialogue or multiparty conversation Who spoke when?
• 음성인식기반의한국어학습자발음평가
• 한국어학습자발음평가시스템(네이버 + 서울대)
Multilingual Pronunciation Assessment
Corpus-Based Evaluation on Chinese Text Normalization
INTRODUCTION
TAXONOMY OF NON-STANDARD WORDS
TEXT NORMALIZATION MODULES
CORPUS AND TESTSET
EVALUATION RESULTS
DISCUSSION AND FUTURE WORK
Kim, S. (2017). Corpus-based evaluation of Chinese text normalization. In 2017 20th Conference of the Oriental Chapter of the
International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA) (pp. 1-4). IEEE.
CORPUS-BASED EVALUATION OF CHINESE TEXT NORMALIZATION
Speech Synthesis: Text-to-Speech
Text normalization A crucial component of text analysis in TTS systems, causing a major degradation of perceived quality of
the given TTS system Converts Non-Standard Words (NSWs) into corresponding standard words
Number expressions, abbreviations, and acronyms, etc. Detection/classification of NSWs, disambiguation, and conversion into standard words
Approaches TN Approaches
WFSTs , Language modeling Machine learning approaches (ME, RNN) along with DBB based TTS (Wavenet, Deep Voice, Tacotron)
Evaluation methods WER, TER, F1-Measure
Aim of paper To present a method of developing a corpus consisting of various categories of NSWs and a
representative test set in Standard Mandarin and Taiwanese Mandarin
INTRODUCTION
TAXONOMY OF NON-STANDARD WORDS
Taxonomy proposed Based on a systematic investigation of a large scale corpus, which consists of sentences from email,
chatting, and news Similar to the one presented in [2]
Examples of Basic NSWs (BNSWs)
Examples of Ambiguous NSWs (ANSWs))
TEXT NORMALIZATION MODULES
Tools Thrax [9][10] and OpenFST [11], which is similar to the one presented in [8]
CORPUS AND TEST SET: STANDARD MANDARIN
Description News Blog Email Forum SMS Chat Total
corpus size 100MB 3.45MB 7.7MB 3.11MB 1.48MB 7.84MB 123.58MB
number of sentences 440,000 42,629 67,572 75,519 33,403 216,116 875,239
sentences with NSWs 150,000 5,062 7,493 9,021 1,885 17,970 191,431percentage 30% 20% 10% 10% 15% 15% 100%
NSW type Date Time Email URL Phone/Fax PercentageProportion 12.8% 0.8% 0.1% 0.1% 0.3% 4.0%
NSW type Num+suffix_each money_name digit_suffix en_digit en_seq nt_wordsProportion 0.2% 0.1% 1.2% 5.8% 2.0% 1.2%
NSW type NumOrder Ratio NumberInterval YearInterval Num+suffix num_units
Proportion 5.4% 0.1% 1.2% 0.1% 34.8% 0.5%
NSW type en_words Digit NumberReal symbol ext_rules en_seq_default
Proportion 0.5% 0.3% 13.1% 5.2% 0.6% 9.7%
Corpus composition
Distribution of NSW categories of 1,000 test cases
CORPUS AND TEST SET: STANDARD MANDARIN
Distribution of NSW categories
CORPUS AND TEST SET: TAIWANESE MANDARIN
Corpus composition
Distribution of NSW categories of 1,000 test cases
Description NewsBlog&Forum&Ne
ws Email SMS Chatting Total
corpus size 25.8MB 100MB 8.55MB 40MB 10MB 184.35MB
number of sentences 723,385 1,230,000 74,300 881,510 302,545 3,211,740
sentences with NSWs 44,478 525,326 21,681 85,386 54,653 731,524
percentage 30% 20% 10% 20% 20% 100%
NSW type Date Time Email URL Phone/Fax PercentagePercentage 4.1% 1.7% 0.3% 0.1% 0.6% 1.2%
NSW type Num+suffix_each money_name digit_suffix en_digit en_seq nt_wordsPercentage 0.1% 0.1% 1.0% 7.4% 2.4% 1.0%
NSW type Fraction NumOrder Ratio NumberInterval Num+suffix num_unitsPercentage 0.1% 2.2% 0.1% 1.2% 23.6% 1.0%
NSW type Digit NumberReal symbol ext_rules en_seq_default Number_TraPercentage 0.6% 18.5% 6.9% 0.2% 21.4% 4.2%
CORPUS AND TEST SET: TAIWANESE MANDARIN
Distribution of NSW categories
Standard Mandarin The test set
1000 sentences including 1,782 NSWs, amounting to 57,387 characters Manual checking conducted by two language experts Results
Errors: 34 NSWs in 33 sentences NSW token accuracy: 98.09% (P_NSW = 1 - 34/1782 = 98.09%) Sentence accuracy: 96.7% (P_Sent = 967/1000 = 96.7%)
Taiwanese Mandarin The test set
1000 sentences including 1,402 NSWs, amounting to 29,158 characters Manual checking conducted by two language experts Results
Errors: 33 NSWs in 31 sentencesNSW token accuracy: 97.64% (P_NSW = 1 - 33/1402 = 97.64%)Sentence accuracy: 96.9% (P_Sent = 969/1000 = 96.9%).
EVALUATION RESULTS
Summary This paper presents a method of developing a corpus consisting of various categories of Non-Standard
Words (NSWs) and a representative test set for the evaluation of the text normalization module proposed for Standard Mandarin and Taiwanese Mandarin.
To note The two languages known to be the same except for their character sets show difference in terms of
NSW categories. More alphabets and their compounds appear in Taiwanese Mandarin (33.8%) than in Standard
Mandarin (20.5%) More numbers and their compounds are found 81.4% in Standard Mandarin than in Taiwanese
Mandarin (63.9%). The symbols appear in the similar proportion in two languages.
DISCUSSION AND FUTURE WORK
Applications for Language Education
로제타스톤 (Rosetta Stone)
시원스쿨리얼트레이닝
스피킹맥스
특징비교
RosettaStone RealTraining SpeakingMax
environment web / app web / app web / app
materialsmultimedia
(photos / speech)
multimedia
(interviews of native
speakers)
multimedia
(interviews of
native speakers)
speaking method shadowing shadowingRepeat: shadowing
Speech: speaking
learning unitword – phrase -
sentencesentence sentence
assessment /
feedback
word: pass/fail
sentence:
sound wave
visual feedback
including individual
scores
visual feedback
(a variational sould
wave)
objective fluency fluency fluency
네이버 cake
• 영어발음평가시스템
• 네이버영어사전적용
네이버어학사전
Concluding remarks
Future: AI tutors
blog.frontiersin.org
Language Skills
Discussion
• 인공지능기술과외국어교육
• “Virtual Tutor”
• 인공지능환경에서외국어교육연구자의역할
• 컨텐츠 개발
• 컨텐츠설계
• 컨텐츠추천(큐레이션)
• 컨텐츠평가
• 데이터베이스구축
• 기능및컨텐츠를고려한데이터설계및개발
관련연구결과물
• 김선희 (2013). 일본어음성합성을위한음성셋정의. 한국음향학회 2013 추계학술대회논문요약집, p17.
• 김종진, 김상진, 김선희, 김형준, 와타나베리카, 홍진표 (2013). NAVER 다국어음성합성시스템소개. 한국음향학회 2013 추계학술대회논
문요약집, p18.
• 김상진, 김종진, 김선희, 김형준 (2013). 영자신문낭독 TTS용음성코퍼스의발성목록설계. 한국음향학회 2013 추계학술대회논문요약집,
p17.
• 김선희(2014).영어 TTS DB 운율연구: 낭독체와대화체비교. 한국음성학회 2014 가을학술대회논문집, pp93-94.
• 홍진표, 김선희(2014). 한국어 TTS 개발을위한통합운율경계모델링. 한국음성학회 2014 가을학술대회논문집, pp187-188.
• Minki Lee, Jaemin Kim, Sunhee Kim (2015). Classification of prosodic boundaries based on acoustic cues for Korean TTS. ICSS 2015.
• 이민기, 김선희, 김재민(2015). 음소특성을이용한한국어자동음소전사. 음성및신호처리학술대회 2015.
• 이민기, 김재민, 홍진표, 김선희(2016). 개인화음성합성개발을위한소용량발성목록추출. 한국음성학회 2016 봄학술대회.
• Sunhee Kim (2016). How to select a good voice for TTS. 9th ISCA Speech Synthesis Workshop.
• Sunhee Kim (2017). Corpus-based evaluation of Chinese text normalization. In 2017 20th Conference of the Oriental Chapter of the International
Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA) (pp. 1-4). IEEE. (Best paper)
• 김선희(2018). 코퍼스기반프랑스어텍스트정규화평가. 말소리와음성과학, 10(4).
• 김선희(2018). 지식기반프랑스어발음열생성시스템. 말소리와음성과학, 10(1), 49-55.
• 김선희(2018). 프랑스어자동발음평가를위한음운자질연구. 프랑스어문교육, 60, 147-168.
관련연구결과물
• 김선희(2018). 프랑스어 schwa 의음향학적특성.언어학연구 49 (2018): 83-101.
• 김선희(2018). 프랑스어자동발음평가를위한음운자질연구. 프랑스어문교육, 60, 147-168.
• 김선희, & 정현훈(2018). 외국어학습용어플리케이션의음성인식기술활용현황. 한국디지털콘텐츠학회논문지, 19(4), 621-630.
• Ryu, Hyuksu, et al. (2016). Automatic pronunciation assessment of Korean spoken by L2 learners using best feature set selection. Signal and
Information Processing Association Annual Summit and Conference (APSIPA), 2016 Asia-Pacific. IEEE, 2016.
• Hyejin Hong, Sunhee Kim, & Minhwa Chung (2014). A corpus-based analysis of English segments produced by Korean learners. Journal of Phonetics,
46, 52-67.
谢谢