Upload
ailish
View
27
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Experiences from large NLP Projects. Jan Alexandersson. German research center for Artificial Intelligence GmbH Stuhlsatzenhausweg 3, Geb. 43.1 66123 Saarbrücken Tel.: (0681) 302-5 347 Email: [email protected] www.dfki.de/~janal. Overview. Introduction What was VerbMobil What is SmartKom - PowerPoint PPT Presentation
Citation preview
Scanalu2002 23.5 Jan Alexandersson
Experiences from large NLP Projects
Jan Alexandersson
German research center for Artificial Intelligence GmbHStuhlsatzenhausweg 3, Geb. 43.1
66123 Saarbrücken
Tel.: (0681) 302-5347Email: [email protected]/~janal
Scanalu2002 23.5 Jan Alexandersson
Overview
• Introduction
• What was VerbMobil
• What is SmartKom
• Scaling
• Experiences from VerbMobil
• Conclusion
Scanalu2002 23.5 Jan Alexandersson
What was...
?
http://verbmobil.dfki.de
Scanalu2002 23.5 Jan Alexandersson
VerbMobil - What was it?
• Speech-to-speech translation system
• Robust processing of spontaneous dialogs
• Speaker independent (adaptive)
• Languages: English, German, Japanese
• Domains: Appointment scheduling, travel planning and hotel reservation, remote PC maintenance
• Summary of the dialogue automatically generated by the system
• The system mediates between two humans, it does not play an active role
• There is no control of the ongoing dialog by the system
Scanalu2002 23.5 Jan Alexandersson
The Verbmobil Partners
Scanalu2002 23.5 Jan Alexandersson
Prof. MahrTU BerlinDr. Klein, Dr. WolfDLR, PT
Prof. HoffmannTU Dresden
Prof. PaulusTU Braunschweig
Prof. GörzProf. NiemannUniv. Erlangen
Prof. v. HahnUniv. Hamburg
Prof. TillmannLMU MünchenDr. RuskeTU MünchenDr. BlockSiemens, München
R. RengTemic, UlmDipl.-Ing. MangoldDaimlerChrysler, Ulm
Prof. GibbonUniv. Bielefeld
Prof. BlauertUniv. Bochum
Prof. RohrerUniv. Stuttgart
Prof. HinrichsUniv. Tübingen
Prof. WaibelUniv. Karlsruhe
A. KlüterDFKI,Kaiserslautern
Dr. EiselePhilips, AachenProf. NeyRWTH Aachen
Prof. HessUniv. BonnDr. ReuseBMBF Referat 524
Prof. PinkalUniv. d. Saarlandes
Prof. UszkoreitProf. Wahlster
DFKI, Saarbrücken
Sprecheradaption
MultilingualeWortlisten
SignalnaheEvaluierung
Erkenner DC,Sprachsteuerung(C, C++, Fortran)
Datensammlung,Integrierte Verarbeitung(C, C++, LISP, Prolog)
Woz-Experimente,Datensammlung
Transfer (Prolog)
Multilinguale Erkenner(C, C++)
Kontextaus-wertung(LISP, Prolog, Java)
Prof. KurematsuATR International, Kyoto, Japan
Prof. WaibelCMU, Pittsburgh;Prof. SagCSLI, Stanford, USA
Syntax,Rob. Semantik, Dialog(LISP, Prolog)
Datensammlung, ErkennungSyntax (C, C++, Prolog)
Datensammlung
Erkenner AachenStat. Transfer(C++,C)
Chunk-Parser(Prolog)
Reparatur, Prosodie D, E (C)
AkustischeSynthese(C, C++)
Systemintegration(C++, Tcl-Tk)
MultilingualeProsodiesteuerung(C++,C)
The Verbmobil Partners
Scanalu2002 23.5 Jan Alexandersson
• 23 participating institutions (in Verbmobil II), from Germany and the USA
• Over 900 full-time employees and students involved over the whole duration
• Funded by the German Ministry for Education and Science and the participating companies:
Facts About the Project
BMBF-Funding Phase I, 1.01.93 – 31.12.96 62.7 Mio. DM
BMBF-Funding Phase II, 1.01.97 - 30.9.2000 53.3 Mio. DM
Industrial investment I+II 32.6 Mio. DM
Related industrial R & D activities ca. 20 Mio. DM
Total 168.6 Mio. DM
31.6 Mio €
27 Mio €
16.5 Mio €
ca. 10 Mio €
85.1 Mio €
Scanalu2002 23.5 Jan Alexandersson
Project Organization
Verbmobil Consortium
Group of Module Managers
Head of System Integration GroupA. Klüter
Module CoordinatorN. Reithinger
Manager Module 1
Manager Module n...
Verbmobil Advisory Board
Scientific Management
Scientific HeadW. Wahlster
Deputy Scientific HeadA. Waibel
Head of Project Management GroupR. Karger
DL
R G
. Kle
in
Ste
erin
g C
om
mit
tee
German Federal Ministry for Research and Education
Scanalu2002 23.5 Jan Alexandersson
Input Conditions Naturalness Adaptability Dialog Capabilities
Incr
easi
ng
Co
mp
lexi
ty Close-SpeakingMicrophone/Headset
Push-to-talk
Telephone,Pause-basedSegmentation
Isolated Words
ReadContinuous
Speech
SpeakerIndependent
SpeakerDependent
MonologDictation
Information-seeking Dialog
Open Microphone,GSM Quality
SpontaneousSpeech
SpeakerAdaptive
MultipartyNegotiation
Verbmobil
Challenges for Language Engineering
Scanalu2002 23.5 Jan Alexandersson
Classification of Machine TranslationMethods
SyntacticAnalysis
WordStructure
WordStructure
Direct Translation
Syntactic Transfer
SemanticTransfer
Interlingua
SemanticStructure
SemanticStructure
SemanticAnalysis
SemanticGeneration
SyntacticGeneration
SyntacticStructure
SyntacticStructure
MorphologicAnalysis
MorphologicGeneration
Source LanguageSource Language Target LanguageTarget Language
Scanalu2002 23.5 Jan Alexandersson
The VerbMobil Case
SyntacticAnalysis
WordStructure
WordStructure
Direct Translation
Syntactic Transfer
SemanticTransfer
Interlingua
SemanticStructure
SemanticStructure
SemanticAnalysis
SemanticGeneration
SyntacticGeneration
SyntacticStructure
SyntacticStructure
MorphologicAnalysis
MorphologicGeneration
Source LanguageSource Language Target LanguageTarget Language
Speech
Signal
Speech
Signal
ProsodicAnalysis
ProsodicAnnotation
Scanalu2002 23.5 Jan Alexandersson
The Graphical User Interface
Scanalu2002 23.5 Jan Alexandersson
Focuses of Speech Recognitionin Verbmobil
RobustnessMultilinguality
LargeVocabulary
DaimlerChrysler
RWTHAachen
University ofKarlsruhe
Scanalu2002 23.5 Jan Alexandersson
General Speech Recognition Task
GermanGerman
EnglishEnglish
JapaneseJapanese
Audio Signal Recognizers Word Hypotheses Graph
interface between acoustic and linguistic processing
Scanalu2002 23.5 Jan Alexandersson
What Linguistic Analysis Really Needs
• Syntactic Boundaries He saw ? the man ? with the telescope Prosody cannot help
• Dialog Act Boundaries No, I have no time at all on Thursday. D But how about on Friday? Dialog acts are pragmatic units that chunk the input into units which can be processed alone.
• Prosodic Syntactic Boundaries Of course ? not ? on Saturday Syntactic boundaries that correlate to the acoustic-phonetic reality; help during analysis within one chunk/dialog act. Important in spontaneous speech with elliptical utterances.
Scanalu2002 23.5 Jan Alexandersson
Speech Signal Word Hypotheses Graph
Multilingual Prosody ModuleProsodic features:F0 duration energy ....
Search SpaceRestriction
Parsing
Dialog ActSegmentation and
Recognition
Dialog Understand.
Constraints forTransfer
Translation
LexicalChoice
GenerationSpeech
Synthesis
SpeakerAdaptation
BoundaryInformation
BoundaryInformation
BoundaryInformation
BoundaryInformation
SentenceMood
SentenceMood
AccentedWords
AccentedWords
Prosodic FeatureVector
Prosody in Verbmobil
Scanalu2002 23.5 Jan Alexandersson
Facts about Repairs in the Verbmobil Corpus
• 21% of all turns in the Verbmobil corpus (79 562 turns ) contain at least one self correction
• The syntactic category is preserved in most cases(For example: Out of a sample of 266 verb replacements, 224 are again mapped to verbs)
• Repairs take place in a restricted context(in 98% the reparandum consists of less than 5 words)
• Repair sequences underlie certain regularities
Scanalu2002 23.5 Jan Alexandersson
The Understanding of Spontaneous Speech Repairs
I need a car next Tuesday oops Monday
Original Utterance Editing Phase Repair Phase
Reparandum Editing Term Reparans
Recognition ofSubstitutions
Transformation of theWord Hypotheses Graph
I need a car next Monday
Scanalu2002 23.5 Jan Alexandersson
Architecture of Repair Processing “On Thursday I cannot no I can meet äh after one”
Scanalu2002 23.5 Jan Alexandersson
Multiple Approaches• Mono-cultural approaches are dangerous
– humans vs. viruses diversity– Microsoft vs. ILOVEYOU and copycats alternative software solutions
• Some sources of errors in a speech translation system– external
• spontaneous speech: not well formed, hesitations, repairs• bad acoustic conditions• human dialog behavior
– internal• knowledge gaps in modules• software errors• probabilistic processing
Use multiple engines, varying approaches on various stages of processing
Scanalu2002 23.5 Jan Alexandersson
• Exclusive alternatives: three different 16 kHz German speech recognizers with various capabilities
• Competing approaches:
– three parsers: HPSG, Chunk, Statistical
– five translation tracks: case-based, dialog-act based, statistical,
substring- based, linguistic (deep) semantic translation• Needed: selection and combination of results from competing tracks
– parsers: combination of partial analyses in the semantic processing modules
– translation: pre-selection module
Multiple Approaches in Verbmobil
Scanalu2002 23.5 Jan Alexandersson
Multiple Translation Tracks - Approaches and Advantages
• Case-based: – Approach: uses examples from the aligned bilingual Verbmobil corpus– Advantage: good translation if input matches example in corpus
• Dialog-act based:– Approach: extract core intention (dialog act) and content– Advantage: robust wrt. recognition errors
• Statistical– Approach: use statistical language and translation models– Advantage: guaranteed translation with high approximate correctness
• Substring- based– Approach: combines statistical word alignment with precomputation of translation
”chunks” and contextual clustering– Advantage: guaranteed translation with high approximate correctness
• Linguistic (deep) semantic translation– Approach: “classic” approach using semantic transfer– Advantage: high quality translation in case of success
Scanalu2002 23.5 Jan Alexandersson
Example Based Translation• Task:
Providing a translation based on translation templates and partial linguistic analysis
• Input: WHGs or best Hypothesis
• Method: Definite Clause Grammar (DCG), graph matching algorithms
• Result: Translation and a confidence value
• Benefit: Improving Verbmobils translation capabilities through an additional translation path
• Responsible: DFKI, Kaiserslautern
Scanalu2002 23.5 Jan Alexandersson
Dialog-Act Based Translation• Task:
Robustly provide a translation of core intentions and contents of the domain
• Input: Prosodically annotated best hypothesis (flat WHG)
• Method: Statistical dialog-act classifier and Finite State Transducers
• Result: Translation and a confidence value, additionally content descriptions for the dialog module
• Benefit: Robust translation and content extraction even when the recognition is erroneous
• Responsible: DFKI, Saarbrücken
Scanalu2002 23.5 Jan Alexandersson
Statistical Translation• Task:
Provide approximative correct translations
• Input: Prosodically annotated best hypothesis (flat WHG)
• Method: Use statistical language and translation models
• Result: Translation and a confidence value
• Benefit: Approximative correct translation for spontaneous speech
• Responsible: RWTH Aachen
Scanalu2002 23.5 Jan Alexandersson
Deep Translation• Task:
Provide high quality translations
• Input: Prosodically annotated WHG and contextual information
• Method: Use syntactic and semantic approaches to analysis, transfer, and generation
• Result: Translation containing content information, suited for high quality speech synthesis
• Benefit: Delivers the highest quality, but is sensitive to recognition errors and spontaneous speech phenomena
• Responsible: Siemens AG, DFKI Saarbrücken, Universität Tübingen, Universität des Saarlandes, Universität Stuttgart, TU Berlin, CSLI Stanford
Scanalu2002 23.5 Jan Alexandersson
Modules Involved
•Integrated processing comprises
– search through the WHG
– statistic parser
– chunk parser
•Semantic Construction provides VITs from statistic and chunk parser output
•Deep Analysis: HPSG Parser
•Dialog Semantics:combination of parsing results, and semantic resolution
•Transfer: VIT to VIT transfer
•Generation: TAG generation from VITs
•Dialog+Context: provides contextual information
Scanalu2002 23.5 Jan Alexandersson
The Multi-Parser Approach• Verbmobil uses three different syntactic parsers:
an HPSG parser, a chunk parser, and a probabilistic LR parser.
• Every parser implements another level of parsing accuracy, depth of syntactic analysis, and robustness of the analyzing process.
– Chunk parser: Most robust but least accurate analysis
– HPSG parser: Most accurate by least robust analysis
– Probabilistic parser: Level of accuracy and robustness
between HPSG and chunk parser
Scanalu2002 23.5 Jan Alexandersson
HPSG Processing • Task:
Thorough syntactic analysis
• Input: Word chains from integrated processing
• Method: Apply HPSG analysis
• Result: Source language VITs
• Benefit: Delivers the highest quality, but is sensitive to recognition errors and spontaneous speech phenomena
• Responsible:
DFKI Saarbrücken, CSLI
Stanford
Scanalu2002 23.5 Jan Alexandersson
The Result is a Syntactic Tree“Alright, and that should get us there about nine in the evening.”
Scanalu2002 23.5 Jan Alexandersson
... but analysis is not always spanning“The train arise at seven thirty. We could take a cab it to the hotel
problem train station.”
Scanalu2002 23.5 Jan Alexandersson
Semantic Construction• Task:
Convert and extend syntax trees to VITs
• Input: Syntax tree from statistical and chunk parsers
• Method: Compositional construction using semantic lexicon
• Result: VITs
• Benefit: Providing results of shallow parser to the deep analysis track
• Responsible: Universität Stuttgart (IMS)
Scanalu2002 23.5 Jan Alexandersson
Schematic Processing
Lexcion access and interpretation of the grammatical roles
Intermediate representation: Application Tree
Compositional semantic construction
Intermediate representation: VIT
Non compositional semantic construction using transfer rule engine
Intermediate representation: Resulting VIT
Input: Syntactic tree
Scanalu2002 23.5 Jan Alexandersson
Dialog Semantics• Task:
Combining results from various parsers, reinterpret and correct VITs, and resolve non-local ambiguities
• Input: VITs from different parsers
• Method: VIT models and rule based approaches
• Result: VIT ready for transfer
• Benefit: Enhances robustness of deep analysis and provides vital information for transfer
• Responsible: Universität des Saarlandes, Saarbrücken
Scanalu2002 23.5 Jan Alexandersson
Combining Analyses from Various Parsers
• Parsers deliver VITs for segments of a turn
• May be spanning analyses or just partial fragments
• Combination necessary, both analyses of one parsers, but also analyses from various parsers
• Combination criteria
– HPSG is better than statistical parsers is better than chunk parser
– Integrated results are better than fragments
– Longer results are better than short ones
Scanalu2002 23.5 Jan Alexandersson
Semantic Based Transfer• Task:
Transfer VITs from the source to the target language
• Input: VITs
• Method: Rule based transfer
• Result: VITs for generation
• Benefit: Translate VITs inside the deep translation path
• Responsible: Universität Stuttgart (IMS)
Scanalu2002 23.5 Jan Alexandersson
Context Evaluation• Task:
Resolving ambiguities in the dialog context during semantic transfer
• Input: Requests from transfer
• Method: Using world knowledge and rules
• Result: disambiguated transfer requests
• Benefit: Higher quality of transfer results
• Responsible:
Technical University (TU)
Berlin
Scanalu2002 23.5 Jan Alexandersson
Dialog Processing• Task:
Provides dialog context for all tracks and computes main information for dialog summaries
• Input: Data from a lot of modules
• Method: Frame-like topic structuring and rules
• Result: context information and dialog summaries and minutes
• Benefit: Verbmobil knows what happens throughout the dialog and can present it
• Responsible: DFKI, Saarbrücken
Scanalu2002 23.5 Jan Alexandersson
ProbabilisticAnalysis of Dialog
Acts (HMM)
ProbabilisticAnalysis of Dialog
Acts (HMM)
Recognition ofDialog Plans
(Plan Operators)
Recognition ofDialog Plans
(Plan Operators)
Dialog Act
Dialog Phase
Syntactic AnalysisSyntactic Analysis
RobustDialog Semantics
RobustDialog Semantics
VITVIT
SemanticTransfer
SemanticTransfer
Dialog Act
Dialog Information in Semantic Transfer
Scanalu2002 23.5 Jan Alexandersson
The Intentional Structure
DA Level
Move Level
Game Level
Phase Level
Dialogue LevelVM_Dialogue
PH_Greet
G_Greet
M_Greet M_Greet
PH_Nego
G_Nego
GreetFeedback
Pol_FormIntroduce
G_Nego
Request Suggest
A AB B
Reject
Speaker
M_Tr_Init M_Init M_Resp
Scanalu2002 23.5 Jan Alexandersson
Collaboration for a New Functionality: Summaries
• Provide the users with a summary of the topics that were agreed• Two benefits
– have a piece of information to use in calendars etc.– control the translation
• Approach: exploit already existing modules for– content extraction– dialog interpretation– planning the summary– generation– transfer
Scanalu2002 23.5 Jan Alexandersson
Summaries
• Dialog module keeps track of the dialog:dialog model, context extraction, translations: dialog history
• Three types of documents:
• Minutes: relevant exchanges
• Summary: dialog results
• Scripts: complete dialog script
Scanalu2002 23.5 Jan Alexandersson
Multilingual Summaries• Multilinguality: Integration of transfer
module:
German Summary (HTML)
ContextSyndialog
Dialog
VM-PROTO
GENGER
Transfer (GE) VM-PROTO
GENENG
English Summary (HTML)
Document structure
VITs VITs
Scanalu2002 23.5 Jan Alexandersson
Result Summary
Scanalu2002 23.5 Jan Alexandersson
Generation• Task:
Robustly generate the output of the semantic transfer in German, English, or Japanese
• Input: VITs from transfer
• Method: Constraint system for micro-planning, TAG grammar (reusing HPSG grammars) for syntactic realization
• Result: Strings, enriched with content-to-speech (CTS) information to support synthesis
• Benefit: Output from the semantic transfer track
• Responsible: DFKI, Saarbrücken
Scanalu2002 23.5 Jan Alexandersson
Multiple Translation Tracks –Approx. correct translation
3744
46
69
79 81
40
45
4640
47 49
65
7579
57
6668
78
83 858895
97
0
20
40
60
80
100
120
case based
statistical
DA based
Sem. based
Substring
Selection (Man)
Selection (Learning)
Selection (Manual)
case based 37 44 46
statistical 69 79 81
DA based 40 45 46
Sem. based 40 47 49
Substring 65 75 79
Selection (Automatic) 57 66 68
Selection (Learning) 78 83 85
Selection (Manual) 88 95 97
WA > 50% WA > 75% WA > 80%
Scanalu2002 23.5 Jan Alexandersson
Verbmobil – The BookThere are over 600 refereed papers on the various aspects of and achievements in Verbmobil.
Wolfgang Wahlster (ed.):
"Verbmobil: Foundations of Speech-to-Speech
Translation"Springer-Verlag Berlin Heidelberg
New York. 679 Pages
ISBN 3-540-67783-6
Scanalu2002 23.5 Jan Alexandersson
What is...
?
http://smartkom.dfki.de
Info
rmat
ion
, Ap
plic
atio
ns,
Peo
ple
User(s)
UserModeling
DiscourseManagement
IntentionRecognition
InteractionManagement
ModeAnalysis
Language
Graphics
Gesture
Sound
Media InputProcessing
Media OutputRendering
Reference Architecture for Multimodal Systems
Context Management
ExpectationManagement
User ID
Bio
met
rics
Application Interface
Integrate
Respond
Request
Terminate
Initiate
T
A
V
G
G
ModeCoordination
PresentationDesign
Multimodal ReferenceResolution
Multimodal Fusion
A
A
V
G
G
ModeDesign
Language
Graphics
Gesture
Sound
AnimatedPresentation
Agent
Select Content
Design
Allocate
Coordinate
Layout
UserModel
DiscourseModel
DomainModel
MediaModels
TaskModel
Representation and Inference, States and Histories
ApplicationModels
ContextModel
ReferenceResolution
Action Planning
2 Nov. 2001Dagstuhl SeminarFusion and Coordinationin Multimodal Interactionedited by: M. Maybury
Scanalu2002 23.5 Jan Alexandersson
User specifies goal
delegates task
cooperate
on problems
asks questions
presents results
Service 1 Service 1
Service 2Service 2
Service 3Service 3
IT Services
PersonalizedInteraction
Agent
Situated Delegation-oriented Dialog Paradigm: Collaborative Problem Solving
Scanalu2002 23.5 Jan Alexandersson
The Main Modules on the Control GUI
Scanalu2002 23.5 Jan Alexandersson
More About the System• Modules realized as independent processes• Not all must be there (critical path: speech or graphic input to speech or graphic
output)• (Mostly) independent from display size • Pool Communication Architecture (PCA) based on PVM for Linux and NT
– Modules know only about their I/O pools– Literature:
• Andreas Klüter, Alassane Ndiaye, Heinz Kirchmann: Verbmobil From a Software Engineering Point of View: System Design and Software Integration. In Wolfgang Wahlster: Verbmobil - Foundation of Speech-To-Speech Translation. Springer, 2000.
• Data exchanged using M3L documents • All modules and pools are visualized here ...
Scanalu2002 23.5 Jan Alexandersson
The Real Story
Scanalu2002 23.5 Jan Alexandersson
Frame Languages
Object-oriented ModelingPrimitives
Frame Languages
Object-oriented ModelingPrimitives
NL/MM-Semantics
More formal SemanticsSubsumption, Inferences
NL/MM-Semantics
More formal SemanticsSubsumption, Inferences
W3C Standards
XML Schema/DTDs
W3C Standards
XML Schema/DTDs
M3LM3L
The “Glue“ - M3L: XML based Multimodal Markup Language
Domain Knowledge
NL/MM Representation
Pool Pool Pool. ... .
XML schema XML schema XML schema
Scanalu2002 23.5 Jan Alexandersson
Validation of Dialogue Systems
Analysis
Generator
DatabaseDM
ASR
Synthesis
Dialoguemodel
• Project ValDia (DFKI – DaimlerChrysler ULM)
• Tool for validation of Dialogue Models/Managers (DM)
Automatic
Manual
Scanalu2002 23.5 Jan Alexandersson
Validation of DM• Even slight changes can make test suites for DM invalid
(but not for parser, recognizer, …) • Put persons in front of the complete system
+ We will eventually find errors- It is time consuming
- For some scenarios impossible to exhaustively validate a DM
- What module failed to perform its task?- Combination of errors?
the whole system has to be put together
Scanalu2002 23.5 Jan Alexandersson
Validation of DM• ValDia approach: Replace test person and I/O modules
with ValDia
DatabaseDM
Analysis
Generator
ASR
Synthesis
Dialoguemodel
Scanalu2002 23.5 Jan Alexandersson
Experiences• ValDia detects errors
• Logical:
– Combination of greet und request leads to goal conflict in DM – DM hang!
• Technical:
– After about 500 Dialogues DM crashed due to erroneous memory handling
Scanalu2002 23.5 Jan Alexandersson
What is
Scalability?
Scanalu2002 23.5 Jan Alexandersson
What is Scale (-able)?
• WordNet (1.6):– Noun scaling has 3 senses
• (grading) the act of arranging in a graduated series• act of measuring, arranging or adjusting according to a scale• ascent by or as if by a ladder
– Verb scale has 8 senses• measure by or as if by a scale; "This bike scales only 25 pounds• pattern, make, ... or estimate according to some rate or standard• take by attacking with scaling ladders• (surmount) -- reach the highest point of• climb up by means of a ladder• scale, descale -- remove the scales from; "scale fish"• measure with or as if with scales; "scale the gold"• size or measure according to a scale
Scanalu2002 23.5 Jan Alexandersson
Scaling what/how?
Bigger
Better
Faster
Robuster
PrecisionCoverage
Multilinguality
Cheaper
Depth
Scanalu2002 23.5 Jan Alexandersson
Coverage
Linguisticconstructions
Domain,Task,
Application
Sub-Languages,Type of Lang.
Multilingual,Cultur
Interactionstyle
SIZE
RobustnessDepth
Speed
Scanalu2002 23.5 Jan Alexandersson
Who are we scaling for?
• EU
• NSF
• BMBF
• Industri
• ...
Basic research Research Prototypes
Applied research / Product development
``Real´´ Systems
Scanalu2002 23.5 Jan Alexandersson
Experiences VerbMobil
• ``Many´´ people has said:– With 15-20 persons on one spot I would make a VerbMobil of my
own. But muuuuuch better/cheaper/...• This is not true!
– Software enginering– Ex: Speech recognition
• -93: – Single word recognition– Push-to-talk
• -00:– Open microphone– Spontaneous Speech
Scanalu2002 23.5 Jan Alexandersson
The VerbMobil Corpus
• 3,200 dialogs (G: 1,454, E: 726, J: 1,020)• 1,658 speakers (G: 1,022, E: 202, J: 434)• 79,562 turns (G: 41,512, E: 16,104, J: 21,946) • 1,520,000 running words (G: 670,000, E: 270,000, J:
580,000)• 181,6 hours were recorded (G: 96.1, E: 37.9, J: 47.7)• were recorded using
– a close microphone, – a room microphone and – a telephone
Scanalu2002 23.5 Jan Alexandersson
The VerbMobil Corpus
• transcribed and distributed on
– 56 CDs (21.5 GB)
• Analyzing the corpus:
– 206,000 instances of articulatory background noise,
– 85,000 instances of breathing and
– 35,000 hesitations
• voiced: 19,000,
• nasal: 2,500,
• vocalic-nasalized. 13,500
• The Verbmobil data are distributed to research or commercial users via the Bavarian Archive of Speech Signals (BAS)
Scanalu2002 23.5 Jan Alexandersson
Experiences from WOZ
GER142: danach könnten wir gemeinsam Abendessen gehen
SIM143: Bitte wiederholen Sie Ihre Äußerung.
Es ist ein Fehler in der semantischen Verarbeitung aufgetretenGER144: oh ,danach könnten wir gemeinsam abendessen
SIM145: Bitte wiederholen Sie Ihre Äußerung mit anderen Wörtern. Die semantische Verarbeitung war nicht erfolgreich
GER146: äh, okay
ENG147: maybe a bit louder ?
GER148: yes , I invite you for the dinner.
Scanalu2002 23.5 Jan Alexandersson
Development HPSG
• Starting point: HPGS for written G/E
• Goal: 10.000 Lexical Entries for spont. spoken G/E
• Schema: 20-40
0
2000
4000
6000
8000
10000
12000
-93 -96 (V1.0)
-98 -00
Scanalu2002 23.5 Jan Alexandersson
Development HPSG
• What factors contributed to progress?
– Getting to know the challenge
• Spontaneous/Spoken vs
• Written Language
– Finding a Suitable Formalism
– Tools
– Interface
• Verbmobil Interface Term (VIT)
– Compilation Techniques
– Test Suites
– Corpora
Scanalu2002 23.5 Jan Alexandersson
Well Defined Interfaces
• Speech Recognotion – Linguistic Modules:
– Word Hypothesis Graph (WHG)
• Between (deep) Linguistic Modules
– VerbMobil Interface Term (VIT)
• Linguistic Modules – Synthesizer
– Annotated String (Concept-to-Speech)
Scanalu2002 23.5 Jan Alexandersson
• Development at different cites
• Communication via Email and FTP Server:
– UPLOAD
• Software for integration
– EXCHANGE
• Exchanging software between developers
– ALPHA Service
• New integrated complete system
Support from the System Group (3):The FTP Server
Scanalu2002 23.5 Jan Alexandersson
Important Contributions
• Multiple approaches
• Management
• Meetings
– Project meetings, Work Shops, ...
• Corpus collection - Massive amounts of data for
– Testing, Linguistic Phenomena, Annotation
• System Group
– Test bed, Integration Cycles, ...
• Time
• The Internet
• ...
Scanalu2002 23.5 Jan Alexandersson
Conclusion
• We still need:
– lot of man power :
• Researchers
• Software engineers
• Management
– lot of data:
• annotate
• learn from
• All this costs a lot of $/€
• The Holy Grale of NLP (too?): Self learning systems
Scanalu2002 23.5 Jan Alexandersson
Thank you very much for your attention!