22
Computational Linguistics Emily M. Bender Linguistics 200 March 9, 2007

Emily M. Bender Linguistics 200 March 9, 2007 - UW Faculty Web Server

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Computational Linguistics

Emily M. BenderLinguistics 200March 9, 2007

Overview

• Introduction

• Computational linguistics in everyday life

• Web demos

• How linguistics fits in

• Who’s hiring

• So you want to be a computational linguist...

Overview

• Introduction

• Computational linguistics in everyday life

• Web demos

• How linguistics fits in

• Who’s hiring

• So you want to be a computational linguist...

What is computational linguistics?

• Processing of human language by computers

• ... for linguistic research

• ... for practical applications

• Also known as natural language processing, speech processing

• Point of contact of Linguistics, Computer Science (AI), and Electrical Engineering (signal processing)

How does a spell checker work?

How does a spell checker work?

• Compares input text to a dictionary (+ morphological analyzer) to detect non-words

• Runs error types in reverse (insertion, deletion, transposition, substitution) to come up with candidate corrections

• Compares candidate corrections to dictionary to find viable alternatives

• Ranks candidate corrections according to probability (frequency of that word in context)

• What about irregular morphology?

• What about spelling mistakes which result in other actual words (e.g., three/there, stationery/stationary)?

Noisy channel model

Spell check ASR MT

Input I Word seq Word seq English

Output O Word seq (with mistakes) Acoustic signal French

Noise Mistakes Noise Babelfish

Target Eng. words Eng. words English

Est. Input Î Corrected words Text English

↔î = argmax(p(i|o)) î = argmax(p(o|i)p(i))

Linguistic Knowledge and Machine Learning

• Machine learning

• Design a probabilistic model (the ‘p’s on the previous slide)

• Estimate the probabilities for the model from some data

• Use model to predict labels/values/etc for new data

• Supervised: learner is given training data with target labels included

• Semi-supervised: learner is given a little bit of labeled data and a lot of unlabeled data

• Unsupervised: learner is only given unlabeled data (but lots!)

Linguistic Knowledge and Machine Learning

• What part of a spell-checker needs to be hand engineered?

• What part is learned by the machine?

• What data is used in the machine learning?

• Is the learning supervised, unsupervised, or semi-supervised?

Linguistic Knowledge and Machine Learning: Some history

• 1950s-1980s: Rule-based approaches

• 1990s: Machine learning/statistical revolution

• Now:

• Ceiling effects for machine learning;

• recognition that best solutions will combine both linguistic knowledge and machine learning;

• search for best hybridizations

Overview

• Introduction

• Computational linguistics in everyday life

• Web demos

• How linguistics fits in

• Who’s hiring

• So you want to be a computational linguist...

Computational linguistics in every day life

• What NLP technology do you use?

• Spell checkers

• Grammar checkers

• Menu-based phone systems (with and without ASR)

• Voice activated cell phones

• Search engines

• In-car navigation systems

• Google Calendar

• Context-sensitive ads

Overview

• Introduction

• Computational linguistics in everyday life

• Web demos

• How linguistics fits in

• Who’s hiring

• So you want to be a computational linguist...

Web demos

• English Resource Grammar (DELPH-IN): Broad-coverage precision grammar http://lingo.stanford.edu:8000/erg

• Grammar Matrix (UW/DELPH-IN): Multilingual grammar engineering http://www.delph-in.net/matrix/customize/matrix.cgi

• Text-to-speech system (Oddcast) http://www.oddcast.com/home/demos/tts/frameset.php?frame1=talk

• Machine translation (Babelfish) http://babelfish.altavista.com

Web demos continued

• KnowItAll (UW Turing Center): Autonomous, scalable information extraction from the web http://knowitall-1.cs.washington.edu/dbinterface/knowitall2/default.asp

• Cross-lingual image search (UW Turing Center): http://knowitall-3.cs.washington.edu/panimages/

• Jabberwacky, a chatbot: http://www.jabberwacky.com

Overview

• Introduction

• Computational linguistics in everyday life

• Web demos

• How linguistics fits in

• Who’s hiring

• So you want to be a computational linguist...

All levels of linguistic analysis have a role to play

• Phonetics, phonology:

• text-to-speech, speech recognition

• Morphology:

• Spell checkers, plus support for all tasks at “higher” levels

• Syntax:

• Natural language understanding, generation

• Semantics:

• NLU, generation, reasoning, inference

• Pragmatics:

• Dialogue management, anaphora resolution, menu-based systems

Computational linguistics: Common subtasks

• Language identification

• Sentence tokenization

• Word-level tokenization

• Lemmatization

• Morphological analysis

• Syntactic parsing

• Word-sense disambiguation

• Named-entity recognition

• Sentence- and word-level alignment of parallel bitexts

• Reference resolution

• Dialogue management

• Generation (strategic, tactical)

• Disambiguation

Overview

• Introduction

• Computational linguistics in everyday life

• Web demos

• How linguistics fits in

• Who’s hiring

• So you want to be a computational linguist...

Lots of local and national employers need computational linguists!

• Microsoft

• Google

• Amazon.com

• AOL/Tegic

• Adapx

• InXight Software

• PARC

• VoiceBox

• Cataphora

• LCC

• SYSTRAN

• Nuance

http://depts.washington.edu/uwcl/twiki/bin/view.cgi/Main/JobList

There are many ways to be a computational linguist

• Most flexible option: Training in both Computer Science and linguistics

• To prepare for UW’s Professional Master’s in Computational Linguistics

• Ling 200

• CSE 142, 143, 373

• Stat 391

• To learn more:

• Ling/CSE 472 (prereq Ling 461 or CSE 326 + Ling 200)

• http://compling.washington.edu

• Linguistics colloquia, especially MS/UW Symposium

Thank you!