47
Big questions Survey of areas of linguistics Summary The lab Linguistics in a nutshell by hook or by crook Jeremy G. Kahn Signal, Speech & Language Interpretation Laboratory Department of Linguistics University of Washington 22 June 2008 / Workshop 2007 Kahn Linguistics brushup

University of Washingtonssli.ee.washington.edu/WS07/notes/ling-intro-slides.pdf · Survey of areas of linguistics Summary ... Grammaticality and meaningfulness ... language classification

  • Upload
    lexuyen

  • View
    220

  • Download
    1

Embed Size (px)

Citation preview

Big questionsSurvey of areas of linguistics

SummaryThe lab

Linguistics in a nutshellby hook or by crook

Jeremy G. Kahn

Signal, Speech & Language Interpretation Laboratory Department of LinguisticsUniversity of Washington

22 June 2008 / Workshop 2007

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Outline

1 Linguistics: big questionsLinguistics charterWhat linguists look atLinguistics’ role

2 Survey of areas of linguisticsPhonetics & PhonologyMorphology and syntaxSemantics

3 The lab

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Business information

Linguistics introductions

By necessity, incompleteApologies

my personal speaking styleguessing about level of preparation

Caveat: I’m a computational linguist

Caveat: I have an engineering bias

Goal: informality. Questions are good

Thanks to Don Baumer (Linguistics) for letting me crib slides &examples

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

CharterWhat linguists look atLinguistics’ role

Outline

1 Linguistics: big questionsLinguistics charterWhat linguists look atLinguistics’ role

2 Survey of areas of linguisticsPhonetics & PhonologyMorphology and syntaxSemantics

3 The lab

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

CharterWhat linguists look atLinguistics’ role

What is linguistics?

Scientific study of human language

How is language organized?

How is it used?

General questions about Language (capital L)

What do all languages have in common?

How can we describe how Language (or languages)works?

How can we describe how a language works?

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

CharterWhat linguists look atLinguistics’ role

Language & communication

All communications have:

mode or medium : speech, gesture, olfaction, etc

semanticity : meaning carried

pragmatic function : intention carried

some also have:

interchangeability (send *and* receive)

cultural transmission : learned from other users

arbitrariness : non-iconicity

discreteness "compositionality"

displacement : discuss things that aren’t here

productivity : new ways to organize it

Where do computer languages differ from human languages?Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

CharterWhat linguists look atLinguistics’ role

What makes language interesting?

Language is creative, but constrained“Seattle is rainy.” – well-formed* “rainy Seattle is.” – ill-formed“I like caffeinated drinks without bubbles.”* “Bubbles without drinks caffeinated like I”Not just word order:

“pronk” could be an English word (in fact, it is)

“przak” could not be (how do you know?)

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

CharterWhat linguists look atLinguistics’ role

Constraint and creativity

Linguists like to say language is “rule-governed”.Statistically-minded engineers might quibble...Engineering way of looking at it (thanks Shannon):

sender wants to have symbol for every idea

recipient won’t have those symbols

compositionality and productivity allows novelty andcommunication

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

CharterWhat linguists look atLinguistics’ role

Outline

1 Linguistics: big questionsLinguistics charterWhat linguists look atLinguistics’ role

2 Survey of areas of linguisticsPhonetics & PhonologyMorphology and syntaxSemantics

3 The lab

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

CharterWhat linguists look atLinguistics’ role

Language as a part of the human OS

Language:

not literacy.

major advantage over chimpanzees (e.g. displacement)

we’ve got specialist wetware

Competent language use

No school required

No explicit instruction required

Most humans competent in one language before age 3

What do we mean when we say “competent”?

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

CharterWhat linguists look atLinguistics’ role

Competence and Performance

Big idea in modern linguistics

Competence : what a native user of a language knows.ability to produce & comprehend languagesystem or knowledge (“grammar”) thatsupports thatlargely subconsciouslearned (first-language) without effort

Performance : what language users dooften fully competentnot always: speech errors, typos, “brain-o’s”

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

CharterWhat linguists look atLinguistics’ role

What’s so neat about competence?

Many modern linguists care about competence more thanperformance.Their view (Chomsky):

your competence is a window on the underlying structureof your grammar

your performance includes a bunch of messy wetware

These (self-proclaimed “theoretical”) linguists are very veryinterested in trying to figure out what the OS is from thebehavior of the code.

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

CharterWhat linguists look atLinguistics’ role

Grammaticality and meaningfulness

“Meaningful” and “grammatical” not synonymous:

Grammatical, but meaningless : ‘Colorless green ideas sleepfuriously.’ — Noam Chomsky

Ungrammatical, but meaningful : ‘Around the survivors, aperimeter create.’ — Yoda, Episode 2

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

CharterWhat linguists look atLinguistics’ role

Outline

1 Linguistics: big questionsLinguistics charterWhat linguists look atLinguistics’ role

2 Survey of areas of linguisticsPhonetics & PhonologyMorphology and syntaxSemantics

3 The lab

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

CharterWhat linguists look atLinguistics’ role

What’s all this about grammar, then?

Descriptive grammar : an attempt to describe the acceptabilityjudgments (or patterns of use/competence) of aspeaker.

Prescriptive grammar : explicit instructions on how one shouldwrite (or speak); the language police.

Linguistics is not about descriptive grammar.

We don’t tell you how you should.

We try to describe how you do.

Dogma: All human languages, stigmatized or not, are equallyexpressive.

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

CharterWhat linguists look atLinguistics’ role

Linguistics and semi-supervised learning

Humans do it We get very little explicit labeling of our languagedata, yet we learn without instruction:

what words and parts of words meanhow to pronounce words we readhow to understand sophisticated sentenceconstructions (“respectively”)and more. . .

It’s not all hard-coded (“universal grammar”):patterns often language-specific

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

CharterWhat linguists look atLinguistics’ role

Linguistics and semi-supervised learning

The corpora are out there :the webemail (Enron emails!)newsgroups

also speech corpora:radiotelevisionpodcasts

All mostly unlabeled but enormous

Natural language problems: perfect for semi-supervised work.

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

Overview of the different parts of language

Overview of the different parts of language (different parts of"grammar")

Phonetics - how sounds are made and perceived

Phonology - function and patterning of sounds

Morphology - structure of words

Syntax - analysis of sentence structure (word order)

Semantics - meaning (words to meaning)

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

Other areas of linguistic study

Other areas of linguistic study:

Historical linguistics - language evolution and creation

Pragmatics - what else is intended and performed

Typology - language classification and differences

Psycholinguistics - neurobiological basis for language

Language acquisition

Sociolinguistics - language’s influence on and indication ofsocial status and behavior

Writing systems - . . . a mess

and more. . .We’ll not cover those here

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

Outline

1 Linguistics: big questionsLinguistics charterWhat linguists look atLinguistics’ role

2 Survey of areas of linguisticsPhonetics & PhonologyMorphology and syntaxSemantics

3 The lab

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

Phonetics

Phonetics: the study of linguistic speech soundsarticulatoryauditory (perceptual)acoustic

Problems phonetics works with:no "spaces" between words: but we perceive themsounds are in a continuous (acoustic) space, but we chunkthem into the (discrete) space of the language’s segments

Tools phoneticians use:spectrogram readershuman listeningtranscription system (usually the International PhoneticAlphabet, IPA)

Why use IPA?Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

Spelling is not pronunciation

Probably obvious to non-native English speakers

Some languages have cleaner spelling-sound relationships(Spanish, Korean), but:

“corazon” and “quesadilla” have the same initial soundEven a “clean” alphabetic language (e.g. Spanish) doesn’thave a 1:1 relationship between characters and phoneticsegments:

English is alphabetic, but with even noisier mappings

“this” vs. “thought”English voicing of interdental (tongue-between-teeth)fricative: not represented in orthography ever.

This is why we use IPA.

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

More on phonetics

Lots more available on phonetics:

articulatory names (parts of the speech system)

classification system

learning the IPA

“supra-segmentals”: articulations across multiplesegments (e.g., pitch shapes)

. . . and still not even touching the perceptual or acousticdomain

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

Phonology

Phonology:

Study of inventory of sounds in a language

How sounds pattern together or contrast

Minimal pair (research tool):

‘had’ vs. ‘hat’ : /t/ and /d/ are contrastive in English

‘steel’ vs. ‘stale’ vs. ‘stool‘ : /i/, /e/, /u/ are contrastive

Contrastive sounds are phonemes: minimal units of sound

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

Phonology (2)

Complementary distribution: two sounds appear in consistentlydifferent environments (never the same).

[ph ] ‘pit’

[p ] ‘spit’

[ph], [p] not phonemically different: allophones of /p/Glossing over much more in phonology. . .

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

An aside for the deaf

Sign languages (e.g. American Sign Language) havephonology as well.

Handshapes and gestures are essentially phonemic

Different sign languages have different choices about howto cluster handshapes: different phonemes

I am not an expert, but I know it’s an open research area.

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

Outline

1 Linguistics: big questionsLinguistics charterWhat linguists look atLinguistics’ role

2 Survey of areas of linguisticsPhonetics & PhonologyMorphology and syntaxSemantics

3 The lab

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

Morphology

Morphology is:

the study of wordsthe rules (patterns) of word formation

Word : a minimal free form. Can appearin isolationin multiple positions

“The hunter pursued the bears.”is “-er” a word? No. (constrained after “hunt”)is “the hunter” a word? No. (not minimal)wait: what is “-er” then?

Morpheme : the smallest part of a word carrying meaning

Some morphemes can’t stand alone (affixes):(prefix, suffix, infix, circumfix)

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

Syntax

Lexicon : a dictionary (form and category)

Lexical category : (also “content word”).“Open class”, e.g.

Noun (rabbit, bicycle)Verb (die, love, walk)Adjective (red, tall, frivolous)Adverb (often, very)

Grammatical category (also “function word”).“Closed class”, e.g.

Preposition (with, on, of, for)Conjunction (and, or, because)Determiner (our, the, this, many)Auxiliary (will, can, may)

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

Syntax

Some words are ambiguous (especially open-class). Consider“comb”.How to tell what category it is? some examples:

meaning : acting as a person/place thing? probably NOUN

inflection : if you can add ‘-ed’ or ‘-ing’ to it? probably VERB

distribution : if it appears after a degree word (e.g. “very”):probably ADJ

(Computational linguistics: “part-of-speech tagging”)Morphology ties to syntax.

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

Back to morphology

Nope, not done:Not just words in the lexicon: also morphemes:

closed-class (function) morphemes :

prepositions & articles (function words)inflectional morphemes: don’t change class

open-class morphemes :usually stand-alone (nouns, verbs, etc)also ‘-ly’, ‘-er’, ‘anti-’ derivational morpheme(may change class of stem)

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

Word formation in English

Inflectional morphemes (no class change)-s third person singular present

-ed past tense-ing progressive-en past participle

-s plural-’s possessive-er comparative

-est superlative

Derivational affixes (class change)input result

happy [adj] + -ness happiness [n]beauty [n] + -full beautiful [adj]

beautiful [adj] + -ly beautifully [adv]stable [adj] + -ize stabilize [v]

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

Subtleties in morphology

Perverse cases, even in English:

recursive-ish morphology:input result

beauty [n] + -ful + -ness beautifulness [n]

English has roughly one (rather rude, emphatic) infix:input result

-****ing- + Massachusetts ("Massa-****ing-chusetts")

Comp ling task : stemming, morphological analysis (v.important in other languages, e.g. Czech)

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

Back to syntax

Review: some words are ambiguous (“comb”): what to do?

meaning

inflection

distribution

Distribution could be a lot:

Constituent : grammatical unit; part of larger unitsentence = noun phrase (NP) + verb phrase(VP)noun phrase (NP) = determiner + nournnoun is a (minimal) constituent

Note recursion is possible.

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

Phrases and ambiguity

How does phrase structure help with ambiguity?S

NP

Det

the

N

men

VP

V

comb

NP

Det

their

N

hair

S

NP

Det

the

N

men

VP

V

share

NP

Det

a

N

combNote that structure resolves lexical ambiguity: whether “comb”is noun or verb

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

Syntax and structural ambiguity

Another kind of ambiguity:

The woman shot the man with the gun.

Who has the gun? (she shot him with it):S

NP

The woman

VP

V

shot

NP

Det

the

N

man

PP

with the gun

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

Syntax and structural ambiguity

Another kind of ambiguity:

The woman shot the man with the gun.

Who has the gun? (he had it):S

NP

The woman

VP

V

shot

NP

Det

the

N

man

PP

with the gunKahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

Syntax and structural ambiguity

No ambiguity about the meaning of any word

two different kinds of attachment for “with the gun”

PP attachment? messy. POS? fairly easy.

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

Outline

1 Linguistics: big questionsLinguistics charterWhat linguists look atLinguistics’ role

2 Survey of areas of linguisticsPhonetics & PhonologyMorphology and syntaxSemantics

3 The lab

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

Semantics

Two major areas within the study of language meaning:

Lexical semantics : meaning of individual morphemes

Compositional semantics : (or “phrasal semantics”): howmeaning gets built up from pieces

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

Lexical semantics

synonymy : “means (almost) the same thing”: (angry,sad),(vomit,puke)

homonymy : “same form, unrelated meanings”:(pass[abstain],pass[succeed])

antonymy : “opposite meaning”

hyponymy (hypernymy) : A is a hyponym of B (A is a specialcase of B; B is a hypernym of A; B is ageneralization of A)

poodle ; dog ; animalsprint ; run ; move

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

Compositional semantics

sense (intension) : the meaning of a word/phrase as a function(e.g., “rabbit” is a function from items to booleanvalue)

reference (extension) : which thing(s) in the world the function(word,phrase) picks out (the set of rabbits)

Example:

“Jeremy”

“today’s linguistics tutor”

Same reference (extension), different sense

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Phonetics & PhonologyMorphology and syntaxSemantics

Compositional semantics

Dealing with sentences. Sentences are boolean function onuniverse.

“I like cheese”

“I live in Seattle” Same reference (TRUE), different sense(different function).

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Summarizing

Lots of areas of linguistic research.

Most of these are becoming approachable computationally

None are very easy

But:

these represent what linguists think is going on in naturallanguage

not necessarily what is needed: these classes may notrelate to task at hand in computation

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

Emotion detection task, revisited

What can we add to the emotion detection task?

Class together words (let’s use POS)

sequence of classes might be interesting

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

The lab

1 Read the datafiles; extract text, write out datafile.tok

2 Invoke the Ratnaparkhi tagger on the tokenized text:datafile.maxHpos

3 read the .maxHpos file and pull out just the tags (clean upthe punctuation so it doesn’t break BoosTexter). Createdatafile.pos , which must end with space-comma

4 paste together datafile.pos with datafile.orig5 rerun the emotion detection, but this time with the extra

sequence information

Kahn Linguistics brushup

Big questionsSurvey of areas of linguistics

SummaryThe lab

The lab’s goals

Practice Perl

Practice practical scripting (Perl is great, but not always theanswer)

Get comfortable with a new tool (the Ratnaparkhi tagger;very easy)

Kahn Linguistics brushup