26
A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University

A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University

  • View
    216

  • Download
    2

Embed Size (px)

Citation preview

Page 1: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University

A Corpus Search Methodology for Focus Realization

Jonathan Howell and Mats Rooth

Linguistics and CIS

Cornell University

Page 2: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University

Goals

Study phonetic realization of focus in cases where formal-semantic theories make clear predictions.

Natural data from podcasts, radio, etc.

Find data using speech search engine based on speech recognition (Everyzing)

Automate all of the workflow

Today: preliminary data from pilot

Page 3: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University

he stayed longer than I did

-er [[ he he stayed x long]2

than [ IF stayed x long ]~2]

[ y stayed x-long ] antecedent clause

[ speaker stayed x-long ] scope of focus

Page 4: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University

… I should have liked that song a lot more than I did.

[more

x[[should w[ I like that song x well in w]]

than [I like that song x well in w0]]]

Page 5: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University

I understand even less than I did before

even less [[ I prs understand x much]2

than [I understood x much beforeF] ]~2]

Page 6: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University

Focus in comparative clauses

• Coherent syntactic-semantic theory about where focus should go

• Possibilities are constrained, because the main clause is usually the antecedent for focus interpretation in the comparative clause

• On a theoretical basis, we often think we know the correct grammatical analysis of sentences people use

Page 7: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University
Page 8: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University
Page 9: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University
Page 10: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University
Page 11: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University
Page 12: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University
Page 13: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University
Page 14: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University

Result

Hundreds of elements of a minimal pair varying position for focus

Speech files for short and 10-second intervals spanning than I did

Everyzing html contains time offsets for beginnings words. These are converted by program into a Praat representation.

Alingments are not good enough to use without correction.

Page 15: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University

Classification

Listen to sound snippet to determine if there is an actual token of “than I did”.

True in 56% of cases in a sample of 179 tokens.

Page 16: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University

Classify correct tokens into three grammatical-semantic classes

s comparing than- and main clauses, reference varies in the position of “I”. This licenses focus on the subject “I”.

[ he looked younger than I did. ]

21/40 tokens

Page 17: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University

d Comparing than- and main clauses, reference is constant in the position of “I”, but varies in the possible-world or temporal index of did, and not in any following position.

Depending on details of the representation of modality and time, this could license a focus on “did”.

5/40 tokens

Page 18: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University

f comparing than- and main clauses, reference in the position of I is constant, but varies in some position following did, often a temporal phrase.

I actually look younger now than I did 5 years ago

13/40 tokens

Page 19: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University
Page 20: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University
Page 21: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University

Mark vowel intervals in I and did with hand work.

Pitch in vowel region and duration of vowel region contribute positively to the area under the pitch curve (definite integral of pitch).

Number of glottal pulses in the vowel region.

Page 22: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University
Page 23: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University
Page 24: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University
Page 25: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University

NLP vs. Acoustic Phonetics

Classification based on signal

NLP classifier based on correct sentence (or speech recognition output), using parsing and machine learning on text features

Page 26: A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University

Multiple focus

Issues marking of multiple foci with different scopes, and prominence of focus relative to accents not marking focus.

You made a very small amount more than I did. Now I make muchF more than youF do.