View
216
Download
2
Tags:
Embed Size (px)
Citation preview
A Corpus Search Methodology for Focus Realization
Jonathan Howell and Mats Rooth
Linguistics and CIS
Cornell University
Goals
Study phonetic realization of focus in cases where formal-semantic theories make clear predictions.
Natural data from podcasts, radio, etc.
Find data using speech search engine based on speech recognition (Everyzing)
Automate all of the workflow
Today: preliminary data from pilot
he stayed longer than I did
-er [[ he he stayed x long]2
than [ IF stayed x long ]~2]
[ y stayed x-long ] antecedent clause
[ speaker stayed x-long ] scope of focus
… I should have liked that song a lot more than I did.
[more
x[[should w[ I like that song x well in w]]
than [I like that song x well in w0]]]
I understand even less than I did before
even less [[ I prs understand x much]2
than [I understood x much beforeF] ]~2]
Focus in comparative clauses
• Coherent syntactic-semantic theory about where focus should go
• Possibilities are constrained, because the main clause is usually the antecedent for focus interpretation in the comparative clause
• On a theoretical basis, we often think we know the correct grammatical analysis of sentences people use
Result
Hundreds of elements of a minimal pair varying position for focus
Speech files for short and 10-second intervals spanning than I did
Everyzing html contains time offsets for beginnings words. These are converted by program into a Praat representation.
Alingments are not good enough to use without correction.
Classification
Listen to sound snippet to determine if there is an actual token of “than I did”.
True in 56% of cases in a sample of 179 tokens.
Classify correct tokens into three grammatical-semantic classes
s comparing than- and main clauses, reference varies in the position of “I”. This licenses focus on the subject “I”.
[ he looked younger than I did. ]
21/40 tokens
d Comparing than- and main clauses, reference is constant in the position of “I”, but varies in the possible-world or temporal index of did, and not in any following position.
Depending on details of the representation of modality and time, this could license a focus on “did”.
5/40 tokens
f comparing than- and main clauses, reference in the position of I is constant, but varies in some position following did, often a temporal phrase.
I actually look younger now than I did 5 years ago
13/40 tokens
Mark vowel intervals in I and did with hand work.
Pitch in vowel region and duration of vowel region contribute positively to the area under the pitch curve (definite integral of pitch).
Number of glottal pulses in the vowel region.
NLP vs. Acoustic Phonetics
Classification based on signal
NLP classifier based on correct sentence (or speech recognition output), using parsing and machine learning on text features
Multiple focus
Issues marking of multiple foci with different scopes, and prominence of focus relative to accents not marking focus.
You made a very small amount more than I did. Now I make muchF more than youF do.