Upload
dina-harrington
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
AUTOMATIC DETECTION OF REGISTER CHANGES
FOR THE ANALYSIS OF DISCOURSE STRUCTURE
Laboratoire Parole et Langage, CNRS et Université de Provence Aix-en-Provence, France
Céline De [email protected]
1
Local vs. global pitch characteristics→ Bolinger (1951) Local: changes in the phonological representation of intonation Global: variations in register key (level) and span (range)
2
4
1
Narrow span
2
1
Expanded span
Higher key
21
Lower key
43
→ Trager (1957)
Local vs. global pitch characteristics
→ Functional aspect of local and global pitch variations→ Register variations in intonation systems ToBI (Pierrehumbert, 1980): binary phonological distinction (H&L tones) INTSINT (Hirst & Di Cristo, 1998): 8 possible tonal values where H & L
tones are interpreted with respect to the previous tone or with respect to the speaker’s register
3
make the crucial assumption that the speaker's key and range remain constant.
Overview
4
ADoReVA
Predicting topic changes through automatic detection of register variations
Topic changes as reflected by register variations
ADoReVA
5
Automatic Detection of Register Variations Algorithm
A clustering algorithm: represents through a binary tree structure the way units are grouped together according to their differences in register key and range Correlation with functional annotation
A Praat Plugin
ADoReVACalculate Register differences…
6
Calculates the difference between two consecutive units for key parameter
= sqrt( log2(median_unit) – log2(median_prevUnit))^2
Calculates the difference between two consecutive units for range parameter
= sqrt( log2(max/min_unit) – log2(max/min_prevUnit))2
Recursively reduces the Euclidian distance between two consecutive units in a space defined by key and span parameters
= sqrt( (diffkey)^2+(diffrange)^2)
ADoReVACalculate Register differences…
7
The detection of register key and range is done after the deletion of micro-prosodic effects thanks to the formulae
Which quantiles from q05 to q95 are best correlated with manual annotations of pitch extrema? (De Looze & Hirst, 2007)
- floor = q25*0.75- ceiling = q75*1.75
ADoReVATo Clustering tree…
8
The clustering algorithm groups units according to their difference in key and range. The smaller the difference between two units, the sooner these units are branched together.
ADoReVATo Clustering tree…
9
The output generated by the algorithm is a binary tree structure in the form of a layered icicle diagram
Hierarchical structure
ADoReVATo Clustering tree…
10
The output generated by the algorithm is a binary tree structure in the form of a layered icicle diagram
Relational Organisation
ADoReVATo Clustering tree…
11
The output generated by the algorithm is a binary tree structure in the form of a layered icicle diagram
ADoReVATo Clustering tree…
12
The output generated by the algorithm is a binary tree structure in the form of a layered icicle diagram
ADoReVACalculate Node Distances…
13
Calculate node distances between the leaves (or units) of the tree and correlate them (within a table) with manual annotation functions.
To Stat Analyses…
Topic changes as reflected by register changes
14
Are large differences in register between two consecutive units correlated with topic changes?Are large node distances between two leaves correlated with topic changes?
Topic changes
Topic changes as reflected by register changes
15
Register variations throw light on the informational organisation of the discourse structure: →The information weight carried out by the discourse element→ The hierarchical dimension and relational organisation of linguistic units
Litterature reports:
Lehiste, 1970, Brazil, 1980; Menn & Boyce, 1982; Kutik et al, 1983; Hirschberg & Pierrehumbert 1986 ; Thorsen, 1986; Nakajima & Allen, 1992;; Sluijter & Terken, 1993; Arons, 1994; Nicolas & Hirst, 1995; Fon, 2002; Kong, 2004; Chiu-yu et al, 2005; Mayer et al, 2006; denOuden et al, 2009
High and expanded register signals → Introduction of a new topic or topic change → Discourse element carrying new information → Elements at the beginning of the utterance → …
Topic changes as reflected by register changes
16
Litterature reports:
Low and compressed register signals → Final parts of the utterance → Topic continuity → sub-topics, parenthetical comments → …
Lehiste, 1970, Brazil, 1980; Menn & Boyce, 1982; Kutik et al, 1983; Hirschberg & Pierrehumbert 1986 ; Thorsen, 1986; Nakajima & Allen, 1992;; Sluijter & Terken, 1993; Arons, 1994; Nicolas & Hirst, 1995; Fon, 2002; Kong, 2004; Chiu-yu et al, 2005; Mayer et al, 2006; denOuden et al, 2009
Topic changes as reflected by register changes
17
Detection of topic changes through detection of large node distances
Assumption
Informing about declination/ final lowering: what temporal span?
Corpora PFC Corpus : 30 minutes of read speech from 10 French-native
speakers (Delais-Roussarie & Durand, 2003)
PAC Corpus: 30 minutes of read speech from 8 English-native speakers (www.pac-project.com)
CID corpus : 40 minutes of dialogue from 8 French-native speakers (Bertrand et al, 2007)
Aix-Marsec Corpus: 30 minutes of dialogue from 9 English-native speakers (Auran et al, 2004)
18
Functional Annotation A simplified version of Grosz & Sidner (1986) as used in Fon (2002)
and Kong (2004) DSP2, DSP1, DSP0 between prosodic words → DSP0: no discourse boundary/ related units → DSP1: hierarchically superior relation between units/ but still
share related purposes (cause-effect/ clarifying relationship) → DSP2: no related discourse purposes or topics
19
Preliminary Results Higher and expanded register Large differences in key and range or Large Euclidian distances Large node distances in the binary tree structure
Correlated with topic changes/ DSP2 annotation
Preliminary Results Higher and expanded register Large differences in key and range or Large Euclidian distances Large node distances in the binary tree structure
21
Range is not always involved in signaling topic changes.
Both Key and Range
Aix-Marsec Corpus (dialogue speech)
Key: F(2, 3446)=146.3, p-val< 2.2e-16 Range: F(2, 3446)=23.98, p-val: 4.549e-11
Range less than key
French Corpora (read and dialogue speech)
Key: F(2, 2398)=142, p-val< 2.2e-16 Range: F(2, 2398)=6.233, p-val: 0.0019
Not range
PAC Corpus (read speech)
Key: F(2, 3003) = 67.26, p-value: < 2.2e-16 Range: F(2, 3003) =0.1469, p-value = 0.8634
Preliminary Results Higher and expanded register Large differences in key and range or Large Euclidian distances Large node distances in the binary tree structure
22
Range is not always involved in signaling topic changes.
Speaking styles?Lively speech marked with variations in range
Preliminary Results
23
Range is not correlated with DSP1 annotation
Cause-effect/ clarifying relationship between two consecutive units may be signaled with modifying key only
Preliminary Results
24
Key appears as a stable parameter while range may be optional to indicate topic changes
Variations in range may be seen as marking a speaker’s involvment while telling his/her story
Key and range parameters convey different functions and have to be studied separatly
Prediction
25
Predicting topic changes through automatic detection of register variations
Confusion matrices:
→ 6 Features: key/ range differences in key/range node distances for key/range
→ 2 Classes: DSP0, DSP1/ DSP2
Prediction
26
Prediction with features key/ difference in key and node distance for key
→ gives better results than range, difference in range and node distances range.
Prediction
27
Prediction with both features → key and difference in key or
Scores Recall Precision F-Measurecat1 0.40074906 0.40074906 0.40074906
Key & diffkey
Key feature
DiffKey feature
Scores Recall Precision F-Measurecat1 0.31210986 0.31210986 0.31210986
Scores Recall Precision F-Measurecat1 0.38451934 0.38451934 0.38451934
Scores Recall Precision F-Measurecat1 0.4082397 0.28410077 0.33504096
NodDK feature
Key & NodDK Scores Recall Precision F-Measurecat1 0.52184767 0.27866668 0.3633203
→ key and node distance for key slightly improve the detection of topic changes
Prediction
28
Higher scores of prediction for dialogue speech than read speech
→ between 20-30% predicted for read speech→ about 40% predicted for dialogue speech
Discussion
29
Objective detection of register variations vs. subjective annotation of topic changes
Detection of other functions than topic changes as reflected by register variations
Detection of topic changes through automatic detection of - Tempo variations (pause & speaking rate)- Intensity variations
Discussion
30
Usefulness of the algorithm?Better understanding of the hierarchical and organisational structure of discourseHow do units fit together?
Conclusion
31
ADoReVA
An algorithm to understand the structure of speech as reflected by register variations
An algorithm to be implemented into intonation systems to improve the phonological representation of intonation (INTSINT: Detection of Top/Mid/Bottom taking into account register variations)
Testing different units
Subjective annotation vs. objective detection
A graphical representation to serve pre-analysis