Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
ATILA – 2013
An algorithm for generating child–adult interaction data
Yevgen Matusevych Afra Alishahi
Contents
1. Input to CLA models.
2. Natural vs. generated input.
3. Hybrid approach.
4. Improving the algorithm.
Overview
• Computational models of child language acquisition (CLA) often use as an input utterance–scene pairs, for example in modeling cross-situational word learning:
Utterance (linguistic input): Take the ball!
Scene (visual input): {ball, car, rattle, book}
• Existing collections of child-directed speech (e.g., CHILDES) provide the linguistic input, but not the visual input.
Input to CLA models
Two possibilities:
1. Use a small manually annotated dataset.
- Relatively small amounts.
2. Generate visual input automatically.
- But what about its statistical properties?
Input to a cognitively plausible model must have the same statistical properties as the naturalistic data. So we need to compare the two sources.
Manually annotated sample • 3 short fragments (~10 min. each) of video recordings of 13-month-old
children playing toys with adults.
• Adult’s and child’s gaze directions, utterances and actions.
• Scene at step 3: [adult, child, book, car, open, point, play]
# Who? Looks where? Does what? Says what?
1. Adult child point book FROG. CROAK-CROAK
2. Child car play car [babbling]
3. Adult book open book CROAK-CROAK
Automatically prepared data Fazly, Alishahi et al., 2010:
use semantic symbols that correspond to the words in the utterance. Referential uncertainty is simulated by merging the representations of two consecutive scenes, and pairing them with only one of the utterances.
Utt1: But it is very boring.
Utt2: Are we going to play now?
Utt3: Did you get fed up … ?
Automatically prepared data Fazly, Alishahi et al., 2010:
use semantic symbols that correspond to the words in the utterance. Referential uncertainty is simulated by merging the representations of two consecutive scenes, and pairing them with only one of the utterances.
Utt1: But it is very boring. Scene1: [but, it, is, very, boring, are,
we, going, to, play, now]
Utt2: Are we going to play now?
Utt3: Did you get fed up … ?
Statistical measures • Measuring statistical properties:
1. Scene stability, or the overlap between every pair of consecutive scenes:
2. Noise, or the normalized number of words that refer to something not present in the scene:
3. Referential certainty, or the normalized number of the scene elements that are referred to in the utterance:
1
11),(
+
++ ∪
∩=
ii
iiii SS
SSSSoverlap
i
iiii U
SUUUnoise
∩−=)(
i
iii S
SUScertainty
∩=)( Si - current scene
Ui - current utterance Si+1 - next following scene
Statistical measures
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Scene stability Noise Referential certainty
manual
automatic
The hybrid approach A framework that uses a small data sample as an input and generates a
meaningful stream of adult-child interaction.
Context: puzzle, duck, bin, ball, frog Turn Agent Action Utterance
1. Adult play puzzle — 2. Child play duck babbling 3. Adult point puzzle Duck fits here. 4. Child touch bin babbling 5. Adult play puzzle Yes?
The hybrid approach
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Scene stability Noise Referential certainty
manual
automatic
generated
The hybrid approach
• The hybrid approach – generating the data based on a small manually annotated sample – provides better data. So how does it work?
The hybrid approach
• The hybrid approach – generating the data based on a small manually annotated sample – provides better data. So how does it work?
• Based on co-occurrence frequencies. If two items co-occur often, they must be related, e.g.:
— Adults react on children’s babbling and actions.
— Utterances often accompany actions.
— Objects are associated with certain actions.
A manipulate book FROG. CROAK-CROAK
C close book [babbling]
A open book CROAK-CROAK
Improved algorithm
• Manual system of dependencies using information from n previous feature values.
1. Processing.
2. Generation.
# Who? Looks where? Does what? To what? Says what?
1. Adult child point book FROG. CROAK-CROAK
2. Child car play car [babbling]
3. Adult book open book CROAK-CROAK
Improved algorithm: processing
ADULT gazeA: child actionA: point argument1A: book argument2A: ⌀ utteranceA: FROG. CROAK-CROAK CHILD gazeC: car actionC: play argument1C: car argument2C: ⌀ utteranceC: babbling
# Who? Looks where? Does what? To what? Says what?
1. Adult child point book FROG. CROAK-CROAK
2. Child car play car [babbling]
3. Adult book open book CROAK-CROAK
Improved algorithm: processing
ADULT gazeA: child actionA: point argument1A: book argument2A: ⌀ utteranceA: FROG. CROAK-CROAK CHILD gazeC: car actionC: play argument1C: car argument2C: ⌀ utteranceC: babbling ADULT gazeA: book
Improved algorithm: processing
ADULT gazeA: child actionA: point argument1A: book argument2A: ⌀ utteranceA: FROG. CROAK-CROAK CHILD gazeC: car actionC: play argument1C: car argument2C: ⌀ utteranceC: babbling ADULT gazeA: book
Count (gazeA (n+1) = book| gazeA (n) = child)
Count (gazeA (n+1) = book| actionA (n) = point) …
Improved algorithm: processing
ADULT gazeA: child actionA: point argument1A: book argument2A: ⌀ utteranceA: FROG. CROAK-CROAK CHILD gazeC: car actionC: play argument1C: car argument2C: ⌀ utteranceC: babbling ADULT gazeA: book actionA: open
Improved algorithm: processing
Adult child point book FROG. CROAK-
CROAK
Child car play car [babbling]
Adult book open book CROAK-CROAK
Improved algorithm: processing
Adult child point book FROG. CROAK-
CROAK
Child car play car [babbling]
Adult book open book CROAK-CROAK
Improved algorithm: processing
Adult child point book FROG. CROAK-
CROAK
Child car play car [babbling]
Adult book open book CROAK-CROAK
Improved algorithm: processing
ACTIONC= GAZEA =
play point open
book 1 4 0
child 7 2 0
car 0 5 3
Improved algorithm: generating
Features: {gazeA, actionA, object1A, object2A, utteranceA, gazeC, actionC, object1C, object2C, utteranceC}
A. Assume the features are independent?
B. Markov chain with memory m = 10?
C. Make an assumption that each feature depends on some features, but not on the other ones?
∏∈
==featuresF
iini
valueFvalueFP )|(
),...,,|( 10102211 −−−−−− ==== nnnnnnn vFvFvFvalueFP
Improved algorithm: generating
A distribution of values: book: 0.025 car: 0.005 child: 0.01 … So we can sample a value using the probabilities as weights.
Conclusions & future work
• Data generated using the hybrid approach have their statistical properties closer to those of naturalistic data.
• The algorithm can be improved using automatic collection of implicit statistical information and transforming it into transitional probabilities.
• We need to find an optimal way to represent the relations between the features: - which distribution to use? - assign weights? - replace sparse features like UTTERANCE with their categories? It means more manual work.
Questions?