20
Considerations for Evaluating Models of Language Understanding and Reasoning Gabriel Recchia University of Cambridge

Considerations for Evaluating Models of Language Understanding and Reasoning Gabriel Recchia University of Cambridge

Embed Size (px)

Citation preview

Page 1: Considerations for Evaluating Models of Language Understanding and Reasoning Gabriel Recchia University of Cambridge

Considerations for Evaluating Models of Language Understanding and Reasoning

Gabriel RecchiaUniversity of Cambridge

Page 2: Considerations for Evaluating Models of Language Understanding and Reasoning Gabriel Recchia University of Cambridge

• Background: The bAbI dataset• Introducing the GABITS dataset

(http://nowin2d.com/gabits/)

Page 3: Considerations for Evaluating Models of Language Understanding and Reasoning Gabriel Recchia University of Cambridge

Some history

Page 4: Considerations for Evaluating Models of Language Understanding and Reasoning Gabriel Recchia University of Cambridge

Some history

(slide from Bordes, Weston, Chopra, Mikolov, Joulin & Bottou, 2015)

Page 5: Considerations for Evaluating Models of Language Understanding and Reasoning Gabriel Recchia University of Cambridge

GeneratingProcess

Training Set

Test Set

Page 6: Considerations for Evaluating Models of Language Understanding and Reasoning Gabriel Recchia University of Cambridge

Facebook’s bAbI dataset

(slide from Bordes, Weston, Chopra, Mikolov, Joulin & Bottou, 2015)

Page 7: Considerations for Evaluating Models of Language Understanding and Reasoning Gabriel Recchia University of Cambridge

The bAbI dataset

(slide from Bordes, Weston, Chopra, Mikolov, Joulin & Bottou, 2015)

Page 8: Considerations for Evaluating Models of Language Understanding and Reasoning Gabriel Recchia University of Cambridge

Introducing GABITSThe Grounded and bAbI-Inspired Task Set

Page 9: Considerations for Evaluating Models of Language Understanding and Reasoning Gabriel Recchia University of Cambridge

Each training instance consists of– A narrative– A group of questions and associated answers– An image illustrating the state of the world at

every point when something changes state– A symbolic representation of the state of the

world at every point when something changes state (optional)

Introducing GABITSThe Grounded and bAbI-Inspired Task Set

Page 10: Considerations for Evaluating Models of Language Understanding and Reasoning Gabriel Recchia University of Cambridge

1 The lamp is in the kitchen.2 The ball is in the dining room.3 Eve is in the hall.4 Carol is in the kitchen.5 Frank is in the hall.6 Carol got the lamp.7 Eve went to the kitchen.8 Eve travelled to the billiard room.9 Frank travelled to the kitchen.10 Eve went to the kitchen.11 Carol travelled to the billiard room.12 Carol discarded the lamp.13 Carol grabbed the lamp.

Narrative

Page 11: Considerations for Evaluating Models of Language Understanding and Reasoning Gabriel Recchia University of Cambridge

14 (T1.a) Who is in the kitchen? Eve,Frank15 (T2.a) Where is Eve? kitchen16 (T3.a) What is Carol holding? lamp17 (T12.3) How many objects is Carol holding? one18 (T3.b) Who is holding the lamp? Carol19 (T3.b) Who is holding the ball? no one20 (T4.a) What has Carol held? lamp21 (T4.b) Who has held the lamp? Carol22 (T6) Who moved the lamp to the billiard room? Carol23 (T7.a) Where has Eve been? billiard room,hall,kitchen24 (T7.a) Where has Frank been? hall,kitchen25 (T7.c) Where has the lamp been? billiard room,kitchen26 (T7.b) Where has Eve not been? dining room29 (T8.a) Who has been in the hall? Eve,Frank30 (T8.c) What has been in the kitchen? lamp32 (T8.c) What has been in the billiard room? lamp33 (T8.b) Who has not been in the billiard room? Frank

Questions

Page 12: Considerations for Evaluating Models of Language Understanding and Reasoning Gabriel Recchia University of Cambridge

27 (T12.8) How many people have been in the kitchen? three28 (T13.8) Have fewer than four people been in the kitchen? yes31 (T13.8) Have fewer than three objects been in the kitchen?yes37 (T11) Who has been in the hall or the dining room (but not both)? Eve,Frank38 (T9.a) Who has been in the dining room or the kitchen (or both)?Carol,Eve,Frank42 (T13.9) Have more than five people been in the billiard room or the hall or both? no

Questions (cont.)

Page 13: Considerations for Evaluating Models of Language Understanding and Reasoning Gabriel Recchia University of Cambridge

Visual representation of world

Page 14: Considerations for Evaluating Models of Language Understanding and Reasoning Gabriel Recchia University of Cambridge

1 The lamp is in the kitchen.2 The ball is in the dining room.3 Eve is in the hall.4 Carol is in the kitchen.5 Frank is in the hall.

Page 15: Considerations for Evaluating Models of Language Understanding and Reasoning Gabriel Recchia University of Cambridge

6 Carol got the lamp.

Page 16: Considerations for Evaluating Models of Language Understanding and Reasoning Gabriel Recchia University of Cambridge

7 Eve moved to the kitchen.

Page 17: Considerations for Evaluating Models of Language Understanding and Reasoning Gabriel Recchia University of Cambridge

Symbolic representation of worldagent2.name Frankagent2.x 170agent2.y 414(agent2.room hall)

item0.name lampitem0.x 278item0.y 408(item0.room kitchen)(item0.owner Carol)

item1.name ballitem1.x 118item1.y 52(item1.room dining room)(item1.owner null)

time: 65(Carol took the lamp)

agent0.name Eveagent0.x 149agent0.y 324(agent0.room hall)

agent1.name Carolagent1.x 284agent1.y 414(agent1.room kitchen)

Page 18: Considerations for Evaluating Models of Language Understanding and Reasoning Gabriel Recchia University of Cambridge

self-contained: all or nearly all of the information necessary to perform well at the task is present within the training data

– It should be obviously possible for a human to solve the task even if they do not speak the language in which the task is rendered

Advantages

Page 19: Considerations for Evaluating Models of Language Understanding and Reasoning Gabriel Recchia University of Cambridge

incremental and compositional: questions build on each other

Advantages

Who is in the hall?Who has been in the hall?Who has not been in the hall?Who has been in the hall and the lounge?How many people have been in the hall?How many people have been in the hall and the lounge?

Page 20: Considerations for Evaluating Models of Language Understanding and Reasoning Gabriel Recchia University of Cambridge

wide-coverage: the tasks in the dataset correspond to diverse abilities

For even wider coverage - even more tasks!

In recent years, there has been an increasing number of papers with (mostly) self-contained tasks involving two- or three-dimensional spatial representations• Contact me for our list so far!• Or to let me know about more tasks to add to the

collection!

Advantages