46
Finding your way through the woods with GrETEL Liesbeth Augustinus Vincent Vandeghinste Ineke Schuurman Frank Van Eynde TABU-dag - June 14, 2013

Finding your way through the woods with GrETEL

  • Upload
    booth

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

Finding your way through the woods with GrETEL. Liesbeth Augustinus Vincent Vandeghinste Ineke Schuurman Frank Van Eynde. TABU-dag - June 14, 2013. GrETEL. Gr eedy E xtraction of T rees for E mpirical L inguistics Query engine for treebanks - PowerPoint PPT Presentation

Citation preview

Page 1: Finding your way through the woods with GrETEL

Finding your way through the woods with GrETEL

Liesbeth AugustinusVincent Vandeghinste

Ineke SchuurmanFrank Van Eynde

TABU-dag - June 14, 2013

Page 2: Finding your way through the woods with GrETEL

GrETEL• Greedy Extraction of Trees for Empirical Linguistics

• Query engine for treebanks

• Nederbooms projectExploitation of Dutch treebanksfor research in linguistics

Page 3: Finding your way through the woods with GrETEL

GrETEL• Greedy Extraction of Trees for Empirical Linguistics

• Query engine for treebanks

• Nederbooms projectExploitation of Dutch treebanksfor research in linguistics

• Goalso User-friendly toolso Access to large data fileso Fast and accurate

Page 4: Finding your way through the woods with GrETEL

GrETEL• Greedy Extraction of Trees for Empirical Linguistics

• Query engine for treebanks

• Treebank = syntactically annotated corpuse.g. Penn Treebank (English), TüBa (German),LASSY, CGN (Dutch)

Page 5: Finding your way through the woods with GrETEL

TREEBANKS

CGN core corpus LASSY smallSpoken Dutch Written Dutch

Stylistic & regional differences

conversations vs read textsNL vs VL

Stylistic differences

Wikipedia vs legal texts

± 1M tokens ± 1M tokens

130k sentences 65k sentences

Manually corrected Manually corrected

Page 6: Finding your way through the woods with GrETEL

GrETEL• Greedy Extraction of Trees for Empirical Linguistics

• Query engine for treebanks

• Treebank = syntactically annotated corpuse.g. Penn Treebank (English), TüBa (German),LASSY, CGN (Dutch)

• Parsere.g. Alpino (Van Noord 2006)

Page 7: Finding your way through the woods with GrETEL

ALPINO PARSER

Dit is een zin. >> ALPINO parser >>“This is a sentence.”

Page 8: Finding your way through the woods with GrETEL

ALPINO PARSER

Dit is een zin. >> ALPINO parser >>“This is a sentence.”

XML treesQuery language: XPath

Page 9: Finding your way through the woods with GrETEL

XPATH

//node[@cat="smain" and node[@rel="su" and @pt="vnw" and @lemma="dit"] and node[@rel="hd" and @pt="ww" and @lemma="zijn"] and node[@rel="predc" and @cat="np" and node[@rel="det" and @pt="lid" and @lemma="een"] and node[@rel="hd" and @pt="n" and @lemma="zin"]]]

Page 10: Finding your way through the woods with GrETEL

XPATH

//node[@cat="smain" and node[@rel="su" and @pt="vnw" and @lemma="dit"] and node[@rel="hd" and @pt="ww" and @lemma="zijn"] and node[@rel="predc" and @cat="np" and node[@rel="det" and @pt="lid" and @lemma="een"] and node[@rel="hd" and @pt="n" and @lemma="zin"]]]

Page 11: Finding your way through the woods with GrETEL

XPATH

//node[@cat="smain" and node[@rel="su" and @pt="vnw" and @lemma="dit"] and node[@rel="hd" and @pt="ww" and @lemma="zijn"] and node[@rel="predc" and @cat="np" and node[@rel="det" and @pt="lid" and @lemma="een"] and node[@rel="hd" and @pt="n" and @lemma="zin"]]]

Page 12: Finding your way through the woods with GrETEL

XPATH

Page 13: Finding your way through the woods with GrETEL

GrETEL• Greedy Extraction of Trees for Empirical Linguistics

• Query treebanks by example

Page 14: Finding your way through the woods with GrETEL

GrETEL• Greedy Extraction of Trees for Empirical Linguistics

• Query treebanks by example

• First version=> only for LASSY treebank

• New release=> GrETEL for CGN treebank=> update based on user reviews

Page 15: Finding your way through the woods with GrETEL

GrETEL• Example sentence

• Indicate relevant itemsof the sentence

• (Adapt XPath)• Select treebank

• Inspect results

• Parser (Alpino)

• Automatically generate XPath expression

• Present results

the user

Page 16: Finding your way through the woods with GrETEL

OUTLINE• GrETEL in a nutshell

• GrETEL demoo Case studyo Search options

• Conclusions and future work

Page 17: Finding your way through the woods with GrETEL

CASE STUDY• Verbs with fixed preposition

o E.g. Hij keek met een bang hartje naar de heks. ‘he was looking at the witch with a heavy heart .’

o VERB + (…+) PREP

LASSY:• Xpath query//node[@cat="smain" and node[@rel="hd" and @pos="verb" and @root="kijk"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pos="prep" and @root="naar"]]]

Page 18: Finding your way through the woods with GrETEL

CASE STUDY• Verbs with fixed preposition

o E.g. Hij keek naar de heks. ‘he was looking at the witch .’

• Discontinuous constructions!o E.g. Hij keek met een bang hartje naar de heks.

‘he was looking at the witch with a heavy heart .’

o VERB + (…+) PREP

Page 19: Finding your way through the woods with GrETEL

GrETEL ONLINE

Page 20: Finding your way through the woods with GrETEL

INPUT

Page 21: Finding your way through the woods with GrETEL

ANNOTATION MATRIX

Page 22: Finding your way through the woods with GrETEL

ANNOTATION GUIDELINES

Page 23: Finding your way through the woods with GrETEL

XPATH GENERATOR

Page 24: Finding your way through the woods with GrETEL

Other treebank, other format …Hij keek met een bank hartje naar de heks

• CGN/node[@cat="smain" and node[@rel="hd" and @pt="ww" and

@lemma="kijken"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pt="vz" and @lemma="naar"]]]

• LASSY//node[@cat="smain" and node[@rel="hd" and @pos="verb"

and @root="kijk"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pos="prep" and @root="naar"]]]

Page 25: Finding your way through the woods with GrETEL

Other treebank, other format …Hij keek met een bang hartje naar de heks

• CGN/node[@cat="smain" and node[@rel="hd" and @pt="ww" and

@lemma="kijken"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pt="vz" and @lemma="naar"]]]

• LASSY//node[@cat="smain" and node[@rel="hd" and @pos="verb"

and @root="kijk"] and node[@rel="ld" and @cat="pp" and node[@rel="hd" and @pos="prep" and @root="naar"]]]

Page 26: Finding your way through the woods with GrETEL

TREEBANK SELECTION

Page 27: Finding your way through the woods with GrETEL

RESULTSVerb plus fixed preposition

o E.g. Hij keek naar de heks. ‘A number of trees fell down.’

o VERB + (…+) PREP

4004 matches in 3881 sentences

Page 28: Finding your way through the woods with GrETEL

RESULTS: table

Page 29: Finding your way through the woods with GrETEL

RESULTS: data

Page 30: Finding your way through the woods with GrETEL

RESULTS: trees

Page 31: Finding your way through the woods with GrETEL

OUTLINE• GrETEL in a nutshell

• GrETEL demoo Case studyo Search options

• Conclusions and future work

Page 32: Finding your way through the woods with GrETEL

SEARCH OPTIONS

Below annotation matrix

Page 33: Finding your way through the woods with GrETEL

SEARCH OPTIONSGreen versus red word order in Dutch

o green: past participle – auxiliary De NAVO stelt dat ze er alles aan gedaan heeft

o red: auxiliary – past participleDe NAVO stelt dat ze er alles aan heeft gedaan

“The NATO claim that they have done everything in their power” (deredactie.be)

Page 34: Finding your way through the woods with GrETEL

SEARCH OPTIONS

Page 35: Finding your way through the woods with GrETEL

SEARCH OPTIONS

Page 36: Finding your way through the woods with GrETEL

SEARCH OPTIONS

Page 37: Finding your way through the woods with GrETEL

SEARCH OPTIONS

Page 38: Finding your way through the woods with GrETEL

SEARCH OPTIONS

Page 39: Finding your way through the woods with GrETEL

OUTLINE• GrETEL in a nutshell

• GrETEL demoo Case studyo Search options

• Conclusions and future work

Page 40: Finding your way through the woods with GrETEL

CONCLUSIONS• GrETEL: search engine for Dutch treebanks

• Input = natural language example

• Output = sample of similar sentences

• Syntactic concordancer

• Available online (via Mozilla Firefox)

• No installation required

Page 41: Finding your way through the woods with GrETEL

FUTURE WORK• GrETEL 2.0

o Include SoNaR corpus (ca 500M tokens)o More generic

• AfriBoomso GrETEL for Afrikaanso Include other treebank formats

Page 42: Finding your way through the woods with GrETEL

CASE STUDY• Collective noun constructions

o E.g. Een aantal bomen zijn omgevallen. ‘A number of trees fell down.’

o DET + NOUN + PLURAL NOUN

• Discontinuous constructions!o E.g. Een groot aantal oude bomen zijn omgevallen.

‘A large number of old trees fell down.’

Page 43: Finding your way through the woods with GrETEL

Thanks for your attention!

Try it yourself athttp://nederbooms.ccl.kuleuven.be/eng/gretel

Page 44: Finding your way through the woods with GrETEL

Waaraan vs Waar … aan

Waar denk je aan ?//node[@cat="top" and node[@rel="--" and @cat="whq" and

node[@rel="whd" and @pos="adv"] and node[@rel="body" and @cat="sv1" and node[@rel="pc" and @cat="pp" and node[@rel="hd" and @pos="prep"]]]] and node[@rel="--" and @pos="punct"]] (4 results)

• Waar bemoei je je mee?• Wanneer gaat een koortsstuip over in epilepsie ?

Page 45: Finding your way through the woods with GrETEL

Waaraan denk je ?//node[@cat="top" and node[@rel="--" and @cat="whq" and

node[@rel="whd" and @pos="pp"]] and node[@rel="--" and @pos="punct"]] (38 results)

• Waarom werken we ?• Waartoe verbind ik mij als ouder door dit formulier in te

vullen ?• Vanwaar die gulle hand van een Turkse overheid die in de

schulden zwemt ?

Page 46: Finding your way through the woods with GrETEL

Hij klom de boom in//node[@cat="top" and node[@rel="--" and @cat="smain"

and node[@rel="hd" and @pos="verb"] and node[@rel="ld" and @cat="np" and node[@rel="det" and @pos="det"] and node[@rel="hd" and @pos="noun"]] and node[@rel="svp" and @pos="part"]] and node[@rel="--" and @pos="punct"]] (37 results)

• Door haar winst komt Clijsters de top-20 binnen .• In feite ging minder dan de helft van Dorsets de rivier

over .• Nederland gaat de bezettingstijd in .