The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingual Media Monitoring

The Role of CNL and AMRin Scalable Abstractive

Summarizationfor Multilingual Media Monitoring

Normunds Grūzītis and Guntis Bārzdiņš

University of Latvia, IMCSNational information agency LETA

5th Workshop on Controlled Natural Language, 25–26 July 2016, Aberdeen, Scotland

Large-scale media monitoring

BBC monitoring journalists translate from 30 languages into English, follow 400 social media accounts every day.

A monitoring journalist typically monitors 4 TV channels and several online sources simultaneously. This is about the maximum that any person can cope with mentally and physically. The required human effort thus scales linearly with the number of monitored sources.

Monitoring journalists constantly need to be on the lookout for more sources and follow important stories—but as it is, they are tied down with mundane, routine monitoring tasks.

Monitoring 250 video channels results in a daily buffer of 2.5TB, a weekly buffer of 19Tb, and an annual buffer of 1Pb.

SUMMA – Scalable Understanding of Multilingual MediA

Identify people, places, events of interestDiscover trends, emerging events, crucial new stories

H2020 grant No. 688139

Timeline

Storyline

Event-based multi-document summarization: storyline highlights across a set of related stories

unrestricted

sort of CNL?(templates)

• Extractive summarization selects representative sentences from the input documents

• Abstractive summarization builds a semantic representation from which a summary is generated

• What semantic representation?

Sentence A: I saw Joe’s dog, which was running in the garden.Sentence B: The dog was chasing a cat.Summary: Joe’s dog was chasing a cat in the garden.

Liu F., Flanigan J., Thomson S., Sadeh N., Smith N.A. Toward Abstractive Summarization Using Semantic Representations. NAACL 2015

Abstractive summarization

AMR – Abstract Meaning Representation• A semantic representation aimed at large-scale human annotation

• A practical, replicable amount of abstraction

• Captures many aspects of meaning in a single simple data structure

• Aims to abstract away from (English) syntax

• Rooted, labeled graphs

• Makes heavy use of PropBank framesets

• An actual sembank of nearly 50K sentences

• Sentences paired with their whole-sentence, logical meanings

AMR – Abstract Meaning Representation• A form of AMR has been around for a long time (Langkilde and Knight, 1998)

• It has changed a lot since then: PropBank, DBpedia, etc.

• Banarescu et al. (2013) – the fundamentals of the current AMR annotation scheme

• Uses the PENMAN notation (Bateman, 1990)

• A way of representing a directed labeled graph in a simple tree-like form

• Easy to read and write (for a human), and to traverse (for a program)

• From semantic role labelling (SRL) to whole-sentence representation

AMR – Abstract Meaning Representation• Nodes are variables labelled by concepts

• Entities, events, states, properties

• d / dog: d is an instance of dog

• Edges are semantic relations

• E.g. “The dog is eating bones.”

(e / eat-01:ARG0 (d / dog):ARG1 (b / bone))

eat.01: consume (VN-class: eat-39.1, FN-frame: Ingestion) ARG0-PAG: consumer, eater (VN-role: agent) ARG1-PPT: meal (VN-role: patient)

e / eat-01

b / boned / dog

ARG0 ARG1

AMR – Abstract Meaning Representation“Bob ate four cakes that he bought.”(x2 / eat-01

:ARG0 (x1 / person

:name (n / name

:op1 "Bob")

:wiki "Bob_X")

:ARG1 (x4 / cake

:quant 4

:ARG1-of (x7 / buy-01

:ARG0 x1)))

e / eat-01

x4 / cakex1 / person

ARG0 ARG1

x7 / buy-01

ARG1

-of

"Bob_X"

name

wik

i

ARG0

4

quant

AMR – Abstract Meaning Representation

Schneider N., Flanigan J., O’Gorman T. AMR Tutorial at NAACL 2015https://github.com/nschneid/amr-tutorial/

• AMR is still biased towards English or other source languages

• Not an Interlingua, but close: Comparison of English AMRs to Chinese and CzechXue N., Bojar O., Hajič J., Palmer M., Uresova Z., Zhang X. LREC 2014

• Meanwhile, AMR is agnostic about how to derive meanings from strings, and vice versa

https://github.com/nschneid/amr-tutorial/

Natural Language Understanding• While it has been recently showed that the CNL approach can be scaled up..

• Embedded CNLs allowing for CNL-based domain-specific information extraction

• CNL as an efficient and user-friendly interface for Big Data end-point querying

• CNL for bootstrapping robust NL interfaces

• High-level CNL for legal sources

• ..use cases like media monitoring are not limited to a particular domain, the input sources vary from newswire texts to TV and radio transcripts to user-generated content in social networks

• In the era of Big Data, there is a dominating view that Deep Learning is the only way to cope with robust and scalable NLU

• NLU cannot be approached by CNLs, and grammars in general (?)

SemEval 2016 Task 8 on AMR parsing1. Riga (University of Latvia / LETA): 0.61962. CAMR (Brandeis University / Boulder Learning Inc. / Rensselaer Polytechnic Institute): 0.61953. ICL-HD (Ruprecht-Karls-Universität Heidelberg): 0.60054. UCL+Sheffield (University College London / University of Sheffield): 0.59835. M2L (Kyoto University): 0.59526. CMU (Carnegie Mellon University / University of Washington): 0.56367. CU-NLP (OK Robot Go Ltd. / University of Colorado): 0.55668. UofR (University of Rochester): 0.49859. MeaningFactory (University of Groningen): 0.4702*10. CLIP@UMD (University of Maryland): 0.437011. DynamicPower (National Institute for Japanese Language and Linguistics): 0.3706*

* Did not use AMR training data

NLG from AMR• The potential of grammar-based and CNL approaches becomes obvious in the opposite direction

• e.g. in the generation of story highlights from summarized (pruned) AMR graphs

• Text generation from AMR is still recognized as a future task• An unexplored niche for grammars and CNLs• GF, for instance, as an excellent framework for implementing multilingual AMR verbalizers• Issue: AMR to AST mapping

Pourdamghani N., Gao Y., Hermjakob U., Knight K. Aligning English Strings with Abstract Meaning Representation Graphs. EMNLP 2014

Butler A. Deterministic natural language generation from meaning representations for machine translation. NAACL 2016 Workshop on Semantics-Driven Machine Translation

Pourdamghani N., Knight K., Hermjakob U. Generating English from Abstract Meaning Representations. INLG 2016 (to appear)

Flanigan J., Dyer C., Smith N.A., Carbonell J. Generation from Abstract Meaning Representation using Tree Transducers. NAACL 2016

NLG from AMR• Butler A. 2016. Deterministic natural language generation from meaning representations for

machine translation. NAACL Workshop on Semantics-Driven Machine Translation

• Converts PENMAN-style representations to Penn-style trees

• Uses Tregex and Tsurgeon utilities which are a part of the Stanford NLP library

• Covers a wide range of constructions

• A simple example: “Girls see a boy.”

AMR to GF conversion: first experiment“Girls see a boy.”(x2 (see-01 (:ARG0 (x1 girl)) (:ARG1 (x4 boy))))

mkCl : NP ⟶ VP ⟶ ClmkVP : V2 ⟶ NP ⟶ VPmkNP : Quant ⟶ Num ⟶ CN ⟶ VPmkCN : N ⟶ CN

(mkCl (mkNP a_Quant singularNum (mkCN girl_N)) (mkVP see_V2 (mkNP a_Quant singularNum (mkCN boy_N))))

adjoin (Cl (VP @)) with PB-framemove ARG0 under Clmove ARG1 under VPadjoin (NP a_Quant singularNum (CN @)) with ARG0/1

excise var

AMR to GF conversion: first experiment“The boy sees the two pretty girls.”(x3 (see-01 (:ARG0 (x2 boy)) (:ARG1 (x7 (girl (:quant 2) (:mod (x6 pretty)))))))

mkCN : A ⟶ N ⟶ CNmkNum : Digits ⟶ NummkDigits : Str ⟶ Digits

(mkCl (mkNP a_Quant singularNum (mkCN boy_N)) (mkVP see_V2 (mkNP a_Quant (mkNum (mkDigits "2")) (mkCN pretty_A girl_N))))

move mod under CNreplace Num with quantadjoin (Num (Digits @)) with quant

Story headlines: Templates? Application grammar? CNL?Multilingual Headlines Generator(a GF toy example by Jose P. Moreno)http://grammaticalframework.org/demos/multilingual_headlines.html

http://www.grammaticalframework.org/demos/multilingual_headlines.html

http://www.grammaticalframework.org/demos/multilingual_headlines.html

Conclusion• There is a potential for cooperating with the DL folks in both NLU and NLG

• Especially in NLG which is recognized among the next problems to “solve” by DL

• Especially in domain specific use cases that can be approached by CNL

• AMR to text issues to be addressed: number, time, co-references, articles, concepts and WSD (for multilingual NLG), named entities, reification; the management of transformation rules

Technology

The Role of CNL and AMR in Scalable Abstractive Summarization for Multilingual Media Monitoring