7
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Short Papers), pages 407–413 Melbourne, Australia, July 15 - 20, 2018. c 2018 Association for Computational Linguistics 407 Neural Open Information Extraction Lei Cui, Furu Wei, and Ming Zhou Microsoft Research Asia {lecu,fuwei,mingzhou}@microsoft.com Abstract Conventional Open Information Extrac- tion (Open IE) systems are usually built on hand-crafted patterns from other NLP tools such as syntactic parsing, yet they face problems of error propagation. In this paper, we propose a neural Open IE approach with an encoder-decoder frame- work. Distinct from existing methods, the neural Open IE approach learns highly confident arguments and relation tuples bootstrapped from a state-of-the-art Open IE system. An empirical study on a large benchmark dataset shows that the neural Open IE system significantly outperforms several baselines, while maintaining com- parable computational efficiency. 1 Introduction Open Information Extraction (Open IE) involves generating a structured representation of informa- tion in text, usually in the form of triples or n-ary propositions. An Open IE system not only ex- tracts arguments but also relation phrases from the given text, which does not rely on pre-defined on- tology schema. For instance, given the sentence deep learning is a subfield of machine learning”, the triple (deep learning; is a subfield of ; ma- chine learning) can be extracted, where the rela- tion phrase “is a subfield of ” indicates the seman- tic relationship between two arguments. Open IE plays a key role in natural language understanding and fosters many downstream NLP applications such as knowledge base construction, question an- swering, text comprehension, and others. The Open IE system was first introduced by TEXTRUNNER (Banko et al., 2007), fol- lowed by several popular systems such as REVERB (Fader et al., 2011), OLLIE (Mausam et al., 2012), ClausIE (Del Corro and Gemulla, 2013) Stanford OPENIE (Angeli et al., 2015), PropS (Stanovsky et al., 2016) and most recently OPENIE4 1 (Mausam, 2016) and OPENIE5 2 . Al- though these systems have been widely used in a variety of applications, most of them were built on hand-crafted patterns from syntactic pars- ing, which causes errors in propagation and com- pounding at each stage (Banko et al., 2007; Gash- teovski et al., 2017; Schneider et al., 2017). There- fore, it is essential to solve the problems of cascad- ing errors to alleviate extracting incorrect tuples. To this end, we propose a neural Open IE ap- proach with an encoder-decoder framework. The encoder-decoder framework is a text generation technique and has been successfully applied to many tasks, such as machine translation (Cho et al., 2014; Bahdanau et al., 2014; Sutskever et al., 2014; Wu et al., 2016; Gehring et al., 2017; Vaswani et al., 2017), image caption (Vinyals et al., 2014), abstractive summarization (Rush et al., 2015; Nallapati et al., 2016; See et al., 2017) and recently keyphrase extraction (Meng et al., 2017). Generally, the encoder encodes the input sequence to an internal representation called ‘con- text vector’ which is used by the decoder to gen- erate the output sequence. The lengths of input and output sequences can be different, as there is no one on one relation between the input and out- put sequences. In this work, Open IE is cast as a sequence-to-sequence generation problem, where the input sequence is the sentence and the out- put sequence is the tuples with special placehold- ers. For instance, given the input sequence “deep learning is a subfield of machine learning”, the output sequence will be “harg1i deep learning h/arg1ihreli is a subfield of h/reliharg2i machine 1 https://github.com/allenai/openie-standalone 2 https://github.com/dair-iitd/OpenIE-standalone

Neural Open Information Extraction“deep learning is a subfield of machine learning”, the triple (deep learning; is a subfield of; ma- ... 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Neural Open Information Extraction“deep learning is a subfield of machine learning”, the triple (deep learning; is a subfield of; ma- ... 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Short Papers), pages 407–413Melbourne, Australia, July 15 - 20, 2018. c©2018 Association for Computational Linguistics

407

Neural Open Information Extraction

Lei Cui, Furu Wei, and Ming Zhou

Microsoft Research Asia{lecu,fuwei,mingzhou}@microsoft.com

Abstract

Conventional Open Information Extrac-tion (Open IE) systems are usually builton hand-crafted patterns from other NLPtools such as syntactic parsing, yet theyface problems of error propagation. Inthis paper, we propose a neural Open IEapproach with an encoder-decoder frame-work. Distinct from existing methods,the neural Open IE approach learns highlyconfident arguments and relation tuplesbootstrapped from a state-of-the-art OpenIE system. An empirical study on a largebenchmark dataset shows that the neuralOpen IE system significantly outperformsseveral baselines, while maintaining com-parable computational efficiency.

1 Introduction

Open Information Extraction (Open IE) involvesgenerating a structured representation of informa-tion in text, usually in the form of triples or n-arypropositions. An Open IE system not only ex-tracts arguments but also relation phrases from thegiven text, which does not rely on pre-defined on-tology schema. For instance, given the sentence“deep learning is a subfield of machine learning”,the triple (deep learning; is a subfield of ; ma-chine learning) can be extracted, where the rela-tion phrase “is a subfield of ” indicates the seman-tic relationship between two arguments. Open IEplays a key role in natural language understandingand fosters many downstream NLP applicationssuch as knowledge base construction, question an-swering, text comprehension, and others.

The Open IE system was first introducedby TEXTRUNNER (Banko et al., 2007), fol-lowed by several popular systems such asREVERB (Fader et al., 2011), OLLIE (Mausam

et al., 2012), ClausIE (Del Corro and Gemulla,2013) Stanford OPENIE (Angeli et al., 2015),PropS (Stanovsky et al., 2016) and most recentlyOPENIE41 (Mausam, 2016) and OPENIE52. Al-though these systems have been widely used ina variety of applications, most of them werebuilt on hand-crafted patterns from syntactic pars-ing, which causes errors in propagation and com-pounding at each stage (Banko et al., 2007; Gash-teovski et al., 2017; Schneider et al., 2017). There-fore, it is essential to solve the problems of cascad-ing errors to alleviate extracting incorrect tuples.

To this end, we propose a neural Open IE ap-proach with an encoder-decoder framework. Theencoder-decoder framework is a text generationtechnique and has been successfully applied tomany tasks, such as machine translation (Choet al., 2014; Bahdanau et al., 2014; Sutskeveret al., 2014; Wu et al., 2016; Gehring et al., 2017;Vaswani et al., 2017), image caption (Vinyalset al., 2014), abstractive summarization (Rushet al., 2015; Nallapati et al., 2016; See et al., 2017)and recently keyphrase extraction (Meng et al.,2017). Generally, the encoder encodes the inputsequence to an internal representation called ‘con-text vector’ which is used by the decoder to gen-erate the output sequence. The lengths of inputand output sequences can be different, as there isno one on one relation between the input and out-put sequences. In this work, Open IE is cast as asequence-to-sequence generation problem, wherethe input sequence is the sentence and the out-put sequence is the tuples with special placehold-ers. For instance, given the input sequence “deeplearning is a subfield of machine learning”, theoutput sequence will be “〈arg1〉 deep learning〈/arg1〉 〈rel〉 is a subfield of 〈/rel〉 〈arg2〉 machine

1https://github.com/allenai/openie-standalone2https://github.com/dair-iitd/OpenIE-standalone

Page 2: Neural Open Information Extraction“deep learning is a subfield of machine learning”, the triple (deep learning; is a subfield of; ma- ... 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

408

𝑥1 𝑥2 … 𝑥𝑛

3-layerLSTM

attention𝑦1 𝑦2 …𝑠

𝑦1 𝑦2 … 𝑦𝑚

decoder

embedding

+

copying

encoder

Figure 1: The encoder-decoder model architecture for the neural Open IE system

learning 〈/arg2〉”. We obtain the input and out-put sequence pairs from highly confident tuplesbootstrapped from a state-of-the-art Open IE sys-tem. Experiment results on a large benchmarkdataset illustrate that the neural Open IE approachis significantly better than others in precision andrecall, while also reducing the dependencies onother NLP tools.

The contributions of this paper are three-fold. First, the encoder-decoder framework learnsthe sequence-to-sequence task directly, bypassingother hand-crafted patterns and alleviating errorpropagation. Second, a large number of high-quality training examples can be bootstrappedfrom state-of-the-art Open IE systems, which isreleased for future research. Third, we conductcomprehensive experiments on a large benchmarkdataset to compare different Open IE systems toshow the neural approach’s promising potential.

2 Methodology

2.1 Problem Definition

Let (X,Y ) be a sentence and tuples pair, whereX = (x1, x2, ..., xm) is the word sequence andY = (y1, y2, ..., yn) is the tuple sequence ex-tracted from X . The conditional probability ofP (Y |X) can be decomposed as:

P (Y |X) = P (Y |x1, x2, ..., xm)

=

n∏i=1

p(yi|y1, y2, ..., yi−1;x1, x2, ...xm)

(1)

In this work, we only consider the binary extrac-tions from sentences, leaving n-ary extractions and

nested extractions for future research. In addi-tion, we ensure that both the argument and rela-tion phrases are sub-spans of the input sequence.Therefore, the output vocabulary equals the inputvocabulary plus the placeholder symbols.

2.2 Encoder-Decoder Model Architecture

The encoder-decoder framework takes a variablelength input sequence to a compressed representa-tion vector that is used by the decoder to generatethe output sequence. In this work, both the en-coder and decoder are implemented using Recur-rent Neural Networks (RNN) and the model archi-tecture is shown in Figure 1.

The encoder uses a 3-layer stacked Long Short-Term Memory (LSTM) (Hochreiter and Schmid-huber, 1997) network to covert the input sequenceX = (x1, x2, ...xm) into a set of hidden represen-tations h = (h1, h2, ..., hm), where each hiddenstate is obtained iteratively as follows:

ht = LSTM(xt, ht−1) (2)

The decoder also uses a 3-layer LSTM net-work to accept the encoder’s output and generatea variable-length sequence Y as follows:

st = LSTM(yt−1, st−1, c)

p(yt) = softmax(yt−1, st, c)(3)

where st is the hidden state of the decoder LSTMat time t, c is the context vector that is introducedlater. We use the softmax layer to calculate theoutput probability of yt and select the word withthe largest probability.

An attention mechanism is vital for the encoder-decoder framework, especially for our neural

Page 3: Neural Open Information Extraction“deep learning is a subfield of machine learning”, the triple (deep learning; is a subfield of; ma- ... 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

409

Open IE system. Both the arguments and rela-tions are sub-spans that correspond to the inputsequence. We leverage the attention method pro-posed by Bahdanau et al. to calculate the contextvector c as follows:

ci =n∑

j=1

αijhj

αij =exp(eij)∑nk=1 exp(eik)

eij = a(si−1, hj)

(4)

where a is an alignment model that scores howwell the inputs around position j and the outputat position i match, which is measured by the en-coder hidden state hj and the decoder hidden statesi−1. The encoder and decoder are jointly opti-mized to maximize the log probability of the out-put sequence conditioned on the input sequence.

2.3 Copying MechanismSince most encoder-decoder methods maintain afixed vocabulary of frequent words and convert alarge number of long-tail words into a special sym-bol “〈unk〉”, the copying mechanism (Gu et al.,2016; Gulcehre et al., 2016; See et al., 2017; Menget al., 2017) is designed to copy words from the in-put sequence to the output sequence, thus enlarg-ing the vocabulary and reducing the proportion ofgenerated unknown words. For the neural OpenIE task, the copying mechanism is more importantbecause the output vocabulary is directly from theinput vocabulary except for the placeholder sym-bols. We simplify the copying method in (Seeet al., 2017), the probability of generating the wordyt comes from two parts as follows:

p(yt) =

{p(yt|y1, y2, ..., yt−1;X) if yt ∈ V∑

i:xi=ytati otherwise

(5)where V is the target vocabulary. We combine thesequence-to-sequence generation and attention-based copying together to derive the final output.

3 Experiments

3.1 DataFor the training data, we used Wikipedia dump201801013 and extracted all the sentences thatare 40 words or less. OPENIE4 is used to an-alyze the sentences and extract all the tuples

3https://dumps.wikimedia.org/enwiki/20180101/

with binary relations. To further obtain high-quality tuples, we only kept the tuples whose con-fidence score is at least 0.9. Finally, there area total of 36,247,584 〈sentence, tuple〉 pairs ex-tracted. The training data is released for pub-lic use at https://1drv.ms/u/s!ApPZx_TWwibImHl49ZBwxOU0ktHv.

For the test data, we used a large benchmarkdataset (Stanovsky and Dagan, 2016) that contains3,200 sentences with 10,359 extractions4. Wecompared with several state-of-the-art baselinesincluding OLLIE, ClausIE, Stanford OPENIE,PropS and OPENIE4. The evaluation metrics areprecision and recall.

3.2 Parameter Settings

We implemented the neural Open IE model us-ing OpenNMT (Klein et al., 2017), which is anopen source encoder-decoder framework. We used4 M60 GPUs for parallel training, which takes3 days. The encoder is a 3-layer bidirectionalLSTM and the decoder is another 3-layer LSTM.Our model has 256-dimensional hidden states and256-dimensional word embeddings. A vocabularyof 50k words is used for both the source and tar-get sides. We optimized the model with SGD andthe initial learning rate is set to 1. We trained themodel for 40 epochs and started learning rate de-cay from the 11th epoch with a decay rate 0.7. Thedropout rate is set to 0.3. We split the data into20 partitions and used data sampling in OpenNMTto train the model. This reduces the length of theepochs for more frequent learning rate updates andvalidation perplexity computation.

3.3 Results

We used the script in (Stanovsky and Dagan,2016)5 to evaluate the precision and recall of dif-ferent baseline systems as well as the neural OpenIE system. The precision-recall curve is shown inFigure 2. It is observed that the neural Open IEsystem performs best among all tested systems.Furthermore, we also calculated the Area underPrecision-Recall Curve (AUC) for each system.The neural Open IE system with top-5 outputsachieves the best AUC score 0.473, which is sig-nificantly better than other systems. Although the

4https://github.com/gabrielStanovsky/oie-benchmark

5The absolute scores are different from the original paperbecause the authors changed the matching function in theirGitHub Repo, but did not change the relative performance.

Page 4: Neural Open Information Extraction“deep learning is a subfield of machine learning”, the triple (deep learning; is a subfield of; ma- ... 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

410

0.473

0.27

0.373

0.248

0.375

0.187

0.308

0.082

00.05

0.10.15

0.20.25

0.30.35

0.40.45

0.5

Neural OpenIE (Top-5)

Neural OpenIE (Top-1)

OpenIE 4OpenIE 4 (Binary)

ClausIE Ollie PropS Stanford

Area Under Precision-Recall Curve

Figure 2: The Precision-Recall (P-R) curve and Area under P-R curve (AUC) of Open IE systems

neural Open IE is learned from the bootstrappedoutputs of OPENIE4’s extractions, only 11.4% ofthe extractions from neural Open IE agree withthe OPENIE4’s extractions, while the AUC scoreis even better than OPENIE4’s result. We be-lieve this is because the neural approach learnsarguments and relations across a large number ofhighly confident training instances. This also indi-cates that the generalization capability of the neu-ral approach is better than previous methods. Weobserved many cases in which the neural Open IEis able to correctly identify the boundary of argu-ments but OpenIE4 cannot, for instance:

Input Instead , much of numerical analysis isconcerned with obtaining approximatesolutions while maintaining reasonablebounds on errors .

Gold much of numerical analysis ||| con-cerned ||| with obtaining approximatesolutions while maintaining reason-able bounds on errors

OpenIE4 much of numerical analysis ||| is con-cerned with ||| obtaining approximatesolutions

Neural Open IE much of numerical analysis ||| is con-cerned ||| with obtaining approximatesolutions while maintaining reason-able bounds on errors

This case illustrates that the neural approach re-duces the limitation of hand-crafted patterns fromother NLP tools. Therefore, it reduces the errorpropagation effect and performs better than othersystems especially for long sentences.

We also investigated the computational cost ofdifferent systems. For the baseline systems, weobtained the Open IE extractions using a Xeon2.4 GHz CPU. For the neural Open IE, we eval-uated performance based on an M60 GPU. The

running time was calculated by extracting OpenIE tuples from the test dataset that contains a to-tal of 3,200 sentences. The results are shown inTable 1. Among the aforementioned conventionalsystems, Ollie is the most efficient approach whichtakes around 160s to finish the extraction. By us-ing GPU, the neural approach takes 172s to extractthe tuples from the test data, which is comparablewith conventional approaches. As the neural ap-proach does not depend on other NLP tools, wecan further optimize the computational cost in fu-ture research efforts.

System Device TimeStanford CPU 234s

Ollie CPU 160sClausIE CPU 960sPropS CPU 432s

OpenIE4 CPU 181sNeural Open IE GPU 172s

Table 1: Running time of different systems

4 Related Work

The development of Open IE systems haswitnessed rapid growth during the pastdecade (Mausam, 2016). The Open IE sys-tem was introduced by TEXTRUNNER (Bankoet al., 2007) as the first generation. It caststhe argument and relation extraction task as asequential labeling problem. The system is highlyscalable and extracts facts from large scale webcontent. REVERB (Fader et al., 2011) improvedover TEXTRUNNER with syntactic and lexicalconstraints on binary relations expressed by verbs,which more than doubles the area under the

Page 5: Neural Open Information Extraction“deep learning is a subfield of machine learning”, the triple (deep learning; is a subfield of; ma- ... 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

411

precision-recall curve. Following these efforts,the second generation known as R2A2 (Etzioniet al., 2011) was developed based on REVERB

and an argument identifier, ARGLEARNER, tobetter extract the arguments for the relationphrases. The first and second generation Open IEsystems extract only relations that are mediatedby verbs and ignore contexts. To alleviate theselimitations, the third generation OLLIE (Mausamet al., 2012) was developed, which achieves betterperformance by extracting relations mediated bynouns, adjectives, and more. In addition, contex-tual information is also leveraged to improve theprecision of extractions. All the three generationsonly consider binary extractions from the text,while binary extractions are not always enoughfor their semantics representations. Therefore,SRLIE (Christensen et al., 2010) was developedto include an attribute context with a tuple whenit is available. OPENIE4 was built on SRLIE witha rule-based extraction system RELNOUN (Paland Mausam, 2016) for extracting noun-mediatedrelations. Recently, OPENIE5 improved uponextractions from numerical sentences (Saha et al.,2017) and broke conjunctions in arguments togenerate multiple extractions. During this period,there were also some other Open IE systemsemerged and successfully applied in different sce-narios, such as ClausIE (Del Corro and Gemulla,2013) Stanford OPENIE (Angeli et al., 2015),PropS (Stanovsky et al., 2016), and more.

The encoder-decoder framework was intro-duced by Cho et al. and Sutskever et al., wherea multi-layered LSTM/GRU is used to map theinput sequence to a vector of a fixed dimension-ality, and then another deep LSTM/GRU to de-code the target sequence from the vector. Bah-danau et al. and Luong et al. further improvedthe encoder-decoder framework by integrating anattention mechanism so that the model can au-tomatically find parts of a source sentence thatare relevant to predicting a target word. To im-prove the parallelization of model training, convo-lutional sequence-to-sequence (ConvS2S) frame-work (Gehring et al., 2016, 2017) was proposedto fully parallelize the training since the numberof non-linearities is fixed and independent of theinput length. Recently, the transformer frame-work (Vaswani et al., 2017) further improved overthe vanilla S2S model and ConvS2S in both accu-racy and training time.

In this paper, we use the LSTM-based S2S ap-proach to obtain binary extractions for the Open IEtask. To the best of our knowledge, this is the firsttime that the Open IE task is addressed using anend-to-end neural approach, bypassing the hand-crafted patterns and alleviating error propagation.

5 Conclusion and Future Work

We proposed a neural Open IE approach usingan encoder-decoder framework. The neural OpenIE model is trained with highly confident binaryextractions bootstrapped from a state-of-the-artOpen IE system, therefore it can generate high-quality tuples without any hand-crafted patternsfrom other NLP tools. Experiments show thatour approach achieves very promising results ona large benchmark dataset.

For future research, we will further investigatehow to generate more complex tuples such as n-ary extractions and nested extractions with theneural approach. Moreover, other frameworkssuch as convolutional sequence-to-sequence andtransformer models could apply to achieve betterperformance.

Acknowledgments

We are grateful to the anonymous reviewers fortheir insightful comments and suggestions.

ReferencesGabor Angeli, Melvin Jose Johnson Premkumar,

and Christopher D. Manning. 2015. Leverag-ing linguistic structure for open domain informa-tion extraction. In Proceedings of the 53rd An-nual Meeting of the Association for Computa-tional Linguistics and the 7th International JointConference on Natural Language Processing (Vol-ume 1: Long Papers). Association for Computa-tional Linguistics, Beijing, China, pages 344–354.http://www.aclweb.org/anthology/P15-1034.

Dzmitry Bahdanau, Kyunghyun Cho, and YoshuaBengio. 2014. Neural machine translation byjointly learning to align and translate. CoRRabs/1409.0473. http://arxiv.org/abs/1409.0473.

Michele Banko, Michael J. Cafarella, StephenSoderland, Matt Broadhead, and Oren Et-zioni. 2007. Open information extraction fromthe web. In Proceedings of the 20th Interna-tional Joint Conference on Artifical Intelligence.Morgan Kaufmann Publishers Inc., San Fran-cisco, CA, USA, IJCAI’07, pages 2670–2676.http://dl.acm.org/citation.cfm?id=1625275.1625705.

Page 6: Neural Open Information Extraction“deep learning is a subfield of machine learning”, the triple (deep learning; is a subfield of; ma- ... 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

412

Kyunghyun Cho, Bart van Merrienboer, Caglar Gul-cehre, Dzmitry Bahdanau, Fethi Bougares, HolgerSchwenk, and Yoshua Bengio. 2014. Learningphrase representations using rnn encoder–decoderfor statistical machine translation. In Proceedings ofthe 2014 Conference on Empirical Methods in Nat-ural Language Processing (EMNLP). Associationfor Computational Linguistics, Doha, Qatar, pages1724–1734. http://www.aclweb.org/anthology/D14-1179.

Janara Christensen, Mausam, Stephen Soderland, andOren Etzioni. 2010. Semantic role labeling foropen information extraction. In Proceedings ofthe NAACL HLT 2010 First International Work-shop on Formalisms and Methodology for Learn-ing by Reading. Association for Computational Lin-guistics, Los Angeles, California, pages 52–60.http://www.aclweb.org/anthology/W10-0907.

Luciano Del Corro and Rainer Gemulla. 2013.Clausie: Clause-based open information extrac-tion. In Proceedings of the 22Nd Interna-tional Conference on World Wide Web. ACM,New York, NY, USA, WWW ’13, pages 355–366.https://doi.org/10.1145/2488388.2488420.

Oren Etzioni, Anthony Fader, Janara Christensen,Stephen Soderland, and Mausam Mausam. 2011.Open information extraction: The second gener-ation. In Proceedings of the Twenty-Second In-ternational Joint Conference on Artificial Intelli-gence - Volume Volume One. AAAI Press, IJCAI’11,pages 3–10. https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-012.

Anthony Fader, Stephen Soderland, and Oren Etzioni.2011. Identifying relations for open information ex-traction. In Proceedings of the Conference of Em-pirical Methods in Natural Language Processing(EMNLP ’11). Edinburgh, Scotland, UK.

Kiril Gashteovski, Rainer Gemulla, and LucianoDel Corro. 2017. Minie: Minimizing facts in openinformation extraction. In Proceedings of the 2017Conference on Empirical Methods in Natural Lan-guage Processing. Association for ComputationalLinguistics, Copenhagen, Denmark, pages 2630–2640. https://www.aclweb.org/anthology/D17-1278.

Jonas Gehring, Michael Auli, David Grangier, andYann N Dauphin. 2016. A Convolutional EncoderModel for Neural Machine Translation. ArXiv e-prints .

Jonas Gehring, Michael Auli, David Grangier, DenisYarats, and Yann N Dauphin. 2017. ConvolutionalSequence to Sequence Learning. ArXiv e-prints .

Jiatao Gu, Zhengdong Lu, Hang Li, and Vic-tor O.K. Li. 2016. Incorporating copying mech-anism in sequence-to-sequence learning. In Pro-ceedings of the 54th Annual Meeting of the As-sociation for Computational Linguistics (Volume

1: Long Papers). Association for ComputationalLinguistics, Berlin, Germany, pages 1631–1640.http://www.aclweb.org/anthology/P16-1154.

Caglar Gulcehre, Sungjin Ahn, Ramesh Nallap-ati, Bowen Zhou, and Yoshua Bengio. 2016.Pointing the unknown words. In Proceed-ings of the 54th Annual Meeting of the As-sociation for Computational Linguistics (Volume1: Long Papers). Association for ComputationalLinguistics, Berlin, Germany, pages 140–149.http://www.aclweb.org/anthology/P16-1014.

Sepp Hochreiter and Jurgen Schmidhuber. 1997. Longshort-term memory. Neural Comput. 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735.

Guillaume Klein, Yoon Kim, Yuntian Deng, JeanSenellart, and Alexander M. Rush. 2017.Opennmt: Open-source toolkit for neuralmachine translation. CoRR abs/1701.02810.http://arxiv.org/abs/1701.02810.

Thang Luong, Hieu Pham, and Christopher D. Man-ning. 2015. Effective approaches to attention-basedneural machine translation. In Proceedings of the2015 Conference on Empirical Methods in Natu-ral Language Processing. Association for Compu-tational Linguistics, Lisbon, Portugal, pages 1412–1421. http://aclweb.org/anthology/D15-1166.

Mausam. 2016. Open information extraction sys-tems and downstream applications. In Pro-ceedings of the Twenty-Fifth InternationalJoint Conference on Artificial Intelligence.AAAI Press, IJCAI’16, pages 4074–4077.http://dl.acm.org/citation.cfm?id=3061053.3061220.

Mausam, Michael Schmitz, Robert Bart, StephenSoderland, and Oren Etzioni. 2012. Open languagelearning for information extraction. In Proceed-ings of Conference on Empirical Methods in Natu-ral Language Processing and Computational Natu-ral Language Learning (EMNLP-CONLL).

Rui Meng, Sanqiang Zhao, Shuguang Han, DaqingHe, Peter Brusilovsky, and Yu Chi. 2017. Deepkeyphrase generation. In Proceedings of the55th Annual Meeting of the Association forComputational Linguistics (Volume 1: LongPapers). Association for Computational Lin-guistics, Vancouver, Canada, pages 582–592.http://aclweb.org/anthology/P17-1054.

Ramesh Nallapati, Bowen Zhou, Cıcero Nogueirados Santos, aglar Gulehre, and Bing Xiang. 2016.Abstractive text summarization using sequence-to-sequence rnns and beyond. In CoNLL.

Harinder Pal and Mausam. 2016. Demonyms and com-pound relational nouns in nominal open ie. In Pro-ceedings of the 5th Workshop on Automated Knowl-edge Base Construction. Association for Compu-tational Linguistics, San Diego, CA, pages 35–39.http://www.aclweb.org/anthology/W16-1307.

Page 7: Neural Open Information Extraction“deep learning is a subfield of machine learning”, the triple (deep learning; is a subfield of; ma- ... 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

413

Alexander M. Rush, Sumit Chopra, and Jason We-ston. 2015. A neural attention model for abstrac-tive sentence summarization. In Proceedings of the2015 Conference on Empirical Methods in Natu-ral Language Processing. Association for Computa-tional Linguistics, Lisbon, Portugal, pages 379–389.http://aclweb.org/anthology/D15-1044.

Swarnadeep Saha, Harinder Pal, and Mausam. 2017.Bootstrapping for numerical open ie. In Pro-ceedings of the 55th Annual Meeting of the As-sociation for Computational Linguistics (Volume2: Short Papers). Association for ComputationalLinguistics, Vancouver, Canada, pages 317–323.http://aclweb.org/anthology/P17-2050.

Rudolf Schneider, Tom Oberhauser, Tobias Klatt,Felix A. Gers, and Alexander Loser. 2017.Analysing errors of open information extractionsystems. In Proceedings of the First Work-shop on Building Linguistically GeneralizableNLP Systems. Association for Computational Lin-guistics, Copenhagen, Denmark, pages 11–18.http://www.aclweb.org/anthology/W17-5402.

Abigail See, Peter J. Liu, and Christopher D. Manning.2017. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th An-nual Meeting of the Association for ComputationalLinguistics (Volume 1: Long Papers). Associationfor Computational Linguistics, Vancouver, Canada,pages 1073–1083. http://aclweb.org/anthology/P17-1099.

Gabriel Stanovsky and Ido Dagan. 2016. Cre-ating a large benchmark for open informationextraction. In Proceedings of the 2016 Con-ference on Empirical Methods in Natural Lan-guage Processing. Association for ComputationalLinguistics, Austin, Texas, pages 2300–2305.https://aclweb.org/anthology/D16-1252.

Gabriel Stanovsky, Jessica Ficler, Ido Dagan, andYoav Goldberg. 2016. Getting more out ofsyntax with props. CoRR abs/1603.01648.http://arxiv.org/abs/1603.01648.

Ilya Sutskever, Oriol Vinyals, and Quoc V.Le. 2014. Sequence to sequence learningwith neural networks. CoRR abs/1409.3215.http://arxiv.org/abs/1409.3215.

Ashish Vaswani, Noam Shazeer, Niki Parmar, JakobUszkoreit, Llion Jones, Aidan N Gomez, Ł ukaszKaiser, and Illia Polosukhin. 2017. Attention is allyou need. In I. Guyon, U. V. Luxburg, S. Ben-gio, H. Wallach, R. Fergus, S. Vishwanathan, andR. Garnett, editors, Advances in Neural Informa-tion Processing Systems 30, Curran Associates, Inc.,pages 5998–6008. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf.

Oriol Vinyals, Alexander Toshev, Samy Bengio, andDumitru Erhan. 2014. Show and tell: A neu-ral image caption generator. CoRR abs/1411.4555.http://arxiv.org/abs/1411.4555.

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V.Le, Mohammad Norouzi, Wolfgang Macherey,Maxim Krikun, Yuan Cao, Qin Gao, KlausMacherey, Jeff Klingner, Apurva Shah, MelvinJohnson, Xiaobing Liu, Lukasz Kaiser, StephanGouws, Yoshikiyo Kato, Taku Kudo, HidetoKazawa, Keith Stevens, George Kurian, NishantPatil, Wei Wang, Cliff Young, Jason Smith, JasonRiesa, Alex Rudnick, Oriol Vinyals, Greg Corrado,Macduff Hughes, and Jeffrey Dean. 2016. Google’sneural machine translation system: Bridging the gapbetween human and machine translation. CoRRabs/1609.08144. http://arxiv.org/abs/1609.08144.