Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Universidade de São Paulo
2015-09
Semi-supervised never-ending learning in
rhetorical relation identification International Conference on Recent Advances in Natural Language Processing, 2015, Hissar.http://www.producao.usp.br/handle/BDPI/49271
Downloaded from: Biblioteca Digital da Produção Intelectual - BDPI, Universidade de São Paulo
Biblioteca Digital da Produção Intelectual - BDPI
Departamento de Ciências de Computação - ICMC/SCC Comunicações em Eventos - ICMC/SCC
INTERNATIONAL CONFERENCE
RECENT ADVANCES IN
NATURAL LANGUAGE PROCESSING
P R O C E E D I N G S
Edited by
Galia Angelova, Kalina Bontcheva, Ruslan Mitkov
Hissar, Bulgaria
7–9 September, 2015
INTERNATIONAL CONFERENCE
RECENT ADVANCES IN
NATURAL LANGUAGE PROCESSING’2015
PROCEEDINGS
Hissar, Bulgaria
7–9 September 2015
ISSN 1313-8502
Designed and Printed by INCOMA Ltd.
Shoumen, BULGARIA
ii
Table of Contents
POS Tagging for Arabic Tweets
Fahad Albogamy and Allan Ramsay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Lexicon-based Sentiment Analysis for Persian Text
Fatemeh Amiri, Simon Scerri and Mohammadhassan Khodashahi . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Automatic Construction of a TMF Terminological Database using a Transducer Cascade
Chihebeddine Ammar, Kais Haddar and Laurent Romary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
A Statistical Model for Measuring Structural Similarity between Webpages
Zhenisbek Assylbekov, Assulan Nurkas and Inês Russinho Mouga . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Predicting the quality of questions on Stackoverflow
Antoaneta Baltadzhieva and Grzegorz Chrupała . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
How Topic Biases Your Results? A Case Study of Sentiment Analysis and Irony Detection in Italian
Francesco Barbieri, Francesco Ronzano and Horacio Saggion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Arabic Named Entity Recognition Process using Transducer Cascade and Arabic Wikipedia
Fatma BEN MESMIA, Kais HADDAR, Denis Maurel and Nathalie Friburger . . . . . . . . . . . . . . . . 48
DanProof: Pedagogical Spell and Grammar Checking for Danish
Eckhard Bick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55
Maximal Repeats Enhance Substring-based Authorship Attribution
Romain Brixtel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Improving Event Detection with Active Learning
Kai Cao, Xiang Li, Miao Fan and Ralph Grishman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Improving Event Detection with Dependency Regularization
Kai Cao, Xiang Li and Ralph Grishman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Authorship Verification, Average Similarity Analysis
Daniel Castro Castro, Yaritza Adame Arcia, María Pelaez Brioso and Rafael Muñoz Guillena . . 84
Coreference Resolution to Support IE from Indian Classical Music Forums
Joe Cheri and Pushpak Bhattacharyya. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .91
Readability Assessment of Translated Texts
Alina Maria Ciobanu, Liviu P. Dinu and Flaviu Pepelea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Processing and Normalizing Hashtags
Thierry Declerck and Piroska Lendvai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Tune Your Brown Clustering, Please
Leon Derczynski, Sean Chester and Kenneth Bøgh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Temporal Relation Classification using a Model of Tense and Aspect
Leon Derczynski and Robert Gaizauskas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Efficient Named Entity Annotation through Pre-empting
Leon Derczynski and Kalina Bontcheva . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
xi
A Joint Model of Product Properties, Aspects and Ratings for Online Reviews
Ying Ding and Jing Jiang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Towards Opinion Summarization from Online Forums
Ying Ding and Jing Jiang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Cross-lingual Synonymy Overlap
Anca Dinu, Liviu P. Dinu and Ana Sabina Uban . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Barbecued Opakapaka: Using Semantic Preferences for Ontology Population
Ismail El Maarouf, Georgiana Marsic and Constantin Orasan. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .153
Towards a Lexicon-grammar based Framework for NLP: an Opinion Mining Application
Annibale Elia, Serena Pelosi, Alessandro Maisto and Raffaele Guarasci . . . . . . . . . . . . . . . . . . . . . 160
Using the Textual Content of the LMF-Normalized Dictionaries for Identifying and Linking the Syntactic
Behaviors to the Meanings
Imen ELLEUCH, Bilel GARGOURI and Abdelmajid BEN HAMADOU . . . . . . . . . . . . . . . . . . . 168
Weakly Supervised Definition Extraction
Luis Espinosa Anke, Horacio Saggion and Francesco Ronzano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
Jointly Embedding Relations and Mentions for Knowledge Population
Miao Fan, Kai Cao, Yifan He and Ralph Grishman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
Distributional Semantics for Resolving Bridging Mentions
Tim Feuerbach, Martin Riedl and Chris Biemann . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
Text Classification into Abstract Classes Based on Discourse Structure
Boris Galitsky, Dmitry Ilvovsky and Sergey O. Kuznetsov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
Enriching Word Sense Embeddings with Translational Context
Mehdi Ghanimifard and Richard Johansson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Automatic Acquisition of Artifact Nouns in French
Xiaoqin Hu and Pierre-André Buvet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
Feature-Rich Part-Of-Speech Tagging Using Deep Syntactic and Semantic Analysis
Luchezar Jackov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
The Application of Constraint Rules to Data-driven Parsing
Sardar Jaf and Allan Ramsay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
Part-of-Speech Tagging for Code-Mixed English-Hindi Twitter and Facebook Chat Messages
Anupam Jamatia, Björn Gambäck and Amitava Das . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
Sentiment Analysis in Twitter for Macedonian
Dame Jovanoski, Veno Pachovski and Preslav Nakov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
About Emotion Identification in Visual Sentiment Analysis
Olga Kanishcheva and Galia Angelova . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
Fine-Grained Sentiment Analysis for Movie Reviews in Bulgarian
Borislav Kapukaranov and Preslav Nakov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
xii
Structural Alignment for Comparison Detection
Wiltrud Kessler and Jonas Kuhn. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .275
Recognition of Polish Temporal Expressions
Jan Kocon and Michał Marcinczuk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
Domain Adaptation with Filtering for Named Entity Extraction of Japanese Anime-Related Words
Kanako Komiya, Daichi EDAMURA, Ryuta TAMURA, Minoru SASAKI, Hiroyuki SHINNOUand Yoshiyuki KOTANI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
Feature Extraction for Native Language Identification Using Language Modeling
Vincent Kríž, Martin Holub and Pavel Pecina . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
Learning Agglutinative Morphology of Indian Languages with Linguistically Motivated Adaptor Gram-
mars
Arun Kumar, Lluís Padró and Antoni Oliver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
Statistical Sandhi Splitter and its Effect on NLP Applications
Prathyusha Kuncham, Kovida Nelakuditi and Radhika Mamidi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
Automatically Identifying Periodic Social Events from Twitter
Florian Kunneman and Antal Van den Bosch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
Collecting and Evaluating Lexical Polarity with A Game With a Purpose
Mathieu Lafourcade, Alain Joubert and Nathalie Le Brun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
Medical imaging report indexing: enrichment of index through an algorithm of spreading over a lexico-
semantic network
Mathieu Lafourcade and Lionel Ramadier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
Taxonomy Beats Corpus in Similarity Identification, but Does It Matter?
Minh Le and Antske Fokkens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
Semantic Parsing via ℓ_0-norm-based Alignment
Zhihua Liao, Qixian Zeng and Qiyun Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
A Supervised Semantic Parsing with Lexicon Extension and Syntactic Constraint
Zhihua Liao, Qixian Zeng and Qiyun Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
Predicting the Level of Text Standardness in User-generated Content
Nikola Ljubešic, Darja Fišer, Tomaž Erjavec, Jaka Cibej, Dafne Marko, Senja Pollak and Iza Škr-janec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
Predicting Inflectional Paradigms and Lemmata of Unknown Words for Semi-automatic Expansion of
Morphological Lexicons
Nikola Ljubešic, Miquel Esplà-Gomis, Filip Klubicka and Nives Mikelic Preradovic . . . . . . . . . 379
Predicting Correlations Between Lexical Alignments and Semantic Inferences
Simone Magnolini and Bernardo Magnini . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388
Learning the Impact of Machine Translation Evaluation Metrics for Semantic Textual Similarity
Simone Magnolini, Ngoc Phuoc An Vo and Octavian Popescu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
Norwegian Native Language Identification
Shervin Malmasi, Mark Dras and Irina Temnikova . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
xiii
Automatic construction of complex features in Conditional Random Fields for Named Entities Recogni-
tion
Michał Marcinczuk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
Pattern Construction for Extracting Domain Terminology
Yusney Marrero García, Paloma Moreda Pozo and Rafael Muñoz-Guillena . . . . . . . . . . . . . . . . . . 420
A Procedural Definition of Multi-word Lexical Units
Marek Maziarz, Stan Szpakowicz and Maciej Piasecki . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
Semi-Supervised Never-Ending Learning in Rhetorical Relation Identification
Erick Galani Maziero, Graeme Hirst and Thiago Alexandre Salgueiro Pardo . . . . . . . . . . . . . . . . . 436
Exposing Paid Opinion Manipulation Trolls
Todor Mihaylov, Ivan Koychev, Georgi Georgiev and Preslav Nakov . . . . . . . . . . . . . . . . . . . . . . . . 443
Extractive Summarization by Aggregating Multiple Similarities
Olof Mogren, Mikael Kågebäck and Devdatt Dubhashi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
Statistical Machine Translation Improvement based on Phrase Selection
Cyrine Nasri, Chiraz Latiri and Kamel Smaili . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
A Simple and Efficient Method to Generate Word Sense Representations
Luis Nieto Piña and Richard Johansson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
TBXTools: A Free, Fast and Flexible Tool for Automatic Terminology Extraction
Antoni Oliver and Mercè Vàzquez. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .473
Enlarging the Croatian WordNet with WN-Toolkit and Cro-Deriv
Antoni Oliver, Krešimir Šojat and Matea Srebacic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
A Comparative Study of Different Sentiment Lexica for Sentiment Analysis of Tweets
Canberk Ozdemir and Sabine Bergler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
Classification of Attributes in a Natural Language Query into Different SQL Clauses
Ashish Palakurthi, Ruthu S M, Arjun Akula and Radhika Mamidi . . . . . . . . . . . . . . . . . . . . . . . . . . 497
Classifying Idiomatic and Literal Expressions Using Vector Space Representations
Jing Peng, Anna Feldman and Hamza Jazmati . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
Extraction of the Multi-word Lexical Units in the Perspective of the Wordnet Expansion
Maciej Piasecki, Michał Wendelberger and Marek Maziarz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
A New Approach to Automated Text Readability Classification based on Concept Indexing with Integrated
Part-of-Speech n-gram Features
Abigail Razon and John Barnden . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
Classification of Lexical Collocation Errors in the Writings of Learners of Spanish
Sara Rodríguez-Fernández, Roberto Carlini and Leo Wanner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
Measuring Semantic Similarity for Bengali Tweets Using WordNet
Dwijen Rudrapal, Amitava Das and Baby Bhattacharya . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .537
Ordering adverbs by their scaling effect on adjective intensity
Josef Ruppenhofer, Jasper Brandes, Petra Steiner and Michael Wiegand . . . . . . . . . . . . . . . . . . . . . 545
xiv
bRol: The Parser of Syntactic and Semantic Dependencies for Basque
Haritz Salaberri, Olatz Arregi and Beñat Zapirain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
Error Analysis and Improving Speech Recognition for Latvian Language
Askars Salimbajevs and Jevgenijs Strigins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563
Towards the Unsupervised Acquisition of Implicit Semantic Roles
Niko Schenk, Christian Chiarcos and Maria Sukhareva . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570
Evaluating the Impact of Using a Domain-specific Bilingual Lexicon on the Performance of a Hybrid
Machine Translation Approach
Nasredine Semmar, Othman Zennaki and Meriama Laib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579
Hierarchical Topic Structuring: From Dense Segmentation to Topically Focused Fragments via Burst
Analysis
Anca-Roxana Simon, Pascale Sébillot and Guillaume Gravier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588
Improving Word Sense Disambiguation with Linguistic Knowledge from a Sense Annotated Treebank
Kiril Simov, Alexander Popov and Petya Osenova . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596
Learning Relationship between Authors’ Activity and Sentiments: A case study of online medical forums
Marina Sokolova and Victoria Bobicev . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604
Translating from Original to Simplified Sentences using Moses: When does it Actually Work?
Sanja Štajner and Horacio Saggion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
Automatic Text Simplification for Spanish: Comparative Evaluation of Various Simplification Strategies
Sanja Štajner, Iacer Calixto and Horacio Saggion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618
Towards Multilingual Event Extraction Evaluation: A Case Study for the Czech Language
Josef Steinberger and Hristo Tanev . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627
A VSM-based Statistical Model for the Semantic Relation Interpretation of Noun-Modifier Pairs
Nitesh Surtani and Soma Paul . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636
An LDA-based Topic Selection Approach to Language Model Adaptation for Handwritten Text Recogni-
tion
Jafar Tanha, Jesse de Does and Katrien Depuydt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646
Training Automatic Transliteration Models on DBPedia Data
Velislava Todorova and Kiril Simov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654
Arabic-English Semantic Word Class Alignment to Improve Statistical Machine Translation
Ines Turki Khemakhem, Salma Jamoussi and Abdelmajid Ben Hamadou . . . . . . . . . . . . . . . . . . . . 663
Detection and Fine-Grained Classification of Cyberbullying Events
Cynthia Van Hee, Els Lefever, Ben Verhoeven, Julie Mennes, Bart Desmet, Guy De Pauw, WalterDaelemans and Veronique Hoste . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 672
A New Approach for Idiom Identification Using Meanings and the Web
Rakesh Verma and Vasanthi Vuppuluri . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681
Learning the Impact and Behavior of Syntactic Structure: A Case Study in Semantic Textual Similarity
Ngoc Phuoc An Vo and Octavian Popescu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 688
xv
Six Good Predictors of Autistic Text Comprehension
Victoria Yaneva and Richard Evans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697
User Name Disambiguation in Community Question Answering
Baoguo Yang and Suresh Manandhar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707
Discovering Causal Relations in Textual Instructions
Kristina Yordanova . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714
A Large Wordnet-based Sentiment Lexicon for Polish
Monika Zasko-Zielinska, Maciej Piasecki and Stan Szpakowicz . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721
Named Entity Recognition of Persons’ Names in Arabic Tweets
Omnia Zayed and Samhaa El-Beltagy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731
One Tree is not Enough: Cross-lingual Accumulative Structure Transfer for Semantic Indeterminacy
Patrick Ziering and Lonneke van der Plas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 739
Lost in Discussion? Tracking Opinion Groups in Complex Political Discussions by the Example of the
FOMC Meeting Transcriptions
Cäcilia Zirn, Robert Meusel and Heiner Stuckenschmidt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747
xvi
Proceedings of Recent Advances in Natural Language Processing, pages 436–442,Hissar, Bulgaria, Sep 7–9 2015.
Semi-Supervised Never-Ending Learning in Rhetorical RelationIdentification
Erick G. Maziero1,2, Graeme Hirst1
1Department of Computer Science
University of Toronto
Toronto, ON, M5S 3G4, Canada
erick,[email protected]
Thiago A. S. Pardo2
2Department of Computer Science
University of Sao Paulo
Sao Carlos, SP, 13566-590, Brazil
Abstract
Some languages do not have enough la-
beled data to obtain good discourse pars-
ing, specially in the relation identification
step, and the additional use of unlabeled
data is a plausible solution. A workflow
is presented that uses a semi-supervised
learning approach. Instead of only a pre-
defined additional set of unlabeled data,
texts obtained from the web are continu-
ously added. This obtains near human per-
fomance (0.79) in intra sentential rhetori-
cal relation identification. An experiment
for English also shows improvement using
a similar workflow.
1 Introduction
A text is composed of coherent propositions
(phrases and sentences, for example) ordered and
connected according to the intentions of the au-
thor of the text. This composition may be recog-
nized and structured according to many theories
and this type of information is valuable to many
natural language processing applications. A pro-
cess to recognize, automatically, the coherent or
discursive (or also rhetorical) structure of a text is
named discourse parsing (DP).
The most prominent theory in Computational
Linguistics to structure the discourse of a text is
the Rhetorical Structure Theory (RST) proposed
by Mann and Thompson (1987). In this theory,
the text is segmented into elementary discourse
units (EDUs), which each contain a proposition
(basic idea) of the text. The theory proposes a
set of rhetorical relations that may hold between
Figure 1: An example of sentence-level struc-
ture according to RST. From Soricut and Marcu
(2003).
the EDUs, explicating the intentions of the author.
For example, consider the sentence in Figure 1.
It is segmented into three EDUs, numbered from
1 to 3. EDUs 2 and 3 are related by the relation
Enablement, forming a new span of text, which is
related to 1 by the relation Attribution. In each re-
lation, EDUs can be Nucleus (more essential) or
Satellite to the writer’s purpose.
Many approaches have been used in DP, the
majority of them using machine learning algo-
rithms, such as probabilistic models (Soricut and
Marcu, 2003), SVMs (Reitter, 2003; duVerle and
Prendinger, 2009; Hernault et al., 2010; Feng and
Hirst, 2012) and dynamic conditional random field
(Joty et al., 2012). To obtain acceptable results,
these approaches need plenty of labeled data. But
even more than other levels of linguistic informa-
tion, such as morphology or syntax, the annotation
of discourse is an expensive task. Given this fact,
what can we do when there is not enough data to
perform effective learning of DP, as in languages
with little annotated data?
This paper describes a methodology to over-
come the problem of insufficient labeled data in
the task of identifying rhetorical relations between
436
Figure 2: Lexicalized syntactic tree used by
SPADE. The circles indicate the node used as the
most indicative information to identify the rhetor-
ical relation and structure.
EDUs, which is the most important step during
DP. The language used in our work is Portuguese
and two well-known systems of DP for English
were adapted to this language. Portuguese is a lan-
guage with insufficient annotated data to obtain a
good discourse parser, but has all the tools to adapt
some English discourse parsers. A framework of
semi-supervised never-ending learning (SSNEL)
(see Section 2.2 below) was created and evaluated
with the adapted models. The results show that
this approach improved the results to achieve near-
human perfomance, even with the use of automatic
tools (syntax parser and discourse segmenter).
2 Related Work
2.1 Supervised Discourse Parsing
Soricut and Marcu (2003) use two probabilistic
models to perform a sentence-level analysis, one
for segmentation and other to identify the rela-
tions and build the rhetorical structure. The parser
is named SPADE (Sentence-level Parsing of Dis-
coursE) and the authors base their model on lex-
ical and syntactic information, extracting features
from a lexicalized syntactic tree. They assume that
the features extracted at the jointing point of two
discursive segments are the most indicative infor-
mation to identify the rhetorical structure of the
sentence. For example, in Figure 2, the circled
nodes correspond to the most indicative cues to
identify the structure and relation between each
two adjacent segments.
The authors report a F-measure of 0.49 in a
set of 18 RST relations. The human performance
in this same task is 0.77 (measured by inter-
annotation agreement). The authors, then, use the
probabilistic model with manual segmentation and
syntactic trees to see the impact of this information
in the parsing and the model achieves 0.75.
Hernault et al. (2010) use support vector ma-
chine (SVM) classifiers to perform DP. This dis-
course parser is named HILDA (HIgh-Level Dis-
course Analyser). This work used a set of 41
rhetorical relations and achieves a F-measure of
0.48 in the step of relation identification, both
intra-sentential and inter-sentential.
Feng and Hirst (2012) improve HILDA by
incorporating new proposed features and some
adapted from Lin et al. (2009). Another impor-
tant decision was the specification of features for
intra-sentential and inter-sentential relationships
and the use of contextual features in the building
of the rhetorical tree. Considering the approach
to intra-sentential relation identification, with 18
RST relations this work achieves a macro aver-
age F-measure of 0.49 and weighted average F-
measure of 0.77 in relation identification.
Joty et al. (2012) use a joint modelling ap-
proach to identify the structure and the relations at
the sentence-level using DCRFs (dynamic condi-
tional random fields) and a non-greedy bottom-up
method in the construction of the rhetorical struc-
ture. The features used in this work were similar to
those used by HILDA. They achieve a F-measure
of 0.77, using manual segmentation, and 0.65 us-
ing automatic segmentation.
Some languages, such as Portuguese, do not
have enough data to train a good DP and there is
no work treating this limitation in this language.
The first attempt to perform DP in Portuguese was
made by Pardo and Nunes (2006), who used an
approach based on lexical patterns extracted from
an RST-annotated corpus of academic texts to cre-
ate DiZer (Discourse analyZer). More than 740
lexical patterns were manually extracted from the
corpus. A lexical pattern is composed of the dis-
cursive markers, its position in the EDU, and cor-
responding nuclearity. The use of lexical patterns
is a unique approach for Portuguese, and achieves
a F-measure of 0.625 in relation detection when
evaluated in academic texts; in news texts, DiZer
achieves an F-measure of 0.405.
437
2.2 Semi-supervised Discourse Parsing
All the above cited approaches to DP use anno-
tated data to extract discursive knowledge and are
limited to the availability of this resource, which
is expensive to obtain. Specially, it is important to
note that, to obtain good performance in the task
more data is necessary. Semi-supervised learning
(SSL) is employed in scenarios in which there is
some labeled data and large availability of unla-
beled data, and manual annotation is an expensive
task (Zhu, 2008).
Related to the use of SSL in DP, Marcu and
Echihabi (2002) used naive Bayes to train binary
classifiers to distinguish between some types of
relations, as Elaboration vs. Cause-Explanation-
Evidence. For example, for this binary classifier,
applying SSL, the accuracy increased from ap-
proximately 0.6 to 0.95 after the use of millions
of new instances. Chiarcos (2012) used SSL to
develop a probabilistic model mapping the occur-
rence of discourse markers and verbs to rhetori-
cal relations. For Italian, Soria and Ferrari (1998)
conducted work in the same direction. Sporleder
and Lascarides (2005) performed similar work to
Marcu and Echihabi, with similar results for a dif-
ferent set of relations and a more sophisticated
classifier. Building on this, there is an interest-
ing idea, known as never-ending learning (NEL)
by Carlson et al. (2010), in which they apply SSL
with infinite unlabeled data. The needed data is
widely and freely available on the web. Their ar-
chitecture runs 24 hours per day, forever, obtaining
new information and performing a learning task.
With the aim of surpassing the limitation of la-
beled RST in Portuguese to develop a good DP,
we employ SSNEL in the task by adapting the
work of Soricut and Marcu (2003) and Hernault
et al. (2010). This choice for SSLNEL was made
considering the large and free availability of news
texts on the web.
3 RST Corpora
RST-DT (RST Discourse TreeBank) (Carlson
et al., 2001) is the most widely used corpus an-
notated with RST in English. Table 1 compares
it with available Portuguese corpora labeled ac-
cording to RST (these corpora will be referred to
as RST-DT-PT hereafter). The corpora CSTNews
(Cardoso et al., 2011), Summ-it (Collovini et al.,
2007) and two-thirds of Rhetalho (Pardo and Seno,
Corpus Language Documents Words
RST-DT-PT PT 340 120,847
CSTNews 140 47,240
Rhetalho 50 2,903
Summ-it 50 16,704
CorpusTCC 100 53,000
RST-DT EN 385 176,383
Table 1: Size of the RST-DT-PT and its compo-
nents, and of the RST-DT.
2005) are composed of news texts, and the cor-
pus CorpusTCC (Pardo and Nunes, 2004) and the
reminder of Rhetalho are composed of scientific
texts. The RST-DT contains more documents (45)
and many more words (55,536) than RST-DT-PT.
This work focuses on the identification of
rhetorical relations at the sentence level, and as
is common since the work of Soricut and Marcu
(2003), fine-grained relations were grouped: 29
sentence-level rhetorical relations were found and
grouped into 16 groups. The imbalance of the re-
lations is a natural characteristic in discourse and,
to avoid overfitting of a learning model on the less-
frequent relations, no balancing was made. The re-
lation Summary, for example, occurs only 2 times,
and Elaboration occurs 1491 times, making very
difficult the identification of the Summary relation.
4 Adapted Models
Syntactic information is crucial in SPADE (Sori-
cut and Marcu, 2003) and for Portuguese the
parser most similar to that used by Soricut and
Marcu is the LX-parser (Stanford parser trained to
Portuguese (Silva et al., 2010)). After the pars-
ing of the text by the syntactic parser, the same
lexicalization procedure (Magerman, 1995) was
applied and adapted according to the tagset used
by LX-parser. In this adaptation, only pairs of
adjacent segments at sentence-level were consid-
ered, and nuclearity was not considered, in or-
der to avoid sparseness in the data. Training the
adapted model (here called SPADE-PT) using the
RST-DT-PT achieved F-measure of 0.30. The pre-
cision was 0.69, but the recall was only 0.19.
The same features used by HILDA (Hernault
et al., 2010) were extracted from the pairs of ad-
jacent segments at sentence-level and many ma-
chine learning algorithms were tested, besides the
SVM, which was used in the original work. The
algorithm which obtained the best F-measure was
438
J48, an implementation of decision trees (Quinlan,
1993). The RST-DT-PT corpora was used and the
adaptation (here called of HILDA-PT) achieved
an F-measure of 0.548, which is much better than
that of SPADE-PT. A possible explanation is that
the feature set in SPADE is composed only of
syntactic tags and words. The resulting proba-
bilistic model is sparse and many equal instances
may indicate different relations (classes). How-
ever HILDA adds more features over which the
classifier can work better, even when some values
are absent.
Given the results of the adapted models,
HILDA-PT was chosen to be incorporated into the
SSNEL, explicated below.
5 Semi-supervised Never-ending
Learning Workflow
Here, an adaptation of Carlson et al. (2010) self-
training algorithm was used. Two different ap-
proaches to relation identification are used, that
is to say, a lexical pattern set LPS (the relation
identification module of DiZer), and a multi-class
classifier C generated according to some machine
learning algorithm. All the new instances obtained
from the lexical module are used together with the
more confident classifications of C to retrain this
last. For each classification, J48 returns a con-
fidence value used to choose the most confident
classifications.
Also, there is interest in observing the be-
haviour of the classifier in each iteration of
the semi-supervision, searching for the best
F-measure it may achieve. In this way, a workflow
of never-ending learning (NEL) was proposed
and is presented in Figure 3. Workflow 1 is
presented as an alternative visualization to the
illustration in Figure 3. Continuously, a crawler
gets pages from online news on the web and
performs cleaning to obtain the main text (Text).
In a first iteration, a Segmenter (Maziero et al.,
2007) is applied to obtain the EDUs in each
sentence and, for each pair of adjacent EDUs
(PairEDUs), the C1 classifier (C1 initially trained
with the LabeledData1 from the RST-DT-PT) and
the lexical pattern set LPS are used to identify
the relations between the segments. To retrain
C1, all the new instances from the lexical pattern
set LabeledDataLPS (as LPS does not provide
a confidence value, all the labelled instances are
Data: LabeledData1 and Text
train a classifier C1 using LabeledData1
while exist some Text doget one Text from NewsTexts
apply Segmenter on Text to obtain PairEDUs
Index← 1
forall the PairEDUs doapply LPS to obtain LabeledDataLPS
apply CIndex to obtain LabeledDataC
forall the LabeledDataC as newInstanceC do
if confidence of newInstanceC ≥ 0.7 thenLabeledDataCCon f ident←
newInstanceend
endLabeledDataIndex+1←
LabeledDataLPS+
LabeledDataCCon f ident
train a new classifier CIndex+1
using LabeledDataIndex+1
apply Monitor and obtain
FmCIndex+1
plot FmCIndex+1 in the graph G if
FmCindex+1 < FmCIndex thendiscard CIndex+1
CIndex+1←CIndex
end
end
end
Workflow 1: Workflow of the SSNEL using two
models to identify rhetorical relations between
each PairEDUs.
used in the semi-supervision) and the classifi-
cations LabeledDataC with confidence greater
than 0.7 by C1 are joined with LabeledData1
to obtain LabeledData2 (LabeledData2 =
LabeledDataLPS + LabeledDataC). After the
retraining, a Monitor verifies the new F-measure
of C2 (FmC2, obtained using 10-fold cross val-
idation) and, if it decreased compared with the
F-measure of C1 (FmC1), C2 is discarded and, for
the next iteration, C1 will continue to be used. If
FmC1 did not decrease, C2 will be used in the
next iteration. Monitor also plots a graph G to
present the behaviour of FmC during SSNEL.
This process continues iteratively.
It is important to note that, given the small size
of the training data, we opted to use 10-fold cross-
validation during the training and testing of the
439
Figure 3: SSNEL workflow.
classifiers, instead of separating the data into three
sets (training, development, and test). The total
number of instances was 6163 and some relations,
such as Restatement with 28 instances, would have
few relations when split into three sets.
During the semi-supervision of SPADE-PT, the
model of relation identification was incrementally
obtained at each iteration, since the addition of
a new instance only modifies the probabilities of
the instances already present in the model. If
the instance is new, it is added to the model and
the probabilities are adjusted. However, in the
semi-supervision of the HILDA-PT, the algorithm
J48 does not allow incremental learning. There
are some implementations of incremental decision
trees, but the resulting models are not as accurate
as J48 because they work with an incomplete set of
training instances. As we want the best F-measure
for relation identification, the algorithm J48 was
employed, even though it is not an incremental
learning.
Another important decision is to monitor the
concept-drift (CD) (Klinkenberg, 2004) during the
SSNEL, given that a concept may change over
time. In this work, CD refers to different sources
and topics to which the classifier is applied. To
treat CD, the algorithm may detect the evolution
of the concept and be able to modify the model to
accommodate the concept, avoiding the decrease
MethodF-measure
InstancesInitial Final
DiZer 0.22 - -
Elaboration Relation 0.26 - -
SPADE-PT 0.30 0.34 1,592
HILDA-PT 0.55 0.79 21,740
Table 2: Comparison of results considering the
two adapted models (SPADE-PT and HILDA-
PT) with two baselines (Elaboration Relation and
DiZer).
in the performance of the model being generated.
One technique to monitor the CD is statistical
process control (SPC) (Gama et al., 2004). This
technique constantly analyses the error during the
learning: if the F-measure drops, it may indicate
some changes in the concept and the model needs
to be modified. In the SSNEL workflow, this is
treated by the Monitor, which discards new in-
stances used to retrain the model if its F-measure
decreases, ensuring that the learned model always
acquires correct new learning.
6 Experiments
Considering Workflow 1, the two adapted models
were instantiated as C, and many iterations were
executed. After 1,640 iterations and the addition
of 1,592 new training instances, the F-measure of
SPADE-PT increased only 0.05. HILDA-PT, af-
ter 180 iterations and with the addition of 21,740
new instances, increased 0.24, achieving 0.79 us-
ing automatic segmentation. Table 2 presents a
summary of the results. As explained in Section
4, the features used by SPADE-PT lead to a sparse
model (when there is not enough initial data), and
this is the reason that, during 1,640 iterations, only
1,592 new instances were acquired, compared to
the number of iterations and new instances during
the experiment with HILDA-PT.
To evaluate the parsers, two baselines were con-
sidered. One of them (Elaboration Relation) is the
labeling of all the instances with the most frequent
relation in the corpus (Elaboration); the second is
the use of LPS (DiZer) applied to all PairEDUs
in RST-DT-PT. SPADE-PT, even after many iter-
ations in SSNEL, performed lower than the two
baselines. HILDA-PT, since even before the use
of SSNEL, performed better than the baselines.
The class composed of relations Interpretation,
Evaluation and Conclusion had 40 labeled exam-
440
ples, initially. After the iterations, its F-measure
increased from 0.054 to 0.916. Except for Com-
parison and Summary, all the other relations in-
creased their F-measures. This reinforces the re-
sults obtained by Marcu and Echihabi (2002),
which increased (the result of a binary classifier
to distinguish between two relations) from 0.6 to
0.95 after the use of millions of new instances. The
relation Summary, however, with only 2 labeled
instances, continued its zero F-measure.
SSNEL of HILDA-PT was executed for 23
days. Documents used had on average 28 sen-
tences and 749 words. The choice of only 10 doc-
uments per batch is to have a fine-grained control
over the new instances, given that if a new classi-
fier decreases the F-measure, it is discarded. Out
of 70 generated classifiers were discarded.
As the use of 10-fold cross-validation in the
SSNEL may lead to some overfitting on the data
which was already classified in the workflow, two
other SSNEL experiments were performed, for
English and Portuguese, with separated training
and test sets. These experiments had less time to
run, and, in order to determine whether the im-
provements during the SSNEL were statistically
significant, paired T-tests were employed to com-
pare initial classifier and the best classifier ob-
tained during iterations in the workflow. The test
shown improvements (at the level p < .1), even
though they are low for both experiments. Prob-
ably, with many more iterations the results would
be better. Table 3 shows the improvements in the
accuracy during the SSNEL, the number of itera-
tions, and the number of new instances incorpo-
rated in the training data. Although a direct com-
parison between the experiments is not fair, due to
different corpora, the improvements show that this
workflow is promising to increase the accuracy of
classifiers with unlabeled data.
The experiment with SSNEL for English was
realized in order to see the results that could be ob-
tained when large annotated corpora are available.
In the SSNEL for English, only decision-tree clas-
sifiers were used to classify new instances. For
Portuguese, a symbolic model (lexical patterns)
was also used together with the classifiers.
The improved results presented in Table 2 and
3 are very different due to differing evaluation
strategies. Using separated test data, we tried to
avoid possible overfitting on training data, but the
size of test data may not lead to a fair evaluation
ExperimentAccuracy
Instances IterationsInitial Final
Portuguese 0.531 0.556 1,247 200
English 0.635 0.645 565 25
Table 3: Results of SSNEL applied to Portuguese
and English languages using training and test sets.
of some relations with very few examples.
We do not compare our results to those of
Soricut and Marcu (2003) or Joty et al. (2012),
since HILDA-PT used different corpora (RST-DT-
PT instead of RST-DT), and some reported re-
sults are for the complete DP. However, our re-
sults show the potential of the SSNEL workflow
when not enough labeled data is available for su-
pervised learning, since the same approach for re-
lation identification of Hernault et al. (2010) was
used in HILDA-PT and 0.531 was initially ob-
tained. These results constitute the state of art
for rhetorical relation identification for Portuguese
and it is believed that with more time (iterations in
SSNEL), the results may increase.
7 Conclusion
Even though the results obtained in the SSNEL
were satisfactory, new features will be added to
the HILDA-PT, for example, types of discourse
signals, beyond the discourse markers (Taboada
and Das, 2013), and the use of semantic informa-
tion, as synonymity. Also, given that the number
of features will increase, feature selection may be
applied to select the most informative features in
each iteration of the SSNEL.
Since this work treats only rhetorical relations,
without nuclearity, a classifier of nuclearity was
trained (with the same features of HILDA-PT) and
obtained a F-score of 0.86. As done by Feng and
Hirst (2012), a better set of features will be se-
lected to identify relations between inter-sentential
spans. A procedure similar to tree building used
by Feng and Hirst (2012) will be employed in the
future DP.
Acknowledgments
This work was financially supported by
grant♯2014/11632-0, Sao Paulo Research
Foundation (FAPESP), the Natural Sciences
and Engineering Research Council of Canada and
by the University of Toronto.
441
References
Paula C.F. Cardoso, Erick G. Maziero, Maria L.C. Jorge,Eloize M.R. Seno, Ariani Di Felippo, Lucia H.M. Rino,Maria G.V. Nunes, and Thiago A.S. Pardo. 2011. CST-News: A discourse-annotated corpus for single and multi-document summarization of news texts in Brazilian Por-tuguese. In Proceedings of the 3rd RST Brazilian Meeting,pages 85–105. Cuiaba/Brazil.
Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Set-tles, Estevam R. Hruschka, and Tom M. Mitchell. 2010.Toward an architecture for never-ending language learn-ing. In Proceedings of Association for the Advancementof Artificial Intelligence, volume 5, pages 1306–1313.
Lynn Carlson, Daniel Marcu, and Mary E. Okurowski. 2001.Building a discourse-tagged corpus in the framework ofRhetorical Structure Theory. In Proceedings of Sec-ond SIGdial Workshop on Discourse and Dialogue, vol-ume 16, pages 1–10.
Christian Chiarcos. 2012. Towards the unsupervised acquisi-tion of discourse relations. In Proceedings of 50th AnnualMeeting of the Association for Computational Linguistics,pages 213–217.
Sandra Collovini, Thiago I. Carbonel, Jorge C.B. Coelho, Ju-liana T. Fuchs, and Renata Vieira. 2007. Summ-it: umcorpus anotado com informacoes discursivas visando asumarizacao automatica. Congresso Nacional da SBC,pages 1605–1614.
David A. duVerle and Helmut Prendinger. 2009. A noveldiscourse parser based on support vector machine classi-fication. In Proceedings of Joint Conference of the 47thAnnual Meeting of the ACL and the 4th InternationalJoint Conference on Natural Language Processing of theAFNLP, volume 2, pages 665–673.
Vanessa W. Feng and Graeme Hirst. 2012. Text-level dis-course parsing with rich linguistic features. In Proceed-ings of 50th Annual Meeting of the Association for Com-putational Linguistics, volume 1, pages 60–68.
Joao Gama, Pedro Medas, Gladys Castillo, and Pedro Ro-drigues. 2004. Learning with drift detection. In Proceed-ings of 17th Brazilian symp. on Artif. Intell. SBIA, pages286–295.
Hugo Hernault, Helmut Prendinger, David A. duVerle, andMitsuru Ishizuka. 2010. HILDA: A discourse parser us-ing support vector machine classification. Dialogue andDiscourse, 1(3):1–33.
Shafiq Joty, Giuseppe Carenini, and Raymon T. Ng. 2012.A novel discriminative framework for sentence-level dis-course analysis. In Proceedings of the 2012 Joint Con-ference on Empirical Methods in Natural Language Pro-cessing and Computational Natural Language Learning,EMNLP-CoNLL ’12, pages 904–915. Association forComputational Linguistics, Stroudsburg, PA, USA.
Ralf Klinkenberg. 2004. Learning drifting concepts: Ex-ample selection vs. example weighting. Intelligent DataAnalysis, 8(3):281–300.
Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng. 2009. Recog-nizing implicit discourse relations in the Penn DiscourseTreebank. In Proceedings of 2009 Conference on Empir-ical Methods in Natural Language Processing, volume 1,pages 343–351.
David M. Magerman. 1995. Statistical decision-tree mod-els for parsing. In Proceedings of Association for Com-putational Llinguistics 1995, pages 276–283. Cambridge,Massachusetts.
William C. Mann and Sandra A. Thompson. 1987. Rhetor-ical Structure Theory: Toward a functional theory of textorganization. Text, 8(3):243–281.
Daniel Marcu and Abdessamad Echihabi. 2002. An unsu-pervised approach to recognizing discourse relations. InProceedings of 40th Annual Meeting of the Associationfor Computational Linguistics, pages 368–375.
Erick G. Maziero, Thiago A.S. Pardo, and Maria G.V. Nunes.2007. Identificacao automatica de segmentos discursivos:o uso do parser Palavras. Technical Report 305, Universityof Sao Paulo.
Thiago A.S. Pardo and Maria G.V. Nunes. 2004. Relacoesretoricas e seus marcadores superficiais: Analise de umcorpus de textos cientıficos em Portugues do Brasil. Tech-nical Report 231, University of Sao Paulo.
Thiago A.S. Pardo and Maria G.V. Nunes. 2006. Reviewand evaluation of DiZer: An automatic discourse analyzerfor Brazilian Portuguese. In Proceedings of 7th Workshopon Computational Processing of Written and Spoken Por-tuguese - PROPOR (Lecture Notes in Computer Science3960), pages 180–189.
Thiago A.S. Pardo and Eloize R.M. Seno. 2005. Rhetalho:um corpus de referencia anotado retoricamente. In Pro-ceedings of V Encontro de Corpora.
J. Ross Quinlan. 1993. C4.5: Programs for Machine Learn-ing. Morgan Kaufmann Publishers Inc., San Francisco,CA, USA.
David Reitter. 2003. Simple signals for complex rhetorics:On rhetorical analysis with rich-feature support vectormodels.
Joao Silva, Antonio Branco, Sergio Castro, and Reis Reis.2010. Out-of-the-box robust parsing of portuguese. InProceedings of 9th International Conference on the Com-putational Processing of Portuguese, PROPOR’10, pages75–85. Springer-Verlag, Berlin, Heidelberg.
Claudia Soria and Giacomo Ferrari. 1998. Lexical markingof discourse relations - some experimental findings. InProceedings of ACL-98 Workshop on Discourse Relationsand Discourse Markers, pages 36–42.
Radu Soricut and Daniel Marcu. 2003. Sentence level dis-course parsing using syntactic and lexical information.In Proceedings of the 2003 Conference of the NorthAmerican Chapter of the Association for ComputationalLinguistics on Human Language Technology, volume 1,pages 149–156.
Caroline Sporleder and Alex Lascarides. 2005. Exploitinglinguistic cues to classify rhetorical relations. In Proceed-ings of Recent Advances in Natural Language Processing(RANLP-05), pages 157–166. Bulgaria.
Maite Taboada and Debopam Das. 2013. Annotation uponannotation: Adding signalling information to a corpus ofdiscourse relations. Dialogue and Discourse, 4(2):249–281.
Xiaojin Zhu. 2008. Semi-supervised learning literature sur-vey. Technical Report 1530, University of Wisconsin-Madison.
442