17
FP7-ICT-2011.4.2 Contract no.: 288342 www.xlike.org XLike Deliverable D2.3.1 Page 1 of (17) © XLike consortium 2012 – 2014 Deliverable D2.3.1 Informal Language Analysis Report and Prototype Editor: Ariadna Quattoni, UPC Author(s): Ariadna Quattoni (UPC), Xavier Carreras (UPC), Lluís Padró (UPC) Deliverable Nature: Prototype (P) Dissemination Level: (Confidentiality) 1 Public (PU) Contractual Delivery Date: M12 Actual Delivery Date: M12 Suggested Readers: All partners of the XLike project consortium and end-users Version: 2.0 Keywords: Linguistic analysis, natural language processing, domain adaptation, informal languages, Google Web TreeBank 1 Please indicate the dissemination level using one of the following codes: PU = Public • PP = Restricted to other programme participants (including the Commission Services) • RE = Restricted to a group specified by the consortium (including the Commission Services) • CO = Confidential, only for members of the consortium (including the Commission Services) • Restreint UE = Classified with the classification level "Restreint UE" according to Commission Decision 2001/844 and amendments • Confidentiel UE = Classified with the mention of the classification level "Confidentiel UE" according to Commission Decision 2001/844 and amendments • Secret UE = Classified with the mention of the classification level "Secret UE" according to Commission Decision 2001/844 and amendments

Informal Language Analysis Report and Prototype · Informal Language Analysis Report and Prototype ... The difference between formal and informal texts goes well beyond lexical choice

  • Upload
    donga

  • View
    225

  • Download
    0

Embed Size (px)

Citation preview

FP7-ICT-2011.4.2 Contract no.: 288342

www.xlike.org XLike Deliverable D2.3.1

Page 1 of (17) © XLike consortium 2012 – 2014

Deliverable D2.3.1

Informal Language Analysis Report and Prototype

Editor: Ariadna Quattoni, UPC

Author(s): Ariadna Quattoni (UPC), Xavier Carreras (UPC), Lluís Padró (UPC)

Deliverable Nature: Prototype (P)

Dissemination Level: (Confidentiality)1

Public (PU)

Contractual Delivery Date: M12

Actual Delivery Date: M12

Suggested Readers: All partners of the XLike project consortium and end-users

Version: 2.0

Keywords: Linguistic analysis, natural language processing, domain adaptation, informal languages, Google Web TreeBank

1 Please indicate the dissemination level using one of the following codes: • PU = Public • PP = Restricted to other programme participants (including the Commission Services) • RE = Restricted to a group specified by the consortium (including the Commission Services) • CO = Confidential, only for members of the consortium (including the Commission Services) • Restreint UE = Classified with the classification level "Restreint UE" according to Commission Decision 2001/844 and amendments • Confidentiel UE = Classified with the mention of the classification level "Confidentiel UE" according to Commission Decision 2001/844 and amendments • Secret UE = Classified with the mention of the classification level "Secret UE" according to Commission Decision 2001/844 and amendments

XLike Deliverable D2.3.1

Page 2 of (17) © XLike consortium 2012 – 2014

Disclaimer

This document contains material, which is the copyright of certain XLike consortium parties, and may not be reproduced or copied without permission.

All XLike consortium parties have agreed to full publication of this document.

The commercial use of any information contained in this document may require a license from the proprietor of that information.

Neither the XLike consortium as a whole, nor a certain party of the XLike consortium warrant that the information contained in this document is capable of use, or that use of the information is free from risk, and accept no liability for loss or damage suffered by any person using this information.

Full Project Title: Cross-lingual Knowledge Extraction

Short Project Title: XLike

Number and Title of Work package:

WP2 – Multilingual Linguistic Processing

Document Title: D2.3.1 – Informal Language Analysis Report and Prototype

Editor (Name, Affiliation) Ariadna Quattoni, UPC

Work package Leader (Name, affiliation)

Xavier Carreras, UPC

Estimation of PM spent on the deliverable:

23 PM

Copyright notice

2012-2014 Participants in project XLike

Deliverable D2.3.1 XLike

© XLike consortium 2012 - 2014 Page 3 of (17)

Executive Summary

Informal unedited text seems to be predominant on a wide range of web domains such as blogs, tweets, discussion forums, etc. Processing this text with off-the-shelve NLP tools poses a great challenge. This is because most NLP tools were designed to work on formal text coming from highly edited domains such as newswire data. The differences between formal and informal text are multiple: lexical choice, choice of grammatical constructs, discourse structure, etc.

In the context of machine learning this problem is known as domain adaptation and it refers to the fact that there is a gap between the distribution of the data used for training models and the distribution of the test data. The first part of this deliverable provides a literature review of the different domain adaptation approaches that have been developed for the task of parsing informal language.

In the second part, we present a set of experiments on performing PoS tagging and dependency parsing of informal documents. From these experiments we conclude that the lexical divergence between formal English and the language used in informal domains can have a great impact in dependency parsing performance.

We also observed a significant drop in PoS tagging accuracy for some informal domains, and that PoS tagging accuracy has a significant impact in dependency parsing accuracy. Therefore, one starting point for improving parsing accuracy in informal domains is to improve PoS tagging accuracy.

XLike Deliverable D2.3.1

Page 4 of (17) © XLike consortium 2012 – 2014

Table of Contents

Executive Summary ........................................................................................................................................... 3 Table of Contents .............................................................................................................................................. 4 List of Tables ...................................................................................................................................................... 5 Abbreviations..................................................................................................................................................... 6 1 Introduction: NLP Processing of Informal Text ........................................................................................... 7

1.1 Differences in Lexical Choice.................................................................................................................... 7 1.2 Poor Spelling and Grammar ..................................................................................................................... 7 1.3 Difference in Prevalent Syntactic Constructs ........................................................................................... 7 1.4 Difference on Discourse Structure ........................................................................................................... 8

2 A Review of the State of the Art ................................................................................................................. 9 2.1 Text Normalization Approach .................................................................................................................. 9 2.2 Word Clustering Approach....................................................................................................................... 9 2.3 Self -Training Approach ......................................................................................................................... 10 2.4 Sample Re-Weighting ............................................................................................................................. 10

3 Testing standard NLP tools on non-canonical text ....................................................................................... 11 3.1 Dataset: The Google Web Treebank ...................................................................................................... 11 3.2 Experiments on Processing Non-canonical English................................................................................ 11 3.3 Discussion............................................................................................................................................... 13

4 Conclusions ............................................................................................................................................... 15 References ....................................................................................................................................................... 16

Deliverable D2.3.1 XLike

© XLike consortium 2012 - 2014 Page 5 of (17)

List of Tables

Table 1: Accuracy results for PoS Tagging ....................................................................................................... 12 Table 2 Dependency parsing accuracies on several datasets using predicted PoS tags. ................................ 13 Table 3 Dependency parsing accuracies on several datasets using correct PoS tags. .................................... 13 Table 4 Statistics of unknown words for several datasets in English (en) and Spanish (es) ........................... 14

XLike Deliverable D2.3.1

Page 6 of (17) © XLike consortium 2012 – 2014

Abbreviations

NLP Natural Language Processing

SANCL First Workshop on Syntactic Analysis of Non-Canonical Language

XLike Cross-lingual Knowledge Extraction

WSJ Wall Street Journal portion of the Penn Treebank

Deliverable D2.3.1 XLike

© XLike consortium 2012 - 2014 Page 7 of (17)

1 Introduction: NLP Processing of Informal Text

It is well known that the performance of state-of-the-art natural language processing tools degrades significantly when they are run on informal language. By informal language we mean the language used in a large proportion of web domains such as blogs, consumer reviews and forum discussions. The main characteristic of these domains, as opposed to the standard newswire domains, is that non-professional writers generate the content and that there is almost no editorial supervision.

To give a concrete number, a recently performed challenge on constituency and dependency parsing for informal text showed that although the best parsers have performances in the mid 90% for newswire data, their performance in informal text is in the mid 80%. The results of the challenge showed that even for simple tasks such as part of speech tagging the performance on informal text drops significantly (almost 10%) ([7]).

There are multiple reasons why informal text poses a challenge to standard natural language tools. The most significant challenge comes from a lack of supervised training data covering informal domains. Because of the high cost of building supervised training corpora for new domains, most of the NLP resources are trained on available annotated resources. For most languages these annotated resources cover only formal language.

In particular, for the English language most part-of-speech taggers and syntactic parsers where trained using the Wall Street Journal (WSJ) portion of the Penn TreeBank ([22]). As a consequence of this there is a mismatch between the data used for training the NLP tools and the data encountered in informal domains. More specifically, we can observe the following important difficulties:

1.1 Differences in Lexical Choice

The lexical choices that we are likely to find on user generated content can vary greatly from those found on newswire data. In general, slang and technical jargon seem to be much more predominant in informal discussions than they are in a more formal and edited domains such as newswire.

1.2 Poor Spelling and Grammar

Because of the informal and unedited nature of user generated content, spelling mistakes and ungrammatical constructions are much more predominant in such domains than they are in more formal domains which are more often subject to editorial revisions.

1.3 Difference in Prevalent Syntactic Constructs

The difference between formal and informal texts goes well beyond lexical choice and we can also observe a significant shift in the distribution of syntactic constructs. For example because of the conversational nature of user-generated content as opposed to newswire data, we are much more likely to find a wide range of questions and imperatives that are rather uncommon in formal discourse.

XLike Deliverable D2.3.1

Page 8 of (17) © XLike consortium 2012 – 2014

1.4 Difference on Discourse Structure

The most significant difference in discourse structure is the prevalent presence on informal texts of sentence fragments (i.e. incomplete sentences), which are almost inexistent in formal text.

As we can see most of the challenges of parsing informal text come from a mismatch between the distribution (of both lexical and syntactic constructs) used for training (i.e. samples from formal language) and the test distribution (samples from informal language). In the context of machine learning the problem of learning when there is a divergence between train and test distributions is usually referred to as domain adaptation. The train distribution is usually referred as the source domain and the test distribution as the target domain.

In the more formal description of the domain adaptation problem we normally assume that we are given some training data for some source domain but that we want to use it to train models for a target domain for which we do not have supervised training data. If no information about the target domain is provided there is not much that we can do, thus one usually assumes that some distributional information from the target domain is given, typically in the form of unlabeled samples (which often are easy to obtain in large quantities).

Several domain adaptation techniques have been proposed to solve the problem of parsing informal text. When viewed as a domain adaptation problem the source domain is formal language (usually newswire data) and the target domain is informal language (usually user-generated content such as blogs, emails, product reviews, discussion forums and tweets). In the next section we review the literature on developing NLP resources for informal languages.

Deliverable D2.3.1 XLike

© XLike consortium 2012 - 2014 Page 9 of (17)

2 A Review of the State of the Art

Recently the issue of parsing informal text has attracted some interest in the research community as evidenced by the increment of publications in the subject and the organization of a challenge on parsing non-canonical English text [7].

Multiple approaches have been proposed to solve the problem and most of the NLP tools for informal text implement a combination of ideas. The main approaches are based on: 1) Text Normalization 2) Word Clustering 3) Self-Training and 4) Sample Re-Weighting. In the remaining of this section we give a brief overview of each of these ideas.

2.1 Text Normalization Approach

The idea behind the text normalization approach ([2], [3], [5], [16]) is first to correct or modify the informal text so that it is as close as possible to Standard English, or put it differently so that it is as close as possible to the data encountered in the training corpus.

The degree to which the text is normalized ranges from simple token normalization to more complex correction over the sentential syntactic structure. To a lesser or greater extent all the proposed models for parsing informal text perform some form of text normalization as a pre-processing stage.

2.2 Word Clustering Approach

The word-clustering approach attempts to solve the divergence of lexical choice between source (i.e. formal text) and target domains (i.e. informal text). The main idea is to use unlabeled data from both the source and target domains to induce a word representation. The induced word representation maps unique tokens to either discrete clusters or a real-valued vector representation. Then in the supervised training stage the induced word representation is used as features. The underlying assumption is that a model trained on the source domain using these features will more easily generalize to the target domain since the features were induced using data from both domains. The key observation is that the learned clusters can reduce lexical data sparseness and help to bridge the gap between source and target domain. This is because some clusters will group together words from both the target and source domain.

Multiple algorithms have been proposed to induce word representations in the domain adaptation setting. To mention a few:

1) The Brown clustering algorithm was used by ([16],[18]). The Brown algorithm is based on clustering words based on the n-grams in which they occur.

2) Hayashi ([12]) used a modification of the original Brown clustering algorithm that takes into account the syntactic context (i.e. dependencies) in which a word occurs. That is they obtain clusters by exploiting dependency n-gram information.

3) Another approach is to learn a real-vector embedding of words ([4],[6],[14],[19]), that is each word is mapped to a point in a high dimensional space. The algorithm that induces the embedding guarantees that words that appear in similar contexts will lie close to each other in this space.

XLike Deliverable D2.3.1

Page 10 of (17) © XLike consortium 2012 – 2014

2.3 Self -Training Approach

Almost all proposed approaches to handle informal text utilize a technique called self-training ([9], [12], [16]). The standard self-training approach consists of a two-stage process:

1) A parser is trained on the labeled data from the source domain and run on a large number of unlabeled sentences from the target domain. The result of this process is a set of automatically-labeled sentences from the target domain.

2) The parser is retrained using as training data both the labeled data from the source domain and the labeled sentences from the target domain obtained from the previous stage.

One variant of the self-training approach ([9]) is to run several parsers and only use the labeled target sentences where the outputs of these parsers agree.

Another variant of the self-training ([8]) approach tries to handle settings where there are multiple target domains rather than a single target domain. To do this, the first stage of self-training is modified so that the output consists of multiple training sets, i.e. one set of labeled sentences for each target domain. Then in the second stage a different parser is trained for each target domain. At testing time one might not know the domain of a sentence. To solve this problem the authors in [8] propose to train a domain classifier that predicts the domain of a sentence.

2.4 Sample Re-Weighting

Another approach to address the problem of the divergence between source and target distributions is to use sample re-weighting ([15]). The idea is based on giving weights to the source domain training samples so that the empirical expectation for features obtained from the weighted samples resembles that of the target distribution. In order to do this it is necessary to have an estimate of both the target and source distribution, this estimate can be approximated using unlabeled data from both domains. Multiple approximation algorithms have been proposed such as: an approach based on exploiting classifiers trained to discriminate between source and target sentences or an approach based on latent semantic indexing.

Deliverable D2.3.1 XLike

© XLike consortium 2012 - 2014 Page 11 of (17)

3 Testing standard NLP tools on non-canonical text

In this section we present a set of experiments on PoS tagging and syntactic dependency parsing of informal documents, and compare it to the performance obtained in standard texts. We draw some conclusions about the differences between standard and informal text when it comes to applying NLP tools.

3.1 Dataset: The Google Web Treebank

The Google Web Treebank ([23,24]) is the first annotated corpus of non-canonical English text. It provides large amounts of unlabeled data for a wide range of domains as well as small amounts of labeled data for evaluation. The goal of this dataset is two-fold:

1) To be used as an evaluation benchmark for measuring the performance of NLP tools on non-standard text. For this it provides small datasets for each domain (around 2000-4000 sentences per domain.) annotated with syntactic parse trees in the style of Ontonotes 4.0.

2) To provide large amounts of unlabeled data on informal domains. This data can be used to adapt NLP tools that were trained on formal text.

The corpus covers five domains: Yahoo! Answers, Emails, Newsgroups, Local Business Reviews and Web-Blogs.

3.2 Experiments on Processing Non-canonical English

We have conducted experiments on two of the informal domains included in the Google Web Treebank: Emails and Web-Blogs. The goal of the experiments presented in this section is to test the performance of different NLP tools (trained on formal text) on non-canonical English text.

In the first experiment we compared the performance of the FreeLing ([20]) part of speech tagger on the standard benchmark of formal English text (Wall Street Journal test set (WSJ)) and the Web-Blogs and Email annotated portions of the Google Web Treebank. We report accuracy for two tag- sets: the Penn Treebank set consisting of 42 part of speech tags and the reduced universal tag-set consisting of 12 tags [25]. The results of this experiment are shown in table 1. When compared to the performance over standard text we observe that there is no significant drop on the Web-Blogs dataset but there is almost 4% drop in performance on the Emails dataset.

In the second experiment we tested the performance of several dependency parsers (trained on the standard partition of the WSJ) on the Web-Blogs and Emails datasets. In particular, we consider 5 dependency parsing models:

1) A parser that uses only the part of speech tags predicted by the FreeLing tagger as sentence features (Only PoS).

2) A parser that uses the part of speech tags predicted by the FreeLing tagger and lexical features for the top 100 most frequent words in the training portion.

3) A parser that uses part of speech tags predicted by the FreeLing tagger and lexical features for the top 1,000 most frequent words.

4) A parser that uses part of speech tags predicted by the FreeLing tagger and lexical features for the top 10,000 most frequent words.

XLike Deliverable D2.3.1

Page 12 of (17) © XLike consortium 2012 – 2014

5) A parser that uses part of speech tags predicted by the FreeLing tagger and lexical features for all words appearing in the training set.

All experiments performed for this report were done using the prototypes of XLike to perform linguistic analysis --- see deliverable D2.2.1 for an explanation of how the methods for PoS tagging and dependency parsing work [1]. In particular, the models for PoS tagging and the dependency parsing model 5 correspond to the models in the XLike pipeline to process standard texts in English; the only difference is that here we adapt the output of these processes so that predicted PoS tags and syntactic labels match those of the Google Web Treebank. Parsing models 1-4 have been developed for this experiment using the tools of the prototype to learn new models.

Table 2 shows the results for these experiments. The metrics are UAS and LAS (unlabeled/labeled attachment score), that measure accuracy of unlabeled and labeled syntactic dependencies. We can observe that for all datasets the optimal parser is the one that uses PoS tags and the 10,000 top words: 86.10% of UAS for WSJ, 82.88% for Web-Blogs and 76.91% for Emails. However, the differences with the parser using all the words are tiny. When we compare the performance of the best parser on standard English (WSJ) versus non-standard English (Web-Blogs and Emails) we observe a drop of performance of 3.2% for Web-Blogs and 9.2% for Emails.

Another important observation is that the parser performance on non-standard text degrades more for lexicalized models (i.e. models that utilize word features) than for non-lexicalized models. For example for the Only-PoS model we observe a drop of around 2% for Web-Blogs and a drop of around 7% for Emails.

We also observe that in standard English we gain 4.65% in performance by including lexical features while in non-standard English we gain 3.8% for Web-Blogs and only 2.3 % for Emails. This result seems to support the hypothesis that differences in lexical choice among domains can have a negative effect in the performance of NLP parsers.

In the third set of experiments we investigate how much of the drop in parsing performance can be explained by the drop in PoS tagging accuracy. With this goal in mind we repeated the same experiments as in table 2 but using the true part of speech tag (provided as corpus annotation) instead of the PoS predicted by FreeLing. Table 3 shows the results for these experiments.

We observe that the difference in dependency parsing performance on standard and non-standard English is reduced. For the Web-Blogs data we observe a drop of 2.2 % instead of the 3.2% drop observed when using predicted PoS tags. For the Emails data we observe a drop of 7.3% instead of the 9.2% observed when using predicted PoS tags. This seems to suggest that improving the accuracy of the PoS tagger on non-standard text can be an important step in reducing the difference between parser performance on formal and informal domains.

Table 1: Accuracy results for PoS Tagging

Penn Treebank tags Universal tags

WSJ (standard) 94.64% 95.97%

Web-Blogs 94.48% 95.50%

Emails 90.93% 92.45%

Deliverable D2.3.1 XLike

© XLike consortium 2012 - 2014 Page 13 of (17)

Table 2 Dependency parsing accuracies on several datasets using predicted PoS tags.

WSJ (standard) Web-Blogs Emails

UAS LAS UAS LAS UAS LAS

Only PoS 81.45 76.89 79.07 73.61 74.65 68.05

PoS + 100 Words 84.36 80.96 81.69 77.86 75.90 70.49

PoS + 1,000 Words 85.77 82.74 82.46 79.01 76.68 71.48

PoS + 10,000 Words 86.10 83.05 82.88 79.56 76.91 71.74

PoS + All Words 86.02 83.05 82.67 79.34 76.82 71.92

Table 3 Dependency parsing accuracies on several datasets using correct PoS tags.

WSJ (standard) Web-Blogs Emails

UAS LAS UAS LAS UAS LAS

Only PoS 86.26 83.58 85.09 81.39 80.92 77.12

PoS + 100 Words 89.06 87.40 87.24 85.17 82.20 79.41

PoS + 1,000 Words 90.25 88.92 88.12 86.48 82.82 80.29

PoS + 10,000 Words 90.46 89.19 88.19 86.63 83.17 80.80

PoS + AllWords 90.51 89.24 88.28 86.67 82.57 80.59

3.3 Discussion

One of our hypotheses was that differences in lexical choice could explain a significant part of the drop of performance of dependency parsers when run on informal text. If this hypothesis was true we would expect to see much more lexical choice difference in the Emails domain than in the Web-Blogs domain, since the drop in performance in Emails is much more significant than in Web-Blogs.

To have an approximation of the divergence in lexical choice we computed the number of words in test that are unknown to the FreeLing tagger. Since the FreeLing tagger was trained on Standard English text we believe that this is a good approximation of lexical divergence between formal and informal domains. Table 4 shows the computed statistics.

As we predicted, we observe that the percentage of unknown words is significantly higher for the Emails dataset (i.e. 2.97%) than for the Web-Blogs dataset (i.e. 1.31%).

Given that lexical divergence seems to be a good predictor of parser performance we also computed this metric for other informal English and Spanish datasets. More specifically, for English we computed the percentage of unknown words over a large dataset of messages from Twitter, the 10.93% divergence seems to suggest that parsing this data will be a great challenge.

For Spanish, we computed the percentage of words unknown to the off-the-shelf Spanish FreeLing tagger. We considered a corpus of standard Spanish ([21]) and a corpus of Spanish tweets. Here the lexical divergence for non-standard Spanish is even more severe: 48.97%.

XLike Deliverable D2.3.1

Page 14 of (17) © XLike consortium 2012 – 2014

Table 4 Statistics of unknown words for several datasets in English (en) and Spanish (es)

sentences tokens unknown tokens num/punct tokens

WSJ (en) 1,336 30,178 0.77% 20.85%

Web-Blogs (en) 1,016 22,971 1.31% 16.31%

Emails (en) 2,450 27,739 2.97% 18.47%

Twitter (en) 295,057 3,832,108 10.93% 24.69%

ANCORA (es) 1,875 51,436 0.92% 16.66%

Twitter (es) 177,989 2,913,606 48.97% 23.83%

Deliverable D2.3.1 XLike

© XLike consortium 2012 - 2014 Page 15 of (17)

4 Conclusions

Informal unedited text seems to be predominant on a wide range of web domains such as blogs, tweets, discussion forums, etc. Processing this text with off-the-shelf NLP tools poses a great challenge. This is because most of these tools were designed to work on formal text coming from highly edited domains such as newswire data. The differences between formal and informal text are multiple: lexical choice, choice of grammatical constructs, discourse structure, etc.

In the context of machine learning this problem is known as domain adaptation and it refers to the fact that there is a gap between the distribution of the data use for training models and the distribution of the test data. The first part of this deliverable provided a literature review of the different domain adaptation approaches that have been developed for the task of parsing informal language.

In the second part, we presented a set of experiments on performing PoS tagging and dependency parsing of informal documents. From these experiments we conclude that the lexical divergence between formal English and the language used in informal domains can have a great impact in dependency parsing performance.

We also observed a significant drop in PoS tagging accuracy for some informal domains and that PoS tagging accuracy has a significant impact in dependency parsing accuracy. Therefore, one starting point for improving parsing accuracy in informal domains is to improve PoS tagging accuracy.

In the short-term future we plan to investigate:

1) Techniques for improving PoS tagging accuracy on informal domains. In particular, we want to investigate text normalization and word embedding approaches.

2) Techniques to directly improve dependency parsing accuracy. In particular, we want to investigate word embedding approaches, sample re-weighting and self-training.

XLike Deliverable D2.3.1

Page 16 of (17) © XLike consortium 2012 – 2014

References

[1] XLike deliverable “D2.2.1 – Early Deep Linguistic Processing Prototype”

[2] Dahlmeier D. “A Beam-Search Decoder for Grammatical Error Correction”, in EMNLP-CoNLL 2012.

[3] Islam A., Inkpen D. “Correcting Different Types of Errors in Text”, In CAI 2011.

[4] Blitzer J., McDonald R., Pereira F. “Domain Adaptation with Structural Correspondence Learning”, in EMNLP 2006.

[5] Gimpel K., Shneider N., 0’Connor B., Dipanian D., Mills D., Eisenstein J., Heilman M., Yogatama D., Flanigan J., Smith N. “• Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments”, in ACL 2011.

[6] Olutobi 0., O’Connor B., Dyer C., Gimpel K., Schneider N. j., McDonald R., Pereira F. “Part-Of-Speech Tagging for Tweeter: Word Clusters and Other Advances”, TechReport 2012.

[7] Petrov S., McDonald R., “Overview of the 2012 Shared Task on Parsing the Web”, in SANCL 2012.

[8] Le Roux J., Foster J., Wagner J., Rasul S., Kaljahi Z., Bryl A., “DCU-Paris13 Systems for the SANCL 2012 Shared Task”, in SANCL 2012.

[9] Bohnet B., Farkas R., Cetinoglu O., “SANCL 2012 Shared Task: IMS System Description”, in SANCL 2012.

[10] McClosky D., Wanxiang C..,Recasens M., Wang M., Socher R., Manning D., “Stanford’s System for Parsing the English Web”, in SANCL 2012.

[11] Dunlop A., Roark B., Wagner J., “Merging self-trained grammars for automatic domain adaptation”, in SANCL 2012.

[12] Hayashi K., Kondo S., Duh K., Matsumoto Y., “The NAIST Dependency Parser for SANCL 2012 Shared Task”, in SANCL 2012.

[13] Tang B., Jiang M., Hua X., “Vanderbilt’d System for SANCL 2012 Shared Task”, in SANCL 2012.

[14] Xtaoye W., Smith D., “Semi-Supervised Deterministic Shift-Reduce Parsing with Word Embeddings ”, in SANCL 2012.

[15] Sogaard A., Plank B., “Parsing the web as covariate shift”, in SANCL 2012.

[16] Seddah D., Benoit S., Candito M., “The Alpage Architecture at the SANCL 2012 Shared Task: Robust Pre-Processing and Lexical Bridging for User-Generated Content Parsing”, in SANCL 2012.

[17] Zhang M., Che W., Liu Y., Li Z., Liu T., “HIT Dependency Parsing: Boostrap Aggregating Heterogeneous Parsers”, in SANCL 2012.

[18] Koo T., Carreras X., Collins M., “Simple Semi-Supervised Dependency Parsing”, in ACL 2008.

[19] Ando R, Zhang T., “A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data”, in Journal of Machine Learning 2005.

[20] Padro L., Stanilovsky E., “FreeLing 3.0: Towards wider Multilinguality”, in LREC 2012.

[21] Recasens, M., Martí M.A., “AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan.” Language Resources and Evaluation 2012, Springer Science.

[22] Mitchell M., Marcinkiewicz M.A., Santorini B., “ Building a large annotated corpus of English: the penn Treebank” , in Computational Linguistics 1993.

[23] Bies A., Mott J., Warner C., Kulick S., “English Web Treebank”, in Linguistic Data Consortium, LDC2012T13, Philadelphia 2012.

Deliverable D2.3.1 XLike

© XLike consortium 2012 - 2014 Page 17 of (17)

[24] Petrov S, McDonald R, “First SANCL Shared Task on Parsing English Web Text”, https://sites.google.com/site/sancl2012/home/shared-task.

[25] Petrov S., Das D. and McDonald R., "A Universal Part-of-Speech Tagset", In Proceedings of LREC 2012.