SENTIment POLarity Classification Task - Sentipolc@Evalita 2014

EVALITA 2014 EVALUATION OF NLP AND SPEECH TOOLS FOR ITALIAN

SENTIPOLC SENTIment POLarity Classification

di.unito.it/sentipolc14 Valerio Basile, University of Groningen

Andrea Bolioli, CELI, Torino

Malvina Nissim, University of Groningen, University of Bologna

Viviana Patti, University of Torino, Dip. di Informatica

Paolo Rosso, Universitat Politècnica de València

EVALITA 2014 Workshop December 11 2014, Pisa

A new shared task in the Evalita evaluation campaign • sentiment analysis at the message level on Italian tweets • three independent sub-tasks: Task 1 - Subjectivity Classification: a system

must decide whether a given message is subjective or objective

Task 2 - Polarity Classification: a system must decide whether a given message is of positive, negative, neutral or mixed sentiment

Task 3 (Pilot): Irony Detection: a system must decide whether a given message is ironic or not

Task description

Semeval 2013, task 2 Semeval 2014, task 9


Development and Test Data

Collection • 6,448 (training set 4,513; test set: 1,935)

tweets derived from two existing corpora: o SENTI-TUT (Bosco, Patti, Bolioli, 2013) o TWITA (Basile and Nissim, 2013)

• two main components: o political: extraction based on

specific keywords and hashtags marking political topics (#grillo, Monti)

o generic: random tweets on any topic.



Data format • Each tweet is presented as a sequence of comma separated fields

id, subj, pos, neg, iro, topic, text

• Manual annotation: subj (subjectivity)/pos (positive polarity)/neg (negative polarity)/iro (ironic)

• Apart from the id, which is a string of numeric characters, the value of all the other fields can be either “0” or “1”.

• For the four manually annotated classes: o 0 means that the feature is absent o 1 means that the feature is present




id, subj, pos, neg, iro, topic, text

Constraints in the annotation scheme: • An objective tweet will not have any polarity nor irony • A subjective tweet can exhibit at the same time positive and

negative polarity (mixed!) • A subjective tweet can exhibit no specific polarity and be just

neutral but with a clear subjective flavour • An ironic tweet is always subjective and it must have one

defined polarity



• Constraints in the annotation scheme:


Examples

l’articolo di Roberto Ciccarelli dal manifesto di oggi http://fb.me/1BQVy5WAk

o subj = 0 o pos = 0 o neg = 0 o iro = 0

• Objective tweet: …0, 0, 0, 0… id, subj, pos, neg, iro, topic, text


Examples

Dati negativi da Confindustria che spera nel nuovo governo Monti. Castiglione: “Avanti con le riforme” http://t.co/kIKnbFY7


• Subjective, mixed: …1, 1, 1, 0 … id, subj, pos, neg, iro, topic, text


Examples Botta di ottimismo a #lInfedele: Governo Monti, o la

va o la spacca.


• Subjective, negative, ironic: …1, 0, 1, 1 … id, subj, pos, neg, iro, topic, text

• Underlying assumptions on irony o 1111: not allowed! o 1001: not allowed! o 0XX1: not allowed!

An ironic tweet is always subjec?ve and it must have one defined polarity




id, subj, pos, neg, ironic, topic, text

• id: Twitter status id (necessary to retrieve the text). • topic: 0 means “generic” and 1 means “political”. • text: this column will be filled with the

actual tweet's text o Due to Twitter’s privacy policy, tweets

cannot be distributed directly o Participants were provided with a web interface (RESTful Web API

technology) through which they could download the tweet’s text on the fly --when still available-- for all the ids provided

TwiEer’s peculiar issue in the evalua?on phase: same training/test data for all teams


Evaluation

• Evaluation set: tweets classified by all participating teams o current twitter policies! o no big differences

• Metrics: precision, recall and F-measure for each field/class o polarity classification: adapted in order to take

into account the peculiarities of the annotation scheme (e.g. possible to have mixed sentiment)

o details on evaluation metrics applied for the evaluation of the participant results in the organizers’ report


Participants • A total of 11 teams from 4 different countries participated in

at least one of the three tasks • SENTIPOLC was the most participated Evalita task with a total

of 35 submitted runs: great interest of the NLP community on sentiment analysis in Italian social media o Most of the submissions were constrained (training only on task data)

• Only academy (no industry)


Results – Task 1 subjectivity

• The highest F-score was achieved by uniba2930 at 0.7140 (constrained run) o All participating systems show an improvement over the

baseline

• majority class baseline (for all tasks)


Results – Task 2 polarity • Again, the highest F-score was achieved by uniba2930 at

0.6771 (constrained). o the most popular

subtask o all participating

systems show an improvement over the baseline


Results – pilot Task 3 irony • The highest F-score was achieved by UNITOR at 0.5959

(unconstrained run) and 0.5759 (constrained run). o some systems score very close to the baseline:

hight complexity of the task


Comparison, issues

• Comparison lines: o exploitation of further Twitter annotated data for training o classification framework (approaches, algorithms, features) o exploitation of available resources (e.g. sentiment lexicons, NLP

tools, etc.), o interdependency of tasks in case of systems participating in

several subtasks …in the organizers’ report

• Issues o Irony and polarity reversal o Mixed sentiment is hard to recognise


What’s next • uniba2930: best system on tasks 1 & 2

Pierpaolo Basile and Nicole Novielli UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features

• UNITOR: best system on pilot task 3 Giuseppe Castellucci, Danilo Croce, Diego De Cao, Roberto Basili A Multiple Kernel Approach for Twitter Sentiment Analysis in Italian

Discussion!

17.45: Poster session Proceedings on-line!

http://clic.humnet.unipi.it/proceedings/Proceedings-EVALITA-2014.pdf


Discussion • Feedback from 2014 Sentipolc teams? • Next edition? Ideas? Proposal?

o data - Twitter data

Facebook data (conversational threads, friends network)?

- format o tasks

- aspect-based sentiment analysis (target)? emotions? o systems

- Sentipolc systems available as services via API/download?

o evaluation metrics

Social Media

SENTIment POLarity Classification Task - Sentipolc@Evalita 2014