Upload
viviana-patti
View
523
Download
1
Embed Size (px)
Citation preview
EVALITA 2014 EVALUATION OF NLP AND SPEECH TOOLS FOR ITALIAN
SENTIPOLC SENTIment POLarity Classification
di.unito.it/sentipolc14 Valerio Basile, University of Groningen
Andrea Bolioli, CELI, Torino
Malvina Nissim, University of Groningen, University of Bologna
Viviana Patti, University of Torino, Dip. di Informatica
Paolo Rosso, Universitat Politècnica de València
EVALITA 2014 Workshop December 11 2014, Pisa
A new shared task in the Evalita evaluation campaign • sentiment analysis at the message level on Italian tweets • three independent sub-tasks: Task 1 - Subjectivity Classification: a system
must decide whether a given message is subjective or objective
Task 2 - Polarity Classification: a system must decide whether a given message is of positive, negative, neutral or mixed sentiment
Task 3 (Pilot): Irony Detection: a system must decide whether a given message is ironic or not
Task description
Semeval 2013, task 2 Semeval 2014, task 9
EVALITA 2014 Workshop December 11 2014, Pisa
Development and Test Data
Collection • 6,448 (training set 4,513; test set: 1,935)
tweets derived from two existing corpora: o SENTI-TUT (Bosco, Patti, Bolioli, 2013) o TWITA (Basile and Nissim, 2013)
• two main components: o political: extraction based on
specific keywords and hashtags marking political topics (#grillo, Monti)
o generic: random tweets on any topic.
EVALITA 2014 Workshop December 11 2014, Pisa
Development and Test Data
Data format • Each tweet is presented as a sequence of comma separated fields
id, subj, pos, neg, iro, topic, text
• Manual annotation: subj (subjectivity)/pos (positive polarity)/neg (negative polarity)/iro (ironic)
• Apart from the id, which is a string of numeric characters, the value of all the other fields can be either “0” or “1”.
• For the four manually annotated classes: o 0 means that the feature is absent o 1 means that the feature is present
EVALITA 2014 Workshop December 11 2014, Pisa
Development and Test Data
Data format • Each tweet is presented as a sequence of comma separated fields
id, subj, pos, neg, iro, topic, text
Constraints in the annotation scheme: • An objective tweet will not have any polarity nor irony • A subjective tweet can exhibit at the same time positive and
negative polarity (mixed!) • A subjective tweet can exhibit no specific polarity and be just
neutral but with a clear subjective flavour • An ironic tweet is always subjective and it must have one
defined polarity
EVALITA 2014 Workshop December 11 2014, Pisa
Development and Test Data
• Constraints in the annotation scheme:
EVALITA 2014 Workshop December 11 2014, Pisa
Examples
l’articolo di Roberto Ciccarelli dal manifesto di oggi http://fb.me/1BQVy5WAk
o subj = 0 o pos = 0 o neg = 0 o iro = 0
• Objective tweet: …0, 0, 0, 0… id, subj, pos, neg, iro, topic, text
EVALITA 2014 Workshop December 11 2014, Pisa
Examples
Dati negativi da Confindustria che spera nel nuovo governo Monti. Castiglione: “Avanti con le riforme” http://t.co/kIKnbFY7
o subj = 1 o pos = 1 o neg = 1 o iro = 0
• Subjective, mixed: …1, 1, 1, 0 … id, subj, pos, neg, iro, topic, text
EVALITA 2014 Workshop December 11 2014, Pisa
Examples Botta di ottimismo a #lInfedele: Governo Monti, o la
va o la spacca.
o subj = 1 o pos = 0 o neg = 1 o iro = 1
• Subjective, negative, ironic: …1, 0, 1, 1 … id, subj, pos, neg, iro, topic, text
• Underlying assumptions on irony o 1111: not allowed! o 1001: not allowed! o 0XX1: not allowed!
An ironic tweet is always subjec?ve and it must have one defined polarity
EVALITA 2014 Workshop December 11 2014, Pisa
Development and Test Data
Data format • Each tweet is presented as a sequence of comma separated fields
id, subj, pos, neg, ironic, topic, text
• id: Twitter status id (necessary to retrieve the text). • topic: 0 means “generic” and 1 means “political”. • text: this column will be filled with the
actual tweet's text o Due to Twitter’s privacy policy, tweets
cannot be distributed directly o Participants were provided with a web interface (RESTful Web API
technology) through which they could download the tweet’s text on the fly --when still available-- for all the ids provided
TwiEer’s peculiar issue in the evalua?on phase: same training/test data for all teams
EVALITA 2014 Workshop December 11 2014, Pisa
Evaluation
• Evaluation set: tweets classified by all participating teams o current twitter policies! o no big differences
• Metrics: precision, recall and F-measure for each field/class o polarity classification: adapted in order to take
into account the peculiarities of the annotation scheme (e.g. possible to have mixed sentiment)
o details on evaluation metrics applied for the evaluation of the participant results in the organizers’ report
EVALITA 2014 Workshop December 11 2014, Pisa
Participants • A total of 11 teams from 4 different countries participated in
at least one of the three tasks • SENTIPOLC was the most participated Evalita task with a total
of 35 submitted runs: great interest of the NLP community on sentiment analysis in Italian social media o Most of the submissions were constrained (training only on task data)
• Only academy (no industry)
EVALITA 2014 Workshop December 11 2014, Pisa
Results – Task 1 subjectivity
• The highest F-score was achieved by uniba2930 at 0.7140 (constrained run) o All participating systems show an improvement over the
baseline
• majority class baseline (for all tasks)
EVALITA 2014 Workshop December 11 2014, Pisa
Results – Task 2 polarity • Again, the highest F-score was achieved by uniba2930 at
0.6771 (constrained). o the most popular
subtask o all participating
systems show an improvement over the baseline
EVALITA 2014 Workshop December 11 2014, Pisa
Results – pilot Task 3 irony • The highest F-score was achieved by UNITOR at 0.5959
(unconstrained run) and 0.5759 (constrained run). o some systems score very close to the baseline:
hight complexity of the task
EVALITA 2014 Workshop December 11 2014, Pisa
Comparison, issues
• Comparison lines: o exploitation of further Twitter annotated data for training o classification framework (approaches, algorithms, features) o exploitation of available resources (e.g. sentiment lexicons, NLP
tools, etc.), o interdependency of tasks in case of systems participating in
several subtasks …in the organizers’ report
• Issues o Irony and polarity reversal o Mixed sentiment is hard to recognise
EVALITA 2014 Workshop December 11 2014, Pisa
What’s next • uniba2930: best system on tasks 1 & 2
Pierpaolo Basile and Nicole Novielli UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features
• UNITOR: best system on pilot task 3 Giuseppe Castellucci, Danilo Croce, Diego De Cao, Roberto Basili A Multiple Kernel Approach for Twitter Sentiment Analysis in Italian
Discussion!
17.45: Poster session Proceedings on-line!
http://clic.humnet.unipi.it/proceedings/Proceedings-EVALITA-2014.pdf
EVALITA 2014 Workshop December 11 2014, Pisa
Discussion • Feedback from 2014 Sentipolc teams? • Next edition? Ideas? Proposal?
o data - Twitter data
Facebook data (conversational threads, friends network)?
- format o tasks
- aspect-based sentiment analysis (target)? emotions? o systems
- Sentipolc systems available as services via API/download?
o evaluation metrics