Upload
nathan-barber
View
215
Download
0
Embed Size (px)
Citation preview
The JDPA Sentiment Corpusfor the Automotive Domain
Miriam Eckert, Lyndsie Clark, Nicolas Nicolov
J.D. Power and Associates
Jason S. Kessler
Indiana University
Overview
• 335 blog posts containing opinions about cars– 223K tokens of blog data
• Goal of annotation project:– Examples of how words interact to evaluate entities– Annotations encode these interactions
• Entities are invoked physical objects and their properties– Not just cars, car parts– People, locations, organizations, times
Excerpt from the corpus
“last night was nice. sean bought me caribou and we went to my house to watch the baseball game …
“… yesturday i helped me mom with brians house and then we went and looked at a kia spectra. it looked nice, but when we got up to it, i wasn't impressed ...”
Outline
• Motivating example• Overview of annotation types
– Some statistics
• Potential uses of corpus• Comparison to other resources
John recently purchased a
had agreat a disappointing stereo,
and was
mildly
very grippy. He also considered a
which, while highly had a better
PERSON
Honda Civic.CAR
engine,CAR-PART CAR-PART
stereo.CAR-PART
CARPERSON
BMW
ItCAR
REFERS-TO
pricedCAR-FEATURE
REFERS-TO
John recently purchased a
had agreat a disappointing stereo,
and was
mildly
very grippy. He also considered a
which, while highly had a better
PERSON
Honda Civic.CAR
engine,CAR-PART CAR-PART
stereo.CAR-PART
CARPERSON
BMW
ItCAR
pricedCAR-FEATURE
TARGET TARGET TARGET
TARGET
TARGET
John recently purchased a
had agreat a disappointing stereo,
and was
mildly
very grippy. He also considered a
which, while highly had a better
PERSON
Honda Civic.CAR
engine,CAR-PART CAR-PART
stereo.CAR-PART
CARPERSON
BMW
ItCAR
REFERS-TO
pricedCAR-FEATURE
REFERS-TO
PART-OF PART-OF
FEATURE-OF
PART-OF
John recently purchased a
had a great a disappointing stereo,
and was
mildly
very grippy. He also considered a
which, while highly had a better
PERSON
Honda Civic.CAR
engine,CAR-PART CAR-PART
stereo.CAR-PART
CARPERSON
BMW
ItCAR
pricedCAR-FEATURE
DIMENSION
MORE
LESS
John recently purchased a
had a great a disappointing stereo,
and was
mildly
very grippy. He also considered a
which, while highly had a better
PERSON
Honda Civic.CAR
engine,CAR-PART CAR-PART
stereo.CAR-PART
CARPERSON
BMW
ItCAR
REFERS-TO
PART-OF PART-OF
TARGET TARGET TARGET
TARGET
TARGET
pricedCAR-FEATURE
FEATURE-OF
DIMENSION
MORE
LESS
Entity-level sentiment: positive
Entity-level sentiment: mixed
REFERS-TO
TARGET
Outline
• Motivating example• Overview of annotation types
– Some statistics
• Potential uses of corpus• Comparison to other resources
John recently purchased a Civic. It had a great engine and was priced well.
John
PERSON
Civic It
Entity annotationsREFERS-TO
REFERS-TO
CAR
engine
CAR-PART
• >20 semantic types from• ACE Entity Mention Detection Task• Generic automotive types
priced
CAR-FEATURE
Entity-relation annotationsEntity-level sentiment: Positive
• Relations between entities• Entity-level sentiment
annotations• Sentiment flow between
entities through relations• My car has a great engine.• Honda, known for its high
standards, made my car.
Civic
CAR
engine
CAR-PART
priced
CAR-FEATURE
PART-OF FEATURE-OF
Entity annotation type: statistics
• Inter-annotator agreement• Among mentions 83%• Refers-to: 68%
• 61K mentions in corpus and 43K entities
• 103 documents annotated by around 3 annotators
A1: …Kia Rio…A2: …Kia Rio…
MATCH
A1: …Kia Rio…A2: …Kia Rio…
NOT A MATCH
Sentiment expressions
great engine
highly priced
Prior polarity: positive
Prior polarity: negative
• Evaluations• Target mentions• Prior polarity:
• Semantic orientation given target
• positive, negative, neutral, mixed
… a
highly spec’edPrior polarity: positive
Sentiment expressions
• Occurrences in corpus: 10K
• 13% are multi-word • like no other, get up and go
• 49% are headed by adjectives
• 22% nouns (damage, good amount)
• 20% verbs (likes, upset)
• 5% adverbs (highly)
Sentiment expressions
• 75% of sentiment expression occurrences have non evaluative uses in corpus
• “light”– …the car seemed too light to be safe…– …vehicles in the light truck category…
• 77% sentiment expression occurrences are positive
• Inter-annotator agreement: – 75% spans, 66% targets, 95% prior polarity
Modifiers -> contextual polarityNEGATORS
not a good car
not a very good car
INTENSIFIERSvery good cara
kind of good cara
UPWARD
DOWNARD
NEUTRALIZERS
if goodthe car is
I hope goodthe car is
COMMITTERSsure goodthe car isI am
UPWARD
suspect goodthe car isIDOWNWARD
Other annotations
• Speech events (not sourced from author)– John thinks the car is good.
• Comparisons:– Car X has a better engine than car Y.– Handles a variety of cases
Outline
• Motivating example• Overview of annotation types
– Some statistics
• Potential uses of corpus• Comparison to other resources
Possible tasks
• Detecting mentions, sentiment expressions, and modifiers
• Identifying targets of sentiment expressions, modifiers
• Coreference resolution• Finding part-of, feature-of, etc. relations• Identifying errors/inconsistencies in data
Possible tasks• Exploring how elements interact:
– Some idiot thinks this is a good car.• Evaluating unsupervised sentiment systems or
those trained on other domains• How do relations between entities transfer
sentiment?– The car’s paint job is flawless but the safety record
is poor.• Solution to one task may be useful in solving
another.
But wait, there’s more!
• 180 digital camera blog posts were annotated• Total of 223,001 + 108,593 = 331,594 tokens
Outline
• Motivating example– Elements combine to render entity-level
sentiment
• Overview of annotation types– Some statistics
• Potential uses of corpus• Comparison to other resources
Other resources
• MPQA Version 2.0 – Wiebe, Wilson and Cardie (2005)– Largely professionally written news articles – Subjective expression
• “beliefs, emotions, sentiments, speculations, etc.”
– Attitude, contextual sentiment on subjective expressions
– Target, source annotations– 226K tokens (JDPA: 332K)
Other resources
• Data sets provided by Bing Liu (2004, 2008)– Customer-written consumer electronics product
reviews– Contextual sentiment toward mention of product– Comparison annotations– 130K tokens (JDPA: 332K)
Thank you!
• Obtaining the corpus:– Research and educational purposes– [email protected]– June 2010– Annotation guidelines:
http://www.cs.indiana.edu/~jaskessl
• Thanks to: Prof. Michael Gasser, Prof. James Martin, Prof. Martha Palmer, Prof. Michael Mozer, William Headden
Top 20 annotations by type
Inter-annotator agreement