Sentiment Analysis of Scientific Citations Awais Athar Natural Language and Information Processing...

Preview:

Citation preview

Sentiment Analysis of

Scientific Citations

Awais Athar

Natural Language and Information Processing Group , Computer Lab

Supervised by: Dr. Simone Teufel

Sentiment AnalysisSentiment Analysis focuses on identifying positive and negative opinions, emotions or expressions in given text.

Subjectivity Analysis

Example: Movie Reviews

Can we do it automatically?

Simple Sentiment Analysis

This movie is absolutely HILARIOUS!!! I hated

the Spice Girls before my friend made me watch

this movie, and now I LOVE them! This movie is

one of the funniest movies I've ever seen in

my life, and I watch comedies all the time.

This is definitely my new favorite movie.

Sentiment = sign(Number of positive words - Number of negative words) = sign(4 - 1) = sign(3) = +ve

Does it always work?

I hate the Spice Girls. I hate how their music is so

… I hate how they promote … And I hate how they're

all over … Why I saw this movie is a really, really,

really long story, but I did, and one would think I'd

despise every minute of it. But... Okay, I'm really

ashamed of it, but I enjoyed it. I mean, I admit it's

a really awful movie, a wannabe … filled with excuses

for them to act wacky as hell… the ninth floor of

hell … a cheap ass cameo in a cheap ass movie. The

plot is such a mess that it's terrible. But I loved

it.

CENSORED

CENSORED

http://www.imdb.com/reviews/111/11181.html

I work on scientific text…• Scientific papers cite other papers• A citation is any mention of another document• Used in citation indexes for search

What do researchers think about this paper?

35 237

43 151

6 75

18 163

Is citation count a good

measure ?

A citation sub-graph

N04-1021

N09-1025

P02-1039J93-2003

W03-1002

Colour the edges

N04-1021

N09-1025

P02-1039J93-2003

W03-1002

After a top-sort

J93-2003 P02-1039 W03-1002 N04-1021 N09-1025

Why not reuse existing classifiers?

• Sentiment is often hidden

• Often neutral

While SCL has been successfully applied to POS tagging and Sentiment Analysis (Blitzer et al., 2006), its effectiveness for parsing was rather unexplored.

There are five different IBM translation models (Brown et al. , 1993).

Scientific Text

• Negative polarity is often expressed in contrastive terms

• Variation in lexicon

This method was shown to outperform the class based model proposed in (Brown et al., 1992) . . .

Similarity-based smoothing (Dagan, Lee, and Pereira 1999) provides an intuitively appealing approach to language modeling.

Scientific Text

• Technical terms play a major role

• Scope of influence of citations varies widely

Current state of the art machine translation systems (Och, 2003) use phrasal (n-gram) features . . .

As reported in Table 3, small increases in METEOR (Banerjee and Lavie, 2005), BLEU (Papineni et al., 2002) and NIST scores (Doddington, 2002) suggest that . . .

Applications

• Determining the quality of a paper for ranking in citation indexes by including negative citations in the weighting scheme

• Identifying contributions of some research work in the domain.

• Identifying shortcomings and detecting problems in a particular approach

• Recognising unaddressed issues and possible gaps in current research approaches.

• Identifying personal bias of an author by observing his criticism trends.

Task 1

Given a formal citation, predict its sentiment

Corpus for Citation Sentiment Analysis

• Manually annotated 8736 citations • From 310 research papers • ACL Anthology (Bird et al., 2008)

Citations

7541

293

902

Distribution of Sentiment across Citations

Objective Negative Positive

Features

Word Level

N-grams

Parts of Speech

Science Lexicon

Contextual Polarity*

Subjectivity Clues

Negation Phrases

Valance Shifters

Sentence Structure

Dependency Structures

Sentence Splitting

Negation

* Wilson et al. 2009

Word Level Features

• N-grams: “The results were good”– The results were good– Unigrams: The, results, were, good– Bigrams: The results, results were, were good– Trigrams: The results were, results were good

• Parts of Speech– This lead to good results– DT VBP TO JJ NNS – This/DT lead/VBP to/TO good/JJ results/NNS

Word Level Features

• Science Specific Sentiment Lexicon– 83 manually extracted polar phrases– From 736 citations– Negative: complicated, daunting, deficiencies, degrade, difficult, inability, lack, poor, restrict, unexplored, worse

– Positive: acceptance, accurately, adequately, aided, appealing, bestperforming, better …

Contextual Polarity Features

• Adjectives• Adverbs• Subjectivity Clues

– Strong / Weak– Positive / Negative

• Cardinal Numbers • Modal Auxiliary Verbs (can, may, could, might, …)• Negation Phrases (no, not, never, …)• Polarity Shifters (so-called effort)

Sentence Structure Features

<CIT> showed that the results for French-English were competitive

nsubj

ccomp

complm

det prep

nsubj

pobjcop

The relationship between results and competitive will be missed by trigrams but the dependency representation captures it in the nsubj(competitive, results) feature.

Dependency Relations

Output from Stanford parser

Sentence Structure Features

Removing irrelevant polar phrases around a citation might improve results. Sentence trimming

Sentence Trimming Algo

Sentence Structure Features

”Turney’s method did not work_neg well_neg although they reported 80% accuracy in <CIT>.

All words inside a k-word window of any negation term are suffixed with a token neg to distinguish them from their non-polar versions

Negation

Classifier

• Support Vector Machine• 10-fold cross-validation

w

1 bxw 1 bxw

0 bxw

bluexredx

Kernel Trick

𝜑 (𝐱 ) → (𝑥1❑2 ,√2𝑥1𝑥2 , 𝑥2❑

2 )

Evaluation

Citations

7541

293902

𝐹𝑚𝑖𝑐𝑟𝑜=𝐹 𝑜75418736

+𝐹𝑛293

8736+𝐹 𝑝

9028736

¿0.87 𝐹𝑜+0.03𝐹𝑛+0.10𝐹 𝑝

𝐹𝑚𝑎𝑐𝑟𝑜=𝐹𝑜+𝐹𝑛+𝐹 𝑝

3

Results

Challenges in Citation Sentiment Analysis

• Negative sentiment is ‘politically dangerous’- (Ziman, 1968)

• Personal biases are hedged - (Hyland, 1995)

• Criticism is ‘sweetened’ - (MacRoberts and MacRoberts, 1984; Hornsey et al., 2008)

“While SCL has been successfully applied to POS tagging and Sentiment Analysis (Blitzer et al., 2006), its effectiveness for parsing was rather unexplored.”

Problem: Context is Ignored

Problem: Informal Citations Are ignoredCurrent work assumes that the sentiment present in the citation sentence represents

the true sentiment

Task 2

Given a sentence,predict whether or not it

contains an informal citation

Corpus Construction• Starting point: Athar's 2011

citation sentence corpus• Select top 20 papers; treat all

incoming citations to these• 1,741 citations (from >850

papers)• 4-class scheme

– objective/neutral– positive– negative– e cluded

x

View of the Annotation Tool

Demo

Distribution of Classes

Features: Formal Citation

Features: Author’s Name

Features: Acronyms

Features: Work Nouns (Teufel, 2010)

Features: Pronoun

Features: Connectors

Features: Section Markers

Features: Citation Lists

Features: Lexical Hooks

Features: n-Grams

• Using as baseline• SVM• 10-fold cross-validation• F-score

Results

Task 3 (redefinition of Task 1?)

Given a citation,predict sentiment

(taking informal citations into account)

Impact on Sentiment Detection

• n-grams of length 1 to 3• Dependency triplets (Athar, 2011)

det_results_Thensubj_good_resultscop_good_were

Annotation Unit is the Citation• Problem

– There may be more than 1 sentiment /citation

• Annotation unit = citation. Projection needed:– For Gold Standard: assume last sentiment is what is really

meant– For Automatic Treatment: merge citation context into one

single sentence

Results: Context Helps!

• SVM• 10-fold cross-validation• F-score

Back to the original question

35 237

43 151

6 75

18 163

Is citation count a good

measure ?

Referenced Papers and Citation Count

• Traditional Measure: Citation count

• Misses informal citations– 1 Formal, 27 informal

• Most papers are cited out of ‘politeness, policy or piety’

– Ziman (1968) • Out of 2,300 citations, 80% were cited only to

point towards further information– Spiegel-Rosing (1977)

• Out of 623 references, only 9% were of essential importance to the citing paper

– Hanney et al. (2005)

Task 4

Given a referenced paper,predict whether or not it is

significant

Features

Features

New Features

Results

Class-based Comparison

Conclusion

• New large citation sentiment corpus – more than 200,000 sentences

• Citation contexts carry subjective references – ignoring them would result in loss of a lot of

sentiment, specially criticism.• Citation sentiment detection

– all forms of citations– indirect mentions and acronyms.

• New task of detecting `in passing’ references

References

• A. Athar, “Detecting Sentiment in Scientific Citations”, PhD Thesis, Computer Lab, University of Cambridge. 2013 (expected)

• A. Athar and S. Teufel, “Detection of implicit citations for sentiment detection”, in Proc. of Workshop on Detecting Structure in Scholarly Discourse 2012, Jeju, Republic of Korea. 2012.

• A. Athar and S. Teufel, “Context-Enhanced Citation Sentiment Detection”, in Proc. of NAACL/HLT 2012, Montréal, Canada. 2012.

• A. Athar, “Sentiment Analysis of Citations using Sentence Structure-Based Features”, in Proc. Of ACL 2011, Portland, Oregon, US. 2011.

Thank you!