15
DATA SOCIETY © 2015 TM “If you can’t explain it simply, you don’t understand it well enough.” - Albert Einstein

Guest Talk for Data Society's "INTRO TO DATA SCIENCE BOOT CAMP"

Embed Size (px)

Citation preview

DATA SOCIETY © 2015

TM

“If you can’t explain it simply, you don’t understand it well enough.”- Albert Einstein

DATA SOCIETY © 2015

Contrarian Questioncon·trar·i·an kəәnˈtre(əә)rēəәn,kän-/ noun noun: contrarian; plural noun: contrarians 1.

a person who opposes or rejects popular opinion, especially in stock exchange dealing.

adjective adjective: contrarian 1.

opposing or rejecting popular opinion; going against current practice. ”the comment came more from a contrarian disposition than moral conviction"

DATA SOCIETY © 2015

Contrarian Question

Data has NO intrinsic value!!

DATA SOCIETY © 2015

Facts vs. Opinion(s)Two main types of textual information. • Facts and Opinions Search engines are optimized for Facts; Sentiment Analysis is a growing attempt (not completely solved) to optimize the discovery of opinions. Opinions Mining or Sentiment Analysis is an attempt to recognize the opinion or sentiment that a person holds toward an object.

DATA SOCIETY © 2015

Let's have a look at this*: • 91 percent of people report having gone into a store because of an

online experience. • 89 percent of consumers conduct research using search engines. • 78 percent of consumers say that posts made by companies on

social media influence their purchases. • 72 percent of consumers trust online reviews as much as personal

recommendations. • 62 percent of consumers end up making a purchase in a store after

researching it online.*Blogs, Comments (i.e. YouTube, Facebook), Reviews, Forums, Microblogging (i.e. Twitter) http://www.investopedia.com/terms/a/asymmetricinformation.asp

Information Asymmetry

DATA SOCIETY © 2015

Information Asymmetry

seller

buyer

informationinformationinformation $$

$$

$$

$

$

DATA SOCIETY © 2015

Where do we find sentiment?• Movie / Books: Are the reviews on this movie/book positive/negative?

• Product Sales: What is thought of the new iPhone?

• Public Sentiment: How do consumers feel about the economy? How is consumer sentiment effecting sales by sector?

• Politics: How are voters polarized, if at all around a candidate or policy?

• Prediction: Stock Prices, Election Outcomes, Market Trends, Product Sales

DATA SOCIETY © 2015

Scherer’s Typology of Emotions:Scherer's typology of emotions is briefly explained as follows: • Emotion: This is a brief, organically synchronized evaluation of a major event, for

example, being angry, sad, joyful, ashamed, proud, and elated i.e. fired or promoted at work.

• Mood: This is a diffused, non-caused, low-intensity, long-duration change in subjective feeling, for example, being cheerful, gloomy, irritable, listless, depressed, and buoyant

• Interpersonal stance: This is an affective stance towards another person in a specific interaction, for example, being friendly, flirtatious, distant, cold, warm, supportive, and contemptuous

• Attitudes: This is enduring, affectively colored beliefs or dispositions towards objects or persons, for example, being liking, loving, hating, valuing, and desiring

• Personality traits: These are stable personality dispositions and typical behavior tendencies, for example, being nervous, anxious, reckless, morose, hostile, and jealous

DATA SOCIETY © 2015

Goal: all measurement is to arrange items on a continuum (observed or unobserved).

Measurement:

DATA SOCIETY © 2015

1. Dictionary Based Sentiment Analysis • i.e. Is an attitude toward an object positive or negative? • e.g. Jeffrey Breen’s method, qdap

2. Supervised Learning for Sentiment Analysis. • i.e. Given data we have seen in the past, can we predict class

assignment for our polarity measure (positive/neutral/negative) • e.g. Naive Bayes, MaxEnt, SVM

3. Unsupervised Sentiment Analysis • i.e. No dictionaries. No labeled data. No training algorithms. And,

scale words (often bi-grams) and users on a single dimension. • e.g. latent variable models - IRT

Sentiment Analysis: Ordered Sophistication Lexicon Based (Supervised)

DATA SOCIETY © 2015

• The Beige Book (http://www.federalreserve.gov/monetarypolicy/beigebook), more formally called the Summary of Commentary on Current Economic Conditions, is a report published by the United States Federal Research Board (FRB) eight times a year.

• The Beige Book has been in publication since 1985 and is now published online.

• The report is published by each (n=12) of the Federal Reserve Bank districts (e.g. Beige Book (October 2013) is below)

• The content is rather anecdotal. The report interviews key business contacts, economists, market experts, and others to get their opinion about the economy.

• The data used in this book can be found on GitHub (https://github.com/ SocialMediaMininginR/beigebook), as well as the Python code for all the scraping and parsing.

Beige Book

DATA SOCIETY © 2015

“Consumer spending grew modestly in most Districts. Auto sales continued to be strong, particularly in the New York District where they were said to be increasingly robust. In contrast, Chicago, Kansas City,

and Dallas indicated slower growth in auto sales in September.”

Beige Book

DATA SOCIETY © 2015

qdapDetermine polarity (a few ways to do this… none of which are perfect). qdap uses word clusters.

> pol.bb<- polarity(bb$text, grouping.var = bb$location, polarity.frame = POLKEY, constrain = TRUE, negators = qdapDictionaries::negation.words, amplifiers = qdapDictionaries::amplification.words, deamplifiers = qdapDictionaries::deamplification.words, question.weight = 0, amplifier.weight = .3, n.before = 4, n.after = 2, rm.incomplete = FALSE, digits = 3)

xi = word of polarity xit = neutral (xi0); negator (xiN); amplifier(xia); de-amplifier (xid)

Each polarized word (xi) is then weighted w based on the weights from polarity frame

xiT

xixi-1xi-2xi-3xi-4 xi+1 xi+2

DATA SOCIETY © 2015

Summary

Beige Book (1996 - 2013) with recession bars (pink)

DATA SOCIETY © 2015

thank you

Blog: http://socialmediaminingr.com Twitter: @rheimann

http://datatactics.blogspot.com/2015/02/modern-approaches-to-sentiment-analysis.html