48
© 2014 IBM Corporation IBM Research – Brazil 1 Análise de sentimento durante a Copa usando Big Data Alan Braz – IBM Research @alanbraz

Análise de sentimento durante a Copa usando Big Data

Embed Size (px)

DESCRIPTION

A tecnologia de análise de sentimento social, desenvolvida pela IBM Brasil, analisa o que está sendo postado nas redes sociais sobre qualquer tema, empresa ou pessoa, sem a necessidade de uma hashtag. Todos os posts públicos em português são capturados por um sistema IBM de alta tecnologia com inteligência artificial, que é treinado para aprender a interpretar se o sentimento de cada postagem é positivo, neutro ou negativo. Essa tecnologia é capaz de analisar postagens de diversos assuntos e naturezas, incluindo gírias, sarcasmo e linguagem coloquial. Esta solução apelidade de FAMA, foi utilizada durante os jogos do Brasil na Copa das Confederações em 2013 e evoluída para funcionar nos 64 jogos da Copa do Mundo FIFA 2014. Nesta apresentação contarei a motivação, detalhes técnicos e resultados desta empreitada que unificou futebol, redes sociais e tecnologia! Leia mais em http://alanbraz.wordpress.com/2014/08/07/tdc2014/

Citation preview

Page 1: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

1

Análise de sentimento durante a Copa usando Big Data

Alan Braz – IBM Research @alanbraz

Page 2: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

2

Alan Braz

IBM Research – BrazilResearch Software Engineer

2002:2005 UNICAMP – BSc in Computer Science2005aug:2005nov IBM GBS – Java developer intern2005:2007 IBM GBS – Java developer (WWER)2007:2010 IBM GBS – Technical leader (eAC)2009:2012 IBM GBS – Agile coach and instructor (GenO)2009:today Metrocamp – SE, RUP, Agile grad teacher2010:2012 IBM GBS – Software Architect (Blue Community)2009:2013 UNICAMP – MSc Agile Software Engineering2013feb:today IBM Research Brazil as RSE

www.alanbraz.com.br@alanbraz

Page 3: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

3

Innovation and Comfort

Trial-and-Error:– start-ups

RADICALINNOVATIONINNOVATION

Science-Based:– scientific method

(empirical)

– logic deduction (mathematics)

Page 4: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

4

science-based innovation

Page 5: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

5

The World is our Lab: 12 Labs Worldwide in 10 Countries

China WatsonAlmaden

Austin

JapanIsrael

Switzerland

India

Ireland

Australia

BehavioralScience Chemistry

ElectricalEngineering

ComputerScience

MaterialsScience

MathematicalScience Physics

ServicesScience

IBM Research world-wide has 1600+ PhDs with diversity of disciplines:

Africa

Page 6: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

6

Page 7: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

7

Page 8: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

8

Page 9: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

9

IBM Research - Brazil

Natural resources modeling, analytics, and logistics.

Systems of engagement and insights.

Analytics and modeling of social and human data and applications.

Micro/nano- technologies aimed at addressing smarter planet challenges.

Smarter Natural

Resources

Systems of Engagement and Insights

Smarter Devices

Rio de Janeiro

São Paulo

A team of world class researchers in close connection to the other 12 IBM Research labs an to the world’s best scientific, academic, and development communities.

Social Data

Analytics

Page 10: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

10

System U: Modeling People from Social Media

Social behaviorse.g., when tweetingSocial behaviorse.g., when tweeting

Five Factor Model

•Openness•Conscientious•Extroverted•Agreeable•Neuroticism

Five Factor Model

•Openness•Conscientious•Extroverted•Agreeable•Neuroticism

Ford’s 12 “Universal Needs”

•Structure•Challenge•Excitement•Liberty•Harmony•Closeness

Ford’s 12 “Universal Needs”

•Structure•Challenge•Excitement•Liberty•Harmony•Closeness

•Practicality•Self-expression•Curiosity• Ideals•Love•Stability

Five Values

•Self-transcendence•Conservation•Self-enhancement•Hedonism•Openness-to-Change

Five Values

•Self-transcendence•Conservation•Self-enhancement•Hedonism•Openness-to-Change

Page 11: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

11

Project: Social Media Behavior Simulation

Goal: to create a tool for companies to explore the impact and result of social media actions through simulation.

Applications: exploration of effort size

and impact of marketing campaigns;

determination of counter-information measures in viral media outbreaks.

Maira Gatti, Ana Appel, Claudio Pinhanez, Rogério de Paula, Cicero dos Santos, Alexander Rademaker, Paulo Cavalin, Samuel Barbosa, Daniel Gribel

Romney’s Network 5.1M tweets 28,145 active users 5,498 followers

Obama’s Network 23,856,961 followersRomney’s Network 1,675,792 followers

Sample - Sept 22 to Oct 29, 2012Obama’s Network 5.6M tweets 24,526 active users 3,594 followers

Simulation of Obama/Romney Twittercampaigns in the last month before electionSimulation of Obama/Romney Twittercampaigns in the last month before election

Page 12: Análise de sentimento durante a Copa usando Big Data

© 2013 IBM Corporation

Page 13: Análise de sentimento durante a Copa usando Big Data

Video:Ei!

https://www.youtube.com/watch?v=b7IvNyLvizQ

Page 14: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

14

Ei! 194 Million Brazilians Helping their National Team’s Coach

An app made specifically for one person: Luiz Felipe Scolari, coach of the Brazilian national soccer team.

Ei! is an app that identifies, filters and analyzes all the Twitter comments that Brazilians have made during the games.

With the touch of a button, Scolari will know what the country consensus is on:

At half time: which players the audience are liking and hating, what changes should be made, which tactics should be explored, what player needs to be introduced…

After the game: his country’s perspective on how the team, the players and his performance as a coach.

Page 15: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

15

Challenges• Real-time issues• Up to 5 million tweets per match• Up to 20 thousands tweets per minute

• Texting x Writing: Casual language• nao disse , Balotelli ia meter gol hoje , um golaço ainda , madero aquele negoo

• hora de colocar o Leandro né Felipão ? u.u

• vou ser repetitivo de novo , mas : na minha epoca de jovem torcedor da seleção brasileira , brasil nao tomava gol de p### de chile não viu

• jah to vendo o Brasil faze nois passa vergonha na copa ! ! ! pq meu g-zuis ...

• acho q o ronaldinho tem que ser totula

• Com todo o respeito , Luis Fabiano , popcorn men hahahahaha beijo para quem entendeu , pior piada ever ! Haha

Page 16: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

16

Social Sentiment Analysis is Difficult

(CHEvATM) Diego costa merece errar por ter escolhido outra seleçao pra jogar

(BRAvITA) Itália perdendo o segundo jogador lesionado com TRINTA minutos de jogo. Prandelli deve tá jogando o Football Manager 2013.

(BRAvITA) PAAAAAAAARTIU ASSISTIR JOGO DO Brazil!

(BRAvITA) Vacilo, Jô ia entrar e fazer mais um

(BRAvMEX) o que aconteceu com a seleção ? Pqp

(BRAvURU) no momento dançando show das poderosas de sutiã e short jeans

(RMAvATM) BALE AMOR FAÇA AQUELE LINDO GOL QUE PROMETEU PRA MIM ONTEM A NOITE

(BRAvMEX) Brazil vai ganhando do México, vingando-se das Olimpíadas, num jogo que vale tanto quanto troco em bala.

(SAOvCOR) o ganso so quer fazer jogada genial

(SAOvCOR) Com essa Fabulosa em campo o Sao Paulo sempre vai fazer gol contra o Corinthians, entenda tecnico retranqueiro do c#######

(SAOvCOR) Mano meu pai ganho 500 conto no jogo do bixo kkkk

Page 17: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

17

Ei! Social Sentiment Solution

Page 18: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

18

Millions of events per second

Microsecond Latency

Traditional / Non-traditional data sources

Real time delivery

PowerfulAnalytics

Algorithmic Trading

Telco ChurnPrediction

SmartGrid

CyberSecurity

Government /Law enforcement

ICUMonitoring

EnvironmentMonitoring

InfoSphere StreamsA Platform for Real Time Analytics on BIG Data

Key Big Data Challenge – Velocity

Volume:Terabytes per secondPetabytes per day

Variety:All kinds of dataAll kinds of analytics

Velocity:Insights in microseconds

Page 19: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

19 http://www.ibm.com/developerworks/analytics/

Page 20: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

20

Page 21: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

21

x86 host x86 host x86 host x86 host x86 host

Optimizing scheduler assigns PEs to hosts, and continually manages resource allocation

Optimizing scheduler assigns PEs to hosts, and continually manages resource allocation

Commodity hardware – laptop, blades or high performance clustersCommodity hardware – laptop, blades or high performance clusters

MetersCompany Filter

Usage Model

Usage Contract

Temp Action

Dynamically add hosts and jobsDynamically add hosts and jobs

New jobs work with existing jobsNew jobs work with existing jobs

Text Extract

Degree History

Compare History Store

History

Meters

Season Adjust

Daily Adjust

Text Extract

Streams Runtime Illustrated

Page 22: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

22

Ei! is Built on FAMA: Real-Time Social Media Polarity Analysis Tool for Portuguese Language

FAMA is social sentiment analysis tool for the Portuguese Language developed by IBM Research - Brazil

FAMA processes text related to topics of interest which appear in social media: Twitter, Facebook, ReclameFacil, etc.; or in private text repositories such as customer complaints or call center logs.

FAMA can determine polarity related to the topics of interest: positive, negative, or neutral.

FAMA can find most commonly used terms and their co-occurrences with the topics of interest. “FAMA”

Greek goddess of gossip and rumor

Page 23: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

23

FAMA: Real-Time Social Media Polarity Analysis in Portuguese

23

Text Classifier

classifieddatabase

Stream Computin

g

Infosphere Streams

learneddatabase

JSONs

TextAnalytics

dashboard user interface

FAMA

Page 24: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

24

Construction of the Learned Database from Manual Analysis of Tweet Samples

The data for the learned database is created by manual inspection of tweets:

about 2000 tweets from 4 friendly matches

15 different coders with different degrees of interest and knowledge of soccer

uses tool to display, collect, and process the data.

Page 25: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

25

FAMA Analysis of a Tweet: Example of Text Classification

vou ser repetitivo de novo , mas : na minha epoca de jovem torcedor da seleção brasileira , brasil nao tomava gol de p### de chile não viu

vou ser repetitivo de novo , mas : na minha epoca de jovem torcedor daseleção brasileira

brasil nao tomava gol de p### de chile não viu

feature: bad word

verbs: vou, ser, tomavanoums: epoca, brasil, gol, chile, seleçãoadjectives: repetitivo, jovem, brasileira, palavrão

vou: ir (to go)ser: ser (to be) tomava: tomar (suffer)p###: palavrão (bad word)

Page 26: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

26

FAMA (2013): Social Sentiment Analysis with a Naïve Bayes Classifier

Sentiment Analysis

Learning a Classifier

hj vai dar Brazil!, positive

Felipão é mt burrro, negative

O jogo começa as 16h, neutral

functionH

Supervised Learning Algorithm

Naive Bayes Classifier

function Hneymar ta jogando mt hj!!!

positiveneutral

negative

manually annotated corpus

Page 27: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

27

Game - Timeline

Page 28: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

28

Confederations Cup Final: Brazil 3x0 Spain

Page 29: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

29

Players and Main Topics

Page 30: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

30

Players and Main Topics

Inspired by Social Media Streams (former TwitterVis) http://arena1.watson.ibm.com:8080/cav/

Page 31: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

31

Page 32: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

32

Page 33: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

33

www.craquedasredes.com.br

A tecnologia de análise de sentimento social, desenvolvida pela IBM Brasil, analisa o que está sendo postado nas redes sociais sobre qualquer tema, empresa ou pessoa, sem a necessidade de uma hashtag.

Todos os posts públicos em português são capturados por um sistema IBM  de alta tecnologia com inteligência artificial, que é treinado para  aprender a interpretar se o sentimento de cada postagem é positivo, neutro ou negativo.

Essa tecnologia é  capaz de analisar postagens de diversos assuntos e naturezas, incluindo gírias, sarcasmo e linguagem coloquial.

Page 34: Análise de sentimento durante a Copa usando Big Data

Video:Copa

https://www.youtube.com/watch?v=748YIZn-p4U

Page 35: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

35

Limitations of Naive Bayes Approach - Extra Labeling Needed

Naive Bayes

Penalty kick for Uruguay- David Luiz commited it- Júlio César defended it

Naive Bayes

Brazil x Uruguay – Semi-final

David Luiz commited:

- too much neutral

Julio Cesar defended:

- too much neutral

- too much negative

Page 36: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

36

Deep Learning Applied to Social Sentiment Analysis

Sentiment Analysis

Learning a Deep Learning Classifier

hj vai dar Brazil!, positive

Felipão é mt burrro, negative

O jogo começa as 16h, neutral

functionN

Deep Learning Algorithm

Multi-LayerNeural

Network function N

neymar ta jogando mt hj!!!positiveneutral

negative

large scale non-annotated corpus

manually annotated corpus

Page 37: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

37

Brazil x Uruguay – Improvements with Deep Learning

Deep CNNNaive Bayes

Penalty kick for Uruguay- David Luiz commits it- Júlio César defends it

Page 38: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

38

Brazil x Uruguay – Improvements with Deep Learning on Players Scores

Deep CNN(Deep FAMA)

Naive Bayes(FAMA)

David Luiz commits penalty Julio Cesar defends penalty

Page 39: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

39

Deep FAMA Covering All 64 Games of World Cup 2014

• all WC’14 64 games• 53M posts processed• 34M posts about the games• peak of 72K/minute• 5.8M different users

• delivered by team composed by Research, GBS, GTS, SWG, and Software Lab BR

• uses full IBM portfolio:• Infosphere Streams• Websphere• DB2• Cognos BI• all running on SoftLayer

Page 40: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

40

Brazil 1x7 Germany: Social Anatomy of the Largest Event in SN Historyglobally 35.6M tweets (WR)6.8M posts in Portuguese (19% of world)peak of 72K/minute (after 5th goal)1.4M tweets after the game

5th goal peak of 72K/minute

David Luiz interview

positive effects

David Luiz interview

5th goal

David Luiz saves the image of Brazil after the game: without David Luiz 271K positive comments about interview, Brazil post-game positive posts would decrease from 44% to 25%

First half 1.7M: 32% 13% 55% Entire game 4.4M: 33% 13% 54%

Page 41: Análise de sentimento durante a Copa usando Big Data
Page 42: Análise de sentimento durante a Copa usando Big Data
Page 43: Análise de sentimento durante a Copa usando Big Data
Page 44: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

44

Results Used by TV Globo, ESPN, and TV Band

Globo 2nd screen app1M downloads, 1.1M page views

ESPN Brazil28K page views

Page 45: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

45

Ei! Social Sentiment Solution

Page 46: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

46 http://bigdatauniversity.com/bdu-wp/bdu-course/big-data-fundamentals/

Page 47: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

47 https://www.coursera.org/course/mmds

Page 48: Análise de sentimento durante a Copa usando Big Data

© 2014 IBM Corporation

IBM Research – Brazil

48

IBM Research – Brazilhttp://www.research.ibm.com/brazil/

Alan Braz - [email protected] - @alanbraz

facebook.com/ibmbluemix

twitter.com/ibmbluemix

Artigos e tutoriais em português: www.ibm.com/developerworks/br/

www.bluemix.net