Upload
alan-braz
View
567
Download
3
Embed Size (px)
DESCRIPTION
A tecnologia de análise de sentimento social, desenvolvida pela IBM Brasil, analisa o que está sendo postado nas redes sociais sobre qualquer tema, empresa ou pessoa, sem a necessidade de uma hashtag. Todos os posts públicos em português são capturados por um sistema IBM de alta tecnologia com inteligência artificial, que é treinado para aprender a interpretar se o sentimento de cada postagem é positivo, neutro ou negativo. Essa tecnologia é capaz de analisar postagens de diversos assuntos e naturezas, incluindo gírias, sarcasmo e linguagem coloquial. Esta solução apelidade de FAMA, foi utilizada durante os jogos do Brasil na Copa das Confederações em 2013 e evoluída para funcionar nos 64 jogos da Copa do Mundo FIFA 2014. Nesta apresentação contarei a motivação, detalhes técnicos e resultados desta empreitada que unificou futebol, redes sociais e tecnologia! Leia mais em http://alanbraz.wordpress.com/2014/08/07/tdc2014/
Citation preview
© 2014 IBM Corporation
IBM Research – Brazil
1
Análise de sentimento durante a Copa usando Big Data
Alan Braz – IBM Research @alanbraz
© 2014 IBM Corporation
IBM Research – Brazil
2
Alan Braz
IBM Research – BrazilResearch Software Engineer
2002:2005 UNICAMP – BSc in Computer Science2005aug:2005nov IBM GBS – Java developer intern2005:2007 IBM GBS – Java developer (WWER)2007:2010 IBM GBS – Technical leader (eAC)2009:2012 IBM GBS – Agile coach and instructor (GenO)2009:today Metrocamp – SE, RUP, Agile grad teacher2010:2012 IBM GBS – Software Architect (Blue Community)2009:2013 UNICAMP – MSc Agile Software Engineering2013feb:today IBM Research Brazil as RSE
www.alanbraz.com.br@alanbraz
© 2014 IBM Corporation
IBM Research – Brazil
3
Innovation and Comfort
Trial-and-Error:– start-ups
RADICALINNOVATIONINNOVATION
Science-Based:– scientific method
(empirical)
– logic deduction (mathematics)
© 2014 IBM Corporation
IBM Research – Brazil
4
science-based innovation
© 2014 IBM Corporation
IBM Research – Brazil
5
The World is our Lab: 12 Labs Worldwide in 10 Countries
China WatsonAlmaden
Austin
JapanIsrael
Switzerland
India
Ireland
Australia
BehavioralScience Chemistry
ElectricalEngineering
ComputerScience
MaterialsScience
MathematicalScience Physics
ServicesScience
IBM Research world-wide has 1600+ PhDs with diversity of disciplines:
Africa
© 2014 IBM Corporation
IBM Research – Brazil
6
© 2014 IBM Corporation
IBM Research – Brazil
7
© 2014 IBM Corporation
IBM Research – Brazil
8
© 2014 IBM Corporation
IBM Research – Brazil
9
IBM Research - Brazil
Natural resources modeling, analytics, and logistics.
Systems of engagement and insights.
Analytics and modeling of social and human data and applications.
Micro/nano- technologies aimed at addressing smarter planet challenges.
Smarter Natural
Resources
Systems of Engagement and Insights
Smarter Devices
Rio de Janeiro
São Paulo
A team of world class researchers in close connection to the other 12 IBM Research labs an to the world’s best scientific, academic, and development communities.
Social Data
Analytics
© 2014 IBM Corporation
IBM Research – Brazil
10
System U: Modeling People from Social Media
Social behaviorse.g., when tweetingSocial behaviorse.g., when tweeting
Five Factor Model
•Openness•Conscientious•Extroverted•Agreeable•Neuroticism
Five Factor Model
•Openness•Conscientious•Extroverted•Agreeable•Neuroticism
Ford’s 12 “Universal Needs”
•Structure•Challenge•Excitement•Liberty•Harmony•Closeness
Ford’s 12 “Universal Needs”
•Structure•Challenge•Excitement•Liberty•Harmony•Closeness
•Practicality•Self-expression•Curiosity• Ideals•Love•Stability
Five Values
•Self-transcendence•Conservation•Self-enhancement•Hedonism•Openness-to-Change
Five Values
•Self-transcendence•Conservation•Self-enhancement•Hedonism•Openness-to-Change
© 2014 IBM Corporation
IBM Research – Brazil
11
Project: Social Media Behavior Simulation
Goal: to create a tool for companies to explore the impact and result of social media actions through simulation.
Applications: exploration of effort size
and impact of marketing campaigns;
determination of counter-information measures in viral media outbreaks.
Maira Gatti, Ana Appel, Claudio Pinhanez, Rogério de Paula, Cicero dos Santos, Alexander Rademaker, Paulo Cavalin, Samuel Barbosa, Daniel Gribel
Romney’s Network 5.1M tweets 28,145 active users 5,498 followers
Obama’s Network 23,856,961 followersRomney’s Network 1,675,792 followers
Sample - Sept 22 to Oct 29, 2012Obama’s Network 5.6M tweets 24,526 active users 3,594 followers
Simulation of Obama/Romney Twittercampaigns in the last month before electionSimulation of Obama/Romney Twittercampaigns in the last month before election
© 2013 IBM Corporation
© 2014 IBM Corporation
IBM Research – Brazil
14
Ei! 194 Million Brazilians Helping their National Team’s Coach
An app made specifically for one person: Luiz Felipe Scolari, coach of the Brazilian national soccer team.
Ei! is an app that identifies, filters and analyzes all the Twitter comments that Brazilians have made during the games.
With the touch of a button, Scolari will know what the country consensus is on:
At half time: which players the audience are liking and hating, what changes should be made, which tactics should be explored, what player needs to be introduced…
After the game: his country’s perspective on how the team, the players and his performance as a coach.
© 2014 IBM Corporation
IBM Research – Brazil
15
Challenges• Real-time issues• Up to 5 million tweets per match• Up to 20 thousands tweets per minute
• Texting x Writing: Casual language• nao disse , Balotelli ia meter gol hoje , um golaço ainda , madero aquele negoo
• hora de colocar o Leandro né Felipão ? u.u
• vou ser repetitivo de novo , mas : na minha epoca de jovem torcedor da seleção brasileira , brasil nao tomava gol de p### de chile não viu
• jah to vendo o Brasil faze nois passa vergonha na copa ! ! ! pq meu g-zuis ...
• acho q o ronaldinho tem que ser totula
• Com todo o respeito , Luis Fabiano , popcorn men hahahahaha beijo para quem entendeu , pior piada ever ! Haha
© 2014 IBM Corporation
IBM Research – Brazil
16
Social Sentiment Analysis is Difficult
(CHEvATM) Diego costa merece errar por ter escolhido outra seleçao pra jogar
(BRAvITA) Itália perdendo o segundo jogador lesionado com TRINTA minutos de jogo. Prandelli deve tá jogando o Football Manager 2013.
(BRAvITA) PAAAAAAAARTIU ASSISTIR JOGO DO Brazil!
(BRAvITA) Vacilo, Jô ia entrar e fazer mais um
(BRAvMEX) o que aconteceu com a seleção ? Pqp
(BRAvURU) no momento dançando show das poderosas de sutiã e short jeans
(RMAvATM) BALE AMOR FAÇA AQUELE LINDO GOL QUE PROMETEU PRA MIM ONTEM A NOITE
(BRAvMEX) Brazil vai ganhando do México, vingando-se das Olimpíadas, num jogo que vale tanto quanto troco em bala.
(SAOvCOR) o ganso so quer fazer jogada genial
(SAOvCOR) Com essa Fabulosa em campo o Sao Paulo sempre vai fazer gol contra o Corinthians, entenda tecnico retranqueiro do c#######
(SAOvCOR) Mano meu pai ganho 500 conto no jogo do bixo kkkk
© 2014 IBM Corporation
IBM Research – Brazil
17
Ei! Social Sentiment Solution
© 2014 IBM Corporation
IBM Research – Brazil
18
Millions of events per second
Microsecond Latency
Traditional / Non-traditional data sources
Real time delivery
PowerfulAnalytics
Algorithmic Trading
Telco ChurnPrediction
SmartGrid
CyberSecurity
Government /Law enforcement
ICUMonitoring
EnvironmentMonitoring
InfoSphere StreamsA Platform for Real Time Analytics on BIG Data
Key Big Data Challenge – Velocity
Volume:Terabytes per secondPetabytes per day
Variety:All kinds of dataAll kinds of analytics
Velocity:Insights in microseconds
© 2014 IBM Corporation
IBM Research – Brazil
19 http://www.ibm.com/developerworks/analytics/
© 2014 IBM Corporation
IBM Research – Brazil
20
© 2014 IBM Corporation
IBM Research – Brazil
21
x86 host x86 host x86 host x86 host x86 host
Optimizing scheduler assigns PEs to hosts, and continually manages resource allocation
Optimizing scheduler assigns PEs to hosts, and continually manages resource allocation
Commodity hardware – laptop, blades or high performance clustersCommodity hardware – laptop, blades or high performance clusters
MetersCompany Filter
Usage Model
Usage Contract
Temp Action
Dynamically add hosts and jobsDynamically add hosts and jobs
New jobs work with existing jobsNew jobs work with existing jobs
Text Extract
Degree History
Compare History Store
History
Meters
Season Adjust
Daily Adjust
Text Extract
Streams Runtime Illustrated
© 2014 IBM Corporation
IBM Research – Brazil
22
Ei! is Built on FAMA: Real-Time Social Media Polarity Analysis Tool for Portuguese Language
FAMA is social sentiment analysis tool for the Portuguese Language developed by IBM Research - Brazil
FAMA processes text related to topics of interest which appear in social media: Twitter, Facebook, ReclameFacil, etc.; or in private text repositories such as customer complaints or call center logs.
FAMA can determine polarity related to the topics of interest: positive, negative, or neutral.
FAMA can find most commonly used terms and their co-occurrences with the topics of interest. “FAMA”
Greek goddess of gossip and rumor
© 2014 IBM Corporation
IBM Research – Brazil
23
FAMA: Real-Time Social Media Polarity Analysis in Portuguese
23
Text Classifier
classifieddatabase
Stream Computin
g
Infosphere Streams
learneddatabase
JSONs
TextAnalytics
dashboard user interface
FAMA
© 2014 IBM Corporation
IBM Research – Brazil
24
Construction of the Learned Database from Manual Analysis of Tweet Samples
The data for the learned database is created by manual inspection of tweets:
about 2000 tweets from 4 friendly matches
15 different coders with different degrees of interest and knowledge of soccer
uses tool to display, collect, and process the data.
© 2014 IBM Corporation
IBM Research – Brazil
25
FAMA Analysis of a Tweet: Example of Text Classification
vou ser repetitivo de novo , mas : na minha epoca de jovem torcedor da seleção brasileira , brasil nao tomava gol de p### de chile não viu
vou ser repetitivo de novo , mas : na minha epoca de jovem torcedor daseleção brasileira
brasil nao tomava gol de p### de chile não viu
feature: bad word
verbs: vou, ser, tomavanoums: epoca, brasil, gol, chile, seleçãoadjectives: repetitivo, jovem, brasileira, palavrão
vou: ir (to go)ser: ser (to be) tomava: tomar (suffer)p###: palavrão (bad word)
© 2014 IBM Corporation
IBM Research – Brazil
26
FAMA (2013): Social Sentiment Analysis with a Naïve Bayes Classifier
Sentiment Analysis
Learning a Classifier
hj vai dar Brazil!, positive
Felipão é mt burrro, negative
O jogo começa as 16h, neutral
functionH
Supervised Learning Algorithm
Naive Bayes Classifier
function Hneymar ta jogando mt hj!!!
positiveneutral
negative
manually annotated corpus
© 2014 IBM Corporation
IBM Research – Brazil
27
Game - Timeline
© 2014 IBM Corporation
IBM Research – Brazil
28
Confederations Cup Final: Brazil 3x0 Spain
© 2014 IBM Corporation
IBM Research – Brazil
29
Players and Main Topics
© 2014 IBM Corporation
IBM Research – Brazil
30
Players and Main Topics
Inspired by Social Media Streams (former TwitterVis) http://arena1.watson.ibm.com:8080/cav/
© 2014 IBM Corporation
IBM Research – Brazil
31
© 2014 IBM Corporation
IBM Research – Brazil
32
© 2014 IBM Corporation
IBM Research – Brazil
33
www.craquedasredes.com.br
A tecnologia de análise de sentimento social, desenvolvida pela IBM Brasil, analisa o que está sendo postado nas redes sociais sobre qualquer tema, empresa ou pessoa, sem a necessidade de uma hashtag.
Todos os posts públicos em português são capturados por um sistema IBM de alta tecnologia com inteligência artificial, que é treinado para aprender a interpretar se o sentimento de cada postagem é positivo, neutro ou negativo.
Essa tecnologia é capaz de analisar postagens de diversos assuntos e naturezas, incluindo gírias, sarcasmo e linguagem coloquial.
© 2014 IBM Corporation
IBM Research – Brazil
35
Limitations of Naive Bayes Approach - Extra Labeling Needed
Naive Bayes
Penalty kick for Uruguay- David Luiz commited it- Júlio César defended it
Naive Bayes
Brazil x Uruguay – Semi-final
David Luiz commited:
- too much neutral
Julio Cesar defended:
- too much neutral
- too much negative
© 2014 IBM Corporation
IBM Research – Brazil
36
Deep Learning Applied to Social Sentiment Analysis
Sentiment Analysis
Learning a Deep Learning Classifier
hj vai dar Brazil!, positive
Felipão é mt burrro, negative
O jogo começa as 16h, neutral
functionN
Deep Learning Algorithm
Multi-LayerNeural
Network function N
neymar ta jogando mt hj!!!positiveneutral
negative
large scale non-annotated corpus
manually annotated corpus
© 2014 IBM Corporation
IBM Research – Brazil
37
Brazil x Uruguay – Improvements with Deep Learning
Deep CNNNaive Bayes
Penalty kick for Uruguay- David Luiz commits it- Júlio César defends it
© 2014 IBM Corporation
IBM Research – Brazil
38
Brazil x Uruguay – Improvements with Deep Learning on Players Scores
Deep CNN(Deep FAMA)
Naive Bayes(FAMA)
David Luiz commits penalty Julio Cesar defends penalty
© 2014 IBM Corporation
IBM Research – Brazil
39
Deep FAMA Covering All 64 Games of World Cup 2014
• all WC’14 64 games• 53M posts processed• 34M posts about the games• peak of 72K/minute• 5.8M different users
• delivered by team composed by Research, GBS, GTS, SWG, and Software Lab BR
• uses full IBM portfolio:• Infosphere Streams• Websphere• DB2• Cognos BI• all running on SoftLayer
© 2014 IBM Corporation
IBM Research – Brazil
40
Brazil 1x7 Germany: Social Anatomy of the Largest Event in SN Historyglobally 35.6M tweets (WR)6.8M posts in Portuguese (19% of world)peak of 72K/minute (after 5th goal)1.4M tweets after the game
5th goal peak of 72K/minute
David Luiz interview
positive effects
David Luiz interview
5th goal
David Luiz saves the image of Brazil after the game: without David Luiz 271K positive comments about interview, Brazil post-game positive posts would decrease from 44% to 25%
First half 1.7M: 32% 13% 55% Entire game 4.4M: 33% 13% 54%
© 2014 IBM Corporation
IBM Research – Brazil
44
Results Used by TV Globo, ESPN, and TV Band
Globo 2nd screen app1M downloads, 1.1M page views
ESPN Brazil28K page views
© 2014 IBM Corporation
IBM Research – Brazil
45
Ei! Social Sentiment Solution
© 2014 IBM Corporation
IBM Research – Brazil
46 http://bigdatauniversity.com/bdu-wp/bdu-course/big-data-fundamentals/
© 2014 IBM Corporation
IBM Research – Brazil
47 https://www.coursera.org/course/mmds
© 2014 IBM Corporation
IBM Research – Brazil
48
IBM Research – Brazilhttp://www.research.ibm.com/brazil/
Alan Braz - [email protected] - @alanbraz
facebook.com/ibmbluemix
twitter.com/ibmbluemix
Artigos e tutoriais em português: www.ibm.com/developerworks/br/
www.bluemix.net