Social media a prediktivní analýza15. 6. 2011 Josef Šlerka, Prahakonference Social media ve finančních službách
Predictive analytics
Predictive analytics encompasses a variety of statistical techniques from modeling, data mining and game theory that analyze current and historical facts to make predictions about future events. (WIKIPEDIA)
Predictive analytics
In business, predictive models exploit patterns found in historical and transactional data to identify risks and opportunities. Models capture relationships among many factors to allow assessment of risk or potential associated with a particular set of conditions, guiding decision making for candidate transactions. (WIKIPEDIA)
Search jako signál
Hyunyoung Choi, Hal Varia:
Predicting the Present with Google Trends
Jak je to možné?
Život je hledání ... (taky)
a dříve než se rozhodneme, tak hledáme ... (taky)
Google Insights
služba, kterou Google postkytuje zadarmo
lze ji využít i pro predikční analýzy
Nikolaos Askitas, Klaus F. Zimmermann:
Google Econometrics and Unemployment Forecasting
of the song “Right Round” in terms of search volume closelytracks its rank on the Billboard Hot 100 chart.
Thus motivated, we now investigate whether search activity isa systematic leading indicator of consumer activity by forecasting(i) opening weekend box-office revenue for 119 feature films re-leased in the United States between October 2008 and September2009; (ii) first-month sales of video games across all gamingplatforms (e.g., Xbox, PlayStation, etc.) for 106 games releasedbetween September 2008 and September 2009; and (iii) theweekly rank of 307 songs that appeared on the Billboard Hot100 list between March and September 2009. Search data for mo-vies and video games come from Yahoo!’s Web search query logsfor the US market. Predictions in these domains are based onlinear models with Gaussian error of the form
log!revenue" # !0 $ !1 log!search" $ ";
where, in order to account for the highly skewed distributions ofpopularity, both revenue and search volume are log-transformed.For songs, search data were collected from Yahoo!’s dedicatedmusic site, music.yahoo.com. We predict the weekly Billboardrank using search rank from the current and previous weeks:
billboardt$1 # !0 $ !1searcht $ !2searcht!1 $ ":
Fig. 2 A–C shows that search-based predictions are stronglycorrelated with realized outcomes for movies (0.85) and videogames (0.76) and moderately correlated for music (0.56), wherein each case revenue or rank is predicted on the day immediatelypreceding the event of interest. Moreover, Fig. 2 D–F shows thatthe predictive power of search persists as far out as several weeksin advance—for example, four weeks prior to a movie’s release
Transformers 2
Time to Release (Days)
Sea
rch
Volu
me
A
!30 !20 !10 0 10 20 30
Tom Clancy's H.A.W.X
Time to Release (Days)
Sea
rch
Volu
me
B
!30 !20 !10 0 10 20 30
Right Round
Week
Ran
k
40
30
20
10
C
Mar!09 Apr!09 May!09 Jun!09 Jul!09 Aug!09
BillboardSearch
Fig. 1. Search volume for the movie Transformers 2 (A) and the video game Tom Clancy’s H.A.W.X. (B) prior to and after their release, and search and Billboardrank for the song “Right Round” by Flo Rida (C).
Movies
Predicted Revenue (Dollars)
Act
ual R
even
ue (
Dol
lars
)
10
10
10
10
10
10
10
103 104 105 106 107 108 109
Video Games
Predicted Revenue (Dollars)
Act
ual R
even
ue (
Dol
lars
)
103
104
105
106
107
103 104 105 106 107
Non!Sequel
Sequel
Music
Predicted Billboard Rank
Act
ual B
illbo
ard
Ran
k
0
20
40
60
80
100
0 20 40 60 80 100
Movies
Time to Release (Weeks)
Mod
el F
it
0.4
0.5
0.6
0.7
0.8
0.9
!6 !5 !4 !3 !2 !1 0
Video Games
Time to Release (Weeks)
Mod
el F
it
0.4
0.5
0.6
0.7
0.8
0.9
!6 !5 !4 !3 !2 !1 0
Music
Time to Release (Weeks)
Mod
el F
it
0.4
0.5
0.6
0.7
0.8
0.9
!6 !5 !4 !3 !2 !1 0
A B C
D E F
Fig. 2. Search-based predictions for box-office movie revenue (A), first-month video game sales (B), and the Billboard rank of songs (C), where predictions aremade immediately prior to the event of interest; correlation between predicted and actual outcomes when predictions are based on query data t weeks priorto the event (D–F).
Goel et al. PNAS " October 12, 2010 " vol. 107 " no. 41 " 17487
COMPU
TERSC
IENCE
SSO
CIALSC
IENCE
S
of the song “Right Round” in terms of search volume closelytracks its rank on the Billboard Hot 100 chart.
Thus motivated, we now investigate whether search activity isa systematic leading indicator of consumer activity by forecasting(i) opening weekend box-office revenue for 119 feature films re-leased in the United States between October 2008 and September2009; (ii) first-month sales of video games across all gamingplatforms (e.g., Xbox, PlayStation, etc.) for 106 games releasedbetween September 2008 and September 2009; and (iii) theweekly rank of 307 songs that appeared on the Billboard Hot100 list between March and September 2009. Search data for mo-vies and video games come from Yahoo!’s Web search query logsfor the US market. Predictions in these domains are based onlinear models with Gaussian error of the form
log!revenue" # !0 $ !1 log!search" $ ";
where, in order to account for the highly skewed distributions ofpopularity, both revenue and search volume are log-transformed.For songs, search data were collected from Yahoo!’s dedicatedmusic site, music.yahoo.com. We predict the weekly Billboardrank using search rank from the current and previous weeks:
billboardt$1 # !0 $ !1searcht $ !2searcht!1 $ ":
Fig. 2 A–C shows that search-based predictions are stronglycorrelated with realized outcomes for movies (0.85) and videogames (0.76) and moderately correlated for music (0.56), wherein each case revenue or rank is predicted on the day immediatelypreceding the event of interest. Moreover, Fig. 2 D–F shows thatthe predictive power of search persists as far out as several weeksin advance—for example, four weeks prior to a movie’s release
Transformers 2
Time to Release (Days)
Sea
rch
Volu
me
A
!30 !20 !10 0 10 20 30
Tom Clancy's H.A.W.X
Time to Release (Days)
Sea
rch
Volu
me
B
!30 !20 !10 0 10 20 30
Right Round
Week
Ran
k
40
30
20
10
C
Mar!09 Apr!09 May!09 Jun!09 Jul!09 Aug!09
BillboardSearch
Fig. 1. Search volume for the movie Transformers 2 (A) and the video game Tom Clancy’s H.A.W.X. (B) prior to and after their release, and search and Billboardrank for the song “Right Round” by Flo Rida (C).
Movies
Predicted Revenue (Dollars)
Act
ual R
even
ue (
Dol
lars
)
10
10
10
10
10
10
10
103 104 105 106 107 108 109
Video Games
Predicted Revenue (Dollars)
Act
ual R
even
ue (
Dol
lars
)
103
104
105
106
107
103 104 105 106 107
Non!Sequel
Sequel
Music
Predicted Billboard Rank
Act
ual B
illbo
ard
Ran
k
0
20
40
60
80
100
0 20 40 60 80 100
Movies
Time to Release (Weeks)
Mod
el F
it
0.4
0.5
0.6
0.7
0.8
0.9
!6 !5 !4 !3 !2 !1 0
Video Games
Time to Release (Weeks)
Mod
el F
it
0.4
0.5
0.6
0.7
0.8
0.9
!6 !5 !4 !3 !2 !1 0
Music
Time to Release (Weeks)
Mod
el F
it
0.4
0.5
0.6
0.7
0.8
0.9
!6 !5 !4 !3 !2 !1 0
A B C
D E F
Fig. 2. Search-based predictions for box-office movie revenue (A), first-month video game sales (B), and the Billboard rank of songs (C), where predictions aremade immediately prior to the event of interest; correlation between predicted and actual outcomes when predictions are based on query data t weeks priorto the event (D–F).
Goel et al. PNAS " October 12, 2010 " vol. 107 " no. 41 " 17487
COMPU
TERSC
IENCE
SSO
CIALSC
IENCE
S
Funguje i u nás?
nejsou žadné přesné studie
není důvod, aby nefungoval
Social media jako signál
Život NENÍ jen hledání ... Fans, followers, pages
“Co se vám honí hlavou?” (Facebook)
“What’s happening?” (Twitter)
Predikce burzy
Predikce burzy
To put it in simple words, when the emotions on twitter fly high, that is when people express a lot of hope, fear, and worry, the Dow goes down the next day. When people have less hope, fear, and worry, the Dow goes up. It therefore seems that just checking on twitter for emotional outbursts of any kind gives a predictor of how the stock market will be doing the next day. Zhang, Fuehres, Peter A. Gloor: Predicting Stock Market Indicators Through Twitter “I hope it is not as bad as I fear”
Predikce akcií
sledované akcie Starbucks, Coca Cola a Nike
použité signály Facebook Fans, Twitter flowers, YouTube Views
Predikce voleb
volby do amerického senátu
signálem byl počet followerů na Twitteru
korelace mezi vítězstvím a počtem byla 71%
u porovnání FB fanoušků dokonce 80%
Funguje to i u nás?
Zdá, se že ano:-)
Výzkum na datech ze www.ataxosocialinsider.cz
Ataxo Social Insider
nástroj pro analýzu dat ze sociálních sítí, diskusních fór, blogů a zpravodajských serverů
Ataxo Social Insider
A co predikce?
Case study:
počty zmínek na Facebooku a návštěvnost filmu
zmínky o Inception na českém Facebooku 2010 a divácký ohlas
Harry Potter na českém Facebooku 2010 a divácký ohlas
FB zmínky jako signál
Korelace ukazuje schopnost předvídat dynamiku tržeb filmů, protože lidé většinou dělají, co říkají....
Budoucnost?
Propojme data a dívejme se...
Profilování klientů
propojení statusů uživatelů s jejich finačním chováním
predikce solventnosti
míra spolehlivosti jejich sítě
ověření reality
Hledání produktů
šití produktů na míru
objevování patterns v chování
Půjde to?
Jde to! V USA firma RapLeaf.
U nás zatím není poptávka.
Data ano.
Děkuji za pozornost
www.ataxointeractive.com
twitter.com/josefslerka