Upload
subash-dsouza
View
277
Download
0
Embed Size (px)
DESCRIPTION
Big Data Camp LA 2014, • The Future's so bright (You can barely make any predictions about it) by Timothy Shea of DataSift
Citation preview
Powering The Social Economy
How do we Make Good Forecasts?
The Architecture vs The Practice (aka: Form vs Function)
Platforms for Big Data storage, processing & analytics.
VS
Actual applications of Data-at-Scale
Themes for This Morning
How DataSift Manages, Processes & Delivers
Data Visualization via Tableau
Causal Inference & Statistical Modeling
Movies & Coffee
Who am I?
Tim Shea
@SheaNineSeven
Data Scientist & Sales Engineer at DataSift
Focus on Alliances & Channels:
Tableau, Alteryx, Microstrategy, Informatica, SAP
Data Science as a Practice:
Disambiguation, Classification, Causality
What is DataSift?
Social Data Platform Full “Firehose” Access 2 Billion Posts per Day ½ Trillion Posts Historical Archive
Really Intense Architecture Diagram
We Make it Simple for You Focus on Filtering Big Data < Relevant Data Enrichments: - Demographics - Links - Emotion & Intent - Learned Classification
Demo
DataSift: Beyond “Social Listening”
Ex. “Does Social have anything to do with my Business?”
Line Charts and Graphs
Vs
Operationalized Decision Making
“The Enterprise”
DataSift Enterprise customers are building:
1. Demand Forecasting 2. Critical Event Processing
3. Market Segmentation/Statistical Classification 4. Establishing Correlative Relationships(**)
Causality
Necessary…Connection?
Does Event A cause Event B?
Fighting Crime…Fights Crime(?)
Does The Past have anything at all to do with The Future?
Defending Your Hypotheses
How can I create & defend my Hypotheses?
How do I communicate my findings to Laypeople (non-Data Scientists) like your Boss?
Risk Management in Hollywood
Movies
Through the Lens of:
DataSift - What we do as a Social Data Platform
Tableau - How to Make Sense of a Mountain of Data
Good Data & Good Tools
Risk Management is Hard
Q: What is a “Sure Bet”?
Q: Should I spend $100MM making this movie?
Q: How can I make this process less risky?
Enter DataSift & Tableau
Example
Return Every: Tweet
Facebook Post Instagram Photo
Bitly Click
For What? Every single Movie released in 2013
Compare it With
Tableau
What Data do we Have?
1. Intuition
2. Social => Box Correlation?
3. Prove It
4. Defend the Model
The Model
Y = a + bX
Y = Box Office (the predicted) X = Social Volume (the predictor)
B = Coefficient A = Some offset
Defend the Model v1
P-value: There is an X% chance that the Null Hypothesis is true.
Null Hypothesis: The linear coefficient is equal to zero.
Defend the Model v2
P-value (again): We can be (100 – X)% confident that the correlation were seeing can be explained by our model.
R-Squared: Our model explains about Y% of the variability (points
outside the regression line) given “Sum of Least Squared”
Defend the Model v3
Every Bitly click predicts about $240 in Box Office Sales
I’m extremely confident (99%) that this is not due to chance.
With ~96% confidence we can rely on this model in the future.
The Model (cont)
Y “is predicted by” a + bX
Box Office = 0 + $240 * (# bitly clicks) Box Office = 0 + $130 * (# tweets)
Benchmarking
If my Bitly #’s drop below $240
If my Twitter #’s drop below $130
If my Instagram #’s drop below $2809
If my Facebook #’s drop below $3871
Other Considerations
Other Considerations
Residuals
Other Regression (Logarithmic, Exponential, Polynomial)
“Overfitting”
Additional Dimensions DataSift Social Data:
Gender Income
Geography “Influence”
Industry vs Consumers
Thanks for Listening!