Big data camp la futures so bright tim-shea

Preview:

DESCRIPTION

Big Data Camp LA 2014, • The Future's so bright (You can barely make any predictions about it) by Timothy Shea of DataSift

Citation preview

Powering The Social Economy

How do we Make Good Forecasts?

The Architecture vs The Practice (aka: Form vs Function)

Platforms for Big Data storage, processing & analytics.

VS

Actual applications of Data-at-Scale

Themes for This Morning

How DataSift Manages, Processes & Delivers

Data Visualization via Tableau

Causal Inference & Statistical Modeling

Movies & Coffee

Who am I?

Tim Shea

@SheaNineSeven

Data Scientist & Sales Engineer at DataSift

Focus on Alliances & Channels:

Tableau, Alteryx, Microstrategy, Informatica, SAP

Data Science as a Practice:

Disambiguation, Classification, Causality

What is DataSift?

Social Data Platform Full “Firehose” Access 2 Billion Posts per Day ½ Trillion Posts Historical Archive

Really Intense Architecture Diagram

We Make it Simple for You Focus on Filtering Big Data < Relevant Data Enrichments: - Demographics - Links - Emotion & Intent - Learned Classification

Demo

DataSift: Beyond “Social Listening”

Ex. “Does Social have anything to do with my Business?”

Line Charts and Graphs

Vs

Operationalized Decision Making

“The Enterprise”

DataSift Enterprise customers are building:

1.  Demand Forecasting 2.  Critical Event Processing

3.  Market Segmentation/Statistical Classification 4.  Establishing Correlative Relationships(**)

Causality

Necessary…Connection?

Does Event A cause Event B?

Fighting Crime…Fights Crime(?)

Does The Past have anything at all to do with The Future?

Defending Your Hypotheses

How can I create & defend my Hypotheses?

How do I communicate my findings to Laypeople (non-Data Scientists) like your Boss?

Risk Management in Hollywood

Movies

Through the Lens of:

DataSift - What we do as a Social Data Platform

Tableau - How to Make Sense of a Mountain of Data

Good Data & Good Tools

Risk Management is Hard

Q: What is a “Sure Bet”?

Q: Should I spend $100MM making this movie?

Q: How can I make this process less risky?

Enter DataSift & Tableau

Example

Return Every: Tweet

Facebook Post Instagram Photo

Bitly Click

For What? Every single Movie released in 2013

Compare it With

Tableau

What Data do we Have?

1. Intuition

2. Social => Box Correlation?

3. Prove It

4. Defend the Model

The Model

Y = a + bX

Y = Box Office (the predicted) X = Social Volume (the predictor)

B = Coefficient A = Some offset

Defend the Model v1

P-value: There is an X% chance that the Null Hypothesis is true.

Null Hypothesis: The linear coefficient is equal to zero.

Defend the Model v2

P-value (again): We can be (100 – X)% confident that the correlation were seeing can be explained by our model.

R-Squared: Our model explains about Y% of the variability (points

outside the regression line) given “Sum of Least Squared”

Defend the Model v3

Every Bitly click predicts about $240 in Box Office Sales

I’m extremely confident (99%) that this is not due to chance.

With ~96% confidence we can rely on this model in the future.

The Model (cont)

Y “is predicted by” a + bX

Box Office = 0 + $240 * (# bitly clicks) Box Office = 0 + $130 * (# tweets)

Benchmarking

If my Bitly #’s drop below $240

If my Twitter #’s drop below $130

If my Instagram #’s drop below $2809

If my Facebook #’s drop below $3871

Other Considerations

Other Considerations

Residuals

Other Regression (Logarithmic, Exponential, Polynomial)

“Overfitting”

Additional Dimensions DataSift Social Data:

Gender Income

Geography “Influence”

Industry vs Consumers

Getting Started

tim.shea@datasift.com

@sheanineseven

http://bit.ly/DataSiftBigDataCamp

Thanks for Listening!

Recommended