View
747
Download
2
Category
Preview:
DESCRIPTION
Citation preview
1
Reporting, Analytics, and Tableau at Spil Games0-100 KPH in One Year
Presented by:Rob Winters
30 October 2012
2
• 200 Million UVs per Month (Google Analytics)• 2-3 Billion pageviews per month, >1 Billion game plays• Local portals in 19 languages with traffic from 219 countries a month• Developer, Publisher, and Platform• Target audience: Girls, Boys, Adult females
Who is Spil Games?
Titles
Portals
3
• Slowing growth in core markets plus increased competition equaled less rapid EBITDA growth
• Focus on personalization and user-centricity (changing market expectations)
• Change in revenue streams and products (Advertising to End User Monetization)
Why Reporting and Analytics Became Important
These guys showed that data mattered
4
August 2011: Starting from Ground Zero
Data “Piggy Bank”
• 5 GB of data• Unindexed/Unusable• Direct copies of some production data
Reporting
• Two Dashboards• Manually generated weekly
“Analytics”
5
Analytics
ReportingData Platform
Today we have a different landscape
Data Warehouse
• >700GB Compressed/>2,5TB Uncompressed• Largest tables load >50M records/day
MapReduce
• >150M events/day• >500M events/day by Q1 2013
Tableau Server
• Daily, Weekly, and Monthly Push Reports• >192 views, 5,5 GB of extracts and data sources• 100+ sessions per day
Tableau Desktop plus R
• Five desktop users of Tableau• Driving company forecast process and planning• Analysis forms backbone of new strategies
6
Two Analytics Specialists
One Reporting Specialist
Two DBAs
One Hadoop Developer
One Freelance Python Developer
The Team that Built It
7
Why we chose Tableau
Speed of Development Flexibility in dealing with messy data
Combined Analytics and Reporting From the best city in the world
8
Reporting
9
• Reports are scheduled and pushed daily, weekly, and monthly from the Tableau Server• Tools used:
• Tableau Server + tabcmd: pull reports• sqlRun: ODBC command line tool used to build message bodies• BLAT: command line email client
Push Reporting
SQL
Processes
•Grab key values from DWH•Log into Tableau Server Postgres DB, confirms that the extract update ran
Build
message body
•Modify DB outputs to match expected format (ex. “.”->”,”)•Echo text into body text file
Pull
reports
from
Tableau
Server
•start /wait "Logoff" Tabcmd" logout•start /wait "Login" "Tabcmd" login -s https://reporting.spilgames.com -u USERNAME -p PASSWORD•start /wait "Pull Report" "Tabcmd" get views/DailyEUMRevenueReport/DailyEUMRevenueReportPush.pdf -f "X:\Reporting data\Today's Reports\Daily EUM Report for %AWSDT%.pdf"
Send Emails and
Archive
•Email list managed via ActiveDirectory, emails pushed via SMTP•Every push report archived to share drive to have permanent record
10
Sample Push Report
11
• 25 power users (10% of local office)• 5 sites:
• Reporting• Development• Three sites for business partners
• 87 workbooks, 192 views• >100 sessions per day• One dashboard accounts for 50% of all views
Web reporting platform
Reporting Platform
12
Change to the site Issue
Added dashboards to various pages to show report update status, primary KPIs on the landing page, etc
Slow load speed of reports caused issues; different permissions between sites led to partners having issues
Custom HTML on the page to link to documentation, report request forms, and email the reporting team
Due to Tableau’s “update” process, HTML would have to be manually replaced with each version change
Custom CSS to match branding Same issue as custom HTML
What we have done that DIDN’T work and other issues
13
• Tableau butchers custom SQL. when possible, use views, tables, or projections• Huge amounts of usage data are available on the server back-end, use it to your
advantage.• Tabcmd can handle custom variables easily, opening the potential for users to
request highly personalized reports (or batch produce reports with a loop)• Balance flexibility with data size, and use extracts for reports which require significant
dimensionality• If using the server for multiple functions (ex. reporting AND analysis-sharing), make
separate sites to avoid confusion on data quality• Use parameters and actions to make your report dynamic
• You can lead a horse to water, but you can’t make them drink• Make it easy to search with tags• Provide easy access to documentation and contact forms• Resist the urge to make duplications of data for users wanting slightly modified
reports
Recommendations for Reporting
14
Analytics
15
Analytics at Spil: many tools make effective work
R plus Tableau combine to form a well-trained athlete
Explore data in
SQL and Tableau
Build Models in
R and evaluate
Test via A/B
Testing
Implement reporting
with Tableau
Both form a critical part of Spil’s
analytics
But for simple problems, Tableau is
sufficient
16
When we use Tableau When we use R
Multidimensional trending analysis (including comparing trends)
Modeling/forecasts (ARIMA, regression, etc)
Distribution analysis Seasonal decomposition
Visualization of small multiples Tree-based analysis
Exploratory analysis Statistical analysis (correlations, t-tests, ANOVA)
Data mining
We have found each tool is optimal for different purposes
17
• Structuring your data BEFORE Tableau forces you to consider dimensions/attributes. SQL and Hive are your friends.
• We are TOO good at seeing patterns, so “trust but verify” what you learn from Tableau with more robust tools like SAS or R.
• Remember Occam’s Razor: Use the simplest possible visualization that can accurately convey the information but no simpler
Analysis Advice
18
Case: Content Recommendation A/B Test
19
Most content on the home page was geared to the under-12 audience, yet analysis showed older users were more valuable
Can we ensure that the content interests for the most valuable users are met?
20
Work flow:
1. A variety of base and calculated variables were created in SQL and loaded into a reference table
2. Data was loaded into Tableau and explored to find “natural”/visual relationships and break points
3. New variables were created or added based on visual exploration4. Revised data set was loaded into R for modeling
• Step 1: Stepwise logistic models predicting probability of game play based on variables from step 3
• Step 2: Build behavioral clustering models and compare to demographic segmentation
• Step 3: Model other industry standard approaches (ex. slope one, cosine similarity) in R and measure reduction in AIC
5. Users were assigned to appropriate clusters and distributions of various variables explored in R
6. Models were tuned and made ready for production
Tableau and R were used simultaneously to accelerate analysis and modeling process
21
Segmentation: Kmeans clustering on 30+ factors, dividing the user base in 30 different behavioral segments plus demographic boosting
Ultimately, an ensemble of models were built to recommend content
Content selection: Ensemble model based on drivers predictive of play
• Cosine similarity of player bases and probability of play
• Weighted slope one modeling of relative play rates
• General user feedback from user ratings and relative time on page
22
Case: Monthly Forecasting Process
23
Bottoms up ARIMA forecast is generated for each core market/business channel/traffic source split (approximately 500 forecasts)
1. Traffic (visits) are forecast using R’s auto-ARIMA functionality• Multiple ARIMA models plus time series linear regressions are built and
compared based on AIC/AICc, with the best-fit model selected• Forecasts are then rolled up to market/channel level (approx. 120 forecasts)
2. Primary interactions (casual and social gameplays) are forecast on a per-visit basis based on historical patterns and known seasonality matrices
3. Secondary interactions (navigational pageviews) are forecast based on primary interaction forecasts, historical data, and other regressors
4. Advertising impressions are loaded into the model on a market/channel/page type basis to generate total impressions by type and location
5. eCPMs are forecast on a market/channel/page type basis6. Forecast data is aggregated and loaded into the data warehouse for tracking
Step One: Initial forecast is built using R
Completely parallel: Using a quad-core machine, total forecasting time is under one hour per month
24
Step two: Exploratory variance analysis using Tableau
Why Tableau:• Faster than R for rapid exploration• Flexible adjustment of plot
structure while exploring data with leadership
• Clear visualization without planning
25
Step Three: Modify forecast with business and load adjusted forecast to data warehouse
Channel Family Family Family Family Family Family Family Family Family Family Family Family Family
Jan 12 Feb 12 Mar 12 Apr 12 May 12 Jun 12 Jul 12 Aug 12 Sep 12 Oct 12 Nov 12 Dec 12Grand Total
Austria 1.0 1.1 1.0 1.0 1.0 1.0 1.0 .9 .8 1.0 1.0 1.0 11.8Belgium 2.5 2.7 2.4 2.5 2.3 2.5 2.3 2.4 2.2 2.3 2.5 2.7 29.2France 17.1 18.0 16.8 17.2 16.6 16.9 16.1 15.5 14.1 15.9 16.4 18.5 199.2Germany 12.9 12.8 12.8 12.4 12.1 12.8 12.7 11.3 10.5 11.2 11.6 12.6 145.6Italy 18.7 20.2 19.0 19.3 19.1 21.0 17.8 16.6 17.1 16.7 16.9 17.6 220.0Netherlands 5.6 5.4 5.2 5.0 4.9 5.2 4.7 4.2 4.0 4.6 4.6 4.9 58.3Poland 34.1 34.8 33.2 30.9 28.7 31.4 28.8 29.5 25.2 26.7 29.2 33.8 366.2Portugal 2.0 2.0 2.3 2.3 2.2 2.6 2.6 2.5 2.1 1.9 1.9 2.3 26.9Russia 5.8 5.7 6.2 5.4 5.1 4.2 3.5 3.7 3.4 4.1 4.5 4.7 56.2Spain 5.2 5.4 5.4 5.6 5.3 5.8 5.1 5.1 4.9 4.6 4.4 5.3 62.1Sweden 3.1 3.0 2.9 2.8 2.6 2.6 2.1 2.3 2.2 2.6 2.7 3.0 31.9Switzerland .9 .9 .9 .9 .9 .9 .8 .8 .7 .9 .9 .9 10.5Ukraine 1.5 1.6 1.7 1.5 1.3 1.1 .9 .9 .8 .9 .9 1.0 14.1United Kingdom 2.5 2.7 2.6 2.7 2.6 2.6 2.7 2.4 2.0 2.4 2.4 2.8 30.6United States 5.1 5.3 5.4 5.0 5.1 5.4 5.4 4.8 4.3 4.4 4.8 5.2 60.3Canada 1.1 1.1 1.2 1.1 1.0 1.0 1.0 1.0 .9 1.0 1.0 1.1 12.5Turkey 29.9 25.0 25.8 23.5 24.0 24.7 23.6 23.8 22.0 20.7 20.9 21.1 285.1India .5 .5 .7 .8 1.0 .9 .6 .6 .6 .6 .6 .6 8.0Indonesia .7 .5 .6 .7 .8 .9 1.0 1.0 .7 .6 .7 .8 9.0Argentina 9.7 10.1 9.9 9.4 10.0 10.6 11.5 10.8 9.5 9.7 8.2 9.5 119.0Brazil 27.2 23.7 22.8 21.7 22.8 24.0 24.7 22.7 20.5 20.6 19.2 22.5 272.6Mexico 12.0 12.0 13.2 13.4 13.8 14.4 14.4 13.3 10.2 10.0 9.6 11.3 147.6LATAM 17.6 16.6 16.8 16.4 16.4 18.2 18.6 18.0 15.3 15.3 14.2 16.5 199.8ROW 15.7 13.8 14.5 13.5 13.7 14.8 14.4 13.2 11.2 10.9 11.2 12.6 159.6Grand Total 232.6 225.1 223.3 214.9 213.5 225.8 216.4 207.3 185.2 189.4 190.3 212.4 2536.1
Austria .0Belgium .0France .0Germany .0Italy .0Netherlands .0Poland .0Portugal .0Russia .0Spain .0Sweden .0Switzerland .0Ukraine .0United Kingdom .0United States .0Canada .0Turkey .0India .0Indonesia .0Argentina .0Brazil .0Mexico .0LATAM .0ROW .0Grand Total .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0
Austria 1.0 1.1 1.0 1.0 1.0 1.0 1.0 .9 .8 1.0 1.0 1.0 11.8Belgium 2.5 2.7 2.4 2.5 2.3 2.5 2.3 2.4 2.2 2.3 2.5 2.7 29.2France 17.1 18.0 16.8 17.2 16.6 16.9 16.1 15.5 14.1 15.9 16.4 18.5 199.2Germany 12.9 12.8 12.8 12.4 12.1 12.8 12.7 11.3 10.5 11.2 11.6 12.6 145.6Italy 18.7 20.2 19.0 19.3 19.1 21.0 17.8 16.6 17.1 16.7 16.9 17.6 220.0Netherlands 5.6 5.4 5.2 5.0 4.9 5.2 4.7 4.2 4.0 4.6 4.6 4.9 58.3Poland 34.1 34.8 33.2 30.9 28.7 31.4 28.8 29.5 25.2 26.7 29.2 33.8 366.2Portugal 2.0 2.0 2.3 2.3 2.2 2.6 2.6 2.5 2.1 1.9 1.9 2.3 26.9Russia 5.8 5.7 6.2 5.4 5.1 4.2 3.5 3.7 3.4 4.1 4.5 4.7 56.2Spain 5.2 5.4 5.4 5.6 5.3 5.8 5.1 5.1 4.9 4.6 4.4 5.3 62.1Sweden 3.1 3.0 2.9 2.8 2.6 2.6 2.1 2.3 2.2 2.6 2.7 3.0 31.9Switzerland .9 .9 .9 .9 .9 .9 .8 .8 .7 .9 .9 .9 10.5Ukraine 1.5 1.6 1.7 1.5 1.3 1.1 .9 .9 .8 .9 .9 1.0 14.1United Kingdom 2.5 2.7 2.6 2.7 2.6 2.6 2.7 2.4 2.0 2.4 2.4 2.8 30.6United States 5.1 5.3 5.4 5.0 5.1 5.4 5.4 4.8 4.3 4.4 4.8 5.2 60.3Canada 1.1 1.1 1.2 1.1 1.0 1.0 1.0 1.0 .9 1.0 1.0 1.1 12.5Turkey 29.9 25.0 25.8 23.5 24.0 24.7 23.6 23.8 22.0 20.7 20.9 21.1 285.1India .5 .5 .7 .8 1.0 .9 .6 .6 .6 .6 .6 .6 8.0Indonesia .7 .5 .6 .7 .8 .9 1.0 1.0 .7 .6 .7 .8 9.0Argentina 9.7 10.1 9.9 9.4 10.0 10.6 11.5 10.8 9.5 9.7 8.2 9.5 119.0Brazil 27.2 23.7 22.8 21.7 22.8 24.0 24.7 22.7 20.5 20.6 19.2 22.5 272.6Mexico 12.0 12.0 13.2 13.4 13.8 14.4 14.4 13.3 10.2 10.0 9.6 11.3 147.6LATAM 17.6 16.6 16.8 16.4 16.4 18.2 18.6 18.0 15.3 15.3 14.2 16.5 199.8ROW 15.7 13.8 14.5 13.5 13.7 14.8 14.4 13.2 11.2 10.9 11.2 12.6 159.6Grand Total 232.6 225.1 223.3 214.9 213.5 225.8 216.4 207.3 185.2 189.4 190.3 212.4 2536.1
1. Forecast data is loaded into Excel template (right) to load in adjustments
2. Channel/market leaders provide feedback on initiatives and expected impact, along with non-initiative adjustments (if needed)
3. Revised forecast data is committed and uploaded; primary outputs (visits, pageviews, gameplays, advertising revenues) are recalculated
4. Final forecast is shared with Management Team
26
Step Four: Activity is monitored within Tableau
Recommended