View
167
Download
0
Category
Tags:
Preview:
Citation preview
Data Science at PebbleAnalyzing Data to Make Smarter Watches
June 2, 2015
Today’s speakers
Scott Ward
Solutions Architect
Amazon Web Services
Kiyoto Tamura
Head of Marketing
Treasure Data
Susan Holcomb
Head of Analytics
Pebble
Data at Pebble
What is Pebble?
• Customizable smart watch with crowd-pleasing history
• $10.3MM on Kickstarter with first product
• In March, $20MM on Kickstarter with new product
Pebble Data Team: Then vs. Now
One year ago…
No data team
No analytics infrastructure
Barely any data
Barely any insights
Today… 5-person team (& growing!)
Scalable analytics infrastructure via Treasure Data
~60MM records per day
New product influenced by data insights
Data Science Workflow
Define the problem
Acquire the data
Fit the model
the work the hype
Pebble’s First Problem
How should we measure product success?
Engagement Definition
• How can we tell someone likes the watch?– Button presses?– Apps downloaded / launched?– Minimized SW bugs?– A crazy formula combining these?
• Simplest: They are wearing the watch– Use accelerometer
Accessing Data
60 MM records per day Scheduled jobs
in TD to post-process & aggregate data
Ad hoc queries in TD to explore data (Presto, Hive)
Dashboards
Standardized output
Process: ~30 queries to get
one result
Accelerometer noise threshold
• Accelerometer picks up gestures, net motion (so we can enable cool features)
• Sensitive enough to pick up vibrations of passing train
• Goal: Determine threshold for noise so we can assess when watch is really in use
Accelerometer noise threshold
First result
???
Raising the threshold
peaks shift left spike remainsbacklight data matches original threshold!!
Further validated by survey of users
Why this worked
• Rapid, repeated ad hoc querying lets you get an intuitive picture of the data– What is the range?– Where are the errors?– Where are the inflection points?
• Few analytics infrastructure tools optimize for this– Too focused on standardized reporting– Want to sell you black box that spits out “insights”
Problems 2-n
• Building scalable reporting system
• Delivering insights that shaped interface for new product
• Discovering signals on user attrition
• Designing models to segment use cases
• Analyzing dozens of product elements to improve product experience
thanks <3
Product Overview
Kiyoto TamuraDirector of Developer Relations
Event Data is Everywhere…
Smartphones Websites Home Automation
WearableDevices
ConnectedVehicles
Event Data is Everywhere…
Smartphones Websites Home Automation
WearableDevices
ConnectedVehicles
{“timestamp”: “2015-05-22T13:50:00-0600”,“event”: “tap”,“object”: “button_32”,“user”: { “name”: “Luca”, “email”: “luca@treasuredata.com”, “twitter”: “luckymethod” }}
Connecting the (big) data dots is hard
credit: Matt Turck @ FirstMark Capital
We provide a simple solution
Ingest Analyze Distribute
and more…
• Streaming or Batch ingestion (or both) with Treasure Agent and Embulk
• Don’t worry about changing the way you send data, Treasure Data handles it all
• 99.99% uptime, our team takes care of running the show so you don’t have to
• Query all your data using SQL, no schema required
• Control Treasure Data through our Console, our Command Line Interface or Luigi-TD for complex automated data pipelines
• Choose Hive or Presto
• Run machine learning at scale with Hivemall
• Expansive collection of export plugins: send data to Google Docs, Tableau, Excel, PostgreSQL…
• Connect your favorite BI tool
• Fine grained user access control to your data
Why is Treasure Data better?
Ingest Analyze Distribute
CommerceTechnologyGaming Media & Ad Tech
Our growing customer base
Energy Company
IoT
• API Servers (c3.2xlarge)
• Hadoop workers (c3.8xlarge)
• Generic workers (c3.4xlarge)
• Powers our schema-free, columnar store
• 50 billion events/day
• No capacity planning needed!
• Both MySQL & PostgreSQL
• Reduced ops cost
• No dedicated devops for 2.5 years
Treasure Data on AWS
EC2 S3 RDS
Amazon Relational Database Service (RDS)
Amazon RDS is a fully managed relational DB service that is:– Simple to deploy– Easy to scale– Reliable– Cost-effective
Ease of deployment and patching
Push-button scalability
Choice of DB Engines
Automated backups
User snapshots and cloning
Monitoring and auto. host replacement
POSTGRE
Amazon RDS for Aurora (Preview)
Amazon RDS - Multi-Availability Zone Configuration
• Configure your RDS environment for high availability and DR
• Primary database running in one Availability Zone with Standby in
another
• DNS Name changes due to unhealthy RDS instance or Availability Zone
Availability Zone #1
Web Tier
RDPGW
AppTier
Web Tier
AppTier
Auto Scaling group
Auto Scaling group
Availability Zone #2
Web Tier
AppTier
Web Tier
AppTier
Auto Scaling group
Auto Scaling group
RDS Multi-Availability Zone Architecture
Amazon RDS - Read Replicas
Insert Partner Logo Here
Region #1 Region #2
Insert Partner Logo Here
Questions?
Treasure DataKiyoto Tamura
@kiyototamura
treasuredata.com
PebbleSusan Holcomb
getpebble.com
AWSScott Ward
aws.amazon.com
Contact us to learn more
Recommended