58
Practical Data Science @ Etsy Dr. Jason Davis

Practical Data Science @ Etsy

Embed Size (px)

Citation preview

Page 1: Practical Data Science @ Etsy

Practical Data Science @ Etsy

Dr. Jason Davis

Page 2: Practical Data Science @ Etsy

About me

Ph.D. Machine learning & data mining

Entrepreneur. Founder of ad startup Adtuitive

Engineering Director @ Etsy. Search & Data

Page 3: Practical Data Science @ Etsy

TopicsEtsy's data science infrastructure

The Precision / Recall tradeoff

Building products on the web using data science

Data science statistics

Analyzing products with data

Page 4: Practical Data Science @ Etsy

Etsy's data science infrastructure

Page 5: Practical Data Science @ Etsy

Mapreduce and ElasticityScalability

Operability

Page 6: Practical Data Science @ Etsy

Elastic MapReduce (EMR) @ Etsy

Page 7: Practical Data Science @ Etsy

EMR Scalability0 to 1,000 machines to 0. Instantly.

Simplified job scheduling

Easy to "backfill" and run expensive jobs.

Page 8: Practical Data Science @ Etsy

EMR Operability

Barnum does not persist clusters

1. Start job: create new cluster2. Sync data from S33. Run mapreduce analysis4. Shutdown cluster and sync results to S3

No operability support required

Page 9: Practical Data Science @ Etsy

Precision vs. Recall

Page 10: Practical Data Science @ Etsy

Precision vs. Recall

Recall = |Target AND Algorithm | / | Target |Precision = |Target AND Algorithm | / | Algorithm |

Page 11: Practical Data Science @ Etsy

Academic Diversion:Distance Metric Learning via LogDet Divergence

Goal: learn distance measure to respect constraints

Page 12: Practical Data Science @ Etsy

Academic diversions (cont.)

Page 13: Practical Data Science @ Etsy

Recall vs. Precision

Page 14: Practical Data Science @ Etsy

Adtuitive: High precision retail matching

Precision correlates with database sizeScale the db & simple measures work well

Page 15: Practical Data Science @ Etsy

Etsy.com search: relevancy vs recency

Page 16: Practical Data Science @ Etsy

Search on Etsy: relevancy vs recency

Buyer: "I want to see bananas"Seller "I want to see my bananas"

Buyers: prefer high precisionSellers: prefer high recall

Page 17: Practical Data Science @ Etsy

Building products with data

Page 18: Practical Data Science @ Etsy

SELECT * FROM listings WHERE title LIKE '%silver%' ORDER BY creation_tsz;

Page 19: Practical Data Science @ Etsy
Page 20: Practical Data Science @ Etsy

Recall domination

I was looking for a pink bonnet with blue polka dots. I searched for "polka dot bonnet" but all I got were vintage polka records. Then I wanted a classic rock album and searched for "rock and roll" and got nothing but circular stones.

http://www.etsy.com/blog/news/2007/search-what-you-wanted-and-what-you-got/

Page 21: Practical Data Science @ Etsy

The switch to search relevancy

Page 22: Practical Data Science @ Etsy

The switch to relevancy

Better buyer experience● Higher precision via better search algorithms

Better advertising for sellers● Opportunity to pay for recall via Search ads

Better search insight for sellers● Visibility into search placement via Shop

Stats

Page 23: Practical Data Science @ Etsy

Step 1: Understand your data

Page 24: Practical Data Science @ Etsy

Dig through raw data

Garbage in, garbage outVisit logs, Raw query logs

80% of data science is data cleaning!

Page 25: Practical Data Science @ Etsy

Descriptive statistics

Goal: reject hypotheses as quickly as possible

search depthquery conversion rates

Page 26: Practical Data Science @ Etsy

Step 2: Invest in tools

Page 27: Practical Data Science @ Etsy

Good tools provide visibility into the intersection of algorithms & data

Lunr

Page 28: Practical Data Science @ Etsy

Side by side testing

Page 29: Practical Data Science @ Etsy

A/B Analyzer

Page 30: Practical Data Science @ Etsy

Shop Stats

Page 31: Practical Data Science @ Etsy

Step 3: Understand Tradeoffs

(most tradeoffs don't look like this)

Page 32: Practical Data Science @ Etsy

Queries a, b, & c are better. Queries x & y are not.

Page 33: Practical Data Science @ Etsy

Thinking short-term vs. long-term

How will the change affect behavior?

Affect expectations?

User incentives?

Improved data?

Page 34: Practical Data Science @ Etsy

Step 4: Appreciate Product Complexity

Page 35: Practical Data Science @ Etsy

Algorithmic Complexity

vs

Page 36: Practical Data Science @ Etsy

Improvement / Complexity Tradeoff

Summer 2011 Search Relevance ● 4 months● 30 experiments● 11 wins

Page 37: Practical Data Science @ Etsy

UIComplexity

Page 38: Practical Data Science @ Etsy

Communications & Support complexity

Page 39: Practical Data Science @ Etsy

Data science statistics (at Etsy)

Page 40: Practical Data Science @ Etsy

Popular web metrics● Conversion rate: # of purchases / # of visits● Registration rate: # of registrations / # of visits● Funnel completion rates

Scenario:

● OMG, I just deployed unicorns to the site...● The registration rate increased from 2.53% to 2.58%● Where's the champagne?

Page 41: Practical Data Science @ Etsy

The binomial distribution

Completion rates are binomial● Visitor completes task (success)● Visitor does not complete task (failure)

One parameter: p = probability of success

Other fun facts● The binomial distribution is not the normal distribution● 0 <= p <= 1

Page 42: Practical Data Science @ Etsy

Confidence IntervalsHypothesis: my coin is biased and p = 0.6

Supporting data: 10 flips, 6 heads, p_observed = 0.6

What happens if I rerun the experiment?

What happens if I rerun the experiment 100 times?

● Answer: 95 out of 100 times, 0.262 < p_observed < 0.878

Page 43: Practical Data Science @ Etsy

The binomial confidence interval

Things to know● Small p values -> exponentially more data● p values near 0.5 -> less data● Not symmetric

Example● success = 6, trials = 100● interval: [0.022, 0.126]

Page 44: Practical Data Science @ Etsy

Example: Unicorn Search

Experiment setup: 50/50 unicorns / non-unicorns

Hypothesis: unicorns increase conversion rate by 2%

Assumptions● 500k visits with search per day● Baseline conversion is 2.0% ● Hypothesis conversion is 2.04%

Page 45: Practical Data Science @ Etsy

Example: Unicorn SearchAfter 7 days

● 3.5M search visits, 1.75M per bucket● Non-unicorns: 35,000 conversions

○ [0.0198, 0.0202]● Unicorns: 35,700 conversions

○ [0.0202, 0.0206]

Conclusion: We need an entire week to measure this improvement (!)

Page 46: Practical Data Science @ Etsy

Analyzing products with data

Page 47: Practical Data Science @ Etsy

Web analytics 101

Page view

Visit

Visitor

Bounce

Referer

Page 48: Practical Data Science @ Etsy

Funnels

Page 49: Practical Data Science @ Etsy

Example: search funnel

Search => View listing => Add to cart

Page 50: Practical Data Science @ Etsy

The product lifecycle

Page 51: Practical Data Science @ Etsy

Controlled Experiments

Page 52: Practical Data Science @ Etsy

Controlled Experiments

Page 53: Practical Data Science @ Etsy

Experimental Steps

1. Think of something awesome; implement it2. Release as controlled experiment to small

percentage of users3. Wait a week4. Analyze on-site behavior comparing control

vs test group5. Apply statistics and determine if the change

is a win or a loss

Page 54: Practical Data Science @ Etsy

Experimentation @ Etsy

Continuous deployment

Simple experimental configuration

Automated analysis

Page 55: Practical Data Science @ Etsy

Continuous deployment

All code is written from "trunk"

Changes are pushed out incrementally

40 times per day

Page 56: Practical Data Science @ Etsy

Simple experimental configuration

$server_config['search_infinite'] = array( 'description' => "Search Infinite Scroll", 'enabled' => 'rampup', 'rampup' => array( 'percent' => 50.0, 'url_override' => "infinite", ));

Simple configuration => more experimentation

Page 57: Practical Data Science @ Etsy

The A/B Analyzer

Percentage compared to control

Indication of statistical significance

Page 58: Practical Data Science @ Etsy

Experimental success

Category I: Fail! But you learned somethingCategory II: Micro-scopic success. Product- level metrics increased, larger Etsy metrics did notCategory III: Macro-scopic success