Upload
anuj-luthra
View
145
Download
3
Embed Size (px)
Citation preview
Customer Behaviour ModellingInsights from customer data
RedbubbleAnuj Luthra
Product Development
● Millions of users● Lots of ideas● Lots of unquantified & unvalidated assumptions● What are the Biggest problems● What should we pursue first = best opportunity● We want to build the right thing
Existing Techniques● User interviews and surveys
○ Interpretation of wants and needs is tricky■ Not dependable
○ Expensive & time consuming● Analytic tools (GoogleAnalytics, Flurry) provide high
level views ○ difficult to gauge effect of each variable on its own :
Lots of factors at play, how much did a singular thing affect the outcome
What is lacking● Ability to get insights from real user actions/visits● Make it Quick and Cheap to support/reject assumptions● Confidence, like probabilities, external factors and stuff :
-)
What we do
● Statistical modelling of customer data and infer● Quantification of relative impact of the user behaviours
and visit attributes“Lets put some science in data analysis”
● Give a starting point● Define the goal for measuring success● Keeps you focussed and honest● Hunches are powerful - use domain knowledge
Strongest Hypotheses
Identify hypotheses
○ HypothesisA: “Users jumping along & looking at multiple search result pages are having a bad experience”
○ HypothesisB: “Users navigating to a listing from search results are having a good experience”
○ HypothesisC: “Users typing in keywords in search box multiple times are not having a good experience”
Measurable User Journeys
● Identify particular user journeys in a visit○ hypothesisA: SPPPSPSP○ hypothesisB: SLL
● Journeys don’t need to be exclusive - they are not!● Lots of log parsing, mapreduce● Usually the process varies for each business
Data Preparation
● Start with a small sample size● Focus more on quality● Look out for anomalies & outliers● Remove correlated variables - noise
Data Visualization
● Visualize your data○ Simple Histogram will tell you a lot of things○ Scatter plots are good for identifying outliers
Regression analysis
● Statistical process for estimating the relationships among variables
● Choice of method largely depends of the form of data and variable types
● Linear regression is your go-to method for initial pokes● Poisson or logit model are also very useful tools for
most ecommerce related datasets
Example (Using R)
Independent Variables Estimate Std. Error z value Pr(>z) Significance
clickThroughToListings 0.34065 0.12654 2.692 0.00710**
pagingAroundSearchResults -0.28925 0.08688 -3.329 0.00087***
usingSearchBoxTooMuch 0.12038 0.12608 0.955 0.33967
glm(formula = addToCart ~ clickThroughToListings + pagingAroundSearchResults + usingSearchBoxTooMuch, family = "binomial", data = summary.df
)
Independent Variables Estimate Std. Error z value Pr(>z) Significance
clickThroughToListings 0.34065 0.12654 2.692 0.00710**
pagingAroundSearchResults -0.28925 0.08688 -3.329 0.00087***
usingSearchBoxTooMuch 0.12038 0.12608 0.955 0.33967
How to interpret signalDirection
How to interpret signal
Independent Variables Estimate Std. Error z value Pr(>z) Significance
clickThroughToListings 0.34065 0.12654 2.692 0.00710**
pagingAroundSearchResults -0.28925 0.08688 -3.329 0.00087***
usingSearchBoxTooMuch 0.12038 0.12608 0.955 0.33967
Significance
Concrete Direction
● Now we know which user segments present a real opportunity to make improvements
● How big is the customer segment = problem size● Knowing problem size helps in prioritizing
SummaryMethodology:1. Gather Strongest Hypotheses2. Construct Measurable User Journeys3. Choose & Apply statistical methods4. Support/reject hypothesis5. Repeat-Refine
Toolkit
● BigQuery: Query parsing, mapreduce● R: Data visualization, cleaning, augmentation, statistical
methods● Ruby: Scripting● Coffee: ‘Cos
Found it interesting?
Come and talk to us