View
761
Download
2
Category
Tags:
Preview:
DESCRIPTION
Markov chains can take predictive theory to a new level, with large-scale applications for digital marketing. From social media network modeling to user pathing, site scoring and recommended pages, Markov chains can quantify, rank, and return likely outcomes on the web. In other words, they can demystify demographics. Here's how.
Citation preview
Using Markov Chains to Predict User Behavior
Rivka Fogel
Markov Chains: Probability without History
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 2
Andrey Markov
Rivka Fogel
What Are Probability Spaces?
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 3
Focal Object / Function Co-Domain
Function/Possibility 1
Function/Possibility 2
• Also known as stochastic processes
Rivka Fogel
Type 1: Time Series
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 4
First Event
Function/Possibility 1
Function/Possibility 2
Time
Also called “states”
Rivka Fogel
Application: Personalization
• To return more accurate SERPs (E) for that user
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 5
Identifying user-specific authorities
User E B A
C D
Rivka Fogel
Type 2: Spatial Field
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 6
Shared Event
• Variable interactions are often statistically correlated
Rivka Fogel
Addition of The Markov Property
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 7
E because of B or D, not because of A
B A
C D
• The probability of B causing E, as opposed to D causing E, is calculated by the Bayesian Theorem
The Next State Depends Only on the Current State:
Rivka Fogel
Application: (not provided)
• The Markov Property enables the marketer to model paths without knowing every state.
• While some keyphrase data is known, it can also identify the keyphrase based on other users’ paths where the keyphrase is known.
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 8
Homepage
Keyphrase?
Bounce
Model Landing Page
Homepage Video View
Inventory
Gallery Page Video View
Rivka Fogel
Application: Multichannel Attribution
• Identify A (or predict D) via multiple probability states within a Markovian chain. COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 9
Monitoring and prediction can be based on probability of a user’s path given other users’ paths
Probability of B A Probability of C
B 1 C Known Path 1
B 2 C Known Path 2
D
4
5
Rivka Fogel
Application: Audience Segmentation
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 10
Probability of B A
Probability of C
B 1 C Known Path 1
B 2 C
D
4
5
Landing Page
Known Path 2
Referral Paths On-Site Paths
Rivka Fogel
Relational Markov Properties
• Relational Markov Models group multiple types of objects – relations – and calculate the probability of the relation’s appearance in a state.
• They work off of Dynamic Bayesian Networks
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 11
Relational Markov Models allow states to be of different types.
E because of B or D’s type, not because of A or C’s type
State B
State D
Type 2 Type 1
State A
State C
Rivka Fogel
Application: Audience Segmentation 2
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 12
B
1 C
2
Paid
Organic
Known
Rivka Fogel
Application: User Experience
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 13
Homepage Bounce
Model Landing Page
Homepage Video View
Inventory
Gallery Page Video View
Page Visit Video View Bounce
Types:
Rivka Fogel
Application: Social Network Modeling
• This function will answer: if the user ended up converting/visiting the landing page, which [type(s)] of social interaction[s] came into play?
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 14
Site Landing
Page
Rich Media Play Rich Media
Host Page
User Share
Influencer
News Feed
Brand Social Profile
Rivka Fogel
Application: HTTP Service Request Prediction
• Prefetch Page A given the probability that the user will want to see it. • The keyphrase cluster is predicted by the function with co-domain B and
is then used to predict the incidence of B where the first state isn’t known.
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 15
Probability of 3 A Keyphrase 1
1
3
2
Known Paths
Keyphrase Cluster
Keyphrase 2
Rivka Fogel
Application: Agent Suggestion
• Auto-suggests searches (Search C) and links (URL E) that the user is likely to want to access, based on user history and other users’ history
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 16
URL A URL B
URL C URL D URL E
Keyphrase Cluster or Authority
First words of Query
Search A
Search B Search C
Rivka Fogel
Application: Search Engine Scoring
• The function identifies hubs of authority that are probable next steps in many systems (each with individual focus objects).
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 17
Identifying Authority 2:
Page A Keyphrase Cluster
Page B
Link 2
Page C Link 1
Authority 1 Authority 2
Rivka Fogel
Appendix: Formal Definitions
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 18
Where, Probability Spaces: • The measurable space (S, Σ) and an object on the
measurable space X • The probability space is defined by the function P, the
assignment of probabilities to events, and where Ω is the set of possible outcomes, and F is set of events in which each event has 0 or more outcomes P(x) = Σ(t1-tk)P(t1) for all X on Ω
• The finite dimensional distribution X: Xt1 Ω -> Xk
• That arrow, or the push forward measures, or the random distribution of events, or the matrix of transition probabilities P PT1(.)=PT1(.)/x = Sk
– Where the Bayesian theorem allows for: P (H|E old) = P(H)*P(H|E new)/P(E entire set)
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 19
Rivka Fogel
• P(Xl+1=S | Xl=St | Xl-1 = St-1 … X0 = S0) = P(Xl+1=S | Xl = Sl) | Xl=I – The random distribution of events is defined because the
system is finite. • So, in the matrix of transition probabilities [defined
as Pl, l+1 over ij = P(Xl+1 = j | Xl=i)], Pl is independent of l.
• That is, s^(t) = s^
(t-1)A – s is the state space, A is the matrix of transition
probabilities, and ^ is the initial probability distribution of the states in s. s(t) is the probability vector for states at time “t.”
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 20
Then, Markov Property:
Rivka Fogel
Markov Restatement 1: When a User’s History is Available
• A(s, s’)=C(s,s’)/Σs’’ C(s,s’’) and ^(s)=C(s)/Σs’ C(s’) – C(s,s’) counts the instances where s’ follows s – This can be applied to HTTP prediction and agent
suggestion
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 21
Rivka Fogel
Markov Restatement 2: When the Evidence Comes from a User Pool • The Markov function becomes a generative chain
link system that can store counts and probabilities • s^(t) = a0i^(t-1)A+a1i^(t-2)A2+a2i^(t-3)A3… and
= Max(a0i^(t-1)A+a1i^(t-2)A2+a2i^(t-3)A3…) – s(t) is normalized to select a list of probable states. – Where probabilities are used:
This can be applied to authority hubs as well, where collected user path traversal patterns are represented in a traversal connectivity matrix.
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 22
Rivka Fogel
Markov Restatement 3: When Groupings of States Are Estimated • These are Relational Markov Models • These groupings are also seen as abstractions. A(Q) forms a
lattice of abstractions. – {D, R, Q, A, π} where D ∈ D is the tree and a hierarchy of values. R is a
set of relations. Each relation is defined by nodes on leaves of D. Q is the set of states. A is the transition probability matrix. Π is the initial probability, that is the initial state in the chain. States are defined as abstractions on Q.
– The rank of an abstraction a=R(d1, …., dk) in the lattice is defined as 1+ Σk
1 depth(dk). Depth is a node’s depth on the tree, and increases with the abstraction’s rank. The rank of Q (the most general) is 0.
• States that have nodes on common leaves will more frequently appear in abstractions together.
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 23
Rivka Fogel
Further Reading • Anderson, Corin R., Domingos, Pedro, and Weld, Daniel S.
“Relational Markov Models and their Application to Adaptive Web Navigation.” Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. (2002): 143-152. Electronic. http://homes.cs.washington.edu/~pedrod/papers/kdd02a.pdf
• Downey, Allen. “Bayesian statistics made (as) simple (as possible).” Pycon US. 7 March 2012. http://pyvideo.org/video/608/bayesian-statistics-made-as-simple-as-possible
• Ildiko, Flesch and Lucas, Peter. “Markov Equivalence in Bayesian Networks.” Electronic. http://www.cs.ru.nl/P.Lucas/markoveq.pdf
• Sarukkai, Ramesh R. “Link prediction and path analysis using Markov chains.” Computer Networks 3 (June 2000): 377-386. Electronic. http://www.sciencedirect.com/science/article/pii/S138912860000044X
COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 24
Rivka Fogel
Questions?
Recommended