An Adaptive Proportional Value-per-Click Agent for Bidding in Ad
Auctions Trading Agent Design and Analysis Workshop 2011
Kyriakos C. Chatzidimitriou AUTH/CERTH
Lampros C. Stavrogiannis Univ. of Southampton
Andreas L. Symeonidis AUTH/CERTH
Pericles A. Mitkas AUTH/CERTH
Introduction
• Basic idea: working paper of Dr. Yevgeniy Vorobeychik regarding QuakTAC 2009 entry
• Since this initial work, we have: – Conducted more Game-Theoretic experiments – Improved conversions estimation – Improved user distribution estimation – Included an adaptive component
• Ended up with (more or less) the same: “Ultimate Answer to the Ultimate Question of Life, The
Universe, and Everything” TAC Ad Auctions Game
0.3 TADA@IJCAI 2011 Mertacor 2
Basic Strategy: VPC
q
d
q
dvabid
11
]|[}|{Prˆ
^
conversionrevenueEclickconversionvqqq
)ˆ}(|{Pr}|{Pr1
^^
d
qqqIfocusedconversioncentagefocusedPerclickconversion
A
B
C
D
TADA@IJCAI 2011 Mertacor 3
A) Expected Revenue
• Solely depends on Manufacturer’s Specialty (MS)
qin matchednot MS
qin matched MS)1(
qin definednot MS3/))3((
USP
MSBUSP
MSBUSP
TADA@IJCAI 2011 Mertacor 4
B) Focused Percentage
• Monte Carlo Simulations
• First Method (Vorobeychik) – focusedPercentagequery = conversionsquery /
[clicksquery * Pr(conversionquery )]
– Average over query class (F0,F1,F2)
• Second Method – Use server source files
– MC states (NS, IS, F0, F1, F2, T) per product (x9)
– focusedPercentagequery = Fiquery / (Fiquery + ISquery)
2011
TADA@IJCAI 2011 Mertacor 5
C) Id Estimation
• kNN – Inspired by periodic conversions behavior
– Time series matching using Euclidean Distance as a similarity criterion
– k = 5, t = 5, N = 600
• Heuristic Baseline – Underestimate for bidding higher cd = (cd-1 +cd-2 +cd-3 )/4
• Aggregate – cd = (kNN+Baseline)/2
– cd+1 = ((kNN+Baseline)/2)/2
)ˆˆ(11231
cap
ddddddCcccccgI
TADA@IJCAI 2011 Mertacor 7
No conversions
High conversion
prob.
High VPC
High bid
High ranking
Conversions
Low conversion
prob.
Low VPC
Low bid
No ad display
Cyclic behavior
• 5-day long pulses
• Pulse Height & Width related to factors like user distribution at the time, competition
• Large peaks in daily profits come from “catching the wave”
TADA@IJCAI 2011 Mertacor 9
Rest of the strategy
• Budget unconstrained
• Hard-coded ad selection strategy
– F0 => generic
– F2 => if user preference matched => targeted
– F1 => if one of the preferences is matched => targeted, else generic
TADA@IJCAI 2011 Mertacor 10
Simulation-based Game Theoretical Analysis
• One-shot Bayesian game
• Myopic linear strategies b = α ∙ vpc -> find optimal shading, α
• Iterative best response to find a symmetric Bayes-Nash equilibrium
• Most profitable single deviation from a homogeneous set of opponents until self-play is best response -> BNE
TADA@IJCAI 2011 Mertacor 11
D) alpha
• Vorobeychik
– “a = 0.2, 0.3 more robust to aggressive opponents”
– The previous best values found a=0.1, 0.2 (2009) not profitable in 2010 platform
• We have re-run the algorithm under the 2010 specs
– a=0.3 is the optimal value (1 -> 0.4 -> 0.3)
TADA@IJCAI 2011 Mertacor 12
Simulation-based Game Theoretical Analysis
• Instead of α -> (αF0 ,αF1, αF2) x (αCLOW, αCMED,αCHIGH)
• Start from optimal α = 0.3, explore all possible deviations for each α, first for query levels then capacity levels
• 0.3 seems to be optimal in all cases
• Points in between do not yield different results (0.3 still the best)
TADA@IJCAI 2011 Mertacor 13
Adaptive component
• Problem Statement
We want to capture the case where, based on the current environment (competition conditions),
having a different α than 0.3, will yield a competitive advantage
• GT analysis “a good starting point”
• Model it as an associative k-armed bandit problem with optimistic initial values and e-greedy action selection strategy
TADA@IJCAI 2011 Mertacor 16
State, Action, Reward
• State
– Quantized VPC (x11)
– Capacity (x3)
– Query Type (x3)
– Manufacturer Specialty Bonus (x2)
– Component Specialty Bonus (x2)
• a = {0.28, 0.29, 0.3, 0.31, 0.32}
• r = daily profits
TADA@IJCAI 2011 Mertacor 17
Experiment (1/2)
• Self-play
– 210 games
– All capacities to 450 (MEDIUM)
• The standard agent is unbeatable since it is created that way
Agent Name Score
Mertacor-Std-1 53.042
Mertacor-Std-2 52.763
Mertacor-kNN-1 52.673
Mertacor-kNN-2 52.703
Mertacor-RL-1 52.270
Mertacor-RL-2 52.233
Mertacor-Full-1 51.673
Mertacor-Full-2 51.899
TADA@IJCAI 2011 Mertacor 18
Experiment (2/2)
• Mix-up things, include more agents with different strategies – 250 games
– All capacities to 450 (MEDIUM)
• Better estimation lead to better performance
• Adaptiveness is suited for even more complicated environments (capacity and strategy wise)
Agent Name Score
Mertacor-kNN 53.223
Mertacor-Std 52.245
Schlemazl (2010) 51.975
Mertacor-Full 51.796
Mertacor-RL 51.790
Epflagent (2010) 49.232
Tau (2010) 45.987
Crocodile (2010) 45.858
TADA@IJCAI 2011 Mertacor 19
Also tested/under development
• Daily Campaign Budget Threshold algorithms
– Estimation
– Simulation
• Particle Filtering for user state estimation
– TacTex
2011
TADA@IJCAI 2011 Mertacor 20
Conclusions & Future Work
• α = 0.3 is a very powerful conclusion/hard to beat
• Better estimates for B) user state and C) Id could further improve performance
• On-line learning still in very crude form – Not yet satisfied but seems a reasonable thing to do
• Competition-wise: fitted-Q learning from data logs
TADA@IJCAI 2011 Mertacor 21