Download pdf - An Adaptive Proportional Value-per-Click Agent for Bidding in Ad Auctions

An Adaptive Proportional Value-per-Click Agent for Bidding in Ad

Auctions Trading Agent Design and Analysis Workshop 2011

Kyriakos C. Chatzidimitriou AUTH/CERTH

Lampros C. Stavrogiannis Univ. of Southampton

Andreas L. Symeonidis AUTH/CERTH

Pericles A. Mitkas AUTH/CERTH

Introduction

• Basic idea: working paper of Dr. Yevgeniy Vorobeychik regarding QuakTAC 2009 entry

• Since this initial work, we have: – Conducted more Game-Theoretic experiments – Improved conversions estimation – Improved user distribution estimation – Included an adaptive component

• Ended up with (more or less) the same: “Ultimate Answer to the Ultimate Question of Life, The

Universe, and Everything” TAC Ad Auctions Game

0.3 TADA@IJCAI 2011 Mertacor 2

Basic Strategy: VPC

q

d

q

dvabid

11

]|[}|{Prˆ

^

conversionrevenueEclickconversionvqqq

)ˆ}(|{Pr}|{Pr1

^^

d

qqqIfocusedconversioncentagefocusedPerclickconversion

A

B

C

D

TADA@IJCAI 2011 Mertacor 3

A) Expected Revenue

• Solely depends on Manufacturer’s Specialty (MS)

qin matchednot MS

qin matched MS)1(

qin definednot MS3/))3((

USP

MSBUSP

MSBUSP


B) Focused Percentage

• Monte Carlo Simulations

• First Method (Vorobeychik) – focusedPercentagequery = conversionsquery /

[clicksquery * Pr(conversionquery )]

– Average over query class (F0,F1,F2)

• Second Method – Use server source files

– MC states (NS, IS, F0, F1, F2, T) per product (x9)

– focusedPercentagequery = Fiquery / (Fiquery + ISquery)

2011


Graph for query (pg, null)


C) Id Estimation

• kNN – Inspired by periodic conversions behavior

– Time series matching using Euclidean Distance as a similarity criterion

– k = 5, t = 5, N = 600

• Heuristic Baseline – Underestimate for bidding higher cd = (cd-1 +cd-2 +cd-3 )/4

• Aggregate – cd = (kNN+Baseline)/2

– cd+1 = ((kNN+Baseline)/2)/2

)ˆˆ(11231

cap

ddddddCcccccgI


kNN example


No conversions

High conversion

prob.

High VPC

High bid

High ranking

Conversions

Low conversion

prob.

Low VPC

Low bid

No ad display

Cyclic behavior

• 5-day long pulses

• Pulse Height & Width related to factors like user distribution at the time, competition

• Large peaks in daily profits come from “catching the wave”


Rest of the strategy

• Budget unconstrained

• Hard-coded ad selection strategy

– F0 => generic

– F2 => if user preference matched => targeted

– F1 => if one of the preferences is matched => targeted, else generic


Simulation-based Game Theoretical Analysis

• One-shot Bayesian game

• Myopic linear strategies b = α ∙ vpc -> find optimal shading, α

• Iterative best response to find a symmetric Bayes-Nash equilibrium

• Most profitable single deviation from a homogeneous set of opponents until self-play is best response -> BNE


D) alpha

• Vorobeychik

– “a = 0.2, 0.3 more robust to aggressive opponents”

– The previous best values found a=0.1, 0.2 (2009) not profitable in 2010 platform

• We have re-run the algorithm under the 2010 specs

– a=0.3 is the optimal value (1 -> 0.4 -> 0.3)



• Instead of α -> (αF0 ,αF1, αF2) x (αCLOW, αCMED,αCHIGH)

• Start from optimal α = 0.3, explore all possible deviations for each α, first for query levels then capacity levels

• 0.3 seems to be optimal in all cases

• Points in between do not yield different results (0.3 still the best)






Adaptive component

• Problem Statement

We want to capture the case where, based on the current environment (competition conditions),

having a different α than 0.3, will yield a competitive advantage

• GT analysis “a good starting point”

• Model it as an associative k-armed bandit problem with optimistic initial values and e-greedy action selection strategy


State, Action, Reward

• State

– Quantized VPC (x11)

– Capacity (x3)

– Query Type (x3)

– Manufacturer Specialty Bonus (x2)

– Component Specialty Bonus (x2)

• a = {0.28, 0.29, 0.3, 0.31, 0.32}

• r = daily profits


Experiment (1/2)

• Self-play

– 210 games

– All capacities to 450 (MEDIUM)

• The standard agent is unbeatable since it is created that way

Agent Name Score

Mertacor-Std-1 53.042

Mertacor-Std-2 52.763

Mertacor-kNN-1 52.673

Mertacor-kNN-2 52.703

Mertacor-RL-1 52.270

Mertacor-RL-2 52.233

Mertacor-Full-1 51.673

Mertacor-Full-2 51.899


Experiment (2/2)

• Mix-up things, include more agents with different strategies – 250 games

– All capacities to 450 (MEDIUM)

• Better estimation lead to better performance

• Adaptiveness is suited for even more complicated environments (capacity and strategy wise)

Agent Name Score

Mertacor-kNN 53.223

Mertacor-Std 52.245

Schlemazl (2010) 51.975

Mertacor-Full 51.796

Mertacor-RL 51.790

Epflagent (2010) 49.232

Tau (2010) 45.987

Crocodile (2010) 45.858


Also tested/under development

• Daily Campaign Budget Threshold algorithms

– Estimation

– Simulation

• Particle Filtering for user state estimation

– TacTex

2011


Conclusions & Future Work

• α = 0.3 is a very powerful conclusion/hard to beat

• Better estimates for B) user state and C) Id could further improve performance

• On-line learning still in very crude form – Not yet satisfied but seems a reasonable thing to do

• Competition-wise: fitted-Q learning from data logs


Thank you for your attention

Questions?