38
Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient Abraham D. Flaxman, CMU Math Adam Tauman Kalai, TTI-Chicago H. Brendan McMahan, CMU CS May 25, 2006

Online Convex Optimization in the Bandit Setting: Gradient Descent

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Online Convex Optimization in the Bandit Setting: Gradient Descent

Online Convex Optimizationin the Bandit Setting:

Gradient Descent Without a Gradient

Abraham D. Flaxman, CMU Math

Adam Tauman Kalai, TTI-Chicago

H. Brendan McMahan, CMU CS

May 25, 2006

Page 2: Online Convex Optimization in the Bandit Setting: Gradient Descent

Outline

Online AnalysisExample: Multi-Armed BanditGeneral Setting: Online Convex Optimization

ResultsBandit Gradient Descent: AlgorithmBandit Gradient Descent: Analysis

Page 3: Online Convex Optimization in the Bandit Setting: Gradient Descent

Example: Multi-Armed Bandit Problem

Pittsburgh plans casino with 1000 slot machines.

Page 4: Online Convex Optimization in the Bandit Setting: Gradient Descent

Example: Multi-Armed Bandit Problem

One (of many) problems this raises:I Which of the 1000 machines should you play?

A traditional approach:I Assume that each machine i pays out independently with

probability pi

I Develop strategy accordingly

Would be nice not to assume that the machines are so wellbehaved.

Page 5: Online Convex Optimization in the Bandit Setting: Gradient Descent

Example: Multi-Armed Bandit Problem

One (of many) problems this raises:I Which of the 1000 machines should you play?

A traditional approach:I Assume that each machine i pays out independently with

probability pi

I Develop strategy accordingly

Would be nice not to assume that the machines are so wellbehaved.

Page 6: Online Convex Optimization in the Bandit Setting: Gradient Descent

Example: Multi-Armed Bandit Problem

One (of many) problems this raises:I Which of the 1000 machines should you play?

A traditional approach:I Assume that each machine i pays out independently with

probability pi

I Develop strategy accordingly

Would be nice not to assume that the machines are so wellbehaved.

Page 7: Online Convex Optimization in the Bandit Setting: Gradient Descent

Example: Adversarial Multi-Armed Bandit

The CS-Theory approach to removing the assumption:I An adversary controls when the machines pay out

I The adversary knows everything about your algorithm,except the results of random coin tosses

I How do you know if you are doing well?

Page 8: Online Convex Optimization in the Bandit Setting: Gradient Descent

Example: Adversarial Multi-Armed Bandit

The CS-Theory approach to removing the assumption:I An adversary controls when the machines pay out

I The adversary knows everything about your algorithm,except the results of random coin tosses

I How do you know if you are doing well?

Page 9: Online Convex Optimization in the Bandit Setting: Gradient Descent

Example: Adversarial Multi-Armed Bandit

The CS-Theory approach to removing the assumption:I An adversary controls when the machines pay out

I The adversary knows everything about your algorithm,except the results of random coin tosses

I How do you know if you are doing well?

Page 10: Online Convex Optimization in the Bandit Setting: Gradient Descent

Competitive Analysis and Regret

To see how well you’ve done:I Compare with the best you could have done if you knew

the future

I competitive ratio := zoffline/zonline

I regret := zoffline − zonline

What is zoffline?I Be nice if it was best sequence of machines to playI To have results, make it best single machine to play

Page 11: Online Convex Optimization in the Bandit Setting: Gradient Descent

Competitive Analysis and Regret

To see how well you’ve done:I Compare with the best you could have done if you knew

the future

I competitive ratio := zoffline/zonline

I regret := zoffline − zonline

What is zoffline?I Be nice if it was best sequence of machines to playI To have results, make it best single machine to play

Page 12: Online Convex Optimization in the Bandit Setting: Gradient Descent

Competitive Analysis and Regret

To see how well you’ve done:I Compare with the best you could have done if you knew

the future

I competitive ratio := zoffline/zonline

I regret := zoffline − zonline

What is zoffline?

I Be nice if it was best sequence of machines to playI To have results, make it best single machine to play

Page 13: Online Convex Optimization in the Bandit Setting: Gradient Descent

Competitive Analysis and Regret

To see how well you’ve done:I Compare with the best you could have done if you knew

the future

I competitive ratio := zoffline/zonline

I regret := zoffline − zonline

What is zoffline?I Be nice if it was best sequence of machines to play

I To have results, make it best single machine to play

Page 14: Online Convex Optimization in the Bandit Setting: Gradient Descent

Competitive Analysis and Regret

To see how well you’ve done:I Compare with the best you could have done if you knew

the future

I competitive ratio := zoffline/zonline

I regret := zoffline − zonline

What is zoffline?I Be nice if it was best sequence of machines to playI To have results, make it best single machine to play

Page 15: Online Convex Optimization in the Bandit Setting: Gradient Descent

Learning from Expert Advice

I Many other problems also fit into this framework

I For example, learning from expert adviceI But for computational reasons, a more general setting can

be convenient

Page 16: Online Convex Optimization in the Bandit Setting: Gradient Descent

Learning from Expert Advice

I Many other problems also fit into this frameworkI For example, learning from expert advice

I But for computational reasons, a more general setting canbe convenient

Page 17: Online Convex Optimization in the Bandit Setting: Gradient Descent

Learning from Expert Advice

I Many other problems also fit into this frameworkI For example, learning from expert adviceI But for computational reasons, a more general setting can

be convenient

Page 18: Online Convex Optimization in the Bandit Setting: Gradient Descent

General Setting: Online Convex Optimization

I Bounded convex set S ⊆ Rd , constant C > 0I At time t , simultaneously,

I Adversary chooses convex function ct : S → [−C, C]I We choose point x t ∈ S

I We pay adversary ct(x t)

Goal: choose a sequence of x t to make∑

t ct(x t) small

How we tell if we’ve done well: if regret is small

regret := minx∈S

{ ∑t

ct(x)

}−

∑t

ct(x t)

Page 19: Online Convex Optimization in the Bandit Setting: Gradient Descent

General Setting: Online Convex Optimization

I Bounded convex set S ⊆ Rd , constant C > 0

I At time t , simultaneously,I Adversary chooses convex function ct : S → [−C, C]I We choose point x t ∈ S

I We pay adversary ct(x t)

Goal: choose a sequence of x t to make∑

t ct(x t) small

How we tell if we’ve done well: if regret is small

regret := minx∈S

{ ∑t

ct(x)

}−

∑t

ct(x t)

Page 20: Online Convex Optimization in the Bandit Setting: Gradient Descent

General Setting: Online Convex Optimization

I Bounded convex set S ⊆ Rd , constant C > 0I At time t , simultaneously,

I Adversary chooses convex function ct : S → [−C, C]I We choose point x t ∈ S

I We pay adversary ct(x t)

Goal: choose a sequence of x t to make∑

t ct(x t) small

How we tell if we’ve done well: if regret is small

regret := minx∈S

{ ∑t

ct(x)

}−

∑t

ct(x t)

Page 21: Online Convex Optimization in the Bandit Setting: Gradient Descent

General Setting: Online Convex Optimization

I Bounded convex set S ⊆ Rd , constant C > 0I At time t , simultaneously,

I Adversary chooses convex function ct : S → [−C, C]I We choose point x t ∈ S

I We pay adversary ct(x t)

Goal: choose a sequence of x t to make∑

t ct(x t) small

How we tell if we’ve done well: if regret is small

regret := minx∈S

{ ∑t

ct(x)

}−

∑t

ct(x t)

Page 22: Online Convex Optimization in the Bandit Setting: Gradient Descent

General Setting: Online Convex Optimization

I Bounded convex set S ⊆ Rd , constant C > 0I At time t , simultaneously,

I Adversary chooses convex function ct : S → [−C, C]I We choose point x t ∈ S

I We pay adversary ct(x t)

Goal: choose a sequence of x t to make∑

t ct(x t) small

How we tell if we’ve done well: if regret is small

regret := minx∈S

{ ∑t

ct(x)

}−

∑t

ct(x t)

Page 23: Online Convex Optimization in the Bandit Setting: Gradient Descent

General Setting: Online Convex Optimization

I Bounded convex set S ⊆ Rd , constant C > 0I At time t , simultaneously,

I Adversary chooses convex function ct : S → [−C, C]I We choose point x t ∈ S

I We pay adversary ct(x t)

Goal: choose a sequence of x t to make∑

t ct(x t) small

How we tell if we’ve done well: if regret is small

regret := minx∈S

{ ∑t

ct(x)

}−

∑t

ct(x t)

Page 24: Online Convex Optimization in the Bandit Setting: Gradient Descent

An upper bound on regret

TheoremThere is a randomized algorithm which produces a sequence ofx t , so that if S ⊆ Rd and each ct takes values in [−C, C] then

E[regret] = E[

minx∈S

{ n∑t=1

ct(x)

}−

n∑t=1

ct(x t)

]≤ 6Cdn5/6.

Page 25: Online Convex Optimization in the Bandit Setting: Gradient Descent

An upper bound on regret

TheoremThere is a randomized algorithm which produces a sequence ofx t , so that if S ⊆ Rd and each ct takes values in [−C, C] then

E[regret] = E[

minx∈S

{ n∑t=1

ct(x)

}−

n∑t=1

ct(x t)

]≤ 6Cdn5/6.

Page 26: Online Convex Optimization in the Bandit Setting: Gradient Descent

The Randomized Algorithm

Algorithm is a form of gradient descent

d-dimensional online algorithm

S

x1

x2

x3x4

Main trick: a random variable approximating the gradient andformed by a single evaluation of the function

Page 27: Online Convex Optimization in the Bandit Setting: Gradient Descent

Analysis of algorithm

The flavor of the analysis is this:

‖x t − x?‖2/2η = ‖P(x t − η · gt)− x?‖2 = . . .

ct(x t)− ct(x?) ≤‖x t − x?‖2 − ‖x t+1 − x?‖2

2η+

ηG2

2.

regret =n∑

t=1

ct(x t)− ct(x?) ≤‖x1 − x?‖2

2η+ n

ηG2

2.

Take η = 1/√

n.

I Full Details:I A. Flaxman, A. Kalai, H. McMahon, Online convex

optimization in the bandit setting: gradient descent withouta gradient, Symposium of Discrete Algorithms (SODA),2005.

Page 28: Online Convex Optimization in the Bandit Setting: Gradient Descent

Analysis of algorithm

The flavor of the analysis is this:

‖x t − x?‖2/2η = ‖P(x t − η · gt)− x?‖2 = . . .

ct(x t)− ct(x?) ≤‖x t − x?‖2 − ‖x t+1 − x?‖2

2η+

ηG2

2.

regret =n∑

t=1

ct(x t)− ct(x?) ≤‖x1 − x?‖2

2η+ n

ηG2

2.

Take η = 1/√

n.

I Full Details:I A. Flaxman, A. Kalai, H. McMahon, Online convex

optimization in the bandit setting: gradient descent withouta gradient, Symposium of Discrete Algorithms (SODA),2005.

Page 29: Online Convex Optimization in the Bandit Setting: Gradient Descent

Analysis of algorithm

The flavor of the analysis is this:

‖x t − x?‖2/2η = ‖P(x t − η · gt)− x?‖2 = . . .

ct(x t)− ct(x?) ≤‖x t − x?‖2 − ‖x t+1 − x?‖2

2η+

ηG2

2.

regret =n∑

t=1

ct(x t)− ct(x?) ≤‖x1 − x?‖2

2η+ n

ηG2

2.

Take η = 1/√

n.

I Full Details:I A. Flaxman, A. Kalai, H. McMahon, Online convex

optimization in the bandit setting: gradient descent withouta gradient, Symposium of Discrete Algorithms (SODA),2005.

Page 30: Online Convex Optimization in the Bandit Setting: Gradient Descent

Analysis of algorithm

The flavor of the analysis is this:

‖x t − x?‖2/2η = ‖P(x t − η · gt)− x?‖2 = . . .

ct(x t)− ct(x?) ≤‖x t − x?‖2 − ‖x t+1 − x?‖2

2η+

ηG2

2.

regret =n∑

t=1

ct(x t)− ct(x?) ≤‖x1 − x?‖2

2η+ n

ηG2

2.

Take η = 1/√

n.

I Full Details:I A. Flaxman, A. Kalai, H. McMahon, Online convex

optimization in the bandit setting: gradient descent withouta gradient, Symposium of Discrete Algorithms (SODA),2005.

Page 31: Online Convex Optimization in the Bandit Setting: Gradient Descent

Analysis of algorithm

The flavor of the analysis is this:

‖x t − x?‖2/2η = ‖P(x t − η · gt)− x?‖2 = . . .

ct(x t)− ct(x?) ≤‖x t − x?‖2 − ‖x t+1 − x?‖2

2η+

ηG2

2.

regret =n∑

t=1

ct(x t)− ct(x?) ≤‖x1 − x?‖2

2η+ n

ηG2

2.

Take η = 1/√

n.

I Full Details:I A. Flaxman, A. Kalai, H. McMahon, Online convex

optimization in the bandit setting: gradient descent withouta gradient, Symposium of Discrete Algorithms (SODA),2005.

Page 32: Online Convex Optimization in the Bandit Setting: Gradient Descent

Analysis of algorithm

The flavor of the analysis is this:

‖x t − x?‖2/2η = ‖P(x t − η · gt)− x?‖2 = . . .

ct(x t)− ct(x?) ≤‖x t − x?‖2 − ‖x t+1 − x?‖2

2η+

ηG2

2.

regret =n∑

t=1

ct(x t)− ct(x?) ≤‖x1 − x?‖2

2η+ n

ηG2

2.

Take η = 1/√

n.

I Full Details:I A. Flaxman, A. Kalai, H. McMahon, Online convex

optimization in the bandit setting: gradient descent withouta gradient, Symposium of Discrete Algorithms (SODA),2005.

Page 33: Online Convex Optimization in the Bandit Setting: Gradient Descent

Conclusion

I Online convex optimization in the bandit settingI Analysis in an adversarial setting

I Exists algorithm with have regret ≤ 6Cdn5/6

I Streaming Algorithms?I ε-approximate solution in ε6 passes over the data

Page 34: Online Convex Optimization in the Bandit Setting: Gradient Descent

Conclusion

I Online convex optimization in the bandit setting

I Analysis in an adversarial settingI Exists algorithm with have regret ≤ 6Cdn5/6

I Streaming Algorithms?I ε-approximate solution in ε6 passes over the data

Page 35: Online Convex Optimization in the Bandit Setting: Gradient Descent

Conclusion

I Online convex optimization in the bandit settingI Analysis in an adversarial setting

I Exists algorithm with have regret ≤ 6Cdn5/6

I Streaming Algorithms?I ε-approximate solution in ε6 passes over the data

Page 36: Online Convex Optimization in the Bandit Setting: Gradient Descent

Conclusion

I Online convex optimization in the bandit settingI Analysis in an adversarial setting

I Exists algorithm with have regret ≤ 6Cdn5/6

I Streaming Algorithms?I ε-approximate solution in ε6 passes over the data

Page 37: Online Convex Optimization in the Bandit Setting: Gradient Descent

Conclusion

I Online convex optimization in the bandit settingI Analysis in an adversarial setting

I Exists algorithm with have regret ≤ 6Cdn5/6

I Streaming Algorithms?

I ε-approximate solution in ε6 passes over the data

Page 38: Online Convex Optimization in the Bandit Setting: Gradient Descent

Conclusion

I Online convex optimization in the bandit settingI Analysis in an adversarial setting

I Exists algorithm with have regret ≤ 6Cdn5/6

I Streaming Algorithms?I ε-approximate solution in ε6 passes over the data