Online Allocation Algorithms with Applications in

Online Allocation Algorithms with Applications in

Computational Advertising

by

Morteza Zadimoghaddam

Submitted to the Department of Electrical Engineering and ComputerScience

in partial fulfillment of the requirements for the degree of

Doctor of Philosophy in Computer Science

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

February 2014

c© Massachusetts Institute of Technology 2014. All rights reserved.

Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Department of Electrical Engineering and Computer Science

January 24, 2014

Certified by. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Erik D. Demaine

ProfessorThesis Supervisor

Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Leslie A. Kolodziejski

Chairman, Department Committee on Graduate Students

To Faezeh

Online Allocation Algorithms with Applications in

Computational Advertising

by

Morteza Zadimoghaddam

Submitted to the Department of Electrical Engineering and Computer Scienceon January 24, 2014, in partial fulfillment of the

requirements for the degree ofDoctor of Philosophy in Computer Science

Abstract

Over the last few decades, a wide variety of allocation markets emerged from theInternet and introduced interesting algorithmic challenges, e.g., ad auctions, onlinedating markets, matching skilled workers to jobs, etc. I focus on the use of allocationalgorithms in computational advertising as it is the quintessential application of myresearch. I will also touch on the classic secretary problem with submodular utilityfunctions, and show that how it is related to advertiser’s optimization problem incomputational advertising applications. In all these practical situations, we shouldfocus on solving the allocation problems in an online setting since the input is beingrevealed during the course of the algorithm, and at the same time we should makeirrevocable decisions. We can formalize these types of computational advertisingproblems as follows. We are given a set of online items, arriving one by one, and aset of advertisers where each advertiser specifies how much she wants to pay for eachof the online items. The goal is to allocate online items to advertisers to maximizesome objective function like the total revenue, or the total quality of the allocation.There are two main classes of extensively studied problems in this context: budgetedallocation (a.k.a. the adwords problem) and display ad problems. Each advertiser isconstrained by an overall budget limit, the maximum total amount she can pay inthe first class, and by some positive integer capacity, the maximum number of onlineitems we can assign to her in the second class.

Thesis Supervisor: Erik D. DemaineTitle: Professor

3

Acknowledgments

As I am approaching the end of my doctorate studies at MIT, I vividly remember the

people who had a great influence on my life. I am indebted to them for their help

and support throughout this journey. At first I would like to thank my mother and

father (Fatemeh and Abbas); it is because of their never ending support that I have

had the chance to progress in life. Their dedication to my education provided the

foundation for my studies.

I started studying extracurricular Mathematics and Computer Science books in

high school with the hope of succeeding in the Iranian Olympiad in Informatics. For

this, I am grateful to Professor Mohammad Ghodsi; for the past many years he has

had a critical role in organizing Computer Science related competitions and training

camps all around Iran. With no doubt, without his support and guidance I would

not have had access to an amazing educational atmosphere.

Probably the main reason I got admitted to MIT was the excellent guidance of

Professor MohammadTaghi Hajiaghayi on how to conduct research in Theoretical

Computer Science. From my undergraduate years up to now, Prof. Hajiaghayi has

been a great colleague to work with. I would also like to thank Dr. Vahab Mirrokni

who has been a great mentor for me in my graduate studies and during my internship

at Google research. I am grateful for his support, advice, and friendship.

I would like to express my deepest gratitude to my advisor, Professor Erik Demaine

for providing me with an excellent environment for conducting research. Professor

Demaine has always amazed me by his intuitive way of thinking and his great teaching

skills. He has been a great support for me and a nice person to talk to.

I would like to thank my committee members, Professors Piotr Indyk and Costis

Daskalakis who were more than generous with their expertise and precious time.

I would also like to thank all my friends during my studies. In particular, I want

to thank Mohammad Norouzi, Mohammad Rashidian, Hamid Mahini, Amin Karbasi,

Hossein Fariborzi, Dan Alistarh, and Martin Demaine.

Finally, and most importantly, I want to thank my wife, Faezeh, who has been

4

very supportive during all my study years. Without your encouragements, I could

not come this far. You inspire me by the way you look at life. Thank you for making

all these years so wonderful.

5

Contents

1 Introduction 9

1.1 Budgeted Allocation Problem . . . . . . . . . . . . . . . . . . . . . . 11

1.2 Display Ad Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.3 Submodular Secretary Problem . . . . . . . . . . . . . . . . . . . . . 14

2 Simultaneous Algorithms for Budgeted Allocation Problem 16

2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2 Main Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2.1 Lower bounding the Increase in the Potential Function . . . . 23

2.2.2 Description of the Factor-Revealing Mathematical Program . . 26

2.3 The Competitive Ratio of Weighted-Balance . . . . . . . . . . . . . . 29

2.4 The Competitive Ratio of Balance . . . . . . . . . . . . . . . . . . . 32

2.5 Hardness Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3 Bicriteria Online Matching: Maximizing Weight and Cardinality 41

3.1 Results and Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.2 Hardness Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.2.1 Better Upper Bounds via Factor-Revealing Linear Programs . 48

3.2.2 Hardness Results for Large Values of Weight Approximation

Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.3 Algorithm for Large Capacities . . . . . . . . . . . . . . . . . . . . . 56

3.4 Algorithm for Small Capacities . . . . . . . . . . . . . . . . . . . . . 61

6

3.4.1 Lower Bounding the Weight Approximation Ratio . . . . . . . 62

3.4.2 Factor Revealing Linear Program for CardinalityAlg . . . . . . 63

4 Submodular Secretary Problem and its Extensions 66

4.1 Our Results and Techniques . . . . . . . . . . . . . . . . . . . . . . . 71

4.2 The Submodular Secretary Problem . . . . . . . . . . . . . . . . . . . 73

4.2.1 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.2.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.3 The Submodular Matroid Secretary Problem . . . . . . . . . . . . . . 82

4.4 Knapsack Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.5 The Subadditive Secretary Problem . . . . . . . . . . . . . . . . . . . 89

4.5.1 Hardness Result . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4.5.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4.6 Further Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4.6.1 The Secretary Problem with the “Maximum” Function . . . . 95

7

List of Figures

3-1 New curves for upper and lower bounds. . . . . . . . . . . . . . . . . 45

8

Chapter 1

Introduction

Over the last few decades, a wide variety of allocation markets have emerged. Some of

the most important examples of these markets are a) online dating markets that span

around 75% of single people in United States, b) matching skilled workers with jobs

such as National Resident Matching Program (NRMP) (or the Match) that matches

medical school students with residency programs, c) Ad Auction multi-billion dollars

markets which I elaborate more on it later as the quintessential application of my

research. The emergence of these markets introduced many interesting optimization

and algorithmic challenges. We can formalize these problems with a set of resources

present in advance that should be consumed with a set of online nodes arriving one

at a time during the algorithm. The goal is to design an algorithm that allocates

these resources to online nodes when they arrive without over exhausting resources.

We note that the allocation decisions are irrevocable. There are several objective

functions considered in these problems including a) the revenue achieved by these

allocations, and b) the quality of the allocation which is the total satisfaction of

online nodes based on the resources they have received. We have studied designing

online allocation algorithms that incorporate these multiple objectives simultaneously.

Another important aspect of these problems that contributes to their online nature

is the arrival order of online nodes. The performance of an online algorithm heavily

depends on which nodes are going to arrive and their arrival order. In the worst

case approach, the algorithm’s performance is measured assuming the arrival order

9

is determined by an adversary, and therefore the algorithm’s worst performance is

computed as its indicative performance. However in stochastic settings, the arrival

order is assumed to be a random permutation of online items, and the expected

value of algorithm’s performance is considered as its indicative performance. How to

combine these two settings, and coming up with a more realistic analysis framework

is the other main topic of my research.

I would like to highlight the use of allocation algorithms in computational advertis-

ing which is the quintessential application of my research e.g., the multi-billion dollar

ad auction markets of Google or Bing search engines. We can formalize these types

of computational advertising problems as follows. We are given a set of online items

that could represent keywords searched in a search engine during a day, pageviews

of a news website, or users of any website that shows ads. These online items are

arriving one by one, and the algorithm should serve these items by assigning them

some advertisements. On the other hand, we have a set of advertisers where each

advertiser specifies how much she wants to pay for each of the online items when they

arrive. There are multiple important objectives in this context including maximizing

the total revenue, the number of clicks on ads, or the number of served online items.

In these advertising applications, advertisers are in fact the resources, and there are

different types of constraints on these resources depending on the application. There

are two main classes of extensively studied problems in this context: budgeted allo-

cation (a.k.a. the adwords problem) and display ad problems. Each advertiser is

constrained by an overall budget limit, the maximum total amount she can pay in

the first class, and by some positive integer capacity, the maximum number of online

items we can assign to her in the second class.

The main issues of combining different objectives, and considering different arrival

orders are discussed above as part of the optimization problem the publisher is facing.

The publisher (Google, Bing, etc) is the search engine that decides which ads to show

for each online node that arrives and has to maintain some revenue and keep up the

quality (relevance) of these ads to maintain the online users in long term. At the end

of this thesis, we look at these settings from an advertisers perspective which has a

10

limited budget and tries to win (capture) the most relevant online nodes to publicize

her own business. We will see later how this is related to the classic secretary problem,

and how we should extend the classic secretary problem to capture complexities of

the advertiser’s problem in online ad auctions.

1.1 Budgeted Allocation Problem

The performance of online algorithms for the adwords (Budgeted Allocation) prob-

lem heavily depends on the order in which the online items will appear. Mehta et

al.[65] provided a 1 − 1/e competitive algorithm which works for every input order

(adversarial or worst case order). They presented a novel adaptive scoring function

on the advertisers (based on their remaining budgets) and allocated each online item

to an advertiser for whom the product of her bid for the online item and her score

is maximized. In another independent direction, researchers assumed that the online

items appear according to a random permutation. Devanur and Hayes [23] proposed

a Primal-Dual 1−ε competitive algorithm in this case which can be adapted to other

similar stochastic settings as well. The adversarial case is too pessimistic to model

reality. On the other hand, the random arrival (and other stochastic) setting is useful

only if the incoming traffic patterns of online items (e.g., page-views) can be predicted

with a reasonably good precision. In other words, such algorithms may rely heavily on

a precise forecast of the online traffic patterns, and may not react quickly to sudden

changes in these patterns. In fact, the slow reaction to such spikes in traffic patterns

imposes a serious limitation on the real-world use of stochastic algorithms in practical

applications. This is a common issue in applying stochastic optimization techniques

to the online resource allocation problems, e.g., see [79]. In an effort to tackle this ad

allocation problem, we [67] study algorithms that achieve good performance ratios

in both adversarial and stochastic frameworks simultaneously, and are robust against

different traffic patterns.

We present algorithms that achieve the best competitive ratios of both settings

simultaneously, i.e. 1− ε in the random arrival model and 1− 1/e in the adversarial

11

case for unweighted graphs. However, for weighted graphs we prove that this is not

possible; we show when the competitive ratio of an online algorithm tends to 1 in

the random arrival model, its competitive ratio tends to 0 in the adversarial model.

Formally, we prove that an algorithm with competitive ratio 1 − ε in the random

order arrival setting cannot have a competitive ratio more than 4√ε in the adversarial

setting. We also prove that no online algorithm that achieves an approximation factor

of 1 − 1/e for the adversarial inputs can achieve an average approximation factor

better than 97.6% for random arrival inputs. In light of this hardness result, we

design algorithms with improved approximation ratios in the random arrival model

while preserving the competitive ratio of 1 − 1/e in the worst case. To this end,

we show the algorithm proposed by [65] achieves a competitive ratio of 0.76 for the

random arrival model, while having a 1 − 1/e ≈ 0.63 competitive ratio in the worst

case.

Main Techniques: To achieve the hardness result, we exploit the fact that the

algorithm has no means of distinguishing between adversarial and stochastic inputs.

It is then sufficient to construct instances in which the behavior of an optimum online

algorithm in the stochastic setting differs drastically from an algorithm that works well

in the adversarial setting. For the positive results, we propose a general three stage

process which is useful in analyzing the performance of any greedy algorithm: (i) We

first define an appropriate potential function as the sum of the indefnite integrals of

the advertisers’ scores, and interpret the online algorithm as a greedy approach acting

to improve the potential function by optimizing the corresponding scoring functions.

(ii) We then track the changes in the potential function by formulating a mathematical

program. (iii) Finally, by discretization on two spectrums of time and budgets and

applying the mean value theorem of calculus, we translate the mathematical program

into a constant-size LP and solve it numerically. We prove that the solution of this

LP lower bounds the competitive ratio of the algorithm.

12

1.2 Display Ad Problem

In contrast to the budgeted allocation problem, in the display advertising problem,

advertisers are not constrained by budget caps. Instead, each advertiser has some

integer capacity which is the maximum number of online items we can assign to her.

This problem has been modeled as maximizing the weight of an online matching

instance [35, 34, 25, 15, 24]. While weight is indeed important, this model ignores

the fact that cardinality of the matching is also crucial in the display ad application.

This fact illustrates that in many real applications of online allocation, one needs to

optimize multiple objective functions, though most of the previous work in this area

deals with only a single objective function. On the other hand, there is a large body of

work exploring offline multi-objective optimization in the approximation algorithms

literature. In this part of thesis, we focus on simultaneously maximizing online two

objectives which have been studied extensively in matching problems: cardinality and

weight.

In online display advertising, advertisers typically purchase bundles of millions

of display ad impressions from web publishers. Display ad serving systems that

assign ads to pages on behalf of publishers must satisfy the contracts with advertisers,

respecting targeting criteria and delivery goals. Modulo this, publishers try to allocate

ads intelligently to maximize overall quality (measured, for example, by clicks), and

therefore a desirable property of an ad serving system is to maximize this quality

while satisfying the contracts to deliver to each advertiser its purchased bundle of

impressions. This motivates our model of the display ad problem as simultaneously

maximizing weight and cardinality.

We study this multi-objective online allocation problem by providing bicrite-

ria competitive algorithms and showing the tightness of our algorithms by proving

hardness results. We [60] present a bicriteria (p(1 − 1/e1/p), (1 − p)(1 − 1/e1−p))-

approximation algorithm for every p in range (0, 1) where the two approximation

factors are for the weight and cardinality objectives. We provide hardness results

that show the exact tightness of our algorithm at three main points: the starting

13

point (0, 1− 1/e), the middle point (0.43, 0.43), and the final point (1− 1/e, 0). Our

hardness results also show that the gap between our algorithm approximation factors

and the pareto optimal curve is at most 9% at other points.

To achieve efficient algorithms that focus on both objectives, we use two primal

dual subroutines that are responsible for weight and cardinality of the allocation

separately. We allow the subroutines to share and exhaust the total capacities of

advertisers separately, and provide an analysis to handle the collisions. To get hard-

ness results, we construct multi-phase instances with exponentially growing weights.

We then formulate linear programs that upper bound the performance of any on-

line algorithm on the synthesized instances. Approximating the linear programs with

constant size factor-revealing LPs yields the appropriate hardness results.

1.3 Submodular Secretary Problem

Another interesting and well-studied online allocation problem is the classic secretary

problem in which a company wants to hire a secretary with a sequence of applicants

arriving one by one. The goal is to hire the maximum value applicant, and the main

assumption that yields non-zero competitive algorithms in this setting is the random

order arrival of applicants. One of the most important generalizations of the secretary

problem is the multiple choice version in which the company wants to hire several

skilled workers (applicants) with different constraints varying from a simple cap on

the number of secretaries to much more complex cases, such as matroid constraints.

We consider the problem of hiring up to a fixed number of applicants to maximize

the performance of the secretarial group [11]. The overlapping skills of applicants can

be modeled by defining the performance function as a submodular function on the set

of applicants. The class of submodular functions spans a variety of value functions

in practical settings. Following we show how this generalized version of classic sec-

retary problem relates to the ad auctions applications. Consider an advertiser that

has limited budget to spend throughout the day. She wants to show her ads to a

subset of online users (e.g. associated with searched keywords in a search engine) to

14

maximize her own value while respecting her own budget. The budget constraint of

the advertiser could be modelled with a knapsack constraint when the price of win-

ning each online node is estimated accurately. The value function of subsets of online

nodes is submodular in many applications as it observes the diminishing marginal

value property.

We present constant competitive algorithms for this submodular secretary problem

and provide constant hardness results as well. We then generalize our result to achieve

constant competitive algorithms when we have multiple knapsack constraints. In

the case of multiple matroid constraints, we present poly-logarithmic competitive

algorithms. Finally, we show that Θ(√n) is the best achievable competitve ratio

for the subadditive secretary problem which models almost all practical situations

including this class of submodular secretary problems.

15

Chapter 2

Simultaneous Algorithms for

Budgeted Allocation Problem

Online bipartite matching is a fundamental optimization problem with many appli-

cations in online resource allocation, especially the online allocation of ads on the

Internet. In this problem, we are given a bipartite graph G = (X, Y,E) with a set of

fixed nodes (or bins) Y , a set of online nodes (or balls) X, and a set E of edges between

them. Any fixed node (or bin) yj ∈ Y is associated with a total weighted capacity

(or budget) cj. Online nodes (or balls) xi ∈ X arrive online along with their incident

edges (xi, yj) ∈ E(G) and their weights wi,j. Upon the arrival of a node xi ∈ X, the

algorithm can assign xi to at most one bin yj ∈ Y where (xi, yj) ∈ E(G) and the

total weight of nodes assigned to yj does not exceed cj. The goal is to maximize the

total weight of the allocation. This problem is known as the AdWords problem, and

it has been studied under the assumption thatmaxi,j wi,jminj cj

→ 0, in [65, 18, 23].

Under the most basic online model, known as the adversarial model, the online

algorithm does not know anything about the xi’s or E(G) beforehand. In this model,

the seminal result of Karp, Vazirani and Vazirani [56] gives an optimal online 1− 1e-

competitive algorithm to maximize the size of the matching for unweighted graphs

where wij = 1 for each (xi, yj) ∈ E(G). For weighted graphs, Mehta et al. [65, 18]

presented the first 1 − 1e-approximation algorithm to maximize the total weight of

the allocation for the AdWords problem and this result has been generalized to more

16

general weighted allocation problems [18, 33].

Other than the adversarial model, motivated by applications in online advertising,

various stochastic online models have been proposed for this problem. In such stochas-

tic models, online nodes xi ∈ X arrive in a random order. In other words, given a

random permutation σ ∈ Sn, the ball xσ(t) arrives at time time t for t = 1, 2, . . . , n.

In this case, the seminal result of Devanur and Hayes [23] gives a 1−ε-approximation

for the problem if the number of balls n is a prior information to the algorithm, and

OPTwij≥ O(m logn

ε3), where m := |Y |. This result has also been generalized and improved

in several followup work [34, 2, 76], and its related models like the iid models with

known or unknown distributions [36, 10, 66, 25]1. These stochastic models are partic-

ularly motivated in the context of online ad allocation. In this context, online nodes

correspond to page-views, search queries, or online requests for ads. In these settings,

the incoming traffic of page-views may be predicted with a reasonable precision using

a vast amount of historical data.

All these stochastic models and their algorithms are useful only if the incoming

traffic of online nodes (e.g. page-views) can be predicted with a reasonably good

precision. In other words, such algorithms may rely heavily on a precise forecast of

the online traffic patterns, and may not react quickly to sudden changes in the traffic.

In fact, the slow reaction to such traffic spikes impose a serious limitation in the real-

world use of stochastic algorithms in practical applications. This is a common issue in

applying stochastic optimization techniques for online resource allocation problems

(see e.g., [79]). Various methodologies such as robust or control-based stochastic

optimization [13, 14, 79, 74] have been applied to alleviate this drawback. In this

chapter, we study this problem from a more idealistic perspective and aim to design

algorithms that simultaneously achieve optimal approximation ratios for both the

adversarial and stochastic models. Our goal is to design algorithms that achieve

good performance ratios both in the worst case and in the average case.

Our Contributions and Techniques. In this chapter, we study simultaneous

approximation algorithms for the adversarial and stochastic models for the online

1In the iid stochastic models, online nodes are drawn iid from a known or an unknown distribution.

17

budgeted allocation problem. Our goal is to design algorithms that achieve a com-

petitive ratio strictly better than 1−1/e on average, while preserving a nearly optimal

worst case competitive ratio. Ideally, we want to achieve the best of both worlds, i.e,

to design an algorithm with the optimal competitive ratio in both the adversarial

and random arrival models. Toward this goal, we show that this can be achieved for

unweighted graphs, but not for weighted graphs. Nevertheless, we present improved

approximation algorithms for weighted graphs.

For weighted graphs we prove that no algorithm can simultaneously achieve nearly

optimal competitive ratios on both the adversarial and random arrival models. In

particular, we show that no online algorithm that achieve an approximation factor of

1 − 1e

for the worst-case inputs may achieve an average approximation factor better

than 97.6% for the random inputs (See Corollary 2.5.3). More generally, we show that

any algorithm achieving an approximation factor of 1−ε in the stochastic model may

not achieve a competitive ratio better than 4√ε in the adversarial model (See Theorem

2.5.1). In light of this hardness result, we aim to design algorithms with improved

approximation ratios in the random arrival model while preserving the competitive

ratio of 1 − 1e

in the worst case. To this end, we show an almost tight analysis of

the algorithm proposed in [65] in the random arrival model. In particular, we show

its competitive ratio is at least 0.76, and is no more than 0.81 (See Theorem 2.3.1,

and Lemma 2.5.5). Combining this with the worst-case ratio analysis of Mehta et

al. [65] we obtain an algorithm with the competitive ratio of 0.76 for the random

arrival model, while having a 1 − 1e

ratio in the worst case. It is worth noting that

unlike the result of [23] we do not assume any prior knowledge of the number of balls

is given to the algorithm.

On the other hand, for unweighted graphs, under some mild assumptions, we

show a generalization an algorithm in [54] achieves a competitive ratio of 1 − ε in

the random arrival model (See Theorem 2.4.1). Combining this with the worst-

case ratio analysis of [54, 65], we obtain an algorithm with the competitive ratio of

1− ε in the random arrival model, while preserving the optimal competitive ratio of

1 − 1e

in the adversarial model. Previously, a similar result was known for a more

18

restricted stochastic model where all bins have equal capacities [68]. For the case

of small degrees, an upper bound of 0.82 is known for the approximation ratio of

any algorithm for the online stochastic matching problem (even for the under the iid

model with known distributions) [66].

Our proofs consist of three main steps: (i) the main technique is to define an

appropriate potential function as an indefinite integral of a scoring function, and in-

terpret the online algorithms as a greedy algorithm acting to improve these potential

functions by optimizing the corresponding scoring functions (see Section 2.2); These

potential functions may prove useful elsewhere; (ii) An important component of the

proof is to write a factor-revealing mathematical program based on the potential

function and its changes, and finally (iii) the last part of the proofs involve changing

the factor-revealing programs to a constant-size LP and solve it using a solver (in the

weighted case), or analyzing the mathematical program explicitly using an interme-

diary algorithm with an oracle access to the optimum (in the unweighted case). The

third step of the proof in the weighted case is inspired by the technique employed

by Mahdian and Yan [64] for unweighted graphs, however, the set of mathematical

programs we used are quite different from theirs.

All of our results hold under two mild assumptions: (i) large capacities (i.e.,

maxi,j wi,jminj cj

→ 0), and (ii) a mild lower bound on the value of OPT: the aggregate

sum of the largest weight ball assigned to each bin by the optimum is much smaller

than OPT, i.e.,∑

j maxi:opt(i)=j wi,j OPT. Both of these assumptions are valid in

real-world applications of this problem in online advertising. The first assumption

also appears in the AdWords problem, and the second assumption aims to get rid of

some degenerate cases in which the optimum solution is very small.

2.1 Notations

Let G(X, Y,E) be a (weighted) bipartite graph, where X := x1, . . . , xn is the set of

online nodes (or balls), and Y := y1, . . . , ym is the set of fixed nodes (or bins). For

each pair of nodes xi, yj, wi,j represents the weight of edge (xi, yj). Each online node

19

yj is associated with a weighted capacity (or budget) cj > 0. The online matching

problem is as follows: first a permutation σ ∈ Sn is chosen (the distribution may be

chosen according to any unknown distribution): at times t = 1, 2, . . . , n, the ball xσ(t)

arrives and its incident edges are revealed to the algorithm. The algorithm can assign

this ball to at most one of the bins that are adjacent to it. The total weight of balls

assigned to each bin yj may not exceed its weighted capacity cj. The objective is to

maximize the weight of the final matching.

Given the graph G, the optimum offline solution is the maximum weighted bipar-

tite matching in G respecting the weighted capacities, i.e, the total weight of balls

assigned to to a bin yj may not exceed cj. For each ball xi, let opt(i) denote the index

of the bin that xi is being matched to in the optimum solution, and alg(i) be the

index of the bin that xi is matched to in the algorithm. Also for each node yj ∈ Y ,

let oj be the weighted degree of yj in the optimum solution. Observe that for each

j, we have 0 ≤ oj ≤ cj. By definition, we have the size of the optimum solution

is OPT =∑

j oj. Throughout the chapter, we use OPT as the total weight of the

optimal solution, and ALG as the total weight of the output of the online algorithm.

Throughout this chapter, we make the assumption that the weights of the edges

are small compared to the capacities, i.e., maxi,j wi,j is small compared to mini cj.

Also we assume that the aggregate sum of the largest weight ball assigned to each

bin by the optimum is much smaller than OPT i.e.,∑

j maxi:opt(i)=j wi,j OPT. In

particular, let

γ ≥ max

maxi,j

wi,jcj,

√∑j maxi:opt(i)=j wi,j

OPT

(2.1)

the guarantees of our algorithm are provided for the case when γ → 0. For jus-

tifications behind these mild assumptions, see the discussion in the Introduction.

Throughout this chapter, wlog we assume that the optimum matches all of the balls,

otherwise we can throw out the unmatched ball and it can only make the competitive

ratio of the algorithm worse.

20

2.2 Main Ideas

In this section, we describe the main ideas of the proof. We start by defining the

algorithms as deterministic greedy algorithms optimizing specific scoring functions.

We also define a concave potential function as an indefinite integral of the scoring

function, and show that a “good” greedy algorithm must try to maximize the potential

function. In section 2.2.1, we show that if σ is chosen uniformly at random, then we

can lower-bound the increase of the potential in an ε fraction of process; finally in

section 2.2.2 we write a factor-revealing mathematical program based on the potential

function and its changes.

We consider a class of deterministic greedy algorithms that assign each incoming

ball xσ(t) based on a “scoring function” defined over the bins. Roughly speaking,

the scoring function characterizes the “quality” of a bin, and a larger score implies

a better-quality bin. In this chapter, we assume that the score is independent of

the particular labeling of the bins, and it is a non-negative, non-increasing function

of the amount that is saturated so far (roughly speaking, these algorithms try to

prevent over-saturating a bin when the rest are almost empty). Throughout this

chapter, we assume that the scoring function and its derivative are bounded (i.e.,

|f ′(.)|, |f(.)| ≤ 1). However, all of our arguments in this section can also be applied

to the more general scoring functions that may even depend on the overall capacity

ci of the bins. At a particular time t, let rj(t) represent the fraction of the capacity

of the bin yj that is saturated so far. Let f(rj(t)) be the score of bin yj at time t.

When the ball xσ(t+1) arrives, the greedy algorithm simply computes the score of all

of the bins and assigns xσ(t+1) to the bin yj maximizing the product of wσ(t+1),j and

f(rj(t)).

Kalyanasundaram, and Pruhs [54] designed the algorithm Balance using the scor-

ing function fu(rj(t)) := 1−rj(t) (i.e., the algorithm simply assigns an in-coming ball

to the neighbor with the smallest ratio if it is less than 1, and drops the ball otherwise).

They show that for any unweighted graph G, it achieves a 1− 1/e competitive ratio

against any adversarially chosen permutation σ. Mehta et al. [65] generalized this al-

21

gorithm to weighted graphs by defining the scoring function fw(rj(t)) = (1−e1−rj(t)).

Their algorithm, denoted by Weighted-Balance , achieves a competitive ratio of 1−1/e

for the AdWords problem in the adversarial model. We note that both of the algo-

rithms never over-saturate bins (i.e., 0 ≤ rj(t) ≤ 1). Other scoring functions have also

been considered for other variants of the problem (see e.g. [63, 33]). Intuitively, these

scoring functions are chosen to ensure that the algorithm assigns the balls as close

to opt(xσ(t)) as possible. When the permutation is chosen adversarially, any scoring

function would fail to perfectly monitor the optimum assignment (as discussed be-

fore, no online algorithm can achieve a competitive ratio better than 1 − 1/e in the

adversarial model). However, we hope that when σ is chosen uniformly at random,

for any adversarially chosen graph G, the algorithm can almost capture the optimum

assignment. In the following we try to formalize this observation.

We measure the performance of the algorithm at time t by assigning a potential

function that in some sense compares the quality of the overall decisions of the al-

gorithm w.r.t. the optimum. Assuming the optimum solution saturates all of the

bins (i.e., cj = oj), the potential function achieves its maximum at the end of the

algorithm if the balls are assigned exactly according to the optimum. The closer the

value of the potential function to the optimum means a better assignment of the balls.

We define the potential function as the weighted sum of the indefinite integral of the

scoring functions of the bins chosen by the algorithm:

φ(t) :=∑j

cj

∫ rj(t)

r=0

f(r)dr =∑j

cjF (rj(t)).

In particular, we use the following potential function for the Balance and the

Weighted-Balance algorithms respectively:

φu(t) : = −1

2

∑j

cj(1− rj(t))2 (2.2)

φw(t) : =∑j

cj(rj − erj(t)−1). (2.3)

Observe that since the scoring function is a non-increasing function of the ratios, its

22

antiderivative F (.) will be a concave function of the ratios. Moreover, since it is always

non-negative the value of the potential function never decreases in the running time

of the algorithm. By this definition the greedy algorithm can be seen as an online

gradient descent algorithm which tries to maximize a concave potential function; for

each arriving ball xσ(t), it assigns the ball to the bin that makes the largest local

increase in the function.

To analyze the performance of the algorithm we lower-bound the increase in the

value of the potential function based on the optimum matching. This allows us to

show that the final value of the potential function achieved by the algorithm is close

to its value in the optimum, thus bound the competitive ratio of the algorithm. In the

next section, we use the fact that σ is chosen randomly to lower-bound the increase in

εn steps. Finally, in section 2.2.2 we write a factor-revealing mathematical program

to compute the competitive ratio of the greedy algorithm.

2.2.1 Lower bounding the Increase in the Potential Function

In this part, we use the randomness defined on the permutation σ to argue that with

high probability the value of the potential function must have a significant increase

during the run of the algorithm. We define a particular event E corresponding to

event that the arrival process of the balls is approximately close to its expectation.

To show that E occurs with high probability, we only consider the distribution of

arriving balls at 1/ε equally distance times; as a result we can mainly monitor the

amount of increase in the potential function at these time intervals. For a carefully

chosen 0 < ε < 1, we divide the process into 1/ε slabs such that the kth slab includes

the [knε + 1, (k + 1)nε] balls. Assuming σ is chosen uniformly at random, we show

a concentration bound on the weight of the balls arriving in the kth slab. Using that

we lower bound φ((k + 1)nε)− φ(knε) in Lemma 2.2.2.

First we use the randomness to determine the weight of the balls arriving in the kth

slab. Let Ii,k be the indicator random variable indicating that the ith ball will arrive

in the kth slab. Observe that for any k, the indicators Ii,k are negatively correlated:

knowing that Ii,k = 1 can only decrease the probability of the occurrence of the other

23

balls in the kth slab (i.e., P [Ii,k|Ii′,k = 1] ≤ P [Ii,k]). Define Nj,k :=∑

i:opt(i)=j wi,jIi,k

as the sum of the weight of the balls that are matched to the jth bin in the optimum

and arrive in the kth slab. It is easy to see that Eσ [Nj,k] = ε · oj, moreover, since it is

a linear combination of negatively correlated random variables it will be concentrated

around its mean. Define h(k) :=∑

j |Nj,k − εoj|. The following Lemma shows that

h(k) is very close to zero for all time slabs k with high probability. Intuitively, this

implies that with high probability the balls from each slab are assigned similarly in

the optimum solution.

Lemma 2.2.1. Let h(k) :=∑

j |Nj,k − εoj|. Then Pσ

[∀k, h(k) ≤ 5γ

εδOPT

]≥ 1− δ.

Proof. It suffices to upper-bound P[h(k) > 5γ

εδOPT

]≤ δε; the lemma can then be

proved by a simple application of the union bound. First we use Azuma-Hoeffding

concentration bound to compute E [|Nj,k − εoj|]; then we simply apply the Markov

inequality to upper-bound h(k).

Let Wj :=√

2∑

i:opt(i)=j w2i,j, for any j, k, we show E [|Nj,k − εoj|] ≤ 3Wj. Since

Nj,k is a linear combination of negatively correlated random variables Ii,k for opt(i) =

j, and E [Nj,k] = ε·oj by a generalization of the Azuma Hoeffding bound to negatively

correlated random variables [70] we have

E [|Nj,k − ε · oj|] ≤ Wj

∞∑l=0

P [|Nj,k − E [Nj,k] | ≥ l ·Wj]

≤ Wj

1 + 2∞∑l=1

e−

l2W2j

2∑i:opt(i)=j w

2i,opt(i)

≤ Wj

(1 + 2

∞∑l=1

e−l2

)≤ 3Wj. (2.4)

Let wmax(j) := maxi:opt(i)=j wi,j. Since W 2j as twice the sum of the square of the

weights assigned to the jth bin is a convex function we can write Wj ≤√

2wmax(j)oj.

Therefore, by the linearity of expectation we have

E [h(k)] ≤∑j

3√

2wmax(j)oj ≤ 5∑j

wmax(j)/γ + γoj2

≤ 5 1

2γ

∑j

wmax(j) +γ

2OPT ≤ 5γOPT,

24

where the last inequality follows from assumption (2.1). Since h(k) is a non-negative

random variable, by the Markov inequality we get P[h(k) > 5γ

εδOPT

]≤ δε. The

lemma simply follows by applying this inequality for all k ∈ 0, . . . , 1/ε and using

the union bound.

Let E be the event that ∀k, h(k) ≤ 5γεδ

OPT. The next lemma shows that condi-

tioned on E , one can lower-bound the increase in the potential function in any slab

(i.e., φ((k + 1)nε)− φ(knε) for any 0 ≤ k < 1/ε):

Lemma 2.2.2. Conditioned on E, for any 0 ≤ k < 1/ε, t0 = knε, and t1 = (k+1)nε

we have

φ(t1)− φ(t0) ≥ ε∑j

ojf(rj(t1)) −6√γ

εδOPT.

Proof. First we simply compute the increase of the potential function at time t + 1,

for some t0 ≤ t < t1. Then, we lower-bound the increase using the monotonicity

of the scoring function f(.). Finally, we condition on E and lower-bound the final

expression in terms of OPT. Let σ(t + 1) = i, and assume the algorithm assigns xi

to the jth bin (i.e., alg(i) = j); since the algorithm maximizes wi,jf(rj(t)) we can

lower-bound the increase of the potential function based on the optimum. First using

the mean value theorem of the calculus we have:

cj

F (rj(t) +

wi,jcj

)− F (rj(t)

= cj

wi,jcjf(rj(t)) +

1

2

(wi,jcj

)2

f ′(r∗)

,

for some r∗ ∈ [rj(t), rj(t) + wi,j/cj]. Since the derivative of f(.) is bounded (i.e.,

|f ′(r)| ≤ 1 for all r ∈ [0, 1]), we get

φ(t+ 1)− φ(t) = cj

F (rj(t) +

wi,jcj

)− F (rj(t)

≥ wi,opt(i)f(ropt(i)(t))− wi,j

wi,jcj

≥ wi,opt(i)f(ropt(i)(t1))− γwi,j, (2.5)

where the first inequality follows by the greedy decision chosen by the algorithm

wi,jf(rj(t)) ≥ wi,opt(i)f(ropt(i)(t)), and the last inequality follows by assumption (2.1).

Consider a single run of the algorithm; wlog we assume that ALG ≤ OPT. We

25

can monitor the amount of increase in the potential function in the kth slab as follows:

φ(t1)− φ(t0) ≥t1−1∑t=t0

wσ(t),opt(σ(t))f(ropt(σ(t))(t))− γwσ(t),alg(σ(t))

≥

∑j

∑t0≤t<t1,opt(σ(t))=j

f(rj(t1))wσ(t),j − γOPT

=∑j

f(rj(t1))Nj,k − γOPT

where the second inequality follows by the fact that f(.) is a non-increasing function

of the ratio, and∑

t0≤t<t1 wσ(t),alg(σ(t)) ≤ ALG ≤ OPT, and the equality follows from

the definition of Nj,k. By lemma 2.2.1 we know Nj,k is highly concentrated around

ε · oj. Conditioned on E , we have h(k) ≤ 5γεδOPT , thus:

φ(t1)− φ(t0) ≥ ε∑j

f(rj(t1))oj −∑j

|Nj,k − εoj| − γOPT

≥ ε∑j

f(rj(t1))oj − h(k)− γOPT ≥ ε∑j

f(rj(t1))oj −6γ

εδOPT

where the first inequality follows by the assumption |f(.)| ≤ 1.

2.2.2 Description of the Factor-Revealing Mathematical Pro-

gram

In this section we propose a factor-revealing mathematical program that lower-bounds

the competitive ratio of the algorithms Balance and Weighted-Balance . In sections

2.3, and 2.4 we derive a relaxation of the program and analyze that relaxation. In-

terestingly, the main non-trivial constraints follows from the lower-bounds obtained

for the amount of increase in the potential function in Lemma 2.2.2.

The details of the program is described in MP(1). It is worth noting that the last

inequality in this program follows from the monotonicity property of the ratios. In

other words, we assume the ratio function rj(t) is a monotonically increasing function

w.r.t. to t.

26

MP(1)minimize 1

OPT

∑j minrj(n), 1cj∑

j cjF (rj(t)) = φ(t) ∀t ∈ [n],

ε∑

j ojf(rj((k + 1)nε))− 6γεδ

OPT ≤ φ((k + 1)nε)− φ(knε) ∀k ∈ [1ε− 1],

oj ≤ cj ∀j ∈ [m],∑j oj = OPT,

rj(t) ≤ rj(t+ 1) ∀j, t ∈ [n− 1].

The following proposition summarizes our arguments and shows that the program

MP(1) is a relaxation for any deterministic greedy algorithm that works based on

a scoring function. It is worth noting that the whole argument still follows when

the scoring function is not necessarily non-negative; we state the proposition in this

general form.

Proposition 2.2.3. Let f be any non-increasing, scoring function of the ratios rj(t)

of the bins such that |f(r)|, |f ′(r)| ≤ 1 for the range of ratios that may be encountered

in the running time of the algorithm. For any (weighted) graph G = (X, Y ), and

ε > 0, with probability at least 1−δ, MP(1) is a factor-revealing mathematical program

for the greedy deterministic algorithm that uses scoring function f(.).

Since the potential function F (.) is a concave function, this program may not

be solvable in polynomial time. In section 2.4, we show that after adding a new

constraint it is possible to analyze it analytically for the unweighted graphs. To deal

with this issue for the weighted graphs, we write a constant-size LP relaxation of the

program that lower-bounds the optimum solution (after losing a small error). Finally,

we solve the constant-size LP by an LP solver, and thus obtain a nearly tight bound

for the competitive ratio of the Weighted-Balance (see section 2.3 for more details).

In the rest of this section, we write a simpler mathematical program that will be

used later in section 2.3 for analyzing the Weighted-Balance algorithm. In particular,

we simplify the critical constraint that measures the increase in the potential function

by further removing the term −6γεδ

OPT. Moreover, since Weighted-Balance never

over-saturates bins we can also add the constraint rj(n) ≤ 1 to both MP(1) and

MP(2) and still have a relaxation of Weighted-Balance .

27

MP(2) minimize∑

j rj(n)cjs.t.

∑j cj(rj(t)− erj(t)−1) = φ(t) ∀t ∈ [n],

ε∑

j oj(1− erj((k+1)nε)−1) ≤ φ((k + 1)nε)− φ(knε) ∀k ∈ [1ε],

oj ≤ cj ∀j ∈ [m],∑j oj = 1.

rj(t) ≤ rj(t+ 1) ∀j, t ∈ [n− 1],rj(n) ≤ 1 ∀j ∈ [m].

In the next Lemma we show that the optimum value of MP(2) is at least (1−√

12γε2δ

)

of MP(1):

Lemma 2.2.4. For any weighted graph G, we have MP(1) ≥ (1−α) min1,MP(2),

where α :=√

12γε2δ

.

Proof. Wlog we can replace OPT = 1 in MP(1). Let s1 := rj(t), cj, oj, φ(t) be a

feasible solution of the MP(1). If∑

j rj(n)cj ≥ (1 − α) we are done; otherwise we

construct a feasible solution s2 of the MP(2) such that the value of s1 is at least

(1−α) of the value of s2. Then the lemma simply follows from the fact that the cost

of the value of the optimum solution of MP(1) is at least of (1 − α) of the value of

the optimum of MP(2).

Define s2 := rj(t), cj/(1 − α), oj, φ(t)/(1 − α). Trivially, s2 satisfies all except

(possibly) the second constraint of MP(2). Moreover, the value of s1 is (1− α) times

the value of s2. It remains to prove the feasibility of the second constraint of MP(2),

i.e.,

ε(1− α)∑j

oj(1− erj((k+1)nε)−1) ≤ φ((k + 1)nε)− φ(knε),

for all k ∈ [1/ε]. Since s1 is a feasible solution of MP(1) we have

φ((k + 1)nε)− φ(knε) ≥ ε∑j

oj(1− erj((k+1)nε)−1)− ε

2α2

≥ ε∑j

oj(1− erj((k+1)nε)−1)

1−

ε2α2

ε2

∑j oj(1− rj((k + 1)nε)

, (2.6)

where the last inequality follows from the assumption that 0 ≤ rj(t) ≤ 1, and the fact

that 1− ex−1 ≥ 12(1− x) for x ∈ [0, 1]. On the other hand, since

∑j rj(n)cj < 1− α,

28

we can write: ∑j

oj(1− rj((k + 1)nε)) ≥ 1−∑j

cjrj(n) ≥ α

The lemma simply follows from putting the above inequality together with equation

(2.6).

2.3 The Competitive Ratio of Weighted-Balance

In this section, we lower-bound the competitive ratio of the WEIGHED-BALANCE

algorithm in the random arrival model. More specifically, we prove the following

theorem:

Theorem 2.3.1. For any weighted graph G = (X, Y ), and

δ > 0, with probability 1 − δ, the competitive ratio of the

Weighted-Balance algorithm in the random arrival model is at least

0.76(1−O(√γ/δ)).

To prove the bound in this theorem, we write a constant-size linear programming

relaxation of the problem based on MP(2) and solve the problem by an LP solver.

The main two difficulties with solving program MP(2) are as follows: first, as we

discussed in section 2.2.2, MP(2) is not a convex program; second, the size of the

program (i.e., the number of variables and constraints) is a function of the size of the

graph G. The main idea follows from a simple observation that the main inequalities

in MP(2), those lower-bounding the increase in the potential function, are indeed

lower-bounding the increase in the potential function only at constant (i.e., 1/ε)

number of times. Hence, we do not need to keep track of the ratios and the potential

function for all t ∈ [n]; instead it suffices to monitor these values at 1/ε critical times

(i.e., at times knε for k ∈ [1/ε]), for a constant ε. Even in those critical times it

suffices to approximately monitor the ratios of the bins by discretizing the ratios into

1/ε slabs.

For any integers 0 ≤ i < 1/ε, 0 ≤ k ≤ 1/ε, let ci,k be the sum of the capacities of

the bins of ratio rj(knε) ∈ [iε, (i + 1)ε), and oi,k be the sum of the weighted degree

29

of the bins of ratio rj(knε) ∈ [iε, (i+ 1)ε) in the optimum solution, i.e.,

ci,k :=∑

j:rj(knε)∈[iε,(i+1)ε)

cj, oi,k :=∑

j:rj(knε)∈[iε,(i+1)ε)

oj. (2.7)

Now we are ready to describe the constant-size LP relaxation of MP(2). We write the

LP relaxation in terms of the new variables ci,k, oi,k. In particular, instead of writing

the constraints in terms of the actual ratios of the bins, we round down (or round

up) the ratios to the nearest multiple of ε such that the constraint remains satisfied.

The details are described in LP(1).

LP(1) minimize 11−1/e

φ(1

ε)−

∑1/ε−1i=0 ci,k(iε/e− eiε−1)

s.t.

∑1/ε−1i=0 ci,k(iε− eiε−1) ≤ φ(k) ∀k ∈ [1

ε]∑1/ε−1

i=0 εoi,k+1(1− e(i+1)ε−1) ≥ φ(k + 1)− φ(k) ∀k ∈ [1ε− 1]

ci,k ≥ oi,k ∀i ∈ [1ε− 1], k ∈ [1

ε]∑1/ε−1

i=0 oi,k = 1 ∀k ∈ [1ε] :∑1/ε−1

l=i cl,k+1 ≥∑1/ε−1

l=i cl,k ∀i ∈ [1ε− 1], k ∈ [1

ε− 1]

In the next Lemma, we show that the LP(1) is a linear programming relaxation

of the program MP(2):

Lemma 2.3.2. The optimum value of LP(1) lower-bounds the optimum value of

MP(2).

Proof. We show that for any feasible solution s := rj(t), cj, oj, φ(t) of MP(2) we can

construct a feasible solution s′ = c′i,k, o′i,k, φ′(k) for LP(1) with a smaller objective

value. In particular, we construct s′ simply by using equation (2.7), and letting

φ′(k) := φ(knε). First we show that all constraints of LP(1) are satisfied by s′, then

we show that the value of LP(1) for s′ is smaller than the value of MP(2) for s.

The first equation of LP(1) simply follows from rounding down the ratios in the

first equation of MP(2) to the nearest multiple of ε. The equation remains satisfied by

the fact that the potential function φ(.) is increasing in the ratios (i.e., Fw(r) = r−er−1

is increasing in r ∈ [0, 1]). Similarly, the second equation of LP(1) follows from

rounding up the ratios in the second equation of MP(2), and noting that the scoring

function is decreasing in the ratios (i.e., fw(r) = 1− er−1 is decreasing for r ∈ [0, 1]).

30

The third and fourth equations can be derived from the corresponding equations in

MP(2). Finally, the last equation follows from the monotonicity property of the ratios

(i.e., rj(t) is a non-decreasing function of t).

It remains to compare the values of the two solutions s, and s′. We have

1

1− 1/e

φ′(1

ε)−

1/ε−1∑i=0

c′i,1/ε(iε/e− eiε−1

) ≤

1

1− 1/e

φ(n)−

∑j

cj(rj(n)

e− erj(n)−1)

=

∑j

cjrj(n),

where the inequality follows from the fact that r/e− er−1 is a decreasing function for

r ∈ [0, 1], and the last inequality simply follows from the definition of φw(.) (i.e., the

first equation of MP(2)).

Now we are ready to prove Theorem 2.3.1:

Proof of Theorem 2.3.1. By Proposition 2.2.3, for any ε > 0, with probability

1 − δ the competitive ratio of Weighted-Balance is lower-bounded by the optimum

of MP(1). On the other hand, by Lemma 2.2.4 the optimum solution of MP(1) is

at least (1−√

12γε2δ

) of the optimum solution of MP(2). Finally, by Lemma 2.3.2 the

optimum solution of MP(2) is at least the optimum of LP(1). Hence, with probability

1−δ the competitive ratio of Weighted-Balance is at least (1−√

12γε2δ

) of the optimum

of LP(1).

The constant-size linear program LP(1) can be solved numerically for any value

of ε > 0. By solving this LP using an LP solver, we can show that for ε = 1/250 the

optimum solution is greater than 0.76. This implies that with probability 1 − δ the

competitive ratio of Weighted-Balance is at least 0.76(1−O(√γ/δ)).

Remark 2.3.3. We also would like to remark that the optimum solution of LP(1)

beats the 1 − 1/e factor even for ε = 1/8; roughly speaking this implies that even if

the permutation σ is almost random, in the sense that each 1/8 of the input almost

has the same distribution, then Weighted-Balance beats the 1− 1/e factor.

31

2.4 The Competitive Ratio of Balance

In this section we show that for any unweighted graph G, under some mild assump-

tions, the competitive ratio of Balance approaches 1 in the random arrival model.

Theorem 2.4.1. For any unweighted bipartite graph G = (X, Y,E), and δ > 0, with

probability 1 − δ the competitive ratio of Balance in the random arrival model is at

least 1− β∑j cj

OPT, where β := 3(γ/δ)1/6.

First we assume that our instance is all-saturated meaning that the optimum

solution saturates all of the bins (i.e., cj = oj for all j), and show that the competitive

ratio of the algorithm is at least 1− 3(γ/δ)1/6:

Lemma 2.4.2. For any δ > 0, with probability 1 − δ the competitive ratio of Bal-

ance on all-saturated instances in the random arrival model is at least 1− β.

Then we prove Theorem 2.4.1 via a simple reduction to the all-saturated case.

To prove Lemma 2.4.2, we analyze a slightly different algorithm Balance’ that

always assigns an arriving ball (possibly to an over-saturated bin); this will allow us

to keep track of the number of assigned balls at each step of the process. In particular

we have

∀t ∈ [n] :∑j

cjrj(t) = t, (2.8)

where rj(t) does not necessarily belong to [0, 1]. The latter certainly violates some of

our assumptions in Section 2.2. To avoid the violation, we provide some additional

knowledge of the optimum solution to Balance’ such that the required assumptions

are satisfied, and it achieves exactly the same weight as Balance .

We start by describing Balance’ , then we show that it still is a feasible algorithm

for the potential function framework studied in Section 2.2; in particular we show it

satisfies Proposition 2.2.3. When a ball xi arrives at time t + 1 (i.e., σ(t + 1) = i),

similar to Balance , Balance’ assigns it to a bin maximizing wi,jfu(rj(t)); let j be such

a bin. Unlike Balance if rj(t) ≥ 1 (i.e., all neighbors of xi are saturated), we do not

drop xi; instead Balance’ assigns it to the bin yopt(i).

32

First note that although Balance’ magically knows the optimum assignment of a

ball once all of its neighbors are saturated, it achieves the same weight matching. This

simply follows from the fact that over-saturating bins does not increase our gain, and

does not alter any future decisions of the algorithm. Next we use Proposition 2.2.3

to show that MP(1) is indeed a mathematical programming relaxation for Balance’ .

By Proposition 2.2.3, we just need to verify that |fu(.)|, |f ′u(.)| ≤ 1 for all the ratios

we might encounter in the run of Balance’ . Since fu(r) = (1− r), and the ratios are

always non-negative, it is sufficient to show that the ratios are always upper-bounded

by 2. To prove this, we crucially use the fact that Balance’ has access to the optimum

assignment for the balls assigned to the over-saturated bins. Observe that the set

of balls assigned to a bin after it is being saturated, is always a subset of the balls

assigned to it in the optimum assignment. Since the ratio of all bins are at most 1 in

the optimum, they will be upper-bounded by 2 in Balance’ .

The following is a simple mathematical programming relaxation to analyze Bal-

ance’ in the all-saturated instances:

MP(3)minimize

∑j minrj(n), 1cj∑

j cjrj(t) = t t ∈ [n],

ε∑

j cj(1− rj((k + 1)nε))− 6γεδ

OPT ≤ φu((k + 1)nε)− φu(knε) ∀k ∈ [1ε− 1],

Note that the first constraint follows from (2.8), and the second constraint follows

from the second constraint of MP(1), and the fact that cj = oj in the all-saturated

instances.

Now we are ready to prove Lemma 2.4.2:

Proof of Lemma 2.4.2. With probability 1−δ, MP(3) is a mathematical programming

relaxation of Balance’ . First we sum up all 1/ε second constraints of MP(3) to

obtain a lower-bound on φu(n), and we get φu(n) is very close to zero (intuitively,

the algorithm almost manages to optimize the potential function). Then, we simply

apply the Cauchy-Schwarz inequality to φu(n) to bound the loss of Balance’ .

We sum up the second constraint of MP(3) for all k ∈ 0, 1, . . . , 1ε− 1; the RHS

33

telescopes and we obtain:

φu(n)− φu(0) ≥ OPT (1− 6γ

ε2δ)− ε

1/ε−1∑k=0

∑j

cjrj((k + 1)nε)

≥ n(1− 6γ

ε2δ)− ε2n

1/ε−1∑k=0

(k + 1) ≥ n(1

2− ε

2− 6γ

ε2δ)

where the first inequality follows by the assumption that the instance is all-saturated,

and the second inequality follows from applying the first constraint of MP(3) for

t = (k+ 1)nε, and the fact that OPT = n. Since φu(0) = −12

∑j(1− rj(0))2 = −n/2,

we obtain φu(n) ≥ −n( ε2

+ 6γε2δ

).

Observe that only the non-saturated bins incur a loss to the algorithm, i.e.,

Loss(Balance’ ) =∑

rj(n)<1

cj(1− rj(n)).

Using the lower-bound on φu(n) we have

∑rj(n)<1

cj(1− rj(n)) ≤√ ∑

rj(n)<1

cj(1− rj(n))2 ·∑

rj(n)<1

cj

≤√−2φu(n) · n ≤ n

√ε+

12γ

ε2δ,

where the first inequality follows by the Cauchy-Schwarz inequality, and the second

inequality follows from the definition of φu(n). The lemma simply follows from choos-

ing ε = 2(2γ/δ)1/3 in the above inequality.

Next we prove Theorem 2.4.1; we analyze the general instances by a reduction to

all-saturated instances.

Proof of Theorem 2.4.1. Let G = (X, Y ) be an unweighted graph, similar to Lemma

2.4.2 it is sufficient to analyze Balance’ on G. For every bin yj we introduce cj − ojdummy balls that are only adjacent to the jth bin, and let G′ = (X ′, Y ) be the new

instance. First we show that the expected number of non-dummy balls matched by

Balance’ in G′ is at most the expected size of the matching that Balance’ achieves

34

in G. We analyze the performance of Balance’ on G simply using Lemma 2.4.2, and

eliminating the effect of dummies.

Fix a permutation σ ∈ S|X′|; letW ′(σ) be the number of non-dummy balls matched

by Balance’ on σ. Similarly, let W (σ[X]) be the size of the matching obtained on

σ[X] in G, where σ[X] is the projection of σ on X. Using an argument similar to [17,

Lemma 2] (e.g., the monotonicity property), one can show that W ′(σ) ≤ W (σ[X])

for all σ ∈ S|X′|. Hence, to compute the competitive ratio of Balance’ on G, it is

sufficient to upper-bound the expected number of non-dummy balls not-matched by

Balance’ on G′. The latter is certainly not more than the total loss of Balance’ on G′

which is no more than β∑

j cj by Lemma 2.4.2.

2.5 Hardness Results

In this section, we show that there exists a family of weighted graphs G such that

for any ε > 0, any online algorithm that achieves a 1 − ε competitive ratio in the

random arrival model, does not achieve an approximation ratio better than a function

g(ε) in the adversarial model, where g(ε) → 0 as ε → 0. More specifically, we prove

something stronger:

Theorem 2.5.1. For any constants δ, ε > 0, there exists family of weighted bipartite

graphs G = (X, Y ) such that for any (randomized) algorithm that achieves 1 − ε

competitive ratio (in expectation) on at least δ fraction of the permutations σ ∈ S|X|,

does not achieve more than 4√ε (in expectation) for a particularly chosen permutation

in another graph G′.

As a corollary, we can show that any algorithm that achieves the competitive

ratio of 1 − 1/e in the adversarial model can not achieve an approximation factor

better than 0.976 in the random arrival model. Moreover, at the end of this section,

we show that for some family of graphs the Weighted-Balance algorithm does not

achieve an approximation factor better than 0.81 in the random arrival model (see

Lemma 2.5.5 for more details). This implies that our analysis of the competitive ratio

35

of this algorithm is tight up to an additive factor of 5%. We start by presenting the

construction of the hard examples:

Example 2.5.2. Fix a large enough integer l > 0, and let α :=√ε; let Y := y1, y2

with capacities c1 = c2 = l. Let C and D be two types of balls (or online nodes), and

let the set of online nodes X correspond to a set of l copies of C and l/α copies of

D. Each type C ball has a weight of 1 for y1, and a weight of 0 for y2, while a type

D ball has a weight of 1 in y1 and a weight of α in y2.

First of all, observe that the optimum solution achieves a matching of weight 2l

simply by assigning all type C balls to y1, and type D balls to y2. On the other hand,

any algorithm that achieves the competitive ratio of 1−ε in the random arrival model

should match the balls “very similar” to this strategy. However, if the algorithm uses

this strategy, then an adversary may construct an instance by preserving the first

l balls of the input followed by l/α dummy balls. But in this new instance it is

“much better” to assign all of the first l balls to y1. In the following we formalize this

observation.

Proof of Theorem 2.5.1. Let G be the graph constructed in Example 2.5.2, and let

A be a (randomized) algorithm that achieves 1− ε competitive ratio (in expectation)

on at least δ fraction of permutations σ ∈ Sn, where n = l + l/α, for some constant

δ > 0. First we show that there exists a particular permutation σ∗ such that there

are at most lα balls of type C among σ∗(1), . . . , σ∗(l), and algorithm A achieves at

least (1−ε)2l on σ∗. Then we show that the (expected) gain of A from the first l balls

is at most 4l√ε. Finally, we construct a new graph G′ = (X ′, Y ) and a permutation

σ′ such that the first l balls in σ′ is the same as the first l balls of σ∗. This will imply

that A does not achieve a competitive ratio better than 4√ε on G′.

To find σ∗ it is sufficient to show that with probability strictly more than 1 − δ

the number of type C balls among the first l balls of a uniformly random chosen

permutation σ is at most lα. This can be proved simply using the Chernoff-Hoeffding

bound. Let Bi be a Bernoulli random variable indicating that xσ(i) is of type C, for

1 ≤ i ≤ l. Observe that Eσ [Bi] = α1+α

, and these variables are negatively correlated.

36

By a generalization of Chernoff-Hoeffding bound [70] we have

P

[l∑

i=1

Bi > αl

]≤ e−

lα3

6 < δ,

where the last inequality follows by choosing l large enough. Hence, there exists a

permutation σ∗ such that the number of type C balls among its first l balls is at most

lα, and A achieves (1− ε)2l on σ∗.

Next we show that the (expected) gain of A from the first l balls of σ∗ is at most

2l(α + ε/α) = 4l√ε. This simply follows from the observation that any ball of type

D that is assigned to y1 incurs a loss of α. Since the expected loss of the algorithm

is at most 2lε on σ∗, the expected number of type D balls assigned to y1 (in the

whole process) is no more than 2lεα

. We can upper-bound the (expected) gain of the

algorithm from the first l balls by lα+ 2lεα

+ lα, where the first term follows from the

upper-bound on the number of C balls, and the last term follows from the number of

D balls (that may possibly be) assigned to y2.

It remains to construct the adversarial instance G′ together with the permutation

σ′. G′ has the same set of bins, while X ′ is the union of the first l balls of σ∗ with

l/α dummy balls (a dummy ball has zero weight in both of the bins). We construct

σ′ by preserving the first l balls of σ∗, filling the rest with the dummy balls (i.e.,

xσ′(i) = xσ∗(i) for 1 ≤ i ≤ l). First, observe that the optimum solution in G′ achieves

a matching of weight l simply by assigning all of the first l balls to y1. On the other

hand, as we proved the (expected) gain of the algorithm A is no more than 4l√ε on

G′. Therefore, the competitive ratio of A in this adversarial instance is no more than

4√ε.

The following corollary can be proved simply by choosing δ small enough in The-

orem 2.5.1:

Corollary 2.5.3. For any constant ε > 0, any algorithm that achieves a competitive

ratio of 1 − ε in the random arrival model does not achieve strictly better than 4√ε

in the adversarial model. In particular, it implies that any algorithm that achieves

a competitive ratio of 1 − 1e

in the adversarial model does not achieve strictly better

37

than 0.976 in the random order model.

It is also worth noting that Weighted-Balance achieves at least 0.89 competitive

ratio in the random arrival model for Example 2.5.2, and the worst case happens for

α ≈ 0.48. Next we present a family of examples where the Weighted-Balance does

not achieve a factor better than 0.81 in the random arrival model.

Example 2.5.4. Fix a large enough integer n > 0, and α < 1; again let Y := y1, y2

with capacities c1 = n, and c2 = n2. Let X be a union of n identical balls each of

weight 1 for y1 and α for y2.

Lemma 2.5.5. For a sufficiently large n, and a particularly chosen α > 0, the

competitive ratio of the Weighted-Balance in the random arrival model for Example

2.5.4 is no more than 0.81.

Proof. First observe that the optimum solution achieves a matching of weight n sim-

ply by assigning all balls to y1. Intuitively, Weighted-Balance starts with the same

strategy, but after partially saturating y1, it sends the rest to y2 (note that each ball

that is sent to y2 incurs a loss of 1 − α to the algorithm). Recall that r1(n) is the

ratio of y1 at the end of the algorithm. The lemma essentially follows from upper-

bounding r1(n) by 1 + 1/n + ln(1 − α(1 − e1/n−1)). Since the algorithm achieves a

matching of weight exactly r1(n)n + (1 − r1(n))nα, and OPT = n, the competitive

ratio is r1(n) + (1− r1(n))α. By optimizing over α, one can show that the minimum

competitive ratio is no more than 0.81, and it is achieved by choosing α ' 0.55.

It remains to show that r1(n) ≤ 1 + 1/n+ ln(1− α(1− e1/n−1)). Let t be the last

time where a ball is assigned to y1 (i.e., r1(t − 1) + 1/n = r1(t) = r1(n)). Since the

ball at time t is assigned to y1, we have

1 · fw(r1(t− 1)) ≥ α · fw(r2(t− 1)) ≥ α · fw(1

n),

where the last inequality follows by the fact that the ratio of the second bin can

not be more than α · n/c2 < 1/n, and fw(.) is a non-increasing function of the

38

ratios. Using fw(r) = 1 − er−1, and r1(t − 1) + 1/n = r1(n) we obtain that r1(n) ≤

1 + 1/n+ ln(1− α(1− e1/n−1)).

2.6 Related Work

In this section, we discuss the related works that are not mentioned at the beginning

of Chapter 2. For unweighted graphs, it has been recently observed that the Karp-

Vazirani-Vazirani 1− 1e-competitive algorithm for the adversarial model also achieves

an improved approximation ratio of 0.70 in the random arrival model [55, 64]. This

holds even without the assumption of large degrees. It is known that without this

assumption, one cannot achieve an approximation factor better than 0.82 for this

problem (even in the case of iid with known distributions) [66]. This is in contrast

with our result for unweighted graphs with large degrees.

Online budgeted allocation and its generalizations appear in two main categories

of online advertising: AdWords (AW) problem [65, 18, 23], and the Display Ads

Allocation (DA) problem [33, 34, 2, 76]. In both of these problems, the publisher

must assign online page-views (or impressions) to an inventory of ads, optimizing

efficiency or revenue of the allocation while respecting pre-specified contracts. In the

DA problem, given a set of m advertisers with a set Sj of eligible impressions and

demand of at most n(j) impressions, the publisher must allocate a set of n impressions

that arrive online. Each impression i has value wij ≥ 0 for advertiser j. The goal

of the publisher is to assign each impression to one advertiser maximizing the value

of all the assigned impressions. The adversarial online DA problem has been studied

in [33] in which the authors present a 1− 1e-competitive algorithm for the DA problem

under a free disposal assumption for graphs of large degrees. This result generalizes

the 1− 1e-approximation algorithm by Mehta et al [65] and by Buchbinder et. al. [18].

Following a training-based dual algorithm by Devanur and Hayes [23] for the AW

problem, training-based (1 − ε)-competitive algorithms have been developed for the

DA problem and its generalization to packing linear programs [34, 76, 2] including

the DA problem. These papers develop a (1 − ε)-competitive algorithm for online

39

stochastic packing problems in which OPTwij≥ O(m logn

ε3) (or OPT

wij≥ O(m logn

ε2) applying

the technique of [2]) and the demand of each advertiser is large, in the random-order

and the i.i.d model. Although studying a similar set of problems, none of the above

papers study the simultaneous approximations for adversarial and stochastic models,

and the dual-based 1 − ε-competitive algorithms for the stochastic variants do not

provide a bounded competitive ratio in the adversarial model.

Dealing with traffic spikes and inaccuracy in forecasting the traffic patterns is a

central issue in operations research and stochastic optimization. Various method-

ologies such as robust or control-based stochastic optimization [13, 14, 79, 74] have

been proposed. These techniques either try to deal with a larger family of stochastic

models at once [13, 14, 79], try to handle a large class of demand matrices at the

same time [79, 4, 6], or aim to design asymptotically optimal algorithms that react

more adaptively to traffic spikes [74]. These methods have been applied in particular

for traffic engineering [79] and inter-domain routing [4, 6]. Although dealing with

similar issues, our approach and results are quite different from the approaches taken

in these papers. Finally, an interesting related model for combining stochastic and

online solutions for the Adwords problem is considered in [63], however their approach

does not give an improved approximation algorithm for the iid model.

40

Chapter 3

Bicriteria Online Matching:

Maximizing Weight and

Cardinality

In the past decade, there has been much progress in designing better algorithms for

online matching problems. This line of research has been inspired by interesting com-

binatorial techniques that are applicable in this setting, and by online ad allocation

problems. For example, the display advertising problem has been modeled as maxi-

mizing the weight of an online matching instance [35, 34, 25, 15, 24]. While weight is

indeed important, this model ignores the fact that cardinality of the matching is also

crucial in the display ad application. This example illustrates the fact that in many

real applications of online allocation, one needs to optimize multiple objective func-

tions, though most of the previous work in this area deals with only a single objective

function. On the other hand, there is a large body of work exploring offline multi-

objective optimization in the approximation algorithms literature. In this chapter, we

focus on simultaneously maximizing online two objectives which have been studied

extensively in matching problems: cardinality and weight. Besides being a natural

mathematical problem, this is motivated by online display advertising applications.

Applications in Display Advertising. In online display advertising, advertisers

typically purchase bundles of millions of display ad impressions from web publishers.

41

Display ad serving systems that assign ads to pages on behalf of publishers must

satisfy the contracts with advertisers, respecting targeting criteria and delivery goals.

Modulo this, publishers try to allocate ads intelligently to maximize overall quality

(measured, for example, by clicks), and therefore a desirable property of an ad serv-

ing system is to maximize this quality while satisfying the contracts to deliver the

purchased number n(a) impressions to advertiser a. This has been modeled in the

literature (e.g., [35, 2, 76, 25, 15, 24]) as an online allocation problem, where quality

is represented by edge weights, and contracts are enforced by overall delivery goals:

While trying to maximize the weight of the allocation, the ad serving systems should

deliver n(a) impressions to advertiser a. However, online algorithms with adversarial

input cannot guarantee the delivery of n(a) impressions, and hence the goals n(a)

were previously modeled as upper bounds. But maximizing the cardinality subject to

these upper bounds is identical to delivering as close to the targets as possible. This

motivates our model of the display ad problem as simultaneously maximizing weight

and cardinality.

Problem Formulation. More specifically, we study the following bicriteria online

matching problem: consider a set of bins (also referred to as fixed nodes, or ads) A

with capacity constraints n(a) > 0, and a set of online items (referred to as online

nodes, or impressions or pageviews) I arriving one by one. Upon arrival of an online

item i, a set Si of eligible bins (fixed node neighbors) for the item is revealed, together

with a weight wia for eligible bin a ∈ Si. The problem is to assign each item i to an

eligible bin in Si or discard it online, while respecting the capacity constraints, so bin

a gets at most n(a) online items. The goal is to maximize both the cardinality of the

allocation (i.e. the total number of assigned items) and the sum of the weights of the

allocated online items.

It was shown in [35] that achieving any positive approximation guarantee for the

total weight of the allocation requires the free disposal assumption, i.e. that there is no

penalty for assigning more online nodes to a bin than its capacity, though these extra

nodes do not count towards the objective. In the advertising application, this means

that in the presence of a contract for n(a) impressions, advertisers are only pleased by

42

– or at least indifferent to – getting more than n(a) impressions. More specifically, if a

set Ia of online items are assigned to each bin a, and Ia(k) denotes the set of k online

nodes with maximum weight in Ia, the goal is to simultaneously maximize cardinality

which is∑

a∈A min(|Ia|, n(a)), and total weight which is∑

a∈A∑

i∈Ia(n(a))wia.

Throughout this chapter, we use Wopt to denote the maximum weight matching,

and overload this notation to also refer to the weight of this matching. Similarly, we

use Copt to denote both the maximum cardinality matching and its cardinality. Note

that Copt and Wopt may be distinct matchings. We aim to find (α, β)-approximations

for the bicriteria online matching problem: These are matchings with weight at least

αWopt and cardinality at least βCopt. Our approach is to study parametrized approx-

imation algorithms that allow a smooth tradeoff curve between the two objectives,

and prove both approximation and hardness results in this framework. As an offline

problem, the above bicriteria problem can be solved optimally in polynomial time,

i.e., one can check if there exists an assignment of cardinality c and weight w respect-

ing capacity constraints. (One can verify this by observing that the integer linear

programming formulation for the offline problem is totally unimodular, and therefore

the problem can be solved by solving the corresponding LP relaxation.) However in

the online competitive setting, even maximizing one of these two objectives does not

admit better than a 1 − 1/e approximation [56]. A naive greedy algorithm gives a

12-approximation for maximizing a single objective, either for cardinality or for total

weight under the free disposal assumption.

3.1 Results and Techniques

The seminal result of Karp, Vazirani and Vazirani [56] gives a simple randomized (1−

1/e)-competitive algorithm for maximizing cardinality. For the weight objective, no

algorithm better than the greedy 1/2-approximation is known, but for the case of large

capacities, a 1−1/e-approximation has been developed [35] following the primal-dual

analysis framework of Buchbinder et al. [18, 65]. Using these results, one can easily

get a (p2, (1−p)(1− 1

e))-approximation for the bicriteria online matching problem with

43

small capacities, and a(p(1− 1

e), (1− p)(1− 1

e))-approximation for large capacities.

These factors are achieved by applying the online algorithm for weight, WeightAlg, and

the online algorithm for cardinality, CardinalityAlg, as subroutines as follows: When

an online item arrives, pass it to WeightAlg with probability p, and CardinalityAlg with

probability 1− p. As for a hardness result, it is easy to show that an approximation

factor better than (α, 1 − α) is not achievable for any α > 0. There is a large gap

between the above approximation factors and hardness results. For example, the

naive algorithm gives a (0.4(1− 1/e), 0.6(1− 1/e)) ≈ (0.25, 0.38)-approximation, but

the hardness result does not preclude a (0.4, 0.6)-approximation. In this chapter, we

tighten the gap between these lower and upper bounds, and present new tradeoff

curves for both algorithms and hardness results. Our lower and upper bound results

are summarized in Figure 3-1. For the case of large capacities, these upper and lower

bound curves are always close (with a maximum vertical gap of 9%), and exactly

coincide at the point (0.43, 0.43).

We first describe our hardness results. In fact, we prove three separate inapprox-

imability results which can be combined to yield a ‘hardness curve’ for the problem.

The first result gives better upper bounds for large values of β; this is based on

structural properties of matchings, proving some invariants for any online algorithm

on a family of instances, and writing a factor-revealing mathematical program (see

Section 3.2.1). The second main result is an improved upper bound for large values

of α, and is based on a new family of instances for which achieving a large value for

α implies very small values of β (see Section 3.2.2). Finally, we show that for any

achievable (α, β), we have α + β ≤ 1− 1e2

(see Theorem 3.2.2).

These hardness results show the limit of what can be achieved in this model. We

next turn to algorithms, to see how close we can come to these limits. The key to our

new algorithmic results lies in the fact that though each subroutine WeightAlg and

CardinalityAlg only receives a fraction of the online items, it can use the entire set of

bins. This may result in both subroutines filling up a bin, but if WeightAlg places t

items in a bin, we can discard t of the items placed there by CardinalityAlg and still

get at least the cardinality obtained by CardinalityAlg and the weight obtained by

44

Figure 3-1: New curves for upper and lower bounds.

WeightAlg. Each subroutine therefore has access to the entire bin capacity, which is

more than it ‘needs’ for those items passed to it. Thus, its competitive ratio can be

made better than 1 − 1/e. For large capacities, we prove the following theorem by

extending the primal-dual analysis of Buchbinder et al. and Feldman et al. [35, 65,

18].

Theorem 3.1.1. For all 0 < p < 1, there is an algorithm for the bicriteria online

matching problem with competitive ratios tending to(p(1− 1

e1/p), (1− p)(1− 1

e1/(1−p)))

as minan(a) tends to infinity.

For small capacities, our result is more technical and is based on studying struc-

tural properties of matchings, proving invariants for our online algorithm over any

instance, and solving a factor-revealing LP that combines these new invariants and

previously known combinatorial techniques by Karp, Vazirani, Vazirani, and Birn-

baum and Mathieu [56, 17]. Factor revealing LPs have been used in the context of

online allocation problems [65, 64]. In our setting, we need to prove new variants

and introduce new inequalities to take into account and analyze the tradeoff between

the two objective functions. This result can also be parametrized by p, the fraction

45

of items sent to WeightAlg, but we do not have a closed form expression. Hence, we

state the result for p = 1/2.

Theorem 3.1.2. For all 0 ≤ p ≤ 1, the approximation guarantee of our algorithm

for the bicriteria online matching problem is lower bounded by the green curve of

Figure 3-1. In particular, for p = 1/2, we have the point (1/3, 0.3698).

Related Work. Our work is related to online ad allocation problems, including

the Display Ads Allocation (DA) problem [35, 34, 2, 76], and the AdWords (AW)

problem [65, 23]. In both of these problems, the publisher must assign online im-

pressions to an inventory of ads, optimizing efficiency or revenue of the allocation

while respecting pre-specified contracts. The Display Ad (DA) problem is the online

matching problem described above only considering the weight objective [35, 15, 24].

In the AdWords (AW) problem, the publisher allocates impressions resulting from

search queries. Advertiser a has a budget B(a) on the total spend instead of a bound

n(a) on the number of impressions. Assigning impression i to advertiser a consumes

wia units of a’s budget instead of 1 of the n(a) slots, as in the DA problem. For both

of these problems, 1− 1e-approximation algorithms have been designed under the as-

sumption of large capacities [65, 18, 35]. None of the above papers for the adversarial

model study multiple objectives at the same time.

Besides the adversarial model studied in this chapter, online ad allocations have

been studied extensively in various stochastic models. In particular, the problem

has been studied in the random order model, where impressions arrive in a random

order [23, 34, 2, 76, 55, 64, 67]; and the iid model in which impressions arrive iid ac-

cording to a known (or unknown) distribution [36, 66, 46, 25, 26]. In such stochastic

settings, primal and dual techniques have been applied to getting improved approxi-

mation algorithms. These techniques are based on computing offline optimal primal

or dual solutions of an expected instance, and using this solution online [36, 23]. It is

not hard to generalize these techniques to the bicritera online matching problem. In

this chapter, we focus on the adversarial model. Note that in order to deal with traffic

spikes, adversarial competitive analysis is important from a practical perspective, as

46

discussed in [67].

Most previous work on online problems with multiple objectives has been in the

domain of routing and scheduling, and with different models. Typically, goals are to

maximize throughput and fairness; see the work of Goel et al. [43, 42], Buchbinder

and Naor [19], and Wang et al. [78]. In this literature, different objectives often come

from applying different functions on the same set of inputs, such as processing times

or bandwidth allocations. In a model more similar to ours, Bilo et al. [16] consider

scheduling where each job has two different and unrelated requirements, processing

time and memory; the goal is to minimize makespan while also minimizing maximum

memory requirements on each machine. In another problem with distinct metrics,

Flammini and Nicosia [38] consider the k-server problem with a distance metric and

time metric defined on the set of service locations. However, unlike our algorithms,

theirs do not compete simultaneously against the best solution for each objective;

instead, they compete against offline solutions that must simultaneously do well on

both objectives. Further, the competitive ratio depends on the relative values of the

two objectives. Such results are of limited use in advertising applications, for instance,

where click-through rates per impression may vary by several orders of magnitude.

3.2 Hardness Instances

In this section for any 0 ≤ α ≤ 1−1/e, we prove upper bounds on β such that the bi-

criteria online matching problem admits an (α, β)-approximation. Note that it is not

possible to achieve α-approximation guarantee for the total weight of the allocation

for any α > 1 − 1/e. We have two types of techniques to achieve upper bounds: a)

Factor-Revealing Linear Programs, b) Super Exponential Weights Instances, which

are discussed in Subsections 3.2.1, and 3.2.2 respectively. Factor revealing LP hard-

ness instances give us the red upper bound curve in Figure 3-1. The orange upper

bound curve in Figure 3-1 is proved by Super Exponential Weights Instances pre-

sented in Subsection 3.2.2, and the black upper bound line in Figure 3-1 is proved in

Theorem 3.2.2.

47

3.2.1 Better Upper Bounds via Factor-Revealing Linear Pro-

grams

We construct an instance, and a linear program LPα,β based on the instance where

α and β are two parameters in the linear program. We prove that if there exists an

(α, β)-approximation for the bicriteria online matching problem, we can find a feasible

solution for LPα,β based on the algorithm’s allocation for the generated instance.

Finally we find out for which pairs (α, β) the linear program LPα,β is infeasible.

These pairs (α, β) are upper bounds for the bicriteria online matching problem.

For any two integers C, l, and some large weight W 4l2, we construct the

instance as follows. We have l phases, and each phase consists of l sets of C identical

items, i.e. l2C items in total. For any 1 ≤ t, i ≤ l, we define Ot,i to be the set i in

phase t that has C identical items. In each phase, we observe the sets of items in

increasing order of i. There are two types of bins: a) l weight bins b1, b2, · · · , bl which

are shared between different phases, b) l2 cardinality bins b′t,i1≤t,i≤l. For each phase

1 ≤ t ≤ l, we have l separate bins b′t,i1≤i≤l. The capacity of all bins is C. We pick

two permutations πt, σt ∈ Sn uniformly at random at the beginning of each phase t

to construct edges. We note that these permutations are private knowledge, and they

are not revealed to the algorithm. For any 1 ≤ i ≤ j ≤ l, we put an edge between

every item in set Ot,i and bin b′t,σt(j) with weight 1 where σt(j) is the jth number in

permutation σt. We also put an edge between every item in set Ot,i and bin bπt(j) (for

each j ≥ i) with weight W t.

Suppose there exists an (α, β)-approximation algorithm Aα,β for the bicriteria on-

line matching problem. For any 1 ≤ t, i ≤ l, let xt,i be the expected number of items

in set Ot,i that algorithm Aα,β assigns to weight bins bπt(j)lj=i. Similarly we define

yt,i to be the expected number of items in set Ot,i that algorithm Aα,β assigns to car-

dinality bins b′t,σt(j)lj=i. We know that when set Ot,i arrives, although the algorithm

can distinguish between weight and cardinality bins, it sees no difference between

the weight bins bπt(j)lj=i, and no difference between the cardinality bins b′t,σt(j)lj=i.

By uniform selection of π and σ, we ensure that in expectation the xt,i items are

48

allocated equally to weight bins bπt(j)lj=i, and the yt,i items are allocated equally

to cardinality bins b′t,σt(j)lj=i. In other words, for 1 ≤ i ≤ j ≤ l, in expectation

xt,i/(l− i+ 1) and yt,i/(l− i+ 1) items of set Ot,i is allocated to bins bπt(j) and b′t,σt(j),

respectively. It is worth noting that similar ideas have been used in previous papers

on online matching [56, 17].

Since weights of all edges to cardinality bins are 1, we can assume that the items

assigned to cardinality bins are kept until the end of the algorithm, and they will not

be thrown away. We can similarly say that the weights of all items for weight bins

is the same in a single phase, so we can assume that an item that has been assigned

to some weight bin in a phase will not be thrown away at least until the end of the

phase. However, the algorithm might use the free disposal assumption for weight

bins in different phases. We have the following capacity constraints on bins bπt(j) and

b′t,σt(j):

∀1 ≤ t, j ≤ l:∑j

i=1 xt,i/(l − i+ 1) ≤ C &∑j

i=1 yt,i/(l − i+ 1) ≤ C. (3.1)

At any stage of phase t, the total weight assigned by the algorithm cannot be

less than α times the optimal weight allocation up to that stage, or we would not

have weight αWopt if the input stopped at this point. After set Ot,i arrives, the

maximum weight allocation achieves at least total weight CiW t which is achieved by

assigning items in set Ot,i′ to weight bin bπt(i′) for each 1 ≤ i′ ≤ i. On the other

hand, the expected weight in allocation of algorithm Aα,β is at most C(tl+W t−1l) +

W t∑i

i′=1 xt,i′ ≤ W t(C/√W +

∑ii′=1 xt,i′). Therefore we have the following inequality

for any 1 ≤ t, i ≤ l:

i∑i′=1

xt,i′/C ≥ αi− 1/√W. (3.2)

We prove in Lemma 3.2.1 that the linear program LPα,β is feasible if there exists

an algorithm Aα,β by defining pi =∑l

t=1 xt,i/lC, and qi =∑l

t=1 yt,i/lC. Now for any

α, we can find the maximum β for which the LPα,β has some feasible solution for

49

large values of l and W . These factor-revealing linear programs yield the red upper

bound curve in Figure 3-1.

LPα,βC1:

∑ii′=1 pi′ ≥ αi− 1/

√W ∀1 ≤ i ≤ l

C2:∑l

i=1 qi ≥ lβ − 1C3: pi + qi ≤ 1 ∀1 ≤ i ≤ l

C4:∑j

i=1 pi/(l − i+ 1) ≤ 1 ∀1 ≤ j ≤ l

C5:∑j

i=1 qi/(l − i+ 1) ≤ 1 ∀1 ≤ j ≤ l

Lemma 3.2.1. If there exists an (α, β)-approximation algorithm for the bicriteria

online matching problem, there exists a feasible solution for LPα,β as well.

Proof. We claim that the values pi =∑l

t=1 xt,i/lC, and qi =∑l

t=1 yt,i/lC form a

feasible solution for LPα,β. Intuitively, pi and qi are the average values of xt,i/C

and yt,i/C in the l different phases. Since we have inequality 3.2 for each value of t,

their average values also admit the same type of inequlity which is constraint C1 in

LPα,β. To prove that constraint C2 holds, we should look at the total cardinality of

algorithm’s allocation in all phases. The optimal cardinality allocation is to assign

set Ot,i to b′t,σt(i) for all 1 ≤ t, i ≤ l which assigns all items and achieves l2C in

cardinality. But algorithm Aα,β assigns at most lC items to weight bins at the end

(after applying free disposals), and∑

1≤t,i≤l yt,i items to cardinality bins. Since it is a

β-approximation for cardinality, we should have that∑

1≤t,i≤l yt,i+ lC ≥ l2Cβ. Based

on definition of qili=1, this inequality is equivalent to constraint C2. Constraint C3

holds because the expected total number of the items the algorithm assigns from each

set to weight and cardinality bins can not be more than C, the number of items in

the set. Constraints C4 and C5 are derived from inequalities of Equation 3.1.

In addition to computational bounds for infeasibility of certain (α, β) pairs, we

can theoretically prove in Theorem 3.2.2 that for any (α, β) with α + β > 1 − 1/e2,

the LPα,β is infeasible so there exists no (α, β) approximation for the problem. We

note that Theorem 3.2.2 is a simple generalization of the 1− 1/e hardness result for

the classic online matching problem [56, 17].

50

Theorem 3.2.2. For any small ε > 0, and α + β ≥ 1 − 1/e2 + ε, there exists no

(α, β)-approximation algorithm for the bicriteria matching problem.

Proof. We just need to show that LPα,β is infeasible. Given a solution of LPα,β, we

find a feasible solution for LP ′ε defined below by setting ri = pi + qi for any 1 ≤ i ≤ l.

LP ′ε∑l

i=1 ri ≥ (1− 1/e2 + ε/2)lri ≤ 1 ∀1 ≤ i ≤ l∑j

i=1 ri/(l − i+ 1) ≤ 2 ∀1 ≤ j ≤ l

The first inequality in LP ′ε is implied by summing up the constraint C1 for i = l,

and constraint C2 in LPα,β, and also using the fact that α+β ≥ (1−1/e2+ε/2)+ε/2.

We note that the ε/2 difference between the α + β and 1 − 1/e2 + ε/2 takes care of

−1/√W and −1 in the right hand sides of constraints C1 and C2 for large enough

values of l and W . Now we prove that LP ′ε is infeasible for any ε > 0 and large

enough l. Suppose there exists a feasible solution r1, r2, · · · , rn. For any pair 1 ≤

i < j ≤ n, if we have ri < 1 and rj > 0, we update the values of ri and rj to

rnewi = ri + min1− ri, rj, and rnewj = rj −min1− ri, rj. Since we are moving the

same amount from rj to ri (for some i < j), all constraints still hold. If we do this

operation iteratively until there is no pair ri and rj with the above properties, we reach

a solution r′ili=1 of this form: 1, 1, · · · , 1, x, 0, 0, · · · , 0 for some 0 ≤ x ≤ 1. Let t be

the maximum index for which r′t is 1. Using the third inequality for j = l, we have that∑ti=1 1/(l−i+1) ≤ 2 which means that ln (l/(l − t+ 1)) ≤ 2. So t is not greater than

l(1−1/e2), and consequently∑l

i=1 r′i ≤ t+1 ≤ (1−1/e2)l+1 < (1−1/e2+ε/2)l. This

contradiction proves that LP ′ε is infeasible which completes the proof of theorem.

3.2.2 Hardness Results for Large Values of Weight Approx-

imation Factor

The factor-revealing linear program LPα,β gives almost tight bounds for small values

of α. In particular, the gap between the the upper and lower bounds for the cardinality

approximation ratio β is less than 0.025 for α ≤ (1 − 1/e2)/2. But for large values

of α (α > (1 − 1/e2)/2), this approach does not give anything better than the α +

51

β ≤ 1 − 1/e2 bound proved in Theorem 3.2.2 . This leaves a maximum gap of

1/e − 1/e2 ≈ 0.23 between the upper and lower bounds at α = 1 − 1/e. In order to

close the gap at α = 1 − 1/e, we present a different analysis based on a new set of

instances, and reduce the maximum gap between lower and upper bounds from 0.23

to less than 0.09 for all values of α ≥ (1− 1/e2)/2.

The main idea is to construct a hardness instance Iγ for any 1/e ≤ γ < 1, and

prove that for any 0 ≤ p ≤ 1−γ, the pair (1−1/e−f(p), p/(1−γ)) is an upper bound

on (α, β) where f(p) is pe(γ+p)

. In other words, there exists no (α, β)-approximation

algorithm for this problem with both α > 1 − 1/e − f(p) and β > p/(1 − γ). By

enumerating different pairs of γ and p, we find the orange upper bound curve in

Figure 3-1.

For any γ ≥ 1/e, we construct instance Iγ as follows: The instance is identical to

the hardness instance in Subsection 3.2.1, but we change some of the edge weights. To

keep the description short, we only describe the edges with modified weights here. Let

r be b0.5 log1/γ lc. In each phase 1 ≤ t ≤ l, we partition the l sets of items Ot,ili=1

into r groups. The first l(1 − γ) sets are in the first group. From the remaining γl

sets, we put the first (1 − γ) fraction in the second group and so on. Formally, we

put set Ot,i in group 1 ≤ z < r for any i ∈ [l− lγz−1 + 1, l− lγz]. Group r of phase t

contains the last lγr−1 sets of items in phase t. The weight of all edges from sets of

items in group z in phase t is W (t−1)r+z for any 1 ≤ z ≤ r and 1 ≤ t ≤ l.

Given an (α, β)-approximation algorithm Aα,β, we similarly define xt,i and yt,i to

be the expected number of items from set Ot,i assigned to weight and cardinality bins

by algorithm Aα,β respectively. We show in the following lemma that in order to have

a high α, the algorithm should allocate a large fraction of sets of items in each group

to the weight bins.

Lemma 3.2.3. For any phase 1 ≤ t ≤ l, and group 1 ≤ z < r, if the expected number

of items assigned to cardinality bins in group z of phase t is at least plCγz−1 (which

is p times the number of all items in groups z, z + 1, · · · , r of phase t), the weight

approximation ratio cannot be greater than 1− 1/e− f(p) where f(p) is pe(γ+p)

.

52

Proof. We define w = W (t−1)r+z which is the weight of edges from items in group z

of phase t to weight bins. At the end of group z of phase t, we look at the expected

number of items allocated to cardinality bins in this group. If it is more than plCγz−1,

we construct a new instance I ′ by changing the weights of the edges of the next items

as follows. Instance I ′ is identical to our instance Iγ up to the end of group z in

phase t. After this group, everything is identical to Iγ except the edge weights that

we change as follows. For every group r ≥ z′ > z in phase t, instead of setting the

weights of edges between items and bins to W (t−1)r+z′ , we set them to the weights

of edges in group z to weight bins which is w = W (t−1)r+z. After phase t, we set

the weights of all edges from items in phases t + 1, t + 2, · · · , l to both weight and

cardinality bins to zero. Since our instance Iγ, and the new instance I ′ are identical

up to the end of group z of phase t, we know that the algorithm has the same expected

allocation up to that point for both instances.

To simplify notation in the rest of the proof, we rename the items in groups

z, z + 1, · · · , r of phase t, and their associated weight bins. We define l′ to be lγz−1

which is the number of sets of items in groups z, z + 1, · · · , r of phase t. For any

1 ≤ i ≤ l′, we let O′i to be the set of items Ot,l−l′+i, and also let b′′i to be the weight

bin bπt(l−l′+i). In instance I ′, the optimum weight allocation is at least wl′C which can

be achieved by assigning set of items O′i to weight bin b′′i for any 1 ≤ i ≤ l′. We also

know that the weight any algorithm achieves in instance I ′ is from assigning these l′

items to the weight bins, because the total weights achieved before and after these l′

sets of items is negligible by the choice of large W . We define ri to be the fraction

of items in set O′i assigned to weight bins b′′jl′j=i for any 1 ≤ i ≤ l′. We want to

prove that if∑(1−γ)l′

i=1 (1 − ri) ≥ pl′, the weight approximation ratio of the algorithm

is at most 1 − 1/e − f(p). We note that group z is the first 1 − γ fraction of these

l′ sets. We prove that even if the algorithm tries to allocate all items to weight bins

b′′i l′i=1, it cannot saturate more than 1− 1/e fraction of these l′ weight bins in total.

Define r′j to be the fraction of weight bin b′′j which is filled with items O′iji=1 for

any 1 ≤ j ≤ l′. When set of items O′i arrive, the algorithm do not see any difference

between items b′′jl′j=i, and we choose their order to be a random permutation. So in

53

expectation, the algorithm cannot saturate more than 1/(l′− i+ 1) fraction of bin b′′j

with set O′i for any 1 ≤ i ≤ j ≤ l′. Therefore for any 1 ≤ j ≤ l′, the fraction r′j is

at most min1,∑j

i=1 1/(l′− i+ 1). Summing up these upper bounds for all l′ values

of j gives us (1− 1/e)l′ upper bound when l′ goes to infinity. Applying the values of

ri’s we get the following bound on the total filled fraction of weight bins b′′jl′j=1:

l′∑j=1

r′j ≤l′∑j=1

min1,j∑i=1

ri/(l′ − i+ 1) =

l′∑j=1

min1,j∑i=1

1/(l′ − i+ 1)−

l′∑j=1

(min1,

j∑i=1

1/(l′ − i+ 1) −min1,j∑i=1

ri/(l′ − i+ 1)

)=

(1− 1/e)l′ −l′(1−1/e)∑j=1

j∑i=1

(1− ri)/(l′ − i+ 1)−

l′∑j=l′(1−1/e)+1

max

0,

l′(1−1/e)∑i=1

(1− ri)/(l′ − i+ 1)−j∑

i′=l′(1−1/e)+1

ri′/(l′ − i′ + 1)

Now we want to upper bound the above expression by (1 − 1/e − f(p))l′ using

the fact that∑l′(1−γ)

i=1 (1 − ri) is at least pl′. It is not hard to see that the above

expression is maximized when ri′ = 1 for any i′ > l′(1 − γ). We also observe that

for any 1 ≤ i ≤ l′(1 − γ), the fraction 1 − ri which is the lack of contribution of

items in set O′i to the weight bins is distributed evenly between bins bjl′j=i. For

the first l′(1 − 1/e) bins, this lack will be counted for sure, but for other bins after

the threshold l′(1 − 1/e), there is some chance that sets of items O′i′l′

i′=l′(1−1/e)+1

cover up for the lack of previous items, and their lacks will not be counted in the

sum. So the maximum of the above expression is achieved when the lacks are for bins

with higher indices, because this way the lack will be distributed more on the bins

after threshold l′(1−1/e) and less before that. We conclude with the following upper

bound on∑l′

j=1 r′j inequality in which ri is zero for i ∈ [l′(1− γ − p), l′(1− γ)] and is

54

1 for other values of i:

(1−1/e)l′−pl′∑k=1

(k + l′(γ − 1/e))

(k + l′γ)−

l′/e∑k=1

(max

0,

pl′∑k′=1

1

(k′ + l′γ)−

k∑k′′=1

1

(l′/e− k′′ + 1)

)

For large values of l and therefore l′, we can write each sum in the above formula

in an integral form by defining variable x to be the index of each sum divided by l′.

We will have the above formula equal to:

l′(

1− 1/e−∫ p

x=0

x+ γ − 1/e

x+ γdx−

∫ p∗

x=0

(ln(

γ + p

γ)− ln(

1/e

1/e− x)

)dx

)

= l′(

1− 1/e− (p− ln(p+ γ

γ)/e)− p

e(γ + p)ln(

γ + p

γ) +

1

e(1 +

γ ln(γ/((γ + p)e))

γ + p)

)

= 1− 1/e− p

e(γ + p)

where p∗ is some fraction such that for x = p∗ we have that γ+pγ

= 1/e1/e−x . In

other words, we should take the integral until the point that the maximum of zero

and the difference of the two summations becomes zero. Computing the integrals,

and summing them up gives us a simple formula 1− 1/e− p/e(γ + p) for the above

expression.

We conclude this part with the main result of this subsection:

Theorem 3.2.4. For any small ε > 0, 1/e ≤ γ < 1, and 0 ≤ p ≤ 1−γ, any algorithm

for bicriteria online matching problem with weight approximation guarantee, α, at

least 1− 1/e− f(p) cannot have cardinality approximation guarantee, β, greater than

p/(1− γ) + ε.

55

Proof. Using Lemma 3.2.3, for any group 1 ≤ z < r in any phase 1 ≤ t ≤ l,

we know that at most p fraction of items are assigned to cardinality bins, because

1−1/e−f(p) is a strictly increasing function in p. Since in each phase the number of

items is decreasing with a factor of γ in consecutive groups, the total fraction of items

assigned to cardinality bins is at most p+pγ+pγ2+· · ·+pγr−2 plus the fraction of items

assigned to cardinality in the last group r of phase t. Even if the algorithm assigns

all of group r to cardinality, it does not achieve more than fraction γr−1 from these

items in each phase. Since the optimal cardinality algorithm can match all items,

the cardinality approximation guarantee is at most p(1 + γ + γ2 + · · ·+ γr−2) + γr−1.

For large enough l (and consequently large enough r), this sum is not more than

p/(1− γ) + ε.

One way to compute the best values for p and γ corresponding to the best upper

bound curve is to solve some corresponding complex equations explicitly. Instead, we

compute these values numerically by trying different values of p and γ which, in turn,

yield the orange upper bound curve in Figure 3-1.

3.3 Algorithm for Large Capacities

We now turn to algorithms, to see how close one can come to matching the upper

bounds of the previous section. In this section, we assume that the capacity n(a) of

each bin a ∈ A is “large”, and give an algorithm with the guarantees in Theorem 3.1.1

as mina∈A n(a)→∞.

Recall that our algorithm Alg uses two subroutines WeightAlg and CardinalityAlg,

each of which, if given an online item, suggests a bin to place it in. Each item

i is independently passed to WeightAlg with probability p and CardinalityAlg with

the remaining probability 1 − p. First note that CardinalityAlg and WeightAlg are

independent and unaware of each other; each of them thinks that the only items

which exist are those passed to it. This allows us to analyze the two subroutines

separately.

We now describe how Alg uses the subroutines. If WeightAlg suggests matching

56

item i to a bin a, we match i to a. If a already has n(a) items assigned to it in total,

we remove any item assigned by CardinalityAlg arbitrarily; if all n(a) were assigned by

WeightAlg, we remove the item of lowest value for a. If CardinalityAlg suggests match-

ing item i to a′, we make this match unless a′ has already had at least n(a′) total items

assigned to it by both subroutines. In other words, the assignments of CardinalityAlg

might be thrown away by some assignments of WeightAlg; however, the total number

of items in a bin is always at least the the number assigned by CardinalityAlg. Items

assigned by WeightAlg are never thrown away due to CardinalityAlg; they may only

be replaced by later assignments of WeightAlg. Thus, we have proved the following

proposition.

Proposition 3.3.1. The weight and cardinality of the allocation of Alg are respec-

tively at least as large as the weight of the allocation of WeightAlg and the cardinality

of the allocation of CardinalityAlg.

Note that the above proposition does not hold for any two arbitrary weight func-

tions, and this is where we need one of the objectives to be cardinality. We now

describe WeightAlg and CardinalityAlg, and prove Theorem 3.1.1. WeightAlg is essen-

tially the exponentially-weighted primal-dual algorithm from [35], which was shown

to achieve a 1 − 1e

approximation for the weighted online matching problem with

large degrees. For completeness, we present the primal and dual LP relaxations for

weighted matching below, and then describe the algorithm. In the primal LP, for

each item i and bin a, variable xia denotes whether impression i is one of the n(a)

most valuable items for bin a.

Primal

max∑

i,awiaxia∑a xia ≤ 1 (∀ i)∑i xia ≤ n(a) (∀ a)

xia ≥ 0 (∀ i, a)

Dual

min∑

a n(a)βa +∑

i zi

βa + zi ≥ wia (∀i, a)

βa, zi ≥ 0 (∀i, a)

Following the techniques of Buchbinder et al. [18], the algorithm of [35] simul-

taneously maintains feasible solutions to both the primal and dual LPs. Each dual

57

variable βa is initialized to 0. When item i arrives online:

• Assign i to the bin a′ = arg maxawia− βa. (If this quantity is negative for all

a, discard i.)

• Set xia′ = 1. If a′ previously had n(a′) items assigned to it, set xi′a′ = 0 for the

least valuable item i′ previously assigned to a′.

• In the dual solution, set zi = wia′−βa′ and update dual variable βa′ as described

below.

Definition 3.3.2 (Exponential Weighting). Let w1, w2, . . . wn(a) be the weights of the

n(a) items currently assigned to bin a, sorted in non-increasing order, and padded

with 0s if necessary.

Set βa = 1

p·n(a)·((1+1/p·n(a))n(a)−1)

∑n(a)j=1 wj

(1 + 1

p·n(a)

)j−1.

Lemma 3.3.3. If WeightAlg is the primal-dual algorithm, with dual variables βa

updated acccording to the exponential weighting rule defined above, the total weight

of the allocation of WeightAlg is at least p ·(1− 1

k

)where k =

(1 + 1

p·d

)d, and d =

minan(a). Note that limd→∞ k = e1/p.

Before proving Lemma 3.3.3, we provide some brief intuition. If all items are

passed to WeightAlg, it was proved in [35] that the algorithm has competitive ratio

tending to 1−1/e as d = minan(a) tends to∞; this is the statement of Lemma 3.3.3

when p = 1. Now, suppose each item is passed to WeightAlg with probability p. The

expected value of the optimum matching induced by those items passed to WeightAlg

is at least p ·Wopt, and this is nearly true (up to o(1) terms) even if we reduce the

capacity of each bin a to p ·n(a). This follows since Wopt assigns at most n(a) items to

bin a, and as we are unlikely to sample more than p·n(a) of these items for the reduced

instance, we do not lose much by reducing capacities. But note that WeightAlg can

use the entire capacity n(a), while there is a solution of value close to pWopt even

with capacities p · n(a). This extra capacity allows an improved competitive ratio of

1− 1e1/p

, proving the lemma.

58

Proof of Lemma 3.3.3. We construct a feasible dual solution as follows. Recall that

the dual LP has a variable βa for each bin a and zi for each online item i. Though

WeightAlg is unaware of items which are not passed to it, we maintain dual variables

for these items as well, purely for the purpose of analysis. Let S denote the set of

items passed to WeightAlg, and I − S those passed to CardinalityAlg. For ease of

notation, we use e1/p to represent(

1 + 1p·d

)dwhere d = minan(a); as d tends to

infinity, the latter expression tends to e1/p.

For each item i, whether it is in S or not, we set zi = maxawia−βa, or zi = 0 if

wia − βa is negative for each bin a. For those items i ∈ S, if zi is positive, we update

βa using the update rule of Definition 3.3.2. This gives a feasible solution to the dual

LP defined on the entire set of items (including those in I − S).

In the previous analysis of weighted online matching in [35], following previous

work of Buchbinder et al. [18], one shows a competitive ratio of 1 − 1/e by arguing

that the change in the primal is at least (1 − 1/e) times the change in the dual at

each step. Since we end with a feasible primal and dual, the total value obtained by

the algorithm (the value of the primal solution) is at least (1− 1/e) ·Wopt.

Here, however, we do not compare the change in the primal to the change in

the dual directly. This is because, when an item in I − S arrives, there is a change

in the dual, but perhaps no change in the primal as the item does not get passed

to WeightAlg. To deal with this, we introduce a new ‘reduced’ objective function∑a∈A p · n(a)βa +

∑i∈S zi. We show that the change in primal is comparable to the

change in this new reduced objective. Though new reduced objective is not an upper

bound on the true optimal solution, it is not too difficult to see that it is at least

p ·Wopt: For the terms involving variables βa, we have simply scaled by p, but the

reduced objective only includes the terms zi for i ∈ S; this latter difference in the

objectives requires some care.

First, though, we examine the relationship between the primal and the reduced

objective. Consider how these change when an item i is sent to WeightAlg. If i is

unassigned (if each wia − βa is non-positive), we have zi = 0 and no change in either

primal or dual. Otherwise, let a be the bin that item i is assigned to, and let v be

59

the value of the currently lowest-valued item assigned to bin a. The change in the

primal is wia − v.

The change in the reduced objective is zi + p · n(a) times the change in βa. Let

βn, βo represent the new and old values of βa respectively. We assume that i now

becomes the most valuable item for bin a.1 Then, from the definition of our update

rule for βa, we have:

βn =

(1 +

1

p · n(a)

)βo +

wia(p · n(a))(e1/p − 1)

− v · e1/p

(p · n(a))(e1/p − 1)

Overall, then, the change in the reduced objective is:

zi + p · n(a) (βn − βo) = (wia − βo) +

(βo +

wiae1/p − 1

− v · e1/p

e1/p − 1

)= wia +

wiae1/p − 1

− v · e1/p

e1/p − 1

= (wia − v) ·(

e1/p

e1/p − 1

)= (wia − v) /

(1− 1

e1/p

)

Since the change in the primal is wia − v, we have that the change in the primal

is at least 1 − 1/e1/p times the change in the reduced objective. It remains only to

argue that the reduced objective is, in expectation, at least p times the original dual

objective. Recall that the reduced objective is∑

a∈A p · n(a)βa +∑

i∈S zi, while the

original dual objective is∑

a∈A n(a)βa +∑

i∈I zi.

The terms involving βa are simply scaled by p in the reduced objective, so for

every run of the algorithm, the contribution of these terms to the reduced objective is

exactly p times that to the original dual objective. However, of the terms involving zis,

the reduced objective only includes a subset which depends on which items are selected

for S. As each item is selected for S with probability p independently, we would like to

conclude that in expectation (though clearly not in every run),∑

i∈S zi = p∑

i∈I zi.

1It is easy to observe that this is the worst case for our algorithm, as this gives the maximumincrease in the dual; see [35].

60

However, the value of zi depends on the random allocation of items to S, since it

depends on each βa at the time item i arrives, and each βa is affected only by those

items in S. Still, it is possible to use linearity of expectation since zi is not a function

of whether i itself is assigned to S (in all cases, zi = wia−βa), only a function of which

previous items were assigned to S. Thus, the expected contribution of each item to

the reduced objective is exactly p times its expected contribution to the original dual

objective; this can be verified by some straightforward algebra which we omit.

Algorithm CardinalityAlg is identical to WeightAlg, except that it assumes all items

have weight 1 for each bin. Since items are assigned to CardinalityAlg with probability

1 − p, Lemma 3.3.3 implies the following corollary. This concludes the proof of

Theorem 3.1.1.

Corollary 3.3.4. The total cardinality of the allocation of CardinalityAlg is at least

(1−p)·(1− 1

k

), where k =

(1 + 1

(1−p)·d

)d, and d = minan(a). Note that limd→∞ k =

e1/(1−p).

3.4 Algorithm for Small Capacities

We now consider algorithms for the case when the capacities of bins are not large.

Without loss of generality, we assume that the capacity of each bin is one, because

we can think about a bin with capacity c as c identical bins with capacity one. So we

have a set A of bins each with capacity one, and a set of items I arriving online. As

before, we use two subroutines WeightAlg and CardinalityAlg, but the algorithms are

slightly different from those in the previous section.

In WeightAlg, we match item i (that has been passed to WeightAlg) to the bin that

maximizes its marginal value. Formally we match i to bin a = arg maxa∈A(wi,a−wi′,a)

where i′ is the last item assigned to a before item i.

In CardinalityAlg, we run the RANKING algorithm presented in [56]. So

CardinalityAlg chooses a permutation π uniformly at random on the set of bins A,

assigns an item i (that has been passed to it) to the bin a that is available, has the

minimum rank in π, and there is also an edge between i and a.

61

3.4.1 Lower Bounding the Weight Approximation Ratio

Let n = |I| be the number of items. We denote the ith arrived item by i. Let ai

be the bin that i is matched to in Wopt for any 1 ≤ i ≤ n. One can assume that

all unmatched items in the optimum weight allocation are matched with zero-weight

edges to an imaginary bin. So Wopt is equal to∑n

i=1wi,ai . Let S be the set of items

that have been passed to WeightAlg. If WeightAlg matches item i to bin aj for some

j > i, we call this a forwarding allocation (edge) because item j (the match of aj in

Wopt) has not arrived yet. We call it a selected forwarding edge if j ∈ S. We define

the marginal value of assigning item i to bin a to be wia minus the value of any item

previously assigned to a.

Lemma 3.4.1. The weight of the allocation of WeightAlg is at least (p/(p+ 1))Wopt.

Proof. Each forwarding edge will be a selected forwarding edge with probability p

because Pr[j ∈ S] is p for any j ∈ I. Let F be the total weight of forwarding edges

of WeightAlg, where by weight of a forwarding edge, we mean its marginal value (not

the actual weight of the edge). Similarly, we define Fs to be the sum of marginal

values of selected forwarding edges. We have the simple equality that the expected

value of F , E(F ), is E(Fs)/p. We define W ′ and Ws to be the total marginal values

of allocation of WeightAlg, and the sum∑

i∈S wi,ai . We know that E(Ws) is pWopt

because Pr[i ∈ S] is p. We prove that W ′ is at least Ws − Fs.

For every item i that has been selected to be matched by WeightAlg, we get at

least marginal value wi,ai minus the sum of all marginal values of items that have been

assigned to bin a by WeightAlg up to now. If we sum up all these lower bounds on our

gains for all selected items, we get Ws(=∑

i∈S wi,ai) minus the sum of all marginal

values of items that has been assigned to ai before item i arrives for all i ∈ S. The

latter part is exactly the definition of Fs. Therefore W ′ is at least Ws − Fs. We

also know that W ′ ≥ F . Using E[F ] ≥ E[Fs]/p, we have that E(W ′) is at least

E(Ws)− pE(W ′), and this yields the p/(p+ 1) approximation factor.

Corollary 3.4.2. The weight and cardinality approximation guarantees of Alg are

at least p/(p+ 1) and (1− p)/(1− p+ 1) respectively.

62

3.4.2 Factor Revealing Linear Program for CardinalityAlg

Our goal in this subsection is to prove an approximation factor for CardinalityAlg better

than the (1−p)/(1−p+1) bound of Theorem 3.1.2 by formulating a factor-revealing

LP that lower bounds it.

Proof of Theorem 3.1.2. We prove that the cardinality approximation ratio of

CardinalityAlg is lower bounded by the solution of the linear program LPk shown

below for any positive integer k. We state the proof and the linear program LPk for

the simpler case of p = 1/2, and show the necessary changes for general p ∈ [0, 1]

at the end of the proof. Before showing how this LP lower bounds the cardinality

approximation factor, we note that the first three lines of constraints in LPk hold for

the weighted version and in fact give us the 1/3 lower bound. The last inequality is

specific to cardinality and makes an improvement on the lower bound from 1/3 to

almost 0.37.

Minimize: β∀1 < i ≤ k: si ≥ si−1 & sfi ≥ sfi−1 & sbi ≥ sbi−1∀1 ≤ i ≤ k: ti ≥ ti−1 & ti ≥ sfi & si = sfi + sbi

β ≥ sk + tk & β ≥ 1/2− sfk∀1 < i ≤ k: si − si−1 ≥ 1/2k − (si + ti)/k

In CardinalityAlg, a uniformly random permutation π is selected on set A of bins.

Let A′ ⊆ A be the set of bins matched in Copt. We define A′′ to be the set of bins

matched in CardinalityAlg which depends on permutation π. We divide permutation

π into k equal parts each with |A|/k bins. For each a ∈ A′, we define i(a) to be the

match of a in Copt. Let Ac be the set of bins like a such that i(a) has been selected to

be passed to CardinalityAlg. For each 1 ≤ i ≤ k, we define sets Si, and Ti as follows:

Si = A′∩A′′∩Ac∩πj|1 ≤ j ≤ (i|A|/k), and Ti = A′∩A′′∩πj|1 ≤ j ≤ (i|A|/k)\Ac.

In other words, Si is the set of bins in the first i parts of π that are matched in

both Copt and our algorithm, and their matches in Copt are selected to be passed to

CardinalityAlg. The only difference for Ti is that their matches in Copt are not selected

for CardinalityAlg. We also partition Si into two sets: SFi the set of bins that has

been matched in our algorithm with forwarding edges, and the rest in set SBi. We

63

remind that a forwarding edge in our algorithm is an edge matching an item to a bin

a such that i(a) has not arrived yet. We prove that this is a feasible solution for LPk:

si = E[|Si|/|A′|], sfi = E[|SFi|/|A′|], sbi = E[|SBi|/|A′|], and ti = E[|Ti|/|A′|].

Now we prove the constraints of LPk hold. The first 4 constraints are the mono-

tonicity constraints which hold by definition, the same is true for the 6th constraint.

The constraint ti ≥ sfi holds because every forwarding edge (incident to a bin in the

first i parts of π) with probability 1/2 will be counted in sfi, and with the other 1/2

will be counted in ti. This way everything in sfi will be counted, but there might

be other uncounted bins in ti. This is why we get an inequality instead of equality.

We have β at least 1/2 − sfk because in expectation half of the items matched in

Copt = |A′| will be selected for cardinality, and the number of them that are not

matched in CardinalityAlg is at most sfk|A′|. The inequality β ≥ sk + tk holds by

definition. These inequalities give us the 1/3 lower bound on β. We add the follow-

ing inequality as well to get a better approximation ratio for cardinality. We note

that similar (and simpler) inequalities have been presented in the literature of online

matching, e.g. Lemma 5 in [17].

∀1 ≤ i ≤ k, si − si−1 ≥ 1/2k − (si + ti)/k

Let s0 = 0 to make the math consistent. This inequality lower bounds the expected

number of matched bins in A′ ∩A′′ ∩Ac in part i of permutation π. For each a ∈ A′

in part i of permutation π, we prove that the probability that a ∈ Ac, and a is in A′′

is at least 1/2 − (si + ti)/k. With probability 1/2, a is in set Ac. In this case, we

know that if no item is matched to a by CardinalityAlg, item i(a) has been assigned by

CardinalityAlg to some bin a′ whose rank in π is smaller than a. This means that i(a)

is matched to one of the bins in Si or Ti. Since π is selected uniformly at random,

a is selected uniformly at random from A′ ∩ Ac. So the probability of this event

(i(a) being matched to some bin before a in π) given that a ∈ A′ ∩ Ac is at most

(si+ ti)/|A′∩Ac|. Therefore with probability at least (1− (si+ ti)/|A′∩Ac|)/2, bin a

is matched by CardinalityAlg. Summing up these lower bounds on probabilities for all

64

different choices of a, and applying the fact that in expectation there are |A′|/k bins

like a ∈ A′ in part i of π, we get the lower bound on si−si−1 ≥ 1/2k−(si+ti)/2k. We

note that E(|A′∩Ac|) = E(|A′|)/2, and for large values of |A′|, the ratio |A′|/|A′∩Ac|

is around 2 with high precision, i.e. it is at most 2+ε for any small constant ε > 0, and

large enough |A′|. Since the solution for this linear program is greater than 0.3698 for

large enough k, we know that the approximation ratio of CardinalityAlg and therefore

our algorithm is at least 0.3698.

For any other value of p, we can change the LPk as follows. We just need to

change three constraints in LPk. Instead of ti ≥ sfi, we have ti/p ≥ sfi/(1 − p)

because with probability p we choose each item for weight and with the remaining

probability 1 − p for cardinality. With a similar reason, we have β ≥ (1 − p) − sfk,

and si−si−1 ≥ (1−p)/k−(si+ti)/k instead of their‘ simpler versions. We enumerate

on different values of p, and solve the LPk for each of these values to get the green

lower bound curve in Figure 3-1.

65

Chapter 4

Submodular Secretary Problem

and its Extensions

Online auction is the essence of many modern markets, particularly networked mar-

kets, in which information about goods, agents, and outcomes is revealed over a period

of time, and the agents must make irrevocable decisions without knowing future in-

formation. Optimal stopping theory is a powerful tool for analyzing such scenarios

which generally require optimizing an objective function over the space of stopping

rules for an allocation process under uncertainty. Combining optimal stopping the-

ory with game theory allows us to model the actions of rational agents applying

competing stopping rules in an online market. This first has been done by Hajiaghayi

et al. [48] who considered the well-known secretary problem in online settings and

initiated several follow-up papers (see e.g. [7, 8, 9, 47, 52, 59]).

Perhaps the most classic problem of stopping theory is the secretary problem.

Imagine that you manage a company, and you want to hire a secretary from a pool of

n applicants. You are very keen on hiring only the best and brightest. Unfortunately,

you cannot tell how good a secretary is until you interview her, and you must make

an irrevocable decision whether or not to make an offer at the time of the interview.

The problem is to design a strategy which maximizes the probability of hiring the

most qualified secretary. It is well-known since 1963 [27] that the optimal policy is

to interview the first t − 1 applicants, then hire the next one whose quality exceeds

66

that of the first t − 1 applicants, where t is defined by∑n

j=t+11j−1 ≤ 1 <

∑nj=t

1j−1 ;

as n → ∞, the probability of hiring the best applicant approaches 1/e, as does

the ratio t/n. Note that a solution to the secretary problem immediately yields an

algorithm for a slightly different objective function optimizing the expected value of

the chosen element. Subsequent papers have extended the problem by varying the

objective function, varying the information available to the decision-maker, and so

on, see e.g., [3, 41, 75, 80].

An important generalization of the secretary problem with several applications

(see e.g., a survey by Babaioff et al. [8]) is called the multiple-choice secretary problem

in which the interviewer is allowed to hire up to k ≥ 1 applicants in order to maximize

performance of the secretarial group based on their overlapping skills (or the joint

utility of selected items in a more general setting). More formally, assuming applicants

of a set S = a1, a2, · · · , an (applicant pool) arriving in a uniformly random order,

the goal is to select a set of at most k applicants in order to maximize a profit function

f : 2S 7→ R. We assume f is non-negative throughout this chapter. For example,

when f(T ) is the maximum individual value [39, 40], or when f(T ) is the sum of

the individual values in T [59], the problem has been considered thoroughly in the

literature. Indeed, both of these cases are special monotone non-negative submodular

functions that we consider in this chapter. A function f : 2S 7→ R is called submodular

if and only if ∀A,B ⊆ S : f(A) + f(B) ≥ f(A ∪ B) + f(A ∩ B). An equivalent

characterization is that the marginal profit of each item should be non-increasing,

i.e., f(A ∪ a) − f(A) ≤ f(B ∪ a) − f(B) if B ⊆ A ⊆ S and a ∈ S \ B. A

function f : 2S 7→ R is monotone if and only if f(A) ≤ f(B) for A ⊆ B ⊆ S; it is

non-monotone if it is not necessarily the case. Since the number of sets is exponential,

we assume a value oracle access to the submodular function; i.e., for a given set T , an

algorithm can query an oracle to find its value f(T ). As we discuss below, maximizing

a (monotone or non-monotone) submodular function which demonstrates economy of

scale is a central and very general problem in combinatorial optimization and has

been subject of a thorough study in the literature.

The closest setting to our submodular multiple-choice secretary problem is the ma-

67

troid secretary problem considered by Babaioff et al. [9]. In this problem, we are given

a matroid by a ground set U of elements and a collection of independent (feasible)

subsets I ⊆ 2U describing the sets of elements which can be simultaneously accepted.

We recall that a matroid has three properties: 1) the empty set is independent; 2)

every subset of an independent set is independent (closed under containment)1; and

finally 3) if A and B are two independent sets and A has more elements than B, then

there exists an element in A which is not in B and when added to B still gives an

independent set2. The goal is to design online algorithms in which the structure of U

and I is known at the outset (assume we have an oracle to answer whether a subset

of U belongs to I or not), while the elements and their values are revealed one at

a time in random order. As each element is presented, the algorithm must make an

irrevocable decision to select or reject it such that the set of selected elements belongs

to I at all times. Babaioff et al. present an O(log r)-competitive algorithm for general

matroids, where r is the rank of the matroid (the size of the maximal independent

set), and constant-competitive algorithms for several special cases arising in practical

scenarios including graphic matroids, truncated partition matroids, and bounded de-

gree transversal matroids. However, they leave as a main open question the existence

of constant-competitive algorithms for general matroids. Our constant-competitive

algorithms for the submodular secretary problem in this chapter can be considered

in parallel with this open question. To generalize both results of Babaioff et al. and

ours, we also consider the submodular matroid secretary problem in which we want

to maximize a submodular function over all independent (feasible) subsets I of the

given matroid. Moreover, we extend our approach to the case in which l matroids

are given and the goal is to find the set of maximum value which is independent with

respect to all the given matroids. We present an O(l log2 r)-competitive algorithm for

the submodular matroid secretary problem generalizing previous results.

Prior to our work, there was no polynomial-time algorithm with a nontrivial guar-

antee for the case of l matroids—even in the offline setting—when l is not a fixed

1This is sometimes called the hereditary property.2This is sometimes called the augmentation property or the independent set exchange property.

68

constant. Lee et al. [61] give a local-search procedure for the offline setting that runs

in time O(nl) and achieves approximation ratio l+ε. Even the simpler case of having a

linear function cannot be approximated to within a factor better than Ω(l/ log l) [51].

Our results imply an algorithm with guarantees O(l log r) and O(l log2 r) for the of-

fline and (online) secretary settings, respectively. Both these algorithms run in time

polynomial in l. In case of the knapsack constraints, the only previous relevant work

that we are aware of is that of Lee et al. [61] which gives a (5+ε) approximation in the

offline setting if the number of constraints is a constant. In contrast, our results work

for arbitrary number of knapsack constraints, albeit with a loss in the guarantee; see

Theorem 4.1.3.

Our competitive ratio for the submodular secretary problem is 71−1/e . Though

our algorithm is relatively simple, it has several phases and its analysis is relatively

involved. As we point out below, we cannot obtain any approximation factor better

than 1 − 1/e even for offline special cases of our setting unless P = NP. A natural

generalization of a submodular function while still preserving economy of scale is a

subadditive function f : 2S 7→ R in which ∀A,B ⊆ S : f(A) + f(B) ≥ f(A ∪ B). In

this chapter, we show that if we consider the subadditive secretary problem instead

of the submodular secretary problem, there is no algorithm with competitive ratio

o(√n). We complement this result by giving an O(

√n)-competitive algorithm for the

subadditive secretary problem.

Background on submodular maximization Submodularity, a discrete analog

of convexity, has played a central role in combinatorial optimization [62]. It appears

in many important settings including cuts in graphs [53, 44, 71], plant location prob-

lems [22, 21], rank function of matroids [28], and set covering problems [30].

The problem of maximizing a submodular function is of essential importance,

with special cases including Max Cut [44], Max Directed Cut [49], hypergraph cut

problems, maximum facility location [1, 22, 21], and certain restricted satisfiability

problems [50, 29]. While the Min Cut problem in graphs is a classical polynomial-

time solvable problem, and more generally it has been shown that any submodular

69

function can be minimized in polynomial time [53, 72], maximization turns out to be

more difficult and indeed all the aforementioned special cases are NP-hard.

Max-k-Cover, where the goal is to choose k sets whose union is as large as possible,

is another related problem. It is shown that a greedy algorithm provides a (1− 1/e)

approximation for Max-k-Cover [58] and this is optimal unless P = NP [30]. More

generally, we can view this problem as maximization of a monotone submodular func-

tion under a cardinality constraint, that is, we seek a set S of size k maximizing f(S).

The greedy algorithm again provides a (1− 1/e) approximation for this problem [69].

A 1/2 approximation has been developed for maximizing monotone submodular func-

tions under a matroid constraint [37]. A (1 − 1/e) approximation has been also ob-

tained for a knapsack constraint [73], and for a special class of submodular functions

under a matroid constraint [20].

Recently constant factor (34

+ ε)-approximation algorithms for maximizing non-

negative non-monotone submodular functions has also been obtained [32]. Typical

examples of such a problem are max cut and max directed cut. Here, the best

approximation factors are 0.878 for max cut [44] and 0.859 for max directed cut [29].

The approximation factor for max cut has been proved optimal, assuming the Unique

Games Conjecture [57]. Generalizing these results, Vondrak very recently obtains a

constant factor approximation algorithm for maximizing non-monotone submodular

functions under a matroid constraint [77]. Subadditive maximization has been also

considered recently (e.g. in the context of maximizing welfare [31]).

Submodular maximization also plays a role in maximizing the difference of a

monotone submodular function and a modular function. A typical example of this

type is the maximum facility location problem in which we want to open a subset of

facilities and maximize the total profit from clients minus the opening cost of facilities.

Approximation algorithms have been developed for a variant of this problem which

is a special case of maximizing nonnegative submodular functions [1, 22, 21]. The

current best approximation factor known for this problem is 0.828 [1]. Asadpour et

al. [5] study the problem of maximizing a submodular function in a stochastic setting,

and obtain constant-factor approximation algorithms.

70

4.1 Our Results and Techniques

The main theorem in this chapter is as follows.

Theorem 4.1.1. There exists a 71−1/e-competitive algorithm for the monotone sub-

modular secretary problem. More generally there exists a 8e2-competitive algorithm

for the non-monotone submodular secretary problem.

We prove Theorem 4.1.1 in Section 4.2. We first present our simple algorithms

for the problem. Since our algorithm for the general non-monotone case uses that of

monotone case, we first present the analysis for the latter case and then extend it for

the former case. We divide the input stream into equal-sized segments, and show that

restricting the algorithm to pick only one item from each segment decreases the value

of the optimum by at most a constant factor. Then in each segment, we use a standard

secretary algorithm to pick the best item conditioned on our previous choices. We

next prove that these local optimization steps lead to a global near-optimal solution.

The argument breaks for the non-monotone case since the algorithm actually

approximates a set which is larger than the optimal solution. The trick is to invoke

a new structural property of (non-monotone) submodular functions which allows us

to divide the input into two equal portions, and randomly solve the problem on one.

Indeed Theorem 4.1.1 can be extended for the submodular matroid secretary

problem as follows.

Theorem 4.1.2. There exists an O(l log2 r) competitive algorithm for the (non-

monotone) matroid submodular secretary problem, where r is the maximum rank of

the given l matroids.

We prove theorem 4.1.2 in Section 4.3. We note that in the submodular matroid

secretary problem, selecting (bad) elements early in the process might prevent us

from selecting (good) elements later since there are matroid independence (feasibility)

constraints. To overcome this issue, we only work with the first half of the input. This

guarantees that at each point in expectation there is a large portion of the optimal

solution that can be added to our current solution without violating the matroid

71

constraint. However, this set may not have a high value. As a remedy we prove there

is a near-optimal solution all of whose large subsets have a high value. This novel

argument may be of its own interest.

We shortly mention in Section 4.4 our results for maximizing a submodular sec-

retary problem with respect to l knapsack constraints. In this setting, there are l

knapsack capacities Ci : 1 ≤ i ≤ l, and each item j has different weights wij associ-

ated with each knapsack. A set T of items is feasible if and only if for each knapsack

i, we have∑

j∈T wij ≤ Ci.

Theorem 4.1.3. There exists an O(l)-competitive algorithm for the (non-monotone)

multiple knapsack submodular secretary problem, where l denotes the number of given

knapsack constraints.

Lee et al. [61] gives a better (5 + ε) approximation in the offline setting if l is a

fixed constant.

We next show that indeed submodular secretary problems are the most general

cases that we can hope for constant competitiveness.

Theorem 4.1.4. For the subadditive secretary problem, there is no algorithm with

competitive ratio in o(√n). However there is an algorithm with almost tight O(

√n)

competitive ratio in this case.

We prove Theorem 4.1.4 in Section 4.5. The algorithm for the matching upper

bound is very simple, however the lower bound uses clever ideas and indeed works

in a more general setting. We construct a subadditive function, which interestingly

is almost submodular, and has a “hidden good set”. Roughly speaking, the value

of any query to the oracle is proportional to the intersection of the query and the

hidden good set. However, the oracle’s response does not change unless the query has

considerable intersection with the good set which is hidden. Hence, the oracle does

not give much information about the hidden good set.

Finally in our concluding remarks in Section 4.6, we briefly discuss two other

aggregate functions max and min, where the latter is not even submodular and models

a bottle-neck situation in the secretary problem.

72

Remark Subsequent to our study of online submodular maximization [12], Gupta et

al. [45] consider similar problems. By reducing the case of non-monotone submodular

functions to several runs of the greedy algorithm for monotone submodular functions,

they present O(p)-approximation algorithms for maximizing submodular functions (in

the offline setting) subject to p-independence systems (which include the intersection

of p matroids), and constant factor approximation algorithms when the maximization

is subject to a knapsack constraint. In the online secretary setting, they provide O(1)-

competitive results for maximizing a submodular function subject to cardinality or

partition matroid constraints. They also obtain an O(log r) competitive ratio for

maximization subject to a general matroid of rank r. The latter result improves our

Theorem 4.1.2 when l = 1.

4.2 The Submodular Secretary Problem

4.2.1 Algorithms

In this section, we present the algorithms used to prove Theorem 4.1.1. In the classic

secretary problem, the efficiency value of each secretary is known only after she arrives.

In order to marry this with the value oracle model, we say that the oracle answers

the query regarding the efficiency of a set S ′ ⊆ S only if all the secretaries in S ′ have

already arrived and been interviewed.

Our algorithm for the monotone submodular case is relatively simple though its

analysis is relatively involved. First we assume that n is a multiple of k, since other-

wise we could virtually insert n−kbnkc dummy secretaries in the input: for any subset

A of dummy secretaries and a set B ⊆ S, we have that f(A ∪ B) = f(B). In other

words, there is no profit in employing the dummy secretaries. To be more precise,

we simulate the augmented input in such a way that these secretaries are arriving

uniformly at random similarly to the real ones. Thus, we say that n is a multiple of

k without loss of generality.

We partition the input stream into k equally-sized segments, and, roughly speak-

73

Algorithm 1 Monotone Submodular Secretary Algorithm

Input: A monotone submodular function f : 2S 7→ R, and a randomly permutedstream of secretaries, denoted by (a1, a2, . . . , an), where n is an integer multiple of k.Output: A subset of at most k secretaries.

Let T0 ← ∅Let l← n/kfor i← 1 to k do phase i

Let ui ← (i− 1)l + l/eLet αi ← max

(i−1)l≤j<uif(Ti−1 ∪ aj)

if αi < f(Ti−1) thenαi ← f(Ti−1)

end ifPick an index pi : ui ≤ pi < il such that f(Ti−1 ∪ api) ≥ αiif such an index pi exists then

Let Ti ← Ti−1 ∪ apielse

Let Ti ← Ti−1end if

end forOutput Tk as the solution

ing, try to employ the best secretary in each segment. Let l := nk

denote the

length of each segment. Let a1, a2, · · · , an be the actual ordering in which the

secretaries are interviewed. Break the input into k segments such that Sj =

a(j−1)l+1, a(j−1)l+2, . . . , ajl for 1 ≤ j < k, and Sk = a(k−1)l+1, a(k−1)l+2, . . . , an.

We employ at most one secretary from each segment Si. Note that this way of having

several phases of (almost) equal length for the secretary problem seems novel to this

chapter, since in previous works there are usually only two phases (see e.g. [48]). The

phase i of our algorithm corresponds to the time interval when the secretaries in Si

arrive. Let Ti be the set of secretaries that we have employed from⋃ij=1 Sj. Define

T0 := ∅ for convenience. In phase i, we try to employ a secretary e from Si that maxi-

mizes f(Ti−1∪e)−f(Ti−1). For each e ∈ Si, we define gi(e) = f(Ti−1∪e)−f(Ti−1).

Then, we are trying to employ a secretary x ∈ Si that has the maximum value for

gi(e). Using a classic algorithm for the secretary problem (see [27] for instance) for

employing the single secretary, we can solve this problem with constant probability

1/e. Hence, with constant probability, we pick the secretary that maximizes our local

74

profit in each phase. It leaves us to prove that this local optimization leads to a

reasonable global guarantee.

The previous algorithm fails in the non-monotone case. Observe that the first

if statement is never true for a monotone function, however, for a non-monotone

function this guarantees the values of sets Ti are non-decreasing. Algorithm 2 first

divides the input stream into two equal-sized parts: U1 and U2. Then, with probability

1/2, it calls Algorithm 1 on U1, whereas with the same probability, it skips over the

first half of the input, and runs Algorithm 1 on U2.

Algorithm 2 Submodular Secretary Algorithm

Input: A (possibly non-monotone) submodular function f : 2S 7→ R, and arandomly permuted stream of secretaries, denoted by (a1, a2, . . . , an), where n is aninteger multiple of 2k.Output: A subset of at most k secretaries.

Let U1 := a1, a2, . . . , an/2Let U2 := an/2 + 1, . . . , an−1, anLet 0 ≤ X ≤ 1 be a uniformly random value.if X ≤ 1/2 then

Run Algorithm 1 on U1 to get S1

Output S1 as the solutionelse

Run Algorithm 1 on U2 to get S2

Output S2 as the solutionend if

4.2.2 Analysis

In this section, we prove Theorem 4.1.1. Since the algorithm for the non-monotone

submodular secretary problem uses that for the monotone submodular secretary prob-

lem, first we start with the monotone case.

Monotone Submodular

We prove in this section that for Algorithm 1, the expected value of f(Tk) is within a

constant factor of the optimal solution. Let R = ai1 , ai2 , · · · , aik be the optimal so-

lution. Note that the set i1, i2, · · · , ik is a uniformly random subset of 1, 2, · · · , n

75

with size k. It is also important to note that the permutation of the elements of the

optimal solution on these k places is also uniformly random, and is independent from

the set i1, i2, · · · , ik. For example, any of the k elements of the optimum can appear

as ai1 . These are two key facts used in the analysis.

Before starting the analysis, we present a simple property of submodular functions

which will prove useful in the analysis.

Lemma 4.2.1. If f : 2S 7→ R is a submodular function, we have f(B) − f(A) ≤∑a∈B\A [f(A ∪ a)− f(A)] for any A ⊆ B ⊆ S.

Proof. Let k := |B|− |A|. Then, define in an arbitrary manner sets Biki=0 such that

• B0 = A,

• |Bi \Bi−1| = 1 for i : 1 ≤ i ≤ k,

• and Bk = B.

Let bi := Bi \Bi−1 for i : 1 ≤ i ≤ k. We can write f(B)− f(A) as follows

f(B)− f(A) =k∑i=1

[f(Bi)− f(Bi−1)]

=k∑i=1

[f(Bi−1 ∪ bi)− f(Bi−1)]

≤k∑i=1

[f(A ∪ bi)− f(A)] ,

where the last inequality follows from the non-increasing marginal profit property of

submodular functions. Noticing that bi ∈ B \A and they are distinct, namely bi 6= bi′

for 1 ≤ i 6= i′ ≤ k, finishes the argument.

Define X := Si : |Si ∩ R| 6= ∅. For each Si ∈ X , we pick one element, say si, of

Si ∩ R randomly. These selected items form a set called R′ = s1, s2, · · · , s|X | ⊆ R

of size |X |. Since our algorithm approximates such a set, we study the value of such

random samples of R in the following lemmas. We first show that restricting ourselves

76

to picking at most one element from each segment does not prevent us from picking

many elements from the optimal solution (i.e., R).

Lemma 4.2.2. The expected value of the number of items in R′ is at least k(1−1/e).

Proof. We know that |R′| = |X |, and |X | is equal to k minus the number of sets Si

whose intersection with R is empty. So, we compute the expected number of these

sets, and subtract this quantity from k to obtain the expected value of |X | and thus

|R′|.

Consider a set Sq, 1 ≤ q ≤ k, and the elements of R = ai1 , ai2 , . . . , aik. Define

Ej as the event that aij is not in Sq. We have Pr(E1) = (k−1)ln

= 1 − 1k, and for any

i : 1 < i ≤ k, we get

Pr

(Ei

∣∣∣∣∣i−1⋂j=1

Ej

)=

(k − 1)l − (i− 1)

n− (i− 1)≤ (k − 1)l

n= 1− 1

k,

where the last inequality follows from a simple mathematical fact: x−cy−c ≤

xy

if c ≥ 0

and x ≤ y. Now we conclude that the probability of the event Sq ∩R = ∅ is

Pr(∩ki=1Ei) = Pr(E1) · Pr(E2|E1) · · ·Pr(Ek| ∩k−1j=1 Ej) ≤(

1− 1

k

)k≤ 1

e.

Thus each of the sets S1, S2, . . . , Sk does not intersect with R with probability at

most 1/e. Hence, the expected number of such sets is at most k/e. Therefore, the

expected value of |X | = |R′| is at least k(1− 1/e).

The next lemma materializes the proof of an intuitive statement: if you randomly

sample elements of the set R, you expect to obtain a profit proportional to the size

of your sample. An analog of this is proved in [31] for the case when |R|/|A| is an

integer.

Lemma 4.2.3. For a random subset A of R, the expected value of f(A) is at least

|A|k· f(R).

Proof. Let (x1, x2, . . . , xk) be a random ordering of the elements of R. For r =

1, 2, . . . , k, let Fr be the expectation of f(x1, . . . , xr), and define Dr := Fr − Fr−1,

77

where F0 is interpreted to be equal to zero. Letting a := |A|, note that f(R) = Fk =

D1+· · ·+Dk, and that the expectation of f(A) is equal to Fa = D1+· · ·+Da. We claim

that D1 ≥ D2 ≥ · · · ≥ Dk, from which the lemma follows easily. Let (y1, y2, . . . , yk)

be a cyclic permutation of (x1, x2, . . . , xk), where y1 = xk, y2 = x1, y3 = x2, . . . , yk =

xk−1. Notice that for i < k, Fi is equal to the expectation of f(y2, . . . , yi+1) since

y2, . . . , yi+1 is equal to x1, . . . , xi.

Fi is also equal to the expectation of f(y1, . . . , yi), since the sequence

(y1, . . . , yi) has the same distribution as that of (x1, · · · , xi). Thus, Di+1 is

the expectation of f(y1, . . . , yi+1) − f(y2, . . . , yi+1), whereas Di is the ex-

pectation of f(y1, . . . , yi) − f(y2, . . . , yi). The submodularity of f implies

that f(y1, . . . , yi+1) − f(y2, . . . , yi+1) is less than or equal to f(y1, . . . , yi) −

f(y2, . . . , yi), hence Di+1 ≤ Di.

Here comes the crux of our analysis where we prove that the local optimization

steps (i.e., trying to make the best move in each segment) indeed lead to a globally

approximate solution.

Lemma 4.2.4. The expected value of f(Tk) is at least |R′|

7k· f(R).

Proof. Define m := |R′| for the ease of reference. Recall that R′ is a set of secretaries

s1, s2, . . . , sm such that si ∈ Shi ∩ R for i : 1 ≤ i ≤ m and hi : 1 ≤ hi ≤ k. Also

assume without loss of generality that hi′ ≤ hi for 1 ≤ i′ < i ≤ m, for instance,

s1 is the first element of R′ to appear. Define ∆j for each j : 1 ≤ j ≤ k as the

gain of our algorithm while working on the segment Sj. It is formally defined as

∆j := f(Tj) − f(Tj−1). Note that due to the first if statement in the algorithm,

∆j ≥ 0 and thus E[∆j] ≥ 0. With probability 1/e, we choose the element in Sj

which maximizes the value of f(Tj) (given that the set Tj−1 is fixed). Notice that by

definition of R′ only one si appears in Shi . Since si ∈ Shi is one of the options,

E[∆hi ] ≥E[f(Thi−1 ∪ si)− f(Thi−1)]

e. (4.1)

To prove by contradiction, suppose E[f(Tk)] <m7k· f(R). Since f is monotone,

E[f(Tj)] <m7k· f(R) for any 0 ≤ j ≤ k. Define B := si, si+1, · · · , sm. By

78

Lemma 4.2.1 and monotonicity of f ,

f(B) ≤ f(B ∪ Thi−1) ≤ f(Thi−1) +m∑j=i

[f(Thi−1 ∪ sj)− f(Thi−1)],

which implies

E[f(B)] ≤ E[f(Thi−1)] +m∑j=i

E[f(Thi−1 ∪ sj)− f(Thi−1)].

Since the items in B are distributed uniformly at random, and there is no difference

between si1 and si2 for i ≤ i1, i2 ≤ m, we can say

E[f(B)] ≤ E[f(Thi−1)] + (m− i+ 1) · E[f(Thi−1 ∪ si)− f(Thi−1)]. (4.2)

We conclude from (4.1) and (4.2)

E[∆hi ] ≥E[f(Thi−1 ∪ si)− f(Thi−1)]

e≥ E[f(B)]− E[f(Thi−1)]

e(m− i+ 1).

Since B is a random sample of R, we can apply Lemma 4.2.3 to get E[f(B)] ≥|B|kf(R) = f(R)(m− i+ 1)/k. Since E[f(Thi−1)] ≤ m

7k· f(R), we reach

E[∆hi ] ≥E[f(B)]− E[f(Thi−1)]

e(m− i+ 1)≥ f(R)

ek− m

7kf(R) · 1

e(m− i+ 1). (4.3)

Adding up (4.3) for i : 1 ≤ i ≤ dm/2e, we obtain

dm/2e∑i=1

E[∆hi ] ≥⌈m

2

⌉· f(R)

ek− m

7ek· f(R) ·

dm/2e∑i=1

1

m− i+ 1.

Since∑b

j=a1j≤ ln b

a+1for any integer values of a, b : 1 < a ≤ b, we conclude

dm/2e∑i=1

E[∆hi ] ≥⌈m

2

⌉· f(R)

ek− m

7ek· f(R) · ln m⌊

m2

⌋ .

79

A similar argument for the range 1 ≤ i ≤ bm/2c gives

bm2 c∑i=1

E[∆hi ] ≥⌊m

2

⌋· f(R)

ek− m

7ek· f(R) · ln m⌈

m2

⌉ .We also know that both

∑bm/2ci=1 E[∆hi ] and

∑dm/2ei=1 E[∆hi ] are at most E[f(Tk)]

because f(Tk) ≥∑m

i=1 ∆hi . We conclude with

2E[f(Tk)] ≥⌈m

2

⌉ f(R)

ek− mf(R)

7ek· ln m⌊

m2

⌋ +⌊m

2

⌋ f(R)

ek− mf(R)

7ek· ln m⌈

m2

⌉≥ mf(R)

ek− mf(R)

7ek· ln m2⌊

m2

⌋ ⌈m2

⌉ , and sincem2

bm/2cdm/2e< 4.5

≥ mf(R)

ek− mf(R)

7ek· ln 4.5 =

mf(R)

k·(

1

e− ln 4.5

7e

)≥ mf(R)

k· 2

7,

which contradicts E[f(Tk)] <mf(R)

7k, hence proving the supposition false.

The following theorem wraps up the analysis of the algorithm.

Theorem 4.2.5. The expected value of the output of our algorithm is at least

1−1/e7f(R).

Proof. The expected value of |R′| = m ≥ (1 − 1/e)k from Lemma 4.2.2. In other

words, we have∑k

m=1 Pr[|R′| = m] · m ≥(1− 1

e

)k. We know from Lemma 4.2.4

that if the size of R′ is m, the expected value of f(Tk) is at least m7kf(R), implying

that∑

v∈V Pr[f(Tk) = v

∣∣ |R′| = m]·v ≥ m

7kf(R), where V denotes the set of different

values that f(Tk) can get. We also know that

E[f(Tk)] =k∑

m=1

E[f(Tk)||R′| = m] Pr[|R′| = m] ≥k∑

m=1

m

7kf(R) Pr[|R′| = m]

=f(R)

7kE[|R′|] ≥ 1− 1/e

7f(R).

Non-monotone Submodular

Before starting the analysis of Algorithm 2 for non-monotone functions, we show an

interesting property of Algorithm 1. Consistently with the notation of Section 4.2.2,

80

we use R to refer to some optimal solution. Recall that we partition the input stream

into (almost) equal-sized segments Si : 1 ≤ i ≤ k, and pick one item from each.

Then Ti denotes the set of items we have picked at the completion of segment i. We

show that f(Tk) ≥ 12ef(R ∪ Ti) for some integer i, even when f is not monotone.

Roughly speaking, the proof mainly follows from the submodularity property and

Lemma 4.2.1.

Lemma 4.2.6. If we run the monotone algorithm on a (possibly non-monotone)

submodular function f , we obtain f(Tk) ≥ 12e2f(R ∪ Ti) for some i.

Proof. Consider the stage i + 1 in which we want to pick an item from Si+1.

Lemma 4.2.1 implies

f(R ∪ Ti) ≤ f(Ti) +∑

a∈R\Ti

f(Ti ∪ a)− f(Ti).

At least one of the two right-hand side terms has to be larger than f(R ∪ Ti)/2. If

this happens to be the first term for any i, we are done: f(Tk) ≥ f(Ti) ≥ 12f(R ∪ Ti)

since f(Tk) ≥ f(Ti) by the definition of the algorithm: the first if statement makes

sure f(Ti) values are non-decreasing. Otherwise assume that the lower bound occurs

for the second terms for all values of i.

Consider the events that among the elements in R \ Ti exactly one, say a, falls

in Si+1. Call this event Ea. Conditioned on Ea, ∆i+1 := f(Ti+1) − f(Ti) is at least

f(Ti∪a)−f(Ti) with probability 1/e: i.e., if the algorithm picks the best secretary

in this interval. Each event Ea occurs with probability at least 1k· 1e. Since these

events are disjoint, we have

E[∆i+1] ≥∑

a∈R\Ti

Pr[Ea] ·1

e[f(Ti+1)− f(Ti)] ≥

1

e2k

∑a∈R\Ti

f(Ti ∪ a)− f(Ti)

≥ 1

2e2kf(R ∪ Ti),

81

and by summing over all values of i, we obtain

E[f(Tk)] =∑i

E[∆i] ≥∑i

1

2e2kf(R ∪ Ti) ≥

1

2e2minif(R ∪ Ti).

Unlike the case of monotone functions, we cannot say that f(R∪Ti) ≥ f(R), and

conclude that our algorithm is constant-competitive. Instead, we need to use other

techniques to cover the cases that f(R ∪ Ti) < f(R). The following lemma presents

an upper bound on the value of the optimum.

Lemma 4.2.7. For any pair of disjoint sets Z and Z ′, and a submodular function f ,

we have f(R) ≤ f(R ∪ Z) + f(R ∪ Z ′).

Proof. The statement follows from the submodularity property, observing that (R ∪

Z) ∩ (R ∪ Z ′) = R, and f([R ∪ Z] ∪ [R ∪ Z ′]) ≥ 0.

We are now at a position to prove the performance guarantee of our main algo-

rithm.

Theorem 4.2.8. Algorithm 2 has competitive ratio 8e2.

Proof. Let the outputs of the two algorithms be sets Z and Z ′, respectively. The

expected value of the solution is thus [f(Z) + f(Z ′)]/2.

We know that E[f(Z)] ≥ c′f(R ∪ X1) for some constant c′, and X1 ⊆ U1. The

only difference in the proof is that each element of R \ Z appears in the set Si

with probability 1/2k instead of 1/k. But we can still prove the above lemma for

c′ := 1/4e2. Same holds for Z ′: E[f(Z ′)] ≥ 14ef(R ∪X2) for some X2 ⊆ U2.

Since U1 and U2 are disjoint, so are X1 and X2. Hence, the expected value of our

solution is at least 14e2

[f(R ∪X1) + f(R ∪X2)]/2, which via Lemma 4.2.7 is at least

18e2f(R).

4.3 The Submodular Matroid Secretary Problem

In this section, we prove Theorem 4.1.2. We first design an O(log2 r)-competitive

algorithm for maximizing a monotone submodular function, when there are matroid

82

constraints for the set of selected items. Here we are allowed to choose a subset of

items only if it is an independent set in the given matroid.

The matroid (U , I) is given by an oracle access to I. Let n denote the number

of items, i.e., n := |U|, and r denotes the rank of the matroid. Let S ∈ I denote an

optimal solution that maximizes the function f . We focus our analysis on a refined

set S∗ ⊆ S that has certain nice properties: 1) f(S∗) ≥ (1 − 1/e)f(S), and 2)

f(T ) ≥ f(S∗)/ log r for any T ⊆ S∗ such that |T | = b|S∗|/2c. We cannot necessarily

find S∗, but we prove that such a set exists.

Start by letting S∗ = S. As long as there is a set T violating the second property

above, remove T from S∗, and continue. The second property clearly holds at the

termination of the procedure. In order to prove the first property, consider one

iteration. By submodularity (subadditivity to be more precise) we have f(S∗ \ T ) ≥

f(S∗)− f(T ) ≥ (1− 1/ log r)f(S∗). Since each iteration halves the set S∗, there are

at most log r iterations. Therefore, f(S∗) ≥ (1− 1/ log r)log r · f(S) ≥ (1− 1/e)f(S).

We analyze the algorithm assuming the parameter |S∗| is given, and achieve a

competitive ratio O(log r). If |S∗| is unknown, though, we can guess its value (from a

pool of log r different choices) and continue with Lemma 4.3.1. This gives an O(log2 r)

competitive ratio.

Lemma 4.3.1. Given |S∗|, Algorithm 3 picks an independent subset of items with

size |S∗|/2 whose expected value is at least f(S∗)/4e log r.

Proof. Let k := |S∗|. We divide the input stream of n items into k segments of

(almost) equal size. We only pick k/2 items, one from each of the first k/2 segments.

Similarly to Algorithm 1 for the submodular secretary problem, when we work

on each segment, we try to pick an item that maximizes the marginal value of the

function given the previous selection is fixed (see the for loop in Algorithm 1). We

show that the expected gain in each of the first k/2 segments is at least a constant

fraction of f(S∗)/k log r.

Suppose we are working on segment i ≤ k/2, and let Z be the set of items already

picked; so |Z| ≤ i−1. Furthermore, assume f(Z) ≤ f(S∗)/2 log r since otherwise, the

83

Algorithm 3 Monotone Submodular Secretary Algorithm with Matroid constraint

Input: A monotone submodular function f : 2U 7→ R, a matroid (U , I), and arandomly permuted stream of secretaries, denoted by (a1, a2, . . . , an).Output: A subset of secretaries that are independent according to I.

Let U1 := a1, a2, . . . , abn/2cPick the parameter k := |S∗| uniformly at random from the pool 20, 21, 2log rif k = O(log r) then

Select the best item of the U1 and output the singletonelse run Algorithm 1 on U1 and respect the matroid

Let T0 ← ∅Let l← bn/kcfor i← 1 to k do phase i

Let ui ← (i− 1)l + l/eLet αi ← max

(i−1)l≤j<uiTi−1∪aj∈I

f(Ti−1 ∪ aj)

if αi < f(Ti−1) thenαi ← f(Ti−1)

end ifPick an index pi : ui ≤ pi < il such that f(Ti−1∪api) ≥ αi and Ti−1∪api ∈Iif such an index pi exists then

Let Ti ← Ti−1 ∪ apielse

Let Ti ← Ti−1end if

end forOutput Tk as the solution

end if

84

lemma is already proved. By matroid properties we know there is a set T ⊆ S∗ \Z of

size bk/2b such that T ∪Z ∈ I. The second property of S∗ gives f(T ) ≥ f(S∗)/ log r.

From Lemma 4.2.1 and monotonicity of f , we obtain

∑s∈T

[f(Z ∪ s)− f(Z)] ≥ f(T ∪ Z)− f(Z) ≥ f(T )− f(Z) ≥ f(S∗)/2 log r.

Note that each item in T appears in this segment with probability 2/k because we

divided the input stream into k/2 equal segments. Since in each segment we pick the

item giving the maximum marginal value with probability 1/e, the expected gain in

this segment is at least

∑s∈T

1

e· 2

k· [f(Z ∪ s)− f(Z)] ≥ f(S∗)/ek log r.

We have this for each of the first k/2 segments, so the expected value of our solution

is at least f(S∗)/2e log r.

Finally, it is straightforward (and hence the details are omitted) to combine the

algorithm in this section with Algorithm 2 for the non-monotone submodular sec-

retary problem, to obtain an O(log2 r)-competitive algorithm for the non-monotone

submodular secretary problem subject to a matroid constraint.

Here we show the same algorithm works when there are l ≥ 1 matroid constraints

and achieves a competitive ratio of O(l log2 r). We just need to respect all matroid

constraints in Algorithm 3. This finishes the proof of Theorem 4.1.2.

Lemma 4.3.2. Given |S∗|, Algorithm 3 picks an independent subset of items (i.e.,

independent with respect to all matroids) with expected value at least f(S∗)/4el log r.

Proof. The proof is similar to the proof of Lemma 4.3.1. We show that the expected

gain in each of the first k/2l segments is at least a constant fraction of f(S∗)/k log r.

Suppose we are working on segment i ≤ k/2l, and let Z be the set of items

already picked; so |Z| ≤ i − 1. Furthermore, assume f(Z) ≤ f(S∗)/2 log r since

otherwise, the lemma is already proved. We claim that there is a set T ⊆ S∗ \ Z of

85

size k − l× bk/2lc ≥ k/2 such that T ∪ Z is an independent set in all matroids. The

proof is as follows. We know that there exists a set T1 ⊆ S∗ whose union with Z is an

independent set of the first matroid, and the size of T1 is at least |S∗|− |Z|. This can

be proved by the exchange property of matroids, i.e., adding Z to the independent

set S∗ does not remove more than |Z| items from S∗. Since T1 is independent with

respect to the second matroid (as it is a subset of S∗), we can prove that there exists

a set T2 ⊆ T1 of size at least |T1| − |Z| such that Z ∪ T2 is an independent set in the

second matroid. If we continue this process for all matroid constraints, we can prove

that there is a set Tl which is an independent set in all matroids, and has size at least

|S∗| − l|Z| ≥ k− l×bk/2lc ≥ k/2 such that Z ∪ Tl is independent with respect to all

the given matroids. The rest of the proof is similar to the proof of Lemma 4.3.1—we

just need to use the set Tl instead of the set T in the proof.

Since we are gaining a constant times f(S∗)/k log r in each of the first k/2l

segments, the expected value of the final solution is at least a constant times

f(S∗)/l log r.

4.4 Knapsack Constraints

In this section, we prove Theorem 4.1.3. We first outline how to reduce an instance

with multiple knapsacks to an instance with only one knapsack, and then we show

how to solve the single knapsack instance.

Without loss of generality, we can assume that all knapsack capacities are equal to

one. Let I be the given instance with the value function f , and item weights wij for

1 ≤ i ≤ l and 1 ≤ j ≤ n. Define a new instance I ′ with one knapsack of capacity one

in which the weight of the item j is w′j := maxiwij. We first prove that this reduction

loses no more than a factor 4l in the total value. Take note that both the scaling and

the weight transformation can be carried in an online manner as the items arrive.

Hence, the results of this section hold for the online as well as the offline setting.

Lemma 4.4.1. With instance I ′ defined above, we have 14l

OPT(I) ≤ OPT(I ′) ≤

OPT(I).

86

Proof. The latter inequality is very simple: Take the optimal solution to I ′. This is

also feasible in I since all the item weights in I are bounded by the weight in I ′.

We next prove the other inequality. Let T be the optimal solution of I. An item

j is called fat if w′j ≥ 1/2. Notice that there can be at most 2l fat items in T since∑j∈T w

′j ≤

∑j∈T∑

iwij ≤ l. If there is any fat item with value at least OPT(I)/4l,

the statement of the lemma follows immediately, so we assume this is not the case.

The total value of the fat items, say F , is at most OPT(I)/2. Submodularity and

non-negativity of f gives f(T \F ) ≥ f(T )−f(F ) ≥ OPT(I)/2. Sort the non-fat items

in decreasing order of their value density (i.e., ratio of value to weight), and let T ′ be

a maximal prefix of this ordering that is feasible with respect to I ′. If T ′ = T \ F ,

we are done; otherwise, T ′ has weight at least 1/2. Let x be the total weight of

items in T ′ and let y indicate the total weight of items in T \ (F ∪ T ′). Let αx and

αy denote the densities of the two corresponding subsets of the items, respectively.

Clearly x+ y ≤ l and αx ≥ αy. Thus, f(T \ F ) = αx · x+ αy · y ≤ αx(x+ y) ≤ αx · l.

Now f(T ′) ≥ αx · 12 ≥12lf(T \ F ) ≥ 1

4lf(T ) finishes the proof.

Here we show how to achieve a constant competitive ratio when there is only one

knapsack constraint. Let wj denote the weight of item j : 1 ≤ j ≤ n, and assume

without loss of generality that the capacity of the knapsack is 1. Moreover, let f

be the value function which is a non-monotone submodular function. Let T be the

optimal solution, and define OPT := f(T ). The value of the parameter λ ≥ 1 will be

fixed below. Define T1 and T2 as the subsets of T that appear in the first and second

half of the input stream, respectively. We first show the this solution is broken into

two balanced portions.

Lemma 4.4.2. If the value of each item is at most OPT /λ, for sufficiently large λ,

the random variable |f(T1)−f(T2)| is bounded by OPT /2 with a constant probability.

Proof. Each item of T goes to either T1 or T2 with probability 1/2. Let the random

variable X1j denote the increase of the value of f(T1) due to the possible addition

of item j. Similarly X2j is defined for the same effect on f(T2). The two variables

X1j and X2

j have the same probability distribution, and because of submodularity

87

and the fact that the value of item j is at most OPT/λ, the contribution of item j

in f(T1) − f(T2) can be seen as a random variable that always take values in range

[−OPT /λ,OPT /λ] with mean zero. (In fact, we also use the fact that in an optimal

solution, the marginal value of any item is non-negative. Submodularity guarantees

that this holds with respect to any of the subsets of T as well.) Azuma’s inequality

ensures that with constant probability the value of |f(T1)− f(T2)| is not more than

maxf(T1), f(T2)/2 for sufficiently large λ. Since both f(T1) and f(T2) are at most

OPT, we can say that they are both at least OPT /4, with constant probability.

The algorithm is as follows. Without loss of generality assume that all items are

feasible, i.e., any one item fits into the knapsack. We flip a coin, and if it turns up

“heads,” we simply try to pick the one item with the maximum value. We do the

following if the coin turns up “tails.” We do not pick any item from the first half of

the stream. Instead, we compute the maximum value set in the first half with respect

to the knapsack constraint; Lee et al. give a constant factor approximation for this

task. From the above argument, we know that f(T1) is at least OPT/4 since all the

items have limited value in this case (i.e., at most OPT /λ). Therefore, we obtain a

constant factor estimation of OPT by looking at the first half of the stream: i.e., if the

estimate is ÔPT, we get OPT /c ≤ ÔPT ≤ OPT. After obtaining this estimate, we

go over the second half of the input, and pick an item j if and only if it is feasible to

pick this item, and moreover, the ratio of its marginal value to wj is at least ÔPT/6.

Lemma 4.4.3. The above algorithm is a constant competitive algorithm for the non-

monotone submodular secretary problem with one knapsack constraint.

Proof. We give the proof for the monotone case. Extending it for the non-monotone

requires the same idea as was used in the proof of Theorem 2. First suppose there is

an item with value at least OPT /λ. With probability 1/2, we try to pick the best

item, and we succeed with probability 1/e. Thus, we get an O(1) competitive ratio

in this case.

In the other case, all the items have small contributions to the solution, i.e., less

than OPT /λ. In this case, with constant probability, both f(T1) and f(T2) are at

88

least OPT /4. Hence, ÔPT is a constant estimate for OPT. Let T ′ be the set of items

picked by the algorithm in this case. If the sum of the weights of the items in T ′ is at

least 1/2, we are done, because all these items have (marginal) value density at least

ÔPT/6, so f(T ′) ≥ (1/2) · ( ÔPT/6) = ÔPT/12 ≥ OPT /48.

Otherwise, the total weight of T ′ is less than 1/2. Therefore, there are items in

T2 that are not picked. There might be two reasons for this. There was not enough

room in the knapsack, which means that the weight of the items in T2 is more than

1/2. However, there cannot be more than one such item in T2, and the value of this

item is not more than OPT/λ. Let z be this single big item, for future reference.

Therefore, f(T ′) ≥ f(T2)−OPT /λ in this case.

The other case is when the ratios of some items from T2 are less than ÔPT/6, and

thus we do not pick them. Since they are all in T2, their total weight is at most 1.

Because of submodularity, the total loss due to these missed items is at most ÔPT/6.

Submodularity and non-negativity of f then gives f(T ′) ≥ f(T2)−f(z)− ÔPT/6 ≥ÔPT−OPTλ− ÔPT/6 = O(OPT).

4.5 The Subadditive Secretary Problem

In this section, we prove Theorem 4.1.4 by presenting first a hardness result for

approximation subadditive functions in general. The result applies in particular to

our online setting. Surprisingly, the monotone subadditive function that we use here

is almost submodular ; see Proposition 4.5.4 below. Hence, our constant competitive

ratio for submodular functions is nearly the most general we can achieve.

Definition 4.5.1 (Subadditive function maximization). Given a nonnegative subad-

ditive function f on a ground set U , and a positive integer k ≤ |U |, the goal is to

find a subset S of U of size at most k so as to maximize f(S). The function f is

accessible through a value oracle.

89

4.5.1 Hardness Result

In the following discussion, we assume that there is an upper bound of m on the size

of sets given to the oracle. We believe this restriction can be lifted. If the function

f is not required to be monotone, this is quite easy to have: simply let the value of

the function f be zero for queries of size larger than m. Furthermore, depending on

how we define the online setting, this may not be an additional restriction here. For

example, we may not be able to query the oracle with secretaries that have already

been rejected.

The main result of the section is the following theorem. It shows the subadditive

function maximization is difficult to approximate, even in the offline setting.

Theorem 4.5.2. There is no polynomial time algorithm to approximate an instance

of subadditive function maximization within O(√n) of the optimum. Furthermore,

no algorithm with exponential time 2t can achieve an approximation ratio better than

O(√n/t).

First, we are going to define our hard function. Afterwards, we continue with

proving certain properties of the function which finally lead to the proof of Theo-

rem 4.5.2.

Let n denote the size of the universe, i.e., n := |U |. Pick a random subset S∗ ⊆ U

by sampling each element of U with probability k/n. Thus, the expected size of S∗

is k.

Define the function g : U 7→ N as g(S) := |S ∩ S∗| for any S ⊆ U . One can easily

verify that g is submodular. We have a positive r whose value will be fixed below.

Define the final function f : U 7→ N as

f(S) :=

1 if g(S) = 0

dg(S)/re otherwise.

It is not difficult to verify the subadditivity of f ; it is also clearly monotone.

In order to prove the core of the hardness result in Lemma 4.5.3, we now let

90

r := λ · mkn

, where λ ≥ 1 +√

3tnmk

and t = Ω(log n) will be determined later.

Lemma 4.5.3. An algorithm making at most 2t queries to the value oracle cannot

solve the subadditive maximization problem to within k/r approximation factor.

Proof. Note that for any X ⊆ U , f(X) lies between 0 and dk/re. In fact, the optimal

solution is the set S∗ whose value is at least k/r. We prove that with high probability

the answer to all the queries of the algorithm is one. This implies that the algorithm

cannot achieve an approximation ratio better than k/r.

Assume that Xi is the i-th query of the algorithm for 1 ≤ i ≤ 2t. Notice that

Xi can be a function of our answers to the previous queries. Define Ei as the event

f(Xi) = 1. This is equivalent to g(Xi) ≤ r. We show that with high probability all

events Ei occur.

For any 1 ≤ i ≤ 2t, we have

Pr

[Ei|

i−1⋂j=1

Ej

]=

Pr[⋂ij=1 Ej]

Pr[⋂i−1j=1 Ej]

≥ Pr

[i⋂

j=1

Ej

]≥ 1−

i∑j=1

Ej.

Thus, we have Pr[∩2ti=1Ei] ≥ 1 − 2t∑2t

i=1 Pr[Ei] from union bound. Next we bound

Pr[Ei]. Consider a subset X ⊆ U such that |X| ≤ m. Since the elements of S∗ are

picked randomly with probability k/n, the expected value of X∩S∗ is at most mk/n.

Standard application of Chernoff bounds gives

Pr[f(X) 6= 1] = Pr[g(X) > r] = Pr

[|X ∩ S∗| > λ · mk

n

]≤ exp

− (λ− 1)2

mk

n

≤ exp−3t ≤ 2−2t

n,

where the last inequality follows from t ≥ log n. Therefore, the probability of all Eievents occurring simultaneously is at least 1− 1/n.

Now we can prove the main theorem of the section.

Proof of Theorem 4.5.2. We just need to set k = m =√n. Then, λ =

√3t, and the

inapproximability ratio is Ω(√

nt). Restricting to polynomial algorithms, we obtain

91

t := O(log1+ε n), and considering exponential algorithms with running time O(2t′),

we have t = O(t′), giving the desired results.

In case the query size is not bounded, we can define f(X) := 0 for large sets X,

and pull through the same result; however, the function f is no longer monotone in

this case.

We now show that the function f is almost submodular. Recall that a function g

is submodular if and only if g(A) + g(B) ≥ g(A ∪B) + g(A ∩B).

Proposition 4.5.4. For the hard function f defined above, f(A) + f(B) ≥ f(A ∪

B) + f(A ∩ B) − 2 always holds; moreover, f(X) is always positive and attains a

maximum value of Θ(√n) for the parameters fixed in the proof of Theorem 4.5.2.

Proof. The function h(X) := g(X)/r is clearly submodular, and we have h(X) ≤

f(X) ≤ h(X) + 1. We obtain f(A) + f(B) ≥ h(A) + h(B) ≥ h(A∪B) + h(A∩B) ≥

f(A ∪B) + f(A ∩B)− 2.

4.5.2 Algorithm

An algorithm that only picks the best item clearly gives a k competitive ratio. We now

show how to achieve an O(n/k) competitive ratio, and thus by combining the two,

we obtain an O(√n)-competitive algorithm for the monotone subadditive secretary

problem. This result complements our negative result nicely.

Partition the input stream S into ` := n/k (almost) equal-sized segments, each

of size at most k. Randomly pick all the elements in one of these segments. Let the

segments be denoted by S1, S2, . . . , S`. Subadditivity of f implies f(S) ≤∑

i f(Si).

Hence, the expected value of our solution is∑

i1`f(Si) ≥ 1

`f(S) ≥ 1

ÒPT, where the

two inequalities follow from subadditivity and monotonicity, respectively.

4.6 Further Results

In this chapter, we consider the (non-monotone) submodular secretary problem for

which we give a constant-competitive algorithm. The result can be generalized when

92

we have a matroid constraint on the set that we pick; in this case we obtain an

O(log2 r)-competitive algorithm where r is the rank of the matroid. However, we

show that it is very hard to compete with the optimum if we consider subadditive

functions instead of submodular functions. This hardness holds even for “almost

submodular” functions; see Proposition 4.5.4.

One may consider special non-submodular functions which enjoy certain structural

results in order to find better guarantees. For example, let f(T ) be the minimum

individual value in T which models a bottle-neck situation in the secretary problem,

i.e., selecting a group of k secretaries to work together, and the speed (efficiency) of

the group is limited to that of the slowest person in the group (note that unlike the

submodular case here the condition for employing exactly k secretaries is enforced.)

In this case, we present a simple O(k) competitive ratio for the problem as follows.

Interview the first 1/k fraction of the secretaries without employing anyone. Let α be

the highest efficiency among those interviewed. Employ the first k secretaries whose

efficiency surpasses α.

Theorem 4.6.1. Following the prescribed approach, we employ the k best secretaries

with probability at least 1/e2k.

Indeed we believe that this O(k) competitive ratio for this case should be almost

tight. One can verify that provided individual secretary efficiencies are far from each

other, say each two consecutive values are farther than a multiplicative factor n, the

problem of maximizing the expected value of the minimum efficiency is no easier than

being required to employ all the k best secretaries. The following theorem provides

evidence that the latter problem is hard to approximate.

Theorem 4.6.2. Any algorithm with a single threshold—i.e., interviewing applicants

until some point (observation phase), and then employing any one who is better than

all those in the observation phase—misses one of the k best secretaries with probability

1−O(log k/k).

Another important aggregation function f is that of maximizing the performance

of the secretaries we employ: think of picking k candidate secretaries and finally

93

hiring the best. We consider this function in Subsection 4.6.1 for which we present a

near-optimal solution. In fact, the problem has been already studied, and an optimal

strategy appears in [40]. However, we propose a simpler solution which features

certain “robustness” properties (and thus is of its own interest): in particular, suppose

we are given a vector (γ1, γ2, . . . , γk) such that γi ≥ γi+1 for 1 ≤ i < k. Sort the

elements in a set R of size k in a non-increasing order, say a1, a2, . . . , ak. The goal

is to maximize the efficiency∑

i γiai. The algorithm that we propose maximizes this

more general objective obliviously; i.e., the algorithm runs irrespective of the vector

γ, however, it can be shown the resulting solution approximates the objective for all

vectors γ at the same time. We refer to Subsection 4.6.1 for more details. Following

we provide proofs of theorems 4.6.1 and, 4.6.2.

Proof of Theorem 4.6.1. Let R = a1, a2, . . . , a|R| ⊆ S denote the set of k best

secretaries. Let S∗ denote the first 1/k fraction of the stream of secretaries. Let E1

denote the event when S∗ ∩ R = ∅, that is, we do not lose the chance of employing

the best secretaries (R) by being a mere observer in S∗. Let E2 denote the event that

we finally pick the set R. Let us first bound Pr[E1]. In order to do so, define E1j for

j : 1 ≤ j ≤ |R| as the event that aj 6∈ Se. We know that Pr[E11 ] ≥ 1/k. In general,

we have for j > 1

Pr

[E1j

∣∣∣∣∣⋂i<j

E1i

]≥n− n

k− j + 1

n− j + 1

≥n− n

k− k

n− k

= 1− n/k

n− k

≥ 1− 2

kassuming k ≤ n

2. (4.4)

Notice that the final assumption is justified because we can solve the problem of

finding the k′ = n−k ≤ n/2 smallest numbers in case k > n/2. Using Equation (4.4)

94

we obtain

Pr[E1] = Pr[E11 ] Pr[E12 |E11 ] · · ·Pr[E1|R|| ∪j<|R| E1j ]

≥(

1− 2

k

)k≥ e−2. (4.5)

The event E2 happens when E1 happens and the (k+ 1)th largest element appears in

S∗. Thus, we have Pr[E2] = Pr[E1] Pr[E2|E1] ≥ e−2 · 1/k = 1e2k

.

Proof of Theorem 4.6.2. We assume that we cannot find the actual efficiency of a

secretary, but we only have an oracle that given two secretaries already interviewed,

reports the better of the two. This model is justified if the range of efficiency values

is large, and a suitable perturbation is introduced into the values.

Suppose the first secretary is hired after interviewing a β fraction of the secretaries.

If β > log k/k then the probability that we miss at least one of the k best secretaries

is at least 1− (1− β)k = 1− 1/k. If on the other hand, β is small, say β ≤ log k/k,

there is little chance that the right threshold can be picked. Notice that in the oracle

model, the threshold has to be the efficiency of one prior secretary. Thus for the right

threshold to be selected, we need to have the (k + 1)th best secretary in the first β

fraction—the probability of this even is no more than β. Therefore, the probability

of success cannot be more than log k/k.

4.6.1 The Secretary Problem with the “Maximum” Function

We now turn to consider a different efficiency aggregation function, namely the max-

imum of the efficiency of the individual secretaries. Alternately, one can think of this

function as a secretary function with k choices, that is, we select k secretaries and we

are satisfied as long as one of them is the best secretary interviewed. We propose an

algorithm that accomplishes this task with probability 1−O(ln kk

)for k > 1.

As we did before, we assume that n is a multiple of k, and we partition the

input stream into k equally-sized segments, named S1, S2, . . . , Sk. Let f(s) denote

95

the efficiency of the secretary s ∈ S. For each set i : 1 ≤ i < k, we compute

αi := maxs∈

⋃j≤i Si

f(s),

which is the efficiency of the best secretary in the first i segments. Clearly, αi can

be computed in an online manner after interviewing the first i segments. For each

i : 1 ≤ i < k, we try to employ the first secretary in⋃j>i Sj whose efficiency surpasses

αi. Let this choice, if at all present, be denoted si. The output of the algorithm

consists of all such secretaries sii. Notice that such an element may not exist for a

particular i, or we may have si = si′ for i 6= i′. We employ at most k − 1 secretaries.

The following theorem bounds the failure probability of the algorithm.

Theorem 4.6.3. The probability of not employing the best secretary is O(ln kk

).

Proof. Let (a1, a2, . . . , an) denote the stream of interviewed secretaries. Let am be

the best secretary, and suppose am ∈ Si, namely (i − 1)l < m ≤ il, where l := n/k.

Our algorithm is successful if the second best secretary of the set a1, a2, . . . , am−1

does not belong to Si. The probability of this event is

(i− 1)l

m≥ (i− 1)l

il=i− 1

i. (4.6)

The probability of am ∈ Si is 1/k and conditioned on this event, the probability of

failure is at most 1/i. Hence, the total failure probability is no more than∑k

i=11k1i

=

O(ln kk

)as claimed.

This problem has been previously studied by Gilbert and Mosteller [40]. Our

algorithm above is simpler and yet “robust” in the following sense. The primary

goal is to select the best secretary, but we also guarantee that many of the “good”

secretaries are also selected. In particular, we show that the better the rank of a

secretary is in our evaluation, the higher is the guarantee we have for employing her.

Theorem 4.6.4. The probability of not hiring a secretary of rank y is O(√

yk

).

96

Proof. Let (a1, a2, . . . , an) denote the stream of interviewed secretaries. Let am be the

secretary of rank y, and suppose am ∈ Si, namely (i− 1)l < m ≤ il, where l := n/k.

Below we define three bad events whose probabilities we bound, and we show that am

is hired provided none of these events occur. In particular, we give an upper bound

of O(√y/k for each event. The claim then follows from the union bound.

Let z :=√

ky−1 − 1. The event E1 occurs if i ≤ z. This event happens with

probability z/k which is less than√

1k(y−1) ≤

√yk.

We say the event E2 happens if am is not the best secretary among those in sets

Si, Si−1, . . . , Si−z. This happens when there is at least one of the y − 1 secretaries

better than am in these sets. Let W be a random variable denoting the number of

these y − 1 secretaries in any of the mentioned sets. Since any secretary is in one

of these sets with probability (z + 1)/k (notice that z + 1 is the number of these

sets), we can say that the expected value of W is (y− 1)(z+ 1)/k. Using the Markov

Inequality, the probability that W is at least 1 is at most its expected value which is

(y−1)(z+1)/k. Thus, using the definition of z, we get an upper bound of O(√

y−1k

)for E2.

Finally, we define E3 as the event when the best secretary among

a(i−z−1)l+1, a(i−z−1)l+2, . . . , aj−1 (secretaries appearing before am in the above-

mentioned sets) is in set Si. This happens with probability at most 1/(z+1), because

there are z + 1 sets that the best secretary is equally likely to be in each. Thus, we

get Pr[E3] = O(√

yk

)by definition of z.

If non of the events E1, E2 and E3 happen, we claim am is employed. Because if

the maximum item of items a(i−z−1)l+1, a(i−z−1)l+2, . . . , aj−1 is in the set Si′ , and

i− z ≤ i′ < i, then we hire am for the set Si′ ; refer to the algorithm when we consider

the threshold αi′ .

The aforementioned algorithm of [40] misses a good secretary of rank y with prob-

ability roughly 1/y. On the other hand, one can show that the algorithm of Klein-

berg [59] (for maximizing the sum of the efficiencies of the secretaries) picks secretaries

of high rank with probability about 1−Θ(1/√k). However, the latter algorithm guar-

antees the selection of the best secretary with a probability no more than O(1/√k).

97

Therefore, our algorithm has the nice features of both these algorithms: the best

secretary is hired with a very good probability, while other good secretaries also have

a good chance of being employed.

98

Bibliography

[1] A. A. Ageev and M. I. Sviridenko. An 0.828-approximation algorithm for the

uncapacitated facility location problem. Discrete Appl. Math., 93(2-3):149–156,

1999.

[2] S. Agrawal, Z. Wang, and Y. Ye. A dynamic near-optimal algorithm for online

linear programming. Working paper posted at http://www.stanford.edu/ yyye/.

[3] Miklos Ajtai, Nimrod Megiddo, and Orli Waarts. Improved algorithms and

analysis for secretary problems and generalizations. SIAM J. Discrete Math.,

14(1):1–27, 2001.

[4] David Applegate and Edith Cohen. Making routing robust to changing traffic

demands: algorithms and evaluation. IEEE/ACM Trans. Netw., 16(6):1193–

1206, 2006.

[5] Arash Asadpour, Hamid Nazerzadeh, and Amin Saberi. Stochastic submodular

maximization. In WINE, pages 477–489, 2008.

[6] Yossi Azar, Edith Cohen, Amos Fiat, Haim Kaplan, and Harald Racke. Optimal

oblivious routing in polynomial time. J. Comput. Syst. Sci., 69(3):383–394, 2004.

[7] Moshe Babaioff, Nicole Immorlica, David Kempe, and Robert Kleinberg. A

knapsack secretary problem with applications. In APPROX, pages 16–28, 2007.

[8] Moshe Babaioff, Nicole Immorlica, David Kempe, and Robert Kleinberg. Online

auctions and generalized secretary problems. SIGecom Exch., 7(2):1–11, 2008.

99

[9] Moshe Babaioff, Nicole Immorlica, and Robert Kleinberg. Matroids, secretary

problems, and online mechanisms. In SODA, pages 434–443, 2007.

[10] Bahman Bahmani and Michael Kapralov. Improved bounds for online stochastic

matching. In ESA, pages 170–181, 2010.

[11] MohammadHossein Bateni, Mohammad Taghi Hajiaghayi, and Morteza Zadi-

moghaddam. Submodular secretary problem and extensions. ACM Transactions

on Algorithms, 9(4):32, 2013.

[12] MohammadHossein Bateni, MohammadTaghi Hajiaghayi, and Morteza Zadi-

moghaddam. The submodular secretary problem. Technical Report TD-7UEP26,

AT&T Labs–Research, July 2009.

[13] Aharon Ben-Tal and Arkadi Nemirovski. Robust optimization - methodology

and applications. Math. Program., 92(3):453–480, 2002.

[14] Dimitris Bertsimas, Dessislava Pachamanova, and Melvyn Sim. Robust linear

optimization under general norms. Oper. Res. Lett., 32(6):510–516, 2004.

[15] Anand Bhalgat, Jon Feldman, and Vahab S. Mirrokni. Online ad allocation with

smooth delivery. In ACM Conference on Knowledge Discovery (KDD), 2012.

[16] V. Bilo, M. Flammini, and L. Moscardelli. Pareto approximations for the bi-

criteria scheduling problem. Journal of Parallel and Distributed Computing,

66(3):393–402, 2006.

[17] Benjamin Birnbaum and Claire Mathieu. On-line bipartite matching made sim-

ple. SIGACT News, 39(1):80–87, 2008.

[18] N. Buchbinder, K. Jain, and J.S. Naor. Online Primal-Dual Algorithms for

Maximizing Ad-Auctions Revenue. In Proc. ESA, page 253. Springer, 2007.

[19] N. Buchbinder and J.S. Naor. Fair online load balancing. In Proceedings of the

eighteenth annual ACM symposium on Parallelism in algorithms and architec-

tures, pages 291–298. ACM, 2006.

100

[20] Gruia Calinescu, Chandra Chekuri, Martin Pal, and Jan Vondrak. Maximizing

a submodular set function subject to a matroid constraint (extended abstract).

In IPCO, pages 182–196, 2007.

[21] Gerard Cornuejols, Marshall Fisher, and George L. Nemhauser. On the unca-

pacitated location problem. In Studies in integer programming (Proc. Workshop,

Bonn. 1975), pages 163–177. Ann. of Discrete Math., Vol. 1. North-Holland, Am-

sterdam, 1977.

[22] Gerard Cornuejols, Marshall L. Fisher, and George L. Nemhauser. Location of

bank accounts to optimize float: an analytic study of exact and approximate

algorithms. Manage. Sci., 23(8):789–810, 1976/77.

[23] Nikhil Devanur and Thomas Hayes. The adwords problem: Online keyword

matching with budgeted bidders under random permutations. In ACM EC,

2009.

[24] Nikhil R. Devanur, Zhiyi Huang, Nitish Korula, Vahab S. Mirrokni, and Qiqi

Yan. Whole-page optimization and submodular welfare maximization with online

bidders. In ACM Conference on Electronic Commerce, pages 305–322, 2013.

[25] Nikhil R. Devanur, Kamal Jain, Balasubramanian Sivan, and Christopher A.

Wilkens. Near optimal online algorithms and fast approximation algorithms

for resource allocation problems. In ACM Conference on Electronic Commerce,

pages 29–38, 2011.

[26] Nikhil R. Devanur, Balasubramanian Sivan, and Yossi Azar. Asymptotically

optimal algorithm for stochastic adwords. In ACM ee on Electronic Commerce,

pages 388–404, 2012.

[27] E. B. Dynkin. The optimum choice of the instant for stopping a markov process.

Sov. Math. Dokl., 4:627–629, 1963.

101

[28] Jack Edmonds. Submodular functions, matroids, and certain polyhedra. In

Combinatorial Structures and their Applications (Proc. Calgary Internat. Conf.,

Calgary, Alta., 1969), pages 69–87. Gordon and Breach, New York, 1970.

[29] U. Feige and M. X. Goemans. Approximating the value of two power proof

systems, with applications to MAX 2SAT and MAX DICUT. In ISTCS, page

182, 1995.

[30] Uriel Feige. A threshold of lnn for approximating set cover. J. ACM, 45(4):634–

652, 1998.

[31] Uriel Feige. On maximizing welfare when utility functions are subadditive. In

STOC, pages 41–50, 2006.

[32] Uriel Feige, Vahab S. Mirrokni, and Jan Vondrak. Maximizing non-monotone

submodular functions. In FOCS, pages 461–471, 2007.

[33] J. Feldman, N. Korula, V. Mirrokni, S. Muthukrishnan, and M. Pal. Online ad

assignment with free disposal. In WINE, 2009.

[34] Jon Feldman, Monika Henzinger, Nitish Korula, Vahab S. Mirrokni, and Clif-

ford Stein. Online stochastic packing applied to display ad allocation. In Mark

de Berg and Ulrich Meyer, editors, Algorithms - ESA 2010, 18th Annual Eu-

ropean Symposium, Liverpool, UK, September 6-8, 2010. Proceedings, Part I,

volume 6346 of Lecture Notes in Computer Science, pages 182–194. Springer,

2010.

[35] Jon Feldman, Nitish Korula, Vahab S. Mirrokni, S. Muthukrishnan, and Martin

Pal. Online ad assignment with free disposal. In Stefano Leonardi, editor, Inter-

net and Network Economics, 5th International Workshop, WINE 2009, Rome,

Italy, December 14-18, 2009. Proceedings, volume 5929 of Lecture Notes in Com-

puter Science, pages 374–385. Springer, 2009.

[36] Jon Feldman, Aranyak Mehta, Vahab Mirrokni, and S. Muthukrishnan. Online

stochastic matching: Beating 1 - 1/e. In FOCS, 2009.

102

[37] M. L. Fisher, G. L. Nemhauser, and L. A. Wolsey. An analysis of approximations

for maximizing submodular set functions. II. Math. Prog. Stud., (8):73–87, 1978.

Polyhedral combinatorics.

[38] M. Flammini and G. Nicosia. On multicriteria online problems. In Proceedings

of the 8th Annual European Symposium on Algorithms, pages 191–201. Springer-

Verlag, 2000.

[39] P. R. Freeman. The secretary problem and its extensions: a review. Internat.

Statist. Rev., 51(2):189–206, 1983.

[40] John P. Gilbert and Frederick Mosteller. Recognizing the maximum of a se-

quence. J. Amer. Statist. Assoc., 61:35–73, 1966.

[41] Kenneth S. Glasser, Richard Holzsager, and Austin Barron. The d choice secre-

tary problem. Comm. Statist. C—Sequential Anal., 2(3):177–199, 1983.

[42] A. Goel, A. Meyerson, and S. Plotkin. Approximate majorization and fair online

load balancing. In Proceedings of the twelfth annual ACM-SIAM symposium on

Discrete algorithms, pages 384–390. Society for Industrial and Applied Mathe-

matics, 2001.

[43] A. Goel, A. Meyerson, and S. Plotkin. Combining fairness with throughput: On-

line routing with multiple objectives. Journal of Computer and System Sciences,

63(1):62–79, 2001.

[44] Michel X. Goemans and David P. Williamson. Improved approximation algo-

rithms for maximum cut and satisfiability problems using semidefinite program-

ming. J. Assoc. Comput. Mach., 42(6):1115–1145, 1995.

[45] Anupam Gupta, Aaron Roth, Grant Schoenebeck, and Kunal Talwar. Con-

strained non-monotone submodular maximization: Offline and secretary algo-

rithms. In WINE, pages 246–257, 2010.

103

[46] B. Haeupler, V. Mirrokni, and M. Zadimoghaddam. Online stochastic weighted

matching: Improved approximation algorithms. In WINE, 2011.

[47] Mohammad T. Hajiaghayi, Robert Kleinberg, and Tuomas Sandholm. Auto-

mated online mechanism design and prophet inequalities. In AAAI, pages 58–65,

2007.

[48] Mohammad Taghi Hajiaghayi, Robert Kleinberg, and David C. Parkes. Adaptive

limited-supply online auctions. In EC, pages 71–80, 2004.

[49] Eran Halperin and Uri Zwick. Combinatorial approximation algorithms for the

maximum directed cut problem. In SODA, pages 1–7, 2001.

[50] Johan Hastad. Some optimal inapproximability results. J. ACM, 48(4):798–859,

2001.

[51] Elad Hazan, Shmuel Safra, and Oded Schwartz. On the complexity of approxi-

mating k-set packing. Computational Complexity, 15(1):20–39, 2006.

[52] Nicole Immorlica, Robert D. Kleinberg, and Mohammad Mahdian. Secretary

problems with competing employers. In WINE, pages 389–400, 2006.

[53] Satoru Iwata, Lisa Fleischer, and Satoru Fujishige. A combinatorial strongly

polynomial algorithm for minimizing submodular functions. J. ACM, 48(4):761–

777, 2001.

[54] Bala Kalyanasundaram and Kirk Pruhs. On-line network optimization problems.

In Developments from a June 1996 seminar on Online algorithms: the state of

the art, pages 268–280, London, UK, 1998. Springer-Verlag.

[55] Chinmay Karande, Aranyak Mehta, and Pushkar Tripathi. Online bipartite

matching with unknown distributions. In STOC, 2011.

[56] Richard M. Karp, Umesh V. Vazirani, and Vijay V. Vazirani. An optimal algo-

rithm for on-line bipartite matching. In STOC, pages 352–358, 1990.

104

[57] Subhash Khot, Guy Kindler, Elchanan Mossel, and Ryan O’Donnell. Optimal

inapproximability results for max-cut and other 2-variable csps? In FOCS, pages

146–154, 2004.

[58] Samir Khuller, Anna Moss, and Joseph Naor. The budgeted maximum coverage

problem. Inf. Process. Lett., 70(1):39–45, 1999.

[59] Robert Kleinberg. A multiple-choice secretary algorithm with applications to

online auctions. In SODA, pages 630–631, 2005.

[60] Nitish Korula, Vahab S. Mirrokni, and Morteza Zadimoghaddam. Bicriteria

online matching: Maximizing weight and cardinality. In to appear in Proceedings

of the Ninth Conference on Web and Internet Economics, WINE ’13, 2013.

[61] Jon Lee, Vahab Mirrokni, Viswanath Nagarajan, and Maxim Sviridenko. Max-

imizing non-monotone submodular functions under matroid and knapsack con-

straints. In STOC, pages 323–332, 2009.

[62] L. Lovasz. Submodular functions and convexity. In Mathematical programming:

the state of the art (Bonn, 1982), pages 235–257. Springer, Berlin, 1983.

[63] Mohammad Mahdian, Hamid Nazerzadeh, and Amin Saberi. Allocating online

advertisement space with unreliable estimates. In ACM Conference on Electronic

Commerce, pages 288–294, 2007.

[64] Mohammad Mahdian and Qiqi Yan. Online bipartite matching with random

arrivals: A strongly factor revealing lp approach. In STOC, 2011.

[65] Aranyak Mehta, Amin Saberi, Umesh Vazirani, and Vijay Vazirani. Adwords

and generalized online matching. J. ACM, 54(5):22, 2007.

[66] H. Menshadi, S. OveisGharan, and A. Saberi. Offline optimization for online

stochastic matching. In SODA, 2011.

105

[67] Vahab S. Mirrokni, Shayan Oveis Gharan, and Morteza Zadimoghaddam. Simul-

taneous approximations for adversarial and stochastic online budgeted allocation.

In SODA, pages 1690–1701, 2012.

[68] Rajeev Motwani, Rina Panigrahy, and Ying Xu 0002. Fractional matching via

balls-and-bins. In APPROX-RANDOM, pages 487–498, 2006.

[69] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations

for maximizing submodular set functions. I. Math. Program., 14(3):265–294,

1978.

[70] Alessandro Panconesi and Aravind Srinivasan. Randomized distributed edge

coloring via an extension of the chernoff-hoeffding bounds. Siam Journal on

Computing, 26:350–368, 1997.

[71] Maurice Queyranne. A combinatorial algorithm for minimizing symmetric sub-

modular functions. In SODA, pages 98–101, 1995.

[72] Alexander Schrijver. A combinatorial algorithm minimizing submodular func-

tions in strongly polynomial time. J. Combin. Theory Ser. B, 80(2):346–355,

2000.

[73] Maxim Sviridenko. A note on maximizing a submodular set function subject to

a knapsack constraint. Oper. Res. Lett., 32(1):41–43, 2004.

[74] Bo Tan and R. Srikant. Online advertisement, optimization and stochastic net-

works. CoRR, abs/1009.0870, 2010.

[75] R. J. Vanderbei. The optimal choice of a subset of a population. Math. Oper.

Res., 5(4):481–486, 1980.

[76] Erik Vee, Sergei Vassilvitskii, and Jayavel Shanmugasundaram. Optimal online

assignment with forecasts. In ACM EC, 2010.

[77] Jan Vondrak. Symmetry and approximability of submodular maximization prob-

lems. In FOCS, 2009.

106

[78] C.M. Wang, X.W. Huang, and C.C. Hsu. Bi-objective optimization: An online

algorithm for job assignment. Advances in Grid and Pervasive Computing, pages

223–234, 2009.

[79] Hao Wang, Haiyong Xie, Lili Qiu, Yang Richard Yang, Yin Zhang, and Albert

Greenberg. Cope: traffic engineering in dynamic networks. In Proceedings of the

2006 conference on Applications, technologies, architectures, and protocols for

computer communications, SIGCOMM ’06, pages 99–110, New York, NY, USA,

2006. ACM.

[80] John G. Wilson. Optimal choice and assignment of the best m of n randomly

arriving items. Stochastic Process. Appl., 39(2):325–343, 1991.

107

Documents

Online Allocation Algorithms with Applications in