10

Strategic Priceb ot Dynamics - cs.brown.educs.brown.edu/people/amy/papers/ecomm.pdf · an understanding of strategic priceb ot dynamics. More sp eci cally, this pap er is a comparativ

  • Upload
    dangdan

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Strategic Pricebot Dynamics

Amy R. Greenwald, Je�rey O. Kephart, and Gerald J. Tesauro

IBM Institute for Advanced Commerce

IBM Thomas J. Watson Research Center

Yorktown Heights, New York, USA

[email protected], fkephart,[email protected]

Abstract

Shopbots are software agents that automatically querymultiple sellers on the Internet to gather informationabout prices and other attributes of consumer goodsand services. Rapidly increasing in number and sophis-tication, shopbots are helping more and more buyersminimize expenditure and maximize satisfaction. In re-sponse at least partly to this trend, it is anticipatedthat sellers will come to rely on pricebots, automatedagents that employ price-setting algorithms in an at-tempt to maximize pro�ts. This paper reaches towardan understanding of strategic pricebot dynamics.

More speci�cally, this paper is a comparative studyof four candidate price-setting strategies that di�er ininformational and computational requirements: game-theoretic pricing (GT), myoptimalpricing (MY), deriva-tive following (DF), and Q-learning (Q). In an e�ort togain insights into the tradeo�s between practicality andpro�tability of pricebot algorithms, the dynamic behav-ior that arises among homogeneous and heterogeneouscollections of pricebots and shopbot-assisted buyers isanalyzed and simulated.

In homogeneous settings | when all pricebots usethe same pricing algorithm | DFs outperform MYsand GTs. Investigation of heterogeneous collections ofpricebots, however, reveals an incentive for individualDFs to deviate to MY or GT. The Q strategy exhibitssuperior performance to all the others since it learnsto predict and account for the long-term consequencesof its actions. Although the current implementation ofQ is impractically expensive, techniques for achievingsimilar performance at greatly reduced computationalcost are under investigation.

1 Introduction

Shopbots | software agents that automatically querymultiple on-line vendors to gather information aboutprices and other attributes of consumer goods and ser-vices | herald a future in which automated agents arean essential component of electronic commerce [2, 5, 11,16]. Shopbots outperform and out-inform humans, pro-viding extensive product coverage in just a few seconds,far more than a patient and determined human shoppercould ever achieve, even after hours of manual search.Rapidly increasing in number and sophistication, shop-bots are helping more and more buyers minimize ex-penditure and maximize satisfaction.

Since the launch of BargainFinder [12], a CD shop-bot, on June 30, 1995, the range of products representedby shopbots has expanded dramatically. A shopbotavailable at shopper.com claims to compare 1,000,000prices on 100,000 computer-oriented products. Anothershopbot, DealPilot.com (formerly acses.com), gath-ers, collates and sorts prices and expected delivery timesof books, CDs, and movies o�ered for sale on-line. Oneof the most popular shopbots, mysimon.com, compareso�ce supplies, groceries, toys, apparel, and consumerelectronics, just to name a few of the items on its prod-uct line. As the range of products covered by shopbotsexpands to include more complex products such as con-sumer electronics, the level of shopbot sophistication isrising accordingly. On August 16th, 1999, mysimon.comincorporated technology that, for products with multi-ple features such as digital cameras, uses a series ofquestions to elicit multi-attribute utilities from buyers,and then sorts products according to the buyer's speci-�ed utility. Also on that day, lycos.com licensed simi-lar technology from frictionless.com.

Shopbots are clearly a boon to buyers who use them.Moreover, when shopbots become adopted by a su�-cient portion of the buyer population, it seems likelythat sellers will be compelled to decrease prices andimprove quality, bene�ting even those buyers who donot shop with bots. How the widespread utilization of

shopbots might a�ect sellers, on the other hand, is notquite so apparent. Less established sellers may welcomeshopbots as an opportunity to attract buyers who mightnot otherwise have access to information about them,but more established sellers may feel threatened. Somelarger players have even been known to deliberatelyblock automated agents from their web sites [4]. Thispractice seems to be waning, however; today, sellers likeAmazon.com and BarnesandNoble.com tolerate queriesfrom agents such as DealPilot.com on the grounds thatbuyers take brand name and image as well as price intoaccount as they shop.

As more and more buyers are relying on shopbots toincrease their awareness about products and prices, it isbecoming advantageous for sellers to increase exibilityin their pricing strategies, perhaps by using pricebots|automated agents that employ price-setting algorithmsin an attempt to maximize pro�ts. A primitive exam-ple of a pricebot is available at books.com, an on-linebookseller. When a prospective buyer expresses inter-est in a given book, books.com automatically queriesAmazon.com, Borders.com, and BarnesandNoble.com

to determine the price that is being o�ered at thosesites. books.com then slightly undercuts the lowest ofthe three quoted prices, typically by 1% of the retailprice. Such dynamic pricing on millions of titles is vir-tually impossible to achieve manually, yet can easily beimplemented with a modest amount of programming.

As more and more sellers automate price-setting,pricebots are going to interact with one another, yield-ing unexpected price and pro�t dynamics. This pa-per reaches toward an understanding of strategic price-bot dynamics via analysis and simulation of four can-didate price-setting algorithms that di�er in their in-formational and computational needs: game-theoreticpricing (GT), myoptimal pricing (MY), derivative fol-lowing (DF), and Q-learning (Q). Previously, we studiedthe price and pro�t dynamics that ensue when shopbot-assisted buyers interact with homogeneous collectionsof pricebots that utilize these algorithms [9, 10, 15]. Inthis work, we �rst establish that our previous results arenot signi�cantly altered when buyers' valuations are in-homogeneous rather than identical. Later, we examinethe behavior that ensues when pricebots employing dif-ferent strategies are pitted against one another.

This paper is organized as follows. The next sec-tion presents our model of an economy that consists ofshopbots and pricebots. This model is analyzed froma game-theoretic point of view in Sec. 3. In Sec. 4, wediscuss the price-setting strategies of interest: game-theoretic, myoptimal pricing, derivative following, andQ-learning. Secs. 5 and 6 describe simulations of homo-geneous and heterogeneous collections of pricebots thatimplement these algorithms. Finally, Sec. 7 presentsour conclusions and discusses ideas for future work.

2 Model

We study an economy in which there is a single homo-geneous good o�ered for sale by S sellers and of interestto B buyers, with B � S. Each buyer b generates pur-chase orders at random times, at rate �b, and each sellers reconsiders (and potentially resets) its price ps at ran-dom times, at rate �s. The value of the good to buyerb is vb, and the cost of production for seller s is rs.

A buyer b's utility for a good is a risk-neutral func-tion of its price as follows:

ub(p) =

�vb � p if p � vb0 otherwise

(1)

We do not assume that buyers are necessarily utilitymaximizers. Instead, we assume that they use one ofa set of �xed sample size search rules in selecting theseller from whom to purchase. A buyer of type i (where0 � i � S) searches for the lowest price among i sellerschosen at random from the set of S sellers, and pur-chases the good if that seller's price is less than thebuyer's valuation vb. (Price ties are broken randomly.)

A few special cases are worth mentioning. A buyerof type i = 0 simply opts out of the market withoutchecking any prices. Buyers of types i = 1, i = 2, andi = S have been referred to in previous work [9, 10] asemploying the Any Seller, Compare Pair and Bargain

Hunter strategies, respectively; the latter correspondsto buyers who take advantage of shopbots. 1 The buyerpopulation is assumed to consist of a mixture of buyersemploying one or another of these strategies. Specif-ically, a �xed, exogenously determined fraction wi ofbuyers employs strategy i, and

Piwi = 1.

The pro�t function �s for seller s per unit time isdetermined by the price vector ~p, which describes allseller's prices: �s(~p) = (ps � rs)Ds(~p), where Ds(~p) isthe rate of demand for the good produced by seller s.This rate of demand is determined by the overall buyerrate of demand, the likelihood that the chosen seller'sprice ps will not exceed the buyer's valuation vb, andthe likelihood of buyers selecting seller s as their po-tential seller. If � =

Pb �b, and if hs(~p) denotes the

probability that seller s is selected, while g(ps) denotesthe fraction of buyers whose valuations satisfy vb � ps,then Ds(~p) = �Bhs(~p)g(ps). Note that the functiong(p) can be expressed as g(p) =

R1

p (x)dx, where (x)

is the probability density function describing the likeli-hood that a given buyer has valuation x.

1In this framework, it is also possible to consider buyers as util-ity maximizers, with the additional cost of searching for the lowestprice expressed explicitly in the utility functions. In doing so, thesearch cost for bargain hunters is taken to be zero, while for thosebuyers who use the any seller strategy, its value is greater than vb.The relationship between exogenously determined buyer behavior andthe endogenous approach which incorporates the cost of informationacquisition and explicitly allows for buyer decision-making is furtherexplored in computational settings in Kephart and Greenwald [10]; inthe economics literature, see, for example, Burdett and Judd [1].

Without loss of generality, de�ne the time scale s.t.�B = v. Now �s(~p) is interpreted as the expected pro�tfor seller s per unit sold systemwide. Moreover, sellers's pro�t is such that �s(~p) = v(ps � rs)hs(~p)g(ps). Weanalyze the functions hs(~p) and g(p) in the next section.

3 Analysis

We now present a game-theoretic analysis of the pre-scribed model viewed as a one-shot game. 2 Assumingsellers are utility maximizers, we derive the symmetricmixed strategy Nash equilibrium. A Nash equilibriumis a vector of prices at which sellers maximize their indi-vidual pro�ts and from which they have no incentive todeviate [13]. There are no pure strategy Nash equilib-ria in our economic model whenever 0 < wA < 1 [8, 9].There does, however, exist a symmetric mixed strategyNash equilibrium, which we derive presently.

Let f(p) denote the probability density function ac-cording to which sellers set their equilibrium prices, andlet F (p) be the corresponding cumulative distributionfunction. Following Varian [17], we note that in therange for which it is de�ned, F (p) has no mass points,since otherwise a seller could decrease its price by anarbitrarily small amount and experience a discontinu-ous increase in pro�ts. Moreover, there are no gaps inthe said distribution, since otherwise prices would notbe optimal | a seller charging a price at the low end ofthe gap could increase its price to �ll the gap while re-taining its market share, thereby increasing its pro�ts.The cumulative distribution function F (p) is computedin terms of the quantity hs(~p).

Recall that hs(~p) represents the probability that buy-ers select seller s as their potential seller. This functionis expressed in terms of the probabilistic demand forseller s by buyers of type i, namely hs;i(~p), as follows:

hs(~p) =SXi=0

hs;i(~p) (2)

The �rst component hs;0(~p) = 0. Consider the nextcomponent, hs;1(~p). Buyers of type 1 select sellers atrandom; thus, the probability that seller s is selectedby such buyers is simply hs;1(~p) = 1=S. Now considerbuyers of type 2. In order for seller s to be selected bya buyer of type 2, s must be included within the pairof sellers being sampled, which occurs with probability

2The analysis presented in this section applies to the one-shot ver-sion of our model, although the simulation results reported in Sec. 5focus on repeated settings. We consider the Nash equilibrium of theone-shot game, rather than its iterated counterpart, for at least tworeasons, including (i) the Nash equilibrium of the stage game playedrepeatedly is in fact a Nash equilibrium of the repeated game, and(ii) the Folk Theorem of repeated game theory (see, for example, Fu-denberg and Tirole [7]) states that virtually all payo�s in a repeatedgame correspond to a Nash equilibrium, for su�ciently large valuesof the discount parameter. Thus, we isolate the stage game Nashequilibrium as an equilibrium of particular interest.

(S � 1)=�S2

�= 2=S, and s must be lower in price than

the other seller in the pair. Since, by the assumptionof symmetry, the other seller's price is drawn from thesame distribution, this occurs with probability 1�F (p).Hence, hs;2(~p) = (2=S) [1� F (p)]. In general, seller s is

selected by a buyer of type i with probability�S�1i�1

�=�Si

�= i=S, and seller s is the lowest-priced among the isellers selected with probability [1�F (p)]i�1, since theseare i�1 independent events. Therefore, we have derivedhs;i(~p) = (i=S)[1 � F (p)]i�1, and3

hs(p) =1

S

SXi=0

iwi[1� F (p)]i�1 (3)

A Nash equilibrium in mixed strategies requires thatall prices assigned positive probability yield equal pay-o�s | otherwise, it would not be optimal to randomize.Thus, assuming rs = r for all sellers s, the equilibriumpayo� � � �s(p) = v(p � r)hs(p)g(p), for all prices p.The precise value of � can be derived by consideringthe maximum price that sellers are willing to charge,say pm. At this price, F (pm) = 1, which by Eq. 3 im-plies that hs(pm) = w1=S. Identifying the expressionv(p� r)g(p) as the pro�t function of a monopolist, thisfunction attains its maximal value �m (the monopolist'spro�t) at price pm. Therefore, for all sellers s,

� = v(pm � r)hs(pm)g(pm) =w1�mS

(4)

and hence,

v(p � r)g(p) =w1�mPS

i=0 iwi[1� F (p)]i�1(5)

implicitly de�nes p and F (p) in terms of one another,and in terms of g(p), for all p such that 0 � F (p) � 1.

In previous work, all buyer valuations were takento be equal (i.e., for all buyers b, vb = v), and henceg(p) = �(v � p) (see [9] and [10]). In this paper, weassume (x) is a uniformdistribution over interval [0; v],with v > 0, in which case the integral yields a stepfunction as follows:

g(p) =

�1� p=v if p � v0 otherwise

(6)

For this uniform distribution of buyer valuations, themonopolist's pro�t function is simply (p� r)(v�p), forp � v, which is maximized at the price pm = (v + r)=2.At this price, the monopolist's pro�t �m = [(v� r)=2]2.Inserting these values into Eq. 5 and solving for p interms of F yields:

p(F ) = pm �p�m(1� w1PS

i=0 iwi[1� F (p)]i�1

)1=2

(7)3In Eq. 3, hs(p) is expressed as a function of seller s's scalar price

p, given that probability distribution F (p) describes the other sellers'expected prices.

Eq. 7 has several important implications. First ofall, in a population in which there are no buyers of type1 (i.e., w1 = 0) the sellers charge the production costr and earn zero pro�ts; this is the traditional Bertrandequilibrium. On the other hand, if the population con-sists of just two buyer types, 1 and some i 6= 1, then itis possible to invert p(F ) to obtain:

F (p) = 1���

w1

iwi

��(pm � p)2

2(v � p)(p � r)

�� 1

i�1

(8)

The case in which i = S and vb = v for all buyers bwas studied previously by Varian [17]; in this model,buyers either choose a single seller at random (type 1)or search all sellers and choose the lowest-priced amongall sellers (type S) and all buyers have equal valuations.

Since F (p) is a cumulative probability distribution,it is only valid in the domain for which its valuation isbetween 0 and 1. The upper boundary is p = pm, sinceprices above this threshold leads to decreases in mar-ket share that exceed the bene�ts of increased pro�tsper unit. The lower boundary p� can be computed bysetting F (p�) = 0 in Eq. 7, which yields:

p� = pm �p�m(1� w1PS

i=0 iwi

)1=2

(9)

In general, Eq. 7 cannot be inverted to obtain an ana-lytic expression for F (p). It is possible, however, to plotF (p) without resorting to numerical root �nding tech-niques. We use Eq. 7 to evaluate p at equally spacedintervals in F 2 [0; 1]; this produces unequally spacedvalues of p ranging from p� to pm.

Fig. 1(a) and 1(b) depict the CDFs in the prescribedmodel under varying distributions of buyer strategies| in particular, w1 2 f0:1; 0:25;0:5; 0:75; 0:9g| whenS = 2 and S = 5, respectively. In Fig. 1(a), most ofthe probability density is concentrated just above p�,where sellers expect low margins but high volume. Incontrast, in Fig. 1(b), the probability density is con-centrated either just above p�, where sellers expect lowmargins but high volume, or just below pm, where theyexpect high margins but low volume. In both �gures,the lower boundary p� increases as the fraction of ran-dom shoppers increases; in other words, p� decreases asthe fraction of shopbot buyers increases. Moving fromS = 2 to S = 5 further decreases p�, which is consistentwith Eq. 9. These relationships have a straightforwardinterpretation: shopbots catalyze competition amongsellers, and moreover, small fractions of shopbot usersinduce competition among large numbers of sellers.

Recall that the pro�t earned by each seller is 1S�mw1,

which is strictly positive so long as w1 > 0. It is asthough only buyers of type 1 are contributing to sellers'pro�ts, although the actual distribution of contributions

0.0 0.1 0.2 0.3 0.4 0.50.0

0.2

0.4

0.6

0.8

1.0

w1 = 0.9

w1 = 0.5

w1 = 0.1

Cumulative Distribution Functions

Price

Pro

babi

lity

0.0 0.1 0.2 0.3 0.4 0.50.0

0.2

0.4

0.6

0.8

1.0

w1 = 0.9

w1 = 0.5

w1 = 0.1

Cumulative Distribution Functions

Price

Pro

babi

lity

(a) 2 Sellers (b) 5 Sellers

Figure 1: CDFs for w1 2 f0:1; 0:25; 0:5;0:75;0:9g.

from buyers of type 1 vs. buyers of type i > 1 is not asone-sided as it appears. In reality, buyers of type 1 arecharged less than pm on average, and buyers of typei > 1 are charged more than r on average, althoughtotal pro�ts are equivalent to what they would be if thesellers practiced perfect price discrimination. In e�ect,buyers of type 1 exert negative externalities on buyersof type i > 1, by creating surplus pro�ts for sellers.

4 Pricebot Strategies

When su�ciently widespread adoption of shopbots bybuyers forces sellers to become more competitive, itis likely that sellers will respond by creating pricebots

that automatically set prices in attempt to maximizepro�tability. It seems unrealistic, however, to expectthat pricebots will simply compute a Nash equilibriumand �x prices accordingly. The real business world isfraught with uncertainties, undermining the validity oftraditional game-theoretic analyses: sellers lack perfectknowledge of buyer demands, and have an incompleteunderstanding of competitors' strategies. In order tobe deemed pro�table, pricebots will need to learn fromand adapt to changing market conditions.

We now introduce four pricebot strategies, each ofwhich places di�erent requirements on the type andamount of information available to the agent and uponthe agent's computational power.

GT The game-theoretic strategy is designed to repro-duce the mixed strategy Nash equilibrium that wascomputed in the previous section. It makes use offull information about the buyer population, andit assumes its competitors utilize game-theoreticpricing as well.

GT is a constant function since it makes no use ofhistorical observations. Nonetheless, it is of inter-est in our simulation studies in part because thereexist learning algorithms that converge to stagegame-theoretic equilibria over repeated play (seeFoster and Vohra [6] and Greenwald [8]).

MY The myopically optimal, or myoptimal , 4 pricingstrategy (see, for example, [11]) uses informationabout all the buyer characteristics that factor intothe buyer demand function, as well as competitors'prices, but makes no attempt to account for com-petitors' pricing strategies. Instead, it is based onthe assumption of static expectations: even if oneseller is contemplating a price change under my-optimal pricing, this seller does not assume thatthis will elicit a response from its competitors; itassumes that competitors' prices will remain �xed.

The myoptimal seller s uses all available informa-tion and the assumption of static expectations toperform an exhaustive search for the price p�s thatmaximizes its expected pro�t �s. The computa-tional demands can be reduced greatly if the pricequantum � (the smallest amount by which oneseller may undercut another) is su�ciently small.Under such circumstances, the optimal price p�s isguaranteed to be either the monopolistic price pmor � below some competitor's price, limiting thesearch for p�s to S possible values. In our simula-tions, we choose � = 0:002.

DF The derivative-following strategy is less informa-tionally intensive than either the myoptimal or thegame-theoretic pricing strategies. In particular,this strategy can be used in the absence of anyknowledge or assumptions about one's competi-tors or the buyer demand function. A derivativefollower simply experiments with incremental in-creases (or decreases) in price, continuing to moveits price in the same direction until the observedpro�tability level falls, at which point the direc-tion of movement is reversed. The price increment� is chosen randomly from a speci�ed probabilitydistribution; in the simulations described here thedistribution was uniform between 0.01 and 0.02.

Q The Q-learning price-setting strategy is based on areinforcement learning procedure called Q-learning[18], which can learn optimal pricing policies forMarkov Decision Problems (MDPs). It does soby learning the function Q(x; a) representing thecumulative discounted payo� of taking action ain state x. The discounted payo� is expressed asP

n nrn, where rn is the expected reward n time

steps in the future, and is a constant \discountparameter" lying between 0 and 1. The optimalpolicy for a given state is the action that maxi-mizes the Q-function for that state. Q-learningyields a deterministic policy, and is therefore un-able to represent equilibrium play in games where

4In the game-theoretic literature, this strategy is often referred toas Cournot best-reply dynamics [3]; however, price is being set, ratherthan quantity.

the equilibria are composed solely of randomizedstrategies. Q-learning �nds the optimal policy incases where the Q-learner's opponents use station-ary Markovian strategies. Situations that deviatefrom this, such as history-dependent opponents ornon-stationary learning opponents (e.g., anotherQ-learner), constitute an interesting and open re-search topic that is touched upon here and in someof our prior work (see Tesauro and Kephart [15]).

5 GT, MY, and DF Simulations

We simulated an economy with 1000 buyers and ini-tially 2, and later 5, pricebots employing various mix-tures of pricing strategies. In the simulations depicted,buyer valuations are uniformly distributed in the inter-val [0; 1], and each seller's production cost r = 0. Themixture of buyer types is set at w1 = 0:25, w2 = 0:25,and wS = 0:5; when S = 2, w2 = wS = 0:75. Thesimulation is asynchronous: at each time step, a buyeror seller is randomly selected to either make a purchaseor reset a price. The chance that a given agent is se-lected to act is determined by its rate; the rate �b atwhich a given buyer b attempts to purchase the goodis set to 0.000999, while the rate �s at which a givenseller reconsiders its price is 0.00005. Each simulationwas iterated for 100 million time steps.

5.1 Homogeneous Simulations

Here, we augment our previous work (see [9]) on sim-ulations of 2 pricebots of matching types by assumingthe uniformly distributed buyer valuations introducedin Section 3. If we consider 2 GT pricebots, simulationsverify that the cumulative distribution of prices closelyresembles the derived F (p), to within statistical error.The time-averaged pro�ts for each seller were 0:0319,nearly the theoretical value of 0.03125. When 2 price-bots use the myoptimal pricing strategy, cyclical pricewars result (see [9]). The myoptimal sellers' expectedpro�ts averaged over one price-war cycle are given by�mys = 1=vS[pm � p�]

R pmp�

(v � p)(p � r) dp, which isequal to 0.0893. Simulation results closely match thistheoretical value with an average pro�t per time step forMYs of 0.0892, nearly thrice the average pro�t obtainedvia the game-theoretic pricing strategy.

In related work (see [9]), assuming all buyer valua-tions to be equal, the behavior of DF pricebots tendedtowards what is in e�ect a collusive state in which allprices approached the monopolistic price. In this study,2 DF pricebots once again track one another closely; indoing so they accumulate greater pro�ts than either 2myoptimal or 2 game-theoretic pricebots. The e�ectof uniformly distributed buyer valuations is nonethelessapparent in substantial uctuations in price, ranging

from approximately 0.15 to 0.55 | the upper boundof which is above the monopolistic price pm = 0:5 |rather than remain in the neighborhood of the monop-olistic price. Speci�cally, DF pricebots achieved time-averaged pro�ts of 0.1127.

5.2 Heterogeneous Simulations

Fig. 2(a) and Fig. 2(b) portray simulations of hetero-geneous play. Consider the price dynamics of Fig 2(a),which depicts 1GT vs. 1DF. In this scenario, the game-theoretic pricebot outperforms the derivative followerby more often than not capturing greater market sharewith its lower price. In response, the derivative followercharges relatively high prices, although it does not oscil-late precisely around the monopolistic price pm = 0:5;instead, it prices in a range slightly below this optimum,since in doing so it more frequently �nds itself to be thelower-priced of the two pricebots. The average pro�tsof GT were 0.0682, while DF's average pro�ts were lessthan half this value at 0.0334.

In contrast, Fig 2(b) depicts the price dynamics of1GT vs. 1MY. Unlike the derivative follower, the my-optimal pricebot outperforms the game-theoretic price-bot; speci�cally, the time-averaged pro�ts of GT weremerely 0.0235, while MY achieved 0.0494. MY pricinghas two notable advantages over derivative following:(i) access to full information pertaining to both com-petitors' prices and buyer demand, and (ii) the abilityto change its price discontinuously, if necessary. Accord-ingly, Fig. 2(b) reveals that MY prices generally just un-dercut GT prices, unless GT charges p�, in which caseMY charges pm = 0:5. We temporarily defer discussionof competition between MY and DF pricebots.

Table 1 summarizes the results of our 1-on-1 GT,MY, and DF simulations by depicting the time-averagedpro�ts obtained by pricebots that employed the variousstrategies as indicated. It is interesting to consider thispro�t matrix as representing the payo�s of a normalform game in which there are three possible strategies,namely MY, DF, and GT. In doing so, we observe thatthe strategy pro�les (1MY, 1MY) and (1DF, 1DF) areboth pure strategy Nash equilibria, with the latter asPareto optimal. Moreover, regardless of the opponents'behavior, it is always preferable to choose strategy MYor DF, rather than behave as prescribed by GT: i.e.,the elimination of dominated strategies eliminates thegame-theoretic strategists. This outcome is not entirelysurprising in view of the fact that GT is rooted in a stagegame analysis, and prescribes play with no regard forhistorical data. In contrast, MY and DF take into ac-count changing environmental conditions, and are thusmore apt in asynchronous, repeated game settings.

We now turn to heterogeneous simulations of 5 price-bots. These simulations were conducted assuming 1 in-dividual pricebot vs. 4 pricebots playing the same strat-

0 1 2 3 4 5 6 7 8 9 10×107

0.0

0.1

0.2

0.3

0.4

0.5

0.61GT vs. 1DF

Time

Pric

e

0 1 2 3 4 5 6 7 8 9 10×107

0.0

0.1

0.2

0.3

0.4

0.5

0.61GT vs. 1MY

Time

Pric

e

Figure 2: 1-on-1 Pricebot Simulations

1MY 1DF 1GT1MY .0892,.0892 .0932,.0456 .0494,.02351DF .0456,.0932 .1127,.1127 .0334,.06821GT .0235,.0494 .0682,.0334 .0319,.0319

Table 1: 1-on-1 Pro�t Matrix. Within a given cell, theleft-hand pro�t is that received by an agent employingthe strategy corresponding to that cell's row, while theright-hand pro�t is that received by an agent employingthe strategy corresponding to that cell's column.

egy: e.g., 1MY vs. 4DF. Fig 3 displays a 3 � 3 matrixdepicting the 9 possible combinations of 1 vs. 4 price-bots of strategies MY, DF, and GT, with the cells inthe matrix indexed by 1 � i; j � 3: e.g., Fig 31;2 refersto the cell that depicts 1MY vs. 4DF. We describe twoof the more interesting o�-diagonal entries.

1MY vs. 4DF

The fact that a homogeneous group of derivative follow-ers is able to extract a larger pro�t than more clever,better informed, game-theoretic and myoptimal agents(see Sec. 5.1) may seem paradoxical: how is it that itis so smart to be so ignorant? Fig. 31;2, in which onemyoptimal pricebot is pitted against 4 derivative fol-lowers, suggests that in fact ignorance may not be bliss.During the simulation, MY averaged a pro�t of 0.0690.Hovering close to one another, just below the monopo-listic price pm, 2 of the 4 DFs received average pro�tsof 0.0210 and 0.0227. A third DF never learned thatit would be better o� charging below pm; even after 10million iterations it continued to set its price above pmand earned only 0.0188. The �nal DF pricebot, how-ever, had a more interesting experience: it engaged inhead-to-head combat with the MY pricebot, managingin the process to do better than its cohorts obtainingan average pro�t of 0.0275. Towards the end of thissimulation though, the prices of the competing DF andMY pricebots approach pm, and consequently, those ofthe other DF pricebots. In longer runs, we observed

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.65MY

Time (Millions)

Pric

e

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.61MY vs. 4DF

Time (Millions)

Pric

e

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.61MY vs. 4GT

Time (Millions)

Pric

e

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.61DF vs. 4MY

Time (Millions)

Pric

e

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.65DF

Time (Millions)

Pric

e

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.61DF vs. 4GT

Time (Millions)

Pric

e

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.61GT vs. 4MY

Time (Millions)

Pric

e

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.61GT vs. 4DF

Time (Millions)

Pric

e

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.65GT

Time (Millions)

Pric

e

Figure 3: 4-on-1 Pricebot Simulations: Prices vs. Time

other DFs going head-to-head with the MY pricebotsuggesting that all DFs ultimately fare equally well.

1DF vs. 4MY

A simulation of 1 derivative-following pricebot compet-ing with 4 myoptimal pricebots is depicted in Fig. 32;1.In this simulation, the derivative follower achieved av-erage pro�ts of 0.0136, while the myoptimal pricebotsearned 0.0335 on average. Note that the pro�t obtainedby a myoptimal seller in this heterogeneous setting isnearly identical to that obtained when all sellers aremyoptimal. This equivalence is due to a coinciden-tal cancellation of two opposing e�ects. First, note inFig. 32;1 that the derivative follower's price forms anupper bound above which no myoptimal seller is will-ing to price. This upper bound is in general less thanpm, since the derivative follower is continually enticed

into half-hearted price wars, but its recovery phase istypically too short to allow it to meander up as high aspm. These dynamics tend to reduce the pro�t receivedby a myoptimal seller. On the other hand, the moresavvy myoptimal pricebots rarely permit the derivativefollower to undercut them, so the pro�ts earned in themarket segment consisting of the more diligent shop-pers is shared by just four myoptimal sellers, ratherthan �ve. Thus, each myoptimal seller receives a largerpiece of a smaller pie. These opposing e�ects happento counterbalance one another almost perfectly for thechoice of parameters made in this example.

Prisoners' Dilemma

Table 2 summarizes the results of 5 pricebot simula-tions by depicting the time-averaged pro�ts obtainedby pricebots that employed the various strategies as in-

dicated. (This table parallels Fig. 3.) Given 4 oppos-ing pricebots that behave uniformly, the best-responseof one additional pricebot is myoptimal pricing; con-versely, assuming 1 opposing pricebot behaves accord-ing to MY, the best-response for 4 pricebots acting inunison is again myoptimal pricing. Thus, in this frame-work, the strategy pro�le (1MY, 4MY) can be viewedas a Nash equilibrium; in fact this is the unique Nashsolution in Table 2. This strategy pro�le is not Paretooptimal, whereas (1DF, 4DF), for example, is Paretooptimal. In e�ect, this pro�t matrix has the same basiccharacter as that of the Prisoner's Dilemma: the coop-erative strategy is DF, but there is every incentive forindividuals to defect to MY.

4MY 4DF 4GT1MY .0337,.0337 .0690,.0225 .0185,.01091DF .0136,.0335 .0387,.0387 .0134,.01591GT .0119,.0169 .0536,.0226 .0129,.0129

Table 2: 4-on-1 Pro�t Matrix

Consumer Surplus

Table 3 presents the consumer surplus | the bene�tachieved beyond the buyers' willingness to pay | ob-tained by the simulated buyers for the usual combina-tions of seller strategies. As expected, when all price-bots play DF and seller pro�ts are maximized, consumersurplus is minimized. Notice, however, that the intro-duction of a single GT pricebot substantially increasesconsumer surplus; similarly, the introduction of a GTpricebot into a group of otherwise MY pricebots leadsto even greater increases in consumer surplus. Appar-ently, the presence of even a single GT agent substan-tially bene�ts buyers. Interestingly, however, the great-est consumer surplus is seen, not when all pricebotsare GT strategists, but rather when one pricebot playsMY and the remainder play GT. The deterministic, ex-plicit undercutting by the MY pricebot (as opposed tothe probabilistic undercutting by the GT pricebot) notonly bene�ts the MY seller (see Table 2), but it alsoslightly improves the consumers' lot. What is a gain forMY and the consumers is a loss for the GT sellers.

4MY 4DF 4GT1MY .2907 .3066 .43491DF .3010 .2638 .41441GT .4092 .3238 .4326

Table 3: Consumer Surplus

6 Q-Learning Simulations

The MY, DF, and GT pricing strategies studied in theprevious section are all prede�ned strategies that do notvary during a simulation run. In this section, we studyQ-learning, which incorporates a training period dur-ing which time Q pricebots adapt to speci�c opponentstrategies. Simulation results of Q-learning against eachof the 4 pricebot strategies (including Q-learning itself)are presented below. Due to the lookup table represen-tation of the Q-function, these simulations were limitedto 2 pricebots. In future work, we plan to study Q-learning in the case of multiple pricebots using functionapproximators (e.g., neural networks and decision trees)rather than lookup tables to represent the Q-functions.Details of our Q-learning methodology are presented inTesauro and Kephart [15].

Q vs. GT

Initially, we trained a Q-learner against a GT pricebot,and obtained a resulting policy that is virtually identi-cal to the myoptimal policy, regardless of the value ofthe discount parameter . This is due to the fact thatthe game-theoretic strategy is state-independent, whichimplies the actions of the Q-learner have no impact onits opponent's future course of action. Hence, the bestthe Q-learner can do is optimize its immediate reward,which amounts to following the myoptimal policy.

Q vs. DF

The DF strategy is history dependent; thus, Q-learningis not guaranteed in theory to �nd an optimal policyagainst DF. In practice, however, Q-learning performsquite well. The Q-derived policy, which is once againlargely independent of , is similar to the myoptimalpolicy. The primary di�erence is that the Q-learnertends to undercut by 2� rather than by just �, therebyguaranteeing that it will continue to have the lowestprice even after the DF replies, since the DF can onlylower its price by � at each given time step. 5

Q vs. MY

In Tesauro and Kephart [15], Q-learning vs. MY and Qvs. another Q-learner were studied in a variant of thepresent model of shopbots and pricebots in which allbuyers had equal valuations. The results discussed here,assuming a uniformdistribution of buyer valuations, aresimilar to the results reported previously. Q-learning al-ways yielded exact convergence to a stationary, optimalsolution (as expected) against a myoptimal opponent.Moreover, the Q-derived policies always outperformed

5In the simulations reported here, DF was modi�ed slightly so thatthe size of its price increments was �xed at 0.01, which matched theprice discretization of the Q-learner's lookup tables.

the myoptimal strategy, and average utility increasedmonotonically with . Q-learning also had the side ef-fect of improving the utility of its myopic opponent,because the Q-learner learned to abandon undercuttingbehavior more readily than MY as the price decreased.As shown in Fig. 4(a), the price-war regime is smallerand is con�ned to higher average prices, leading to acloser approximation to collusive behavior, with greaterexpected utilities for both sellers.

0 1 2 3 4 5 6 7 8 9 10×106

0.0

0.1

0.2

0.3

0.4

0.5

0.61Q vs. 1MY

Time

Pric

e

0 1 2 3 4 5 6 7 8 9 10×106

0.0

0.1

0.2

0.3

0.4

0.5

0.62Q

Time

Pric

e

Figure 4: (a) Prices vs. time for Q-learner vs. myopti-mal pricebot after training. (Training used a discountparameter = 0:5.) Compared to MY vs MY, theprice-war amplitude is diminished, and pro�tability isincreased for both sellers. (b) Prices vs. time for oneQ-learner vs. another after training. (Both Q-learnerstrained with = 0:5.)

Q vs. Q

In simultaneous Q-learning, exact convergence of the Q-functions was only found for small . For large , therewas very good approximate convergence, in which theQ-functions converged to stationary solutions to withinsmall random uctuations. Di�erent solutions were ob-tained at each value of . For small , a symmetricsolution was generally obtained (in which the shapesof p1(p2) and p2(p1) were identical), whereas a brokensymmetry solution was obtained at large . There wasa range of values, between 0.1 and 0.3, where eithera symmetric or asymmetric solution could be obtained,depending on initial conditions. The asymmetric solu-tion is counter-intuitive because one would expect thatsymmetric utility functions would lead to symmetricpolicies. Fig. 4(b) illustrates the results of simultane-ous Q-learning at = 0:5. Note that there is a furtherreduction in the price-war regime as compared to MYvs. MY and Q vs. MY, leading to even greater pro�tsfor both sellers. Pro�ts in the Q-learner simulations aresummarized in Table 4.

In the cases of Q vs MY and Q vs GT, it appearsthat the Q-learner has obtained the theoretically opti-mal strategies against the respective opponents. For

Pricebots Pro�ts(Q,GT) (.0469, .0360)(Q,DF) (.2132, .0301)(Q,MY) (.1089, .1076)(Q,Q) (.1254, .1171)

Table 4: Pro�ts per time step in 2 pricebot simulationswhere one of the pricebots was a Q-learner (trained at = 0:5) and the other was as indicated.

Q vs DF, the Q-learner does quite well despite thenon-Markovian nature of the opponent's strategy, andfor simultaneous Q-learning, generally good behaviorwas found, despite the absence of theoretical guaran-tees. Exact or very good approximate convergence wasobtained to simultaneously self-consistent Q-functionsand optimal policies. Spontaneous symmetry breakingis found for large , even though the utility functionsfor both players are symmetric. Understanding howthis broken symmetry solution comes about is an ac-tive area of research. Also of interest would be to bet-ter characterize the solutions that appear to convergeapproximately but not exactly; such behavior has noanalog in ordinary Q-learning.

7 Conclusions and Future Work

This paper examined several approaches to the designand implementation of pricebot algorithms that di�erin their informational and computational requirements,and further di�er in terms of their merits and limita-tions when tested against themselves and one another.The simplest algorithm is derivative following (DF) |its price adjustments are trivial to compute, and it re-quires no knowledge of either the buyer responses or ofthe other sellers' prices. DF is a reasonable approach inthe absence of such knowledge, and performs well whenother sellers also use DF. If more knowledge is avail-able, however, other approaches can be designed to yieldgreater pro�ts. If expected buyer demand is known, itis often possible to compute a game-theoretic equilib-rium solution (GT). While this strategy is perhaps ap-propriate for one-shot games, in iterated games wherethere is knowledge of other sellers' prices, it is possibleto outperform GT. The myoptimal strategy (MY), forexample, uses knowledge of expected buyer demand tocompute a short-term optimal price (but ignores the ex-pected long-term behavior of its competitors) and out-performs both GT and DF. Computation of MY wouldappear to require an exhaustive search over all possi-ble prices; however, MY's computational overhead canbe reduced to examining only �-undercuts of the othersellers' prices and the monopolistic price, for su�cientlysmall values of �.

Pricing based on Q-learning is superior to myopti-mal pricing since Q anticipates the longer-term con-sequences of its actions resulting both from competi-tor responses and its own counter-responses. Thus,among the pricing strategies studied, Q appears to o�erthe most promising performance; however, its compu-tational requirements are rather substantial. This istrue for both o�-line and on-line Q-learning. For o�-line Q, an extensive simulation is required, includingmodels of both the buyers' demands and the other sell-ers' strategies. For on-line Q, real competitor pricesand buyer responses are used, so that models are notrequired, but the training process remains long and te-dious, and the learner must endure true losses duringthe training period while exploring suboptimal actions.In future work, we plan to continue investigating adap-tive learning approaches such as Q-learning, with anaim towards designing algorithms that are feasible forpractical implementation. For example, the Q-functioncan be represented using function approximators ratherthan lookup tables. Preliminary (and encouraging) re-sults have been obtained based on neural networks [14],and work based on decision trees is underway. Lastly,while knowledge of competitors' prices seems reasonable(by the very existence of shopbots), accurate models ofcompetitors' strategies or buyer demand may in fact beunavailable in actual agent economies; methods to over-come such limitations remain open to investigation.

References

[1] K. Burdett and K. L. Judd. Equilibrium price dis-persion. Econometrica, 51(4):955{969, July 1983.

[2] A. Chavez and P. Maes. Kasbah: an agent market-place for buying and selling goods. In Proceedings

of the First International Conference on the Prac-

tical Application of Intelligent Agents and Multi-

Agent Technology, London, U.K., April 1996.

[3] A. Cournot. Recherches sur les Principes Math-

ematics de la Theorie de la Richesse. Hachette,1838.

[4] J. B. DeLong and A. M. Froomkin. The nexteconomy? In D. Hurley, B. Kahin, and H. Var-ian, editors, Internet Publishing and Beyond: The

Economics of Digital Information and Intellecutal

Property. MIT Press, Cambridge, Massachusetts,1998.

[5] J. Eriksson, N. Finne, and S. Janson. Informa-tion and interaction in MarketSpace | towards anopen agent-based market infrastructure. In Pro-

ceedings of the Second USENIX Workshop on Elec-

tronic Commerce, November 1996.

[6] D. Foster and R. Vohra. Regret in the on-line de-cision problem. Games and Economic Behavior,21:40{55, 1997.

[7] D. Fudenberg and J. Tirole. Game Theory. MITPress, Cambridge, 1991.

[8] A. Greenwald. Learning to Play Network Games.PhD thesis, Courant Institute of Mathematical Sci-ences, New York University, New York, May 1999.

[9] A. Greenwald and J. Kephart. Shopbots and price-bots. In Proceedings of Sixteenth International

Joint Conference on Arti�cial Intelligence, To Ap-pear, August 1999.

[10] J. Kephart and A. Greenwald. Shopbot economics.In Proceedings of Fifth European Conference on

Symbolic and Quantitative Approaches to Reason-

ing with Uncertainty, To Appear, July 1999.

[11] J. O. Kephart, J. E. Hanson, D. W. Levine, B. N.Grosof, J. Sairamesh, R. B. Segal, and S. R. White.Dynamics of an information �ltering economy. InProceedings of the Second International Workshop

on Cooperative Information Agents, 1998.

[12] B. Krulwich. The BargainFinder agent: Compari-son price shopping on the Internet. In J. Williams,editor, Agents, Bots and Other Internet Beasties,pages 257{263. SAMS.NET publishing (MacMil-lan), 1996. URLs: http://bf.cstar.ac.com/bf,http://www.geocities.com/ResearchTriangle/9430.

[13] J. Nash. Non-cooperative games. Annals of Math-

ematics, 54:286{295, 1951.

[14] G. Tesauro. Pricing in agent economies using neu-ral networks and multi-agent q-learning. In Pro-

ceedings of IJCAI Workshop on Learning About,

From, and With Other Agents, August 1999.

[15] G. Tesauro and J. Kephart. Pricing in agenteconomies using multi-agent q-learning. In Proceed-ings of Fifth European Conference on Symbolic and

Quantitative Approaches to Reasoning with Uncer-

tainty, To Appear, July 1999.

[16] M. Tsvetovatyy, M. Gini, B. Mobasher, andZ. Wieckowski. MAGMA: an agent-based virtualmarket for electronic commerce. Applied Arti�cial

Intelligence, 1997.

[17] H. Varian. A model of sales. American Eco-

nomic Review, Papers and Proceedings, 70(4):651{659, September 1980.

[18] C. Watkins. Learning from Delayed Rewards. PhDthesis, Cambridge University, Cambridge, 1989.