Algorithmic Collusion Detection and Regulation · 2021. 7. 1. · 4/31 Detecting Algorithmic Collusion Ex-ante 1.Simulate algorithm behavior to classify algorithms into collusive

1/31

Algorithmic Collusion Detection and Regulation

Matteo Courthoud

April 15, 2021

2/31

Algorithmic Collusion

High-frequency pricing decision are delegated to algorithms (Chen etal., 2016)

Computational evidence that algorithms can learn to collude (Calvanoet al., 2020; Klein, 2019)

Empirical evidence that it might be happening (Assad et al., 2020)

General concerns from policy: OECD (2017), Autorite de laconcurrence & Bundeskartellamt (2019), UK Competition andMarkets Authority (2021)

How can we detect collusion? We first need to define it.

3/31

Defining Algorithmic Collusion

From the legal perspective:

“Collusion is when firms use strategies that embody a reward–punishmentscheme which rewards a firm for abiding by the supracompetitive outcomeand punishes it for departing from it.” (Harrington, 2018)

Normally we cannot observe firm strategies so we punish firm intentto collude, via communication

However, we can observe algorithm strategies!

What is the best way to detect algorithmic collusion?

4/31

Detecting Algorithmic Collusion

Ex-ante

1. Simulate algorithm behavior to classify algorithms into collusive andnon-collusive ones (Ezrachi & Stucke, 2017)

Live monitoring

2. Get intermediate output from the algorithm

“Computer scientists might develop algorithms that explain their ownbehavior” (Calvano et al., 2020)

Ex-post

3. Rely on observational data

4. Look at algorithm strategies

“An AA [artificial agent] is not collusive if its price recommendation isnot dependent on rival firms’ responding in a particular manner”(Harrington, 2018)

5/31

This Project

1. Show that observing algorithm strategies might not be enough todetermine collusion

Even algorithms’ strategies that do not depend on rivals’ actions cangenerate collusive behavior in the form of reward-punishment schemesHow? Self-punishment

2. Propose a method to detect collusive behavior, based on unilateralalgorithm re-training

No need to control all algorithms at the same timeNo modeling assumptionsBut unfeasible: need firms to reset their algorithms

3. Propose an alternative to (2) that does not require additional data

Work in progressVery noisy results

5/31

This Project







5/31

This Project







6/31

Literature

Igami (2020): reinforcement learning in economics terms

Calvano et al. (2020): Q-learning alg.s learn grim-trigger strategies

Klein (2019): also with alternating moves

Hansen et al. (2021): MAB alg.s learn sovra-competitive pricesbecause of correlated experiments

Assad et al. (2020): evidence from German gasoline market

Asker et al. (2021): how economic modeling can prevent collusion

Mehra (2015), Ballard & Naik (2017), Capobianco & Gonzaga(2017), Ezrachi & Stucke (2017), Deng (2017), Gal (2017), Okuliar& Kamenir (2017), Harrington (2018), Calvano et al. (2019):legislative point of view

6/31

Q-Learning

7/31

Q-Learning: Overview

Q-learning algorithms are model-free algorithms devised to find optimalpolicies in dynamic environments.

Not designed to be deployed in strategic environments

Most optimality results on Q-learning, derived for stationaryenvironments

Setting

Objective: maximize flow of discounted payoffs E [∑∞

t=0 δtπt ]

In each period, an agent observes a state s ∈ S (finite)

The agent has to take an action a ∈ A (finite)

8/31

Q-Learning: Overview

What does model-free means?

The algorithm does not rely on assumptions nor tries to model

The relationship between states, actions, and payoffs π(s, a)

The relationship between states, actions, and future states s ′(s, a)

Feedback:

The algorithm knows its own observed state and actions

At the end of each periods it observes the realized payoffs and thenext state

9/31

Q-Learning Algorithm

Timing

Action-specific value function Q(s, a) is initialized

In each period

1. The algorithm observes the state s

2. Two different ways of selecting an action a

Exploration: the algorithm takes a random action a ∈ AExploitation: the algorithm takes the action a = arg maxa Q(s, a)

3. The algorithm observes the realized payoff π(s, a)

4. The algorithm updates Q(s, a) using π(s, a)

Weighted average of observed payoffs in state s, given action a

Convergence when actions are not updated for enough periods

10/31

Exploration vs Exploitation

Trade-off

Exploration needed to learn policies’ valueExploitation to rip the benefits of the optimal policy

Over time the algorithm shifts from “exploration-most” mode to“exploitation-only” mode

For local optimality in markov environments (Sutton & Barto, 2018)

Note that the algorithm “learns” also in exploitation mode

It still updates QBut only for optimal actions, in visited states

A new action a′′ can become optimal in a state s only if the previousoptimal action a′ performs worse than a′′ for a sufficient number ofvisits of s

Important: in exploitation mode, action optimality is determined bythe One Shot Deviation principle

11/31

Algorithm Details

Algorithm 1: Q-learning

initialize Qi (s, a) ∀i , s, a ;initialize s0 ;while convergence condition not met do

explorationi = I(ri < e−βt

)where ri ∼ U(0, 1) ;

if explorationi thena∗i = a ∈ A chosen uniformly at random ;

elsea∗i = arg maxai Qi (s, ai ) ;

endobserve π given (s, a∗) ;observe s ′ given (s, a∗) ;

Qi (s, a∗i ) = αQi (s, a

∗i ) + (1− α)

[π(s, a∗) + δmaxa′i Qi (s

′, a′i )]

;

s = s ′ ;

end

12/31

Experience Based Equilibrium

What does Q converge to?

Experience-based Equilibrium (Fershtman & Pakes, 2012): at the statesof the equilibrium recurrent class

strategies are optimal given agents’ evaluations, and

these evaluations are consistent with the empirical distribution ofoutcomes and the primitives of the model

12/31

Simulations

13/31

Setting

Infinitely repeated game

Unit mass of consumers with logit demand

qi (p) =e−µpi

e−µpi +∑−i e

−µp−i.

Two firms compete in prices

πi (p) = qi (p)(pi − c)

Fixed grid of available actions (prices)

State: prices of both firms in the previous period p = (pi , p−i )

Calvano et al. (2020): the algorithms learn grim-trigger strategy to keepsupra-competitive prices

14/31

Calvano et al. (2020)

Firms strategies converge to supra-competitive prices.

15/31


If one firm unilaterally unilaterally deviates, it gets higher profits.

16/31


But the deviating firm gets punished afterwards.

17/31

Comment

How can we prevent collusion?

Harrington (2018), Ezrachi & Stucke (2017): sufficient to havepolicy not depend on competitors’ actions

Why? It’s sufficient to look at the algorithm’s inputs to determine therisk of collusion.

At least in the stylized environment of Calvano et al. (2020)

What if the algorithm strategy was based only on own past action?

Firms condition their strategies only on their own past actions

Learn from competitor’s action indirectly through payoffs

However, policies Q do not depend on rival’s action

18/31

Blind State Space

Firms’ prices depend only on their own previous period price.

Outcome: firms set supra-competitive prices. How? Self-punishment.

19/31

Blind State SpaceThe rival does not observe the deviation and hence does not changeactions.

Why is it an equilibrium? How does the algorithm learn such strategy?

20/31

Discussion

Why is it an equilibrium?

Algorithm in exploitation mode only tests one-shot deviations

Unable to learn completely new strategies

How does the algorithm learn such strategy?

In the exploration phase, the algorithm tests any strategy

High-prices + self punishment is more profitable than NE

Is it collusive behavior?

It is a reward-punishing scheme with the sole intent of keepingsupracompetitive prices

20/31

Detecting Collusion

21/31

Detecting Collusion

1. Inspecting and understanding the algorithm might be hard

2. As in previous section, even algorithms independent from competitors’strategies can learn to collude

3. Detecting collusion from observational data is hard

“Absent a natural experiment or counterfactual, enforcers may notreadily discern whether (and why) the market price is too high. Is itthe result of artificial intervention or natural supply and demanddynamics?” (Ezrachi & Stucke, 2019)

Proposal: re-train algorithms unilaterally

Intuition 1: syncronous learning is essential for collusion

Intuition 2: if algorithms are able to learn to collude, they should beable to learn to exploit collusion.

21/31

Detecting Collusion

1. Inspecting and understanding the algorithm might be hard

2. As in previous section, even algorithms independent from competitors’strategies can learn to collude

3. Detecting collusion from observational data is hard

“Absent a natural experiment or counterfactual, enforcers may notreadily discern whether (and why) the market price is too high. Is itthe result of artificial intervention or natural supply and demanddynamics?” (Ezrachi & Stucke, 2019)

Proposal: re-train algorithms unilaterally

Intuition 1: syncronous learning is essential for collusion

Intuition 2: if algorithms are able to learn to collude, they should beable to learn to exploit collusion.

22/31

Re-training Algorithms

Procedure

Stop algorithms as soon as convergence is achieved

Select one of the algorithms and reset it’s exploration/exploitationcounter

Iterate until convergence

I analyze three cases

1. Replication of Calvano et al. (2020), conditional on collusion

2. Blind algorithms presented in the previous section

3. Replication of Calvano et al. (2020), conditional on non-collusion

23/31

Re-training

1. Replication of Calvano et al. (2020), conditional on collusion

Outcome: consistent patter of profitable, more competitive deviations.

24/31

Re-training

2. Blind algorithms presented before.

Outcome: consistent patter of profitable, more competitive deviations.

25/31

Re-training

3. Replication of Calvano et al. (2020), conditional on non-collusion.

Outcome: no deviation from competitive outcome.

26/31

Discussion

Results seem to suggest that coordination in learning is crucial toachieve collusion

Re-training could constitute a model-free test to detect algorithmiccollusion

But it’s costly, slow, and hard to enforce

Can we achieve the same outcome without re-training on new data?

27/31

Re-training using observational data

Idea: re-train from past observational data.

Take observational data

Compute payoffs from k most recent observe data points

πi (s, a) =1

k

T∑t=T−k

πi ,t(s, a) ∀s, a, i (1)

Compute state transitions from k most recent observe data points

s ′i (s, a) =1

k

T∑t=T−k

s ′i ,t(s, a) ∀s, a, i (2)

Given πi (s, a) and s ′i (s, a) compute Qi (s, a)

Observe resulting behavior

We cover the same 3 settings of the previous section.

28/31

Re-training

1. Replication of Calvano et al. (2020), conditional on collusion.

Outcome: slight undercut paired with higher profits but noisy.

29/31

Re-training

2. Blind algorithms presented before.

Outcome: slight undercut paired with higher profits but noisy.

30/31

Re-training

3. Replication of Calvano et al. (2020), conditional on non-collusion.

Outcome: non-collusion unaffected by re-training.

31/31

Conclusion

I have shown that

1. Even if one could inspect the algorithm strategy, collusive strategiesthat do not depend, directly or indirectly, on the rival behavior arepossible

2. If algorithms were colluding, unilaterally re-training one of them leadsto more competitive strategies

Thank you!

31/31

Conclusion

I have shown that

1. Even if one could inspect the algorithm strategy, collusive strategiesthat do not depend, directly or indirectly, on the rival behavior arepossible

2. If algorithms were colluding, unilaterally re-training one of them leadsto more competitive strategies

Thank you!

31/31

Bibliography

31/31

Bibliography I

Asker, J., Fershtman, C., & Pakes, A. (2021). Artificial intelligence and pricing: Theimpact of algorithm design (Tech. Rep.). National Bureau of Economic Research.

Assad, S., Clark, R., Ershov, D., & Xu, L. (2020). Algorithmic pricing and competition:Empirical evidence from the german retail gasoline market.

Autorite de la concurrence & Bundeskartellamt. (2019). Algorithms and competition.

Ballard, D. I., & Naik, A. S. (2017). Algorithms, artificial intelligence, and jointconduct. CPI Antitrust Chronicle, 29–35.

Calvano, E., Calzolari, G., Denicolo, V., & Pastorello, S. (2019). Algorithmic pricingwhat implications for competition policy? Review of industrial organization, 55(1),155–171.

Calvano, E., Calzolari, G., Denicolo, V., & Pastorello, S. (2020). Artificial intelligence,algorithmic pricing, and collusion. American Economic Review, 110(10), 3267–97.

Capobianco, A., & Gonzaga, P. (2017). Algorithms and competition: Friends or foes?Competition Policy International, 1, 2.

Chen, L., Mislove, A., & Wilson, C. (2016). An empirical analysis of algorithmic pricingon amazon marketplace. In Proceedings of the 25th international conference on worldwide web (pp. 1339–1349).

Deng, A. (2017). When machines learn to collude: Lessons from a recent research studyon artificial intelligence. Available at SSRN 3029662.

31/31

Bibliography IIEzrachi, A., & Stucke, M. E. (2017). Artificial intelligence & collusion: When

computers inhibit competition. U. Ill. L. Rev., 1775.

Ezrachi, A., & Stucke, M. E. (2019). Sustainable and unchallenged algorithmic tacitcollusion. Nw. J. Tech. & Intell. Prop., 17, 217.

Fershtman, C., & Pakes, A. (2012). Dynamic games with asymmetric information: Aframework for empirical work. The Quarterly Journal of Economics, 127(4),1611–1661.

Gal, M. S. (2017). Algorithmic facilitated coordination: Market and legal solutions. CPIAntitrust Chronicle, 1–7.

Hansen, K. T., Misra, K., & Pai, M. M. (2021). Frontiers: Algorithmic collusion:Supra-competitive prices via independent algorithms. Marketing Science.

Harrington, J. E. (2018). Developing competition law for collusion by autonomousartificial agents. Journal of Competition Law & Economics, 14(3), 331–363.

Igami, M. (2020). Artificial intelligence as structural estimation: Deep blue, bonanza,and alphago. The Econometrics Journal, 23(3), S1–S24.

Klein, T. (2019). Autonomous algorithmic collusion: Q-learning under sequentialpricing. Amsterdam Law School Research Paper(2018-15), 2018–05.

Mehra, S. K. (2015). Antitrust and the robo-seller: Competition in the time ofalgorithms. Minn. L. Rev., 100, 1323.

31/31

Bibliography III

OECD. (2017). Algorithms and collusion: Competition policy in the digital age.

Okuliar, A., & Kamenir, E. (2017). Pricing algorithms: Conscious parallelism orconscious commitment? Competition Policy International, 1–4.

Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MITpress.

UK Competition and Markets Authority. (2021). Algorithms: How they can reducecompetition and harm consumers.

Documents

Algorithmic Collusion Detection and Regulation · 2021. 7. 1. · 4/31 Detecting Algorithmic Collusion Ex-ante 1.Simulate algorithm behavior to classify algorithms into collusive