INFORMS is located in Maryland, USA Publisher: Institute ... · History: Accepted by Chris Forman, information science. Funding: The authors thank the University of Minnesota Social

This article was downloaded by: [8.9.237.33] On: 06 March 2020, At: 11:24Publisher: Institute for Operations Research and the Management Sciences (INFORMS)INFORMS is located in Maryland, USA

Management Science

Publication details, including instructions for authors and subscription information:http://pubsonline.informs.org

Can Reputation Discipline the Gig Economy? ExperimentalEvidence from an Online Labor MarketAlan Benson, Aaron Sojourner, Akhmed Umyarov

To cite this article:Alan Benson, Aaron Sojourner, Akhmed Umyarov (2019) Can Reputation Discipline the Gig Economy? Experimental Evidencefrom an Online Labor Market. Management Science

Published online in Articles in Advance 12 Sep 2019

. https://doi.org/10.1287/mnsc.2019.3303

Full terms and conditions of use: https://pubsonline.informs.org/Publications/Librarians-Portal/PubsOnLine-Terms-and-Conditions

This article may be used only for the purposes of research, teaching, and/or private study. Commercial useor systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisherapproval, unless otherwise noted. For more information, contact [email protected].

The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitnessfor a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, orinclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, orsupport of claims made of that product, publication, or service.

Copyright © 2019, The Author(s)

Please scroll down for article—it is on subsequent pages

With 12,500 members from nearly 90 countries, INFORMS is the largest international association of operations research (O.R.)and analytics professionals and students. INFORMS provides unique networking and learning opportunities for individualprofessionals, and organizations of all types and sizes, to better understand and use O.R. and analytics tools and methods totransform strategic visions and achieve better outcomes.For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org

http://pubsonline.informs.org

https://doi.org/10.1287/mnsc.2019.3303

https://pubsonline.informs.org/Publications/Librarians-Portal/PubsOnLine-Terms-and-Conditions

https://pubsonline.informs.org/Publications/Librarians-Portal/PubsOnLine-Terms-and-Conditions

http://www.informs.org

MANAGEMENT SCIENCEArticles in Advance, pp. 1–24

http://pubsonline.informs.org/journal/mnsc ISSN 0025-1909 (print), ISSN 1526-5501 (online)

Can Reputation Discipline the Gig Economy? ExperimentalEvidence from an Online Labor MarketAlan Benson,a Aaron Sojourner,a Akhmed Umyarova

aCarlson School of Management, University of Minnesota, Minneapolis, Minnesota 55455Contact: [email protected], http://orcid.org/0000-0003-4256-3357 (AB); [email protected], http://orcid.org/0000-0001-6839-2512 (AS);[email protected], http://orcid.org/0000-0003-3731-9093 (AU)

Received: July 28, 2017Revised: July 3, 2018; December 17, 2018Accepted: December 28, 2018Published Online in Articles in Advance:September 12, 2019


Copyright: © 2019 The Author(s)

Abstract. Just as employers face uncertainty when hiring workers, workers also faceuncertainty when accepting employment, and bad employers may opportunisticallydepart from expectations, norms, and laws. However, prior research in economics andinformation sciences has focused sharply on the employer’s problem of identifying goodworkers rather than vice versa. This issue is especially pronounced in markets for gigwork, including online labor markets, in which platforms are developing strategies tohelp workers identify good employers. We build a theoretical model for the value ofsuch reputation systems and test its predictions on Amazon Mechanical Turk, on whichemployers may decline to pay workers while keeping their work product and workersprotect themselves using third-party reputation systems, such as Turkopticon. We findthat (1) in an experiment on worker arrival, a good reputation allows employers tooperate more quickly and on a larger scale without loss to quality; (2) in an experimentalaudit of employers, working for good-reputation employers pays 40% higher effectivewages because of faster completion times and lower likelihoods of rejection; and (3)exploiting reputation system crashes, the reputation system is particularly important tosmall, good-reputation employers, which rely on the reputation system to compete forworkers against more established employers. This is the first clean field evidence of theeffects of employer reputation in any labor market and is suggestive of the special rolethat reputation-diffusing technologies can play in promoting gig work, in which con-ventional labor and contract laws are weak.

Open Access Statement: This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0International License. You are free to download this work and share with others commercially ornoncommercially, but cannot change in any way, and you must attribute this work as “ManagementScience. Copyright © 2019 The Author(s). https://doi.org/10.1287/mnsc.2019.3303, used under aCreative Commons Attribution License: https://creativecommons.org/licenses/by-nd/4.0/.”

History: Accepted by Chris Forman, information science.Funding: The authors thank the University of Minnesota Social Media and Business Analytics Col-laborative for funding.

Supplemental Material: Data are available at https://doi.org/10.1287/mnsc.2019.3303.

Keywords: online labor markets • online ratings • employer reputation • labor markets • economics of information systems • job search •electronic markets and auctions • IT policy and management • contracts and reputation • information and market efficiency

1. IntroductionAmazon Mechanical Turk (M-Turk), TaskRabbit,Upwork, Uber, Instacart, DoorDash, and other onlineplatforms have drastically reduced the cost of seek-ing, forming, and terminating work arrangements.This development has raised the concern that plat-forms circumvent regulations that protect workers.AU.S. GovernmentAccountabilityOffice (2015, p. 22)report notes that such “online clearinghouses forobtaining ad hoc jobs” are attempting “to obscure oreliminate the link between the worker and the busi-ness . . . which can lead to violations of worker pro-tection laws.” Developers of these online platformsargue that ratings systems can help discipline em-ployers and other trading partners that break rules

and norms. To what extent can the threat of a badpublic reputation prevent employer opportunism andthe promise of a good reputation encourage respon-sible employer behavior? To what extent can workersvoluntarily aggregate their private experiences intoshared memory to discipline opportunistic employers?These questions are especially important to gig jobs andonline labor platforms,which are characterized by high-frequency matching of workers and employers, and onwhich platform owners have moved fast and oftensought to break out of traditional regulatory regimes.Our setting focuses on M-Turk, which possesses

two useful and rare features for this purpose. First, noauthority (neither the government nor the plat-form) disciplines opportunistic employers. In M-Turk,

1

http://pubsonline.informs.org/journal/mnsc

mailto:[email protected]

http://orcid.org/0000-0003-4256-3357

http://orcid.org/0000-0003-4256-3357


http://orcid.org/0000-0001-6839-2512

http://orcid.org/0000-0001-6839-2512


http://orcid.org/0000-0003-3731-9093

http://orcid.org/0000-0003-3731-9093



https://creativecommons.org/licenses/by-nd/4.0/


after workers put forth effort, employers may keep thework product but refuse payment for any reason orno reason. Workers have no contractual recourse orappeal process. Because M-Turk also has no nativetool for rating employers, many workers use third-party platforms to help them find the best em-ployers. Among the most popular of these platformsis Turkopticon, which allows workers to share in-formation about employers. Second, such oppor-tunism is observable to researchers with some effort.In traditional labor markets, workers would pre-sumably rely on an employer’s reputation to protectthemselves against events not protected by a contract,such as promises for promotion or paying for over-time. Unfortunately, these outcomes are especially dif-ficult to observe for both courts and researchers alike.The literature on both traditional and online labor mar-kets has instead focused on learning about opportun-ism byworkers, not by employers (e.g., List 2006, Oyerand Schaefer 2011,Moreno andTerwiesch 2014, Pallais2014, Stanton and Thomas 2015, Farronato et al. 2018,and Filippas et al. 2018). These two contextual fea-tures enable our empirical study.

We begin by outlining a model of employer reputa-tion. The model allows for two forms of employer rep-utation: a public reputation disseminated by a formalratingsystemandan informal reputationgovernedbyanindividual employer’s private visibility. The modelyields three key, testable predictions regarding the valueof reputation. First, employers with better reputationsare rewarded through an enhanced ability to attractworkers at a given promised payment level, and in thissense, a better reputation acts as collateral againstfuture opportunism. Second, workers earn more whenthey have information that enables them to work onlyfor better-reputation employers. Third, the value of thepublic reputation system to employers depends ontheir visibility: less-visible employers with good reputa-tionson the systemrelyon it toattractworkers, andbetter-known, good-reputation employers are less reliant. Thisthird prediction has important implications for how acredible reputation system changeswhat types of tradingrelationships a market with poor enforcement can bear.

Interpreting this prediction, less well-known em-ployers, such as newer or smaller firms, strugglemore toearn workers’ trust to recruit workers and grow.Credible reputation systems especially help these less-visible employers and, thereby, support the entry offirms from the competitive fringe andpromote economiccompetition and dynamism. Our model is consistentwith the digitization literature’s findings that ratingsystems are especially important to relatively unknownagents and not as important to agents with an otherwisehighlyvisible reputation. Luca (2016) finds thatYelp.comreviews are especially important for smaller, inde-pendent restaurants than for restaurant chains and,

indeed, thatYelppenetration inanarea is associatedwiththedecline inchains thatpresumably rely onother formsof reputation. Nagaraj (2016) finds that digitization ofmagazine articles has a greater effect onWikipedia entriesfor less-known individuals than for well-known in-dividuals forwhomother information is readily available.In three tests, we provide, to our knowledge, the

first clean field evidence of employer reputation ef-fects in a labor market. These tests follow the model’sthree predictions. Specifically, the first experimentmeasures the effect of employer reputation on theability to recruit workers. We create 36 employers onM-Turk. Using a third-party employer rating site,Turkopticon, we endow each with (i) 8–12 goodratings, (ii) 8–12 bad ratings, or (iii) no ratings. Wethen examine the rate at which they can recruitworkers to posted jobs. We find that employers withgood reputations recruit workers about 50% morequickly than our otherwise-identical employers withno ratings and 100% more quickly than those withvery bad reputations. Using M-Turk wage elastici-ties estimated by Horton and Chilton (2010), we es-timate that posted wages would need to be almost200% greater for bad-reputation employers and100% greater for no-reputation employers to attractworkers at the same rate as good-reputation em-ployers. Outside of M-Turk, one might think of theattractiveness of the job as the firm’s ability to attractapplicants and reputation as a substitute for wagefor that purpose. A better reputation shifts the firm’slabor supply curve to the right, allowing it to recruitand select frommoreworkers for a givenwage offer ina given period of time. We also estimate that about55% of job searchers use Turkopticon, suggesting thatmore complete adoption would magnify these effects.We find evidence that Turkopticon is signaling em-ployer characteristics rather than just task charac-teristics. These results demonstrate that workers usereputation to screen employers and that reputationaffects employers’ ability to recruit workers.The second experiment tests the validity of online

reputations from the perspective of a worker. Repu-tation systems based on individual ratings are vulner-able to inflation and inaccuracy (Filippas et al. 2018).We act as a blinded worker to assess the extent towhich other workers’ public employer ratings reflectreal variation in employer quality. One research as-sistant (RA) randomly selects tasks from employerswho have good reputations, bad reputations, or noreputation and sends each job to a second RA, whodoes the jobs while blind to employers’ reputations.This experimental feature ensures that worker effortis independent of employer reputation, and so anyobserved differences in subsequent employer be-havior toward the worker are not a result of differ-ences in worker effort. Consistent with a pooling

Benson, Sojourner, and Umyarov: Can Reputation Discipline the Gig Economy?2 Management Science, Articles in Advance, pp. 1–24, © 2019 The Author(s)

equilibrium, we observe no difference in the averageper-task payment promised up front by employersof different reputations. However, effective wageswhile working for good-reputation employers are 40%greater than effective wages while working for bad-reputation employers. We decompose this differenceinto the shares resulting from differences in job quality(how long the job takes) versus the probability of beingpaid at all. This experiment shows that, although thereputation system aggregates ratings that are all vol-untarily provided and unverified, it contains usefulinformation for workers.

Finally, we focus on instances when Turkopticonservers stopped working by using a difference-in-differences design to study effects when the reputa-tion system is removed temporarily. We match dataon M-Turk tasks completed across employers overtime from Ipeirotis (2010a) with each employer’scontemporaneous Turkopticon ratings and comparethe change in worker arrival rates for different typesof employers when Turkopticon crashes. When Tur-kopticon crashes, employers with bad reputations arelargely unaffected. Taken with the prior evidence, thisresult suggests that employers with bad reputationsdo not recruit workers who use Turkopticon and relyonly on uninformed workers. However, the effect onemployers with good reputations on Turkopticon isheterogeneous by employer visibility, measured as thenumber of times an employer posted work in the pastand proxying for the likelihood that a worker wouldhave encountered it before. Workers sharply with-draw their labor supply from less-visible, good-reputation employers, who presumably were benefit-ing from Turkopticon informing workers of their goodreputations. In contrast, worker arrival rates increasefor more-visible, good-reputation employers, who arepresumably already better known as safe bets.

2. Literature ReviewMost markets have information problems to somedegree. ForM-Turkworkers, Turkopticon is theDun&Bradstreet of procurers, the Moody’s of bond buyers,the Fair Isaac of consumer lenders, and the Metacriticof moviegoers. Each of these institutions offers ex-tralegal protections against contractual incomplete-ness based on information sharing and the implicitthreat of coordinated withdrawal of trade by one sideof a market.

A large literature studies unilateral ratings of sup-pliers in online markets. In principle, ratings shouldbe unilateral when suppliers (including workers,sellers of goods, and service providers) vary in waysthat are difficult to contract upon, creating problemsfor buyers. In contrast, suppliers do not face manyissues in choosing customers; buyers are essentiallymoney on the barrel. For example, prior work (e.g.,

Dellarocas and Wood 2008 and Nosko and Tadelis2015) has studied sellers on eBay, on which buyersrate sellers on the accuracy of product descriptionsand timeliness, but sellers only care that buyers sub-mit payment. A similar case could be made for Yelp(e.g., Luca 2016), on which the quality of the food andservice is difficult to contract upon, but restaurantsserve all comers.Although the vast majority of work has concerned

unilateral rating regimes, Airbnb (e.g., Fradkin et al.2015) and Uber (e.g., Rosenblat et al. 2015) offer twomajor exceptions: Airbnb hosts and guests rate eachother as do Uber drivers and passengers. Althoughboth sides value the ease of platform matching andmicrocontracting, both also worry about opportu-nistic behavior by counterparties. These platforms tryto create healthy markets while avoiding regulationby developing two-sided online ratings systems. Thedigital labor platform Upwork has also used a two-sided rating system, but research there has not fo-cused on employer reputation.Issues of trust and contract enforcement among

trading partners are especially important to onlinespot markets and other forms of gig work. The natureof the employment relationship, especially in gigwork, features bilateral uncertainty. Despite this, theliterature on both traditional and online labor haslargely focused on the employer’s problem of eval-uating and rating employees rather than vice versa.Labor economics has given great attention to howemployers interpret educational credentials, workexperience, or their experience at that firm to identifythe most productive workers (for a review, see Oyerand Schaefer 2011).This focus on guarding against worker opportun-

ism now extends to the burgeoning literature on onlinelabor markets. Studies, for instance, have examinedhow online employers infer workers’ abilities fromtheir work histories (Pallais 2014), oversubscription(Horton 2019), work through outsourcing agencies(Stanton and Thomas 2015), national origin (Agrawalet al. 2013), or platform endorsements (Barach et al.2019). Although these studies present a sample of therecent literature on online labor markets, they arecharacteristic of its present focus on rating workersrather than firms. As such, the literature offers littleevidence on the risks thatworkers facewhen selectingemployers or how platforms can design systems thatpromote trade absent regulation.Our work builds on a handful of studies that have

sought to empirically examine employer reputation.Turban and Cable (2003) provide the first correla-tional evidence that companies with better reputa-tions tend to attract more applicants using career-services data from two business schools. Hannon andMilkovich (1996) find mixed evidence that news of

Benson, Sojourner, and Umyarov: Can Reputation Discipline the Gig Economy?Management Science, Articles in Advance, pp. 1–24, © 2019 The Author(s) 3

prominent employer rankings affects stock prices.Using a similar methodology, Chauvin and Guthrie(1994) find small but significant effects. Brown andMatsa (2015) find that distressed financial firms at-tract fewer and lower-quality applicants. List andMomeni (2017), also in an experiment on M-Turk,find that employers that promise to donate wagesto a charity attract more work, albeit with morecheating, suggesting that workers take moral licensewhen performing work for good causes. As such, theprior evidence on employer reputation is either corre-lational or concerns uncertainty of other characteristicsof the employer (e.g., the firm’s future success or al-truism), which workers may value in themselves orbecause they also view these as correlated with theirquality as an employer. We provide the first field ex-perimental evidence regarding how employers’ rep-utation for treating workers affects their ability toattract work.

Theoretically, a better public reputation would al-low trading partners to extract greater work or higherprices (Klein andLeffler 1981).Moreno andTerwiesch(2014) find that online service providers leveragebetter reputations to either charge more or increasetheir probability of being selected for a project. Ba andPavlou (2002), in a market “similar to eBay,” find thatbuyers are willing to pay a price premium to dealwith trusted sellers. Similarly, Banerjee and Duflo(2000) find that supplier reputation is important inthe Indian software market, in which postsupplyservice is important but difficult to contract.McDevitt(2011) finds evidence that residential plumbing firmswith high records of complaints are more likely tochange their name, suggesting that firms seek topurge bad reputations. However, in their study ofeBay sellers, Bajari and Hortacsu (2003) find only asmall effect of reputation on prices. Although thesestudies once again focus on rating the supply side, ourtheory and empirics speak to the demand side bypresenting evidence that employers with good rep-utations can extract lower prices and greater quan-tities of work.

We also consider how public ratings of employerssubstitute for more traditional markers of beingan established organization. Stinchcombe (1965) fa-mously theorized that new organizations face acredibility problem when attracting trading part-ners, a phenomenon they refer to as the “liability ofnewness.” Luca (2016) offers perhaps themost closelyrelated test for the relative importance of onlineratings versus being established; he finds that Yelpratings are more important to independent restau-rants than to chains. As we discuss, our setting offersa relatively clean opportunity to exploit exogenousremovals of a reputation system, to do so for thedemand side, and to do so in a labor market.

3. Theory and HypothesesWe offer a formal model of job search in which thereis no contract enforcement. Under certain conditions,a “public relational contract” emerges, whereby thethreat of losing future workers discourages employeropportunism. In doing so, our model illustrates theinsight from List (2006), that reputational concernscan lead economic agents to act in a manner thatappears more prosocial and fair. We explore howworkers and different kinds of employers rely on anemployer-reputation system and test the model’shypotheses in the online market.The model begins from a basic sequential search

model.Workers search for a job to receive awage offerfrom a random employer and choose whether toaccept each offer and put forth effort. Then, the em-ployer chooses ex post whether to renege on pay-ment. Workers’ decision to work or to forego an offerdepends on their beliefs as to whether the employerwill pay, which depends on two exogenous factors:whether the worker is informed by a reputation systemandwhether the employer is visible.1 Specifically, thereare two types of workers: workers informed by thereputation system see all employers’ past paymenthistories, and uninformed workers do not. Being in-formed, in this sense, signifies access to the collectiveexperience of prior workers as would be diffusedby a public reputation system. Second, uninformedworkers may nonetheless observe the pay history ofan offering employer with a probability that dependson a property of the employer: its visibility.2 The visi-bility of an employer can be thought of simply as itis treated in the model: an employer characteristicthatmakes its history known toworkers evenwithoutthe reputation system. Perhaps the most obvious em-pirical proxy for visibility is simply the size of theemployer, which, by definition, signifies the extent ofworkers’ past experience with the employer.Although the single-shot model would devolve

into the classic hold-up problem in which employersnever pay and workers never put forth effort, wecharacterize an interesting, nonunique, steady-stateequilibrium in which an employer’s good reputationacts as collateral against future wage theft. They con-tinue to pay so that they can attract workers and getwork done in the future. We explore how the existenceof the public reputation system (informed workers)differentially affects highly-visible and less-visibleemployers. Employers with a good reputation (“high-road employers”) continue to pay as long as the shareof informed workers is sufficiently high, and employerswith a bad reputation (“low-road employers”) neverpay.3We describe conditions that avoidmaking high-road employers’ reneging temptation too great, whichwould then cause all workers to exit the labor market.


This temptation is disciplined by the expected fu-ture flow of workers, which depends on the extent ofworkers’ ability to identify high-road employers. As aresult, we endogenously characterize the scope ofeconomic activity that this environment can bear andhow it depends on the ability of workers to screenemployers through either a public reputation systemor private knowledge.4

The model makes three key claims. First, the publicreputation system deters employer opportunism bycreating a credible threat that employers’ ability toattract workers in the future will erode (tested instudy 1). Good-reputation employers can attractmore workers to the same job offer than either no-reputation or bad-reputation employers; a betterreputation shifts the labor supply curve to the right.Second, it is incentive-compatible for workers toscreen out low-road employers in their job search,a strategy that boosts workers’ hourly earnings(study 2). Third, the reputation system is a substitutefor the visibility of individual employers. The repu-tation system matters especially for smaller, less-visible high-road employers, whose good paymenthistories would otherwise be unknown to job seekers(study 3).

Formally, assume a job search environment withmeasure one of workers indexed by i ∈ [0, 1] andmeasure one of risk-neutral employers indexed byj ∈ [0, 1]. A share, s ∈ [0, 1], of employers have a high-road history of making promised payments. Workerswith i ≤ p ∈ (0, 1) are fully informed to all employers’past play via a public reputation system. Employersdiffer in their history’s visibility vj ∈ (0, 1) to otherworkers, and high values of vj represent “highlyvisible” employers. Let v ≡ E[vj]. Workers who areindifferent between accepting and rejecting offerschoose to accept. Employers indifferent betweenpaying and reneging choose to pay. Normalizing, weassume each earns zero if they do not participate inthis market (e.g., nonwork or the labor market char-acterized by this environment). The timing of a periodof job search is

1. Worker i chooses whether to search. Those whodo incur cost c and receive a wage promise w from arandom employer j. Fully informedworkers observe j’spast decisions to pay or renege. Other workers ob-serve employer j’s history with probability vj. Non-searching workers receive zero and proceed to thenext period of job search. Zero represents the valueof not participating in the online labor market.

2. Worker i decides whether to accept or rejectemployer j’s offer. If the worker accepts, the workerincurs a cost of effort e, and employer j receives workproduct with value y.5 If the worker rejects, theworker receives zero and proceeds to the next periodof job search.

3. Employer jdecideswhether to paywor to renegeand pay zero. Employers discount future periods atrate δ.We provide a set of parametric restrictions, prove

the existence of an equilibrium, and explore its prop-erties. First, promised wages exceed the value of workeffort: w − e> 0. This is a trivial precondition for any-one wanting to work. Second, an uninformed workerwill not accept an unknown employer’s offer becausethe expected payoff for doing so does not justify thecertain cost of effort, sw< e. This assures not everyoneworks for anyone. Third, for uninformed workers,promised wages (w) and the chance of being matchedto a visible, high-road employer (vs) times its payoff(w − e) exceed the certain cost of search, vs(w − e) ≥ c.The uninformed worker’s market participation con-straint is met. It implies a lower bound on wage,wmin �e + (vs)−1c, necessary to keep uninformed workersfrom dropping out of the market. It increases inthe cost of effort and search and decreases in theshare of high-road employers and average employervisibility.The fourth parameter restriction is less trivial and

more interesting. For high-road employers, it requiresthat the gains from high-road trade (y − w), employerfarsightedness δ, and the flow of workers (p + v − pv)outweigh the one-time payoff of reneging (y) andcontinuing as a low-road employer: (1 − δ)−1(p + v −pv)(y − w) ≥ y. We refer to this as the “reputationdiffusion” criterion, noting that an employer’s pastplay must be observed either through worker inform-edness or employer visibility. As both p and v approachzero, workers can’t screen employers well enough,and employers don’t extract sufficient rents on a goodreputation to justify maintaining a high-road repu-tation.6 Otherwise, high-road employers would re-nege, the value of market participation for all workersbecomes negative, no work is performed, and the labormarket unravels.An adapted Lerner index, (y − w)/y, measures the

share of worker productivity kept as economic rent byhigh-road employers in this equilibrium.The reputation-diffusion criterion implies this share cannot fall be-low [(p + v − pv)−1] * (1 − δ). Better information forworkers, p or v rising to one, creates space for high-road employers to raise wages and to retain a smallershare of worker productivity because they have re-liable access to a larger share of workers. This propertyreflects the reputation system literature’s view thatbetter reputation systems can expand the scope ofeconomic activity that can be completed online byarms-length trading partners (Jøsang et al. 2007).First, consider workers who vary in their inform-

edness. Informed workers encounter a high-road em-ployer in any period with probability s. They accepthigh-road employers’ offers because w − e − c ≥ −c,


which follows from the assumption that trade is prof-itable: w − e> 0. They reject low-road employers’ of-fers because −c> −c − e. Therefore, the present valueof this strategy for fully informed workers is (1 − δ)−1·[s(w − e) − c]. Uninformed workers encounter an em-ployer with an observable pay history with proba-bility v. In this case, they face the same incentives andmake the same decisions as fully informed workers.Uninformed workers who encounter a nonvisibleemployer reject the offer because of the parameterrestriction sw< e. The present value of this strategyto uninformed workers is (1−δ)−1[vs(w− e) − c]. Bothinformed and uninformed workers’ payoffs satisfytheir participation constraint under the condition,vs(w− e) − c≥ 0.

Next, consider employers, which vary in their vis-ibility. Low-road employers are supplied no labor be-cause workers only accept offers from revealed, high-road employers.7 High-road employers receiveworkersat a rate of p + (1 − p)vj, receiving all informedworkersand the share vj of uninformed workers. In equilib-rium, it is incentive-compatible for employers to payif the present value payoff of paying exceeds theimmediate temptation of reneging, (1 − δ)−1(p + vj −pv)(y − w) ≥ y. This follows from the reputation-diffusion criterion discussed earlier. The high-roademployer’s participation constraint is simply thaty − w ≥ 0.

The first two hypotheses concern the model’s keypredictions regarding why it is incentive-compatiblefor employers to maintain a good reputation and forworkers to screen for jobs on a good reputation.

Hypothesis 1. For a given job offer, employers with a betterreputation attract workers more quickly.

Hypothesis 2. Workers have higher average pay whenworking for employers with better reputations.

Our final hypothesis follows from an interestingproperty of the model: that the reputation systemthat informs workers and employer visibility are sub-stitutes. This is the key feature of the reputation-diffusion criterion, and yields the surprising resultthat, when the share of informed-type workers is sentto zero (e.g., because of the crash of the reputationsystem), the only employers affected are those withgood reputations and imperfect visibility.

Hypothesis 3. If the reputation system is disabled, sendingthe share of informed workers to zero, then (a) highly visibleemployers with good reputation lose a small or no share ofworkers, (b) less-visible employers with good reputation losea larger share of workers, and (c) employers with bad rep-utation are unaffected regardless of visibility.

To see this, recall that the flow of workers to high-road employers is given by p + vj − pvj. As the share of

informed workers decreases (i.e., the employer rep-utation system crashes, sending p → 0), a high-roademployer’s worker flow converges to vj. The mostvisible employers (vj → 1) are unaffected, and thearrival rate to the least-visible employers (vj → 0) fallsmore. Put another way, the reputation system (themass of informed workers) is least valuable to themost-visible employers and most valuable to the least-visible employers. In our model, change in the repu-tation system p does not affect low-road employers.They get the same arrival rate regardless of inform-edness or visibility.The relative value of the reputation system tomore-

versus less-visible employers is a chief contributionof this model, at least beyond labor markets. Con-ceptually, we can think of the reputation system asany technology that makes it less costly for onetrading party to observe the past behaviors of poten-tial partners. In our setting, we observe relatively brief,unexpected instances when the market’s public repu-tation system crashes. In these instances, workerswho remain in the market must rely on other mech-anisms to screen employers. Because our setting’sreputation system is hosted through a third party, itsoutages present a special opportunity to study howworkers adapt and how this change in search be-havior affects employers of varying visibility.Before continuing, it’s also important to note some

other interesting features of the model. One relates tothe substitutability of informed workers and visibleemployers. As p → 1, v no longer affects search or theflow of workers; as v → 1, p no longer affects searchor the flow of workers. In either case, workers al-ways accept jobs from known high-road employers.In this sense, technologies that give workers a col-lective memory serve as a scalable substitute for thepersonal experience that markets have traditionallyrelied upon to avoid opportunistic trading partners.A second insight relates to the upper bound on wagethat this environment can bear before the marketbreaks down from high-road employer defection.Rearranging the reputation diffusion criterion yieldsw ≤ y[1 − (1 − δ)(p + v − pv)−1] ≡ wmax, which increasesin worker informedness p and average employer vis-ibility v. In other words, stronger public reputationsystems (higher p’s) and better private sources of in-formation (higher v’s) both increase the upper boundon wages that are supportable in the absence of en-forceable contracts.

4. Setting4.1. M-TurkM-Turk is an online labor market that allows em-ployers (these purchasers of labor are called re-questers) to crowdsource human intelligence tasks(HITs) to workers over a web browser. Common HITs


include audio transcription, image recognition, textcategorization, and other tasks not easily performed bymachines. M-Turk allows employers to process largebatches of HITs with greater flexibility and at generallymuch greater speeds and lower costs than traditionalemployment.

Amazon does not generally publish detailed usagestatistics; however, in 2010, it reported that morethan 500,000 workers from more than 190 countrieswere registered on M-Turk.8 In 2014, Panos Ipeirotis’web crawler found that the number of available HITsfluctuated between 200,000 and 800,000 from Januaryto June 2014.9 Ross et al. (2009) found that a majorityof workers were female (55%) and from the UnitedStates (57%) or India (32%). Horton and Chilton(2010) estimated that the median reservation wagewas $1.38 an hour. M-Turk’s platform revenue comesfrom 10% brokerage fees paid for by employers.

M-Turk specifically has many features making itattractive for studying how workers navigate em-ployer heterogeneity using public employer reputa-tion.10 First, there is no variation in the terms ofcontracts. In most labor markets, relationships em-body a mix of enforceable and unenforceable ele-ments and the nature of the mix is unknown to theresearcher; observed differences between employersmay reflect differences in workers’ contracts andaccess to legal recourse. In M-Turk, workers putforth effort, employers acquire the work product, andthen employers choose whether to pay workers.Employers may refuse payment for any reason or noreason, and workers have no contractual recourse.This complete lack of contract enforcement is rare andvaluable for research although potentially madden-ing for workers. Here, one can be sure that all em-ployer behavior is discretionary and is performedabsent the possibility of enforcement. Second, M-Turkdoes not have a native employer-reputation system, afeature it shares with off-line labor markets but unlikeother online labor markets. This also proves useful byallowing us to decouple worker effort from employerreputation in the audit study.

To help avoid employer opportunism, many M-Turkworkers use Turkopticon, a third-party browser plug-in that allows workers to review and screen em-ployers (Silberman and Irani 2016). There are severalreasons these ratings may be uninformative. First, thesystem is unnecessary if workers face no informationor enforcement problems. Second, the system relies onworkers voluntarily contributing accurate, private in-formation to a common pool, which costs time anddirects other workers to scarce, high-paying tasks. Thisdistinguishes labor markets from consumer marketsin which trade is nonrival. Third, ratings systems varywidely in their informativeness because of reputationinflation and other issues (Nosko and Tadelis 2015,

Horton and Golden 2015). Anyone can post any re-view on Turkopticon. It has no revenue and is main-tained by volunteers.When an employer posts a task, it appears to

workers on a list of available tasks. This list specifies ashort description of the task, the number of tasksavailable in the batch, the promised pay per task, thetime allotted for workers to complete the task oncethey accept it, and the name of the employer. Theemployer also may restrict eligibility to workers witha sufficiently high approval rating, which requires ahistory of having submitted work approved and paidfor by past employers. Workers may preview the taskbefore accepting. Upon acceptance, a worker hasthe allotted time to submit the task. The employerthen has a predetermined period to approve or rejectthe task with or without an accompanying note.If the employer approves the task, the employer paysthe posted rate and broker fees to Amazon. The condi-tions for approval are not contractible; if the employerrejects the task, the worker’s submitted work remainsin the employer’s possession but no payment is made.Moreover, the worker’s approval rate declines, re-ducing the worker’s eligibility for other tasks in thefuture. There is no process for appealing a rejection.Opportunism takes many forms in this market.

Employers may disguise wage theft by posting un-paid trial tasks, implicitly with the promise thatworkers who submit work that matches a known,correct answer will receive work for pay when, infact, the trial task is the task itself, and the employerrejects all submitted work for being defective. Inaddition to nonpayment, employers may also ad-vertise that a task should take a set amount of timewhen it is likely to take much longer. Therefore, al-though the promised pay for accepted submissionsis known, the effective wage rate, depending on thetime it takes to complete the task, is not. Employerscan also delay accepting submitted work for up to30 days. Employers may or may not communicatewith workers. Employers can also differ in how gen-erous they are to workers. Some might not really domuch quality control and pay for all work regardless ofhow bad it is or pay a lot for very easy tasks.

4.2. Reputation on M-TurkWithin M-Turk, there is no tool allowing workers toreview employers, and workers cannot observe em-ployers’ effective wages or payment histories. How-ever, several third-party resources allow workers toshare information voluntarily regarding employerquality. These include web forums, automatic notifi-cation resources, and public-rating sites.11

We test our hypotheses regarding the value of theemployer reputation system using Turkopticon, acommunity ratings database and browser plugin that


we estimate is used by a slight majority of M-Turkjob seekers.12 The plugin adds information to theworker’s job search interface, including communityratings of an employer’s communicativity, generos-ity, fairness, and promptness. Ratings take integervalues from one to five. As of November 2013, Tur-kopticon included 105,909 reviews by 8,734 workersof 23,031 employers. The attributes have a mean of3.80 and a standard deviation of 1.72.13 Workers canclick on a link to read text reviews of an employer.These reviews typically further recommend or warnagainst doing work for a given employer. Figure 1provides an illustration.

Figures 1 and 2 illustrate an M-Turk worker’s jobsearch process. Figure 1 shows how workers searchfor tasks for pay. Figure 2 shows a preview of the taskthat we use for this study.

Turkopticon is remarkable because it relies on vol-untary feedback from a community of anonymousworkers to provide a signal of employer quality. Thesereviews are costly in terms of the worker’s time, andthe content of the review is unverifiable to otherworkers. More importantly, there is wide variationin the effective pay rate of individual tasks. Becauseemployers typically post tasks in finite batches andallow workers to repeat tasks until the batch is com-pleted, the wage-maximizing behavior would be tohoard tasks posted by good employers bymisdirectingother workers.14 Because reviews are anonymous, di-rect reciprocity and punishment is limited. As such,sharing honest reviews could be thought of as a pro-social behavior that is costly to the worker in terms oftime and valuable private information and in whichsocial recognition or direct reciprocity is limited. Otherstudies of online reputation systems suggest that re-viewers are primarily motivated by a “joy of giving”and fairness (Cornes and Sandler 1994, Resnick andZeckhauser 2002).

5. Experiment 15.1. SetupThe first experiment examines the value of the rep-utation system to employers. Specifically, we exam-ine whether a good reputation helps employers attractworkers. We do so by creating employers on M-Turk,exogenously endowing them with reputations onTurkopticon, and then testing the rate at which theyattract work.

1. We created 36 employer accounts on M-Turk.The names of these employers consist of permutationsof three first names and 12 last names.15 We usedmultiple employers to protect against the evolution ofratings during the experiment. We chose these namesbecause they are common, Anglo, and male (for firstnames), and our analysis of Turkopticon ratings findthat these names are not generally rated high or low.

2. We endowed 12 employers with good reputa-tions, 12 employers with bad reputations, and left 12employers with no ratings or reputation. We createdaccounts on Turkopticon and posted numerical at-tribute ratings and long-form text reviews. Reviewsfor our bad (good)-reputation employers are taken asa sample of actual bad (good) reviews of bad (good)-reputation employers on Turkopticon.16 Good- andbad-reputation employers receive 8–12 reviews each.These reviews make our good and bad reputationsnot unusual with regard to their mean reputationsalthough the bad-reputation employers do have anunusual degree of rater consensus about their bad-ness.17 Because M-Turk workers may sort tasks alpha-betically by employers’ names, we balance reputationsby the first name of the employer so that reputationis random with respect to the alphabetical order ofthe employer.

3. Our employer identities took turns posting taskson M-Turk. They did so in 72 one-hour intervals,posting new tasks on the hour. At the start of eachhour, the employer posted hundreds of tasks, morethan were ever done within the hour. Each workerwas allowed to do only one task, and this was ap-parent when browsing the job listing. Posts began at12:00 a.m. on Tuesday, July 7, and ended at 11:59 p.m.on Thursday, July 9. For example, the employernamed Mark Kelly, who was endowed with a goodreputation on Turkopticon, posted tasks at 12:00 a.m.and ceased accepting new submissions at 12:59 a.m.,thereafter disappearing from workers’ search results.At 1:00 a.m., Joseph Warren, who had no reputationon Turkopticon, posted new tasks.

We balanced the intervals so that (1) in each hour, overthree days, the three reputation types are representedonce, and (2) in each hour, over each six-hour partition ofa day, the three reputation types are represented twice.Wechose thefinal schedule (Table 1) at randomfrom theset of all schedules that would satisfy these criteria.

The tasks consisted of image-recognition exer-cises. Workers were asked to enter the names,quantity, and prices of alcoholic items from an imageof a grocery receipt that we generated. Receipts were20 items long and contained three to five alcoholicitems.18 Workers could only submit one task inany one-hour interval. The pay rate was $0.20, andworkers had 15 minutes to complete the task oncethey accepted it.

4. Simultaneously, we created three employersthat posted 12-cent surveys requesting informa-tion from workers’ dashboards. These employersposted new batches of tasks each hour for 24 hourseach. Their reputation did not vary. The purpose ofthis task was to determine a natural baseline arrivalrate that could be used as a control in the mainregressions.


5. We recorded the quantity and quality of completedtasks. We did not respond to communications and didnot pay workers until the experiment concluded.

Note that employer reputation may affect the la-bor supply throughmultiple causal mechanisms. One

obvious mechanism is that the workers would justconsult the public reputation of the employer di-rectly each time and pick or reject the jobs accord-ingly. The other mechanism would be that someworkers who discovered a “good” employer would

Figure 1. (Color online) M-Turk Worker’s Job Search Process: Turkopticon

Notes. Screen capture of an M-Turk worker’s job search interface. The tooltip box left of center is available to workers who have installedTurkopticon and shows color-coded ratings of the employer’s communicativity, generosity, fairness, and promptness. It also offers a link to long-form reviews.


invite others to work for this “good” employer aswell.19 Accordingly, some workers who discovereda “bad” employer would warn others not to work forthe “bad” employer. In other words, the effect ofemployer reputation as examined in our study is not

limited to just the first-order effect but rather is theoverall effect on labor supply.Because our experiment was three full days long,

we also considered the possibility that the reputationof our employers would start evolving as workers

Figure 2. (Color online) M-Turk Worker’s Job Search Process: Previewing, Accepting, and Submitting Tasks

Notes. Screen capture of an M-Turk worker’s job search interface. From the list of tasks, workers must choose to preview a task before acceptingthe task. They then enter data into the web form and submit their work.


would eventually start adding their own true feed-back into the reputation system or might eventuallystart noticing the similarities and patterns betweenthe employers. To account for this, we switched be-tween employers rather quickly: every hour. In ad-dition to that, we also used a balanced randomizationprocedure, such as every six-hour slot20 of the day hasat least two good employers, two bad employers, andtwo neutral employers as presented in our schedulein Table 1. This way, none of the treatment groupsis affected disproportionally by any potential timeevolution of the experiment, and each six-hour slotoffers equal exposure to all treatment groups.

As we were monitoring the progress of the experi-ment, indeed, on Thursday21 at 4:14 p.m., an obser-vant worker compiled and publicly announced a listof our 24 employers with good and bad ratings on aReddit forum, noting their similarities and suggestingthat the reviews were created by fake accounts. OnThursday at 5:22 p.m., we reached out to this worker andto a concerned group of other workers on a Turkopticondiscussion board to address the concerns regardingour employers falsifying reviews with the intent ofdefrauding workers. We disclosed to this group ofworkers that all workers would be paid. On Thursdayat 6:14 p.m., the description of the experiment wascross-posted on Reddit. As this possibility was con-sidered in our randomization procedure, we presenttwo sets of the results: one including this last six-hour

shift and one excluding this last six-hour shift. As canbe seen and is discussed in Table 2, our results arequalitatively identical in both cases.

5.2. ResultsSummarizing the results of the experiment, Figure 3shows the cumulative distribution of arrivals acrossthe three employer-reputation types. As can be seenin Figure 3, our results are qualitatively consistentthroughout the time of the experiment starting fromthe very beginning owing to the balanced time allo-cation schedule that we discussed. By the conclusionof each of the 12 six-hour partitions, the employerwith good ratings had attracted more work than theemployerwith neutral ratings, and the employerwithneutral ratings had attracted more work than theemployer with poor ratings.Table 2 shows results from a negative binomial

model. In all samples except for the six-hour parti-tions, employers with good reputations attract workmore quickly than employers with poor reputationswith p< 0.01. However, if comparing only against no-reputation employers at a 5% significance level, em-ployers with a good reputation do not receive sub-mitted work significantly faster than those with noreputation, and employers with a poor reputation re-ceive submitted work significantly slower only in thefull samples.

Table 1. Balanced, Random Allocation of Employer Identities to Time Slots withReputation

Tuesday Wednesday Thursday

0:00 (g) Mark Kelly (b) Thomas Jordan (n) Mark Jordan1:00 (n) Joseph Warren (g) Joseph Jordan (b) Mark Warren2:00 (g) Thomas Warren (n) Mark Jordan (b) Joseph Kelly3:00 (n) Thomas Kelly (b) Thomas Jordan (g) Thomas Warren4:00 (b) Mark Warren (n) Joseph Warren (g) Mark Kelly5:00 (b) Joseph Kelly (g) Joseph Jordan (n) Thomas Kelly6:00 (g) Joseph Lewis (n) Thomas Lewis (b) Mark Lewis7:00 (n) Mark Roberts (g) Thomas Roberts (b) Thomas Clark8:00 (b) Thomas Clark (n) Thomas Lewis (g) Mark Clark9:00 (g) Mark Clark (b) Mark Lewis (n) Joseph Clark10:00 (n) Joseph Clark (b) Joseph Roberts (g) Joseph Lewis11:00 (b) Joseph Roberts (g) Thomas Roberts (n) Mark Roberts12:00 (b) Thomas Martin (n) Joseph Johnson (g) Joseph Martin13:00 (n) Thomas Adams (b) Joseph Adams (g) Mark Adams14:00 (n) Mark Martin (g) Mark Adams (b) Mark Johnson15:00 (g) Thomas Johnson (n) Thomas Adams (b) Joseph Adams16:00 (b) Mark Johnson (g) Thomas Johnson (n) Mark Martin17:00 (h) Joseph Martin (b) Thomas Martin (n) Joseph Johnson18:00 (n) Thomas Miller (b) Joseph Robinson (g) Thomas Robinson19:00 (g) Thomas Robinson (n) Mark Robinson (b) Thomas Owens20:00 (g) Mark Owens (b) Joseph Robinson (n) Mark Robinson21:00 (n) Joseph Owens (g) Joseph Miller (b) Mark Miller22:00 (b) Mark Miller (n) Thomas Miller (g) Joseph Miller23:00 (b) Thomas Owens (g) Mark Owens (n) Joseph Owens

Note. Parentheses denote employers endowed with good (g), no (n), and bad (b) reputations.


Although finding workers at a slower pace may notseem like such a major punishment for employerswith a poor reputation at first, we examine this findingvery carefully: the only people who are working forthe bad employers are “uninformed” workers. Inother words, if the labor market has approximately50% informed workers and 50% uninformed workers(such as AmazonMechanical Turk as of the time of the

experiment as described), the bad employers wouldindeed get the work done in approximately twice thetime. However, in the labor market in which almosteveryone is informed, the already slow pace of workmay become too slow for the bad employer to survivein the market.In addition, we also examine differences in esti-

mated effort and quality. The mean time spent per

Table 2. Negative Binomial Regression for Arrival of Submitted Tasks and Other Events

Good reputation No reputation

Sample β Standard error β Standard error Periods Events

Event: submitted tasksFull sample(1) All submitted tasks 2.053* (0.500) 1.503 (0.368) 72 1,641Subsamples(2) Day 1 only 4.104* (1.969) 2.135 (1.030) 24 695(3) Days 1–2 only 2.424* (0.766) 1.76 (0.559) 48 1,125(4) 12 a.m.–6 a.m. 1.679 (0.823) 1.393 (0.689) 18 114(5) 6 a.m.–12 p.m. 2.843* (1.201) 2.157 (0.915) 18 534(6) 12 p.m.–6 p.m. 1.096 (0.267) .978 (0.239) 18 415(7) 6 p.m.–12 a.m. 2.694* (0.955) 1.648 (0.589) 18 577Excluding last 12 hours(8) No controls 2.466* (0.704) 1.803* (0.516) 60 1,313(9) Controls for baseline rate 2.523* (0.719) 1.808* (0.515) 60 1,313(10) Day fixed effects 2.294* (0.654) 1.778* (0.498) 60 1,313(11) Hour fixed effects 1.858* (0.274) 1.374* (0.205) 60 1,313(12) Day and hour fixed effects 1.836* (0.262) 1.364* (0.196) 60 1,313Event: other(13) Task previews 2.314* (0.571) 1.495 (0.370) 72 1,837(14) Task accepts 2.141* (0.529) 1.551 (0.384) 72 1,799(15) Error-free submissions 2.018* (0.548) 1.5 (0.410) 72 1,012(16) First submissions 2.871* (0.804) 1.644 (0.465) 72 899(17) Error-free first submissions 2.88* (0.928) 1.641 (0.536) 72 508

Notes. Each row is a regression. Coefficients are incidence rate ratios with bad reputation as the omittedcategory. Standard errors in parentheses.

*p< 0.05.

Figure 3. (Color online) Cumulative Accepted Jobs by Employer Reputation

Note. Bold points represent active job listings.


task for good reputation, no reputation, and poorreputation employers were respectively 136, 113, and121 seconds. The difference between good reputationand no reputation employers is statistically signifi-cant with p< 0.01. For each of the three groups, theerror-free rates were between 61% and 63%, and themajor-error rates (e.g., no alcoholic items identified)were between 3.0% and 5.2%. Differences in the error-free rates and major-error rates are not statisticallysignificant.22 Mason andWatts (2010) also found thathigher payments raise the quantity but not quality ofsubmitted work.

In the full sample, 45.2% of the submitted taskswere not the first tasks submitted by an individualworker, and 9.7% of the submitted tasks were thesixth task or greater. The high incidence of repeatsubmissionsmay be for a number of factors, includingpower-users, correlated task search criteria (e.g., in-dividuals continuously search using the same crite-ria), automated alerts (e.g., TurkAlert), or purposelysearching for the same task across hours.

Table 3 shows results from our preferred specifi-cation of the negative binomial regressions to estimatethe arrival rates of task previews, acceptances, sub-missions, first submissions (by worker), and correctfirst submissions. These specifications omit the last12 hours in which the experiment was disclosed andalso include day and hour fixed effects. Arrival ratesfor good-reputation employers are significantly great-er than no-reputation employers for all outcomes, andarrival rates for no-reputation employers are signifi-cantly greater than bad-reputation employers for alloutcomes except correct first submissions with p< 0.05.Results provide evidence that good reputations pro-duce more previews, acceptances, submissions, firstsubmissions, and correct first submissions.

The point estimates in column (3) suggest thatarrival rates for employers with good and no reputa-tions, respectively, exceed those of employers with badreputations by 84% and 36%.

Table 3 also provides evidence about the effectsof reputation on various steps in the matching pro-cess. Conditional on a worker previewing a task, theprobability of accepting the task is not significantlydifferent by treatment. If information received by pre-viewing a task (e.g., the type of the task, the intuitivenessof the user interface) were a substitute for reputationinformation, then good-reputation employers wouldlose fewer workers during the preview stage than no-reputation employers. In the former, but not latter,workers would already have received the signal priorto previewing the task. This evidence suggests thatobservable task characteristics do not substitute forreputation information. The reputation system addsinformation abovewhatworkers can otherwise observe.Turkopticon is not native to the M-Turk interface

and must be installed by the worker. As such, thereputations we endow are visible only to a fraction ofworkers, and so only part of the “treated” populationactually receives the treatment. To estimate theshare of M-Turk job seekers who use Turkopticon,we posted a one-question, free-response survey asking,“How do you choose whether to accept HITs from arequester you haven’t worked for before? Pleasedescribe any factors you consider, any steps you take,and any tools or resources youuse.”Becausewe postedthe survey from a requester account that did not havea Turkopticon rating and because we require workersto identify Turkopticon specifically, we expected thisprocedure to yield a conservative estimate of the trueportion of job seekers who use Turkopticon. Of these,55 of the 100 responses mention Turkopticon ex-plicitly, and seven other responses mention other orunspecified websites.23

Experiment 1 also offers three additional piecesof evidence that Turkopticon provides information ofemployer type rather than task type. First, we findthat the observed probability of accepting a task con-ditional on previewing a task does not vary signifi-cantly by employer type. Second, we find that the

Table 3. Preferred Specification: Negative Binomial Regression of Arrival Rates in the First 60 Hours with Day andHour FixedEffects

Previews (1) Acceptances (2) Submissions (3) First submissions (4) Correct first submissions (5)

Good reputation 1.964* 1.909* 1.836* 2.488* 1.855*(0.280) (0.277) (0.262) (0.426) (0.405)

No reputation 1.403* 1.387* 1.364* 1.608* 1.261(0.204) (0.203) (0.196) (0.277) (0.278)

Constant 16.56* 14.10* 13.31* 8.024* 3.54*(4.907) (4.300) (4.002) (2.788) (1.729)

Day fixed effects Yes Yes Yes Yes YesHour fixed effects Yes Yes Yes Yes YesObservations 60 60 60 60 60

Notes. Standard errors in parentheses. Bad reputation is the omitted category. All coefficients for good employers are significantly different fromcoefficients for bad employers with p< 0.05.

*p< 0.05.


elapsed time that workers spend previewing tasksprior to accepting the task does not vary significantlyby reputation type. Third, our survey of 100 M-Turkworkers featured no workers who reported a beliefthat certain tasks were inherently more fairly orhighly compensated although nearly all cited ob-servable employer characteristics from past experi-ence or tools such as Turkopticon. These suggest thatworkers screened on Turkopticon ratings and not oninformation (e.g., task type) gathered during the taskpreviews. This, along with ratings criteria used byTurkopticon and the test in Experiment 1, lead us toconclude that workers use Turkopticon to get in-formation about employers that wouldn’t be acces-sible until after they would have otherwise exertedeffort (e.g., time to completion and nonpayment) ratherthan getting information on task type.

5.3. Alternative Dependent VariablesEmployers, especially on Mechanical Turk, want toget work done quickly, cheaply, and accurately. Ourresults suggest that better reputations help employersattractmoreworkers at a givenprice andquality, givingan advantage in speed. What if employers wished touse their reputation to achieve lower prices or greaterquality?

Horton and Chilton (2010) estimate that M-Turkworkers have a median wage elasticity for recruit-ment of 0.43. If this point elasticity holds for oursample, a bad-reputation employer that pays $0.59, ano-reputation employer that pays $0.37, and a good-reputation employer that pays $0.20 would attractwork at the same rate. This is a conservative esti-mate.Horton andChilton’s (2010) estimate of themeanelasticity is lower (0.24), implying that generating aslarge a difference in worker arrival rates as reputationgenerates would require even larger differences inpromised payments. Dube et al. (2018) synthesizeexisting evidence, includingHorton andChilton (2010)and subsequent experiments, and new evidence theygenerate to estimate that the mean recruitment elas-ticity is even lower (0.06), implying that reputationis even more valuable as a substitute for higher wagesin generating changes in recruitment and worker ar-rival rates to an employer for a given job posting.Moreno and Terwiesch (2014) also find that onlineservice providers on vWorker.com substitute be-tween higher prices and greater volume. Our studyfocuses on recruitment, leaving effects on retentionfor future work.

To estimate the value of a good reputation for gettingwork of better quality, consider moving to a majority-rules process. In particular, each alcoholic item data-collection costs an average of $0.03 in our study, andeach item was coded correctly with a probability ofp � 0.890. If a third rater is used only if the first two

raters disagree, then the average cost per item risesto $0.071, and the probability an item is coded cor-rectly rises to 0.966.24 The elasticity estimates implythat reducing price per worker-task to hold averagecosts per completed task constant reduces quantityof work completed by 23.7%: less than the quantitygained by a good reputation relative to no reputation.In other words, a good-reputation employer couldimplement a majority-rules process, cut the price perworker-task so as to achieve the same cost per com-pleted task, improve accuracy from0.890 to 0.966, andstill get work done more quickly than an employerwith no reputation although doing so may compro-mise the employer’s good reputation (especially“generosity” ratings) in the long run.

6. Experiment 26.1. SetupOur model assumes that the reputation system pro-vides accurate information to workers. In reality, thecheap talk, voluntarily-contributed ratings could bebiased or very noisy, such that the “informed” whomake decisions based on the reputation system are nomore informed than others. Our second experimentvalidates this assumption empirically and examinesthe value of the reputation system to workers.Specifically, we examined whether Turkopticon

ratings are informative of three employer character-istics that workers value but about which they faceuncertainty during the search process: the likelihoodof payment, the time to payment, and the implicitwage rate. As reflected in the literature on onlineratings, informedness shouldn’t be taken for granted.Horton andGolden (2015) show that oDesk, an onlinelabor market with a native bilateral rating system, ex-periences extensive reputation inflation as employersandworkers strategically, rather than truthfully, reportexperiences. Others report similar biases on eBay(Dellarocas andWood 2008, Nosko and Tadelis 2015),Airbnb (Fradkin et al. 2015), and Yelp (Luca 2016).The validity of Turkopticon ratingsmay be evenmoresurprising given that tasks offered by revealed goodemployers are rival (unlike, for example, good prod-ucts on retail markets).We followed the following procedure:1. We produced a random ordering of three rep-

utation types: good, bad, and none.2. The nonblind research assistant (RA1), using a

browser equipped with Turkopticon, screened the listof tasks on M-Turk until finding one that met the re-quirements of the next task on the random ordering.25

• If the next scheduled item was good, RA1searched the list for a task posted by an employerin which all attributes are green (all attributes aregreater than 3.0/5); 26.3% of the 23,031 employersreviewed on Turkopticon meet this criterion.


• If the next scheduled item was bad, RA1searched the list for a task posted by an employerwith no green attributes and a red rating for pay (allattributes are less than 3.0/5, and pay is less than2.0/5); 21.6% of employers reviewed on Turkopticonmeet this criterion.

• If the next scheduled item was none, RA1searched the list for a task posted by an employerwith no reviews.

3. RA1 sent the task to the blindedRA2,who used abrowser not equipped with Turkopticon.

4. RA2 performed and submitted the task. RA2was instructed to perform all tasks diligently.26

5. RA1 and RA2 repeated steps 2–4. A web crawlerrecorded payments and rejections by employers toRA2’s account with accuracy within one minute ofactual payment or rejection.

The blinding procedure decouples the searchprocess from the job performance process, therebyprotecting against the risk that RA2 inadvertentlyconditions effort on the employer’s reputation.

6.2. ResultsFigure 4 shows results for rejection rates and time topayment by the employer’s reputation type. Rejectionrates were 1.4% for employers with good reputations,4.3% for employers with no reputation, and 7.5% foremployers with bad reputations.

Table 4 presents further results and significancetests for rejection rates, time to payment, and realizedhourlywage rates.Wedefine realizedwage rates to bepayments divided by the time to complete the task ifthe work is accepted and zero if the work is rejected.We define promised wage rates to be posted payments

divided by the time to complete the task; they are notzero if the work is rejected.27 Employers with good repu-tations have significantly lower rejection rates andfaster times to decisions. They do not have statisticallydifferent posted pay rates. This distinction is importantbecause the pay for accepted tasks is contractible butthe task’s acceptance criteria and realistic time require-ments are not.In principle, the ratings on Turkopticon could be

orthogonal to employer type and instead be pro-viding information on task types (e.g., survey or photocategorization) rather than employer types. We donot find evidence that this is the case. First, Turkop-ticon requests workers to rate employers on fairness,communicativity, promptness, and generosity; unliketask type, these are revealed only after workers haveinvested effort and are subject to hold-up. Textualcomments also emphasize information that wouldonly be revealed to prospectiveworkers after investingeffort. Second, the RA’s task classifications in exper-iment 2 are not significantly correlated with em-ployers’ Turkopticon scores. We also found evidenceagainst workers screening on task rather than em-ployer in experiment 1.Given the low cost of creating new employers, it is

puzzling that employers with poor reputations per-sist rather than creating new accounts.When the studywas conducted, the only cost to creating a new em-ployer was the time filling forms and awaiting ap-proval. Since then, the cost of producing new aliaseshas grown.28 If creating new accounts were perfectlycostless and employers were informed, we would ex-pect there to be no active employers with poor repu-tations. However, Turkopticon’s textual reviews also

Figure 4. (Color online) Time to Payment and Rejection by Employer Reputation

Notes. Whiskers represent standard errors. p-values for a χ2 test that shares are independent of reputation are, respectively, 0.002, 0.011, 0.805,0.012, and 0.007.


suggest that workers are aware that employers withbad reputations may create new identities.

We conclude that the longer work times and loweracceptance rates validate Turkopticon’s ratings. In otherwords, Turkopticon is informative about employer dif-ferences that would be unobservable (or at least morecostly toobserve) in the absenceof the reputation system.

To provide an intuition for the magnitude of thevalue of employer-reputation information toworkers,note that our results imply that following a strategy ofdoing jobs only for good-reputation employerswouldyield about a 40% higher effective wage than doingjobs for only no-reputation or bad-reputation em-ployers: $2.83 versus just under $2.00 per hour. Re-sults suggest about 20% of the gap in effective pay isexplained by nonpayment, and 80% is explained bylonger tasks. However, this calculation understatesthe penalties when an employer rejects tasks becausethe rejected worker is penalized in two ways: non-payment and a lower approval rating. The latter re-duces the worker’s eligibility for future tasks fromother employers.

7. Natural ExperimentExperiments 1 and 2 demonstrate the effects of thereputation system on employers and workers on theonline market. So far, these results suggest thatworkers can earn substantially more by screeningemployers with good reputations, and employers withbetter reputations attract workers more quickly (or,alternatively, for a given speed, more cheaply) thanthose with no or poor reputations. In this section, weaddress the question of what happens to the job marketwhen the reputation system suddenly disappears.

7.1. Ideal ExperimentFollowing Rubin (1974), we begin by describing anideal experiment that would identify the causalpartial-equilibrium effect of the reputation system onthe market. Assume a researcher had the ability to(1) shut down the reputation system at will and forany periods of time and (2) monitor the entire market,including which jobs are being taken, how fast theyare finished, and so on. One could randomly assignthe time when the reputation system is removed and

Table 4. Rejection and Time to Payment by Employer Reputation

Mean Standard error N

Paired test p-values

Good None Bad

Main outcomes1. Rejection ratesGood reputation 0.013 (0.008) 223 0.073 0.003No reputation 0.043 (0.016) 164 0.073 0.246Bad reputation 0.071 (0.018) 211 0.003 0.246

2. Days to decisionGood reputation 1.679 (0.146) 223 0.132 0.001No reputation 2.296 (0.433) 164 0.132 0.03Bad reputation 3.715 (0.467) 211 0.001 0.03

3. Realized wage ratesGood reputation 2.834 (0.228) 173 0.011 0.043No reputation 1.957 (0.259) 141 0.011 0.949Bad reputation 1.986 (0.352) 168 0.043 0.949

Other outcomes4. Days to decision, accepts onlyGood reputation 1.643 (0.144) 220 0.083 0.001No reputation 2.368 (0.451) 157 0.083 0.023Bad reputation 3.943 (0.499) 196 0.001 0.023

5. Promised wage ratesGood reputation 2.834 (0.228) 173 0.017 0.098No reputation 2.011 (0.257) 141 0.017 0.771Bad reputation 2.142 (0.352) 168 0.098 0.771

6. Advertised payGood reputation 0.277 (0.025) 223 0.001 0.938No reputation 0.159 (0.024) 164 0.001 < 0.001Bad reputation 0.28 (0.022) 211 0.938 < 0.001

7. RA log-seconds to completeGood reputation 5.737 (0.228) 173 0.372 < 0.001No reputation 5.639 (0.085) 141 0.372 0.001Bad reputation 6.368 (0.069) 168 < 0.001 < 0.001

Notes. Rejection rate p-values are from a χ2 test that rejection rates are the same between the row andcolumn. Time-to-pay p-values are from a two-sample t-test that the mean times to pay are the samebetween the row and column. Standard errors in parentheses.


randomly decide for how long it is absent.29 Becausesuch a treatment assignment would be independentof market conditions, one could conclude that anychanges observed in the market were caused by thetreatment. Acknowledging that it is infeasible andunethical to shut down the reputation systemwebsitepurposefully, we use a natural experiment with ob-servational data, which serves as an approximation tothe ideal experiment described.

7.2. Observational DataTo explore the partial-equilibrium effect of reputationsystem absences, we exploit the seven instances whenthe Turkopticon servers went down. To accomplishthat, we collected the following data:

• Turkopticon downtime. We assembled data onTurkoption’s downtime using time stamps fromworker and Turkopticon administrative posts onthe Turkopticonwebsite, Reddit, Twitter, and GoogleGroups. These are summarized in Table 5. The chiefconcern is that Turkopticon’s downtimes are correlatedwith one of our variables, for example, because ofespecially heavy traffic. However, all administrativeposts attributed crashes to unrelated technical issues,such as software updates.

• Individual-task–level data on the entire marketthat was collected by the web crawler M-TurkTracker (Ipeirotis 2010b, Difallah et al. 2015) and issummarized in Table 6. M-Turk Tracker scans theM-Turk market every six minutes and records thestatus of all HITs that it observes, such as the numberof tasks left in a particular HIT, the task description,and the reward offered by the task. By studying thechanges in the number of tasks still left in each HIT,we can explore how fast the jobs are taken and, thus,explore shifts in the supply of labor in this market.

Our first goal is to study the total effect of thereputation system shutdown on the labor market. Todo that, we examinee the amount of work done byM-Turk workers at any given moment in time with

respect to whether the reputation system was activeat the moment. We measure work being done as the“promised” pay rate of a given task multiplied by thenumber of tasks that were done (rewards earned); weprefer this to the number of tasks alone because quicktasks tend to be cheap.We control for time of day, dayof week, employer, and episode using fixed effects.More specifically, we use the following model:

log(1 + Rewards Earnedit) � β0 + β1DOWNt + β2Ht

+ β3Dt+ β4Ri +β5Et+ εit,

(1)

where RewardsEarnedit is the total promised pay toall workers working on task i at time t, DOWNt is theindicator variable for whether the reputation systemis down at time t (DOWNt � 1) or not (DOWNt � 0),Htis the fixed effect for the hour of the day at time t,Dt isthe fixed effect for the day of the week at time t, Ri isthe fixed effect of the employer who requested task i,and Et is the fixed effect for the downtime episode.The analysis sample is restricted to observations oc-curring between two weeks before and after the startof a downtime episode. Table 7 presents results.As shown in Table 7, the overall job consumption

on the market actually increases as the reputationsystem shuts down. From this result, we concludethat the workers tend to stay in the market when thereputation system is shut down at least in the shortterm. There are a number of possible explanations forthis, not all of which necessarily correspond to higherpay among workers or better allocation of work.For example, workers might speed up work becausethey’re spending less time screening and reviewingemployers. In the short term, this might raise theamount of promised pay earned, but less of thispromised pay may be realized (given experiment 1).In the long term, the lack of a reputation system mayimpair workers’ ability to find good but small em-ployers and may discourage smaller employers frominvesting in a good reputation.

Table 5. Summary of Turkopticon Downtime Data

Variable Value

Number of downtime episodes 7.00Average length of a downtime episode, hours 10.53Average time between downtime episodes, days 61.54Total range spanned by downtime episodes, days 369.27

Table 6. Summary of MTracker Data

Variable Value

Number of hit status observations 504,840,038Number of distinct requesters 65,243Number of distinct crawls 267,319Average time between the crawls, minutes 12.07

Table 7. Overall Effect of Reputation System Shutdown onJob Consumption

Dependent variable:log(Rewards Earned)

DOWN 0.0034* (0.0004)Hour of day fixed effect YesDay of week fixed effect YesEmployer fixed effect YesDowntime episode fixed effect YesObservations 5,572,840R2 0.1422Adjusted R2 0.1419Residual standard error 0.1109 (df = 5,570,882)

Note. Robust standard errors in parentheses.*p< 0.01.


To examine whether workers’ job search changes,we study the heterogeneity of the treatment effect.We want to separate reputation into two dimensions:howgood an employer’s public reputation is and howwidely known or visible an employer is outside thepublic reputation system. We measure the quality ofan employer’s reputation using Turkopticon reviews.Wemeasure the visibility of employer i at time t as thenumber of times the MTurk-Tracker web crawlerencountered that employer across all time periodsbefore t. This is designed to capture workers’ generalfamiliarity with the employer. Some employers (suchas the brokers that useM-Turk to subcontract tasks onbehalf of their clients) become well known amongM-Turk workers. However, many employers postjobs only infrequently. Independent of Turkopticon,few workers have private knowledge of these less-visible employers’ past behavior. An employer fre-quently encountered by workers in their day-to-daybrowsing and work history would also tend to befrequently encountered by the web crawler. On theother hand, if the web crawler (which runs every fewminutes) encountered a particular employer only ahandful of times then this employer would generallynot be familiar to workers.

We performed a semiparametric test to examineheterogeneity by employer reputation and visibilitybased on the following procedure and plot the resultsin Figure 5:

1. Pick all the good-reputation employers in thelowest quartile of visibility, whose visibility is in thefirst 25 percentiles.30 These are the least-visible good-reputation employers. Estimate the DOWN coeffi-cient using only jobs for these employers. Denote itDOWN0%,good. Plot the estimated coefficient at 0% onthe x-axis with a green marker.

2. Shift the percentile window by 5%; that is, pickall good-reputation employers between the fifth andbottom 30th percentiles of visibility. Estimate a newDOWN coefficient using only jobs for these employers,denoted by DOWN5%,good. Plot the estimated co-efficient at 5% on the x-axis and in green.

3. Shift by 5% again and repeat the procedureuntil DOWN75%,good is estimated, corresponding tothe top quartile by visibility of good-reputation em-ployers, that is, the most-visible good-reputationemployers.

4. Repeat the entire procedure for bad-reputationemployers to estimate DOWN0%,bad,DOWN5%,bad...,DOWN75%,bad.These results suggest that the instantaneous effect

of the reputation system varies by employer. Em-ployerswith bad reputations are relatively unaffectedby the downtime, consistent with these employersattracting onlyworkerswho do not use the reputationsystem. Less-visible employers with good reputa-tions are the most adversely affected. Results areconsistent with these employers no longer being dis-covered by workers who use Turkopticon as a screen.Most-visible employers with good reputations are posi-tively affected as thoughworkers using Turkopticon stopscreening for less-visible good-reputation employers andinstead use the best-known, good-reputation employersas a fallback option. In other words, these results sug-gest that Turkopticon aids in workers’ discovery ofsmall high-road employers and provides these em-ployers an incentive to invest in their reputation. Toextrapolate, we might expect that the reputation sys-tem promotes competition and prevents the marketfrom devolving into a small, oligopsonistic set of well-known employers because newer and smaller em-ployers require substantial reputational investments to

Figure 5. (Color online) The Effect of Downtime Depends on Visibility of the Requester


become sufficiently well known to attract new workersreliably.

In contrast to the ideal experiment, this study hassome caveats. Although we would like to studywhether the market would reach a new equilibrium(e.g., it would collapse or become an oligopsony) inthe absence of a reputation system, we only observerelatively short, expected-to-be-temporary down-times that surely don’t allow employers to adapt theirpayment strategies endogenously or workers to ad-just their labor market response to such changes. Wealso cannot observe whether workers are actuallypaid less for their work. We only observe promised pay-ments and the number of tasks performed. Nonethe-less, the results provide relatively clean evidence forthe instantaneous effects of the reputation system onhow workers search for jobs and how public andprivate reputation substitute for one another.

8. Discussion8.1. Types of Reputation and Market DesignOur experimental design focuses on the value of apublicly available employer reputation system andnot on workers’ private signals. To test Hypothesis 1,we allowed workers to complete only one task. TotestHypothesis 2, we submitted only one task for eachemployer. To test Hypothesis 3, we examined howcompleted work varied by public ratings and totalvisibility. Even so, private information remains im-portant in this context, and other tools (e.g., TurkAlerts and DCI New HIT Monitor) are available tonotify workers when a employer that they privatelyflag posts a job. Likewise, employers can privatelyinvite workers to apply for future work.

The coexistence of these more traditional matches,which rely on private information and repeated con-tracting, is arguably a reminder of the shortcomings ofcurrent rating systems. Indeed, the chief value prop-osition of such crowdsourcing platforms is to providea quick and efficient method of connecting employersand workers to large numbers of trading partners. Theinability to do so is a key concern for both employersand workers.

Amazon might do more to encourage workers andemployers to use public ratings. For instance, Amazoncould give workers access to historical information oneach employer, such as average past wage and re-jection rates. It could also create a native, subjectiverating system, asUpworkhas done and asAmazonhasdone for consumer products. The lack of informationabout employer reputations coupled with the lack ofcontract enforcementmay be limiting themarket to thesmall size that a reputation can discipline and to smalltasks that are relatively short and well defined. Forinstance, Guo et al. (2018) find that workers on onlinemarkets avoid jobs that are difficult to codify or have

flexible renegotiation provisions or subjective out-come standards. Ipeirotis (2010a) notes that M-Turk’spoor enforcement likely limits the scope of workperformed on the platform to very narrow tasks. Sinceour experiment, Amazon has begun requiring uniquetax identification numbers, making it more difficultfor employers to reset a bad reputation.

8.2. Why Do Workers Contribute to the PublicReputation System?

The coexistence of public and private reputationsystems also begs the question: why do workerscontribute to a collective memory when they caninstead hoard private knowledge of the best em-ployers? Prior studies have also noted the collectiveaction problem that this entails (see, e.g., Levine andPrietula 2013 and Gao et al. 2015). Again, this liter-ature almost exclusively focuses on ratings of sellersand service providers by buyers and clients.Nonetheless, online labor markets and ratings of

employers are also a unique and instructive settingfor understanding contributions to public ratings. In atypical product market, goods are nonrival: when abuyer favorably rates a product or seller on Amazon,that buyer’s ability to get future products or ser-vices is not hindered by an increase in other buyers.However, employers post finite numbers of tasks,and favorable reviews for smaller employers couldlead other workers to consume those tasks. In thissense, worker reviews are costly to workers not onlyin terms of time, but also in that they attract otherworkers to a rival “good.” The ability of Turkopticonto attract large numbers of raters shows that altruismand volunteerism survive even under these conditions.Moreover, our results from our second experiment con-firm that these reviews are useful and informative.

8.3. Specific Puzzles from Our Empirical ResultsEach study features some empirical results that war-rant future attention. In experiment 1, we found thatgood-reputation employers attract work more quicklywith no loss of quality. However, good-reputation em-ployersmight also get a reputation for paying regardlessof work quality, leading workers to flock to these em-ployers and then exert minimal effort. Although theanalysis of workmanship in experiment 1 finds noevidence for this margin of behavior, it remains anopen question whether employer reputation systemscan also invite moral hazard.In experiment 2, why did effective wages for good-

reputation employers exceed those for bad-reputationemployers? As noted in our review of the literature,studies have generally (although not always) found theopposite result: good reputations allow trading part-ners to extract more favorable terms, such as the abilityto attract workers at lower pay. Such compensating


differentials may be impossible in this setting: al-though pay-per-task is specified, the time to completethe task is not. Ratings may also capture other aspectsof the employer. Following Bartling et al. (2013), someemployers may be more altrustic; these employerspay higher wages and also have better reputations.Indeed, Turkopticon ratings include an item for gen-erosity, which intends to capture expected wages.A second alternative is that ratings implicitly captureemployers’ preferences for getting work done morequickly; impatient employers pay higher wages andmaintain good reputations to get work accomplishedquickly.

Finally, in the natural experiment, what wouldhappen to themarket if the reputation system remaineddown? Turkopticon’s downtimes suggest that smaller,good-reputation employers are especially dependenton Turkopticon to get work done quickly. Followingthe logic of our formalmodel,wemayhypothesize thatthe long-term loss of a reputation system would leadthe market to become concentrated among the mostvisible of the good-reputation employers althoughsmaller employers are deterred by the cost of estab-lishing a good reputation and may need to go throughthird-party brokers (such as CrowdFlower) that haveestablished reputations.

8.4. Lessons for Reputation Systems in Off-LineLabor Markets

What relevance do these findings have for othermarkets? M-Turk workers are unconventional in thatthey are contracted for very small tasks and haveminimal interaction with firms. However, the issuesthat they confront are more general. As Agrawal et al.(2015, p. 219) describe, “the growth of online marketsfor contract labor has been fast and steady.”

Uber, TaskRabbit, DoorDash, and other onlineplatforms are also blurring the boundaries betweenoff-line employment and entrepreneurship (Apte andMason 1995, Weil 2014, Harris and Krueger 2015).Although these platforms are also drawing increasingscrutiny from regulators, wage theft and other formsof opportunism are also pervasive in other settings inwhich legal enforcement is weak, including amongindependent contractors, undocumented immigrants,misclassified employees, and low-wage employees(Bobo 2011, Rodgers et al. 2014). Wage theft hasprompted the U.S. Department of Labor’s Wage andHour Division to award back pay to an average of262,996 workers a year for the past 10 years, and farmore cases go unremedied (Bernhardt et al. 2009,2013; Bobo 2011; Lifsher 2014; U.S. Department ofLabor Wage and Hour Division 2016). The value ofstolen wages restored to workers through enforcementactions is larger than the total value stolen in all bank, gasstation, and convenience store robberies (Lafer 2013).

Krueger (2017, p. 13) reports that about a third ofAmerican workers spent some time in the prior week“workingor self-employedasan independent contractor,independent consultant, or freelanceworker,” including“working on construction jobs, selling goods or ser-vices in their businesses, or working through a digitalplatform, such as Uber, Upwork, or Avon,” and 84%of these workers report self-employment as theirmain job. Among these workers, over a third report“having an incident in the last year where someonehired you to do a job or project and you were not paidon time.”Over a quarter reported at least one incidentof being unable to collect the full amount owed for ajob or project that theworker completed. The FreelancersUnion has used both reputation and regulatory solu-tions to address client nonpayment, including its “clientscorecard” and a successful effort to lobby New YorkCity to pass the Freelance Isn’t Free Act.As onM-Turk, workers in the broader labor market

strive to distinguish which employers will treat themwell or ill. Workers have always made decisionswith partial information about employer quality, andso these forces have always shaped labor markets.Contracts and bilateral relational contracting areimportant forces disciplining employer opportunism,but they are certainly incomplete. Workers have al-ways relied on public employer reputations propa-gated through informal, decentralized, word-of-mouthconversations. Although economists have had theoriesabout how employer reputation would work, the in-formal system has operated largely outside our view,yielding a very thin empirical literature. As the cost ofcommunications and data storage fell in recent years,employer reputation has become more centralized,systematic, and measurable, showing up in generallabor market matching sites, such as Glassdoor.comand Indeed.com, and in more specialized contexts,such as ProjectCallisto.org, which allows workers toshare information about sexual harassment and abuseat work, and Contratados.org, which allows migrantworkers to review recruiters and employers.Attention to the worker’s information problem also

suggests innovative directions for policy and in-stitution building. Can more be done to improve thefunctioning of the gig economy through helping workersovercome their information problem with respect toemployer heterogeneity? Can we improve institutionaldesigns to better elicit, aggregate, and disseminateinformation about employers? Platform design affectsworkers’ willingness to voluntarily contribute theirprivate information to the public pool (Marinescuet al. 2018). A policy example of this kind of logicin action is that, in 2009, the U.S. Occupational Safetyand Health Administration began systematically is-suing press releases to notify the public about largeviolations of workplace safety laws. This effort attempts


to influence employer reputation, to improve the flowof information about employer quality, and to createincentives for providing safer workplaces. Johnson(2019) found that it also induces competing employersto improve compliance with worker protection laws al-though the Trump administration rolled back this ef-fort (Meier and Ivory 2017). Workers have traditionallyused labor unions and professional associations as avenue for exchanging information about working con-ditions and coordinating collective withdrawal of tradeto discipline employers. The rise of new institutions thatfacilitate information sharing may be taking up someof this role.

Platforms can better deliver on their promise toreduce matching frictions and increase efficiency tothe extent that they help workers distinguish reliableemployers. If that goal can be addressed, then thefalling costs of information processing and diffusionmay move labor markets closer to the competitiveideal.31 Fulfilling this promise requires designing plat-forms that help workers to find great employers andavoid bad ones.

9. ConclusionOnline platforms are making it cheaper to connecttrading partners, but issues of trust and reliabilityremain. The empirical literature has focused almostexclusively on sellers, including sellers of products(e.g., eBay brokers), services (e.g., restaurants), andlabor (e.g., contract workers on gig platforms). Labormarkets have always faced bilateral uncertainty al-though the relative absence of regulation has madegig and online labor markets especially prone toopportunistic employers.

This study provides a theoretical and empirical foun-dation to better understand how employer reputationsystems can partially substitute for legal and otherthird-party contract enforcement. Moreover, the ex-perience of M-Turk and Turkopticon suggests thatreputation systems may have an important role toplay in providing employers with incentives to treatworkers well, giving lesser-known employers directaccess to workers, and ultimately expanding the scopeof work that can be completed online. Institutions andpolicies can combat opportunistic employers, but giventhe complexities of the employment relationship, itseems implausible that opportunism will ever be fullyeliminated.

AcknowledgmentsThe authors thank their excellent research assistants HarshilChalal, Sima Sajjadiani, Jordan Skeates-Strommen, RobVellela, and Qianyun Xie. They also thank Panos Ipeirotis forsharing M-Turk Tracker data. For their useful feedback, theauthors thank John Budd, Eliza Forsythe, Mitch Hoffman,John List, Colleen Manchester, Mike Powell, David Rahman,

and Chris Stanton; as well as workshop participants at theASSA Meetings, MEA-SOLE meetings, Minnesota AppliedMicroeconomics Workshop, MIT Sloan Organizational Eco-nomics Lunch, MIT Conference on Digital Experimentation,MIT Sloan IWER seminar, LERA meetings, Michigan Ross,Northwestern Law School, Organization Science WinterConference, Barcelona GSE Digital Economics SummerWorkshop, IZA World Labor Conference, NBER Personnelsummer institute, GSU, the Advances in Field Experimentsconference at BU, and Tennessee-Knoxville. Authorship isequal and alphabetical. This paper is an updated versionof IZA Discussion Paper 9501.

Endnotes1As discussed more later, we estimate that at least one half of theM-Turk labor supply installs Turkopticon although it remains apuzzle why not all workers do so. Exogenous employer visibilityis also interesting: one might imagine that, in small communities,all market participants would have a well-established reputationwithout the aid of any formal reputation system. M-Turk featuresboth a few large brokerages (e.g., Crowdflower) that constitute asubstantial share of labor demand and a long tail of smallerrequesters.2Perfect revelation of past payment simplifies the exposition. Boardand Meyer-ter Vehn (2013) consider reputation building whenlearning is imperfect. Their model also yields ergodic shirking withincreasing incentives for noncontractible investments as reputationbecomes noiseless.3Workers may face the two standard kinds of information problemswith respect to unobserved employer heterogeneity: adverse selec-tion and moral hazard. Employers’ technologies or product marketsmay differ in ways that make low-road practices more or lessprofitable. In this adverse-selection setting, it is trivial to understandwhy variation in employment practices emerges. An alternativetheory is that there is no essential heterogeneity between employers.Differences in strategic employment practices appear betweencompeting employers (Osterman 2018). We focus on this more in-teresting case. In all labor markets, both mechanisms are almostcertainly empirically relevant. Cabral and Hortacsu (2010) did suchan accounting in a consumer-goods market, baseball cards on EBay.We know of no analogous accounting in any labor market. Thatremains for future work.4Other studies show how reputation systems and credentials canimprove efficiency in other online markets including eBay (Noskoand Tadelis 2015, Hui et al. 2016) and Airbnb (Fradkin et al. 2015).5Our model abstracts away from the real but very well-studiedemployer information problem of dealing with workers who maydiffer in quality or shirk.6Another approach is to let employers exogenously vary in theirdiscount rates, in which case farsighted employers become high-roademployers.7We could allow low-road employers to receive somework and someprofit by allowing visibility to yield an imperfect signal. Then s couldarise endogenously by allowing employers to exogenously vary intheir discount rates; cheap and patient employers renege, rich andimpatient employers pay. We omit this complexity because it addslittle insight, and our empirics are concerned with whether the signalis valuable, not the degree to which it is precise.8Available online at https://archive.fo/FaVE.9Available online at http://www.mturk-tracker.com (accessed June14, 2014).10 In legal terms, M-Turk is a brokerage that facilitates relationshipsbetween two contracting parties: one that seeks work for pay and


https://archive.fo/FaVE

http://www.mturk-tracker.com

another that performs work. We use “employer” as shorthand for theformer.11Popular resources include Reddit’s HitsWorthTurkingFor,CloudMeBaby.com,mturkforum.com,mturkgrind.com, turkalert.com,turkernation.com, turkopticon.ucsd.edu.12 For details on our estimates, see the end of the study 2 resultssection. Silberman et al. (2010), Irani (2012), and Silberman (2013)provide background on Turkopticon.13These statistics are based on our analysis of data scraped from thesite. Attribute ratings are determined by the mean from the followingquestions: (i) for communicativity, “How responsive has this re-quester been to communications or concerns you have raised?”;(ii) for generosity, “How well has this requester paid for the amountof time their HITs take?”; (iii) for fairness, “How fair has this requesterbeen in approving or rejecting your work?”; (iv) for promptness,“How promptly has this requester approved your work and paid?”Their means (standard deviations) are respectively 4.01 (1.68), 3.98(1.62), 3.71 (1.68), and 3.18 (1.91), suggesting that ratings are mean-ingfully spread. Their number of reviews are 93,596, 93,025, 99,437, and44,298. Reviews are somewhat consistent across dimensions; the cor-relation between any one dimension and the mean value of the otherthree dimensions is 0.57. On workers’ displays, average ratings arecolor coded; scores less than two are red, scores between two and threeare yellow, and scores greater than three are green.14This competition between workers to get the best jobs is the basis ofresources such as TurkAlert.com, which allows workers to receive analert whenever employers of their choosing post new tasks.15The first names are Joseph, Mark, and Thomas. The last names areAdams, Clark, Johnson, Jordan, Kelly, Lewis, Martin, Miller, Owens,Roberts, Robinson, and Warren.16 For this purpose, we define bad reviews as those giving a score of1/5 on all rated attributes and a good review as giving a 4/5 or 5/5 onall rated attributes. The text reviews clearly corroborate the numericalrankings; an RA given only the text reviews correctly identified theemployer type in 285 of the 288 reviews.17At the time of the experiment, of the 23,095 employers rated onTurkopticon, 22.9% met our criteria for being bad-reputation and48.1% met our definition of being good-reputation. Many of thebottom ratings come from employers with few reviews. Of the 1,564employerswith 8–12 reviews, only 1.1%met our definition of bad and41.5%met our definition of good. In this sense, our “good” employershad good ratings but not uncommonly so. However, themean ratingsof our bad employers are especially bad given the consensus across somany raters that they merit one on all dimensions. As such, ourestimate of the effect of bad reputation (relative to good and noreputation) might be interpreted as an upper bound at which abouthalf of workers are using this reputation system to screen employers.18Alcoholic items came from a list of 25 bestselling beers. This task,therefore, features simple image recognition, abbreviation recogni-tion, and domain knowledge.19 In both of these cases, the workers have no information of their ownand, thus, base their decisions only on the available public reputationthat they see20There are four six-hour slots in each day—12 a.m.–5 a.m., 6 a.m.–11a.m., 12 p.m.–5 p.m., 6 p.m.–11 p.m.—that roughly correspond tonight, morning, day, and evening shifts.21Thursday is the last day of the three days of the experiment.22Differences are for a two-sample t-test for equal means of the log-work time with α< 0.1. Error-free receipts are those in which allalcoholic items were identified, no nonalcoholic items were identi-fied, and the prices were entered correctly. Major-error receipts arethose in which no alcoholic items were identified or more than sixitems are listed.

23Otherwise, responses emphasize estimated pay, estimated time tocompletion, and perceived trustworthiness (e.g., from a known or-ganization). To the extent one is interested in the effect of reputationamong informed workers, this treatment-on-treated effect is 82%(from 0.55−1) larger than the estimated effect in the observed equi-librium. The estimated effect is a weighted average of a larger effectamong workers who use the reputation system and a zero effectamong those who don’t.24Assuming errors are independent, the expected cost is 2c[p2 +(1 − p)2] + 3c[1 − p2 − (1 − p)2]. The probability of a correct decisionis p2 + 2(p2(1 − p)).25To hold skill constant, the RA omitted any tasks requiring master’squalification. The task classifications were uncorrelated with Tur-kopticon scores.26RA2 was not able to complete all jobs sent by RA1. Some expiredquickly. Also, bad-reputation employers’ jobs were more likely to beso dysfunctional as to be unsubmittable.27Counts are lower for wage rates because the blinded RA lost trackof time to completion for some tasks.28On July 27, 2014, Amazon began requiring employers to post alegal personal or company name, physical address, and Social Se-curity number or employer identification number.29The control group is the time when the reputation system is up.30Reputations are defined as in experiment 2.31According to Manning (2011, p. 978), “If one thinks of frictions asbeing caused by a lack of awareness of where vacancies are . . . thenone might have expected a large effect of the Internet. But if . . . onethinks of frictions as coming from idiosyncracies in the attractive-ness of different jobs . . . then one would be less surprised that theeffects of the Internet seem to be more modest.”

ReferencesAgrawal A, Horton J, Lacetera N, Lyons E (2015) Digitization and

the contract labor market: A research agenda. Goldfarb A,Greenstein SM, Tucker CE, eds. Economic Analysis of the DigitalEconomy (University of Chicago Press, Chicago), 219–250.

Agrawal AK, Lacetera N, Lyons E (2013) Does information help orhinder job applicants from less developed countries in onlinemarkets? NBER Working Paper No. 18720, National Bureau ofEconomic Research, Cambridge, MA.

Apte UM, Mason RO (1995) Global disaggregation of information-intensive services. Management Sci. 41(7):1250–1262.

Ba S, Pavlou PA (2002) Evidence of the effect of trust buildingtechnology in electronic markets: Price premiums and buyerbehavior. MIS Quart. 26(3):243–268.

Bajari P, Hortacsu A (2003) The winner’s curse, reserve prices, andendogenous entry: Empirical insights from eBay auctions.RAND J. Econom. 34(2):329–355.

Banerjee AV, Duflo E (2000) Reputation effects and the limits ofcontracting: A study of the Indian software industry. Quart. J.Econom. 115(3):989–1017.

Barach M, Golden J, Horton J (2019) Steering buyers to selectedsellers: The role of platform incentives and credibility. Man-agement Sci. Forthcoming.

Bartling B, Fehr E, Schmidt KM (2013) Use and abuse of authority:A behavioural foundation of the employment relation. J. Eur.Econom. Assoc. 11(4):711–742.

Bernhardt A, Spiller MW, Theodore N (2013) Employers gone rogue:Explaining industry variation in violations of workplace laws.Indust. Labor Relations Rev. 66(4):808–832.

Bernhardt A, Milkman R, Theodore N, Heckathorn D, Auer M,DeFilippis J, Gonzalez AL, et al. (2009) Broken Laws, Unpro-tected Workers (NELP National Employment Law Project, NewYork).


Board S, Meyer-ter Vehn M (2013) Reputation for quality. Econo-metrica 81(6):2381–2462.

Bobo K (2011) Wage Theft in America (New Press, New York).Brown J, Matsa DA (2016) Boarding a sinking ship? An investigation

of job applications to distressed firms. J. Finance 71(2):507–550.Cabral L, Hortacsu A (2010) The dynamics of seller reputation: Ev-

idence from ebay. J. Indust. Econom. 58(1):54–78.Chauvin KW, Guthrie JP (1994) Labor market reputation and the

value of the firm. Managerial Decision Econom. 15(6):543–552.Cornes R, Sandler T (1994) The comparative static properties of the

impure public good model. J. Public Econom. 54(3):403–421.Dellarocas C, Wood CA (2008) The sound of silence in online feed-

back: Estimating trading risks in the presence of reporting bias.Management Sci. 54(3):460–476.

Difallah DE, Catasta M, Demartini G, Ipeirotis PG, Cudré-Mauroux P(2015) The dynamics of micro-task crowdsourcing: The case ofAmazonMTurk. Gangemi A, Leonardi S, Panconesi A, eds. Proc.24th Internat. Conf. World Wide Web (ACM, New York), 238–247.

Dube A, Jacobs J, Naidu S, Suri S (2018) Monopsony in online labormarkets. NBER Working Paper No. 24416, National Bureau ofEconomic Research, Cambridge, MA.

Farronato A, Fradkin A, Larsen B, Brynjolfsson E (2018) Consumerprotection in an online world: When does occupational licensingmatter? Working paper, Harvard University, Cambridge, MA.

Filippas A, Horton JJ, Golden J (2018) Reputation inflation. Proc. 2018ACM Conf. Econom. Comput. (ACM, New York), 483–484

Fradkin A, Grewal E, Holtz D, Pearson M (2015) Bias and reciprocityin online reviews: Evidence from field experiments on Airbnb.Roughgarden T, ed. Proc. 16th ACMConf. Econom. Comput. (ACM,New York), 641.

Gao GG, Greenwood BN, Agarwal R, McCullough JS (2015) Vocalminority and silent majority: How do online ratings reflectpopulation perceptions of quality? MIS Quart. 39(3):565–589.

Guo X, Gong J, Pavlou P (2018) Enhancing the “call for bids” toimprove matching efficiency in online labor markets: Capturingthe meaning of unstructured textual content with machine learn-ing. Working paper, Temple University, Philadelphia.

Hannon JM, Milkovich GT (1996) The effect of human resourcereputation signals on share prices: An event study. Human Re-source Management 35(3):405–424.

Harris SD, Krueger AB (2015) A proposal for modernizing labor lawsfor twenty-first-century work: The independent worker. TheHamilton Project Discussion Paper 10, Brookings Institution,Washington, DC.

Horton J (2019) Buyer uncertainty about seller capacity: Causes, con-sequences, and a partial solution.Management Sci., ePub ahead ofprint May 6, https://doi.org/10.1287/mnsc.2018.3116.

Horton JJ, Chilton LB (2010) The labor economics of paid crowd-sourcing. Proc. 11th ACM Conf. Electronic Commerce (ACM, NewYork), 209–218.

Hui X, Saeedi M, Shen Z, Sundaresan N (2016) Reputation andregulations: Evidence from eBay. Management Sci. 62(12):3604–3616.

Ipeirotis PG (2010a) A plea to Amazon: FixMechanical Turk. AccessedJuly 9, 2019, https://www.behind-the-enemy-lines.com/2010/10/plea-to-amazon-fix-mechanical-turk.html.

Ipeirotis PG (2010b) Analyzing the Amazon Mechanical Turk mar-ketplace. XRDS Crossroads ACM Magazine Students 17(2):16–21.

Irani L (2012) Microworking the crowd. Limn (2):https://escholarship.org/uc/item/7vc0r3sh.

Johnson MS (2019) Regulation by shaming: Deterrence effects ofpublicizing violations of workplace safety and health laws.Working paper, Duke University, Durham, NC.

Jøsang A, Ismail R, Boyd C (2007) A survey of trust and reputationsystems for online service provision. Decision Support Systems43(2):618–644.

Klein B, Leffler KB (1981) The role of market forces in assuringcontractual performance. J. Political Econom. 89(4) 615–641.

Krueger A (2017) Independent workers: What role for public policy?Ann. Amer. Acad. Political Soc. Sci. 675(1):8–25.

Lafer G (2013) The legislative attack on American wages and laborstandards, 2011–2012. Economic Policy Institute Briefing Paper364, Economic Policy Institute, Washington, DC.

Levine SS, Prietula MJ (2013) Open collaboration for innovation:Principles and performance. Organ. Sci. 25(5):1414–1433.

Lifsher M (2014) California cracks down on wage theft by employers.Los Angeles Times (October 23), https://www.latimes.com/business/la-fi-wage-theft-action-20141024-story.html.

List J, Momeni F (2017) When corporate social responsibility back-fires: Theory and evidence from a natural field experiment.NBER Working Paper No. 24169, National Bureau of EconomicResearch, Cambridge, MA.

List JA (2006) The behavioralist meets the market: Measuring socialpreferences and reputation effects in actual transactions. J. Poli-tical Economy 114(1):1–37.

Luca M (2016) Reviews, reputation, and revenue: The case ofYelp.com. Harvard Business School Working Paper No. 12-016,Harvard University, Cambridge, MA.

Manning A (2011) Imperfect competition in the labor market.Card D, Ashenfelter O, eds. Handbook of Labor Economics, vol. 4,part B (Elsevier, Amsterdam), 973–1041.

Marinescu I, Klein N, Chamberlain A, Smart M (2018) Incentives canreduce bias in online reviews. NBER Working Paper No. 24372,National Bureau of Economic Research, Cambridge, MA.

Mason W, Watts DJ (2010) Financial incentives and the performanceof crowds. ACM SigKDD Explorations Newsletter 11(2):100–108.

McDevitt RC (2011) Names and reputations: An empirical analysis.Amer. Econom. J. Microeconom. 3(3):193–209.

Meier B, Ivory D (2017) Federal rules on worker safety and re-cord keeping are likely targets for rollbacks. New York Times(March 13), https://www.nytimes.com/2017/03/13/business/us-worker-safety-rules-osha.html.

Moreno A, Terwiesch C (2014) Doing business with strangers: Reputationin online service marketplaces. Inform. Systems Res. 25(4):865–886.

Nagaraj A (2016) Does copyright affect reuse? Evidence from theGoogle books digitization project. Working paper, Universityof California, Berkeley, Berkeley.

Nosko C, Tadelis S (2015) The limits of reputation in platformmarkets: An empirical analysis and field experiment. NBERWorking Paper No. 20830, National Bureau of Economic Re-search, Cambridge, MA.

Osterman P (2018) In search of the high road: Meaning and evidence.ILR Rev. 71(1):3–34.

Oyer P, Schaefer S (2011) Personnel economics: Hiring and incentives.Card D, Ashenfelter O, eds. Handbook of Labor Economics, vol. 4,part B (Elsevier, Amsterdam), 1769–1823.

Pallais A (2014) Inefficient hiring in entry-level labor markets. Amer.Econom. Rev. 104(11):3565–3599.

Resnick P, Zeckhauser R (2002) Trust among strangers in internettransactions: Empirical analysis of eBay’s reputation system.Econom. Internet E-commerce 11(2):23–25.

Rodgers WM, Horowitz S, Wuolo G (2014) The impact of clientnonpayment on the income of contingent workers: Evidencefrom the Freelancers Union independent worker survey. Indust.Labor Relations Rev. 67(3 suppl):702–733.

Rosenblat A, Levy KE, Barocas S, Hwang T (2017) Discriminatingtastes: Uber’s customer ratings as vehicles for workplace dis-crimination. Policy Internet 9(3):256–279.

Ross J, Zaldivar A, Irani L, Tomlinson B (2009) Who are the Turkers?Worker demographics in Amazon Mechanical Turk. Technicalreport, University of California, Irvine, Irvine.



https://www.behind-the-enemy-lines.com/2010/10/plea-to-amazon-fix-mechanical-turk.html

https://www.behind-the-enemy-lines.com/2010/10/plea-to-amazon-fix-mechanical-turk.html

https://escholarship.org/uc/item/7vc0r3sh

https://escholarship.org/uc/item/7vc0r3sh

https://www.latimes.com/business/la-fi-wage-theft-action-20141024-story.html

https://www.latimes.com/business/la-fi-wage-theft-action-20141024-story.html

https://www.nytimes.com/2017/03/13/business/us-worker-safety-rules-osha.html

https://www.nytimes.com/2017/03/13/business/us-worker-safety-rules-osha.html

Rubin DB (1974) Estimating causal effects of treatments in ran-domized and nonrandomized studies. J. Educational Psych. 66(5):688–701.

Silberman MS (2013) Dynamics and governance of crowd workmarkets. Presentation, International Workshop on HumanComputation and Crowd Work, September 30, Karlsruhe,Germany.

Silberman MS, Irani L (2016) Operating an employer reputationsystem: Lessons from Turkopticon, 2008–2015. Comparative LaborLaw Policy J. 37(3):472–505.

Silberman MS, Ross J, Irani L, Tomlinson B (2010) Sellers’ problemsin human computation markets. Proc. ACM Sigkdd WorkshopHuman Comput. (ACM, New York), 18–21.

Stanton C, Thomas C (2015) Landing the first job: The value of in-termediaries in online hiring. Rev. Econom. Stud. 83(2):810–854.

Stinchcombe AL (1965) Social structure and organizations. March JG,ed. Handbook of Organizations (Rand McNally, Chicago), 142–193.

Turban DB, Cable DM (2003) Firm reputation and applicant poolcharacteristics. J. Organ. Behav. 24(6):733–751.

U.S. Department of LaborWage andHour Division (2016)Working fora fair day’s pay. Accessed July 9, 2019, https://archive.fo/DGVX1.

U.S. Government Accountability Office (2015) Contingent work-force: Size, characteristics, earnings, and benefits. AccessedJuly 9, 2019, https://www.gao.gov/products/GAO-15-168R.

Weil D (2014) The Fissured Workplace (Harvard University Press,Cambridge, MA).


https://archive.fo/DGVX1

https://www.gao.gov/products/GAO-15-168R

Documents

INFORMS is located in Maryland, USA Publisher: Institute ... · History: Accepted by Chris Forman, information science. Funding: The authors thank the University of Minnesota Social