Modelling Cooperation and Competition in Urban Retail

Modelling Cooperation and Competition in Urban

Retail Ecosystems with Complex Network Metrics

Jordan Cambe∗†

Laboratoire de Physique,

ENS de Lyon

[email protected]

Krittika D’Silva∗

Computer Laboratory,

University of Cambridge

[email protected]

Anastasios NoulasNew York University

[email protected]

Cecilia MascoloComputer Laboratory,

University of Cambridge

[email protected]

Adam WaksmanFoursquare Labs

[email protected]

Abstract

Understanding the impact that a new business has onthe local market ecosystem is a challenging task as itis multifaceted in nature. Past work in this space hasexamined the collaborative or competitive role of ho-mogeneous venue types (i.e. the impact of a new book-store on existing bookstores). However, these priorworks have been limited in their scope and explana-tory power. To better measure retail performance ina modern city, a model should consider a number offactors that interact synchronously. This paper is thefirst which considers the multifaceted types of inter-actions that occur in urban cities when examining theimpact of new businesses. We first present a model-ing framework which examines the role of new busi-nesses in their respective local areas. Using a lon-gitudinal dataset from location technology platformFoursquare, we model new venue impact across 26major cities worldwide. Representing cities as con-nected networks of venues, we quantify their structureand characterise their dynamics over time. We note astrong community structure emerging in these retailnetworks, an observation that highlights the interplayof cooperative and competitive forces that emerge inlocal ecosystems of retail establishments. We next de-vise a data-driven metric that captures the first-ordercorrelation on the impact of a new venue on retailerswithin its vicinity accounting for both homogeneousand heterogeneous interactions between venue types.Lastly, we build a supervised machine learning modelto predict the impact of a given new venue on its local

∗These two authors contributed equally and are listed alpha-betically.

†Work was performed while a visitor at the University ofCambridge. Complete affiliation: Univ Lyon, ENS de Lyon,UCB Lyon 1, CNRS, Laboratoire de Physique, F-69342 Lyon,France; Institut Rhonalpin des Systemes Complexes, IXXI, F-69342 Lyon, France

retail ecosystem. Using two classes of features that ac-count for the presence of different venue types as wellas how these venue form cooperative networks, themodel achieves area under the curve above 80% scoresfor certain venue categories. Our approach highlightsthe power of complex network measures in buildingmachine learning prediction models. These modelshave numerous applications within the retail sector.More broadly, however, the methodology and resultscan support policymakers, business owners, and urbanplanners in the development of models to characterizeand predict changes in urban settings.

Keywords— Urban mobility, Spatio-temporal pat-terns, Predictive modeling

1 Introduction

Location is known to be highly influential in the success ofa new business opening in a city. Where a business is posi-tioned across the urban plane not only determines its reachby clienteles of relevant demographics, but more critically,it determines its exposure to a local ecosystem of busi-nesses who strive to increase their own share in a local mar-ket. The types of businesses and brands that are present inan urban neighborhood in particular has been shown [13]to play a vital role in determining whether a new retailfacility will grow and blossom, or instead whether it willbecome a sterile investment and eventually close. Compe-tition is nonetheless only one determinant in retail success.How a local business establishes a cooperative network withother places in its vicinity has been shown to also play adecisive role in its sales growth [4]. Local businesses cancomplement each other by exchanging customer flows withregards to activities that succeed each other (e.g. going toa bar after dining at a restaurant), or through the forma-tion of urban enclaves of similar local businesses that giverise to characteristic identities that then become recognis-able by urban dwellers. A classic example of the latteris the presence of many Chinese restaurants in a China-town [20].

1

arX

iv:2

104.

1398

1v1

[ph

ysic

s.so

c-ph

] 2

8 A

pr 2

021

It is therefore natural to hypothesize that the rise of abusiness lies on the complex interplay between cooperationand competition that manifests in a local area. Measuringthese cooperative and competitive forces in a city remains,however, a major challenge. Today’s cities change rapidlydriven by urban migration and phenomena such as gentri-fication as well as large urban development projects, whichcan lead to shops opening and closing at increasing rates.In 2011, the fail rate of restaurants in certain cities, such asNew York, was as high as 80% 1 with some businesses clos-ing in only a matter of months. A similar picture has beenreported for high street retailers in the United Kingdomwith part of the crisis being attributed to the increasingdominance of online retailers 2.

Data generated in location technology platforms by mo-bile users who navigate the city provides a unique oppor-tunity to respond to the aforementioned challenges. Inaddition to providing quick updates, in almost real time,on the places that open (or close) in cities - thus accu-rately reflecting the set of local businesses in a given area- they offer a view on urban mobility flows between areasand places at fine spatial and temporal scales. The abil-ity to describe these two dimensions of urban activity -places and mobility - paves the way for measuring the im-pact, either positive or negative, that retail facilities haveon each another. In this work, we harness this opportu-nity, building on a longitudinal dataset by Foursquare thatdescribes mobility interactions between places in 26 citiesaround the world. Our contributions are summarized inmore detail in the following:

• Detecting patterns of cooperation in urban ac-tivity networks: We model businesses in a city asa connected network of nodes belonging to differentactivity types. We examine the properties of thesenetworks spatially and temporally. With respect to anull network model, we observe higher clustering co-efficient, higher modularity and lower closeness cen-trality scores which are indicators of strong tenden-cies for local businesses to cluster and form collab-orative communities that exchange customer flows.In numerical terms, the modularity of urban activitynetworks is ≈ 0.6 relative to the corresponding nullmodels with ≈ 0.15.

• Measuring the impact of new businesses us-ing spatio-temporal metrics: We next model theimpact of a new business opening in a given area.Previous work [4] conducted a preliminary analysisof homogeneous impact of new businesses. However,this work was limited as it did not isolate numer-ous contributing factors such as whether multiple newvenues have opened when attributing the impact to agiven business. We develop a methodology more ro-bust to bias through the use of spatio-temporal filtersMoreover our approach generalizes to heterogeneousinteractions between venue types present in an urbansystem by considering impact measurements betweendifferent venue categories (e.g. measuring the impact

1https://www.businessinsider.com/new-york-

restaurants-fail-rate-2011-82https://www.theguardian.com/cities/ng-interactive/

2019/jan/30/high-street-crisis-town-centres-lose-8-

of-shops-in-five-years

of a restaurant to a bar). Notably, we observe thatthe opening of a Fast Food Restaurant near otherFast Food Restaurants results in the most significantcompetition, with a median decline in customer flowsof 21% over 6 months. We also show how this com-petitive ranking can be created for heterogeneous cat-egories.

• Predicting optimal retail environment usingurban dynamics and network topology mea-sures: Lastly, we built a supervised learning modelto predict the impact of multiple new venues on anexisting venue. We incorporate additional networkmetrics into our model to consider the urban envi-ronment of a given venue and its capacity to oper-ate cooperatively with other places in vicinity. Weobserve significant heterogeneity between categorieswhere some have more predictable trends while oth-ers have greater variations. In light of this hetero-geneity across venue types we tailor supervised learn-ing models by training them in a manner that re-flects the idiosyncrasies of its category. Despite theinherent difficulty of the prediction task, due to themulti-factorial nature of dynamic and complex inter-actions in retail ecosystems, our results suggest thatincorporating complex signals in predictive machinelearning frameworks can offer meaningful insight inreal world application scenarios. For certain businesscategories, AUC scores above 0.7 are consistently at-tained, whereas complex network metrics consistentlyboost the performance of classifiers offering a clearadvantage over baseline methods.

Our results are especially important in a digital agewith shifting customer preferences as physical business areforced to adapt to remain competitive. Our methodologycan enable a better understanding of interactions withinlocal retail ecosystems. Modern data and methods, suchas those employed in the present work, not only can allowfor monitoring these phenomena at scale, but also offernovel opportunities for retail facility owners to assess therisk of opening a new venture through location-based ana-lytics. Similar methods can be applied beyond the scope ofthe retail sector we study here, namely for urban planningand innovation e.g. through assessing the impact of open-ing transport hubs, leisure and social centers or health andsanitation facilities in city neighborhoods.

2 Related Work

Understanding retail ecosystems and determining the op-timal location for a business to open have been longbeen questions in operations research and spatial eco-nomics [10, 7]. Compared to modern approaches, thesemethods were characterized by static datasets inform-ing on population distribution across geographies, trackedthrough census surveys and the extraction of retail catch-ment areas through spatial optimization methods [1].Gravity models on population location and mobility laterbecame a common approach for site placement of newbrands [11].

The availability of spatio-temporally granular urbandatasets and the popularization of spatial analysis meth-

2

https://www.businessinsider.com/new-york-restaurants-fail-rate-2011-8

https://www.businessinsider.com/new-york-restaurants-fail-rate-2011-8

https://www.theguardian.com/cities/ng-interactive/2019/jan/30/high-street-crisis-town-centres-lose-8-of-shops-in-five-years



ods in the past decade led to a new generation of ap-proaches to quantify retail success in cities. In this line,network-based approaches have been proposed to under-stand the retail survival of local businesses through qual-ity assessment on the interactions of urban activities lo-cally [13]. In addition to networks of places, street networkanalysis emerged as an alternative medium to understandcustomer flows in cities, with various network centralitybeing proposed as a proxy to understand urban economicactivities [3, 17]. While this previous work examines theimpact of new businesses, it is limited in scope. It consid-ers only homogeneous categories, does not compare net-work properties across different cities, and does not builda prediction model. This is where the primary novelty ofthe present paper lies.

More recently, machine learning and optimization meth-ods have been introduced to solve location optimizationproblems in the urban domain, focusing not only on re-tail store optimization [14] but also real estate ranking [9]amongst other applications. Location technology plat-forms such as Foursquare opened the window of oppor-tunity for customer mobility patterns to be studied at finespatio-temporal scales [6, 5] and moreover, semantic anno-tations on places presented direct knowledge on the typesof urban activities that emerge geographically and led toworks that allowed for the tracking and comparing urbangrowth patterns at global scale [4]. Closer to the spiritof the present work from a modelling perspective, the au-thors in [12] study co-location patterns of urban activitiesin Boston and subsequently recommend areas where cer-tain types of activities may be missing.

3 Dataset Description

Within the last decade, Online Location-based Serviceshave experienced a surge in popularity, attracting hun-dreds of millions of users worldwide. These systems havecreated troves of data which describe, at a fine spatio-temporal granularity, the ways in which users visit differ-ent businesses and areas of a city. We hypothesize thesedata can be used to build a predictive model of the im-pact of a new venue on the surrounding businesses. Tothis end, we utilize data from Foursquare, a location tech-nology platform with a consumer application that allowsusers to check into different locations. As of August 2015,Foursquare had more than 50 million active users and morethan 10 billion check-ins [19].

The basis of our analysis is a longitudinal dataset from26 cities that spans three years, from 2011 to 2013, andincluded over 80 million checkins. We aggregate data fromthe 10 most represented cities in North America3 and the16 most represented cities in Europe4. A summary of ourdata is described in Table 1.

For each venue, we have the following information: geo-graphic coordinates, specific and general category, creationdate, total number of check-ins, and number of unique

3North American cities are: Austin, Boston, Dallas, SanFrancisco, New York City, Houston, Las Vegas, Los Angeles,Toronto, Washington.

4European cities are: Amsterdam, Antwerpen, Barcelona,Berlin, Brussels, Budapest, Copenhagen, Gent, Helsinki, Kiev,Madrid, Milano, Paris, Prague, Riga, London

Region#

venues# newvenues

# transi-tions

North America 94,094 29,552 43,200,432Europe 101,101 40,275 44,600,446Total 195,195 69,827 87,800,878

Table 1: Foursquare dataset description for 10 NorthAmerican cities and 16 European cities.

visitors. The specific and general categories fall withinFoursquare’s API of hierarchical categories. A full list ofthe categories can be found by querying the FoursquareAPI [8]. The dataset also contains a list of transitionswithin a given city. A transition is defined as a pair ofcheck-ins by an anonymous user to two different venueswithin the span of three hours. It is identified by a starttime, end time, source venue, and destination venue. Weconsider the set of venues V in a city. A venue v ∈ Vis represented with a tuple < loc, date, category > whereloc is the geographic coordinates of the venue, date is itscreation date, and category is the specific category of thevenue. The creation date, date, for a given venue refers tothe date it was added to the Foursquare platform. Priorwork by Daggitt et al. [4] showed that across all citieswhen examining the number of venues added per month,the last 20% of venues were new venues rather than exist-ing venues added to the database for the first time. Weapply this methodology to all cities and define new venuesas those that fall within the last 20% of venues added toFoursquare for that city.

4 Urban Activity Networks

We begin by examining transitions between Foursquarevenues of different category types that we refer to as urbanactivities. While we are considering a mix of categoriesusers check in in the city, our focus from an analysis andmodelling point of view will be focusing on urban activitiescorresponding to retail establishments (e.g. restaurants).

4.1 Visualizing Mobility Interactions

To visualize an urban activity network, we create a graphGi for each city i, where the set of nodes Ncat is the setof business categories defined previously in Section 3. Inthis network, business categories are linked by weighteddirected edges es→d. A directed link is created from thesource category cs to the destination category cd if at leastone transition happens during the time window we con-sider (e.g. weekend, weekday, or a period of hours duringa day). Thus, the weight of each edge is proportional tothe total number of transitions from the source category tothe destination category for the particular time period ofinterest for each city. The weights are then normalized bythe total number of check-ins that occurred at cd. There-fore, the weight can be interpreted as the percentage ofcustomers of cd who come from cs. To eliminate insignifi-cant links, we filter out edges that have less than 50 tran-sitions total. We examine two time intervals of interest:

3

Figure 1: Network visualization of categories during the evening in London (left) and Paris (right). Differentcolors represent different Louvain communities [2]. We see clusters of travel and transport (blue), nightlife(red), and shopping activities (green).

morning AM (6am-12pm) and evening PM (6pm-12am).In Figure 1 we visualize the network in the evening for

two cities, London and Paris. The colors represent differ-ent communities, obtained using the Louvain communitydetection algorithm [2]. Further, the size of nodes is pro-portional to their degree. This visualization, as one ex-ample, describes similarities and variations in the struc-ture of urban activities in different cities. We observean underlying common structure for the two cities, eventhough cultural distinctions can also be noted. We haveobserved a similar pattern across different cities which wedon’t visualize due to lack of space. In terms of similar-ity in network structure, we see a shopping cluster (green)centered around Department Stores; a cluster for traveland transport (blue) centered around categories such asTrain Stations and Subways; a leisure cluster (light brown)centered around Plazas and containing outdoor categories(e.g. Parks, Gardens, Soccer Stadiums). On the otherhand, differences in network structure become also appar-ent. We note for instance how recreation activities in theevenings differs across the two cities. London has a con-siderably large nightlife cluster (red) centered around pubsfrom which a number of different nightlife categories un-fold (e.g. Nightclubs, restaurants of different types, The-ater). Paris is more segregated and contains two nightlifeclusters: one cluster around French Restaurants (red)linked to Coffee Shops, Theaters, Nightclubs; and anothercluster (gray) centered around Bars which contains FoodTrucks, Fast Food Restaurants, and Music Venues. Thisdichotomy translates to the presence of two classes of cus-tomers each of which adheres to different types of activitysequences during nighttime. Another observation is re-garding variations in network structure over time: the Cof-fee Shop category in London is separated from the nightlifecluster, which may indicate different kind of customer be-haviors between daytime and evening. Interestingly, wealso see associations emerging between types of businesses.Taking Paris as an example, French Restaurants interact alot with Coffee Shops and Nightclubs and so do Bars withFood Trucks. In both cities, Coffee Shops are drawing

crowds from Subways, Toy Stores with Electronics Storesand Sport Stores.

Overall, these results suggest strong structural charac-teristics in urban activity networks where different cate-gories of places form interaction patterns of cooperation,where mobile users move from one to the other. Competi-tion on the other hand manifests in a more implicit mannerin the network in two ways: first, retail facilities that aregrouped in the same node (e.g. Bars) have to share cus-tomers that have been previously performing a differentactivity (e.g. going to a Restaurant) and second, throughactivities that do not share an edge in the network and asa result they do not interact with one another in terms ofmobility patterns.

4.2 Network Properties

We next quantify the structure of these networks in termsof different network properties considering also differenttime intervals. For our two cities of comparison, we listthe network metrics in Table 2 and enlist those next.

• The average clustering coefficient, < C >, is the ten-dency of categories to form triangles, that is to gatherlocally into fully connected groups. It varies between0 and 1 with higher values implying a higher numberof triangles in the network (see [16] for more details).

• The average closeness centrality, < Cc >, is the av-erage length of the shortest path between the nodesand here accounts for the tendency of categories to beclose to each in terms of shortest paths [16]. It variesbetween 0 and 1 where a higher closeness centralityscore for a node suggests higher proximity to othernodes in the network.

• The modularity, Q, is a well established metric indi-cating how well defined communities are within thenetwork [2]. Modularity values fall within the range[−1, 1], with greater positive values indicating greaterpresence of community structure.

4

London Paris

AMRandom

AMPM

RandomPM

AMRandom

AMPM

RandomPM

# ofnodes

176 176 199 199 157 157 150 150

# ofedges

1727 1125 2539 1636 1708 1084 1818 1105

< C > 0.655 0.361 0.657 0.410 0.671 0.456 0.701 0.437< Cc > 0.298 0.426 0.373 0.444 0.367 0.445 0.390 0.454

Q 0.380 0.172 0.398 0.153 0.326 0.156 0.310 0.147

Table 2: Network metrics for London and Paris for during the morning AM (6am - 12pm) and evening PM(6pm-12am). These metrics are compared to a configuration model (Random).

We compare our network metrics to a random baseline,a configuration model[15], which maintains the degree dis-tributions of our networks, in Table 2. The comparisonwith the null model provides an indication of how signifi-cant empirical observations are with respect to the randomcase. First we note that for all three metrics the real net-works are very different to the corresponding null models.In general, high clustering coefficient and modularity to-gether with a lower closeness centrality scores point to thetendency of local businesses to form significantly tight clus-ters that are well isolated from one another. Furthermore,we also investigated variations of these networks propertiesfor different period of the day. We observed in some casesthat the closeness centrality was higher in the evening rel-ative to morning hours, which is the case of London forexample (see Table 2).

Looking closer at the network modularity scores pre-sented in Table 2 we note a partitioning of different cat-egories into communities with scores around 0.3/0.4 forboth cities compared to much smaller values ≈ 0.15 forthe null model. Finally, the similarity in terms of net-work properties values between the two cities, as well asthe prominent community structure in both suggest thatthe hypothesis that the organization of the retail businessecosystem is similar across cities is a plausible one. Thisis true to a certain degree, nonetheless variations are alsonoted due to apparent cultural differences.

In this section, we highlighted the dominance of commu-nity structure and local clustering in urban activity net-works. This observation suggests that categories gain (orlose) attractiveness as a result of other activities aroundthem. It further raises the question of how businesses af-fect each other and in particular how these dynamics alterwhen a new business opens in proximity of another one.In the next section we perform an impact analysis withregards to opening events of new retail facilities aiming toshed light to the aforementioned questions.

5 Measuring Impact

In this section we examine the impact of new businesseson other establishments within their vicinity. Previouswork [4] has considered the homogeneous impact of a newbusiness opening, that is the impact that venue categorieshave on categories of the same type (e.g. the impact of a

new Coffee Shop on another Coffee Shop). In this sectionwe generalize the impact metric to heterogeneous mixesof categories, and develop a methodology more robust tospatio-temporal bias effects.

5.1 Spatio-temporal Scope of Impact

To measure the impact of a new venue opening in an areawe need to define its geographic scope. We define thespatial neighborhood of a venue as the set of venues thatare located within the radius rs. Formally, we define thespatial neighbourhood of a venue vn as:

SN(vn, rs) = {ve ∈ V : dist(vn, ve) < rs ∧ vn 6= ve} (1)

where V is the set of venues in the city and dist(vn, ve) isthe Euclidean distance between venue vn and ve.

Further we introduce a similar notion across the tempo-ral dimension. In particular, given a temporal radius rt,then two businesses opening within rt are considered tobe temporal neighbors. Formally, we define the temporalneighbors of a venue vn as:

TN(vn, rt) = {ve ∈ V : t dist(vn, ve) < rt ∧ vn 6= ve} (2)

where t dist is the difference in the number of monthsbetween the creation date of vn and ve. Finally, we defineWT as the temporal window of observation, i.e. the totalperiod in months over which we measure the number ofcheck-ins at a given business.

5.2 Measuring Impact

5.2.1 Defining the impact formula

We base our impact metric Ivn(ve, tvn) of a new venuevn on an existing venue ve on the metric introduced inDagitt et al. [4]. This is defined as the normalized num-ber of transitions for an existing venue in a specific timeperiod prior to and after a new venue opens in its spatialneighborhood. We define the number of check-ins duringthe posterior time interval as follows:

Cpostvn (ve, tvn ,∆t) =

tvn+∆t−1∑d=tvn

nve (d, d + 1) (3)

This calculates the sum of check-ins at venue ve, nve , afterthe opening of vn at tvn and over a period of ∆t = WT /2

5

CategoryMedianImpact

% Busi-nessesI < 1

# Busi-nesses

Fast FoodRestau-

rants0.79 67.9 28

Bakeries 0.81 73.1 26PizzaPlaces

0.81 69.3 39

CoffeeShops

0.84 66.4 202

SandwichPlaces

0.84 62.0 63

Table 3: Ranking of the categories which have thestrongest homogeneous negative impact.

months. We define the number of checkins during the priortime interval as follows:

Cpriorvn (ve, tvn ,∆t) =

tvn−∆t−1∑d=tvn

nve (d− 1, d) (4)

Similarly, this calculates the number of check-ins overthe period of ∆t months before the opening of vn. For-mally, the impact metric is defined as follows:

Ivn(ve, tvn ,∆t) =Cpost

vn (ve, tvn ,∆t)/Ncat(tvn , tvn + ∆t)

Cpriorvn (ve, tvn ,∆t)/Ncat(tvn , tvn −∆t)

(5)Ncat(tvn , tvn + ∆t) (and respectively Ncat(tvn , tvn −∆t))is the total number of check-ins to all businesses of ve’scategory during the period [tvn , tvn + ∆t] (respectively[tvn , tvn − ∆t]). This normalizing factor takes into ac-count both the background trend of the category and thepotential season effect which can occur in the market. Thisfactor is calculated at a per city level.

5.2.2 Impact metric interpretation

Above, we define Ncat(tvn , tvn + ∆t) as the total numberof check-ins to all businesses of a specific category. We de-fine the market share of a given venue as the total numberof check-ins to that venue as a percentage of Ncat. Usingthe impact metric defined in Section 5.2.1 we can calculatethe percentage of market share gained or lost by venue veafter the opening of venue vn. For example, if vn is a cloth-ing store and ve is a sandwich shop Ivn(ve, tvn ,∆t) = 1.2means the market share of venue ve (the sandwich shop)increased by 20% over the ∆t months following the openingof vn (the clothing store). Conversely, a number below 1means the market share of the venue ve decreased after theopening of venue vn, suggesting the new venue was a com-petitor. For the example above if Ivn(ve, tvn ,∆t) = 0.81this would correspond to the market share of venue ve (thesandwich shop) decreasing by 19% over the ∆t months fol-lowing the opening of vn (the clothing store).

(a) Effect of the size of the spatial radius on the num-ber of data points when rt is set to 3 months. Anoptimal value of rs is drawn for 100 m.

(b) Effect of the size of the temporal radius on thenumber of data points when rs is set to 100m. Thenumber of data points continues to decrease for rt > 1months.

Figure 2: Tuning the spatial and temporal parametersof our model.

5.3 Tuning Spatial and Temporal Win-dows

A major challenge in analyzing correlations in a complexsystem is implementing a methodology which can limitthe effect of potential hidden variables which may bias theobserved correlations.

For our model, our methodology must isolate the impactof one business in constantly changing neighborhoods. Forthis reason, as a first analysis, we set rt to ∆t = WT /2,that is half of the window of observation, and we onlychoose a pair (vn, ve) if the new business vn has no tem-poral neighbors (defined above in Equation 2). In otherwords, if we set ∆t = WT /2 = 3 months, we only measurethe impact of vn on ve if ve was open at least 3 monthsbefore vn and if no other business opened in the spatialneighborhood of ve during [tvn −∆t, tvn + ∆t]. We applythis form of filtering to isolate the confounding effect ofmultiple businesses opening next to each other. This nat-urally introduces a trade-off with regards to the numberof data points considered for our measurement.

The number of data points resulting from our approachdepends on two parameters. The spatial radius rs andthe temporal radius rt. As shown in Figure 2(a) where rtis fixed at 3 months, when the spatial radius is small theprobability of a new business nearby opening is low. Hencethe number of data points remains small as well. As thespatial radius increases so does the number of data pointsuntil a maximum is reached after which it decreases again.This likely due to the fact that after a certain spatial ra-dius the probability of finding only one business openingwithin rs decreases as well. Based on these results we setparameter rs to 100 meters for our subsequent analysis.We apply the same reasoning for tuning the temporal ra-

6

Figure 3: Plot of the impact measure of Coffee Shopson Burger Joints. Each column is a Burger Joint forwhich only a Coffee Shop opened in its neighborhoodduring a period ∆t = 3 months. We see that 37/56joints are over I = 1, that is 66% of Burger Joints havebeen impacted negatively by the opening of a CoffeeShop.

dius rt setting rs to 100m. However, tuning rt is morechallenging. Our hypothesis is that after a time period,tvn + rt, the positive or negative impact of a new businesswill become stable within the scope of a neighborhood.After this period of time, vn will be considered as partof the baseline of the neighborhood, when examining theimpact of future new venues. Our hypothesis is that asrt increases, the less likely we are to be considering othernew shops that are interfering with the impact we measure.Figure 2(b) shows that the number of data points steadilydecreases from rt = ∆t > 1 month onwards. This im-plies that the optimal value for rt in terms of data pointsis smaller than one month. However, as suggested ear-lier increasing rt limits the risk to consider previously newvenues that opened just before the time window of interestin the impact measure. For this reason we choose a valueof rt longer than one month. For the analysis describedbelow, we set rt to three months.

5.4 Measuring Impact on Retail Activ-ity

We perform the analysis using the aforementioned setupon the 16 most popular retail categories and aggregateddata from the 26 cities listed above in Section 3. We ag-gregated data from these cities based on observed similar-ities in consumer trends as discussed in Section 4. Thisenabled us to build a set of 10, 238 businesses. The top16 retail categories examined were as follows: Bakeries,Bars, Burger Joints, Clothing Stores, Cocktail Bars, CoffeeShops, Fast Food Restaurants, French Restaurants, Gro-cery Stores, Hotels, Italian Restaurants, Nightclubs, PizzaPlaces, Pubs, Sandwich Places, and Supermarkets.

Figure 3 shows the impact measured for each BurgerJoint where a Coffee Shop opened within its spatial neigh-borhood. We observe a heterogeneous distribution of im-pact scores. However 66.1% of the 56 burger joints inthis configuration were impacted negatively. The medianimpact is 0.85 which corresponds to a 15% decrease in de-mand to Burger Joint when a Coffee Shop opens within100 meters. In this analysis, due to the skewness of theimpact measures distribution, we use the median and the

percentage of positively (or negatively) impacted busi-nesses as key numbers to represent the directed impactof pairs of categories. Furthermore, we note the struc-ture of Burger Joints in Figure 3. It is composed of twoparts: a high impact peak followed by a linear decline.Interestingly, this structure is consistent across all pairsof categories, generalized with a peak of ∼ 10% of highimpacts shops (I > 1.5), followed by a linear decline ofimpact (∼ 80− 90% of shops), and succeeded with an op-tional ∼ 10% discontinuously negatively impacted shops,representing competition in the area (0 < I < 0.5).

In Figure 4 we show the matrix of median impact of ourtop 16 retail categories on each other. A few observationsare worth noting. First, impact scores on the diagonaltend to be negative. This observation shows and quantifiesthe direct competitive effect of businesses belonging to thesame category. Second, most impact scores are negativesuggesting the general tendency for retail establishmentsto compete. Third, some cooperative pairs are noticeable(e.g. Hotels on Pizza Place: 70% of positive impact, me-dian impact: 1.2; Nightclubs on French Restaurants: 67%of positive impact, median impact: 1.8). We also notethat certain network links seen in Figure 1 between cate-gories can be found as positive impact pairs in the matrix,such as Nightclubs and French Restaurants and Fast FoodRestaurants and Bars.

5.4.1 Building a competitive ranking

Using the aforementioned results we list a ranking of ho-mogeneous competition in Table 3. This ranking showsthe top categories in terms of negative impact, when abusiness of the same category opens within 100 meters.The list suggests that fast food restaurants feature thestrongest homogeneous negative impact (−21%), closelyfollowed by other businesses where people may visit forfood or a snack, namely Bakeries, Pizza Places, CoffeeShops, Sandwich places. Note also the high probability ofnegative impact for Bakeries with a 73.1% chance to benegatively impacted.

Our methodology can also be applied to heterogeneouspairs of businesses to determine which new business typewould create the greatest competition or cooperation foran existing business. Using Coffee Shops as an example,we saw in Table 3 that a new Coffee Shop creates a me-dian impact of I = 0.84. We similarly see in Figure 4 acompetitive effect from new Burger Joints, which resultin a median impact of I = 0.85. A competitive rankingof this type can be established for all categories to betterunderstand trends in the impact of new businesses.

To our knowledge the methodology described above isnovel, and despite the limitations that we discuss in moredetail in Section 7, it allows for the principled quantifica-tion of the impact that new businesses have in the retailecosystem they operate. We exploit this formulation inthe context of a prediction task in Section 6 where impactscores are modeled as target labels and the goal becomes topredict the impact of new businesses opening on their localenvironment. From an economic perspective these resultshighlight the most competitive types of business categoriesthat operate in cities and moreover the highlight cases ofcategories where cooperation in terms of customer flows is

7

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

Bakeries

Bars

Burger Joints

Clothing Stores

Cocktail Bars

Coffee Shops

Fast Food Restaurants

French Restaurants

Grocery Stores

Hotels

Italian Restaurants

Nightclubs

Pizza Places

Pubs

Sandwich Places

Supermarkets

Baker

ies Bars

Burge

r Join

ts

Clothin

g Sto

res

Cockta

il Bar

s

Coffee

Sho

ps

Fast

Food

Res

taur

ants

Frenc

h Res

taur

ants

Groce

ry S

tore

s

Hotels

Italia

n Res

taur

ants

Nightcl

ubs

Pizza

Places

Pubs

Sandw

ich P

laces

Super

mar

kets

Impacted category

Impa

ctin

g ca

tego

ry

0.51.01.52.02.5

Impact

# of data points●●

50100150

200

Figure 4: Matrix of median impact of categories on each other. The size of the circles represent the number ofpairs of businesses which were used (e.g. Coffee Shops - Coffee Shops is 200). The color varies according to themedian impact value of the pair, dark blue corresponds to impact below 1 (representing competition), greyrepresents neutral median impact equal to 1, and dark red translates to high median impact (representingcooperation). Read: French Restaurants (9th row) have a median impact on Bars (2nd column) of 1.2 andless than 100 pairs were counted.

8

more likely to emerge.

5.4.2 Takeaways

Our approach can enable a quantitative metric to mea-sure competitive and cooperative behaviors between pairsof categories. Further, it highlights general trends, suchas competition between food-related categories and coop-eration between certain clusters of categories. However,reducing the complexity has two limitations. It limitsthe number of data as it filters out neighborhoods withmore than one business opening within WT . It also over-simplifies the problem. Often, for a given pair of cate-gories, there is heterogeneity in the impact measure be-tween those two categories (e.g. there is a range in theimpact of a new Coffee Shop on a Bakery). This suggeststhere are complex interactions between a new venue andits environment and there may be additional factors toconsider. In the following sections, we study the impact ofcombinations of multiple new businesses coupled with thenetwork topology of their environment. This methodologyleads to insights which can help determine the optimallocation for a new venue.

6 Predicting New Business Im-pact

In Section 5, we explored the impact of the opening of asingle business on its neighborhood. Next, we investigatein the form of a prediction task the cumulative impact ofmultiple shops opening within the same spatial and tem-poral neighborhoods, as defined in the previous section.

6.1 Prediction Task

To illustrate our methodology, let us consider the exam-ple of French Restaurants. To understand the impact ofnew venues on French Restaurants in Europe and NorthAmerica, we consider all new venues that opened withinthe spatial radius rs of a French restaurant within a givenperiod rt. We then examine how the demand of that givenFrench restaurant changed after the opening of the newvenues. We aim to predict whether the opening of thosenew venues will have a positive or negative impact on theFrench restaurant given network features about the venue,and a count of the types of venues that opened within rt.We consider this prediction task for all retail categoriesaiming to understand how each category is impacted bynew venues opening nearby.

We model the impact of new venues as a binary clas-sification task where the impact on the existing venue isthe dependent variable and the features described belowin Section 6.2 are independent variables. We further rep-resent a positive impact label as 1 and a negative im-pact label with a 0. Considering a supervised learningmethodogy, our goal then becomes to learn an associationof the input feature vector x with a binary label y. We ex-periment with a number of supervised learning algorithmsdescribed in the following paragraphs. All network fea-tures were min-max normalised. We split our dataset intotraining and test sets with the training set consisting of

80% of the data and the test set consisting of the remaining20%. We perform 5-fold cross-validation to pick the bestperforming model and report the subsequent accuracy ofprediction. Finally, we also sub-sample our dataset per-forming our predictions on a balanced dataset of positiveand negative classes.

6.2 Extracting features

6.2.1 Business features

When a new venue vn opens at time tvn within a givenspatial neighborhood SN(vn, rs) we count all other busi-nesses which opened within the temporal neighborhoodTN(vn, rt) of vn and in the same spatial neighborhoodSN(vn, rs). For each existing venue ve, the counts of eachcategory of venue within its spatial and temporal neigh-borhood is encoded as a feature. We refer to these featuresas Business features.

6.2.2 Network features

As described above, we utilize network topology measuresat the venue neighborhood level considering each venueve ∈ SN(vn, rs) where each venue is a node in the network.Edges in the network represent the number of transitionsfrom the source venue to the destination venue. Thesefeatures are described as follows. The in-degree of thevenue ve is the number of edges coming from the othervenues to ve. The out-degree of the venue ve is the numberof edges coming from ve to the other venues. The degreeof ve is the total number of edges between ve and theother venues. The closeness centrality of a venue ve isthe average of its shortest paths to all the other venuesof the network. The diversity of a venue ve is defined bythe Shannon equitability index from information theorywhich calculates the variety of the neighbors of venue ve[18]. These features give a representation of how well agiven venue is connected to the rest of the city networkand also describe the distribution of its customers.

6.3 Evaluation

In this section, we report our findings on the predictiveability of the features as well as the supervised learningmodels we employ in the prediction task. We compare thepredictions against different baselines:

• Network connectivity features, as described in section6.2.1.

• Business features, as described in section 6.2.2.

We first discuss how our combined model outperformsboth baselines, then show the predictive capabilities of dif-ferent categories, and lastly discuss the value of networkmetrics in our prediction task.

Model Selection: We explored a number of differentmodels, including Logistic Regression, Gradient Boosting,Support Vector Machines, Random Forests, and NeuralNetworks. As described above, we train our models topredict the impact of new businesses on a given type ofretail category. When aggregating our data across all cat-egories and working to predict the impact of a new venueon any type of existing venue category, this resulted in

9

Category Coffee Shops Bars Italian Restaurants Hotels French Restaurants BakeriesTrain/Test Size 1956/218 2275/253 1629/181 2129/237 1631/181 1526/170AUC (Business

Features Baseline)0.611 0.603 0.618 0.621 0.701 0.771

AUC (NetworkFeatures Baseline)

0.549 0.551 0.564 0.557 0.591 0.584

AUC (All Features) 0.627 0.642 0.658 0.702 0.742 0.861

Table 4: AUC Scores for a subset of categories using a Gradient Boosting model.

models with low predictive power. As discussed previouslyin Section 5, this suggests there is significant heterogeneityof behavior across different category types and therefore atraining strategy tailored for each category because moreeffective. To take this into account, we segregated ourdata by category and built a separate model for each typeof category. We individually modelled the ten retail cate-gories with the largest total number of data points. Thesecategories were as follows: Bars, Hotels, Coffee Shops, Ital-ian Restaurants, French Restaurants, Bakeries, ClothingStores, Pizza Places, Nightclubs, and Grocery Stores. Tocompare our supervised learning models, we calculated theaverage AUC scores across all ten categories using our dif-ferent feature baselines.

Our combined models, using both business and networkfeatures, had a resulting AUC of 0.611 for Logistic Re-gression, 0.687 for Gradient Boosting, 0.631 for SupportVector Machines, 0.640 for Random Forest, and 0.667 forNeural Networks. These results suggest that venue-specificmetrics can support the prediction of the impact of a newvenue. Across all categories, we saw that Gradient Boost-ing was the most robust model for our task with the high-est average AUC across all ten categories. As such, we useGradient Boosting to further analyze our models below.

Variations across Categories We next examine thepredictability of different category types. Table 4 showsthe predictability of six of the ten categories trained us-ing a Gradient Boosting model. The AUCs of the re-maining categories are as follows: Clothing Stores (AUC:0.664, train/test size: 1362/152), Pizza Places (AUC:0.682, train/test size: 1287/143), Nightclubs (AUC: 0.686,train/test size: 622/70), and Grocery Stores (AUC: 0.690,train/test size: 581/65). We see a range in the predictabil-ity of different venue types: Bakeries have the highest AUCof 0.861 and Coffee Shops have the lowest AUC of 0.627.This suggests that certain activities adapt to a more het-erogeneous set of environment, making them more chal-lenging to predict. These more challenging categories tendto be those that are ubiquitous and general, such as Barsand Coffee Shops.

The Value of Network Metrics: To understand theinfluence of different feature classes on our results, we runour Gradient Boosting model on each class separately andin combination. We report our observations in Figure 5and note that for Bakeries the category features alonereach AUC of ≈ 0.77, the network features alone reachAUC of ≈ 0.60 and combined model leads to a higheroverall AUC of ≈ 0.85. These results, which were con-sistent across all retail categories, suggest using networkfeatures in lieu of venue features alone supports our predic-tion task. We see that the two feature classes together have

Figure 5: ROC Curves of Bakeries for the performanceof each class of features and for the combined model.

the best prediction results, suggesting that the impact ofnew venues is driven by a number of forces, including thenetwork connectivity, which modulate the nature of theirinteractions within their neighborhood.

7 Conclusion And Future Work

Our methodology highlights the power of complex net-works measures in building machine learning predictionmodels. This is especially valuable for systems in whichinteractions between agents must be taken into account.We began in Section 4 of this work by detecting patterns ofcooperation in urban networks and quantifying similaritiesand differences in network structure across cities. Next, inSection 5 we measured the impact of new businesses isolat-ing cases in which exactly one new venue opened withina given region. Using urban network topology measureand our business impact metrics, we developed a machinelearning model to predict the optimal location for a newbusiness in Section 6. The novelty of our approach inmethodological terms stems from the use of complex net-works measures, combined with machine learning meth-ods to tackle modern societal challenges. These resultscan support policy makers, business owners, and urbanplanners as they have the potential to pave the way forthe development of sophisticated models describing urbanneighborhoods and help determine the optimal conditionsfor establishing a new venue.

10

One of the limitations of our work is that we only ex-amine the impact of retail venues. However, this analysiscan be more broadly applied to venues of other categoriesas well. The present study sets the frame for further gen-eral studies. As one example of future work, one couldexamine the impact of new bus stops or transport hubson those venues within their proximity. Additionally, inthis work we conduct a preliminary examination of thevariations in network trends across cities worldwide. Fu-ture work could expand upon this to explore the dualitybetween general network trends and cultural consumer id-iosyncrasies across cities. Our analysis could also be fur-ther expanded by considering temporal network analysis,examining the variations in features across different timeintervals of interest.

8 Acknowledgements

We thank Foursquare for supporting this research by pro-viding the dataset used in the analysis.Jordan Cambe would like to thank the GdR-ISIS and theED PHAST (52) for supporting his work in the Universityof Cambridge.Krittika D’Silva’s work was supported through the GatesCambridge Trust.

References

[1] W. Applebaum. Methods for determining store tradeareas, market penetration, and potential sales. Jour-nal of marketing Research, pages 127–141, 1966.

[2] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, andE. Lefebvre. Fast unfolding of communities in largenetworks. Journal of Statistical Mechanics: Theoryand Experiment, 2008(10):P10008, 2008.

[3] P. Crucitti, V. Latora, and S. Porta. Centrality mea-sures in spatial networks of urban streets. PhysicalReview E, 73(3):036125, 2006.

[4] M. L. Daggitt, A. Noulas, B. Shaw, and C. Mascolo.Tracking urban activity growth globally with big lo-cation data. Royal Society Open Science, 3(4):150688,2016.

[5] K. D’Silva, K. Jayarajah, A. Noulas, C. Mascolo, andA. Misra. The role of urban mobility in retail busi-ness survival. Proc. ACM Interact. Mob. WearableUbiquitous Technol., 2(3):100:1–100:22, Sept. 2018.

[6] K. D’Silva, A. Noulas, M. Musolesi, C. Mascolo, andM. Sklar. Predicting the temporal activity patternsof new venues. EPJ Data Science, 7(1):13, May 2018.

[7] H. A. Eiselt and G. Laporte. Competitive spatialmodels. European Journal of Operational Research,39(3):231–242, 1989.

[8] Get details of a venue. https://developer.

foursquare.com/docs/api. [Online; Last accessed09-May-2018].

[9] Y. Fu, Y. Ge, Y. Zheng, Z. Yao, Y. Liu, H. Xiong, andJ. Yuan. Sparse real estate ranking with online user

reviews and offline moving behaviors. In Data Min-ing (ICDM), 2014 IEEE International Conference on,pages 120–129. IEEE, 2014.

[10] A. Ghosh and C. S. Craig. Formulating retail locationstrategy in a changing environment. The Journal ofMarketing, pages 56–68, 1983.

[11] M. Gibson and M. Pullen. Retail turnover in the eastmidlands: A regional application of a gravity model.Regional Studies, 6(2):183–196, 1972.

[12] C. A. Hidalgo and E. E. Castaner. Do we need anothercoffee house? the amenity space and the evolution ofneighborhoods. ArXiv, 2015.

[13] P. Jensen. Network-based predictions of retail storecommercial categories and optimal locations. Phys.Rev. E, 74:035101, Sep 2006.

[14] D. Karamshuk, A. Noulas, S. Scellato, V. Nicosia, andC. Mascolo. Geo-spotting: mining online location-based services for optimal retail store placement. InProceedings of the 19th ACM SIGKDD internationalconference on Knowledge discovery and data mining,pages 793–801. ACM, 2013.

[15] M. Molloy and B. Reed. A critical point for randomgraphs with a given degree sequence. Random Struct.Algorithms, 6(2-3):161–180, Mar. 1995.

[16] M. Newman. Networks: An Introduction. OxfordUniversity Press, Inc., New York, NY, USA, 2010.

[17] S. Porta, V. Latora, F. Wang, S. Rueda, E. Strano,S. Scellato, A. Cardillo, E. Belli, F. Cardenas, B. Cor-menzana, et al. Street centrality and the locationof economic activities in barcelona. Urban Studies,49(7):1471–1488, 2012.

[18] A. L. Sheldon. Equitability indices: Dependence onthe species count. Ecology, 50(3):466–467, 1969.

[19] VentureBeat. Foursquare by the numbers, 2015.https://goo.gl/Vi1UUf.

[20] M. Zhou. Chinatown: The socioeconomic potential ofan urban enclave. Temple University Press, 2010.

11

https://developer.foursquare.com/docs/api

https://developer.foursquare.com/docs/api

https://goo.gl/Vi1UUf

Documents

Modelling Cooperation and Competition in Urban Retail