Predictive analytics can facilitate proactive property vacancy policies for cities

  • Published on

  • View

  • Download


  • act

    slies, NY 1

    to unnstratthe fucturlargera city

    reactive strategies aimed at the most urgent need, to policy development based on informed,Keywords:

    1. Introduction

    rthea-calletoriesloyment, and economictmigrajor

    region it results in a mismatch between supply of housing stock

    bus routes, schools, and grocery stores, as well as blight and

    Technological Forecasting & Social Change xxx (2013) xxxxxx

    TFS-17827; No of Pages 13

    Contents lists available at ScienceDirect

    Technological Forecastbeen a rise in vacant residential properties [3]. Elected andappointed officials in many of these cities perceive propertyvacancy as a major problem that affects all citizens [4].

    Although abandoned homes are symptomatic of otherproblems, they also contribute to neighborhood decline andfrustrate revitalization, e.g. in Baltimore, Maryland [5]. Indeed,housing abandonment can attract criminal activity, lead to anincreased risk of residential fire, and lead to unwelcome public

    crime in local neighborhoods are also contributing factors.Finally, there are property-specific factors such as the floor area,the number of bathrooms, and the owner's residency status.

    Several previous studies have also attempted to understandcauses for property vacancy. Bassett et al. found that housingabandonment in Flint, Michigan is not due to any single causebut is significantly related to a variety of economic, spatial, anddemographic factors [8]. In Buffalo, New York, Silverman et al.malaise [2]. As there has been an oupeople from city centers, one of the mhealth trends, independently of the socioeconarea [6]. There are a variety of policy actions

    This paper is based in part on a technical report2011. Corresponding author at: IBM Thomas J. Wats

    1101 Kitchawan Road, Yorktown Heights, NY 10598E-mail address: (L.R. Varshn

    0040-1625/$ see front matter 2013 Elsevier Inc. A

    Please cite this article as: S.U. Appel, et al., PForecast. Soc. Change (2013), http://dx.doiation of jobs andconsequences has

    and demand for housing. Local spatial factors such as nearness toMany cities in the industrial Noregions of the United States, the soseen the proliferation of rusting facprices, population losses, high unempthe use of predictive analytics within the sociotechnical system is provided using data fromSyracuse, New York.

    2013 Elsevier Inc. All rights reserved.

    st and Midwestd rust belt, have, declining home

    to address property vacancy [7], but require an understandingof underlying causes.

    Broadly speaking, we have found that property vacancy canbe linked to three hierarchical levels of cause. An overarchingfactor is regional population dynamics: when people leave aPredictive analyticsUrban planningholistic insight and proactive interventions that prevent and reverse decline. A demonstration ofProperty vacancySystems of systemsPredictive analytics can facilitate profor cities

    Sheila U. Appel, Derek Botti, James Jamison, LeIBM Thomas J. Watson Research Center, 1101 Kitchawan Road, Yorktown Height

    a r t i c l e i n f o a b s t r a c t

    Article history:Received 4 September 2012Received in revised form 6 July 2013Accepted 28 August 2013Available online xxxx

    Is it possible for a citythis paper, we demomining to determineusing a variety of straccuracy. Within aanalytics will allowomic status of thethat can be taken

    [1], prepared in Nov.

    on Research Center,, United States.ey).

    ll rights reserved.

    redictive analytics can property vacancy policies

    Plant, Jing Y. Shyr, Lav R. Varshney0598, United States

    derstand, analyze, predict, and therefore prevent vacant properties? Ine the feasibility of using techniques from machine learning and datauture vacancy risks for individual properties and for neighborhoodsal, demographic, socioeconomic, and city activity features with highsystems-of-systems framework that we develop, these predictiveto move from decision-making based on educated anecdotes and

    ing & Social Changefound that the vacant residential property rate of a census tractincreases with the poverty rate, the rate of renters receivingrental assistance, and higher percentages of business addresses[9]. In Philadelphia, Pennsylvania, Hillier et al. found that out-standing housing code violations, and tax arrearages, as well ascharacteristics of nearby properties were predictive of aban-doned properties [10]. They also developed a basic predictivealgorithm.

    cilitate proactive property vacancy policies for cities, Technol.013.08.028

  • 2 S.U. Appel et al. / Technological Forecasting & Social Change xxx (2013) xxxxxxAs part of the IBM Smarter Cities Challenge (, weworkedwith the governmentof the City of Syracuse, New York to help understand, analyze,predict, and therefore prevent vacant residential properties. Thecity's goal is to move from decision-making based on educatedanecdotes and reactive strategies aimed at the most urgentneed, to policy development based on informed, holistic insight,and proactive interventions that prevent and reverse decline.Specifically, Syracuse asked us how to:

    1. identify indicators for factors contributing to the causes ofproperty vacancy,

    2. integrate and analyze relevant data from disparate sourcesacross a broad ecosystem of stakeholders, and

    3. develop a predictive, flexible model to show the impactvarious events or actions could have on a neighborhood'sstability.

    In this paper, we report our results.Since Syracuse does not have tens of thousands of abandoned

    houses likeDetroit,Michigan or Philadelphia, the problem seemsmore manageable [11] and a proactive approach based on data-driven risk forecastingmore amenable to affecting social change.

    In developing a proactive approach to the residentialvacancy problem, we developed a systems-of-systems frame-work drawing on the field of engineering systems [12]. Withinthis framework, we defined a specific information technologyarchitecture that would support the data gathering, data ana-lysis, and knowledge dissemination necessary for the variouscity departments, regional agencies, not-for-profit organizations,and citizen groups to work together. Such an information tech-nology system would also integrate into the several city sub-systems, leading to coordinated preventative actions.

    Government databases and data warehouses in Syracuseand elsewhere are siloed, and so it was a non-trivial task tobring different data sources together. Furthermore, manyuseful datawere held by not-for-profit organizations rather thanby government departments. Data, such as those maintained incodes enforcement systems, in housing partner data systems,and in police intelligence systems, are meant for specific taskswithin the realm of each specific system. However, these datareveal a comprehensive view of the city if combined. Our re-search and development focused on combining and exploitingthese data for the development of a predictive analytics solution.

    The keystone of the system is a predictive analytics sub-systemwith algorithms drawn frommachine learning and datamining [1315]. Indeed, many of the algorithmic techniquesdeveloped for business analytics and service delivery in theprivate sector can be adapted almost directly to municipalgovernment service delivery [16]. Our predictive algorithmsoperate both to identify neighborhoods that are on the bubblewith respect to vacancy and to identify individual vacantproperties that should by all rights be occupied. Theseneighborhoods and individual properties are where mosteffort should be devoted. Consistent and unbiased estimates ofpredictive accuracy demonstrate that our algorithms will bevery effective. Important factors include male unemploymentrates and nuisance crime rates in neighborhoods, and housingcode violations in individual properties.

    In closing this introductory section, let us note that webelieve that the analytical frameworks and techniques describedherein may be applicable throughout the rust belt and beyond.Please cite this article as: S.U. Appel, et al., Predictive analytics can faForecast. Soc. Change (2013), reason for this belief is because there are several universallawswhich describe how cities are structured, how they behave,and how they evolve [1719]. By deriving insight from the vastoceans of data that are now starting to be collected and collatedwith digital devices, cities can positively affect issues thatfundamentally the quality of their citizen's lives and becomesmarter [20,21].

    2. City of Syracuse

    The City of Syracuse had evolved from a crossroadssettlement in the early nineteenth century to a bustlingindustrial and transportation hub by 1900. Since the 1950sSyracuse has grappled with balancing the ebb and flow of itspopulation and aligning that to its housing stock. At its peak,Syracuse was home to 250,000. While the population hasstabilized in the past 10 years, the distribution of those residentshas followed a common trend among many cities, particularlythose in the rust belt: the outmigration of jobs and people fromthe city center to suburbs, as shown in Figs. 1 and 2.

    The City of Syracuse is located in the geographic center ofNew York State within Onondaga County. More than 85% ofthe 42,000 parcels in the City of Syracuse are residential innature. There are roughly 25,000 single family homes in theCity and an additional 10,000 multi-unit residential structureshousing more than 60,000 households. The nature, type, andcondition of these residential uses varywidely but all fit togetherto form a patchwork of neighborhoods that provide a variety ofliving experiences. Of the total housing units, about 75% werebuilt before 1960 and 47% were constructed in 1939 or earlier.Houses built after 1980make up only 6% of the total. By contrast,only about 53% of housing units in the county were built before1960, and three times as many houses were built in the countyafter 1980 than in the city, reflecting a continuation ofresidential suburban sprawl [22].

    Of the approximately 35,000 residential parcels in the cityabout 1500 are vacant today and the mayor, members of theCommonCouncil, and other civic leaders have identified vacantproperties as one issue that unites all Syracusans, regardless ofethnicity, age, income, or education. Their byproductsblight,crime and declining property values and tax revenuesimpactthe quality of life of a diverse population which includesrefugees, academics, artists, and blue collar workers, amongothers.

    Global economic factors mean Syracuse shares a housingdynamic common amongmany cities. Declining property valuesand neighborhood degradation have removed the impetus formany homeowners to upgrade or maintain these properties;under- or unemployment has made it impossible for others.Declining property values have also led to an influx of specu-lators who, unfamiliar with local market dynamics, purchaserental properties as investments. In fact nearly 50% of Syracuse'shousing stock is occupied by renters, compared to 33% occupiedby owners. Absentee landlords or poor landlord managementhas led to rentals becoming untenanted and abandoned, exacer-bating the problem.

    2.1. Making Syracuse a smarter city

    Syracuse offers an ideal opportunity to demonstrate thetenets of a smarter city: its relatively small size and populationcilitate proactive property vacancy policies for cities, Technol.013.08.028

  • make it easier to drive and affect change in away that can scaleout to address what is a prevalent issue for other cities, of anysize, across the globe. Its size also makes it relatively easy toidentify, connect, and communicate with key stakeholdersacross an ecosystem which, while broad, is also relativelyshallow. In particular, the city has identified 13 formal partnersin their housing ecosystem and recognizes there are countlessother civic and private organizations that impact the cause andeffect of vacant property. The breadth of this ecosystem is atestament to the strength of the shared belief something mustbe done.

    Perhaps more importantly than size, city leaders havedemonstrated a commitment to innovative thinking and anappreciation of how to drive efficiencies by sharing commonresources, such as combining the information technologyinfrastructures for the city and the local school district. Theyare also adept at working collaboratively with partnersincluding housing associations, community organizations,police, fire officials, and others. In fact in 2010, for the firsttime, the City created a comprehensive housing plan [22]delineating its many distinct neighborhoods and outliningobjectives to preserve and rehabilitate existing housing

    Fig. 1. Population trends for Onondaga County (source: US Census Bureau).

    3S.U. Appel et al. / Technological Forecasting & Social Change xxx (2013) xxxxxxFig. 2. Housing stock trends for Onondaga County

    Please cite this article as: S.U. Appel, et al., Predictive analytics can faForecast. Soc. Change (2013), Syracuse (source: US Census Bureau).

    cilitate proactive property vacancy policies for cities, Technol.013.08.028

  • stock, fill gaps created by demolition with quality newconstruction, and provide home maintenance support andincentives to owners.

    The vision within the current mayor's administration is toenhance the quality of life for Syracusans and encourage anenvironment where vibrancy can flourish, and in doing so

    the result of prior actions to be sure those are reflected in

    4 S.U. Appel et al. / Technological Forecasting & Social Change xxx (2013) xxxxxxsubsequent analysis.

    1 However, geospatially tagged numerical and categorical data may allowmodeling of the spread of vacancy/property decline in space and in time. Inparticular, a spatiotemporal hidden Markov random eld model [23] may beappropriate to understand spread, as it captures how features are related toeach other in neighboring space or time. Methods from spatial econometricsmay be used to assess the impact of vacancy on the area [3].provide an example to other cities in similar straits.

    3. Current state of affairs

    In order to qualitatively understand the housing vacancyproblem in the city and to understand obstacles to the effec-tiveness of current methods, we interviewed more than 50people representing city and county planners, academia,housing association partners, neighborhood organizations,educators, philanthropic organizations, homeowners, andentrepreneurs. These interviews led to the followingfindings.

    There is a broad housing ecosystem whose existence isitself a testament to the strength of the shared belief thatsomething must be done. The challenge is there is no clearmethod of data exchange between these stakeholders. Dataexist in multiple silos. The city has one set of data, thepolice, another, and other stakeholders including housingpartners, builders and renovators, and not-for-profit orga-nizations also hold pieces of the puzzle. Collaborationseems strong, but few connections are more than social. Inmost cases, if data are exchanged it is done reactively, via aphysical handoff.

    The result is there is no single organization responsible fordata governance, nor is there a universally accepted defini-tion of vacancy, and few stakeholders are able to speak inquantitative terms about the scope of the problem and theimpact of interventions. Historical data are not systematicallyarchived.

    There is also no predictive modeling being done by any ofthe stakeholders focused on the vacancy problem. In fact,most analysis, if any, is focused on geospatial mapping of thedata. While a good visualization tool, geospatial analysis isinsufficient to provide prediction or deep insight.1

    Weworkedwith city planners tomap their current analysisprocesses, as in Fig. 3. The system reflects many manualinterventions throughout the analysis process. City plannersare required to serve as intermediary in almost every step inthe process to create an analysis of vacant properties. Data aremanually input and normalized as part of preprocessing;results are manually adjusted to provide planning priorities;and reports are created by hand and delivered by the plannersto the stakeholder making the request. Once action had beentaken, the city planner is required to monitor status throughexternal reporting systems, and know which outcomes werePlease cite this article as: S.U. Appel, et al., Predictive analytics can faForecast. Soc. Change (2013), Financial themes

    Lost tax revenue from vacant properties exacerbates thestringent budgetary constraints local governments face.Vacant properties also represent other direct costs, such asmaintenance.

    The city must rely on state or federal funds to address thevacancy problem, but those programs have specific param-eters for use. Often these do not align with the city's need. Forexample, funding to provide low cost housing does notnecessarily increase property value nor does it help stimulateneighborhoods identified to be at or near a tipping point. Otherpolicies can work at cross-purposes, creating an environmentthat makes it difficult to effect change.

    4. Systems of systems framework

    To develop a principled methodology for addressing thehousing vacancy problem, we introduce a system of systemsframework.

    The definition of a system of systems is the integration ofindependently useful systems into a larger system thatdelivers unique capabilities. In this paper, a system is said tobe composed of organizations, people, and technologiesthat are contained within a logical grouping that have acommon objective, interest, or purpose. A key characteris-tic of the systems of systems approach is an adherence toa loosely coupled design principle when integrating thesystems of interest. Loose coupling minimizes the dependen-cies between systems such that a change within one systemcan be accommodated by another system or systems withminimal to no change or modification.

    This enables people and systems to interact in entirely newand more accurate ways through the exchange of information.

    Based on our study, we have defined a vacant propertypredictive system of systems focused on interconnectingstakeholders who are either affected by, reacting to, ortrying to prevent the impacts of property vacancies withinneighborhoods. Fig. 4 illustrates this system of systems,comprising five elements:

    Neighborhood the place where people live and work withinthe city limits including the infrastructure that supports them.

    Planning and development the urban planning, businessdevelopment, and neighborhood sustainability functionsfor a city.

    Common services the safety, protection, transportation,infrastructuremaintenance, and improvement functions thatimpact quality of life within the city limits.

    Support services the financial, housing, educational,community, social, and charitable functions focused on acity, in particular those functions that impact quality of lifeassociated with neighborhoods.

    Predictive situational analysis system the core ana-lysis functions used to predict property vacancy andprovide prioritized recommendations and neighborhoodclassifications.

    A specific information technology architecture that im-plements the systems-of-systems framework is depicted inFig. 5. This would be the target state of affairs from thesystems point of view. Comparing Fig. 3 to Fig. 5, we observecilitate proactive property vacancy policies for cities, Technol.013.08.028

  • 5S.U. Appel et al. / Technological Forecasting & Social Change xxx (2013) xxxxxxa significant simplification and increase in automation. Such achange would increase efficiencies, improve real-time visi-bility, and allow data-driven policy making.

    The predictive situational analysis is the main informationtechnology and analytics piece of the system that isdeveloped in our work. As seen in Fig. 6, it also comprisesseveral pieces.

    4.1. Data clearinghouse

    Data, either structured or unstructured, flow into thesituational analysis system from several other loosely coupledsystems. The purpose of this component is to normalize data intostandardized forms to facilitate analysis. Since data inputs couldbe from disparate sources such as government agencies,not-for-profit organizations or citizens, such standardization iscritical. For example, the time-stamped listing of all nuisancecrime reports from the police department may be normalized

    Fig. 3. Current city vacanc

    Please cite this article as: S.U. Appel, et al., Predictive analytics can faForecast. Soc. Change (2013), the number of nuisance crime reports in the past year byneighborhood, to enable the spatial join operation in geospatialdatabases.

    4.2. Prediction

    The prediction component should be designed to usehistorical data to learn which features are indicative of eithervacancy or the risk of future vacancy. As demonstrated inSection 5, there are sets of features and indicators that canpredict vacancy at the individual parcel level with very highaccuracy. This component determines the vacancy state of agiven property and generates a score on its vacancy risk. Aproperty that is in an occupied state but has a high vacancyscore is at high risk for falling into vacancy. Contrarily, ifthere is a property that is in a vacant state but has a lowvacancy score, then simple ameliorative actions may lead tooccupancy.

    y analysis system.

    cilitate proactive property vacancy policies for cities, Technol.013.08.028

  • 6 S.U. Appel et al. / Technological Forecasting & Social Change xxx (2013) xxxxxx4.3. Cost estimation

    This component uses many of the same features andindicators as the prediction component, but rather thancomputing the vacancy risk, it estimates the costsbothdirect and indirectand the quality of life impacts of a givenproperty falling into vacancy or rising into occupancy.

    Direct costs such as tax revenue loss or demolition costsmay be computed directly from input features. Indirectcosts such as decreased property values, or increased crime,however, must be estimated using geospatially orientedmathematical models. Geospatial, numerical, and categoricaldata available in the ecosystem may be leveraged to makethese estimates using spatiotemporal hidden Markov randomfield models and spatial econometrics.

    An example output of this component would be that ifProperty A were to fall into vacancy, it would increase directcosts to the city by $10,000; decrease the value of propertieswithin 500 ft by 10% within two years; increase theprobability of its two next door neighbor properties fallinginto vacancy by 5% in one year and 35% in five years; andincrease nuisance crime in the adjacent block by 10%.

    Fig. 4. Vacant property predic

    Please cite this article as: S.U. Appel, et al., Predictive analytics can faForecast. Soc. Change (2013), Decision analysis

    The first function of the decision analysis component is tocombine vacancy risk scores, vacancy states, and estimates ofcosts and quality of life impacts to determine the expectedimpact of addressing a given property under various actions.These impacts can be at the level of individual properties,blocks, or neighborhoods. The result of this first function maybe raw impact assessments, or a prioritized list of propertiesthat are the easiest to address according to the assessment,but also have the greatest potential impact.

    The set of potential actions that can be taken in responseto prediction and impact assessment range from doing nothingto providing social services for occupants to demolishing astructure. The decision analysis component is designed to usethe impact assessment, together with known values of resourceconstraints, to suggest a sequence of actions.

    4.5. Event correlation

    This is designed to provide information to the predictioncomponent on what actions have been taken, so as to be able

    tive system of systems.

    cilitate proactive property vacancy policies for cities, Technol.013.08.028

  • 7S.U. Appel et al. / Technological Forecasting & Social Change xxx (2013) xxxxxxto correctly interpret data in the future. It also provides amechanism to notify stakeholders of events of interest related toconditions to the dashboard component.

    4.6. Dashboard

    The dashboard allows various stakeholders to see the results ofthe impact assessment, decision analysis and event correlationenginebefore the fact. This is the core information that canbeusedtomake policy decisions, understand gaps, andmonitor the statusof actions and their impact on achieving the desired outcomes.

    5. Predictive analytics for prioritization

    Forecasting the vacancy risk of either individual proper-ties or of neighborhoods is central to proactive policy making.

    Fig. 5. Target city vacanc

    Please cite this article as: S.U. Appel, et al., Predictive analytics can faForecast. Soc. Change (2013), this section, we bring together several datasets of physicaland socioeconomic factors to make predictions.

    Characterizing individual properties is an instance of thebipartite ranking problem in machine learning [24] since wehave binary-labeled data on whether a given property is vacantor occupied andwant to learn a real-valued scoring rule that canbe used to rank properties according to vacancy risk. To solve thebipartite ranking problem, we use a univariate loss function anduse an ensemble of decision trees [25] methodology.

    On the other hand, characterizing the vacancy rates ofneighborhoods is an instance of the regression problem. Asa starting point, we use Chi-squared Automatic InteractionDetection (CHAID) trees [26] to perform feature selection.After finding the most important features, we use multi-variate linear regression to develop a predictive algorithm.

    The primary datasets we used are as follows:

    y analysis system.

    cilitate proactive property vacancy policies for cities, Technol.013.08.028

  • uation

    8 S.U. Appel et al. / Technological Forecasting & Social Change xxx (2013) xxxxxx The parcel data (Data 1) used for analysis are provided bythe Bureau of Planning and Sustainability and have 38 fieldsand 41,805 parcels. The fields include important parcel(property) features such as whether a property is vacant,the full value assessed, the land value assessed, the numberof code violations, whether owner occupied, owner nameand address, size, neighborhood, number of bedrooms,number of bathrooms, land use classification, shape, length,depth, area, street frontage., etc.

    The neighborhood indicator data (Data 2) are also providedby the Bureau of Planning and Sustainability (through the2010 Syracuse Housing Plan) and include vacancy rate,median family income, average family size, male unem-ployment rate, female unemployment rate, average house-hold size, number of property owned, number of propertyrented, median year of house built, etc. This data set isbased on the 2000 census (U.S. Census Bureau).

    Police call data (Data 3) are provided by the Syracuse PoliceDepartment. The data are a listing of all telephone callsmade to the department in 2011. The information includedfor each call has time, date, call type, and neighborhood.

    Fig. 6. Predictive sitAlthough not presented in this paper, our analysis alsoused population race data and vacancy data by census tractfrom the 2010 census (U.S. Census Bureau), as well asvacancy data from the United States Postal Service andDepartment of Housing and Urban Development for censustracts in Syracuse.

    The remainder of this section details the methodologiesand results of our predictive analytics algorithms. Sincethe bipartite ranking problem may be more unfamiliar toreaders, we provide greater details than for the regressionproblem.

    5.1. Individual properties

    5.1.1. Bipartite rankingWe consider a bipartite ranking formulation of supervised

    learning since the dataset on individual property parcelscontains labeled data on whether or not a given parcel isvacant or occupied as well as several other features that can

    Please cite this article as: S.U. Appel, et al., Predictive analytics can faForecast. Soc. Change (2013), used as predictors. The goal is to learn a real-valuedvacancy risk scoring function to be able to prioritize actions.That is to say, we have binary-labeled training data and wantto learn a real-valued scoring function for ranking: thebipartite ranking problem [24].

    The goal of a ranking algorithm is to establish a total orderon parcels such that positive instances precede negative onesin the ranked list. Consequently, traditional algorithms forbipartite ranking call for minimizing the number of disagree-ments (or misorderings) among pairs of ranked samples [24].Such algorithms reduce ranking to a binary classificationproblem by treating each pair of instances as a single objectthat should be classified as positive if the pairwise orderingis correct and negative if the pairwise ordering is incorrect.Given both the large numbers of features in housing dataand the large training set size, this is often computationallyinfeasible.

    Using classification algorithms directly on the labeledtraining data to perform ranking has been demonstrated inmany domains to perform well in practice [27] and is alsocomputationally simpler. Hence we operate directly on the

    al analysis system.labeled training data rather than pairs of training data pointsand perform bipartite ranking through minimization of a(standard) univariate loss function [28]. This approach of usingunivariate loss to do ranking has provably good performancefor margin-based classifiers [28]. Many margin-based classi-fiers, however, like support vector machines and AdaBoost donot work well with the kind of categorical features that arepresent in housing data.

    We train random forest classifiers for binary classification[25,29], which are ensembles of decision trees trained onrandom subsets of training data. In operation, new instancesare classified using each tree in the ensemble. If we were touse the random forest for the original binary classificationtasks, we would take the majority vote of the individual trees,but since we are interested in ranking, we use the actualdistribution of votes as a score upon whose basis to rank. Forexample, if there are 100 trees in the ensemble and 36 treessay an unseen parcel i is predicted to be vacant and 64 treessay the parcel is predicted to be occupied, then the vacancyrisk score for parcel i will be 0.36.

    cilitate proactive property vacancy policies for cities, Technol.013.08.028

  • icting individual property vacancy risk.

    9S.U. Appel et al. / Technological Forecasting & Social Change xxx (2013) xxxxxxAlgorithm 1 describes the learning phase of the randomforest bipartite ranking algorithm and Algorithm 2 outlinesthe score computation phase of the bipartite ranking.

    Fig. 7. Feature importance for predIn contrast to margin-based classifiers or discriminanttechniques such as linear discriminant analysis or partialleast squares discriminant analysis, random forests do notexplicitly maximize the margin, thus making the score/margin an unbiased measure that is directly related togeneralization error.

    Please cite this article as: S.U. Appel, et al., Predictive analytics can faForecast. Soc. Change (2013), Scoring vacancy riskAfter filtering for residential use and a valid vacancy code

    (vacant or occupied), Data 1 had 33,522 properties for analysis.From these data, we found that the residential vacant propertyrate is 4.6%. Although single family and two-family homesaccount for 89% of the vacant properties, three-family homeshave the highest vacancy rate at 8.8%.

    We applied the random forest-based bipartite rankingalgorithm and key indicators, as shown in Fig. 7, emerged. Thenumber of code violations is strongly indicative of a vacantproperty. Other indicators include the full value assessment ofthe parcel, whether the property owner of record lives inSyracuse, and the year built. Even without using any neighbor-hood indicators, the ensemble-basedmodel provides a 99.975%(training accuracy) assessment that a parcel is in fact vacant.We ensured that the model did not overfit the data by con-trolling the out-of-bag estimate of the generalization error.Fig. 8. Scoring vacant properties according to vacancy risk. Properties withlow vacancy score should be prioritized.

    cilitate proactive property vacancy policies for cities, Technol.013.08.028

  • The vacancy score that arises from the model indicates the

    action. This analysis directly yields ordered lists of parcels foraction.

    5.2. Neighborhoods

    Aggregating socioeconomic data from the U.S. Census aswell as crime report data from the Syracuse Police Depart-ment, we examine vacancy rates by neighborhood.

    The first step in our analysis is to perform featureselection using a CHAID tree [26], finding six key features inthe census and police data. CHAID is a method to builddecision trees using chi-square statistics to identify optimalsplits. CHAID first examines cross tabulations between eachof the features and the vacancy rate, testing for significanceusing a chi-square independence test. First all features thatdo not yield statically significant differences in vacancy rateare merged. Next each group of three or more predictors isre-spilt by all possible binary divisions; a split is retained if ityields a statically significant difference in vacancy rate. Onceeach of the features has been grouped to produce the maximumpossible diversity of classes in the vacancy rate, the chi-squaretest is applied to the resulting groupings. The predictor thatgenerates groupings differing most according to this test is

    Fig. 9. Scoring occupied properties according to vacancy risk. Properties withhigh vacancy score should be prioritized.

    10 S.U. Appel et al. / Technological Forecasting & Social Change xxx (2013) xxxxxxease of rehabilitation and sale to turn a vacant property intoan occupied one. This scoring system could also be used toprovide an indicator that a property is at risk of becomingvacant. As such, the scoring system can be used to eitherprioritize parcels by vacancy risk or ease of rehabilitation.

    As it turns out, there are some vacant properties with lowvacancy score and there are occupied properties that havehigh vacancy score. These outlying properties are ones whereactions can be focused.

    The first histogram of scores (Fig. 8) depicts low-scoringvacant properties in green, where ameliorative action may beeasiest. The next histogramof scores (Fig. 9) depicts high-scoringoccupied properties in green, which are at highest risk of fallingvacant. These properties may be prioritized for preventativeFig. 10. Regression model for predicting neig

    Please cite this article as: S.U. Appel, et al., Predictive analytics can faForecast. Soc. Change (2013), as the splitter for the current node. The result is a CHAIDtree, and features used in splits near the root of the tree areconsidered the most important and are selected for furtheranalysis.

    After feature selection, we leverage a linear regressionmodelto assess the impact of key features on vacancies at theneighborhood level. Male unemployment emerged as the mostdominant factor. Average family size, percentage of medianfamily income, percentage of controlled substance calls to aneighborhood, percentage of disturbance calls, and percentageof local law violation codes also added significance to themodel.

    Overall, the regression model with these indicatorsobtained 81% accuracy for predicting neighborhood vacancyrates. The regression model parameters and significance

    hborhood property vacancy rate risk.cilitate proactive property vacancy policies for cities, Technol.013.08.028

  • 11S.U. Appel et al. / Technological Forecasting & Social Change xxx (2013) xxxxxxvalues are shown in Fig. 10. Moreover, using these factors, itis possible to generate simulations of the vacancy in theneighborhoods and tomanipulate key indicators to show theoutcomes of policy and investment changes or to determinethe percentage increase in a neighborhood's likelihood ofhaving a vacancy issue.

    The significance in coefficient table shows that male un-employment rate and percent of controlled substance callsare the two most important indicators. Average family sizeand percent of disturbance calls to police are the next set ofsignificant indicators. Percent of median household incomeand percent of local law violations are the final set of significantindicators.Writing the regression equation canhelp to evaluatethe impact of vacancy rate by increasing or decreasing thevalue of key indicators.

    The regression equation can be written as follows:

    VR 21:265 9:81 AFS 1:034 MUR0:032 PMHI 1:237 PDC1:885 PCSC 0:646 PLLVC;

    where VR is the vacancy rate, AFS is the average familysize, MUR is the male unemployment rate, PMHI is thepercentage median household income, PDC is the percent-age of disturbance calls, PCSC is the percentage of controlled

    Fig. 11. Classifying neighborhoods accordin

    Please cite this article as: S.U. Appel, et al., Predictive analytics can faForecast. Soc. Change (2013), calls, and PLLVC is the percentage of local lawviolations.

    As a result of the regression analysis, it is possible tocategorize neighborhoods into four types:

    Distressed, where a significant vacancy problem currentlyexists.

    Transitional, where substantial investment has previouslybeen made and the neighborhood is improving.

    Bubble, where a significant vacancy problem does not exist,but factors are in place which could tip the neighborhood toeither distressed or stable.

    Stable, where vacancy is not a problem and the market isoperating correctly to move new occupants into vacantproperty as part of the regular course of real estate.

    Mathematically, this categorization is based on the currentvacancy rate and on the standardized residuals from theregression analysis. Since the average vacant rate (treating eachneighborhood as a point sample) is 11.78%, we can adopt a ruleto classify the neighborhood with vacancy rate higher thanaverage asHigh and vacancy rate lower than average as Low. Thestandardized residual shows how far the predicted vacancy rateis from the observed vacancy rate and is used to classify the

    g to vacancy rate and vacancy risk.

    cilitate proactive property vacancy policies for cities, Technol.013.08.028

  • minimization of univariate loss, Proceedings of the 28th International

    12 S.U. Appel et al. / Technological Forecasting & Social Change xxx (2013) xxxxxxfuture vacancy risk. This is divided into negative and positive. InFig. 11, the several neighborhoods in Syracuse are plotted alongthe dimensions of vacancy rate and standardized residual andcategorized into the four quadrants.

    In the current reactive approach to housing policy, resourcesare devoted to distressed neighborhoods, whereas in a moreproactive approach, the goalwould be to prevent neighborhoodson the bubble from becoming distressed.

    6. Effecting change

    The systems framework, information technology architec-ture, and predictive analytics algorithm suite that we havedeveloped in the previous sections define a mechanism fortransforming from a reactive mode of operation based on gutinstincts to a proactive mode of operation based on mathe-matical models. This can be implemented by a city in a phasedfashion by following a 36-month roadmap that is detailed in alonger technical report that we have written and presented tothe City of Syracuse and key stakeholders in the housingecosystem [1].

    Beyond implementing the technical solution we havedescribed in this paper, and the data governance needed tosupport the systems of systems framework, another key aspectis communications both through public communications chan-nels to the citizenry but also through an organizational changemanagement program internally that ensures new programs,behaviors, and processes are adhered to.

    7. Conclusion

    The problem of increasing vacant properties is not easilyresolved and creates additional hardships on already constrainedcity resources and budgets. Although Syracuse had been takingactions, the approach was reactive and transactional rather thansystematic and predictive. This paper has provided a systemssolution that allows the City of Syracuse to begin transformingto a proactive and preventative model using analytics based onmathematical models and data available in the housing eco-system. Our proposed approach has now impacted city policy[30].

    Moreover, the basic predictive analytics applied at theneighborhood level through statistical regression analysis andat the individual property level through ensemble methodsfrom machine learning may be applicable to cities in similarstraights as Syracuse. Indeed, there is an emerging science ofcities that indicates broad structural similarities across urbansystems [19]. The basic systems of systems methodologydeveloped herein has been used to address urban planningproblems in other IBM Smarter Cities Challenge projects, suchas in Birmingham, UK. Going forward it is important to validatethe effectiveness of our proactive and predictive methodologyin other cities with similar problems, both from an informationtechnology infrastructure point of view and from a predictiveanalytics point of view.


    This work was conducted as part of an IBM Smarter CitiesChallenge engagement with the City of Syracuse Thanks toPlease cite this article as: S.U. Appel, et al., Predictive analytics can faForecast. Soc. Change (2013), on Machine Learning (ICML 2011), 2011, pp. 11131120.[29] S. Clmenon, N. Vayatis, Tree-based ranking methods, IEEE Trans. Inf.

    Theory 55 (2009) 43164336.[30] T. Knauss, Syracuse Mayor Proposes Crackdown on Vacant Homes, The

    Post-Standard, 2013.leaders of the Syracuse government including Stephanie A.Miner (Mayor) and Andrew M. Maxwell (Director of Bureauof Planning & Sustainability).


    [1] IBM, IBM's Smarter Cities Challenge Syracuse Report, 2011.[2] S. Safford, Why the Garden Club Couldn't Save Youngstown: The

    Transformation of the Rust Belt, Harvard University Press, Cambridge,MA, 2009.

    [3] B.A. Mikelbank, Spatial analysis of the impact of vacant, abandoned andforeclosed properties, Technical report, Federal Reserve Bank ofCleveland, 2008.

    [4] J. Accordino, G.T. Johnson, Addressing the vacant and abandonedproperty problem, J. Urban Aff. 22 (2000) 301315.

    [5] J.R. Cohen, Abandoned housing: exploring lessons from Baltimore,Hous. Policy Debate 12 (2001) 415448.

    [6] N.G. Helmholdt, Neighborhood Effects of Physical Interventions toAbandoned Housing, Cornell University, 2009. (Master's thesis).

    [7] J. Schilling, J. Logan, Greening the rust belt: a green infrastructuremodel for right sizing America's shrinking cities, J. Am. Plan. Assoc. 74(2008) 451466.

    [8] E.M. Bassett, Understanding housing abandonment and owner decision-making in flint, Michigan: an exploratory analysis, Technical reportWPE06EB1, Lincoln Institute of Land Policy, 2006.

    [9] R.M. Silverman, L. Yin, K.L. Patterson, Dawn of the dead city: anexploratory analysis of vacant addresses in Buffalo, NY 20082010,J. Urban Aff. 35 (2) (2013) 131152.

    [10] A.E. Hillier, D.P. Culhane, T.E. Smith, C.D. Tomlin, Predicting housingabandonment with the Philadelphia neighborhood information system,J. Urban Aff. 25 (2003) 91106.

    [11] D. Wachsmuth, S. Pasternak, Use it or lose it: Toronto's abandonmentissues campaign for affordable housing, Crit. Plann. 15 (2008) 722.

    [12] O.L. de Weck, D. Roos, C.L. Magee, Engineering Systems: MeetingHuman Need in a Complex Technological World, MIT Press, Cambridge,MA, 2011.

    [13] C.V. Apte, S.J. Hong, R. Natarajan, E.P.D. Pednault, F.A. Tipu, S.M. Weiss,Data-intensive analytics for predictive modeling, IBM J. Res. Dev. 47(2003) 1723.

    [14] T.H. Davenport, J.G. Harris, Competing on Analytics: The New Scienceof Winning, Harvard Business School Press, Boston, 2007.

    [15] K.R. Varshney, A. Mojsilovi, Business analytics based on financial timeseries, IEEE Signal Process. Mag. 28 (2011) 8393.

    [16] P.R. Messinger, Municipal Service Delivery: A Multi-stakeholderFramework, Human Factors and Ergonomics in Manufacturing &Service Industries 23 (1) (2013) 3746.

    [17] L. Bettencourt, G. West, A unified theory of urban living, Nature 467(2010) 912913.

    [18] M.A. Changizi, M. Destefano, Common scaling laws for city highwaysystems and the mammalian neocortex, Complexity 15 (2010) 1118.

    [19] L.M.A. Bettencourt, The origins of scaling in cities, Science 340 (2013)14381441.

    [20] C. Harrison, B. Eckman, R. Hamilton, P. Hartswick, J. Kalagnanam, J.Paraszczak, P. Williams, Foundations for smarter cities, IBM J. Res. Dev.54 (2010) 1:11:16.

    [21] M. Naphade, G. Banavar, C. Harrison, J. Paraszczak, R. Morris, Smarter citiesand their innovation challenges, IEEE Comput. 44 (2011) 3239.

    [22] P. Driscoll, S. Owens, B. Walsh, A.M. Maxwell, S. Kearney, Syracusehousing plan, technical report, City of Syracuse, 2010.

    [23] D. Koller, N. Friedman, Probabilistic Graphical Models: Principles andTechniques, MIT Press, Cambridge, MA, 2009.

    [24] S. Agarwal, T. Graepel, R. Herbrich, S. Har-Peled, D. Roth, Generalizationbounds for the area under the ROC curve, J. Mach. Learn. Res. 6 (2005)393425.

    [25] L. Breiman, Random forests, Mach. Learn. 45 (2001) 532.[26] G.V. Kass, An exploratory technique for investigating large quantities of

    categorical data, Appl. Stat. 29 (1980) 119127.[27] C. Cortes, M. Mohri, AUC optimization vs. error rate minimization, in: S.

    Thrun, L. Saul, B. Schlkopf (Eds.), Advances in neural informationprocessing systems 16, MIT Press, Cambridge, MA, 2004.

    [28] W. Kotowski, K. Dembczyski, E. Hllermeier, Bipartite ranking throughcilitate proactive property vacancy policies for cities, Technol.013.08.028

  • Sheila U. Appel is the East regional director, Corporate Citizenship andCorporate Affairs for IBM US, where she began her career in 1977. She hasreceived two IBM People Management Awards and has been recognized asone of the Top 100 Women Leaders in the Hudson Valley. Sheila currentlymentors several individuals both inside and outside of IBM, with emphasison women. In her current position, she manages the team that has corporateresponsibility for the twenty-six states in East United States, and is the lead forIBM in New York overseeing a range of programs helping to establish IBM as aleader in a new breed of corporate philanthropy an approach that promotessystemic social improvement largely through investments of technology,services, employee skills and volunteerism. She has recently been assignedto lead IBM's Corporate Citizenship's worldwide economic developmentstrategy. Sheila is a graduate of Dutchess Community College with an Business Administration and B.S. in Organizational Leadership fromMarist College.

    Derek Botti is a certied infrastructure architect at IBM, focusing on theintegration of computer systems, servers and networks to create complexsoftware systems. He is presently a CIO Security Architect in the IT Risk andTransformation Department for IBM Security. Prior to this, Derek was the leadarchitect for Service Management solutions for healthcare, Smart Cities, andelectronics and manufacturing. Derek began his career at IBM in 2001 workingwith the web hosting team responsible for managing corporate sponsorshipmarketing websites for the tennis grand slam events and The Masters golftournament. Additionally, this team was responsible for hosting, for the IBM chief information ofcer's ofce. He holds a B.S. degreein computer science from the University of Arkansas at Little Rock.

    James Jamison is a technical executive who supports IBM's Software SolutionsGroup. He is currently focused on software based technologies that enableSmarter Commerce and Internet of Things solutions. James joined IBM in 1996and was appointed a Director and IBM Distinguished Engineer in 2009. His areasof expertise are systems engineering, enterprise architecture, applicationdevelopment and technical sales. He holds a master's degree in computerscience, and undergraduate degrees in electrical engineering and mathe-

    matics. James is best known for his ability to take creative approaches tonew challenges using well-grounded methods to meet client needs andexpectations.

    Leslie Plant is with IBM Canada's external communications team, where sheis responsible for media relations support to the country general managerand healthcare, software, retail and research executive leaders. She joinedIBM in 1997 as communications manager for ISM Information ManagementSystems, a wholly owned subsidiary. Previously, she worked in progressiveroles within corporate communications at Magna International. She graduatedwith Honors from Carleton University in 1986 with a combined B.A. degree injournalism and English.

    Jing Y. Shyr is a distinguished engineer and chief statistician at the IBM BusinessAnalytics group. Before the IBM acquisition in 2009, Jingwas chief statistician andsenior vice president of Technology Solutions at SPSS Inc., aworldwide provider ofanalytical technology. She leads a team of researchers and software developersresponsible for the creation of data mining technology and statistical methodol-ogy. She holds a bachelor's degree in applied mathematics from National ChiaoTung University, a master's degree in applied statistics from National Tsing HuaUniversity, Taiwan, and a Ph.D. in statistics from Purdue University.

    Lav R. Varshney is a research scientist at the IBM Thomas J. Watson ResearchCenter in Yorktown Heights, NY. His current research is focused on dataanalytics, collective intelligence, and computational creativity. He was bornin Syracuse and received the B.S. degreewith honors in electrical and computerengineering (magna cum laude) from Cornell University. He received the S.M.,E.E., and Ph.D. degrees in electrical engineering and computer science from theMassachusetts Institute of Technology (MIT). His master's thesis was awardedthe E. A. Guillemin Thesis Award for best electrical engineering thesis at MITand his doctoral thesis received the J.-A. Kong Award Honorable Mention forbest electrical engineering thesis at MIT.

    13S.U. Appel et al. / Technological Forecasting & Social Change xxx (2013) xxxxxxPlease cite this article as: S.U. Appel, et al., Predictive analytics can faForecast. Soc. Change (2013), proactive property vacancy policies for cities, Technol.013.08.028

    Predictive analytics can facilitate proactive property vacancy policies for cities1. Introduction2. City of Syracuse2.1. Making Syracuse a smarter city

    3. Current state of affairs3.1. Financial themes

    4. Systems of systems framework4.1. Data clearinghouse4.2. Prediction4.3. Cost estimation4.4. Decision analysis4.5. Event correlation4.6. Dashboard

    5. Predictive analytics for prioritization5.1. Individual properties5.1.1. Bipartite ranking5.1.2. Scoring vacancy risk

    5.2. Neighborhoods

    6. Effecting change7. ConclusionAcknowledgmentsReferences


View more >