Big data for development: LIRNEasia’s...

Preview:

Citation preview

Bigdatafordevelopment:LIRNEasia’sexperiences

SriganeshLokanathanh>p://lirneasia.net/projects/bd4d/

ThisworkwascarriedoutwiththeaidofagrantfromtheInternaGonalDevelopmentResearchCentre,CanadaandtheDepartmentforInternaGonalDevelopmentUK

UniversityofDhakaDhaka,3rdFebruary2017

Catalyzing policy change through research to improve people’s lives in the emerging Asia Pacific by facilitating their use of hard and soft infrastructures through the use of knowledge, information and technology.

LIRNEasiaisaregionalnon-profitthinktank.Ourmissionisthatof

Wherewework

SomedevelopmentproblemsofinteresttoLIRNEasia...

•  MorepeopleliveinciGesthaninruralareassince2008–  HowcanwemakeciGesmorelivable?–  IstherearoleofICTs,notjustmoreroads,transit,etc.?

•  InfecGousdiseasesareposingthreats–  Canwemakebe>erdecisionsreallocaGngscarceresources?

•  GovernmentsareflyingblindwithalackofGmelydatatobe>ertargetexpenditures,assessprograms,orachieveSDGs–  Aretherewaystoremedythis?

4

ComprehensivecoverageofpopulaGonneeded.Sourcesofdata?

•  AdministraGvedata–  E.g.,digiGzedmedicalrecords,insurancerecords,taxrecords

•  CommercialtransacGons(transacGon-generateddata)–  E.g.,Stockexchangedata,banktransacGons,creditcardrecords,

supermarkettransacGonsconnectedbyloyaltycardnumber

•  OnlineacGviGes/socialmedia–  E.g.,onlinesearchacGvity,onlinepageviews,blogs/FB/twi>erposts

•  Sensorsandtrackingdevices–  E.g.,roadandtrafficsensors,climatesensors,equipment&

infrastructuresensors,mobilephonescommunicaGngwithbasestaGons,satellite/GPSdevices

5

MobileNetworkBigDataisonlyopGonatthisGme

6

Sources: https://www.gsmaintelligence.com/; http://www.itu.int/en/ITU-D/Statistics/Documents/publications/misr2015/MISR2015-w5.pdf; Facebook advertising portal; http://data.worldbank.org/indicator/SP.POP.TOTL

ButwhatkindofMNBDisappropriaterightnowforSriLanka?

•  StreetbumpisaBostoncrowdsourcing+bigdataapplicaGonthatusesthenaturalmovementofciGzenstoimprovestreetmaintenance–  Datageneratedfromanapp

downloadedtoasmartphone“mounted”inacar

•  CanStreetbumpbetransplantedinColomboatthisGme?–  Featurephones>>

Smartphones

•  “Somethingbe>erthannothing”maynotapply–  Biastowardroadstraversedby

smartphoneownersèIncondiGonsoflimitedresources,mayskewresourceallocaGon

7

Bigdatausedintheresearch•  MulGplemobileoperatorsinSriLankaprovidedfourdifferenttypesof

meta-data–  CallDetailRecords(CDRs):Recordsofcalls,SMS,Internetaccess–  AirGmerechargerecords–  NoVisitorLocaGonRegistry(VLR)data,becausetheyarewri>enover&not

stored

•  DatasetsdonotincludeanyPersonallyIdenGfiableInformaGon–  Allphonenumbersarepseudonymized–  LIRNEasiadoesnotmaintainanymappingsofidenGfierstooriginalphone

numbers•  Historical,notrealGme;thereforeanalyzedinbatchmodeinahardware

stackcosGng<USD30k•  Cover50-60%ofusers;veryhighcoverageinWestern(whereColombo

thecapitalcityinlocated)&Northern(mostaffectedbycivilconflict)Provinces,basedoncorrelaGonwithcensusdata

8

•  NowalsousingCCTVfootageaswellassatelliteimagery

Mobilenetworkbigdata+otherdataèrich,Gmelyinsightsthatserveprivateaswellaspublicpurposes

9

Construct Behavioral Variables

1. Mobility variables 2. Social variables 3. Consumption variables

Other Data Sources

1. Data from Dept. of Census & Statistics

2. Transportation data 3. Health data 4. Financial data 5. Etc.

Dual purpose insights

Private purposes

1. Mobility & location based services

2. Financial services 3. Richer customer

profiles 4. Targeted

marketing 5. New VAS

Public purposes

1. Transportation & Urban planning

2. Crises response + DRR

3. Health services 4. Poverty mapping 5. Financial inclusion

Mobile network big data (CDRs, Internet access usage, airtime recharge

records)

EXAMPLESOFONGOINGWORK

10

MNBDdatacangiveusgranular&high-frequencyesGmatesofpopulaGondensity

11

DSD population density from 2012 census

DSD population density estimate from MNBD

Voronoi cell population density estimate from MNBD

Popula9ondensitychangesinColomboregion:weekday/weekendPicturesdepictthechangeinpopula9ondensityatapar9cular9merela9vetomidnight

12

Wee

kday

Su

nday

Decrease in Density Increase in Density

Time 18:30 Time 12:30 Time 06:30

Ourfindingscloselymatchresultsfromexpensive&infrequenttransportaGonsurveys;arecheaper&canbeproducedasneeded

13

46.9%ofcity’sdayGmepopulaGoncomesfromoutside.PotenGalconfiguraGonsofaMetropolitanCorporaGon

HomeDSD Popula9on Percentagecontribu9ontoColombo’sday9me

popula9on

ColomboCity(2DSDs)

555,031

53.1

Total 555,031 53.1

HomeDSD Popula9on Percentagecontribu9ontoColombo’sday9me

popula9on

ColomboCity(2DSDs)

555,031

53.1

Maharagama 195,355

3.7

Kolonnawa 190,817

3.5

Kaduwela 252,057

3.3

SriJ’puraKo>e

107,508

2.9

Total 1,300,768 66.5

HomeDSD Popula9on Percentagecontribu9ontoColombo’sday9me

popula9on

ColomboCity(2DSDs)

555,031

53.1

Maharagama 195,355

3.7

Kolonnawa 190,817

3.5

Kaduwela 252,057

3.3

SriJ’puraKo>e

107,508

2.9

Dehiwala 87,834

2.6

Kesbewa 244,062 2.5

Wa>ala 174,336

2.5

Kelaniya 134,693

2.1

Ratmalana 95,162 2.0

Moratuwa 167,160 1.8

Total 2,204,015 79.9

HomeDSD Popula9on Percentagecontribu9ontoColombo’sday9me

popula9on

ColomboCity(2DSDs)

555,031

53.1

Maharagama 195,355

3.7

Kolonnawa 190,817

3.5

Kaduwela 252,057

3.3

SriJ’puraKo>e

107,508

2.9

Dehiwala 87,834

2.6

Kesbewa 244,062 2.5

Wa>ala 174,336

2.5

Total 1,807,000 74.1

Wecanunderstandtheextantoflocalizedtravel

15

Each colored area represents a region where the density of travel within it is greater than that across its borders: useful for

decisions on last-mile solutions for multi-modal transport

UnderstandingtrafficcondiGons

•  CDRdataisn’tgoodenoughtounderstandtrafficcondiGons

•  CCTVfootagegivesusabe>erchanceofunderstandingtrafficflow(volume,speeds)

16

Haar-feature classification Deep learning based classifier

WeexploitdiurnalbasestaGonsignaturesto…..

•  WecanusethisinsighttogroupbasestaGonsintodifferentgroups,usingunsupervisedmachinelearningtechniques

17

TypeY:?TypeX:?

…..understandlandusepa>erns

18

Highly commercial

Highly residential

ButusingjustMNBDgetsusonlysofar

•  Analysescurrentlybeingimprovedusingmachinevisiontechniquesonsatellitedata,aswellasusingclassifiersfromFoursquaredata

•  4differentsourcesofdataarebeingusedtocollecGvelygiveabe>erGmelypictureoflanduse–  ExisGnggroundtruthsurveydata(dataotenoldandgeographically

sparse)–  MNBD–  Satelliteimagery–  Foursquare

19

Finalresultswillhelpabe>ercomparisonofrealityagainstplans

20

2013MNBDanalyses 2020UDAplan 2000UDAlandusesurvey

2010SurveyDept.LandUseMap

WearealsoleveragingmobilitytoinfereconomicacGvity

Economicac9vity=

(numberofworkers)x(produc9vityperworker)

Observed Mustbeinferred

• WeassumemoreproducGveregionsaremorea>racGvedesGnaGons • CommuGngpa>ernsemergefromthetrade-offbetweena>racGvenessofaworkplaceandthecostofgewngthere

21

ExampleofcommuGngflowsfromoneoriginlocaGon

22

Biyagama Export

Processing Zone

HighCommuGngFlows

LowCommuGngFlows

23

WecandevelopnewproxymeasuresofeconomicacGvity

HighEsGmatedWage

LowEsGmatedWage

EsGmatedlog(wage)

Intuition: log(wage) is estimated as employment in excess of what is predicted by distance to residential population.

24

Nightlights Householddata IndustrialData

GeographicvariaGon

TimevariaGon yearly quarterly/2-3yrs/decade yearly/decade

RelevantvariablesEduca9on,

(un)employment,skilllevels

Employment,capitalintensity

Idealfor: ImprovingMeasure

Improving&ValidaGon

IncorporaGngotherdatacangivefurtherinsights

Householddata:Census/HIES/LFSIndustrialdata:ASI,IndustrialCensus

BenefitofanimprovedframeworkformodelingeconomicacGvity

•  IncreasethecoverageofexisGngsurveys(bothtemporalandgeographic)–  Bycalibra9ngwithhousehold,industrycensusandsurveydata,whenavailable

–  Then,mobiledatacanbeusedtopredict/extrapolateforGmeperiodsandregionswithoutsurveydata

•  Cancaptureinformaleconomicac9vity–  Otherresearchsuggestsinformaleconomyisalmost30%ofGDPinSriLanka

25

HowdocommuniGesmatchexisGngadministraGvedivisions?

26 The 9 provinces The 11 detected

communities

WithsomeexcepGons,boundariesofcommuniGesdifferfromexisGngadministraGvedivisions

27

•  Northern(1),Uva(10)andSouthern(11)communiGesmostsimilartoexisGngprovincialboundaries;but11takesEmbilipiGyaandKataragama

•  Colombodistrictisclusteredasasinglecommunity(7)•  GampahamergeswithcoastalbeltofNorthWesternProvince

(2)andKalutara(8)isitsowncommunity–  WhatdoesthismeanforWesternProvinceMegapolis?

•  Bawcaloa&AmparadistrictsoftheEasternProvincemergewiththePolonnaruwadistrictofNorthCentralProvincetoformitsowndisGnctcommunity(6)–  PossiblyreflecGveofeconomiclinkagessincethisisthericebelt

ofSriLanka–  Doeseconomicsoverrideethnicity?

Moredifferencesappearwhenwezoominfurther•  Theli>oralregionsform

theirowndisGnctsub-communiGes

•  ThenorthernpartofColombocityformsacommunitywithWa>ala,acrosstheKelaniriver

•  Ingeneral,riversnolongerformnaturalboundariesofcommuniGes

•  SuchworkcanpossiblyinformelectoralreformprocessinSriLankaandtheworkoftheDelimitaGonCommission

28 Bridge

PredicGngspaGalspreadofdengueinSriLanka

•  SuchanalyseswillbeveryimportantshouldanycasesofzikabedetectedinSriLanka

•  IniGalsresultsarebeingimprovedfurtherinpartnershipwiththeUniversityofMoratuwaandtheEpidUnitoftheMinistryofHealth

29

Comparing predicted & actual dengue outbreaks for Dehiwala MOH region in 2014

Week

# of

cas

es

POLICYIMPLICATIONS

30

ImplicaGonsforpublicpolicy•  PopulaGonmapsandmobility/migraGonpa>ernsare

essenGalpolicytools–  Includedinmostcensusandhouseholdsurveys

•  Improve&extendmeasurement–  Largesamplesizes–  Veryfrequentmeasurement

•  Moreprecisemeasures–  CommuGng/seasonal/long-termmigraGon

•  Improveplanningfor“spiky”eventsthatcreatepressureongovernmentandprivatelysuppliedservices–  Humanitarianresponsetodisasters–  Specialevents/holidays

31

ImplicaGonsforpublicpolicy•  Urban&transportaGonplanning

–  CanidenGfyhighvolumetransportcorridorstoprioriGzeforprovisionofmasstransit

–  Mapdefactomunicipalboundaries

•  Healthpolicy–  Mobilitypa>ernsthatcanhelprespondtospreadofinfecGous

diseases(e.g.,dengue,malaria,zika,etc.)

•  Almostreal-Gmemonitoringofurbanlanduse–  Cancomplementcostlyandinfrequentlandusesurveys

•  Providebe>ermeasuresofundetectedinformaleconomicacGvity

32

CHALLENGE1:POLICYIMPACT

33

Fromsupply-pushtodemand-pull

34

Event

2014Oct •  FoundingChairhasone-on-onemeeGngwithSecretary,UrbanDevelopmentèpresentaGontourbandevelopmentprofessionalsinNovember2014agreedupon,butpostponedduetoearlyannouncementofPresidenGalElecGon

2015Jan •  BigDatateamconductsapubliclectureorganizedbyInsGtuteofEngineersSriLanka(IESL)

2015Feb •  EmailcontactmadewithnewDGofUDAwithoffertobriefonLA’songoingresearch(meeGngsplannedbutdon’thappen)

•  FirstmediainteracGonsinSriLanka

2015May •  BigDataTeamLeadermakes5-minutepresentaGonatWorkshoponimplementaGonoftransportaGonmasterplanforMinistryofInternalTransportorganizedbyUoM’sDept.ofTransport&LogisGcs.A>endedbydomainspecialists,academics,researchers,officialsfromUDA,RDA,etc.

2015Jun-Sept

•  BigDataTeamLeaderinvitedtojoinplanningteamfortheWesternRegionMegapolisPlanningProjecttoprovideinsightsfromMNBD.Ournamesuggestedbyoneofthea>endeesintheMay2015event,whowasappointedtoleadoneofthecommi>eesworkingontheWRMPP

•  Intotal9meeGngswerea>ended,culminaGnginapresentaGontoallthecommi>eesoninsightsfromLIRNEasiabigdataresearchrelatedtourbanandtransportaGonplanning

2015Aug •  BigDataTeamLeaderpresentsatWorkshoponIntegratedLandUseTransportModelingPracGcesinSriLankaandaroundtheWorldorganizedbyUoM’sTransportaGonEngineeringDivision.A>endedbydomainspecialists,academics,researchers,officialsfromUDA,RDA,etc.

Polic

y En

light

enm

ent

/ Sup

ply-

push

D

eman

d-Pu

ll

35

Sri Lanka’s highest-circulation English newspaper

Sunday April, 12, 2015

Fromsupply-pushtodemand-pull

36

Event

2015Sept •  BigDataTeamLeadera>endslaunchofWorldBankReporton“LeveragingUrbanizaGoninSouthAsia”

•  SidediscussionswithDGofUDAonLIRNEasia’songoingresearchleadstoone-on-onemeeGngwithDGthefollowingdaytobriefhimonLIRNEasiaresearch

2015Dec •  UDA,UDA’sProfessionalsAssociaGon,&YoungPlannersForumofInsGtuteofTownPlanners,SriLankaorganizespecialsessionforLAtopresentongoingresearch.

•  EchelonMagazinereportonMegapolisplansincludechartsgivenbyUDAonsourcelocaGonsofColombo’sdayGmepopulaGondevelopedbyLIRNEasia(withoutacknowledgement)

2016Jan •  BigDatateaminvitedtomakepresentaGontoSriLankaStrategicCiGesDevelopmentProjectworkingonKandy

2016Feb •  DGUDArequestsaddiGonalmobilityandland-useinsightsonKandy

2016Feb •  SriLankaStrategicCiGesDevelopmentProjectreachesouttoLIRNEasiaforinsightsonfoottrafficinKandy.Ourdatanotsuitablebutwebrainstormpossiblemethodologies

2016Aug •  UDArequestsaddiGonalfiner-grainedmobilityinsightsforspecificareasinWesternProvince

Polic

y En

light

enm

ent

/ Sup

ply-

push

Dem

and-

Pull

WesternRegionalMegapolisProject(WRMP)

37

Source: Interview with Western Region Megapolis Authority in Echelon magazine (December 2015, pp. 63)

Lessons•  Nodemandforinsightswhenwestartedin2012;buthadwe

notstartedontheresearchwhenwedid,noresultswouldhavebeenavailablewhenthepolicywindowopened

•  Givendifficultyofassessingqualityofbig-dataresearch,thecredibilityofLIRNEasiahasbeenofvalue–  MoreworkneedstobedonetomakepotenGalusersofthisresearch

moreknowledgeable

•  Supply-pushapproachhelpedcreatethecondiGonsforessenGaldemand-pull–  However,poliGcalchangesandappointmentswhichwereoutsideour

controlwerecriGcalincreaGngdemand

38

CHALLENGE2:HUMANRESOURCES

39

DatascienGstsareinshortsupply;NeededaremulG-disciplinaryteams

•  WeprioriGzeanalyGcalthinkingoverknowledgeofbigdatatoolsinourrecruitmentinterviews

•  TeammembershavedifferentspecialGes•  Staffandcollaboratorshaveamixofcomputer-scienceskills,staGsGcs,anddomainknowledge

40

41

•  DanajaMaldeniya(ComputerScience,StaGsGcs)

–  NowatUofMichiganbutsGllcollaboraGng

•  CDAthuraliya(ComputerScience)•  DedunuDhananjaya(SotwareEngineer,

SystemsAdministrator)–  Movedtoprivatesector,butsGllcollaboraGng

•  IsuruJayasooriya(ComputerScience,MachineVision)

•  KaushalyaMadhawa(ComputerScience,StaGsGcs)

–  NowatTokyoInsGtuteofTechnology

•  KeshandeSilva(ComputerScience,AgentBasedSimulaGons)

•  LasanthaFernando(ComputerScience)•  MadhushiBandara(ComputerScience)

–  NowatUofNewSouthWalesbutsGllcollaboraGng

•  NisansadeSilva(ComputerScience)–  NowatUofOregon,butsGllcollaboraGng

•  RobertGalyean(MathemaGcs,Physics)•  Prof.RohanSamarajiva(PublicPolicy)•  SriganeshLokanathan(PublicPolicy,

ComputerScience)•  ShaznaZuhyle(ResearchManager)•  ThavishaGomez(ResearchManager)

Staff Collaborators •  Prof.AmalKumarage(Dept.of

Transport&LogisGcs,UoM)–  TransportaGon,UrbanPlanning

•  DrAmalShehanPerera(Dept.ofCSE,UoM)

–  DataMining

•  FieldsofView–  IndianresearchinsGtutespecializingin

gamesandsimulaGonsforpublicpolicyissues

•  GabrielKreindler(MIT)–  Economics,StaGsGcs

•  ProfJoshuaBlumenstock(UCBerkley,SchoolofInformaGon)

–  DataScience,Economics,StaGsGcs

•  ProfMoinulZaber(UofDhaka)–  DataScience,PublicPolicy

•  Shibasaki&SekimotoLaboratory,UniversityofTokyo

–  BigDataforDevelopmentresearchpracGce

•  YuheiMiyauchi(MIT)–  Economics,StaGsGcs

Flowthrough:We’realwayshiring•  Acomputer-sciencegraduatedoesnotthinkofworkingina

think-tank;otenlookingtojoinasotwarefirm–  LIRNEasiahasdonepresentaGonsinpublicforumsandatuniversiGes

tobroadenhorizons

•  Keysellingpointistheworkthatwedoandourpartnerships–  Rewardingtoseeresearchbeingused–  GoodopportuniGesforpublicaGonandconferencepapers–  LIRNEasiaencouragesindividualresearcherstobuildtheir“brands”–  IdealforadmissionintogoodPhDprograms;currentfunded

placements•  UOregon•  UMichigan•  TokyoInsGtuteofTechnology•  UNewSouthWales

42

Privacy•  ThehighertheresoluGonofyourinsights,thegreatertheprivacyimplicaGons

•  Mixingnon-personaldatawithothersourcescanrevealpersonala>ributes

•  Informedconsentismeaninglessinabigdataworld

•  Peopleotendon’tknowwhattheirgeneralizableprivacyneedsareorhowtheirpreferencesmightevolve

43

Weshouldalsobeaskingthefollowing:

•  Shouldweconcentrateondefiningandenforcingprivacyrightsorreducingharms?

•  Whenitcomestodevelopingeconomies,shouldweworryaboutinclusionorexclusioninbigdata?

•  WhataboutcompeGGon?

44

SelectedPublicaGons&Reports•  Lokanathan,S.,Kreindler,G.,deSilva,N.D.,Miyauchi,Y.,Dhananjaya,D.,&Samarajiva,R.

(2016).UsingMobileNetworkBigDataforInformingTransportaGonandUrbanPlanninginColombo.Informa*onTechnologies&Interna*onalDevelopment

•  Samarajiva,R.,Lokanathan,S.,Madhawa,K.,Kriendler,G.,&Maldeniya,D.(2015).Bigdatatoimproveurbanplanning.EconomicandPoli*calWeekly,VolL.No.22,May30

•  Maldeniya,D.,Lokanathan,S.,&Kumarage,A.(2015).Origin-DesGnaGonmatrixesGmaGonforSriLankausingmobilenetworkbigdata.13thInternaGonalConferenceonSocialImplicaGonsofComputersinDevelopingCountries.Colombo

•  Kreindler,G.&Miyauchi,Y.(2015).CommuGngandProducGvity:QuanGfyingUrbanEconomicAcGvityusingCellPhoneData.LIRNEasia

•  Lokanathan,S&Gunaratne,R.L.(2015).MobileNetworkBigDataforDevelopment:DemysGfyingtheUsesandChallenges.Communica*ons&Strategies.

•  Lokanathan,S.(2014).TheroleofbigdataforICTmonitoringandfordevelopment.InMeasuringtheInforma*onSociety2014.InternaGonalTelecommunicaGonUnion.

Moreinforma9on:hdp://lirneasia.net/projects/bd4d/

45

Recommended