38
Leveraging mobile network big data for development This work was carried out with the aid of a grant from the Interna5onal Development Research Centre, O>awa, Canada. Sriganesh Lokanathan, LIRNEasia IITDelhi, New Delhi 22 August 2014

Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

Leveraging  mobile  network  big  data  for  development  

This  work  was  carried  out  with  the  aid  of  a  grant  from  the  Interna5onal  Development  Research  Centre,  O>awa,  Canada.    

Sriganesh  Lokanathan,  LIRNEasia  IIT-­‐Delhi,  New  Delhi  22  August  2014  

Page 2: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

About  LIRNEasia  (www.lirneasia.net)  

•  We  are  a  regional  think  tank  based  in  Colombo,  working  across  South  Asia,  South  East  Asia  and  the  Pacific  Small  Island  States  

•  Our  mission  –  Catalyzing  policy  change  through  research  to  improve  people’s  lives  in  the  emerging  Asia  Pacific  by  facilita5ng  their  use  of  hard  and  soS  infrastructures  through  the  use  of  knowledge,  informa5on  and  technology  

2

Page 3: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

What  is  Big  Data?  

3

Page 4: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

4

Big Data

VARIETY  Different  forms  

of  data  

VALUE  Poten@al  value  

of  data  

VOLUME  Size  of  data  

VELOCITY  Rate  of  change  

of  data  

VERACITY  Uncertainty  of  

data  

The Vs

Page 5: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

What  has  facilitated  the  rise  of    big  data?  

•  Vast  drops  in  the  cost  of  storing  and  retrieving  informa5on  

•  Exponen5al  growth  in  computer  power  and  memory    – data  can  reside  in  persistent  memory  instead  of  disk  and  tape  

•  Major  improvements  in  techniques  for  performing  machine  learning  and  reasoning  

5

Page 6: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

Sources  of  big  data  •  Administra5ve  data  

–  E.g.  digi5zed  medical  records,  insurance  records,  tax  records,  etc.  

•  Commercial  transac5ons  –  E.g.  Bank  transac5ons,  credit  card  purchases,  supermarket  purchases,  

online  purchases,  etc.  

•  Sensors  and  tracking  devices  –  E.g.  road  and  traffic  sensors,  climate  sensors,  equipment  &  

infrastructure  sensors,  mobile  phones,  satellite/  GPS  devices,  etc.    

•  Online  ac5vi5es/  social  media  –  E.g.  online  search  ac5vity,  online  page  views,  blogs/  FB/  twi>er  posts,  

online  audio/  video/  images,  etc.  

6

Page 7: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

If  we  want  comprehensive  coverage  of  the  popula5on,  what  are  the  sources  of  big  

data  in  developing  economies?  •  Administra5ve  data?  

–  E.g.  digi5zed  medical  records,  insurance  records,  tax  records,  etc.  

•  Commercial  transac5ons?  –  E.g.  Bank  transac5ons,  credit  card  purchases,  supermarket  purchases,  

online  purchases,  etc.  

•  Sensors  and  tracking  devices?  –  E.g.  road  and  traffic  sensors,  climate  sensors,  equipment  &  

infrastructure  sensors,  mobile  phones,  satellite/  GPS  devices,  etc.    

•  Online  ac5vi5es/  social  media?  –  E.g.  online  search  ac5vity,  online  page  views,  blogs/  FB/  twi>er  posts,  

online  audio/  video/  images,  etc.  

7

Page 8: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

Currently  only  mobile  network  big  data  has  the  widest  possible  popula5on  coverage  

8

Page 9: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

LIRNEasia’s  Big  Data  for  Development  (BD4D)  Research  

•  LIRNEasia  has  nego5ated  access  to  historical  and  anonymized  telecom  network  meta-­‐data  from  mul5ple  operators  in  Sri  Lanka  

•  In  the  current  research  cycle  we  are    –  conduc5ng  exploratory  research  on  answering  a  few  social  science  

ques5ons  related  to  mobility  and  connectedness  –  developing  a  framework  with  privacy  and  self-­‐regulatory  guidelines  for  

the  collec5on,  use  and  sharing  of  mobile  phone  data.    

•  h>p://lirneasia.net/projects/bd4d/    

9

Page 10: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

The  data  sets  •  Mul5ple  mobile  operators  in  Sri  Lanka  have  provided  LIRNEasia  access  to  4  different  types  of  meta-­‐data:  –  Call  Detail  Records  (CDRs)  

•  Records  of  calls,  SMS-­‐es,  Internet  access  

–  Air5me  top-­‐up  records  

•  Data  sets  do  not  include  any  Personally  Iden5fiable  Informa5on  (PII).    –  All  phone  numbers  are  anonymized  and    –  LIRNEasia  does  not  maintain  any  mappings  of  iden5fiers  to  original  phone  numbers  

10

Page 11: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

What  are  some  types  of  big  data  captured  by  mobile  network  operators?  

•  Call  Detail  Record  (CDR)  –  Records  of  all  calls  made  and  received  by  a  person  created  mainly  for  

the  purposes  of  billing  –  Similar  records  exist  for  all  SMS-­‐es  sent  and  received  as  well  as  for  all  

Internet  sessions  

–  The  Cell  ID  in  turn  has  a  lat-­‐long  posi5on  associated  with  it.  

11

Page 12: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

•  Air@me  reload  records  –  Records  of  all  air5me  reloads  performed  by  prepaid  SIMs  –  Each  row  corresponds  to  a  record  of  one  person’s  ac5vity:  

 

12

What  are  some  types  of  big  data  captured  by  mobile  network  operators  (contd.)?  

Page 13: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

The  overall  process  of  leveraging  mobile  network  big  data  for  development  

13

Mobile  network  big  data  (CDRs,  Internet  access  usage,  air5me  recharge  

records)  

Construct behavioral variables

(i)  Mobility  variables  (ii)  Social  variables  (iii)  Consump5on  

variables  

Other data sources

(i)  Census  data  (ii)  HIES  data  (iii)  Survey  maps  (iv)  Transporta5on  

schedules  (v)  ++++  

Insights

(i)  Urban  planning  (ii)  Crisis  

management  &  DRR  

(iii)  Health  monitoring  &  planning  

(iv)  Socio-­‐economic  monitoring  &  planning  

Analytics

Page 14: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

Why  are  we  doing  this  work?  

•  To  bring  5mely  evidence  into  the  policy  making  process  in  developing  economies  

14

Page 15: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

An  example  of  5mely  policy-­‐relevant  evidence  

15

•  The  image  on  the  leS  depicts  rela5ve  density  of  people  in  Colombo  city  and  the  surrounding  regions  at  1300  compared  to  0000  (midnight  the  previous  day)  on  a  normal  weekday.    

•  The  yellow  to  red  colors  depict  areas  whose  density  has  increased  rela5ve  to  midnight.  The  blue  color  depicts  areas  whose  density  has  decreased  rela5ve  to  midnight  (the  darker  the  blue,  the  greater  the  loss  in  density).  The  clear  areas  are  those  where  the  overall  density  has  not  changed.  

Page 16: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

Our  findings  closely  match  results  from  expensive  &  infrequent  transporta5on  surveys    

16

Page 17: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

Using  mobile  network  big  data  we  can  understand  the  mobility  pa>erns  of  the  popula5on  

17 Volume of people

Low High

Page 18: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

Understanding  land  use  characteris5cs  

•  People  leave  digital  traces  when  they  use  communica5on  devices.  

•  Mobile  communica5on  pa>erns  at  different  loca5ons  can  be  leveraged  to  classify  them  into  land-­‐use  categories.  

18

Page 19: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

The  diurnal  pa>erns  of  users  connected  to  base  sta5ons  revealed  interes5ng  pa>erns  

19

Base station 1 Base station 2

Page 20: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

How  do  we  leverage  this  insight?  

•  Time  series  of  each  base  sta5on  is  normalized  to  a  (0-­‐1)  range    

•  Euclidean  distance  between  two  5me  series  is  used  to  cluster  base  sta5ons  in  an  unsupervised  manner  using  k-­‐means  algorithm  

20

Page 21: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

Distribu5on  of  base  sta5ons  in  Colombo  district  

6

Page 22: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

What  does  this  reveal?  

•  Cluster  1  exhibits  pa>erns  consistent  with  a  commercial  area  

 •  Cluster  2  exhibits  pa>erns  

consistent  with  a  less  commercial  and  more  residen5al  characteris5cs  (or  possibly  mixed)    

 

8

Page 23: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

A  closer  look  at  base  sta5ons  in  each  cluster  

7

Cluster 1 examples Cluster 2 examples

Page 24: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

Our  results  show  how  the  Central  Business  District  (CBD)  has  expanded  

24

Page 25: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

Seethawaka  Export  Processing  Zone  (EPZ)  

11

Seethawaka EPZ Photo ©Senanayaka Bandara - Panoramio

Page 26: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

So  how  can  mobile  network  big  data  help  transporta5on  and  urban  policy  

•  Can  help  us  understand  popula5on  mobility  at  a  very  high  resolu5on  (both  temporally  as  well  as  spa5ally)  –  VLR  data  is  very  useful  for  understanding  traffic  conges5on  

•  Can  help  us  understand  land-­‐use  characteris5cs  •  Can  provide  insights  in-­‐between  infrequent  and  costly  surveys  and  land  use  census  and  in  some  cases  even  replace  them  

26

Page 27: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

Can  mobile  network  big  data  help  us  understand  Sri  Lanka’s  communi5es?  

27

Page 28: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

The  geo-­‐spa5al  distribu5on  of  Sri  Lanka’s  social  networks?  

•  Each  link  represents  the  raw  number  of  outgoing  and  incoming  calls  between  two  DSDs    •  Divisional  Secretariat  

Division  (DSD)  is  a  third  level  administra5ve  division;  331  in  total  in  LK  

     

9 No. of calls

Low High

Page 29: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

A  different  picture  emerges  when  call  volume  is  normalized  by  popula5on  

     ●  Strongly  connected  

components  are  visible      

10 No. of calls

Low High

Page 30: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

Iden5fying  communi5es:  methodology  

•  The  social  network  is  segregated  such  that  overlapping  connec5ons  between  communi5es  are  minimized.  

•  Strength  of  a  community  is  determined  by  modularity  •  Modularity  Q              =      (edges  inside  the  community)  –    

     (expected  number  of  edges  inside  the  community)        

   

M.  E.  J.-­‐Newman,  Michele-­‐Girvan,  “Finding  and  evalua5ng  community  structure  in  networks”,  Physical  Review  E,  APS,  Vol.  69,  No.  2,  p.  1-­‐16,  204.      

12

Page 31: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

The  resultant  communi5es  are  centered  around  geographic  neighborhoods  

●  For  Sri  Lanka,  the  op5mal  number  of  communi5es  discovered  by  the  algorithm  was  11  

31

Page 32: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

 How  much  do  these  communi5es  mesh  with  

exis5ng  administra5ve  boundaries?    

●  Southern  and  Northern  provinces  have  the  highest  similarity  to  their  respec5ve  provincial  boundaries.  

 

14

Page 33: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

●  Colombo  district  is  clustered  as  a  single  community  and  Gampaha  is  merged  with  North  Western  Province  

15

 How  much  do  these  communi5es  mesh  with  exis5ng  administra5ve  boundaries  (contd.)?  

 

Page 34: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

Zooming  into  a  community  

●  Community detection is done within identified communities

●  These communities

are less less related to DSD boundaries in Colombo district

16

Page 35: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

Implica5ons  for  public  policy?  

•  Administra5ve  boundaries  based  on  history  and  geography  may  not  reflect  current  community  structures  

35

Page 36: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

Analy5cal  challenges  

36

Challenge   Solu@on(s)  Data  is  biased  towards  frequent  users  

•  Understand  and  adjust  for  selec5on  bias  

Data  sparsity   •  Interpola5on  techniques  •  Probability  based  models  

Different  tower  densi5es   •  Different  scale  of  analyses  depending  on  region  

Valida5ng  results   •  Using  other  data  sources  e.g.  data  from  Dept.  of  Census  and  Sta5s5cs,  transporta5on  survey  data,  etc.  

Page 37: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

In  sum..  

•  Mobile  network  big  data  has  many  uses  for  developmental  policy:  – Helping  us  understand  human  mobility  at  a  fine  temporal  and  geo-­‐spa5al  scale    

– Helping  us  understand  social  connectedness  and  community  structures  

•  But  it  is  not  without  analy5cal  challenges  –  Incorpora5ng  other  data  (even  ‘small  data’)  can  improve  the  results  

  37

Page 38: Leveraging)mobile)networkbig)data) fordevelopment)lirneasia.net/wp-content/uploads/2014/09/IIT-delhi-Loka... · 2018-05-21 · 4 Big Data VARIETY) Differentforms) of)data) VALUE)

Thank  you  

•  More  informa5on:  – h>p://lirneasia.net/projects/bd4d/    

38