28
mining the social web Aris2des Gionis Michael Mathioudakis firstname.lastname@aalto.fi Aalto University Spring 2015

Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

Embed Size (px)

Citation preview

Page 1: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

mining  the  social  web  

Aris2des  Gionis  Michael  Mathioudakis  

[email protected]      

Aalto  University  Spring  2015  

Page 2: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

social  web  

   

facebook  twiEer  linkedin  foursquare  flickr  instagram  pinterest  youtube  ustream  

github  stackoverflow  wikipedia    

2  

Page 3: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

social  web  

 websites  and  plaHorms  that  enable  users  to  

produce  content  blog  posts,  ‘status’  messages,  videos,  pictures,  podcasts  

consume  content  read  text  -­‐  blog  posts,  ‘status’  messages  

listen  to  podcasts,  watch  videos  

interact  with  each  other  comment  on  each  other’s  posts,  ‘like’  or  rate  items  

3  

Page 4: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

mining  the  social  web  a  lot  of  users...  a  lot  of  data...  

what  could  we  learn*?  *  assuming  we  have  the  data  -­‐  more  on  that  later  

  gain  insights  into...  social  behavior  

how  many  connec2ons  does  an  average  person  have?  do  people  connect  with  like-­‐minded  people?  

poli2cal  sen2ment  what  do  people  think  about  current  poli2cal  issues?  

how  we  experience  our  ci2es  what’s  the  best  neighborhood  for  food/nightlife?  

how  we  build  our  careers  how  oRen  do  people  change  careers?  

how  beneficial  is  it  to  ‘network’  professionally?  other?  

4  

Page 5: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

mining  the  social  web  

 there  is  already  research  that  explores  those  ques2ons  

 we  will  discuss  some  of  it  

now  and  in  the  next  two  lectures  

5  

Page 6: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

twiEer  •  a  social  sensor  – social  network  +  news  media  – what  is  happening?  – where,  who?  happening?  –  trends  – events  – opinions  – poli2cal  views  – sen2ments  – demographics  

6  

Page 7: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

twiEer  studies  

•  finding  news  events  and  stories  •  detec2ng  trends  •  predic2ng  consumer  behavior  •  predic2ng  stock  market(!)  •  disaster  response  •  rumor  analysis  and  credibility  assessment  •  influence  analysis  •  poli2cal  analysis  –  polariza2on,  bias  of  news  media  

•  sociology  studies  –  sen2ment  vs.  demographics,  gender  inequality  

  7  

Page 8: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

•  photo  sharing  +  social  network  •  photos  contain  addi2onal  informa2on  –  tags  – geoloca2on  – comments,  favorites  – assigned  to  groups  

8  

Page 9: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

9  

Page 10: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

Eric  Fischer   10  

Page 11: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

recommend  tourist  i2neraries  

11  

Page 12: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

foursquare  

•  loca2on-­‐based  social  network  •  users  check-­‐in  to  different  loca2ons  •  loca2ons  have  types  (hierarchy)  –  restaurant,  sport  venue,  museum,  college,  …      

•  ques2ons:  – where  do  people  hang  out?  – where  events  take  place?  – do  friends  influence  each  other?  

12  

Page 13: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

when/where  people  check  in?  �.� exploration ��

0 5 10 15 20

New-York

London

Barcelona

Helsinki

Total

(a) Hourly check-ins frequency during the day. The activity is at its lowestaround � a.m. and after that, there are three peaks: one when peoplego to work in the morning, one in the middle of the day and the lastone at the end of the evening. Yet, depending of the city, these peaksdo not happen at the same time, nor with the same intensity. Therefore,instead of working directly the raw values of features, we use the numberof standard deviation or z-score.

Mon0–4

Mon8–12

Mon16–20

Tue0–4

Tue8–12

Tue16–20

Wed0–4

Wed8–12

Wed16–20

Thu0–4

Thu8–12

Thu16–20

Fri 0–4

Fri 8–12

Fri 16–20

Sat 0–4

Sat 8–12

Sat 16–20

Sun0–4

Sun8–12

Sun16–20

New-YorkLondon

BarcelonaHelsinkiTotal

(b) Four-hour check-in frequency during the week. We observe a day/nightcycle, which becomes less distinctive as the weekend is approaching anddisappears on Saturday.

Figure �: Check-ins temporal pattern.

�� datasets

�–� �–� �–�� ��–��

��–�� ��–��

��–�� ��–�

10

20

30

40

50

hour

percentage

� hours time clusters in Paris

�–� �–� �–�� ��–�� ��–�� ��–�

10

20

30

40

50

60

hour

percentage

� hours time clusters in Paris

Figure �: Venues clustered by time of check-ins.

13  

Page 14: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

when/where  people  check  in?  �� datasets

City Name Category Entropy

BarcelonaCastellers de Barcelona Non-Profit 0.0139

Café de la Pompeu Café 0.0172

Ràdio � Radio Station 0.0176

ParisBoutique Orange Electronics Store 0.0099

Métro Goncourt [��] Subway 0.0105

Blue Acacia Office 0.0112

Barcelona

Plaça de Catalunya Plaza 0.5835

Sants Estació Train Station 0.6298

Sagrada Família Government Building 0.6309

Camp Nou Stadium 0.6852

Paris

Gare SNCF : Gare de Lyon Train Station 0.6725

Gare SNCF : Paris Nord Train Station 0.6911

Musée du Louvre Museum 0.6924

Tour Eiffel Government Building 0.7167

(a) Venues in Paris and Barcelona with lowest and highest user en-tropy.

0 0.2 0.4 0.6 0.80

0.2

0.4

0.6

time entropy Ht

user

entropyH

u

0.738x2 � 0.385x + 0.179

0.009e�4.103x + 0.1

(b) Venues entropy in Paris with respect to user population and timeof check-in during the day.

Figure �: Extreme values of user entropy, and its relation with time entropy.

14  

Page 15: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

data  sources  less  obvious  

traffic  sensors  

15  

Page 16: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

detec2ng  events  with  traffic  sensors  

16  

Page 17: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

project  ideas  less  obvious  

17  

Page 18: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

your  project  

come  up  with  a  project  idea  implement  it!  

report  on  your  results  and  findings  

18  

Page 19: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

types  of  projects  

•  form  a  hypothesis  and  set  out  to  test  it  –  are  rich  people  happier?  

•  start  with  an  interes2ng  ques2on  – which  are  hipster  neighborhoods  in  my  city?  

•  start  with  a  business  idea  –  recommend  relevant  music  to  music  listeners  –  recommend  clothes  to  music  listeners  

•  start  with  a  problem  that  you  (think)  can  solve    –  how  to  iden2fy  trends  in  space  and  2me?  

•  start  with  a  cool  dataset  and  explore  it  

19  

Page 20: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

your  project  

analyze  data  

set  a  goal  for  your  project  (what’s  the  ques2on  you  want  to  answer)  

study  related  literature  (what  has  /  hasn’t  been  done  already?  

or  you  think  you  can  do  it  beEer)  

collect  data  (some  data  are  more  difficult  to  come  by)  

results  evalua2on  

(have  you  answered  the  ques2on  asked  originally?  possible  improvements?  

future  work?)  

1   2  

3  

4  

5  

6  

20  

Page 21: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

coming  up  with  a  project  idea  

•  conferences:    SIGKDD,  ICWSM,  WWW,  WSDM  

•  themes  –  urban  compu2ng,  trend  /  event  detec2on,  social  networks,  poli2cal  sen2ment,  privacy  

–  other  •  google  scholar  •  talk  with  us  

office  hours:  Mon,  14:15-­‐15:30    and  by  appointment  

21  

Page 22: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

collec2ng  the  data  

•  what  data  are  available?  –  different  plaHorms  share  different  data  about  their  users’  ac2vity  –  browse  dev  sites  of  social  networks  &  find  out  about  privacy  policies  

and  APIs  –  browse  public  data  repositories  –  the  data  mining  group  has  data  for  

blog  posts,  twiEer,  google+,  facebook,  foursquare  

 

•  code  Mining  the  Social  Web  (github)  hEps://github.com/ptwobrussell/Mining-­‐the-­‐Social-­‐Web-­‐2nd-­‐Edi2on  

 22  

Page 23: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

schedule  

•  Today:  overview  •  February  2nd  :  discuss  literature  (Aris)  •  February  9th  :  discuss  literature  (Michael)  •  February  16th  &  23rd:  present  project  proposals  •  March  30th  :  students  submit  progress  report  •  March  30th  &  April  6th:  intermediate  presenta2ons  •  May  4th  &  May  11th  :  final  presenta2ons  •  May  15th  :  final  report  due  

23  

Page 24: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

final  report  

•  introduc2on  •  related  work  •  problem  statement  •  proposed  technique  (algorithms)  •  data  descrip2on  •  empirical  evalua2on    

–  results  –  comparison  with  state  of  the  art  

•  future  work  

24  

Page 25: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

grading  

•  originality  (has  it  been  done  before)  •  poten2al  impact  (how  interes2ng  it  is  &  why)  •  rigorousness  of  proposed  technique  •  reproducibility  (public  code)  •  presenta2on  

•  teams  of  2  are  encouraged  •  presenta2ons  &  reports  are  required  •  surveys  of  exis2ng  techniques  are  ok,  too  

25  

Page 26: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

schedule  

•  Today:  overview  •  February  2nd  :  discuss  literature  (Aris)  •  February  9th  :  discuss  literature  (Michael)  •  February  16th  and  23rd:  students  present  project  proposals  

•  March  30th  :  students  submit  progress  report  •  March  30th  &  April  6th:  intermediate  presenta2ons  •  May  4th  &  May  11th  :  final  presenta2ons  •  May  15th  :  final  report  due  

26  

Page 27: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

un2l  then...  browse  literature  

see  papers  posted  on  noppa  for  a  sample  conferences  KDD,  ICWSM,  WWW,  WSDM    

google  scholar  dev  websites,  for  example...  

hEps://dev.twiEer.com,  hEps://developers.facebook.com,  hEps://developer.github.com/,  hEps://developer.foursquare.com  

code  samples,  hEps://github.com/ptwobrussell/Mining-­‐the-­‐Social-­‐Web-­‐2nd-­‐Edi2on  

data  repositories,  hEp://snap.stanford.edu/,  hEp://icwsm.org/2013/datasets/datasets/,  

hEp://wadam-­‐data.dis.uniroma1.it  

and  talk  to  us!        

27  

Page 28: Mining the Social Web - Lecture 1 - T61.6020 lecture-01-slides

see  you  next  week!  

 Aris2des  Gionis  

Michael  Mathioudakis  contact:  [email protected]  

   

Office  Hours:  Mon,  14:15-­‐15:30    and  by  appointment  

28