25
Haipeng Zhang, Mohammed Korayem, David Crandall School of Informa-cs and Compu-ng Indiana University, Bloomington, USA Mining photo8sharing websites to study ecological phenomena Gretchen LeBuhn Department of Biology San Francisco State University, San Francisco, USA

Mining(photo8sharing(websites(to(( study(ecological(phenomena(

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Mining(photo8sharing(websites(to(( study(ecological(phenomena(

Haipeng(Zhang,(Mohammed(Korayem,(David(Crandall(School&of&Informa-cs&and&Compu-ng&

Indiana&University,&Bloomington,&USA&

Mining(photo8sharing(websites(to((study(ecological(phenomena(

Gretchen(LeBuhn(Department&of&Biology&

San&Francisco&State&University,&San&Francisco,&USA&

Page 2: Mining(photo8sharing(websites(to(( study(ecological(phenomena(

Social&photo&sharing&websites&

100+&billion(photos&&6+&billion(photos&&

Page 3: Mining(photo8sharing(websites(to(( study(ecological(phenomena(

Snow(

Flowers(

Cloud(cover(

Wildlife(

Foliage(

Page 4: Mining(photo8sharing(websites(to(( study(ecological(phenomena(
Page 5: Mining(photo8sharing(websites(to(( study(ecological(phenomena(

Need&for&ecological&data&

&How&is&nature&changing&due&to&

global&warming?&

–  Plot8based(studies:(FineHgrained&informa-on&but&only&at&a&few&

loca-ons,&and&laborHintensive&

–  Aerial(surveillance:(Con-nentalHscale&informa-on,&but&only&

useful&for&some&phenomena&

[IPCC2007]&

Page 6: Mining(photo8sharing(websites(to(( study(ecological(phenomena(

Our&paper&

•  Can&we&observe&nature&by&mining&photo&websites?&

•  We&study&two&phenomena:&snow(and&vegetaHon(cover(–  Es-mate&geoHtemporal&distribu-ons&at&con-nental&scale,&

using&~150&million&photos&from&Flickr&(via&public&API)&

–  Analyze&geoHtags,&-mestamps,&text&tags,&visual&content&

–  Evaluate&techniques&for&es-ma-on&in&crowdHsourced&data&

–  Compare&to&data&from&weather&sta-ons&and&satellites&

Page 7: Mining(photo8sharing(websites(to(( study(ecological(phenomena(

Related&Work&

•  CrowdHsourced&observa-onal&data,&e.g.:&–  Es-ma-ng&public&mood&from&Twi]er&[Bollen11]&&

–  Predic-ng&product&sales&from&Flickr&tags&[Jin10]&

–  Es-ma-ng&spread&of&flu&from&search&queries&[Ginsberg09]&

– Monitoring&forest&fires&from&Twi]er&[DeLongueville09]&

•  VolunteerHbased&ci-zen&science&

& The Great Sunflower Project

Page 8: Mining(photo8sharing(websites(to(( study(ecological(phenomena(

Challenges&

•  Incorrect&geotags&and&-mestamps&

•  Difficult&to&recognize&image&content &&

automa-cally&

•  Text&tags&helpful&but&noisy&–  Some&tags&are&completely&incorrect,&others&are&misleading&

•  Dataset&biases&– Many&more&photos&in&ci-es&than&rural&areas&

–  People&more&likely&to&take&photos&of&the&unusual&

•  Misleading&image&data&

–  e.g.&zoos,&ski&slopes,&synthe-c&images,&etc.&

Page 9: Mining(photo8sharing(websites(to(( study(ecological(phenomena(

Combining&evidence&

•  Photos&by&different&people&are&(almost)&independent&

observa-ons,&with&uncorrelated&noise&

#&of&users&tagging&a&photo&

with&“snow”&

Probability&of&actual&snow&

Page 10: Mining(photo8sharing(websites(to(( study(ecological(phenomena(

A&simple&model&

•  Suppose&we’re&interested&in&some&object&X&(e.g.&snow)&–  Specifically,&whether&X&was&present&at&a&given&-me&and&place&

–  Let&s&denote&the&event&that&a&given&user&takes&a&picture&of&X#–  Assume&s&depends&on&presence&of&X:&

P(s&|&X)&=&probability&of&taking&picture&of&X,&given&X&was&present&&

&Could&be&factored&into:&Probability&of&seeing&X,&probability&of&

& &taking&photo,&probability&of&uploading&to&Flickr,&…&

P(s&|&X)&=&probability&of&taking&picture&of&X,&given&X&was&not&present&&

&Bad&-mestamps&or&geotags,&misleading&image&content,&…&

Page 11: Mining(photo8sharing(websites(to(( study(ecological(phenomena(

A&simple&model&

•  Suppose&m&users&took&photos&of&X,&and&n&users&did&not&–  Using&Bayes&law,&

–  Assuming&each&user&acts&independently&(condi-oned&on&X),&&

&

–  High&or&low&ra-o&means&high&or&low&probability&of&X;&&

&ra-o&near&1&means&low&confidence&either&way&

Page 12: Mining(photo8sharing(websites(to(( study(ecological(phenomena(

False&posi-ve&rate&

True&posi-ve&rate&

Snow&es-ma-on&in&ci-es&

•  Es-mate&daily&snow&cover&(presence&or&absence)&&

–  Predict&using&Flickr&photo&tags,&compare&to&ground&truth&

from&Na-onal&Weather&Service&historical&data&

–  Es-mate&parameters&on&2007H2008,&test&on&2009H2010.&&

Tag(set((hand8selected):(({snow,&snowy,&snowing,&&

&&snowstorm}&

&

Model(parameters(((esHmated(from(training(data):(P(s|snow)&=&17.12%&

P(s|no&snow)&=&0.14%&

Page 13: Mining(photo8sharing(websites(to(( study(ecological(phenomena(

Learning&relevant&tags&

•  Find&tags&that&correlate&well&with&snow&cover&in&GT&–  Feature&vector&for&each&day&is&histogram&of&number&of&

people&that&used&each&tag;&labels&are&snow/no&snow&from&GT&

–  Train&on&2007H2008&data,&test&on&2009H2010&data&–  Increases&classifica-on&accuracies&significantly:&

False&posi-ve&rate&

True&posi-ve&

rate&

False&posi-ve&rate&

True&posi-ve&rate&

HandHselected&tags& Learned&tags&(via&SVM)&

Page 14: Mining(photo8sharing(websites(to(( study(ecological(phenomena(

Con-nentalHscale&observa-on&

•  Es-mate&snow&cover&on&each&day&at&

each&place&in&North&America&

–  For&each&geographic&bin&of&size&1°&x&1°&–  Use&ground&truth&data&from&Terra&

satellite&

Snow&cover&

(green)&

No&snow&cover&

(blue)&&

Missing&data&

and&cloud&cover&

(black)&

NASA&Terra&

Page 15: Mining(photo8sharing(websites(to(( study(ecological(phenomena(

Satellite(map((1(degree(geo(bins)(

Map(esHmated(by(Flickr(photo(analysis(

Dec(21,(2009&

No&snow&cover&

(blue)&&

Snow&cover&

(green)&Missing&

data&

(black/

gray)&

Dec(21,(2009&

Page 16: Mining(photo8sharing(websites(to(( study(ecological(phenomena(

Con-nentalHscale&es-ma-on&

•  Predict&presence&of&snow&on&each&day&for&each&geo&bin&–  ~35&million&total&decisions&

False&posi-ve&rate&

True&posi-ve&

rate&

Learned&tags&(with&

different&classifiers)&

Recall&

Precision&

Learned&tags&

Page 17: Mining(photo8sharing(websites(to(( study(ecological(phenomena(

Visual&features&

L&=&(L11,&L

12,&…&L

44)&

&&&

&

&

a&=&(a11,&a

12,&…&a

44)&

&

&

b&=&(b

11,&b

12,&…&b

44)&

Gradient(magnitude(

•  Color&and&texture&features&similar&to&GIST&[Torralba03]&

–  Divide&image&into&array&of&4x4&cells;&in&each&cell&compute&

mean&color&value&(in&CIELab&space)&and&mean&gradient&energy&

& Color(channels(

Image(

Page 18: Mining(photo8sharing(websites(to(( study(ecological(phenomena(

Visual&features&

•  Color&and&texture&features&similar&to&GIST&[Torralba03]&

–  Divide&image&into&array&of&4x4&cells;&in&each&cell&compute&

mean&color&value&(in&CIELab&space)&and&mean&gradient&energy&

&

Gradient(magnitude(

L&=&(L11,&L

12,&…&L

44)&

&&&

&

&

a&=&(a11,&a

12,&…&a

44)&

&

&

b&=&(b

11,&b

12,&…&b

44)&

Color(channels(

L&=&(L11,&L

12,&…,&L

44)&

&&&

&

&

A&=&(a11,&a

12,&…,&a

44)&

&

&

B&=&(b

11,&b

12,&…,&b

44)&

G&=&(G11,&G

12,&…,&G

44)&

Image&descriptor&

is&concatena-on&&

of&L,&A,&B,&and&G((64&dimensions);&

then&learn&&

SVM&classifier&

Image(

Page 19: Mining(photo8sharing(websites(to(( study(ecological(phenomena(

Classifica-on&with&visual&features&

•  Vision&yields&modest&(~3%)&improvement&in&precision&

Correctly(classified(as(non8snow:( Incorrectly(classified(as(snow:(

Page 20: Mining(photo8sharing(websites(to(( study(ecological(phenomena(

Es-ma-ng&vegeta-on&cover&

•  We&also&es-mate&vegeta-on&cover&(greenery&index)&

on&a&con-nental&scale&

–  Again&using&ground&truth&data&from&Terra&satellite&

Page 21: Mining(photo8sharing(websites(to(( study(ecological(phenomena(

BuSerfly(

Page 22: Mining(photo8sharing(websites(to(( study(ecological(phenomena(

Leaves(

Page 23: Mining(photo8sharing(websites(to(( study(ecological(phenomena(

Conclusion&

•  We&propose&to&observe&the&natural&world&through&mining&&

public&photos&from&online&social&sharing&sites&

–  Hundreds&of&billions&of&images&available&

–  But&noise,&bias,&content&extrac-on&are&challenges&•  We&study&two&phenomena,&snow&cover&and&vegeta-on&

–  Using&geoHtags,&-me&stamps,&text&tags,&and&visual&features&

–  Use&ground&truth&from&satellites&to&measure&es-ma-on&accuracy&

•  Future&work&–  More&sophis-cated&computer&vision&techniques&

–  Combine&our&noisy,&sparse&data&with&biologists’&noisy,&sparse&data&

–  Study&other&phenomena,&like&migra-on&pa]erns&of&wildlife,&

distribu-ons&of&blooming&flowers,&etc.&

Page 24: Mining(photo8sharing(websites(to(( study(ecological(phenomena(

Thank&you!&

Page 25: Mining(photo8sharing(websites(to(( study(ecological(phenomena(

False&posi-ves&

Visible(snow,((i.e.&bad&ground&truth,&&

-mestamps,&geotags&

16%&

No(visible(snow,((i.e.&Incorrect&or&&

misleading&tags&

42%&

Trace(or(distant(snow(33%&

Man8made(snow(9%&

(Total&of&N=1,855&photos)&