13
a real-time bird tracker for Central Park Eamon Kavanagh, Insight Data Engineering Fellowship Summer 2016

Bird Feed

Embed Size (px)

Citation preview

a real-time bird tracker for Central Park

Eamon Kavanagh, Insight Data Engineering Fellowship Summer 2016

Motivation & Main Problems•  Birds can be fast and elusive unless you know where to look

•  How do you process real-time location and trending data? •  How do you properly handle unreliable sensor data? •  Can you store data in a way to ensure accuracy in batch?

Hooded Warbler Yellow-rumped Warbler

Motivation & Main Problems•  Birds can be fast and elusive unless you know where to look

•  How do you process real-time location and trending data? •  How do you properly handle unreliable sensor data? •  Can you store data in a way to ensure accuracy in batch?

Hooded Warbler Yellow-rumped Warbler

Motivation & Main Problems•  Birds can be fast and elusive unless you know where to look

•  How do you process real-time location and trending data? •  How do you properly handle unreliable sensor data? •  Can you store data in a way to ensure accuracy in batch?

Hooded Warbler Yellow-rumped Warbler

Motivation & Main Problems•  Birds can be fast and elusive unless you know where to look

•  How do you process real-time location and trending data? •  How do you properly handle unreliable sensor data? •  Can you store data in a way to ensure accuracy in batch?

Hooded Warbler Yellow-rumped Warbler

Demoeamonkavanagh.com/bird-feed

Pipeline{“name”: “Catbird”, “family”: “Thrush”, “lat”: …}

Challenges & Solutions•  Managing real-time location and trending data to have

up-to-date queries

•  Properly handling out-of-order real-time data so you have a sense of computational accuracy

•  Using very new open-source technology (cloned Flink locally to implement a bug fix before it was officially released)

Challenges & Solutions•  Managing real-time location and trending data to have

up-to-date queries

Challenges & Solutions•  Managing real-time location and trending data to have

up-to-date ‘near me’ queries

[Streaming Windows in Apache Flink] Retrieved June 23, 2016 link

Challenges & Solutions•  Properly handling out-of-order real-time data so you have a

sense of computational accuracy

Challenges & Solutions•  Properly handling out-of-order real-time data so you have a

sense of computational accuracy

[Watermarks in Apache Flink] Retrieved June 23, 2016 link

About Me•  ~2 years experience as a data scientist in ad tech •  MSc in Applied Mathematics (University of British Columbia) •  BSc in Pure Mathematics (McMaster University)