23
Let’s introduce Amazon Kinesis Inaugural meetup of the Amazon Kinesis - London User Group

Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

Embed Size (px)

DESCRIPTION

This is my presentation to the inaugural meetup of the Amazon Kinesis London User Group. In it I briefly introduced Snowplow, explained why we were excited about Kinesis (drawing on my "three eras" blog post) and then set out how we are updating Snowplow to run on Kinesis. I concluded with a live demo of what we have running on Kinesis so far.

Citation preview

Page 1: Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

Let’s introduce Amazon KinesisInaugural meetup of the

Amazon Kinesis - London User Group

Page 2: Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

This evening

• Introducing Amazon Kinesis, Ian Meyers, AWS

• Pizza and drinks break

• Kinesis and Snowplow, Alex Dean, Snowplow Analytics

• Drinks

• All courtesy of our hosts:

Page 3: Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

Introducing Amazon Kinesis

Page 4: Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

Snowplow and Kinesis

1. Snowplow – who we are

2. Why are we excited about Kinesis?

3. Adding Kinesis support to Snowplow

4. Live demo!

5. Questions

Page 5: Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

Snowplow – who we are

Page 6: Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

Today, Snowplow is primarily an open source web analytics platform

Website / webappSnowplow: data pipeline

Collect Transform and enrich

Amazon Redshift /

PostgreSQL

Amazon S3

• Your granular, event-level and customer-level data, in your own data warehouse

• Connect any analytics tool to your data• Join your web analytics data with any other data set

Page 7: Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

Snowplow was born out of our frustration with traditional web analytics tools…• Limited set of reports that don’t answer business questions

• Traffic levels by source• Conversion levels• Bounce rates• Pages / visit

• Web analytics tools don’t understand the entities that matter to business• Customers, intentions, behaviours, articles, videos, authors,

subjects, services… • …vs pages, conversions, goals, clicks, transactions

• Web analytics tools are siloed• Hard to integrate with other data sets incl. digital (marketing

spend, ad server data), customer data (CRM), financial data (cost of goods, customer lifetime value)

Page 8: Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

…and out of the opportunities to tame big data new technologies presented

These tools make it possible to capture, transform, store and analyse all your granular, event-level data, to you can perform any analysis

Page 9: Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

Snowplow is composed of a set of loosely coupled subsystems, architected to be robust and scalable

1. Trackers 2. Collectors 3. Enrich 4. Storage 5. AnalyticsA B C D

A D Standardised data protocols

Generate event data

Examples:• Javascript

tracker• Python /

Lua / No-JS / Arduino tracker

Receive data from trackers and log it to S3

Examples:• Cloudfront

collector• Clojure

collector for Amazon EB

Clean and enrich raw data

Built on Scalding / Cascading / Hadoop and powered by Amazon EMR

Store data ready for analysis

Examples:• Amazon

Redshift• PostgreSQL• Amazon S3

• Batch-based• Normally run overnight; sometimes

every 4-6 hours

Page 10: Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

Why are we excited about Kinesis?

Page 11: Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

A quick history lesson: the three eras of business data processing

1. The classic era, 1996+

2. The hybrid era, 2005+

3. The unified era, 2013+

For more see http://snowplowanalytics.com/blog/2014/01/20/the-three-eras-of-business-data-processing/

Page 12: Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

The classic era, 1996+

OWN DATA CENTER

Data warehouse

HIGH LATENCY

Point-to-point connections

WIDE DATA COVERAGE

CMS

Silo

CRM

Local loop Local loop

NARROW DATA SILOES LOW LATENCY LOCAL LOOPS

E-comm

SiloLocal loop

Management reporting

ERP

SiloLocal loop

Silo

Nightly batch ETL process

FULL DATA HISTORY

Page 13: Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

The hybrid era, 2005+

CLOUD VENDOR / OWN DATA CENTER

Search

SiloLocal loop

LOW LATENCY LOCAL LOOPS

E-comm

SiloLocal loop

CRM

Local loop

SAAS VENDOR #2

Email marketing

Local loop

ERP

SiloLocal loop

CMS

SiloLocal loop

SAAS VENDOR #1

NARROW DATA SILOES

Stream processing

Productrec’s

Micro-batch processing

Systems monitoring

Batch processing

Data warehouse

Management reporting

Batch processing

Ad hoc analytics

Hadoop

SAAS VENDOR #3

Web analytics

Local loop

Local loop Local loop

LOW LATENCY LOW LATENCY

HIGH LATENCY HIGH LATENCY

APIs

Bulk exports

Page 14: Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

The unified era, 2013+CLOUD VENDOR / OWN DATA CENTER

Search

Silo

SOME LOW LATENCY LOCAL LOOPS

E-comm

Silo

CRM

SAAS VENDOR #2

Email marketing

ERP

Silo

CMS

Silo

SAAS VENDOR #1

NARROW DATA SILOES

Streaming APIs / web hooks

Unified log

LOW LATENCY WIDE DATA

COVERAGE

Archiving

Hadoop

< WIDE DATA

COVERAGE >

< FULL DATA

HISTORY >

FEW DAYS’ DATA HISTORY

Systems monitoring

Eventstream

HIGH LATENCY LOW LATENCY

Product rec’sAd hoc analytics

Management reporting

Fraud detection

Churn prevention

APIs

Page 15: Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

CLOUD VENDOR / OWN DATA CENTER

Search

Silo

SOME LOW LATENCY LOCAL LOOPS

E-comm

Silo

CRM

SAAS VENDOR #2

Email marketing

ERP

Silo

CMS

Silo

SAAS VENDOR #1

NARROW DATA SILOES

Streaming APIs / web hooks

Unified log

Archiving

Hadoop

< WIDE DATA

COVERAGE >

< FULL DATA

HISTORY >

Systems monitoring

Eventstream

HIGH LATENCY LOW LATENCY

Product rec’sAd hoc analytics

Management reporting

Fraud detection

Churn prevention

APIs

The unified log is Kinesis (or Kafka)

Page 16: Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

CLOUD VENDOR / OWN DATA CENTER

Search

Silo

SOME LOW LATENCY LOCAL LOOPS

E-comm

Silo

CRM

SAAS VENDOR #2

Email marketing

ERP

Silo

CMS

Silo

SAAS VENDOR #1

NARROW DATA SILOES

Streaming APIs / web hooks

Unified log

Archiving

Hadoop

< WIDE DATA

COVERAGE >

< FULL DATA

HISTORY >

Systems monitoring

Eventstream

HIGH LATENCY LOW LATENCY

Product rec’sAd hoc analytics

Management reporting

Fraud detection

Churn prevention

APIs

Can we implement Snowplow on top of Kinesis?

Page 17: Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

Adding Kinesis support to Snowplow

Page 18: Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

Where we are heading with our Kinesis architecture

Scala Stream Collector

Raw event stream

Enrich Kinesis app

Bad raw events stream

Enriched event

stream

S3

Redshift

S3 sink Kinesis app

Redshift sink Kinesis

app

Snowplow Trackers

Page 19: Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

We took an important first step in our last release…

hadoop-etl

Record-level enrichment functionality

scala-common-enrich

scala-hadoop-enrich scala-kinesis-enrich

0.8.12pre-0.8.12

Page 20: Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

… and the next release should get us much closer

Scala Stream Collector

Raw event stream

Enrich Kinesis app

Bad raw events stream

Enriched event

stream

S3

Redshift

S3 sink Kinesis app

Redshift sink Kinesis app

Snowplow Trackers

Page 21: Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

Live demo!

Page 22: Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

Questions?

http://snowplowanalytics.comhttps://github.com/snowplow/snowplow

@snowplowdata

Page 23: Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London User Group

And finally…

Huge thanks to our hosts!