38
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. July 13, 2016 Streaming Data Processing with Amazon Kinesis Alan Lewis, Principal Architect, Realtor.com Ray Zhu, Sr. Product Manager, AWS

Streaming Data Processing with Amazon Kinesis

Embed Size (px)

Citation preview

Page 1: Streaming Data Processing with Amazon Kinesis

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

July 13, 2016

Streaming Data Processing

with Amazon Kinesis

Alan Lewis, Principal Architect, Realtor.com

Ray Zhu, Sr. Product Manager, AWS

Page 2: Streaming Data Processing with Amazon Kinesis

What to expect from this session

Amazon Kinesis: Getting Started with streaming data on AWS

• Streaming scenarios

• Amazon Kinesis Streams overview

• Amazon Kinesis Firehose overview

• Firehose getting started experience

• Amazon Kinesis at Realtor.com

Page 3: Streaming Data Processing with Amazon Kinesis

Need to go a bit faster

Page 4: Streaming Data Processing with Amazon Kinesis

Scenarios Accelerated Ingest-

Transform-Load

Continual Metrics

Generation

Responsive Data

Analysis

Data Types IT logs, applications logs, social media / clickstreams, sensor or device data, market data

Ad/ Marketing

Tech

Publisher, bidder data

aggregation

Advertising metrics like

coverage, yield, conversion

Analytics on user

engagement with ads,

optimized bid / buy engines

IoT Sensor, device telemetry

data ingestion

IT operational metrics

dashboards

Sensor operational

intelligence, alerts, and

notifications

Gaming Online customer engagement

data aggregation

Consumer engagement

metrics for level success,

transition rates, CTR

Clickstream analytics,

leaderboard generation,

player-skill match engines

Consumer

Engagement

Online customer engagement

data aggregation

Consumer engagement

metrics like page views,

CTR

Clickstream analytics,

recommendation engines

Streaming data scenarios across segments

1 23

Page 5: Streaming Data Processing with Amazon Kinesis

Amazon KinesisServices make it easy to capture, deliver, and process streams on AWS

Amazon Confidential

In Preview

Amazon Kinesis

Streams

Stores data as a

continuous replayable

stream for custom

applications

Amazon Kinesis

Firehose

Load streaming data into

Amazon S3, Amazon

Redshift, and Amazon

Elasticsearch Service

Amazon Kinesis

Analytics

Analyze data streams

using standard SQL

queries

Page 6: Streaming Data Processing with Amazon Kinesis

Amazon Kinesis Streams

Page 7: Streaming Data Processing with Amazon Kinesis

Amazon Kinesis StreamsStore data as a continuous stream

Easy administration: Simply create a new stream and set the desired level of capacity

with shards. Scale to match your data throughput rate and volume.

Build real-time applications: Perform continual processing on streaming big data using

Amazon Kinesis Client Library (KCL), Apache Spark/Storm, AWS Lambda, and more.

Low cost: Cost-efficient for workloads of any scale.

Page 8: Streaming Data Processing with Amazon Kinesis

Amazon Kinesis Firehose

Page 9: Streaming Data Processing with Amazon Kinesis

Amazon Kinesis FirehoseLoad massive volumes of streaming data into destinations

Zero administration: Capture and deliver streaming data into Amazon S3, Amazon

Redshift, and other destinations without writing an application or managing infrastructure.

Direct-to-data store integration: Batch, compress, and encrypt streaming data for

delivery into data destinations in as little as 60 secs using simple configurations.

Seamless elasticity: Seamlessly scale to match data throughput without intervention.

Capture and submit

streaming data to Firehose

Firehose loads streaming data

continuously into Amazon S3

and Amazon Redshift

Analyze streaming data using

your favorite BI tools

Page 10: Streaming Data Processing with Amazon Kinesis

Amazon Kinesis Firehose

Customer Experience

Page 11: Streaming Data Processing with Amazon Kinesis

Amazon Kinesis Firehose console experience Unified console experience for Firehose and Streams

Page 12: Streaming Data Processing with Amazon Kinesis

Amazon Kinesis Firehose console (Amazon S3) Create fully managed resources for delivery without building an app

Page 13: Streaming Data Processing with Amazon Kinesis

Amazon Kinesis Firehose console (Amazon S3) Configure data delivery options simply using the console

Page 14: Streaming Data Processing with Amazon Kinesis

Amazon Kinesis Firehose console (Amazon Redshift)Configure data delivery to Amazon Redshift simply using the console

Page 15: Streaming Data Processing with Amazon Kinesis

Amazon Kinesis Firehose console (Amazon ES)Configure data delivery to Amazon ES simply using the console

Page 16: Streaming Data Processing with Amazon Kinesis

Amazon Kinesis Firehose monitoringVisibility into and transparency of data delivery

Page 17: Streaming Data Processing with Amazon Kinesis

Amazon Kinesis Firehose monitoringError logging for troubleshooting delivery failures

Page 18: Streaming Data Processing with Amazon Kinesis

Amazon Kinesis Firehose pricingSimple, pay-as-you-go, and no upfront costs

Dimension Value

Per 1 GB of data ingested $0.035

Page 19: Streaming Data Processing with Amazon Kinesis

Kinesis at Realtor.com

Page 20: Streaming Data Processing with Amazon Kinesis

What I’d like you to take away

Amazon Kinesis is:

• Simple, reliable, and offers high performance

• A transformative building block with broad applicability

• An enabler for “real time everywhere”

Page 21: Streaming Data Processing with Amazon Kinesis

About Realtor.com

First national US real estate

search site

Most accurate real estate

content

Gets data from 99% of MLSs

55 million unique users in April

Page 22: Streaming Data Processing with Amazon Kinesis

Realtor.com cloud strategy

Going “all in” on cloud, most

on AWS

About ½ done – BI, search,

geo services, photos all in

AWS now

Strong bias towards AWS

managed services

Page 23: Streaming Data Processing with Amazon Kinesis

Customer problem

My listings get lots of traffic at

start, but less over time

I only want people searching

for relevant listings

I want to get more brand

exposure in search

Page 24: Streaming Data Processing with Amazon Kinesis

Solution: “Turbo listings” product

Native ad product that

provides customers more

exposure in search

100% relevant placements,

and are like any other listing

Shows the agent profile photo

in search

Page 25: Streaming Data Processing with Amazon Kinesis

Turbo technical requirements

Extreme availability and throughput

Multiple systems, both inside and outside VPCs (and

inside/outside AWS)

Auditable, secure billing database

Page 26: Streaming Data Processing with Amazon Kinesis

Why Kinesis?

Great performance

Multiproducer, multiconsumer queues

Worry-free managed service

Page 27: Streaming Data Processing with Amazon Kinesis

Turbo architecture

AWS

AWS

Mobile

Native

Apps

Decrement

impressions

API

Create

Campaign

API

Update

Campaign

API

Delete

Campaign

API

Campaign

Expired?

Count

Reached

zero?

False

True

True

Campaign Manager

Page 28: Streaming Data Processing with Amazon Kinesis

Impression data

{

"campaign_id": "01d329aa-9eb2-426c-9b7b-4877a32fb176",

"id": "a34f271f-058d-47ba-9d45-8140261742a0",

"listing_id": 593893632,

"property_id": 1258201259,

"advertiser_id": "8675309",

"event_type": "turbo_search_impression",

"producer": "fesl",

"client_source": "rdc_web",

"client_version": "8.0",

"page_variation": "list_view",

"timestamp": "2016-03-02T00:47:25+00:00",

"user_agent": "...”

}

Page 29: Streaming Data Processing with Amazon Kinesis

Impression tracking flow

AWS

Lambda

Pull events

Amazon

RDSAmazon EC2

Amazon Kinesis

Streams

Post to web

service

Decrement in

DB

Campaign manager

Page 30: Streaming Data Processing with Amazon Kinesis

Billing flow

Amazon

DynamoDB

Amazon

Redshift

AWS

Lambda

Amazon

S3Amazon Kinesis

StreamsAmazon Kinesis

Firehose

AWS KMS Private subnet

AWS

Lambda

AWS

Lambda

Validate

event

Firehose

PutRecordFirehose

destination

SSE-KMS

encryption on

Amazon S3

Amazon S3

notification

Status

tracking

Event

source

COPY

command

KMS encryption

on Amazon Redshift

Data transfer

In JSON

Event data

in JSON

Page 31: Streaming Data Processing with Amazon Kinesis

Redshift – 15 minute batches

Page 32: Streaming Data Processing with Amazon Kinesis

Outcomes: Huge scale

Serving millions of impressions per day on 2 Kinesis

shards

Tested up to 20x current site traffic

Basically, we couldn’t break it

Page 33: Streaming Data Processing with Amazon Kinesis

Outcomes: Great performance

Latencies in single or low

double digit milliseconds

Events are processed in small

batches for efficiency

For our purposes, Kinesis

gives us real time data

streaming

Page 34: Streaming Data Processing with Amazon Kinesis

Lessons learned

Complexity with Amazon Redshift and private subnets

Must consider what dedupe behavior you need

Simple key–value data JSON structure pays dividends

Page 35: Streaming Data Processing with Amazon Kinesis

Future: Real time pipeline

Real time is the pinnacle

Collect data on page 1, and

act on page 2

What we’ve built on Kinesis

with the turbo feature is the

starting point for us

Photo by @snordq on Flickr. Creative Commons License

Page 36: Streaming Data Processing with Amazon Kinesis

What I’d like you to take away

Amazon Kinesis is:

Simple, reliable, and offers high performance

A transformative building block with broad applicability

An enabler for “real time everywhere”

Page 37: Streaming Data Processing with Amazon Kinesis

One final thing…

Hiring! Search for “realtor.com careers” (careers.move.com)

Software engineers, QA engineers, data scientists, product

managers, and project managers

In Santa Clara, Ventura County, Vancouver, Canada, and

Morgantown, WV

Thank you: Eddy Luten, Viren Nagtode, and Sonal Shirke

Page 38: Streaming Data Processing with Amazon Kinesis

Thank you!