Upload
amazon-web-services
View
268
Download
1
Embed Size (px)
Citation preview
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
July 13, 2016
Streaming Data Processing
with Amazon Kinesis
Alan Lewis, Principal Architect, Realtor.com
Ray Zhu, Sr. Product Manager, AWS
What to expect from this session
Amazon Kinesis: Getting Started with streaming data on AWS
• Streaming scenarios
• Amazon Kinesis Streams overview
• Amazon Kinesis Firehose overview
• Firehose getting started experience
• Amazon Kinesis at Realtor.com
Need to go a bit faster
Scenarios Accelerated Ingest-
Transform-Load
Continual Metrics
Generation
Responsive Data
Analysis
Data Types IT logs, applications logs, social media / clickstreams, sensor or device data, market data
Ad/ Marketing
Tech
Publisher, bidder data
aggregation
Advertising metrics like
coverage, yield, conversion
Analytics on user
engagement with ads,
optimized bid / buy engines
IoT Sensor, device telemetry
data ingestion
IT operational metrics
dashboards
Sensor operational
intelligence, alerts, and
notifications
Gaming Online customer engagement
data aggregation
Consumer engagement
metrics for level success,
transition rates, CTR
Clickstream analytics,
leaderboard generation,
player-skill match engines
Consumer
Engagement
Online customer engagement
data aggregation
Consumer engagement
metrics like page views,
CTR
Clickstream analytics,
recommendation engines
Streaming data scenarios across segments
1 23
Amazon KinesisServices make it easy to capture, deliver, and process streams on AWS
Amazon Confidential
In Preview
Amazon Kinesis
Streams
Stores data as a
continuous replayable
stream for custom
applications
Amazon Kinesis
Firehose
Load streaming data into
Amazon S3, Amazon
Redshift, and Amazon
Elasticsearch Service
Amazon Kinesis
Analytics
Analyze data streams
using standard SQL
queries
Amazon Kinesis Streams
Amazon Kinesis StreamsStore data as a continuous stream
Easy administration: Simply create a new stream and set the desired level of capacity
with shards. Scale to match your data throughput rate and volume.
Build real-time applications: Perform continual processing on streaming big data using
Amazon Kinesis Client Library (KCL), Apache Spark/Storm, AWS Lambda, and more.
Low cost: Cost-efficient for workloads of any scale.
Amazon Kinesis Firehose
Amazon Kinesis FirehoseLoad massive volumes of streaming data into destinations
Zero administration: Capture and deliver streaming data into Amazon S3, Amazon
Redshift, and other destinations without writing an application or managing infrastructure.
Direct-to-data store integration: Batch, compress, and encrypt streaming data for
delivery into data destinations in as little as 60 secs using simple configurations.
Seamless elasticity: Seamlessly scale to match data throughput without intervention.
Capture and submit
streaming data to Firehose
Firehose loads streaming data
continuously into Amazon S3
and Amazon Redshift
Analyze streaming data using
your favorite BI tools
Amazon Kinesis Firehose
Customer Experience
Amazon Kinesis Firehose console experience Unified console experience for Firehose and Streams
Amazon Kinesis Firehose console (Amazon S3) Create fully managed resources for delivery without building an app
Amazon Kinesis Firehose console (Amazon S3) Configure data delivery options simply using the console
Amazon Kinesis Firehose console (Amazon Redshift)Configure data delivery to Amazon Redshift simply using the console
Amazon Kinesis Firehose console (Amazon ES)Configure data delivery to Amazon ES simply using the console
Amazon Kinesis Firehose monitoringVisibility into and transparency of data delivery
Amazon Kinesis Firehose monitoringError logging for troubleshooting delivery failures
Amazon Kinesis Firehose pricingSimple, pay-as-you-go, and no upfront costs
Dimension Value
Per 1 GB of data ingested $0.035
Kinesis at Realtor.com
What I’d like you to take away
Amazon Kinesis is:
• Simple, reliable, and offers high performance
• A transformative building block with broad applicability
• An enabler for “real time everywhere”
About Realtor.com
First national US real estate
search site
Most accurate real estate
content
Gets data from 99% of MLSs
55 million unique users in April
Realtor.com cloud strategy
Going “all in” on cloud, most
on AWS
About ½ done – BI, search,
geo services, photos all in
AWS now
Strong bias towards AWS
managed services
Customer problem
My listings get lots of traffic at
start, but less over time
I only want people searching
for relevant listings
I want to get more brand
exposure in search
Solution: “Turbo listings” product
Native ad product that
provides customers more
exposure in search
100% relevant placements,
and are like any other listing
Shows the agent profile photo
in search
Turbo technical requirements
Extreme availability and throughput
Multiple systems, both inside and outside VPCs (and
inside/outside AWS)
Auditable, secure billing database
Why Kinesis?
Great performance
Multiproducer, multiconsumer queues
Worry-free managed service
Turbo architecture
AWS
AWS
Mobile
Native
Apps
Decrement
impressions
API
Create
Campaign
API
Update
Campaign
API
Delete
Campaign
API
Campaign
Expired?
Count
Reached
zero?
False
True
True
Campaign Manager
Impression data
{
"campaign_id": "01d329aa-9eb2-426c-9b7b-4877a32fb176",
"id": "a34f271f-058d-47ba-9d45-8140261742a0",
"listing_id": 593893632,
"property_id": 1258201259,
"advertiser_id": "8675309",
"event_type": "turbo_search_impression",
"producer": "fesl",
"client_source": "rdc_web",
"client_version": "8.0",
"page_variation": "list_view",
"timestamp": "2016-03-02T00:47:25+00:00",
"user_agent": "...”
}
Impression tracking flow
AWS
Lambda
Pull events
Amazon
RDSAmazon EC2
Amazon Kinesis
Streams
Post to web
service
Decrement in
DB
Campaign manager
Billing flow
Amazon
DynamoDB
Amazon
Redshift
AWS
Lambda
Amazon
S3Amazon Kinesis
StreamsAmazon Kinesis
Firehose
AWS KMS Private subnet
AWS
Lambda
AWS
Lambda
Validate
event
Firehose
PutRecordFirehose
destination
SSE-KMS
encryption on
Amazon S3
Amazon S3
notification
Status
tracking
Event
source
COPY
command
KMS encryption
on Amazon Redshift
Data transfer
In JSON
Event data
in JSON
Redshift – 15 minute batches
Outcomes: Huge scale
Serving millions of impressions per day on 2 Kinesis
shards
Tested up to 20x current site traffic
Basically, we couldn’t break it
Outcomes: Great performance
Latencies in single or low
double digit milliseconds
Events are processed in small
batches for efficiency
For our purposes, Kinesis
gives us real time data
streaming
Lessons learned
Complexity with Amazon Redshift and private subnets
Must consider what dedupe behavior you need
Simple key–value data JSON structure pays dividends
Future: Real time pipeline
Real time is the pinnacle
Collect data on page 1, and
act on page 2
What we’ve built on Kinesis
with the turbo feature is the
starting point for us
Photo by @snordq on Flickr. Creative Commons License
What I’d like you to take away
Amazon Kinesis is:
Simple, reliable, and offers high performance
A transformative building block with broad applicability
An enabler for “real time everywhere”
One final thing…
Hiring! Search for “realtor.com careers” (careers.move.com)
Software engineers, QA engineers, data scientists, product
managers, and project managers
In Santa Clara, Ventura County, Vancouver, Canada, and
Morgantown, WV
Thank you: Eddy Luten, Viren Nagtode, and Sonal Shirke
Thank you!