Upload
amazon-web-services
View
607
Download
4
Embed Size (px)
Citation preview
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Streaming Analytics—Getting Started with Amazon Kinesis
June 20, 2016
What to expect from this sessionAmazon Kinesis: Getting started with streaming data on AWS • Streaming scenarios• Amazon Kinesis Streams overview• Amazon Kinesis Firehose overview • Firehose experience for Amazon S3 and Amazon Redshift
Amazon Kinesis Streams
Build your own custom applications that
process or analyze streaming data
Amazon Kinesis Firehose
Easily load massive volumes of streaming data into Amazon S3 and Amazon Redshift
Amazon Kinesis Analytics
Easily analyze data streams using
standard SQL queries
Amazon Kinesis: Streaming data made easyServices make it easy to capture, deliver, and process streams on AWS
In Preview
What to expect from this sessionAmazon Kinesis streaming data in the AWS cloud• Amazon Kinesis Streams• Amazon Kinesis Firehose (focus of this session)• Amazon Kinesis Analytics
In Preview
Scenarios Accelerated Ingest-Transform-Load
Continual Metrics Generation
Responsive Data Analysis
Data types IT logs, applications logs, social media/clickstreams, sensor or device data, market data
Ad/marketing tech
Publisher, bidder data aggregation
Advertising metrics like coverage, yield, conversion
Analytics on user engagement with ads, optimized bid/buy engines
IoT Sensor, device telemetry data ingestion
IT operational metrics dashboards
Sensor operational intelligence, alerts, and notifications
Gaming Online customer engagement data aggregation
Consumer engagement metrics for level success; transition rates; cost, time, and resources (CTR)
Clickstream analytics, leaderboard generation, player-skill match engines
Consumer engagement
Online customer engagement data aggregation
Consumer engagement metrics like page views, CTR
Clickstream analytics, recommendation engines
Streaming data scenarios across segments
1 2 3
Amazon Kinesis: Streaming data done the AWS wayMakes it easy to capture, deliver, and process real-time data streams
Pay as you go, no up-front costs
Elastically scalable
Right services for your specific use cases
Real-time latencies
Easy to provision, deploy, and manage
Amazon Kinesis StreamsBuild your own data streaming applications
Easy administration: Simply create a new stream and set the desired level of capacity with shards. Scale to match your data throughput rate and volume. Build real-time applications: Perform continual processing on streaming big data using Amazon Kinesis Client Library (KCL), Apache Spark/Storm, AWS Lambda, and more. Low cost: Cost-efficient for workloads of any scale.
Sending and reading data from Streams
AWS SDK
LOG4J
Flume
Fluentd
Get* APIs
Amazon Kinesis Client Library +Connector Library
Apache Storm
Amazon Elastic MapReduce
Sending Consuming
AWS Mobile SDK
Amazon Kinesis Producer Library
AWS Lambda
Apache Spark
Real-time streaming data ingestion
Custom-built streaming applications
Inexpensive: $0.014 per 1,000,000 PUT payload units
Amazon Kinesis StreamsManaged service for real-time processing
We listened to our customers…
Amazon Kinesis Streams select new features…
Amazon Kinesis Producer Library
PutRecords API, 500 records or 5 MB payload
Amazon Kinesis Client Library in Python, Node.js, Ruby…
Server-side time stamps
Increased individual max record payload 50 KB to 1 MB
Reduced end-to-end propagation delay
Extended stream retention from 24 hours to 7 days
Amazon Kinesis Firehose
Amazon Kinesis FirehoseLoad massive volumes of streaming data into Amazon S3 and Amazon Redshift
Zero administration: Capture and deliver streaming data into Amazon S3, Amazon Redshift, and other destinations without writing an application or managing infrastructure.
Direct-to-data store integration: Batch, compress, and encrypt streaming data for delivery into data destinations in as little as 60 seconds using simple configurations.
Seamless elasticity: Seamlessly scales to match data throughput without intervention.
Capture and submit streaming data to Firehose
Firehose loads streaming data continuously into S3 and
Amazon Redshift
Analyze streaming data using your favorite BI tools
• Amazon S3• Amazon Redshift• Amazon
Elasticsearch Service
AWS Platform SDKsMobile SDKsAmazon Kinesis Agent AWS IoT
Amazon S3 Amazon Redshift
• Send data from IT infra, mobile devices, sensors • Integrated with AWS SDK, agents, and AWS IoT
• Fully managed service to capture streaming data
• Elastic w/o resource provisioning• Pay-as-you-go: 3.5 cents/GB transferred
• Batch, compress, and encrypt data before loads• Loads data into Amazon Redshift tables by using
the COPY command
Amazon Kinesis Firehose
Capture IT and app logs, device and sensor data, and more
Enable near-real time analytics using existing tools
Amazon Elasticsearch Service
Scenarios Accelerated Ingest-Transform-Load
Continual Metrics Generation
Responsive Data Analysis
Data Types IT logs, applications logs, social media/clickstreams, sensor or device data, market data
Marketing tech Publisher, bidder data aggregation
Advertising metrics like coverage, yield, conversion
Analytics on user engagement with ads, optimized bid/buy engines
IoT Sensor, device telemetry data ingestion
IT operational metrics dashboards
Sensor operational intelligence, alerts and notifications
Gaming Online customer engagement data aggregation
Consumer engagement metrics for level success, transition rates, CTR
Clickstream analytics, leaderboard generation, player-skill match engines
Consumer online
Online customer engagement data aggregation
Consumer engagement metrics like page views, CTR
Clickstream analytics, recommendation engines
Streaming data scenarios across segments
1 2 3
1. Delivery stream: The underlying entity of Firehose. Use Firehose by creating a delivery stream to a specified destination and send data to it. • You do not have to create a stream or provision shards.• You do not have to specify partition keys.
2. Records: The data producer sends data blobs as large as 1,000 KB to a delivery stream. That data blob is called a record.
3. Data producers: Producers send records to a delivery stream. For example, a web server that sends log data to a delivery stream is a data producer.
Amazon Kinesis Firehose Three simple concepts
Amazon Kinesis Firehose console experience Unified console experience for Firehose and Streams
Amazon Kinesis Firehose console (S3) Create fully managed resources for delivery without building an app
Amazon Kinesis Firehose console (S3) Configure data delivery options simply using the console
Amazon Kinesis Firehose console (Amazon Redshift)
Configure data delivery to Amazon Redshift simply using the console
Amazon Kinesis Firehose console (Amazon Elasticsearch Service)
Amazon Kinesis agentSoftware agent makes submitting data to Firehose easy • Monitors files and sends new data records to your delivery stream• Handles file rotation, check pointing, and retry upon failures• Preprocessing capabilities such as format conversion and log parsing• Delivers all data in a reliable, timely, and simple manner• Emits Amazon CloudWatch metrics to help you better monitor and
troubleshoot the streaming process• Supported on Amazon Linux AMI with version 2015.09 or later, or Red Hat
Enterprise Linux version 7 or later; install on Linux-based server environments such as web servers, front ends, log servers, and more
• Also enabled for Streams
Amazon Kinesis Firehose pricingSimple, pay-as-you-go, and no up-front costs
Dimension ValuePer 1 GB of data ingested $0.035
Amazon Kinesis Firehose or Amazon Kinesis Streams?
Amazon Kinesis Streams is a service for workloads that requires custom processing, per incoming record, with sub-1-second processing latency, and a choice of stream processing frameworks.
Amazon Kinesis Firehose is a service for workloads that require zero administration, ability to use existing analytics tools based on Amazon S3, Amazon Redshift or Amazon Elasticsearch Service, and a data latency of 60 seconds or higher.
Amazon Kinesis Analytics
Amazon Kinesis AnalyticsAnalyze data streams continuously with standard SQL
Apply SQL on streams: Easily connect to data streams and apply existing SQL skills.
Build real-time applications: Perform continual processing on streaming big data with sub-second processing latencies.
Scale elastically: Elastically scales to match data throughput without any operator intervention.
Preview!
Connect to Amazon Kinesis streams, Firehose delivery
streams
Run standard SQL queries against data streams
Amazon Kinesis Analytics can send processed data to analytics tools so you can create alerts and respond in real time
Select Amazon Kinesis Customer Case Studies
Ad tech Gaming IoT
DataXu: Digital AdTech
Amazon Redshift
Sushiro: Kaiten Sushi Restaurants380 stores stream data from sushi plate sensors and stream to Amazon Kinesis
Sushiro: Kaiten Sushi Restaurants380 stores stream data from sushi plate sensors and stream to Kinesis
Sushiro: Kaiten Sushi Restaurants380 stores stream data from sushi plate sensors and stream to Kinesis
Buzzing API
APIReadyData
Amazon KinesisStreams
Node.JS App- Proxy
Clickstream
Data ScienceApplication
Amazon Redshift
ETL on EMR
Users to Hearst
Properties
Final Hearst Data Pipeline
LATENCY
THROUGHPUT
Milliseconds 30 Seconds 100 Seconds 5 Seconds
100 GB/Day 5 GB/Day 1 GB/Day 1 GB/Day
Agg Data Models
Firehose
S3
Thank you!