60
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. John Yeung, Solutions Architect 31 October 2017 Deep Dive on AWS with Demo AWS Big Data and Machine Learning Day | Hong Kong

Deep Dive on Big Data

Embed Size (px)

Citation preview

Page 1: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

John Yeung, Solutions Architect

31 October 2017

Deep Dive on AWS with DemoAWS Big Data and Machine Learning Day | Hong Kong

Page 2: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

What to expect from the session

Big Data ChallengesArchitectural PrinciplesDesign PatternsDemo (around 15 mins)

Page 3: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Ever-Increasing Big Data

Volume

Velocity

Variety

Page 4: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Big Data Evolution

Batch Processing

StreamProcessing

MachineLearning

Page 5: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Plenty of Tools

Amazon Glacier

S3 DynamoDB

RDS

EMR

Amazon Redshift

Data PipelineAmazon Kinesis

Amazon Kinesis Streams app

Lambda Amazon ML

SQS

ElastiCache

DynamoDBStreams

Amazon ElasticsearchService

Amazon Kinesis Analytics

Page 6: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Big Data Challenges

Why?

How?

What tools should I use?

Is there a reference architecture?

Page 7: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Architectural Principles

Page 8: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Architecture Principles

#1: Build Decoupled Systems• Data → Store → Process → Store → Analyze → Answers

#2: Use Right Tool for the Job• Data structure, latency, throughput, access patterns

#3: Leverage AWS Managed Services• Scalable/elastic, available, reliable, secure, no/low admin

#4: Use Lambda Architecture Ideas• Immutable (append-only) log, batch/speed/serving layer

#5: Be Cost-conscious• Big data ≠ big cost

Page 9: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Simplify Big Data Processing

COLLECT STORE PROCESS/ANALYZE CONSUME

1. Time to answer (Latency)2. Throughput

3. Cost

Page 10: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

COLLECT

Page 11: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Types of DataCOLLECT

Mobile apps

Web apps

Data centersAWS Direct

Connect

RECORDS

Appl

icat

ions In-memory data

Database records

AWS Import/ExportSnowball

Logging

Amazon CloudWatch

AWS CloudTrail

DOCUMENTS

FILES

Logg

ing

Tran

spor

t

Search documents

Log files

MessagingMessage MESSAGES

Mes

sagi

ng

Messages

Devices

Sensors & IoT platforms

AWS IoT STREAMS

IoT Data streams

Transaction-based

File-based

Event-based

Page 12: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Store

Page 13: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

STORE

Devices

Sensors & IoT platforms

AWS IoT STREAMS

IoT

COLLECT

AWS Import/ExportSnowball

Logging

Amazon CloudWatch

AWS CloudTrail

DOCUMENTS

FILES

Logg

ing

Tran

spor

t

MessagingMessage MESSAGES

Mes

sagi

ngAp

plic

atio

ns

Mobile apps

Web apps

Data centersAWS Direct

Connect

RECORDS

Types of Data Stores

Database SQL & NoSQL databases

Search Search engines

File store File systems

Queue Message queues

Streamstorage

Pub/sub message queues

In-memory Caches

Page 14: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

In-memory

COLLECT STORE

Mobile apps

Web apps

Data centersAWS Direct

Connect

RECORDS Database

AWS Import/ExportSnowball

Logging

Amazon CloudWatch

AWS CloudTrail

DOCUMENTS

FILES

Search

MessagingMessage MESSAGES

Devices

Sensors & IoT platforms

AWS IoT STREAMS

Apache Kafka

Amazon KinesisStreams

Amazon Kinesis Firehose

Amazon DynamoDB Streams

Hot

Stre

am

Amazon SQS

Mes

sage

Amazon S3File

Logg

ing

IoT

Appl

icat

ions

Tran

spor

tM

essa

ging

In-memory, Database, Search

Page 15: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

COLLECT STORE

Mobile apps

Web apps

Data centersAWS Direct

Connect

RECORDS

AWS Import/ExportSnowball

Logging

Amazon CloudWatch

AWS CloudTrail

DOCUMENTS

FILES

MessagingMessage MESSAGES

Devices

Sensors & IoT platforms

AWS IoT STREAMS

Apache Kafka

Amazon KinesisStreams

Amazon Kinesis Firehose

Amazon DynamoDB Streams

Hot

Stre

am

Amazon SQS

Mes

sage

Amazon Elasticsearch Service

Amazon DynamoDB

Amazon S3

Amazon ElastiCache

Amazon RDS

Sear

ch

SQL

N

oSQ

L C

ache

File

Logg

ing

IoT

Appl

icat

ions

Tran

spor

tM

essa

ging

Amazon ElastiCache• Managed Memcached or Redis service

Amazon DynamoDB• Managed NoSQL database service

Amazon RDS• Managed relational database service

Amazon Elasticsearch Service• Managed Elasticsearch service

Page 16: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Use the Right Tool for the Job

Data Tier

Search

Amazon Elasticsearch Service

In-memory

Amazon ElastiCacheRedisMemcached

SQL

Amazon AuroraAmazon RDS

MySQLPostgreSQLOracleSQL Server

NoSQL

Amazon DynamoDBCassandraHBaseMongoDB

Page 17: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

In-memory

COLLECT STORE

Mobile apps

Web apps

Data centersAWS Direct

Connect

RECORDSDatabase

AWS Import/ExportSnowball

Logging

Amazon CloudWatch

AWS CloudTrail

DOCUMENTS

FILES

Search

MessagingMessage MESSAGES

Devices

Sensors & IoT platforms

AWS IoT STREAMS

Apache Kafka

Amazon KinesisStreams

Amazon Kinesis Firehose

Amazon DynamoDB Streams

Hot

Stre

am

Amazon S3

Amazon SQS

Mes

sage

Amazon S3File

Logg

ing

IoT

Appl

icat

ions

Tran

spor

tM

essa

ging

File Storage

Page 18: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Why Is Amazon S3 Good for Big DataNatively supported by big data frameworks (Spark, Hive, Presto, etc.) Multiple & heterogeneous analysis clusters can use the same dataUnlimited number of objects and volume of dataVery high bandwidth – no aggregate throughput limitDesigned for 99.99% availability – can tolerate zone failureDesigned for 99.999999999% durabilityNo need to pay for data replicationNative support for versioningTiered-storage (Standard, IA, Amazon Glacier) via life-cycle policiesSecure – SSL, client/server-side encryption at restLow cost

Page 19: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

In-memory

Amazon Kinesis Firehose

Amazon KinesisStreams

Apache Kafka

Amazon DynamoDB Streams

Amazon SQS

Amazon SQS• Managed message queue service

Apache Kafka• High throughput distributed streaming platform

Amazon Kinesis Streams• Managed stream storage + processing

Amazon Kinesis Firehose• Managed data delivery

Amazon DynamoDB• Managed NoSQL database• Tables can be stream-enabled

Message & Stream Storage

Devices

Sensors & IoT platforms

AWS IoT STREAMS

IoT

COLLECT STORE

Mobile apps

Web apps

Data centersAWS Direct

Connect

RECORDSDatabaseAp

plic

atio

ns

AWS Import/ExportSnowball

Logging

Amazon CloudWatch

AWS CloudTrail

DOCUMENTS

FILES

Search

File store

Logg

ing

Tran

spor

t

MessagingMessage MESSAGES

Mes

sagi

ng

Mes

sage

Stre

am

Page 20: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Why Stream Storage

Decouple producers & consumers

Persistent buffer

Collect multiple streams

Preserve client ordering

Parallel consumption

4 4 3 3 2 2 1 14 3 2 1

4 3 2 1

4 3 2 1

4 3 2 14 4 3 3 2 2 1 1

shard 1 / partition 1

shard 2 / partition 2

Consumer 1Count of red = 4

Count of violet = 4

Consumer 2Count of blue = 4

Count of green = 4

DynamoDB stream Amazon Kinesis stream Kafka topic

Page 21: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

What Stream Storage should I use?AmazonDynamoDBStreams

AmazonKinesisStreams

AmazonKinesis Firehose

ApacheKafka

AmazonSQS

AWS managed service

Yes Yes Yes No Yes

Guaranteedordering

Yes Yes Yes Yes No

Delivery exactly-once at-least-once exactly-once at-least-once at-least-once

Data retention period

24 hours 7 days N/A Configurable 14 days

Availability 3 AZ 3 AZ 3 AZ Configurable 3 AZ

Scale / throughput

No limit /~ table IOPS

No limit /~ shards

No limit /automatic

No limit /~ nodes

No limits /automatic

Parallel clients Yes Yes No Yes No

Stream MapReduce Yes Yes N/A Yes N/A

Record/object size 400 KB 1 MB Redshift row size Configurable 256 KB

Cost Higher (table cost) Low Low Low (+admin) Low-medium

Hot Warm

Page 22: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Which Data Store Should I Use

Data Structure → Fixed schema, JSON, key-value

Access Patterns → Store data in the format you will access it

Data Characteristics → Hot, Warm, Cold

Cost → Right cost

Page 23: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Data Structure and Access Patterns

Access Patterns What to use?Put/Get (key, value) In-memory, NoSQLSimple relationships → 1:N, M:N NoSQLMulti-table joins, transaction, SQL SQLFaceting, search Search

Data Structure What to use?Fixed schema SQL, NoSQLSchema-free (JSON) NoSQL, Search(Key, value) In-memory, NoSQL

Page 24: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

What is the temperature of your data

Page 25: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Data characteristics: Hot, Warm or Cold

Hot Warm ColdVolume MB–GB GB–TB PB–EBItem size B–KB KB–MB KB–TBLatency ms ms, sec min, hrsDurability Low–high High Very highRequest rate Very high High LowCost/GB $$-$ $-¢¢ ¢

Hot data Warm data Cold data

Page 26: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

In-memory SQL

Request rateHigh Low

Cost/GBHigh Low

LatencyLow High

Data volumeLow High

Amazon Glacier

Stru

ctur

e

NoSQL

Hot data Warm data Cold data

Low

High

Page 27: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Which Data Store Should I UseAmazon ElastiCache

AmazonDynamoDB

AmazonRDS/Aurora

AmazonES

Amazon S3

AmazonGlacier

Average latency

ms ms ms, sec ms,sec ms,sec,min(~ size)

hrs

Typicaldata stored

GB GB–TBs(no limit)

GB–TB(64 TB max)

GB–TB MB–PB(no limit)

GB–PB(no limit)

Typicalitem size

B-KB KB(400 KB max)

KB(64 KB max)

B-KB(2 GB max)

KB-TB(5 TB max)

GB(40 TB max)

Request Rate

High – very high Very high(no limit)

High High Low – high(no limit)

Very low

Storage costGB/month

$$ ¢¢ ¢¢ ¢¢ ¢ ¢4/10

Durability Low - moderate Very high Very high High Very high Very high

Availability High2 AZ

Very high 3 AZ

Very high3 AZ

High2 AZ

Very high3 AZ

Very high3 AZ

Hot data Warm data Cold data

Page 28: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

PROCESS / ANALYZE

Page 29: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Analytics & FrameworksInteractive

Takes secondsExample: Self-service dashboardsAmazon Redshift, Amazon Athena, Amazon EMR (Presto, Spark)

BatchTakes minutes to hours Example: Daily/weekly/monthly reportsAmazon EMR (MapReduce, Hive, Pig, Spark)

MessageTakes milliseconds to secondsExample: Message processingAmazon SQS applications on Amazon EC2

StreamTakes milliseconds to secondsExample: Fraud alerts, 1 minute metricsAmazon EMR (Spark Streaming), Amazon Kinesis Analytics, KCL, Storm, AWS Lambda

PROCESS / ANALYZE

Amazon Machine LearningM

LM

essa

ge

Amazon SQS appsAmazon EC2

Streaming

Amazon Kinesis Analytics

KCLapps

AWS Lambda

Stre

am

Amazon EC2

Amazon EMR

Fast

Amazon Redshift

Presto

AmazonEMR

Fast

Slow

Amazon Athena

Batc

hIn

tera

ctiv

e

Page 30: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

What about ETL

https://aws.amazon.com/big-data/partner-solutions/

ETLSTORE PROCESS / ANALYZE

Data Integration PartnersReduce the effort to move, cleanse, synchronize, manage, and automatize data related processes. AWS Glue

AWSGlueisafullymanagedETLservicethatmakesiteasytounderstandyourdatasources,preparethedata,andmoveitreliablybetweendatastores

New

Page 31: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

CONSUME

Page 32: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

COLLECT STORE CONSUMEPROCESS / ANALYZE

Amazon Elasticsearch Service

Apache Kafka

Amazon SQS

Amazon KinesisStreams

Amazon Kinesis Firehose

Amazon DynamoDB

Amazon S3

Amazon ElastiCache

Amazon RDS

Amazon DynamoDB Streams

Hot

Hot

War

m

File

Mes

sage

Stre

am

Mobile apps

Web apps

Devices

MessagingMessage

Sensors & IoT platforms

AWS IoT

Data centersAWS Direct

Connect

AWS Import/ExportSnowball

Logging

Amazon CloudWatch

AWS CloudTrail

RECORDS

DOCUMENTS

FILES

MESSAGES

STREAMS

Logg

ing

IoT

Appl

icat

ions

Tran

spor

tM

essa

ging

ETL

Sear

ch

SQL

N

oSQ

L C

ache

Streaming

Amazon Kinesis Analytics

KCLapps

AWS Lambda

Fast

Stre

am

Amazon EC2

Amazon EMR

Amazon SQS apps

Amazon Redshift

Amazon Machine Learning

Presto

AmazonEMR

Fast

Slow

Amazon EC2

Amazon Athena

Batc

hM

essa

geIn

tera

ctiv

eM

L

Page 33: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

STORE CONSUMEPROCESS / ANALYZE

Amazon QuickSight

Apps & Services

Anal

ysis

& v

isua

lizat

ion

Not

eboo

ks

IDE

API

Applications & API

Analysis and visualization

Notebooks

IDE

Business users

Data scientist, developers

COLLECT ETL

Page 34: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Put them together

Page 35: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Streaming

Amazon Kinesis Analytics

KCLapps

AWS Lambda

COLLECT STORE CONSUMEPROCESS / ANALYZE

Amazon Elasticsearch Service

Apache Kafka

Amazon SQS

Amazon KinesisStreams

Amazon Kinesis Firehose

Amazon DynamoDB

Amazon S3

Amazon ElastiCache

Amazon RDS

Amazon DynamoDB Streams

Hot

Hot

War

m

Fast

Stre

am

Sear

ch

SQL

N

oSQ

L C

ache

File

Mes

sage

Stre

am

Amazon EC2

Mobile apps

Web apps

Devices

MessagingMessage

Sensors & IoT platforms

AWS IoT

Data centersAWS Direct

Connect

AWS Import/ExportSnowball

Logging

Amazon CloudWatch

AWS CloudTrail

RECORDS

DOCUMENTS

FILES

MESSAGES

STREAMS

Amazon QuickSight

Apps & Services

Anal

ysis

& v

isua

lizat

ion

Not

eboo

ksID

EAP

I

Logg

ing

IoT

Appl

icat

ions

Tran

spor

tM

essa

ging

ETL

Amazon EMR

Amazon SQS apps

Amazon Redshift

Amazon Machine Learning

Presto

AmazonEMR

Fast

Slow

Amazon EC2

Amazon Athena

Batc

hM

essa

geIn

tera

ctiv

eM

L

Page 36: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Design Patterns

Page 37: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Concept #1: Decoupled Data Bus

• Storage decoupled from processing• Multiple stages

Store Process Store Process

ProcessStore

Page 38: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Concept #2: Multiple Stream Processing

ProcessStore

Amazon Kinesis

Amazon DynamoDB

Amazon S3

AWS Lambda

Amazon Kinesis Connector

Library KCL

• Parallel processing

Page 39: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Concept #3: Multiple Data Stores

Amazon EMR

Amazon Kinesis

AWS Lambda

Amazon S3

Amazon DynamoDB

Spark Streaming

Amazon Kinesis Connector

Library KCL

Spark SQL

• Analysis framework reads from or writes to multiple data stores

ProcessStore

Page 40: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon EMR

ApacheKafka

KCL

AWS Lambda

SparkStreaming

Apache Storm

Amazon SNS

AmazonML

Notifications

AmazonElastiCache

(Redis)

AmazonDynamoDB

AmazonRDS

AmazonES

Alert

App state

Real-time prediction

KPI

DynamoDBStreams

Amazon Kinesis

ProcessStore

Real-time Analytics Design Pattern

Page 41: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon SQS

Amazon SQS App

Amazon SQS App

Amazon SNS Subscribers

AmazonElastiCache

(Redis)

AmazonDynamoDB

AmazonRDS

AmazonES

Publish

App state

KPI

Amazon SQS App

Amazon SQSApp

Auto Scaling group

Amazon SQSPriority queue

Messages /eventsProcess

Store

Message / Event Processing Design Pattern

Page 42: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon S3

Amazon EMR

Hive

Pig

Spark

AmazonMachine Learning

Consume

Amazon Redshift

Amazon EMR

PrestoSpark

BatchMode

InteractiveMode

Batch prediction

Real-time predictionAmazon Kinesis

Firehose

Amazon Athena

Amazon KinesisAnalytics

Files

ProcessStore

Interactive &Batch Analytics Design Pattern

Page 43: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

DemonstrationApply what we’ve just learnt

Page 44: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Real-time Analytics Design Pattern

Apache Web Server

Amazon Kinesis

Firehose

Amazon Kinesis

Firehose

Amazon Kinesis Analytics

Amazon S3 bucket

Availability Zone #1

KibanaAmazon ElasticSearch

Page 45: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Elastic Cloud Computing EC2

Amazon EC2 provides the Virtual Machines VMs, known as instances, to run your web application on the platform you choose. It allows you to configure and scale your compute capacity easily to meet changing requirements and demand.

In this demo, this instance is installed with Apache Web Server which continuously generates web access log records and Amazon Kinesis Agent which streams these records to Amazon Kinesis Firehose.

Apache Web Server

+Amazon

Kinesis Agent

Page 46: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Kinesis Firehose

Amazon Kinesis Firehose is a fully managed service for delivering real-time streaming data to destinations such as Amazon Simple Storage Service (AmazonS3), Amazon Redshift, or Amazon Elasticsearch Service (Amazon ES).

In this step, we will create an Amazon Kinesis Firehose delivery stream to save each log entry in Amazon S3 and to provide the log data to the Amazon Kinesis Analytics application.

Amazon Kinesis

Firehose

Page 47: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Example: Real-time Analytics (1)

Apache Web Server

Amazon Kinesis

Firehose

Availability Zone #1

1. A Linux Instance is installed with Amazon Kinesis Agent which sends log records to Amazon Kinesis Firehose continuously.

Streaming data

COLLECT

Page 48: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Simple Storage Service S3

Amazon S3 has a simple web services interface that you can use to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure.

Examples: Web Access Log, Static Web Site and Data Lake etc.

Amazon S3

Page 49: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Kinesis Analytics

Amazon Kinesis Analytics enables you to query streaming data or build entire streaming applications using SQL, so that you can gain actionable insights promptly.

It takes care of everything required to run your queries continuously and scales automatically to match the volume and throughput rate of your incoming data.

Amazon Kinesis

Analytics

Page 50: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Example: Real-time Analytics (2)

Apache Web Server

Amazon Kinesis

Firehose

Amazon S3 bucket

Availability Zone #1

2a. Amazon Kinesis Firehose will write each log record to Amazon Simple Storage Service S3 for durable storage.

COLLECT STORE

Page 51: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Example: Real-time Analytics (2)

Apache Web Server

Amazon Kinesis

Firehose

Amazon Kinesis Analytics

Amazon S3 bucket

Availability Zone #1

2b. Amazon Kinesis Analytics run a SQL statement against the streaming input data.

COLLECT STORE PROCESS / ANALYZE

Page 52: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

SQL Operations Inside Kinesis Analytics

Source Stream

Insert & Select (Pump)

Destination Stream

Amazon Kinesis Analytics

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" ( datetime VARCHAR(30), status INTEGER, statusCount INTEGER);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" SELECT STREAM TIMESTAMP_TO_CHAR('yyyy-MM-dd''T''HH:mm:ss.SSS', LOCALTIMESTAMP) as datetime, "response" as status, COUNT(*) AS statusCountFROM "SOURCE_SQL_STREAM_001" GROUP BY "response", FLOOR(("SOURCE_SQL_STREAM_001".ROWTIME - TIMESTAMP '1970-01-01 00:00:00') minute / 1 TO MINUTE);

Amazon Kinesis

Firehose

Page 53: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Example: Real-time Analytics (3)

Apache Web Server

Amazon Kinesis

Firehose

Amazon Kinesis

Firehose

Amazon Kinesis Analytics

Amazon S3 bucket

Availability Zone #1

COLLECT STORE PROCESS / ANALYZE

3. Amazon Kinesis Analytics creates an aggregated data set every minute and output that data to a second Firehose delivery stream.

STORE

Page 54: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Elasticsearch Service ES

Amazon Elasticsearch Service makes it easy to deploy, secure, operate, and scale Elasticsearch for log analytics, full text search, application monitoring, and more. Amazon Elasticsearch Service is a fully managed service that delivers real-time analytics capabilities alongside the availability, scalability, and security that production workloads require.

The service offers built-in integrations with Kibana, Logstashand other AWS services. It enables you to go from raw data to actionable insights quickly and securely.

Amazon Elasticsearch

Page 55: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Example: Real-time Analytics (4)

Apache Web Server

Amazon Kinesis

Firehose

Amazon Kinesis

Firehose

Amazon Kinesis Analytics

Amazon S3 bucket

Availability Zone #1

Amazon ElasticSearch

COLLECT STORE PROCESS / ANALYZE STORE

4. This Firehose delivery stream will write the aggregated data to an Amazon ES domain.

Page 56: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Kibana

Kibana lets you visualize your Elasticsearch data. It provides you interactive visualizations with various types including histograms, line graphs, pie charts, and more. It leverages the full aggregation capabilities of Elasticsearch.

Kibana

Page 57: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Example: Real-time Analytics (5)

Apache Web Server

Amazon Kinesis

Firehose

Amazon Kinesis

Firehose

Amazon Kinesis Analytics

Amazon S3 bucket

Availability Zone #1

KibanaAmazon ElasticSearch

COLLECT STORE PROCESS / ANALYZE STORE CONSUME

5. Finally, use Kibana to visualize the result of your system.

Page 58: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Implementation Steps

Apache Web Server

Amazon Kinesis

Firehose

Amazon Kinesis

Firehose

Amazon Kinesis Analytics

Amazon S3 bucket

Availability Zone #1

KibanaAmazon ElasticSearch

COLLECT STORE PROCESS / ANALYZE STORE CONSUME

1 2a

2b

345 6

Page 59: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Let’s build your own one in 60 mins!

https://aws.amazon.com/getting-started/projects/build-log-analytics-solution/

Page 60: Deep Dive on Big Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Thank you!John Yeung | [email protected]