Deep Dive on Big Data

Preview:

Citation preview

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

John Yeung, Solutions Architect

31 October 2017

Deep Dive on AWS with DemoAWS Big Data and Machine Learning Day | Hong Kong

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

What to expect from the session

Big Data ChallengesArchitectural PrinciplesDesign PatternsDemo (around 15 mins)

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Ever-Increasing Big Data

Volume

Velocity

Variety

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Big Data Evolution

Batch Processing

StreamProcessing

MachineLearning

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Plenty of Tools

Amazon Glacier

S3 DynamoDB

RDS

EMR

Amazon Redshift

Data PipelineAmazon Kinesis

Amazon Kinesis Streams app

Lambda Amazon ML

SQS

ElastiCache

DynamoDBStreams

Amazon ElasticsearchService

Amazon Kinesis Analytics

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Big Data Challenges

Why?

How?

What tools should I use?

Is there a reference architecture?

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Architectural Principles

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Architecture Principles

#1: Build Decoupled Systems• Data → Store → Process → Store → Analyze → Answers

#2: Use Right Tool for the Job• Data structure, latency, throughput, access patterns

#3: Leverage AWS Managed Services• Scalable/elastic, available, reliable, secure, no/low admin

#4: Use Lambda Architecture Ideas• Immutable (append-only) log, batch/speed/serving layer

#5: Be Cost-conscious• Big data ≠ big cost

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Simplify Big Data Processing

COLLECT STORE PROCESS/ANALYZE CONSUME

1. Time to answer (Latency)2. Throughput

3. Cost

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

COLLECT

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Types of DataCOLLECT

Mobile apps

Web apps

Data centersAWS Direct

Connect

RECORDS

Appl

icat

ions In-memory data

Database records

AWS Import/ExportSnowball

Logging

Amazon CloudWatch

AWS CloudTrail

DOCUMENTS

FILES

Logg

ing

Tran

spor

t

Search documents

Log files

MessagingMessage MESSAGES

Mes

sagi

ng

Messages

Devices

Sensors & IoT platforms

AWS IoT STREAMS

IoT Data streams

Transaction-based

File-based

Event-based

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Store

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

STORE

Devices

Sensors & IoT platforms

AWS IoT STREAMS

IoT

COLLECT

AWS Import/ExportSnowball

Logging

Amazon CloudWatch

AWS CloudTrail

DOCUMENTS

FILES

Logg

ing

Tran

spor

t

MessagingMessage MESSAGES

Mes

sagi

ngAp

plic

atio

ns

Mobile apps

Web apps

Data centersAWS Direct

Connect

RECORDS

Types of Data Stores

Database SQL & NoSQL databases

Search Search engines

File store File systems

Queue Message queues

Streamstorage

Pub/sub message queues

In-memory Caches

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

In-memory

COLLECT STORE

Mobile apps

Web apps

Data centersAWS Direct

Connect

RECORDS Database

AWS Import/ExportSnowball

Logging

Amazon CloudWatch

AWS CloudTrail

DOCUMENTS

FILES

Search

MessagingMessage MESSAGES

Devices

Sensors & IoT platforms

AWS IoT STREAMS

Apache Kafka

Amazon KinesisStreams

Amazon Kinesis Firehose

Amazon DynamoDB Streams

Hot

Stre

am

Amazon SQS

Mes

sage

Amazon S3File

Logg

ing

IoT

Appl

icat

ions

Tran

spor

tM

essa

ging

In-memory, Database, Search

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

COLLECT STORE

Mobile apps

Web apps

Data centersAWS Direct

Connect

RECORDS

AWS Import/ExportSnowball

Logging

Amazon CloudWatch

AWS CloudTrail

DOCUMENTS

FILES

MessagingMessage MESSAGES

Devices

Sensors & IoT platforms

AWS IoT STREAMS

Apache Kafka

Amazon KinesisStreams

Amazon Kinesis Firehose

Amazon DynamoDB Streams

Hot

Stre

am

Amazon SQS

Mes

sage

Amazon Elasticsearch Service

Amazon DynamoDB

Amazon S3

Amazon ElastiCache

Amazon RDS

Sear

ch

SQL

N

oSQ

L C

ache

File

Logg

ing

IoT

Appl

icat

ions

Tran

spor

tM

essa

ging

Amazon ElastiCache• Managed Memcached or Redis service

Amazon DynamoDB• Managed NoSQL database service

Amazon RDS• Managed relational database service

Amazon Elasticsearch Service• Managed Elasticsearch service

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Use the Right Tool for the Job

Data Tier

Search

Amazon Elasticsearch Service

In-memory

Amazon ElastiCacheRedisMemcached

SQL

Amazon AuroraAmazon RDS

MySQLPostgreSQLOracleSQL Server

NoSQL

Amazon DynamoDBCassandraHBaseMongoDB

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

In-memory

COLLECT STORE

Mobile apps

Web apps

Data centersAWS Direct

Connect

RECORDSDatabase

AWS Import/ExportSnowball

Logging

Amazon CloudWatch

AWS CloudTrail

DOCUMENTS

FILES

Search

MessagingMessage MESSAGES

Devices

Sensors & IoT platforms

AWS IoT STREAMS

Apache Kafka

Amazon KinesisStreams

Amazon Kinesis Firehose

Amazon DynamoDB Streams

Hot

Stre

am

Amazon S3

Amazon SQS

Mes

sage

Amazon S3File

Logg

ing

IoT

Appl

icat

ions

Tran

spor

tM

essa

ging

File Storage

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Why Is Amazon S3 Good for Big DataNatively supported by big data frameworks (Spark, Hive, Presto, etc.) Multiple & heterogeneous analysis clusters can use the same dataUnlimited number of objects and volume of dataVery high bandwidth – no aggregate throughput limitDesigned for 99.99% availability – can tolerate zone failureDesigned for 99.999999999% durabilityNo need to pay for data replicationNative support for versioningTiered-storage (Standard, IA, Amazon Glacier) via life-cycle policiesSecure – SSL, client/server-side encryption at restLow cost

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

In-memory

Amazon Kinesis Firehose

Amazon KinesisStreams

Apache Kafka

Amazon DynamoDB Streams

Amazon SQS

Amazon SQS• Managed message queue service

Apache Kafka• High throughput distributed streaming platform

Amazon Kinesis Streams• Managed stream storage + processing

Amazon Kinesis Firehose• Managed data delivery

Amazon DynamoDB• Managed NoSQL database• Tables can be stream-enabled

Message & Stream Storage

Devices

Sensors & IoT platforms

AWS IoT STREAMS

IoT

COLLECT STORE

Mobile apps

Web apps

Data centersAWS Direct

Connect

RECORDSDatabaseAp

plic

atio

ns

AWS Import/ExportSnowball

Logging

Amazon CloudWatch

AWS CloudTrail

DOCUMENTS

FILES

Search

File store

Logg

ing

Tran

spor

t

MessagingMessage MESSAGES

Mes

sagi

ng

Mes

sage

Stre

am

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Why Stream Storage

Decouple producers & consumers

Persistent buffer

Collect multiple streams

Preserve client ordering

Parallel consumption

4 4 3 3 2 2 1 14 3 2 1

4 3 2 1

4 3 2 1

4 3 2 14 4 3 3 2 2 1 1

shard 1 / partition 1

shard 2 / partition 2

Consumer 1Count of red = 4

Count of violet = 4

Consumer 2Count of blue = 4

Count of green = 4

DynamoDB stream Amazon Kinesis stream Kafka topic

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

What Stream Storage should I use?AmazonDynamoDBStreams

AmazonKinesisStreams

AmazonKinesis Firehose

ApacheKafka

AmazonSQS

AWS managed service

Yes Yes Yes No Yes

Guaranteedordering

Yes Yes Yes Yes No

Delivery exactly-once at-least-once exactly-once at-least-once at-least-once

Data retention period

24 hours 7 days N/A Configurable 14 days

Availability 3 AZ 3 AZ 3 AZ Configurable 3 AZ

Scale / throughput

No limit /~ table IOPS

No limit /~ shards

No limit /automatic

No limit /~ nodes

No limits /automatic

Parallel clients Yes Yes No Yes No

Stream MapReduce Yes Yes N/A Yes N/A

Record/object size 400 KB 1 MB Redshift row size Configurable 256 KB

Cost Higher (table cost) Low Low Low (+admin) Low-medium

Hot Warm

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Which Data Store Should I Use

Data Structure → Fixed schema, JSON, key-value

Access Patterns → Store data in the format you will access it

Data Characteristics → Hot, Warm, Cold

Cost → Right cost

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Data Structure and Access Patterns

Access Patterns What to use?Put/Get (key, value) In-memory, NoSQLSimple relationships → 1:N, M:N NoSQLMulti-table joins, transaction, SQL SQLFaceting, search Search

Data Structure What to use?Fixed schema SQL, NoSQLSchema-free (JSON) NoSQL, Search(Key, value) In-memory, NoSQL

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

What is the temperature of your data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Data characteristics: Hot, Warm or Cold

Hot Warm ColdVolume MB–GB GB–TB PB–EBItem size B–KB KB–MB KB–TBLatency ms ms, sec min, hrsDurability Low–high High Very highRequest rate Very high High LowCost/GB $$-$ $-¢¢ ¢

Hot data Warm data Cold data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

In-memory SQL

Request rateHigh Low

Cost/GBHigh Low

LatencyLow High

Data volumeLow High

Amazon Glacier

Stru

ctur

e

NoSQL

Hot data Warm data Cold data

Low

High

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Which Data Store Should I UseAmazon ElastiCache

AmazonDynamoDB

AmazonRDS/Aurora

AmazonES

Amazon S3

AmazonGlacier

Average latency

ms ms ms, sec ms,sec ms,sec,min(~ size)

hrs

Typicaldata stored

GB GB–TBs(no limit)

GB–TB(64 TB max)

GB–TB MB–PB(no limit)

GB–PB(no limit)

Typicalitem size

B-KB KB(400 KB max)

KB(64 KB max)

B-KB(2 GB max)

KB-TB(5 TB max)

GB(40 TB max)

Request Rate

High – very high Very high(no limit)

High High Low – high(no limit)

Very low

Storage costGB/month

$$ ¢¢ ¢¢ ¢¢ ¢ ¢4/10

Durability Low - moderate Very high Very high High Very high Very high

Availability High2 AZ

Very high 3 AZ

Very high3 AZ

High2 AZ

Very high3 AZ

Very high3 AZ

Hot data Warm data Cold data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

PROCESS / ANALYZE

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Analytics & FrameworksInteractive

Takes secondsExample: Self-service dashboardsAmazon Redshift, Amazon Athena, Amazon EMR (Presto, Spark)

BatchTakes minutes to hours Example: Daily/weekly/monthly reportsAmazon EMR (MapReduce, Hive, Pig, Spark)

MessageTakes milliseconds to secondsExample: Message processingAmazon SQS applications on Amazon EC2

StreamTakes milliseconds to secondsExample: Fraud alerts, 1 minute metricsAmazon EMR (Spark Streaming), Amazon Kinesis Analytics, KCL, Storm, AWS Lambda

PROCESS / ANALYZE

Amazon Machine LearningM

LM

essa

ge

Amazon SQS appsAmazon EC2

Streaming

Amazon Kinesis Analytics

KCLapps

AWS Lambda

Stre

am

Amazon EC2

Amazon EMR

Fast

Amazon Redshift

Presto

AmazonEMR

Fast

Slow

Amazon Athena

Batc

hIn

tera

ctiv

e

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

What about ETL

https://aws.amazon.com/big-data/partner-solutions/

ETLSTORE PROCESS / ANALYZE

Data Integration PartnersReduce the effort to move, cleanse, synchronize, manage, and automatize data related processes. AWS Glue

AWSGlueisafullymanagedETLservicethatmakesiteasytounderstandyourdatasources,preparethedata,andmoveitreliablybetweendatastores

New

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

CONSUME

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

COLLECT STORE CONSUMEPROCESS / ANALYZE

Amazon Elasticsearch Service

Apache Kafka

Amazon SQS

Amazon KinesisStreams

Amazon Kinesis Firehose

Amazon DynamoDB

Amazon S3

Amazon ElastiCache

Amazon RDS

Amazon DynamoDB Streams

Hot

Hot

War

m

File

Mes

sage

Stre

am

Mobile apps

Web apps

Devices

MessagingMessage

Sensors & IoT platforms

AWS IoT

Data centersAWS Direct

Connect

AWS Import/ExportSnowball

Logging

Amazon CloudWatch

AWS CloudTrail

RECORDS

DOCUMENTS

FILES

MESSAGES

STREAMS

Logg

ing

IoT

Appl

icat

ions

Tran

spor

tM

essa

ging

ETL

Sear

ch

SQL

N

oSQ

L C

ache

Streaming

Amazon Kinesis Analytics

KCLapps

AWS Lambda

Fast

Stre

am

Amazon EC2

Amazon EMR

Amazon SQS apps

Amazon Redshift

Amazon Machine Learning

Presto

AmazonEMR

Fast

Slow

Amazon EC2

Amazon Athena

Batc

hM

essa

geIn

tera

ctiv

eM

L

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

STORE CONSUMEPROCESS / ANALYZE

Amazon QuickSight

Apps & Services

Anal

ysis

& v

isua

lizat

ion

Not

eboo

ks

IDE

API

Applications & API

Analysis and visualization

Notebooks

IDE

Business users

Data scientist, developers

COLLECT ETL

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Put them together

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Streaming

Amazon Kinesis Analytics

KCLapps

AWS Lambda

COLLECT STORE CONSUMEPROCESS / ANALYZE

Amazon Elasticsearch Service

Apache Kafka

Amazon SQS

Amazon KinesisStreams

Amazon Kinesis Firehose

Amazon DynamoDB

Amazon S3

Amazon ElastiCache

Amazon RDS

Amazon DynamoDB Streams

Hot

Hot

War

m

Fast

Stre

am

Sear

ch

SQL

N

oSQ

L C

ache

File

Mes

sage

Stre

am

Amazon EC2

Mobile apps

Web apps

Devices

MessagingMessage

Sensors & IoT platforms

AWS IoT

Data centersAWS Direct

Connect

AWS Import/ExportSnowball

Logging

Amazon CloudWatch

AWS CloudTrail

RECORDS

DOCUMENTS

FILES

MESSAGES

STREAMS

Amazon QuickSight

Apps & Services

Anal

ysis

& v

isua

lizat

ion

Not

eboo

ksID

EAP

I

Logg

ing

IoT

Appl

icat

ions

Tran

spor

tM

essa

ging

ETL

Amazon EMR

Amazon SQS apps

Amazon Redshift

Amazon Machine Learning

Presto

AmazonEMR

Fast

Slow

Amazon EC2

Amazon Athena

Batc

hM

essa

geIn

tera

ctiv

eM

L

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Design Patterns

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Concept #1: Decoupled Data Bus

• Storage decoupled from processing• Multiple stages

Store Process Store Process

ProcessStore

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Concept #2: Multiple Stream Processing

ProcessStore

Amazon Kinesis

Amazon DynamoDB

Amazon S3

AWS Lambda

Amazon Kinesis Connector

Library KCL

• Parallel processing

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Concept #3: Multiple Data Stores

Amazon EMR

Amazon Kinesis

AWS Lambda

Amazon S3

Amazon DynamoDB

Spark Streaming

Amazon Kinesis Connector

Library KCL

Spark SQL

• Analysis framework reads from or writes to multiple data stores

ProcessStore

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon EMR

ApacheKafka

KCL

AWS Lambda

SparkStreaming

Apache Storm

Amazon SNS

AmazonML

Notifications

AmazonElastiCache

(Redis)

AmazonDynamoDB

AmazonRDS

AmazonES

Alert

App state

Real-time prediction

KPI

DynamoDBStreams

Amazon Kinesis

ProcessStore

Real-time Analytics Design Pattern

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon SQS

Amazon SQS App

Amazon SQS App

Amazon SNS Subscribers

AmazonElastiCache

(Redis)

AmazonDynamoDB

AmazonRDS

AmazonES

Publish

App state

KPI

Amazon SQS App

Amazon SQSApp

Auto Scaling group

Amazon SQSPriority queue

Messages /eventsProcess

Store

Message / Event Processing Design Pattern

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon S3

Amazon EMR

Hive

Pig

Spark

AmazonMachine Learning

Consume

Amazon Redshift

Amazon EMR

PrestoSpark

BatchMode

InteractiveMode

Batch prediction

Real-time predictionAmazon Kinesis

Firehose

Amazon Athena

Amazon KinesisAnalytics

Files

ProcessStore

Interactive &Batch Analytics Design Pattern

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

DemonstrationApply what we’ve just learnt

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Real-time Analytics Design Pattern

Apache Web Server

Amazon Kinesis

Firehose

Amazon Kinesis

Firehose

Amazon Kinesis Analytics

Amazon S3 bucket

Availability Zone #1

KibanaAmazon ElasticSearch

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Elastic Cloud Computing EC2

Amazon EC2 provides the Virtual Machines VMs, known as instances, to run your web application on the platform you choose. It allows you to configure and scale your compute capacity easily to meet changing requirements and demand.

In this demo, this instance is installed with Apache Web Server which continuously generates web access log records and Amazon Kinesis Agent which streams these records to Amazon Kinesis Firehose.

Apache Web Server

+Amazon

Kinesis Agent

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Kinesis Firehose

Amazon Kinesis Firehose is a fully managed service for delivering real-time streaming data to destinations such as Amazon Simple Storage Service (AmazonS3), Amazon Redshift, or Amazon Elasticsearch Service (Amazon ES).

In this step, we will create an Amazon Kinesis Firehose delivery stream to save each log entry in Amazon S3 and to provide the log data to the Amazon Kinesis Analytics application.

Amazon Kinesis

Firehose

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Example: Real-time Analytics (1)

Apache Web Server

Amazon Kinesis

Firehose

Availability Zone #1

1. A Linux Instance is installed with Amazon Kinesis Agent which sends log records to Amazon Kinesis Firehose continuously.

Streaming data

COLLECT

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Simple Storage Service S3

Amazon S3 has a simple web services interface that you can use to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure.

Examples: Web Access Log, Static Web Site and Data Lake etc.

Amazon S3

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Kinesis Analytics

Amazon Kinesis Analytics enables you to query streaming data or build entire streaming applications using SQL, so that you can gain actionable insights promptly.

It takes care of everything required to run your queries continuously and scales automatically to match the volume and throughput rate of your incoming data.

Amazon Kinesis

Analytics

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Example: Real-time Analytics (2)

Apache Web Server

Amazon Kinesis

Firehose

Amazon S3 bucket

Availability Zone #1

2a. Amazon Kinesis Firehose will write each log record to Amazon Simple Storage Service S3 for durable storage.

COLLECT STORE

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Example: Real-time Analytics (2)

Apache Web Server

Amazon Kinesis

Firehose

Amazon Kinesis Analytics

Amazon S3 bucket

Availability Zone #1

2b. Amazon Kinesis Analytics run a SQL statement against the streaming input data.

COLLECT STORE PROCESS / ANALYZE

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

SQL Operations Inside Kinesis Analytics

Source Stream

Insert & Select (Pump)

Destination Stream

Amazon Kinesis Analytics

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" ( datetime VARCHAR(30), status INTEGER, statusCount INTEGER);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" SELECT STREAM TIMESTAMP_TO_CHAR('yyyy-MM-dd''T''HH:mm:ss.SSS', LOCALTIMESTAMP) as datetime, "response" as status, COUNT(*) AS statusCountFROM "SOURCE_SQL_STREAM_001" GROUP BY "response", FLOOR(("SOURCE_SQL_STREAM_001".ROWTIME - TIMESTAMP '1970-01-01 00:00:00') minute / 1 TO MINUTE);

Amazon Kinesis

Firehose

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Example: Real-time Analytics (3)

Apache Web Server

Amazon Kinesis

Firehose

Amazon Kinesis

Firehose

Amazon Kinesis Analytics

Amazon S3 bucket

Availability Zone #1

COLLECT STORE PROCESS / ANALYZE

3. Amazon Kinesis Analytics creates an aggregated data set every minute and output that data to a second Firehose delivery stream.

STORE

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Elasticsearch Service ES

Amazon Elasticsearch Service makes it easy to deploy, secure, operate, and scale Elasticsearch for log analytics, full text search, application monitoring, and more. Amazon Elasticsearch Service is a fully managed service that delivers real-time analytics capabilities alongside the availability, scalability, and security that production workloads require.

The service offers built-in integrations with Kibana, Logstashand other AWS services. It enables you to go from raw data to actionable insights quickly and securely.

Amazon Elasticsearch

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Example: Real-time Analytics (4)

Apache Web Server

Amazon Kinesis

Firehose

Amazon Kinesis

Firehose

Amazon Kinesis Analytics

Amazon S3 bucket

Availability Zone #1

Amazon ElasticSearch

COLLECT STORE PROCESS / ANALYZE STORE

4. This Firehose delivery stream will write the aggregated data to an Amazon ES domain.

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Kibana

Kibana lets you visualize your Elasticsearch data. It provides you interactive visualizations with various types including histograms, line graphs, pie charts, and more. It leverages the full aggregation capabilities of Elasticsearch.

Kibana

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Example: Real-time Analytics (5)

Apache Web Server

Amazon Kinesis

Firehose

Amazon Kinesis

Firehose

Amazon Kinesis Analytics

Amazon S3 bucket

Availability Zone #1

KibanaAmazon ElasticSearch

COLLECT STORE PROCESS / ANALYZE STORE CONSUME

5. Finally, use Kibana to visualize the result of your system.

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Implementation Steps

Apache Web Server

Amazon Kinesis

Firehose

Amazon Kinesis

Firehose

Amazon Kinesis Analytics

Amazon S3 bucket

Availability Zone #1

KibanaAmazon ElasticSearch

COLLECT STORE PROCESS / ANALYZE STORE CONSUME

1 2a

2b

345 6

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Let’s build your own one in 60 mins!

https://aws.amazon.com/getting-started/projects/build-log-analytics-solution/

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Thank you!John Yeung | jyeung@amazon.com

Recommended