48

Modern data architectures for real time analytics and engagement

Embed Size (px)

Citation preview

Page 1: Modern data architectures for real time analytics and engagement
Page 2: Modern data architectures for real time analytics and engagement

Modern Data Architectures for Real-Time Analytics & Engagement

Russell NashAPAC Solutions Architect

Page 3: Modern data architectures for real time analytics and engagement

Russell NashAPAC Solutions ArchitectAmazon Web Services

Page 4: Modern data architectures for real time analytics and engagement

SCALABLE FLEXIBLE MANAGEABLE COST EFFECTIVE

Modern Data Architecture

Page 5: Modern data architectures for real time analytics and engagement

Ingest Serving

Speed (Real-time)

Scale (Batch)

Data analysts

Data scientists

Business users

Engagement platforms

Automation / events

Sources

Modern Data Architecture

Page 6: Modern data architectures for real time analytics and engagement

Ingest Serving

Speed (Real-time)

Scale (Batch)

Data analysts

Data scientists

Business users

Engagement platforms

Automation / events

Sources

Real-time Pipeline

Amazon Kinesis

Machines

Devices

Mobile

Clickstream

Page 7: Modern data architectures for real time analytics and engagement

Amazon Kinesis Streams

Amazon Kinesis Firehose

Amazon Kinesis Analytics

Kinesis Family

Page 8: Modern data architectures for real time analytics and engagement

Availability Zone

Availability Zone

Availability Zone

Amazon Kinesis

Stream

AWS Lambda

KCL App

Amazon EMR

Streaming

Logs

Alerts

Analysis

Dashboards

Predictions

Page 9: Modern data architectures for real time analytics and engagement

Amazon Kinesis Stream

SHARD1000 TPS or 1MB 5 TPS or 2MB

SHARD

2000 TPS or 2MB 10 TPS or 4MB

SHARD

3000 TPS or 3MB 15 TPS or 6MB

Retention: 24 hours to 7 Days

Page 10: Modern data architectures for real time analytics and engagement

Creating a Kinesis Stream

Page 11: Modern data architectures for real time analytics and engagement

Amazon Kinesis Stream

SHARD

SHARD

SHARD

EVENT PRODUCERS

KinesisEndpoint

Specify Partition Key

Page 12: Modern data architectures for real time analytics and engagement

• Writes to one or more Amazon Kinesis Streams• Retry Mechanism• Uses PutRecords • Aggregates • Integrates with Amazon KCL to de-aggregate• Submits Amazon CloudWatch metrics

Kinesis Producer Library

Page 13: Modern data architectures for real time analytics and engagement

Kinesis Agent

• Monitors files and sends new data records to your delivery stream• Handles file rotation, checkpointing, and retry upon failures• Delivers all data in a reliable, timely, and simple manner• Emits AWS CloudWatch metrics

Page 14: Modern data architectures for real time analytics and engagement

Availability Zone

Availability Zone

Availability Zone

Amazon Kinesis

Stream

AWS Lambda

KCL App

Amazon EMR

Streaming

Logs

Alerts

Analysis

Dashboards

Predictions

Page 15: Modern data architectures for real time analytics and engagement

Kinesis Data Out – Kinesis Client Library

SHARD 1

SHARD 2

SHARD 3

SHARD N

EC2 Instance

Worker 1

Worker 2

EC2 Instance

Worker 3

Worker N

KCL: Java, Node.js, Python, .NET, Ruby

Page 16: Modern data architectures for real time analytics and engagement

twitter-trends.com

twitter-trends.com website

Page 17: Modern data architectures for real time analytics and engagement

twitter-trends.com

The solution: Local Top 10

My top-10

My top-10

My top-10

Global top-10

Page 18: Modern data architectures for real time analytics and engagement

KINESIS

twitter-trends.com

Challenges using the Kinesis API directly

Kinesisapplication

Manual creation of workers and assignment to shards

How many workers per EC2 instance?How many EC2 instances?

Page 19: Modern data architectures for real time analytics and engagement

KINESIS

twitter-trends.com

Using the Kinesis Client Library

Kinesisapplication

Shard mgmt table

Page 20: Modern data architectures for real time analytics and engagement

KINESIS

twitter-trends.com

Elasticity and load balancing

Shard mgmt table

Auto scaling Group

Page 21: Modern data architectures for real time analytics and engagement

KINESIS

twitter-trends.com

Fault tolerance support in KCL

Shard mgmt table

XAvailability Zone

1

Availability Zone 3

Page 22: Modern data architectures for real time analytics and engagement

Checkpoint, replay design pattern

Kinesis

1417182123

Shard-i235810

Shard ID

Lock Seq num

Shard-i

Host A

Host B

Shard ID

Local top-10

Shard-i

0

10

18X2

3

5

8

10

14

1718

2123

0

310

Host AHost B

{#Movies: 10235, #Weather: 9835, …}{#Movies: 10235, #Weather: 9910, …}

1023

1417

1821

23

Page 23: Modern data architectures for real time analytics and engagement

Availability Zone

Availability Zone

Availability Zone

Amazon Kinesis

Stream

AWS Lambda

KCL App

Amazon EMR

Streaming

Logs

Alerts

Analysis

Dashboards

Predictions

Page 24: Modern data architectures for real time analytics and engagement

Kinesis & Lambda

SHARD 1

SHARD 2

SHARD 3

SHARD N

AWS Lambda: Node.js, Java, Python, C#

AWS Lambda

Page 25: Modern data architectures for real time analytics and engagement

LambdaBlueprints

Page 26: Modern data architectures for real time analytics and engagement

Availability Zone

Availability Zone

Availability Zone

Amazon Kinesis

Stream

AWS Lambda

KCL App

Amazon EMR

Streaming

Logs

Alerts

Analysis

Dashboards

Predictions

Page 27: Modern data architectures for real time analytics and engagement

Spark Core

SparkSQL

Spark Streaming

Spark R

Spark ML Graph X

Page 28: Modern data architectures for real time analytics and engagement

Spark Core

SparkSQL

Spark Streaming

Spark R

Spark ML Graph X

Page 29: Modern data architectures for real time analytics and engagement

StreamMicro

BatchesResults

Amazon Kinesis

Apache Kafka

Page 30: Modern data architectures for real time analytics and engagement

Spark Core

SparkSQL

Spark Streaming

Spark R

Spark ML Graph X

Page 31: Modern data architectures for real time analytics and engagement

Data Prep

Prediction Model

Train

TestSplit

70%

30%

Near Real-time Data

Training Data

SQL

ML

Page 32: Modern data architectures for real time analytics and engagement

Ingest Serving

Speed (Real-time)

Scale (Batch)

Data analysts

Data scientists

Business users

Engagement platforms

Automation / events

Sources

Amazon Kinesis AWS Lambda

Application

Amazon EMR

Streaming

S3 (Log)

Amazon ElasticSearch(Dashboard)

Real-time Pipeline

Page 33: Modern data architectures for real time analytics and engagement

AmazonElasticsearch

• Search and Analytics• Scalable• Fully Managed• Integrated – Logstash, Kibana

Page 34: Modern data architectures for real time analytics and engagement
Page 35: Modern data architectures for real time analytics and engagement

Ingest Serving

Speed (Real-time)

Scale (Batch)

Data analysts

Data scientists

Business users

Engagement platforms

Automation / events

Sources

Amazon Kinesis AWS Lambda

Application

Amazon EMR

Streaming

S3 (Logs)

Amazon ElasticSearch(Dashboards)

Amazon EMR(Predictions)

ML

Amazon SNS(Alerts)

Real-time Pipeline

Amazon Redshift

(Analytics)

Page 36: Modern data architectures for real time analytics and engagement

Amazon Kinesis Streams

Amazon Kinesis Firehose

Amazon Kinesis Analytics

Kinesis Family

Page 37: Modern data architectures for real time analytics and engagement

S3

Redshift

Elasticsearch

Amazon Kinesis Firehose

Auto provisioningAuto partition keysEnd to End Elastic

Batch Compress

Encrypt

Page 38: Modern data architectures for real time analytics and engagement

Amazon Kinesis Streams

Amazon Kinesis Firehose

Amazon Kinesis Analytics

Kinesis Family

Page 39: Modern data architectures for real time analytics and engagement

Kinesis Analytics

Stream or Firehose

Kinesis Analytics

Data OutData In

SQL

Stream or Firehose

Page 40: Modern data architectures for real time analytics and engagement
Page 41: Modern data architectures for real time analytics and engagement
Page 42: Modern data architectures for real time analytics and engagement

Sonos

Page 43: Modern data architectures for real time analytics and engagement

New X1 Instance - Tons of Memory

• Large-scale, in-memory applications

• Intel® Xeon® E7 8880 v3 Haswell processors

• Up to 2TB of memory

• Up to 128 vCPUs per instance

Page 44: Modern data architectures for real time analytics and engagement

Intel® Processor Technologies

Intel® AVX – Dramatically increases performance for highly parallel HPC workloads such as life science engineering, data mining, financial analysis, media processing

Intel® AES-NI – Enhances security with new encryption instructions that reduce the performance penalty associated with encrypting/decrypting data

Intel® Turbo Boost Technology – Increases computing power with performance that adapts to spikes in workloads

Intel Transactional Synchronization (TSX) Extensions – Enables execution of transactions that are independent to accelerate throughput

P state & C state control – provides granular performance tuning for cores and sleep states to improve overall application performance

Page 45: Modern data architectures for real time analytics and engagement
Page 46: Modern data architectures for real time analytics and engagement

twitter.com/awsawscloudseasia

[email protected]

facebook.com/amazonwebservices/

youtube.com/user/AmazonWebServices

slideshare.net/amazonwebservices

Thank you for joining us today. Please complete the survey & let us know what you think of the webinar.

Page 47: Modern data architectures for real time analytics and engagement

REGISTER NOWhttp://amzn.to/2jFt11NComplimentary labs are available only till 31 March 2017

Get hands on experience working with the AWS Technology.Access the complimentary Big Data on AWS self-paced labs

Page 48: Modern data architectures for real time analytics and engagement

Q&A