Transcript
Page 1: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Dr. Steffen Hausmann

Sr. Solutions Architect, Amazon Web Services

Deep Dive into Concepts and Tools for

Analyzing Streaming Data

Page 2: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Data originates in real-time

Photo by mountainamoeba

https://www.flickr.com/photos/mountainamoeba/2527300028/

Page 3: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Analytics is done in batches

Photo by PracticalHacks

https://www.flickr.com/photos/29225844@N05/2828724211

Page 4: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Insights are Perishable

Photo by Lucas Cobb

https://www.flickr.com/photos/cobblucas/4780005097/

Page 5: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Analyzing Streaming Data on AWS

Page 6: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Challenges of Stream Processing

Photo by FollowYour Nose

https://www.flickr.com/photos/laprimadonna/3294467673

Page 7: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Comparing Streams and Relations

𝑅 ⊆ 𝐼𝑑 × 𝐶𝑜𝑙𝑜𝑟

Relation

𝑆 ⊆ 𝐼𝑑 × 𝐶𝑜𝑙𝑜𝑟 × 𝑇𝑖𝑚𝑒

Stream

7

now

Page 8: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Querying Streams and Relations

Relation Stream

Fixed data and ad-hoc queries

Fixed queries and

continuously ingested data

Page 9: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Challenges of Querying Infinite Streams

SELECT * FROM S WHERE color = ‘black’

SELECT * FROM S JOIN S’

SELECT color, COUNT(1) FROM S GROUP BY color

... NOT EXISTS (SELECT * FROM S WHERE color = ‘red’)

Page 10: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Page 11: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Analyzing Streaming Data on AWS

• Runs standard SQL queries on

top of streaming data

• Fully managed and scales

automatically

• Only pay for the resources your

queries consume

Amazon Kinesis Analytics

• Open-source stream processing

framework

• Included in Amazon Elastic Map

Reduce (EMR)

• Flexible APIs with Java and

Scalar, SQL, and CEP support

Apache Flink

SQL

Page 12: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Evaluating Queries over Streams

Photo by Brad Greenlee

https://www.flickr.com/photos/bgreenlee/91309374/

Page 13: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Evaluating Non-monotonic OperatorsTumbling Windows

SELECT STREAM color, COUNT(1)

FROM ...

GROUP BY STEP(rowtime BY INTERVAL ‘10’ SECOND), color;

t1 t3 t5 t6 t9

10 sec

SQL

Page 14: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Evaluating Non-monotonic OperatorsSliding Windows

SELECT STREAM color, COUNT(1) OVER w

FROM ...

GROUP BY color

WINDOW w AS (RANGE INTERVAL ’10’ SECOND PRECEDING);

t1 t3 t5 t6 t9

SQL

Page 15: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Evaluating Non-monotonic OperatorsSession Windows

t5 t6t1 t3 t8 t9

stream.keyBy(<key selector>).window(EventTimeSessionWindows.withGap(Time.minutes(10))).<windowed transformation>(<window function>);

session gap

Page 16: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

SELECT STREAM *

FROM S AS s JOIN S’ AS t

ON s.color = t.color

SELECT STREAM *

FROM S OVER w AS s JOIN S’ OVER w AS t

ON s.color = t.color

WINDOW w AS (RANGE INTERVAL ‘10’ SECOND PRECEDING);

Evaluating Unbounded Queries

t2 t4 t8t7

t1 t3 t5 t6 t9

S

S‘

SQL

Page 17: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Different Time Semantics

Page 18: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Maintaining Order of Events

t1 t3 t8t7

Event Time

t1 t3 t8 7

Processing Time

t7

t11

t11

Page 19: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Maintaining Order of EventsUsing processing time based windows

t1 t3 t8 t7

Processing

Time

processing

time

count

0

processing

time

count

10

t11

Page 20: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Maintaining Order of EventsUsing multiple time-windows

SELECT STREAM

STEP(rowtime BY INTERVAL ’10’ SECOND) AS processing_time,

STEP(event_time BY INTERVAL ’10’ SECOND) AS event_time,

color,

COUNT(1)

FROM ...

GROUP BY processing_time, event_time, color;

SQL

Page 21: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Maintaining Order of EventsUsing multiple time-windows

t1 t3 t8 t7

Processing

Time

processing

time

event time count

0 0

processing

time

event time count

10 0

10 10

t11

Page 22: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Maintaining Order of EventsUsing event time and watermarks

t1 t3 t8 t710 20

event time count

0

event time count

10

0

Processing

Time

t11

Page 23: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Adding Watermarks to a Stream

- Periodic watermarks

- Assuming ascending timestamps

- Punctuated watermarks

stream.assignTimestampsAndWatermarks(new AscendingTimestampExtractor<MyEvent>() {

@Overridepublic long extractAscendingTimestamp(MyEvent element) {

return element.getCreationTime();}

});

Page 24: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Different Processing Semantics

Photo by Dominic Alves

https://www.flickr.com/photos/dominicspics/6854063597/

Page 25: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Consuming Data from a Stream

Consumer

Output sink

Page 26: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Different Processing SemanticsAt-most Once Semantics

Consumer

Output sink

Offset store

pos 561

pos 561

pos 1105

pos 1105

Page 27: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Different Processing SemanticsAt-least Once Semantics

Consumer

Output sink

Offset store

pos 561

pos 0

pos 0

Page 28: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Different Processing SemanticsExactly-once Semantics

• At-least-once event delivery plus

message deduplication

• Keep a transaction log of

processed messages

• On failure, replay events and

remove duplicated events for

every operator

Message Deduplication

• State for each operator is

periodically checkpointed

• On failure, rewind operator to

the previous consistent state

Distributed Snapshots

Page 29: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Go Build!

Page 30: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Please complete the session

survey in the summit mobile app.

Page 31: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Thank you!

Page 32: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Watermarks and Allowed Lateness

t3 t1 t8 t480

Processing

Time

stream.keyBy(<key selector>).window(<window assigner>).allowedLateness(<time>).sideOutputLateData(lateOutputTag)

t5


Recommended