32
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Dr. Steffen Hausmann Sr. Solutions Architect, Amazon Web Services Deep Dive into Concepts and Tools for Analyzing Streaming Data

Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Dr. Steffen Hausmann

Sr. Solutions Architect, Amazon Web Services

Deep Dive into Concepts and Tools for

Analyzing Streaming Data

Page 2: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Data originates in real-time

Photo by mountainamoeba

https://www.flickr.com/photos/mountainamoeba/2527300028/

Page 3: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Analytics is done in batches

Photo by PracticalHacks

https://www.flickr.com/photos/29225844@N05/2828724211

Page 4: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Insights are Perishable

Photo by Lucas Cobb

https://www.flickr.com/photos/cobblucas/4780005097/

Page 5: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Analyzing Streaming Data on AWS

Page 6: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Challenges of Stream Processing

Photo by FollowYour Nose

https://www.flickr.com/photos/laprimadonna/3294467673

Page 7: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Comparing Streams and Relations

𝑅 ⊆ 𝐼𝑑 × 𝐶𝑜𝑙𝑜𝑟

Relation

𝑆 ⊆ 𝐼𝑑 × 𝐶𝑜𝑙𝑜𝑟 × 𝑇𝑖𝑚𝑒

Stream

7

now

Page 8: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Querying Streams and Relations

Relation Stream

Fixed data and ad-hoc queries

Fixed queries and

continuously ingested data

Page 9: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Challenges of Querying Infinite Streams

SELECT * FROM S WHERE color = ‘black’

SELECT * FROM S JOIN S’

SELECT color, COUNT(1) FROM S GROUP BY color

... NOT EXISTS (SELECT * FROM S WHERE color = ‘red’)

Page 10: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Page 11: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Analyzing Streaming Data on AWS

• Runs standard SQL queries on

top of streaming data

• Fully managed and scales

automatically

• Only pay for the resources your

queries consume

Amazon Kinesis Analytics

• Open-source stream processing

framework

• Included in Amazon Elastic Map

Reduce (EMR)

• Flexible APIs with Java and

Scalar, SQL, and CEP support

Apache Flink

SQL

Page 12: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Evaluating Queries over Streams

Photo by Brad Greenlee

https://www.flickr.com/photos/bgreenlee/91309374/

Page 13: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Evaluating Non-monotonic OperatorsTumbling Windows

SELECT STREAM color, COUNT(1)

FROM ...

GROUP BY STEP(rowtime BY INTERVAL ‘10’ SECOND), color;

t1 t3 t5 t6 t9

10 sec

SQL

Page 14: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Evaluating Non-monotonic OperatorsSliding Windows

SELECT STREAM color, COUNT(1) OVER w

FROM ...

GROUP BY color

WINDOW w AS (RANGE INTERVAL ’10’ SECOND PRECEDING);

t1 t3 t5 t6 t9

SQL

Page 15: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Evaluating Non-monotonic OperatorsSession Windows

t5 t6t1 t3 t8 t9

stream.keyBy(<key selector>).window(EventTimeSessionWindows.withGap(Time.minutes(10))).<windowed transformation>(<window function>);

session gap

Page 16: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

SELECT STREAM *

FROM S AS s JOIN S’ AS t

ON s.color = t.color

SELECT STREAM *

FROM S OVER w AS s JOIN S’ OVER w AS t

ON s.color = t.color

WINDOW w AS (RANGE INTERVAL ‘10’ SECOND PRECEDING);

Evaluating Unbounded Queries

t2 t4 t8t7

t1 t3 t5 t6 t9

S

S‘

SQL

Page 17: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Different Time Semantics

Page 18: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Maintaining Order of Events

t1 t3 t8t7

Event Time

t1 t3 t8 7

Processing Time

t7

t11

t11

Page 19: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Maintaining Order of EventsUsing processing time based windows

t1 t3 t8 t7

Processing

Time

processing

time

count

0

processing

time

count

10

t11

Page 20: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Maintaining Order of EventsUsing multiple time-windows

SELECT STREAM

STEP(rowtime BY INTERVAL ’10’ SECOND) AS processing_time,

STEP(event_time BY INTERVAL ’10’ SECOND) AS event_time,

color,

COUNT(1)

FROM ...

GROUP BY processing_time, event_time, color;

SQL

Page 21: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Maintaining Order of EventsUsing multiple time-windows

t1 t3 t8 t7

Processing

Time

processing

time

event time count

0 0

processing

time

event time count

10 0

10 10

t11

Page 22: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Maintaining Order of EventsUsing event time and watermarks

t1 t3 t8 t710 20

event time count

0

event time count

10

0

Processing

Time

t11

Page 23: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Adding Watermarks to a Stream

- Periodic watermarks

- Assuming ascending timestamps

- Punctuated watermarks

stream.assignTimestampsAndWatermarks(new AscendingTimestampExtractor<MyEvent>() {

@Overridepublic long extractAscendingTimestamp(MyEvent element) {

return element.getCreationTime();}

});

Page 24: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Different Processing Semantics

Photo by Dominic Alves

https://www.flickr.com/photos/dominicspics/6854063597/

Page 25: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Consuming Data from a Stream

Consumer

Output sink

Page 26: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Different Processing SemanticsAt-most Once Semantics

Consumer

Output sink

Offset store

pos 561

pos 561

pos 1105

pos 1105

Page 27: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Different Processing SemanticsAt-least Once Semantics

Consumer

Output sink

Offset store

pos 561

pos 0

pos 0

Page 28: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Different Processing SemanticsExactly-once Semantics

• At-least-once event delivery plus

message deduplication

• Keep a transaction log of

processed messages

• On failure, replay events and

remove duplicated events for

every operator

Message Deduplication

• State for each operator is

periodically checkpointed

• On failure, rewind operator to

the previous consistent state

Distributed Snapshots

Page 29: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Go Build!

Page 30: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Please complete the session

survey in the summit mobile app.

Page 31: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Thank you!

Page 32: Deep Dive into Concepts and Tools for Analyzing Streaming Dataaws-de-media.s3.amazonaws.com/images/AWS_Summit... · © 2018, Amazon Web Services, Inc. or its affiliates. All rights

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Watermarks and Allowed Lateness

t3 t1 t8 t480

Processing

Time

stream.keyBy(<key selector>).window(<window assigner>).allowedLateness(<time>).sideOutputLateData(lateOutputTag)

t5