Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Dr. Steffen Hausmann
Sr. Solutions Architect, Amazon Web Services
Deep Dive into Concepts and Tools for
Analyzing Streaming Data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data originates in real-time
Photo by mountainamoeba
https://www.flickr.com/photos/mountainamoeba/2527300028/
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Analytics is done in batches
Photo by PracticalHacks
https://www.flickr.com/photos/29225844@N05/2828724211
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Insights are Perishable
Photo by Lucas Cobb
https://www.flickr.com/photos/cobblucas/4780005097/
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Analyzing Streaming Data on AWS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges of Stream Processing
Photo by FollowYour Nose
https://www.flickr.com/photos/laprimadonna/3294467673
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Comparing Streams and Relations
𝑅 ⊆ 𝐼𝑑 × 𝐶𝑜𝑙𝑜𝑟
Relation
𝑆 ⊆ 𝐼𝑑 × 𝐶𝑜𝑙𝑜𝑟 × 𝑇𝑖𝑚𝑒
Stream
7
now
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Querying Streams and Relations
Relation Stream
Fixed data and ad-hoc queries
Fixed queries and
continuously ingested data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges of Querying Infinite Streams
SELECT * FROM S WHERE color = ‘black’
SELECT * FROM S JOIN S’
SELECT color, COUNT(1) FROM S GROUP BY color
... NOT EXISTS (SELECT * FROM S WHERE color = ‘red’)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Analyzing Streaming Data on AWS
• Runs standard SQL queries on
top of streaming data
• Fully managed and scales
automatically
• Only pay for the resources your
queries consume
Amazon Kinesis Analytics
• Open-source stream processing
framework
• Included in Amazon Elastic Map
Reduce (EMR)
• Flexible APIs with Java and
Scalar, SQL, and CEP support
Apache Flink
SQL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Evaluating Queries over Streams
Photo by Brad Greenlee
https://www.flickr.com/photos/bgreenlee/91309374/
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Evaluating Non-monotonic OperatorsTumbling Windows
SELECT STREAM color, COUNT(1)
FROM ...
GROUP BY STEP(rowtime BY INTERVAL ‘10’ SECOND), color;
t1 t3 t5 t6 t9
10 sec
SQL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Evaluating Non-monotonic OperatorsSliding Windows
SELECT STREAM color, COUNT(1) OVER w
FROM ...
GROUP BY color
WINDOW w AS (RANGE INTERVAL ’10’ SECOND PRECEDING);
t1 t3 t5 t6 t9
SQL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Evaluating Non-monotonic OperatorsSession Windows
t5 t6t1 t3 t8 t9
stream.keyBy(<key selector>).window(EventTimeSessionWindows.withGap(Time.minutes(10))).<windowed transformation>(<window function>);
session gap
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SELECT STREAM *
FROM S AS s JOIN S’ AS t
ON s.color = t.color
SELECT STREAM *
FROM S OVER w AS s JOIN S’ OVER w AS t
ON s.color = t.color
WINDOW w AS (RANGE INTERVAL ‘10’ SECOND PRECEDING);
Evaluating Unbounded Queries
t2 t4 t8t7
t1 t3 t5 t6 t9
S
S‘
SQL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Different Time Semantics
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Maintaining Order of Events
t1 t3 t8t7
Event Time
t1 t3 t8 7
Processing Time
t7
t11
t11
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Maintaining Order of EventsUsing processing time based windows
t1 t3 t8 t7
Processing
Time
processing
time
count
0
processing
time
count
10
t11
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Maintaining Order of EventsUsing multiple time-windows
SELECT STREAM
STEP(rowtime BY INTERVAL ’10’ SECOND) AS processing_time,
STEP(event_time BY INTERVAL ’10’ SECOND) AS event_time,
color,
COUNT(1)
FROM ...
GROUP BY processing_time, event_time, color;
SQL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Maintaining Order of EventsUsing multiple time-windows
t1 t3 t8 t7
Processing
Time
processing
time
event time count
0 0
processing
time
event time count
10 0
10 10
t11
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Maintaining Order of EventsUsing event time and watermarks
t1 t3 t8 t710 20
event time count
0
event time count
10
0
Processing
Time
t11
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Adding Watermarks to a Stream
- Periodic watermarks
- Assuming ascending timestamps
- Punctuated watermarks
stream.assignTimestampsAndWatermarks(new AscendingTimestampExtractor<MyEvent>() {
@Overridepublic long extractAscendingTimestamp(MyEvent element) {
return element.getCreationTime();}
});
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Different Processing Semantics
Photo by Dominic Alves
https://www.flickr.com/photos/dominicspics/6854063597/
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Consuming Data from a Stream
Consumer
Output sink
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Different Processing SemanticsAt-most Once Semantics
Consumer
Output sink
Offset store
pos 561
pos 561
pos 1105
pos 1105
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Different Processing SemanticsAt-least Once Semantics
Consumer
Output sink
Offset store
pos 561
pos 0
pos 0
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Different Processing SemanticsExactly-once Semantics
• At-least-once event delivery plus
message deduplication
• Keep a transaction log of
processed messages
• On failure, replay events and
remove duplicated events for
every operator
Message Deduplication
• State for each operator is
periodically checkpointed
• On failure, rewind operator to
the previous consistent state
Distributed Snapshots
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Go Build!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Please complete the session
survey in the summit mobile app.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Watermarks and Allowed Lateness
t3 t1 t8 t480
Processing
Time
stream.keyBy(<key selector>).window(<window assigner>).allowedLateness(<time>).sideOutputLateData(lateOutputTag)
t5