30
Event Hub & Azure Stream Analytics Davide Mauri Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

Event Hub & Azure Stream Analytics

Embed Size (px)

Citation preview

Page 1: Event Hub & Azure Stream Analytics

Event Hub & Azure Stream Analytics

Davide Mauri

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

Page 2: Event Hub & Azure Stream Analytics

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

About MeMicrosoft SQL Server MVPWorks with SQL Server from 6.5, on BI from 2003Specialized in Data Solution Architecture, Database Design, Performance Tuning, High-Performance Data Warehousing, BI, Big DataPresident of UGISS (Italian SQL Server UG)Regular Speaker @ SQL Server eventsConsulting & Training, Mentor @ SolidQE-mail: [email protected]: @mauridb Blog: http://sqlblog.com/blogs/davide_mauri/default.aspx

Page 3: Event Hub & Azure Stream Analytics

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

Agenda• Complex Event Processing• The Lambda Architecture• Azure Stream Analytics

• Data Ingestion• Azure Stream Analytics Query Language• Advanced Features

• Additional Resources• Conclusions

Page 4: Event Hub & Azure Stream Analytics

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

Complex Event Processing• Event processing is a method of tracking and analyzing (processing)

streams of information (data) about things that happen (events)

• Complex event processing, or CEP, is event processing that combines data from multiple sources to infer events or patterns that suggest more complicated circumstances.

• Start to appear in 1990• Goal: identify meaningful events (such as opportunities or threats) and

respond to them as quickly as possible

Page 5: Event Hub & Azure Stream Analytics

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

Complex Event Processing Use Cases• Network monitoring• Intelligence and surveillance• Risk management• E-commerce• Fraud detection• Smart order routing• Transaction cost analysis• Pricing and analytics• Market data management• Algorithmic trading• Data warehouse augmentation Ref: http://www.infoq.com/articles/stream-processing-hadoop

Page 6: Event Hub & Azure Stream Analytics

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

The Lambda ArchitectureGeneric, scalable and fault-tolerant data processing architecture […] in which low-latency reads and updates are required.

Ref: http://lambda-architecture.net/

Page 7: Event Hub & Azure Stream Analytics

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

Hadoop but not only that!• Apache Hadoop Ecosystem is the typical solution nowadays

• “Mature” Option• Flume (optional collector and streaming data movement system)• Kafka (distributed messaging system)• Storm (distributed real-time computation system)

• “Innovative” Option• Spark + Spark Streaming

• Very powerful, but very complex

Page 8: Event Hub & Azure Stream Analytics

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

Why the Cloud? And why Azure?• Due to the high scalability and computing power that a streaming

solution may require, the cloud is a perfect environment for it

• Very cheap and Very Simple to start a project

• Very well integrated with all other Azure offerings• From Monitoring to Power BI

Page 9: Event Hub & Azure Stream Analytics

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

Stream analytics• Real-Time (somehow) complex event processing engine

• Enables real-time event processing in a very simple and cheap way• SQL-Like language• Temporal Semantic Support

• Different from SQL Server 2016• Specific for streaming data

• Azure Only at present time

Page 10: Event Hub & Azure Stream Analytics

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

Stream analytics• Platform-as-a-Service

• Can handle millions of events per second• Based on the REEF project (now Apache incubated)

• Main objects: Job, Query, Functions, Input & Outputs• Totally manageable from a REST interface

• “Streaming Units” is the base concept to manage performance, scalability and costs

• Roughly 1 Streaming Units = 1 MB/Sec of throughput

Page 11: Event Hub & Azure Stream Analytics

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

Stream analytics - Data ingestion• Inputs for Stream Analytics

• Streaming Sources (“Data in motion”)• JSON, CSV or AVRO

• Reference Data (“Data at rest”)• JSON or CSV• Blob Store (max 50MB)

• Streaming Sources• Event Hubs• IoT Hub

Page 12: Event Hub & Azure Stream Analytics

Stream analytics – High-Level Architecture

OUTPUT[Result of Query]

Azure SQL DB

Azure Event Hubs

Azure Blob Storage

INPUT

Source of Events

Azure Blob Storage

Azure Event Hubs

Reference Data

Query runs continuously against the incoming stream of events

Stream Analytics

QueryEvents have defined schema and are

temporal (sequenced in time)

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

Other Azure Stuff

Page 13: Event Hub & Azure Stream Analytics

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

Data ingestion• A nice tool to monitor Event Hub is the “Service Bus Explorer”

• https://github.com/paolosalvatori/ServiceBusExplorer

Page 14: Event Hub & Azure Stream Analytics

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

DEMOSimple Setup of Event Hubs, Source and Destination

Page 15: Event Hub & Azure Stream Analytics

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

Stream Analytics Query Engine• Take date from one or more input

• Send resulting data to one or more output

• Support most common data types:• bigint, float, unicode strings, datetime• key-value pairs• arrays

Page 16: Event Hub & Azure Stream Analytics

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

Stream Analytics Query Language• Stream Analytics Query Language Reference

• https://msdn.microsoft.com/library/azure/dn834998.aspx

• Subset of T-SQL

• With specific temporal extension• Time values to be used can be set using TIMESTAMP BY directive

Page 17: Event Hub & Azure Stream Analytics

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

Stream Analytics Query LanguageDML Statements•SELECT•FROM•WHERE•GROUP BY•HAVING•CASE•JOIN•UNION

Windowing Extensions•Tumbling Window•Hopping Window•Sliding Window•Duration

Aggregate Functions•SUM•COUNT•AVG•MIN•MAXScaling Functions• WITH• PARTITION BY

Date and Time Functions•DATENAME•DATEPART•DAY•MONTH•YEAR•DATETIMEFROMPARTS•DATEDIFF•DATADD

String Functions• LEN• CONCAT• CHARINDEX• SUBSTRING• PATINDEX

Statistical Functions•VAR/VARP•STDEV/STDEVP

Page 18: Event Hub & Azure Stream Analytics

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

DEMOStream Analytics Query in action

Page 19: Event Hub & Azure Stream Analytics

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

Advanced features• Partitioning Support

• Specially useful for high scalability

• CTE-Like constructs that also helps scaling out

• Temporal aggregations• Tumbling, Hopping and Sliding Windows

• (Temporal) Join between input streams

Page 20: Event Hub & Azure Stream Analytics

Tumbling window• Adjacent non-overlapping

windows• Answer to the question:

“What happened in the last X seconds? And in the next X? And in the next X?” And so on…

1 5 4 26 8 6 5

0 10 4020 30 Time (secs)

1 5 4 26

8 6

50

A 20-second Tumbling Window

60

3 6 1

5 3 6 1

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeekJoin the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

Page 21: Event Hub & Azure Stream Analytics

Hopping window

1 5 4 26 8 7

0 10 4020 30 50

A 20-second Hopping Window with a 10-second “Hop”

60

4 26

8 6

5 3 6 1

1 5 4 26

8 6 5 3

6 15 3

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

• Overlapping windows• Answer to the question:

“Each X second tell me what happened in the previous Y seconds”

• The same event can be in more than one windows

• Think to a “moving average”

Page 22: Event Hub & Azure Stream Analytics

Sliding window

1 5

0 10 4020 30 Time (secs)

50

A 20-second Sliding Window

1

8

8

5 1

9

5 1 9

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

• A forward moving window. Every time something happen, you get data of what happened in the last “X” seconds.

Page 23: Event Hub & Azure Stream Analytics

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

DEMOStream Analytics Full Power!

Page 24: Event Hub & Azure Stream Analytics

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

Stream analytics and machine learning• Apply AzureML model to streaming data

• Sample use-cases• Fraud Detection• Product Recommendation• Customer Sentiment Analysis• Maintenance Prediction

• Right now in preview and available only through the “old” portal• https://manage.windowsazure.com/

Page 25: Event Hub & Azure Stream Analytics

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

DEMOStream Analytics & Machine Learning

Page 26: Event Hub & Azure Stream Analytics

Stream analytics alternative (on azure)• Apache Storm

• IaaS or PaaS (With HDInsight)

• Much more complex to manage and develop…but much more powerful

• https://azure.microsoft.com/en-us/documentation/articles/stream-analytics-comparison-storm/

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

Page 27: Event Hub & Azure Stream Analytics

Stream analytics on-premises?• Apache Hadoop Ecosystem

• Flume / Kafka / Storm

• StreamInsight• CEP solution part of the SQL Server Platform

• EventStore • Javascript OpenSource CEP

• None of them (except EventStore) has native temporal extension

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

Page 28: Event Hub & Azure Stream Analytics

Additional resources• Online Documentation• Stream Analytics Reference Architecture• Lambda Architecture• GitHub Repository

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

Page 29: Event Hub & Azure Stream Analytics

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

Thanks!Questions?

Page 30: Event Hub & Azure Stream Analytics

Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek

Demos available on GitHubhttps://github.com/yorek/devweek2016