IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on Apache Ignite

  • Published on
    09-Jan-2017

  • View
    274

  • Download
    1

Embed Size (px)

Transcript

PowerPoint Presentation

Test Driving Streaming and CEPon Apache IgniteMATT COVENTONSee all the presentations from the In-Memory Computing Summit at http://imcsummit.org

About meBig Data Services Lead at Innovative Software Engineeringhttp://www.iseinc.bizmattcoventon@iseinc.biz

What are we going to do today?An Overview of Apache Ignite Streaming and CEPDive into some code! A simple streaming/CEP use case

apache ignite streaming and cepOverview

What is streaming?Most commonly, streaming refers to processing unbounded data sets as they arrive to achieve lower latency and therefore more timely results.If you havent already, read these helpful posts that clarify the terms, techniques, and design patterns:https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102

What is CEP?Complex event processing, or CEP, is event processing that combines data from multiple sources to infer events or patterns that suggest more complicated circumstances. The goal of complex event processing is to identify meaningful events (such as opportunities or threats) and respond to them as quickly as possible (https://en.wikipedia.org/wiki/Complex_event_processing)

IN the Apache ignite contextApache Ignite In-Memory Data Fabric is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies.

Apache ignite streamingPrimarily a high performance means of inserting unbounded data sets into the Ignite Data Grid (cache) using IgniteDataStreamer APIStreamReceiver API offers custom pre-processingOther data processing through queries (including continuous queries) and cache policiesBacked by all kinds of Ignite goodness:ScalableFault-tolerantHigh throughputStreaming functionality atop a convergent data platform the future is bright!

Ignite data streamer api

Ignite data streamer apiIgniteDataStreamer API is the basic building block to writing unbounded data to IgniteScalableFault-tolerantAt-least-once-guarantee (watch out for duplicate data)Buffers data and writes in batches (may introduce unwanted latency, set perNodeBufferSize() and autoFlushFrequency() accordingly)

Stream receiver apiStreamReceiver API allows you to add custom, collocated pre-processing of the streaming data prior to putting it into the cache.Does not put data into the cache automatically, you need to handle that during processingSingle receiver per IgniteDataStreamerTwo out of the box implementation of StreamReceiverStreamTransformer updates data in the stream cache based on its previous valueStreamVisitor visits every key-value tuple in the streamMight be possible to implement watermark, trigger, accumulation patterns (depending on use case, see https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102)

windowingAchieved through cache eviction and expiry policiesUse eviction policies for size/batch basedConsider SortedEvictionPolicy with custom comparator for x most recent eventsUse expiry policies for time basedConsider notion of event time, ingestion time, and processing timeCreatedExpiryPolicy is ingestion time basedWhat if data is delayed?Consider a custom expiry policy based on event time

QueryingAll Ignite data indexing capabilities as well as Ignite SQL, TEXT, and Predicate based cache queries are available (its just another cache after all)Leverage continuous queries to filter events on the node and receive real-time notifications that match your criteriaAnother option to implement watermark, trigger, and accumulation patternsThis is where the complex event processing (CEP) magic happens leveraging distributed joins and cross-cache joins

DIVE into some codeA simple iot use case

A simple iot USE CaseMonitor productivity on manufacturing linesSensors stream number of items per second through IgniteDataStreamerData is retained in the cache for 60 seconds (windowing)Dashboard shows number of items per minute for each active line and the total items per minute for the entire factory

Ignite pom dependencies

Production line event

Cache config

Monitoring application main class

Monitoring application rest controller

Starting the ignite node

H2 DEBUG console with one line reporting

dashboard

Thank you!

Recommended

View more >