13
Managing Real-time Data Streaming Effectively with CEP Solutions

Managing Real-time Data Streaming Effectively with CEP ... · Production line data has become very complex and cumbersome to collect and store, and analyze later. The question is,

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Managing Real-time Data Streaming Effectively with CEP ... · Production line data has become very complex and cumbersome to collect and store, and analyze later. The question is,

Managing Real-timeData Streaming Effectively

with CEP Solutions

Page 2: Managing Real-time Data Streaming Effectively with CEP ... · Production line data has become very complex and cumbersome to collect and store, and analyze later. The question is,

Introduction

1

As Big Data has grown, the need to respond to it in near real-time has also grown. Businesses are relying on real-time analytics, alerting, on-line machine learning, and continuous computation to achieve their goals. Real-time streaming analytics is an emerging technique, that enables enterprises to collect, integrate, analyze, and visualize data, as and when it is generated. It processes data as it is being produced, without disrupting the activity of existing sources, storage, and enterprise systems. This emerging technique is an element of data stream processing, which is a broader term to representing a group of techniques.

Data stream processing is commonly applied to unstructured machine datastreams – event data generated by servers, applications, sensors, devices, andnetworks. Many industries have started adopting data stream processingtechniques to provide continuous and real-time visibility to understand theiroperations. This enables an organization to drive production, operations, sales,marketing, and other critical business functions with higher clarity.

Industries have significant investment and business processes based on theirdata analytics. Real-time data streams strengthen industries with dynamic dataanalytics and reduces losses, gain operational insights, and seize newopportunities. The processing of messages as they arrive is termed as, “realtimeprocessing”. Complex Event Processing (CEP) combines data frommultiple sources to infer events or patterns that suggest more complicatedcircumstances. It is an emerging technology that creates actionable, situationalknowledge from distributed message-based systems, databases, andapplications in real-time or near real-time. When complex events are processedand used effectively, they can provide organizations with the capability to define,manage, and predict events, situations, exceptional conditions, opportunities,and threats in complex, heterogeneous networks.

HTC has spent a lot of time to understand the intricacies of CEP and the CEPengines available for industrial support. Having understood, the value andbenefits of real-time processing, HTC has created a team of professionals towork on potential CEP solutions, for the benefit of industries at large.

Page 3: Managing Real-time Data Streaming Effectively with CEP ... · Production line data has become very complex and cumbersome to collect and store, and analyze later. The question is,

2

Some of the application areas of CEP engine include a wide range of industries where real-time events decide the growth of the industry. For example, real-time events are critical in the security industry where intruder events are collected andanalyzed for intrusion and data leakages. Real-time events have emerged as acritical component in stock trading for quick decisions. Real-time events havesurfaced as a core entry in emergency healthcare.

What is the use of Real-time Streaming Analytics?

CEP solutions have started finding their rightful place in several organizationsas a part of their emerging applications. For example, to identify a fraudulentdebit card transaction in an ATM, banks evaluate the time of transaction,geographical location of the withdrawal event, and other related data fromvarious sources. Streaming analytics moves one step further and allows banksto do event processing against really massive volumes of data streaming intothe enterprise at an extremely high velocity. A stream analytics platform canprocess millions and even tens of millions of events per second.

The data in a streaming analytics environment is processed before it lands in adatabase. In traditional analytics, information was gathered, stored, and laterused for deriving analytics which was called “at-rest analytics”. Streaminganalytics is a method through which organizations gain “perishable” insights orinsights they can only detect and act on at a moment’s notice. With datastreaming technologies, the analytics is carried out on moving data addressingthe data “as is” to derive actionable insights.

HTC has analyzed the application of CEP solutions to various industry verticals.This white paper is built around the business benefits of CEP adoption in semiconductor industries.

The semi-conductor industry deploys numerous production sensors. Thesesensors are used for thermal sensing and temperature measurements of silicondevices which is a necessary step towards the qualification and certification ofthe device. Deployment of CEP solutions enable real-time actionablemeasurements. The sequential sensor data events are fed into a CEP engine tocontinuously process the data stream and observe adherence to prescribedbusiness rules. Whenever a deviation from the normal business rules isdetected, the CEP engine is designed to trigger actions or alarms. These triggers

Page 4: Managing Real-time Data Streaming Effectively with CEP ... · Production line data has become very complex and cumbersome to collect and store, and analyze later. The question is,

3

coupled with triggered actions empower the semi-conductor industries toimprove production and increase the return from their investments.

What is Streaming Technology?

Stream processing requires two specific technology capabilities. The firstcomponent for stream processing is ingesting data from multiple sources.Organizations need to have suitable techniques to ingest data. Often, the datatypes and sources are varying with differential data speeds. Any technology thatis used for stream processing must consume different data types at very highvolumes from varying sources. The second component is an analytics enginethat must be capable of filtering, aggregating and correlating streaming data foridentifying useful patterns. Such patterns are termed as business rules.Sometimes, to detect patterns and enable useful insights, there may be a need toenrich the data stream with data from existing databases. This is required tomatch the streams with historical patterns for understanding long termvariations.

Organizations use the power of CEP engine to combine stream processing withbatch processing and derive optimal results from their data. Such situations callfor storing the relevant streaming data for historical reasons and for analytics.Data streams which pass through the CEP engine queries are stored for laterretrieval and analysis.

Page 5: Managing Real-time Data Streaming Effectively with CEP ... · Production line data has become very complex and cumbersome to collect and store, and analyze later. The question is,

4

HTC deployed a variant of event processing solution for the banking domain toprocess high volume data. The processed data was used to maintain export logsfor reporting to RBI. The system resulted in measurable benefits and enabled thebanks to retrieve the applicable information on demand.

What is Streaming Analytics Use Cases?

Multiple use cases exist for streaming analytics. For instance, dashboards andvisualization software with streaming analytics can help industries visualizeand monitor their business in real-time. Such tools can be used to monitor socialsentiment and changing customer attitudes.

Similarly, streaming analytics capabilities can be used to enable real-time alertsor leverage new business opportunities, like making promotional offers tocustomers based on where they might be at a specific time. Streaming analyticscapabilities are also vital in the security-monitoring context as they provideindustries a way to quickly correlate seemingly disparate events to detect threatpatterns and their risks.

HTC suggested solutions built around CEP engine for a leading retailer whenthey were setting up online business. The real-time processing of the customer’slikings were analyzed and projected for prompt decisions enabling the retailer topost offers / discounts.

What is the use of Real-time Data Streamingin Semi-Conductor Industry?

With the Internet of Things (IoT) ramping up, the demand for chips has increasedfrom smartphone to automotive manufacturers. With this fast moving chipmarket and billions of devices being manufactured and deployed, defectivedevice returns, have become a central issue that impacts profitability. Semiconductor companies have a significant challenge to manage the production of chips in production line along with associated sensor data points. Production line data has become very complex and cumbersome to collect and store, and analyze later. The question is, whether internally developed data analytic solutions can continue to meet the demands of its users, generate operational insight, and provide the actionable information needed to ensure that they can achieve yield, quality and productivity targets.

Page 6: Managing Real-time Data Streaming Effectively with CEP ... · Production line data has become very complex and cumbersome to collect and store, and analyze later. The question is,

5

How does Faster Data Access Enable Better Decision-making?

Semi-conductor operations management team and executive management arecollectively interested in capturing and analyzing the streaming data sets, fromproduction line and across their industry's global supply chain to improve yield,quality, and productivity. The conventional practice was built around analyzingthe manufacturing data spread over days or weeks after the production wascompleted and the chips were tested leading to latency. Due to the data latency,the primary outcome of data analysis was restricted to safeguard against futuremanufacturing or testing errors and did not address concerns with devices thathad already left the production floor. Some bad chips escaped into the supplychain and many good chips were labelled as bad and thrown away due to faultswith either test equipment or test programs. This was an accepted outcomesince it was the only solution available at that point in time.

With the introduction of real-time big data analytics, semi-conductormanufacturers can now make actionable decisions to prevent test escapes (badchips passed off as good chips) or reclaim yield (good chips labeled as badchips) while it is still possible during the manufacturing process to improvequality or yield.

“Using the right tools at right place at the right time brings enormous businessbenefits by taking preventive action immediately to avert catastrophes.” Practice Head – Big Data, HTC Global Services

What is Failure Analysis?

Fault and failure management is an essential part of operations. Failuremanagement is an element of fault management. It is the process involved indetermining the failure of devices and production line elements. This isaddressed through root cause analysis. Process owners can determine the rootcause of the failure and not just the intermediate causes that occur after the rootcause triggers the failure. Failure analysis is necessary to prevent similarfailures in the future.

What is Preventive Maintenance?

Preventive maintenance is an element of fault management. In earlier days, thesemiconductor industry performed maintenance after an equipment fails. Thisresulted in the need for preventive maintenance. More recently, Condition BasedMaintenance (CBM), enabled by the advent of monitors and sensors, have begun to provide views of the equipment’s operating conditions through data

Page 7: Managing Real-time Data Streaming Effectively with CEP ... · Production line data has become very complex and cumbersome to collect and store, and analyze later. The question is,

6

streams in real-time. Advanced analysis techniques such as real-time Fault Detection and Classification (FDC) are used to identify whether performance indicators have deteriorated to a predetermined threshold or control limit. The CBM applications trigger a maintenance event in factory systems when required.

How is the Dispersed Data Collected and Managed?

Quality Control and Fault Management calls for macroscopic and microscopicdata visualizations. However, it is clear that the data is highly dispersed due tovarious economic conditions and outsourced business benefits creatingdispersed supply chain environments. Semi-conductor manufacturers haveglobally dispersed supply chains. As a result, wafer manufacturing, wafer sort,and final test are often conducted across countries, which leads to thegeneration of fragmented data across their manufacturing operations. By thetime the test data is aggregated from the locations within a givensemi-conductor supply chain, the opportunity to analyze the data and makemeaningful decisions has passed, leading to lost opportunities. This situationhas resulted in many companies accepting a trade-off with the ongoingproblems of yield loss and test escapes that could have been avoided if thesource of manufacturing problem was found sooner.

There are several losses in wafer manufacturing that demand attention. Theseinclude equipment downtime, yield losses, setup time, batching/ dispatching,rework, speed loss, lot wait and hot lots. The losses indicate that the scope fordata mining and analysis is vast. Until the emergence of big data solutions, semiconductor companies found themselves having to make wide-ranging business decisions based on data that was inconsistent, incomplete, and in many cases, out-of-date. CEP solution framework enable dispersed data collection a reality with publish/subscribe solution techniques coupled with CEP engine capabilities built on business specific rules to generate the required insight.

More information on publish/subscribe solutions are available in thePublish/Subscribe Solutions for Responding to Events in Real-time white paper.

Page 8: Managing Real-time Data Streaming Effectively with CEP ... · Production line data has become very complex and cumbersome to collect and store, and analyze later. The question is,

7

How Real-time Big Data Analytics Enable Detecting and Acting?

Effective fault management and better manufacturing operational decisionslead to a healthier return on investment (ROI). Though enough fault management and quality control measures are in place, the measured data sets are dispersed. A big data solution analyses manufacturing data streams directly from every location in a global supply chain to allow companies to make actionabl business decisions within minutes of test completion. This process markedly improves yield, quality and productivity. Since a single view of global supply chain data streams is a reality, proactively and automatically mining data in real time emerges as an important outcome. The ability to automate data rules and algorithms enables engineers to easily identify potential problems. Immediate returns in test time reduction, yield recovery and escape prevention are all derivatives of investing in a big data solution. Business units can find unique value by accessing the same data in real time. Operations managers, testengineers and product planners now can take full advantage of a consistent,complete and accurate dataset and use it in their daily tasks to improve overallbusiness operations.

Big Data Upshot

Big data solutions help semi-conductor manufacturers to address the problemsof data collection, detection, and action to improve their yield, quality, orproductivity, providing extensive business value. Today, many of the world'slargest semi-conductor companies are harnessing the power of big dataanalytics to help them collect and analyze their manufacturing data. Thesecompanies are using big data solutions to safeguard their investments toproduce high-quality chips while simultaneously augmenting profit margins andmarket share.

CEP Engine in Real-time Analytics

CEP technologies were created to process huge streams of events at high ratewith low latency. The most widely adopted open source CEP engine is Esper. Itenables rapid development of applications that process large volumes ofincoming messages or events, regardless of whether incoming messages arehistorical or real-time in nature. Esper is an open source solution that has earnedstrong scores for runtime architecture, platform administration, and eventprocessing features. CEP engine’s highly embeddable Java architecture, strong

Page 9: Managing Real-time Data Streaming Effectively with CEP ... · Production line data has become very complex and cumbersome to collect and store, and analyze later. The question is,

CEP feature set, and open source status make it a top candidate to be embedded in other vendor tools or applications. It provides the Event Processing Language (EPL) designed for concisely expressing situations and fast execution against both historical and currently-arriving events.

CEP engine is applied to streaming data that triggers actions when the incomingdata satisfy the predefined rules. CEP libraries are available for Java and .NETplatforms. The engine keeps all the required data structured in memory makingthe processing fast. The core of the CEP system is the engine that consists a setof standing queries (or rules). It provides real-time big data analytics as it is a ’NoDatabase’ technology meaning that no data has to be saved. CEP engine storesrules in the engine, and when new data arrives it checks whether or not theserules are fired. This procedure is continuous as new arriving data are processedserially, and the engine responds in real time if any of the stored events meets theconstraints. The triggered events can be pushed further into the engine, feedingother rules or sent to their listeners.

Listeners are associated with rules and define the actions to be taken when therule is activated. The user can create queries and add them into the engine.These queries are written in EPL and their syntax is similar to SQL queries. Themain difference between EPL and SQL is that EPL uses views instead of tables.Views are the different operations applied to the incoming data to structure datain an event stream. Each EPL query defines a sliding or batch window of theincoming stream that it monitors.

Page 10: Managing Real-time Data Streaming Effectively with CEP ... · Production line data has become very complex and cumbersome to collect and store, and analyze later. The question is,

What are the Benefits of EPL?

EPL converges event stream processing (filtering, joins, aggregation) andcomplex event processing (causality) into one single language. The corelanguage is SQL, conforming to ensure rapid learning. It is also highly orientedtowards supporting modern technologies. So it is an example of object oriented(more than table oriented) that enables simple extension. The language includesevent windows and causality patterns as first citizens. The engine nativelysupport several types of event formats from Java/.Net object, maps, to XMLdocuments, etc.

What Latency can be Achieved with CEP Engine?

The engine provides real-time Big Data analytics for immediate insight, turninghigh velocity log and other machine data into streaming operational intelligence.Esper is a 'NoDatabase' technology since no data is stored. Instead, data arrivesas real-time streams and is processed in-memory using continuous SQLcon-forming queries. This allows for massively parallel streaming dataprocessing, ensuring the best use of today's multi-core, multi-blade servers. Itpermits applications to be deployed within a fraction of the time and at a fractionof the cost as alternative Big Data analytics solutions.

Immediate Benefits of Applying CEP Engine

The CEP engine works like a database turned upside-down: Instead of storing the data and running queries against stored data - the CEP engine enablesapplications to store queries and run the data. Response from the engine occursin real-time when conditions occur that match queries. The execution model isthus continuous.

When such event streams are available, it is then easy to add additional queries.Very little or no coding effort is required to analyze existing event streams byadding new queries, or refine existing queries. Esper EPL is designed forexpressiveness and ease-of-use with a familiar SQL look-alike, and can also beextended in numerous ways - providing great agility in expressing businessimperatives.

As query results are available with very little or no delay, the technology brings atremendous business value by providing immediate information. Thetechnology is designed to handle very high volume of events with minimallatency, and remove the need to store events.

9

Page 11: Managing Real-time Data Streaming Effectively with CEP ... · Production line data has become very complex and cumbersome to collect and store, and analyze later. The question is,

10

Events can be domain representations of relevant steps in a business process orproblem space, and remain agnostic to the technology used to accomplishintegration - be it format, transport or protocol - which ensures existing systemstailored for such tasks can fully be reused.

Page 12: Managing Real-time Data Streaming Effectively with CEP ... · Production line data has become very complex and cumbersome to collect and store, and analyze later. The question is,

11

Acronyms

The acronyms used in this white paper and their expansion are provided below:

Acronym Expansion

CBM

CEP

Condition Based Maintenan ce

IOT

Internet of Things

FDC Fault Detection and Classi�cation

EPL

Event Processing Language

ROI Return on Investment

Complex Event Processing

References

1. Complex event processing https://en.wikipedia.org/wiki/Complex_event_processing

2. How Can You Benefit from Real-time Streaming Analytics? http://streamanalytix.com/overview

3. Real-Time Big Data Analytics Impacts Semiconductor Manufacturing Industry http://www.ebnonline.com/author.asp?section_id=3525

4. http://www.espertech.com/esper http://www.espertech.com/esper/

5. Driving Semiconductor Manufacturing Business Performance Through Analytics https://www.wipro.com/documents/driving-semiconductor-manufacturing business-performance-through-analytics.pdf

6. Insights on a Scalable and Dynamic Traffic Management System http://openproceedings.org/2015/conf/edbt/paper-342.pdf

7. Complex Event Processing Made Simple Using Esperhttp://www.theserverside.com/news/1363826/Complex-Event-Prcessing-Made-Simple-Using-Esper

Page 13: Managing Real-time Data Streaming Effectively with CEP ... · Production line data has become very complex and cumbersome to collect and store, and analyze later. The question is,

About HTC's Big Data CoE

HTC's Center of Excellence for Big Data Management and Analytics brings in mature technologies and thought leadership. Our dedicated R&D team develops highly customized and cost-effective cutting edge solutions to enable clients manage and understand big data for improved and quicker decision making.

This white paper was developed by HTC's Big Data CoE.

8

World Headquarters3270 West Big Beaver RoadTroy, MI 48084, U.S.APhone: 248.786.2500Fax: 248.786.2515www.htcinc.comReaching out… through IT ®

USA l UK l Germany l India l Malays ia l S ingapor e l UAE l Austra l ia l Indones ia