Upload
lisanl
View
159
Download
1
Tags:
Embed Size (px)
Citation preview
© 2015 IBM Corporation
InfoSphere Streams V4 update
Mike SpicerSTSM, Lead Architect InfoSphere StreamsFor questions about this presentation contact Mike Spicer via [email protected]
© 2015 IBM Corporation
Important Disclaimer
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONALPURPOSES ONLY.
WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THEINFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTYOF ANY KIND, EXPRESS OR IMPLIED.
IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY,WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.
IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OROTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:
• CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS ORTHEIR SUPPLIERS AND/OR LICENSORS); OR
• ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENTGOVERNING THE USE OF IBM SOFTWARE.
IBM’s statements regarding its plans, directions, and intent are subject to change orwithdrawal without notice at IBM’s sole discretion. Information regarding potentialfuture products is intended to outline our general product direction and it should notbe relied on in making a purchasing decision. The information mentioned regardingpotential future products is not a commitment, promise, or legal obligation to deliverany material, code or functionality. Information about potential future products maynot be incorporated into any contract. The development, release, and timing of anyfuture features or functionality described for our products remains at our solediscretion.
THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.
IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
© 2015 IBM Corporation
Streams V4 Update
A Major Release
–Next generation Architecture
–Automated System High Availability
–Application Resiliency
–Streams for Excel
–Toolkit Enhancements
Released March 2015
3
© 2015 IBM Corporation
Automated System High Availability
“Without specialized HA skills, an administrator can
quickly and easily configure Streams to be resilient
and use a single console to manage multiple
instances with common users and hosts.”
New next generation architecture– Simpler Setup & Administration
– More Secure
– More Resilient
– More Automatic
– More Dynamic
– New JMX API
4
© 2015 IBM Corporation
Automated System High Availability
Simpler Setup & Administration– Reduce dependencies (Shared FS, DB2, SSH) and support versioning– Multi instance management with new Domain concept, single console, and interactive streamtool– Simpler Management Service Configuration, fully automatic or controlled using tags
Comprehensive Monitoring and Management API– JMX api provides secure interface for full programmatic management and monitoring of Streams
More Secure– Removed dependency on SSH and OS users– Authentication and authorization checks for all api and tooling requests– Improved LDAP support (Microsoft Active Directory and multi-part lookup) – Improved security model (Roles and Job Groups)– Improved Audit Log support
More Resilient– Recovery always on (Zookeeper), support redundant services to remove single point of failure
More Automatic– Automatic failover & restart of services, automatic recovery of host failures– Automatic notification of system changes and service relocation for resource changes
More Dynamic– Support more dynamic configuration changes
© 2015 IBM Corporation
Moving beyond an Instance centric model
Tooling
Domain Metadata Catalog
Instance Services
Host Controller
PEC
PEC
Host
Instance
Domain
Host Controller
PEC
PEC
Host
Instance Services
Host Controller
PEC
PEC
Host
Instance
Host Controller
PEC
PEC
Host
Instance Metadata Catalog Instance Metadata Catalog
Domain Services
New Streams Domain
© 2015 IBM Corporation
New Streams Domain
A container for instances which provides a single point for configuring and managing common resources, security and instances
– A domain can contain 0 or more instances
– A single management console for a domain and all of its instances
The domain is responsible for the following:
– Configuration : Global configuration for the Streams domain and defaults
for new instances.
– Instance management : Allow users to configure and manage instances.
– Resources : Allow users to configure and manage the host resources
available for instances in the domain.
– Security : Users are configured and managed by the domain. The domain
is responsible for authenticating users and checking that they are
authorized to perform actions against the domain and instances.
– Public API : Provide JMX and REST apis to manage and monitor the
domain and instances.
© 2015 IBM Corporation
Service A(leader)(standby)
Service A
Scenario 1: Management Host Failure
Services are running with a HA Count of 3
A Host failure is detected
If a Service on the Host was the leader, a standby takes over
A replacement service is started
Another Host becomes available and is tagged for management services
The Services are load balanced across the management hosts
Resource A Resource B Resource C
Service A Service A
“Management” “Management” “Management”
(standby) (standby)(leader)
Service A(standby)
Resource D“Management”
© 2015 IBM Corporation
Scenario 2: Application Host Failure
An Applications PEs are running across several Hosts
A Host failure is detected
PEs are started on alternative application Hosts
Streams are reconnected
Resource A Resource B Resource C“Application” “Application” “Application”
Source
Source
Sink 1
Sink 2
Op 2
Op 1 Op 1
© 2015 IBM Corporation
All New Admin Console
A single console for a domain
A summary of system health is always visible
Dashboard widgets can be flipped for graphical & textual views
Tree based view similar to Streams Studio system Explorer
Context based actions
© 2015 IBM Corporation
“With a simple annotation and HA compliant operators, a
developer can guarantee all data is processed.”
Consistent State – A point in time where all tuples for all
streams in a consistent region have been fully processed
by the operators in the consistent region.
11
op1
op2
op3
Application Resiliency
© 2015 IBM Corporation
Region is
Consistent
Source initiates consistency with the Controller
Source drains processing and checkpoints state
Operators in the region drain processing and checkpoint state
Controller confirms a consistent state has been established
Processing resumes
12
Source SinkOp 1
Op 2
Controller
Iniate
Consistent
State
Establishing A Consistent State
© 2015 IBM Corporation
Region is
Consistent
The controller detects a failure and Initiates a reset
Source resets state from the last consistent state checkpoint
Failed PEs are restarted and Streams are reconnected
Operators in the region reset state from the last consistent state checkpoint
Controller confirms recovering to a consistent state
Processing resumes with the source replaying tuples since the last consistent state
13
Source SinkOp 1
Op 2
Controller
Reset
Region
Recovering To A Consistent State On Failure
© 2015 IBM Corporation
Streams guarantees that a consistent region will process all data at
least once– Operators at the start of a consistent region must be able to replay data
• Can be achieved using a new replay operator which we provide
Exactly once semantics can be achieved when all operators in a
consistent region have at least one of the following characteristics:– Can reset their state and the state of any external system they interact with
to the last consistent state on a reset marker
– Can detect duplicates tuples being replayed since the last consistent state
and do not process them again
– Are idempotent (tuples can be processed multiple times without changing
the result beyond the initial processing of the tuple)
14
Consistent Region Application Semantics
© 2015 IBM Corporation
IBM InfoSphere Streams for Microsoft Excel
“An Excel user can quickly and easily identify and access
streaming data, to enable analysis and visualization on
continually updating data with the full power of Excel”
© 2015 IBM Corporation
Streams for Excel
An Excel add-in using Excel Real Time Data (RTD)
A Stream is made available to Excel using a simple annotation@view(name = "VMStatData", port = Stats, sampleSize = 50, bufferSize = 100,
description = “Memory related statistics", activateOption = automatic)
Streams for Excel shows the Streams the user has authorization for
–Name, description, attributes and properties of the Stream
–Search and Favorites to locate streams of interest
Streams are dragged onto spreadsheet
–Entire Stream or individual attributes
–Data is continually updated, and can be paused
Full Excel functionality on the data
–Charts, Formulas, Cut & Paste
Spreadsheets can be saved and sent to others
–Stream data will continue when reconnect to Streams
© 2015 IBM Corporation
Toolkit Enhancements
Timeseries Toolkit
– New Operators & Functions• AnomalyDetector: Online detection of anomalous patterns
• KmeansClustering: Builds a K-Means cluster model and scores incoming data against it
• DSPFilterFinite: filters a “fixed-length” time series
• CrossCorrelateMulti: Cross correlates more than two time series simultaneously
• Distance functions: Calculate the distance between two time series (DTW, LCSS & LpNorm)
– Improved Operators• CrossCorrelate2, DWT2, VAR2, ReSample, FFT, RLSFilter
GeoSpatial Toolkit
– GeoFence – returns set of polygons (fenced areas) that contain a location
– Hangout – determines if an entity is “hanging out”
– SpatialGridIndex – objects in the index within the given radius of the point
– SpatialSplit – route tuples based on location (similar to Split operator)
17
© 2015 IBM Corporation
Open Source Streams Toolkits on GitHub
IBMStreams repository on GitHub
–https://github.com/IBMStreams
Over 25 Toolkits as well as samples, benchmarks and demos
–MongoDB, HBase, Kafka, Thrift, JSON, Parquet
–Streams YARN Resource Manager
The Messaging, iNet & HDFS Toolkits available on GitHub
–https://github.com/IBMStreams/streamsx.messaging• New support for Kafka & improvements to JMS & MQTT
–https://github.com/IBMStreams/streamsx.inet• New & extended inet operators
–https://github.com/IBMStreams/streamsx.hdfs• Added support for compressed binary files
18
© 2015 IBM Corporation
InfoSphere Streams V4 & Roadmap
Mike Spicer - STSM, Lead Architect InfoSphere Streams