© 2015 IBM Corporation
At-least-once tuple processing with
consistent regions
IBM InfoSphere Streams Version 4.0
Gabriela Jacques da Silva
Research Staff Member
2 © 2015 IBM Corporation
Important Disclaimer
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONALPURPOSES ONLY.
WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THEINFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTYOF ANY KIND, EXPRESS OR IMPLIED.
IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY,WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.
IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OROTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:
• CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS ORTHEIR SUPPLIERS AND/OR LICENSORS); OR
• ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENTGOVERNING THE USE OF IBM SOFTWARE.
IBM’s statements regarding its plans, directions, and intent are subject to change orwithdrawal without notice at IBM’s sole discretion. Information regarding potentialfuture products is intended to outline our general product direction and it should notbe relied on in making a purchasing decision. The information mentioned regardingpotential future products is not a commitment, promise, or legal obligation to deliverany material, code or functionality. Information about potential future products maynot be incorporated into any contract. The development, release, and timing of anyfuture features or functionality described for our products remains at our solediscretion.
THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.
IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
3 © 2015 IBM Corporation
Agenda
What are consistent regions?
Tuple processing guarantees
Demonstration
Stages of consistent state establishment
@consistent annotation
Standard toolkit support
4 © 2015 IBM Corporation
Consistent regions enable topologies to checkpoint a
state consistent with fully processing a set of tuples
op1
op2
op3
time
op1
op2
op3
consistent
inconsistent
m1 m2 m3 m4
5 © 2015 IBM Corporation
On failures, state of the topology is reset to a
consistent one and tuples are replayed
op1
op2
op3
time
op1
op2
op3
m1 m2 m3 m4m3
6 © 2015 IBM Corporation
Use consistent regions when the application needs to
process every tuple at-least-once
Fine-grained selection of regions by using @consistent and @autonomous
@consistent
@autonomous
at-least-once*
may receive duplicates
on replay
at-most-once
*Exactly once if
• Operator can reset state and state
of external system
• Detect duplicate to avoid re-
processing
• Tuple processing is idempotent
7 © 2015 IBM Corporation
Demo
1. Run without consistent regions – incomplete output on failures
2. Run with consistent regions – complete output on failures
8 © 2015 IBM Corporation
Establishment of a consistent state has two stages,
while restoration has a single stage
Drain
All in-flight tuples are forced to be processed
Checkpoint Operator state (including state variables) is written to checkpoint backend
Checkpoint backend File system – levelDB
Redis Sharding
Replicas
Reset Operator state is read back from checkpoint backend
New StateHandler interface exposes stages to primitive operators
9 © 2015 IBM Corporation
Configure a consistent region by parameterizing the
@consistent annotation
@consistent(
trigger={periodic|
operatorDriven},
period=3.0,
drainTimeout=30.0,
resetTimeout=30.0,
maxConsecutiveResetAttempts=5)
How to start the establishment of a consistent state?
How often?
When to timeout a drain?
When to timeout a reset?
How many reset retries?
10 © 2015 IBM Corporation
Many Standard Toolkit operators have been adapted
Aggregate
Filter
Functor
Punctor
Join
Barrier
Beacon
CharacterTransform
Compress
Custom
Decompress
DeDuplicate
Delay
DynamicFilter
Format
Pair
Split
ThreadedSplit
Throttle
Union
DirectoryScan
FileSink
FileSource
MetricsSink
UDPSink
XMLParse
ReplayableStart
11 © 2015 IBM Corporation
More details can be found at streamsdev and
InfoCenter
http://www-
01.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.dev.do
c/doc/consistentregions.html
https://developer.ibm.com/streamsdev/2015/02/20/processing-tuples-least-
infosphere-streams-consistent-regions/
https://developer.ibm.com/streamsdev/docs/setup-redis-replication-infosphere-
streams-4-0/
https://github.com/IBMStreams/samples/tree/master/ConsistentRegions/
Samples at $STREAMS_INSTALL/samples/spl/feature/ConsistentRegion