Upload
lisanl
View
57
Download
0
Embed Size (px)
Citation preview
© 2016 IBM Corporation
IBM Streams Hyperstate Accelerator
IBM Streams Version 4.2
John MacMillan
Senior Software Developer, HSA
For questions about this presentation contact [email protected]
2 © 2016 IBM Corporation
Important Disclaimer
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONALPURPOSES ONLY.
WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THEINFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTYOF ANY KIND, EXPRESS OR IMPLIED.
IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY,WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.
IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OROTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:
• CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS ORTHEIR SUPPLIERS AND/OR LICENSORS); OR
• ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENTGOVERNING THE USE OF IBM SOFTWARE.
IBM’s statements regarding its plans, directions, and intent are subject to change orwithdrawal without notice at IBM’s sole discretion. Information regarding potentialfuture products is intended to outline our general product direction and it should notbe relied on in making a purchasing decision. The information mentioned regardingpotential future products is not a commitment, promise, or legal obligation to deliverany material, code or functionality. Information about potential future products maynot be incorporated into any contract. The development, release, and timing of anyfuture features or functionality described for our products remains at our solediscretion.
THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.
IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
3 © 2016 IBM Corporation
What is Hyperstate Accelerator (HSA)?
Hardware accelerated shared state management– Designed for real-time analytics
– Can be configured as the Streams data store for:• Distributed Process Store (DPS) Toolkit
• Consistent Regions
5 © 2016 IBM Corporation
What is Hyperstate Accelerator again?
Designed for emerging analytics workloads– Faster, persistent, reliable
HSA is fast– Hardware accelerated communication
– Dramatically better performance than current offerings• Nearly 3x operations per second in accelerated mode
• Nearly 10x reduction in latency
Supports high-speed persistence– Hardware accelerated storage using IBM FlashSystems
– Built on IBM’s Trillion Operations Technology• Optimized for analytics workloads
High Availability– Fast fail-over without replication
6 © 2016 IBM Corporation
Hyperstate Accelerator in Streams
Exploits high speed network and in memory KVS for the RealTime Analytics Application
tier.
Real time Analytics App Tier
+IBM’s Trillion Operations Technology (TOT)
provides support for Trillions Of KV Operations A Day that is optimized for IBM's Shared Flash Systems with PetaByte (PB) scale persistent
data management and High Availability without replication.
RDMA over Converged Ethernet (RoCE) Network
Storage Network
Streams ServerStreams Server
Streams ServerStreams Server
Streams ServerStreams Server
Streams ServerStreams Server
Streams ServerStreams Server
Streams ServerStreams Server
HSA Server
RDMA
Optimized KVS
TOT Flash
Optimized
Persistence
IBM
FlashSystems
IBM
FlashSystems
HSA Server HSA Server
RDMA
Optimized KVS
TOT Flash
Optimized
Persistence
RDMA
Optimized KVS
TOT Flash
Optimized
Persistence
7 © 2016 IBM Corporation
Architecture Overview
Client API supporting multiple stores (maps) of KV data
RoCE (RDMA over Converged Ethernet) for client-server transfers– Can use TCP/IP
High Speed Single node systems with in-memory only stores
Persistent stores built on IBM’s Trillion Operation Technology– Up to 4 nodes sharing an IBM FlashSystems storage for High Availability
– No need to replicate storage
– Fast fail-over because data is already available to the spare node
Shard
ShardPip
elin
e TOT Container
TOT Container
Shard
Shard
Pip
elin
e TOT Container
TOT Container
IBM
Fla
sh
Syste
ms
Sh
are
d S
tora
ge
HSA Client
RoCE
8 © 2016 IBM Corporation
Client API– Internal in Streams, used by DPS Toolkit and Consistent Regions
– Connect to the server to get a session
– Within the session open and operate on separate KV stores called maps
– Map attributes include• Name
• Contents: raw binary key-value pairs or hashtables
• Whether lookup is supported
• Persistence mode: immediate or deferred
– Typical access operations
Data storage and access
Shard
ShardPip
elin
e TOT Container
TOT Container
Shard
Shard
Pip
elin
e TOT Container
TOT Container
IBM
Fla
sh
Syste
ms
Sh
are
d S
tora
ge
HSA Client
RoCE
9 © 2016 IBM Corporation
Data storage– Map is sharded over multiple containers
– Containers are spread across servers
– Immediate or deferred persistence
Access– Client hashes key and maps hash to container
– Connects to pipeline on server that manages that container
– Pipeline directs request to appropriate shard process
– Shard process reads / writes container
Data storage and access
Shard
ShardPip
elin
e TOT Container
TOT Container
Shard
Shard
Pip
elin
e TOT Container
TOT Container
IBM
Fla
sh
Syste
ms
Sh
are
d S
tora
ge
HSA Client
RoCE
10 © 2016 IBM Corporation
Optimized for high-velocity small write, random read– Random put operations batched into large, sequential write operations
– Combined log and data eliminates write operations
– Single index for unified lookup
– Single I/O read operation per key-value get operation
– Built directly on block interface to eliminate file system overhead
IBM’s Trillion Operations Technology
TOT Container
TOT Containers striped across LUNs
TOT
Index in
Memory
11 © 2016 IBM Corporation
2-4 node cluster, with 1 node spare (1-3 active)– Containers are assigned to active nodes
When a node fails– Containers from the failed node are assigned to the spare
– Client is informed about the new location of the containers
If failed node returns it becomes the spare
High Availability
Shard
ShardPip
elin
e TOT Container
TOT Container
Shard
Shard
Pip
elin
e
IBM
Fla
sh
Syste
ms
Sh
are
d
Sto
rag
e
Shard
Shard
Pip
elin
e TOT Container
TOT Container
TOT Container
TOT Container
12 © 2016 IBM Corporation
How fast is fast?
Internal benchmark
HSA with RDMA vs HSA with TCP
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
1800000
2000000
RDMA TCP
Per Server Throughput (op/s)(higher is better)
0
20
40
60
80
100
120
RDMA TCP
Latency (μs)(lower is better)
13 © 2016 IBM Corporation
How fast is fast?
Internal analytics workload
HSA vs RDB
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1 2 3 4 5 6 7 8 9 10
Analytics Workload Performance (TPM)(higher is better)
HSA RDB
14 © 2016 IBM Corporation
How fast is fast?
Trillion Operations Technology vs. Linux stock driver
0.5 0.5 0.5 0.5 0.5 0.5 0.41 0.41
5.468 5.4545.342 5.342 5.303
5.021
4.393
2.32
0
1
2
3
4
5
6
64 128 140 160 256 512 1K 2K
Writes by I/O size(higher is better)
Linux Stock Driver
15 © 2016 IBM Corporation
How fast is fast?
Trillion Operations Technology vs. Flash-optimized Open Source
0
1000
2000
3000
4000
5000
6000
64 128 256 512 1K
Get/Put Performance by I/O size(higher is better)
Get on OSS Get on TOT Put on OSS Put on TOT