Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
February 2013
Mellanox Big Data Solutions
© 2013 Mellanox Technologies 2
Leading Supplier of End-to-End Interconnect Solutions
Host/Fabric Software ICs Switches/Gateways Adapter Cards Cables
Comprehensive End-to-End InfiniBand and Ethernet Portfolio
Virtual Protocol Interconnect
Storage Front / Back-End
Server / Compute Switch / Gateway
56G IB & FCoIB 56G InfiniBand
10/40/56GbE & FCoE 10/40/56GbE
Fibre Channel
Virtual Protocol Interconnect
© 2013 Mellanox Technologies 3
A scalable fault-tolerant distributed system for data storage and processing
Hadoop has two main systems
• Hadoop Distributed File System: self-healing high-bandwidth clustered storage.
• MapReduce: distributed fault-tolerant resource management and scheduling coupled with a scalable data
programming abstraction.
Key values
• Flexibility – Store any data, Run any analysis.
• Scalability – Start at 1TB/3-nodes grow to petabytes/1000s of nodes.
• Economics – Cost per TB at a fraction of traditional options.
Hadoop Framework
HDFS™ (Hadoop Distributed File System)
Map Reduce HBase
DISK DISK DISK DISK DISK DISK
Hive Pig
Map Reduce
HDFS™ (Hadoop Distributed File System)
© 2013 Mellanox Technologies 4
Capabilities are Determined by the weakest component in the system
Different approaches in Big Data marketplace – Same needs
• Better Throughput and Latency
• Scalable, Faster data Movement
Big Data Needs Big Pipes
* Data Source: Intersect360 Research, 2012, IT and Data scientists survey
Big Data Applications Require High Bandwidth and Low Latency Interconnect
© 2013 Mellanox Technologies 5
The Map Reduce Process
~~~~~~~
~~~~~~~
~~~~~~~
~~~~~~~
~~~~~~~
~~~~~~~
~~~~~~~
~~~~~~~
~~~~~~~
~~~~~~~
~~~~~~~
~~~~~~~
~~~~~~~
~~~~~~~
~~~~~~~
~~~~~~~
~~~~~~~
~~~~~~~
© 2013 Mellanox Technologies 6
Mapping
Themis
Hera
Zeus
Haas
Sloan
Fuqua
Themis
~~~~~~~
~~~~~~~
~~~~~~~
Geo( ),1
Geo( ),1 Geo( ),1 Geo( ),1 Geo( ),1 Geo( ),1 Geo( ),1
Geo( ),1
Ggod (Hera),1 Ggod (Zeus),1
Titan (Themis),1 Cartoon ( ),1
BS(Haas),1
BS(Sloan),1
BS(Fuqua),1
Titan ( ),1
Titan (Themis),1
Geo( ),1
Geo( ),1
Cartoon ( ),1
Titan ( ),1 Titan ( ),1
Titan ( ),1 Titan ( ),1
© 2013 Mellanox Technologies 7
Reducing
Geo( ),1
Geo( ),1 Geo( ),1 Geo( ),1 Geo( ),1 Geo( ),1 Geo( ),1
Geo( ),1
Ggod (Hera),1 Ggod (Zeus),1
Titan (Themis),1 Cartoon ( ),1
BS(Haas),1
BS(Sloan),1
BS(Fuqua),1
Titan ( ),1
Titan (Themis),1
Geo( ),1
Geo( ),1
Cartoon ( ),1
Titan ( ),1 Titan ( ),1
Titan ( ),1 Titan ( ),1
Geo , 10
Ggod , 2
BS , 3
Titan , 7
Cartoon , 2
Geo( ),1
Geo( ),1
Geo( ),1 Geo( ),1 Geo( ),1 Geo( ),1 Geo( ),1
Geo( ),1
Geo( ),1
Geo( ),1
BS(Haas),1
BS(Sloan),1
BS(Fuqua),1
Titan ( ),1
Titan (Themis),1
Titan ( ),1 Titan ( ),1
Titan ( ),1 Titan ( ),1
Titan (Themis),1 Cartoon ( ),1
Cartoon ( ),1
Ggod (Hera),1 Ggod (Zeus),1
© 2013 Mellanox Technologies 8
Map Reduce Serialization
© 2013 Mellanox Technologies 9
Shuffle and Merge
Shuffle
• All nodes to nodes data transfer
• Co-exist with map and merge workload
Merge
• Heavy lifting of mapped data
• Machines’ memory bounded
• Multiple merges per MoF
© 2013 Mellanox Technologies 10
Network-Levitated Merge Algorithm
© 2013 Mellanox Technologies 11
Shuffle
Merge
New
Algorithm
Time start
Map Map Map Map
Map Map Map Map
Map Map Map Map
Map
Stage
Reduce
Reduce
Header fetch
Header fetch
shuffle merge
shuffle merge
New Pipelined Data Flow
11
© 2013 Mellanox Technologies 12
Plug-in architecture • Open-source
• Coming soon – Git / Googlecode repository
Accelerates Map Reduce Jobs • Accelerated merge sort
Efficient Shuffle Provider • Data transfer over RDMA
• Supports InfiniBand and Ethernet
Supported Hadoop Distributions • Apache 3.0 – In the main trunk! • Apache 2.0.3 – In the main trunk*!
• Apache Hadoop 1.0.x ; 1.1.x • Cloudera Distribution Hadoop 3 update 4 (CDH3u4) • Hortonworks HDP 1.1
Supported Hardware • ConnectX®-3 VPI • SwitchX-2 based systems
Unstructured Data Accelerator - UDA
HDFS™ (Hadoop Distributed File System)
Map Reduce HBase
DISK DISK DISK DISK DISK DISK
Hive Pig
Map Reduce
© 2013 Mellanox Technologies 13
UDA - Software Architecture
JobTracker
TaskTracker
ReduceTask
TaskTracker
MapTask
Hadoop (Java)
RDMA NIC / HCA
UDA Plugin (C++)
MOFSupplier
Data Engine RDMA
Server
NetMerger
RDMA Client Merging
Thread
Merging
Thread
Merging
Thread
© 2013 Mellanox Technologies 14
Double Hadoop Performance with UDA
*TeraSort is a popular benchmark used to measure the performance of Hadoop cluster
~50% Disk Access CPU Efficiency 2.5X
**1TB Data Set, 5x E5-2680 Machines, 4x HDD Base; Vanilla Apache Hadoop 1.0.3; UDA Apache Hadoop 1.0.3+UDA
* **
~2X Faster Job Completion! Increase the Value of Data!
© 2013 Mellanox Technologies 15
With RDMA Solution No RDMA Solution
Leveraging RDMA and Network Levitated Merge
Mappers Reducers Mappers Reducers
© 2013 Mellanox Technologies 16
Where Do you Get the Savings?
400GB Terasort Results, Vanilla Vs. UDA, CDH3u4, 5 Nodes, IB FDR
Vanilla UDA
Decrease HDFS and File Disk Access by ~50%
© 2013 Mellanox Technologies 17
HDFS is the Hadoop File System
• The underlying File system for HBase and other NoSQL Data Bases
More Drives, Higher Throughput is Needed
SSDs Solutions Must use Higher Throughput
• Bounded by 1GbE and 10GbE
HDFS Acceleration; Joint Project With Ohio State University
HDFS™ (Hadoop Distributed File System)
Map Reduce HBase
DISK DISK DISK DISK DISK DISK
Hive Pig
© 2013 Mellanox Technologies 18
Hadoop and 1GbE? – Romance is Over, It’s Time for Real Life…
© 2013 Mellanox Technologies 19
High Throughput, High Utilization, Faster Analytics
*http://www.youtube.com/watch?v=zQdDi9pdf3I
© 2013 Mellanox Technologies 20
Turnkey, high performance solution for Hadoop applications
• Available now!
• Ease of installation, usage and management
<16 min to analyze 1TB of Unstructured Data, with 5 servers!
• Using Mellanox end-to-end 40GbE, STEC SSDs and Intel® Xeon® E5 Systems
• Fully pre-configured, manageable solution with Zettaset Orchestrator®
An outcome of collaboration with
High-Performance Hadoop Cluster – Turnkey Solution
80% CAPAX Reduction!
© 2013 Mellanox Technologies 21
Record times on data analytics metrics
High-Performance Hadoop Cluster – Turnkey Solution
0
5
10
15
20
25
30
35
10GbE Competitor 40GbE Mellanox
Tim
e (
Min
ute
s)
Hadoop Terasort Benchmark
40% Reduction in Runtime
© 2013 Mellanox Technologies 22
Mellanox VPI Card
• MCX354A-FCBT
Mellanox Edge Switches
• MSX10xx; MSX60xx
Cloudera Certified – CDH3 and CDH4
© 2013 Mellanox Technologies 23
EMC 1000-Node Analytic Platform
Accelerates Industry's Hadoop Development
24 PetaByte of physical storage
• Half of every written word since inception of mankind
Mellanox VPI Solutions
Test Drive Your Application
2X Faster Hadoop Job Run-Time Hadoop
Acceleration
High Throughput, Low Latency, RDMA Critical for ROI
© 2013 Mellanox Technologies 24
Mellanox Performance For Hadoop Infrastructure
Bandwidth
• Highest in the industry – 56Gbp/s
• Imperative for mega-scale data transfers
Latency
• Lowest in the industry - <0.7uSec
• Required for fastest response time, user experience
Power Efficiency
• Lowest Power End-to-End solution, <0.1W/Gbp/s
• Low CPU overhead, CPU cycles for computation
Scalability
• Linear growth in performance as cluster size scale
© 2013 Mellanox Technologies 25
Thank You