Designing Hadoop for the Enterprise Data Center

  • Published on

  • View

  • Download

Embed Size (px)


Strata/Hadoop World 2012 with Jacob Rapp, Cisco & Eric Sammer, Cloudera


<ul><li> 1. Designing Hadoop for the Enterprise DataCenter Jacob Rapp, Cisco Eric Sammer, Cloudera </li> <li> 2. AgendaHadoop Considerations Traffic Types Job Patterns Network Considerations ComputeIntegration Co-exist with current Data Center infrastructureMulti-tenancy Remove the Silo clusters 2 </li> <li> 3. Data in the Enterprise Data Lives in a confined zone of enterprise repository Long Lived, Regulatory and Compliance Call Sales ERP Doc Recor Doc Driven Cente Pipeli Modul Mgmt ds Mgmt r ne eA A Mgmt B Heterogeneous Data Life Cycle Data ERP Soc Office Video Many Data Models Servic Media Modul Apps Conf Collab e eB Diverse data Structured and Unstructured Produc Diverse data sources - Subscriber based Customer DB t Catal og VOIP Exec Report (Oracle/SAP) Catalo Data s Diverse workload from many g sources/groups/process/technology Virtualized and non-virtualized with mostly SAN/NAS base Scaling &amp; Integration Dynamics are different Data Warehousing(structured) with diverse repository + Unstructured Data Few hundred to thousand nodes, few PB Integration, Policy &amp; Security Challenges Each Apps/Group/Technology limited in data generation Consumption Servicing confined domains 3 </li> <li> 4. Enterprise Data Center InfrastructureWAN Edge Layer FC FC SAN A SAN B Nexus 7000 Layer 3 MDS 9500 10 GE Core Layer 2 - 1GE SAN Layer 2 - 10GE DirectorCore Layer 10 GE DCB(LAN &amp; SAN) 10 GE FCoE/DCB 4/8 Gb FC Nexus 7000 10 GE Aggr vPC+ L3 FabricPathAggregation &amp; Services L2 Layer Network Services FC FC SAN Access SAN A B Layer Nexus SAN Edge 5500 MDS 9200 / FCoE 9100 B22 FEX Nexus 5500 10GE CBS 31xx Nexus 7000 Nexus 5500 FCoE UCS FCoE HP Bare Metal Nexus 2148TP-E Blade switch Nexus 2232 Nexus 3000 End-of-Row Blade 1G Nexus 3000 Bare Metal Top-of-Rack Top-of-Rack C- Top-of-Rack 10G class 1 GbE Server Access &amp; 4/8Gb FC via dual HBA (SAN A // SAN B) 10Gb DCB / FCoE Server Access or 10 GbE Server Access &amp; 4/8Gb FC via dual HBA (SAN A // SAN B) 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 4 </li> <li> 5. Hadoop Cluster Design &amp;Network Architecture 5 </li> <li> 6. Validated 96 Node HadoopCluster Nexus 7000 Nexus 7000 Nexus 5548 Nexus 5548 2248TP-E Nexus 3000 Nexus 3000 2248TP-E Name Node Name Node Cisco UCS C 200 Cisco UCS C200 Single NIC Single NIC Nodes 1 48 Data Nodes 49- 96 Data Data Nodes 1 48 Data Nodes 49 - 96 Cisco UCS C 200 Single NIC Cisco UCS 200 Single NIC Cisco UCS C 200 Single NIC Cisco UCS C 200 Single NIC Traditional DC Design Nexus 55xx/2248 Nexus 7K-N3K based Topology Hadoop Framework Network Apache 0.20.2 Three Racks each with 32 nodes Linux 6.2 Distribution Layer Nexus 7000 or Slots 10 Maps &amp; 2 Reducers per node Nexus 5000 Compute UCS C200 M2 ToR FEX or Nexus 3000 Cores: 12 2 FEX per Rack Processor: 2 x Intel(R) Xeon(R) CPU X5670 @ 2.93GHz Each Rack with either 32 single or Disk: 4 x 2TB (7.2K RPM) dual attached host Network: 1G: LOM, 10G: Cisco UCS P81E </li> <li> 7. Hadoop Job Patterns andNetwork Traffic 7 </li> <li> 8. Job Patterns Reduce Ingress vs. Analyze Egress Data Set 1:0.3 The Time the reducers start is dependent on: Reduce mpleted.maps It doesnt change the amount Ingress vs. of data sent to Reducers, but Egress may change the timing toExtract Transform Load Data Set send that data (ETL) 1:1 Reduce Ingress vs. Explode Egress Data Set 1:2 8 </li> <li> 9. Traffic Types Small Flows/Messaging (Admin Related, Heart-beats, Keep-alive, delay sensitive application messaging) Small Medium Incast (Hadoop Shuffle) Large Flows (HDFS Ingest) Large Incast (Hadoop Replication) 9 </li> <li> 10. Map and Reduce TrafficNameNodeJobTrackerZooKeeper Many-to-Many Traffic Pattern Map 1 Map 2 Map 3 Map N Shuffle Reducer 1 Reducer 2 Reducer 3 Reducer N Output Replication HDFS 10 </li> <li> 11....</li></ul>


View more >