Upload
hadoop-summit
View
395
Download
0
Embed Size (px)
DESCRIPTION
What do data center operators need to know when deploying Hadoop in the Data Center? Multi-tenancy, network topology, workload types, and myriad other factors affect the way applications run and perform in the data center. Understanding performance characteristics of the distributed system is key to not only optimize for Hadoop, but allows Hadoop to seamlessly operate side-by-side existing applications.
Citation preview
Hadoop Considerations
• Traffic Types, Job Patterns, Network Considerations, Compute
Network Integration
• Co-exist with current Data Center infrastructure
• Open, Programmable and Application-Aware Networks
Multi-tenancy
• Remove the “Silo clusters”
2
3
4
Analyze
Extract Transform Load
(ETL)
Explode
Reduce
Reduce
Reduce
Ingress vs.
Egress
Data Set
1:0.3
Ingress vs.
Egress
Data Set
1:1
Ingress vs.
Egress
Data Set
1:2
The Time the reducers
start is dependent on:mapred.reduce.slowstart.co
mpleted.maps
It doesn’t change the amount
of data sent to Reducers, but
may change the timing to
send that data
5
Small Flows/Messaging(Admin Related, Heart-beats, Keep-alive,
delay sensitive application messaging)
Small – Medium Incast(Hadoop Shuffle)
Large Flows(HDFS Ingest)
Large Incast(Hadoop Replication)
6
Many-to-Many Traffic Pattern
Map 1 Map 2 Map NMap 3
Reducer 1 Reducer 2 Reducer 3 Reducer N
HDFS
Shuffle
Output
Replication
NameNode
JobTracker
ZooKeeper
AnalyzeSimulated with
Shakespeare
Wordcount
Extract Transform Load
(ETL)Simulated with
Yahoo TeraSort
Extract Transform Load
(ETL)Simulated with
Yahoo TeraSort with output
replication
Job Patterns have varying impact on network utilization
8
9
Network Attributes
Architecture
Availability
Capacity, Scale &
Oversubscription
Flexibility
Management & Visibility
Integration Considerations
10
Single 1GE100% Utilized
Dual 1GE75% Utilized
10GE40% Utilized
Generally 1G is being used largely due to the cost/performance trade-offs.
Though 10GE can provide benefits depending on workload
• No single point of failure from network view point. No impact on job completion time
• NIC bonding configured at Linux – with LACP mode of bonding
• Effective load-sharing of traffic flow on two NICs.
• Recommended to change the hashing to src-dst-ip-port (both network and NIC bonding in Linux) for optimal load-sharing
11
1
13 25 37 49 61 73 85 97
109
121
133
145
157
169
181
193
205
217
229
241
253
265
277
289
301
313
325
337
349
361
373
385
397
409
421
433
445
457
469
481
493
505
517
529
541
553
565
577
589
601
613
625
637
649
661
673
685
697
709
721
733
745
757
769
781
793
Job
Co
mp
leti
on
Ce
ll U
sage
1G Buffer Used 10G Buffer Used 1G Map % 1G Reduce % 10G Map % 10G Reduce %
1GE vs. 10GE Buffer Usage
12
Moving from 1GE to 10GE actually lowers the buffer requirement at the switching layer.
By moving to 10GE, the data node has a wider pipe to receive data lessening the need for buffers on the network as the total aggregate transfer rate and amount of data does not increase substantially. This is due, in part, to limits of I/O and Compute capabilities
Goals
• Extensive Validation of Hadoop Workload
• Reference Architecture
Make it easy for Enterprise
Demystify Network for HadoopDeployment
Integration with Enterprise with efficient choices of network topology/devices
Findings
• 10G and/or Dual attached server provides consistent job completion time & better buffer utilization
• 10G provide reduce burst at the access layer
• Dual Attached Sever is recommended design –1G or 10G. 10G for future proofing
• Rack failure has the biggest impact on job completion time
• Does not require non-blocking network
• Latency does not matter much in Hadoopworkloads
13
http://www.slideshare.net/Hadoop_Summit/ref-arch-validated-and-tested-approach-to-define-a-network-design
http://youtu.be/YJODsK0T67A
More Details From Hadoop
Summit 2012 at:
14
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 15
n3548-001# show interface brief
--------------------------------------------------------------------------------
Ethernet VLAN Type Mode Status Reason Speed Port
Interface Ch #
--------------------------------------------------------------------------------
Eth1/1 1 eth access up none 10G(D) --
Eth1/2 1 eth access up none 10G(D) --
Eth1/3 1 eth access up none 10G(D) --
Eth1/4 1 eth access up none 10G(D) --
Eth1/5 1 eth access up none 10G(D) –-
.
.
Eth1/33 1 eth access up none 10G(D) --
Eth1/34 1 eth access up none 10G(D) --
Eth1/35 1 eth access down SFP not inserted 10G(D) --
Eth1/36 1 eth access down SFP not inserted 10G(D) --
Eth1/37 1 eth access down Administratively down 10G(D) –
.
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16
n3548-001# show mac address-table dynamic
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay
MAC
age - seconds since first seen,+ - primary entry using vPC Peer-
Link
VLAN MAC Address Type age Secure NTFY Ports
---------+-----------------+--------+---------+------+----+----------------
--
* 1 e8b7.484d.a208 dynamic 60570 F F Eth1/31
* 1 e8b7.484d.a20a dynamic 60560 F F Eth1/31
* 1 e8b7.484d.a73e dynamic 60560 F F Eth1/34
* 1 e8b7.484d.a740 dynamic 60560 F F Eth1/34
* 1 e8b7.484d.ad15 dynamic 60560 F F Eth1/28
* 1 e8b7.484d.ad17 dynamic 60560 F F Eth1/28
* 1 e8b7.484d.b3e9 dynamic 60570 F F Eth1/25
* 1 e8b7.484d.b3eb dynamic 60560 F F Eth1/25
.
.
MAC Addresses
of the connected
devices … and
the port they are
on…
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 17
n3548-001# portServerMap
=======================================
Port Server FQDN
---------------------------------------
Eth1/1 c200-m2-10g2-001.cluster10g.com
Eth1/2 c200-m2-10g2-002.cluster10g.com
Eth1/3 c200-m2-10g2-003.cluster10g.com
Eth1/4 c200-m2-10g2-004.cluster10g.com
Eth1/5 c200-m2-10g2-005.cluster10g.com
Eth1/6 c200-m2-10g2-006.cluster10g.com
Eth1/7 c200-m2-10g2-031.cluster10g.com
Eth1/8 c200-m2-10g2-008.cluster10g.com
Eth1/9 c200-m2-10g2-009.cluster10g.com
Eth1/11 c200-m2-10g2-011.cluster10g.com
.
.
.
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 18
n3548-001# trackerList
===========================================
Port Server Server Port
-------------------------------------------
Eth1/2 c200-m2-10g2-002 50544
Eth1/3 c200-m2-10g2-003 41909
Eth1/4 c200-m2-10g2-004 36480
Eth1/5 c200-m2-10g2-005 38179
Eth1/6 c200-m2-10g2-006 51375
Eth1/7 c200-m2-10g2-031 41915
Eth1/8 c200-m2-10g2-008 50983
Eth1/9 c200-m2-10g2-009 37056
Eth1/11 c200-m2-10g2-011 35882
Eth1/12 c200-m2-10g2-012 44551
.
.
.
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 19
n3548-001# bufferServerMap
===================================================================
Port Server 1sec 5sec 60sec 5min 1hr
-------------------------------------------------------------------
Eth1/1 c200-m2-10g2-001 0KB 0KB 0KB 0KB 0KB
Eth1/2 c200-m2-10g2-002 384KB 384KB 1536KB 2304KB 2304KB
Eth1/3 c200-m2-10g2-003 384KB 384KB 1152KB 1536KB 1536KB
Eth1/4 c200-m2-10g2-004 384KB 384KB 2304KB 2304KB 2304KB
Eth1/5 c200-m2-10g2-005 384KB 384KB 768KB 1536KB 1536KB
Eth1/6 c200-m2-10g2-006 384KB 2304KB 2304KB 2304KB 2304KB
Eth1/7 c200-m2-10g2-031 384KB 384KB 3456KB 3840KB 3840KB
Eth1/8 c200-m2-10g2-008 768KB 768KB 2688KB 2688KB 2688KB
Eth1/9 c200-m2-10g2-009 384KB 384KB 2304KB 2304KB 2304KB
Eth1/11 c200-m2-10g2-011 384KB 384KB 1920KB 1920KB 1920KB
.
.
.Eth1/1(c200-m2-10g2-001)
has 0 buffer usage because
it’s the name node
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 20
n3548-001# jobsBuffer
Hadoop Job Info ...
===================================================================
1 jobs currently running
JobId RunTime(secs) User Priority
job_201306131423_0009 120 hadoop NORMAL
===================================================================
Buffer Info - Per Port
Port Server 1sec 5sec 60sec 5min 1hr
-------------------------------------------------------------------
Eth1/1 c200-m2-10g2-001 0KB 0KB 0KB 0KB 0KB
Eth1/2 c200-m2-10g2-002 384KB 384KB 768KB 768KB 768KB
Eth1/3 c200-m2-10g2-003 384KB 384KB 1152KB 1152KB 1152KB
Eth1/4 c200-m2-10g2-004 384KB 1536KB 1536KB 1536KB 1536KB
Eth1/5 c200-m2-10g2-005 384KB 768KB 1152KB 1152KB 1152KB
.
.
What jobs were running
during peak buffer usage
… and for how long were
they running
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 21
n3548-001(config)# jobsBuffer
Hadoop Job Info ...
===================================================================
0 jobs currently running
JobId RunTime(secs) User Priority
===================================================================
Buffer Info - Per Port
Port Server 1sec 5sec 60sec 5min 1hr
-------------------------------------------------------------------
Eth1/1 c200-m2-10g2-001 0KB 0KB 0KB 0KB 0KB
Eth1/2 c200-m2-10g2-002 0KB 0KB 0KB 1920KB 1920KB
Eth1/3 c200-m2-10g2-003 0KB 0KB 0KB 2304KB 2304KB
Eth1/4 c200-m2-10g2-004 0KB 0KB 0KB 2688KB 2688KB
Eth1/5 c200-m2-10g2-005 0KB 0KB 0KB 2304KB 2304KB
Eth1/6 c200-m2-10g2-006 0KB 0KB 0KB 2304KB 2304KB
Eth1/7 c200-m2-10g2-031 0KB 0KB 0KB 1920KB 2688KB
.
Historic look at the
buffer usage …
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 22
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 23
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 24
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 25
Buffer Usage
Shuffle
Replication
Reduce
Map
0 60 120 180 240 300 360 420 480 540 600 660 720 780
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 26
(Python Socket)
Push Data Push Data Push Data
PTP Grandmaster
(OPTIONAL)
Analyze
github.com/datacenter
27
28
Hadoop + HBASE
Job Based
Department Based
Various Multitenant Environments
Need to understand
Traffic Patterns
Scheduling
Dependent
Permissions and
Scheduling
Dependent
29
Map 1 Map 2 Map NMap 3
Reducer
1
Reducer
2
Reducer
3
Reducer
N
HDFS
Shuffle
Output
Replication
Region
Server
Region
Server
Client Client
Major
Compaction
ReadRead
Read
Update
Update
Read
Major
Compaction
30
Hbase During Major Compaction
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
Latency(us)
Time
UPDATE-AverageLatency(us) READ-AverageLatency(us) QoS-UPDATE-AverageLatency(us) QoS-READ-AverageLatency(us)
Read/Update
Latency
Comparison of Non-
QoS vs. QoS Policy
~45% for Read
Improvement
Switch Buffer
Usage
With Network QoS
Policy to prioritize
Hbase Update/Read
Operations
Switch Buffer
Usage
With Network QoS
Policy to prioritize
Hbase Update/Read
Operations
0
5000
10000
15000
20000
25000
30000
35000
40000
Latency(us)
Time
UPDATE-AverageLatency(us) READ-AverageLatency(us) QoS-UPDATE-AverageLatency(us) QoS-READ-AverageLatency(us)
1
70
139
208
277
346
415
484
553
622
691
760
829
898
967
1036
1105
1174
1243
1312
1381
1450
1519
1588
1657
1726
1795
1864
1933
2002
2071
2140
2209
2278
2347
2416
2485
2554
2623
2692
2761
2830
2899
2968
3037
3106
3175
3244
3313
3382
3451
3520
3589
3658
3727
3796
3865
3934
4003
4072
4141
4210
4279
4348
4417
4486
4555
4624
4693
4762
4831
4900
4969
5038
5107
5176
5245
5314
5383
5452
5521
5590
5659
5728
5797
5866
5935
BufferUsed
Timeline
HadoopTeraSort Hbase
Hbase + Hadoop Map Reduce
Read/Update
Latency
Comparison of Non-
QoS vs. QoS Policy
~60% for Read
Improvement
Cisco Unified Data Center
UNIFIED
FABRIC
UNIFIED
COMPUTING
Highly Scalable, Secure
Network FabricModular Stateless
Computing Elements
UNIFIED
MANAGEMENT
Automated
Management
THANK YOU FOR LISTENING
www.cisco.com/go/ucswww.cisco.com/go/nexushttp://www.cisco.com/go/wor
kloadautomation
Manages Enterprise
Workloads
Cisco.com Big Datawww.cisco.com/go/bigdata
Data Center Script Examples from Presentation:
github.com/datacenter