Upload
altoros
View
2.970
Download
4
Embed Size (px)
DESCRIPTION
A presentation made by Altoros and Joyent together at the NoSQL Now! 2013 conference.
Citation preview
NoSQL Now!
Aug 21, 2013
Ben Wen, Joyent
Renat Khasanshyn, Altoros
About Joyent
The high-performance public cloud
infrastructure provider
Cloud IaaS Virtual Machines:
Linux, Windows, BSD, SmartOS
(fka Solaris) with Zones
Core founding sponsors of Node.js
Four global datacenters
Key markets:
Big data, mobile, e-commerce,
finsvc, SaaS
Open Source contributions:
Node.js, KVM, DTrace, ZFS,
SmartOS
4
Running bare-metal only practical for some organizations
Performance varies significantly across various job types
In fact, for many jobs, less = more
Utilization of most clusters in production is low
Optimizing Hadoop/MapReduce performance is hard
5
Get upset when truth comes out!
Biased (to the shiny side of the coin)
Often add controversy and confusion
6
- For Hadoop, what is the impact of Container-based virtualization vs Hardware
emulation (KVM)*
- What are the Hadoop optimization strategies? Is there a “rule of thumb” when it
comes to determining the optimization approach?
- What are the optimal Hadoop cluster settings for 1TB TeraSort benchmark on
100 and 400 node clusters running Linux and SmartOS on the Joyent Public
Cloud?
7
Physical (disks, cpu, network)
OS/Hypervisor (especially for virtualized environments)
Hadoop/MapReduce (tons of settings)
Algorithmic (data structures, join strategies, big-O…)
Implementation (code efficiency, architecture decisions that fit all other factors)
8
Open source Unix operating system based on the active fork of Open Solaris technology (illumos) for the cloud. Uses containerized OS virtualization, called Zones (think a mature LXC with secure RBAC and auditing)
operating system based on the Debian
Linux distribution and distributed as free
and open source software.
Apache Hadoop is an open-source software framework that supports data-intensive distributed applications, licensed under the Apache v2 license. Derived from Google's MapReduce and Google File System (GFS) papers, Hadoop enables applications to work with thousands of computation-independent computers and petabytes of data.
9
Written by Opscode and released as open source under the Apache License 2.0., Chef is a DevOps tool used for configuring cloud services or to streamline the task of configuring a company's internal servers. Chef automatically sets up and tweaks the operating systems and programs that run in massive data centers.
Developed by creators of the Starfish project from Duke University, Unravel brings run-time profiling of Hadoop jobs followed by a cost-based database query optimization. Unravel connects to streams of Hadoop and system instrumentation data, and applies statistical machine learning to optimize cost of Hadoop jobs and increase cluster utilization.
1
0
Comparing I/O Path on
Bare Metal Unix Vs Zones Vs KVM
• Code path is essentially the same as bare metal
• Zones partition at the OS level
• Performance is higher
• KVM is encapsulated by hypervisor
• Code path is much more circuitous in a KVM process.
• Performance is impacted
Bare-metal OS Virtualization Kernel Virtualization
1
1
No over
head for
Zones:
Stack traces
show how a
network
packet is
transmitted
from:
Bare Metal
vs
Joyent Zone
vs
Fedora VM
on KVM
Bare Metal Joyent Zone (aka SmartMachine) Fedora VM on KVM VM
Start Start Start
1 kernel`start_xmit
2 kernel`dtrace_int3_handler+0xd2
3 kernel`kmem_cache_free+0x2f
4 kernel`dtrace_int3+0x3a
5 kernel`eth_header
6 kernel`__kfree_skb+0x47
7 kernel`start_xmit+0x1
8 kernel`dev_hard_start_xmit+0x322
9 kernel`sch_direct_xmit+0xef
10 kernel`dev_queue_xmit+0x184
11 kernel`eth_header+0x3a
12 kernel`neigh_resolve_output+0x11e
13 kernel`nf_hook_slow+0x75
14 kernel`ip_finish_output
15 kernel`ip_finish_output+0x17e
16 kernel`ip_output+0x98
17 kernel`__ip_local_out+0xa4
18 kernel`ip_local_out+0x29
19 kernel`ip_queue_xmit+0x14f
20 kernel`tcp_transmit_skb+0x3e4
21 kernel`__kmalloc_node_track_caller+0x185
22 kernel`sk_stream_alloc_skb+0x41
23 kernel`tcp_write_xmit+0xf7
24 kernel`__alloc_skb+0x8c
25 kernel`__tcp_push_pending_frames+0x26
26 kernel`tcp_sendmsg+0x895
27 kernel`inet_sendmsg+0x64
28 kernel`sock_aio_write+0x13a
29 kernel`do_sync_write+0xd2
30 kernel`security_file_permission+0x2c
31 kernel`rw_verify_area+0x61
32 kernel`vfs_write+0x16d
33 kernel`sys_write+0x4a
34 kernel`sys_rt_sigprocmask+0x84
35 kernel`system_call_fastpath+0x16
36 igb`igb_tx_ring_send+0x33
37 mac`mac_hwring_tx+0x1d
38 mac`mac_tx_send+0x5dc
39 mac`mac_tx_single_ring_mode+0x6e
mac`mac_tx+0xda mac`mac_tx+0xda mac`mac_tx+0xda
dld`str_mdata_fastpath_put+0x53 dld`str_mdata_fastpath_put+0x53 dld`str_mdata_fastpath_put+0x53
ip`ip_xmit+0x82d ip`ip_xmit+0x82d ip`ip_xmit+0x82d
ip`ire_send_wire_v4+0x3e9 ip`ire_send_wire_v4+0x3e9 ip`ire_send_wire_v4+0x3e9
ip`conn_ip_output+0x190 ip`conn_ip_output+0x190 ip`conn_ip_output+0x190
ip`tcp_send_data+0x59 ip`tcp_send_data+0x59 ip`tcp_send_data+0x59
ip`tcp_output+0x58c ip`tcp_output+0x58c ip`tcp_output+0x58c
ip`squeue_enter+0x426 ip`squeue_enter+0x426 ip`squeue_enter+0x426
ip`tcp_sendmsg+0x14f ip`tcp_sendmsg+0x14f ip`tcp_sendmsg+0x14f
sockfs`so_sendmsg+0x26b sockfs`so_sendmsg+0x26b sockfs`so_sendmsg+0x26b
sockfs`socket_sendmsg+0x48 sockfs`socket_sendmsg+0x48 sockfs`socket_sendmsg+0x48
sockfs`socket_vop_write+0x6c sockfs`socket_vop_write+0x6c sockfs`socket_vop_write+0x6c
genunix`fop_write+0x8b genunix`fop_write+0x8b genunix`fop_write+0x8b
genunix`write+0x250 genunix`write+0x250 genunix`write+0x250
genunix`write32+0x1e genunix`write32+0x1e genunix`write32+0x1e
unix`_sys_sysenter_post_swapgs+0x14 unix`_sys_sysenter_post_swapgs+0x14 unix`_sys_sysenter_post_swapgs+0x149
Skips steppingthrough39 functionsrequiredwhen Fedorais running onKVM/qemu
Note thata Joyent Zoneis exactly thesame as “BareMetal”
Three identical Apache Hadoop 1.0.4 clusters were provisioned on Joyent
infrastructure using Joyent REST API and Opscode Chef
Each cluster was tweaked for optimal performance following best practices for
TeraSort benchmark.
13
A custom script launches virtual machines using Joyent API and stores information
about them in a json file.
14
Each machine in cluster is being configured according to its role in cluster using
Chef cookbooks.
15
As part of TeraSort benchmark a dataset is generated using TeraGen utility
included in Apache Hadoop.
16
On one of the nodes a Hadoop TeraSort job using previously generated dataset is
submitted.
17
See: Hadoop job_201210261134_0010 on hadoop-smartos-r-1.html
The key difference between the two clusters was unveiled when monitoring I/O and
CPU utilization. Ubuntu cluster was spending too much time in OS kernel while
performing I/O operations as demonstrated on Figure 1.
SmartOS cluster was using CPU much more efficiently and was able to utilize larger
number of Hadoop mappers and reducers, key configuration parameters for Hadoop:
20
21
22
The key difference between the clusters was unveiled when monitoring I/O and CPU utilization. Ubuntu cluster was spending too much time in OS kernel while performing I/O (for copies of configfiles and job reports –email [email protected])
24
1) Basic cluster configuration is key (one time effort for typical workloads)
DATA DISK SCALING
COMPRESSION
JVM REUSE POLICY
HDFS BLOCK SIZE
MAP-SIDE SPILLS
COPY/SHUFFLE PHASE TUNING
REDUCE-SIDE SPILLS
2) Tune the number of map and reduce tasks appropriately
3) Consider GPU for some workloads
25
• Forthcoming in October
• Includes cloud performance
• Co-author DTrace book
• More here on his techniques:
• http://dtrace.org/blogs/brendan/
26
Thank you!
Ben Wen: [email protected]
Renat Khasanshyn: [email protected]
@renatco (650) 395-7002