View
43
Download
1
Category
Tags:
Preview:
DESCRIPTION
Data Center Networking. CS 6250: Computer Networking Fall 2011. Cloud Computing. Elastic resources Expand and contract resources Pay-per-use Infrastructure on demand Multi-tenancy Multiple independent users Security and resource isolation Amortize the cost of the (shared) infrastructure - PowerPoint PPT Presentation
Citation preview
1
Data Center Networking
CS 6250: Computer NetworkingFall 2011
Cloud Computing
• Elastic resources– Expand and contract resources– Pay-per-use– Infrastructure on demand
• Multi-tenancy– Multiple independent users– Security and resource isolation– Amortize the cost of the (shared) infrastructure
• Flexibility service management– Resiliency: isolate failure of servers and storage– Workload movement: move work to other locations
2
Trend of Data Center
By J. Nicholas Hoover , InformationWeek June 17, 2008 04:00 AM
200 million Euro
Data centers will be larger and larger in the future cloud computing era to benefit from commodities of scale. Tens of thousand Hundreds of thousands in # of servers
Most important things in data center management- Economies of scale- High utilization of equipment- Maximize revenue- Amortize administration cost- Low power consumption (http://www.energystar.gov/ia/partners/prod_development/downloads/EPA_Datacenter_Report_Congress_Final1.pdf)
http://gigaom.com/cloud/google-builds-megadata-center-in-finland/
http://www.datacenterknowledge.com
Cloud Service Models
• Software as a Service– Provider licenses applications to users as a service– e.g., customer relationship management, email, …– Avoid costs of installation, maintenance, patches, …
• Platform as a Service– Provider offers software platform for building applications– e.g., Google’s App-Engine– Avoid worrying about scalability of platform
• Infrastructure as a Service– Provider offers raw computing, storage, and network– e.g., Amazon’s Elastic Computing Cloud (EC2)– Avoid buying servers and estimating resource needs
4
Multi-Tier Applications
• Applications consist of tasks– Many separate components– Running on different machines
• Commodity computers– Many general-purpose computers– Not one big mainframe– Easier scaling
5
Front end Server
Aggregator
Aggregator Aggregator… …
Aggregator
Worker
…
Worker Worker
…
Worker Worker
Enabling Technology: Virtualization
• Multiple virtual machines on one physical machine• Applications run unmodified as on real machine• VM can migrate from one computer to another6
Status Quo: Virtual Switch in Server
7
Top-of-Rack Architecture
• Rack of servers– Commodity servers– And top-of-rack switch
• Modular design– Preconfigured racks– Power, network, and
storage cabling
• Aggregate to the next level
8
Modularity
• Containers
• Many containers
9
10
Data Center Challenges
• Traffic load balance• Support for VM migration• Achieving bisection bandwidth• Power savings / Cooling• Network management (provisioning)• Security (dealing with multiple tenants)
Data Center Costs (Monthly Costs)
• Servers: 45%– CPU, memory, disk
• Infrastructure: 25%– UPS, cooling, power distribution
• Power draw: 15%– Electrical utility costs
• Network: 15%– Switches, links, transit
11
http://perspectives.mvdirona.com/2008/11/28/CostOfPowerInLargeScaleDataCenters.aspx
12
Common Data Center Topology
Internet
Servers
Layer-2 switchAccess
Data Center
Layer-2/3 switchAggregation
Layer-3 routerCore
Data Center Network Topology
13
CR CR
AR AR AR AR. . .
SS
Internet
SS
A AA …
SS
A AA …
. . .
Key•CR = Core Router•AR = Access Router•S = Ethernet Switch•A = Rack of app. servers
~ 1,000 servers/pod
Requirements for future data center• To catch up with the trend of mega data center, DCN technology
should meet the requirements as below– High Scalability– Transparent VM migration (high agility)– Easy deployment requiring less human administration– Efficient communication– Loop free forwarding– Fault Tolerance
• Current DCN technology can’t meet the requirements.– Layer 3 protocol can not support the transparent VM migration.– Current Layer 2 protocol is not scalable due to the size of forwarding table
and native broadcasting for address resolution.
15
Problems with Common Topologies
• Single point of failure
• Over subscription of links higher up in the topology
• Tradeoff between cost and provisioning
Capacity Mismatch
16
CR CR
AR AR AR AR
SS
SS
A AA …
SS
A AA …
. . .
SS
SS
A AA …
SS
A AA …
~ 5:1~ 5:1
~ 40:1~ 40:1
~ 200:1~ 200:1
Data-Center Routing
17
CR CR
AR AR AR AR. . .
SS
DC-Layer 3
Internet
SS
A AA …
SS
A AA …
. . .
DC-Layer 2
Key•CR = Core Router (L3)•AR = Access Router (L3)•S = Ethernet Switch (L2)•A = Rack of app. servers
~ 1,000 servers/pod == IP subnet
S S S S
SS
Reminder: Layer 2 vs. Layer 3• Ethernet switching (layer 2)
– Cheaper switch equipment– Fixed addresses and auto-configuration– Seamless mobility, migration, and failover
• IP routing (layer 3)– Scalability through hierarchical addressing– Efficiency through shortest-path routing– Multipath routing through equal-cost multipath
• So, like in enterprises…– Data centers often connect layer-2 islands by IP routers
18
19
Need for Layer 2
• Certain monitoring apps require server with same role to be on the same VLAN
• Using same IP on dual homed servers
• Allows organic growth of server farms
• Migration is easier
20
Review of Layer 2 & Layer 3• Layer 2
– One spanning tree for entire network• Prevents loops• Ignores alternate paths
• Layer 3– Shortest path routing between source and destination– Best-effort delivery
21
FAT Tree-Based Solution
• Connect end-host together using a fat-tree topology – Infrastructure consist of cheap devices
• Each port supports same speed as endhost– All devices can transmit at line speed if packets are distributed
along existing paths– A k-port fat tree can support k3/4 hosts
22
Fat-Tree Topology
23
Problems with a Vanilla Fat-tree
• Layer 3 will only use one of the existing equal cost paths
• Packet re-ordering occurs if layer 3 blindly takes advantage of path diversity
24
Modified Fat Tree
• Enforce special addressing scheme in DC– Allows host attached to same switch to route only
through switch– Allows inter-pod traffic to stay within pod– unused.PodNumber.switchnumber.Endhost
• Use two level look-ups to distribute traffic and maintain packet ordering.
25
Two-Level Lookups
• First level is prefix lookup– Used to route down the topology to endhost
• Second level is a suffix lookup– Used to route up towards core– Diffuses and spreads out traffic– Maintains packet ordering by using the same ports for
the same endhost
26
Diffusion Optimizations
• Flow classification– Eliminates local congestion– Assign to traffic to ports on a per-flow basis instead of
a per-host basis
• Flow scheduling– Eliminates global congestion– Prevent long lived flows from sharing the same links– Assign long lived flows to different links
27
Drawbacks
• No inherent support for VLAN traffic
• Data center is fixed size
• Ignored connectivity to the Internet
• Waste of address space– Requires NAT at border
Data Center Traffic Engineering
Challenges and Opportunities
28
Wide-Area Network
29
Router Router
DNS Server
DNS-basedsite selection
. . . . . .Servers Servers
Internet
Clients
Data Centers
Wide-Area Network: Ingress Proxies
30
Router Router
Data Centers
. . . . . .Servers Servers
Clients
ProxyProxy
Traffic Engineering Challenges
• Scale– Many switches, hosts, and virtual machines
• Churn– Large number of component failures
– Virtual Machine (VM) migration
• Traffic characteristics– High traffic volume and dense traffic matrix
– Volatile, unpredictable traffic patterns
• Performance requirements– Delay-sensitive applications
– Resource isolation between tenants31
Traffic Engineering Opportunities• Efficient network
– Low propagation delay and high capacity
• Specialized topology– Fat tree, Clos network, etc.– Opportunities for hierarchical addressing
• Control over both network and hosts– Joint optimization of routing and server placement– Can move network functionality into the end host
• Flexible movement of workload– Services replicated at multiple servers and data centers– Virtual Machine (VM) migration
32
PortLand: Main IdeaAdd a new host
Transfer a packet
Key features - Layer 2 protocol based on tree topology - PMAC encode the position information - Data forwarding proceeds based on PMAC- Edge switch’s responsible for mapping between PMAC and AMAC- Fabric manger’s responsible for address resolution- Edge switch makes PMAC invisible to end host- Each switch node can identify its position by itself- Fabric manager keep information of overall topology. Corresponding to the fault, it notifies affected nodes.
Questions in discussion board• Question about Fabric Manager
o How to make Fabric Manager robust ?o How to build scalable Fabric Manager ?
Redundant deployment or cluster Fabric manager could be solution
• Question about reality of base-line tree topologyo Is the tree topology common in real world ?
Yes, multi-rooted tree topology has been a traditional topology in data center.
[A scalable, commodity data center network architecture Mohammad La-Fares and et el, SIGCOMM ‘08]
Discussion pointso Is the PortLand applicable to other topology ?
The Idea of central ARP management could be applicable. To solve forwarding loop problem, TRILL header-like method would be necessary. The benefits of PMAC ? It would require larger size of forwarding table.
• Question about benefits from VM migrationo VM migration helps to reduce traffic going through aggregate/core switches.o How about user requirement change?o How about power consumption ?
• ETCo Feasibility to mimic PortLand on layer 3 protocol.
o How about using pseudo IP ?
o Delay to boot up whole data center
Status Quo: Conventional DC Network
Reference – “Data Center: Load balancing Data Center Services”, Cisco 2004
CR CR
AR AR AR AR. . .
SS
DC-Layer 3
Internet
SS
A AA …
SS
A AA …
. . .
DC-Layer 2Key
• CR = Core Router (L3)• AR = Access Router (L3)• S = Ethernet Switch (L2)• A = Rack of app. servers
~ 1,000 servers/pod == IP subnet
Conventional DC Network Problems
CR CR
AR AR AR AR
SS
SS
A AA …
SS
A AA …
. . .
SS
SS
A AA …
SS
A AA …
~ 5:1
~ 40:1
~ 200:1
Dependence on high-cost proprietary routers
Extremely limited server-to-server capacity
And More Problems …CR CR
AR AR AR AR
SS
SS SS
SS
SS SS
IP subnet (VLAN) #1
~ 200:1
• Resource fragmentation, significantly lowering utilization (and cost-efficiency)
IP subnet (VLAN) #2
A AA … A AA … A A… AA …AA A
And More Problems …CR CR
AR AR AR AR
SS
SS SS
SS
SS SS
IP subnet (VLAN) #1
~ 200:1
• Resource fragmentation, significantly lowering cloud utilization (and cost-efficiency)
Complicated manual L2/L3 re-configuration
IP subnet (VLAN) #2
A AA … A AA … A A… AA …AA A
All We Need is Just a Huge L2 Switch,
or an Abstraction of One
A AA … A AA …
. . .
A AA … A AA …
CR CR
AR AR AR AR
SS
SS SS
SS
SS SS
AAAA AAAA AAAA A A A A AA A AA AA AA
. . .
41
VL2 Approach
• Layer 2 based using future commodity switches• Hierarchy has 2:
– access switches (top of rack) – load balancing switches
• Eliminate spanning tree– Flat routing– Allows network to take advantage of path diversity
• Prevent MAC address learning– 4D architecture to distribute data plane information– TOR: Only need to learn address for the intermediate switches– Core: learn for TOR switches
• Support efficient grouping of hosts (VLAN replacement)
42
VL2
43
VL2 Components
• Top-of-Rack switch: – Aggregate traffic from 20 end host in a rack– Performs ip to mac translation
• Intermediate Switch– Disperses traffic– Balances traffic among switches– Used for valiant load balancing
• Decision Element– Places routes in switches– Maintain a directory services of IP to MAC
• End-host– Performs IP-to-MAC lookup
44
Routing in VL2
• End-host checks flow cache for MAC of flow– If not found ask agent to resolve– Agent returns list of MACs for server and MACs for intermediate
routers
• Send traffic to Top of Router – Traffic is triple-encapsulated
• Traffic is sent to intermediate destination
• Traffic is sent to Top of rack switch of destination
The Illusion of a Huge L2 Switch
1. L2 semantics
2. Uniform high capacity
3. Performance isolation
A AA … A AA … A AA … A AA …AAAA AAAA AAAA A A A A AA A AA AA AA
SolutionApproachObjective
2. Uniformhigh capacity between servers
Enforce hose model using existing
mechanisms only
Employ flat addressing1. Layer-2 semantics
3. Performance Isolation
Guarantee bandwidth for
hose-model traffic
Flow-based random traffic indirection
(Valiant LB)
Name-location separation & resolution
service
TCP
Objectives and Solutions
Name/Location Separation
47
payloadToR3
. . . . . .
yx
Servers use flat names
Switches run link-state routing and maintain only switch-level topology
Cope with host churns with very little overheadCope with host churns with very little overhead
y zpayloadToR4 z
ToR2 ToR4ToR1 ToR3
y, zpayloadToR3 z
. . .
DirectoryService
…x ToR2
y ToR3
z ToR4
…
Lookup &Response
…x ToR2
y ToR3
z ToR3
…
• Allows to use low-cost switches• Protects network and hosts from host-state churn• Obviates host and switch reconfiguration
• Allows to use low-cost switches• Protects network and hosts from host-state churn• Obviates host and switch reconfiguration
Clos Network Topology
48
. . .
. . .
TOR
20 Servers
Int
. . . . . . . . .
Aggr
K aggr switches with D ports
20*(DK/4) Servers
. . . . . . . . . . .
Offer huge aggr capacity & multi paths at modest costOffer huge aggr capacity & multi paths at modest cost
D (# of 10G ports)
Max DC size(# of Servers)
48 11,520
96 46,080
144 103,680
Valiant Load Balancing: Indirection
49
x y
payloadT3 y
z
payloadT5 z
IANYIANYIANY
IANY
Cope with arbitrary TMs with very little overheadCope with arbitrary TMs with very little overhead
Links used for up pathsLinks usedfor down paths
T1 T2 T3 T4 T5 T6
[ ECMP + IP Anycast ]• Harness huge bisection bandwidth• Obviate esoteric traffic engineering or optimization• Ensure robustness to failures• Work with switch mechanisms available today
[ ECMP + IP Anycast ]• Harness huge bisection bandwidth• Obviate esoteric traffic engineering or optimization• Ensure robustness to failures• Work with switch mechanisms available today
1. Must spread traffic2. Must ensure dst independence1. Must spread traffic2. Must ensure dst independenceEqual Cost Multi Path ForwardingEqual Cost Multi Path Forwarding
50
Properties of Desired Solutions• Backwards compatible with existing
infrastructure– No changes in application– Support of layer 2 (Ethernet)
• Cost-effective– Low power consumption & heat emission– Cheap infrastructure
• Allows host communication at line speed• Zero-configuration• No loops• Fault-tolerant
Research Questions
• What topology to use in data centers?– Reducing wiring complexity
– Achieving high bisection bandwidth
– Exploiting capabilities of optics and wireless
• Routing architecture?– Flat layer-2 network vs. hybrid switch/router
– Flat vs. hierarchical addressing
• How to perform traffic engineering?– Over-engineering vs. adapting to load
– Server selection, VM placement, or optimizing routing
• Virtualization of NICs, servers, switches, …51
Research Questions• Rethinking TCP congestion control?
– Low propagation delay and high bandwidth– “Incast” problem leading to bursty packet loss
• Division of labor for TE, access control, …– VM, hypervisor, ToR, and core switches/routers
• Reducing energy consumption– Better load balancing vs. selective shutting down
• Wide-area traffic engineering– Selecting the least-loaded or closest data center
• Security– Preventing information leakage and attacks
52
Recommended