Upload
bayle
View
24
Download
0
Embed Size (px)
DESCRIPTION
048866: Packet Switch Architectures. Scaling. Dr. Isaac Keslassy Electrical Engineering, Technion [email protected] http://comnet.technion.ac.il/~isaac/. Achieving 100% throughput. Switch model Uniform traffic Technique: Uniform schedule (easy) - PowerPoint PPT Presentation
Citation preview
048866: Packet Switch Architectures
Dr. Isaac KeslassyElectrical Engineering, Technion
http://comnet.technion.ac.il/~isaac/
Scaling
Spring 2006 048866 – Packet Switch Architectures 2
Achieving 100% throughput
1. Switch model2. Uniform traffic
Technique: Uniform schedule (easy)3. Non-uniform traffic, but known traffic matrix
Technique: Non-uniform schedule (Birkhoff-von Neumann)4. Unknown traffic matrix
Technique: Lyapunov functions (MWM)5. Faster scheduling algorithms
Technique: Speedup (maximal matchings) Technique: Memory and randomization (Tassiulas) Technique: Twist architecture (buffered crossbar)
6. Accelerate scheduling algorithm Technique: Pipelining Technique: Envelopes Technique: Slicing
7. No scheduling algorithm Technique: Load-balanced router
Spring 2006 048866 – Packet Switch Architectures 3
Outline
Up until now, we have focused on high performance packet switches with:
1. A crossbar switching fabric,2. Input queues (and possibly output queues as well),3. Virtual output queues, and4. Centralized arbitration/scheduling algorithm.
Today we’ll talk about the implementation of the crossbar switch fabric itself. How are they built, how do they scale, and what limits their capacity?
Spring 2006 048866 – Packet Switch Architectures 4
Crossbar switchLimiting factors
1. N2 crosspoints per chip, or N x N-to-1 multiplexors
2. It’s not obvious how to build a crossbar from multiple chips,
3. Capacity of “I/O”s per chip. State of the art: About 300 pins each operating at
3.125Gb/s ~= 1Tb/s per chip. About 1/3 to 1/2 of this capacity available in practice
because of overhead and speedup. Crossbar chips today are limited by “I/O” capacity.
Spring 2006 048866 – Packet Switch Architectures 5
Scaling
1. Scaling Line Rate Bit-slicing Time-slicing
2. Scaling Time (Scheduling Speed) Time-slicing Envelopes Frames
3. Scaling Number of Ports Naïve approach Clos networks Benes networks
Spring 2006 048866 – Packet Switch Architectures 6
Bit-sliced parallelism
Linecard (from each input)
Cell
Cell
Cell
• Cell is “striped” across k identical planes.
•Scheduler makes same decision for all slices.
•However, doesn’t decrease scheduling speed
•Other problem(s)?SchedulerScheduler
8
7654321
k
Spring 2006 048866 – Packet Switch Architectures 7
Time-sliced parallelism
• Cell carried by one plane; takes k cell times.
• Centralized scheduler is unchanged. It works for each slice in turn.
•Problem: same scheduling speed
SchedulerScheduler
8
7654321
k
Cell
Linecard (from each input)
Cell
Cell
Cell
Spring 2006 048866 – Packet Switch Architectures 8
Scaling
1. Scaling Line Rate Bit-slicing Time-slicing
2. Scaling Time (Scheduling Speed) Time-slicing Envelopes Frames
3. Scaling Number of Ports Naïve approach Clos networks Benes networks
Spring 2006 048866 – Packet Switch Architectures 9
Time-sliced parallelismwith parallel scheduling
•Now scheduling is distributed to each slice.
•Scheduler has k cell times to schedule
•Problem(s)?
Slow Scheduler
2
1
k
Cell
Linecard (from each input)
Cell
Cell
Cell
Cell
3
Slow Scheduler
Slow Scheduler
Slow Scheduler
Spring 2006 048866 – Packet Switch Architectures 10
Envelopes
Envelopes of k cells [Kar et al., 2000] Problem: “Should I stay or should I go now?”
Waiting starvation (“Waiting for Godot”) Timeouts loss of throughput
Slow SchedulerSlow Scheduler
Linecard (at each VOQ)
Cell Cell Cell CellCell
Spring 2006 048866 – Packet Switch Architectures 11
Frames for scheduling
The slow scheduler simply takes its decision every k cell times and holds it for k cell times
Often associated with pipelining Note: pipelined-MWM still stable (intuitively: the weight doesn’t change
much) Possible problem(s)?
Slow SchedulerSlow Scheduler
Linecard (at each VOQ)
Cell Cell Cell CellCell Cell Cell Cell
Spring 2006 048866 – Packet Switch Architectures 12
Scaling a crossbar
Conclusion: Scaling the line rate is relatively straightforward
(although the chip count and power may become a problem).
Scaling the scheduling decision is more difficult, and often comes at the expense of packet delay.
What if we want to increase the number of ports?
Can we build a crossbar-equivalent from multiple stages of smaller crossbars?
If so, what properties should it have?
Spring 2006 048866 – Packet Switch Architectures 13
Scaling
1. Scaling Line Rate Bit-slicing Time-slicing
2. Scaling Time (Scheduling Speed) Time-slicing Envelopes Frames
3. Scaling Number of Ports Naïve approach Clos networks Benes networks
Spring 2006 048866 – Packet Switch Architectures 14
Scaling number of outputs Naïve Approach
4 inp
uts
4 outputs
Building Block: 16x16 crossbar switch:
Eight inputs and eight outputs required!
Spring 2006 048866 – Packet Switch Architectures 15
3-stage Clos Network
n x k
m x m
k x n1
N
N = n x mk ≥ n
1
2
…
m
1
2
…
…
…
k
1
2
…
m
1
N
n n
Spring 2006 048866 – Packet Switch Architectures 16
With k = n, is a Clos network non-blocking like a crossbar?
Consider the example: scheduler chooses to match(1,1), (2,4), (3,3), (4,2)
Spring 2006 048866 – Packet Switch Architectures 17
With k = n is a Clos network non-blocking like a crossbar?
Consider the example: scheduler chooses to match(1,1), (2,2), (4,4), (5,3), …
By rearranging matches, the connections could be added.Q: Is this Clos network “rearrangeably non-blocking”?
Spring 2006 048866 – Packet Switch Architectures 18
With k = n a Clos network is rearrangeably non-blocking
Route matching is equivalent to edge-coloring in a bipartite multigraph.Colors correspond to middle-stage switches.
(1,1), (2,4), (3,3), (4,2)
Each vertex corresponds to an n x k
or k x n switch.
No two edges at a vertex may be colored the same.
Vizing ‘64: a D-degree bipartite graph can be colored in D colors.(remember: Birkhoff-von Neumann Decomposition
Theorem)
Therefore, if k = n, a Clos network is rearrangeably non-blocking (and can therefore perform any permutation).
Spring 2006 048866 – Packet Switch Architectures 19
How complex is the rearrangement?
Method 1: Find a maximum size bipartite matching for each of D colors in turn, O(DN2.5). Why does it work?
Method 2: Partition graph into Euler sets, O(N.logD) [Cole et al. ‘00]
Spring 2006 048866 – Packet Switch Architectures 20
Euler partition of a graph
Euler partition of graph G: 1. Each odd degree vertex is at the end of one open path.2. Each even degree vertex is at the end of no open path.
Spring 2006 048866 – Packet Switch Architectures 21
Euler split of a graph
Euler split of G into G1 and G2:1. Scan each path in an Euler
partition.2. Place each alternate edge
into G1 and G2
GG1
G2
Spring 2006 048866 – Packet Switch Architectures 22
Edge-Coloring using Euler sets
Assume for simplicity that the graph is regular (all vertices have the
same degree, D), and D=2i
Perform i “Euler splits” and 1-color each resulting graph. This is log D operations, each of O(E).
Spring 2006 048866 – Packet Switch Architectures 23
Implementation
SchedulerScheduler Route connections
Route connections
Requestgraph
Permutation Paths
Spring 2006 048866 – Packet Switch Architectures 24
Implementation
Pros A rearrangeably non-blocking switch can perform any
permutation A cell switch is time-slotted, so all connections are
rearranged every time slot anywayCons Rearrangement algorithms are complex (in addition to
the scheduler)
Can we eliminate the need to rearrange?
Spring 2006 048866 – Packet Switch Architectures 25
Strictly non-blocking Clos Network
Clos’ Theorem: If k >= 2n – 1, then a new connection can alwaysbe added without rearrangement.
Spring 2006 048866 – Packet Switch Architectures 26
Clos Theorem
I1
I2
…
Im
O1
O2
…
Om
M1
M2
…
…
…
Mk
n x k
m x m
k x n1
N
N = n x mk ≥ 2n-1
1
N
n n
Spring 2006 048866 – Packet Switch Architectures 27
Clos Theorem
Ia Ob
1 1
n
k
1
n
k
1. Consider adding the n-th connection between1st stage Ia and 3rd stage Ob.
2. We need to ensure that there is always somecenter-stage M available.
3. If k > (n – 1) + (n – 1) , then there is always an M available. i.e. we need k >= 2n – 1.
n – 1 alreadyin use at input
and output.n-1n?
1
n-1n?
Spring 2006 048866 – Packet Switch Architectures 28
Benes networksRecursive construction
Spring 2006 048866 – Packet Switch Architectures 29
Benes networksRecursive construction
Spring 2006 048866 – Packet Switch Architectures 30
Scaling Crossbars: Summary
Scaling the bit-rate through parallelism is easy. Scaling the scheduler is hard. Scaling the number of ports is harder. Clos network:
Rearrangeably non-blocking with k = n, but routing is complicated,
Strictly non-blocking with k >= 2n – 1, so routing is simple. But requires more bisection bandwidth.
Benes network: scaling with small components