Stage-Distributed Time-Division Permutation Routing in a Multistage Optically Interconnected Switching Fabric Alvaro Cassinelli (1), Makoto Naruse (2),

Stage-Distributed Time-Division Permutation Routing in a Multistage Optically Interconnected Switching Fabric Alvaro Cassinelli(1), Makoto Naruse (2), Masatoshi Ishikawa(1)

1: University of Tokyo, Dept. Information Physics and Computing, 7-3-1 Hongo Bunkyo-ku, Tokyo 113-0033, Japan. ([email protected])2: Communications Research Laboratory, 4-2-1 Nukui-kita, Koganei, Tokyo 184-8795, Japan.

Mechanically Reconfigurable wave-guide-based interconnections

Introduction Column-controlled buffered MIN architecturePlane-to-plane guide-wave based optical interconnections

Bandwidth, cross-talk and volume consumption considerations, all advocate for plane-to-plane optical interconnects within massively interconnected electronic system as a promising substitute to conventional electronic circuitry in the near future.

Multistage Interconnection Networks (MINs) are capable of supporting arbitrary input-output interconnections, and have been long proposed as an interesting alternative to the full crossbar and the time-shared bus (in terms of performance, cost and scalability) for use as permutation networks for multiprocessors but also as point-to-point switching fabrics for telecom applications.

As a consequence, multistage architectures using optical plane-to-plane regular interconnections may well represent a theoretically optimal architecture for use within a massively interconnected systems.

The work presented here develops on some fundamental aspects of such optical multistage architecture:

Transparent column-controlled Multistage Network

Possible implementation of an electro-optical bi-permutation module using stacked layers and electro-optically controlled normal coupling between the layers.

Performance analysis of a buffered GSMIN

Conclusions and further research

Column-control certainly reduces overall interconnection capacity, but if things are properly designed, the MIN architecture can still accommodate communication primitives of most static interconnection networks. The interest of such an arrangement lies in the ease of control and straightforward implementation. We presented here preliminary experimental results demonstrating a simple optical architecture using cascaded fiber-based bi-permutation modules. An electro-mechanical system has been developed providing stage-switching times on the order of milliseconds, making this architecture suitable for reconfigurable, high-bandwidth inter-processor communications. Further research on this topic include the feasibility study of dense guided-wave multi-permutation modules by stacking layers of planar lightwave circuits, as well as the introduction of electro-optical switching.

6

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Input request probability per unit time ()

Pro

ba

bil

ity

of

pa

ck

et

ac

ce

pta

nc

e

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

00

1

2

3

4

5

1

2

3

4

65

0

crossbar

Depth Analysis = Buffer Size

Depth Transfer = Buffer Size

SEMIN

GSMIN

Bu

ffer s

ize

6

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1


Pro

ba

bil

ity

of

pa

ck

et

ac

ce

pta

nc

e

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

00

1

2

3

4

5

crossbar

Size: 128 x 128

GSMIN (alternate)GSMIN (forced alternate)

Bu

ffer s

ize

6

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1


Pro

ba

bil

ity

of

pa

ck

et

ac

ce

pta

nc

e

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

00

1

2

3

4

5

1

2

3

4

65

0

crossbar

SEMIN

GSMINSize: 128 x 128

SEMIN (alternate)SEMIN/GSMIN (forced alternate)

Bu

ffer s

ize

“Fair” selection mode (the state of the global-switch is chosen randomly if the poll is a draw)

Time slot

Permutation appearance period

time

Inte

rco

nn

ect

1

Inte

rco

nn

ect

2

Inte

rco

nn

ect

3

Inte

rco

nn

ect

N

Inte

rco

nn

ect

1

Inte

rco

nn

ect

2

Inte

rco

nn

ect

3

Inte

rco

nn

ect

N

Inte

rco

nn

ect

1

Inte

rco

nn

ect

2

Inte

rco

nn

ect

3

Inte

rco

nn

ect

N

(B)

(A) (C)

A small mechanical (or optical ) perturbation produces a drastic change of the interconnection pattern from input to output.

• better efficiency (than holograms) for long-range interconnections.

• no cross-talk in 3D, just like free-space optics.

• No space-invariance required.

• Theoretically more volume efficient than free-space

• Precise and robust alignment possible…

• multiple interleaved permutations possible.

Fiber-based Modules vs. Free-Space

• Maybe “hard” to build? Boring, but not a fundamentally difficult (can be automated, can be done by “layers”, see below).

• Alignment of both output and input needed.

• Power dissipation may be a fundamental limitation, but we are far from these limits.

Routing Strategy

Conflicts are not resolved individually at the switch level, but globally at the stage level by a “tournament” between all the incoming requests to that particular stage. Provided that these request are uniformly directed to any possible output, "votes" leading to the adoption of one of the two possible states of the global-switch will be evenly distributed. Such behaviour takes place for all stages of the network, so that at each stage, half of the requests will be dropped and half will be able to pass to the next stage. This means there is an enormous number of discarded packets, certainly much bigger than that occurring by internal blocking in a standard SEMIN. However, if one considers a buffered architecture, then presumably there will be no need to provide it with a large buffer memory, because the packets that have been retained in the buffers are very likely to go forward in the following tournament.

Figures represent performance (normalized amount of satisfied requests) as a function of the input load (computed as the probability of a request being issued at any input per unit time) for the MIN with stage-distributed switching, and for a standard Shuffle Exchange MIN with independent control of switches (both 128x128 large networks). The input traffic follows the uniform request model.

Individual control of switches as well as arbitration may be unnecessary on a standard SEMIN if the buffer size is larger than four. Also, when the buffer size is equal to three, the 128x128 GSMIN already outperforms a 128x128 full-crossbar.

Why common control of switches per switching stage (column-control)?

A column-controlled MIN can be an efficient (circuit-switched) permutation network.

“indirect” implementation obtained by electrically joining individual switches (common control lines a, b and c) of a Shuffle-Exchange network (here the Inverse Baseline)

A more “direct” Implementation of a column-controlled Multistage Network using multi-permutation switching modules (the GSMIN network)

L2 E2

I4

switching region

inputcross

straight

In the case of “column/row-decomposable” permutations (see left), which happen to be the ones required in most regular interconnected MINs, 2D interconnection modules can be easily implemented by stacking layers of printed lightwave circuits containing 1D permutations (see right).

(A) Guide-wave based interconnection modules

While many demonstrator have been built to illustrate the advantages of free-space optics over electronics for dense plane-to-plane interconnections, there has been relatively little research on the use of two-dimensional wave-guide-based interconnections, (or guide-wave based interconnection modules). Yet, these can easily achieve better transmission efficiency than holographic-based interconnections while almost completely cancelling cross-talk, and contrary to the common belief they may be more volume efficient than free-space optics for both space-invariant and space-variant interconnects.

(B) Transparent column-controlled Optical Multistage Network

The use of “modular” inter-stage interconnections leads naturally to consider a new paradigm for the optical MIN architecture: interconnection modules containing several interleaved global interconnections (i.e. permutations). Addressing of the required permutation can be done by mechanical displacement of the whole multi-permutation module.

Cascading such modules without intermediate optoelectronic arrays gives a transparent ”globally-stage switched” multistage interconnection network (GSMIN) that can be used as a circuit-switched permutation network for multiprocessor communications (this architecture represents an original optical implementation of a column-controlled shuffle-exchange MIN).

(C) Buffered column-controlled Optical Multistage Network

Column-controlled unbuffered MINs have been extensively studied for circuit-switched applications. We found that an inter-stage buffered column-controlled MIN architecture may also represent an interesting alternative to the well-known Shuffle-Exchange MINs (SEMINs) for point-to-point packet-routed communications, both from the point of view of its implementation complexity and simplicity of routing protocol.

Recently, we tested this approach by implementing several 4x4 fiber-based modules each integrating a

different fixed topology.

Automatic alignment of modules, both dynamically and statically (pre-aligned ”plug-and-play”

exchangeable blocks), is a critical issue now being studied.

(b)(a)

input plane

0

1

2

3

4

5

6

8

output p

lane

“Brute” folding and row/column decomposed folding of the perfect shuffle inter-stage permutation

Stacking planar lightwave circuits to produce 2D guided-wave interconnection modules

Experimental results

Stacking layers

Multistage architecture

parallel computers

switching networks

Dense optical interconnect:

interconnection folded in 2D

Optical Multistage Architecture Paradigm

0000000100100011010001010110

011110001001101010111100110111101111

Inp

ut

Ou

tpu

t

E(1) E(1) E(1) E(1)0000000100100011010001010110011110001001101010111100110111101111

(4) (4) (4)(4)

Shuffle interconnectionExchange switch

0000000100100011010001010110

011110001001101010111100110111101111

Inp

ut

Ou

tpu

t

E(1) E(1) E(1) E(1)0000000100100011010001010110011110001001101010111100110111101111

(4) (4) (4)(4)

Shuffle interconnectionExchange switch

+

(2)

…an optical “3D optical wiring” module between

2D VLSI arrays.

networkinterconnection modules

Processor arrays

interconnection modules

arrays

networkinterconnection modules

Processor arrays

interconnection modules

arrays

Multi-permutation interconnection module paradigm

output

input

3D bi-permutation module built by stacking planar

lightwave circuits

output

input

3D bi-permutation module built by stacking planar

lightwave circuits

actuators

inputs

outputsactuators

inputs

outputs

Input(VCSEL array)

Output

C1 or IdC2 or Id

Id . Id E1 . E2Id . C2 C1 . IdId . Id E1 . E2Id . C2 C1 . Id

c3

c4

c2

c1

c2

c1

c 3 c4

…topology mapped on a plane (optical interconnects, VLSI integration)

four-dimensional hypercube connected multiprocessor…

{c2, id}{c1, id}

{c3, id}{c4, id}

Four-dimensional “spanned” hypercube network (16 nodes) using four bi-permutation modules, each providing a cube permutation and the identity.

A total of 24=16 global permutations for the whole network.

First two modules of a spanned version of a 4-dimensional hypercube were fabricated using interleaved optical fibers, and the resulting four possible interconnections observed.

Spanned hypercube using Bi-permutation modules

Guide-wave based multi-permutation interconnection module

Mechanically reconfigurable multi-stage architecture based on cascading multi-permutation modules

Channels are multi mode fibers: MFD = 9.5 mGrad diam. 125 m NA: 0.1

{c2, id}input output

Output (to CCD)

Input(from VCSEL array)

Exit first module

Input second module

Displacement stages

Prototype using interleaved fibers

Experience with two cascaded modules:

Displacement pitch for commutation: 125 m

Alignment tolerance: 5 m (half peak power).

Inter-module Coupling Efficiency: 1.7dB (no additional optics, matching oil or antireflection coating).

Validation of simple cascaded architecture

Electro-mechanical switching device

The switching speed seems to be limited to the millisecond range (resonant frequency).

Micro electro-mechanical actuators (MEMS) may also be an interesting alternative when switching latency in the millisecond range is tolerable.

This electro-magnetic actuated device can position the module

in both X and Y directions.

Non-integrated bi-permutation module:

Only the older packets waiting on the queues are given attention in the selection phase, and that only these packets are candidates to be transferred during the transfer phase. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1


Pro

ba

bil

ity

of

pa

ck

et

ac

ce

pta

nc

e

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

Depth Analysis = 1

Depth Transfer = 1

SEMIN

GSMIN

crossbar

Bu

ffer s

ize

3

4

6

Depth Analysis = Buffer Size

Depth Transfer = Buffer Size

0

crossbar

0

1

2

3

4

56

012

5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Pro

ba

bil

ity

of

pa

ck

et

ac

ce

pta

nc

e

SEMIN

GSMIN

Bu

ffer s

ize

“Alternate” selection mode (the state of the global-switch changes if the poll is even)

“Forced-alternate” selection mode (the state of the global-switch changes periodically)Bi-permutation module (integrates switching and

permutation stage)Column-controlled Shuffle-Exchange stage

a b ca b c c b ac b a

Despite the reduced permutation capacity of a column-controlled MIN, a permutation network can emulate (by time-multiplexing) a full-crossbar provided that the reconfiguration latency remains small with respect to the interconnection request rate, and that the bandwidth is sufficiently high to accommodate in a single time slot all the data to be transferred. An unbuffered column-controlled MIN may provide enough optical bandwidth - but the reconfiguration time may be still too slow (using thermo-optical switches for instance).

A clear improvement in performance is seen in the standard SEMIN for buffer

sizes larger than one.

This performance may exceed that of the GSMIN, but it seems to increase slowly

(with buffer size), so that when the buffer size is equal to five (or four in the case of

a 64×64 network), both networks show roughly the same performance.

From the routing point of view, simulations showed that a buffered column-controlled MIN would not require excessive buffer size to achieve respectable performances (under uniform traffic). The path-selection mechanism can be further reduced to the simple alternation of the available stage permutations, without degrading the performance. Under such stage-distributed time-division permutation multiplexing, the SEMIN and GSMIN fabrics become strictly equivalent routing architectures; hence (provided that buffer size is chosen to be larger than three) this analysis-free strategy will provide a very simple arbitration mechanism for standard SEMIN networks. This is an interesting result on its own.

There is no significant change in performance between a buffered GSMIN with "alternate" selection and a buffered GSMIN with "forced alternate" selection. A continual "blind" alternation of switching states performs just as well.

When a standard SEMIN and a column-control SEMIN (GSMIN) are operated in forced alternate mode, they become strictly equivalent architectures. Therefore, when a standard SEMIN is operated in forced alternate mode, its performance roughly degrades to that of an alternate operated GSMIN; on the other hand, when a GSMIN is operated in forced alternate mode, the performance remains substantially the same.

A multistage ”spanned” version of most direct network topologies (hypercube, cube-connected-cycles, etc.) can be implemented as an unbuffered column-controlled MIN architecture.

A time-division multiplexing (TDM) technique can be used to select the interconnections at each stage.

A column-controlled MIN can emulate a full-crossbar by time-multiplexing

Simplicity and Scalability: A whole switching stage needs a unique control signal.

Can have a “direct” optical implementation: the switching stage and its adjacent inter-stage permutation can be physically merged into a unique “permutation module" providing two different interconnection patterns (bi-permutation switching module).

=

arbitration

Total length of buffer

depth of analysis

Length of transferred packets/cycle

…

…

…

Buffer 1

Buffer N

…

…

Bi-p

erm

uta

tio

n

mo

du

le

arbitration

Total length of buffer

depth of analysis

Length of transferred packets/cycle

…

…

…

Buffer 1

Buffer N

…

…

Bi-p

erm

uta

tio

n

mo

du

le

Routing parameters

Optimal situation: both “depth of analysis” and “depth of transfer” are equal to the actual buffer size.

The performance of a "fair" operated GSMIN is superior to that of a standard ("fair" operated) MIN for buffer sizes larger than one.

Also, for a buffer size equal to three, the GSMIN network gives better performance than the crossbar, even at full-load.

This arbitration free mechanism may be very appealing for all-optical networks, if optical buffering functions (e.g. delay lines) could be integrated on the cascaded modules themselves, an issue worth exploring.

Two implementation of a column-controlled Multistage Interconnection Network

inputoutput

inputoutput

inp

ut

Alignments hardware

inp

ut

Alignments hardware

Input pattern (exit VCSEL array)

Output pattern (CCD image)

Ou

tpu

t (C

CD

)

Inp

ut

(VC

SE

Ls)





Ou

tpu

t (C

CD

)

Inp

ut

(VC

SE

Ls)

Ou

tpu

t (C

CD

)

Inp

ut

(VC

SE

Ls)

ECOC 2003 / We4.P.137

processing (analysis/buffer)

bi-permutation module

module control

processing (analysis/buffer)

bi-permutation module

module control

Documents

Stage-Distributed Time-Division Permutation Routing in a Multistage Optically Interconnected Switching Fabric Alvaro Cassinelli (1), Makoto Naruse (2),