58
Keynote SSS‘08 Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology [email protected]

Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology [email protected]. ... SLAC National Accelerator

  • Upload
    others

  • View
    27

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS‘08

Distributed Algorithms and VLSI

Ulrich SchmidVienna University of Technology

[email protected]

Page 2: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 2

Content

Short introduction to Very Large Scale Integration (VLSI): A photo gallery …– Great perspectives– But …

VLSI Circuits ↔ Distributed Algorithms– DAs and VLSI: Do’s and Don’t’s

Do’s – an Example: DARTS Fault-tolerant Clocks– Starting point: A simple distributed algorithm– How to implement it in VLSI ?– Proofs – [Under the rug: Metastability …]

Page 3: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 3

Short introduction to VLSI: A photo gallery …

Page 4: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 4

VLSI Circuits

Page 5: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 5

Major IngredientsTransistors (nMOS):

Polysilicon GateSiO2Insulator

n n

p substrate

channel

Source Drain

LW

Gate

Source

Drain

Interconnect (wires):

Form & connect gates

(Inverter)

Page 6: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 6

Miniaturization: Moore‘s Law

Intel 4004 (1971) Intel P4 (2001)• 2.250 transistors• 12 mm2 / 10 µm• 0.74 MHz, 1W

• 42.000.000 transistors• 217 mm2 / 0.180 µm = 180 nm• 2 GHz, 50 W

Page 7: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 7

Multicore Processors

IBM POWER4 (dual-core)

IBM Cell (8-core)

Tilera TILE64

Today: < 45 nm

Page 8: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 8

Systems-on-Chip (SoC)

Assemble whole SoC from suitable componentsMarket for “IP cores”, from different vendorsSync/asyncinterfaces

Nvidia Tegra

Page 9: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 9

Great perspectives for VLSI circuits.

But …

Page 10: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 10

Manufacturing Limitations

VLSILab Politechnico Torino

Optical Proximity Correction, Intel Corp.

Page 11: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 11

Defects (Electromigration)

P. Gutman, IBM T.J.Watson Research Center

M. Ohring, Reliability and Failure of Electronic Materials and Devices,1998 ASM Corp. Shanghai

Wiskers Hillock

Void

Page 12: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 12

Defects (Gate Oxide BD)

K.-L. Pey, C.-H. Tung, Physical characterization of breakdown in metal-oxide-semiconductor transistors

Breakdown−induced thermochemical reactions in (a) poly−Si gate and (b) p−Si substrate of n−channel MOSFETs.

Semitracks, Inc.

ESD-induced gate oxide breakdownwww.siliconfareast.com

Page 13: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 13

Power Dissipation Problems

A. Choudhary, UMassSmall transistor dissipating 5mW in an SOI wafer; University of Bolton

→ Reduce supply voltage !

Page 14: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 14

Radiation-induced Soft Errors

SLAC National Accelerator LabStanford

SET SEU

Powell, 1959

0 km10 km

1

10-3

Soft error rates dominate in VLSI !

Page 15: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 15

Slow Signal Propagation

Transistors switch faster

BUTWires thinnerLess transistor driving strengthRC Signal propagation along wires dominate circuit speed

Page 16: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 16

Clock Distribution Problem

Circuit & physical design of the POWER4 microprocessor, IBM J. Res. Dev.

Cell processor

tPD,CLK

CLK

D

CLK

D

CLK

D

CLK

D

tdly,DATA,1m

tdly,DATA,2m

tdly,DATA,km

FF1

FF2

FFk FFmcombin. logic

Clock signal (common!)

CLK

D

CLK

DCombinat. logic (gates)

Data

Synchronous design paradigm:

→ Synchronous abstraction increasingly difficult to maintain !

Page 17: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 17

Hence, deep submicron VLSI circuits …

Page 18: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 18

… are in fact FT Distributed Systems

Spatial distributionMessage-passing communicationMassive concurrencyAsynchronyFailuresSecurity issues (IP cores!)

Worth-while undertaking:Explore the applicability of DA results & approaches to VLSI circuits …

Page 19: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 19

Applying DA Research in VLSI ?

2008 Dagstuhl-Seminar Distributed Algorithms in VLSI Chips (B. Charron-Bost, J. Ebergen, S. Dolev, U. Schmid, http://www.dagstuhl.de/08371)

[Great place for such undertakings …]

Page 20: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 20

DA and VLSI – Don’t’s

Apply standard DAs in the VLSI context – too heavy weight in terms of computation & communicationApply standard replication-based FT (for coping with “classic” VLSI faults) – too heavy-weight in terms of power & area penalties

BUT …

Page 21: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 21

DA and VLSI – Do’s (I)Apply “light-weight” DAs for decentralized handling of [nowadays centralized] functions, e.g. in large multicores– Memory access scheduling (Moscibroda & Mutlu, PODC’08)– Apply self-stabilizing algorithms for handling transient failures (S.

Dolev & Haviv, IEEE ToC, 2006)– Fault-tolerant clock generation in SoCs (Függer, Schmid, Fuchs,

Kempf, EDCC’06)

Apply replication-based FT to cope with malicious failures in VLSI – IP core security threats in SoCs– Inconsistently propagated errors in high-dependability

applications

Tilera TILE64

Page 22: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 22

DA and VLSI – Do’s (II)

Apply VLSI results & approaches in DA research– Error-correcting codes and asynchronous consensus (Friedmann,

Mostefaoui, Rajsbaum & Raynal, IEEE ToC, 2007)– Corruption-resilient Codes (S. Dolev & Tzachar, DISC’08)

Extend DA approaches, to contribute to a (still lacking!) “Theory of Dependable VLSI Circuits”– Early example: Arbiter-Problem (Lamport, ~1980)– Handle massive concurrency (continuously computing gates!)– Handle computation and communication resource restrictions– Handle “non-closed” specifications– Define suitable failure models

Page 23: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 23

Do’s – an Example: DARTS Fault-tolerant Clocks

Page 24: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 24

DARTS – Distributed Algorithms for Robust Tick Synchronization

Joint work with A. Steininger, M. Függer, G. Fuchs [and many others]

http://ti.tuwien.ac.at/ecs/research/projects/darts

Page 25: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 25

Clocking in SoCs (I)

Classic synchronous paradigmConcept: Common notion of time for entire chip

Method: Single quartz oscillatorGlobal, phase-accurate clock tree

Disadvantages- Cumbersome clock tree design- High power consumption- Clock is single point of failure!

DSP

WLAN

Video

GPRS

GPS

Page 26: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 26

Clocking in SoCs (II)

Alternative: DARTS clocksConcept: Multiple synchronized tick generators

Method: Distributed FT tick generation alg (TG algs)Interacting via dedicated clock network (TG net)

Advantages- No quartz oscillator(s)- No critical clock tree- Clock is no single point of failure!- Reasonable synchrony

DSP

WLAN

Video

GPRS

GPS

Page 27: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 27

Reasonable Synchrony ?

Phase synchronization

Clock synchronization

- max precision, - min/max frequency

Tick synchronization

Page 28: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 28

Starting point: A Distributed Algorithm

Page 29: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 29

On booting do:send tick(0) to all; C:= 0; /* C is last tick number sent */

Continuously do:

If received tick(C) from n – f different processes:send tick(C+1) to all; C := C+1;

On booting do:send tick(0) to all; C:= 0; /* C is last tick number sent */

Continuously do:

If received tick(C) from all n processes:send tick(C+1) to all; C := C+1;

Failure-free case (f = 0): Simple barrier synchronization(Modified) Srikanth & Toueg algorithmFailure case f > 0 ?

A Distributed Algorithm (I)

On booting do:send tick(0) to all; C:= 0; /* C is last tick number sent */

Continuously do:If received tick(X) from f +1 different processes and X > C:

send tick(C+1),…, tick(X) to all [once]; C := X;If received tick(C) from n – f different processes:

send tick(C+1) to all [once]; C := C+1;

Page 30: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 30

A Distributed Algorithm (III)For n ≥ 3f + 1 and up to f Byz. failures,

with end-to-end delays ∈[d,d+ε]:Suppose process p sends tick(C+1) at time tThen, process q also sends tick(C+1)by time t+d+2ε

⇒ Clock ticks occur approximately synchronously

On booting: send tick(0) to all; C := 0; If got tick(X) from f +1 procs and X > C: send tick(C+1),…, tick(X) to all [once]; C := X; If got tick(C) from n - f processes: send tick(C+1) to all [once]; C := C+1;

f + 1

n − f ≥ 2f + 1

p at t any q’ at t+ε any q at t+d+2ε

≤ ε≤ d+ε

Page 31: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 31

How to implement this DA in VLSI ?

Mind: We don’t have any clock available for a synchronous implementation …

Page 32: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 32

Asynchronous Basic Circuits

a

b

y

loop

b y

a

y

prop

a b0

10

01

10

1

yold0

1yold

AND, OR, …; Muller C-Gate:- Continuously computes y = y(a,b) [with delay tprop]- AND gate for signal transitions ( barrier synchronization)- Note: Inevitably involves feedback loop [tloop]

Page 33: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 33

Asynchronous Communication

Convey alternating up/down signal transitions only FIFO “zero-bit message” channels [with delay]

performance penalty (serial data transmission)additional wires (parallel data transmission)

Sender Receiver

k-bit

k-bit data transmission costly: Additional circuitry +

Signal wires

Page 34: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 34

Major Challenges

If received tick(X) from f +1 processes and X > C :send tick(C+1),…, tick(X) to all [once]C := X

If received tick(C) from n − f processes :send tick(C+1) to all [once]C := C+1

k-bit message, k unbounded

Atomicity of actions

To be replaced byzero-bit messages

k kept at receiver

To be ensured byarchitecture + pathdelay constraints

Build suitablethreshold circuits

Thresholdcomparison

Page 35: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 35

k-bit Zero-bit Messages

...

...C

C

C

C

Rremote,in

C

C

C

C

NAND

NOR

NOR

NAND

NAND

NAND

GEQe

GRe

GEQo

GRo

Ctop

Pipe Compare Signal Generation

Diff-Gate Local PipeRemote Pipe

Counter Module

LocalClk

TG net feeds everyclock signal to everyTG alg (bus of width n)At every TG alg, n − 1 Counter Modules [oneper remote TG alg] maintain tick numbersAnonymous ticks ⇒rules only distinguish– r rem > r loc (f + 1, GR

rule) – r rem ≥ r loc (n − f, GEQ

rule)

Asynchronous up/down-counterTG alg 1

TG alg 6

TG alg 5

TG alg 4

TG alg 3

TG alg 2

TG net

On booting: send tick(0) to all; C := 0; If got tick(X) from f +1 procs and X > C: send tick(C+1),…, tick(X) to all [once]; C := X; If got tick(C) from n - f processes: send tick(C+1) to all [once]; C := C+1;

Move tick number maintenance from sender to receiver

Page 36: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 36

Asynchron. Up/Down Counter

C

C

C

C

Rremote,in

C

C

C

C

NAND

NOR

NOR

NAND

NAND

NAND

GEQe

GRe

GEQo

GRo

Ctop

Pipe Compare Signal Generation

Diff-Gate Local PipeRemote Pipe

Counter Module

LocalClk

Ingredients:– Two elastic pipelines (= FIFO buffers for signal

transitions) count remote and local clock ticks– Common transitions removed by Diff-Gate– GR and GEQ status signals derived from last stages

Metastability-free by construction [well, almost …]

Page 37: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 37

Atomicity of Actions

The gates making up the f + 1 and the n − f rulecompute continuously and concurrently, hence– may both produce tick(k), for the same k– this must be circumvented by all means [„once“]

How to ensure this atomicity?– Use separate circuitry for generating up-transitions (odd

k) and down-transitions (even k) → tick(k − 1) and tick(k) never mixed up

– Ensure that ratio of the maximum and minimum delay along certain paths is bounded (cp. Θ–Model [WLS05], ABC Model [RS08]) → tick(k − 2) and tick(k) nevermixed up

Page 38: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 38

Threshold Modules

...

...

......

......

GR and GEQ statussignals of the n − 1 Counter Modules fedinto f +1 and n − fthreshold gatesBack-transition from status signals to transition-signalling for generating tick(k)

Page 39: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 39

Proofs

Page 40: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 40

Proofs & Implementations (SW)

abstraction

model (alg+sys)

implementation

SW

specificationproof

On booting: send tick(0) to all; C := 0; If got tick(X) from f +1 procs and X > C: send tick(C+1),…, tick(X) to all [once]; C := X; If got tick(C) from n - f processes: send tick(C+1) to all [once]; C := C+1;

- max precision- min/max frequency

Ticksync n TG Algs, f Byz.

Executable machine code, real system

Prove that the model meets the specificationMinimize „proof gap“ between model and implementation

Proof goals:

Tick synced FT clocks

Distr. state machine, Byzantine failures

TTP implementation

Page 41: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 41

Proofs & Implementations (HW)

abstraction

model (alg+sys)

implementation

SW HW

partitioning & constraints

HW capabilities

specificationproof

On booting: send tick(0) to all; C := 0; If got tick(X) from f +1 procs and X > C: send tick(C+1),…, tick(X) to all [once]; C := X; If got tick(C) from n - f processes: send tick(C+1) to all [once]; C := C+1;

Page 42: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 42

Hierarchical Proof

Specification of low-level building blocks Up/down ticks correctly simulate tick(k)Synchronization propertiesBounded Precision & FrequencyBounded space (pipeline)

tick-up/downInterlocking proof

tick(k), tick(k+1), …

(P)

Precision & Frequency

(U) (S)

Bounded space

Page 43: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 43

On booting:send tick(0) to all; C := 0;

If got tick(X) from f +1 procs and X > C: send tick(C+1),…, tick(X) to all [once];C := X;

If got from n - f processes: send to all [once];C := C+1;

Interlocking Proof - “[once]”

k

k+1

k-2

x

tick-up/down

tick(k), tick(k+1), …

Interlocking proof

tick(k+1)tick(k)

x

tick(C)tick(C+1)

Page 44: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 44

Higher-Level Properties

(P) Progress. If all correct nodes send tick(k) by time t, then every correct node sends at least tick(k+1) by t + T+.(U) Unforgeability. If no correct node sends tick(k) by time t, then no correct node sends tick(k+1) by t+T-

first.(S) Simultaneity. If some correct node sends tick(k) by time t, then every correct process sends at least tick(k) by t+T-

first

and, on top of those,

Precision & FrequencyBounded pipeline size

tick(k), tick(k+1), …

(P)

Precision & Frequency

(U) (S)

Bounded pipes

Prove elementary synchronization properties

Page 45: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 45

Complete Suite of Proofs

[EDCC’06]

Page 46: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 46

ack_ext ack_int

req_ext req_int

Remote Pipe

____

_G

EQe

GR

e

GEQ

o

___

GR

o

3f+1

1

= 2f+1 = 2f+1

= f+1 = f+1

......

......

Threshold Logic_____GEQe

GRe

GEQo

___GRo

clk_

out

Pipeline 1

Node p

...

...

...

Pipe Compare Signal Generators

CC

CC

CC

CC

C

Diff-GateCC

C

Local Pipe

rem

ote

clk_

in

External Pipe

Pipeline 2

Local PipeDiff-Gate

Pipe Compare Signal Gen.

ExternalPipe

Pipeline 3

Local PipeDiff-Gate

Pipe Compare Signal Gen.

RemotePipe

Pipeline 3f+1

LocalPipe

Diff-Gate

Pipe Compare Signal Gen.

...

Complete Implementation

Implementation of the model only needs to– implement the low-level building blocks as specified– ensure the additional delay ratio bounds for

interlocking proof (place & route constraints)

[DFT’06]

Page 47: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 47

DARTS - Lessons Learned

Fault-tolerant distributed algorithms are indeed applicable in the VLSI context, but need “down-sizing” Distributed computing models with bounded delay ratio (Θ-Model, ABC model) well-suited for VLSI context (technology migration, re-using of models, etc.)Sole transition logic approach not sufficient for fault-tolerance ⇒ need a model that integrates event and state representationTime-free models suffer from a large “proof-gap” ⇒ need a model incorporating (continuous) timeFailures raise new metastability concerns ⇒ MS needs further investigation

Page 48: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 48

Under the rug: Metastability …

[Stolen from Dagstuhl presentation of A. Steininger …]

Page 49: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 49

Metastability

1

2

3

4

5

1 2 3 4 5

Inv 1

Inv 2

ui,2 = uo,1

ui,1 = uo,2

stable (HI)

stable (LO)

metastable

Bistable element(memory cell) withpositive feedback

Page 50: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 50

Revisit Muller C-Element

1

01

0x

a

x

y

a

x

y

a

x

y

pure delay at gateand interconnect

limited output slope

normal operation

oscillationcreeping

b y

a

Page 51: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 51

Error Containment

count pr

count pq

ThM

TG

node p

count qp

count qr

ThM

TG

node q

count rp

count rq

ThM

TG

node r

According to our proofs the wall holds – but we ignored metastability!

Page 52: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 52

The Counter Module

count pr

count pq

ThM

TG

node p

count qp

count qr

ThM

TG

node q

count rp

count rq

ThM

TG

node r

C

C

C

C

Rremote,in

C

C

C

C

NAND

NOR

NOR

NAND

NAND

NAND

GEQe

GRe

GEQo

GRo

Ctop

Pipe Compare Signal Generation

Diff-Gate Local PipeRemote Pipe

Counter Module

LocalClk

purely combinational logicwon‘t hurt

BUT won‘t help

Muller C-ElementMetastable input may pass through!

Page 53: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 53

The Threshold Module

count pr

count pq

ThM

TG

node p

count qp

count qr

ThM

TG

node q

count rp

count rq

ThM

TG

node r

Threshold Modulepurely combinational logic=> will not create metastability problem

BUT:

will propagate metastabilitywhile being near thethreshold

NO masking, NO protection

Page 54: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 54

Metastability Containment ?

count pr

count pq

ThM

TG

node p

count qp

count qr

ThM

TG

node q

count rp

count rq

ThM

TG

node r

Page 55: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 55

The End … © 2007, WDR

Page 56: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 56

Some References[Bau05] R. Baumann. Radiation-induced soft errors in advanced semiconductor technologies. IEEE Transactions on Device and Materials Reliability 5(3):305--316, Sept. 2005.[BJ83] J. C. Barros and B. W. Johnson. Equivalence of the arbiter, the synchronizer, the latch, and the inertial delay. IEEE Trans. Comput., 32(7):603--614, 1983.[BZMLCLD02] R. Bhamidipati, A. Zaidi, S. Makineni, K. Low, R. Chen, K.-Y. Liu, and J. Dalgrehn. Challenges and methodologies for implementing high-performance network processors. Intel Technology Journal, 6(3):83--92, Aug. 2002.[BY07] A. Bink and R. York. Arm996hs, the first licensable, clockless 32-bit processor core. IEEE Micro, 25(2):58--68, February 2007.[Bor05] S. Borkar. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. IEEE Micro, 25(6):10--16, Nov. 2005.[Cha84] D. M. Chapiro. Globally-Asynchronous Locally-Synchronous Systems. PhD thesis, Stanford University, Oct. 1984.[Con03] C. Constantinescu. Trends and challenges in VLSI circuit reliability. IEEE Micro, 23(4):14--19, July 2003.[DH06a] S. Dolev and Y. Haviv. Self-stabilizing microprocessors, analyzing and overcoming soft-errors. IEEE Transactions on Computers, 55(4):385--399, Apr. 2006.[Dol00] S. Dolev. Self-Stabilization. MIT Press, 2000.[DR98] C. Dyer and D. Rodgers. Effects on spacecraft \& aircraft electronics. In Proceedings ESA Workshop on Space Weather, ESA WPP-155, pages 17--27, Nordwijk, The Netherlands, nov 1998. ESA. [DT08] S. Dolev and N. Tzachar. Brief announcment: Corruption resilient fountain codes. In DISC, pages 502--503, 2008.[FFSK06:DFT] M. Ferringer, G. Fuchs, A. Steininger, and G. Kempf. VLSI Implementation of a Fault-Tolerant Distributed Clock Generation. IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT2006), pages 563--571, Oct. 2006.

Page 57: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 57

Some References

[FMRR07] R. Friedman, A. Mostefaoui, S. Rajsbaum, and M. Raynal. Asynchronous agreement and its relation with error-correcting codes. IEEE Trans. Comput., 56(7):865--875, 2007.[Fri01] E. G. Friedman. Clock distribution networks in synchronous digital integrated circuits. Proceedings of the IEEE, 89(5):665--692, May 2001.[FSFK06] M. Fuegger, U. Schmid, G. Fuchs, and G. Kempf. Fault-Tolerant Distributed Clock Generation in VLSI Systems-on-Chip. In Proceedings of the Sixth European Dependable Computing Conference (EDCC-6), pages 87--96. IEEE Computer Society Press, Oct. 2006.[ITRS05] International technology roadmap for semiconductors, 2005.[KHP04] T. Karnik, P. Hazucha, and J. Patel. Characterization of soft errors caused by singleevent upsets in CMOS processes. Dependable and Secure Computing, IEEE Transactions on, 1(2):128--143, April-June 2004.[KK98] I. Koren and Z. Koren. Defect tolerance in VLSI circuits: Techniques and yield analysis. Proceedings of the IEEE, 86(9):1819--1838, Sep 1998.[Lam84] L. Lamport. Buridan's principle. Technical report, SRI Technical Report, 1984.[Lam03] L. Lamport. Arbitration-free synchronization. Distributed Computing, 16(2/3):219--237, September 2003. [LP76] L. Lamport and R. Palais. On the glitch phenomenon. Technical report, SRI Technical Report, 1976.[LS03] G. Le Lann and U. Schmid. How to implement a timer-free perfect failure detector in partially synchronous systems. Technical Report 183/1-127, Department of Automation, Technische Universit\"at Wien, January 2003.[Mar81] L. Marino. General theory of metastable operation. IEEE Transactions on Computers, C-30(2):107--115, February 1981.[MA01] M. S. Maza and M. L. Aranda. Analysis of clock distribution networks in the presence of crosstalk and groundbounce. In Proceedings International IEEE Conference on Electronics, Circuits, and Systems (ICECS), pages 773--776, 2001.

Page 58: Distributed Algorithms and VLSI - ti.tuwien.ac.at · Distributed Algorithms and VLSI Ulrich Schmid Vienna University of Technology s@ecs.tuwien.ac.at. ... SLAC National Accelerator

Keynote SSS'08 U. Schmid 58

Some References[Nic05] M. Nicolaidis. Design for soft error mitigation. Device and Materials Reliability, IEEE Transactions on, 5(3):405--418, Sept. 2005.[Nor96] E. Normand. Single-event effects in avionics. IEEE Transactions on Nuclear Science,43(2):461--474, Apr 1996.[PB93] M. Peercy and P. Banerjee. Fault tolerant VLSI systems. Proceedings of the IEEE, 81(5):745--758, May 1993.[Res01] P. J. Restle and others. A clock distribution network for microprocessors. IEEE Journal of Solid-State Circuits, 36(5):792--799, May 2001. [RDS90] L. M. Reyneri, D. DelCorso, and B. Sacco. Oscillatory metastability in homogeneous and nhomogeneous flip-flops. IEEE Journal of Solid-State Circuits, SC-25(1):254--264, February 1990.[RS08] P. Robinson and U. Schmid. The Asynchronous Bounded-Cycle Model. Proceedings SSS'08, 2008.[SE02] I. E. Sutherland and J. Ebergen. Computers without Clocks. Scientific American, 287(2):62--69, Aug. 2002.[Sut89] I. E. Sutherland. Micropipelines. Communications of the ACM, Turing Award, 32(6):720--738, June 1989. ISSN:0001-0782.[WLS05] J. Widder, G. Le Lann, and U. Schmid. Failure detection with booting in partially synchronous systems. In Proceedings of the 5th European Dependable Computing Conference (EDCC-5), volume 3463 of LNCS, pages 20--37, Budapest, Hungary, Apr. 2005. Springer Verlag.[WS05] J. Widder and U. Schmid. Achieving synchrony without clocks. Research Report 49/2005, Technische Universität Wien, Institut für Technische Informatik, 2005. (submitted).