Verification at HP Labs

Verification at HP Labs

Mark Tuttle

(with the help of many friends at)

HP Labs

Slide 2 of 47

Overview of verification work

• Cache coherence protocols– Alpha EV6, EV7, EV8 protocols– Itanium

• Bus protocols: – PCI-X, Infiniband (FIO/NGIO/SIO)

• Database systems

• Distributed algorithms

• A SAT-based bounded model checker– Applications to Itanium software

Slide 3 of 47

Most of this work uses TLA+

• Lamport’s specification language based on set theory, first-order logic, temporal logic

• Hierarchical style improves readability, rigor– specifications: becomes

– proofs: becomes

• Most find reading easy, writing not too hard

BA AB

CBA <1>1. <2>1. CASE <2>2. CASE <2>3. QED

CBA AB

Slide 4 of 47

Wildfire: EV6 cache coherenceKourosh Gharachorloo, Madhu Sharma,

Simon Steely, Steve Van Doren

quad

globalswitch

quad

quad

quad

quad

quad

quad

quad

P1

localswitch

P2

P3

P4

mem

DIR

TTT

DTAG

global port

arbiter

32 processor server

Slide 5 of 47

Directory-based cache coherence

x x5 copies owner

processors memory directory

To get x, go to x’s directory to see who owns x.

P1

P2

P3

P4

Slide 6 of 47

Get read-only copy

P

Q

Rd(x)

Fwd(x)Fill(x,5)

copies=Qx owner=Q

P,Q

Slide 7 of 47

Get writable copy

P

S

R

Q

copies=Q,R,Sx owner=QRdEx(x)

FwdRdEx(x)FillEx(x,5)

Inval(x)

Inval(x)

P P

Slide 8 of 47

A complicated protocolDirectory can be many steps ahead of processors

R1 R2 R3

Dir

RdEx(x) RdEx(x)

FwdRdEx(x)

RdEx(x)

FwdRdEx(x)

Data

Data Data

Slide 9 of 47

A complicated protocolGenerates data and commit events independently • Memory barriers impose instruction order

– Maintain count of outstanding off-chip requests– Pass memory barrier only when count is 0

read AMBread B

inval(x)inval(y)data(A)

read AMBread B

inval(x)inval(y)

commit(A) data(A)

Slide 10 of 47

Dramatic speedups possible

Reads are fast MBs are fast

read A… work ...MBread B

inval(x)inval(y)

commit(A) data(A)

read AMBread B

commit(A)

data(A)

owner

fwd(A)

“Intuitively surprising this actually works!”

Slide 11 of 47

Wildfire verification

Paul Harter, Leslie Lamport, Mark Tuttle, Yuan Yu

• We are asked to look at the protocol

• We arrive very late (almost tape-out)

• No time for complete proof

• But enough time for a rigorous analysis

Slide 12 of 47

Wildfire cache coherence in “three easy steps”+“two-man years”

Model Alpha memory model.(200 lines)

Model complete protocol.(2000 lines, 3 months)

Prove implementation(5500 lines, 4+ months, incomplete)

Model abstract protocol.(500 lines)

Prove implementation(550 lines, 2 months, informal)

Slide 13 of 47

Step 1: Alpha memory model

• Official specification is – Informal: an English document

– Behavioral: defines acceptable sequences of memory operations

• Our specification is– Precise: a single logical formula

– State-based: required for invariance-style proofs

• We did simplify the model slightly:– Operations read and write entire cache lines

– Some “impossible” implementations ruled out

• Compare the specifications: 12 pages vs 200 lines

Slide 14 of 47

The heart of the model

• A Before order– Orders reads and writes in an execution– Determines return values for the reads

• A GoodExecutionOrder predicate– Defines the Before orders allowed by the model

Slide 15 of 47

State machine actions

• ReceiveRequest(proc, req) Receive a request

• ChooseNewData(proc, idx) Choose the return value for a request

• Respond(proc, idx) Return the value to a request • ExtendBefore Expand the Before relation

• Actions must preserve GoodExecutionOrder.

Slide 16 of 47

GoodExecutionOrder

GoodExecutionOrder == LET [some definitions deleted] IN /\ (*************************************************************) (* Before is a partial order. *) (*************************************************************) /\ Before \subseteq ReqId \X ReqId /\ \A r1, r2 \in ReqId : IsBefore(r1, r2) => ~IsBefore(r2, r1) /\ \A r1, r2, r3 \in ReqId : IsBefore(r1, r2) /\ IsBefore(r2, r3) => IsBefore(r1, r3) /\ (*************************************************************) (* SourceOrder implies the Before order. *) (*************************************************************) \A r1, r2 \in ReqId : SourceOrder(r1, r2) => IsBefore(r1, r2) /\ (*************************************************************) (* RequestOrder implies the Before order. *) (*************************************************************) \A r1, r2 \in ReqId : RequestOrder(r1, r2) => IsBefore(r1, r2)

This is the hard part --- look how short it is!

Slide 17 of 47

/\ (*******************************************************) (* Writes and successful SCs to the same location that *) (* have issued a response are totally ordered. *) (*******************************************************) \A r1, r2 \in ReqId : /\ ReqIdQ[r1].req.type \in {"Wr", "SC"} /\ ReqIdQ[r1].req.newData # "Failed" /\ ReqIdQ[r1].req.responded /\ ReqIdQ[r2].req.type \in {"Wr", "SC"} /\ ReqIdQ[r2].req.newData # "Failed" /\ ReqIdQ[r2].req.responded /\ ReqIdQ[r1].req.adr = ReqIdQ[r2].req.adr => IsBefore(r1, r2) \/ IsBefore(r2, r1)

Slide 18 of 47

/\ (*******************************************************************) (* LL/SC Axiom: For each successful SC, there is a matching LL and *) (* there is no write to the same address from a different *) (* processor between the LL and SC in the Before order. *) (*******************************************************************) \A r2 \in ReqId : /\ ReqIdQ[r2].req.type = "SC" /\ ReqIdQ[r2].newData \notin {Failed, NotChosen} => \E r1 \in ReqId : /\ LLSCPair(r1, r2) /\ \A r \in ReqId : /\ \/ ReqIdQ[r].req.type = "Wr" \/ /\ ReqIdQ[r].req.type = "SC" /\ ReqIdQ[r].newData \notin {NotChosen, Failed} /\ r[1] # r2[1] /\ ReqIdQ[r2].req.adr = ReqIdQ[r].req.adr => ~IsBefore(r1, r) \/ ~IsBefore(r, r2)

Slide 19 of 47

/\ (**************************************************************) (* Value Axiom: A read reads from the preceding write in the *) (* Before order. *) (**************************************************************) \A r1, r2 \in ReqId : /\ ReqIdQ[r2].source # NoSource /\ ReqIdQ[r1].req.type = "Wr" /\ ReqIdQ[r1].req.adr = ReqIdQ[r2].req.adr => IF ReqIdQ[r2].source = FromInitMem THEN ~IsBefore(r1, r2) ELSE \/ ~IsBefore(ReqIdQ[r2].source, r1) \/ ~IsBefore(r1, r2)

Slide 20 of 47

Step 2: Model abstract protocol

protocol = abstract protocol + implementation junk

Surprisingly,– abstract protocol’s correctness was far from obvious– we discovered a bug… in the memory model

Proved hardest part of correctness:– Proved the Before order is acyclic– 35-line invariant based on 300 lines of definitions– 550-line proof, cases nested 10 levels deep

Slide 21 of 47

Found: Alpha memory model bug

x=0, y=0

P: if x=1 then y:= 2 Q: if y=2 then x:=1

x=1, y=2

This behavior breaks the critical section implementation recommended in the SRM.

(Jim Saxe)

Original Alpha memory model allowed

Slide 22 of 47

Revised Alpha memory model

causal cycle

P: if x=1 then y:=2

Q: if y=2 then x:= 1

break the cycle

P: if x=1 then y:=2


Slide 23 of 47

Wildfire counterexample

The Alpha memory model says x=3,but in Wildfire it could be x=1…

Q: if x=1 then y:=2

R: if y=2 then x:=3

P: x:=1

Slide 24 of 47

Q

directoryInval(x)

P: x:=1 R: if y=2 then x:=3Q: if x=1 then y:=2

R x=0P x=1

ITD(x) ok

Slide 25 of 47

Q

directoryInval(x)


R x=0P x=1

Rd(x)Fwd(x)

x=1 x=1

Slide 26 of 47

Q

directoryInval(x)


R x=0P x=1

ITD(y)

x=1

ok

y=2

Slide 27 of 47

Q

directory


R x=0,3P x=1 x=1y=2

Rd(y)Fwd(y)

y=2y=2

Inval(x)

Slide 28 of 47

Q


R x=3P x=1 x=1y=2 y=2

Inval(x)

The result must be x=3, but the result is x=1.

The same thing was possible in other machines.(Kourosh Gharachorloo)

Slide 29 of 47

What went wrong?

An ordering internal to P … forced an ordering for Q:

P: if x=1 then y:=2


P: if x=1 then y:=2


The fix: use internal orderings to forbid orderings,but not to force orderings.

Slide 30 of 47

New Alpha memory model

Q: if x=1 then y:=2

R: if y=2 then x:= 3

P: x:=1

There is no dependency/source cycle:

R1 R2W1 WnW2 …

Slide 31 of 47

Obstacle: no single, complete descriptionEnglish documents: 12 documents, 4-inch stack

Lisp simulator: crucial to understanding some details

None compact, none mathematically tractable

Different levels of abstraction, some inconsistency

We had to write our own description

Step 3: Model complete protocol

Slide 32 of 47

Obstacle: algorithm complexity

ChangeToDirty DummyRdVic FailedChangeToDirty Fetch InvalToDirty InvalToDirtyVic Rd RdMod RdVic RdVicMod QV_Fetch QV_Rd QV_RdMod WrVic ChangeToDirtyFailure ChangeToDirtySuccess

FetchFillMarker FillMarkerFillMarkerMod ForwardFetch ForwardFetchWithFetchFillMarker ForwardRd ForwardRdMod ForwardRdWithFillMarker ForwardRdModWithFillMarkerMod

InvalAck InvalToDirtySuccess Invalidate LoopComsig LoopComsigWithInvalAck LoopComsigWithShadowClear

LoopComsigWithShadowInvalAndShadowClear ShadowChangeToDirtySuccess ShadowForwardFetch

ShadowForwardRd ShadowForwardRdMod ShadowInvalToDirtySuccess ShadowInvalidate ShadowShortFillMod

ShadowSnap ShortFetchFill ShortFill ShortFillMod VictimAck FetchFill Fill FillMod VCFetchFill VCFill VCFillMod

Slide 33 of 47

Solution: Quarks

• Ack• ChangeToDirty• Clear• Comsig• Fill• ForwardedGet• GetValue

• InvalidToDirty• QuadInvalidate• ReleaseMAF• ReleaseVDB• SetCacheLineState• Victimize• Write

Quarks combine to form messages.

Slide 34 of 47

Quarks form messages, then split up

GetValue QuadInval

ForwardedGetForwardedGet, QuadInval, Comsig

Comsig

homequad

globalswitch

copyholders

owner

reader

Slide 35 of 47

Quarks resolve message overloading• “ChangeToDirtySuccess” could mean

– {AckChangeToDirty, Comsig, QuadInvalidate^*, ClearOutstandingInval}

– {AckChangeToDirty, Comsig, QuadInvalidate^*}

– {Comsig, ReleaseMAF, SetCacheLineState}

Quarks simplify algorithm description• Each quark processed separately, independently

• Each data structure changed by a single quark

Slide 36 of 47

Quark handling

ProcFieldsMessage(proc, msg) ==

/\ ...

/\ Cache' = CASE ...

[] ("Fill" \in msg) /\ (subtype("Fill") # "Fetch")

-> [Cache EXCEPT

![proc, cacheIndex].state =

IF subtype("Fill") = "Mod"

THEN "ExclusiveDirty"

ELSE "Clean",

![proc, cacheIndex].tag = AddressToTag(msg.adr),

![proc, cacheIndex].data = msg.data ]

If a processor receives a Fill quark carrying cacheable data, then how is the cache is updated?

Slide 37 of 47

Define an invariant describing all reachable states.1000 lines

Prove invariance.

We focused on the most difficult, error-prone parts:

cache dtag directorymessages messages

Wildfire invariant

on quad(150 lines)

off quad(150 lines)

Slide 38 of 47

2./\ a.\/ (* proc is the owner of adr *)

1./\ Dir[adr].owner = proc

b.\/ (* proc is not the owner of adr *) ...

2./\ a.\/ (* dtag is dirty *)

1./\ DTagState(adr, proc) = Dirty...

b.\/ (* dtag is invalid *) ...

c.\/ (* dtag is clean *) ...

Dir - Dtag Invariant

DTagCacheInvariant == ...

Mother == DirDTagInvariant /\ DTagCacheInvariant /\ ...

DirDTagInvariant ==

\A adr \in MemBlockAddress, proc \in Processor :

a.\/ (* local address *) ...

b.\/ (* nonlocal address *)

1./\ ProcToQuad(proc) # AddressToQuad(adr)

2./\ Proj(HomeToArbQ) =[ [FG* [QFI] QI* AckWrite] QI* AGV(mod,1) | FG* AckCTD(Success)] FG*

Slide 39 of 47

DTag-Cache Invariance

ASSUME: /\ Mother /\ Wildfire

/\ DTagCacheInvariant(proc,adr)

PROVE: DTagCacheInvariant(proc,adr)'

<1>1. CASE a (* DTagState(proc, adr) = "Invalid" *)

<1>2. CASE b (* DTagState(proc, adr) # "Invalid" *)

<1>3. QED

Slide 40 of 47





<1>1. CASE a (* DTagState(proc, adr) = "Invalid" *)

<2>1. CASE a2a (* AddressCache(proc, adr).state' = "Invalid" *)

<2>2. CASE a2b (* AddressCache(proc, adr).state' # "Invalid" *)

<2>3. QED

<1>2. CASE b (* DTagState(proc, adr) # "Invalid" *)

<1>3. QED

Slide 41 of 47





<1>1. CASE a (* 1./\ DTagState(proc, adr) = "Invalid" *)

<2>1. CASE a2a (* 1. AddressCache(proc, adr).state' = "Invalid" *)

...

<14>1. CASE doing something at the proc

Pf: ....

<14>2. CASE doing something at the arb

<14>3. QED

...

<2>2. CASE a2b (* 1. AddressCache(proc, adr).state' # "Invalid" *)

<2>3. QED

<1>2. CASE b (* 1./\ DTagState(proc, adr) # "Invalid" *)

<1>3. QED

Slide 42 of 47

The implementation proof

In Step 2, we defined an abstract model of the Wildfire algorithm

In Step 3, we defined a complete model of the Wildfire algorithm

Now use the invariant to prove that the complete model implements the abstract model.

This is undone.

Slide 43 of 47

Results: one bug

A fetch is an uncached read.

Victimization removes data from the cache.

The bug allows a fetch to interfere with victimization.

To demonstrate the bug, we need to describe more of the hardware…

Slide 44 of 47

The quad architecturequad

proc

cache

procproc proc

P

ArbGP

dtag directory memoryttt

switch to other quads

Slide 45 of 47

Dtag: a duplicate copy of cache state

One use: invalidate all copies on a quad.

cacheP

Arb

dtag

y r/w

y P r/w

inval(y)

inval(y)

Slide 46 of 47

TTT: tells state of off-quad requests

GP

ttt

cache

P

y

write(y)

write(y)

write(y) ackwrite(y)

Slide 47 of 47

The BugBy causing a fetch to interfere with a victimize,

generate an Inval(y) to a cache without a copy of y.

cacheP

Arb

dtag

y r/w

inval(y)

inval(y)

Slide 48 of 47

Initial state: P owns y

dir mem

y: P y

dtag

y: P

tttgp arb

P y Q R S

Slide 49 of 47

Now P victimizes y to read x into same cache line

dir mem

y: P y

dtag

y: Pgp arb

P y Q R S

ttt


write(y)


get(x)

Slide 50 of 47

So P is waiting for x

dir mem

y: y

dtag

y: Pgp arb

P Q R S

ttt


ackwrite(y)

get(x)

Slide 51 of 47

Now R becomes owner of y

dir mem

y: R y

dtag

y: Pgp arb

P Q Rowns

y

S

ttt


ackwrite(y)

get(x)

Slide 52 of 47

Now P fetches y while waiting for x

dir mem

y: R y

dtag

y: Pgp arb

P Q Rowns

y

S

ttt


ackwrite(y)

get(x)fetch(y)

fetch(y)

fetch(y) ackfetch(y)fill(y)

Slide 53 of 47

Now P gets its copy of y

dir mem

y: R y

dtag

y: Pgp arb

P Q Rowns

y

S

ttt


ackwrite(y)

get(x)

ackfetch(y)

fetch(y) ackfetch(y)fill(y)

fwd(y)

fill(y)

Slide 54 of 47

Now ackwrite arrives: the bug

dir mem

y: R y

ttt


ackwrite(y)ackfetch(y)

fetch(y) ackfetch(y)

• ackwrites normally invalidate dtag entries

• half-completed reads normally inhibit this: the new data has reached the cache, and the dtag entry is for the new data

• but fetches are not cached, and should not be treated like cached reads

dtagy: P

Slide 55 of 47

So P is still waiting for x

dtag

y: Pgp arb

P Q

ttt

get(x)

Rowns

y

S

dir mem

y: R y

Slide 56 of 47

Now Q reads y

dtag

y: P,Qgp arb

P Qgets y

ttt

get(x)

Rowns

y

S

dir mem

y: R,Q y

Slide 57 of 47

Now S becomes owner of y

dtag

y: P,Qgp arb

P Qgets y

ttt

get(x)

Sowns

y

R

dir mem

y: R,Q y

inval(y)

Slide 58 of 47

Now we are in trouble…

gp arb

P Qgets y

ttt

get(x)

inval(y)

dtag

y: P,Q

The inval is forwarded to both P and Q

but P doesn’t have a copy to invalidate!

Slide 59 of 47

The bug is obvious in hindsight

• But our scenario exhibiting the bug – is very long,– uses 4 processors,– uses 2 locations, and– uses 15 messages.

• Finding this scenario seemed beyond the power of automated tools like model checkers.

Slide 60 of 47

Wildfire conclusion• We performed a rigorous analysis.

• We studied the hardest parts of the algorithm.

• Designers said their confidence in the algorithm was much improved.

• We expected to find more errors– Designers knew what they were doing– We joined late in the design cycle:

• We had been asked to study the protocol

• Bugs at protocol level had already been found

• All remaining bugs were at the implementation level

Slide 61 of 47

EV7 cache coherence

Joshua Scheid, Homayoon Akhiani, Jonathan Nall Damien Doligez,

Scott Kreider, Scott Taylor, Brannon Batson

• Much simpler protocol, proof actually completed

• First TLA+ specification written by engineers

• First intense application of TLC model checker

• New, interesting uses of spec in simulation

Slide 62 of 47

Results

• 73 bugs found – Most bugs were ambiguity in design documents

• 37 minor: typos, type errors, etc

• 11 bugs: wrong message sent/wrong state set

• 14 missing cases

• 7 spurious cases (dead code)

– 5 bugs were actual implementation bugs• 1 found by TLC

• 4 found by using TLC error traces for RTL simulation…

Slide 63 of 47

Interesting spec applications

• Translate TLC error traces into RTL stimulus– Force RTL simulator into interesting corner cases

• Translate random TLC traces: – Better than random stimulus: satisfies TLA+ spec!

• Translate RTL simulator output to TLA+ – TLC can check that RTL satisfies TLA+ spec– TLC can trace visited states, improve coverage

• TLA+ specs yield good RTL assertions to check

Slide 64 of 47

Itanium cache coherenceMark Tuttle, Jae Yang

• Used Intel chips, modeled their external behavior

• Simply writing spec yielded two design changes

• Too big for TLC:– Intel chip models allowed too many behaviors– Interesting scenarios required large configurations– Used TLC for simulation, not model checking

• Most interesting: TLA+ Itanium memory model– With Gil Neiger, Leslie Lamport, Yuan Yu

Slide 65 of 47

Interconnect verification

• PCI-X: a high-speed extension to PCI bus – Tom Rodeheffer, Mark Tuttle– Found fatal flaws in submissions to standards group

• Infiniband (FIO/NGIO/SIO)– Mark Tuttle, Jae Yang, George Zhang, …

Slide 66 of 47

Database recoveryDave Lomet, Mark Tuttle

data log

logcache

cache manager log manager

disk

memoryO: x := x+1

x O

Slide 67 of 47

After a crash, only the disk remains

data log

• Recovery manager must reconstruct the database

• Recovery manager has only the bits left on disk

• Our theory explains how bits must be managed

Slide 68 of 47

Recovery theory

• Define an ordering on the database operations

• Theorem: If cache manager follows order, then state remains recoverable.

• Theorem: If recovery manager follows order from a recoverable state, then recovery succeeds.

The proofs were done and perfect …

… but model checking found 3 subtle mistakes!

Slide 69 of 47

Robot rendezvousMaurice Herlihy, Mark Tuttle

• Robots parachute onto a graph, move around the graph, and rendezvous on a single node.

• Protocol is complete, all but the “move” function

• Model checking shows no “move” will work

• Saved a week of useless search for “move” !

Slide 70 of 47

The dream

• Model checking white board conversations.– This is where the real design happens.– This is where the problems are encountered

• This requires an abstract, expressive language

• TLA+ is a good language

• TLC is too slow.

What can we take from TLA+ …

… and still get reasonable performance?

Slide 71 of 47

SAT-based BMC Rajeev Joshi, John Matthews, Mark Tuttle

Rajeev Joshi, John Matthews, Mark Tuttle

• TLA+ is the right language, TLC is too slow

• Why not use TLA+– Thought typed language would help (TLA+ untyped)– Wanted to be able to change language (TLA+ hard to

parse)– In hindsight may not have been necessary

Slide 72 of 47

Protocol and property Boolean formulax and (y or z)

SAT checker

Satisfying assignmentx = true, y=true, z=false

Counterexample traceS0 S1 S2 S3 … Sproperty violated!

Model checker

SAT-based Model Checking Rajeev Joshi, John Matthews, Mark Tuttle

Only nontrivial step

Slide 73 of 47

MLA Language

• Types: booleans, integer ranges, records, enums, bounded sequences, finite sets, recursive functions

• Operators: arithmetic, logical, relational

• Value constructors: lambdas, etc.

• Expressions: let, case, if, quantification

• MLA compiler: – 16,000 lines of ML (wc)– Function translation only source of trouble

Slide 74 of 47

Itanium program verification

• Given– Assembly language program (compiler output)– Safety property (an invariant)

• Is safety violated by an execution of the program allowed by the Itanium memory model ?

• Thinking of synchronization code (mutex)

• Examples were Dekker and Bakery algs

Slide 75 of 47

ConclusionFormal methods are up to industrial problems.

Formal methods have incremental payoffs:Specification documents design.

Model checking finds quick design errors.

Proof writing finds deeper design errors.

Any partial proof is still a rigorous analysis.

Slide 76 of 47

Documents

Verification at HP Labs