22
One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap

One Flip per Clock Cycle

Embed Size (px)

DESCRIPTION

One Flip per Clock Cycle. Martin Henz, Edgar Tan, Roland Yap. SAT Problems. Find an assignment of n variables that satisfies all m clauses (disjunctions of literals of variables) Notation: V: array of boolean values; V[3] is the value of the third variable in assignment V - PowerPoint PPT Presentation

Citation preview

Page 1: One Flip per Clock Cycle

One Flip per Clock Cycle

Martin Henz, Edgar Tan, Roland Yap

Page 2: One Flip per Clock Cycle

SAT Problems

Find an assignment of n variables that satisfies all m clauses (disjunctions of literals of variables)

Notation:V: array of boolean values; V[3] is the value

of the third variable in assignment V

EVALi(V): evaluation function of clause i, returns boolean value resulting from evaluating clause i under assignment V

Page 3: One Flip per Clock Cycle

GenSAT

procedure GenSAT(cnf, maxtries, maxflips) for i = 1 to maxtries do INITASSIGN(V); for j = 1 to maxflips do if V satisfies cnf then return V else f = CHOOSEFLIP(); V := V with variable f flippedend end end end

Page 4: One Flip per Clock Cycle

Instances of GenSAT• GSAT: CHOOSEFLIP randomly chooses a flip

that produces maximal score

• WSAT: CHOOSEFLIP randomly chooses a violated clause, and randomly chooses among the variables of that clause a flip that produces maximal score

• GWSAT: choose randomly whether to do GSAT flip or WSAT flip

• GSAT/Tabu: prevent quick flipping back

• HSAT: use history for tie breaking: choose least recently flipped variable

Page 5: One Flip per Clock Cycle

FPGAs

• ASICs: application-specific integrated circuits– customer describes logic behavior in a hardware

description language such as VHDL– vendor designs and produces integrated circuit with

this behavior

• Masked gate arrays– ASIC with transistors arranged in a grid-like manner– initially unconnected; mass produced– add final conductor layers for connecting

components

• FPGAs: field programmable gate arrays

Page 6: One Flip per Clock Cycle

Current Line of FPGAs: Example

• Xilinx XCV1000

• 4MBytes on-board RAM

• max clock rate 300 MHz

• max clock rate using on-board RAM 33MHz

• 6144 CLBs (configurable logic blocks)

• roughly 1M system gates

• 1 Mbit of distributed RAM

• each CLB is divided into 2 slices

• thus 12,288 slices available

Page 7: One Flip per Clock Cycle

Programming FPGAs

• Massively parallel computer with random access memory

• Instructions are compiled into hardware; no runtime stacks; no functions; no recursion…

• In practice, hardware description languages like VHDL are used to program FPGAs

• Newer development: Handel C

Page 8: One Flip per Clock Cycle

NESL-like Syntax for Parallelism

P gates for P depth of P

x:=y+z g(P) = O(1) d(P) = O(1)

Q; R g(P) = g(Q)+g(P) d(P) = g(Q)+g(R)

{e(i) : i S} g(P) = i(g(e(i))) d(P) = maxi(d(e(i)))

Page 9: One Flip per Clock Cycle

ExampleLet S be an array of statically known size n,

where n is a power of 2.

macro SUM(S,n):

if n = 1 then S[0]

else SUM({ S[2i] + S[2i + 1]

: i [0..n/2-1]}, n/2)

g(SUM(S,n) = O(n)

d(SUM(S,n) = O(log n)

Page 10: One Flip per Clock Cycle

Previous GSAT/FPGA Work• Hamadi/Merceron: first non-software design of a

local search algorithm; CP 97

• Yung/Seung/Lee/Leong: runtime reconfigurable version of Hamadi/Merceron work; first implementation; Conference on Field-programmable Logic and Applications, 1999

Page 11: One Flip per Clock Cycle

Naïve Parallel GSAT (Ham/Merc)macro CHOOSEFLIP(f):

max := -1; f := -1;

for i = 1 to n do

score := SUM({EVALj(V[V[i]/i] : j [1…m]});

if score > max (score = max RANDOMBIT()) then

max := score; f := i

end

end

g(CHOOSEFLIP(f)) = O(n m)

d(CHOOSEFLIP(f)) = n * (O(log m) + O(log n)) = O(n log m)

Page 12: One Flip per Clock Cycle

Step 1: Naïve Random GSATmacro CHOOSEFLIP(f):

max := -1; f := -1;

MaxV := {0 : k [1…n]};

for i = 1 to n do

score := SUM({EVALj(V[V[i]/i] : j [1…m]});

if score > max then

max := score; MaxV := { 0 : k [1…n]}[1/i]

else if score = max then MaxV := MaxV[1/i]

end end

f := CHOOSE_ONE(MaxV)

g and d is unchanged; d(CHOOSE_ONE) = O(log n), g = O(n)

Page 13: One Flip per Clock Cycle

Step 2: Parallel Variable Scoringmacro CHOOSEFLIP(f):

Scores := { SUM( {EVALj(V[V[i]/i])

: j [1…m]}) : i [1…n]};

f := CHOOSE_MAX(Scores);

d(CHOOSEFLIP(f)) = O(log m + log n) = O(log m)

g(CHOOSEFLIP(f)) = O(m n2)

Page 14: One Flip per Clock Cycle

Step 3: Relative Scoring

• Selman/Levesque/Mitchell use a technique of relative scoring in their implementation.

• First thorough analysis of relative scoring in Hoos’ Diplomarbeit

• Idea: After every flip, update the score of those variables that are affected by the flip.

• Since clauses are small, the number of affected variables is much smaller than the overall number of variables

Page 15: One Flip per Clock Cycle

Some Notation

• NCl[i] is the number of clauses that contain the variable i

• MaxClauses = maxi NCl[i]; usually MaxClauses << m

• MaxVariables = maxj (number of vars in clause j)

• EVALjC(i) evaluates the j-th clause from the set of

clauses that contain the variable i

Page 16: One Flip per Clock Cycle

Relative Scoring

macro CHOOSE_FLIP(f):NewS := { SUM({EVALj

C(i)(V[V[i]/i]) : j [1…NCl[i]})

: i [1…n] };

OldS := { SUM({EVALjC(i)(V) : j [1…NCl[i]})

: i [1…n] };

Diff := { NewS[i] – OldS[i] : i [1…n]};

f := CHOOSE_MAX(Diff)

g(CHOOSE_FLIP(f)) = O(MaxVars MaxClauses n)

d(CHOOSE_FLIP(f)) = O(log MaxClauses +

log MaxVars)

Page 17: One Flip per Clock Cycle

Step 4: Pipelining

procedure GenSAT(cnf, maxtries, maxflips) for i = 1 to maxtries do INITASSIGN(V); for j = 1 to maxflips do if V satisfies cnf then return V else f = CHOOSEFLIP(); V := V with variable f flippedend end end end

Page 18: One Flip per Clock Cycle

S I

Pipelining Outer Loop

macro CHOOSE_FLIP(f):NewS := { SUM({EVALj

C(i)(V[V[i]/i]) : j [1…NCl[i]})

: i [1…n] };

OldS := { SUM({EVALjC(i)(V) : j [1…NCl[i]})

: i [1…n] };

Diff := { NewS[i] – OldS[i] : i [1…n]};

f := CHOOSE_MAX(Diff)

STAGE I

STAGE II

STAGE III

STAGE IV

S I S II S III S IV S I S II S III S IV S I

S II S III S IV S I S II S III S IV S I

S I S II S III S IV S I S II S III S IV

S I S II S III S IV S I S II S III

S II …

Try 1

Try 2

Try 3

Try 4

Page 19: One Flip per Clock Cycle

Preliminary Experiments

• Conducted on hill-climbing variant of GSAT;

• Comparing software implementation by Selman/Kautz with Hamadi/Merceron and Step 4

• Software: running on Pentium II at 400MHz

• FPGA: running on Xilinx XCV 1000 at 20MHz; programmed using Handel C by Celoxica

Page 20: One Flip per Clock Cycle

Flips per SecondDIMACS

Problems

Software Sel/Kau

FPGA Ham/Mer

FPGA Step 4

Speedup vs H/M

50-80- 1.6

128.5 K 520 K 25 M 48

50-100- 2.0

107.4 K 520 K 25 M 48

100-160-1.6

139.6 K 284 K 22 M 77.5

100-200- 2.0

110.9 K 284 K 22 M 77.5

Page 21: One Flip per Clock Cycle

Flips per Slice SecondDIMACSProblems

Slices Ham/Mer

f / sl sec Ham/Mer

Slices Step 4

f / sl sec Step 4

Impro vement

50-80- 1.6

651 800 1671 14950 18.7

50-100- 2.0

704 740 1697 14700 19.9

100-160-1.6

1136 250 3154 6975 27.9

100-200- 2.0

1240 230 3186 6900 30

Page 22: One Flip per Clock Cycle

Conclusions

• Fastest known one-chip implementation of GSAT

• using parallel relative scoring plus pipelining

• current size and speed makes it feasible to use FPGAs as platforms for parallel algorithms

• FPGA are one-chip parallel machines with serious limitations of programmability

• higher-level languages needed

• stack support needed: towards compiling parallel languages to hardware