Gheorghe M. Ștefan . “The semiconductor industry threw the equivalent of a Hail Mary pass when it...
33
ETTI Colloquia, Nov. 6, 2014 * Can Parallel Computing Be Liberated From Ad Hoc Solutions? A Recursive MapReduce Approach and Its Implementation Gheorghe M. Ștefan http://arh.pub.ro/gstefan/
Gheorghe M. Ștefan . “The semiconductor industry threw the equivalent of a Hail Mary pass when it switched from making microprocessors
The semiconductor industry threw the equivalent of a Hail Mary
pass when it switched from making microprocessors run faster to
putting more of them on a chip doing so without any clear notion of
how such devices would in general be programmed David Patterson
IEEE Spectrum, July 2010 November 6, 2014ETTI Colloquia2
Slide 3
Outline: Little history How parallel computing could be
restarted Kleenes mathematical model Recursive MapReduce abstract
model Backus architectural description Programming the MapReduce
hierarchy Generic one-chip parallel structure Concluding remarks
November 6, 2014ETTI Colloquia3
Slide 4
History: mono-core computation 1936 mathematical computational
models : Turing, Post, Church, Kleene 1944-45 abstract machine
models : Harvard abstract model von Neumann abstract model 1953
manufacturing in quantity : IBM 701 1964 computer architecture :
the concept allows independent evolution for software and hardware
Consequently, now we have few stable and successful sequential
architectures : x86, ARM, PowerPC, November 6, 2014ETTI
Colloquia4
Slide 5
History: parallel computation 1962 manufacturing in quantity :
the first MIMD engine is introduced on the computer market by
Burroughs 1965 architectural issues : Dijkstra formulates the first
concerns about parallel programming issues 1974-76 abstract machine
models : the first abstract models (PRAM models) start to come in
after almost two decades of non-systematic experiments ?
computation model : it is there waiting for us Consequently the
semiconductor industry threw the equivalent of a Hail Mary pass
when it switched from making microprocessors run faster to putting
more of them on a chip November 6, 2014ETTI Colloquia5
Slide 6
About PRAM-like models Parallel Random Access Machine PRAM -
(bit vector models in [Pratt et al. 1974] and PRAM models in
[Fortune and Wyllie 1978]) is considered a natural generalization
of the Random Access Machine model. Parallel Memory Hierarchy
[Alpern et al. 1993] is also a generalization, but this time of the
Memory Hierarchy model applied to the RAM model. Bulk Synchronous
Parallel model divides the program in super-steps [Valiant 1990].
Latency-overhead-gap-Processors LogP - is designed to model the
communication cost [Culler et al. 1991]. November 6, 2014ETTI
Colloquia6
Slide 7
How parallel computing could be consistently restarted 1. Use
Kleenes partial recursive functions model as the foundational
mathematical framework 2. Define an abstract machine model using
meaningful forms derived from Kleenes model 3. Interface the
abstract machine with an architectural (low level) description
based on Backus FP Systems 4. Provide the simplest generic parallel
structure able to run the functions requested by the architecture
5. Evaluate, using the computational motifs highlighted by
Berkeleys View, the options made in the previous three steps and
improve them when needed November 6, 2014ETTI Colloquia7
Slide 8
Kleenes mathematical model for parallel computation From the
three rules: composition primitive recursion minimalization only
the first one, the composition, is independent. f(x) = g(h 1 (x), h
m (x)) November 6, 2014ETTI Colloquia8
Slide 9
Integral parallel abstract model: data-parallel November 6,
2014ETTI Colloquia9
Slide 10
Integral parallel abstract model: reduction-parallel November
6, 2014ETTI Colloquia10
Slide 11
Integral parallel abstract model: speculative-parallel November
6, 2014ETTI Colloquia11
Slide 12
Integral parallel abstract model: time-parallel November 6,
2014ETTI Colloquia12
Slide 13
Integral parallel abstract model: thread-parallel November 6,
2014ETTI Colloquia13
Slide 14
Putting all forms together: integral parallel abstract model
The MapReduce abstract model: Map means data, speculative and
thread parallelism Reduce means reduce parallelism November 6,
2014ETTI Colloquia14
Slide 15
From one-chip to cloud: MapReduce recursive abstract model
November 6, 2014ETTI Colloquia15
Slide 16
Backus architectural description John Backus: Can Programming
Be Liberated from the von Neumann Style? A Functional Style and Its
Algebra of Programs, Communications of the ACM, August, 1978.
Functional Programming Systems primitive functions functional forms
definitions November 6, 2014ETTI Colloquia16
Slide 17
Functional forms Apply to all: f : x (x = ) Construction: [f 1,
, f p ] : x Threaded construction: [f 1, , f p ] : x (x = ) Insert:
/f : x ((x = ) & (p 2)) f : > Composition: (f q f q-1 f 1 )
: x f q : (f q-1 : (f q-2 : ( :(f 1 : x)))) November 6, 2014ETTI
Colloquia17
Slide 18
Kleene Backus synergy November 6, 2014ETTI Colloquia18
Slide 19
MapReduce hierarchy programming Any level in the hierarchy uses
the same programming forms: Map & Reduce (define (Map funcs
args) (cond ((and (atom? funcs) (atom? args)) ; one funcs one args
(funcs args) ) ((and (atom? funcs) (list? args)) ; one funcs many
args (if (null? args)() (cons(funcs(car args)) (Map funcs (cdr
args))) )) ((and (list? funcs) (atom? args)) ; many funcs one args
(if (null? funcs) () (cons((car funcs) args) (Map (cdr funcs)
args))) )) ((and (list? funcs) (list? args)) ; many funcs many args
(if (or (null? funcs)(null? args))() (cons((car funcs) (car
args))(Map (cdr funcs) (cdr args))) )) November 6, 2014ETTI
Colloquia19
Slide 20
MapReduce hierarchy programming (define(Reduce binaryOp
argList) (cond((atom? argList)argList) (#t(binaryOp(car argList)
(Reduce binaryOp (cdr argList)))) )) The 0-level functions in the
hierarchy are: Add, Sub, Mult, And, Or, Xor, Inc, Dec, Not, Max,
Min, November 6, 2014ETTI Colloquia20
Slide 21
Generic one-chip parallel structure November 6, 2014ETTI
Colloquia21
Slide 22
The ConnexArray TM : BA1024 Last version, March 2008 65 nm 99
mm 2 (entire chip) 1024 16-bit cells 1 KB/cell 400 MHz 400 GOPS
> 120 GOPS/W > 6.25 GOPS/mm 2 The first version, 1111 mm 2,
in 90 nm November 6, 2014ETTI Colloquia22
Slide 23
Updated version in 28 nm 2048 32-bit cells with 8KB/cell 1MHz
< 15Watt, at T