Gheorghe M. Ștefan . “The semiconductor industry threw the equivalent of a Hail Mary pass when it switched from making microprocessors

Gheorghe M. tefan http://arh.pub.ro/gstefan/

The semiconductor industry threw the equivalent of a Hail Mary pass when it switched from making microprocessors run faster to putting more of them on a chip doing so without any clear notion of how such devices would in general be programmed David Patterson IEEE Spectrum, July 2010 November 6, 2014ETTI Colloquia2

Outline: Little history How parallel computing could be restarted Kleenes mathematical model Recursive MapReduce abstract model Backus architectural description Programming the MapReduce hierarchy Generic one-chip parallel structure Concluding remarks November 6, 2014ETTI Colloquia3

History: mono-core computation 1936 mathematical computational models : Turing, Post, Church, Kleene 1944-45 abstract machine models : Harvard abstract model von Neumann abstract model 1953 manufacturing in quantity : IBM 701 1964 computer architecture : the concept allows independent evolution for software and hardware Consequently, now we have few stable and successful sequential architectures : x86, ARM, PowerPC, November 6, 2014ETTI Colloquia4

History: parallel computation 1962 manufacturing in quantity : the first MIMD engine is introduced on the computer market by Burroughs 1965 architectural issues : Dijkstra formulates the first concerns about parallel programming issues 1974-76 abstract machine models : the first abstract models (PRAM models) start to come in after almost two decades of non-systematic experiments ? computation model : it is there waiting for us Consequently the semiconductor industry threw the equivalent of a Hail Mary pass when it switched from making microprocessors run faster to putting more of them on a chip November 6, 2014ETTI Colloquia5

About PRAM-like models Parallel Random Access Machine PRAM - (bit vector models in [Pratt et al. 1974] and PRAM models in [Fortune and Wyllie 1978]) is considered a natural generalization of the Random Access Machine model. Parallel Memory Hierarchy [Alpern et al. 1993] is also a generalization, but this time of the Memory Hierarchy model applied to the RAM model. Bulk Synchronous Parallel model divides the program in super-steps [Valiant 1990]. Latency-overhead-gap-Processors LogP - is designed to model the communication cost [Culler et al. 1991]. November 6, 2014ETTI Colloquia6

How parallel computing could be consistently restarted 1. Use Kleenes partial recursive functions model as the foundational mathematical framework 2. Define an abstract machine model using meaningful forms derived from Kleenes model 3. Interface the abstract machine with an architectural (low level) description based on Backus FP Systems 4. Provide the simplest generic parallel structure able to run the functions requested by the architecture 5. Evaluate, using the computational motifs highlighted by Berkeleys View, the options made in the previous three steps and improve them when needed November 6, 2014ETTI Colloquia7

Kleenes mathematical model for parallel computation From the three rules: composition primitive recursion minimalization only the first one, the composition, is independent. f(x) = g(h 1 (x), h m (x)) November 6, 2014ETTI Colloquia8

Integral parallel abstract model: data-parallel November 6, 2014ETTI Colloquia9

Integral parallel abstract model: reduction-parallel November 6, 2014ETTI Colloquia10

Integral parallel abstract model: speculative-parallel November 6, 2014ETTI Colloquia11

Integral parallel abstract model: time-parallel November 6, 2014ETTI Colloquia12

Integral parallel abstract model: thread-parallel November 6, 2014ETTI Colloquia13

Putting all forms together: integral parallel abstract model The MapReduce abstract model: Map means data, speculative and thread parallelism Reduce means reduce parallelism November 6, 2014ETTI Colloquia14

From one-chip to cloud: MapReduce recursive abstract model November 6, 2014ETTI Colloquia15

Backus architectural description John Backus: Can Programming Be Liberated from the von Neumann Style? A Functional Style and Its Algebra of Programs, Communications of the ACM, August, 1978. Functional Programming Systems primitive functions functional forms definitions November 6, 2014ETTI Colloquia16

Functional forms Apply to all: f : x (x = ) Construction: [f 1, , f p ] : x Threaded construction: [f 1, , f p ] : x (x = ) Insert: /f : x ((x = ) & (p 2)) f : > Composition: (f q f q-1 f 1 ) : x f q : (f q-1 : (f q-2 : ( :(f 1 : x)))) November 6, 2014ETTI Colloquia17

Kleene Backus synergy November 6, 2014ETTI Colloquia18

MapReduce hierarchy programming Any level in the hierarchy uses the same programming forms: Map & Reduce (define (Map funcs args) (cond ((and (atom? funcs) (atom? args)) ; one funcs one args (funcs args) ) ((and (atom? funcs) (list? args)) ; one funcs many args (if (null? args)() (cons(funcs(car args)) (Map funcs (cdr args))) )) ((and (list? funcs) (atom? args)) ; many funcs one args (if (null? funcs) () (cons((car funcs) args) (Map (cdr funcs) args))) )) ((and (list? funcs) (list? args)) ; many funcs many args (if (or (null? funcs)(null? args))() (cons((car funcs) (car args))(Map (cdr funcs) (cdr args))) )) November 6, 2014ETTI Colloquia19

MapReduce hierarchy programming (define(Reduce binaryOp argList) (cond((atom? argList)argList) (#t(binaryOp(car argList) (Reduce binaryOp (cdr argList)))) )) The 0-level functions in the hierarchy are: Add, Sub, Mult, And, Or, Xor, Inc, Dec, Not, Max, Min, November 6, 2014ETTI Colloquia20

Generic one-chip parallel structure November 6, 2014ETTI Colloquia21

The ConnexArray TM : BA1024 Last version, March 2008 65 nm 99 mm 2 (entire chip) 1024 16-bit cells 1 KB/cell 400 MHz 400 GOPS > 120 GOPS/W > 6.25 GOPS/mm 2 The first version, 1111 mm 2, in 90 nm November 6, 2014ETTI Colloquia22

Updated version in 28 nm 2048 32-bit cells with 8KB/cell 1MHz < 15Watt, at T

Documents

Gheorghe M. Ștefan . “The semiconductor industry threw the equivalent of a Hail Mary pass when it switched from making microprocessors