How to Parallelize an Algorithm

How to Parallelize an Algorithm

Lecture 2

Today’s Outline• Quiz• Functional Programming Review• Algorithm Parallelization• Announcements

– Projects– Readings

Quiz• This one is graded (unlike the last one)

Fold & Map reviewFold:

foldl f z [] = zfoldl f z (x:xs) = foldl f (f z x) xs[foldr f z (x:xs) = f x (foldr f z xs)]– Applies a function to every element in the list– Each iteration can access the result of the previous

How would you parallelize Map? Fold?

Map:map f [] = []map f (x:xs) = f x : map xs– Applies a function to every element in the list– Each iteration is independent of all others

Group Exercises• See handout

Answers to Group ExercisesConcat:

concat xss = foldr (++) [] xss– Given a list of lists, concats all sublists

Group:group xss = foldl group_func [] xssgroup_func (result) (k,v) = if (has (k,v) result) then(map (update (k,v)) result)else (k,v) :: resultupdate (k1,v1) (k2,v_list) = if (EQ k1 k2) then (k1, v1::vlist) else (k2,v_list)

Why Parallelize?

Reasons to parallelize:– Your reasons here

What issues are there in parallelization?

Reasons to not parallelize:– Your reasons here

Implicit Serialization Example

DocInfo:f = read_file("file.txt”)

words = count_uniq_words(f)spaces = count_spaces(f)

f = capitalize(f)f = remove_punct(f)

words2 = count_uniq_words(f)

puts(“unique words: ” + words)puts(“num spaces: ” + spaces)puts(“unique scrubbed words: ” + words2)

Which statements can be reordered?

Data Dependency Graph

f = read_file("file.txt”)


f = capitalize(f)f = remove_punct(f)

words2 = count_uniq_words(f)

puts(“unique words” + words)puts(“num spaces” + spaces)puts(“unique scrubbed words” + words2)

words2

f0

spaces

f1

f2words

read_file

count_spaces

coun

t_un

iq_w

ords

capitalize

rem

ove_

punc

t

coun

t_un

iq_w

ords

console0

console1

console2

puts

puts

puts Which operations can be done in parallel?

Distributing Computation

words2

f0

spaces

f1

f2words

read_file

count_spaces

coun

t_un

iq_w

ords

capitalize

rem

ove_

punc

t

coun

t_un

iq_w

ords

console0

console1

console2

puts

puts

puts

Cpu1

Cpu2

Cpu3

Cpu4

Storage

Ram

Eliminating Dependencies 1 of 3

Synchronization points?– f[0,1,2]

– console[0,1,2]

words2

f0

spaces

f1

f2words

read_file

count_spaces

coun

t_un

iq_w

ords

capitalize

rem

ove_

punc

t

coun

t_un

iq_w

ords

console0

console1

console2

puts

puts

puts

Ideas for removing them:– Your ideas here


f = read_file("file.txt”)scrubbed_f = scrub_words(f)


words2 = count_uniq_words(scrubbed_f)


captialize, remove_punct can be combined and run first to create a copy of the data before “counting”.

DocInfo 2.0:

Dependency Graph 2.0

f = read_file("file.txt”)scrubbed_f = scrub_words(f)


words2 = count_uniq_words(scrubbed_f)


words2

f

spaceswords

read_file

count_spaces

count_uniq_words

coun

t_un

iq_w

ords

console0

console1

console2

puts

puts

puts

scrubbed_f

scrub_words

Distributing Computation 2.0

Cpu1

Cpu2

Cpu3

Cpu4

Storage

Ram

words2

f

spaceswords

read_file

count_spaces

count_uniq_words

coun

t_un

iq_w

ords

console0

console1

console2

puts

puts

puts

scrubbed_f

scrub_words




words2 = count_uniq_scrubbed_words(f)


captialize, remove_punct only needs to be applied to each word (not the whole file) before “counting”.

DocInfo 3.0:

Dependency Graph 3.0



words2 = count_uniq_scrubbed_words(f)

puts(“unique words” + words)puts(“num spaces” + spaces)puts(“unique scrubbed words” + words2)words2

f0

spaceswords

read_file

count_spacescount_uniq_scrubbed_words

coun

t_un

iq_w

ords

console0

console1

console2

puts

puts

puts

Distributing Computation 3.0

words2

f0

spaceswords

read_file

count_spacescount_uniq_scrubbed_words

coun

t_un

iq_w

ords

console0

console1

console2

puts

puts

puts

Cpu1

Cpu2

Cpu3

Cpu4

Storage

Ram

Parallelization Summary

Why avoid data dependencies?– lowers complexity– makes parallelization possible

How do you avoid data dependencies?– avoid stateful algorithms– avoid side effects (clone state instead of modifying)– avoid global variables and member variables

Parallelization tradeoff:Good:better scalability

Bad:less algorithmic flexibility, higher complexityNeutral:optimizes for large input over small input

Parallelizing Map

What’s required to parallelize map?– function needs to be stateless– function available to each computation unit– input accessible by all computation units– output ordering isn’t important

Definition of map:map f [] = []map f (x:xs) = f x : map xs

Parallelizing Fold

What’s required to parallelize fold?You can’t.

Why can’t you parallelize fold?Each step depends on the result of the previous.

Definition of fold:foldl f z [] = zfoldl f z (x:xs) = foldl f (f z x) xs

How is fold useful in parallel computing then?

MapReduceMapReduce:

mapreduce fm fr l =map (reducePerKey fr) (group (map fm l))

reducePerKey fr (k,v_list) = (k, (foldl (fr k) [] v_list))

– Assume map here is actually concatMap.– Argument l is a list documents– The result of first map is a list of key-value pairs– The function fr takes 3 arguments key, context, current.

With currying, this allows for locking the value of “key” for each list during the fold.

MapReduce maps a fold over the sorted result of a map!

Documents

How to Parallelize an Algorithm