Upload
aaralyn
View
37
Download
0
Embed Size (px)
DESCRIPTION
How to Parallelize an Algorithm. Lecture 2. Today’s Outline. Quiz Functional Programming Review Algorithm Parallelization Announcements Projects Readings. Quiz. This one is graded (unlike the last one). Fold & Map review. Fold: foldl f z [] = z foldl f z (x:xs) = foldl f (f z x) xs - PowerPoint PPT Presentation
Citation preview
How to Parallelize an Algorithm
Lecture 2
Today’s Outline• Quiz• Functional Programming Review• Algorithm Parallelization• Announcements
– Projects– Readings
Quiz• This one is graded (unlike the last one)
Fold & Map reviewFold:
foldl f z [] = zfoldl f z (x:xs) = foldl f (f z x) xs[foldr f z (x:xs) = f x (foldr f z xs)]– Applies a function to every element in the list– Each iteration can access the result of the previous
How would you parallelize Map? Fold?
Map:map f [] = []map f (x:xs) = f x : map xs– Applies a function to every element in the list– Each iteration is independent of all others
Group Exercises• See handout
Answers to Group ExercisesConcat:
concat xss = foldr (++) [] xss– Given a list of lists, concats all sublists
Group:group xss = foldl group_func [] xssgroup_func (result) (k,v) = if (has (k,v) result) then(map (update (k,v)) result)else (k,v) :: resultupdate (k1,v1) (k2,v_list) = if (EQ k1 k2) then (k1, v1::vlist) else (k2,v_list)
Why Parallelize?
Reasons to parallelize:– Your reasons here
What issues are there in parallelization?
Reasons to not parallelize:– Your reasons here
Implicit Serialization Example
DocInfo:f = read_file("file.txt”)
words = count_uniq_words(f)spaces = count_spaces(f)
f = capitalize(f)f = remove_punct(f)
words2 = count_uniq_words(f)
puts(“unique words: ” + words)puts(“num spaces: ” + spaces)puts(“unique scrubbed words: ” + words2)
Which statements can be reordered?
Data Dependency Graph
f = read_file("file.txt”)
words = count_uniq_words(f)spaces = count_spaces(f)
f = capitalize(f)f = remove_punct(f)
words2 = count_uniq_words(f)
puts(“unique words” + words)puts(“num spaces” + spaces)puts(“unique scrubbed words” + words2)
words2
f0
spaces
f1
f2words
read_file
count_spaces
coun
t_un
iq_w
ords
capitalize
rem
ove_
punc
t
coun
t_un
iq_w
ords
console0
console1
console2
puts
puts
puts Which operations can be done in parallel?
Distributing Computation
words2
f0
spaces
f1
f2words
read_file
count_spaces
coun
t_un
iq_w
ords
capitalize
rem
ove_
punc
t
coun
t_un
iq_w
ords
console0
console1
console2
puts
puts
puts
Cpu1
Cpu2
Cpu3
Cpu4
Storage
Ram
Eliminating Dependencies 1 of 3
Synchronization points?– f[0,1,2]
– console[0,1,2]
words2
f0
spaces
f1
f2words
read_file
count_spaces
coun
t_un
iq_w
ords
capitalize
rem
ove_
punc
t
coun
t_un
iq_w
ords
console0
console1
console2
puts
puts
puts
Ideas for removing them:– Your ideas here
Eliminating Dependencies 2 of 3
f = read_file("file.txt”)scrubbed_f = scrub_words(f)
words = count_uniq_words(f)spaces = count_spaces(f)
words2 = count_uniq_words(scrubbed_f)
puts(“unique words” + words)puts(“num spaces” + spaces)puts(“unique scrubbed words” + words2)
captialize, remove_punct can be combined and run first to create a copy of the data before “counting”.
DocInfo 2.0:
Dependency Graph 2.0
f = read_file("file.txt”)scrubbed_f = scrub_words(f)
words = count_uniq_words(f)spaces = count_spaces(f)
words2 = count_uniq_words(scrubbed_f)
puts(“unique words” + words)puts(“num spaces” + spaces)puts(“unique scrubbed words” + words2)
words2
f
spaceswords
read_file
count_spaces
count_uniq_words
coun
t_un
iq_w
ords
console0
console1
console2
puts
puts
puts
scrubbed_f
scrub_words
Distributing Computation 2.0
Cpu1
Cpu2
Cpu3
Cpu4
Storage
Ram
words2
f
spaceswords
read_file
count_spaces
count_uniq_words
coun
t_un
iq_w
ords
console0
console1
console2
puts
puts
puts
scrubbed_f
scrub_words
Eliminating Dependencies 3 of 3
f = read_file("file.txt”)
words = count_uniq_words(f)spaces = count_spaces(f)
words2 = count_uniq_scrubbed_words(f)
puts(“unique words” + words)puts(“num spaces” + spaces)puts(“unique scrubbed words” + words2)
captialize, remove_punct only needs to be applied to each word (not the whole file) before “counting”.
DocInfo 3.0:
Dependency Graph 3.0
f = read_file("file.txt”)
words = count_uniq_words(f)spaces = count_spaces(f)
words2 = count_uniq_scrubbed_words(f)
puts(“unique words” + words)puts(“num spaces” + spaces)puts(“unique scrubbed words” + words2)words2
f0
spaceswords
read_file
count_spacescount_uniq_scrubbed_words
coun
t_un
iq_w
ords
console0
console1
console2
puts
puts
puts
Distributing Computation 3.0
words2
f0
spaceswords
read_file
count_spacescount_uniq_scrubbed_words
coun
t_un
iq_w
ords
console0
console1
console2
puts
puts
puts
Cpu1
Cpu2
Cpu3
Cpu4
Storage
Ram
Parallelization Summary
Why avoid data dependencies?– lowers complexity– makes parallelization possible
How do you avoid data dependencies?– avoid stateful algorithms– avoid side effects (clone state instead of modifying)– avoid global variables and member variables
Parallelization tradeoff:Good:better scalability
Bad:less algorithmic flexibility, higher complexityNeutral:optimizes for large input over small input
Parallelizing Map
What’s required to parallelize map?– function needs to be stateless– function available to each computation unit– input accessible by all computation units– output ordering isn’t important
Definition of map:map f [] = []map f (x:xs) = f x : map xs
Parallelizing Fold
What’s required to parallelize fold?You can’t.
Why can’t you parallelize fold?Each step depends on the result of the previous.
Definition of fold:foldl f z [] = zfoldl f z (x:xs) = foldl f (f z x) xs
How is fold useful in parallel computing then?
MapReduceMapReduce:
mapreduce fm fr l =map (reducePerKey fr) (group (map fm l))
reducePerKey fr (k,v_list) = (k, (foldl (fr k) [] v_list))
– Assume map here is actually concatMap.– Argument l is a list documents– The result of first map is a list of key-value pairs– The function fr takes 3 arguments key, context, current.
With currying, this allows for locking the value of “key” for each list during the fold.
MapReduce maps a fold over the sorted result of a map!