Upload
silvia-moore
View
229
Download
1
Tags:
Embed Size (px)
Citation preview
Data Parallelism
Companion slides forThe Art of Multiprocessor Programming
by Maurice Herlihy & Nir Shavit
Ever Wonder …
Art of Multiprocessor Programming 2
When did the term multicore become popular?
A multi-core processor is a single computing component with two or more independent actual central processing units, which are the units that read and execute program instructions.)
wikipedia
Let’s Ask Google Ngram!
Art of Multiprocessor Programming 3
usage of multicore in books by publication year
Let’s Ask Google Ngram!
Art of Multiprocessor Programming 4
This part since 2000 is obvious …
Let’s Ask Google Ngram!
Art of Multiprocessor Programming 5
???
Let’s Ask Google Ngram!
Art of Multiprocessor Programming 6
multicore cablemulticore fiber
…
but we digress …
WordCount
Art of Multiprocessor Programming 7
alpha 8
easy to do sequentially …what about in parallel?
bravo 3charlie 9
…
zulu 1
MapReduce
Art of Multiprocessor Programming 8
…
chapter 1 chapter 2 chapter k
split text among mapping threads
Map Phase
Art of Multiprocessor Programming 9
…
chapter 1 chapter 2 chapter k
must count words!
must count words!
must count words!
a mapping thread per chapter
Map Phase
Art of Multiprocessor Programming 10
chapter 1
alpha 9juliet 2,alpha 1tango 4
…each mapper thread produces a stream …of key-value pairs …
key: wordvalue: local count
Art of Multiprocessor Programming 11
Mapper Class
abstract class Mapper<IN, K, V> extends RecursiveTask<Map<K, V>> { IN input; public void setInput(IN anInput) { input = anInput; }}
Art of Multiprocessor Programming 12
Mapper Class
abstract class Mapper<IN, K, V> extends RecursiveTask<Map<K, V>> { IN input; public void setInput(IN anInput) { input = anInput; }}
input: document fragment
Art of Multiprocessor Programming 13
Mapper Class
abstract class Mapper<IN, K, V> extends RecursiveTask<Map<K, V>> { IN input; public void setInput(IN anInput) { input = anInput; }}
key: individual word
Art of Multiprocessor Programming 14
Mapper Class
abstract class Mapper<IN, K, V> extends RecursiveTask<Map<K, V>> { IN input; public void setInput(IN anInput) { input = anInput; }}
value: local count
Art of Multiprocessor Programming 15
Mapper Class
abstract class Mapper<IN, K, V> extends RecursiveTask<Map<K, V>> { IN input; public void setInput(IN anInput) { input = anInput; }}
a task that runs in parallel with other tasks
Art of Multiprocessor Programming 16
Mapper Class
abstract class Mapper<IN, K, V> extends RecursiveTask<Map<K, V>> { IN input; public void setInput(IN anInput) { input = anInput; }}
produces a map: word count
Art of Multiprocessor Programming 17
Mapper Class
abstract class Mapper<IN, K, V> extends RecursiveTask<Map<K, V>> { IN input; public void setInput(IN anInput) { input = anInput; }}
initialize input: which document fragment?
Art of Multiprocessor Programming 18
WordCount Mapperclass WordCountMapper extends mapreduce.Mapper< List<String>, String, Long > { … }
Art of Multiprocessor Programming 19
WordCount Mapperclass WordCountMapper extends mapreduce.Mapper< List<String>, String, Long > { … }document fragment is list of words
Art of Multiprocessor Programming 20
WordCount Mapperclass WordCountMapper extends mapreduce.Mapper< List<String>, String, Long > { … }document fragment is list of words
map each word …
Art of Multiprocessor Programming 21
WordCount Mapperclass WordCountMapper extends mapreduce.Mapper< List<String>, String, Long > { … }document fragment is list of words
map each word …to its count in the fragment
Art of Multiprocessor Programming 22
WordCount MapperMap<String,Long> compute() { Map<String,Long> map = new HashMap<>(); for (String word : input) { map.merge(word, 1L, (x, y) -> x + y); } return map; } }
Art of Multiprocessor Programming 23
WordCount MapperMap<String,Long> compute() { Map<String,Long> map = new HashMap<>(); for (String word : input) { map.merge(word, 1L, (x, y) -> x + y); } return map; } }
the compute() method constructs the local word count
Art of Multiprocessor Programming 24
WordCount MapperMap<String,Long> compute() { Map<String,Long> map = new HashMap<>(); for (String word : input) { map.merge(word, 1L, (x, y) -> x + y); } return map; } }
create a map to hold the output
Art of Multiprocessor Programming 25
WordCount MapperMap<String,Long> compute() { Map<String,Long> map = new HashMap<>(); for (String word : input) { map.merge(word, 1L, (x, y) -> x + y); } return map; } }
examine each word in the document fragment
Art of Multiprocessor Programming 26
WordCount MapperMap<String,Long> compute() { Map<String,Long> map = new HashMap<>(); for (String word : input) { map.merge(word, 1L, (x, y) -> x + y); } return map; } } increment that word’s
count in the map
Art of Multiprocessor Programming 27
WordCount MapperMap<String,Long> compute() { Map<String,Long> map = new HashMap<>(); for (String word : input) { map.merge(word, 1L, (x, y) -> x + y); } return map; } }
when the local count is complete, return the map
Reduce Phase
Art of Multiprocessor Programming 28
…
alpha 4bravo 2
…zulu 1
a reducer thread merges mapper outputs
alpha 2juliet 1tango 1
…
alpha 1foxtrot 1papa 1tango 1
…
alpha 1oscar 1,bravo
2…
alpha 3bravo 2
…zulu 1
Reduce Phase
Art of Multiprocessor Programming 29
the reducer task produces a stream …of key-value pairs …
key: word value: word count
Art of Multiprocessor Programming 30
Reducer Classabstract class Reducer<K, V, OUT> extends RecursiveTask<OUT> { K key; List<V> valueList; public void setInput( K aKey, List<V> aList) { key = aKey; valueList = aList; }}
Art of Multiprocessor Programming 31
Reducer Classabstract class Reducer<K, V, OUT> extends RecursiveTask<OUT> { K key; List<V> valueList; public void setInput( K aKey, List<V> aList) { key = aKey; valueList = aList; }}
each reducer is givena single key (word)
Art of Multiprocessor Programming 32
Reducer Classabstract class Reducer<K, V, OUT> extends RecursiveTask<OUT> { K key; List<V> valueList; public void setInput( K aKey, List<V> aList) { key = aKey; valueList = aList; }}
and a list of associated values (word count per fragment)
Art of Multiprocessor Programming 33
Reducer Classabstract class Reducer<K, V, OUT> extends RecursiveTask<OUT> { K key; List<V> valueList; public void setInput( K aKey, List<V> aList) { key = aKey; valueList = aList; }}
It produces a single summary value(the total count for that word)
WordCount
Art of Multiprocessor Programming 34
0.037
normalizing document wordcountgives a fingerprint vector
0.002
0.045
…
0.000
Document Fingerprint
Art of Multiprocessor Programming 35
a fingerprint is a point in a high-dimensional space
Clustering
Art of Multiprocessor Programming 36
romance novels
tango lyrics
Usenix Procedings
k-means
Art of Multiprocessor Programming 37
Find k clusters from raw data
k-means
Art of Multiprocessor Programming 38
Find k clusters from raw data
each vector closer to those in same cluster …
than in different clusters.
MapReduce
Art of Multiprocessor Programming 39
…
thread 1 thread 2 thread k
split points among mapping threads
k-means
Art of Multiprocessor Programming 40
Reducer picks k “centers” at random
Reduce Phase
Art of Multiprocessor Programming 41
0 c0
1 c1
2 c2
…
thread 1 thread 2 thread k
reducer sends key-value pair to mappers
key: cluster number
value: center point
Mappers
Art of Multiprocessor Programming 42
Each mapper uses centers to assign each vector to a cluster
0 c0
1 c1
2 c2
Mappers
Art of Multiprocessor Programming 43
Each mapper uses centers to assign each vector to a cluster
0 c0
1 c1
2 c2
Mappers
Art of Multiprocessor Programming 44
p0 2p1 1p2 1p3 0
mapper sends key-value stream to reducerkey: point
value: cluster ID
C0 = {…}C1 = {…}C2 = {…}
Back at the Reducer
Art of Multiprocessor Programming 45
The reducer merges the streams ….and assembles clusters …
Back at the Reducer
Art of Multiprocessor Programming 46
The reducer computes new centersbased on new clusters …
0 c0’1 c1’2 c2’
Once is Not Enough
Art of Multiprocessor Programming 47
…
thread 1 thread 2 thread k
reducer sends new centers to mappers
process ends when centers become stable
0 c0’1 c1’2 c2’
To Recaptulate
Art of Multiprocessor Programming 48
We saw two problems …
wordcount & k-means …
with similar parallel solutions
Map part is parallel …
Reduce part is sequential.
Art of Multiprocessor Programming 49
abstraction
Map Function
Art of Multiprocessor Programming 50
(k1, v1) list(k2, v2)
doc, contents word, count
cluster ID, center point, cluster ID
Reduce Function
Art of Multiprocessor Programming 51
(k2, list(v2)) list(v2)
word, list of counts count
cluster ID, list of points new cluster center
Example
Art of Multiprocessor Programming 52
Distributed Grep
Map:
line of document
Reduce:
copy line to display
Example
Art of Multiprocessor Programming 53
URL Access Frequency
Map:
(URL, local count)
Reduce:
(URL, total count)
Example
Art of Multiprocessor Programming 54
Reverse web link graph
Map:
(target link, source page)
Reduce:
(target link, list of source pages)
Other Examples
Art of Multiprocessor Programming 55
histogram
matrix multiplication
PageRank
Betweenness centrality
Art of Multiprocessor Programming 56
Distributed MapReduce
Google, Hadoop, etc…
Communication by message
Fault-tolerance important
Art of Multiprocessor Programming 57
Multicore MapReduce
Phoenix, Phoenix++, Metis …
Communication by shared memory objects
Fault-tolerance unimportant
Costs
Art of Multiprocessor Programming 58
key-value layout
memory allocation
mechanism overhead
cache pressure
static vs dynamic
Part Two
Data Streams59
Streams
Art of Multiprocessor Programming 60
sequence of transformations
transformation
transformation
data
source
data
data
consumer
no relation toI/O streams
sometimes in parallel
Streams
Art of Multiprocessor Programming 61
transformation
transformation
data
source
data
data
consumer
transformations given bymathematical functions
Streams
Art of Multiprocessor Programming 62
transformation
transformation
data
source
data
data
consumer
transformation createsnew stream
no modifications or side-effects
correctness easier?
Functional Programming
Art of Multiprocessor Programming 63
functions map old state to new state
old state never changed
no complex side-effects
elegant, easier proofs of correctness
Oh, Really?
Art of Multiprocessor Programming 64
“Functional languages are unnatural to use; but so are knives and forks, diplomatic protocols, double-entry bookkeeping, and a host of other things modern civilization has found useful.”
Jim Morris, 1982
Haiku
Art of Multiprocessor Programming 65
esthetically pleasing
only works on those who understand Haiku
Karate
Art of Multiprocessor Programming 66
esthetically pleasing
works even on those who do not understand Karate
Jim Morris’s Question
Art of Multiprocessor Programming 67
Is functional programming more like Haiku or Karate?
1981: Haiku Today: Karate
Laziness
Art of Multiprocessor Programming 68
x x+1
1,2,3,…
x 2x
sum
No computation until absolutely necessary
add 1 to each element: no computation
double each element: no computation
2(xi+1) is terminal
Laziness
Art of Multiprocessor Programming 69
x x+1
1,2,3,…
x 2x
collect in Listmove to container is terminal
Laziness
Art of Multiprocessor Programming 70
x x+1
1,2,3,…
x 2x
collect in ListLaziness permits optimizations
x 2(x+1)
Laziness
Art of Multiprocessor Programming 71
1 1 2 3 5 8 12 20 32 …
Laziness permits infinite streams …
Stream<Integer> fib = new FibStream();
Art of Multiprocessor Programming 72
Unbounded Random Stream
Stream<Double> randomDoubleStream() { return Stream.generate( () -> random.nextDouble() );}
Art of Multiprocessor Programming 73
Unbounded Random Stream
Stream<Double> randomDoubleStream() { return Stream.generate( () -> random.nextDouble() );}Unbounded stream of double-precision random numbers
Art of Multiprocessor Programming 74
Random Stream
Stream<Double> randomDoubleStream() { return Stream.generate( () -> random.nextDouble() );}
Stream that generates new elements on the fly
Art of Multiprocessor Programming 75
Random Stream
Stream<Double> randomDoubleStream() { return Stream.generate( () -> random.nextDouble() );}
function to call when generating new element
example of Java lambda expression (anon method)
Art of Multiprocessor Programming 76
WordCountList<String> readFile(String fileName) { … return reader .lines() .map(String::toLowerCase) .flatMap(s -> pattern.splitAsStream(s)) .collect(Collectors.toList());}
Art of Multiprocessor Programming 77
WordCountList<String> readFile(String fileName) { … return reader .lines() .map(String::toLowerCase) .flatMap(s -> pattern.splitAsStream(s)) .collect(Collectors.toList());}
puts each word from the document into a List
Art of Multiprocessor Programming 78
WordCountList<String> readFile(String fileName) { … return reader .lines() .map(String::toLowerCase) .flatMap(s -> pattern.splitAsStream(s)) .collect(Collectors.toList());}
open the file, create a FileReader
Art of Multiprocessor Programming 79
WordCountList<String> readFile(String fileName) { … return reader .lines() .map(String::toLowerCase) .flatMap(s -> pattern.splitAsStream(s)) .collect(Collectors.toList());}
how a stream program looks
each line creates a new stream
Art of Multiprocessor Programming 80
WordCountList<String> readFile(String fileName) { … return reader .lines() .map(String::toLowerCase) .flatMap(s -> pattern.splitAsStream(s)) .collect(Collectors.toList());}
turn the FileReader into a stream of lines, each line a string
Art of Multiprocessor Programming 81
WordCountList<String> readFile(String fileName) { … return reader .lines() .map(String::toLowerCase) .flatMap(s -> pattern.splitAsStream(s)) .collect(Collectors.toList());}
map creates a new stream by applying a function to each stream element
here, converts to lower case
Art of Multiprocessor Programming 82
WordCountList<String> readFile(String fileName) { … return reader .lines() .map(String::toLowerCase) .flatMap(s -> pattern.splitAsStream(s)) .collect(Collectors.toList());}
flatMap replaces one stream element with multiple stream elements
here, splits line into words
Art of Multiprocessor Programming 83
WordCountList<String> readFile(String fileName) { … return reader .lines() .map(String::toLowerCase) .flatMap(s -> pattern.splitAsStream(s)) .collect(Collectors.toList());}
collect puts stream elements in a container
here, in a List
terminal operation
Art of Multiprocessor Programming 84
WordCountList<String> readFile(String fileName) { … return reader .lines() .map(String::toLowerCase) .flatMap(s -> pattern.splitAsStream(s)) .collect(Collectors.toList());}
No loops
No conditionals
No mutable objects
Art of Multiprocessor Programming 85
WordCountMap<String,Long> map = text .stream() .collect( Collectors.groupingBy( Function.identity(), Collectors.counting()));
now let’s count the words
Art of Multiprocessor Programming 86
WordCountMap<String,Long> map = text .stream() .collect( Collectors.groupingBy( Function.identity(), Collectors.counting()));
start with list of words
Art of Multiprocessor Programming 87
WordCountMap<String,Long> map = text .stream() .collect( Collectors.groupingBy( Function.identity(), Collectors.counting()));
turn List into a stream
Art of Multiprocessor Programming 88
WordCountMap<String,Long> map = text .stream() .collect( Collectors.groupingBy( Function.identity(), Collectors.counting()));
put stream into a container (Map)word count
Art of Multiprocessor Programming 89
WordCountMap<String,Long> map = text .stream() .collect( Collectors.groupingBy( Function.identity(), Collectors.counting()));
each element’s key is that element
Art of Multiprocessor Programming 90
WordCountMap<String,Long> map = text .stream() .collect( Collectors.groupingBy( Function.identity(), Collectors.counting()));
each element’s value is the number of times it appears
Art of Multiprocessor Programming 91
WordCountMap<String,Long> map = text .stream() .collect( Collectors.groupingBy( Function.identity(), Collectors.counting()));
No loops
No conditionals
No mutable objects
k-Means
Art of Multiprocessor Programming 92
class Point { Point(double x, double y) {…} Point plus(Point other) {…} Point scale(double x) {…} static Point barycenter( List<Point> cluster ) {…}}
k-Means
Art of Multiprocessor Programming 93
class Point { Point(double x, double y) {…} Point plus(Point other) {…} Point scale(double x) {…} static Point barycenter( List<Point> cluster ) {…}}
stream-based!
BaryCenter
Art of Multiprocessor Programming 94
Stream Barycenter
Art of Multiprocessor Programming 95
Point barycenter(List<Point> cluster){ double numPoints = cluster.size(); Optional<Point> sum = cluster .stream() .reduce(Point::plus); return sum.get() .scale(1 / numPoints); }
Stream Barycenter
Art of Multiprocessor Programming 96
Point barycenter(List<Point> cluster){ double numPoints = cluster.size(); Optional<Point> sum = cluster .stream() .reduce(Point::plus); return sum.get() .scale(1 / numPoints); }
cluster size
Stream Barycenter
Art of Multiprocessor Programming 97
Point barycenter(List<Point> cluster){ double numPoints = cluster.size(); Optional<Point> sum = cluster .stream() .reduce(Point::plus); return sum.get() .scale(1 / numPoints); }
turn List into Stream
Reduce
Art of Multiprocessor Programming 98
stream.reduce(+):
() ;
(a) a
(a,b) a+b
(a,b,c) (a+b)+c
etc. …terminal operation
k-Means
Art of Multiprocessor Programming 99
Point barycenter(List<Point> cluster){ double numPoints = cluster.size(); Optional<Point> sum = cluster .stream() .reduce(Point::plus); return sum.get() .scale(1 / numPoints); } sum points in cluster
Optional because sum might be empty!
k-Means
Art of Multiprocessor Programming 100
Point barycenter(List<Point> cluster){ double numPoints = cluster.size(); Optional<Point> sum = cluster .stream() .reduce(Point::plus); return sum.get() .scale(1 / numPoints); }
extract sum and divide by # points
k-Means
Art of Multiprocessor Programming 101
List<Point> points = readFile("cluster.dat");centers = randomDistinctCenters(points);double convergence = 1.0; while (convergence > EPSILON) { … }
k-Means
Art of Multiprocessor Programming 102
List<Point> points = readFile("cluster.dat");centers = randomDistinctCenters(points);double convergence = 1.0; while (convergence > EPSILON) { … }
read points from file
k-Means
Art of Multiprocessor Programming 103
List<Point> points = readFile("cluster.dat");centers = randomDistinctCenters(points);double convergence = 1.0; while (convergence > EPSILON) { … }
pick random centers
k-Means
Art of Multiprocessor Programming 104
List<Point> points = readFile("cluster.dat");centers = randomDistinctCenters(points);double convergence = 1.0; while (convergence > EPSILON) { … }
keep going until centers are stable
Compute New Clusters
Art of Multiprocessor Programming 105
while (convergence > EPSILON) { Map<Integer,List<Point>> clusters = points .stream() .collect( Collectors.groupingBy( p -> closestCenter(centers, p) ) );}
turn list of points into a Stream
Compute New Clusters
Art of Multiprocessor Programming 106
while (convergence > EPSILON) { Map<Integer,List<Point>> clusters = points .stream() .collect( Collectors.groupingBy( p -> closestCenter(centers, p) ) );}
put each point in a mapkey is closest centervalue is list of points with that center
k-Means Con’t
Art of Multiprocessor Programming 107
while (convergence > EPSILON) { ... Map<Integer, Point> newCenters = clusters .entrySet() .stream() .collect( Collectors.toMap( e -> e.getKey(), e -> Point.barycenter(e.getValue()) ) ); convergence = distance(centers, newCenters); centers = newCenters;}
compute new centers map: ID center
k-Means Con’t
Art of Multiprocessor Programming 108
while (convergence > EPSILON) { ... Map<Integer, Point> newCenters = clusters .entrySet() .stream() .collect( Collectors.toMap( e -> e.getKey(), e -> Point.barycenter(e.getValue()) ) ); convergence = distance(centers, newCenters); centers = newCenters;}
turn map into a Stream of (ID, point) pairs
k-Means Con’t
Art of Multiprocessor Programming 109
while (convergence > EPSILON) { ... Map<Integer, Point> newCenters = clusters .entrySet() .stream() .collect( Collectors.toMap( e -> e.getKey(), e -> Point.barycenter(e.getValue()) ) ); convergence = distance(centers, newCenters); centers = newCenters;}
turn stream into a mapID barycenter
k-Means Con’t
Art of Multiprocessor Programming 110
while (convergence > EPSILON) { ... Map<Integer, Point> newCenters = clusters .entrySet() .stream() .collect( Collectors.toMap( e -> e.getKey(), e -> Point.barycenter(e.getValue()) ) ); convergence = distance(centers, newCenters); centers = newCenters;}
new Key is still the cluster ID
k-Means Con’t
Art of Multiprocessor Programming 111
while (convergence > EPSILON) { ... Map<Integer, Point> newCenters = clusters .entrySet() .stream() .collect( Collectors.toMap( e -> e.getKey(), e -> Point.barycenter(e.getValue()) ) ); convergence = distance(centers, newCenters); centers = newCenters;}
new value is the barycenter computed earlier
k-Means Con’t
Art of Multiprocessor Programming 112
while (convergence > EPSILON) { ... Map<Integer, Point> newCenters = clusters .entrySet() .stream() .collect( Collectors.toMap( e -> e.getKey(), e -> Point.barycenter(e.getValue()) ) ); convergence = distance(centers, newCenters); centers = newCenters;}
If centers have moved, start again with the new centers
Functional k-Means
Art of Multiprocessor Programming 113
many fewer lines of code
easier to read (really!)
easier to reason
easier to optimize
Art of Multiprocessor Programming 114
Parallelism?Arrays.asList("Arlington", "Berkeley", "Clarendon", "Dartmouth", "Exeter") .stream() .forEach(s -> printf("%s\n", s));
So far, Streams are sequential
Art of Multiprocessor Programming 115
Parallelism?Arrays.asList("Arlington", "Berkeley", "Clarendon", "Dartmouth", "Exeter") .stream() .forEach(s -> printf("%s\n", s));
make List of strings and turn them into a stream
Art of Multiprocessor Programming 116
Parallelism?Arrays.asList("Arlington", "Berkeley", "Clarendon", "Dartmouth", "Exeter") .stream() .forEach(s -> printf("%s\n", s));
forEach applies a method to each element (not functional)
Art of Multiprocessor Programming 117
Parallelism?Arrays.asList("Arlington", "Berkeley", "Clarendon", "Dartmouth", "Exeter") .stream() .forEach(s -> printf("%s\n", s));
ArlingtonBerkeleyClarendonDartmouthExeter
prints
Art of Multiprocessor Programming 118
Parallelism?Arrays.asList("Arlington", "Berkeley", "Clarendon", "Dartmouth", "Exeter") .parallelStream() .forEach(s -> printf("%s\n", s));
turn List into a parallel stream
Art of Multiprocessor Programming 119
Parallelism?Arrays.asList("Arlington", "Berkeley", "Clarendon", "Dartmouth", "Exeter") .parallelStream() .forEach(s -> printf("%s\n", s));
ArlingtonBerkeleyClarendonDartmouthExeter
prints
Dartmouth ArlingtonClarendonBerkeleyExeter
Dartmouth BerkeleyExeterArlingtonClarendon
Art of Multiprocessor Programming 120
Parallelism?Arrays.asList("Arlington", "Berkeley", "Clarendon", "Dartmouth", "Exeter") .stream() .parallel() .forEach(s -> printf("%s\n", s));
can turn stream into a parallel stream
Art of Multiprocessor Programming 121
Pitfallslist.stream().forEach( s -> list.add(0));
Art of Multiprocessor Programming 122
Pitfallslist.stream().forEach( s -> list.add(0));
lambda (function) must not modify source!
Art of Multiprocessor Programming 123
Pitfallssource.parallelStream() .forEach( s -> target.add(s));
exception if target not thread-safeorder added is non-deterministic
Conclusions
Art of Multiprocessor Programming 124
Streams support
functional programming
data parallelism
compiler optimizations