23
Problem-solving on large-scale clusters: theory and applications Lecture 1: Introduction and Theoretical Background

Problem-solving on large-scale clusters:   theory and applications

  • Upload
    fancy

  • View
    45

  • Download
    1

Embed Size (px)

DESCRIPTION

Problem-solving on large-scale clusters:   theory and applications. Lecture 1: Introduction and Theoretical Background. Today’s Outline. Introductions Quiz Course Objective & Administrative Info fold and map : Theory. Introductions. Name + trivia. Quiz Time!. - PowerPoint PPT Presentation

Citation preview

Page 1: Problem-solving on large-scale clusters:   theory and applications

Problem-solving on large-scale clusters:  theory and applications Lecture 1: Introduction and

Theoretical Background

Page 2: Problem-solving on large-scale clusters:   theory and applications

Today’s Outline• Introductions• Quiz• Course Objective & Administrative Info• fold and map: Theory

Page 3: Problem-solving on large-scale clusters:   theory and applications

Introductions• Name + trivia

Page 4: Problem-solving on large-scale clusters:   theory and applications

Quiz Time!• Not graded; helps us calibrate how difficult

to make this seminar• Okay (and encouraged!) to leave

questions blank

Page 5: Problem-solving on large-scale clusters:   theory and applications

Course Outline• Introduction to parallel programming and

distributed system design– successfully decompose problems into map and

reduce stages – decide whether a problem can be solved with a

parallel algorithm, and evaluate its strengths and weaknesses

– understand the basic tradeoffs and major issues in distributed system design

– know the common pitfalls of distributed system design • This seminar is light on “facts” and “recipes”,

heavy on “tradeoffs”

Page 6: Problem-solving on large-scale clusters:   theory and applications

Course Information (1 of 2)• Lecturers:

– Albert J. Wong– Hannah Tang

• Lab consultant:– Alden King

• Liasons:– John Zahorjan– Christophe Bisciglia

Page 7: Problem-solving on large-scale clusters:   theory and applications

Course Information (2 of 2)• Textbook

– None; see online course readings• Webpage:

http://www.cs.washington.edu/cse490h• Mailing lists:

– Course discussion: cse490h@...

Page 8: Problem-solving on large-scale clusters:   theory and applications

Warning: Theory Ahead!• Before we can talk about MapReduce, we

need to talk about the concepts on which it is founded:– Programming languages: fold and map– Distributed systems: data dependancies

Page 9: Problem-solving on large-scale clusters:   theory and applications

Digression: Function Objects (1 of 3)

• A function object is a function that can be manipulated as an object– Sometimes referred to as a “functor”

• In Java, this is usually implemented with a class that has an execute() (or similarly named) method

Page 10: Problem-solving on large-scale clusters:   theory and applications

class ReverseAlphaOrder implements Comparable {

public int Compare(Object o1, Object o2) {

if(o1 instanceof String && o2 instanceof String) {

return String(o1) >= String(o2);

}

}

String[] myStrings;

ReverseAlphaOrder rao;

Collections.sort(myStrings, rao);

Digression: Function Objects (2 of 3)

• Example: Inheriting from the Comparable interface to use Collections.sort()

The underlying idea is to pass the “greater than” operation to sort()

Page 11: Problem-solving on large-scale clusters:   theory and applications

Digression: Function Objects (3 of 3)• In Java, methods that take function objects are

“higher-order functions”– Collections.sort() is a higher-order function

• Mathematically, a “higher order function” is a function which does at least one of the following:– Take one or more functions as input– Output a function

• Examples: – The derivative (from calculus)

d/dx (x3 + 2x) = 3x2 + 2

Page 12: Problem-solving on large-scale clusters:   theory and applications

fold - Introduction• fold is a family of higher-order functions

that process a data structure and return a single value– Commonly, fold takes a function f and a list l, and recursively applies f to “combine” the elements of l

– The return value may be “complex”, e.g. a list• Example:

– fold (+) [1,2,4,8] -> ???– fold (/) [64,8,4,2] -> ???

Page 13: Problem-solving on large-scale clusters:   theory and applications

fold - Directionality• Remember how we said fold was “a family of

functions”? – foldr (/) [64,8,4,2] -> 64 / (8 / (4/2)) -> 16– foldl (/) [64,8,4,2] -> ((64/8) / 4) / 2 -> 1

• “fold right” – recursively applies f over the right side of the list

• “fold left” – recursively applies f over the left side of the list

Right fold Left fold

648

4

÷

÷

2÷ 4

64 8

÷

÷

Page 14: Problem-solving on large-scale clusters:   theory and applications

fold - Questions• Discussion questions:

– What should the base case return?•foldr (+) [] -> ???•foldr (/) [] -> ???

– Can a right fold be implemented as a loop (using tail recursion)? What about left fold?

• Enrichment questions:– What happens to a right fold when given an

infinite list? What about left fold?

Page 15: Problem-solving on large-scale clusters:   theory and applications

fold - Formal Definition• fold takes a function and a list as its inputs –

but it can also take more values. – In particular, fold maintains context / state across

each invocation of f

-- If the list is empty, return the initial value ‘z’foldr f z [] = z -- If the list is not empty, calculate the result of folding the-- rest, and apply f to the first element and to that result.-- The context from previous invocations of f is implicitly -- passed to the current invocation of via foldrfoldr f z (x:xs) = f x (foldr f z xs)

What is the formal definition of foldl?

Page 16: Problem-solving on large-scale clusters:   theory and applications

fold – An Intuition• fold “iterates” over a data structure, and

maintains one unit of state– At each iteration, f is invoked with the current

element and the current state– fold’s return value is the result of f’s final

invocation

Page 17: Problem-solving on large-scale clusters:   theory and applications

map - Introduction• map is a higher-order function that

“transforms” each element in a sequence of elements– Commonly, map takes a function f and a

sequence s, and applies f to each element of s

• Example:– map square_root [1,4,9,16] -> ???

Page 18: Problem-solving on large-scale clusters:   theory and applications

map’s Return Value• map returns a sequence

– The new sequence s’ is not necessarily the same size as s

– The elements of s’ do not necessarily have the same type as the elements of s

Page 19: Problem-solving on large-scale clusters:   theory and applications

• Recall that the sum of N vectors was equal to the sum of their components:

• Let components() decompose a vector into its X and Y components

map’s Return Value – Example

a

ba+b

map components [ ] = , ,

), (,), (, ,= [ ( ) ] ???

, ,, , ,= [ ] ???

Page 20: Problem-solving on large-scale clusters:   theory and applications

map - Questions• Enrichment questions:

– For what values of f and z will fold f z l = l? How can you modify f such that fold f z l = map f l?

– Bonus question: can you implement map in terms of fold?

– Visit foldl.com and foldr.com :)

Page 21: Problem-solving on large-scale clusters:   theory and applications

map – Formal definition• map takes a function and a data structure

as its inputs

-- If the list is empty, there’s nothing to domap f [] = [] -- If the list is not empty, apply f to the first element and-- add the result to the mapping of f on all other elementsmap f (x:xs) = f x : map f xs

What is the complexity of map? What is its runtime?

Page 22: Problem-solving on large-scale clusters:   theory and applications

Exercise (1 of 2)• Individually:

– Determine how these operations can be solved with a fold, a map, or some combination of fold and map:

• Given a list of vectors, add them to determine the resultant vector.

• Ray tracing a single ray– Ray tracing takes a list of rays that intersect the camera, and

traces their path back to their respective lightsources, even across their reflection over several surfaces

• Assuming you had access to a company’s monthly paystubs for all employees for an entire year, calculate how much annual income tax is owed per-person.

• Run-length encoding. – Run-length encoding takes a possibly-repetitive string and

rewrites it as a (value, frequency) pair, eg “aaa b ccccc dd” -> “a3 b c5 d2”.

• Find the smallest element in an array– Come up with some challenging problems yourself!

Page 23: Problem-solving on large-scale clusters:   theory and applications

Exercise (2 of 2)• In small groups, compare your answers to

the above, and stump your team with the problems you came up with!