Minimal MapReduce Algorithms Yufei Tao Chinese University of Hong Kong, Hong Kong

Preview:

Citation preview

Minimal MapReduce Algori

thms

Yufei Tao

Chinese University of Ho

ng Kong, Hong Kong

outline

• INTRODUCTION• PRELIMINARY AND RELATED WORK• SORTING• BASIC MINIMAL ALGORITHMS IN DATABAS

ES• SLIDING AGGREGATION• EXPERIMENTS• CONCLUSIONS

introduction• Motivation Although these principles have guided th

e design of MapReduce algorithms, the previous practices have mostly been on a best-effort basis, paying relatively less attention to enforcing serious constraints on different performance metrics.

introduction• Minimal MapReduce Algorithms

Minimum footprint.Minimum footprint.Bounded net-trafficBounded net-trafficConstant roundConstant roundOptimal computationOptimal computation

introduction• Contributions

The core of this work comprises of neat mini

mal algorithms for two problems:

SortingSortingSliding AggregationSliding Aggregation

introductionSortingSortingSliding AggregationSliding Aggregation

related work

MapReduceMapReduceTeraSortTeraSortAlgorithms on MapReduceAlgorithms on MapReduceRelevance to Minimal AlgorithmsRelevance to Minimal Algorithms

related work-MR

Statelessness for Fault ToleranceStatelessness for Fault Tolerance

Some MapReduce implementations (e.g., Hadoop) place the requirement that, at the end of a round, each machine should send all the data in its storage to a distributed file system.

related work-TS

What's TeraSort?What's TeraSort?

sorting-TS

sortingDefine Si = S ∩(bi−1, bi], for 1 ≤ i ≤ t. In Round 2, all the objects in Si are gathered by Mi, which sorts them in the reducephase. For TeraSort to be minimal, it must hold:P1. s = O(m).P1. s = O(m).P2. |Si| = O(m) for all 1 ≤ i ≤ tP2. |Si| = O(m) for all 1 ≤ i ≤ t

sortingPr

Discussion

Minimality

sorting

Removing the Broadcast Assumption

(by changing round 1)

in databases

Ranking & Skyline

Group by

Semi-Join

in databasesGroup by

example

sliding aggregation

,

,

( )

( ) ( )o window o

win sum o w o

The window sum of o equal:

sliding aggregation

Sorting with Perfect Balance

sliding aggregation

Sliding Aggregate Computation

experiments-sorting

experiments-sorting

experiments-skyline

本篇论文的主要贡献是填充了

最小 MR 算法概念一个空隙。。

thx @hh's

Recommended