黄宜华 Octopus 跨平台统一MLDM编程模型与平台

8/10/2019 Octopus MLDM

1/69

[email protected]@nju.edu.cn

2014.10.232014.10.23


2/69


3/69


4/69

Research Motivations


5/69

Challengesthat

Big

Data

MLDM

brings

computation upon large-scale dataset in acceptable time

Serial machine learning algorithms do not fit and work onan of existin arallel com utin latforms

on different parallel computing platforms


6/69

Challengesthat

Big

Data

MLDM

brings

x s ng mac ne earn ng an a a m n ng a gor ms

need to rewrite in parallel for big data

need to rewrite for different big data processing platforms


7/69

FrequentItemset Mining

Algorithm

and often used algorithm for data mining

Apriori algorithm is the most established algorithm forfinding frequent itemset from a transactional dataset

Tao Xiao, Shuai Wang, Chunfeng Yuan, Yihua Huang. PSON: A Parallelized SONAlgorithm with MapReduce for Mining Frequent Sets. The Fourth InternationalS m osium on Parallel Architectures Al orithms and Pro rammin PAAP 2011252-257, 2011

Hongjian Qiu, Rong Gu, Chunfeng Yuan and Yihua Huang. YAFIM: A Parallel

Frequent Itemset Mining Algorithm with Spark. The 3rd International Workshopon Parallel and Distributed Computing for Large Scale Machine Learning and Big DataAnalytics, conjunction with IPDPS 2014, May 23, 2014. Phoenix, USA


8/69

requen emse n ng gor mApriori algorithm

Needs multiple passes over the database

In the first pass, all frequent 1-itemsets are discovered In each subsequent pass, frequent (k+1)-itemsets are discovered, with the frequent

k- itemsets found in the previous pass as the seed (referred to as candidate itemsets)

Repeat until no more frequent itemsets can be found


9/69

Frequent Itemset Mining Algorithm

A riori Al orithm 1 :

[1] Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499


10/69


Apriori in MapReduce:


11/69


Tao Xiao, Shuai Wang, Chunfeng Yuan, Yihua Huang. PSON: A Parallelized SON Algorithm withMapReduce for Mining Frequent Sets. The Fourth International Symposium on Parallel Architectures,Algorithms and Programming, PAAP 2011, p 252-257, 2011


12/69

requent temset n ng gor t mMa Reduce

Parallel Aprioir algorithm with MapReduce needs to run

the Ma Reduce ob iterativel It need to scan the dataset iteratively and store all the

intermediate data in HDFS

As a result, the parallel Apriori algorithm withMapReduce is not efficient enough


13/69

requen emse n ng gor mS ark

YAFIM, Apriori algorithm implemented in Spark Model,

Our YAFIM contains two hases to find all fre uent

itemsets Phase : Load transaction datasets as a Spark RDD object and

genera e - requen emse s;

Phase : Iteratively generate (k+1)-frequent itemset from k-

fre uent itemset.Hongjian Qiu, Rong Gu, Chunfeng Yuan and Yihua Huang. YAFIM: A Parallel Frequent ItemsetMining Algorithm with Spark. The 3rd International Workshop on Parallel and Distributed Computingfor Large Scale Machine Learning and Big Data Analytics, conjunction with IPDPS 2014, May 23, 2014.

,


14/69


data into a RDD

All transaction data

reside in RDD


15/69


Phase


16/69


Phase I


17/69


x w ubenchmarks [3] with different characteristics:

us oomT10I4D100K

Chess

Pumsb_star


18/69



19/69



20/69



21/69

K-Means

K-Means Clustering Algorithm

Input:A dataset of N data points that need to be clustered into K

clusterOutputK clusters

Choose k cluster center Centers[K] as initial cluster centers

Loop:for each data point P from dataset

Calculate the distance between P and each of Centers[i]

Save to the nearest cluster centerRecalculate the new Centers[K]

Go loop until cluster centers converge


22/69

K-Means


-

class Mapper

setu { read k cluster centers Centers[K]; }

map(key, p) // p is a data point

minDis = Double.MAX VALUE;index = -1;for i=0 to Centers.len th{ dis= ComputeDist(p, Centers[i]);

if dis < minDis

{ minDis = dis;}

}emit Centers i .ClusterID, ,1 ;

}


23/69

K-Means


-

To optimize the data I/O and network transfer, we can use Combiner to-

class Combiner

reduce ClusterID 11 21

{pm = 0.0

n = [(p1,1), (p2,1), ];

for i=0 to n

pm = pm / n; // Calculate the average of points in the Cluster

emit ClusterID, m, n ; // use it as new Center

}


24/69

K-Means

-MapReduceK-Means

class Reducer

reduce ClusterID valueList = m1n1 m2n2

{

pm = 0.0 n=0;

k = length of valuelist belonging to a ClusterID;

for i=0 to k

+= * +=

pm = pm / n; // calculate new center of the Cluster

emit(ClusterID, (pm,n)); // output new center of the Cluster}

In main() function of the MapReduce Job, set a loop to run the


25/69

K-Means


-

while(tempDist > convergeDist && tempIter < MaxIter)

varclosest = data.map ( p => (closestPoint(p, kPoints), (p, 1))) // determine nearest center foreach P

// calculate the avera e of all oints in a cluster as new center

varpointStats = closest.reduceByKey{case ((x1, y1), (x2, y2)) => (x1 + x2, y1 + y2)}varnewPoints = pointStats.map {pair => (pair._1, pair._2._1 / pair._2._2)}.collectAsMap()

= .

for (i


26/69

K-Means

-SparkK-MeansSpark speedup about 4-5 times compared to MapReduce

ntime(s)

Executi

Number of Nodes 1st iteration next iteration

Peng Liu, Jiayu Teng, Yihua Huang.

Study of k-means algorithm parallelization performance based on spark.

CCF Bi Data 2014 Bei in on review


27/69

NaiveBayes Classification Algorithm

Given m classes from training dataset: { C1,C

2, , C

m}

| 1i

map iC C

c arg max P C X i m

.

|| i ii P X C P C P C XP X

=> Only need to calculate P | i iX C P C

n

1| ( | )

i k ikP X C P x C

Supposexk is independent to each other =>

,i i


28/69


Training Map Pseudo Code to calculate P(X|Ci) and P(Ci)class Mapper

map(key, tr) // tr is a training sample

tr trid, X, Ciemit(Ci, 1)

or = o . eng

{ X[j] xnj & xvj // xnj: name if xj, xvj: value of xj

emit(, 1)}

}


29/69


Training Reduce Pseudo Code to calculate P(xj|Ci) and P(Ci)class Reducer

reduce(key, value_list) // key: either Ci or

sum =0; // count for P(xj|Ci) and P(Ci)while(value_list.hasNext())

sum += value_list.next().get();

emit(key, sum)

}// Trim and save output as P(xj|Ci) and P(Ci) tables in HDFS


30/69


Predict Map Pseudo Code to Predict Test Sample

class Ma er setup()

{ load P(xj|Ci) and P(Ci) data from training stageFC = { (Ci, P(Ci)) }, FxC = { (, P(xj|Ci)) }

map(key, ts) // ts is a test sample{ ts tsid, X

MaxF = MIN_VALUE; idx = -1;= .

{ FXCi = 1.0Ci = FC[i].Ci; FCi = FC[i].P(Ci)for (j=0 to X.length){ xnj = X[j].xnj; xvj = X[j].xvj

, , ,FXCi = FXCYi * P(xj|Ci);

}if(FXCi* FCi >MaxF) { MaxF = FXCi*FCi; idx = i; }

emit(tsid, FC[idx].Ci)}


31/69


Training SparkR Code to calculate P(xj|Ci) and P(Ci)


32/69


Predict SparkR Codepre c


33/69


TrainingDatasetthousand

250 35 s 13 s 2.69

500 40 s 14 s 2.851000 49 s 16 s 3.06

2000 66 s 18 s 3.67

q ang u, ong u, ua uang.

The Parallelization of Classification Algorithms Based on SparkR.

CCF Big Data 2014, Beijing, Accepted


34/69


35/69

Large Scale Deep Learning on Intel Xeon Phiore ara e gor ms e o

Manycore Coprocessor with OpenMP

60

cores 30

cores

BaseLine 16024s 15960s

OpenMP 892s 2122s

OpenMP+MKL 97s 120s

Improved

O enMP+MKL

53s 81s

Speedup(fully

optimizedcompared

302 197

Lei Jin, Rong Gu, Chunfeng Yuan and Yihua Huang. Large Scale

Deep Learning On Xeon Phi Many-core Coprocessor. The 3rd

Large Scale Machine Learning and Big Data Analytics, conjunction

with IPDPS 2014, May 23, 2014. Phoenix, USA


36/69

Large Scale Learning to Rank based onore ara e gor ms e o

ra en oos ng ec s on ree w

Research Grant from Baidu


37/69

Large Scale Learning to Rank based onore ara e gor ms e o

Gradient Boosting Decision Tree with MPI

Implemented parallel algorithm with MPI achieves 1.5 speedupcompare w ex s ng a gor m rom a u


38/69

Customized Light-weighted Parallel Computing Platformore ara e gor ms e o

for Large Scale Neural Network Training

Rong Gu, Furao Shen, and Yihua Huang.A Parallel Comput ing

Platform for Training Large Scale Neural Networks. Proceedings

of the IEEE International Conference on Big Data (IEEE BigData

2013), pp. 376 - 384, Santa Clara, CA, USA, Oct. 6-9, 2013


39/69

Summary

MLDM

MLDM


40/69

Part2Part2

Unified Programming Model and Platform fornified Programming Model and Platform for

Machine Learning Data Miningachine Learning Data Mining


41/69

esearc o va ons an oa s

Two fundamental goals of developing computing technology

+

Fast Continously improve the performance

Easy to UseContinously improve the usability


42/69


2007Hadoop

2013 S ark


43/69


OpenMP

MPI


44/69


vs.

SQLSQL

ve, mpa a, parTranswarp Incepter

,Spark Mllib

Octopus


45/69


46/69

What we do for this?esearc o va ons an oa s

We provide an unified programming model and platform to

bridge the gap between data analysts and parallel computing

U

MPI

Spark

nified&eas

P

rogram

ytouse

ing MapReduce


47/69

Problem for professional parallel programmers:esearc o va ons an oa s

A number of parallel computing platforms multiplying hundreds of

machine learning algorithms will generates a lot of duplicated work and

burden to rewrite all algorithms across different platforms

MPILotsofduplicated

Hundredof

MLDM

Algorithms

Spark

MapReduce

rewriteallMLDM

algorithms

What we do for this?We provide a unified programming model and platform for parallelprogrammers to write their MLDM algorithms once but run anywhere!


48/69

ecen esearc a us

RhadoopRevolution AnalyticsRHadoopRJava

RhadoopSparkRpbdR

SparkRSparkRRAPI,RRSpark RDD API

MapReduceSparkRDDMPI

Spark MLlibMLDM

R

Hadoop/Spark

pbdRRMPI

RHPC

/MPI

R

R

MLDM


49/69

MLDM

Basic Ideas

MLDM algorithms can be represented as matrix computations

Adopt matrix as the unified abstraction to represent a variety of machine

learning and data mining(MLDM) algorithms

Provide a high-level matrix model-based MLDM parallel programming andcomputing model

rov e a marx mo e- ase eas - o-use an un e programmnglanguage and software framework to support the model

MLDM


50/69

MLDM

Basic Ideas

MLDMplug-in

Im lement lu -ins for each of underl in arallel com utin latforms,mapping the high-level MLDM programs along with matrix computation

to underlying platforms

,Implement and provide optimized large-scale matrix computation and

,of underlying platforms to programmers and write once, run anyway

MLDMDesign and provide parallel MLDM algorithm libarary

MLDM


51/69

MLDM

Architectural Overview Octopus Project

MLDM

We have initiated a research

develop a cross-platform and

unified MLDM programming

,platform


52/69

MLDM


53/69

MLDM

Architectural Overview

Spark


54/69

Spark

Distributed Matrix Computation Lib with Spark

Marlin: Octopus sub-projectSpark Distributed Matrix Lib

htt s://code.csdn.net/u014252240/s arkmatrixlib

Currently either R or Spark does not provide any ability to

operate large-scale matrix

- .

Distributed Matrix is a critical and fundamental component

model on top of Spark

Spark


55/69

Spark

S ark Mllib OverviewDistributed Matrix Computation Lib with Spark

par

BLAS/LAPACK

Spark


56/69

Spark

-Distributed Matrix Computation Lib with Spark

Spark Mllib

Marlin

API

Spark


57/69

p

Distributed Matrix Computation Lib with Spark-

AutomatedLarge

Scale

Matrix

Partition

and

Parallel

Execution

Manager

ScheduleandDispatch

SparkCluster

erver o es

Spark


58/69

p

Distributed Matrix Computation Lib with Spark

Spark

OctopusHadoopMPI

Spark


59/69

p

Spark-Matrix Lib PerformanceDistributed Matrix Computation Lib with Spark

Marlin

Spark


60/69

p

-Distributed Matrix Computation Lib with Spark

Spark-Matrix


61/69

Integrate Spark with Unified Platform - , ,

platform

allow Spark-Matrix Lib can be called from R language

loading and managing matrix data-

and partitioning and scheduling sub-matrix for

distributed execution;

or calling R-Matrix lib for small size matrix that can be

executed on a single machine.

Spark-Matrix


62/69

Octopus-R User InterfaceIntegrate Spark with Unified Platform

User Interface

from R Studio

to work with

Octopus

Panel to

write MLDM orany other

algorithm code

with matrix

ommanand result

window

Spark-Matrix


63/69

Octopus-R Demo AlgorithmIntegrate Spark with Unified Platform

Logistic

Regressionalgorithm coded

with Matrix

Underlying this

program will beexecuted on top of

our c opus

engine


64/69

Project Research Progress

Octopus

SparkRSparkRMLDM

Hadoop MapReduce

RHadoop

MPI


65/69

Project Research Progress

RMLDM

R

MLD

Spark Hadoop MPI


66/69

PASA


67/69

What

we

do

at

our

NJU

PASA

Big

Data

Lab

ur a stu es on

Parallel

gor ms

Systems, and

pp ca ons

for Big Data

Now we are contributor

Tachyon

PASAW t t NJU PASA Bi D t L


68/69

W atwe oatourNJUPASABigDataLa

Hadoop Intel ,

Tachyon UC BekerleyAMPHBase

HBaseRDF Intel Intel MIC Intel

, , ,

GBDT

Web Web , Intel


69/69

ContactInformationContactInformation

Dr.Dr.YihuaYihua Huang,ProfessorHuang,Professor

NJUNJUPASABigDataLabPASABigDataLabhttp://pasahttp://pasabigdata.nju.edu.cnbigdata.nju.edu.cn

Departmentof

Computer

Science

and

TechnologyDepartment

of

Computer

Science

and

Technology

an ng n vers y, an ng,an ng n vers y, an ng, . . na. . na

[email protected]@nju.edu.cn

TeTe 18918951675167

91279127

Documents

黄宜华 Octopus 跨平台统一MLDM编程模型与平台