28
An Evaluation of Systems for Scalable Linear Algebra Anthony Thomas UC San Diego July 2, 2018 Anthony Thomas (UC San Diego) July 2, 2018 1 / 21

An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

An Evaluation of Systems for Scalable LinearAlgebra

Anthony Thomas

UC San Diego

July 2, 2018

Anthony Thomas (UC San Diego) July 2, 2018 1 / 21

Page 2: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

Introduction

Our Contributions

1 Empirical evaluation of systems focusing on linear algebra (LA) basedmachine learning

2 Articulate a new set of LA workloads stress testing different dataaccess and communication patterns

3 Extensive empirical comparison of several popular LA systems on realand synthetic data

4 Analysis and discussion of system strengths and weaknesses

Under submission to VLDB 2019https://adalabucsd.github.io/slab.html

Anthony Thomas (UC San Diego) July 2, 2018 2 / 21

Page 3: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

Background

What is linear algebra?I Formal mathematical language for describing transformations to

matricesI Characterizes systems of linear equations (in a vector space)I Example: Ax = b ⇒ x = A

−1b

Why should we care?I Most common statistics algorithms can be expressed as transformations

to matricesI Elegant language of abstraction for programming statistical algorithmsI Algorithms can be expressed in “near math” syntaxI Loosely analogous to SQL and relational algebra

Anthony Thomas (UC San Diego) July 2, 2018 3 / 21

Page 4: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

Numerical Linear Algebra (Classical)

LAPACK (C/FORTRAN) R Language (C) Eigen (C++)1979/1992 1993 2009

Mostly low level libraries and wrappers

Some work towards scalability (ScaLAPACK)

Anthony Thomas (UC San Diego) July 2, 2018 4 / 21

Page 5: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

Yet Another Linear Algebra Benchmark?

Linear Algebra has been extensively studied, but...I Focus mostly on single-node in-memory settingI Target low level libraries (BLAS, Eigen, etc...)I Goal is to optimize primitive linear algebra operations

Source: http://eigen.tuxfamily.org/index.php?title=Benchmark

Anthony Thomas (UC San Diego) July 2, 2018 5 / 21

Page 6: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

What’s new in Systems for Scalable Linear Algebra?

Scaling beyond main memory:I RDBMS based systems: Apache MADlib, SimSQL, RIOTI Map-reduce/Spark: Spark ML/MLlib, Apache SystemML, Mahout

SamsaraI Something new: TensorFlow

Moving towards declarative programming style:I Less painful (not painless) implementation of distributed programsI Decoupling physical implementation from program designI Hollistic inter-operator program optimization

Anthony Thomas (UC San Diego) July 2, 2018 6 / 21

Page 7: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

What’s new in Systems for Scalable Linear Algebra?

Scaling beyond main memory:I RDBMS based systems: Apache MADlib, SimSQL, RIOTI Map-reduce/Spark: Spark ML/MLlib, Apache SystemML, Mahout

SamsaraI Something new: TensorFlow

Moving towards declarative programming style:I Less painful (not painless) implementation of distributed programsI Decoupling physical implementation from program designI Hollistic inter-operator program optimization

Anthony Thomas (UC San Diego) July 2, 2018 6 / 21

Page 8: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

Example: Apache SystemML

Aims to bring SQL style ”declarative programming” to machinelearning

Compiles programs written in a custom R-like language (DML) intobatch jobs run on Spark.

Sophisticated program optimization:I Physical operator selection (e.g. GMM implementation)I Optimization based on LA semanticsI Automatic choice of dense/sparse matrices, local/distributed

computation

Anthony Thomas (UC San Diego) July 2, 2018 7 / 21

Page 9: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

Experimental Evaluation

Performance evaluation targeted at the “typical data science user”

Focus on bulk LA operations - not mini-batch SGD and neuralnetworks

Two scale factors controlling data complexity:1 Number of rows2 Data sparsity (% cells which are 0)

And two scale factors controlling the computational environment:1 Number of CPU cores2 Number of cluster nodes (implcitly scales RAM)

Anthony Thomas (UC San Diego) July 2, 2018 8 / 21

Page 10: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

Experimental Evaluation

Performance evaluation targeted at the “typical data science user”

Focus on bulk LA operations - not mini-batch SGD and neuralnetworks

Two scale factors controlling data complexity:1 Number of rows2 Data sparsity (% cells which are 0)

And two scale factors controlling the computational environment:1 Number of CPU cores2 Number of cluster nodes (implcitly scales RAM)

Anthony Thomas (UC San Diego) July 2, 2018 8 / 21

Page 11: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

Experimental Evaluation

Performance evaluation targeted at the “typical data science user”

Focus on bulk LA operations - not mini-batch SGD and neuralnetworks

Two scale factors controlling data complexity:1 Number of rows2 Data sparsity (% cells which are 0)

And two scale factors controlling the computational environment:1 Number of CPU cores2 Number of cluster nodes (implcitly scales RAM)

Anthony Thomas (UC San Diego) July 2, 2018 8 / 21

Page 12: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

Task Categories

Primitive Matrix Operators

Aggregation Operators: Frobenius Norm, Matrix Vector Multiplication

Binary Block Operators: Matrix Addition

Multiplication: General matrix-matrix multiplication, transpose-selfmultiplication

Pipelines and Decompositions

Multiplication chains:

pN×1

= uN×1· v1×N· wN×1

Singular Value Decomposition:

SVD(M)N×K

→ UN×K

· ΣK×K

· V T

K×K

Anthony Thomas (UC San Diego) July 2, 2018 9 / 21

Page 13: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

Task Categories - ML Algorithms

OLS Regression solved via normal equations:

Input: X - data, y - outcomesResult: β = solve(XTX ,XTy)

Logistic Regression solved via gradient descent:

Input: X - data, y - outcomesβ = rand(K , 1)while not converged do

β = β + XT (y − 11+exp(−Xβ))

endResult: β

Anthony Thomas (UC San Diego) July 2, 2018 10 / 21

Page 14: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

Task Categories - ML Algorithms

OLS Regression solved via normal equations:

Input: X - data, y - outcomesResult: β = solve(XTX ,XTy)

Logistic Regression solved via gradient descent:

Input: X - data, y - outcomesβ = rand(K , 1)while not converged do

β = β + XT (y − 11+exp(−Xβ))

endResult: β

Anthony Thomas (UC San Diego) July 2, 2018 10 / 21

Page 15: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

Task Categories - ML AlgorithmsNon-Negative Matrix Factorization solved via multiplicative updates:

Input: X - data matrix, r - rankW = rand(N, r)H = rand(r ,K )while not converged do

W = W · XHT

WHHT

H = H · WTX

WTWH

endResult: (H ,H)

Heteroscedasticity Robust Standard Errors solved by White’sMethod

Input: X - data, ε - OLS ResidualsV =

(XTX )−1XTdiag(ε2)X (XTX )−1

Result: VAnthony Thomas (UC San Diego) July 2, 2018 11 / 21

Page 16: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

Task Categories - ML AlgorithmsNon-Negative Matrix Factorization solved via multiplicative updates:

Input: X - data matrix, r - rankW = rand(N, r)H = rand(r ,K )while not converged do

W = W · XHT

WHHT

H = H · WTX

WTWH

endResult: (H ,H)

Heteroscedasticity Robust Standard Errors solved by White’sMethod

Input: X - data, ε - OLS ResidualsV =

(XTX )−1XTdiag(ε2)X (XTX )−1

Result: VAnthony Thomas (UC San Diego) July 2, 2018 11 / 21

Page 17: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

Technical Details

Systems Compared (Distributed Context)I Spark MLLib - Spark basedI Apache SystemML - Spark basedI SciDB - Custom Array DBMSI Apache MADLib - RDBMS (Greenplum/Postgres) basedI “Programming with Big Data in R” (pbdR) DMAT - MPI/ScaLAPACK

based

Measurements performed on CloudLab “Clemson” siteI 200 GB RAM, 24 CPU, 800GB per nodeI Most experiments (and intermediates) fit in distributed RAM

Each test is run 5 timesI First measurement is discardedI Median, min and max a reported

Data loading time is not included

Disk spills are allowed

Anthony Thomas (UC San Diego) July 2, 2018 12 / 21

Page 18: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

Technical Details

Systems Compared (Distributed Context)I Spark MLLib - Spark basedI Apache SystemML - Spark basedI SciDB - Custom Array DBMSI Apache MADLib - RDBMS (Greenplum/Postgres) basedI “Programming with Big Data in R” (pbdR) DMAT - MPI/ScaLAPACK

based

Measurements performed on CloudLab “Clemson” siteI 200 GB RAM, 24 CPU, 800GB per nodeI Most experiments (and intermediates) fit in distributed RAM

Each test is run 5 timesI First measurement is discardedI Median, min and max a reported

Data loading time is not included

Disk spills are allowed

Anthony Thomas (UC San Diego) July 2, 2018 12 / 21

Page 19: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

Technical Details

Systems Compared (Distributed Context)I Spark MLLib - Spark basedI Apache SystemML - Spark basedI SciDB - Custom Array DBMSI Apache MADLib - RDBMS (Greenplum/Postgres) basedI “Programming with Big Data in R” (pbdR) DMAT - MPI/ScaLAPACK

based

Measurements performed on CloudLab “Clemson” siteI 200 GB RAM, 24 CPU, 800GB per nodeI Most experiments (and intermediates) fit in distributed RAM

Each test is run 5 timesI First measurement is discardedI Median, min and max a reported

Data loading time is not included

Disk spills are allowed

Anthony Thomas (UC San Diego) July 2, 2018 12 / 21

Page 20: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

Results!

Figure 1: Distributed Dense Matrix Ops - Vary Rows

Cluster size: 8 nodes, Matrix fixed axis: 100, CPUs: 24What’s going on with SystemML?Anthony Thomas (UC San Diego) July 2, 2018 13 / 21

Page 21: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

Figure 2: Distributed Sparse Matrix Ops - Vary Sparsity

Cluster size: 8 nodes, CPUs: 24, Logical Matrix Size: 100GBWhat happened to MADlib for MVM?

Anthony Thomas (UC San Diego) July 2, 2018 14 / 21

Page 22: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

Figure 3: Distributed Pipelines and Decompositions

Anthony Thomas (UC San Diego) July 2, 2018 15 / 21

Page 23: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

Figure 4: Distributed Dense LA Algorithms - Vary Rows

Implemented by us...

Anthony Thomas (UC San Diego) July 2, 2018 16 / 21

Page 24: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

Figure 5: Distributed Dense Algorithms - Criteo Adclick Data

Implemented by us and them...

Anthony Thomas (UC San Diego) July 2, 2018 17 / 21

Page 25: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

Comparing LA Abstractions - Example I

Apache SystemML Spark MLLib

Anthony Thomas (UC San Diego) July 2, 2018 18 / 21

Page 26: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

Comparing LA Abstractions - Example II

pbdR SciDB

Anthony Thomas (UC San Diego) July 2, 2018 19 / 21

Page 27: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

Some Commentary

Challenges Remain...

Physical data independence is often poor - need to decide a priori ondense vs sparse, distributed vs. local, which data type to use etc...

Tuning remains a “significant challenge”:1 Often requires substantial systems knowledge (GC tuning, caching and

buffer pools)2 Poor tuning can kill performance3 Often labor intensive - especially for RDBMS type systems4 Too many tunable parameters - leads to “tuning fatigue”5 Tuning parameters may be workload specific

More nodes does not always lead to better performance!

Anthony Thomas (UC San Diego) July 2, 2018 20 / 21

Page 28: An Evaluation of Systems for Scalable Linear Algebra · 02-07-2018  · Introduction Our Contributions 1 Empirical evaluation of systems focusing on linear algebra (LA) based machine

Key Takeaways

Transparently switching between distributed and local executionimproves performance and improves usability

Automatically detecting LA optimizations (diagonal matrix multiply,multiplication chain order) improves performance and improvesusability

Intermediate results should not be needlessly materialized andcomputations should be pipelined whenever possible

Strong physical data independence and LA based abstractions are keyto an enjoyable programming experience

Anthony Thomas (UC San Diego) July 2, 2018 21 / 21