40
Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University Jena, Germany

Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

Scaling up Generic Optimization usingautoBLAS

Joachim Giesen Lars Kühne Sören Laue Jens Müller

Friedrich-Schiller-University Jena, Germany

Page 2: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

Big Data Analytics

About 600-700 papers published at NIPS and ICML per year.

~30% design / implement a new optimization algorithm

Page 3: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

Big Data Analytics

About 600-700 papers published at NIPS and ICML per year.

~30% design / implement a new optimization algorithm

Page 4: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

Motivation

Look at big data analytics through the lens of optimization.

I Least squares regression

I SVMs

I Kernel learning

I k-means

I . . .

Page 5: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

Motivation

Look at big data analytics through the lens of optimization.

I Least squares regression

I SVMs

I Kernel learning

I k-means

I . . .

Page 6: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

Generic Optimization (GENO)

matrix A

vector b

minimize norm(A*x - b, 2)

subject to

x >= 0

GENO

non-negativeleast squares

solver

matrix X sparse

vector y

scalar c

minimize

1/2 * w’*w + c * sum(xi)

subject to

y.*(X*w+vector(b)) >= 1-xi

xi >= 0

SVM solver

Page 7: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

GENO stack

Matrix CalculusSymbolic Computation

Optimization ProblemSolver

Linear Algebra ExpressionsEigen

Page 8: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

GENO stack

Matrix CalculusSymbolic Computation

Optimization ProblemSolver

Linear Algebra ExpressionsEigen

Page 9: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

Matrix Calculus

gradient computation

example: f (x) = xT Ax

gradient: ∇f (x) = (AT + A)x

example: logistic regressionf (w) = 1

2wT w + C · sum(log(1 + exp(−y . ∗ (Xw + b))))gradient: ∇f (w) = ?

I Maple, Mathematica, Sage cannot do it

Page 10: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

Matrix Calculus

gradient computation

example: f (x) = xT Axgradient: ∇f (x) = (AT + A)x

example: logistic regressionf (w) = 1

2wT w + C · sum(log(1 + exp(−y . ∗ (Xw + b))))gradient: ∇f (w) = ?

I Maple, Mathematica, Sage cannot do it

Page 11: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

Matrix Calculus

gradient computation

example: f (x) = xT Axgradient: ∇f (x) = (AT + A)x

example: logistic regressionf (w) = 1

2wT w + C · sum(log(1 + exp(−y . ∗ (Xw + b))))

gradient: ∇f (w) = ?

I Maple, Mathematica, Sage cannot do it

Page 12: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

Matrix Calculus

gradient computation

example: f (x) = xT Axgradient: ∇f (x) = (AT + A)x

example: logistic regressionf (w) = 1

2wT w + C · sum(log(1 + exp(−y . ∗ (Xw + b))))gradient: ∇f (w) = ?

I Maple, Mathematica, Sage cannot do it

Page 13: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

Matrix Calculus

gradient computation

example: f (x) = xT Axgradient: ∇f (x) = (AT + A)x

example: logistic regressionf (w) = 1

2wT w + C · sum(log(1 + exp(−y . ∗ (Xw + b))))gradient: ∇f (w) = ?

I Maple, Mathematica, Sage cannot do it

Page 14: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

Matrix Calculus – Demo

Page 15: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

GENO stack

Matrix CalculusSymbolic Computation

Optimization ProblemSolver

Linear Algebra ExpressionsEigen

Page 16: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

GENO stack

Matrix CalculusSymbolic Computation

Optimization ProblemSolver

Linear Algebra ExpressionsEigen

Page 17: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

Optimization Problem Solver

I tested on many data analytics problems

I tested on CUTEst benchmark (1150 optimizationproblems)

I competitive on large problems

I room for improvement on small problems

Page 18: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

Optimization Problem Solver

I tested on many data analytics problems

I tested on CUTEst benchmark (1150 optimizationproblems)

I competitive on large problems

I room for improvement on small problems

Page 19: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

GENO stack

Matrix CalculusSymbolic Computation

Optimization ProblemSolver

Linear Algebra ExpressionsEigen

Page 20: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

GENO stack

Matrix CalculusSymbolic Computation

Optimization ProblemSolver

Linear Algebra ExpressionsEigen

Page 21: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

GENO stack

Matrix CalculusSymbolic Computation

Optimization ProblemSolver

Linear Algebra ExpressionsautoBLAS

Page 22: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

Motivation autoBLAS

GENO compute time

solver timefunction value

&gradient

Page 23: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

What takes most of the time?

EN KDSVM LR LSQ NNLSQ SPCA0

10

20

30

40

50

60

70

80

90

100 98

9296 95 95

100

Time spent on function value & gradient

Page 24: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

Computing gradient and function value

Example: logistic regression

t1 = ones(rows(X))t2 = exp(-y .* (X * w))t3 = t1 + t2g = w + c*X’ * ((t1 ./ t3) .* t2 .* -y)f = 0.5 * w’ * w + c * t1’ * log(t3)

Page 25: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

autoBLAS

I linear algebra expression pre-compiler

I host language independent

I decouples expression generation from evaluation

I performs symbolic expression optimization

I efficient temporary allocation and reuse

I expressive error-messages

I mapping to BLASI standard interface: portable

I efficient implementations available for all major hardwareplatforms and frameworks

Page 26: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

autoBLAS

I linear algebra expression pre-compiler

I host language independent

I decouples expression generation from evaluation

I performs symbolic expression optimization

I efficient temporary allocation and reuse

I expressive error-messages

I mapping to BLASI standard interface: portable

I efficient implementations available for all major hardwareplatforms and frameworks

Page 27: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

autoBLAS

I linear algebra expression pre-compiler

I host language independent

I decouples expression generation from evaluation

I performs symbolic expression optimization

I efficient temporary allocation and reuse

I expressive error-messages

I mapping to BLASI standard interface: portable

I efficient implementations available for all major hardwareplatforms and frameworks

Page 28: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

autoBLAS

I linear algebra expression pre-compiler

I host language independent

I decouples expression generation from evaluation

I performs symbolic expression optimization

I efficient temporary allocation and reuse

I expressive error-messages

I mapping to BLASI standard interface: portable

I efficient implementations available for all major hardwareplatforms and frameworks

Page 29: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

autoBLAS

I linear algebra expression pre-compiler

I host language independent

I decouples expression generation from evaluation

I performs symbolic expression optimization

I efficient temporary allocation and reuse

I expressive error-messages

I mapping to BLASI standard interface: portable

I efficient implementations available for all major hardwareplatforms and frameworks

Page 30: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

autoBLAS

I linear algebra expression pre-compiler

I host language independent

I decouples expression generation from evaluation

I performs symbolic expression optimization

I efficient temporary allocation and reuse

I expressive error-messages

I mapping to BLASI standard interface: portable

I efficient implementations available for all major hardwareplatforms and frameworks

Page 31: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

autoBLAS

I linear algebra expression pre-compiler

I host language independent

I decouples expression generation from evaluation

I performs symbolic expression optimization

I efficient temporary allocation and reuse

I expressive error-messages

I mapping to BLASI standard interface: portable

I efficient implementations available for all major hardwareplatforms and frameworks

Page 32: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

Declarative Language

Example: logistic regression

t1 = Vector(1, (rows(X)));t2 = exp(-y .* (X * w));t3 = t1 + t2;g = w + c*X’ * ((t1 ./ t3) .* t2 .* -y);f = 0.5 * w’ * w + c * t1’ * log(t3);

Page 33: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

autoBLAS

• C• Eigen-Classes• SQL-Tables• Python array

• expression to statementmapping

• expression optimzation• temporary allocation

• BLAS• Eigen3• MKL• NumPy• sqlBLAS

Frontend Core Backend

Page 34: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

Frontend

#autoblas {Vector y, w;Vector g f=c data=gPtr size=N;Matrix X sparse=csc;...

}

autoblas -f eigen -b mkl < file.cpp.in >file.cpp

Page 35: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

Frontend

#autoblas {Vector y, w;Vector g f=c data=gPtr size=N;Matrix X sparse=csc;...

}

autoblas -f eigen -b mkl < file.cpp.in >file.cpp

Page 36: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

Experiments

KDSVM LR0

2

4

6

8

10

12

Eigen with multicore MKL Eigen with autoBLAS TemporariesautoBLAS with multicore MKL

Sp

ee

du

p

Page 37: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

Experiments

1 2 3 4 5 6 7 8 Xeon Phi0

1

2

3

4

5

6

LSQ - 4GB dataset

# of cores

spe

ed

up

Page 38: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

Conclusion

I with autoBLAS, GENO scales to medium-sized datasets(a few dozen GB) within a single compute node

I exploits all available compute resources within a node(CPU, Xeon Phi, GPU)

I Outlook: go open source

Page 39: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

Workshop (Sparse) Linear Algebra for Graphs andOptimization

Organizers: Joachim Giesen, Sören Laue,Henning Meyerhenke

Venue: Institut für Informatik,Friedrich-Schiller-Universität Jena

Date: Spring 2016 (TBD)

If interested please send an email [email protected].

Page 40: Scaling up Generic Optimization using autoBLAS · Scaling up Generic Optimization using autoBLAS Joachim Giesen Lars Kühne Sören Laue Jens Müller Friedrich-Schiller-University

Workshop (Sparse) Linear Algebra for Graphs andOptimization

Schedule: Friday14:15 - 15:45: Linear Algebra, Graph Algorithms, and theGraphBLAS (Henning Meyerhenke)14:45 - 16:15: Coffee break16:15 - 17:45: Non-linear Optimization (Sören Laue/JoachimGiesen)19:00 Joint Dinner

Saturday9:15 - 10:45: A Practical Introduction to the Intel Xeon PhiArchitecture (Lars Kühne)10:45 - 11:15: Coffee break11:15 - 12:45: Linear Algebra on the Intel Xeon Phi (LarsKühne)

If interested please send an email [email protected].