6
© The 2018 International Conference on Artificial Life and Robotics (ICAROB2018), Feb. 1-4, B-Con Plaza, Beppu, Oita, Japan Harnessing over a Million CPU Cores to Solve a Single Hard Mixed Integer Programming Problem on a Supercomputer Yuji Shinano Mathematical Optimization and Scientific Information, Zuse Institute Berlin, Takustrasse 7, 14195 Berlin, Germany * E-mail: [email protected] Abstract The performance of mixed integer programming (MIP) solvers has improved tremendously in the last two decades and these solvers have been used to sol ve many real-word problems. ParaSCIP is the most successful parallel MIP solver in terms of solving previously unsolvable instances from the well-known benchmark instance set MIPLIB by using supercomputers. ParaSCIP has been developed by using the Ubiquity Generator (UG) framework, which is a general software package to parallelize any state-of-the-art branch-and-bound based solvers. ParaSCIP is a parallelized MIP solver of a single thread solver SCIP. Since Xpress is a multi-threaded solver and ParaSCIP can run at least 80,000 processes in parallel for solving a single MIP, ParaXpress could handle over a million CPU cores. In this talk, a ground design of the UG framework and its latest extensions to harness over a million CPU cores will be presented and preliminary computational results will be provided. Keywords: mixed integer programming problem, massively parallel computing, SCIP, ParaSCIP, ParaXpress, UG Slides Harnessing over a Million CPU Cores to Solve a Single Hard Mixed Integer Programming Problem on a Supercomputer Yuji Shinano Zuse Institute Berlin 1 04/02/2018 The 2018 International Conference on Artificial Life and Robotics ug[SCIP, *]: joint work with Tobias Achterberg, Timo Berthold, Gerald Gamrath, Stefan Heinz, Thorsten Koch, Stephen J. Maher, Daniel Rehfeldt, Stefan Vigerske, Michael Winkler ug[Xpress,*]: joint work with Timo Berthold, Stefan Heinz ug[PIPS-SBB,MPI]: joint work with Lluis Miquel Munguia, Geoffrey Oxberry, Deepak Rajan UGS: joint work with Lluis Miquel Munguia Outline Ø Background and Purpose u State-of-the-art Mixed Integer Programming (MIP) solvers u Parallelization of MIP solvers Ø Ubiquity Generator (UG) framework and ParaSCIP Ø Computational results for solving previously unsolved MIP instances on supercomputers Ø How to harness over a million CPU cores Ø Concluding remarks The 2018 International Conference on Artificial Life and Robotics 2 33

Harnessing over a Million CPU Cores to Solve a Single Hard ...alife-robotics.co.jp/ICAROB2019/IS05.pdf · a Single Hard Mixed Integer Programming Problem on a Supercomputer Yuji Shinano

Embed Size (px)

Citation preview

Page 1: Harnessing over a Million CPU Cores to Solve a Single Hard ...alife-robotics.co.jp/ICAROB2019/IS05.pdf · a Single Hard Mixed Integer Programming Problem on a Supercomputer Yuji Shinano

© The 2018 International Conference on Artificial Life and Robotics (ICAROB2018), Feb. 1-4, B-Con Plaza, Beppu, Oita, Japan

Harnessing over a Million CPU Cores to Solve a Single Hard Mixed Integer Programming Problem on a Supercomputer

Yuji Shinano Mathematical Optimization and Scientific Information, Zuse Institute Berlin,

Takustrasse 7, 14195 Berlin, Germany* E-mail: [email protected]

Abstract

The performance of mixed integer programming (MIP) solvers has improved tremendously in the last two decades and these solvers have been used to sol ve many real-word problems. ParaSCIP is the most successful parallel MIP solver in terms of solving previously unsolvable instances from the well-known benchmark instance set MIPLIB by using supercomputers. ParaSCIP has been developed by using the Ubiquity Generator (UG) framework, which is a general software package to parallelize any state-of-the-art branch-and-bound based solvers. ParaSCIP is a parallelized MIP solver of a single thread solver SCIP. Since Xpress is a multi-threaded solver and ParaSCIP can run at least 80,000 processes in parallel for solving a single MIP, ParaXpress could handle over a million CPU cores. In this talk, a ground design of the UG framework and its latest extensions to harness over a million CPU cores will be presented and preliminary computational results will be provided.

Keywords: mixed integer programming problem, massively parallel computing, SCIP, ParaSCIP, ParaXpress, UG

Slides

Harnessing over a Million CPU Cores to Solve a Single Hard Mixed Integer

Programming Problem on a Supercomputer

Yuji ShinanoZuse Institute Berlin

104/02/2018 The 2018 International Conference on Artificial Life and Robotics

ug[SCIP, *]: joint work with Tobias Achterberg, Timo Berthold, Gerald Gamrath, Stefan Heinz,Thorsten Koch, Stephen J. Maher, Daniel Rehfeldt, Stefan Vigerske, Michael Winkler

ug[Xpress,*]: joint work with Timo Berthold, Stefan Heinzug[PIPS-SBB,MPI]: joint work with Lluis Miquel Munguia, Geoffrey Oxberry, Deepak RajanUGS: joint work with Lluis Miquel Munguia

OutlineØ Background and Purposeu State-of-the-art Mixed Integer Programming (MIP) solversu Parallelization of MIP solvers

ØUbiquity Generator (UG) framework and ParaSCIPØ Computational results for solving previously unsolved MIP instances

on supercomputersØHow to harness over a million CPU coresØ Concluding remarks

The 2018 International Conference on Artificial Life and Robotics 2

33

Page 2: Harnessing over a Million CPU Cores to Solve a Single Hard ...alife-robotics.co.jp/ICAROB2019/IS05.pdf · a Single Hard Mixed Integer Programming Problem on a Supercomputer Yuji Shinano

Yuji Shinano

© The 2018 International Conference on Artificial Life and Robotics (ICAROB2018), Feb. 1-4, B-Con Plaza, Beppu, Oita, Japan

Background and Purpose

The 2018 International Conference on Artificial Life and Robotics 3

MIP (Mixed Integer Linear Programming)Øminimizes or maximizes a linear functionØ is subject to linear constraintsØ has integer and continuous variables

The most general form of combinatorial optimization problemsMany applications

min{ c€ x : Ax Æb, l Æx Æu, x j œZ, for all j œI }A œRm ◊ n , b œRm , c, l , u œRn , I ™ {1, . . . , n}

The 2018 International Conference on Artificial Life and Robotics 4

Background and Purpose

The 2018 International Conference on Artificial Life and Robotics 5

MIP (Mixed Integer Linear Programming)Øminimizes or maximizes a linear functionØ is subject to linear constraintsØ has integer and continuous variables

The most general form of combinatorial optimization problemsMany applicationsMIP solvability has been improving

min{ c€ x : Ax Æb, l Æx Æu, x j œZ, for all j œI }A œRm ◊ n , b œRm , c, l , u œRn , I ™ { 1, . . . , n}

Progress in a state-of-the-art MIP solver

The 2018 International Conference on Artificial Life and Robotics 6

Time limit:10000 sec.

Test set:3741 model

- 235 discarded due to inconsistent answers

- 934 discarded none of the version can solve

- speed-up measured on >100s bracket; 1205 models

Background and Purpose

The 2018 International Conference on Artificial Life and Robotics 7

MIP (Mixed Integer Linear Programming)Øminimizes or maximizes a linear functionØ is subject to linear constraintsØ has integer and continuous variables

The most general form of combinatorial optimization problemsMany applicationsMIP solvability has been improving

min{ c€ x : Ax Æb, l Æx Æu, x j œZ, for all j œI }A œRm ◊ n , b œRm , c, l , u œRn , I ™ { 1, . . . , n}

Development of a massively parallel MIP solver- can solve instances that cannot be solved by state-of-the-art MIP solvers- keeps catching up performance improvements of state-of-the-art MIP solvers

Parallelization of MIP solversBranch-and-bound looks suitable for parallelizationØ MIP solvers: LP based Branch-and-cut algorithm

Subproblems (sub-MIPs) can be processed independentlyUtilize the large number of processors for solving extremely hard MIP instances(previously unsolved problem instances from MIPLIB)

x1 £ 0

x2 ³1x2 £ 0x1 ³1

x5 ³1x5 £ 0

Subproblem(sub-MIP)

8

min{ c€ x : Ax Æb, l i Æx Æui , x j œZ, for all j œI }

The 2018 International Conference on Artificial Life and Robotics

time

Obj.

gap

34

Page 3: Harnessing over a Million CPU Cores to Solve a Single Hard ...alife-robotics.co.jp/ICAROB2019/IS05.pdf · a Single Hard Mixed Integer Programming Problem on a Supercomputer Yuji Shinano

Harnessing over a Million

© The 2018 International Conference on Artificial Life and Robotics (ICAROB2018), Feb. 1-4, B-Con Plaza, Beppu, Oita, Japan

Performance of state-of-the-art MIP Solvers

MIP solver benchmark (1 thread): Shifted geometric mean of results taken from the homepage of Hans Mittelmann (23/Mar/2014). Unsolved or failed instances are accounted for with the time limit of 1 hour.

Huge performancedifference!

9

As of14/April/2017

The 2018 International Conference on Artificial Life and Robotics

Solving techniques involved in SCIP

The 2018 International Conference on Artificial Life and Robotics 10

Dynamic load balancing is neededp Highly unbalanced tree is generated

p Two types of irregularity can be handled well § Irregular # of nodes are generated by a sub-MIP

§ Irregular computing time for a node solving

1.5h0.001sec

1,297,605 Real observation for solving ds in parallel with 4095 solvers

1

12The 2018 International Conference on Artificial Life and Robotics

GAMS and Condor: M.R.Bussieck and M.C.Ferris (2006)

13The 2018 International Conference on Artificial Life and Robotics

35

Page 4: Harnessing over a Million CPU Cores to Solve a Single Hard ...alife-robotics.co.jp/ICAROB2019/IS05.pdf · a Single Hard Mixed Integer Programming Problem on a Supercomputer Yuji Shinano

Yuji Shinano

© The 2018 International Conference on Artificial Life and Robotics (ICAROB2018), Feb. 1-4, B-Con Plaza, Beppu, Oita, Japan

Dynamic load balancingLoadCoordinator

Solver 1

16The 2018 International Conference on Artificial Life and Robotics

Solver 2 Solver 3 Solver 4 Solver 5 Solver n

Global view of tree search

- Send periodically and asynchronously- Interval is specified by a parameter

Notification message: best dual bound, # nodes remain, # nodes solved

Open nodes:p

Try to keep p open nodes in LoadCoordinator

LoadCoordinator makes selected Solvers in collecting mode

Expected to have heavy nodes:

large subtree underneath

Dynamic load balancingLoadCoordinator

Solver 1

17The 2018 International Conference on Artificial Life and Robotics

1080

1100

1120

1140

1160

1180

1200

1220

1240

0 50 100 150 200 250 300 350 400

Obje

ctiv

e Fu

ncti

on V

alue

Computing Time (sec.)

"Incumbents""Optimal"

"GlobalLBs""Solver1"

"Solver10""Solver11""Solver12""Solver13""Solver14""Solver15""Solver16""Solver17""Solver18""Solver19""Solver2"

"Solver20""Solver21""Solver22""Solver23""Solver24""Solver25"

"Solver26""Solver27""Solver28""Solver29""Solver3"

"Solver30""Solver31""Solver32""Solver33""Solver34""Solver35""Solver36""Solver37""Solver38""Solver39""Solver4""Solver5""Solver6""Solver7""Solver8""Solver9"

Solver 2 Solver 3 Solver 4 Solver 5 Solver n

Open nodes:p

Solver which has best dual bound node

Collecting mode Solver

- Changes search strategy tobest dual bound first

- Sends requested number of nodes

p*mp

Why can it handle large scale?LoadCoordinator

Solver 1

18The 2018 International Conference on Artificial Life and Robotics

1080

1100

1120

1140

1160

1180

1200

1220

1240

0 50 100 150 200 250 300 350 400

Obje

ctiv

e Fu

ncti

on V

alue

Computing Time (sec.)

"Incumbents""Optimal"

"GlobalLBs""Solver1"

"Solver10""Solver11""Solver12""Solver13""Solver14""Solver15""Solver16""Solver17""Solver18""Solver19"

"Solver2""Solver20""Solver21""Solver22""Solver23""Solver24""Solver25"

"Solver26""Solver27""Solver28""Solver29""Solver3"

"Solver30""Solver31""Solver32""Solver33""Solver34""Solver35""Solver36""Solver37""Solver38""Solver39""Solver4""Solver5""Solver6""Solver7""Solver8""Solver9"

Solver 2 Solver 3 Solver 4 Solver 5 Solver n

Open nodes:p

Solver which has best dual bound node

Collecting mode Solver

The # of solvers at a time is restricted

- The number is increased by at most 250 even if run with 80,000 Solvers

- Starts from 1- Dynamically switching

Layered presolving

The 2018 International Conference on Artificial Life and Robotics

Global view of tree search

A: Original (sub-)problem

A’: Presolved(sub-)problem

A’: Original (sub-)problem

A’’: Presolved(sub-)problem

min{ c€ x : Ax Æb, l Æx Æu, x j œZ, for all j œI }’ ’ ’ ’’ ’

19

36

Page 5: Harnessing over a Million CPU Cores to Solve a Single Hard ...alife-robotics.co.jp/ICAROB2019/IS05.pdf · a Single Hard Mixed Integer Programming Problem on a Supercomputer Yuji Shinano

Harnessing over a Million

© The 2018 International Conference on Artificial Life and Robotics (ICAROB2018), Feb. 1-4, B-Con Plaza, Beppu, Oita, Japan

Check pointing

The 2018 International Conference on Artificial Life and Robotics

Only the essential nodes are saveddepending on run-time situation

21

Restarting

The 2018 International Conference on Artificial Life and Robotics

Only the essential nodes are saveddepending on run-time situation

Huge trees might be thrown away,but the saved nodes’ dual bound values are calculated more precisely.

22

Main results for MIP solving by ParaSCIP

November 2015 triptim3 was solved with 864 cores in 9.5 hours December 2015 rmine10 was solved with 48 restarted jobs

23

Titan: Cray XK7, Opteron 6274 16C 2.2GHz, Cray Gemini interconnectISM: Fujitsu PRIMERGY RX200S5HLRN II: SGI Altix ICE 8200EX (Xeon QC E5472 3.0 GHz/X5570 2.93 GHz)HLRN III: Cray XC30 (Intel Xeon E5-2695v2 12C 2.400GHz, Aries interconnect)

The 2018 International Conference on Artificial Life and Robotics

The biggest and the longest computationSolving rmine10: 48 restarted runs with 6,144 to 80,000 cores

The 2018 International Conference on Artificial Life and Robotics 24

1

100

10000

1x106

1x108

1x1010

0 1x106 2x106 3x106 4x106 5x106 6x106 7x106 0

10000

20000

30000

40000

50000

60000

70000

80000

90000

Numb

er of

Node

s

Numb

er of

Active

Solve

rs + 1

Computing Time (sec.)

# nodes left# active solvers

# nodes in check-point file

-1916

-1915.5

-1915

-1914.5

-1914

-1913.5

-1913

-1912.5

-1912

0 1x106 2x106 3x106 4x106 5x106 6x106 7x106

Objec

tive F

uncti

on Va

lue

Computing Time (sec.)

IncumbentsGlobal LBs

How upper and lower bounds evolved

How open nodes and active solvers evolved

Titan with 80,000 coresThe others: HLRN III

It took about 75 days and 6,405 years of

CPU core hours!

UG can handle up to 80,000

MPI process

Combining with internal parallelization

The 2018 International Conference on Artificial Life and Robotics 25

ug[Xpress,MPI] : ParaXpressu A powerful massively parallel MIP solver

u Can handle, hopefully efficiently, up to80,000 (MPI processes ) x 24 (threads) = 1,920,000 (cores)

Shared MemoryShared Memory

ParaXpressLoad-

Coordinator ProcessXpress

threadthreadthreadthread

UG Solver processXpress

threadthreadthreadthread

UG Solver processXpress

threadthreadthreadthread

UG Solver processXpress

threadthreadthreadthread

Processing Element or Compute node

UG Solver

processXpress

threadthreadthreadthread

UG Solver

processXpress

threadthreadthreadthread

UG Solver processXpress

threadthreadthreadthread

UG Solver processXpress

threadthreadthreadthread

Processing Element or Compute node

Connection network

: CPU core

6

28

7 5 4 30

4

3

11 4

020

5

10

15

20

25

30

35

2011 2012 2013 2014 2015 2016 2017

Not UG (commercial solvers and SCIP) UG

Solving open instances of MIPLIB2010

The 2018 International Conference on Artificial Life and Robotics 26

Year

Num

ber o

f ope

n in

stan

ces

solve

d http://miplib.zib.de

75 open instances left

MIPLIB2010 was

published

HLRN II -> HLRN III7,000 cores -> 43,000 cores

New ParaXpressParaXpress

started

37

Page 6: Harnessing over a Million CPU Cores to Solve a Single Hard ...alife-robotics.co.jp/ICAROB2019/IS05.pdf · a Single Hard Mixed Integer Programming Problem on a Supercomputer Yuji Shinano

Yuji Shinano

© The 2018 International Conference on Artificial Life and Robotics (ICAROB2018), Feb. 1-4, B-Con Plaza, Beppu, Oita, Japan

Conclusions

UG is a general framework to parallelize any kind of state-of-the art branch-and-bound based solver.

ug[SCIP,* is tool to develop parallel general branch-and-cut solvers. Customized SCIP solver can be parallelized with least effort.

ug[SCIP-Jack,*] is solver for Steiner Tree Problems and its variants, namely only the solver which can run on a distributed memory computing environment (solved three open benchmark instances.

ug[Xpress, MPI](=ParaXpress) and ug[PIPS-SBB, MPI] is ready to run on over a million CPU cores.

UGS is another general framework to configure a parallel solver that can realize any combination of algorithm portfolio and racing.

References 1. Y.Shinano: The Ubiquity Generator Framework: 7 Years of Progress in Parallelizing Branch-and-Bound, ZIB-Report (17-60), 2017. 2. Y.Shinano, S.Heinz, S.Vigerske,M.Winker: FiberSCIP - A shared memory parallelization of SCIP, INFORMS Journal on Computing, 30(1):11–30, 2018 3.Y.Shinano, T.Achterberg, T.Berthold, S.Heinz, T.Koch, M.Winker: Solving Hard MIPLIB2003 Problems with ParaSCIP on Supercomputers: An Update, Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International, pp.1552 – 1561, 2014 4. Y.Shinano, T.Achterberg, T.Berthold, S.Heinz, T.Koch, M.Winker: Solving Open MIP Instances with ParaSCIP on Supercomputers using up to 80,000 Cores, Proc. of 30th IEEE International Parallel & Distributed Processing Symposium, pp.770-779, 2016 5. Y.Shinano, T.Berthold, S.Heinz, A First Implementation of ParaXpress: Combining Internal and External Parallelization to Solve MIPs onSupercomputers, to appear ICMS 2016 proceedings. 6. Lluis-Miquel Munguia, Geoffrey Oxberry, Deepak Rajan, Yuji Shinano: Parallel PIPS-SBB: Multi-Level Parallelism For Stochastic Mixed-Integer Programs, ZIB-Report (17-58), 2017. 7. Gerald Gamrath, Thorsten Koch, Stephen Maher, Daniel Rehfeldt, Yuji Shinano: SCIP-Jack – A solver for STP and variants with parallelization extensions, Mathematical Programming Computation, 9(2), pp. 231-296, 2017

Run different solvers with different configurations in parallel UGS: UG synthesizer• Runs many different solvers in parallel

as MPMD(Multiple Program, Multiple Data) model MPI program

The 2018 International Conference on Artificial Life and Robotics 28

UGS

Mediate solutionsharing

ugsXpressconfig1

ugsXpressconfig2

ugsGurobiconfig1

ugsCPLEXconfig1

ugsParaXpress

config1

ugsPAC_Xpress

config1: B&B solvers: Heuristic solvers

can be distributed memory solvers

The first job:Multiple ParaXpresses run with different configurations

with completely different solver implementations sharing incumbent solutions

The following jobs:Restarted from the most promising check pointing file

38