40
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Maxime Martinasso * , Grzegorz Kwasniewski , Sadaf R. Alam * , Thomas C. Schulthess *‡§ , Torsten Hoefler * Swiss National Supercomputing Centre, ETH Zurich, 6900 Lugano, Switzerland Department of Computer Science, ETH Zurich, Universit ¨ atstr. 6, 8092 Zurich, Switzerland Institute for Theoretical Physics, ETH Zurich, 8093 Zurich, Switzerland § Computer Science and Mathematics Division, Oak Ridge National Laboratory, USA

HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

A PCIe Congestion-Aware Performance Modelfor Densely Populated Accelerator Servers

Maxime Martinasso∗, Grzegorz Kwasniewski†, Sadaf R. Alam∗,Thomas C. Schulthess∗‡§, Torsten Hoefler†∗Swiss National Supercomputing Centre, ETH Zurich, 6900 Lugano, Switzerland

†Department of Computer Science, ETH Zurich, Universitatstr. 6, 8092 Zurich, Switzerland‡ Institute for Theoretical Physics, ETH Zurich, 8093 Zurich, Switzerland

§Computer Science and Mathematics Division, Oak Ridge National Laboratory, USA

Page 2: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Why more densely populated accelerator servers?

accelerators are faster and more energy-efficient than CPU

densely populated accelerator servers are high performance nodes

reduce space occupancy of the data center

HPC Advisory Council – A PCIe performance model | 2

Page 3: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Cray CS Storm – new MeteoSwiss supercomputer

2 cabinets

12 hybrid computing nodes per cabinet

2 Intel Haswell 12-core CPUs per node

8 NVIDIA Tesla K80 GPU acceleratorsper node

2 GPU processors per accelerator

192 GPU processors in total

360 GPU teraflops in total

Production system

GPUs connected by PCI-Express

HPC Advisory Council – A PCIe performance model | 3

Page 4: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

generation 3, 16 GB/s using x16 wide lane

dual simplex (a pair of unidirectional links)

exchange buffer availability between pair of ports of a link

tree-based topology

Building a densely populatedaccelerator servers with PCIe:

PortRCPCIe RootComplex

PCIe Switch GPU

Legend:

x16 Link

RC

CPU CPU

Root Complex

HPC Advisory Council – A PCIe performance model | 4

Page 5: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

generation 3, 16 GB/s using x16 wide lane

dual simplex (a pair of unidirectional links)

exchange buffer availability between pair of ports of a link

tree-based topology

Building a densely populatedaccelerator servers with PCIe:

PortRCPCIe RootComplex

PCIe Switch GPU

Legend:

x16 Link

RC

CPU CPU

Root Complex

GPUK80

HPC Advisory Council – A PCIe performance model | 4

Page 6: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

generation 3, 16 GB/s using x16 wide lane

dual simplex (a pair of unidirectional links)

exchange buffer availability between pair of ports of a link

tree-based topology

Building a densely populatedaccelerator servers with PCIe:

PortRCPCIe RootComplex

PCIe Switch GPU

Legend:

x16 Link

RC

CPU CPU

Root Complex

GPUK80

GPUK80

HPC Advisory Council – A PCIe performance model | 4

Page 7: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

generation 3, 16 GB/s using x16 wide lane

dual simplex (a pair of unidirectional links)

exchange buffer availability between pair of ports of a link

tree-based topology

Building a densely populatedaccelerator servers with PCIe:

PortRCPCIe RootComplex

PCIe Switch GPU

Legend:

x16 Link

RC

CPU CPU

Root Complex

GPUK80

GPUK80

GPUK80

HPC Advisory Council – A PCIe performance model | 4

Page 8: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

generation 3, 16 GB/s using x16 wide lane

dual simplex (a pair of unidirectional links)

exchange buffer availability between pair of ports of a link

tree-based topology

Building a densely populatedaccelerator servers with PCIe:

PortRCPCIe RootComplex

PCIe Switch GPU

Legend:

x16 Link

RC

CPU CPU

Root Complex

GPUK80

GPUK80

GPUK80

GPUK80

HPC Advisory Council – A PCIe performance model | 4

Page 9: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

generation 3, 16 GB/s using x16 wide lane

dual simplex (a pair of unidirectional links)

exchange buffer availability between pair of ports of a link

tree-based topology

Building a densely populatedaccelerator servers with PCIe:

PortRCPCIe RootComplex

PCIe Switch GPU

Legend:

x16 Link

RC

K80

0 1 2 3 4 5 6 7

CPU CPU

Root Complex

HPC Advisory Council – A PCIe performance model | 4

Page 10: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Communication conflicts0→7, 1→4, 6→7

RC

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7

8

12

16

18

13

9 10 11

1514

17

19

HPC Advisory Council – A PCIe performance model | 5

Page 11: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Communication conflicts0→7, 1→4, 6→7

RC

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7

8

12

16

18

13

9 10 11

1514

17

19

Upstream portconflict

HPC Advisory Council – A PCIe performance model | 5

Page 12: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Communication conflicts0→7, 1→4, 6→7

RC

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7

8

12

16

18

13

9 10 11

1514

17

19

Upstream portconflict

Downstream portconflict

HPC Advisory Council – A PCIe performance model | 5

Page 13: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Communication conflicts0→7, 1→4, 6→7

RC

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7

8

12

16

18

13

9 10 11

1514

17

19

Upstream portconflict

Downstream portconflict

Crossing the Root Complexconflict

HPC Advisory Council – A PCIe performance model | 5

Page 14: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Communication conflicts0→7, 1→4, 6→7

RC

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7

8

12

16

18

13

9 10 11

1514

17

19

Upstream portconflict

Downstream portconflict

Crossing the Root Complexconflict

Head-of-Line blocking

HPC Advisory Council – A PCIe performance model | 5

Page 15: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

RC

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7

8

12

16

18

13

9 10 11

1514

17

19

Upstream portconflict

Downstream portconflict

Crossing the Root Complexconflict

Head-of-Line blocking

Conflicts identify by:

Carefully reading the PCIe and vendor documentation

Self experiments with micro-benchmarks (looking for HoL Blocking)

HPC Advisory Council – A PCIe performance model | 5

Page 16: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Motivation – COSMO halo exchange

GPU0 GPU1

GPU5GPU4

GPU3GPU2

GPU6 GPU7

2D domain decomposition GPU0 GPU1

GPU4

GPU3

GPU7GPU6

GPU2

GPU5

3D domain decomposition

Which order of communications is the fastest?

0→4 1→0 2→1 3→7 4→0 5→4 6→2 7→60→1 1→2 2→6 3→2 4→5 5→6 6→7 7→3

1→5 2→3 5→1 6→5

0→1 1→0 2→1 3→2 4→0 5→1 6→2 7→30→4 1→5 2→6 3→7 4→5 5→6 6→5 7→6

1→2 2→3 5→4 6→7. . .

HPC Advisory Council – A PCIe performance model | 6

Page 17: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Motivation – COSMO halo exchange

GPU0 GPU1

GPU5GPU4

GPU3GPU2

GPU6 GPU7

2D domain decomposition GPU0 GPU1

GPU4

GPU3

GPU7GPU6

GPU2

GPU5

3D domain decomposition

Which order of communications is the fastest?

0→4 1→0 2→1 3→7 4→0 5→4 6→2 7→60→1 1→2 2→6 3→2 4→5 5→6 6→7 7→3

1→5 2→3 5→1 6→5

0→1 1→0 2→1 3→2 4→0 5→1 6→2 7→30→4 1→5 2→6 3→7 4→5 5→6 6→5 7→6

1→2 2→3 5→4 6→7. . .

2D domain decomposition example: 20, 376 possibilities

3D domain decomposition has more than 1.6 Million possibilities

HPC Advisory Council – A PCIe performance model | 6

Page 18: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

PCIe performance model

We want to identify the congestion factors ρ ∈ [0, 1] which limit theavailable bandwidth per communication at each communication phase.

Step (D)Step (C)Step (B)Step (A)

Update messages

Head-of-LineBlocking

Downstreamport

conflicts+ RC

Upstreamport

conflicts

Source Arbitration

Communicationgraph

model: compute

Advance to next time stepUpdate communication graph

Remainingmessages?

Yes

No

End

HPC Advisory Council – A PCIe performance model | 7

Page 19: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Communication phase – update messages

Elapsed time Lc , message size Mc , set of communication phases Sc :

Time

LC0 = t0 = 1B ·

m0,0

ρ0,0and MC0 = m0,0

HPC Advisory Council – A PCIe performance model | 8

Page 20: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Communication phase – update messages

Elapsed time Lc , message size Mc , set of communication phases Sc :

Time

LC0 = t0 + t1 = 1B · (

m0,0

ρ0,0+

m0,1

ρ0,1) and MC0 = m0,0 + m0,1

HPC Advisory Council – A PCIe performance model | 8

Page 21: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Communication phase – update messages

Elapsed time Lc , message size Mc , set of communication phases Sc :

Time

Lc =∑

i∈Scti =

1B

∑i∈Sc

mc,i

ρc,iand Mc =

∑i∈Sc

mc,i

(1) start times tstarti and Mc are known(2) ρc,i are given by the model

with (1) and (2) Lc are computable

HPC Advisory Council – A PCIe performance model | 8

Page 22: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Model conflicts on switch

We want to identify the congestion factors ρ ∈ [0, 1] which limit theavailable bandwidth per communication at each communication phase.

Each communication enters a switch with a congestion factor ρ andleaves with a congestion factor ρ′

If∑

i ρi > 1 then an arbitration policy is required,ρ′ = ρ otherwise

HPC Advisory Council – A PCIe performance model | 9

Page 23: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Upstream port conflict

Proportional sharing of availablebandwidth

HPC Advisory Council – A PCIe performance model | 10

Page 24: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Downstream port conflict

Round-robin policy

Performance reduction for crossingthe root complex

CR – set of communications crossing the root complex;n – number of grouped communication sets;R – congestion factor of a grouped communication set;τ – congestion factor for crossing the root complex;

– if CR = ∅ then

R′ =1

n

– if CR 6= ∅ then

R′ =

{min(max( 1

n − τ, 0),R) if R contains comm. ∈ CR

min( 1n + τ,R) otherwise

HPC Advisory Council – A PCIe performance model | 11

Page 25: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Complete example

Step (D)Step (C)Step (B)Step (A)

Update messages

Head-of-LineBlocking

Downstreamport

conflicts+ RC

Upstreamport

conflicts

Source Arbitration

RC

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7

8

12

16

18

13

9 10 11

1514

17

19

τ = 0.2Message size: 300MB

Bandwidth: 11.6 GB/s

comm. Step (A) Step (B) Step (C) Step (D)

(a) 0→2 1 1/2 1/2 3/10 3/10

(b) 1→4 1 1/2 3/10 3/10 3/10

(c) 3→2 1 1 1/2 1/2 7/10

(d) 6→4 1 1 7/10 7/10 7/10

Congestion graph step 1

comm. cong. factor data remaining elapsed time

(a) 0→2 3/10 128 MB 36 ms

(b) 1→4 3/10 128 MB 36 ms

(c) 3→2 7/10 0 MB 36 ms

(d) 6→4 7/10 0 MB 36 ms

Congestion graph step 2

comm. cong. factor data remaining elapsed time

(a) 0→2 1/2 0 MB 65 ms

(b) 1→4 1/2 0 MB 65 ms

HPC Advisory Council – A PCIe performance model | 12

Page 26: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Model Validation

Architecture parameters:B = 11.6GB/sτ = 0.1735

22,259 graphs:non-isomorphiccudaMemcpyAsyncCommunication pattern:scatter, gather, all-to-allEntire set of graphs forsubsets of GPUsRandomly generated'100K communications

Message size: 300 MB

Time no contention:Tref = 25.3ms

95% of communication are in range +/- 15%

HPC Advisory Council – A PCIe performance model | 13

Page 27: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Back to the motivation – COSMO halo exchangeUpper limit on time to solution, throughput approach

Running mode one instance per socket (8 GPUs)

Large domain size 256x256x80 per GPU

One step triggers 312 halo exchanges

Message size: 40 KB to 254 KB

Uses MPI

(3!)4 × (2!)4 = 20, 736 communication graphs for 2D domain

Congestion graphs sorted by elapsed time0.14

0.16

0.18

0.20

0.22

0.24

0.26

0.28

Tim

e [

s]

current implemented graph in COSMO

1.6x 1.9x

HPC Advisory Council – A PCIe performance model | 14

GPU0 GPU1

GPU5GPU4

GPU3GPU2

GPU6 GPU7

2D domain decomposition GPU0 GPU1

GPU4

GPU3

GPU7GPU6

GPU2

GPU5

3D domain decomposition

Page 28: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Fastest schedule for 2D decomposition

RC

0 1 2 3 4 5 6 7

RC

0 1 2 3 4 5 6 7

RC

0 1 2 3 4 5 6 7

4 5 6 7

0 1 2 3

4 5 6 7

0 1 2 3

4 5 6 7

0 1 2 3

HPC Advisory Council – A PCIe performance model | 15

Page 29: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Fastest schedule for 2D decomposition

HPC Advisory Council – A PCIe performance model | 15

RC

0 1 2 3 4 5 6 7

4 5 6 7

0 1 2 3

RC

Page 30: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Fastest schedule for 2D decomposition

HPC Advisory Council – A PCIe performance model | 15

RC

0 1 2 3 4 5 6 74 5 6 7

0 1 2 3

RC

Page 31: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Fastest schedule for 2D decomposition

HPC Advisory Council – A PCIe performance model | 15

RC

0 1 2 3 4 5 6 7

4 5 6 7

0 1 2 3

RC

Page 32: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Fastest schedule for 3D decomposition

RC

0 1 2 3 4 5 6 7

RC

0 1 2 3 4 5 6 7

RC

0 1 2 3 4 5 6 7

4 5 6 7

0 1 2 3

4 5 6 7

0 1 2 3

4 5 6 7

0 1 2 3

HPC Advisory Council – A PCIe performance model | 16

Page 33: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Fastest schedule for 3D decomposition

HPC Advisory Council – A PCIe performance model | 16

RC

RC

0 1 2 3 4 5 6 7

4 5 6 7

0 1 2 3

Page 34: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Fastest schedule for 3D decomposition

HPC Advisory Council – A PCIe performance model | 16

RC

RC

0 1 2 3 4 5 6 74 5 6 7

0 1 2 3

Page 35: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Fastest schedule for 3D decomposition

HPC Advisory Council – A PCIe performance model | 16

RC

RC

0 1 2 3 4 5 6 7

4 5 6 7

0 1 2 3

Page 36: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

COSMO improvement – fastest schedule

RC

0 1 2 3 4 5 6 7

RC

0 1 2 3 4 5 6 7

RC

0 1 2 3 4 5 6 7

4 5 6 7

0 1 2 3

4 5 6 7

0 1 2 3

4 5 6 7

0 1 2 3

COSMO gain: 5.6% per halo exchange step,gain is limited by MPI 2-sided overhead,

initial tests with 1-sided seems to clearly improve the performance.

HPC Advisory Council – A PCIe performance model | 17

Page 37: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Conclusion

- Latency not modeled

- MPI 2-sided overhead not modeled (use one-sided in progress)

+ Captures all PCIe features including congestion

+ Simple model only 2 parameters (B and τ )

+ Precise for large messages

+ Design of topology-aware algorithms

+ COSMO halo exchange performance gain

HPC Advisory Council – A PCIe performance model | 18

Page 38: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

Thank you for your attention.

Page 39: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

2 8 32 128 512 2K 8K 32K 128K 512K 2M 8M 32MMessage size [Byte]

0

1

2

3

4

5

6

7

8

9

10

11

12

Measu

red b

andw

itdh [

GB

/s]

1.88xslower forconflict 1

1.80xslower forconflict 2

1.44xslower forconflict 3

1.21x

0→ 1 or 0→ 3 alone4→ 1 alone0→ 3 in conflict 10→ 1 in conflict 20→ 1 in conflict 3

τ = 1− 1/1.21

HPC Advisory Council – A PCIe performance model | 19

Page 40: HPC-AI Advisory Council - A community effort support ... · Title: A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Author: Maxime Martinasso[0pt][0pt]*,

RCSocket

K80

TOPOLOGY T2

0 1 2 3 4 5 6 7

PortRCPCIe RootComplex

PCIe Switch GPU

K80

TOPOLOGY T1

0 1 2 3 4 5 6 7

RCSock

et

Legend:

x16 Link

HPC Advisory Council – A PCIe performance model | 20