28
Refactoring Conventional Task Schedulers to Exploit Asymmetric ARM big.LITTLE Architectures in Dense Linear Algebra Luis Costero, Francisco D. Igual, Katzalin Olcoz Sandra Catalán, Rafael Rodríguez- Sánchez, Enrique S. Quintana-Ortí

Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

Refactoring Conventional Task Schedulers to Exploit

Asymmetric ARM big.LITTLE Architectures in Dense Linear Algebra

Luis Costero, Francisco D. Igual, Katzalin Olcoz

Sandra Catalán, Rafael Rodríguez-Sánchez, Enrique S. Quintana-Ortí

Page 2: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

https://www.youtube.com/watch?v=KClygZtp8mA

Page 3: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

Task parallelism

Page 4: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

Contribution

Asymmetry-oblivious scheduler

Asymmetry-aware DLA library+

Page 5: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

Contribution

Asymmetry-oblivious scheduler

Asymmetry-aware DLA library+

Task parallelism Data parallelism

Page 6: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

Contribution

Asymmetry-oblivious scheduler

Asymmetry-aware DLA library+

Task parallelism Data parallelism

Virtual Cores

Page 7: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

Software execution models for ARM big.LITTLE

Page 8: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

Target architecture

Page 9: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

Execution Models

Cluster swithching mode

CPU Migration

Global task scheduling

Page 10: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

Parallel execution of DLA operations on multi-threaded architectures

Page 11: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

A=UTU

Page 12: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

Runtime task scheduling of DLA operations

● Task scheduling for the Cholesky factorization

Page 13: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

Runtime task scheduling of DLA operations

● Task scheduling in heterogeneous architectures– The runtime distinguishes between CPU and

GPU targets: OmpSs, StarPU, MAGMA, libflame

– Tasks assigned depending on target properties and specific techniques are applied

Page 14: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

Runtime task scheduling of DLA operations

● Task scheduling in asymmetric architectures– Asymmetry-concious runtime: Botlev-OmpSs– Critical-aware Task Scheduler policy– Each task is mapped to a single core

Page 15: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

Data parallel libraries of BLAS3 kernels

● Multi-threaded implementation of the BLAS-3

Page 16: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

Data parallel libraries of BLAS3 kernels

● Data-parallel libraries for asymmetric architectures:– Global Task Scheduling– Dynamic workload distribution between the

clusters– Static workload distribution in a cluster– Specific loop strides for each type of core

Page 17: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

Retargeting existing task schedulers to asymmetric architectures

Page 18: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

Evaluation of conventional runtimes on AMPs

Page 19: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

Combining conventional runtimes with asymmetric libraries

● GTS model (inspired in CPUM)– Virtual cores composed of 1A15 + 1A7

– Both cores are active simultaneously

● Parallelism:– Task-level: symmetric runtime

– Data-level: asymmetric library

Page 20: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

Combining conventional runtimes with asymmetric libraries

● Comparison with other approaches:✔ Any conventional task scheduler will work

transparently with no special modifications✔ Any improvement in the runtime will impact the

performance on an AMP✔ Any improvement in the asymmetry-aware library

will impact the performace on an AMP✗ Need of a tuned asymmetry-aware DLA library

Page 21: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

Experimental results

Page 22: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

Performance evaluation of the asymmetric BLIS

Page 23: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

Performance evaluation of the asymmetric BLIS

Page 24: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

Integration of the asymmetric BLIS in a conventional task scheduler

Page 25: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

Performance comparison versus asymmetry-aware task scheduler

Page 26: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

Conclusions

Page 27: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

In this work...

● Task-parallelism + Data-parallelism on AMPs● Reuse of existing task schedulers.● Competitive with asymmetry-aware schedulers

Page 28: Refactoring Conventional Task Schedulers to Exploit ...€¦ · Task parallelism Data parallelism. Contribution Asymmetry-oblivious scheduler Asymmetry-aware + DLA library Task parallelism

Thank you