32
1 © Bull, 2012 23/09/14 Yiannis Georgiou David Glesser Matthieu Hautreux Denis Trystram Adaptive Resource and Job Management for limited power consumption

Adaptive Resource and Job Management for limited power consumption · 2016. 6. 24. · [22] M. Etinski, J. Corbalan, J. Labarta, and M. Valero, “BSLD threshold driven power management

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

  • 1© Bull, 2012

    23/09/14 Yiannis GeorgiouDavid GlesserMatthieu HautreuxDenis Trystram

    Adaptive Resource and Job Managementfor limited power consumption

  • 2© Bull, 2012

    Introduction DVFS & Switch-of

    The model Algorithm and implementation Experimentations

    Conclusion and future works

  • 3© Bull, 2012

    Introduction DVFS & Switch-of

    The model Algorithm and implementation Experimentations

    Conclusion and future works

  • 4© Bull, 2014

    Introduction - Powercap

    Powercap: limit the power consumption during a certain amount of time

  • 5© Bull, 2014

    Introduction - Energy

    Why control? Power peak = O(power of a city) Power installations lifetime Electricity providers limitations Controling energy consumption = Controling cost

    How control?– DVFS– Switch-of

    • (or shutdown, or sleep mode, or hibernation...)

  • 6© Bull, 2012

    Introduction DVFS & Switch-of

    The model Algorithm and implementation Experimentations

    Conclusion and future works

  • 7© Bull, 2014

    The RJMS level – Switch-of

    Switch-of Switch-of some resources switched-of has a cost Not possible on all clusters Jobs can not run on switched-of nodes!

  • 8© Bull, 2014

    The RJMS level – Switch-of

    « Power Bonuses » If all components of a level are switched-of, the component of

    the upper level can be switched-of and provide an additional gain

    Exemples : Nodes are made of processors Chassis are made of nodes Rack are made of Chassis

    ...

  • 9© Bull, 2014

    The RJMS level – Switch-of

    « Power Bonuses » on CURIE cluster: 18 nodes per chassis, 5 chassis per rack Power gained by switching of a Chassis

    ~= Power(computing node) Power gained by switching of a Rack

    ~= 10 * Power(computing node)

    ...

    Computing node

    Switched-off node

    Chassis

    Rack

    - 344 W

    - 500 W

    - 3400 W

    ...

    ...

  • 10© Bull, 2014

    The RJMS level - DVFS

    DVFS It's a trade-of between performance and power

    consumption What about performance / energy trade-of ?

    ∫POWER .dt=Energy

  • 11© Bull, 2014

    The RJMS level - DVFS

    DVFS It's a trade-of between performance and power

    consumption What about performance / energy trade-of ?

  • 12© Bull, 2014

    The RJMS level - DVFS

    DVFS It's a trade-of between performance and power

    consumption What about performance / energy trade-of?

  • 13© Bull, 2014

    The RJMS level - DVFS

    DVFS is a trade-of between completion time and power

    No obvious performance / energy trade-of– Minimizing power != minimizing energy– The impact of DVFS is highly dependant on the job

  • 14© Bull, 2012

    Introduction DVFS & Switch-of

    The model Algorithm and implementation Experimentations

    Conclusion and future works

  • 15© Bull, 2014

    Our Model

    We work with maximum power consumptions W is the maximal computational work possible

    Powercap limitation

    W=T .(N−Noff−N dvfsσMax +N dvfsσMin )

    N off . Poff +N dvfs .PMin+(N−Noff−N dvfs) .PMax⩽P

    N X=numberof node in state XΣZ=speed degradationat state ZPY=power consumptionat YP= powercap

  • 16© Bull, 2014

    Our model

    ● In the space 3D (Ndvfs, Nof, W) is a plane

    is an half space

    ⇒ The intersection is a straight line● Within the bound of the total number of

    nodes, W is maximized when:

  • 17© Bull, 2014

    Our Model

    3 cases:

    – DVFS is better ⇒ we only use DVFS

    – Switch-of is better ⇒ we only use Switch-of

    – The powercap is so low that we should use both

  • 18© Bull, 2014

    Our model – switch-of or DVFS?

    How to choose ?

    When ρ

  • 19© Bull, 2014

    Our Model – DVFS or switch-of ?

    On CURIE cluster:

  • 20© Bull, 2012

    Introduction DVFS & Switch-of

    The model Algorithm and implementation Experimentations

    Conclusion and future works

  • 21© Bull, 2014

    The algorithm

    When a powercap limit is set Choose between DVFS and switch-of

    If DVFS When a job is being launched, Try to schedule it at the highest frequency

    If switch-of switch-of nodes at runtime, mark these nodes as « reserved » for the scheduler

  • 22© Bull, 2014

    The algorithm

    $ scontrol create res Watts=123151 ...

  • 23© Bull, 2012

    Introduction DVFS & Switch-of

    The model Algorithm and implementation Experimentations

    Conclusion and future works

  • 24© Bull, 2014

    Experimental validation

    Replay interesting parts of the CURIE workload– 5 hours, high utilization, jobs representative of the

    whole workload

    Slurm can emulate his environement– 336 Slurm nodes on 1 physical node– Sleep instead of real computational jobs

    Add a powercap Case study: 1 hour, in the middle of the trace, at diferent

    powers

  • 25© Bull, 2014

    Experimental validation

  • 26© Bull, 2014

    Experimental validation

    Idle Switch-off

  • 27© Bull, 2014

    Experimental validation

    Switch-off DVFS

  • 28© Bull, 2012

    Introduction DVFS & Switch-of

    The model Algorithm and implementation Experimentations

    Conclusion and future works

  • 29© Bull, 2014

    Current and future works

    Powercap on live power values– Implemented using Layouts

    Powercap on nodes

    DVFS– What about reproducibility of jobs runs?– To do DVFS right, we need to know the job

    Switch-of New scheduling algorithms Switch-of (with bonuses) whithout powercaps Switch-of particular components (cpus, gpus, network...)

  • 30© Bull, 2012

  • 31© Bull, 2014

    References

    [11] V. W. Freeh, D. K. Lowenthal, F. Pan, N. Kappiah, R. Springer, B. L. Rountree, and M. E. Femal, “Analyzing the energy-time trade-of in high-performance computing applications,” IEEE Transactions on Parallel and Distributed Systems, 2007.

    [22] M. Etinski, J. Corbalan, J. Labarta, and M. Valero, “BSLD threshold driven power management policy for HPC centers,” in 2010 IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010.

  • 32© Bull, 2014

    Experimental validation

    Diapo 1Diapo 2Diapo 3Diapo 4Diapo 5Diapo 6Diapo 7Diapo 8Diapo 9Diapo 10Diapo 11Diapo 12Diapo 13Diapo 14Diapo 15Diapo 16Diapo 17Diapo 18Diapo 19Diapo 20Diapo 21Diapo 22Diapo 23Diapo 24Diapo 25Diapo 26Diapo 27Diapo 28Diapo 29Diapo 30Diapo 31Diapo 32