26
A Tale of Two Systems: Flexibility of Usage of Kraken and Nautilus at the National Institute for Computational Sciences Amy F. Szczepański *‡ , Jian Huang *‡ , Sean Ahern *† , Mark Fahey †¶ * Remote Data Analysis and Visualization Center National Institute for Computational Sciences Electrical Engineering and Computer Science Industrial and Information Engineering University of Tennessee Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation

A Tale of Two Systems: Flexibility of Usage of Kraken and … · 2012. 9. 5. · A Tale of Two Systems: Flexibility of Usage of Kraken and Nautilus at the National Institute for Computational

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • A Tale of Two Systems: Flexibility of Usage of Kraken and Nautilus at the National Institute for Computational Sciences

    Amy F. Szczepański*‡, Jian Huang*‡, Sean Ahern*†, Mark Fahey†¶*Remote Data Analysis and Visualization Center†National Institute for Computational Sciences‡Electrical Engineering and Computer Science¶ Industrial and Information EngineeringUniversity of Tennessee

    Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation

  • NautilusNautilus

    KeenelandKeeneland

    Bra

    nd n

    ew!!!

  • 112,896 cores1.33 GB memory/core

    1024 cores4 GB memory/core4 TB shared memory

    March 8, 2011 – March 7, 2012

  • Three types of comparisons

    • Comparisons about overall job mix.

    • Explicitly comparing Kraken to Nautilus: quantitatively describing jobs run on each system.

    • Comparison of overall usage pattern: the 80/20 rule (Pareto Principle) is replaced by the 90/10 rule.

  • Comparison of overall job mix

    • NICS-Kraken (XSEDE) vs. ORNL-Jaguar (DOE)

    • NICS-Nautilus vs. other XSEDE systems

    • NICS-Kraken vs. other XSEDE systems

    Conclusion: Can not necessarily extrapolate from one system to another. Usage patterns depend on both users and architecture.

  • NSF usage should not be extrapolated from DOE

  • Job mix on Kraken

    0–512 cores 513–8192 cores

    8193–49,536

    49,5

    37+

    43% of Kraken’s 112,896 cores

  • Job mix on Kraken: 10% of work done on > 43%

    Jaguar: over 25% of CPU-hours from jobs that used > 40% of the system

  • Kraken vs. Jaguar

    Kraken Jaguar

    allocation process

    XRAC and NICS

    INCITE and ORNL

    proposal restrictions all considered

    use over 20% of system

    projects/year hundreds ~35 (INCITE)

    users thousands hundreds

    projects > 30M hours rare common

  • What this means for XSEDE

    • Lower core counts: Different needs for support scaling applications to larger numbers of cores.

    • Many small jobs running: Possibly different patterns of I/O contention between running jobs.

    • More projects: Potential for supporting a wider variety of software.

    • More users: Increased demand on accounts set up and user support.

  • Job mix on Nautilus

  • XSEDE: Blacklight vs. Nautilus

    https://www.xsede.org/web/guest/project-documents

    XSEDE 2012-Q1 report

    NICS Nautilus:

    PSC Blacklight:

  • XSEDE: Longhorn vs. Nautilus

    TACC Longhorn:

    NICS Nautilus:

    https://www.xsede.org/web/guest/project-documents

    XSEDE 2012-Q1 report

  • XSEDE: More from 2012-Q1 Report

    https://www.xsede.org/web/guest/project-documents

    KrakenGordon

    Ranger Trestles

  • Comparisons between Kraken and Nautilus

    • Projects that only ran < 1024 cores on Kraken #vs. all projects on Nautilus##244 of 493 projects on Kraken#62 projects on Nautilus

    • Projects that ran jobs on both Kraken and Nautilus##28 projects ran on both machines

    • Nautilus: Computation vs. analysis

  • Comparisons of small jobs

    • Size of jobs - number of processors and how long they run

    • Quality of service - queue wait time and scheduler expansion factor

    Observation: No meaningful difference in the size of the jobs, but users tend to run longer jobs on Kraken.

    Observation: Based solely on quality of service, there is no compelling reason for small users of Kraken to move time to Nautilus.

  • Number of cores used in small jobs on Kraken and all jobs on Nautilus

    Running time, in hours, of small jobs on Kraken and all jobs on Nautilus

  • Number of hours that jobs up to 1024 cores wait in the queue before they run

    Kraken is fully allocated, and Nautilus is not.Kraken ran at 91% utilization during this time period.Nautilus ran at 41% utilization during this time period.

  • Scheduler expansion factor

    The scheduler expansion factor for small jobs on Kraken and all jobs on Nautilus

  • Projects using both systems

    • 28 projects running 90,540 jobs on Kraken and 59,330 jobs on Nautilus

    • 74% of jobs and 66% of CPU-hours on Nautilus come from projects that use both Kraken and Nautilus

    • 77% of this usage is in the analysis queue

    • Median of 255 and mean of 2996 CPU-hours on Kraken per CPU-hour used on Nautilus

  • Projects using both systems

    Kraken Nautilus

    average walltime 4.6 hours 1.3 hours

    interactive jobs (-I flag) 1.5% 4.9%

    mean memory per core ≤ 1.33 GB 1786 MB

  • The 90/10 rule

    On both Kraken and Nautilus, 10% of the projects use 90% of the CPU-hours.

  • This is an XSEDE-wide issue

    XSEDE 2012-Q1 report

    https://www.xsede.org/web/guest/project-documents

  • Usage analyses useful for XSEDE

    • Software mix #Need for measurement tools #http://doi.ieeecomputersociety.org/10.1109/MC.2012.192

    • Job mix — this paper

    • User persistence #David Hart, NCAR: Today at 11:30 (EOT track)

  • Acknowledgments

    • National Science Foundation• ARRA-NSF-OCI-0906324 for the NICS-RDAV center

    • NSF-OCI-1136246 for undergraduate research in the NICS-RDAV center

    • NSF-OCI-0711134 for initial funding to support Kraken

    • NSF-OCI-1053575 for funding to support NICS's participation in XSEDE.

    • Troy Baer• Nick Lineback