Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
A Tale of Two Systems: Flexibility of Usage of Kraken and Nautilus at the National Institute for Computational Sciences
Amy F. Szczepański*‡, Jian Huang*‡, Sean Ahern*†, Mark Fahey†¶*Remote Data Analysis and Visualization Center†National Institute for Computational Sciences‡Electrical Engineering and Computer Science¶ Industrial and Information EngineeringUniversity of Tennessee
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation
NautilusNautilus
KeenelandKeeneland
Bra
nd n
ew!!!
112,896 cores1.33 GB memory/core
1024 cores4 GB memory/core4 TB shared memory
March 8, 2011 – March 7, 2012
Three types of comparisons
• Comparisons about overall job mix.
• Explicitly comparing Kraken to Nautilus: quantitatively describing jobs run on each system.
• Comparison of overall usage pattern: the 80/20 rule (Pareto Principle) is replaced by the 90/10 rule.
Comparison of overall job mix
• NICS-Kraken (XSEDE) vs. ORNL-Jaguar (DOE)
• NICS-Nautilus vs. other XSEDE systems
• NICS-Kraken vs. other XSEDE systems
Conclusion: Can not necessarily extrapolate from one system to another. Usage patterns depend on both users and architecture.
NSF usage should not be extrapolated from DOE
Job mix on Kraken
0–512 cores 513–8192 cores
8193–49,536
49,5
37+
43% of Kraken’s 112,896 cores
Job mix on Kraken: 10% of work done on > 43%
Jaguar: over 25% of CPU-hours from jobs that used > 40% of the system
Kraken vs. Jaguar
Kraken Jaguar
allocation process
XRAC and NICS
INCITE and ORNL
proposal restrictions all considered
use over 20% of system
projects/year hundreds ~35 (INCITE)
users thousands hundreds
projects > 30M hours rare common
What this means for XSEDE
• Lower core counts: Different needs for support scaling applications to larger numbers of cores.
• Many small jobs running: Possibly different patterns of I/O contention between running jobs.
• More projects: Potential for supporting a wider variety of software.
• More users: Increased demand on accounts set up and user support.
Job mix on Nautilus
XSEDE: Blacklight vs. Nautilus
https://www.xsede.org/web/guest/project-documents
XSEDE 2012-Q1 report
NICS Nautilus:
PSC Blacklight:
XSEDE: Longhorn vs. Nautilus
TACC Longhorn:
NICS Nautilus:
https://www.xsede.org/web/guest/project-documents
XSEDE 2012-Q1 report
XSEDE: More from 2012-Q1 Report
https://www.xsede.org/web/guest/project-documents
KrakenGordon
Ranger Trestles
Comparisons between Kraken and Nautilus
• Projects that only ran < 1024 cores on Kraken #vs. all projects on Nautilus##244 of 493 projects on Kraken#62 projects on Nautilus
• Projects that ran jobs on both Kraken and Nautilus##28 projects ran on both machines
• Nautilus: Computation vs. analysis
Comparisons of small jobs
• Size of jobs - number of processors and how long they run
• Quality of service - queue wait time and scheduler expansion factor
Observation: No meaningful difference in the size of the jobs, but users tend to run longer jobs on Kraken.
Observation: Based solely on quality of service, there is no compelling reason for small users of Kraken to move time to Nautilus.
Number of cores used in small jobs on Kraken and all jobs on Nautilus
Running time, in hours, of small jobs on Kraken and all jobs on Nautilus
Number of hours that jobs up to 1024 cores wait in the queue before they run
Kraken is fully allocated, and Nautilus is not.Kraken ran at 91% utilization during this time period.Nautilus ran at 41% utilization during this time period.
Scheduler expansion factor
The scheduler expansion factor for small jobs on Kraken and all jobs on Nautilus
Projects using both systems
• 28 projects running 90,540 jobs on Kraken and 59,330 jobs on Nautilus
• 74% of jobs and 66% of CPU-hours on Nautilus come from projects that use both Kraken and Nautilus
• 77% of this usage is in the analysis queue
• Median of 255 and mean of 2996 CPU-hours on Kraken per CPU-hour used on Nautilus
Projects using both systems
Kraken Nautilus
average walltime 4.6 hours 1.3 hours
interactive jobs (-I flag) 1.5% 4.9%
mean memory per core ≤ 1.33 GB 1786 MB
The 90/10 rule
On both Kraken and Nautilus, 10% of the projects use 90% of the CPU-hours.
This is an XSEDE-wide issue
XSEDE 2012-Q1 report
https://www.xsede.org/web/guest/project-documents
Usage analyses useful for XSEDE
• Software mix #Need for measurement tools #http://doi.ieeecomputersociety.org/10.1109/MC.2012.192
• Job mix — this paper
• User persistence #David Hart, NCAR: Today at 11:30 (EOT track)
Acknowledgments
• National Science Foundation• ARRA-NSF-OCI-0906324 for the NICS-RDAV center
• NSF-OCI-1136246 for undergraduate research in the NICS-RDAV center
• NSF-OCI-0711134 for initial funding to support Kraken
• NSF-OCI-1053575 for funding to support NICS's participation in XSEDE.
• Troy Baer• Nick Lineback