PRAKTICKÝ ÚVOD DO SUPERPOČÍTAČEANSELM
Infrastruktura, přístup a podpora uživatelů
David Hrbáč2013-09-27
• Intro• What is the supercomputer• Infrastructure• Access to cluster• Support• Log-in http://www.it4i.cz/files/hrbac.pdf
http://www.it4i.cz/files/hrbac.pptx
Why Anselm
• 6000 name suggestions• The very first coal mine at the region• The very first to have a steam engine• Anselm of Canterbury
Early days
Future - Hal
What is a supercomputer
• Bunch of computers• Having a lot of CPU power• Having a lot of RAM• Local storage• Shared storage• High-speed interconnected• Message Process Interface
Supercomputer
Supercomputer ?!?
Supercomputer ?!?
Anselm HW
• 209 compute nodes• 3344 cores• 15TB RAM• 300TB /home• 135TB /scratch• Bull Extreme Computing Linux (RHEL clone)
Type of Nodes
• 180 compute nodes• 23 GPU accelerated nodes• 4 MIC accelerated nodes• 2 fat nodes
General Node
• 180 nodes• 2880 cores in total• two Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per
node• 64 GB of physical memory per node• one 500GB SATA 2,5” 7,2 krpm HDD per node• bullx B510 blade servers• cn[1-180]
GPU Accelerated Nodes• 23 nodes• 368 cores in total• two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per
node• 96 GB of physical memory per node• one 500GB SATA 2,5” 7,2 krpm HDD per node• GPU accelerator 1x NVIDIA Tesla Kepler K20 per node• bullx B515 blade servers• cn[181-203]
MIC Accelerated NodesIntel Many Integrated Core Architecture
• 4 nodes• 64 cores in total• two Intel Sandy Bridge E5-2470, 8-core, 2.3GHz processors per
node• 96 GB of physical memory per node• one 500GB SATA 2,5” 7,2 krpm HDD per node• MIC accelerator 1x Intel Phi 5110P per node• bullx B515 blade servers• cn[204-207]
Fat Node• 2 nodes• 32 cores in total• 2 Intel Sandy Bridge E5-2665, 8-core, 2.4GHz processors per
node• 512 GB of physical memory per node• two 300GB SAS 3,5”15krpm HDD (RAID1) per node• two 100GB SLC SSD per node• bullx R423-E3 servers• cn[208-209]
Storage
• 300TB /home• 135TB /scratch• Infiniband 40 Gb/s– Native 3600 MB/s– Over TCP 1700MB/s
• Ethernet – 114MB/s• LustreFS
Lustre File System
• Clustered• OSS – object storage server• MDS – meta-data server• Limits in petabytes• Parallel - striped
Stripes
• Stripe count– Parallel access– Mind the script processes– Stripe per gigabyte
• lfs setstripe|getstripe
Quotas
• /home – 250GB• /scratch – no quota
• lfs quota –u hrb33 /home
Access to Anselm
• Internal Access Call - 4x a year– 3rd round
• Open Access Call – 2x a year– 2nd round
Proposals
• Proposals undergoing evaluation– Scientific– Technical– Economical
• Primary Investigator– List of co-operators
Login Credentials
• Personal certificate• Signed request• Credentials encrypted– Login– Password– Ssh keys– Password to the key
Credentials lifetime
• Active project(s) or affiliation to IT4Innovations• Deleted 1 year after the last project• Announcement– 3 months before the removal– 1 month before the removal– 1 week before the removal
Support
• Bug tracking and trouble ticketing system• Documentation• IT4I internal command line tools• IT4I web applications• IT4I android application• End-user courses
Documentation
• https://support.it4i.cz/docs/anselm-cluster-documentation/
• Still evolving• Changes almost every day
IT4I internal command line tools
• It4free• Rspbs• Licenses allocation
• Internal in-house scripts– Automation to handle the credentials– Cluster automation– PBS accounting
IT4I web applications
• Internal information system– Project management– Project accounting– User management
• Cluster monitoring
IT4I android application
• Internal tool• Considering the release to end-users
• Features– News– Graphs
• Feature requests– Accounting– Support– Nodes allocation– Jobs status
Log-in to AnselmFinally!
• Ssh protocol• Via anselm.it4i.cz– login1.anselm.it4i.cz– login2.anselm.it4i.cz
VNC
• ssh anselm –L 5961:localhost:5961
• Remmina• Vncviewer 127.0.0.1:5961
Links
• https://support.it4i.cz/docs/anselm-cluster-documentation/
• https://support.it4i.cz/• https://www.it4i.cz/
Questions
Thank you.