19
Advanced Lustre ® Infrastructure Monitoring (Resolving the Storage I/O Bottleneck and managing the beast) Torben Kling Petersen, PhD Principal Architect High Performance Computing

Advanced Lustre ® Infrastructure Monitoring (Resolving the Storage I/O Bottleneck and managing the beast) Torben Kling Petersen, PhD Principal Architect

Embed Size (px)

Citation preview

  • Slide 1

Advanced Lustre Infrastructure Monitoring (Resolving the Storage I/O Bottleneck and managing the beast) Torben Kling Petersen, PhD Principal Architect High Performance Computing Slide 2 2 The Challenge Slide 3 Xyratex 2013 3 File system Up/down Slow Fragmented Capacity planning HA (Fail-overs etc) Hardware Nodes crashing Components breaking FRUs Disk rebuilds Cables ?? The REAL challenge Software Upgrades / patches ?? Bugs Clients Quotas Workload optimization Other Documentation Scalability Power consumption Maintenance windows Back-ups Slide 4 Xyratex 2013 4 Tightly integrated solutions Hardware Software Support Extensive testing Clear roadmaps In-depth training Even more extensive testing .. The Answer ?? Slide 5 Xyratex 2013 5 ClusterStor Software Stack Overview ClusterStor 6000 Embedded Application Server Intel Sandy Bridge CPU, up to 4 DIMM slots FDR & 40GbE F/E, SAS-2 (6G) B/E SBB v2 Form Factor, PCIe Gen-3 Embedded RAID & Lustre support CS 6000 SSU Lustre File System (2.x) ClusterStor Manager Data Protection Layer (RAID 6 / PD-RAID) Data Protection Layer (RAID 6 / PD-RAID) Linux OS Unified System Management (GEM-USM) Unified System Management (GEM-USM) Embedded server modules Slide 6 6 ClusterStor dashboard Problems found Slide 7 7 Hardware inventory . Slide 8 8 Slide 9 9 Finding problems ??? Slide 10 10 But things brake . Especially disk drives What then ??? Slide 11 Xyratex 2013 11 Large systems use many HDDs to deliver both performance and capacity NCSA BW uses 17,000+ HDDs for the main scratch FS At 3% AFR this means 531 HDDs fail annually Thats ~1.5 drives per day !!!! RAID 6 rebuild time under use is 24 36 hours Bottom line, the scratch system would NEVER be fully operational and there would constantly be a risk of loosing additional drives leading to data loss !! Lets do some math . Slide 12 Xyratex 2013 12 Xyratex pre-tests all drives used in ClusterStor solutions Each drive is subjected to 24-28 hours of intense I/O Reads and writes are performed to all sectors Ambient temperature cycles between 40 C and 5C Any drive surviving, goes on to additional testing As a result Xyratex disk drives deliver proven reliability with less that 0.3% annual failure rate Real Life Impact On a large system such as NCSA BlueWaters with 17,000+ disk drives, this means a predicted failure of 50 drives per year *Other vendors publically state a failure rate of 3%* which (given equivalent number of disk drives) means 500+ drive failures per year With fairly even distribution, the file system will ALWAYS be in a state of rebuild In addition as a file system with wide stripes will perform according to the slowest OST, the entire system will always run in degraded mode .. Drive Technology/Reliability *DDN, Keith Miller, LUG 2012 Slide 13 Xyratex 2013 13 Annual Failure Rate of Xyratex Disks Actual AFR Data (2012/13) Experienced by Xyratex Sourced SAS Drives Xyratex drive failure rate is less than half of industry standard ! At 0.3%, the annual failure would be 53 HDDs Slide 14 14 As growth in areal density growth slows (