Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Moving Lustre Forward
Brent Gorda Intel Corporation
INTEL CONFIDENTIAL 2
Intel’s Technical Computing: Built for Breakthroughs
Delivering essential HPC solutions at every scale
Why? Ingenuity
Programmability Longevity
INTEL CONFIDENTIAL
Agenda
The State of Lustre* software • From Whamcloud to Intel • Current Releases • Important features and enhancements • Emerging Software, Partner and Solution Ecosystem
FastForward update: Lustre-powered storage for Exascale Moving Lustre* Forward
3
* Other names and brands may be claimed as the property of others.
INTEL CONFIDENTIAL
From Whamcloud to Intel
Founded on July 16, 2010 • Brent Gorda – CEO • Eric Barton – CTO
Founded Whamcloud to keep Lustre* moving forward for HPC
• Recognized by OpenSFS and EOFS as the maintainer of source repositories
Acquired by Intel in July 2012 • Becomes the High Performance Data Division • Same team, same mission, more resources
INTEL CONFIDENTIAL 5
Development of a Vibrant Ecosystem
2010 2012
INTEL CONFIDENTIAL
LUSTRE* SOFTWARE
INTEL CONFIDENTIAL 7
Community Lustre Roadmap
1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month intervals. Updates to past versions will be made on an ad hoc basis.
2 Feature releases focus on introducing new features. New release versions are expected at 6 month intervals. New maintenance versions from the feature release stream are anticipated at 18 month intervals.
Sponsors for Development and Releases: LLNL ORNL Intel OpenSFS CEA Xyratex Indiana University
2012 2013 2014 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2
Feature Releases2
2.3.0 2.4.0 2.5.0 2.6.0
Maintenance Releases1
2.1.5 2.1.6 2.1.4 2.1.3 ad hoc
2.4.3 2.4.4 2.4.2 2.4.1 2.4.0
Server Stack SMP Scaling Online check/scrub Job Stats
DNE Phase 2 LFSCK MDT-‐MDT Consistency UID Mapping & Shared Key
LFSCK MDT-‐OST Consistency HSM
OSD restructuring DNE Phase 1 LFSCK MDT FID/ LinkEA Network Request Scheduler 4 MB I/O RPC
06-01-2013
INTEL CONFIDENTIAL
Increasing Community Participation
0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000
[2.1]
2.1
2.2
2.3
Bull
CEA
Cray
DDN
EMC
Fujitsu
LLNL
NICS
ORNL
TACC
Ultrascale
UVT
Whamcloud/Intel Xyratex
Sun/Oracle
Source: Intel internal statistics related to the lines of approved code per contributor per release.
INTEL CONFIDENTIAL
Lustre 2.3 Feature Highlights
Server stack SMP scaling • Performance improvements for multi-core servers
OI scrub
• Background consistency check Lustre 2.3 is available today
INTEL CONFIDENTIAL
Lustre 2.3 / 2.4 Highlights
• SMP Scaling • DNE Phase 1
• Multiple MDS/MDTs in a single file system • Layout lock
• Required for HSM, ensures clients I/O to proper OST • OSD API • Improved single client performance • 4 MB I/O
• Larger RPC size improves performance to back-end disk • Network Request Scheduler • Lustre 2.4.0 released 2Q, 2013
10
INTEL CONFIDENTIAL
Purpose-built for commercial market Based on fully open sourced Lustre 2.3 core Enhanced with Intel® Manager for Lustre* Worldwide technical support Creating vibrant solution ecosystem and partner network Available from channel partners
Intel® Enterprise Edition for Lustre*
INTEL CONFIDENTIAL 12
Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 2014 2015
IEEL Version 2.0 -‐ Q3 2014 * Based on Lustre 2.6 release and an1cipated features:
• Previous IEEL Features, plus: • CIFS/NFS support • Custom Char1ng & Monitoring • Performance Diagnos1cs & Troubleshoo1ng
2013
Community Release Highlights
Intel® Enterprise Edition for Lustre* software
Intel® EE for Lustre* software Release Highlights
IEEL Version 1.1 -‐ Q1 2014 * Based on Lustre 2.5 release and an1cipated features:
• Previous IEEL Features, plus: • Hierarchical Storage Management • Dynamic LNET Configura1on • Enhanced Alerts & Logging • Intel Xeon® Phi Support
IEEL – Q3 2013 Based on Lustre 2.3 release and 2.4 clients and Intel features:
• Lustre file system plus Selected Stability Patches
• Intel Manager for Lustre* soDware • Hadoop Adapter for Lustre* soDware • Enterprise support enhancements • Maintenance Releases
*Roadmap is subject to change. Dates and features are based on current expecta1ons and are subject to changes in scheduling, altera1on, or removal.
Lustre 2.4 – Q2 2013 • OSD Restructuring • DNE Phase 1 • 4MB I/O RPC
Lustre 2.5 – Q4 2013 • Hierarchical Storage Management
Lustre 2.6 – Q2 2014 • DNE Phases 2 • UID Mapping & Shared Key
INTEL CONFIDENTIAL
Moving Lustre* Forward
INTEL CONFIDENTIAL
Bringing ‘Big Data’ to HPC
• Lustre* accelerates Hadoop applications • Global namespace allows all nodes to access all data • Larger capacity and higher I/O • Resource efficient and simpler to manage
• Fast, shared and easy access to data is critical • Open, collaboratively developed software is key
• Linux • Lustre • Hadoop
• Intel is investing in these important technologies
INTEL CONFIDENTIAL
Using Lustre to Improve Hadoop
As HPC moves toward exascale levels, simulations will get larger and more complex Better tools are needed to analyze ever larger datasets Lustre and Hadoop form an ideal foundation:
• Hadoop is the most popular software stack for big data analytics
• Lustre is the leading file system for HPC Combined benefits:
• Easier to manage a single, shared storage solution • No data transfer overhead for staging inputs and
extracting results • No need to partition storage into HPC (Lustre) and
Analytics (HDFS) 15
INTEL CONFIDENTIAL
org.apache.hadoop.fs
Using Lustre with Hadoop
• Hadoop uses pluggable extensions to work with different file systems
• Lustre is POSIX compliant: • Use Hadoop’s built-in LocalFileSystem class • Uses native file system support in Java
• Extends the default file system behavior • Optimizes the performance of the shuffle
phase
16
FileSystem
RawLocalFileSystem
LustreFileSystem
INTEL CONFIDENTIAL
TestDFSIO Benchmark Tests the raw performance of a file system Write and read very large files (35G each) in parallel One mapper per file, and single reducer to collect stats Embarrassingly parallel, does not test shuffle & sort
17 0
20
40
60
80
100
Write Read
Throughput
MB/s
Higher is better!
HDFS Lustre
filesize∑time∑
"
#$$
%
&''
INTEL CONFIDENTIAL
Terasort Benchmark Distributed sort: The primary Map-Reduce primitive Sort 1 Billion records, i.e. approximately 100G • Record: Randomly generated 10 byte key + 90 bytes garbage data Block Size: 128M, maps: 752 @ 4/node, reduces: 16 @ 2/node
18 0 100 200 300 400 500
Runtime (seconds) Less is better!
Lustre HDFS
Lustre 10-15% Faster
INTEL CONFIDENTIAL
Exascale and FastForward
INTEL CONFIDENTIAL
Dept. of Energy “FastForward” Program
• Goal: Deliver Exascale computing before 2020 • FastForward RFP provides funding for R & D • Sponsored by 7 leading US national labs • RFP elements were Processor, Memory and
Storage • Whamcloud led group won the Storage portion:
• HDF Group for HDF5 modifications and extensions • EMC for Burst Buffer manager and I/O Dispatcher • Cray for large scale testing • DDN for versioning object storage
INTEL CONFIDENTIAL
Asynchronous Programming
Bulk synchronous programming • Simplifies application development • But, is susceptible to jitter • Makes strong scaling harder
Asynchronous programming • Loosely coupled between processes
• No waiting at barriers • Closes “gaps” provided jitter balances
out over time
INTEL CONFIDENTIAL
Transactions
Consistency and Integrity Guarantee required on any and all failures
• Foundational component of system resilience Required at all levels of the I/O stack
• Metadata at one level is data to the level below
No blocking protocols Non-blocking on each OSD Non-blocking across OSDs
I/O Epochs demark globally consistent snapshots Guarantees all updates in one epoch are atomic Recovery == roll back to last globally persistent epoch
• Roll forward using client replay logs for transparent fault handling Cull old epochs when next epoch persistent on all OSDs
INTEL CONFIDENTIAL
Architecture/Workflow
INTEL CONFIDENTIAL
Exascale File System
Integrated I/O Stack • Epoch transaction model • Non-blocking scalable object I/O
HDF5/other schema • High level application object I/O model • I/O forwarding I/O Dispatcher • Burst Buffer management • Impedance match application I/O performance to
storage system capabilities
DAOS • Conventional namespace • Containers for transactional, scalable, object I/O
INTEL CONFIDENTIAL
Learn. Contribute. Join.
14 15 16 17 18 19 20 Fri Sat Sun Mon Tues Wed Thurs
7:15 – 8:45 AM EOFS panel:
“Lustre and Big Data”
[ Mark Seager ]
Noon – 3:00 PM Lustre tutorials
and 3:15 to 5:30 PM
Lustre Community BoF
[ Congress Center ]
4:00 Lustre
Community Party [ EOFS
booth ]
“Ask the Architect” Eric Barton
Intel booth #350 Mo: 7 PM
Tues/Wed: 10 AM
INTEL CONFIDENTIAL
Legal Disclaimers
26 Built for Breakthrough
Technical Collateral Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm
Roadmap Notice All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice
INTEL CONFIDENTIAL 27
Intel’s Technical Computing: Built for Breakthroughs
What will be your Breakthrough? Delivering essential HPC solutions at every scale
Why? Ingenuity
Programmability Longevity