27
Moving Lustre Forward Brent Gorda Intel Corporation

Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

Moving Lustre Forward

Brent Gorda Intel Corporation

Page 2: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL 2

Intel’s Technical Computing: Built for Breakthroughs

Delivering essential HPC solutions at every scale

Why? Ingenuity

Programmability Longevity

Page 3: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL

Agenda

The State of Lustre* software •  From Whamcloud to Intel •  Current Releases •  Important features and enhancements •  Emerging Software, Partner and Solution Ecosystem

FastForward update: Lustre-powered storage for Exascale Moving Lustre* Forward

3

* Other names and brands may be claimed as the property of others.

Page 4: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL

From Whamcloud to Intel

Founded on July 16, 2010 •  Brent Gorda – CEO •  Eric Barton – CTO

Founded Whamcloud to keep Lustre* moving forward for HPC

•  Recognized by OpenSFS and EOFS as the maintainer of source repositories

Acquired by Intel in July 2012 •  Becomes the High Performance Data Division •  Same team, same mission, more resources

Page 5: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL 5

Development of a Vibrant Ecosystem

2010 2012

Page 6: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL

LUSTRE* SOFTWARE

Page 7: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL 7

Community Lustre Roadmap

1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month intervals. Updates to past versions will be made on an ad hoc basis.

2 Feature releases focus on introducing new features. New release versions are expected at 6 month intervals. New maintenance versions from the feature release stream are anticipated at 18 month intervals.

Sponsors for Development and Releases: LLNL ORNL Intel OpenSFS CEA Xyratex Indiana University

2012 2013 2014 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2

Feature Releases2

2.3.0 2.4.0 2.5.0 2.6.0

Maintenance Releases1

2.1.5 2.1.6 2.1.4 2.1.3 ad hoc

2.4.3 2.4.4 2.4.2 2.4.1 2.4.0

   Server  Stack  SMP  Scaling      Online  check/scrub      Job  Stats  

   DNE  Phase  2      LFSCK  MDT-­‐MDT  Consistency      UID  Mapping  &  Shared  Key        

 

   LFSCK  MDT-­‐OST  Consistency      HSM  

   OSD  restructuring      DNE  Phase  1      LFSCK  MDT  FID/  LinkEA      Network  Request  Scheduler      4  MB  I/O  RPC  

06-01-2013

Page 8: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL

Increasing Community Participation

0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000

[2.1]

2.1

2.2

2.3

Bull

CEA

Cray

DDN

EMC

Fujitsu

LLNL

NICS

ORNL

TACC

Ultrascale

UVT

Whamcloud/Intel Xyratex

Sun/Oracle

Source: Intel internal statistics related to the lines of approved code per contributor per release.

Page 9: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL

Lustre 2.3 Feature Highlights

Server stack SMP scaling •  Performance improvements for multi-core servers

OI scrub

•  Background consistency check Lustre 2.3 is available today

Page 10: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL

Lustre 2.3 / 2.4 Highlights

•  SMP Scaling •  DNE Phase 1

•  Multiple MDS/MDTs in a single file system •  Layout lock

•  Required for HSM, ensures clients I/O to proper OST •  OSD API •  Improved single client performance •  4 MB I/O

•  Larger RPC size improves performance to back-end disk •  Network Request Scheduler •  Lustre 2.4.0 released 2Q, 2013

10

Page 11: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL

Purpose-built for commercial market Based on fully open sourced Lustre 2.3 core Enhanced with Intel® Manager for Lustre* Worldwide technical support Creating vibrant solution ecosystem and partner network Available from channel partners

Intel® Enterprise Edition for Lustre*

Page 12: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL 12

Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 2014 2015

IEEL  Version  2.0  -­‐  Q3  2014  *  Based  on  Lustre  2.6  release  and  an1cipated  features:  

• Previous  IEEL  Features,  plus:  • CIFS/NFS  support  • Custom  Char1ng  &  Monitoring  • Performance  Diagnos1cs  &  Troubleshoo1ng  

2013

Community Release Highlights

Intel® Enterprise Edition for Lustre* software

Intel® EE for Lustre* software Release Highlights

IEEL  Version  1.1  -­‐  Q1  2014  *  Based  on  Lustre  2.5  release  and  an1cipated  features:  

• Previous  IEEL  Features,  plus:  • Hierarchical  Storage  Management  • Dynamic  LNET  Configura1on  • Enhanced  Alerts  &  Logging  •  Intel  Xeon®  Phi  Support  

IEEL  –  Q3  2013  Based  on  Lustre  2.3  release  and  2.4  clients  and  Intel  features:  

• Lustre  file  system  plus  Selected  Stability  Patches  

•  Intel  Manager  for  Lustre*  soDware  • Hadoop  Adapter  for  Lustre*  soDware  • Enterprise  support  enhancements  • Maintenance  Releases  

*Roadmap  is  subject  to  change.  Dates  and  features  are  based  on  current  expecta1ons    and  are  subject  to  changes  in  scheduling,  altera1on,  or  removal.

Lustre  2.4  –  Q2  2013  • OSD  Restructuring  • DNE  Phase  1  • 4MB  I/O  RPC  

Lustre  2.5  –  Q4  2013  • Hierarchical  Storage  Management    

Lustre  2.6  –  Q2  2014  • DNE  Phases  2  • UID  Mapping  &  Shared  Key  

Page 13: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL

Moving Lustre* Forward

Page 14: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL

Bringing ‘Big Data’ to HPC

•  Lustre* accelerates Hadoop applications •  Global namespace allows all nodes to access all data •  Larger capacity and higher I/O •  Resource efficient and simpler to manage

•  Fast, shared and easy access to data is critical •  Open, collaboratively developed software is key

•  Linux •  Lustre •  Hadoop

•  Intel is investing in these important technologies

Page 15: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL

Using Lustre to Improve Hadoop

As HPC moves toward exascale levels, simulations will get larger and more complex Better tools are needed to analyze ever larger datasets Lustre and Hadoop form an ideal foundation:

•  Hadoop is the most popular software stack for big data analytics

•  Lustre is the leading file system for HPC Combined benefits:

•  Easier to manage a single, shared storage solution •  No data transfer overhead for staging inputs and

extracting results •  No need to partition storage into HPC (Lustre) and

Analytics (HDFS) 15

Page 16: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL

org.apache.hadoop.fs

Using Lustre with Hadoop

•  Hadoop uses pluggable extensions to work with different file systems

•  Lustre is POSIX compliant: •  Use Hadoop’s built-in LocalFileSystem class •  Uses native file system support in Java

•  Extends the default file system behavior •  Optimizes the performance of the shuffle

phase

16

FileSystem

RawLocalFileSystem

LustreFileSystem

Page 17: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL

TestDFSIO Benchmark Tests the raw performance of a file system Write and read very large files (35G each) in parallel One mapper per file, and single reducer to collect stats Embarrassingly parallel, does not test shuffle & sort

17 0

20

40

60

80

100

Write Read

Throughput

MB/s

Higher is better!

HDFS Lustre

filesize∑time∑

"

#$$

%

&''

Page 18: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL

Terasort Benchmark Distributed sort: The primary Map-Reduce primitive Sort 1 Billion records, i.e. approximately 100G •  Record: Randomly generated 10 byte key + 90 bytes garbage data Block Size: 128M, maps: 752 @ 4/node, reduces: 16 @ 2/node

18 0 100 200 300 400 500

Runtime (seconds) Less is better!

Lustre HDFS

Lustre 10-15% Faster

Page 19: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL

Exascale and FastForward

Page 20: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL

Dept. of Energy “FastForward” Program

•  Goal: Deliver Exascale computing before 2020 •  FastForward RFP provides funding for R & D •  Sponsored by 7 leading US national labs •  RFP elements were Processor, Memory and

Storage •  Whamcloud led group won the Storage portion:

•  HDF Group for HDF5 modifications and extensions •  EMC for Burst Buffer manager and I/O Dispatcher •  Cray for large scale testing •  DDN for versioning object storage

Page 21: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL

Asynchronous Programming

Bulk synchronous programming •  Simplifies application development •  But, is susceptible to jitter •  Makes strong scaling harder

Asynchronous programming •  Loosely coupled between processes

•  No waiting at barriers •  Closes “gaps” provided jitter balances

out over time

Page 22: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL

Transactions

Consistency and Integrity Guarantee required on any and all failures

•  Foundational component of system resilience Required at all levels of the I/O stack

•  Metadata at one level is data to the level below

No blocking protocols Non-blocking on each OSD Non-blocking across OSDs

I/O Epochs demark globally consistent snapshots Guarantees all updates in one epoch are atomic Recovery == roll back to last globally persistent epoch

•  Roll forward using client replay logs for transparent fault handling Cull old epochs when next epoch persistent on all OSDs

Page 23: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL

Architecture/Workflow

Page 24: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL

Exascale File System

Integrated I/O Stack •  Epoch transaction model •  Non-blocking scalable object I/O

HDF5/other schema •  High level application object I/O model •  I/O forwarding I/O Dispatcher •  Burst Buffer management •  Impedance match application I/O performance to

storage system capabilities

DAOS •  Conventional namespace •  Containers for transactional, scalable, object I/O

Page 25: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL

Learn. Contribute. Join.

14 15 16 17 18 19 20 Fri Sat Sun Mon Tues Wed Thurs

7:15 – 8:45 AM EOFS panel:

“Lustre and Big Data”

[ Mark Seager ]

Noon – 3:00 PM Lustre tutorials

and 3:15 to 5:30 PM

Lustre Community BoF

[ Congress Center ]

4:00 Lustre

Community Party [ EOFS

booth ]

“Ask the Architect” Eric Barton

Intel booth #350 Mo: 7 PM

Tues/Wed: 10 AM

Page 26: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL

Legal Disclaimers

26 Built for Breakthrough

Technical Collateral Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm

Roadmap Notice All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice

Page 27: Moving Lustre Forward - HPC Advisory Council...Community Lustre Roadmap 1 Maintenance releases focus on bug fixes and stability. Updates to the current version are made at 3 month

INTEL CONFIDENTIAL 27

Intel’s Technical Computing: Built for Breakthroughs

What will be your Breakthrough? Delivering essential HPC solutions at every scale

Why? Ingenuity

Programmability Longevity