17
Apr. 14, 2004 HSM at the NCCS: from UniTree to SAM-QFS u NASA Goddard IEEE MSST 2004 0 NASA Center for Computational Sciences Hierarchical Storage Management at the NASA Center for Computational Sciences: From UniTree to SAM-QFS [email protected] Science Computing Branch Earth and Space Science Computing Division NASA Goddard Space Flight Center Code 931

Hierarchical Storage Management at thestorageconference.us/2004/Presentations/19-EMS.pdf• High Performance Compute Engines –HP Compaq ES 45 Alphaserver SC (1392p) –SGI Origin

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Hierarchical Storage Management at thestorageconference.us/2004/Presentations/19-EMS.pdf• High Performance Compute Engines –HP Compaq ES 45 Alphaserver SC (1392p) –SGI Origin

Apr. 14, 2004 HSM at the NCCS: from UniTree to SAM-QFS u NASA Goddard IEEE MSST 2004 0

NASA Center for Computational Sciences

Hierarchical Storage Managementat the

NASA Center for Computational Sciences:From UniTree to SAM-QFS

[email protected] Computing Branch

Earth and Space Science Computing DivisionNASA Goddard Space Flight Center Code 931

Page 2: Hierarchical Storage Management at thestorageconference.us/2004/Presentations/19-EMS.pdf• High Performance Compute Engines –HP Compaq ES 45 Alphaserver SC (1392p) –SGI Origin

Apr. 14, 2004 HSM at the NCCS: from UniTree to SAM-QFS u NASA Goddard IEEE MSST 2004 1

NASA Center for Computational Sciences

NCCS’s Mission and Customers

• NASA Center for Computational Sciences (NCCS) at NASAGoddard Space Flight Center

• Mission: Enable Earth and space sciences research (via dataassimilation and computational modeling) by providing state-of-the-art facilities in– High Performance Computing (HPC),– Mass Storage– High-speed Networking– HPC Computational Science Expertise

• Earth and space science customers:– Seasonal-to-interannual climate and ocean prediction– Global weather and climate data sets incorporating data

assimilated from numerous land-based and satellite-borneinstruments

Page 3: Hierarchical Storage Management at thestorageconference.us/2004/Presentations/19-EMS.pdf• High Performance Compute Engines –HP Compaq ES 45 Alphaserver SC (1392p) –SGI Origin

Apr. 14, 2004 HSM at the NCCS: from UniTree to SAM-QFS u NASA Goddard IEEE MSST 2004 2

NASA Center for Computational Sciences

NCCS Resources• High Performance Compute Engines

– HP Compaq ES 45 Alphaserver SC (1392p)– SGI Origin 3800s (608p total)– ~4.5 TFLOPs peak total

• Mass Storage Systems and Servers– Mass Data Storage and Delivery System (MDSDS), was UniTree

now SAM-QFS, on Sun Fire 15K, 2 domains, Shared QFS “HA”• ~355 TiB*, ~12M files, DDN S2A 8000 disk arrays

– SGI DMF, Origin 3800 server, to be converted to SAM-QFS via“DMS” (Data Management System, based on SRB)

• ~350 TiB*, ~14M files, HDS 9960 and SGI TP 9x00 disk arrays• Tape Libraries, Intra-Machine/Device Networks, Switches

– Nine STK Powderhorn tape libraries (~51,000 slots)• STK 9840C, STK 9940B, STK 9840A tape drives

– Gigabit Ethernet, Foundry BigIron 15K– 2-Gb Fibre Channel, Brocade Silkworm 12K, 3900s

* Unique (does not include risk mitigation duplicates)

Page 4: Hierarchical Storage Management at thestorageconference.us/2004/Presentations/19-EMS.pdf• High Performance Compute Engines –HP Compaq ES 45 Alphaserver SC (1392p) –SGI Origin

Apr. 14, 2004 HSM at the NCCS: from UniTree to SAM-QFS u NASA Goddard IEEE MSST 2004 3

NASA Center for Computational Sciences

NCCS Mass Storage Growth

0

100

200

300

400

500

600

700

1999 2000 2001 2002 2003

Fiscal Year

To

tal D

ata

(TB

)

0

2

4

6

8

10

12

14

l M

illio

ns

of

File

s

NCCS Mass Data Storage and DeliverySystem (MDSDS) Growth

(Incl

udes

Ris

k M

itiga

tion

Dupl

icat

ed D

ata)

Page 5: Hierarchical Storage Management at thestorageconference.us/2004/Presentations/19-EMS.pdf• High Performance Compute Engines –HP Compaq ES 45 Alphaserver SC (1392p) –SGI Origin

Apr. 14, 2004 HSM at the NCCS: from UniTree to SAM-QFS u NASA Goddard IEEE MSST 2004 4

NASA Center for Computational Sciences

NCCS Projected Requirements• Earth and Space Science drivers: increasing

– Model resolution– Number of assimilated observations– Numbers of concurrent model ensembles

• Total data held (including risk-mitigation duplicates):– Current: ~1.5 PiB– End of FY 2005: ~6 PiB– End of FY 2007: ~19 PiB

• Files (unique):– Current: ~25 million, grows ~33% per year– FY 2005: ~44 million– FY 2007: ~78 million

Page 6: Hierarchical Storage Management at thestorageconference.us/2004/Presentations/19-EMS.pdf• High Performance Compute Engines –HP Compaq ES 45 Alphaserver SC (1392p) –SGI Origin

Apr. 14, 2004 HSM at the NCCS: from UniTree to SAM-QFS u NASA Goddard IEEE MSST 2004 5

NASA Center for Computational Sciences

HSM Evaluation (Spring 2002)

• High-end HSMvendors’ responsesto 60-some technicalquestions– SGI’s DMF– Sun’s SAM-QFS– Legato’s

DiskXtender(UniTree Central FileManager)

– IBM’s HPSS

• NCCS and CSCteam evaluated:– Performance– Integrity, High

Availability– Scalability,

Modularity, Flexibility– Balance (avoiding

bottlenecks)– Manageability

Page 7: Hierarchical Storage Management at thestorageconference.us/2004/Presentations/19-EMS.pdf• High Performance Compute Engines –HP Compaq ES 45 Alphaserver SC (1392p) –SGI Origin

Apr. 14, 2004 HSM at the NCCS: from UniTree to SAM-QFS u NASA Goddard IEEE MSST 2004 6

NASA Center for Computational Sciences

UniTree to SAM-QFS Migration• Employed Sun’s SAM migration toolkit and migration

libraries written by Instrumental, Inc.– Legacy UniTree directory and file “inode” info harvested,

then inserted into SAM-QFS filesystems• Legacy UniTree: ~300 TB, ~11M files• Only 5 days downtime, including QC checks and server

recabling ( ~11M files, ~300K directories)– Transparent user retrieval of legacy files: SAM sees UniTree

files as “stranger” media, so reads files via migration library,then archives to SAM tapes

• Approach requires UniTree system to read legacy files/media– Background migration: via DQDuffy et al. Perl script,

UniTree files pre-staged tape by tape for efficiency

Page 8: Hierarchical Storage Management at thestorageconference.us/2004/Presentations/19-EMS.pdf• High Performance Compute Engines –HP Compaq ES 45 Alphaserver SC (1392p) –SGI Origin

Apr. 14, 2004 HSM at the NCCS: from UniTree to SAM-QFS u NASA Goddard IEEE MSST 2004 7

NASA Center for Computational Sciences

Strong Benefits in NCCS’s Current SAM-QFS Configuration

• Performance observed in daily use: over 10 TB/dayarchived while handling 2+TB/day user traffic

• Shared QFS works well to make the underlyingcluster appear as a single entity

• Using “HA flip-flop” for significant software upgradeshas greatly reduced downtime for significant softwareupgrades

• A test cluster system has been invaluable• Restoring files after accidental deletions much

simpler/faster than previous solution

Page 9: Hierarchical Storage Management at thestorageconference.us/2004/Presentations/19-EMS.pdf• High Performance Compute Engines –HP Compaq ES 45 Alphaserver SC (1392p) –SGI Origin

Apr. 14, 2004 HSM at the NCCS: from UniTree to SAM-QFS u NASA Goddard IEEE MSST 2004 8

NASA Center for Computational Sciences

Lessons Learned• Complexities of clustered HSM systems make

configuration of automated high-availability softwarechallenging

• The “Release Currency Conundrum”:– Software release’s newest features will be the most

immature– Keeping current on OS and HSM patches can help to avoid

significant pitfalls• Make “risk mitigation” duplicate tape copies• Keep your expectations of vendors high

– Great support/cooperation from Sun in getting “TrafficManager” (a.k.a mpxio) to work with 3rd party Fibre ChannelRAID array (DataDirect Networks S2A 8000)

Page 10: Hierarchical Storage Management at thestorageconference.us/2004/Presentations/19-EMS.pdf• High Performance Compute Engines –HP Compaq ES 45 Alphaserver SC (1392p) –SGI Origin

Apr. 14, 2004 HSM at the NCCS: from UniTree to SAM-QFS u NASA Goddard IEEE MSST 2004 9

NASA Center for Computational Sciences

Background Detail

Page 11: Hierarchical Storage Management at thestorageconference.us/2004/Presentations/19-EMS.pdf• High Performance Compute Engines –HP Compaq ES 45 Alphaserver SC (1392p) –SGI Origin

Apr. 14, 2004 HSM at the NCCS: from UniTree to SAM-QFS u NASA Goddard IEEE MSST 2004 10

NASA Center for Computational Sciences

The Large TeamEllen Salmon, Adina Tarshish, Nancy Palm, Tom SchardtNASA Center for Computational Sciences (NCCS)NASA Goddard Space Flight Center Code 931Greenbelt, Maryland [email protected]@[email protected]@nasa.gov

Sanjay Patel, Marty Saletta, Ed Vanderlan,Mike Rouch, Lisa Burns, Dr. Daniel Duffy, Roman TurkevichComputer Sciences Corporation,c/o NASA GSFC Code 931Greenbelt, Maryland [email protected]@[email protected]@[email protected]@[email protected]

Robert Caine, Randall Golay,Craig Flaskerud, Linda Radford,Matt HatleySun Microsystems, Inc.7900 Westpark DriveMcLean, VA, 22102Email:[email protected]@[email protected]@[email protected]

Jeff Paffel, Nathan SchumannInstrumental, Inc.2748 East 82nd StreetBloomington, MN [email protected]@instrumental.com

Page 12: Hierarchical Storage Management at thestorageconference.us/2004/Presentations/19-EMS.pdf• High Performance Compute Engines –HP Compaq ES 45 Alphaserver SC (1392p) –SGI Origin

NCCS MDSDS SAM-QFS User Transfer Traffic

0

500

1000

1500

2000

2500

9/16

/03

9/23

/03

9/30

/03

10/7

/03

10/1

4/03

10/2

1/03

10/2

8/03

11/4

/03

11/1

1/03

11/1

8/03

11/2

5/03

12/2

/03

12/9

/03

12/1

6/03

12/2

3/03

12/3

0/03

1/6/

04

1/13

/04

1/20

/04

1/27

/04

2/3/

04

2/10

/04

2/17

/04

2/24

/04

3/2/

04

3/9/

04

3/16

/04

3/23

/04

3/30

/04

4/6/

04

GiB GiB ret

GiB stor

ems 4/13/2004

From 16 Sep. 2003 to 12 Apr. 2004:• 119.5 TiB stored (2.6 Mfiles)• 37.5 TiB retrieved (1.0 Mfiles)• 157.1 TiB transferred total (3.7 Mfiles)

Page 13: Hierarchical Storage Management at thestorageconference.us/2004/Presentations/19-EMS.pdf• High Performance Compute Engines –HP Compaq ES 45 Alphaserver SC (1392p) –SGI Origin

NCCS MDSDS SAM-QFS Daily Tape Write Activity

0

1

2

3

4

5

6

7

8

9

11/1

/03

11/6

/03

11/1

1/03

11/1

6/03

11/2

1/03

11/2

6/03

12/1

/03

12/6

/03

12/1

1/03

12/1

6/03

12/2

1/03

12/2

6/03

12/3

1/03

1/5/

04

1/10

/04

1/15

/04

1/20

/04

1/25

/04

1/30

/04

2/4/

04

2/9/

04

2/14

/04

2/19

/04

2/24

/04

2/29

/04

3/5/

04

3/10

/04

3/15

/04

3/20

/04

3/25

/04

3/30

/04

4/4/

04

4/9/

04

TiB

/day

0

50

100

150

200

250

300

350

400

450

Kfi

les/

day

total TiBtotal Kfiles

ems 4/13/2004

From 1 Nov. 2003 to 12 Apr. 2004:• 605.5 TiB written• 16.4 million files written(includes Risk Mitigation Duplicates)

Page 14: Hierarchical Storage Management at thestorageconference.us/2004/Presentations/19-EMS.pdf• High Performance Compute Engines –HP Compaq ES 45 Alphaserver SC (1392p) –SGI Origin

Apr. 14, 2004 HSM at the NCCS: from UniTree to SAM-QFS u NASA Goddard IEEE MSST 2004 13

NASA Center for Computational Sciences

The Future (1)

• Further optimize data placement on tape to favordata retrieval– Issue: adequately characterizing retrievals?

• Explore SATA disk as the most nearline part of theHSM hierarchy– NCCS data retrieval profile make this somewhat problematic– But becomes more attractive as time-to-first-data rises on

growing-capacity tape– Not expected to replace tape any time soon

• National Lambda Rail participation: enable largescale, long distance science team collaboration

Page 15: Hierarchical Storage Management at thestorageconference.us/2004/Presentations/19-EMS.pdf• High Performance Compute Engines –HP Compaq ES 45 Alphaserver SC (1392p) –SGI Origin

Apr. 14, 2004 HSM at the NCCS: from UniTree to SAM-QFS u NASA Goddard IEEE MSST 2004 14

NASA Center for Computational Sciences

The Future (2):Data Management System

• Goal: help users manage their data• Based on San Diego Supercomputer Center’s

Storage Resource Broker (SRB) middleware, systemdeveloped by Halcyon Systems, Inc.

• Replaces file system access• Allows for extremely useful metadata and queries,for

monitoring and management,e.g.– File content and provenance– File expiration

• Allows for transparent (to user) migration betweenunderlying HSM

Page 16: Hierarchical Storage Management at thestorageconference.us/2004/Presentations/19-EMS.pdf• High Performance Compute Engines –HP Compaq ES 45 Alphaserver SC (1392p) –SGI Origin

Apr. 14, 2004 HSM at the NCCS: from UniTree to SAM-QFS u NASA Goddard IEEE MSST 2004 15

NASA Center for Computational Sciences

References[1] http://nccs.nasa.gov.[2] Performance Management at an Earth Science Supercomputer Center, Jim McGalliard and

Dick Glassbrook.[3] Storage and Network Bandwidth Requirements Through the Year 2000 for the NASA Center

for Computational Sciences, Ellen Salmon, Proceedings of the fifth Goddard Conference onMass Storage Systems and Technologies, (1996) pp. 273-286.

[4] Mass Storage System Upgrades at the NASA Center for Computational Sciences, A.Tarshish, E. Salmon, M. Macie, and M. Saletta, Proceedings of the Eight NASA GoddardConference on Mass Storage Systems and Technologies, Seventh IEEE Symposium onMass Storage Systems, (2000) pp. 325-334.

[4] UniTree to SAM-QFS Project Plan, Jeff Paffel, Instrumental, Inc., NCCS internal report.[5] UniTree to SAM-QFS Migration Procedure, Daniel Duffy, Computer Sciences Corporation,

NCCS internal report.[6] Sun SAM-FS and Sun SAM-QFS Storage and Archive Management Guide, August 2002;

Sun QFS, Sun SAM-FS, and Sun SAM-QFS File System Administrator’s Guide.[7] http://www.npaci.edu/DICE/SRB/.

Page 17: Hierarchical Storage Management at thestorageconference.us/2004/Presentations/19-EMS.pdf• High Performance Compute Engines –HP Compaq ES 45 Alphaserver SC (1392p) –SGI Origin

Apr. 14, 2004 HSM at the NCCS: from UniTree to SAM-QFS u NASA Goddard IEEE MSST 2004 16

NASA Center for Computational Sciences

Standard Disclaimers and Legalese Eye Chart

• All Trademarks, logos, or otherwise registered identification markers are ownedby their respective parties.

• Disclaimer of Liability: With respect to this presentation, neither the UnitedStates Government nor any of its employees, makes any warranty, express orimplied, including the warranties of merchantability and fitness for a particularpurpose, or assumes any legal liability or responsibility for the accuracy,completeness, or usefulness of any information, apparatus, product, or processdisclosed, or represents that its use would not infringe privately owned rights.

• Disclaimer of Endorsement: Reference herein to any specific commercialproducts, process, or service by trade name, trademark, manufacturer, orotherwise, does not necessarily constitute or imply its endorsement,recommendation, or favoring by the United States Government. In addition,NASA does not endorse or sponsor any commercial product, service, or activity.

• The views and opinions of author(s) expressed herein do not necessarily stateor reflect those of the United States Government and shall not be used foradvertising or product endorsement purposes.