23
Using the Mass Storage System HPSS Presented by: Mitchell Griffith Oak Ridge Leadership Computing Facility (OLCF)

Using the Mass Storage System HPSS · If you need to move data between centers, contact NICS or OLCF helpline Typical Downtimes are Tuesday 8AM – 11 AM . 4 . 5 HPSS Terms • Storage

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Using the Mass Storage System HPSS · If you need to move data between centers, contact NICS or OLCF helpline Typical Downtimes are Tuesday 8AM – 11 AM . 4 . 5 HPSS Terms • Storage

Using the Mass Storage System HPSS

Presented by:

Mitchell Griffith Oak Ridge Leadership Computing Facility (OLCF)

Page 2: Using the Mass Storage System HPSS · If you need to move data between centers, contact NICS or OLCF helpline Typical Downtimes are Tuesday 8AM – 11 AM . 4 . 5 HPSS Terms • Storage

2

HPSS (archival storage)

• What is HPSS? –  HPSS is software that manages petabytes of data on disk and

robotic tape libraries. HPSS provides highly flexible and scalable hierarchical storage management that keeps recently used data on disk and less recently used data on tape.

–  Hierarchical storage system •  Disk cache •  Tape backend •  Core Server, DB2 Metadata, Tape Control Software

• References –  http://www.hpss-collaboration.org –  http://www.hpss-collaboration.org/documents/hpss732/

install_guide.pdf –  http://www.mgleicher.us

Page 3: Using the Mass Storage System HPSS · If you need to move data between centers, contact NICS or OLCF helpline Typical Downtimes are Tuesday 8AM – 11 AM . 4 . 5 HPSS Terms • Storage

3

OLCF/NCCS NICS

HPSS

HPSS Layout

HPSS is shared between the OLCF and NICS

If you need to move data between centers, contact NICS or OLCF helpline

Typical Downtimes are Tuesday 8AM – 11 AM

Page 4: Using the Mass Storage System HPSS · If you need to move data between centers, contact NICS or OLCF helpline Typical Downtimes are Tuesday 8AM – 11 AM . 4 . 5 HPSS Terms • Storage

4

Page 5: Using the Mass Storage System HPSS · If you need to move data between centers, contact NICS or OLCF helpline Typical Downtimes are Tuesday 8AM – 11 AM . 4 . 5 HPSS Terms • Storage

5

HPSS Terms

• Storage Class –  Collection of storage devices –  Either disk or tape (not combination)

• Class of Service (COS) –  Collection of disks and tapes –  Set the hierarchy levels for the storage classes –  Top-level is disk, bottom levels are tape

Page 6: Using the Mass Storage System HPSS · If you need to move data between centers, contact NICS or OLCF helpline Typical Downtimes are Tuesday 8AM – 11 AM . 4 . 5 HPSS Terms • Storage

6

HPSS Terms

• Migrate –  Copy the file from a higher storage class to a lower storage class

(migrate from disk to tape)

• Stage –  Copy the file from a lower storage class to a higher storage class

(stage from tape to disk)

• Purge –  Remove the file from the top-level storage class(free disk cache)

(does NOT delete the file)

Page 7: Using the Mass Storage System HPSS · If you need to move data between centers, contact NICS or OLCF helpline Typical Downtimes are Tuesday 8AM – 11 AM . 4 . 5 HPSS Terms • Storage

7

HPSS Layout

• COS –  Single copy COS (lscos)

COS   Name   Min   Max  

5081   Xsmall   0   131,071  (128K)  

6001   Small   131,072  (128K)   16,777,215  (16M)  

6002   Medium   16,777,216  (16M)   536,870,911  (512M)  

6054   Large_T   536,870,912  (512M)   8,589,934,591  (8G)  

6056   X-­‐Large_T   8,589,934,591  (8G)   281,474,976,710,656  (256T)  

Page 8: Using the Mass Storage System HPSS · If you need to move data between centers, contact NICS or OLCF helpline Typical Downtimes are Tuesday 8AM – 11 AM . 4 . 5 HPSS Terms • Storage

8

HPSS Layout

• Disk Cache –  Disk cache is striped(multiple disks, faster device) –  Upgraded disk cache (660TB in disk cache) –  (12) disk movers in production

•  Tape –  (17) production movers

•  Future Work –  Planning on updating hardware –  Adding additional movers –  Faster network backplane

Page 9: Using the Mass Storage System HPSS · If you need to move data between centers, contact NICS or OLCF helpline Typical Downtimes are Tuesday 8AM – 11 AM . 4 . 5 HPSS Terms • Storage

9

HPSS Layout

•  (6) SL8500’s –  10,000 tape slots per silo

•  40,000+ tapes in use –  Tape capacity (500GB, 1TB, 5TB, 8.5TB)

•  112 tape drives –  16 T10K-A –  60 T10K-B –  36 T10K-C

Page 10: Using the Mass Storage System HPSS · If you need to move data between centers, contact NICS or OLCF helpline Typical Downtimes are Tuesday 8AM – 11 AM . 4 . 5 HPSS Terms • Storage

10

HPSS Interface only way to interface with HPSS

• HSI –  Similar to ftp –  Authentication (keytab or combo) –  Can be interactive or non-interactive –  Preferred only 1 stream at a time –  Batch options to allow HPSS to retrieve files more efficiently

• HTAR –  Acts like a wrapper to tar for HPSS –  Faster than tar a file then put to hpss –  Good for file aggregation –  Preferred only 1 stream at a time

Page 11: Using the Mass Storage System HPSS · If you need to move data between centers, contact NICS or OLCF helpline Typical Downtimes are Tuesday 8AM – 11 AM . 4 . 5 HPSS Terms • Storage

11

HSI

•  For a complete list of commands: –  http://www.mgleicher.us/GEL/hsi/hsi_reference_manual_2/

hsi_commands/

• Most commonly used commands:

•  cd •  chgrp •  chmod •  chown •  du

•  get •  In •  ls •  mkdir •  mv

•  put •  rm •  rmdir •  pwd

Page 12: Using the Mass Storage System HPSS · If you need to move data between centers, contact NICS or OLCF helpline Typical Downtimes are Tuesday 8AM – 11 AM . 4 . 5 HPSS Terms • Storage

12

HSI get and put example

local_file_name : hpss_file_name K:[/home/mitchell]: put 1MB : /home/mitchell/test/1mb

put '1MB' : '/home/mitchell/test/1mb' ( 1048576 bytes, 30173.8 KBS (cos=6001))

K:[/home/mitchell]: get test.file : /home/mitchell/test/1mb

get 'test.file' : '/home/mitchell/test/1mb' (2014/02/06 01:00:35 1048576 bytes, 4965.3 KBS )

Page 13: Using the Mass Storage System HPSS · If you need to move data between centers, contact NICS or OLCF helpline Typical Downtimes are Tuesday 8AM – 11 AM . 4 . 5 HPSS Terms • Storage

13

htar examples

mitchell@krakenpf6:/lustre/scratch/mitchell> htar cf backup.tar .

HTAR: HTAR SUCCESSFUL

mitchell@krakenpf6:/lustre/scratch/mitchell> htar tvf backup.tar

HTAR: drwx------ mitchell/tsg01 0 2014-02-06 01:26 ./

HTAR: drwxr-xr-x mitchell/tsg01 0 2014-02-06 01:24 ./a/

HTAR: -rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:23 ./a/a.file

HTAR: -rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:24 ./a/b.file

HTAR: -rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:24 ./a/c.file

HTAR: -rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:24 ./a/d.file

HTAR: drwxr-xr-x mitchell/tsg01 0 2014-02-06 01:24 ./d/

HTAR: -rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:23 ./d/a.file

HTAR: -rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:24 ./d/b.file

HTAR: -rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:24 ./d/c.file

HTAR: -rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:24 ./d/d.file

HTAR: -rw------- mitchell/tsg01 256 2014-02-06 01:26 /tmp/HTAR_CF_CHK_28882_1391667988

HTAR: Listing complete for backup.tar, 9 files 12 total objects

HTAR: HTAR SUCCESSFUL

Page 14: Using the Mass Storage System HPSS · If you need to move data between centers, contact NICS or OLCF helpline Typical Downtimes are Tuesday 8AM – 11 AM . 4 . 5 HPSS Terms • Storage

14

htar examples

mitchell@krakenpf6:/lustre/scratch/mitchell> hsi get backup.tar

mitchell@krakenpf6:/lustre/scratch/mitchell> tar tvf backup.tar

drwx------ mitchell/tsg01 0 2014-02-06 01:26 .

drwxr-xr-x mitchell/tsg01 0 2014-02-06 01:24 ./a

-rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:23 ./a/a.file

-rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:24 ./a/b.file

-rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:24 ./a/c.file

-rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:24 ./a/d.file

drwxr-xr-x mitchell/tsg01 0 2014-02-06 01:24 ./d

-rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:23 ./d/a.file

-rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:24 ./d/b.file

-rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:24 ./d/c.file

-rw-r--r-- mitchell/tsg01 1048576 2014-02-06 01:24 ./d/d.file

tar: Removing leading `/' from member names

-rw------- mitchell/tsg01 256 2014-02-06 01:26 /tmp/HTAR_CF_CHK_28882_1391667988

Page 15: Using the Mass Storage System HPSS · If you need to move data between centers, contact NICS or OLCF helpline Typical Downtimes are Tuesday 8AM – 11 AM . 4 . 5 HPSS Terms • Storage

15

htar examples

mitchell@krakenpf6:/lustre/scratch/mitchell/test> htar xf backup.tar

HTAR: HTAR SUCCESSFUL

mitchell@krakenpf6:/lustre/scratch/mitchell/test> ls

a d

mitchell@krakenpf6:/lustre/scratch/mitchell/test> rm -fr a d

mitchell@krakenpf6:/lustre/scratch/mitchell/test> ls -l

total 0

mitchell@krakenpf6:/lustre/scratch/mitchell/test> htar xvf backup.tar ./d/a.file

HTAR: x ./d/a.file, 1048576 bytes, 2049 media blocks

HTAR: Extract complete for backup.tar, 1 files. total bytes read: 1,049,088 in 0.006 seconds (188.955 MB/s )

HTAR: HTAR SUCCESSFUL

mitchell@krakenpf6:/lustre/scratch/mitchell/test> ls -l d/a.file

-rw-r--r-- 1 mitchell tsg01 1048576 2014-02-06 01:23 d/a.file

mitchell@krakenpf6:/lustre/scratch/mitchell/test>

Page 16: Using the Mass Storage System HPSS · If you need to move data between centers, contact NICS or OLCF helpline Typical Downtimes are Tuesday 8AM – 11 AM . 4 . 5 HPSS Terms • Storage

16

HPSS Interfaces

• Access time can be slow if accessing from tape –  HSI get command:

K:[/home/USER]: get a.pdf

Scheduler: retrieving file(s)

•  Waiting on a tape drive (resource contention) –  HTAR you will not see anything

• HTAR can give you just a standard tar file –  hsi get file.tar –  $ tar xf file.tar

Page 17: Using the Mass Storage System HPSS · If you need to move data between centers, contact NICS or OLCF helpline Typical Downtimes are Tuesday 8AM – 11 AM . 4 . 5 HPSS Terms • Storage

17

HPSS Errors (possible tape issue)

[/home/USER]: get test.tar

Scheduler: retrieving file(s)

HPSS EIO error, will retry in 10 seconds0.00 0.00%] [Done: 0 Failed: 0]

HPSS EIO error, will retry in 60 seconds0.00 0.00%] [Done: 0 Failed: 0]

HPSS EIO error, will retry in 360 seconds.00 0.00%] [Done: 0 Failed: 0]

HPSS EIO error, aborting0.00 Xferred: 0.00 0.00%] [Done: 0 Failed: 0]

*** hpss_Open: I/O error [-5: HPSS_EIO]

/home/USER/10_5.5A.tar

*** get: Error -1 on transfer. /lustre/USER/test.tar from

/home/USER/test.tar

Page 18: Using the Mass Storage System HPSS · If you need to move data between centers, contact NICS or OLCF helpline Typical Downtimes are Tuesday 8AM – 11 AM . 4 . 5 HPSS Terms • Storage

18

Troubleshooting Common Codes

• HPSS_EPERM (-1) Permission issue, user does not have

permissions to perform the task • HPSS_ENOSPACE (-28) HPSS does not have enough

space (disk or tape) to perform the task • HPSS_EIO (-5) IO error could be anything from HPSS to

network to lustre file system. • HPSS_ECONN (-50) Connection issue

Page 19: Using the Mass Storage System HPSS · If you need to move data between centers, contact NICS or OLCF helpline Typical Downtimes are Tuesday 8AM – 11 AM . 4 . 5 HPSS Terms • Storage

19

Authentication Issue

> hsi result = -11000, errno = 0g] Unable to authenticate user with HPSS. result = -11000, errno = 9 Unable to setup communication to HPSS... *** HSI: error opening logging Error - authentication/initialization failed

Workaround Ø hsi –A combo (only if you have a OTP)

Page 20: Using the Mass Storage System HPSS · If you need to move data between centers, contact NICS or OLCF helpline Typical Downtimes are Tuesday 8AM – 11 AM . 4 . 5 HPSS Terms • Storage

20

HSI Batch

K:[/home/mitchell]: in hpss.marker!

in hpss.marker!

get <<MARKER!

get ’b1.tar' : '/home/mitchell/b1.tar' (2014/02/06 01:52:41 8395776 bytes, 19723.2 KBS )!

get ’b4.tar' : '/home/mitchell/b4.tar' (2014/02/06 01:54:48 8395776 bytes, 400446.2 KBS )!

get ’b2.tar' : '/home/mitchell/b2.tar' (2014/02/06 01:52:46 8395776 bytes, 19706.4 KBS )!

get ’b3.tar' : '/home/mitchell/b3.tar' (2014/02/06 01:52:51 8395776 bytes, 13278.8 KBS )!

K:[/home/mitchell]: !

• When retrieving multiple files, it is strongly recommend to use the MARKER option

• Schedule retrievals in an optimal way so as to minimize HPSS tape mounts $ cat hpss.marker !

get <<MARKER!b1.tar!b2.tar!b3.tar!b4.tar!MARKER!!

Page 21: Using the Mass Storage System HPSS · If you need to move data between centers, contact NICS or OLCF helpline Typical Downtimes are Tuesday 8AM – 11 AM . 4 . 5 HPSS Terms • Storage

21

HPSS Tips

• HPSS is archival storage –  Many very small files are bad for HPSS (metadata stuff) –  Too large of files are bad as well (disk cache fills up) –  Optimal file size for HPSS is between 2GB and 256GB. –  Use HTAR to archive several smaller files into one

•  A member file of an HTAR file cannot be > 64GB

• Submit to the HPSS queue (schedule the transfers) –  http://www.nics.tennessee.edu/computing-resources/hpss/queue

Page 22: Using the Mass Storage System HPSS · If you need to move data between centers, contact NICS or OLCF helpline Typical Downtimes are Tuesday 8AM – 11 AM . 4 . 5 HPSS Terms • Storage

22

HPSS Tips

• Additional information try the transfer with the debug option –  hsi –d5 “get /dev/null : file.tar” –  debug 5; get /dev/null : file.tar

• When moving data to lustre, be mindful of the lustre striping –  Rule of thumb for NICS:1 stripe every 50GB. –  http://www.nics.tennessee.edu/node/311

• Once data is on scratch use standard transfer methods: –  http://www.nics.tennessee.edu/computing-resources/data-transfer –  https://www.olcf.ornl.gov/kb_articles/transferring-data/?

nccssystems=DTN

Page 23: Using the Mass Storage System HPSS · If you need to move data between centers, contact NICS or OLCF helpline Typical Downtimes are Tuesday 8AM – 11 AM . 4 . 5 HPSS Terms • Storage

23

Questions and Contact Information

• OLCF: [email protected] •  NICS and XSEDE users: [email protected] •  NICS and non-XSEDE users: [email protected]