Upload
rosanna-flowers
View
219
Download
4
Embed Size (px)
Citation preview
Scientific Computing Division
Trends and Directions of Mass Storage in the Scientific Computing Arena
CAS 2001
Gene HaranoNational Center for Atmospheric Research
3CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division
Vision
• How do we accomplish that vision?• Handling large datasets – Analysis and
Visualization• Shared File Systems and Cache Pools• Middleware and layering• Management tools• Emerging Technologies• (To name a few)
4CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division
Large Datasets
• The NCAR MSS was originally a tape based archive.• NCAR MSS average file size is 35 MBs (11 M files);
small due to historical restrictions (single volume datasets, model history files) and a large number (25%) of files < 1 MB (user backups)
• Single TB sized files are common for visualization and analysis• Currently these large files are sliced up prior to landing in
the archive.• Access is generally sequential, but some random access.
5CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division
Large Datasets
• Are tape based archives obsolete?• No, but there is a need to reevaluate the
entire storage structure at NCAR.• Cache pools• Data warehouses, data sub-setting
• The NCAR MSS is being treated as a shared file system rather than an archive.
6CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division
Shared File System
• Heterogeneous• High-Performance• High-Capacity• Doesn’t yet exist.
Shared Data
Web/GRID/servers
Programmatic
CommandLine
7CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division
Cache Pools
• External to the archive• Minimize archive activity• Temporary data stays out of the archive• Customized for a smaller set of associated data
• Internal to the archive• Minimize tape activity• Improve response time• Federate and distribute• Repackage small files for tape storage under
system control
8CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division
MSS Proxy
Data analysis
GPFSShared File System
Advanced Research Computing System (IBM SP)
Terascale Modeling & Analysis
9CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division
VislabMSS Proxy Data analysis
Storage Area NetworkShared File System
Terascale Analysis & Visualization
10CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division
CDP/ESGData
Processor DSS server
Storage Area NetworkShared File System
Unidata,DODs
MSSProxy
Data Provisioning & Access
11CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division
Internal Cache Pools
• NCAR MSS event log modeling (April 2000 – April 2001) – looking at tape activity
• 20 TB cache pool – can be federated and distributed• 30 day average cache residency• 70% reduction in tape read-backs• Greatly enhanced response time• Reduce the amount of tape resources or
redefine their use.
12CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division
Middleware and Layering
• An Archive performs 2 basic functions• Reliably storing data• Returning data on demand
• Data analysis, data mining, data assimilation, distributed data servers, etc. are functions utilizing middleware that sits on top of an archive and should be implemented independent of the underlying archive.
Role of an archive
13CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division
Middleware and Layering
• Separate archive functionality from• Visualization• Data servers• Data warehousing, data mining, data subsetting• Web and Grid access• Etc.
• Maximally enables the use of COTS• Allows (transparent) replacement of components
as needed• Fill the gaps with custom software
14CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division
Future Data Services
File CacheServices Pools
NCAR MSS Archive
Data Analysis/Mining/Assimilation
Data Cataloging/Searching
Data Storage Data Storage
Digital Libraries, Data Servers
VisualizationWEB
15CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division
Management Tools
• There is a need for better user and system management tools as MSS capacity scales.
• How does a single user manage 1 million files?
• How does a MSS administrator dynamically tune a system, predict workloads, find and correct bottlenecks?
16CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division
Management Tools
• Defining new roles• Single ordinary user• MSS superuser• As users come and go, there is a need for:
• Project superuser (new)• Division data administrator (new)
• Web based metadata user tools• List, search, catalog holdings – metadata mining• Remove unwanted files
NCAR MSS tools
17CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division
Management Tools
• From the system perspective – utilize data warehousing and data mining techniques• System modeling using event logs.
• Capacity planning• Identify bottlenecks
• Operational monitoring• Track errors, identify trends (media problems)• Intrusion detection• Dynamic system tuning
NCAR MSS tools
18CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division
Emerging Technologies
• Data Path• Tape• Holographic Storage• Probe-Based MEMS• High-Density Rosetta (analog)
19CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division
Data Path
• HIPPI in use today in the NCAR archive• Fibre Channel will replace our HIPPI in the
near term• FC SAN for RAID Cache Pools• FC SAN for Tape sharing
• Others• iSCSI• FC over IP• Infiniband
20CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division
Tape
0
5
10
15
20
25
30
35
40
0 10 20 30 40 50 60 70 80 90 100
Dat
a R
ate
(MB
/sec
)
Linear
3590
3570
3590E
Mammoth DLT-7000
DTFSD-3
Helical Native Cartridge Capacity (GB)Native Cartridge Capacity (GB)
3480/90
AIT-29840
AIT
3570C Ultrium2001
9490 EEAccelis
Mammoth 2
SDLT
3490 E
DLT-4000
99409940
2H022H02
9840B9840B
OptOpt200320031 TB1 TB
200GB 1Q02200GB 1Q02
500GB 500GB 2003
1 TB,60MB,20041 TB,60MB,2004
21CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division
Tape
• To be competitive with magnetic disk, magnetic tape must grow at 10x each 5 years.
• Achieved by a combination of increased areal density and longer (and possibly wider) tape.
(from a storage vendor)
22CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division
Tape
• RAIT (Redundant Array of Independent Tapes)• Increased Performance• Higher Reliability with the use of parity• Higher single “volume” Capacity• Large datasets on a single “volume”
• RAIL (Redundant Array of Independent Libraries)• Greater total system capacity• Improved response time
• These are resource intensive solutions – dedicated libraries and drives
23CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division
Holographic
• Large capacity – 10 GBs in a single cubic centimeter (10 Gbits/in2 for magnetic disk)
• High-speed – 2 Gigabits/sec• Low power• Billions of write cycles
24CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division
Probe-Based MEMS
• MEMS – Micro-Electrical Mechanical Systems
• Probe-based storage arrays• Dense• Highly parallel to achieve high bandwidth• Rectilinear 2D positioning• Commercial devices in the next several years
25CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric ResearchScientific Computing Division
HD Rosetta
• Product marketed by Norsam Technologies
• Developed at Los Alamos National Lab• Analog
• Lifetime of 1000s of years• Can be read back with only a microscope• Stores text and images