Upload
ardice
View
45
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Data Management, Storage and Access Optimization in High Performance Distributed Environment. Xiaohui Shen Department of Electrical and Computer Engineering Northwestern University Jan 17, 2001. Outline. Problem Definition Solutions Meta-data Management System - PowerPoint PPT Presentation
Citation preview
January 17, 2001 Xiaohui Shen 1
Data Management, Storage and Access Optimization in High Performance
Distributed Environment
Xiaohui ShenDepartment of Electrical and Computer EngineeringNorthwestern UniversityJan 17, 2001
January 17, 2001 Xiaohui Shen 2
Outline
Problem Definition Solutions
Meta-data Management System Remote Storage Access Optimizations Multi-Storage I/O System Distributed Parallel File System I/O performance prediction and evaluation Integrated working environment
January 17, 2001 Xiaohui Shen 3
Motivation
January 17, 2001 Xiaohui Shen 4
Current Solutions Parallel File System and runtime libraries: smart
I/O optimizations, caching, prefetching, parallel I/O User interfaces are low-level No portable Hard-coded I/O selection is difficult for runtime systems
Database Systems: high-level, easy-to-use, portable lack of power I/O optimizations
January 17, 2001 Xiaohui Shen 5
System Architecture
January 17, 2001 Xiaohui Shen 6
Tasks
Meta-data Management System Remote Storage Access Optimizations Efficient Storage Organization
• Multi-Storage I/O System• Distributed Parallel File System
I/O performance prediction and evaluation Integrated working environment
January 17, 2001 Xiaohui Shen 7
Part 1: Meta-data Management System (MDMS)
Abstract Storage Devices (ASDs) Storage patterns & access patterns Access History and trail of
navigation
January 17, 2001 Xiaohui Shen 8
MDMS Tables
TABLE NAME FUNCTIONALITY PRIMARY KEYRun table Record each run of the application
with user-specified attributesRun id
Dataset table Keeps information about the datasetsused each run
Run id + association id
Access patterntable
Keeps the access pattern specified byuser for each dataset
Run id + dataset name
Storage patterntable
Keeps information on how data storedfor each dataset
Dataset name
Execution table Records I/O activities of the run,including file path and name, offset,etc
Run id + dataset + iterationnumber
January 17, 2001 Xiaohui Shen 9
MDMS Internal Representation
January 17, 2001 Xiaohui Shen 10
MDMS I/O Flow (API)
January 17, 2001 Xiaohui Shen 11
Optimizations inside MDMS
January 17, 2001 Xiaohui Shen 12
Part 2: Remote Storage Access Optimization for HSS
Secondary Storage Access techniques: collective-I/O, data sieving, caching, prefetching etc
Tertiary Storage Systems directly interacts with applications
Remote environment
January 17, 2001 Xiaohui Shen 13
Optimizations
Remote Collective I/O Remote Data sieving Asynchronous I/O Subfile Superfile Migration, Stage and Purge,
SRB Container
January 17, 2001 Xiaohui Shen 14
Optimization: Subfile
Subfile Subfile
Subfile
Subfile
SubfileSubfileSubfileSubfile
Subfile
Subfile Subfile Subfile
Subfile Subfile Subfile Subfile
January 17, 2001 Xiaohui Shen 15
Optimization: Superfile Create: One large file
Access: first access brings the whole large file into memory, subsequent accesses can be directly serviced from memory
January 17, 2001 Xiaohui Shen 16
Other Optimizations
Migration Stage Purge SRB Container
January 17, 2001 Xiaohui Shen 17
Part 3: MS-I/O: A Multi-storage I/O System
Further performance improvement is limited by the nature of storage media.
The problem is rooted in the traditional Single-storage resource architecutre.
January 17, 2001 Xiaohui Shen 18
Solution: Multi-storage Resource Architecture
Increases logical storage capacity Provides a more flexible and reliable
computing environment Provides new opportunities for
further performance improvement
January 17, 2001 Xiaohui Shen 19
Multi-storage Resource Architecture
January 17, 2001 Xiaohui Shen 20
Experimental Environment
Local Postgres Database
Local Disks Remote Disks Remote Tapes
Compute resource: Argonne SP2
January 17, 2001 Xiaohui Shen 21
Multi-storage I/O System
January 17, 2001 Xiaohui Shen 22
Database Tables and I/O Routines Run table Dataset table Access
pattern table Storage
pattern table Execution
table
January 17, 2001 Xiaohui Shen 23
User Access Pattern (write)
Field Description Value OptimizationsData Partition How data is partitioned
among processorsBBB, B**, BB, B*etc
Collective I/O
Write Size The size of the dataset Huge, large,medium, small
Data location,subfile, superfile
Write Sequence Whether there are asequence of data files(time steps)
Yes, no Superfile,asynchronous I/O
When Use When this dataset willbe accessed
Soon, long, never Data location, dataduplication
Use Frequency How often this datasetwill be accessed
Frequent, seldom,never
Data location, dataduplication
Compute Time Whether compute timeis significant part
Large, small Asynchronous I/O
Future ReadSize How large of datasetwill be accessed
Whole, partial subfile
FutureReadSequence
Will a sequence of datafiles will be accessed?
Yes, no Asynchronous I/O,superfile
Duration Data’s life time onstorage
Permanent,temporary
Data location,Data duplication
January 17, 2001 Xiaohui Shen 24
User Access Pattern (read)Field Description Value Optimizations
Data Partition How data ispartitioned amongprocessors
BBB, B**, BB, B*etc
Collective I/O
Use Frequency How often thisdataset will beaccessed
Frequent, seldom,never
Data location, dataduplication
Compute Time Whether computetime is significantpart
Large, small Asynchronous I/O
Read Size How large of thedataset will beaccessed
Whole, partial Subfile
Read Sequence Will a sequence ofdata files will beaccessed
Yes, no Asynchronous I/O,superfile
January 17, 2001 Xiaohui Shen 25
Optimization decision Flow
January 17, 2001 Xiaohui Shen 26
Applications and Tools
January 17, 2001 Xiaohui Shen 27
Experimental Environment
Applications: IBM SP2 at Argonne Multiple Storage Resources:
Local Disks: Argonne SP2 Remote Disks: SDSC Remote Tapes: SDSC HPSS Local Database: Postgres at NWU
January 17, 2001 Xiaohui Shen 28
0
50000
100000
150000
200000
1 2 3 4 5 6
I/O Time of Data Analysis
MS-I/O Experiments:Data Analysis on Astrophysics data
No access pattern then Remote Tape
DataPartition=‘BBB’ then Remote Tape + Colletive I/O
WhenUse=‘soon’ & Size =‘ medium’ then Remote Disk
Plus DataPartion=‘BBB” then Remote Disk + Collective I/O
Plus UseFrequency=‘frequent’ then Local Disk
Plus DataPartion=‘BBB” then Local Disk + Collective I/O
January 17, 2001 Xiaohui Shen 29
0
200
400
600
800
1000
1200
1 2 3 4 5 6
Execution Time of Volren
MS-I/O Experiments: Volume Rendering
No Access Pattern then Remote Tape
ComputeTime=‘large’ then Remote Tape + Asyn- I/O
WhenUse=‘soon’ & Size =‘ medium’ then Remote Disk
Plus ComputeTime=‘large’ then Remote Disk + Asyn - I/O
Plus UseFrequency=‘frequent’ then Local Disk
Plus ComputeTime=‘large’ then Local Disk + Asyn - I/O
January 17, 2001 Xiaohui Shen 30
MS-I/O Experiments: Subfile and Superfile
0
20000
40000
60000
80000
Remote Disks Remote Tapes
I/O Time of Subfile
Naive
Subfile
0
10
20
30
40
10 Files 20 Files
I/O Time of Superfile
Naive
Superfile
WriteSize=‘huge’ & FutureReadSize = ‘partial’
WriteSize=‘small’ & WriteSequence=‘y’ & FutureReadSequence=‘y’
January 17, 2001 Xiaohui Shen 31
MS-I/O Experiments: Replication and Access History
0
100
200
300
400
500
600
700
1 2 3 4 5 6
Data Access History
0
100
200
300
400
500
600
700
1 2 3 4 5 6
I/O Time of Data Duplication
Dataset was first placed at Remote site
Read.UseFrequency =‘frequent’
Dataset being frequently used is detected.
January 17, 2001 Xiaohui Shen 32
Part 4: DPFS: A Distributed Parallel File System
Collect idle distributed storage as supplement to native storage of parallel computing systems
Characteristics Distributed Parallel File System Database
January 17, 2001 Xiaohui Shen 33
System Architecture of DPFS
January 17, 2001 Xiaohui Shen 34
Software Architecture of DPFS
Parallelism Concurrency
January 17, 2001 Xiaohui Shen 35
DPFS BSU and File view
A Basic Striping Unit (BSU) is called brick in DPFS. Size is 64K.
January 17, 2001 Xiaohui Shen 36
Striping Methods
Lineal Striping Multi-dimensional Striping Array Striping
January 17, 2001 Xiaohui Shen 37
Lineal Striping
January 17, 2001 Xiaohui Shen 38
Problems of Linear Striping
January 17, 2001 Xiaohui Shen 39
Multi-dimensional Striping
January 17, 2001 Xiaohui Shen 40
Array Striping
January 17, 2001 Xiaohui Shen 41
Striping Algorithms
Round - Robin Greedy Algorithm
January 17, 2001 Xiaohui Shen 42
Request Combination
P0: 0-7 P1:8-15 P2:16-23 P3:24-31
P0(0,4) P1(9,13) P2(18,22)P3(27,31)
P0(1,5) P1(10,14) P2(19,23) P3(24,28)
...
January 17, 2001 Xiaohui Shen 43
Meta-data and Database
January 17, 2001 Xiaohui Shen 44
Tree Structure
January 17, 2001 Xiaohui Shen 45
Application Programming Interface
DPFS-Open () DPFS-Write () DPFS-Read () DPFS-Close ()
January 17, 2001 Xiaohui Shen 46
User Interface File system commands: cp, mkdir,
rm, ls etc File transfer between DPFS and
general sequential file system. Example: cp local:my.data DPFS:/home/xhshen:4:greedy
January 17, 2001 Xiaohui Shen 47
Experimental Environment
Compute Resource: Argonne IBM SP2
Storage Resources: Class 1: Argonne Linux machines
(Fast Ethernet and ATM) Class 2: NWU Workstations (155M
ATM) Class 3: NWU Workstations (10 M
Eithernet)
January 17, 2001 Xiaohui Shen 48
DPFS Performance Numbers: File Level Comparison
0
5
10
15
Class 3 Class 2 Class 1
File Level Comparisons8 compute nodes, 4 I/O nodes
Linear
CombinedLinear
Multi-dim
CombinedMulti-dim
Array
CombinedArray
0
5
10
15
20
Class 3 Class 2 Class 1
File Level Comparisons16 compute nodes, 8 I/O nodes
Linear
CombinedLinear
Multi-dim
CombinedMulti-dim
Array
CombinedArray
January 17, 2001 Xiaohui Shen 49
DPFS Performance Numbers: Striping Algorithm Comparison
0
1
2
3
4
5
6
Round-robin Greedy
Striping Algorithm Comparison(8 compute nodes, 8 I/O nodes)
Write
Combined Write
Read
Combined Read
0
2
4
6
8
10
Round-robin Greedy
Striping Algorithm Comparison(16 compute nodes, 16 I/O nodes)
Write
Combined Write
Read
Combined Read
January 17, 2001 Xiaohui Shen 50
Part 5: I/O Performance Prediction and Evaluation
Performance Model Performance Prediction Algorithm
January 17, 2001 Xiaohui Shen 51
Performance Model
T(s) = Tconn + Topen + Tseek + Tread/write(s) + Tfileclose + Tconnclose
January 17, 2001 Xiaohui Shen 52
Performance Prediction Algorithm
M: number of datasets N: total number of iterations freq(j): I/O frequency n(j): number of I/O calls tj(s): data transfer time (stored in
database)
M
j
jprediction stjnjfreqNT1
)()()1)(/(
January 17, 2001 Xiaohui Shen 53
Part 6: Integrated Java Graphical User Interface
January 17, 2001 Xiaohui Shen 54
Functions of IJ-GUI
Registering new applications Running applications remotely
January 17, 2001 Xiaohui Shen 55
Functions of IJ-GUI
Data analysis and visualization Table browsing and searching
January 17, 2001 Xiaohui Shen 56
Functions of IJ-GUI
Automatic code generator
January 17, 2001 Xiaohui Shen 57
Functions of IJ-GUI I/O performance prediction
January 17, 2001 Xiaohui Shen 58
I/O Latency Reducing for Interactive Visualization
I/O Latency Reducing
0
50
100
150
200
250
300
350
400
450
1 2 3 4 5 6 7 8 9 10
Tim
e (s
)
Visualization TimeI/O Time
January 17, 2001 Xiaohui Shen 59
Summary of Contributions
Meta-data Management System Remote Storage Access Optimizations Multi-Storage I/O System Distributed Parallel File System I/O performance prediction and evaluation Integrated working environment
January 17, 2001 Xiaohui Shen 60
Publications A Multi-Storage Resource Architecture and I/O Performance Prediction for
Scientific Computing. by X. Shen and A. Choudhary. Cluster Computing Journal. A Novel Application Development Environment for Large-Scale Scientific
Computations, by X. Shen, W. Liao, A. Choudhary, et al. ACM ICS2000 Remote I/O Optimization and Evaluation for Tertiary Storage Systems through
Storage Resource Broker, by X. Shen, W. Liao and A. Choudhary. IASTED Applied Informatics, Innsbruck, Austria, 2001.
A Java Graphical User Interface for Large-Scale Scientific Computations in Heterogeneous Systems, by X. Shen, G. Thiruvathukal, W. Liao, A. Choudhary, and A. Singh. HPC-ASIA, May 2000.
Meta-Data Management System for High-Performance Large-Scale Scientific Data Access, by W. Liao, X. Shen, A. Choudhary. HiPC 2000.
Data management for large-scale scientific computations in high performance distributed systems, by A. Choudhary, M. Kandemir, H. Nagesh, J. No, X. Shen, V. Taylor, S. More, and R. Thakur. In Proc. HPDC-99
A Multi-Storage Resource Architecture and I/O Performance Prediction for Scientific Computing. by Xiaohui Shen and Alok Choudhary. HPDC-00
A Distributed Multi-Storage I/O System for High Performance Data Intensive Computing, by Xiaohui Shen and Alok Choudhary.
DPFS: A Distributed Parallel File System, by Xiaohui Shen and Alok Choudhary. An Integrated Graphical User Interface for High Performance Distributed Computing, by Xiaohui Shen, Wei-keng Liao and Alok Choudhary
An Integrated Graphical User Interface for High Performance Distributed Computing, by Xiaohui Shen, Wei-keng Liao and Alok Choudhary
A Multimedia Integrated Parallel File System, by J. Carretero, W. Zhu, X. Shen, A. Choudhary. JCIS98.
January 17, 2001 Xiaohui Shen 61
Future Directions-1
January 17, 2001 Xiaohui Shen 62
Future Directions-2