62
January 17, 2001 Xiaohui Shen 1 Data Management, Storage and Access Optimization in High Performance Distributed Environment Xiaohui Shen Department of Electrical and Computer Engineering Northwestern University Jan 17, 2001

Data Management, Storage and Access Optimization in High Performance Distributed Environment

  • Upload
    ardice

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

Data Management, Storage and Access Optimization in High Performance Distributed Environment. Xiaohui Shen Department of Electrical and Computer Engineering Northwestern University Jan 17, 2001. Outline. Problem Definition Solutions Meta-data Management System - PowerPoint PPT Presentation

Citation preview

Page 1: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 1

Data Management, Storage and Access Optimization in High Performance

Distributed Environment

Xiaohui ShenDepartment of Electrical and Computer EngineeringNorthwestern UniversityJan 17, 2001

Page 2: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 2

Outline

Problem Definition Solutions

Meta-data Management System Remote Storage Access Optimizations Multi-Storage I/O System Distributed Parallel File System I/O performance prediction and evaluation Integrated working environment

Page 3: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 3

Motivation

Page 4: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 4

Current Solutions Parallel File System and runtime libraries: smart

I/O optimizations, caching, prefetching, parallel I/O User interfaces are low-level No portable Hard-coded I/O selection is difficult for runtime systems

Database Systems: high-level, easy-to-use, portable lack of power I/O optimizations

Page 5: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 5

System Architecture

Page 6: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 6

Tasks

Meta-data Management System Remote Storage Access Optimizations Efficient Storage Organization

• Multi-Storage I/O System• Distributed Parallel File System

I/O performance prediction and evaluation Integrated working environment

Page 7: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 7

Part 1: Meta-data Management System (MDMS)

Abstract Storage Devices (ASDs) Storage patterns & access patterns Access History and trail of

navigation

Page 8: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 8

MDMS Tables

TABLE NAME FUNCTIONALITY PRIMARY KEYRun table Record each run of the application

with user-specified attributesRun id

Dataset table Keeps information about the datasetsused each run

Run id + association id

Access patterntable

Keeps the access pattern specified byuser for each dataset

Run id + dataset name

Storage patterntable

Keeps information on how data storedfor each dataset

Dataset name

Execution table Records I/O activities of the run,including file path and name, offset,etc

Run id + dataset + iterationnumber

Page 9: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 9

MDMS Internal Representation

Page 10: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 10

MDMS I/O Flow (API)

Page 11: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 11

Optimizations inside MDMS

Page 12: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 12

Part 2: Remote Storage Access Optimization for HSS

Secondary Storage Access techniques: collective-I/O, data sieving, caching, prefetching etc

Tertiary Storage Systems directly interacts with applications

Remote environment

Page 13: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 13

Optimizations

Remote Collective I/O Remote Data sieving Asynchronous I/O Subfile Superfile Migration, Stage and Purge,

SRB Container

Page 14: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 14

Optimization: Subfile

Subfile Subfile

Subfile

Subfile

SubfileSubfileSubfileSubfile

Subfile

Subfile Subfile Subfile

Subfile Subfile Subfile Subfile

Page 15: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 15

Optimization: Superfile Create: One large file

Access: first access brings the whole large file into memory, subsequent accesses can be directly serviced from memory

Page 16: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 16

Other Optimizations

Migration Stage Purge SRB Container

Page 17: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 17

Part 3: MS-I/O: A Multi-storage I/O System

Further performance improvement is limited by the nature of storage media.

The problem is rooted in the traditional Single-storage resource architecutre.

Page 18: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 18

Solution: Multi-storage Resource Architecture

Increases logical storage capacity Provides a more flexible and reliable

computing environment Provides new opportunities for

further performance improvement

Page 19: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 19

Multi-storage Resource Architecture

Page 20: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 20

Experimental Environment

Local Postgres Database

Local Disks Remote Disks Remote Tapes

Compute resource: Argonne SP2

Page 21: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 21

Multi-storage I/O System

Page 22: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 22

Database Tables and I/O Routines Run table Dataset table Access

pattern table Storage

pattern table Execution

table

Page 23: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 23

User Access Pattern (write)

Field Description Value OptimizationsData Partition How data is partitioned

among processorsBBB, B**, BB, B*etc

Collective I/O

Write Size The size of the dataset Huge, large,medium, small

Data location,subfile, superfile

Write Sequence Whether there are asequence of data files(time steps)

Yes, no Superfile,asynchronous I/O

When Use When this dataset willbe accessed

Soon, long, never Data location, dataduplication

Use Frequency How often this datasetwill be accessed

Frequent, seldom,never

Data location, dataduplication

Compute Time Whether compute timeis significant part

Large, small Asynchronous I/O

Future ReadSize How large of datasetwill be accessed

Whole, partial subfile

FutureReadSequence

Will a sequence of datafiles will be accessed?

Yes, no Asynchronous I/O,superfile

Duration Data’s life time onstorage

Permanent,temporary

Data location,Data duplication

Page 24: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 24

User Access Pattern (read)Field Description Value Optimizations

Data Partition How data ispartitioned amongprocessors

BBB, B**, BB, B*etc

Collective I/O

Use Frequency How often thisdataset will beaccessed

Frequent, seldom,never

Data location, dataduplication

Compute Time Whether computetime is significantpart

Large, small Asynchronous I/O

Read Size How large of thedataset will beaccessed

Whole, partial Subfile

Read Sequence Will a sequence ofdata files will beaccessed

Yes, no Asynchronous I/O,superfile

Page 25: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 25

Optimization decision Flow

Page 26: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 26

Applications and Tools

Page 27: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 27

Experimental Environment

Applications: IBM SP2 at Argonne Multiple Storage Resources:

Local Disks: Argonne SP2 Remote Disks: SDSC Remote Tapes: SDSC HPSS Local Database: Postgres at NWU

Page 28: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 28

0

50000

100000

150000

200000

1 2 3 4 5 6

I/O Time of Data Analysis

MS-I/O Experiments:Data Analysis on Astrophysics data

No access pattern then Remote Tape

DataPartition=‘BBB’ then Remote Tape + Colletive I/O

WhenUse=‘soon’ & Size =‘ medium’ then Remote Disk

Plus DataPartion=‘BBB” then Remote Disk + Collective I/O

Plus UseFrequency=‘frequent’ then Local Disk

Plus DataPartion=‘BBB” then Local Disk + Collective I/O

Page 29: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 29

0

200

400

600

800

1000

1200

1 2 3 4 5 6

Execution Time of Volren

MS-I/O Experiments: Volume Rendering

No Access Pattern then Remote Tape

ComputeTime=‘large’ then Remote Tape + Asyn- I/O

WhenUse=‘soon’ & Size =‘ medium’ then Remote Disk

Plus ComputeTime=‘large’ then Remote Disk + Asyn - I/O

Plus UseFrequency=‘frequent’ then Local Disk

Plus ComputeTime=‘large’ then Local Disk + Asyn - I/O

Page 30: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 30

MS-I/O Experiments: Subfile and Superfile

0

20000

40000

60000

80000

Remote Disks Remote Tapes

I/O Time of Subfile

Naive

Subfile

0

10

20

30

40

10 Files 20 Files

I/O Time of Superfile

Naive

Superfile

WriteSize=‘huge’ & FutureReadSize = ‘partial’

WriteSize=‘small’ & WriteSequence=‘y’ & FutureReadSequence=‘y’

Page 31: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 31

MS-I/O Experiments: Replication and Access History

0

100

200

300

400

500

600

700

1 2 3 4 5 6

Data Access History

0

100

200

300

400

500

600

700

1 2 3 4 5 6

I/O Time of Data Duplication

Dataset was first placed at Remote site

Read.UseFrequency =‘frequent’

Dataset being frequently used is detected.

Page 32: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 32

Part 4: DPFS: A Distributed Parallel File System

Collect idle distributed storage as supplement to native storage of parallel computing systems

Characteristics Distributed Parallel File System Database

Page 33: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 33

System Architecture of DPFS

Page 34: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 34

Software Architecture of DPFS

Parallelism Concurrency

Page 35: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 35

DPFS BSU and File view

A Basic Striping Unit (BSU) is called brick in DPFS. Size is 64K.

Page 36: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 36

Striping Methods

Lineal Striping Multi-dimensional Striping Array Striping

Page 37: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 37

Lineal Striping

Page 38: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 38

Problems of Linear Striping

Page 39: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 39

Multi-dimensional Striping

Page 40: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 40

Array Striping

Page 41: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 41

Striping Algorithms

Round - Robin Greedy Algorithm

Page 42: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 42

Request Combination

P0: 0-7 P1:8-15 P2:16-23 P3:24-31

P0(0,4) P1(9,13) P2(18,22)P3(27,31)

P0(1,5) P1(10,14) P2(19,23) P3(24,28)

...

Page 43: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 43

Meta-data and Database

Page 44: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 44

Tree Structure

Page 45: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 45

Application Programming Interface

DPFS-Open () DPFS-Write () DPFS-Read () DPFS-Close ()

Page 46: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 46

User Interface File system commands: cp, mkdir,

rm, ls etc File transfer between DPFS and

general sequential file system. Example: cp local:my.data DPFS:/home/xhshen:4:greedy

Page 47: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 47

Experimental Environment

Compute Resource: Argonne IBM SP2

Storage Resources: Class 1: Argonne Linux machines

(Fast Ethernet and ATM) Class 2: NWU Workstations (155M

ATM) Class 3: NWU Workstations (10 M

Eithernet)

Page 48: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 48

DPFS Performance Numbers: File Level Comparison

0

5

10

15

Class 3 Class 2 Class 1

File Level Comparisons8 compute nodes, 4 I/O nodes

Linear

CombinedLinear

Multi-dim

CombinedMulti-dim

Array

CombinedArray

0

5

10

15

20

Class 3 Class 2 Class 1

File Level Comparisons16 compute nodes, 8 I/O nodes

Linear

CombinedLinear

Multi-dim

CombinedMulti-dim

Array

CombinedArray

Page 49: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 49

DPFS Performance Numbers: Striping Algorithm Comparison

0

1

2

3

4

5

6

Round-robin Greedy

Striping Algorithm Comparison(8 compute nodes, 8 I/O nodes)

Write

Combined Write

Read

Combined Read

0

2

4

6

8

10

Round-robin Greedy

Striping Algorithm Comparison(16 compute nodes, 16 I/O nodes)

Write

Combined Write

Read

Combined Read

Page 50: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 50

Part 5: I/O Performance Prediction and Evaluation

Performance Model Performance Prediction Algorithm

Page 51: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 51

Performance Model

T(s) = Tconn + Topen + Tseek + Tread/write(s) + Tfileclose + Tconnclose

Page 52: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 52

Performance Prediction Algorithm

M: number of datasets N: total number of iterations freq(j): I/O frequency n(j): number of I/O calls tj(s): data transfer time (stored in

database)

M

j

jprediction stjnjfreqNT1

)()()1)(/(

Page 53: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 53

Part 6: Integrated Java Graphical User Interface

Page 54: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 54

Functions of IJ-GUI

Registering new applications Running applications remotely

Page 55: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 55

Functions of IJ-GUI

Data analysis and visualization Table browsing and searching

Page 56: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 56

Functions of IJ-GUI

Automatic code generator

Page 57: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 57

Functions of IJ-GUI I/O performance prediction

Page 58: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 58

I/O Latency Reducing for Interactive Visualization

I/O Latency Reducing

0

50

100

150

200

250

300

350

400

450

1 2 3 4 5 6 7 8 9 10

Tim

e (s

)

Visualization TimeI/O Time

Page 59: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 59

Summary of Contributions

Meta-data Management System Remote Storage Access Optimizations Multi-Storage I/O System Distributed Parallel File System I/O performance prediction and evaluation Integrated working environment

Page 60: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 60

Publications A Multi-Storage Resource Architecture and I/O Performance Prediction for

Scientific Computing. by X. Shen and A. Choudhary. Cluster Computing Journal. A Novel Application Development Environment for Large-Scale Scientific

Computations, by X. Shen, W. Liao, A. Choudhary, et al. ACM ICS2000 Remote I/O Optimization and Evaluation for Tertiary Storage Systems through

Storage Resource Broker, by X. Shen, W. Liao and A. Choudhary. IASTED Applied Informatics, Innsbruck, Austria, 2001.

A Java Graphical User Interface for Large-Scale Scientific Computations in Heterogeneous Systems, by X. Shen, G. Thiruvathukal, W. Liao, A. Choudhary, and A. Singh. HPC-ASIA, May 2000.

Meta-Data Management System for High-Performance Large-Scale Scientific Data Access, by W. Liao, X. Shen, A. Choudhary. HiPC 2000.

Data management for large-scale scientific computations in high performance distributed systems, by A. Choudhary, M. Kandemir, H. Nagesh, J. No, X. Shen, V. Taylor, S. More, and R. Thakur. In Proc. HPDC-99

A Multi-Storage Resource Architecture and I/O Performance Prediction for Scientific Computing. by Xiaohui Shen and Alok Choudhary. HPDC-00

A Distributed Multi-Storage I/O System for High Performance Data Intensive Computing, by Xiaohui Shen and Alok Choudhary.

DPFS: A Distributed Parallel File System, by Xiaohui Shen and Alok Choudhary. An Integrated Graphical User Interface for High Performance Distributed Computing, by Xiaohui Shen, Wei-keng Liao and Alok Choudhary

An Integrated Graphical User Interface for High Performance Distributed Computing, by Xiaohui Shen, Wei-keng Liao and Alok Choudhary

A Multimedia Integrated Parallel File System, by J. Carretero, W. Zhu, X. Shen, A. Choudhary. JCIS98.

Page 61: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 61

Future Directions-1

Page 62: Data Management, Storage and Access Optimization in High Performance Distributed Environment

January 17, 2001 Xiaohui Shen 62

Future Directions-2