19
1 The Case for Versatile Storage System NetSysLab The University of British Columbia Samer Al-Kiswany, Abdullah Gharaibeh, Matei Ripeanu

1 The Case for Versatile Storage System NetSysLab The University of British Columbia Samer Al-Kiswany, Abdullah Gharaibeh, Matei Ripeanu

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

1

The Case for Versatile Storage System

NetSysLabThe University of British Columbia

Samer Al-Kiswany, Abdullah Gharaibeh, Matei Ripeanu

2

Introduction

HotStorage ‘09

Versatile Storage System for large-scale platforms:

• Underutilized resources• Application specialization

The Deployment Approach: • Configured at deployment time• Coupled with the target application

Potential: Higher performance and scalability

3

Platform Example – Argonne Blue Gene/P

160K cores

10 Gb/s Switch

Complex

10 Gb/s Switch

Complex

GPFS

24 servers

IO rate : 8GBps = 51KBps / core !!

HotStorage ‘09

2.5K IO NodesT

oru

s Netw

ork

2.5 GBpsper node3D Torus

850 MBps per 64 nodes

TreeUnder utilized resources.

4

Workload Characteristics

HotStorage ‘09

Workflows – Execution stages communicating through intermediate temporary files

Source [Zhao et. al. SIGMOD record ‘05]

Input file

Output file

Compute

5

Workload Characteristics

HotStorage ‘09

Workflows – Execution stages communicating through intermediate temporary files

Tibi Stef-Praun, et. al. [e-Social Science ‘07]

6

Workload Characteristics

Workflows – Execution stages communicating through intermediate temporary files

HotStorage ‘09

Axes Optimizations

Data life time (temporary )

Application informed caching

Read (Seq. ) Read-ahead

Write (Seq. ) Asynch. write

Consistency (no ) Relaxed Consistency

Workflows

7

Workload Characteristics

Data Analysis – Analyze/search large data sets (e.g. BLAST)

HotStorage ‘09

BLASTMatch new sequences with a data set of known sequences (linear search)

Axes Optimizations

Data life time (temporary )

Application informed caching

Read (Seq. ) Read-ahead

Write (Seq. ) Asynch. write

Consistency (no )

Relaxed Consistency

Locality Caching

Workflows – Data Analysis

8

Workload Characteristics

Checkpointing

HotStorage ‘09

Axes Optimizations

Data life time (temporary )

Application informed caching

Read (Seq. ) Read-ahead

Write (Seq. ) Asynch. write

Consistency (no )

Relaxed Consistency

Locality Caching

Compressibility Similarity detectionWorkflows Data Analysis Checkpointing

9

Workload Characteristics

HotStorage ‘09

Workflows Data Analysis Checkpointing

Axes Optimizations

Data life time (temporary )

Application informed caching

Read (Seq. ) Read-ahead

Write (Seq. ) Asynch. write

Consistency (no )

Relaxed Consistency

Locality Caching

Compressibility Similarity detection

Security Tunable sec. levels

10

Opportunities

Specialization: Application specialized storage Under utilized resources

Compute node storage space Interconnect bandwidth

HotStorage ‘09

11

Our Solution

Versatile Storage System: Application specialized

The Deployment Approach: • Configured at deployment time• Life time coupled with the target application

Potential : Higher

performance and

scalability

HotStorage ‘09

12

Versatile Storage System Architecture

Manager(Metadata management)

HotStorage ‘09

Access Module

StorageNode

Compute Node

13

Configurable / Extensible IO Pipeline

HotStorage ‘09

Application

IO

Queue

DispatcherBuffer Manag. …ConsistencyMetadata

OperationsContent

AddressabilityData

SecurityCommunication

Agent

Application

IO

Queue

DispatcherBuffer Manag.

MetadataOperations

Access Module

StorageNode

14

Configurable / Extensible IO Pipeline

HotStorage ‘09

Application

IO

Queue

DispatcherBuffer Manag. …ConsistencyMetadata

OperationsContent

AddressabilityData

SecurityCommunication

Agent

Dispatcher …ConsistencyContent

AddressabilityData

SecurityCommunication

Agent

Access Module

StorageNode

15

Configurable / Extensible Support

HotStorage ‘09

Metadata Service API

DispatcherRequest

New Module Support

Application

IO

Queue

DispatcherBuffer Manag. …

MetadataOperations NM Communication

Agent

Access Module

StorageNode

Manager

Access Module

Header

Request data

16

Preliminary Evaluation – Real Application

HotStorage ‘09

DOCK6 workflow:

Overall: 1.52x

Stages

Read input, compute, and write temporary results

Summarize, sort, and select

Archive

Versatile Storage Optimizations

Cache the input data

Cache temporary files

Asynch. flush results to GPFS

Results (8K processors)

1.06x

11.76x

1.51x

17

Summary

HotStorage ‘09

Versatile Storage System• Underutilized resources• Application specialization

The Deployment Approach: • Configured at deployment time• Coupled with the target application

Potential: Higher performance and scalability

18

Not addressed – Future work

HotStorage ‘09

Configurability / extensibility evaluation Complete prototype Evaluation with a diverse set of applications

Configuration Application profiling File system automated configuration

19

Thank you

netsyslab.ece.ubc.ca