29
DDSS: A Low-Overhead Distributed Data Sharing Substrate for Cluster-Based Data-Centers over Modern Interconnects K. Vaidyanathan, S. Narravula and D. K. Panda Network Based Computing Laboratory (NBCL) The Ohio State University

NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

DDSS: A Low-Overhead Distributed Data Sharing Substrate

for Cluster-Based Data-Centers over Modern Interconnects

K. Vaidyanathan, S. Narravula and D. K. Panda

Network Based Computing Laboratory (NBCL)

The Ohio State University

Page 2: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

Presentation Outline

• Introduction and Motivation

• Proposed DDSS Framework

• Experimental Results

• Conclusions and Future Work

Page 3: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

Introduction and Motivation

WANWAN

Clients

Web-server(Apache)

DatabaseServer

(MySQL)

Storage

• Internet growth– Number of Users, Type of Service, Amount of data

– E-Commerce, online-banking, stocks, airline reservations

• Data-centers enable such services– Process data and reply to queries

– Need for services like caching, resource adaptation for performance, scalability

ProxyServer

Caching,load

balancing

Application Server (PHP)

CGI, PHP

Multi-TierData-Centers

Page 4: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

High-Performance Networks

• InfiniBand, 10 GigE– High Bandwidth– Low Latency

• Provides rich features– RDMA semantics, Atomic operations, Protocol offload

• OpenFabrics stack– Single interface for InfiniBand, iWARP/10 GigE, etc

• Targeted for Multi-Tier Data-Centers• Can the data-center processes coordinate

better?

Page 5: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

Information-Sharing is common• Applications typically employ their own

– Data placement and management protocols– Synchronization protocols

• Data-Center services– Active Resource Adaptation

• Maintain Server state information

• Locking requirements– Caching

• Coherency & Consistency requirements

– Resource Monitoring (IBM Websphere)

• Load information shared across several servers

– Critical decisions based on shared information

ProxyModule M1

(S1, S2S3, S4 load)

ProxyModule M2

(S1, S2S3, S4 load)

ProxyModule M3

(S1, S2S3, S4 load)

Load ofServer S1

Load ofServer S2

Load ofServer S3

Load ofServer S4

Resource Monitoring Service

Page 6: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

Problems with Existing approaches

• Ad-hoc messaging protocols for exchanging data

• May have high overheads

• Performance may depend on the system load

• May not use the advanced features

• May not be scalable

Page 7: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

Objective

• Can we design a load resilient substrate (DDSS) for data-center applications and services utilizing advanced features such as RDMA, remote atomic operations?

Page 8: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

Presentation Outline

• Introduction and Motivation

• Proposed DDSS Framework

• Experimental Results

• Conclusions and Future Work

Page 9: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

Distributed Data Sharing Mechanism

Shared Data

Data-CenterApplication

ResourceAdaptationServices

LoadBalancingServices

Data-CenterApplication

ResourceMonitoringServices

ResourceMonitoringServices

Get

Get load

Get load

Put

Put load

Put load

Lock Data

Provide an effective mechanism to share data across the data-center

Page 10: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

Proposed DDSS Framework

InfiniBand 10 GigE High-Speed Interconnects

ProtocolOffload RDMA Atomic Multicast

IPC Management

ConnectionMgmt

MemoryMgmt

DataMgmt

BasicLocks

Coherency,ConsistencyMaintenance

Data-CenterApplications

Data-CenterServices

High-SpeedNetworks

AdvancedNetworkFeatures

DistributedData-Sharing

SubstrateComponents

Page 11: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

Proposed Framework Contd…

• Data Management– Local vs Remote, for load

balancing

• Basic Locking– Through atomic operations

(IBA)

• Coherency and Consistency Maintenance– Strict, Write/Read, Null, Delta,

Version– Use of RDMA and atomic

operations

IPC Management

ConnectionMgmt

MemoryMgmt

DataMgmt

BasicLocks

Coherency,ConsistencyMaintenance

ProtocolOffload

RDMAAtomic

Page 12: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

Proposed Framework Contd…

• Connection Management– Takes care of connection-setup and

teardown for nodes participating in DDSS

• Memory Management– Allocates a pool of memory for

DDSS on each node– Manages allocation, release

operations

• IPC Management– Access for multiple threads– Message Queues

Data-CenterApplications

&Services

Module

IPC

OpenFabricsStack

OtherApplications

OtherModules

TCP/IPStack

Page 13: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

DDSS Interface

DDSS Interface• allocate_ss(…)

• release_ss(…)

• get(…)

• put(…)

• acquire_lock_ss(…)

• release_lock_ss(…)

• …

Key = allocate_ss(1024, NONCOHERENT_SS, 5000);

put(key, data, 10);

compute();

get(key,data, 10);

release_ss(key);

Key = allocate_ss(1024, WRITE_COHERENT_SS, 5000);

acquire_lock_ss(key);

put(key, data, 10); release_lock_ss(key);

compute();

get(key,data, 10);

release_ss(key);

Page 14: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

Presentation Outline

• Introduction and Motivation

• Proposed Framework

• Experimental Results

• Conclusions and Future Work

Page 15: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

Experimental Testbed

• InfiniBand– Cluster with dual Intel Xeon 3.4 GHz, 1GB memory– MT25128 Mellanox HCA

• iWARP/GigE– Cluster with Intel dual Xeon 3.0 GHz, 512 MB

memory– Ammasso 1100 Gigabit Ethernet NIC

• OpenFabrics stack– IB, Ammasso (iWARP)

Page 16: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

Experimental Results Outline

• Microbenchmarks– Performance of put() and get() operations

• Distributed Applications– Distributed STORM– Checkpointing Application

• Data-Center Services– Active Resource Adaptation– Active Caching

Page 17: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

Microbenchmarks

• performance of put() and get() operation for small messages is less than 65 usecs for all coherence models

0

2 0

4 0

6 0

8 0

1 0 0

1 2 0

1 4 0

1 1 6 2 5 6 4 0 9 6 6 5 5 3 6

M e s s a g e S iz e (b yte s )

Late

ncy

(use

cs)

N u ll

R e a d

W rite

S tr ic t

V e rs io n

D e lta

0

2 0

4 0

6 0

8 0

1 0 0

1 2 0

1 4 0

1 1 6 2 5 6 4 0 9 6 6 5 5 3 6

M e s s a g e S iz e (b yte s )

Late

ncy

(use

cs)

N u ll

R e a d

W rite

S tr ic t

V e rs io n

D e lta

put() performance get() performance

Page 18: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

Distributed STORM

• Select data of interest and transfer from storage to compute nodes

• Same dataset is processed by multiple STORM applications

this shared dataset is placed in DDSS

• STORM using DDSS shows close to 19% improvement

01 0 0 02 0 0 03 0 0 04 0 0 05 0 0 06 0 0 07 0 0 08 0 0 0

Q u e ry E x e c u t io n

T im e (u s e c s )

1 K 5 K 1 0 K 1 0 0 K

# R e c o rd s

S T O R M S T O R M -D D S S

Page 19: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

CR Coordination

• Checkpoint at random time

• Simulates restart from a consistent checkpoint

• Checkpoint uses DDSS for maintaining checkpoint information, locks, versions, etc

• Check-pointing applications using DDSS are highly scalable

050

100150200250300350

2 3 4 5 6 7 8 9 10 11 12

Number of clientsTim

e (us

ecs)

Avg Sync Time Avg Total Time

Page 20: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

Active Resource Adaptation

• Monitors the load of different websites

• If a website is loaded, shift under-utilized servers to loaded websites

• Software Overhead of DDSS is < 2%

0

5 0

1 0 0

1 5 0

5 1 0 2 0 4 0 6 0 8 0

L o a d (%)

Time (

usec

s)

05 0 01 0 0 01 5 0 02 0 0 02 5 0 03 0 0 0

R e c o n fig u ra t io n T im e s o ftw a re -o v e rh e a d

N o o f R e c o n fig u ra t io n s

Page 21: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

Active Caching

• Supports Strong Coherency for cached dynamic data

• Checks the back-end for current version using RDMA

• Active cache using DDSS is load-resilient

01 0 02 0 03 0 04 0 05 0 06 0 07 0 0

1 2 4 8 1 6 3 2

N u m b e r o f C o m p u te /C o m m u n ic a t io n T h re a d s

Time (

usec

s)

V e rs io n C h e c k - D D S S V e rs io n C h e c k - T C P

Page 22: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

Conclusions & Future Work

• Proposed a distributed data sharing substrate• Using DDSS, data-center applications and

services, with very little modification, can get significant benefits in performance and scalability

• Implemented over OpenFabrics – applicable across InfiniBand, iWARP-capable adapters

• Future work on Fault-tolerance, support for large file sizes, advanced resource management schemes.

Page 23: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

Acknowledgements

Our research is supported by the following organizations

• Current Funding support by

• Current Equipment support by

Page 24: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

Web Pointers

Group Homepage: http://nowlab.cse.ohio-state.edu

Emails: {vaidyana, narravul, panda}@cse.ohio-state.edu

NBC-LAB

Page 25: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

Backup Slides

Page 26: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

High-Performance Networks in Data-Centers

• InfiniBand, iWARP-capable adapters– Offer several features like RDMA, atomic operations (IB), iWARP

(Ammasso, 10 GigE)

Cluster-Based Data-Center Environment

(InfiniBand, iWARP-capable Ammasso, 10 GigE)

WideArea

Network

Distributed Data-Center Environment

iWARP Cluster

iWARP Cluster

iWARP Cluster

Page 27: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

Active Resource Adaptation Design

ServerWebsite A

LoadBalancer

ServerWebsite B

Not Loaded Loaded

Load QueryLoad Query

Successful Atomic (Lock)

Successful Atomic (Update Counter)

Reconfigure Node

Successful Atomic (Unlock)

Load Shared Load Shared

RDMARDMA

P. Balaji, K. Vaidyanathan, S. Narravula and D.K. Panda “Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand”

presented at RAIT 2004

Page 28: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

RDMA based Client Polling Design

Front-End Back-End

Request

Cache Hit

Cache Miss

Response

Version Read

Response

S. Narravla, P. Balaji, K. Vaidyanathan, S. Krishnamoorthy, J. Wu and D.K. Panda “Supporting Strong Coherency for Active Caches in Multi-

Tier Data-Centers over InfiniBand” presented at SAN 2004

Page 29: NETWORK-BASED COMPUTING LABORATORY DDSS: A Low-Overhead

Microbenchmarks

• performance of put() and get() operation is less than 50 usecs

0

1 0

2 0

3 0

4 0

5 0

6 0

1 5 9

N u m b e r o f C l ie n ts

Late

ncy

(use

cs)

N u ll

R e a d

W rite

S tr ic t

V e rs io n

D e lta

0

1 0 0

2 0 0

3 0 0

4 0 0

5 0 0

6 0 0

0 .1 0 .8

L o c k C o n te n t io n (% )La

tenc

y (u

secs

)

N u ll

R e a d

W rite

S tr ic t

V e rs io n

D e lta