29
DieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta Diwaker Gupta Kashi V. Vishwanath Amin Vahdat University of California, San Diego

DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

DieCast: Testing Distributed Systems with an Accurate Scale Model

Diwaker GuptaDiwaker Gupta

Kashi V. Vishwanath

Amin Vahdat

University of California, San Diego

Page 2: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

High performancefilesystemAlice

June 7, 2008 NSDI 2008 | DieCast 2

Limited testinginfrastructure

Diverse deploymentenvironments Use smaller

infrastructure to test a much larger system

Page 3: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

Goals

• Fidelity– How closely can we replicate the target system?

• Reproducibility• Reproducibility– Can we do controlled experiments?

• Efficiency– Use fewer resources

June 7, 2008 NSDI 2008 | DieCast 3

DieCast can scale up a test infrastructure by an order of magnitude

Page 4: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

DieCast Overview

�Replicate target system using fewer machines

�Resource equivalence: perceived CPU capacity, disk and network characteristicscapacity, disk and network characteristics

�Preserve application performance

×Not scaled

×Physical memory: mitigating solutions

× Secondary storage: cheap

June 7, 2008 NSDI 2008 | DieCast 4

Page 5: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

Original System

Applicationservers

June 7, 2008 NSDI 2008 | DieCast 5

Load balancer

Web servers

Databaseservers

Switches

• Fidelity

• Reproducibility

• Efficiency

Page 6: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

Server Consolidation (VMs)

June 7, 2008 NSDI 2008 | DieCast 6

Network emulation

• Fidelity

• Reproducibility

• Efficiency

Page 7: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

Multiplexing Leads to Resource Partitioning

3 GHz CPU, 1 Gbps N/W, 15 Mbps disk I/O, 2 GB RAM

June 7, 2008 NSDI 2008 | DieCast 7

Split equally among 5 VMs

~ 600 MHz CPU, 200 Mbps N/W, 3 Mbps disk I/O, 400 MB RAM each

Page 8: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

Time Dilation [NSDI 2006]

Real time(No dilation)

Events

1 sec

10 Mb

• Slow down passage of time within the OS• CPU, network, disk – all appear faster

Key idea: time is also a resource!

June 7, 2008 NSDI 2008 | DieCast 8

Perceived bandwidth = 10 Mb/s

Dilated time

Events

100 msec

10 Mb

Perceived bandwidth = 100 Mb/s

faster• Experiments take longer

Time Dilation Factor (TDF) = Real time/Virtual time

In this example, TDF = 1sec/100ms = 10

Page 9: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

Multiplexing Under Time Dilation

3 GHz CPU, 1 Gbps N/W, 15 Mbps disk I/O, 2 GB RAM

June 7, 2008 NSDI 2008 | DieCast 9

~ 600 MHz CPU, 200 Mbps N/W, 3 Mbps disk I/O, 400-MB RAM, each

~ 3 GHz CPU, 1 Gbps N/W, 15 Mbps disk I/O?, 400 MB RAM each

TDF 5

Page 10: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

Time Dilation: External Interactions

Dilated TimeFrame

June 7, 2008 NSDI 2008 | DieCast 10

NetworkExternal systems running in the real time frame

Page 11: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

Disk I/O Scaling

• Invariant: perceived disk characteristics are preserved

– Seek time

– Read/write throughput– Read/write throughput

• Issues

– Low level functionality in firmware

– Different I/O models

– Per request scaling is difficult

June 7, 2008 NSDI 2008 | DieCast 11

Page 12: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

Implementation Details

• Supported platforms

– Xen 2.0.7, 3.0.4, 3.1

– Can be ported to non-virtualized systems

• Support for unmodified guest OSes• Support for unmodified guest OSes

• Disk I/O scaling for different I/O models

– Fully virtualized: integration with DiskSim

– Paravirtualized: scaling in device driver

June 7, 2008 NSDI 2008 | DieCast 12

Page 13: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

Disk I/O Scaling: Fully Virtualized VMs

ioemu

disksim

VM(Unmodified OS)

VM(Unmodified OS)

Domain-0Domain-0

Guest OS unaware that no real disk

exists

Request completion time in

simulated disk

June 7, 2008 NSDI 2008 | DieCast 13

VM diskimage

ioemu

Disk device driver

XenXen

exists

Guest OS filesystem

I/O emulation

Page 14: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

Disk I/O Scaling: Fully Virtualized VMs

ioemu

disksim

VM(Unmodified OS)

VM(Unmodified OS)

Domain-0Domain-0

Required perceivedtime: Tsim

⇒Total real timeT = TDF*TActual time to

Service time in simulated disk:

TsimDiskSim running

time: Tdisksim

June 7, 2008 NSDI 2008 | DieCast 14

VM diskimage

ioemu

Disk device driver

XenXen

Treal = TDF*TsimActual time to service: Tioemu

Delay: Delay

Treal = Tioemu + Delay + Tdisksim

⇒Delay = (TDF*Tsim) – Tdisksim – Tioemu

Page 15: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

Network I/O Scaling

Real Configuration Perceived Configuration

Invariant: Perceived network characteristics (bandwidths and latencies) must be preserved

10 Mb/s, 20ms RTT

Real Configuration Perceived Configuration

Original system(TDF 1)

10 Mb/s, 20 ms 10 Mb/s, 20 ms

Time Dilation (TDF 5)

10 Mb/s, 20 ms 50 Mb/s, 4 ms

DieCast (TDF 5) 2 Mb/s, 100 ms 10 Mb/s, 20 ms

June 7, 2008 NSDI 2008 | DieCast 15

Network emulation: ModelNet, Dummynet

Page 16: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

Recap

• Multiplex VMs for efficiency

• Time dilation to scale resources

• Disk I/O scaling

• Network I/O scaling• Network I/O scaling

At this point, the scaled system almost looks like original system!

June 7, 2008 NSDI 2008 | DieCast 16

• Fidelity

• Reproducibility

• Efficiency

Page 17: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

Validation• How well does DieCast scaled performance match the original system?– Application specific metrics

• Can a smaller system be configured to match the resources of a larger system?the resources of a larger system?– Resource utilization profiles

• Applications: RUBiS, BitTorrent, Isaac

• RUBiS– eBay like e-Commerce service

– Ships with workload generator

June 7, 2008 NSDI 2008 | DieCast 17

Page 18: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

RUBiS: Topology

4 DB

8 WebServers

4 DB

8 WebServers

Wide

Area

Link

June 7, 2008 NSDI 2008 | DieCast 18

16 Workload Generators

WideAreaLink

Page 19: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

Experimental SetupBaselineconfiguration:40 physicalmachines

DieCast scaledConfiguration:4 physical machines,10 VMs each

• Xen 3.1, fully virtualized VMs

• Debian Etch, Linux 2.6.17, 256 MB RAM

• DiskSim emulating Seagate ST3217

• Network emulation using ModelNetJune 7, 2008 NSDI 2008 | DieCast 19

Page 20: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

RUBiS: Throughput

June 7, 2008 NSDI 2008 | DieCast 20

Page 21: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

RUBiS: Response Time

June 7, 2008 NSDI 2008 | DieCast 21

Page 22: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

RUBiS: Resource UsageCPU

June 7, 2008 NSDI 2008 | DieCast 22

Memory Network

Page 23: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

Validation Recap

• Evaluated

– RUBiS– BitTorrent

– Isaac

Many more details in the paper

– Isaac

• Demonstrated

– Match application specific metrics

– Preserve resource utilization profile

June 7, 2008 NSDI 2008 | DieCast 23

Page 24: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

Case study: Panasas

• Panasas builds scalable storage systems for high performance computing

– http://www.panasas.com

• Caters to variety of clients• Caters to variety of clients

• Difficult or even impossible to replicate deployment environment of all clients

• Limited resources for testing

June 7, 2008 NSDI 2008 | DieCast 24

Page 25: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

DieCast in Panasas• Custom OS• Integrated hw/sw offering • Not runnable on Xen• Porting DieCast to non-virtualized environments

Clients

June 7, 2008 NSDI 2008 | DieCast 25

Clients run Linux, can be virtualized

Dummynet for network scalingStorage cluster

Page 26: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

Panasas: Evaluation SummaryBaseline DieCast scaled:

1 PM, 10 VMs

• Validation– Two benchmarks from standard test suite: IOZone, MPI-IO; varying block sizes

– Match performance metrics

June 7, 2008 NSDI 2008 | DieCast 26

Scaling: Used 100 machines to scale to 1000 clients

Page 27: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

Limitations

• Memory scaling

• Long running workloads

• Specialized hardware appliances

• Fine grained timing• Fine grained timing

June 7, 2008 NSDI 2008 | DieCast 27

Page 28: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

Summary

• DieCast: scalable testing

– Fidelity, Reproducibility, Efficiency

• Contributions

– Support for unmodified operating systems– Support for unmodified operating systems

– Implement disk I/O scaling (DiskSim integration)

– CPU scheduler enhancements for time dilation

– Comprehensive evaluation, including a commercial storage system

June 7, 2008 NSDI 2008 | DieCast 28

Page 29: DieCast: Testing Distributed Systems with an …cseweb.ucsd.edu/~dgupta/slides/nsdi08-diecast.pdfDieCast: Testing Distributed Systems with an Accurate Scale Model Diwaker Gupta KashiV

Thanks!

Questions?

[email protected]