27
Gordon: Using Flash Memory to Build Fast, Power- efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09 Shimin Chen Big Data Reading Group

Shimin Chen Big Data Reading Group

  • Upload
    dooley

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09. Shimin Chen Big Data Reading Group. Introduction. Gordon: a flash-based system architecture for massively parallel, data-centric computing. - PowerPoint PPT Presentation

Citation preview

Page 1: Shimin Chen Big Data Reading Group

Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive ApplicationsA. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09

Shimin ChenBig Data Reading Group

Page 2: Shimin Chen Big Data Reading Group

Introduction Gordon: a flash-based system architecture for

massively parallel, data-centric computing. Solid-state disks Low-power processors Data-centric programming paradigms

Can deliver: Up to 2.5X the computation per energy of a

conventional cluster based system Increasing performance by up to 1.5X

Page 3: Shimin Chen Big Data Reading Group

Outline Gordon’s system architecture Gordon’s storage system Configuring Gordon Discussion Summary

Page 4: Shimin Chen Big Data Reading Group

Gordon’s System Architecture

Gordon node

16 nodes in an enclosure

Page 5: Shimin Chen Big Data Reading Group

Prototype Photo:

http://www-cse.ucsd.edu/users/swanson/projects/gordon.html

ASPLOS’09 paper uses simulation

Page 6: Shimin Chen Big Data Reading Group

Gordon node Configuration:

256GB of flash storage A flash storage controller (w/ 512MB dedicated DRAM) 2GB ECC DDR2 SDRAM 1.9Ghz Intel Atom processor Running a minimal linux installation

Power: no more than 19w Compared to 81w of a server

900MB/s read and write bandwidth to 256GB disk

Page 7: Shimin Chen Big Data Reading Group

Enclosures Within an enclosure, 16 nodes plug into a

backplane that provides 1Gb Ethernet-style network

A rack holds 16 enclosures (16x16=256 nodes) 64 TB of storage 230 GB/s of aggregate IO bandwidth

Page 8: Shimin Chen Big Data Reading Group

Programming From the SW and users’ perspectives, a Gordon

system appears to be a conventional computing cluster

Benchmarks: Hadoop

Page 9: Shimin Chen Big Data Reading Group

Outline Gordon’s system architecture Gordon’s storage system Configuring Gordon Discussion Summary

Page 10: Shimin Chen Big Data Reading Group

Flash Array Hardware

Flash controller

Flash memoryFlash

memoryFlash memoryFlash

memory

4 buses, each support 4 flash packages

CPU2GB

System DRAM

512MB FTL

DRAM

The flash controller has over 300 pins

Page 11: Shimin Chen Big Data Reading Group
Page 12: Shimin Chen Big Data Reading Group

Super-Page Three ways to

stripe data across flash memory

Horizontal: across buses

Vertical: across packages on the same bus

2-D: combined

Page 13: Shimin Chen Big Data Reading Group

Bypassing and Write Combining Read bypassing: merging read requests to the

same page Write combining: merging write requests to the

same page

Page 14: Shimin Chen Big Data Reading Group

Trace-Based Study

64KB super-page achieves the best performance

Page 15: Shimin Chen Big Data Reading Group

Outline Gordon’s system architecture Gordon’s storage system Configuring Gordon Discussion Summary

Page 16: Shimin Chen Big Data Reading Group

Workloads Map-Reduce workloads using Hadoop

Each workload is run on 8 2.4GHz Core 2 Quad machines with 8GB RAM and a single SATA hard disk with 1Gb Ethernet to collect traces.

Page 17: Shimin Chen Big Data Reading Group

Power Model

Page 18: Shimin Chen Big Data Reading Group

Simulation High-level trace-driven cluster simulator

Trace: second-by-second utilization (number of instructions, number of L2 cache misses, number of bytes sent and received over network, number of hard drive accesses)

Replay the trace for a conventional cluster model and a Gordon model

Detailed storage system simulators Disksim for simulation 1TB, 7200rpm SATAII hard

drive Gordon simulator

Page 19: Shimin Chen Big Data Reading Group
Page 20: Shimin Chen Big Data Reading Group

88 different node configurations

Page 21: Shimin Chen Big Data Reading Group

Optimal Designs

MinT: Min Time MaxE: Max performance/watt MinP: Min Power

Page 22: Shimin Chen Big Data Reading Group
Page 23: Shimin Chen Big Data Reading Group

Outline Gordon’s system architecture Gordon’s storage system Configuring Gordon Discussion Summary

Page 24: Shimin Chen Big Data Reading Group

Exploit Disks Cheap redundancy Virtualize Gorgon:

Gordon + a large disk-based data warehouse Load data from disk storage into Gordon for a job May overlap data transfer time for preparing a job with

computation

Page 25: Shimin Chen Big Data Reading Group

System Costs

Actual price is $7200

Page 26: Shimin Chen Big Data Reading Group

System Costs for Storing 1PB Data

Page 27: Shimin Chen Big Data Reading Group

Summary Use flash memory + low-power processor

(Atom) Support data intensive computing: such as Map-

Reduce operations The design choice is attractive because of

higher power/performance efficiency