CSci5221: Data Centers, Cloud Computing, … Cloud Computing and Data Centers: Overview What’s Cloud Computing? Data Centers and “Computing at Scale” Case

Embed Size (px)

Text of CSci5221: Data Centers, Cloud Computing, … Cloud Computing and Data Centers: Overview What’s...

  • Slide 1
  • CSci5221: Data Centers, Cloud Computing, Cloud Computing and Data Centers: Overview Whats Cloud Computing? Data Centers and Computing at Scale Case Studies: Google File System Map-Reduce Programming Model Optional Material Google Bigtable Readings: Do required readings Also do some of the optional readings if interested
  • Slide 2
  • CSci5221: Data Centers, Cloud Computing, 2 Why Studying Cloud Computing and Data Centers Using Google as an example: GFS, MapReduce, etc. mostly related to distributed systems, not really networking stuff Two Primary Goals: they represent part of current and future trends how applications will be serviced, delivered, what are important new networking problems? more importantly, what lessons can we learn in terms of (future) networking design? closely related, and there are many similar issues/challenges (availability, reliability, scalability, manageability, .) (but of course, there are also unique challenges in networking)
  • Slide 3
  • CSci5221: Data Centers, Cloud Computing, 3 Internet and Web Simple client-server model a number of clients served by a single server performance determined by peak load doesnt scale well (e.g., server crashes), when # of clients suddenly increases -- flash crowd From single server to blade server to server farm (or data center)
  • Slide 4
  • CSci5221: Data Centers, Cloud Computing, 4 Internet and Web From traditional web to web service (or SOA) no longer simply file (or web page) downloads pages often dynamically generated, more complicated objects (e.g., Flash videos used in YouTube) HTTP is used simply as a transfer protocol many other application protocols layered on top of HTTP web services & SOA (service-oriented architecture) A schematic representation of modern web services front-end web rendering, request routing, aggregators, back-end database, storage, computing,
  • Slide 5
  • CSci5221: Data Centers, Cloud Computing, 5 Data Center and Cloud Computing Data center: large server farms + data warehouses not simply for web/web services managed infrastructure: expensive! From web hosting to cloud computing individual web/content providers: must provision for peak load Expensive, and typically resources are under-utilized web hosting: third party provides and owns the (server farm) infrastructure, hosting web services for content providers server consolidation via virtualization VMM Guest OS App Under client web service control
  • Slide 6
  • CSci5221: Data Centers, Cloud Computing, 6 Cloud Computing Cloud computing and cloud-based services: beyond web-based information access or information delivery computing, storage, Cloud Computing: NIST Definition "Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction." Models of Cloud Computing Infrastructure as a Service (IaaS), e.g., Amazon EC2, Rackspace Platform as a Service (PaaS), e.g., Micorsoft Azure Software as a Service (SaaS), e.g., Google
  • Slide 7
  • CSci5221: Data Centers, Cloud Computing, 7 Data Centers: Key Challenges With thousands of servers within a data center, How to write applications (services) for them? How to allocate resources, and manage them? in particular, how to ensure performance, reliability, availability, Scale and complexity bring other key challenges with thousands of machines, failures are the default case! load-balancing, handling heterogeneity, data center (server cluster) as a computer super-computer vs. cluster computer A single super-high-performance and highly reliable computer vs. a computer built out of thousands of cheap & unreliable PCs Pros and cons?
  • Slide 8
  • CSci5221: Data Centers, Cloud Computing, 8 Case Studies Google File System (GFS) a file system (or OS) for cluster computer An overlay on top of native OS on individual machines designed with certain (common) types of applications in mind, and designed with failures as default cases Google MapReduce (cf. Microsoft Dryad) MapReduce: a new programming paradigm for certain (common) types of applications, built on top of GFS Other examples (optional): BigTable: a (semi-) structured database for efficient key- value queries, etc., built on top of GFS Amazon Dynamo:A distributed storage system high availability is a key design goal Googles Chubby, Sawzall, etc. Open source systems: Hadoop,
  • Slide 9
  • CSci5221: Data Centers, Cloud Computing, Google Scale and Philosophy Lots of data copies of the web, satellite data, user data, email and USENET, Subversion backing store Workloads are large and easily parallelizable No commercial system big enough couldnt afford it if there was one might not have made appropriate design choices But truckloads of low-cost machines 450,000 machines (NYTimes estimate, June 14 th 2006) Failures are the norm Even reliable systems fail at Google scale Software must tolerate failures Which machine an application is running on should not matter Firm believers in the end-to-end argument Care about perf/$, not absolute machine perf
  • Slide 10
  • CSci5221: Data Centers, Cloud Computing, Typical Cluster at Google Cluster Scheduling Master Lock ServiceGFS Master Machine 1 Scheduler Slave GFS Chunkserver Linux User Task 1 Machine 2 Scheduler Slave GFS Chunkserver Linux User Task Machine 3 Scheduler Slave GFS Chunkserver Linux User Task 2 BigTable Server BigTable Master
  • Slide 11
  • CSci5221: Data Centers, Cloud Computing, Google: System Building Blocks Google File System (GFS): raw storage (Cluster) Scheduler: schedules jobs onto machines Lock service: distributed lock manager also can reliably hold tiny files (100s of bytes) w/ high availability Bigtable: a multi-dimensional database MapReduce: simplified large-scale data processing....
  • Slide 12
  • CSci5221: Data Centers, Cloud Computing, Chubby: Distributed Lock Service {lock/file/name} service Coarse-grained locks, can store small amount of data in a lock 5 replicas, need a majority vote to be active Also an OSDI 06 Paper
  • Slide 13
  • CSci5221: Data Centers, Cloud Computing, Google File System Key Design Considerations Component failures are the norm hardware component failures, software bugs, human errors, power supply issues, Solutions: built-in mechanisms for monitoring, error detection, fault tolerance, automatic recovery Files are huge by traditional standards multi-GB files are common, billions of objects most writes (modifications or mutations) are append two types of reads: large # of stream (i.e., sequential) reads, with small # of random reads High concurrency (multiple producers/consumers on a file) atomicity with minimal synchronization Sustained bandwidth more important than latency
  • Slide 14
  • CSci5221: Data Centers, Cloud Computing, GFS Architectural Design A GFS cluster: a single master + multiple chunkservers per master running on commodity Linux machines A file: a sequence of fixed-sized chunks (64 MBs) labeled with 64-bit unique global IDs, stored at chunkservers (as native Linux files, on local disk) each chunk mirrored across (default 3) chunkservers master server: maintains all metadata name space, access control, file-to-chunk mappings, garbage collection, chunk migration why only a single master? (with read-only shadow masters) simple, and only answer chunk location queries to clients! chunk servers (slaves or workers): interact directly with clients, perform reads/writes,
  • Slide 15
  • CSci5221: Data Centers, Cloud Computing, GFS Architecture: Illustration GPS clients consult master for metadata typically ask for multiple chunk locations per request access data from chunkservers Separation of control and data flows
  • Slide 16
  • CSci5221: Data Centers, Cloud Computing, Chunk Size and Metadata Chunk size: 64 MBs fewer chunk location requests to the master client can perform many operations on a chuck reduce overhead to access a chunk can establish persistent TCP connection to a chunkserver fewer metadata entries metadata can be kept in memory (at master) in-memory data structures allows fast periodic scanning some potential problems with fragmentation -Metadata file and chunk namespaces (files and chunk identifiers) file-to-chunk mappings locations of a chunks replicas
  • Slide 17
  • CSci5221: Data Centers, Cloud Computing, Chunk Locations and Logs Chunk location: does not keep a persistent record of chunk locations polls chunkservers at startup, and use heartbeat messages to monitor chunkservers: simplicity! because of chunkserver failures, it is hard to keep persistent record of chunk locations on-demand approach vs. coordination on-demand wins when changes (failures) are often Operation logs maintains historical record of critical metadata changes Namespace and mapping for reliability and consistency, replicate operation log on multiple remote machines (shadow masters)
  • Slide 18
  • CSci5221: Data Centers, Cloud Computing, 18 Clients and APIs GFS not transparent to clients requires clients to perform certain consistency verification (using chunk id & version #), make snapshots (if needed), APIs: open, delete, read, write (as expected) append: at least once, possibly with gaps and/or inconsistencies among clients snapshot: quickly create copy of file Separation of data and control: Is


View more >