CSci5221: Data Centers, Cloud Computing, … Cloud Computing and Data Centers: Overview What’s Cloud Computing? Data Centers and “Computing at Scale” Case.

  • Published on
    25-Dec-2015

  • View
    213

  • Download
    1

Transcript

Slide 1 CSci5221: Data Centers, Cloud Computing, Cloud Computing and Data Centers: Overview Whats Cloud Computing? Data Centers and Computing at Scale Case Studies: Google File System Map-Reduce Programming Model Optional Material Google Bigtable Readings: Do required readings Also do some of the optional readings if interested Slide 2 CSci5221: Data Centers, Cloud Computing, 2 Why Studying Cloud Computing and Data Centers Using Google as an example: GFS, MapReduce, etc. mostly related to distributed systems, not really networking stuff Two Primary Goals: they represent part of current and future trends how applications will be serviced, delivered, what are important new networking problems? more importantly, what lessons can we learn in terms of (future) networking design? closely related, and there are many similar issues/challenges (availability, reliability, scalability, manageability, .) (but of course, there are also unique challenges in networking) Slide 3 CSci5221: Data Centers, Cloud Computing, 3 Internet and Web Simple client-server model a number of clients served by a single server performance determined by peak load doesnt scale well (e.g., server crashes), when # of clients suddenly increases -- flash crowd From single server to blade server to server farm (or data center) Slide 4 CSci5221: Data Centers, Cloud Computing, 4 Internet and Web From traditional web to web service (or SOA) no longer simply file (or web page) downloads pages often dynamically generated, more complicated objects (e.g., Flash videos used in YouTube) HTTP is used simply as a transfer protocol many other application protocols layered on top of HTTP web services & SOA (service-oriented architecture) A schematic representation of modern web services front-end web rendering, request routing, aggregators, back-end database, storage, computing, Slide 5 CSci5221: Data Centers, Cloud Computing, 5 Data Center and Cloud Computing Data center: large server farms + data warehouses not simply for web/web services managed infrastructure: expensive! From web hosting to cloud computing individual web/content providers: must provision for peak load Expensive, and typically resources are under-utilized web hosting: third party provides and owns the (server farm) infrastructure, hosting web services for content providers server consolidation via virtualization VMM Guest OS App Under client web service control Slide 6 CSci5221: Data Centers, Cloud Computing, 6 Cloud Computing Cloud computing and cloud-based services: beyond web-based information access or information delivery computing, storage, Cloud Computing: NIST Definition "Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction." Models of Cloud Computing Infrastructure as a Service (IaaS), e.g., Amazon EC2, Rackspace Platform as a Service (PaaS), e.g., Micorsoft Azure Software as a Service (SaaS), e.g., Google Slide 7 CSci5221: Data Centers, Cloud Computing, 7 Data Centers: Key Challenges With thousands of servers within a data center, How to write applications (services) for them? How to allocate resources, and manage them? in particular, how to ensure performance, reliability, availability, Scale and complexity bring other key challenges with thousands of machines, failures are the default case! load-balancing, handling heterogeneity, data center (server cluster) as a computer super-computer vs. cluster computer A single super-high-performance and highly reliable computer vs. a computer built out of thousands of cheap & unreliable PCs Pros and cons? Slide 8 CSci5221: Data Centers, Cloud Computing, 8 Case Studies Google File System (GFS) a file system (or OS) for cluster computer An overlay on top of native OS on individual machines designed with certain (common) types of applications in mind, and designed with failures as default cases Google MapReduce (cf. Microsoft Dryad) MapReduce: a new programming paradigm for certain (common) types of applications, built on top of GFS Other examples (optional): BigTable: a (semi-) structured database for efficient key- value queries, etc., built on top of GFS Amazon Dynamo:A distributed storage system high availability is a key design goal Googles Chubby, Sawzall, etc. Open source systems: Hadoop, Slide 9 CSci5221: Data Centers, Cloud Computing, Google Scale and Philosophy Lots of data copies of the web, satellite data, user data, email and USENET, Subversion backing store Workloads are large and easily parallelizable No commercial system big enough couldnt afford it if there was one might not have made appropriate design choices But truckloads of low-cost machines 450,000 machines (NYTimes estimate, June 14 th 2006) Failures are the norm Even reliable systems fail at Google scale Software must tolerate failures Which machine an application is running on should not matter Firm believers in the end-to-end argument Care about perf/$, not absolute machine perf Slide 10 CSci5221: Data Centers, Cloud Computing, Typical Cluster at Google Cluster Scheduling Master Lock ServiceGFS Master Machine 1 Scheduler Slave GFS Chunkserver Linux User Task 1 Machine 2 Scheduler Slave GFS Chunkserver Linux User Task Machine 3 Scheduler Slave GFS Chunkserver Linux User Task 2 BigTable Server BigTable Master Slide 11 CSci5221: Data Centers, Cloud Computing, Google: System Building Blocks Google File System (GFS): raw storage (Cluster) Scheduler: schedules jobs onto machines Lock service: distributed lock manager also can reliably hold tiny files (100s of bytes) w/ high availability Bigtable: a multi-dimensional database MapReduce: simplified large-scale data processing.... Slide 12 CSci5221: Data Centers, Cloud Computing, Chubby: Distributed Lock Service {lock/file/name} service Coarse-grained locks, can store small amount of data in a lock 5 replicas, need a majority vote to be active Also an OSDI 06 Paper Slide 13 CSci5221: Data Centers, Cloud Computing, Google File System Key Design Considerations Component failures are the norm hardware component failures, software bugs, human errors, power supply issues, Solutions: built-in mechanisms for monitoring, error detection, fault tolerance, automatic recovery Files are huge by traditional standards multi-GB files are common, billions of objects most writes (modifications or mutations) are append two types of reads: large # of stream (i.e., sequential) reads, with small # of random reads High concurrency (multiple producers/consumers on a file) atomicity with minimal synchronization Sustained bandwidth more important than latency Slide 14 CSci5221: Data Centers, Cloud Computing, GFS Architectural Design A GFS cluster: a single master + multiple chunkservers per master running on commodity Linux machines A file: a sequence of fixed-sized chunks (64 MBs) labeled with 64-bit unique global IDs, stored at chunkservers (as native Linux files, on local disk) each chunk mirrored across (default 3) chunkservers master server: maintains all metadata name space, access control, file-to-chunk mappings, garbage collection, chunk migration why only a single master? (with read-only shadow masters) simple, and only answer chunk location queries to clients! chunk servers (slaves or workers): interact directly with clients, perform reads/writes, Slide 15 CSci5221: Data Centers, Cloud Computing, GFS Architecture: Illustration GPS clients consult master for metadata typically ask for multiple chunk locations per request access data from chunkservers Separation of control and data flows Slide 16 CSci5221: Data Centers, Cloud Computing, Chunk Size and Metadata Chunk size: 64 MBs fewer chunk location requests to the master client can perform many operations on a chuck reduce overhead to access a chunk can establish persistent TCP connection to a chunkserver fewer metadata entries metadata can be kept in memory (at master) in-memory data structures allows fast periodic scanning some potential problems with fragmentation -Metadata file and chunk namespaces (files and chunk identifiers) file-to-chunk mappings locations of a chunks replicas Slide 17 CSci5221: Data Centers, Cloud Computing, Chunk Locations and Logs Chunk location: does not keep a persistent record of chunk locations polls chunkservers at startup, and use heartbeat messages to monitor chunkservers: simplicity! because of chunkserver failures, it is hard to keep persistent record of chunk locations on-demand approach vs. coordination on-demand wins when changes (failures) are often Operation logs maintains historical record of critical metadata changes Namespace and mapping for reliability and consistency, replicate operation log on multiple remote machines (shadow masters) Slide 18 CSci5221: Data Centers, Cloud Computing, 18 Clients and APIs GFS not transparent to clients requires clients to perform certain consistency verification (using chunk id & version #), make snapshots (if needed), APIs: open, delete, read, write (as expected) append: at least once, possibly with gaps and/or inconsistencies among clients snapshot: quickly create copy of file Separation of data and control: Issues control (metadata) requests to master server Issues data requests directly to chunkservers Caches metadata, but does no caching of data no consistency difficulties among clients streaming reads (read once) and append writes (write once) dont benefit much from caching at client Slide 19 CSci5221: Data Centers, Cloud Computing, 19 System Interaction: Read Client sends master: read(file name, chunk index) Masters reply: chunk ID, chunk version#, locations of replicas Client sends closest chunkserver w/replica: read(chunk ID, byte range) closest determined by IP address on simple rack-based network topology Chunkserver replies with data Slide 20 CSci5221: Data Centers, Cloud Computing, 20 System Interactions: Write and Record Append Write and Record Append (atomic) slightly different semantics: record append is atomic The master grants a chunk lease to a chunkserver (primary), and replies back to client Client first pushes data to all chunkservers pushed linearly: each replica forwards as it receives pipelined transfer: 13 MB/second with 100 Mbps network Then issues a write/append to primary chunkserver Primary chunkserver determines the order of updates to all replicas in record append: primary chunkserver checks to see whether record append would exceed maximum chunk size if yes, pad the chuck (and ask secondaries to do the same), and then ask client to append to the next chunk Slide 21 CSci5221: Data Centers, Cloud Computing, Leases and Mutation Order Lease: 60 second timeouts; can be extended indefinitely extension request are piggybacked on heartbeat messages after a timeout expires, master can grant new leases Use leases to maintain consistent mutation order across replicas Master grant lease to one of the replicas -> Primary Primary picks serial order for all mutations Other replicas follow the primary order Slide 22 CSci5221: Data Centers, Cloud Computing, Consistency Model Changes to namespace (i.e., metadata) are atomic done by single master server! Master uses log to define global total order of namespace-changing operations Relaxed consistency concurrent changes are consistent but undefined defined: after data mutation, file region that is consistent, and all clients see that entire mutation an append is atomically committed at least once occasional duplications All changes to a chunk are applied in the same order to all replicas Use version number to detect missed updates Slide 23 CSci5221: Data Centers, Cloud Computing, Master Namespace Management & Logs Namespace: files and their chunks metadata maintained as flat names, no hard/symbolic links full path name to metadata mapping with prefix compression Each node in the namespace has associated read- write lock (-> a total global order, no deadlock) concurrent operations can be properly serialized by this locking mechanism Metadata updates are logged logs replicated on remote machines take global snapshots (checkpoints) to truncate logs (but checkpoints can be created while updates arrive) Recovery Latest checkpoint + subsequent log files Slide 24 CSci5221: Data Centers, Cloud Computing, Replica Placement Goals: Maximize data reliability and availability Maximize network bandwidth Need to spread chunk replicas across machines and racks Higher priority to replica chunks with lower replication factors Limited resources spent on replication Slide 25 CSci5221: Data Centers, Cloud Computing, Other Operations Locking operations one lock per path, can modify a directory concurrently to access /d1/d2/leaf, need to lock /d1, /d1/d2, and /d1/d2/leaf each thread acquires: a read lock on a directory & a write lock on a file totally ordered locking to prevent deadlocks Garbage Collection: simpler than eager deletion due to unfinished replicated creation, lost deletion messages deleted files are hidden for three days, then they are garbage collected combined with other background (e.g., take snapshots) ops safety net against accidents Slide 26 CSci5221: Data Centers, Cloud Computing, Fault Tolerance and Diagnosis Fast recovery Master and chunkserver are designed to restore their states and start in seconds regardless of termination conditions Chunk replication Data integrity A chunk is divided into 64-KB blocks Each with its checksum Verified at read and write times Also background scans for rarely used data Master replication Shadow masters provide read-only access when the primary master is down Slide 27 CSci5221: Data Centers, Cloud Computing, 27 GFS: Summary GFS is a distributed file system that support large-scale data processing workloads on commodity hardware GFS has different points in the design space Component failures as the norm Optimize for huge files Success: used actively by Google to support search service and other applications But performance may not be good for all apps assumes read-once, write-once workload (no client caching!) GFS provides fault tolerance Replicating data (via chunk replication), fast and automatic recovery GFS has the simple, centralized master that does not become a bottleneck Semantics not transparent to apps (end-to-end principle?) Must verify file contents to avoid inconsistent regions, repeated appends (at-least-once semantics) Slide 28 CSci5221: Data Centers, Cloud Computing, Google MapReduce The problem Many simple operations in Google Grep for data, compute index, compute summaries, etc But the input data is large, really large The whole Web, billions of Pages Google has lots of machines (clusters of 10K etc) Many computations over VERY large datasets Question is: how do you use large # of machines efficiently? Can reduce computational model down to two steps Map: take one operation, apply to many many data tuples Reduce: take result, aggregate them MapReduce A generalized interface for massively parallel cluster processing Slide 29 CSci5221: Data Centers, Cloud Computing, MapReduce Programming Model Intuitively just like those from functional languages Scheme, lisp, haskell, etc Map: initial parallel computation map (in_key, in_value) -> list(out_key, intermediate_value) In: a set of key/value pairs Out: a set of intermediate key/value pairs Note keys might change during Map Reduce: aggregation of intermediate values by key reduce (out_key, list(intermediate_value)) -> list(out_value) Combines all intermediate values for a particular key Produces a set of merged output values (usually just one) Slide 30 CSci5221: Data Centers, Cloud Computing, Example: Word Counting Goal Count # of occurrences of each word in many documents Sample data Page 1: the weather is good Page 2: today is good Page 3: good weather is good So what does this look like in MapReduce? map(String key, String value): // key: document name // value: document contents for each word w in value: EmitIntermediate(w, "1"); reduce(String key, Iterator values): // key: a word // values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result)); Slide 31 CSci5221: Data Centers, Cloud Computing, Map/Reduce in Action Worker 1: (the 1), (weather 1), (is 1), (good 1) Worker 2: (today 1), (is 1), (good 1) Worker 3: (good 1), (weather 1), (is 1), (good 1) Page 1: the weather is good Page 2: today is good Page 3: good weather is good map Worker 1: (the 1) Worker 2: (is 1), (is 1), (is 1) Worker 3: (weather 1), (weather 1) Worker 4: (today 1) Worker 5: (good 1), (good 1), (good 1), (good 1) feed Worker 1: (the 1) Worker 2: (is 3) Worker 3: (weather 2) Worker 4: (today 1) Worker 5: (good 4) reduce Slide 32 CSci5221: Data Centers, Cloud Computing, Illustration Slide 33 CSci5221: Data Centers, Cloud Computing, More Examples Distributed Grep Map: emit line if matches given pattern P Reduce: identity function, just copy result to output Count of URL Access Frequency Map: parses URL access logs, outputs Reduce: adds together counts for the same unique URL outputs Reverse web-link graph: who links to this page? Map: go through all source pages, generate all links Reduce: for each target, concatenate all source links Many more examples, see paper [MapReduce, OSDI04] Slide 34 CSci5221: Data Centers, Cloud Computing, MapReduce Architecture Single Master node Many worker bees Slide 35 CSci5221: Data Centers, Cloud Computing, MapReduce Operation Initial data split into 64MB blocks Computed, results locally stored Master informed of result locations M sends data location to R workers Final output written Slide 36 CSci5221: Data Centers, Cloud Computing, What if Workers Die? And you know they will Masters periodically ping workers Still alive and working? Good If corpse found Allocate task to next idle worker (ruthless!) If Map worker dies, need to recompute all its data, why? If corpse comes back to life (zombies!) Give it a task, and clean slate What if the Master dies? Only 1 Master, he/she dies, the whole thing stops Fairly rare occurrence Slide 37 CSci5221: Data Centers, Cloud Computing, What if You Find Stragglers? Some workers can be slower than others Faulty hardware Software misconfiguration / bug Whatever Near completion of task Master looks at stragglers and their tasks Assigns backup workers to also compute these tasks Whoever finishes first wins! Can now leave stragglers behind! Slide 38 CSci5221: Data Centers, Cloud Computing, What if You Find Stragglers? Some workers can be slower than others Faulty hardware Software misconfiguration / bug Whatever Near completion of task Master looks at stragglers and their tasks Assigns backup workers to also compute these tasks Whoever finishes first wins! Can now leave stragglers behind! Slide 39 CSci5221: Data Centers, Cloud Computing, Optional Materials: Google BigTable 39 Slide 40 CSci5221: Data Centers, Cloud Computing, Google Bigtable Distributed multi-level map With an interesting data model Fault-tolerant, persistent Scalable Thousands of servers Terabytes of in-memory data Petabyte of disk-based data Millions of reads/writes per second, efficient scans Self-managing Servers can be added/removed dynamically Servers adjust to load imbalance Key points: Data Model and Implementation Structure Tablets, SSTables, compactions, locality groups, API and Details: shared logs, compression, replication, Slide 41 CSci5221: Data Centers, Cloud Computing, Basic Data Model Distributed multi-dimensional sparse map (row, column, timestamp) cell contents Good match for most of Googles applications t1 t2 t3 www.cnn.com ROWS COLUMNS TIMESTAMPS contents Slide 42 CSci5221: Data Centers, Cloud Computing, Rows A row key is an arbitrary string Typically 10-100 bytes in size, up to 64 KB. Every read or write of data under a single row is atomic Data is maintained in lexicographic order by row key The row range for a table is dynamically partitioned Each partition (row range) is named a tablet Unit of distribution and load-balancing. Objective: make read operations single-sited! E.g., In Webtable, pages in the same domain are grouped together by reversing the hostname components of the URLs: com.google.maps instead of maps.google.com. Slide 43 CSci5221: Data Centers, Cloud Computing, Columns Columns have two-level name structure: Family: optional_qualifier; e.g., Language:English Column family A column family must be created before data can be stored in a column key Unit of access control Has associated type information Qualifier gives unbounded columns Additional level of indexing, if desired CNN homepage anchor:cnnsi.com cnn.com contents:anchor:stanford.edu CNN Slide 44 CSci5221: Data Centers, Cloud Computing, Locality Groups column families can be assigned to a locality group Used to organize underlying storage representation for performance data in a locality group can be mapped in memory, and stored in SSTable Avoid mingling data, e.g. page contents and page metadata Can compress locality groups Bloom Filters on SSTables in a locality group avoid searching SSTable if bit not set Tablet movement Major compaction (with concurrent updates) Minor compaction (to catch up with updates) without any concurrent updates Load on new server without requiring any recovery action Slide 45 CSci5221: Data Centers, Cloud Computing, Timestamps ( 64 bit integers ) Used to store different versions of data in a cell New writes default to current time, but timestamps for writes can also be set explicitly by clients Assigned by: Bigtable: real-time in microseconds, client application: when unique timestamps are a necessity. Items in a cell are stored in decreasing timestamp order Application specifies how many versions (n) of data items are maintained in a cell. Bigtable garbage collects obsolete versions Lookup options: Return most recent K values Return all values in timestamp range (or all values) Column families can be marked w/ attributes: Only retain most recent K values in a cell Keep values until they are older than K seconds Slide 46 CSci5221: Data Centers, Cloud Computing, Tablets Large tables broken into tablets at row boundaries Tablet holds contiguous range of rows Clients can often choose row keys to achieve locality Aim for ~100MB to 200MB of data per tablet Serving machine responsible for ~100 tablets Fast recovery: 100 machines each pick up 1 tablet from failed machine Fine-grained load balancing Migrate tablets away from overloaded machine Master makes load-balancing decisions Slide 47 CSci5221: Data Centers, Cloud Computing, Tablets & Splitting aaa.com TABLETS contents EN cnn.com cnn.com/sports.html language Website.com Zuppa.com/menu.html Slide 48 CSci5221: Data Centers, Cloud Computing, Tablets & Splitting aaa.com TABLETS contents EN cnn.com cnn.com/sports.html language Website.com Zuppa.com/menu.html Yahoo.com/kids.html Yahoo.com/kids.html?D Slide 49 CSci5221: Data Centers, Cloud Computing, Table, Tablet and SSTable Multiple tablets make up the table SSTables can be shared Tablets do not overlap, SSTables can overlap SSTable Tablet aardvark apple Tablet apple_two_E boat Slide 50 CSci5221: Data Centers, Cloud Computing, Tablet Representation SSTable: Immutable on-disk ordered map from string string String keys: triples Write buffer in memory (random-access) Append-only log on GFS SSTable on GFS (mmap) Tablet Write Read Slide 51 CSci5221: Data Centers, Cloud Computing, Tablet Contains some range of rows of the table Built out of multiple SSTables Index 64K block SSTable Index 64K block SSTable Tablet Start:aardvarkEnd:apple Slide 52 CSci5221: Data Centers, Cloud Computing, SSTable Immutable, sorted file of key-value pairs Chunks of data plus an index Index is of block ranges, not values A SSTable is stored in GFS Index 64K block SSTable Slide 53 CSci5221: Data Centers, Cloud Computing, Immutability SSTables are immutable simplifies caching, sharing across GFS etc no need for concurrency control SSTables of a tablet recorded in METADATA table Garbage collection of SSTables done by master On tablet split, split tables can start off quickly on shared SSTables, splitting them lazily Only memtable has reads and updates concurrent copy on write rows, allow concurrent read/write Slide 54 CSci5221: Data Centers, Cloud Computing, Overall Architecture Tablet Server GFS chunk server, CMS client Master Server GFS master server, CMS client Chubby Client - Handle master election - Store the bootstrap location of Hbase data - Discover region server - Store access control lists - Assigning tablets - Detecting the addition and expiration of tablet - Balancing tablet-server load - Handle schema changed CMS server - Scheduling jobs - Managing resources on the cluster - dealing with machine failures Slide 55 CSci5221: Data Centers, Cloud Computing, Tablet Assignment Master keeps track of the set of live tablet servers the current assignment of tablets to region servers, including which tablets are unassigned. Chubby Tablet ServerMaster Server 6) Check lock status 5) Assign tablets Tablet servers 9) Reassign unassigned tablets 2) Create a lock 3) Acquire the lock 4) Monitor 8) Acquire and Delete the lock Cluster manager 1)Start a server Slide 56 CSci5221: Data Centers, Cloud Computing, Placement of Tablets A tablet is assigned to one tablet server at a time. Master maintains: The set of live tablet servers, Current assignment of tablets to tablet servers (including the unassigned ones) Chubby maintains tablet servers: A tablet server creates and acquires an eXclusive lock on a uniquely named file in a specific chubby directory (named server directory), Master monitors server directory to discover tablet server, A tablet server stops processing requests if it loses its X lock (network partitioning). Tablet server will try to obtain an X lock on its uniqely named file as long as it exists. If the uniquely named file of a tablet server no longer exists then the tablet server kills itself. Goes back to a free pool to be assigned tablets by the master. Slide 57 CSci5221: Data Centers, Cloud Computing, How Client Locates Tablets 3-level hierarchical lookup scheme for tablets Location is ip:port of relevant server 1 st level: bootstrapped from lock server, points to owner of META0 2 nd level: Uses META0 data to find owner of appropriate META1 tablet 3 rd level: META1 table holds locations of tablets of all other tables META1 table itself can be split into multiple tablets Slide 58 CSci5221: Data Centers, Cloud Computing, Editing a Table Mutations are logged, then applied to an in-memory memtable May contain deletion entries to handle updates Group commit on log: collect multiple updates before log flush SSTable Insert Delete Insert Delete Insert tablet log GFS Tablet apple_two_E boat Memtable Memory Slide 59 CSci5221: Data Centers, Cloud Computing, Client Write & Read Operations Write operation arrives at a tablet server: Server ensures the client has sufficient privileges for the write operation (Chubby), A log record is generated to the commit log file, Once the write commits, its contents are inserted into the memtable. Once memtable reaches a threshold: memtable is frozen, a new memtable is created, frozen metable is converted to an SSTable and written to GFS Read operation arrives at a tablet server: Server ensures client has sufficient privileges for the read operation (Chubby), Read is performed on a merged view of (a) the SSTables that constitute the tablet, and (b) the memtable. Slide 60 CSci5221: Data Centers, Cloud Computing, DFS Memory Compaction memtable Read op Write op Tablet log Frozen memtable V6.0 V4.0V3.0V2.0V1.0 V5.0 Create new memtable Minor compaction Memtable -> a new SSTable SSTable files Major compaction Memtable + all SSTables -> to one SSTable Deleted data are removed storage can be re- used Merging compaction Memtable + a few SSTables -> A new SSTable Periodically done. Deleted data are still alive. Slide 61 CSci5221: Data Centers, Cloud Computing, System Structure (Again) Cluster Scheduling Master handles failover, monitoring GFS holds tablet data, logs Lock service holds metadata, handles master-election Bigtable tablet server serves data Bigtable tablet server serves data Bigtable tablet server serves data Bigtable master performs metadata ops, load balancing Bigtable cell Bigtable client Bigtable client library Open() Slide 62 CSci5221: Data Centers, Cloud Computing, Masters Tasks Use Chubby to monitor health of tablet servers, restart failed servers Tablet server registers itself by getting a lock in a specific directory chubby Chubby gives lease on lock, must be renewed periodically Server loses lock if it gets disconnected Master monitors this directory to find which servers exist/are alive If server not contactable/has lost lock, master grabs lock and reassigns tablets GFS replicates data. Prefer to start tablet server on same machine that the data is already at Slide 63 CSci5221: Data Centers, Cloud Computing, Masters Tasks When (new) master starts grabs master lock on chubby ensures only one master at a time finds live servers (scan chubby directory) communicates with servers to find assigned tablets scans metadata table to find all tablets Keeps track of unassigned tablets, assigns them Metadata root from chubby, other metadata tablets assigned before scanning. Slide 64 CSci5221: Data Centers, Cloud Computing, Shared Logs Designed for 1M tablets, 1000s of tablet servers 1M logs being simultaneously written performs badly Solution: shared logs Write log file per tablet server instead of per tablet Updates for many tablets co-mingled in same file Start new log chunks every so often (64MB) Problem: during recovery, server needs to read log data to apply mutations for a tablet Lots of wasted I/O if lots of machines need to read data for many tablets from same log chunk Slide 65 CSci5221: Data Centers, Cloud Computing, Shared Log Recovery Recovery: Servers inform master of log chunks they need to read Master aggregates and orchestrates sorting of needed chunks Assigns log chunks to be sorted to different tablet servers Servers sort chunks by tablet, writes sorted data to local disk Other tablet servers ask master which servers have sorted chunks they need Tablet servers issue direct RPCs to peer tablet servers to read sorted data for its tablets Slide 66 CSci5221: Data Centers, Cloud Computing, Application 1: Google Analytics Enables webmasters to analyze traffic pattern at their web sites. Statistics such as: Number of unique visitors per day and the page views per URL per day, Percentage of users that made a purchase given that they earlier viewed a specific page. How? A small JavaScript program that the webmaster embeds in their web pages. Every time the page is visited, the program is executed. Program records the following information about each request: User identifier The page being fetched Slide 67 CSci5221: Data Centers, Cloud Computing, Application 2: Personalized Search Records user queries and clicks across Google properties. Users browse their search histories and request for personalized search results based on their historical usage patterns. One Bigtable: Row name is userid A column family is reserved for each action type, e.g., web queries, clicks. User profiles are generated using MapReduce. These profiles personalize live search results. Replicated geographically to reduce latency and increase availability. Slide 68 CSci5221: Data Centers, Cloud Computing, Recap: Google Scale and Philosophy Lots of data copies of the web, satellite data, user data, email and USENET, Subversion backing store Workloads are large and easily parallelizable No commercial system big enough couldnt afford it if there was one might not have made appropriate design choices But truckloads of low-cost machines 450,000 machines (NYTimes estimate, June 14 th 2006) Failures are the norm Even reliable systems fail at Google scale Software must tolerate failures Which machine an application is running on should not matter Firm believers in the end-to-end argument Care about perf/$, not absolute machine perf

Recommended

View more >