21
Google File System Lalit Kumar M.Tech Final Year Compute Science & Engineering Dept. KEC Dwarahat, Almora

advanced Google file System

Embed Size (px)

Citation preview

Page 1: advanced Google file System

Google File System

Lalit KumarM.Tech Final Year

Compute Science & Engineering Dept.KEC Dwarahat, Almora

Page 2: advanced Google file System

Overview Introduction To GFS Architecture Data Flow System Interactions Master Operations Meta Data Management Garbage Collection Fault tolerance Latest Advancement Drawbacks Conclusion References

Page 3: advanced Google file System

Introduction

More than 15,000 commodity-class PC's.Multiple clusters distributed worldwide.Thousands of queries served per second.One query reads 100's of MB of data.One query consumes 10's of billions of CPU cycles.Google stores dozens of copies of the entire Web!

Conclusion: Need large, distributed, highly fault tolerant file system.

Page 4: advanced Google file System

Architecture

A GFS cluster consists of a single master and multiple chunk-servers and is accessed by multiple clients

Figure 1: GFS ArchitectureSource: Howard Gobioff, “The GFS” Presented at SOSP 2003

Page 5: advanced Google file System

Master Manages namespace/metadata. Manages chunk creation, replication, placement. Performs snapshot operation to create duplicate of file or directory tree. Performs checkpointing and logging of changes to metadataChunkservers On startup/failure recovery, reports chunks to master. Periodically reports sub-set of chunks to master (to detect no longer needed

chunks)Metadata Types of Metadata:- File and chunk namespaces, Mapping from files to

chunks, Location of each chunks replicas. Easy and efficient for the master to periodically scan. Periodic scanning is used to implement chunk garbage collection, re-

replication and chunk migration .

Page 6: advanced Google file System

Data is pushed linearly along a carefully picked chain of chunk servers in a TCP pipelined fashion.

Once a chunkserver receives some data, it starts forwarding immediately to the next chunkserver

Each machine forwards the data to the closest machine in the network topology that has not received it.

Data Flow

Figure 2: Data Flow in chunkserversSource: http://research.google.com/archive/gfs‐sosp2003.pdf

Page 7: advanced Google file System

System Interactions Read Algorithm

1.Application originates the read request2.GFS client translates the request form (filename, byte range) -> (filename, chunk index), and sends it to master 3.Master responds with chunk handle and replica locations (i.e. chunkservers where the replicas are stored) 4.Client picks a location and sends the (chunk handle, byte range) request to the location 5.Chunkserver sends requested data to the client 6.Client forwards the data to the application .

Figure 3: Block diagram for Read operation Source: Howard Gobioff, “The GFS” Presented at SOSP 2003

Page 8: advanced Google file System

Write Algorithm

1.Application originates the request 2.GFS client translates request from (filename, data) -> (filename, chunk index), and sends it to master3.Master responds with chunk handle and (primary + secondary) replica locations 4.Client pushes write data to all locations. Data is stored in chunkserver’s internal buffers 5.Client sends write command to primary 6.Primary determines serial order for data instances stored in its buffer and writes the instances in that order to the chunk 7.Primary sends the serial order to the secondaries and tells them to perform the write 8.Secondaries respond to the primary & primary responds back to the client

Figure 4: Block Diagram for Write operationSource: Howard Gobioff, “The GFS” Presented at SOSP 2003

Page 9: advanced Google file System

Master Operation1. Namespace Management and Locking

GFS maps full pathname to Metadata in a table. Each master operation acquires a set of locks. Locking scheme allows concurrent mutations in same directory. Locks are acquired in a consistent total order to prevent deadlock.

2. Replica Placement

3. Chunk Creation

4. Re-Replication

5. Balancing

Page 10: advanced Google file System

Each master operation acquires a set of locks before it runs

To make operation on /dir1/dir2/dir3/leaf it first needs the

following locks– Read-lock on /dir1– Read-lock on /dir1/dir2– Read-lock on /dir1/dir2/dir3– Read-lock or Write-lock on /dir1/dir2/dir3/leaf

File creation doesn’t require write‐lock on parent director read- lock on the name Sufficient to protect the parent

directory from deletion, rename, or snapsho1ed

1. Namespace Management & Locking

Page 11: advanced Google file System

2. Chunk Creation Master considers several factors Place new replicas on chunk servers with below‐average disk

space utilization Limit the number of “recent” creations on each chunk server Spread replicas of a chunk across racks

Page 12: advanced Google file System

3. Re-replication

Master Re-replicate a chunk as soon as the number of available

replicas falls below a user-specified goal.

When a chunkserver becomes unavailable.

When a chunkserver reports a corrupted chunk.

When the replication goal is increased.

Re‐replication placement is similar as for “creation”

Page 13: advanced Google file System

4. Balancing

Master Re-balances replicas periodically for better disk space and

load balancing

Master gradually fills up a new chunkserver rather than instantly

swaps it with new chunks (and the heavy write traffic that come with

them!)

Page 14: advanced Google file System

Metadata Management0

The master stores three major types of metadata: File and chunk namespaces Mapping from files to chunks Locations of each chunk’s replicas

All metadata is kept in the master’s memory.

Figure 5: logical Structure of MetadataSource: Naushad UzZaman,“Survey on Google File System”,CSC 456,2007

Page 15: advanced Google file System

Storage reclaimed lazily by GC.

File first renamed to a hidden name.

Hidden files removes if more than three days old.

When hidden file removed, in-memory metadata is removed.

Regularly scans chunk namespace, identifying orphaned chunks. These

are removed.

Chunk servers periodically report chunks they have and the master replies

with the identity of all chunks that are no longer present in the master’s

metadata. The chunkserver is free to delete its replicas of such chunks.

Garbage Collection

Page 16: advanced Google file System

Fault Tolerance High availability:

Fast recovery. Chunk replication. Master Replication

Data Integrity: Chunkserver uses checksumming. Broken up into 64 KB blocks.

Page 17: advanced Google file System

Latest Advancement

1. Gmail- An easily configurable email service with 15GB of web space.

2. Blogger- A free web-based service that helps consumers publish on the web without writing code or installing software.

3. Google- “Next generation corporate s/w” A smaller version of the Google software, modified for private use.

Page 18: advanced Google file System

Small files will have small number of chunks even one. This can lead to chunk servers storing these files to become hot spots in case of many client requests.

Internal Fragmentation. If there are many such small files the master involvement will increase and

can lead to a potential bottleneck. Having a single master node can become an issue.

Master memory is a limitation. Performance might degrade if the numbers of writers and random writes

are more. No reasoning is provided for the choice of standard chunk size (64MB).

Drawbacks

Page 19: advanced Google file System

Conclusion

GFS meets Google storage requirements

Incremental growth. Regular check of component failure. Data optimization from special operations . Simple architecture. Fault Tolerance.

Page 20: advanced Google file System

References[1] Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung, The Google File System, ACM SIGOPS Operating Systems Review, Volume 37, Issue 5, 2003. [2] Sean Quinlan, Kirk McKusick “GFS-Evolution and Fast-Forward” Communications of the ACM, Vol 53, 2013.[3] Thomas Anderson, Michael Dahlin, JeannaNeefe, David Patterson, Drew Roselli, and Randolph Wang. Serverlessnetworkfil e systems. In Proceedings of the 15th ACM Symposium on Operating System Principles, pages 109–126, Copper Mountain Resort, Colorado, December 1995.[4] Luis-Felipe Cabrera and Darrell D. E. Long. Swift: Using distributed disks triping to provide high I/O data rates. Computer Systems, 4(4):405–436, 1991.[5] InterMezzo. http://www.inter-mezzo.org, 2003.

Page 21: advanced Google file System

Thank You….