40
Bricks and Translators: The distributed file system made by Red Hat Dr. Udo Seidel Linux-Strategy @ Amadeus

Lt2013 glusterfs.talk

Embed Size (px)

DESCRIPTION

Bricks and Translators - The distributed file system made by Red Hat (spring 2013)

Citation preview

Page 1: Lt2013 glusterfs.talk

Bricks and Translators: The distributed file system made by Red Hat

Dr. Udo SeidelLinux-Strategy @ Amadeus

Page 2: Lt2013 glusterfs.talk

LinuxTag 2013 2

To my Mum

Page 3: Lt2013 glusterfs.talk

LinuxTag 2013 3

Agenda

● Introduction● High level overview● Storage inside● Use cases● Summary

Page 4: Lt2013 glusterfs.talk

LinuxTag 2013 4

Introduction

Page 5: Lt2013 glusterfs.talk

LinuxTag 2013 5

Me ;-)

● Teacher of mathematics & physics● PhD in experimental physics● Started with Linux in 1996● Linux/UNIX trainer● Solution engineer in HPC and CAx environment● Head of the Linux Strategy team @Amadeus

Page 6: Lt2013 glusterfs.talk

LinuxTag 2013 6

Storage: History

● Reviewing storage task responsibilities● Block allocation● Space management

● Extension of SCSI standard● Object based storage● Meta-Data handling separated from data

management

Page 7: Lt2013 glusterfs.talk

LinuxTag 2013 7

Object based storage

● Storage objects quite general● Partition, file, ...● Unique identifier

● OSD (Object based Storage Device)● Hardware -> original trigger● Software -> common implementation● Main component of distributed file systems

Page 8: Lt2013 glusterfs.talk

LinuxTag 2013 8

Distributed storage: Paradigm changes

● Block -> Object● Central -> Distributed

● Few -> Many● Big -> Small

● Server <-> Storage

Page 9: Lt2013 glusterfs.talk

LinuxTag 2013 9

Distributed File Systems

● 'Recent' attention on distributed storage● Cloud hype● Big Data

● See also CEPH talks

Page 10: Lt2013 glusterfs.talk

LinuxTag 2013 10

Distributed storage – Now what?!?

● Several implementations● Different functions● Support models● Storage vendors initiatives● Relation to Linux distributions

Here and now ==> GlusterFS

Page 11: Lt2013 glusterfs.talk

LinuxTag 2013 11

High level overview

Page 12: Lt2013 glusterfs.talk

LinuxTag 2013 12

History

● Gluster founded in 2005● Gluster = GNU + cluster● Acquisition by Red Hat in 2011● Community project

● 3.2 in 2011● 3.3 in 2012

● Commercial product: RedHat Storage Server

Page 13: Lt2013 glusterfs.talk

LinuxTag 2013 13

The Client

● Native● 'speaks' GLUSTERFS● Not part of the Linux Kernel● FUSE-based

● NFS● Normal NFS client stack

● S3/Swift compatible● Proxy needed

Page 14: Lt2013 glusterfs.talk

LinuxTag 2013 14

The Server

● Data● Bricks● Translators● Volumes -> exported/served to the client

● Meta-Data● No dedicated instance● Distributed hashing approach

Page 15: Lt2013 glusterfs.talk

LinuxTag 2013 15

The picture

Page 16: Lt2013 glusterfs.talk

LinuxTag 2013 16

Storage inside

Page 17: Lt2013 glusterfs.talk

LinuxTag 2013 17

The Brick● Trust each other● Interconnect

● TCP/IP and/or RDMA/Infiniband

● Dedicated file systems on GlusterFS server● XFS recommended, EXT4 works too● Extended attributes a must

● Two main processes/daemons● glusterd and glusterfsd

Page 18: Lt2013 glusterfs.talk

LinuxTag 2013 18

The Translator

● One per purpose● Replication● POSIX● Quota● I/O behaviour

● Chained -> brick graph● Technically: configuration

Page 19: Lt2013 glusterfs.talk

LinuxTag 2013 19

The Volume

● Service unit● Layer of configuration

● distributed, replicated, striped, ...● NFS● Cache● Permissions● ....

Page 20: Lt2013 glusterfs.talk

LinuxTag 2013 20

The Striped Volume

Page 21: Lt2013 glusterfs.talk

LinuxTag 2013 21

The Distributed Volume

Page 22: Lt2013 glusterfs.talk

LinuxTag 2013 22

The Replicated Volume

Page 23: Lt2013 glusterfs.talk

LinuxTag 2013 23

The Distributed-Replicated Volume

Page 24: Lt2013 glusterfs.talk

LinuxTag 2013 24

Meta Data

● 2 kinds● More of local file system style● Related to distributed nature

● Some stored in backend file system ● Permissions● Time stamps● Distribution/replication

● Some calculated on the fly● Brick location

Page 25: Lt2013 glusterfs.talk

LinuxTag 2013 25

Elastic Hash Algorithm

● Based on file names● Name space divided● Full brick handled via relinking● Stored in extended attributes● Client needs to know topology

Page 26: Lt2013 glusterfs.talk

LinuxTag 2013 26

Distributed Hash Tables

Page 27: Lt2013 glusterfs.talk

LinuxTag 2013 27

Self-Healing

● On demand vs. Scheduled● File based● Based on extended attributes● Split-brain

● Quorum function● Sometimes: manual intervention

Page 28: Lt2013 glusterfs.talk

LinuxTag 2013 28

Geo replication

● Asynchronous● Based on rsync/ssh● Master-Slave ● If needed: cascading ● One way street● Clocks in sync!

Page 29: Lt2013 glusterfs.talk

LinuxTag 2013 29

From files to objects

● Introduced with version 3.3● Hard links with some hierarchy

● Re-uses GFID (inode number)

● UFO● Unified File and Object● Combination with RESTful API● S3 and swift compatible

Page 30: Lt2013 glusterfs.talk

LinuxTag 2013 30

Operations:Growth, shrinkage .. failures

● A Must!● Easy● Rebalance!● Order of servers important

Page 31: Lt2013 glusterfs.talk

LinuxTag 2013 31

What else ...?

● Encryption :-|● Compression :-(● Snapshots :-(● Hadoop connector :-)● Locking granularity :-|● File system statistics :-)

Page 32: Lt2013 glusterfs.talk

LinuxTag 2013 32

Use cases

Page 33: Lt2013 glusterfs.talk

LinuxTag 2013 33

NAS replacement

● NFS as 1:1● Server: GlusterFS● Client: NFS

● NFS as such● Server: GlusterFS● Client: GlusterFS

Page 34: Lt2013 glusterfs.talk

LinuxTag 2013 34

Storage back-end for KVM and Co

● Stacked (indirect)● Not smart● Workable for main hypervisors

● Direct● QEMU● libvirt● oVirt/RHEV

Page 35: Lt2013 glusterfs.talk

LinuxTag 2013 35

SAN replacement

● Not quite advanced (yet)● New translator needed

● Development started● Presenting GlusterFS as block device

● Additional items needed● Locking● ...

Page 36: Lt2013 glusterfs.talk

LinuxTag 2013 36

Summary

Page 37: Lt2013 glusterfs.talk

LinuxTag 2013 37

Take aways

● Thin distributed file system layer● Modular architecture● Operationally ready● Still some surprises● Active development and community

Page 38: Lt2013 glusterfs.talk

LinuxTag 2013 38

References

● http://www.gluster.org● http://www.sxc.hu (pictures)

Page 39: Lt2013 glusterfs.talk

LinuxTag 2013 39

Thank you!

Page 40: Lt2013 glusterfs.talk

LinuxTag 2013 40

Bricks and Translators:The distributed file system made by RedHat

Dr. Udo SeidelLinux-Strategy @ Amadeus