Lt2013 glusterfs.talk

Preview:

DESCRIPTION

Bricks and Translators - The distributed file system made by Red Hat (spring 2013)

Citation preview

Bricks and Translators: The distributed file system made by Red Hat

Dr. Udo SeidelLinux-Strategy @ Amadeus

LinuxTag 2013 2

To my Mum

LinuxTag 2013 3

Agenda

● Introduction● High level overview● Storage inside● Use cases● Summary

LinuxTag 2013 4

Introduction

LinuxTag 2013 5

Me ;-)

● Teacher of mathematics & physics● PhD in experimental physics● Started with Linux in 1996● Linux/UNIX trainer● Solution engineer in HPC and CAx environment● Head of the Linux Strategy team @Amadeus

LinuxTag 2013 6

Storage: History

● Reviewing storage task responsibilities● Block allocation● Space management

● Extension of SCSI standard● Object based storage● Meta-Data handling separated from data

management

LinuxTag 2013 7

Object based storage

● Storage objects quite general● Partition, file, ...● Unique identifier

● OSD (Object based Storage Device)● Hardware -> original trigger● Software -> common implementation● Main component of distributed file systems

LinuxTag 2013 8

Distributed storage: Paradigm changes

● Block -> Object● Central -> Distributed

● Few -> Many● Big -> Small

● Server <-> Storage

LinuxTag 2013 9

Distributed File Systems

● 'Recent' attention on distributed storage● Cloud hype● Big Data

● See also CEPH talks

LinuxTag 2013 10

Distributed storage – Now what?!?

● Several implementations● Different functions● Support models● Storage vendors initiatives● Relation to Linux distributions

Here and now ==> GlusterFS

LinuxTag 2013 11

High level overview

LinuxTag 2013 12

History

● Gluster founded in 2005● Gluster = GNU + cluster● Acquisition by Red Hat in 2011● Community project

● 3.2 in 2011● 3.3 in 2012

● Commercial product: RedHat Storage Server

LinuxTag 2013 13

The Client

● Native● 'speaks' GLUSTERFS● Not part of the Linux Kernel● FUSE-based

● NFS● Normal NFS client stack

● S3/Swift compatible● Proxy needed

LinuxTag 2013 14

The Server

● Data● Bricks● Translators● Volumes -> exported/served to the client

● Meta-Data● No dedicated instance● Distributed hashing approach

LinuxTag 2013 15

The picture

LinuxTag 2013 16

Storage inside

LinuxTag 2013 17

The Brick● Trust each other● Interconnect

● TCP/IP and/or RDMA/Infiniband

● Dedicated file systems on GlusterFS server● XFS recommended, EXT4 works too● Extended attributes a must

● Two main processes/daemons● glusterd and glusterfsd

LinuxTag 2013 18

The Translator

● One per purpose● Replication● POSIX● Quota● I/O behaviour

● Chained -> brick graph● Technically: configuration

LinuxTag 2013 19

The Volume

● Service unit● Layer of configuration

● distributed, replicated, striped, ...● NFS● Cache● Permissions● ....

LinuxTag 2013 20

The Striped Volume

LinuxTag 2013 21

The Distributed Volume

LinuxTag 2013 22

The Replicated Volume

LinuxTag 2013 23

The Distributed-Replicated Volume

LinuxTag 2013 24

Meta Data

● 2 kinds● More of local file system style● Related to distributed nature

● Some stored in backend file system ● Permissions● Time stamps● Distribution/replication

● Some calculated on the fly● Brick location

LinuxTag 2013 25

Elastic Hash Algorithm

● Based on file names● Name space divided● Full brick handled via relinking● Stored in extended attributes● Client needs to know topology

LinuxTag 2013 26

Distributed Hash Tables

LinuxTag 2013 27

Self-Healing

● On demand vs. Scheduled● File based● Based on extended attributes● Split-brain

● Quorum function● Sometimes: manual intervention

LinuxTag 2013 28

Geo replication

● Asynchronous● Based on rsync/ssh● Master-Slave ● If needed: cascading ● One way street● Clocks in sync!

LinuxTag 2013 29

From files to objects

● Introduced with version 3.3● Hard links with some hierarchy

● Re-uses GFID (inode number)

● UFO● Unified File and Object● Combination with RESTful API● S3 and swift compatible

LinuxTag 2013 30

Operations:Growth, shrinkage .. failures

● A Must!● Easy● Rebalance!● Order of servers important

LinuxTag 2013 31

What else ...?

● Encryption :-|● Compression :-(● Snapshots :-(● Hadoop connector :-)● Locking granularity :-|● File system statistics :-)

LinuxTag 2013 32

Use cases

LinuxTag 2013 33

NAS replacement

● NFS as 1:1● Server: GlusterFS● Client: NFS

● NFS as such● Server: GlusterFS● Client: GlusterFS

LinuxTag 2013 34

Storage back-end for KVM and Co

● Stacked (indirect)● Not smart● Workable for main hypervisors

● Direct● QEMU● libvirt● oVirt/RHEV

LinuxTag 2013 35

SAN replacement

● Not quite advanced (yet)● New translator needed

● Development started● Presenting GlusterFS as block device

● Additional items needed● Locking● ...

LinuxTag 2013 36

Summary

LinuxTag 2013 37

Take aways

● Thin distributed file system layer● Modular architecture● Operationally ready● Still some surprises● Active development and community

LinuxTag 2013 38

References

● http://www.gluster.org● http://www.sxc.hu (pictures)

LinuxTag 2013 39

Thank you!

LinuxTag 2013 40

Bricks and Translators:The distributed file system made by RedHat

Dr. Udo SeidelLinux-Strategy @ Amadeus

Recommended