Upload
sahina-bose
View
216
Download
0
Embed Size (px)
Citation preview
GlusterFS – What, Why, How ...
Presented by:Sahina BoseRed Hat18th Mar, 2015
Slide credits:Vijay BellurGlusterFS architectRed Hat
03/18/15
Agenda
● 4Ws and a H● Why GlusterFS?● What is GlusterFS?● How does GlusterFS work?● Where is GlusterFS used?● When does feature foo appear in GlusterFS?
● Q & A
03/18/15
Why GlusterFS?
● 2.5+ exabytes of data produced every day!● 90% of data in last two years● Data needs to be stored somewhere!● Commoditization and Democratization – way to
go
source: http://www-01.ibm.com/software/data/bigdata/what-is-big-data.html
03/18/15
What is GlusterFS?
● Scale-out distributed storage system.● Aggregates storage exports over network interconnects to
provide an unified namespace.● Layered on disk file systems that support extended
attributes.● Provides file, object and block interfaces for data access.
03/18/15
GlusterFS Architecture – Foundations
● Software only, runs on commodity hardware
● No external metadata servers
● Scale-out with Elasticity
● Extensible and modular
● Deployment agnostic
03/18/15
GlusterFS concepts – Trusted Storage Pool
Node2
Probe
Probe accepted
Node 1 and Node 2 are peers in a trusted storage pool
Node2Node1
Node1
03/18/15
GlusterFS concepts – Trusted Storage Pool
Node1 Node2 Node3Node2Node1 Trusted Storage Pool
Node3Node2Node1
Detach
03/18/15
A brick is the combination of a node and an export directory – for e.g. hostname:/dir
Each brick inherits limits of the underlying filesystem
No limit on the number of bricks per node
Data and metadata get stored on bricks
/export3 /export3 /export3
Storage Node
/export1
Storage Node
/export2
/export1
/export2
/export4
/export5
Storage Node
/export1
/export2
3 bricks 5 bricks 3 bricks
GlusterFS concepts - Bricks
03/18/15
GlusterFS concepts - Volumes
● Logical collection of bricks.
● Identified by an administrator provided name.
● Volume or a part of the volume can be mounted in a client
● mount -t glusterfs server1:/<volname>
/my/mnt/point
03/18/15
GlusterFS concepts - Volumes
Node2Node1 Node3
/export/brick1
/export/brick2
/export/brick1
/export/brick2
/export/brick1
/export/brick2
music
Videos
03/18/15
Volume Types
➢Type of a volume is specified at the time of volume creation
➢ Volume type determines how and where data is placed
➢ Following volume types are supported in glusterfs:1) Distribute2) Stripe3) Replication4) Distributed Replicate5) Striped Replicate6) Distributed Striped Replicate7) Disperse
03/18/15
Distributed Volume
➢Distributes files across various bricks of the volume.
➢Directories are present on all bricks of the volume.
➢Removes the need for an external meta data server, provides
O(1) lookup.
03/18/15
Replicated Volume
● Synchronous replication of all updates.
● Provides HA for data.
● Transaction driven for ensuring consistency.
● Changelogs maintained for re-conciliation.
● Any number of replicas can be configured.
03/18/15
Disperse Volume
● Introduced in GlusterFS 3.6
● Erasure Coding / RAID 5 over the network
● “Disperses” data on to various bricks
● IDA : Reed solomon codes
● Non–systematic erasure coding
● Encoding / decoding done on client side
03/18/15
Access Mechanisms
Gluster volumes can be accessed via the following mechanisms:
– FUSE based Native protocol
– NFSv3
– SMB
– libgfapi
– ReST/HTTP
– HDFS
– iSCSI
03/18/15
Object/ReST - SwiftonFile
Client Proxy Account
Container
Object
HTTP Request ( Swift REST
API)
Directory
Volume
FileClientNFS or
GlusterFS Mount
● Unified File and object view.
● Entity mapping between file and object building blocks
03/18/15
Features● Scale-out NAS
● Elasticity● Directory & Volume quotas
● Data Protection and Recovery
● Volume and File Snapshots● User Serviceable Snapshots● Geographic/Asynchronous replication
● Archival
● Read-only● WORM
03/18/15
Features● Performance
● Client side in memory caching for performance
● Data, metadata and readdir caching
● Monitoring
● Built in io statistics
● /proc like interface for introspection
● Provisioning
● puppet-gluster
● gluster-deploy
● More..
03/18/15
Translators in GlusterFS
● Building blocks for a GlusterFS process.● Based on Translators in GNU HURD.● Each translator is a functional unit.● Translators can be stacked together for achieving
desired functionality.● Translators are deployment agnostic – can be loaded
in either the client or server stacks.
03/18/15
Develop with GlusterFS
● Write your own translator● Possible in C and Python as of today
● Write your custom application using libgfapi● Interfaces similar to system calls● C, python, java and go bindings available
● Talk to Gluster through ReST APIs
03/18/15
Ecosystem Integration
● Currently used with various ecosystem projects● Virtualization
– OpenStack
– oVirt
– Qemu
– CloudStack
● Big Data Analytics– Hadoop
– Tachyon
● File Sync and Share– ownCloud
03/18/15
Nova Nodes(GlusterFS as
ephemeral storage)
SwiftObjects
Cinder Data
Glance Data
Swift Data
Swift API
Storage Server
(GlusterFS)
Storage Server
(GlusterFS)
Storage Server
(GlusterFS)
KVM
KVM
KVM
…Fuse/libgfapi
Manila Data
Storage Server
(GlusterFS)
Block Images FilesObjects
Openstack Juno + GlusterFS – Current Integration
03/18/15
New Features in GlusterFS 3.7
● Data Tiering
● Bitrot detection
● Sharding
● Performance improvements
● Better protection against split brain
● NFSv4, 4.1 and pNFS access using NFS Ganesha
● Netgroups style configuration for NFS
03/18/15
Data Tiering in 3.7
● Policy based data movement across hot and cold tiers
● New translator for identifying candidates for promotion/demotion
● Enables better utilization of different classes of storage device/SSDs
Tier Xlator
HOT DHT COLD DHT
Replication Xlator
Other Client Xlator
HOT Tier
POSIX Xlator
CTR Xlator
Other Server Xlator
Brick Storage
Heat Data Store
POSIX Xlator
CTR Xlator
Other Server Xlator
Brick Storage
Heat DataStore
COLD Tier
Demotion
Promotion
03/18/15
Bitrot detection in 3.7
● Detection of silent data corruption● Checksum associated with each file● Periodic data scrubbing● Detection also upon access
03/18/15
Sharding in 3.7
● Solves fragmentation in Gluster volumes● Chunks and places data in any node that has
space● Suitable for large file workloads
03/18/15
Netgroups and Exports for NFS in 3.7
● More advanced configuration for authentication based on /etc/exports like syntax
● Support for netgroups
● Patches written at Facebook
● Forward ported from 3.4 to 3.7
● Cleanups and posted for review
03/18/15
NFS Ganesha support in 3.7
● Supports NFSv4, NFSv4.1 with Kerberos
● pNFS support for Gluster Volumes follows later● Modifications to Gluster internals
● Upcall infrastructure● Gluster CLI to manage NFS Genesha● libgfapi improvements
● High-Availability based on Pacemaker and Corosync
03/18/15
Small-file performance enhancements in 3.7
● Multithreaded epoll (transport layer)
● In memory caching of stat and extended attributes on the bricks
● Improvements for directory reads
● Tiering of data within a volume
03/18/15
Features beyond GlusterFS 3.7
● HyperConvergence with oVirt
● At rest compression
● De-duplication
● Multi-protocol support with NFS, FUSE and SMB
● Native ReST APIs for gluster management
● More integration with OpenStack, Containers
03/18/15
Hyperconverged oVirt – GlusterFS
● Server nodes are used both for virtualization and serving replicated images from the GlusterFS Bricks
● The boxes can be standardized (hardware and deployment) for easy addition and replacement
● Support for both scaling up, adding more disks, and scaling out, adding more hosts V
Ms
and
Sto
rage
Eng
ine
GlusterFS Volume
Bricks Bricks Bricks
03/18/15
GlusterFS 4.0
● Not Evolutionary anymore
● Intended for massive scalability and manageability improvements, remove known bottlenecks
● Make it easier for devops to provison, manage and monitor
● Enable larger deployments and new use cases
03/18/15
GlusterFS 4.0
● New Style Replication
● Improved Distributed hashing Translator
● Composite operations in the GlusterFS RPC protocol
● Support for multiple networks
● Coherent client-side caching
● Advanced data tiering
● ... and much more
03/18/15
Resources
Mailing lists:[email protected]@nongnu.org
IRC:#gluster and #gluster-dev on freenode
Web:http://www.gluster.org