Upload
vijay-bellur
View
624
Download
1
Tags:
Embed Size (px)
Citation preview
Gluster – Overview & Future Directions
Vijay BellurGlusterFS Co-maintainerRed Hat
03/12/15
Agenda● Overview
● Why Gluster?● What is Gluster?● Use Cases & Features
● Future Directions
● Q & A
03/12/15
Why Gluster?
03/12/15
Why Gluster?
● 2.5+ exabytes of data produced every day!
● 90% of data in last two years● Data needs to be stored somewhere!● Commoditization and Democratization –
way to go
source: http://www-01.ibm.com/software/data/bigdata/what-is-big-data.html
03/12/15
What is Gluster?
03/12/15
What is Gluster?
● Scale-out distributed storage system.
● Aggregates storage exports over network interconnects to provide an unified namespace.
● File, Object and Block interfaces
● Layered on disk file systems that support extended attributes.
03/12/15
Typical Gluster Deployment
03/12/15
Gluster Architecture – Foundations
● Software only, runs on commodity hardware
● No external metadata servers
● Scale-out with Elasticity
● Extensible and modular
03/12/15
Volumes in Gluster
● Logical collection of exports aka bricks.
● Identified by an administrative name.
● Volume or a part of the volume used by clients for data
CRUD operations.
● Multiple volume types supported currently
03/12/15
Distributed Volume
03/12/15
Replicated Volume
03/12/15
Distributed Replicated Volume
03/12/15
Dispersed Volume
● Introduced in GlusterFS 3.6
● Erasure Coding / RAID 5 over the network
● “Disperses” data on to various bricks
● Algorithm: Reed solomon
● Non–systematic erasure coding
● Encoding / decoding done on client side
03/12/15
Access Mechanisms
03/12/15
FUSE based native access
03/12/15
NFSv3 access with Gluster NFS
03/12/15
Object/ReST - SwiftonFile
Client Proxy Account
Container
Object
HTTP Request ( Swift REST
API)
Directory
Volume
FileClientNFS or
GlusterFS Mount
● Unified File and object view.
● Entity mapping between file and object building blocks
03/12/15
HDFS access
03/12/15
libgfapi access
03/12/15
Nfs-Ganesha with GlusterFS
03/12/15
SMB with GlusterFS
03/12/15
Block/iSCSi access
03/12/15
Features● Scale-out NAS
● Elasticity, quotas● Data Protection and Recovery
● Volume and File Snapshots, User Serviceable Snapshots, Geographic/Asynchronous replication
● Archival
● Read-only, WORM● Native CLI / API for management
03/12/15
Features● Isolation for multi-tenancy
● SSL for data/connection, Encryption at rest
● Performance
● Data, metadata and readdir caching
● Monitoring
● Built in io statistics, /proc like interface for introspection
● Provisioning
● Puppet-gluster, gluster-deploy
● More..
03/12/15
Gluster & oVirt
Row 1 Row 2 Row 3 Row 40
2
4
6
8
10
12
Column 1
Column 2
Column 3
03/12/15
Gluster Monitoring with Nagios
http://www.ovirt.org/Features/Nagios_Integration
03/12/15
How is it implemented?
03/12/15
Translators in Gluster
● Translator = shared library
● Each translator is a self-contained functional unit.
● Translators can be stacked together for achieving desired functionality.
● Translators are deployment agnostic – write once use anywhere!
03/12/15
Customizable Translator Stack
03/12/15
Where is Gluster used?
03/12/15
Gluster Use Cases
Source: 2014 GlusterFS user survey
03/12/15
Future Directions
03/12/15
Recent Gluster Releases
● 3.5 – April 2014
● 3.6 – Oct 2014
● 3.7 – April 2015
● Currently in development
03/12/15
New Features in Gluster 3.7
03/12/15
Data Tiering
● Policy based data movement across hot and cold tiers
● New translator for identifying candidates for promotion/demotion
● Enables better utilization of different classes of storage device/SSDs
Tier Xlator
HOT DHT COLD DHT
Replication Xlator
HOT Tier
POSIX Xlator
CTR Xlator
Other Server Xlator
Brick Storage
Heat Data Store
POSIX Xlator
CTR Xlator
Other Server Xlator
Brick Storage
Heat DataStore
COLD Tier
Demotion
Promotion
Data Tiering
03/12/15
Bitrot detection
● Detection of at rest data corruption● Checksum associated with each file
● Asynchronous checksum signing
● Periodic data scrubbing● Bitrot detection upon access
03/12/15
Sharding
● Solves fragmentation in Gluster volumes● Chunks and places data in any node that has
space● Suitable for large file workloads requiring
parallelism
03/12/15
Netgroups and Exports for NFS in 3.7
● More advanced configuration for authentication based on /etc/exports like syntax
● Support for netgroups
● Patches written at Facebook
● Forward ported from 3.4 to 3.7
03/12/15
NFS Ganesha improvements
● Supports active – active NFSv4, NFSv4.1 with Kerberos
● pNFS support for Gluster
● New upcall infrastructure added in Gluster
● Gluster CLI to manage NFS Ganesha
● High-Availability based on Pacemaker and Corosync
03/12/15
Performance enhancements
● Small file
● Multi-threaded epoll● In memory metadata caching on bricks● Improvements for directory listing
● Rebalance
● Parallel rebalance● More efficient disk crawling
● Data tiering
03/12/15
TrashCan
● Protection from fat finger deletions, truncations.● Stored in a designated directory within the brick● Captures deletions performed by maintenance
operations like self-healing, rebalance etc.
03/12/15
Arbiter Replication
● 2 Data, 3 Metadata replication● Additional metadata copy used for arbitration● Minimizes possibilites of split-brain by a great
degree● convert existing replica 2 volumes to arbiter
replica volumes
03/12/15
Split-brain Resolution
● Existing behavior – EIO● Administrative policies to automatically resolve
split-brain● User can view split objects & resolve split-brain
03/12/15
Other major improvements
● Support for inode quotas● Volume clone from snapshot● Snapshot scheduling● glusterfind – 'Needle in a haystack'● Loads of bug fixes
03/12/15
Features beyond GlusterFS 3.7
● HyperConvergence with oVirt
● Compression (at rest)
● De-duplication
● Overlay translator
● Multi-protocol support with NFS, FUSE and SMB
● Native ReST APIs for gluster management
● More integration with OpenStack, Containers
03/12/15
Hyperconverged oVirt – Gluster
● Server nodes are used both for virtualization and storage
● Support for both scaling up, adding more disks, and scaling out, adding more hosts
VM
s an
d S
tora
geE
ngin
e
GlusterFS Volume
Bricks Bricks Bricks
03/12/15 48
GlusterFS Native Driver – OpenStack Manila
● Supports Certificate based access type of Manila
● Provision shares that use the 'glusterfs' protocol
● Multi-tenant
● Separation using tenant specific certificates
● Supports certificate chaining and cipher lists
03/12/15 49
GlusterFS Native Driver – OpenStack Manila
10.1.1.1-24Admin
192.168.1.2Tech
10.1.2.1-12HR
Share: Admin(allow admin)
Share: Tech(allow Tech)
Share: HR(allow HR)
Gluster Pool
Manila Orchestration
03/12/15 50
GlusterFS Ganesha Driver for OpenStack Manila
Storage Backend
GlusterFS
Tenant 1
Service VM
Gluster FSAL
NFS-Ganesha Server
Tenant 2
Service VM
Gluster FSAL
NFS-Ganesha Server
Nova VM Nova VM
03/12/15
Gluster 4.0
03/12/15
Gluster 4.0● Address higher scale
● not just higher node count, also correctness and consistency at higher node count
● glusterd, DHT changes● Support more heterogeneous environments
● multiple OSes, multiple storage types, multiple networks, NSR
● Increase deployment flexibility
● e.g. data classification, multiple replication/erasure types and levels
03/12/15
New Style Replication
● Server Side Replication● Controlled by a designated “leader” also known
as sweeper.● Advantages
● Bandwidth usage of client network optimized for direct (fuse) mounts
● Avoidance of split brain
03/12/15
New Style Replication
03/12/15
DHTv2
● Improved scalability and performance for all directory-entry operations.
● High consistency and reliability for conflicting directory-entry operations, and for layout repair.
● Better performance for rebalance
03/12/15
Thousand node glusterd
● Scale glusterd to manage more than 1000 nodes
● Paxos/Raft for membership and configuration management
03/12/15
Gluster 4.0 – What's next?
● Code name for the release? Open to suggestions
● Submissions for feature proposals is still open!
● Implementation of key features in progress.
● Voting on feature proposals during design summit
● Tentatively planned for May 2016
03/12/15
Resources
Mailing lists:[email protected]@nongnu.org
IRC:#gluster and #gluster-dev on freenode
Web:http://www.gluster.org
Thank You!
vijay at gluster.orgtwitter: @vbellur
BACKUP
03/12/15
Striped Volume
● Aggregation of chunks of files placed on various bricks.
● Recommended normally for workloads involving very
large files and parallel access.
● WIP Sharding feature likely to supersede striped
volumes.
03/12/15
GlusterFS concepts – Trusted Storage Pool
● a.k.a cluster
● glusterd uses a membership protocol to form trusted storage pool.
● Trusted Storage Pool is invite only.
● Membership information used for determining quorum.
● Members can be dynamically added and removed from the pool.
03/12/15
How does a distributed volume work?
03/12/15
How does a distributed volume work?
03/12/15
How does a distributed volume work?
03/12/15
A brick is the combination of a node and an export directory – for e.g. hostname:/dir
Each brick inherits limits of the underlying filesystem
No limit on the number of bricks per node
Data and metadata get stored on bricks
/export3 /export3 /export3
Storage Node
/export1
Storage Node
/export2
/export1
/export2
/export4
/export5
Storage Node
/export1
/export2
3 bricks 5 bricks 3 bricks
GlusterFS concepts - Bricks