Gluster overview & future directions vault 2015

Preview:

Citation preview

Gluster – Overview & Future Directions

Vijay BellurGlusterFS Co-maintainerRed Hat

03/12/15

Agenda● Overview

● Why Gluster?● What is Gluster?● Use Cases & Features

● Future Directions

● Q & A

03/12/15

Why Gluster?

03/12/15

Why Gluster?

● 2.5+ exabytes of data produced every day!

● 90% of data in last two years● Data needs to be stored somewhere!● Commoditization and Democratization –

way to go

source: http://www-01.ibm.com/software/data/bigdata/what-is-big-data.html

03/12/15

What is Gluster?

03/12/15

What is Gluster?

● Scale-out distributed storage system.

● Aggregates storage exports over network interconnects to provide an unified namespace.

● File, Object and Block interfaces

● Layered on disk file systems that support extended attributes.

03/12/15

Typical Gluster Deployment

03/12/15

Gluster Architecture – Foundations

● Software only, runs on commodity hardware

● No external metadata servers

● Scale-out with Elasticity

● Extensible and modular

03/12/15

Volumes in Gluster

● Logical collection of exports aka bricks.

● Identified by an administrative name.

● Volume or a part of the volume used by clients for data

CRUD operations.

● Multiple volume types supported currently

03/12/15

Distributed Volume

03/12/15

Replicated Volume

03/12/15

Distributed Replicated Volume

03/12/15

Dispersed Volume

● Introduced in GlusterFS 3.6

● Erasure Coding / RAID 5 over the network

● “Disperses” data on to various bricks

● Algorithm: Reed solomon

● Non–systematic erasure coding

● Encoding / decoding done on client side

03/12/15

Access Mechanisms

03/12/15

FUSE based native access

03/12/15

NFSv3 access with Gluster NFS

03/12/15

Object/ReST - SwiftonFile

Client Proxy Account

Container

Object

HTTP Request ( Swift REST

API)

Directory

Volume

FileClientNFS or

GlusterFS Mount

● Unified File and object view.

● Entity mapping between file and object building blocks

03/12/15

HDFS access

03/12/15

libgfapi access

03/12/15

Nfs-Ganesha with GlusterFS

03/12/15

SMB with GlusterFS

03/12/15

Block/iSCSi access

03/12/15

Features● Scale-out NAS

● Elasticity, quotas● Data Protection and Recovery

● Volume and File Snapshots, User Serviceable Snapshots, Geographic/Asynchronous replication

● Archival

● Read-only, WORM● Native CLI / API for management

03/12/15

Features● Isolation for multi-tenancy

● SSL for data/connection, Encryption at rest

● Performance

● Data, metadata and readdir caching

● Monitoring

● Built in io statistics, /proc like interface for introspection

● Provisioning

● Puppet-gluster, gluster-deploy

● More..

03/12/15

Gluster & oVirt

Row 1 Row 2 Row 3 Row 40

2

4

6

8

10

12

Column 1

Column 2

Column 3

03/12/15

Gluster Monitoring with Nagios

http://www.ovirt.org/Features/Nagios_Integration

03/12/15

How is it implemented?

03/12/15

Translators in Gluster

● Translator = shared library

● Each translator is a self-contained functional unit.

● Translators can be stacked together for achieving desired functionality.

● Translators are deployment agnostic – write once use anywhere!

03/12/15

Customizable Translator Stack

03/12/15

Where is Gluster used?

03/12/15

Gluster Use Cases

Source: 2014 GlusterFS user survey

03/12/15

Future Directions

03/12/15

Recent Gluster Releases

● 3.5 – April 2014

● 3.6 – Oct 2014

● 3.7 – April 2015

● Currently in development

03/12/15

New Features in Gluster 3.7

03/12/15

Data Tiering

● Policy based data movement across hot and cold tiers

● New translator for identifying candidates for promotion/demotion

● Enables better utilization of different classes of storage device/SSDs

Tier Xlator

HOT DHT COLD DHT

Replication Xlator

HOT Tier

POSIX Xlator

CTR Xlator

Other Server Xlator

Brick Storage

Heat Data Store

POSIX Xlator

CTR Xlator

Other Server Xlator

Brick Storage

Heat DataStore

COLD Tier

Demotion

Promotion

Data Tiering

03/12/15

Bitrot detection

● Detection of at rest data corruption● Checksum associated with each file

● Asynchronous checksum signing

● Periodic data scrubbing● Bitrot detection upon access

03/12/15

Sharding

● Solves fragmentation in Gluster volumes● Chunks and places data in any node that has

space● Suitable for large file workloads requiring

parallelism

03/12/15

Netgroups and Exports for NFS in 3.7

● More advanced configuration for authentication based on /etc/exports like syntax

● Support for netgroups

● Patches written at Facebook

● Forward ported from 3.4 to 3.7

03/12/15

NFS Ganesha improvements

● Supports active – active NFSv4, NFSv4.1 with Kerberos

● pNFS support for Gluster

● New upcall infrastructure added in Gluster

● Gluster CLI to manage NFS Ganesha

● High-Availability based on Pacemaker and Corosync

03/12/15

Performance enhancements

● Small file

● Multi-threaded epoll● In memory metadata caching on bricks● Improvements for directory listing

● Rebalance

● Parallel rebalance● More efficient disk crawling

● Data tiering

03/12/15

TrashCan

● Protection from fat finger deletions, truncations.● Stored in a designated directory within the brick● Captures deletions performed by maintenance

operations like self-healing, rebalance etc.

03/12/15

Arbiter Replication

● 2 Data, 3 Metadata replication● Additional metadata copy used for arbitration● Minimizes possibilites of split-brain by a great

degree● convert existing replica 2 volumes to arbiter

replica volumes

03/12/15

Split-brain Resolution

● Existing behavior – EIO● Administrative policies to automatically resolve

split-brain● User can view split objects & resolve split-brain

03/12/15

Other major improvements

● Support for inode quotas● Volume clone from snapshot● Snapshot scheduling● glusterfind – 'Needle in a haystack'● Loads of bug fixes

03/12/15

Features beyond GlusterFS 3.7

● HyperConvergence with oVirt

● Compression (at rest)

● De-duplication

● Overlay translator

● Multi-protocol support with NFS, FUSE and SMB

● Native ReST APIs for gluster management

● More integration with OpenStack, Containers

03/12/15

Hyperconverged oVirt – Gluster

● Server nodes are used both for virtualization and storage

● Support for both scaling up, adding more disks, and scaling out, adding more hosts

VM

s an

d S

tora

geE

ngin

e

GlusterFS Volume

Bricks Bricks Bricks

03/12/15 48

GlusterFS Native Driver – OpenStack Manila

● Supports Certificate based access type of Manila

● Provision shares that use the 'glusterfs' protocol

● Multi-tenant

● Separation using tenant specific certificates

● Supports certificate chaining and cipher lists

03/12/15 49

GlusterFS Native Driver – OpenStack Manila

10.1.1.1-24Admin

192.168.1.2Tech

10.1.2.1-12HR

Share: Admin(allow admin)

Share: Tech(allow Tech)

Share: HR(allow HR)

Gluster Pool

Manila Orchestration

03/12/15 50

GlusterFS Ganesha Driver for OpenStack Manila

Storage Backend

GlusterFS

Tenant 1

Service VM

Gluster FSAL

NFS-Ganesha Server

Tenant 2

Service VM

Gluster FSAL

NFS-Ganesha Server

Nova VM Nova VM

03/12/15

Gluster 4.0

03/12/15

Gluster 4.0● Address higher scale

● not just higher node count, also correctness and consistency at higher node count

● glusterd, DHT changes● Support more heterogeneous environments

● multiple OSes, multiple storage types, multiple networks, NSR

● Increase deployment flexibility

● e.g. data classification, multiple replication/erasure types and levels

03/12/15

New Style Replication

● Server Side Replication● Controlled by a designated “leader” also known

as sweeper.● Advantages

● Bandwidth usage of client network optimized for direct (fuse) mounts

● Avoidance of split brain

03/12/15

New Style Replication

03/12/15

DHTv2

● Improved scalability and performance for all directory-entry operations.

● High consistency and reliability for conflicting directory-entry operations, and for layout repair.

● Better performance for rebalance

03/12/15

Thousand node glusterd

● Scale glusterd to manage more than 1000 nodes

● Paxos/Raft for membership and configuration management

03/12/15

Gluster 4.0 – What's next?

● Code name for the release? Open to suggestions

● Submissions for feature proposals is still open!

● Implementation of key features in progress.

● Voting on feature proposals during design summit

● Tentatively planned for May 2016

03/12/15

Resources

Mailing lists:gluster-users@gluster.orggluster-devel@nongnu.org

IRC:#gluster and #gluster-dev on freenode

Web:http://www.gluster.org

Thank You!

vijay at gluster.orgtwitter: @vbellur

BACKUP

03/12/15

Striped Volume

● Aggregation of chunks of files placed on various bricks.

● Recommended normally for workloads involving very

large files and parallel access.

● WIP Sharding feature likely to supersede striped

volumes.

03/12/15

GlusterFS concepts – Trusted Storage Pool

● a.k.a cluster

● glusterd uses a membership protocol to form trusted storage pool.

● Trusted Storage Pool is invite only.

● Membership information used for determining quorum.

● Members can be dynamically added and removed from the pool.

03/12/15

How does a distributed volume work?

03/12/15

How does a distributed volume work?

03/12/15

How does a distributed volume work?

03/12/15

A brick is the combination of a node and an export directory – for e.g. hostname:/dir

Each brick inherits limits of the underlying filesystem

No limit on the number of bricks per node

Data and metadata get stored on bricks

/export3 /export3 /export3

Storage Node

/export1

Storage Node

/export2

/export1

/export2

/export4

/export5

Storage Node

/export1

/export2

3 bricks 5 bricks 3 bricks

GlusterFS concepts - Bricks

Recommended