45
When IaaS Meets DFS IaaS 平平平 平平平平平平平平平平平 Huang Chih-Chieh (soem) @ NEET

Dfs in iaa_s

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Dfs in iaa_s

When IaaS Meets DFS

IaaS平台中儲存元件的考量與其需求

Huang Chih-Chieh (soem) @ NEET

Page 2: Dfs in iaa_s

2

Outline

• What is IaaS• What is OpenStack• Storage Types in IaaS• Ceph– Issues

• Summary

Page 3: Dfs in iaa_s

3

WHAT IS IAAS

Page 4: Dfs in iaa_s

4

Cloud Service Models Overview

• What if you want to have an IT department ?– Similar to build a new house in previous analogy

• You can rent some virtualized infrastructure and build up your own IT system among those resources, which may be fully controlled.

• Technical speaking, use the Infrastructure as a Service (IaaS) solution.

– Similar to buy an empty house in previous analogy• You can directly develop your IT system through one cloud platform,

and do not care about any lower level resource management.• Technical speaking, use the Platform as a Service (PaaS) solution.

– Similar to live in a hotel in previous analogy• You can directly use some existed IT system solutions, which were

provided by some cloud application service provider, without knowing any detail technique about how these service was achieved.

• Technical speaking, use the Software as a Service (SaaS) solution.

Page 5: Dfs in iaa_s

5

From IaaS to PaaS

Traditional IT

Networking

Storage

Servers

Virtualization

OS

Middleware

Runtime

Data

Applications

You

Man

age

IaaS

Networking

Storage

Servers

Virtualization

OS

Middleware

Runtime

Data

Applications

You

Man

age

Provider Manage

PaaS

Networking

Storage

Servers

Virtualization

OS

Middleware

Runtime

Data

Applications

You

Man

age

Provider Manage

Page 6: Dfs in iaa_s

6

Service Model Overview

Page 7: Dfs in iaa_s

7

WHAT IS OPENSTACK

Page 8: Dfs in iaa_s

8

Page 9: Dfs in iaa_s

9

Page 10: Dfs in iaa_s

10

OpenStack Storage

cinder-volume

Page 11: Dfs in iaa_s

11

STORAGE TYPES IN IAAS

Page 12: Dfs in iaa_s

12

OpenStack Storage

• Instance Storage Provider– Off Compute Node Storage• Shared File System

– On Compute Node Storage• Shared File System

– On Compute Node Storage• Non-shared File System

• Image Repository– Glance

Page 13: Dfs in iaa_s

13

OpenNebula Storage

Page 14: Dfs in iaa_s

14

• Properties– File System vs. Block Device– Shared vs. Non-shared

• Four types:– Shared File System– Non-Shared File System– Shared Block Device– Non-Shared Block Device

SSCloud Storage

Page 15: Dfs in iaa_s

15

• File System– Bootable image / Small image• Small image

– FS cache in host memory– Random accessing

– Type• Shared

– DFS (ceph), NFS (nfsd)

• Non-Shared– Local filesystem + scp

SSCloud Storage

Page 16: Dfs in iaa_s

16

• Block device (with LVM)– Additional space / large image– Heavy accessing image

• Large image– No FS cache in host memory => save memory– Large chunk access

» Hadoop (64MB~128MB per file)

– Type• Shared

– iSCSI + LVM

• Non-Shared– LVM

SSCloud Storage

Page 17: Dfs in iaa_s

17

Storage Systems

• File Based– NFS– DFS• Lustre• GlusterFS• MooseFS• Ceph

Page 18: Dfs in iaa_s

18

Storage Systems

• Block Based– iSCSI + LVM– DRBD– VastSky– KPS: Kernel-based Programmable Storage System– Ceph

Page 19: Dfs in iaa_s

19

Storage Systems

• Object Based– Openstack Swift– Hadoop HDFS• with WebHDFS (1.0.4-stable) or HTTPFS (2.0.3-alpha)

– Ceph

Page 20: Dfs in iaa_s

20

CEPHCEPH: THE FUTURE OF STORAGE™

Page 21: Dfs in iaa_s

21

Ceph

• Overview– Ceph is a free software distributed file system.– Ceph's main goals are to be POSIX-compatible, and

completely distributed without a single point of failure.– The data is seamlessly replicated, making it fault tolerant.

• Release– On July 3, 2012, the Ceph development team released

Argonaut, the first release of Ceph with long-term support.

Page 22: Dfs in iaa_s

22

Ceph

• Introduction– Ceph is a distributed file system that provides

excellent performance ,reliability and scalability.– Objected-based Storage.– Ceph separates data and metadata operations by

eliminating file allocation tables and replacing them with generating functions.

– Ceph utilizes a highly adaptive distributed metadata cluster, improving scalability.

– Using OSD to directly access data, high performance.

Page 23: Dfs in iaa_s

23

Ceph

•Objected-based Storage

Page 24: Dfs in iaa_s

24

Ceph

• Goal– Scalability• Storage capacity, throughput, client performance.

Emphasis on HPC.

– Reliability• Failures are the norm rather than the exception, so the

system must have fault detection and recovery mechanism.

– Performance• Dynamic workloads Load balance.

Page 25: Dfs in iaa_s

25

Ceph

• Ceph Filesystem– POSIX• File based

• Ceph Block Device– RBD• Block based

• Ceph Object Gateway– Swift / S3 Restful API• Object based

Page 26: Dfs in iaa_s

26

Ceph

• Three main components– Clients : Near-POSIX file system interface.– Cluster of OSDs : Store all data and metadata.– Metadata Cluster : Manage namespace(file

name)

Page 27: Dfs in iaa_s

27

Three Fundamental Design

1. Separating Data and Metadata– Separation of file metadata management from the

storage.– Metadata operations are collectively managed by

a metadata server cluster.– User can direct access OSDs to get data by

metadata.– Ceph remove data allocation lists entirely.– Using CRUSH assigns objects to storage devices.

Page 28: Dfs in iaa_s

28

Separating Data and Metadata

• Ceph separates data and metadata operations

Page 29: Dfs in iaa_s

29

Separating Data and Metadata• Data Distribution with CRUSH– In order to avoid imbalance(OSD idle, empty) or

load asymmetries(hot data on new device). →distributing new data randomly.

– Ceph maps ojects into Placement groups(PGs)PGs are assigned to OSDs by CRUSH.

Page 30: Dfs in iaa_s

30

Dynamic Distributed Metadata Management

2.Dynamic Distributed Metadata Management Ceph utilizes a metadata cluster architecture based on Dynamic

Subtree Partitioning.(workload balance)

– Dynamic Subtree Partitioning• Most FS, use static subtree partitioning

→imbalance workloads and easy hash function.• Ceph’s MDS cluster is based on a dynamic subtree

partitioning. →balance workloads

Page 31: Dfs in iaa_s

31

Reliable Distributed Object Storage

3.Reliable Autonomic Distributed Object Storage– Replica.– Failure Detection and Recovery.

Page 32: Dfs in iaa_s

32

Client

• Client Operation– File I/O and Capabilities

Request

Client(open file)

MDS

Translate file name into

inode(inode number, file

owner, mode, size, …)Check OK, return

Return inode number, map file data into objects(CRUSH)

OSD

Direct access

Page 33: Dfs in iaa_s

33

Client

• Client Synchronization– If Multiple clients(readers and writers) use same

file, cancel any previously read and write capability until OSD check OK.• Traditional: Update serialization. →Bad performance• Ceph: Use HPC(high-performance computing

community) can read and write different parts of same file(diff bojects).→increase performance

Page 34: Dfs in iaa_s

34

Metadata

• Dynamically Distributed Metadata– MDSs use journaling• Repetitive metadata updates handled in memory.• Optimizes on-disk layout for read access.

– Per-MDS has journal, when MDS failure another node can quickly recover with journal.

– Inodes are embedded directly within directories.– Each directory’s content is written to the OSD

cluster using the same striping and distribution strategy as metadata journals and file data.

Page 35: Dfs in iaa_s

35

Replica

• Replication– Data is replicated in terms of PGs.– Clients send all writes to the first non-failed OSD in

an object’s PG (the primary), which assigns a new version number for the object and PG and forwards the write to any additional replica OSDs.

Page 36: Dfs in iaa_s

36

Failure detection

• Failure detection– When OSD not response → sign “down”– Pass to the next OSD.– If first OSD doesn’t recover →sign “out”– Another OSD join.

Page 37: Dfs in iaa_s

37

Failure Recovery

• Recovery and Cluster Updates– If an OSD1 crashes → sign “down”– The OSD2 take over as primary.– If OSD1 recovers → sign “up”– The OSD2 receives update request, sent new

version data to OSD1.

Page 38: Dfs in iaa_s

38

EVERYTHING LOOKS GOOD, BUT…

Page 39: Dfs in iaa_s

39

Issues

• Highly developed– 0.48• Monitor waste CPUs• Recovery into un-consistency state

– 0.56• Bugs in file extend behavior

– Qcow2 images have got IO errors in VMs kernel, » but things are going well in the log of Ceph.

– 0.67• ceph-deploy

Page 40: Dfs in iaa_s

40

Issues

• Correct the time– 0.56• OSDs waste CPUs

– ntpdate tock.stdtime.gov.tw

– 0.67• health HEALTH_WARN clock skew detected on mon.1

– ntpdate tock.stdtime.gov.tw– ntpserver

Page 41: Dfs in iaa_s

41

Issues

• CephFS is not statble– Newly system can use ceph RBD– Traditional system could only use the POSIX

interface• 0.56

– Ceph’s operation in a folder would be frozen,» if that folder is getting heavy loading.

– Bugs in file extend behavior

REF: http://www.sebastien-han.fr/blog/2013/06/24/what-i-think-about-cephfs-in-openstack/

Page 42: Dfs in iaa_s

42

Issues

• Mount ceph with– Kernel module• mount –t ceph …

– FUSE• ceph-fuse -c /etc/ceph/ceph.conf …

Page 43: Dfs in iaa_s

43

Issuesroot@SSCloud-01:/# cephfs /mnt/dev set_layout -p 5Segmentation fault

cephfs is not a super-friendly tool right now — sorry! :(I believe you will find it works correctly if you specify all the layout parameters, not just one of them.

root@SSCloud-01:/# cephfs -hnot enough parameters!usage: cephfs path command [options]*Commands: show_layout -- view the layout information on a file or dir set_layout -- set the layout on an empty file, or the default layout on a directory show_location -- view the location information on a file map -- display file objects, pgs, osdsOptions: Useful for setting layouts: --stripe_unit, -u: set the size of each stripe --stripe_count, -c: set the number of objects to stripe across --object_size, -s: set the size of the objects to stripe across --pool, -p: set the pool to use

Useful for getting location data: --offset, -l: the offset to retrieve location data for

root@SSCloud-01:/# cephfs /mnt/dev set_layout -u 4194304 -c 1 -s 4194304 -p 5root@SSCloud-01:/# cephfs /mnt/dev show_layoutlayout.data_pool: 5layout.object_size: 4194304layout.stripe_unit: 4194304layout.stripe_count: 1

Page 44: Dfs in iaa_s

44

SUMMARY

Page 45: Dfs in iaa_s

45

Summary

• There are three type of storage in IaaS– File-based, block-based, object-based

• Ceph is a good choice for IaaS– OpenStack store images in Ceph Block Device– Cinder or nova-volume to boot a VM• using a copy-on-write clone of an image

• CephFS is still highly developed– However, newer version is better.