OpenDedup / SDFS

OpenDedupOpen Source Block-Level Deduplication

Copyright

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. You may obtain a copy of the GNU Free Documentation License from the Free Software Foundation by visiting their Web site or by writing to: Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

If you find this document useful or further it's distribution we would appreciate you letting us know.

The Proverb

“As storage becomes less and less expensive storage becomes more and more expensive.” --unknown

Ways To Shrink Data

● Single-Instance Store● rsync with –link-dest● Cyrus IMAP singleinstancestore

● Instance Compression● zip● gzip● bzip● e[2/3]compr

Traditional File-System

Filename

block list

block 9,463

block 7,112

block 32,762

block 9,463block 64,789

block 14,561

Filename

hash listhash 301a27block 10,112

hash e4a033block 99,017

hash af7b43block 72,465

Block Level Deduplication

hash index

301a27e4a033e4a033af7b43e4a033

Block-Level Deduplication

● Reduced Storage Requirements● Little impact on read performance

● May actually be faster

● Transparent write operations● Cost can be off-loaded from the host

● No file-system meta-data bloat● A problem with rsync –link-dest

● Efficient Replication

Block-Level Deduplication

● Everything hangs on the index● Corrupt the index, loose everything.

● Recovery next to impossible● Disk images are even less coherent.

● Memory Intensive● Requires very large hash-tables.

● Some CPU cost

Types Of Data

● Things that de-duplicate well:● Backups● Virtual Disk Images (VMDKs)● Documents [doc/docx/odt/xls/ods/ppt/odp]● E-Mail

● Things that do not de-duplicate well:● Multimedia [music/video]● Compressed Images [JPEG/PNG/GIF]● Encrypted Data

OPENDEDUP

● Also known as “SDFS”● Supports volume replication

● Distributed storage● RAIN

● Scalable● In excess of a petabyte of data

● File cloning● Block sizes as small as 4K● User space

Requirements

● fuse 2.8 or greater● Java 1.7● attr

Install

tar -C /opt -xzvf /tmp/sdfs-1.0.7.tar.gzmv /opt/sdfs-bin /opt/sdfscd /opt/sdfstar xzvf /tmp/jre-7-fcs-bin-b147-linux-x64-27_jun_2011.tar.gzln -s jre1.7.0/ jreexport JAVA_HOME=/opt/sdfs/jreexport PATH=/opt/sdfs/jre/bin:/opt/sdfs:$PATHexport CLASSPATH=/opt/sdfs/lib

Creating a local volume

--base-path <PATH>--chunk-store-data-location <base-path/chunkstore/chunks>--dedup-db-store <base-path/ddb>--chunk-store-encrypt <true|false>--chunk-store-local <true|false>--io-chunk-size <SIZE in kB; use 4 for VMDKs, defaults to 128>--volume-capacity <SIZE [MB|GB|TB]>--volume-name <STRING>

mkfs.sdfs --volume-name=volume1 \--volume-capacity=100GB \

--base-path=/srv/sdfs/volume1 \--io-chunk-size 64

mkfs.sdfs --volume-name=volume1 \--volume-capacity=100GB \

--base-path=/srv/sdfs/volume1 \--io-chunk-size 64

Mounting a volume

$ mount.sdfs -v volume1 -m /mnt1Running SDFS Version 1.0.7reading config file = /etc/sdfs/volume1-volume-cfg.xml...

$ df -k...volume1-volume-cfg.xml 104857600 0 104857600 0% /mnt1$ time cp -pvR /vms/Vista-100 ....`/vms/Vista-100/Primary Disk 001-000004-s029.vmdk' -> \ `./Vista-100/Primary Disk 001-000004-s029.vmdk'

Seeing the metadata

$ getfattr -d "Primary Disk 001-s001.vmdk"# file: Primary Disk 001-s001.vmdkuser.dse.maxsize="107898470400"user.dse.size="3519938560"user.sdfs.ActualBytesWritten="1,523,056,640"user.sdfs.BytesRead="0"user.sdfs.DuplicateData="15,925,248"user.sdfs.VMDK="false"user.sdfs.VirtualBytesWritten="1,538,981,888"user.sdfs.dedupAll="true"user.sdfs.dfGUID="c39f5c25-328a-456b-a329-77f...user.sdfs.file.isopen="true"user.sdfs.fileGUID="75f9e4f4-56a4-48b0-a3a4-a2...

Can we really use the metadata?

#!/usr/bin/env pythonimport xattrf = open('/mnt1/Vista-100/Primary Disk 001-000003-s045.vmdk')z = xattr.xattr(f)for x, y in z.items(): print x, y

user.sdfs.file.isopen falseuser.sdfs.ActualBytesWritten 0user.sdfs.VirtualBytesWritten 131072user.sdfs.BytesRead 0user.sdfs.DuplicateData 131072user.dse.size 11560353792user.dse.maxsize 107898470400….

Going network!

ServerClientVolume

eStorage Engine

CP Local HTTP

$ umount /mnt1$ rm /etc/sdfs/volume1-volume-cfg.xml$ rm -fR /srv/sdfs/*

/etc/sdfs/hashserver-config.xml

<chunk-server> <network port="2222"

hostname="0.0.0.0" use-udp="false"/> <locations

chunk-store="/srv/sdfs/dchunks/chunkstore/chunks" hash-db-store="/srv/sdfs/ddb/hdb"/> <chunk-store pre-allocate="false"

chunk-gc-schedule="0 0 0/2 * * ?" eviction-age="4" allocation-size="161061273600" page-size="4096" read-ahead-pages="8"/>

</chunk-server>

<chunk-server> <network port="2222"

hostname="0.0.0.0" use-udp="false"/> <locations

chunk-store="/srv/sdfs/dchunks/chunkstore/chunks" hash-db-store="/srv/sdfs/ddb/hdb"/> <chunk-store pre-allocate="false"

chunk-gc-schedule="0 0 0/2 * * ?" eviction-age="4" allocation-size="161061273600" page-size="4096" read-ahead-pages="8"/>

</chunk-server>

Start Our Own Hash Server

$ export JAVA_HOME=/opt/sdfs/jre$ export PATH=/opt/sdfs/jre/bin:/opt/sdfs:$PATH$ export CLASSPATH=/opt/sdfs/lib $ startDSEService.sh /etc/sdfs/hashserver-config.xml

$ netstat --listen --tcp --numeric –program...tcp 0 0 :::2222 :::* LISTEN 13414/java ...

$ mkfs.sdfs --volume-name=volume1 \ --volume-capacity=100GB \--io-chunk-size 4

Create a “remote” volume

●Edit /etc/sdfs/volume1-volume-cfg.xml● Change “enable” attribute of the “local-

chunkstore” element to “false”

<local-chunkstore allocation-size="107374182400" chunk-gc-schedule="0 0 0/4 * * ?" chunk-store="/opt/sdfs/volumes/volume1/chunkstore/chunks" chunk-store-dirty-timeout="1000" chunk-store-read-cache="5" chunkstore-class="org.opendedup.sdfs.filestore.FileChunkStore" enabled="false" encrypt="false" encryption-key="q@98lYEN@mqb6jkj2pV9gZlzSv3@WsUHh4J" eviction-age="6" gc-class="org.opendedup.sdfs.filestore.gc.PFullGC" hash-db-store="/opt/sdfs/volumes/volume1/chunkstore/hdb" pre-allocate="false" read-ahead-pages="8"/>

/etc/sdfs/routing-config.xml

<routing-config><servers> <server name="server1" host="127.0.0.1" port="2222" enable-udp="false" compress="false" network-threads="8"/> <server name="server2" host="127.0.0.1" port="2222" enable-udp="false" compress="false" network-threads="8"/></servers><chunks><chunk name="00" server="server1"/><chunk name="01" server="server1"/>…<chunk name="fe" server="server2"/><chunk name="ff" server="server2"/></chunks></routing-config>

Mount our remote store

$ mount.sdfs -r /etc/sdfs/routing-config.xml \ -v volume1 -m /mnt1

$ df -k...volume1-volume-cfg.xml 104857600 0 104857600 0% /mnt1$ cp -pvR /iso/Heretic2.iso /mnt1/Heretic2-1.iso$ cp -pvR /iso/Heretic2.iso /mnt1/Heretic2-1.iso

Do we have two copies?

$ getfattr -d /mnt1/Heretic2-1.iso…user.sdfs.ActualBytesWritten="242,933,760"user.sdfs.DuplicateData="335,872"user.sdfs.VirtualBytesWritten="243,269,632"...

$ getfattr -d /mnt1/Heretic2-1.iso…user.sdfs.ActualBytesWritten="0"user.sdfs.DuplicateData="243,269,632"user.sdfs.VirtualBytesWritten="243,269,632"...

The server's chunkstore

$ cd /srv/sdfs$ du -ks *10,670,192 dchunks184,972 ddb$ ls -l dchunks/chunkstore/chunks/trw-r--r-- 10,915,586,048 Jul 27 17:11 chunks.chk$ ls -l ddb/hdb/-rw-r--r-- 189,210,598 Jul 27 17:11 hashstore-sdfs

32TB of data at 128KB requires 8GB of RAM. 1TB @ 4KB equals the same 8GB.

Recommendations

● Memory● 2GB allocation OK for:

● 200GB@4KB chunks● 6TB@128KB chunks

● Edit mount.sdfs/startDSE.sh to increase● Change this: "-Xmx2g”● Each chunk requires 25bytes● footprint = (volume / chunk-size) * 25

Janitorial Jobs

● How do chunks gets eliminated?● FUSE tells the DSE what blocks are in use.● DSE checks for unclaimed blocks.

● Every four hours.● For 8 hours upon mount.

● Blocks unclaimed for 10 hours released.● Configuration options:

● FUSE: claim-hash-schedule● DSE: chunk-gc-schedule● Both must be more frequent than eviction-age.

Calling the janitor

● A chunk-store cleaning can be manually requested.

$ setfattr -n user.cmd.cleanstore \ -v 5555:15 /var/lib/pgsql

$ setfattr -n user.cmd.cleanstore \ -v 5555:15 /var/lib/pgsql Minutes

Mount Point

● Many parameters can be tweaked via setfattr. ● Deduplication can be disabled on specific files.

$ setfattr -n user.cmd.dedupAll \ -v 556:false <path to file>

$ setfattr -n user.cmd.cleanstore \ -v 5555:15 /var/lib/pgsql

Making a cloud driveOn Amazon S3

$ mkfs.sdfs --volume-name=<volume name> \--volume-capacity=<volume capacity> \--aws-enabled=true \--aws-access-key=<the aws assigned access key> \--aws-bucket-name=<bucket name> \--aws-secret-key=<assigned aws secret key> \--chunk-store-encrypt=true

$ mount.sdfs <volume name> <mount point>

$ mkfs.sdfs --volume-name=<volume name> \--volume-capacity=<volume capacity> \--aws-enabled=true \--aws-access-key=<the aws assigned access key> \--aws-bucket-name=<bucket name> \--aws-secret-key=<assigned aws secret key> \--chunk-store-encrypt=true

$ mount.sdfs <volume name> <mount point>

OpenDedup / SDFS

Documents

SDF Reference Manual 2013 v1p0 - MICT SETA MICT SETA SDF Reference Manual, Version 1.0, February 2013 Page iii 3.5.5 The “Approved SDFs” Form 40 3.5.6 The “WSP & ATR” Forms

CORPORATE OFFICE...Standard offers products such as MCBs, RCCBs, Distribution Boards (DBs), MCCBs, RCBOs Changeover Switches, Switch Disconnector Fuses (SDFs), HBC Fuses, Modular Switches,

Impact Award Categories 2015download.microsoft.com/download/0/0/F/00F0DC4C-A61D-403B...the competition (Google, IBM, SDFS, or Zimbra) in a customer situation. What specific benefits

sdfs dfgd - images.quickhunts.comimages.quickhunts.com/scavenger/42247c33f5f5de272a... · sdfsd sdfsd This is the mission box for mission #5. Click here to enter your own text This

INTEGRATED TOTAL DIETARY FIBER - Megazyme · Dietary Fiber (SDFP) and Lower Molecular Weight Soluble Dietary Fiber (SDFS) determination. (AOAC Method 2011.25). IDF is recovered by

FN bro dom 121217 - xpproddocs.s3.amazonaws.com FN SDFs Catalogue.pdf · About us Larsen & Toubro infuses engineering with imagination. The Company offers a wide range of advanced

統合型総食物繊維分析法 (0420).pdfAOAC 法2009.01 及び2011.15 で示された統合型分析法には重合度3以上のレジスタントスターチ（RS）やSDFS（NDOなど）を含む総食物繊維の測定法が記載されています。この分析法

SCHEME INFORMATION DOCUMENT - SBI Mutual Fund sdfs 90 days - 62.pdf · The particulars of the Scheme/Funds have been prepared in accordance with the Securities and Exchange Board

LAND USE PLANNING ACT, 2014 (ACT 3 OF 2014) DRAFT LAND …resource.capetown.gov.za/documentcentre/Documents... · SDFs in LUPA 9 Records of SDFs (LUPA S20) • SDFs to be updated

Stochastic Discount Factors (SDFs) and the Equity Premium … · Externado de Colombia. Master of Science en Finanzas. Barcelona GSE – Universitat Pompeu Fabra. ODEON Nº 6 103

Variance Bounds on the Permanent and Transitory ...pages.stern.nyu.edu/~dbackus/BCZ/BakshiChabiyo_bounds...tory components of SDFs, and the ratio of the permanent to the transitory

SKILLS DEVELOPMENT FACILITATOR (SDF) REGISTRATION AND ... › downloads › AGRISETA_SDF_REGISTRAT… · REGISTRATION FOR SECONDARY SDFS ... (SABPP) and the ETDP SETA are accredited

Hierarchical FSMs with Multiple CMsembedded.eecs.berkeley.edu/research/hsc/class.F03/... · HFSMs inside homogenuous SDFs Semantics: FSM is a slave and externally obeys SDF semantics:

1 Seta Career Opportuni… · Web viewThe MICT SETA also requests SDFs to provide data on the occupations they had difficulty filling and to forecast their ... in the world of

Florida Safe and Drug-Free Schools · PDF fileThe Florida Department of Education administers federal Safe and Drug-Free Schools (SDFS) funds to Florida's local education agencies

Model-Free International SDFs in Incomplete Markets markets. Section2describes our data and our main empirical ndings, under various benchmark assumptions about the degree of international

Smart Stochastic Discount Factors · SOFONIAS ALEMU KORSAYE, ALBERTO QUAINI and FABIO TROJANI January 9, 2020 ABSTRACT We introduce model-free Smart Stochastic Discount Factors (S{SDFs)

MQA I-Share Skills Development Facilitators (SDF ......Skills Development Facilitator (SDF): Main SDF for the SDL number even if there are other SDFs on the same levy number. People

Secure Decentralized File Sharing (SDFS) Network White ... · Topia Technology plans to develop clients for Windows desktops, iOS mobile devices, and Android devices. All the software

fdsdf fsf sfsdf sdfs s