Upload
zaheer-shaikh
View
6
Download
0
Embed Size (px)
Citation preview
ZFS Cheatsheet
This is a quick and dirty cheatsheet on Sun's ZFS
Directories and Files
error messages /var/adm/messagesconsole
States
DEGRADED One or more top-level devices is in the degraded state because they have become offline. Sufficient replicas exist to keep functioning
FAULTED One or more top-level devices is in the faulted state because they have become offline. Insufficient replicas exist to keep functioning
OFFLINE The device was explicity taken offline by the "zpool offline" command
ONLINE The device is online and functioning
REMOVED The device was physically removed while the system was running
UNAVAIL The device could not be opened
Scrubbing and Resilvering
Scrubbing Examines all data to discover hardware faults or disk failures, only one scrub may be running at one time, you can manually scrub.
Resilveringis the same concept as rebuilding or resyncing data on to new disks into an array, the smart thing resilvering does is it does not rebuild the wholedisk only the data that is required (the data blocks not the free blocks) thus reducing the time to resync a disk. Resilvering is automatic when youreplace disks, etc. If a scrub is already running it is suspended until the resilvering has finished and then the scrubbing will continue.
ZFS Devices
Disk A physical disk drive
File The absolute path of pre-allocated files/images
Mirror Standard raid-1 mirror
Raidz1/2/3
## non-standard distributed parity-based software raid levels, one common problem called "write-hole" is elimiated because raidz in ## zfs the dataand stripe are written simultanously, basically is a power failure occurs in the middle of a write then you have the ## data plus the parity or youdont, also ZFS supports self-healing if it cannot read a bad block it will reconstruct it using the## parity, and repair or indicate that this block should not be used.
## You should keep the raidz array at a low power of two plus partityraidz1 - 3, 5, 9 disksraidz2 - 4, 6, 8, 10, 18 disksraidz3 - 5, 7, 11, 19 disks
## the more parity bits the longer it takes to resilver an array, standard mirroring does not have the problem of creating the parity## so is quicker in resilvering
## raidz is more like raid3 than raid5 but does use parity to protect from disk failuresraidz/raidz1 - minimum of 3 devices (one parity disk), you can suffer a one disk lossraidz2 - minimum of 4 devices (two parity disks), you can suffer a two disk lossraidz3 - minimum of 5 devices (three parity disks) , you can suffer a three disk loss
spare hard drives marked as "hot spare" for ZFS raid, by default hot spares are not used in a disk failure you must turn on the "autoreplace" feature.
cacheLinux caching mechanism use what is known as least recently used (LRU) algorithms, basically first in first out (FIFO) blocks are moved in and out ofcache. Where ZFS cache is different it caches both least recently used block (LRU) requests and least frequent used (LFU) block requests, the cachedevice uses level 2 adaptive read cache (L2ARC).
log
There are two terminologies here
ZFS intent log (ZIL) - a logging mechanism where all the data to be written is stored, then later flushed as a transactional write, this issimilar to a journal filesystem (ext3 or ext4).Seperate intent log (SLOG) - a seperate logging devive that caches the synchronous parts of the ZIL before flushing them to the slower disk,it does not cache asynchronous data (asynchronous data is flushed directly to the disk). If the SLOG exists the ZIL will be moved to it ratherthan residing on platter disk, everything in the SLOG will always be in system memory. Basically the SLOG is the device and the ZIL is data onthe device.
Storage Pools
displaying
zpool listzpool list -o name,size,altroot
# zdb can view the inner workings of ZFS (zdb has a number of options)zdb
Note: there are a number of properties that you can select, the default is: name, size, used, available, capacity, health, altroot
status
zpool status
## Show only errored pools with more verbosityzpool status -xv
Sun - ZFS cheatsheet http://www.datadisk.co.uk/html_docs/sun/sun_zfs_cs.htm
1 of 7 3/29/15, 11:14 PM
statisticszpool iostat -v 5 5
Note: use this command like you would iostat
historyzpool history -il
Note: once a pool has been removed the history is gone
creating
## You cannot shrink a pool only grow it
## performing a dry run but don't actual perform the creation (notice the -n)zpool create -n data01 c1t0d0s0
# you can persume that I created two files called /zfs1/disk01 and /zfs1/disk02 using mkfilezpool create data01 /zfs1/disk01 /zfs1/disk02
# using a standard disk slicezpool create data01 c1t0d0s0
## using a different mountpoint than the default /zpool create -m /zfspool data01 c1t0d0s0
# mirror and hot spare disks examples, hot spares are not used by default turn on the "autoreplace" feature for each poolzpool create data01 mirror c1t0d0 c2t0d0 mirror c1t0d1 c2t0d1zpool create data01 mirror c1t0d0 c2t0d0 spare c3t0d0
## setting up a log device and mirroring itzpool create data01 mirror c1t0d0 c2t0d0 log mirror c3t0d0 c4t0d0
## setting up a cache devicezpool create data 01 mirror c1t0d0 c2t0d0 cache c3t0d0 c3t1d0
## you can also create raid pools (raidz/raidz1 - mirror, raidz2 - single parity, raidz3 double partity)zpool create data01 raidz2 c1t0d0 c1t1d0 c1t2d0 c1t3d0 c1t4d0
destroying
zpool destroy /zfs1/data01
## in the event of a disaster you can re-import a destroyed poolzpool import -f -D -d /zfs1 data031
addingzpool add data01 c2t0d0
Note: make sure that you get this right as zpool only supports the removal of hot spares and cache disks, for mirrors see attach and detach below
Resizing ## When replacing a disk with a larger one you must enable the "autoexpand" feature to allow you to use the extended space, you must do thisbefore replacing the first disk
removingzpool remove data01 c2t0d0
Note: zpool only supports the removal of hot spares and cache disks, for mirrors see attach and detach below
clearing faults
zpool clear data01
## Clearing a specific disk faultzpool clear data01 c2t0d0
attaching (mirror) ## c2t0d0 is an existing disk that is not mirrored, by attaching c3t0d0 both disks will become a mirror pairzpool attach data01 c2t0d0 c3t0d0
detaching (mirror)zpool detach data01 c2t0d0
Note: see above notes is attaching
onlining zpool online data01 c2t0d0
offlining
zpool offline data01 c2t0d0
## Temporary offlining (will revent back after a reboot)zpool offline data01 -t c2t0d0
Replacing
## replacing like for likezpool replace data03 c2t0d0
## replacing with another diskzpool replace data03 c2t0d0 c3t0d0
scrubbing
zpool scrub data01
## stop a scrubbing in progress, check the scrub line using "zpool status data01" to see any errorszpool scrub -s data01
Note; see top of table for more information about resilvering and scrubbing
Sun - ZFS cheatsheet http://www.datadisk.co.uk/html_docs/sun/sun_zfs_cs.htm
2 of 7 3/29/15, 11:14 PM
exporting
zpool export data01
## you can list exported pools using the import commandzpool import
importing
## when using standard disk devices i.e c2t0d0zpool import data01
## if using files in say the /zfs filesystemzpool import -d /zfs
## importing a destroyed poolzpool import -f -D -d /zfs1 data03
gettingparameters
zpool get all data01
Note: the source column denotes if the value has been change from it default value, a dash in this column means it is a read-only value
settingparameters
zpool set autoreplace=on data01
Note: use the command "zpool get all " to obtain list of current setting
upgrade
## List upgrade pathszpool upgrade -v
## upgrade all poolszpool upgrade -a
## upgrade specific pool, use "zpool get all " to obtain version number of a poolzpool upgrade data01
## upgrade to a specific versionzpool upgrade -V 10 data01
Filesystem
displaying
zfs list
## list different typeszfs list -t filesystemzfs list -t snapshotzfs list -t volume
zfs list -t all -r
## recursive displayzfs list -r data01/oracle
## complex listingzfs list -o name,mounted,sharenfs,mountpoint
Note: there are a number of attributes that you can use in a complex listing, so use the man page to see them all
creating
## persuming i have a pool called data01 create a /data01/apache filesystemzfs create data01/apache
## using a different mountpointzfs create -o mountpoint=/oracle data01/oracle
## create a volume - the device can be accessed via /dev/zvol/[rdsk|dsk]/data03/swapzfs create -V 50mb data01/swapswap -a /dev/zvol/dsk/data01/swap
Note: don't use a zfs volume as a dump device it is not supported
destroying
zfs destroy data01/oracle
## using the recusive options -r = all children, -R = all dependantszfs destroy -r data01/oraclezfs destroy -R data01/oracle
mounting
zfs mount data01
# you can create temporary mount that expires after unmountingzfs mount -o mountpoint=/tmpmnt data01/oracle
Note: there are all the normal mount options that you can apply i.e ro/rw, setuid
unmounting zfs umount data01
Sun - ZFS cheatsheet http://www.datadisk.co.uk/html_docs/sun/sun_zfs_cs.htm
3 of 7 3/29/15, 11:14 PM
share
zfs share data01
## Persist over rebootszfs set sharenfs=on data01
## specific hostszfs set sharenfs="[email protected]/24" data01/apache
unshare
zfs unshare data01
## persist over rebootszfs set sharenfs=off data01
snapshotting
## snapshotting is like taking a picture, delta changes are recorded to the snapshot when the original file system changes, to## remove a dataset all previous snaphots have to be removed, you can also rename snapshots.## You cannot destroy a snapshot if it has a clone
## creating a snapshotzfs snapshot data01@10022010
## renaming a snapshotzfs snapshot data01@10022010 data01@keep_this
## destroying a snapshotzfs destroy data01@10022010
rollback ## by default you can only rollback to the lastest snapshot, to rollback to older one you must delete all newer snapshotszfs rollback data01@10022010
cloning/promoting
## clones are writeable filesystems that was upgraded from a snapshot, a dependency will remain on the snapshot as long as the## clone exists. A clone uses the data from the snapshot to exist. As you use the clone it uses space separate from the snapshot.
## clones cannot be created across zpools, you need to use send/receive see below topics
## cloningzfs clone data01@10022010 data03/clonezfs clone -o mountpoint=/clone data01@10022010 data03/clone
## promoting a clone, this allows you to destroy the original file ssytem that the clone is attached tozfs promote data03/clone
Note: the clone must reside in the same pool
renaming
## the dataset must be kept within the same poolzfs rename data03/ora_disk01 data03/ora_d01
Note: you have two options-p creates all the non-existing parent datasets-r recursively rename the sanpshots of all descendent datasets (used with snapshots only)
Compression
## You enable compression by seeting a feature, compressions are on, off, lzjb, gzip, gzip[1-9] ans zle, not that it only start## compression when you turn it on, other existing data will not be compressedzfs set compression=lzjb data03/apache
## you can get the compression ratiozfs get compressratio data03/apache
Deduplication
## you can save disk space using deduplication which can be on file, block or byte, for example using file each file is hashed with a## cryptographic hashing algorithm such as SHA-256, if a file matches then we just point to the existing file rather than storing a## new file, this is ideal for small files but for large files a single character change would mean that all the data has to be copied
## block deduplication allows you to share all the same blocks in a file minus the blocks that are different, this allows to share the## unique blocks on disk and the reference shared blocks in RAM, however it may need a lot of RAM to keep track of which blocks## are shared and which are not., however this is the preferred option other than file or byte deduplication. Shared blocks are## stored in what is called a "deduplication table", the more deduplicated blocks the larger the table, the table is read everytime## to make a block change thus the table should be held in fast RAM, if you run out of RAM then the table will spillover onto disk.
## So how much RAM do you need, you can use the zdb command to check, take the "bp count", it takes about 320 bytes of ram## for each deduplicate block in the pool, so in my case 288674 means I would need about 92MB, for example a 200GB would need## about 670MB for the table, a good rule would be to allow 5GB of RAM for every 1TB of disk.
## to see the block the dataset consumeszdb -b data01
## to turn on deduplicatezfs set dedup=on data01/text_files
## to see the deduplicatio ratiozfs get dedupratio data01/text_files
Sun - ZFS cheatsheet http://www.datadisk.co.uk/html_docs/sun/sun_zfs_cs.htm
4 of 7 3/29/15, 11:14 PM
## to see the histrogram of howm many blocks are referenced how many timezdb -DD
gettingparameters
## List all the propertieszfs get all data03/oracle
## get a specific propertyzfs get setuid data03/oracle
## get a list of a specific properites for all datasetszfs get compression
Note: the source column denotes if the value has been change from it default value, a dash in this column means it is a read-only value
settingparameters
## set and unset a quotazfs set quota=50M data03/oraclezfs set quota=none data03/oracle
Note: use the command "zfs get all " to obtain list of current settings
inherit ## set back to the default valuezfs inherit compression data03/oracle
upgrade
## List the upgrade pathszfs upgrade -v
## List all the datasets that are not at the current levelzfs upgrade
## upgrade a specific datasetupgrade -V data03/oracle
send/receive
## here is a complete example of a send and receive with incremental update
## create some test filesmkfile -v 100m /zfs/mastermkdir -v 100m /zfs/slave
## create mountpointsmkdir /mastermkdir /slave
## Create the poolszpool create masterzpool create slave
## create the data filesystemzfs create master/data
## create a test fileecho "created: 09:58" > /master/data/test.txt
## create a snapshot and send it to the slave, you could use SSH or tape to transfer to another server (see below)zfs snapshot master/data@1zfs send master/data@1 | zfs receive slave/data
## set the slave to read-only because you can cause data corruption, make sure if do this before accessing anything the## slave/data directoryzfs set readonly=on slave/data
## update the original test.txt fileecho "`date`" >> /master/data/text.txt
## create a second snapshot and send the differences, you may get an error message saying that the desination had been## modified this is because you did not set the slave/data to ready only (see above)zfs snapshot master/data@2zfs send -i master/data@1 master/data@2 | zfs receive slave/data
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## using SSHzfs send master/data@1 | ssh backup_server zfs receive backups/data@1
## using a tape drive, you can also use cpiozfs send master/data@1 > /dev/rmt/0zfs receive slave/data2@1 < /dev/rmt/0zfs rename slave/data slave/data.oldzfs rename slave/data2 slave/data
## you can also save incremental datazfs send master/data@12022010 > /dev/rmt/0
Sun - ZFS cheatsheet http://www.datadisk.co.uk/html_docs/sun/sun_zfs_cs.htm
5 of 7 3/29/15, 11:14 PM
zfs send -i master/data@12022010 master/data@13022010 > /dev/rmt/0
## Using gzip to compress the snapshotzfs send master/fs@snap | gzip > /dev/rmt/0
allow/unallow
## display the permissions set and any user permissionszfs allow master
## create a permission setzfs allow -s @permset1 create,mount,snapshot,clone,promote master
## delete a permission setzfs unallow -s @permset1 master
## grant a user permissionszfs allow vallep @permset1 master
## revoke a user permissionszfs unallow vallep @permset1 master
Note: there are many permissions that you can set so see the man page or just use the "zfs allow" command
Quota/Reservation
## Not strickly a command but wanted to discuss here, you can apply a quota to a dataset, you can reduce this quota only if the## quota has not already exceeded, if you exceed the quota you will get a error message, you also have reservations which will## guarantee that a specified amount of disk space is available to the filesystem, both are applied to datasets and there## descendants (snapshots, clones)
## Newer versions of Solaris allow you to set group and user quota's
## you can also use refquota and refreservation to manage the space without accounting for disk space consumed by descendants## such as snapshots and clones. Generally you would set quota and reservation higher than refquota and refreservation
quota & reservation - properties are used for managing disk space consumed by datasets and their descendantsrefquota & refreservation - properties are used for managing disk space consumed by datasets only
## set a quotazfs set quota=100M data01/apache
## get a quotazfs get quota data01/apache
## setup user quota (use groupquota for groups)zfs set userquota@vallep=100M data01/apache
## remove a user quota (use groupquota for groups)zfs set userquota@vallep=none data01/apache
## List user quota (use groupspace for groups), you can alsolist users with quota's for exampe root userzfs userspace data01/apachezfs get userused@vallep data01/apache
ZFS tasks
Replace faileddisk
# List the zpools and identify the failed diskzpool list
# replace the disk (can use same disk or new disk)zpool replace data01 c1t0d0zpool replace data01 c1t0d0 c1t1d0
# clear any existing errorszpool clear data01
# scrub the pool to check for anymore errors (this depends on the size of the zpool as it can take a long time to completezpool scrub data01
# you can now remove the failed disk in the normal way depending on your hardware
Expand a poolscapacity
# you cannot remove a disk from a pool but you can replace it with a larger diskzpool replace data01 c1t0d0 c2t0d0zpool set autoexpand=on data01
Install the bootblock
# the command depends if you are using a sparc or a x86 systemsparc - installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c0t1d0x86 - installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c0t1d0s0
Lost rootpassword
# You have two options to recover the root password
## option oneok> boot -F failsafe whne requested follow the instructions to mount the rpool on /acd /a/etc
Sun - ZFS cheatsheet http://www.datadisk.co.uk/html_docs/sun/sun_zfs_cs.htm
6 of 7 3/29/15, 11:14 PM
vi passwd|shadowinit 6
## Option twook boot cdrom|net -s (you can boot from the network or cdroml)zpool import -R /a rpoolzfs mount rpool/ROOT/zfsBEcd /a/etcvi passwd|shadowinit 6
Primary mirrordisk in root isunavailable orfails
# boot the secondary mirrorok> boot disk1
## offline and unconfigure failed disk, there may be different options on unconfiguring a disk depends on the hardwarezpool offline rpool c0t0d0s0cfgadm -c unconfigure c1::dsk/c0t0d0
# Now you can physically replace the disk, reconfigure it and bring it onlinecfgadm -c configure c1::dsk/c0t0d0zpool online rpool c0t0d0
# Let the pool know you have replaced the diskzpool replace rpool c0t0d0s0
# if the replace above fails the detach and reattach the primary mirrorzpool deatch rpool c0t0d0s0zpool attach rpool c0t1d0s0 c0t0d0s0
# make checkszpool status rpool
# dont forget to add the boot block (see above)
Resize swap area(and dump areas)
# You can resize the swap if it is not being used, first record the size and if it is being usedswap -l
# resize the swap area, first by removing itswap -d /dev/zvol/dsk/rpool/swapzpool set volsize=2G rpool/swap
# Now activate the swap and check the size, if the -a option does not work then use "swapadd" commandswap -a /dev/zvol/dsk/rpool/swapswap -l
Note: if you cannot delete the original swap area due to being too busy then simple add another swap area, the same procedure is used for dumpareas but using the "dumpadm" command
Sun - ZFS cheatsheet http://www.datadisk.co.uk/html_docs/sun/sun_zfs_cs.htm
7 of 7 3/29/15, 11:14 PM