67
Zettabyte File System (File System Married to Volume Manager) Dusan Baljevic Sydney, Australia

Zettabyte File System (File System Married to Volume Manager) Dusan Baljevic Sydney, Australia

Embed Size (px)

Citation preview

Zettabyte File System (File System Married to Volume Manager)

Dusan BaljevicSydney, Australia

April 18, 2023 Dusan Baljevic, 2007 2

Server Market Trends

Revenue Share Revenue Share Percent

Vendor 2Q 2005 2Q 2005 2Q 2006 2Q 2006 Growth

IBM $3,893 31.9% $3,808 31.2% -2.2%

HP $3,480 28.5% $3,420 28.0% -1.7%

Sun $1,372 11.2% $1,585 13.0% 15.5%

Dell $1,286 10.5% $1,269 10.4% -1.3%

Fujitsu $551 4.5% $554 4.5% 0.5%

Others $1,638 13.4% $1,651 13.5% 0.8%

All $12,219 $12,287 0.6%

Note: Revenue in Billions of USD

Source: IDC http://www.itjungle.com/tug/tug082406-story02.html

April 18, 2023 Dusan Baljevic, 2007 3

ZFS Design Goals• A zettabyte (derived from the SI prefix zetta-)

is a unit of information or computer storage equal to one sextillion (one long scale trilliard) bytes:

Zetta Byte = 1021 Bytes ~ 270 Bytes• Officially introduced with Solaris 10 06/06• Combines Volume Manager and File System

(storage pool concept)• Based on a transactional object model• 128-bit file system (If Moore's Law holds, in 10

to 15 years humanity will need the 65th bit)• Data integrity

April 18, 2023 Dusan Baljevic, 2007 4

ZFS Design Goals (continued)• Snapshot and compression support• Endian-neutral• Automated common administrative tasks

and near-zero administration• Performance• Virtually unlimited scalability• Flexibility

April 18, 2023 Dusan Baljevic, 2007 5

ZFS – Modern ApproachF/S Creator Introduced O/S

HFS Apple Computer 1985 MacOS

VxFS VERITAS 1991 SVR4.0

AdvFS DEC pre-1993 Digital Unix

UFS1 Kirk McKusick 1994 4.4BSD

ZFS Sun Microsystems 2004Solaris

Reiser4 Namesys 2004 Linux

OCFS2 Oracle 2005 Linux

NILFS NTT 2005 Linux

GFS2 Red Hat 2006 Linux

ext4 Andrew Morton 2006 Linux

April 18, 2023 Dusan Baljevic, 2007 6

Traditional File Systems and ZFS (diagram courtesy of Sun Microsystems)

April 18, 2023 Dusan Baljevic, 2007 7

ZFS Storage Pools• Logical collection of physical storage

devices• Virtual storage pools make it easy to

expand or contract file systems by adding physical devices

• A storage pool is also the root of the ZFS file system hierarchy. The root of the pool can be accessed as a file system (for example, mount or unmount, snapshots, change properties)

• ZFS storage pools are divided into datasets (file system, volume, or snapshot). Datasets are identified by unique paths:

/pool-name/dataset-name

April 18, 2023 Dusan Baljevic, 2007 8

ZFS Administration Tools

• Standard tools are zpool and zfs• Most tasks can be run through a web

interface (ZFS Administration GUI and the SAM FS Manager use the same underlying web console packages):

https://hostname:6789/zfs

Prerequisite:

#/usr/sbin/smcwebserver start

#/usr/sbin/smcwebserver enable• zdb (ZFS debugger) command for support

engineers

April 18, 2023 Dusan Baljevic, 2007 9

ZFS Transactional Object Model

• All operations are copy-on-write (COW). Live data is never overwritten

• ZFS writes data to a new block before changing the data pointers and committing the write. Copy-on-write provides several benefits:

* Always-valid on-disk state

* Consistent, reliable backups

* Data rollback to known point in time • Time-consuming recovery procedures like fsck

are not required if the system is shut down in an unclean manner

April 18, 2023 Dusan Baljevic, 2007 10

ZFS Snapshot versus Clone• Snapshot is a read-only copy of a file

system (or volume) that initially consumes no additional space. It cannot be mounted as a file system. ZFS snapshots are immutable. This features is critical for supporting legal compliance requirements, such as Sarbanes-Oxley, where businesses have to demonstrate that the view of the data at a given point in time is correct

• Clone is a write-enabled “snapshot”. It can only be created from a snapshot. A clone can be mounted

• Snapshot properties are inherited at creation time and cannot be changed

April 18, 2023 Dusan Baljevic, 2007 11

ZFS Data Integrity• Solaris 10 with ZFS is the only known

operating system designed to provide end-to-end checksum capability for all data

• All data is protected by 64-bit checksums

• ZFS constantly reads and checks data to help ensure it is correct. If it detects an error in a mirrored pool, the technology can automatically repair the corrupt data

April 18, 2023 Dusan Baljevic, 2007 12

ZFS Data Integrity (continued)• Checksums stored with indirect blocks

• Self-validating, self-authenticating checksum tree

• Detects phantom writes, misdirections, common administrative errors (for example, swap on active ZFS disk)

April 18, 2023 Dusan Baljevic, 2007 13

ZFS Endianess

• ZFS is supported on both SPARC and x86 platforms

• One can easily move storage pools from a SPARC to an x86 server. Neither architecture pays a byte-swapping "tax" due to "adaptive endian-ness" technology (unique to ZFS)

April 18, 2023 Dusan Baljevic, 2007 14

ZFS Scalability• Enables administrators to state the intent

of their storage policies rather than all of the details needed to implement them

• To resize ZFS is easy. To resize an UFS requires downtime and data restore from tapes or other disks

• Maximum filename length = 255 Bytes• Allowable characters in directory entries =

Any Unicode except NUL • Maximum pathname length = No limit

defined• Maximum file size = 16 EB• Maximum volume size = 16 EB

April 18, 2023 Dusan Baljevic, 2007 15

ZFS Scalability (continued)• 248 snapshots in any file system • 248 files in any individual file system • 16 EB files • 16 EB attributes • 3x1023 PB storage pools • 248 attributes for a file • 248 files in a directory • 264 devices in a storage pool • 264 storage pools per system • 264 file systems per storage pool

April 18, 2023 Dusan Baljevic, 2007 16

ZFS Scalability (continued)ZFS Status

Stores File Owner Yes

POSIX File Permissions Yes

Creation Timestamps Yes

Last Access/Read Timestamp Yes

Last Metadata Change Timestamps

Yes

Last Archive Timestamps Yes

Access Control Lists Yes

April 18, 2023 Dusan Baljevic, 2007 17

ZFS Scalability (continued)ZFS Status

Extended Attributes Yes

Checksums Yes

XIP (Hotbar Image

Compression Format)No

Hard Links Yes

Soft Links Yes

Block Journaling Yes

Metadata-only Journaling No

April 18, 2023 Dusan Baljevic, 2007 18

ZFS Scalability (continued)ZFS Status

Transparent Compression Yes

Extents No

Variable Block Size Yes

Allocate-on-flush Yes

Incremental Snapshots Yes

Case-sensitive Yes

Case-preserving Yes

April 18, 2023 Dusan Baljevic, 2007 19

ZFS Scalability (continued)ZFS Status

Iostat Gives Details of IO Utilization

Yes

Rollbacks Yes

Clones Yes

Snapshots Use More Than 1% of Data Space to Create

No

Handles Whole Disk Failures Yes

Built-in Backup/Restore Yes

Integrated Quotas Yes (per file system)

April 18, 2023 Dusan Baljevic, 2007 20

ZFS Flexibility• When additional disks are added to an existing

mirrored or RAID-Z pool, the ZFS is “rebuilt” to redistribute the data. This feature owes a lot to its conceptual predecessor, WAFL, the “Write Anywhere File Layout” file system developed by NetApp for their network file server applicances

• Dynamic striping across all devices to maximize throughput

• Copy-on-write design (most disk writes sequential)

• Variable block sizes (up to 128 kilobytes), automatically selected to match workload

April 18, 2023 Dusan Baljevic, 2007 21

ZFS Flexibility (continued)• Globally optimal I/O sorting and

aggregation • Multiple independent prefetch streams with

automatic length and stride detection• Unlimited, instantaneous read/write

snapshots • Parallel, constant-time directory operations• Explicit I/O priority with deadline scheduling

April 18, 2023 Dusan Baljevic, 2007 22

ZFS Simplifies NFS• To share file systems via NFS, no entries in

/etc/dfs/dfstab are required

• Automatically handled by ZFS if the property sharenfs=on is set

• Commands

zfs share and zfs unshare

April 18, 2023 Dusan Baljevic, 2007 23

ZFS RAID Levels• ZFS file systems automatically stripe across

all top-level disk devices

• Mirrors and RAID-Z devices are considered to be top-level devices

• It is not recommended to mix RAID types in a pool (zpool tries to prevent this, but it can be forced with the -f flag)

April 18, 2023 Dusan Baljevic, 2007 24

ZFS RAID Levels (continued)

The following RAID levels are supported:

* RAID-0 (striping)

* RAID-1 (mirroring)

* RAID-Z (similar to RAID-5, but with variable-

width stripes to avoid RAID-5 write hole)

* RAID-Z2 (double-parity RAID-5)

April 18, 2023 Dusan Baljevic, 2007 25

ZFS RAID Levels (continued)• A RAID-Z configuration with N disks of size X

with P parity disks can hold approximately (N-P)*X bytes and can withstand one device failing

• Start a single-parity RAID-Z configuration at 3 disks (2+1)

• Start a double-parity RAID-Z2 configuration at 5 disks (3+2)

• (N+P) with P = 1 (RAID-Z) or 2 (RAID-Z2) and N equals 2, 4, or 8

• The recommended number of disks per group is between 3 and 9 (use multiple groups if larger)

April 18, 2023 Dusan Baljevic, 2007 26

ZFS Copy-On-Write (courtesy of Jeff Bonwick)

April 18, 2023 Dusan Baljevic, 2007 27

ZFS and Disk Arrays with Own Cache

• ZFS does not “trust” that anything it writes to the ZFS Intent Log (ZIL) made it to your storage, until it flushes the storage cache

• After every write to the ZIL, ZFS executes an fsync() call to instruct the storage to flush its write cache to the disk. ZFS will not consider a write operation done until the ZIL write and flush have completed

• Problem might occur when trying to layer ZFS over an intelligent storage array with a battery-backed cache (due to arrays ability to use cache and override ZFS activities)

April 18, 2023 Dusan Baljevic, 2007 28

ZFS and Disk Arrays with Own Cache (continued)

• Initial tests show that HP EVAs and Hitachi/HP XP SAN are not affected and work well with ZFS

• Lab test will provide more comprehensive data for HP EVA and XP SAN

April 18, 2023 Dusan Baljevic, 2007 29

ZFS and Disk Arrays with Own Cache (continued)

Two possible solutions for SAN that are affected:

• Disable the ZIL. The ZIL is the way ZFS maintains consistency until it can get the blocks written to their final place on the disk. BAD OPTION!

• Configure the disk array to ignore ZFS flush commands. Quite safe and beneficial

April 18, 2023 Dusan Baljevic, 2007 30

Configure Disk Array to Ignore ZFS flush

For Engenio arrays (Sun StorageTek FlexLine

200/300 series, Sun StorEdge 6130, Sun

StorageTek 6140/6540, IBM DS4x00, many SGI

InfiniteStorage arrays):

• Shut down the server or, at minimum, export ZFS pools before running this

• Cut and paste the following into the script editor of the "Enterprise Management Window" of the SANtricity management GUI:

April 18, 2023 Dusan Baljevic, 2007 31

Configure Disk Array to Ignore ZFS flush – Script Example//Show Solaris ICS option

show controller[a] HostNVSRAMbyte[0x2, 0x21];

show controller[b] HostNVSRAMbyte[0x2, 0x21];

//Enable ICS

set controller[a] HostNVSRAMbyte[0x2, 0x21]=0x01;

set controller[b] HostNVSRAMbyte[0x2, 0x21]=0x01;

//Make changes - rebooting controllers

show "Rebooting A controller.";

reset controller[a];

show "Rebooting B controller.";

reset controller[b];

April 18, 2023 Dusan Baljevic, 2007 32

Why ZFS Now?• ZFS is positioned to support more file

systems, snapshots, and files in a file system than can possibly be created in the foreseeable future

• Complicated storage administration concepts are automated and consolidated into straightforward language, reducing administrative overhead by up to 80 percent

• Unlike traditional file systems that require a separate volume manager, ZFS integrates volume management functions. It breaks out of “one-to-one mapping between the file system and its associated volumes” limitation with the storage pool model. When capacity is no longer required by one file system in the pool, it becomes available to others

April 18, 2023 Dusan Baljevic, 2007 33

Dusan’s own experience• Numerous tests on a limited number of Sun

SPARCand Intel platforms consistently showed that ZFS outperformed Solaris Volume Manager (SVM) on same physical volumes by at least 25%

• Ease of maintenance made ZFS a very attractive option

• Ease of adding new physical volumes to pools and automated resilvering was impressive

April 18, 2023 Dusan Baljevic, 2007 34

Typical ZFS on SPARC PlatformUnless specifically requested, this type of

configuration is very common:

• 73 GB internal disks are used• Internal disks are c0t0d0s2 and c1t0d0s2• SVM is used for mirroring• Physical memory is 8GB

April 18, 2023 Dusan Baljevic, 2007 35

Typical Swap Consideration• Primary swap is calculated by using the

following…• If no additional storage available (only two

internal disks used):

2 x RAM, if RAM <= 8GB

1 x RAM, if RAM > 8GB• If additional storage available, make

primary swap relatively small (4GB) and add additional swaps on other disks as necessary

April 18, 2023 Dusan Baljevic, 2007 36

Typical ZFS on SPARC Platform (no Sun Cluster)

Slice Filesystem Size TypeMirror Description

s0 / 11GB UFS Yes Root

s1 Swap 8GB Yes Swap

s2 All 73GB

s3 /var 5GB UFS Yes /var

s4 / 11GB UFS Yes Alternate Root

s5 /var 5GB UFS Yes Alternate /var

s6 30GB ZFS Yes Application SW

s7 SDB replica 64MB SVM mgmt

April 18, 2023 Dusan Baljevic, 2007 37

Typical ZFS on SPARC Platform (up to Sun Cluster 3.1 – ZFS not supported)

Slice Filesystem Size Type Mirror Description

s0 / 30GB UFS Yes Root

s1 Swap 8GB Yes Swap

s2 All 73GB

s3 /global 512MB UFS Yes Sun Cluster (SC)

s4 / 30GB UFS YesAlternate Root

s5 /global 512MB UFS Yes Alternate SC

s6 Not used

s7 SDB replica 64MB SVM mgmt

April 18, 2023 Dusan Baljevic, 2007 38

Support for ZFS in Sun Cluster 3.2

• ZFS is supported as a highly available local file system in the Sun Cluster 3.2 release

• ZFS with Sun Cluster offers a file system solution combining high availability, data integrity, performance, and scalability, covering the needs of the most demanding environments

April 18, 2023 Dusan Baljevic, 2007 39

ZFS Best Practices• Run ZFS on servers that run 64-bit kernel • One GB or more of RAM is recommended • Because ZFS caches data in kernel

addressable memory, the kernel will possibly be larger than with other file systems. Use the size of physical memory as an upper bound to the extra amount of swap space that might be required

• Do not use slices on the same disk for both swap space and ZFS file systems. Keep the swap areas separate from the ZFS file systems

• Set up one storage pool using whole disks per system

April 18, 2023 Dusan Baljevic, 2007 40

ZFS Best Practices (continued)• Set up a replicated pool (raidz, raidz2, or

raid1 configuration) for all production environments

• Do not use disk slices for storage pools intended for production use

• Set up hot spares to speed up healing in the face of hardware failures

• For replicated pools, use multiple controllers to reduce hardware failures and improve performance

April 18, 2023 Dusan Baljevic, 2007 41

ZFS Best Practices (continued)• Run zpool scrub on a regular basis to identify

data integrity problems. For consumer-quality drives, set up weekly scrubbing schedule. For datacenter-quality drives, set up monthly scrubbing schedule

• If workloads have predictable performance characteristics, separate loads into different pools

• For better performance, use individual disks or at least LUNs made up of just a few disks

• Pool performance can degrade when it is very full and file systems are updated frequently (busy mail or web proxy server). Keep pool space under 80% utilization to maintain pool performance

April 18, 2023 Dusan Baljevic, 2007 42

ZFS Backups• Currently, ZFS does not provide a

comprehensive backup or restore utility like ufsdump and ufsrestore

• Use the zfs send and zfs receive commands to capture ZFS data streams

• You can use the ufsrestore command to restore UFS data into a ZFS file system

• Use ZFS snapshots as a quick and easy way to backup file systems

• Create an incremental snapshot stream (zfs send -i)

April 18, 2023 Dusan Baljevic, 2007 43

ZFS Backups (continued)• zfs send and zfs receive commands are

not enterprise-backup solutions • Sun StorEdge Enterprise Backup Software

(Legato Networker 7.3.2 and above) can fully backup and restore ZFS files including ACLs

• Veritas NetBackup product can be used to back up ZFS files, and this configuration is supported. However, it does not currently support backing up or restoring NFSv4-style ACL information from ZFS files. Traditional permission bits and other file attributes are correctly backed up and restored

April 18, 2023 Dusan Baljevic, 2007 44

ZFS Backups (continued)• IBM Tivoli Storage Manager backs up and

restores ZFS file systems with the CLI tools, but the GUI seems to exclude ZFS file systems. Non-trivial ZFS ACLs are not preserved

• Computer Associates' BrightStor ARCserve product backs up and restores ZFS file systems. ZFS ACLs are not preserved

April 18, 2023 Dusan Baljevic, 2007 45

ZFS and Data Protector• According to “HP Openview Storage Data

Protector Planned Enhancements to Platforms, Integrations, Clusters and Zero Downtime Backups version 4.2”, dated 12th of January 2007:

Back Agent (Disk Agent) support for Solaris 10 ZFS will be released in Data Protector 6 in March 2007

http://storage.corp.hp.com/Application/View/ProdCenter.asp?OID=326828&rdoType=filter&inpCriteria1=QuickLink&inpCriteria2=nothing&inpCriteriaChild1=3#QuickLink

April 18, 2023 Dusan Baljevic, 2007 46

ZFS by Example• To create a simple ZFS RAID-1 pool:

zpool create custpool mirror c0t0d0 c1t0d0

zpool create p2m mirror c0t0d0 c0t1d0 mirror c0t2d0 c0t3d0

• When only one file system in a storage pool, the zpool create command does everything:*Writes an EFI label on the disk*Creates a pool of the specified name including

structures for data protection * Creates a file system of same name, creates

directory and mounts file system on /custpool * Mount point preserved across reboots

April 18, 2023 Dusan Baljevic, 2007 47

ZFS by Example (continued)• To create a simple ZFS RAID-1 pool by

using more powerfull zfs command:

zfs create custpool mirror c0t0d0 c1t0d0• To create three file systems that share the

same pool, and one of the file systems is limited to, say, 525 MB:

zfs create custpool/apps zfs create custpool/home

zfs create custpool/web

zfs set quota=525m custpool/web

April 18, 2023 Dusan Baljevic, 2007 48

ZFS by Example (continued)• Mount file system via /etc/vfstab

zfs set mountpoint=legacy custpool/db • Check status of pools: zpool status• Umount / Mount all ZFS file systems zfs umount –a zfs mount –a• Upgrade ZFS pool version (from RAID-Z to

RAID-Z2, for example) zpool upgrade custpool

April 18, 2023 Dusan Baljevic, 2007 49

ZFS by Example (continued)• Clear pool's error count zpool clear custpool• Delete a file system zfs destroy custpool/web• Data integrity can be checked by running

manual scrubbing zpool scrub custpool zpool status -v custpool

• Check if any problem pools exist zpool status -x

April 18, 2023 Dusan Baljevic, 2007 50

ZFS by Example (continued)• To offline a failing disk drive zpool offline custpool c0t1d0• Once the drive has been physically

replaced, run the replace command against the device

zpool replace custpool c0t1d0• After an offlined drive has been replaced, it

can be brought back online zpool online custpool c0t1d0• Show installed ZFS version (V2 on Solaris

10u2, and V3 on Solaris 10u3) zpool upgrade

April 18, 2023 Dusan Baljevic, 2007 51

ZFS by Example (continued)• Create a snapshot from a file system zfs snapshot custpool/web@Mon

• Roll-back a snapshot (unmounts and remounts file system automatically)

zfs rollback –r custpool/web@Mon

• Delete a snapshot (important: clone must be deleted firstly – if it exists for a given snapshot!)

zfs destroy custpool/web@Mon

• A destroyed pool can sometimes be recovered

zpool import -D

April 18, 2023 Dusan Baljevic, 2007 52

ZFS by Example (continued)• Display I/O statistics for ZFS pools zfs iostat 5 fsstat zfs fsstat -F 5 5• Shows current mirror/pool device properties

zpool vdevs• Adds mirror to pool zpool add -f pool mirror c0t1d0s3 c0t1d0s4 • The zfs promote command enables you to

replace an existing ZFS file system with a clone of that file system. This feature is helpful when you want to run tests on an alternative version of a file system and then, make that alternative version of the file system the active file system.

April 18, 2023 Dusan Baljevic, 2007 53

ZFS by Example (continued)• Display ZFS properties zfs get -o property,value,source all custpool

• List pool and its snapshots/clones

zfs list -r custpool/web• Detaches device from a mirror. The

operation is refused if there are no other valid replicas of the data

zpool detach custpool c0t2d0

April 18, 2023 Dusan Baljevic, 2007 54

ZFS by Example (continued)• Snapshots cannot be mounted - hence,

they can not be backed up directly. Backups of snapshots can be done in two ways

* Create a clone, or

* Runzfs send custpool/web@Mon | zfs receive bckpool/webtmp

• Incremental backup zfs send –i custpool/web@Mon > /dev/rmt/0

April 18, 2023 Dusan Baljevic, 2007 55

ZFS by Example (continued)• The following commands illustrate how to test out

changes to a file system, and then replace the original file system with the changed one, using clones, clone promotion, and renaming:

zfs create pool/project/production zfs snapshot pool/project/production@today zfs clone pool/project/production@today pool/project/beta

zfs promote pool/project/beta zfs rename pool/project/production pool/project/legacy

zfs rename pool/project/beta pool/project/production

zfs destroy pool/project/legacy

April 18, 2023 Dusan Baljevic, 2007 56

ZFS by Example (continued)• The following commands sends a full

backup and then an incremental to a remote machine, restoring them into poolB/restored/web@a and poolB/restored/web@b, respectively. poolB must contain the file system poolB/restored, and must not initially contain poolB/restored/web

zfs send custpool/web@a | \ ssh host zfs receive poolB/restored/web@a

zfs send -i custpool/web@a \ custpool/web@b| ssh host zfs receive –d \ poolB/restored/web

April 18, 2023 Dusan Baljevic, 2007 57

ZFS-based swap• Do not swap to a file on a ZFS file system. A

ZFS swap file configuration is not supported

• Using a ZFS volume as a dump device is not supported

zfs create -V 5gb custpool/swapvol swap -a /dev/zvol/dsk/custpool/swapvol

April 18, 2023 Dusan Baljevic, 2007 58

Cross-platform ZFS Data Migration• To move a pool from SPARC machine

sunsrv1 to AMD machine amdsrv2:

1. On sunsrv1:

zpool export tank

2. Physically move disks from sunsrv1 to

amdsrv2

2. On amdsrv2:

zpool import tank

April 18, 2023 Dusan Baljevic, 2007 59

Fun with ZFS Pool Planning • Create two 3-way mirror zpool create -f ex1 mirror c1t0d0 c1t1d0 \

c1t2d0

zpool add -f ex1 mirror c1t6d0 c1t7d0 \

c1t8d0

• Striping 3 disks with 2-way mirror eachzpool create -f ex2 mirror c1t0d0 c1t6d0 \

mirror c1t1d0 c1t7d0 mirror c1t2d0 c1t8d0

April 18, 2023 Dusan Baljevic, 2007 60

ZFS and RBAC • Role-Based Access Control already

supports ZFS. /etc/security/prof_attr:

ZFS File System Management:::Create and \

Manage ZFS File \

Systems:help=RtZFSFileSysMngmnt.html

ZFS Storage Management:::Create and Manage \

ZFS Storage \

Pools:help=RtZFSStorageMngmnt.html

April 18, 2023 Dusan Baljevic, 2007 61

ZFS Command HistoryZFS logs the last group of commands on to the

zpool devices. Even if the pool is moved to a

different system (export/import), command history

is still available:

zpool history custpool

History for ‘custpool':

2006-11-10.03:01:56 zpool create custpool raidz c0t0d0 c0t0d1 c1t0d0 c1t1d1

2006-11-10.12:19:47 zpool export custpool

2006-11-10.12:20:07 zpool import custpool

April 18, 2023 Dusan Baljevic, 2007 62

ZFS Boot Recovery• If a panic-reboot loop is caused by a ZFS

software programming error, the server can be instructed to boot without the ZFS file systems

boot -m milestone=none• When the system is up, remount root file

system “rw” and remove file /etc/zfs/zpool.cache. The remainder of the boot can proceed with command

svcadm milestone• At that point, import the good pools. The

damaged pools may need to be re-initialized

April 18, 2023 Dusan Baljevic, 2007 63

ZFS and Containers• Solaris 10 06/06 supports the use of ZFS

file systems• It is possible to install a zone into a ZFS file

system, but the installer/upgrader program does not understand ZFS well enough to upgrade zones that reside on a ZFS file system

• Upgrading a server that has a zone installed on a ZFS file system is not yet supported

April 18, 2023 Dusan Baljevic, 2007 64

ZFS Test (OpenSolaris only)• Ztest was written by the ZFS Developers as a ZFS

unit test. The tool was developed in parallel with the ZFS functionality

• As features were added to ZFS, unit tests were also added to ztest. In addition, a separate test development team wrote and executed more functional and stress tests

• On an OpenSolaris server, ztest is located in:

/usr/bin/ztest Usage: ztest [-v vdevs (default: 5)] [-s sizeofeachvdev (default: 64M)] [-a alignmentshift

(default: 9) (use 0 for random)] [-m mirrorcopies (default: 2)] [-r raidzdisks (default: 4)] [-R raidzparity (default: 1)] [-d datasets (default: 7)] [-t threads (default: 23)] [-g gangblockthreshold (default: 32K)] [-i initialize pool i times (default: 1)] [-k kill percentage (default: 70%)] [-p poolname (default: ztest)] [-f file directory for vdev files (default: /tmp)] [-V(erbose)] (use multiple times for ever more blather) [-E(xisting)] (use existing pool instead of creating new one) [-T time] total run time (default: 300 sec) [-P passtime] time per pass (default: 60 sec)

April 18, 2023 Dusan Baljevic, 2007 65

ZFS Current Status• Solaris 10 Update 3 (11/06) has been

released. It brings a large number of features that have been in OpenSolaris into the fully supported release, including: ZFS command improvements and changes,

including RAID-Z2, hot-spares, recursive snapshots, promotion of clones, compact NFSv4 ACLs, destroyed pools recovery, error clearing, ZFS integration with FMA, and more

• As of OpenSolaris Build 53, a powerful new feature: ZFS and iSCSI integration

• One of the missing features: file systems are not mountable after detaching “submirrors”

April 18, 2023 Dusan Baljevic, 2007 66

ZFS Current Status (continued)• Improvements for NFS over ZFS. ZFS

complies with all NFS semantics even with write caches enabled. Disabling the ZIL (setting zil_disable to 1 using mdb and then mounting the filesystem) is one way to generate an improper NFS service. With the ZIL disabled, commit request are ignored with potential client's view corruption

• No support for reducing a zpool's size by removing devices

April 18, 2023 Dusan Baljevic, 2007 67

ZFS Mountroot (currently X86 only)• ZFS Mountroot provides capability of configuring a

ZFS root file system. It is not a complete boot solution - it relies on the existence of a small UFS boot environment

• ZFS Mountroot was integrated in Solaris Nevada Build 37 - OpenSolaris release (disabled by default)

• The ZFS Mountroot does not work on SPARC currently

• ACLs are not fully preserved when copying files from a UFS root to a ZFS root. ZFS will convert UFS style ACLs to the new NFSv4 ACLs, but they may not be entirely identical