Upload
amberly-ross
View
216
Download
1
Embed Size (px)
Citation preview
Flexible Storage Allocation
A. L. Narasimha ReddyDepartment of Electrical and Computer EngineeringTexas A & M University
Students: Sukwoo Kang (now at IBM Almaden)John Garrison
2 Texas A&M University Narasimha Reddy 5/1/2008
Outline
Big Picture
Part I: Flexible Storage Allocation
– Introduction and Motivation
– Design of Virtual Allocation
– Evaluation
Part II: Data Distribution in Networked Storage SystemsPart II: Data Distribution in Networked Storage Systems
– Introduction and MotivationIntroduction and Motivation
– Design of User-Optimal Data MigrationDesign of User-Optimal Data Migration
– EvaluationEvaluation
Part III: Storage Management across diverse devicesPart III: Storage Management across diverse devices
Conclusion
3 Texas A&M University Narasimha Reddy 5/1/2008
Storage Allocation
Allocate entire storage space at the time of the file system creation
Storage space owned by one operating system cannot be used by another
30 GB50 GB
Windows NT (NTFS)
Linux(ext2)
70 GB50 GB
98 GB
AIX(JFS)
Running out of space!
ActualAllocations
4 Texas A&M University Narasimha Reddy 5/1/2008
Big Picture
Memory systems employ virtual memory for several reasons
Current storage systems lack such flexibility
Current file systems allocate storage statically at the time of their creation
– Storage allocation: Space on the disk is not allocated well across multiple file systems
5 Texas A&M University Narasimha Reddy 5/1/2008
File Systems with Virtual Allocation
When a file system is created with X GB,
– Allows the file system to be created with only Y GB, where Y << X
– Remaining space used as one common available pool
– As the file system grows, the storage space can be allocated on demand
30 GB50 GB
Windows NT (NTFS)
Linux(ext2)
98 GB
AIX(JFS)
10 GB
10 GB
ActualAllocations
60 GB 40 GB100 GB Common Storage Pool
6 Texas A&M University Narasimha Reddy 5/1/2008
Our Approach to Design
Physical Disk
Physical block address
Employ Allocate-on-write policy
– Storage space is allocated when the data is written
– Writes all data to disk sequentially based on the time at which data is written to the device
– Once data is written, data can be accessed from the same location, i.e., data is updated in-place
7 Texas A&M University Narasimha Reddy 5/1/2008
Allocate-on-write Policy
Physical DiskWrite at t = t’
Extent
Storage space is allocated by the unit of the extent when the data is written
Extent is a group of file system blocks
– Fixed size
– Retain more spatial locality
– Reduce information that must be maintained
8 Texas A&M University Narasimha Reddy 5/1/2008
Allocate-on-write Policy
Physical Disk
Extent0
Extent1
Write at t = t’
Write at t = t’’ (where t’’ > t’)
Data is written to disk sequentially based on write-time
– Further writes to the same data updated in-place
– VA (Virtual Allocation) requires additional data structure
9 Texas A&M University Narasimha Reddy 5/1/2008
Block Map
Physical Disk
Extent0
Extent1
Write at t = t’
Write at t = t’’ (where t’’ > t’)
Extent2
Block map
Block map keeps a mapping of logical storage locations and real (physical) storage locations
10 Texas A&M University Narasimha Reddy 5/1/2008
VA Metadata
Physical Disk
Extent0
Extent1
Extent2
Block map
VAMetadata
Hardening
This block map is maintained in memory and regularly written to disk for hardening against system failures
VA Metadata represents the on-disk block map
11 Texas A&M University Narasimha Reddy 5/1/2008
On-disk Layout & Storage Expansion
Physical Disk
FSMetadata
Extent1
Extent2
VAMetadata
Extent0
Virtual Disk
Extent3
Extent4
Extent5
Extent6
Extent7
Storage Expansion Threshold
Storage Expansion
When the capacity is exhausted or reaches storage expansion threshold, a physical disk can be expanded to other available storage resources
– File system unaware of the actual space allocation and expansion
12 Texas A&M University Narasimha Reddy 5/1/2008
Write Operation
Application Write Request
File System
Buffer/Page Cache Layer Page
Acknowledgement
Allocate new extent and update mapping information
Block I/O Layer (VA)
Search VA block map
Extent3
FSMetadata
Extent1
Extent2
VAMetadata
Extent0
Disk
Hardening
13 Texas A&M University Narasimha Reddy 5/1/2008
Read Operation
Application Read Request
File System
Buffer/Page Cache Layer
Block I/O Layer (VA)
Search VA block map
Extent3
FSMetadata
Extent1
Extent2
VAMetadata
Extent0
Disk
14 Texas A&M University Narasimha Reddy 5/1/2008
Allocate-on-write vs. Other Work
Key difference from log-structured file systems (LFS)
– Only allocation is done at the end of log
– Updates are done in-place after allocation
LVM still ties up storage at the time of file system creation
15 Texas A&M University Narasimha Reddy 5/1/2008
Design Issues
Extent-based Policy Example (with Ext2)
– I (inode), B (data block), V (VA block map)
– A B (B is allocated to A)
File system-based Policy Example (with Ext3 ordered mode)
VA Metadata Hardening (File System Integrity)
– Must keep certain update ordering of VA metadata and FS (meta)data
16 Texas A&M University Narasimha Reddy 5/1/2008
Design Issues (cont.)
Extent Size
– Larger extent size: Reduce block map size, retain more spatial locality, cause data fragmentation
Reclaiming allocated storage space of deleted files
– Needed to continue to provide the benefits of virtual allocation
– Without reclamation, possible to turn virtual allocation into static allocation
Interaction with RAID
– RAID remaps blocks to physical devices to provide device characteristics
– VA remaps blocks for flexibility
– Need to resolve performance impact of VA’s extent size and RAID’s chunk size
17 Texas A&M University Narasimha Reddy 5/1/2008
Spatial Locality Observations & Issues
Metadata and data separation
Data clustering: Reduce seek distance
Multiple file systems
Data placement policy
– Allocate hot data in a high data region of disk
– Allocate hot data in the middle of the partition
18 Texas A&M University Narasimha Reddy 5/1/2008
Implementation & Experimental Setup
Virtual allocation prototype
– Kernel module for Linux 2.4.22
– Employ a hash table in memory for speeding up VA lookups
Setup
– A 3GHz Pentium 4 processor, 1GB main memory
– Red Hat Linux 9 with a 2.4.22 kernel
– Ext2 file system and Ext3 file system
Workloads
– Bonnie++ (Large-file workload)
– Postmark (Small-file workload)
– TPC-C (Database workload)
19 Texas A&M University Narasimha Reddy 5/1/2008
VA Metadata Hardening
-7.3 -3.3 -1.2 +4.9 +8.4 +9.5
Compare EXT2 and VA-EXT2-EX
Compare EXT3 and VA-EXT3-EX, VA-EXT3-FS
20 Texas A&M University Narasimha Reddy 5/1/2008
Reclaiming Allocated Storage Space
Reclaim operation for deleted large files
How to keep track of deleted files?
– Employed stackable file system: Maintain duplicated block bitmap
– Alternatively, could employ “Life or Death at Block-Level” (OSDI’04) work
21 Texas A&M University Narasimha Reddy 5/1/2008
VA with RAID-5 Large-file workload Small-file workload
Large-file workload with NVRAM
Used Ext2 with software RAID-5 + VA
NVRAM-X%: X% of total VA metadata size
VA-RAID-5 NO-HARDEN
VA-RAID-5 NVRAM-17%
VA-RAID-5 NVRAM-4%
VA-RAID-5 NVRAM-1%
22 Texas A&M University Narasimha Reddy 5/1/2008
Data Placement Policy (Postmark)
VA NORMAL partition: Same data rate across a partition
VA ZCAV partition: Hot data is placed in high data region of a partition
16 24
VA-NORMAL: start allocation from the outer cylinders
VA-MIDDLE: start allocation from the middle of a partition
23 Texas A&M University Narasimha Reddy 5/1/2008
Multiple File Systems
VA-7GB: 2 x 3.5GB partition, 30% utilization
VA-32GB: 2 x 16GB partition, 80% utilization
Used Postmark
VA-HALF: The 2nd file system is created after 40% of the 1st file system is written
VA-FULL: 80%
24 Texas A&M University Narasimha Reddy 5/1/2008
Real-World Deployment of Virtual Allocation
Prototype built
25 Texas A&M University Narasimha Reddy 5/1/2008
VA in Networked Storage Environment
Flexible allocation provided by VA leads to
– Balancing locality vs. load balance issues
26 Texas A&M University Narasimha Reddy 5/1/2008
Part II: Data Distribution
Locality-based approach
– Use data migration (e.g. HP AutoRAID)
– Employ “hot” data migration from slower device (remote disk) to faster device (local disk)
Load balancing-based approach (Striping)
– Exploit multiple devices to support the required data rates (e.g. Slice-OSDI’00)
Hot data
Cold data
27 Texas A&M University Narasimha Reddy 5/1/2008
User-Optimal Data Migration
data
Locality is exploited first
– Data is migrated from Disk B to Disk A
Load balancing is also considered
– If the load on Disk A is too high, data is migrated from Disk A to Disk B
28 Texas A&M University Narasimha Reddy 5/1/2008
Migration Decision Issues
data
Where to migrate: Use I/O request response time
When to migrate: Migration threshold
– Initiate migration from Disk A to Disk B only when
How to migrate: Limit number of concurrent migrations (Migration token)
What data to migrate: Active data
writewrite readwrite
29 Texas A&M University Narasimha Reddy 5/1/2008
Design Issues
Allocation policy
– Striping with user-optimal migration: will improve data access locality
– Sequential allocation with user-optimal migration: will improve load balancing
Multi-user environment
– Each user migrates data in a user-selfish manner
– Migrations will tend to improve the performance of all users over longer periods of time
30 Texas A&M University Narasimha Reddy 5/1/2008
Evaluation
Implemented as a kernel block device driver
Evaluated it using SPECsfs benchmark
Configuration SPECsfs Performance Curve
Single-UserMulti-User
31 Texas A&M University Narasimha Reddy 5/1/2008
Single-User Environment
Striping with user-optimal migration
Seq. allocation with user-optimal migration
Configuration: (Allocation Policy)-(Migration Policy)
– STR (Striping), SEQ (Seq. Alloc.), NOMIG (No migration), MIG (User-Optimal migration)
32 Texas A&M University Narasimha Reddy 5/1/2008
Single-User Environment (cont.)
Comparison between migration systems
– Migration based on locality: hot data (remotelocal), cold data (localremote)
33 Texas A&M University Narasimha Reddy 5/1/2008
Multi-User Environment - Striping
Server A: Load from 100 to 700
Server B: Load from 50 to 350
34 Texas A&M University Narasimha Reddy 5/1/2008
Multi-User Environment – Seq. Allocation
Server A: Load from 100 to 1100
Server B: Load from 30 to 480
35 Texas A&M University Narasimha Reddy 5/1/200835 Texas A&M University Narasimha Reddy 8/7/2007
Storage Management Across Diverse Devices
Flash storage becoming widely available
– More expensive than hard drives
– Faster random accesses
– Low Power consumption
In Laptops now
In hybrid storage systems soon
Manage data across Different Devices
– Match application needs to device characteristics
– Optimize for performance, power consumption
36 Texas A&M University Narasimha Reddy 5/1/200836 Texas A&M University Narasimha Reddy 8/7/2007
Motivation
VFS Allows many file systems underneath
VFS maintains 1 to 1 mapping from namespace to storage
Can we provide different storage options for different files for a single user?
– /user1/file1 storage system 1, /user2/file2 storage system 2…
37 Texas A&M University Narasimha Reddy 5/1/200837 Texas A&M University Narasimha Reddy 8/7/2007
Normal File System Architecture
Calc Impress Writer WinAmp
VFS
Ext2
/user1/file1 /user1/file2 /user2/file3 /user2/file4
/user1/*
User Space
Kernel
FAT32
/user2/*
Magnetic Disk Flash Drive
38 Texas A&M University Narasimha Reddy 5/1/2008
Umbrella File System
Calc Impress Writer WinAmp
VFS
Ext2
/user1/file1 /user1/file2
User Space
Kernel
Ext3 Ext2 FAT32
/FS1/user1/file3/FS2/user1/file1
/FS2/user1/file2/FS3/user1/file4
Encrypted Magnetic Disk
Magnetic Disk Flash Drive
UmbrellaFS
/user1/file3 /user1/file4
39 Texas A&M University Narasimha Reddy 5/1/2008
Example Data Organization
/usr/dir1/foo.avi
/usr/dir1/foo.txt
/usr/dir1/foo.jpg
/usr/dir1
/usr
/media/usr/dir1/text/usr/dir1/images/usr/dir1
/media/usr/text/usr/images/usr
/media/usr/dir1/foo.avi/text/usr/dir1/foo.txt/images/usr/dir1/foo.jpg
User View
Underlying data organization
40 Texas A&M University Narasimha Reddy 5/1/200840 Texas A&M University Narasimha Reddy 8/7/2007
Motivation --Policy Based Storage
User or System administrator Choice
– Allow different types of files on different devices
– Reliability, performance, power consumption
Layered Architecture
– Leverage benefits of underlying file systems
– Map applications to file systems and underlying storage
Policy decisions can depend on namespace and metadata
– Example: Files not touched in a week slow storage system
41 Texas A&M University Narasimha Reddy 5/1/200841 Texas A&M University Narasimha Reddy 8/7/2007
Rules Structure
Provided at mount time
User specified
Based on inode values (metadata) and filenames (namespace)
Provides array of branches
42 Texas A&M University Narasimha Reddy 5/1/200842 Texas A&M University Narasimha Reddy 8/7/2007
Umbrella File System
Sits under VFS to enforce policy
Policy enforced at open and close times
Policy also enforced periodically (less often)
UmbrellaFS acts as a “router” for files
– Not only based on namespace, but also metadata
43 Texas A&M University Narasimha Reddy 5/1/200843 Texas A&M University Narasimha Reddy 8/7/2007
Inode Rules Structure
Rule Inode/
Filename
Field Match Value Branch
1 Inode file permissions = Read Only /fs1, /fs2
2 Filename n/a n/a n/a n/a
3 Inode file creation time >= 8:00 am, August 3rd, 2007
/fs2
4 Inode file length < 20 KB /fs3
…
44 Texas A&M University Narasimha Reddy 5/1/200844 Texas A&M University Narasimha Reddy 8/7/2007
Inode Rules
Provide in order of precedence
First match
Compare inode value to rule
– At file creation some inode values indeterminate
– Pass over those rules
45 Texas A&M University Narasimha Reddy 5/1/200845 Texas A&M University Narasimha Reddy 8/7/2007
Filename Rules Structure
Rule Match String Branch
1 /*.avi /fs2,/fs1
2 /home/*.txt /fs1
3 /home/jgarrison/* /fs3
…
46 Texas A&M University Narasimha Reddy 5/1/200846 Texas A&M University Narasimha Reddy 8/7/2007
Filename Rules
Once first filename rule triggered, all checked
Similar to longest prefix matching
Double index based on
– Path matching
– Filename matching
Example:
– Rules: /home/*/*.bar, /home/jgarrison/foo.bar
– File: /home/jgarrison/foo.bar
– File matches second rule more closely (3 path length and 7 characters of file name vs. 3 path length and 4 characters of file name)
47 Texas A&M University Narasimha Reddy 5/1/200847 Texas A&M University Narasimha Reddy 8/7/2007
Evaluation
Overhead
– Throughput
– CPU Limited
– I/O Limited
Example Improvement
48 Texas A&M University Narasimha Reddy 5/1/2008
UmbrellaFS Overhead
Bonnie Read Overhead
0
5
10
15
20
25
30
35
40
Ext2 1 2 4 8 16 32
Rules
Throughput (MB/s)
Ext2
Inode Rules
Filename Rules
49 Texas A&M University Narasimha Reddy 5/1/200849 Texas A&M University Narasimha Reddy 8/7/2007
CPU Limited Benchmarks
50 Texas A&M University Narasimha Reddy 5/1/200850 Texas A&M University Narasimha Reddy 8/7/2007
I/O Limited Benchmarks
51 Texas A&M University Narasimha Reddy 5/1/200851 Texas A&M University Narasimha Reddy 8/7/2007
Flash vs. RAID5 Read Performance
52 Texas A&M University Narasimha Reddy 5/1/2008
Flash vs. RAID5 Write Performance
Write Performance
0
10
20
30
40
50
60
70
1 10 100 1000 10000
File Size (kB)
Throughput (MB/s)
RAID 5
Flash SSD
53 Texas A&M University Narasimha Reddy 5/1/200853 Texas A&M University Narasimha Reddy 8/7/2007
Flash and Disk Hybrid System
54 Texas A&M University Narasimha Reddy 5/1/2008
Disks with Encryption hardware
Encryption Example
0
100
200
300
400
500
600
700
800
Partial Encryption Full Encryption
Time (s)
55 Texas A&M University Narasimha Reddy 5/1/2008
Conclusion
Virtual allocation allows Flexibility
– Improve the flexibility of managing storage across multiple file systems/platforms
Enabled user-optimal migration
– Balance disk access locality and load balance automatically and transparently
– Adapt to changes of workloads and loads in each storage device
Policy-based storage: Umbrella File System
– Allows matching application characteristics to devices