Upload
denise
View
17
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Object Based Disk: the key to intelligent I/O. George Gorbatenko Data Machine International St Paul, MN 55115 [email protected]. Why are we interested?. faster transportable more accessible cheaper facilitates holistic design improve reliability. - PowerPoint PPT Presentation
Citation preview
Sept 23, 2002 1
Object Based Disk: the key to intelligent I/O
George GorbatenkoData Machine International
St Paul, MN 55115 [email protected]
Sept 23, 2002 DMI 2
Why are we interested? faster transportable more accessible cheaper facilitates holistic design improve reliability
Sept 23, 2002 DMI 3
I/O is considered the weak link in systems architecture
I/O problem memory wall bottle neck
Sept 23, 2002 DMI 4
Issues randomness is painful
mechanical time vs electronic time ratio of times is about 200:1
operating system obscures the disk
Sept 23, 2002 DMI 5
Operating System seamless view of space legacy of data storage goes back
to punched card accommodates all applications
Sept 23, 2002 DMI 6
Data evolution tape reflected a 80 column card
image disk reflected tape
Sept 23, 2002 DMI 7
In short… nothing much has changed data
format-wise since the 1930’s we are pretty much dealing with
records in a linear format, one record after the next
Sept 23, 2002 DMI 8
The advantage of object based design is encapsulate the data define the application subset don’t have the operating system
getting in the way
Sept 23, 2002 DMI 9
SQL object is good choice broad user base de facto standard for data bases high enough to exploit the power
in the I/O yesterdays CPU in today’s disk
(controller) aggregate compute power exceeds
the host
Sept 23, 2002 DMI 10
Researchers in Intelligent Disks are motivated by… exploiting the latent processing
potential filtering data in place
Sept 23, 2002 DMI 11
Consider a disk farm…
IOP 0 IOP 1 IOP N
HOST
fbus
fchannel
fIOP
Sept 23, 2002 DMI 12
But where do we place the intelligence? host I/O controller disk
HOST Processor
I/O Processor(IOP)
Sept 23, 2002 DMI 13
Disk basics many platters (ea fixed head) 10 many concentric tracks / platter 10k each track holds many sectors 100
Total number of 512 byte sectors 10M
____ disk capacity: 5GB
Sept 23, 2002 DMI 14
To access a random block seek to track 10-15 us wait for block to roll around 4 –5
us read block 80 us
hence… 200:1
Sept 23, 2002 DMI 15
Design Goals synchronous operation
next data you want is beneath head process data in place (filter) touch the min amount of data
for what you touch you pay in time and space
exploit locality amortize random access read over large
data block
Sept 23, 2002 DMI 16
Access strategies… Amortize the (inefficient) access
over large block of data Make sure the data has utility
Sept 23, 2002 DMI 17
Optimum Block Size
Sept 23, 2002 DMI 18
Select name, address,salary where salary >22K
Last Name First Address City State Zip Empl_no Salary DOH
Gorbatenko George 106 Wildwood Bay St Paul MN 55115 636222 10K 1961Roth Chas 1218 First Ave Hudson WI 54016 123456 8K 1980Anderson Tim 4345 N Polaris Plymouth MN 55123 663322 11K 1982Fittingfoff Maria 2088 Mulberry St Pleasant Hill CA 93010 997431 20K 1960Brubaker Susan 280 Meadowbrook Reno NV 89509 654321 15K 1965Groza Galina 3365 Broderick St San Francisco CA 94102 23417 24K 1975
Sept 23, 2002 DMI 19
Data Utility…
Last Name First Address City State Zip Empl_no
Gorbatenko George 106 Wildwood Bay St Paul MN 55901 636222Roth Chas 1218 First Ave Hudson WI 54016 123456Anderson Tim 4345 N Polaris Plymouth MN 55123 663322Fittingfoff Maria 2088 Mulberry St Pleasant Hill CA 93010 997431Brubaker Susan 280 Meadowbrook Reno NV 89509 654321
Salary
10K 8K 11K 20K 15K 24KGroza Galina 3365 Broderick St San Francisco CA 94102 23417
DOH
196119801982196019651975
Sept 23, 2002 DMI 20
Consider the travels of an inchworm…
A1 B1 C1 D1 E1 A2 B2 C2 D2 E2 A3 B3 C3 D3 E3 A4 B4 C4 D4 E4 A5 B5 C5 D5 E5
Sept 23, 2002 DMI 21
Travels of an inchworm…
A1 B1 C1 D1 E1 A2 B2 C2 D2 E2 A3 B3 C3 D3 E3 A4 B4 C4 D4 E4 A5 B5 C5 D5 E5
Sept 23, 2002 DMI 22
Travels of an inchworm…
A1 B1 C1 D1 E1 A2 B2 C2 D2 E2 A3 B3 C3 D3 E3 A4 B4 C4 D4 E4 A5 B5 C5 D5 E5
Sept 23, 2002 DMI 23
Locality of Reference
A1 B1 C1 D1 E1
A2 B2 C2 D2 E2
A3 B3 C3 D3 E3
A4 B4 C4 D4 E4
A5 B5 C5 D5 E5
(a) Logical view of two dimensional table.
A1 B1 C1 D1 E1 A2 B2 C2 D2 E2 A3 B3 C3 D3……
(b) Row ordered mapping (physical).
A1 A2 A3 A4 A5B1 B2 B3 B4 B5 C1 C2 C3 C4……
(c) Column ordered mapping (physical)
Sept 23, 2002 DMI 24
Preservation of Logical Topology
To preserve the logical topology of n dimensional logical data space, the physical space must at least be of like dimension.
- for a 2D table (rows and columns) we need to view disk as two dimensional
Sept 23, 2002 DMI 25
Observations:
SQL can be decomposed in two operations select - favored by column order extract – favored by row order
granular access permits touching min data map data so as to preserve topology when
going from logical to physical medium reading a tracks worth of data appears
reasonable
Sept 23, 2002 DMI 26
Treating disk as 2D space data objects are 2D spaces solves “design boundaries” disk is basically a 3D medium
cylinder-track-sector
Sept 23, 2002 DMI 27
The disk is 3 dimensional
Sept 23, 2002 DMI 28
Consider the first cylinder of the set…
Sept 23, 2002 DMI 29
Examining a single cylinder…
Sept 23, 2002 DMI 30
which has tracks and sectors…
Sept 23, 2002 DMI 31
track for each head…
-1
-0.5
0
0.5
1
-1
-0.5
0
0.5
10.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Sept 23, 2002 DMI 32
track read…
-1
-0.5
0
0.5
1
-1
-0.5
0
0.5
10.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Sept 23, 2002 DMI 33
diagonal (sector block) read…
-1-0.5
00.5
1
-1
-0.5
0
0.5
10
0.2
0.4
0.6
0.8
1
Sept 23, 2002 DMI 34
sector block shadow
-1-0.5
00.5
1
-1
-0.5
0
0.5
10
0.2
0.4
0.6
0.8
1
Sept 23, 2002 DMI 35
Unwrap a cylinder…
-1
-0.5
0
0.5
1
-1
-0.5
0
0.5
10.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Sept 23, 2002 DMI 36
2 dimensional space: hd x sector
sect 0 sect 2 sect 3 sect 4 sect 5 sect 6 sect 7 sect 8 sect 9 sect10 sect 11hd 0hd 1hd 2hd 3hd 4hd 5hd 6hd 7hd 8hd 9
Sept 23, 2002 DMI 37
track read or sector block read…
sect 0 sect 2 sect 3 sect 4 sect 5 sect 6 sect 7 sect 8 sect 9 sect10 sect 11hd 0hd 1hd 2hd 3hd 4hd 5hd 6hd 7hd 8hd 9
Sept 23, 2002 DMI 38
Physical Sector Block Organization…
hd 0hd 1hd 2hd 3hd 4hd 5hd 6hd 7hd 8hd 9
Logical sector size (lss)
Physical sector (512)
Sept 23, 2002 DMI 39
Logical Sector Block Organization…
hd 0hd 1hd 2hd 3hd 4hd 5hd 6hd 7hd 8hd 9
Logical sector size (lss)
Physical sector (512)
Sept 23, 2002 DMI 40
record structure…
typedef struct _record{char employee_no [8]; // employee number; field Achar name [12]; // name; field Bchar address [24]; // address; field Cchar zip [5]; // zip code; field Dchar salary [6]; // salary; field Echar doh [6]; // data of hire; field Fchar dept [3]; // department; field Gchar tbd [16]; // reserved for future use; field H} Record;
Sept 23, 2002 DMI 41
modified best fit algorithm
LSS = ceil (rec_len / num_hds)
= ceil (64 /10) = 4n = 8
rec_space = LSS * num_hds = 80 bytes
hd 0hd 1hd 2hd 3hd 4hd 5hd 6hd 7hd 8hd 9
LSS (8 bytes)
Sept 23, 2002 DMI 42
modified best fit algorithm
hd 0hd 1hd 2hd 3hd 4hd 5hd 6hd 7hd 8hd 9
LSS (8 bytes)
A
B
C
D
E
F
G
typedef struct _record{char employee_no [8]; // field Achar name [12]; // field Bchar address [24]; // field Cchar zip [5]; // field Dchar salary [6]; // field Echar doh [6]; // field Fchar dept [3]; // field Gchar tbd [16]; // field H} Record;
Sept 23, 2002 DMI 43
SQL Decomposition… Select records
scan the salary field stores ordinal position in bit vector
Extract records optimizer decides strategy (trk or sb
read)
Sept 23, 2002 DMI 44
Comparison Results…PROCESSING TIME FOR CYLINDER *
IDEAL 2D TRK BUFF 4K BLOCK
Data Utility (%) 100 75 9.4 9.4 Records/cylinder (K) 10.24 8.19 10.24 10.24 Selection (%) 0.01 0.01 0.01 0.01 Records extracted 1 1 1 1 Total time (ms) 32 48 320 2720 Proc rate (K rec/sec) 320 171 32 3.8 Number of spins 2+ 3 20 170
ASSUMPTIONS: latency = 8 ms
sectors per track = 128 num_hds = 10 random access = 16 ms
Sept 23, 2002 DMI 45
Prototype
DB Driver
API
Kernel
IOP
Spindle Spindle
Figure 10-1 Block diagram of prototype
• two 4 GB Seagate Baracudas
• 21 heads (29 zones)
• 40 KLOC
• skew = 5 sectors
• Solaris 2.51 OS
• emulated intelligence in IOP
• context sw every 60 ms
Sept 23, 2002 DMI 46
Data particulars… 168 byte records LSS = 8 bytes 63 records per Sector Block 7,749 records per cylinder 3 fields (2 heads) involved in query 2 records extracted from disjoint
blocks
Sept 23, 2002 DMI 47
Test Runs
(a) write cyl worth data w/o optimizer(b) write same with optimizer enabled (c) scan cyl involving 3 col; extract 2
blks(d) repeat operation (c)
Sept 23, 2002 DMI 48
Results…
Observed Calculated
case (a) 2.5 sec 2.427 sec
case (b) 196 ms 216 ± 4 ms case (c) 51 ms 54.5 ± 4ms case (d) 42 ms 37.6 ± 4ms
Sept 23, 2002 DMI 49
Benchmark Analysis 3 Benchmarks selected
- Wisconsin- Set Query- TPC D/H
selected non-join cases reversed engineered the I/O detail
Sept 23, 2002 DMI 50
Wisconsin results…
Q1 Q2 Q3 Q4 Q50.100
1.000
10.00
100.0
1000WISCONSIN BENCHMARK
Benchmarks
Tim
e (
seco
nds)
WIS2D
Sept 23, 2002 DMI 51
Average time within class…
Q1 Q2A Q3A Q4A Q50
20
40
60
80
100
120
140
160
180
200AVERAGE TIME WITHIN CLASS
Query Group
Ave
rage
Tim
e (s
econ
ds)
DB2M2042D
Sept 23, 2002 DMI 52
TPC Q6
LINEAL
block size 8,192 65,536
time (sec) 155 19.38 8.00 ratio
FOM 0.009 0.05 5.47 ratio
2D
lss (bytes) 8 4
time (sec) 1.95 1.59 1.23 ratio
FOM 0.454 0.639 1.41 ratio
FOM ratio 49.38 12.71
time ratio 79.52 12.22
Sept 23, 2002 DMI 53
But what about indexes? Depending on selection rate, may
make sense. useful when…
enforcing unique key SQL JOIN operation
add noise to coherent system
Sept 23, 2002 DMI 54
The “S” curve
0.01% 0.10% 1.00% 10.0% 100%0
5
10
15
20
25EXTRACTION TIME vs SELECTION
Selection (percent)
Num
ber
of
Sp
ins
DEL=1DEL=2DEL=4DEL=8DEL=16
Sept 23, 2002 DMI 55
Index vs Exhaustive Scan
0.001% 0.010% 0.100% 1.000% 10.00% 100.0%
0.010
0.100
1.000
10.00
100.0
1,000.0
10,0000
100,000COMPARISON: INDEX vs EXHAUSTIVE SCAN
Extraction Rate (expressed as percentage)
Tim
e to
Ext
ract
Rec
ords
from
1M
Row
Tab
le (
seco
nds)
seq Xrand Xiso X2D seq2D rand
Sept 23, 2002 DMI 56
Penalty for wrong guess is less
0.001% 0.010% 0.100% 1.000% 10.00% 100.0%
0.010
0.100
1.000
10.00
100.0
1,000.0
10,0000
100,000COMPARISON: INDEX vs EXHAUSTIVE SCAN
Extraction Rate (expressed as percentage)
Tim
e t
o E
xtra
ct R
eco
rds
fro
m 1
M R
ow
Ta
ble
(se
con
ds)
seq Xrand Xiso X2D seq2D rand
3 seconds
7 hours
Sept 23, 2002 DMI 57
Index Strategy Use on unique fields consider for JOIN operations evaluate based on knowledge of
data AVOID where no information is
available about data- penalty is bounded
Sept 23, 2002 DMI 58
Intelligent I/O Processor
IOP
200 MB/s
20 KC
36 MB/s
Sept 23, 2002 DMI 59
I/O ops are atomic to the cylinder Information contained in a “clip”
- spindle & cylinder number & LSS- data- number records - scan info (selection criteria)- command
- fields of interest (project)- read/write/scan
Data and bit (beta) vector returned
Sept 23, 2002 DMI 60
IOP Characteristics… Manages own memory and
- tracks are cached DSP is ideal building block
- columns, vectors Has no semantic awareness of data
- offsets and widths Process data in real time RISC type
- executes simple jobs quickly
Sept 23, 2002 DMI 61
Parallel Architecture
IOP 1 IOP 2 IOP 3 IOP n
HOST PROCESSOR
10 MZ / .020 = 500 IOP’s500 IOP’s => 4K spindles
Sept 23, 2002 DMI 62
Comments… Issue becomes being able to evenly
distribute data DB vendors have to pass the info down Disk OEMs… capacity vs intelligence Performance is scalable Evaluate performance of indexes in new
light…- could introduce noise in otherwise coherent
system
Sept 23, 2002 DMI 63
Summary Object Based Design permits local
optimization with minimum compromise 2D mapping Synchronous disk Granular access is key Intelligent IOP (real time processing) For I/O limited applications… significant
performance gains are possible
Sept 23, 2002 DMI 64
Challenge and future work Model a more rigorous system Explore other applications
spatial scientific
Integrate concept in a communication model
add disk instrumentation define role for disk