Upload
hillary-smith
View
217
Download
2
Tags:
Embed Size (px)
Citation preview
Oracle Index study for Event TAG DB
M. Boschini [email protected]. Della Torre [email protected]
Outline TAG DB issue:
space usage issue INDEX accounting for ~ 50% of used space...
query performance Tests with random data Tests with AMS01 data Tests with AMS02 Cosmic data
M. Boschini Oracle Index Study for STATUS DB – TIM October 2008 - CERN
EVENT TAG DB Goal: test feasibility of storing in an Oracle DB TAG
info for AMS02 events TAG is a 64bit number AMS02 MC has, as for now, no meaningful TAG info
(0 or error bit) We start from previous studies:
http://ams.cern.ch/AMS/Reports/AMSnote-2000_09_05b.ps.gz
http://ams.cern.ch/AMS/Reports/Computing/Apr2004/M.Boschini_LV3_DB.pdf
EVENT TAG DB
Test INDEX type space usage Test INDEX usage Time insert & selections What can we use ?
test environment SLC 4.6 (64bit) 2 dual core Intel(R) Xeon(R) CPU @ 2.00GHz Oracle 11g 64 bit
we opted for Oracle11g because of product lifetime All data is fed to DB using C-OCI programs Oracle setup:
user AMSDES with dedicated Bigfile tablespace overkill wrt Oracle's suggestion (>= 1 TB) ...?
default TEMP TBLSpace (3 GB)
test environment Dummy Table:
RUN (NUMBER) EVENT (NUMBER) TAG (BINARY_DOUBLE)
TAG field will be used for indexing ...assuming queries will be mainly on TAG
10^8 records, equivalent to ~ 10 days of DAQ Test B-Tree vs BitMap Index
as hinted by our previous studies...
Indexes (theory) Btree
default index type in Oracle very space consuming. Not advised for dataware housing
dataware house: write few, read many, HUGE data sample BitMap
advised for dataware housing Bitmap indexes are stored as an array of zero-and-one
values, with one entry for each row. Should be used only on low cardinality fields
Indexes (theory) BitMap
From Oracle Advanced Programming White Paper, Sep. 2008
“...Conventional wisdom holds that bitmap indexes are most appropriate for columns having low distinct values—such as GENDER, MARITAL_STATUS. This assumption is not completely accurate, however. In reality, a bitmap index is always advisable for systems in which data is not frequently updated by many concurrent systems...”
Tests Items to be tested are:
SPACE OCCUPANCY = f(index type) QUERY PERFORMANCE = f(index type)
Index types: B-Tree and Bitmap We started with a random generated sample We then used an AMS01 sample
40 ntuples no request on charge, pmass, pmom, lat (nothing!)
AMS02 Cosmic data
Disk Space TABLE SIZE = 5 GB
this means that for AMS02 we'll need 500 GB B-Tree INDEX SIZE = 4.8 GB
this means that for AMS02 we'd need 480 GB BitMap INDEX SIZE = 70 MB
this means that for AMS02 we'd need 7 GB thus, at least for space reasons, we should use
BitMap INDEX, a factor 30 smaller.
when to create an INDEX After TABLE has been populated ! B-Tree INDEX creation time:
4 µsec/record mainly SORT and disk space allocation
BitMap INDEX creation time: 0.5 µsec/record
Theory: is an INDEX useful ? Not always... ORACLE has a built-in decision taking algorithm
(EXECUTION PLAN) which decides how to actually implement a query.
Execution plan tries to optimize disk accesses and CPU usage.
So, e.g., if reading an index which returns “many” records takes more than X, than the index is not used.
These are based, since Oracle 10.0.2i on Cost Based Optimizer and Dynamic Sampling (by default ON)
EXECUTION PLANPLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3591811347
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 36793 | 503K| 297 (1) | 00:00:04 |
| 1 | TABLE ACCESS FULL | N_TAG_RAND_NOISE | 36793 | 503K| 297 (1)| 00:00:05 |
EXECUTION PLANPLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3591811347
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 36793 | 503K| 297 (1) | 00:00:04 |
| 1 | TABLE ACCESS BY INDEX ROWID| N_TAG_RAND_NOISE | 36793 | 503K| 297 (1)| 00:00:04 |
|* 2 | INDEX RANGE SCAN | TAG_RAND_IDX | 36793 | | 106 (0)| 00:00:02 |
IDX usage Test To test if Index is actually used, we wrote a simple
PLSQL+Perl program that Dynamically populates statistics table
HOW-MANY <--> TAG_STA For each TAG_STA value, generates and analyzes
EXECUTION PLAN for SELECT run, event from TABLE wher tag=TAG_STA;
We thus run CARDINALITY queries to DB and analyze the Optimizer decisions...
Sample1 CARDINALITY:5relative population > 5.7%
Sample2 CARDINALITY:2517relative population: 6x10E-4
Performance Sample1 and Sample2 both do NOT use INDEX in the
select query. This is because Oracle's built-in optimization algorithm
discovers that too many records will be returned, and thus decides to ignore the INDEX
Thus: LOW/MEDIUM cardinality with HIGH relative
population makes INDEX uneffective
Sample3 CARDINALITY:1024relative population 6x10E-5
Performance Sample3 uses INDEX ! This is because Oracle's builtin optimization algorithm
discovers that TABLE ACCESS BY INDEX ROWID is efficient.
Thus LOW/MEDIUM cardinality with “correct for
Oracle” relative population makes INDEX effective !
Real Data: AMS01 We then used an AMS01 sample
40 ntuples (no request on charge, pmass, pmom, ecc) “scrambled” this sample to get 10^8 records
Average relative cardinality = 2x10E-6 BitMap Index is USED ! Selection time
no IDX: 22 sec IDX: 0.05 sec
INDEX really effective
Real Data: AMS02 !!! We also used AMS02 Cosmic Rays data
we used only 2 2x10E8 events no request on charge, pmass, pmom, ...(nothing!)
Table size: 5.5 GB 2 BitMap IDX size: 57 MB 2
just for the sake of curiosity: a UNIQUE requirement on RUN+EVENTNO creates an INDEX 5 GB big !!!
Btree IDX size: 4.2 GB 2
Real Data: AMS02 !!! Mean Relative cardinality: 1.8x10E-4 Max Relative cardinality: 3x10E-3 BitMap Index can be USED !
tested EXECUTION PLAN for all TAG values.
Real Data: AMS02 !!! Selection time
no IDX: Average Selection time: 24 sec 0.01 sec/record once retrieved
IDX: Average Selection time: 0.05 sec 0.005 sec/record once retrieved
Again, B-Map index is very effective.
naïve usage... The most naïve way to design the DB is 1 table for
each month of DAQ. This leads to ~36 tables. Most naïve way to query them all is using a UNION
statement which still uses index with nearly no overhead for the SORT
UNIQUE/UNION-ALL part...
PLAN_TABLE_OUTPUT------------------------------------------------------------------------------------------------------------------------------------Plan hash value: 2624995586
----------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |----------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 44 | 1408 | 290K (50)| 00:58:03 || 1 | SORT UNIQUE | | 44 | 1408 | 290K (50)| 00:58:03 || 2 | UNION-ALL | | | | | || 3 | TABLE ACCESS BY INDEX ROWID | COSMIC_TDV | 42 | 1344 | 145K (1)| 00:29:10 || 4 | BITMAP CONVERSION TO ROWIDS| | | | | ||* 5 | BITMAP INDEX SINGLE VALUE | COMIX_TDV | | | | || 6 | TABLE ACCESS BY INDEX ROWID | COSMIC_TDV1 | 2 | 64 | 144K (1)| 00:28:53 || 7 | BITMAP CONVERSION TO ROWIDS| | | | | ||* 8 | BITMAP INDEX SINGLE VALUE | COMIX_TDV1 | | | | |----------------------------------------------------------------------------------------------
44 rows returned out of 377,516,896 rows in total
Rough estimate for SELECT on 3 years: 25 minutes...
Conclusions In general, Bit-Map index is very efficient in space usage
30 times smaller than normal Btree index Index is used in queries if relative cardinality is low If so, selection time is very low. According to AMS01 and AMS02 Cosmic Rays
analyzed, BitMap index can be used We could thus use it for AMS02 expected space usage: 500 GB (table)+ 6 GB (idx)
EventStatusTable02 files will use ~ 150 GB...
AMS01 sample CARDINALITY:89860
average relative population 5.7%