Improved Query Performance With Variant Indexes - review presentation

Improved Query Performance With Variant Indexes

paper by Patrick O’Neil & Dallan Quass

Presented by V B Wickramasinghe - 148245F

Overview

•What are database indexes?•Value-List Index (B+ tree)•Bitmap Indexes•Projection Indexes•Bit-Sliced Indexes•Comparison of Index Types for different queries.•Conclusion

What are database indexes?

“A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and the use of more storage space to maintain the extra copy of data.”- Wikipedia

http://en.wikipedia.org/wiki/Data_structure

http://en.wikipedia.org/wiki/Table_(database)

Value-List Index (B+ tree)Used to retrieve rows of a table with specified values involving one or more indexed columns.Leaf values of the B+ tree consists of a sequence of entries for index key values. Each key value entry references the set of rows with that value.

----usually B+ trees references each row as RID(Row ID)

Problem- In indexes with a relatively small number of key values compared tothe number of rows, most key values will have large number of associated RID’s

Bitmap Indexes

•Bitmap is an alternate method of representing RID in a Value-List index A “Bitmap B” is defined on table T as a sequence of M bits, where for each row with ordinal number j, we set the jth bit in B if that row satisfies the property of the index.

Bitmap Indexes (contd..)Bitmap Index Performance :-• boolean operations such as AND, OR, and Not are extremely fast for

Bitmaps --- AND

for ( I=0 ; I < len(B1) ; I++) B3[I] = B1[I] & B2[I]; --- OR for ( I=0 ; I < len(B1) ; I++) B3[I] = B1[I] || B2[I]; ---NOT for ( I=0 ; I < len(B1) ; I++) B3[I] = ~B1[I] & EBM[I];

These loops to calculate AND,OR, or NOT are extremely fast compared to same done on RID lists as long as the Bitmaps involved have reasonably

high density.

Advantaged of Bitmap Indexes

•Fast operations

—The most common operations are the bitwise logical operations

—They are well supported by hardware

•Easy to compress, potentially small index size.

•Each individual bitmap is small and frequently used ones can be cached in memory.

•Efficient for read-mostly data: data produced from scientific experiments can be appended in large groups.

•Bitmaps are more space efficient than RID lists in a Value-List index.

•Disk savings are enormous in case when Bitmaps are dense i.e number of one bits as a proportion of all bits in a Bitmap are large.

Projection Index

A projection index for column duplicates all column values for lookup by ordinal number .

Projection Index (contd..)

----projection index turns out to be quite useful in cases where column values must be retrieved for all rows of the foundset, where the foundset is dense enough such that several column values would probably be found on the same disk page of the projection index, but rows themselves being larger, would appear on several different pages

----faster to retrieve a foundset of column value from projection indexthan from rows themselves

Bit-sliced IndexA Bit-Sliced index for a column creates Bitmaps for significant bits in the column value,

Bnn – bitmap representing set of non null values in the indexed column

Comparison of Indexes evaluating Single-Column Sum Aggregates

Plan 1 : Direct access to the rows to calculate the Sum -- costs includes I/O cost and CPU cost for getting proper row and column value from buffer resident page.

Plan 2 : Calculating Sum through a Projection Index Plan 3 : Calculating the Sum through a Bitmap Index

if (COUNT (Bf AND Bnn) = = 0) Return null;

SUM = 0.0; for each non-null value v in the index for C { Designate the set of rows with value v as Bv SUM += v * COUNT(Bf AND BV); }

Return SUM;


Plan 4 : Calculating the SUM through a Bit-Sliced Index

if (COUNT (Bf AND Bnn) = = 0) Return null;SUM = 0.0;for i = 0 to N SUM += 2i * COUNT(Bi AND Bf);Return SUM;


Evaluating Range PredicatesSELECT target-list FROM T WHERE C-range AND <condition>

Plan 1 : Evaluating the Range using the Projection Index :- --- we create BF in a straightforward way by accessing each C value in the index corresponding to an ordinal row number in Bf and testing whether it lies with the specified range

Plan 2 : Evaluating range predicates using Value-List index :- Br = empty set; for each entry v in the index for C that satisfies the range specified Designate the set of rows with the value v as Bv Br = Br OR Bv BF = Bf AND Br

Evaluating OLAP-style Queries

Data warehouses are often built to support OLAP. OLAP query performance depends on creating a set of summary tables to efficiently evaluate an expected set of queries. This approach is possible only when the expected set of queries is known in advance. Specifically, the OLAP Approach addresses queries that group by different combination columns, known as dimensions. But when ad-hoc queries must be issued that selects the rows on the criteria that are not part of the dimensional scheme, summary tables that don’t foresee such selection cannot be used and in such cases other indexes on the base data must be used.

➢ Join Indexes and Bitmap-Join-Indexes are tried to address this problem.

Join IndexesA join index is an index on one table that involves a column value from different table through a commonly encountered join.

but, if there are numerous columns used for restrictions in each dimension table, then the number of star join indexes needed to be able to combine any single column choice from each dimension table is a product of the number of columns in each dimension. There will be a combinatorial explosion of join indexes in terms of the number of useful columns.

✓ bitmap join index addresses this problem

Bitmap Join Index

It is an index on a table T based on a single column of a table S, where S commonly joins with T in a specified way.

Obviously the number of indexes of this kind increases linearly with the number of useful columns in the dimension tables, but the speed of combining Bitmapped indexes to create ad-hoc combinations takes care of the problem of explosion of star join indexes.

Segmentation: It is one of the methods used for speed up the query with one or more group-by attributes.Clustering: This is clustering for the finely divided fact table F to improve the performance. Not only queries on fact tables, it also improves group-by queries on the dimensions.

Improved Grouping Efficiency Using Segmentation & Clustering

● Read-mostly environment of data warehousing has made it feasible to use more complex index structures to speed up the evaluation of queries.

● The paper shows how ad-hoc OLAP style queries involving aggregation and grouping can be efficiently evaluated using indexing and clustering.

● The paper also introduces a new index type, Groupset indexes, that are especially well-suited for evaluating OLAP type of query.

Conclusion

Engineering

Improved Query Performance With Variant Indexes - review presentation