Upload
flashdomain
View
196
Download
0
Tags:
Embed Size (px)
Citation preview
1
An open source DBMS An open source DBMS for handheld devicesfor handheld devices
Stage 2 Stage 2
by Rajkumar Sen
IIT Bombay
Under the guidance of
Prof. Krithi Ramamritham
04/13/23 An open source DBMS for handheld devices
2
OutlineOutline
• Introduction
• Storage Management
• Query Processing
• Future Work
04/13/23 An open source DBMS for handheld devices
3
IntroductionIntroductionStage 1 Survey of
Storage Models: Flat Storage, Domain Storage, and Ring Storage Query Processing issues Data Synchronization Concurrency Control and Recovery
Goals for stage 2 New storage models to further reduce storage cost Memory cognizant query processing Data Synchronization issues System Implementation issues
04/13/23 An open source DBMS for handheld devices
4
Storage ManagementStorage Management
Aim at compactness in representation of data
Existing storage models – Flat Storage– Pointer-based Domain Storage
In Domain Storage, pointer of size p (typically 4 bytes) to
point to the domain value. Can we further reduce the storage cost?
04/13/23 An open source DBMS for handheld devices
5
Storage ManagementStorage Management ID Storage:
– An identifier for each of the domain values
– Identifier is the ordinal value in the domain table
– Store the identifier instead of the pointer
– Use the identifier as an offset into the domain table
– Extendable IDs, length of the identifier grows and shrinks depending on the number of domain values
04/13/23 An open source DBMS for handheld devices
6
Storage ManagementStorage Management D domain values can be distinguished by identifiers of
length log2D /8 bytes.
Starting with 1 byte identifiers, the length grows and shrinks. ID values are projected out from the rest of the relation and stored separately maintaining Positional Indexing. Why not bit identifiers?
– Storage is byte addressable.– Packing bit identifiers in bytes increases the storage
management complexity.
04/13/23 An open source DBMS for handheld devices
7
Storage ManagementStorage Management
Relation R ID Values
Figure: ID Storage
0
1
2
1
n
0
n
v0
v1
vn
Domain Values
Positional Indexing
04/13/23 An open source DBMS for handheld devices
8
Storage ManagementStorage ManagementPing Pong Effect
– At the boundaries, there is reorganization of ID values when the identifier length changes– Frequent insertions and deletions at the boundaries might result in a lot of reorganization– Phenomena should be avoided
No deletion of Domain values– Domain structure means a future insertion might reference the deleted value– Do not delete a domain value even it is not referenced
Setting a threshold for deletion – Delete only if number of deletions exceeds a threshold– Increase the threshold when boundaries are being crossed
04/13/23 An open source DBMS for handheld devices
9
Storage ManagementStorage ManagementPrimary Key-Foreign Key relationship
– Primary key: A domain in itself– IDs for primary key values– Values present in child table are the corresponding primary
key IDs– Projected foreign key column forms a Join Index
Child Table
Relation S
S.BID Values
Figure: Primary Key-Foreign Key Join Index
0
1
2
1
n
0
n
v0
v1
vn
Parent TableRelation R
04/13/23 An open source DBMS for handheld devices
10
Storage ManagementStorage Management ID based Storage wins over Domain Storage when p > log2D /8
Relations in a small device do not have a very high cardinalityAbove condition true for most of the data.
Advantages(i) Considerable saving in storage cost.(ii) Efficient join between parent table and child table
04/13/23 An open source DBMS for handheld devices
11
Storage ManagementStorage ManagementBitmap Storage
– When the number of domain values is very less compared to the number of tuples, e.g., True, False – Selection on multiple attributes
A Data + Index Model– A bitmap index is created for every bitmap attribute– Attribute values are not stored in the base relation– The index can be used to retrieve the domain value of each tuple
Cost of Projection becomes high as is the case with Ring Storage
Join index of parent table-child table possible by storing
bitmaps for every primary key value
04/13/23 An open source DBMS for handheld devices
12
Storage ManagementStorage Management
• Bitmap Storage not an alternative to Ring Storage• Indexing capabilities of both models are different• Depending on attribute characteristics, choose the appropriate model
Memory requirement for selection– Number of bit vectors is equal to the number of attributes that form part of the selection– Bit vectors in memory
04/13/23 An open source DBMS for handheld devices
13
Query ProcessingQuery ProcessingConsiderations
– Minimize writes to secondary storage– Efficient usage of limited main memory– Read buffer not required– Main memory as write buffer– If read:write ratio very high, flash memory as write buffer
Query Plan– An optimal query plan is needed– Reduce materialization, if absolutely necessary use main memory– Bushy trees and right-deep trees are ruled out– Left deep tree is most suited for pipelined evaluation– Right operand in a left-deep tree is always a stored relation– Only one input is pipelined
04/13/23 An open source DBMS for handheld devices
14
Query ProcessingQuery ProcessingMemory Allocation to Operators
– Limited main memory, cannot assume that the entire memory is available for every operator in the left-deep tree plan– Can the plan be executed with the available memory?
If nested loop algorithms are used for every operator, minimumamount of memory is needed to execute the plan
– Nested loop algorithms are inefficient– Should memory usage be reduced to a minimum at the cost of performance?– Memory increasing with every new device– Different devices come with different memory sizes– Query plans should make extensive use of memory– Memory must be optimally allocated among all operators
04/13/23 An open source DBMS for handheld devices
15
Query ProcessingQuery ProcessingOperator evaluation schemes
– Different schemes for an operator– All have different memory usage and cost– Schemes conform to left-deep tree query plan– Cost of a scheme is the computation time
Schemes for Join– Nested Loop Join– Indexed Nested Loop Join– Hash Join
Similar schemes for other operators
04/13/23 An open source DBMS for handheld devices
16
Query ProcessingQuery ProcessingBenefit/Size of a scheme
Every scheme is characterized by a benefit/size ratio which represents its benefit per unit memory allocation Minimum scheme for an operator is the scheme that has max. cost and min. memory
Assume n schemes s1, s2,…sn to implement an operator o
min(o)=smin
i, 1≤i≤n : Cost(si) ≤ Cost(smin) ,
Memory(si) ≥ Memory(smin)
smin is the minimum scheme for operator o. Then,
Benefit(si)=Cost(smin) – Cost(si)
Size(si) =Memory(si) – Memory(smin)
A
04/13/23 An open source DBMS for handheld devices
17
Query ProcessingQuery Processing
An operator is defined by the benefit and size of its schemesEvery operator is a collection of (size,benefit) points, n pointsfor n schemes
Benefit
(0,0)
(s1,b1)
(s2,b2)
Figure: (Size, Benefit) points for an operator
Size
04/13/23 An open source DBMS for handheld devices
18
Query ProcessingQuery ProcessingOptimal Memory Allocation
Determine the amount of memory allocated to each operator to get maximum benefit
2-Phase ApproachPhase 1: Query is first optimized to get a query planPhase 2: Division of memory among the operators
Scheme for every operator is determined in phase 1 and remainsunchanged after phase 2, memory allocation in phase 2 on thebasis of the cost functions of the schemes
Memory is assumed to be available for all the schemes, this maynot be true for a resource constrained device
04/13/23 An open source DBMS for handheld devices
19
Query ProcessingQuery Processing
Depending on the available memory, need to determine thebest scheme for every operator out of all possible ones
Schemes in phase 1 and after phase 2 need not be the same
Optimal division of memory involves the decision of selectingthe best scheme for every operator
04/13/23 An open source DBMS for handheld devices
20
Query ProcessingQuery Processing
Our Solution– We use a heuristic to determine which operator gains the most per unit memory allocation and allocate memory to that operator– Gain of every operator is determined by its best possible scheme– Repeat the process till memory allocation is done
Heuristic:
Select the scheme that has the maximum benefit/size and allocate its memory
04/13/23 An open source DBMS for handheld devices
21
Query ProcessingQuery ProcessingMemAllocate(MTotal) {
1. Mmin = Memory(min(i))
2. for i=1 to m do3. Scheme(i)=min(i)4. end for
5. Mavail = MTotal – Mmin
6. sbest,obest=GetBestScheme(Mavail)7. if no best scheme then return8. else {
9. Mavail = Mavail – Memory(sbest) + Memory(Scheme(obest))
10. Scheme(obest)=sbest
11. RemoveSchemes(sbest,obest)
12. RecomputeBenefits(sbest,obest)13. }14. goto step 6
}
Complexity = O(nm2), m=no. of operators, n=no. of schemes
i=1
mΣ
04/13/23 An open source DBMS for handheld devices
22
Query ProcessingQuery ProcessingRecomputation of Benefits
Once the operator obest gets memory Memory(sbest),
the benefit and size of all the schemes of obest that
have higher memory than sbest change.
New benefit and size values will be the difference between their old values and those of sbest.
Benefit
Size(0,0)
(s1,b1)(s2,b2)
(s2-s1)(b2-b1)
Scheme 1 has highest benefit/size ratioBenefit(Scheme 2)=(b2-b1)
Size(Scheme 2)=(s2-s1)Figure: Benefit and Size Recomputation
04/13/23 An open source DBMS for handheld devices
23
Query ProcessingQuery Processing1 Phase Approach
The 2-phase solution optimally allocates memory to all the operators in the query plan. However, the plan itself might be suboptimal for the given available memory.
1-phase approach takes into account memory division among operators while choosing between plans.
Ideally, 1-phase optimization should be done but the
optimizer becomes complex.
04/13/23 An open source DBMS for handheld devices
24
Future WorkFuture WorkImplementation Status
1. Flat Storage, Domain Storage, Ring Storage, and ID Storage2. Join algorithms
Future Work Bitmap Storage implementation Algorithms for aggregation Query optimizer and the iterator Test using sample relations and data from handheld apps Examine the feasibility of a 1-phase optimizer Database Module Toolkit An operator that returns first-k results of a query Application specific DBMS
25
Thank YouThank You
04/13/23 An open source DBMS for handheld devices
26
ReferencesReferences1. A. Ammann, M. Hanrahan, and R. Krishnamurthy. Design of a
Memory Resident DBMS. In IEEE COMPCON, 1985.
2. C. Bobineau, L. Bouganim, P. Pucheral, and P. Valduriez. PicoDBMS: Scaling down Database Techniques for the Smartcard. In VLDB, 2000.
3. Stephen Blott and Henry F. Korth. An Almost Serial Protocol for Transaction Execution in Main Memory Database Systems. In VLDB, 2002.
4. DB2 Everyplace. http://www.ibm.com/software/data/db2/everyplace.
5. Anindya Datta, Debra VanderMeer, Krithi Ramamritham, and Bongki Moon. Applying Parallel Processing Techniques in Data Warehousing and OLAP. In VLDB, 1999.
6. A. Hulgeri, S. Sudarshan, and S. Seshadri. Memory Cognizant Query Optimization. In Advances In Data Management, 2000.
04/13/23 An open source DBMS for handheld devices
27
ReferencesReferences7. Arthur M. Keller. Algorithms for Translating View Updates to Database Updates for Views Involving Selections, Projections and Joins. In ACM PODS, 1985.
8. Rom Langerak. View Updates in Relational Databases with an Independent Scheme. In ACM PODS, 1990.
9. T. Lehmann and M. Carey. A Study of Index Structures for Main Memory DBMS. In VLDB, 1986.
10. M. Missikov and M. Scholl. Relational Queries in a Domain Based DBMS. In ACM SIGMOD, 1983.
11. Mysql. http://www.mysql.com.
12. P. Pucheral, P. Valduriez, and J.M.Thevenin. EÆcient Main Memory Data Management using the DBGraph Storage Model. In VLDB, 1990.13. The Simputer. http://www.simputer.org.
04/13/23 An open source DBMS for handheld devices
28
A
Σ