Data Partitioning in VLDB

Data Partitioning in VLDB

Tal [email protected]

Why am I here?

Tal Olier – [email protected]

~15 years in various software development positions. All of them involved database practice.

I work in HP Software, I love working there and I came to tell you this; the lecture is just an excuse getting me into the building :)

mailto:[email protected]

Agenda

• RDBMS in short (basic terms)• SQL reminder• A bit about (RDBMS) architecture• Performance - access paths • What is table join• VLDB - the size factor• VLDB - industry practice• How joins are executed• Summary

Relational Database Management System

A little history

• Was invented in 1970– By Edgar Frank "Ted" Codd– In IBM labs– Oracle emerged first to the market

Basics – a table

• Rows • Columns• Primary key

Emp_id Emp_name Salary

1 Dany 10,000

2 Yosi 20,000

3 Moshe 30,000

4 Eli 40,000

Basics – a relation

• A foreign key (constraint)• A reference

– Source table– Source column/s– Target table– Target column/s

People example

• People: name, height, smoking, father• Books read: title, author• Schedule details: from, to, activity• Resume details: from, to, salary

People example

Structured Query Language

Query language

• SQL – Structured Query Language• Declarative (vs. procedural)• Requires Internal optimization

SELECT query structure

• SELECT • FROM… JOIN• WHERE • GROUP BY• HAVING• ORDER BY

SQL modules

• DML (+Select) – Data manipulation language• DDL – Data definition language• TC – Transaction controls (commit/rollback)• DCL – Data control language (grant/revoke)• PE – Procedural extensions

A bit about architecture

Database server

MemoryProcess

I/O System

Client Process

Data Files Log Files

Server ProcessBuffer cache Log

cacheOther cache

Everything is blocks

IO bound vs. CPU bound

• CPU – what is it consumed for?• IO – what is it consumed for?

Performance?

FTS – full table scan

• Scan the whole table – from top to bottom200720082009201020072008200920102007200820092010

B Tree index

• B tree – allows great spanning that derives small tree height

B+ tree

• The leaves are organized in a doubly linked list• B+ tree – allows searching through all values by

searching the leaf level only

Database index

• Data is sorted according to the index columns• The leaf contain pointers to rows in the table• Search of 1 value in a tree - o (log n)• Smaller index height in B+ trees• Index (database) operations:

– Add/remove values– Index seek– Index scan

Index seek/scan

200720082009201020072008200920102007200820092010…

Join (logical)

Inner join

• Use join predicate to match rows from 2 table: A and B

• Each row in table A is compared to each row in table B to find the pairs of rows that satisfy the join predicate

• Than column values for each matched pairs are combined into a result row

dept_id Dept_name

1 Sales2 Engineering3 Marketing

department

employee

Emp_name Dept_idRina 1

Moshe 2

Shira 2

Yossi null

emp_dept_id

emp_name dept_dept_id

dept_name

1 Rina 1 Sales

1 Rina 2 Engineering

1 Rina 3 Marketing

2 Moshe 1 Sales

2 Moshe 2 Engineering

2 Moshe 3 Marketing

2 Shira 1 Sales

2 Shira 2 Engineering

2 Shira 3 Marketing

null Yossi 1 Sales

null Yossi 2 Engineering

null Yossi 3 Marketing

Cartesian product

Equi join

• A inner join that uses equality comparison in the join predicate

• Example:select * from employee emp join department dept on emp.dept_id = dept.dept_id

Equi join

OK

OK

OK

emp_dept_id

emp_name dept_dept_id

dept_name

1 Rina 1 Sales

1 Rina 2 Engineering

1 Rina 3 Marketing

2 Moshe 1 Sales

2 Moshe 2 Engineering

2 Moshe 3 Marketing

2 Shira 1 Sales

2 Shira 2 Engineering

2 Shira 3 Marketing

null Yossi 1 Sales

null Yossi 2 Engineering

null Yossi 3 Marketing

RDBMS – summary in a nutshell

• Tables• References• Joins• Indexes• Blocks• I/O

Very Large Data Base

RDBMS – summary in a nutshell

• Tables• References• Indexes

• Blocks• I/O

VLDB – a table – size factor

Use case: Sales Information

• Table:– Customer name– Order number– Order date and time– List of items, amount and prices

Order details (2007-2010)

200720082009201020072008200920102007200820092010

Remove 2007’s orders

200720082009201020072008200920102007200820092010…

Order details kept in 4 tables

20072007…

20082008…

20092009…

20102010…

… 4 tables – remove 2007’s data

20072007…

20082008…

20092009…

20102010…

Union view

Select * from t2007Union all Select * from t2008Union all Select * from t2009Union all Select * from t2010

Order details kept in 4 tables and a view

20072007…

20082008…

20092009…

20102010…

Partitioned table

20072007…

20082008…

20092009…

20102010…

Get back to: Remove 2007’s data?

20072007…

20082008…

20092009…

20102010…

Impact on index behavior

200720082009201020072008200920102007200820092010…

20072007…

20082008…

20092009…

20102010…

Partitioned index (local index)

200720082009201020072008200920102007200820092010…

20072007…

20082008…

20092009…

20102010…

Local indexes

• Index is bound to it’s partition• Drop partition derives drop index• Smaller index heights• Index is always usable• Harder to maintain uniqueness with it

Partitioned table - concepts

• Partition column is the key for dividing the data

• Performance – only relevant partitions used• Add/drop partition – DDL• Local index – index is bound to a partition

Star schema

Data tables

block blockblockblockblockblockblockblock blockblockblockblockblockblockblock blockblockblockblockblockblockblock blockblockblockblockblockblockblock blockblockblockblockblockblockblock blockblockblockblockblockblockblock blockblockblockblockblockblockblock blockblockblockblockblockblock

T a b l e - A T

a b

l e -

B

T a b l e - C

Let’s get back to our partitioned table

20072007…

20082008…

20092009…

20102010…

Dimension referencing

Year Cust_id …

…

2007 1715

…

Cust_id Name Serial signature …

…

1715 Bank of America Ltd/ FFAA23472394- …

…

…

…

Making fact tables thin

20072007…

20082008…

20092009…

20102010…

Dimension

Dimension

Dimension

Dimension

Dimension

Dimension

Dimension

Dimension

Join (physical)

Join

To perform a join the optimizer need make the following decisions:• Access path

how to access each table

• Join orderif more than 2 tables/views are joined, which join to do first

• Join methodfor each pair of row resource how to perform the join

Join methods– nested loop

• One input is the outer loop, the other input is the inner loop

• The inner loop is executed for each row in the outer loop

• Effective when – The outer loop is small– The inner loop is pre indexed

Join methods– hash

• The smaller of the 2 inputs is named the build input

• The second is probe input• Hash table is build from build input• Each row in the build input is put in the

appropriate bucket• The entire probe input is scanned

Join methods– hash cont’

• For each row the hash value is calculated• The corresponding hash bucket is scanned to

find matched rows in the build input• Good for joining large amount

of data

Join methods– merge

• There is no concept of driving table• Both input sources are sorted according the

join key ( or use sorted source such as index)• The sorted lists are merged together • The merge itself is very fast, but it can be

expensive to sort the sources

Summary

• It’s all about I/O• Star schema – facts and dimensions• Partitions + local indexes• SQL joins (probably hash)

Q & A

Documents

Data Partitioning in VLDB