57
Indexes The second pillar of database wisdom

Indexes: The Second Pillar of Database Wisdom

Embed Size (px)

DESCRIPTION

The most important things that most developers don't know about database index design. If you think designing the indexes in your database is a simple matter, you have a lot to learn!

Citation preview

Page 1: Indexes: The Second Pillar of Database Wisdom

IndexesThe second pillar of database wisdom

Page 2: Indexes: The Second Pillar of Database Wisdom

Guyren Howe(“Guy”)

[email protected]

Page 3: Indexes: The Second Pillar of Database Wisdom

“MySQL doesn’t support this”

MySQL

Page 4: Indexes: The Second Pillar of Database Wisdom

What everyone knows about indexes

• Trade insert/update/delete time for query time

• Trade storage space for query time

Page 5: Indexes: The Second Pillar of Database Wisdom

Other things everyone knows about indexes

• Index all join columns

• Index for every field you search

• Indexes should be rebuilt regularly

• Most selective first

• Dynamic SQL is slow

• Faster computers execute queries faster

Nonsense some folks believe

Page 6: Indexes: The Second Pillar of Database Wisdom

http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html

Page 7: Indexes: The Second Pillar of Database Wisdom

http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html

Page 8: Indexes: The Second Pillar of Database Wisdom

Howe’s LawBusinesses can’t afford the

servers they really need

Page 9: Indexes: The Second Pillar of Database Wisdom

Corollary to Howe’s Law

Your database is never fast enough

Page 10: Indexes: The Second Pillar of Database Wisdom

Query• not a program

• balances:

• random vs sequential I/O

• CPU use

• memory use

• optimization is np hard

Page 11: Indexes: The Second Pillar of Database Wisdom

“Or” and complex queries

• AND queries, particularly in applications are the most common type of query

• A query involving OR clauses can usually be considered as combining the results of two or more AND queries

• Complex to give general advice

• Can be reason to stick to small, simple indexes

Page 12: Indexes: The Second Pillar of Database Wisdom
Page 13: Indexes: The Second Pillar of Database Wisdom

How to use an index

Page 14: Indexes: The Second Pillar of Database Wisdom

Phone Book Theory of Indexing

Page 15: Indexes: The Second Pillar of Database Wisdom

WhoCo

Employee Register

Page 16: Indexes: The Second Pillar of Database Wisdom

Smith Andrew Houston 555-413-6182

Adams Maggie Houston 555-111-8943

Medville Anne Los Angeles 555-413-1183

Samuels Fred New York 555-112-8943

p72

Page 17: Indexes: The Second Pillar of Database Wisdom

Jacob, a programmer

Page 18: Indexes: The Second Pillar of Database Wisdom

Family Name Pages

Adams 17/3, 44/1, 44/5 72/4

Andrews 1/3, 19/5, 44/5

Ambers 11/2

Amberson19/1, 32/1, 32/2, 32/3, 32/4,

99/2

Family name index

Page 19: Indexes: The Second Pillar of Database Wisdom
Page 20: Indexes: The Second Pillar of Database Wisdom

Given Name Pages

Adam 13/3, 14/3, 124/4

Ainsley 12/3

Aston 134/2, 135/2

Atley 19/5, 32/1, 32/3, 32/4, 99/1

Given name index

Page 21: Indexes: The Second Pillar of Database Wisdom

Office Pages

Austin, TX

3/1, 4/1, 5/2, 6/5, 7/4, 11/4, 15/5, 18/3, 19/3, 20/2, 21/3, 22/1, 23/1, 25/4, 26/5, 27/2, 28/4, 29/1, 30/1 31, 33, 34…

Bakersville, MD1/1, 2/2, 3/3, 4/3, 7/2, 11/3, 15/5, 19/4, 20/2, 21/2, 23/4,

44/1, …

Boston, MA …

Branch index

Page 22: Indexes: The Second Pillar of Database Wisdom

Gwen, a programmer

Page 23: Indexes: The Second Pillar of Database Wisdom

Comprehensive index

Branch Family Name Given Name Pages

Austin Jeffries Leslie 11/2

Austin Kew Nelson 124/2

Austin Samuelson Steven 13/4

Austin Simpson Zaphod 98/1

Page 24: Indexes: The Second Pillar of Database Wisdom
Page 25: Indexes: The Second Pillar of Database Wisdom

Comprehensive index

Family Name Given Name Branch Pages

Adams Leslie Bakersville 44/2

Adams Nelson Boston 17/2

Adams Steven Bakersville 44/1

Adams Zaphod Austin 72/3

Page 26: Indexes: The Second Pillar of Database Wisdom
Page 27: Indexes: The Second Pillar of Database Wisdom

1. Prefer compound indexes

Page 28: Indexes: The Second Pillar of Database Wisdom

Full-time index

Full-time Pages

Y 1/7, 1/8 2/1, 2/2, 2/3, 2/4, 2/5…

N 1/1, 1/2, 1/3, 1/4, 1/5, 1/6, 2/6, …

Page 29: Indexes: The Second Pillar of Database Wisdom

2. Prefer selective predicates

Page 30: Indexes: The Second Pillar of Database Wisdom

Comprehensive index

Family Name Given Name Branch Pages

Adams Leslie Bakersville 44/2

Adams Nelson Boston 17/2

Adams Steven Bakersville 44/1

Adams Zaphod Austin 72/3

Page 31: Indexes: The Second Pillar of Database Wisdom

3. Consider a covering index

Page 32: Indexes: The Second Pillar of Database Wisdom

Database index

1979-05-27

Tampa

1979-06-07

Grover

1979-07-08

Austin

1979-02-01 Austin

1979-02-12 Houston

1979-02-17

Bakersville

1979-03-08 Tampa

1979-05-27 Tampa

1979-05-29 Newtown

1979-05-29 Seattle

1979-06-04 Seattle

1979-06-07 Grover

1979-06-14

South End

1979-06-15 Newtown

1979-06-16 London

Page 33: Indexes: The Second Pillar of Database Wisdom

Database index

Austin 1990-01-01

Bakersville

1981-06-01

Bakersville

1985-06-04

Austin1980-01-

01

Austin1981-06-

01

Austin1983-01-

12

Austin1987-11-

31

Austin1990-01-

01

Austin1991-06-

01

Bakersville

1979-08-03

Bakersville

1979-09-07

Bakersville

1981-06-01

Bakersville

1982-12-13

Bakersville

1983-03-28

Bakersville

1985-01-01

Page 34: Indexes: The Second Pillar of Database Wisdom

4. Be subtle with compound index

predicates• Put covering fields you don’t search on at the

end

• Put a field to do range searches on next

• Put other fields you search on at the front in order of use and selectivity

• Unless you’re dealing with different searches, in which case compromise, or use multiple compound indexes

Page 35: Indexes: The Second Pillar of Database Wisdom

WhoCo

Employee Register

Page 36: Indexes: The Second Pillar of Database Wisdom

Senior Personnel Directory

DivisionFamily Name

Given Name Branch Pages

Accounting Adams Leslie Bakersville 44/2

Accounting Adams Nelson Boston 17/1

R&D Yu Stephen Austin 72/5

Page 37: Indexes: The Second Pillar of Database Wisdom

CREATE INDEXpersonnel_main_office_idx

ONpersonnel (

division,family_name,given_name)

WHEREdivision IS NOT NULL;

Partial index

SELECT*

FROMpersonnel

WHEREdivision IS NOT NULL

ANDdivision = ‘R&D’;

MySQL

Page 38: Indexes: The Second Pillar of Database Wisdom

CREATE INDEXpersonnel_name_lc

ONpersonnel (

lower(family_name),lower(given_name));

Functional index

SELECT*

FROMpersonnel

WHERElower(family_name) =

‘smith’ ANDlower(given_name) =

‘mary’;

MySQL

Page 39: Indexes: The Second Pillar of Database Wisdom

Comprehensive index

Family Name Given NameStreet

Address Pages

Smith Leslie 13 Main St 44/2

Smith Nelson 11a Rails St 17/2

Smith Steven 19 Main St 44/1

Smithers Zaphod 12 Rails St 72/3

SELECT*

FROMpersonnel

WHERElower(family_name) like ‘smi

%’;

SELECT*

FROMpersonnel

WHERElower(street_addr) like ‘%

st’;

Page 40: Indexes: The Second Pillar of Database Wisdom

CREATE INDEXpersonnel_addr_rev

ONpersonnel (

reverse(lower(street_addr)));

Functional index tricks

SELECT*

FROMpersonnel

WHEREreverse(lower(street_addr)) like ‘ts

%’;

MySQL

Page 41: Indexes: The Second Pillar of Database Wisdom

Sorting• Need same predicates in WHERE and ORDER

BY

• Index can be used in either direction

• But not both…

• not SELECT… ORDER BY family_name DESC, branch ASC

• Use Postgres’ CREATE INDEX… NULLS LAST (or FIRST)

Page 42: Indexes: The Second Pillar of Database Wisdom

Partial Results• LIMIT n or FETCH FIRST n

• Need pipelined ORDER BY

• Otherwise we have to sort the whole result set

• Database dependent

• Postgres window functions aren’t pipelined

Page 43: Indexes: The Second Pillar of Database Wisdom

Paginating“SELECT

*FROM

peopleORDER BY

hire_dateLIMIT

50OFFSET

1000"

Page 44: Indexes: The Second Pillar of Database Wisdom

Paginating: much better

People.where “

(hire_date, creation_date) >(:hire_date, :creation_date)”,

{hire_date: @last_person.hire_date,creation_date:

@last_person.creation_date}.order(:hire_date, :creation_date).limit: 50

.where “(hire_date > :hire_date)OR(hire_date = :hire_date ANDcreation_date > :creation_date)”

Otherdatabases

Possiblyor hire_date IS NULL

Postgres

Page 45: Indexes: The Second Pillar of Database Wisdom

Clustering

• Based on existing index

• Physically reorganize the heap in index order

• Reduce seek time for blocks of records

• One-off procedure; new records not clustered

• Locks table for duration

Page 46: Indexes: The Second Pillar of Database Wisdom

Use case: partitioning

• use Postgres table inheritance

• eg log data, very fast, last 30 days searchable

• split table into numerous identical subtables

• cluster a table once we’re done writing to it

Page 47: Indexes: The Second Pillar of Database Wisdom

Joins

Page 48: Indexes: The Second Pillar of Database Wisdom

Three Join Methods• Nested Loop

• Query 1 table; loop and query the other

• Hash Join

• Avoid nested loop cost by making hash of one side

• Merge Join

• Query each table, sort and compare

MySQL

MySQL

Page 49: Indexes: The Second Pillar of Database Wisdom

Nested Loop Join

• Index for each side separately

Page 50: Indexes: The Second Pillar of Database Wisdom

Hash Join

• Not for range joins; only equality

• No point indexing join fields

• Index independent predicates

• Reduce memory footprint of hash table by minimizing number of fields or rows

MySQL

Page 51: Indexes: The Second Pillar of Database Wisdom

Merge Join• Less often used because sort is expensive

• Mainly for outer joins

• Or when an index on both sides speeds sorting

• same index for where and order by

• No point indexing join fields, only independent predicates

• Reduce memory footprint by minimizing number of fields or rows

MySQL

Page 52: Indexes: The Second Pillar of Database Wisdom

Index Data Structures

Page 53: Indexes: The Second Pillar of Database Wisdom

Index Data Structures

• B-Tree

• “Normal” index type

• equality and range searches

• Hash

• Faster for equality

• Test

Page 54: Indexes: The Second Pillar of Database Wisdom

Index Data Structures

• GIST

• Variety of data types with complex operators eg spatial contains/near/etc

• hstore

• SP-GIST

• Similar to GIST but better for some data types

• GIN

• more-or-less GIST

• (3x) more space and (3x) slower insert gives faster (3x) search

• Some data types only support one

MySQL

Page 55: Indexes: The Second Pillar of Database Wisdom

Dynamic SQL• Prepared statement or bind queries lets

server cache query plan

• SQL Server or Oracle

• Save query plan time

• Usually saves time, but doesn’t consider values

• Uneven distributions suffer

MySQL

Postgres

Page 56: Indexes: The Second Pillar of Database Wisdom

Statistics

ALTER TABLEtable

ALTER COLUMNcolumn

SETSTATISTICS 1000