38
pgpool: Features and Development Tatsuo Ishii pgpool Global Development Group SRA OSS, Inc. Japan

pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

  • Upload
    others

  • View
    45

  • Download
    0

Embed Size (px)

Citation preview

Page 1: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

pgpool: Features and Development

Tatsuo Ishiipgpool Global Development Group

SRA OSS, Inc. Japan

Page 2: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

Agenda

● Developers● History● Existing pgpool project● Ongoing pgpool-II project● Demonstration

Page 3: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 3

Who are we?

● pgpool Global Development Group– Tatsuo Ishii(SRA OSS, Inc. Japan)– Devrim Gunduz(Command Prompt, Inc.)– Yoshiyuki Asaba(SRA OSS, Inc. Japan)– Taiki Yamaguchi(SRA OSS, Inc. Japan)– pgpoo-II development team

● In addition to Tatsuo, Yoshiyuki and Taiki:– Tomoaki Sato, Yoshiharu Mori, Kaori Inaba (all from SRA

OSS, Inc. Japan)

Page 4: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 4

Developers!

YoshiyukiProject leaderpgpool developer

YoshiharuQuery rewritingParallel executionengin

KaoriProject managerSystem DB

TatsuoEnhancepgpool-Ipgpooldeveloper

TaikiPCPQuery Cachepgpooldeveloper

TomoakiCommunicationmanager

Our Boss“Green Turtle”

Page 5: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 5

pgpool-I: the history

● V0.1: Started as a personal project (2003/6/27)

● V1.0: Synchronous replication (2004/3)● V2.0: Load balance (2004/6)● V3.0: pgpool Global Development

Group(2006/2)

Page 6: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 6

Why pgpool?

● No general purpose connection pooling software was available– Java has its own, but PHP does not...

● No small to mid scale/light weight synchronous replication tool was available

Page 7: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 7

pgpoolparent process

pgpoolchild process

pre-fork

PostgreSQLbackend process

pgpool process architecture

pgpool.conf

Page 8: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 8

pgpool functionality:connection pooling

● Reduce connection overhead● Limit maximum number of connections to

the PostgreSQL backend– New incoming connections are queued in

the kernel if all pgpool processes are busy

Page 9: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 9

How does connection pooling work?(1)

PostgreSQL

pgpool

user1/db1

user1/db1PostgreSQL

user1/db1

user1/db1PostgreSQL

user1/db1PostgreSQLuser1/db1

Page 10: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 10

How does connection pooling work?(2)

pgpool

user1/db1

user2/db2PostgreSQL

user2/db2

user2/db2PostgreSQL

user3/db3

user3/db3

user2/db2PostgreSQL

user3/db3

Page 11: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 11

pgpool understands frontend/backend protocol

● Pgpool is transparent to both applications and PostgreSQL

● Virtually no modifications to applications are needed

● No special APIs are needed● Can be used with existing language APIs● Can be more efficient than libpq because

of smaller controlling granuality

Page 12: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 12

Limitations of current implementation

● resetting pooled connection may not be perfect(need modification to applications)– temp tables– need help from PostgreSQL

● SSL is not supported– fall back to non SSL mode

● no pg_hba.conf like IP based authentication

Page 13: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 13

pgpool functionality: replication

● Queries are duplicated and sent to PostgreSQL servers

pgpool

PostgreSQL

PostgreSQL

query

query

Page 14: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 14

Dead lock problem

session 1 session 2 session 1 session 2master secondary

lock

lock

lock

lock

twait

wait

Page 15: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 15

“Strict” mode in replicationto avoid deadlock problem

pgpoolPostgreSQL

PostgreSQL

Querypacket

pgpoolPostgreSQL

PostgreSQL

Wait untilsomethingreturns

pgpoolPostgreSQL

PostgreSQLQuery

pgpoolPostgreSQL

PostgreSQL

Wait untilsomethingreturns

reply backthe result

Query

Page 16: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 16

pgpool functionality: load balanace

● SELECT queries are sent to randomly chosen backend

● The ratio for load balancing can be changed

pgpool

PostgreSQL

PostgreSQL

Page 17: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 17

Limitations of current implementation in replication

● CURRENT_TIMESTAMP and server- dependent-value-returning-functions cannot be replicated

● MD5 and crypt authentication are not supported● Sequences and SERIAL needs table locking if

there are more than 1 connections● Functions having side effects cannot be load

balanced

Page 18: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 18

pgpool-II project● Information-Technology Promotion

Agency, Japan (IPA: http://www.ipa.go.jp) granted project

● Started in February 2006, expected to release the first version in September 2006 under BSD license

● Features including parallel query and enhancement to pgpool

● Successor to pgpool

Page 19: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 19

Goal of pgpool-II project

● Implement parallel query processing● Enhance pgpool

– Allow to have more than 2 DB nodes– More precise control using shared memory– Easy to manage

● Control port/protocol● Detailed statistics on node status● GUI management tool

Page 20: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 20

pgpool-II architecture overview

CommunicationManager

PCP lib

PCP command

SQLParser

pgpoolCatalog

PostgreSQL

QueryRewriting

Parallelexecution

engine

PostgreSQL

sharedmemorymanager

pgpoolAdmin

Replication/loadbalanceengine

pgpool-II System DB

DB node

ClientQueryCache

Page 21: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 21

pgpool-I and pgpool-II mode

pgpoo-Imode

1000200030004000

1000200030004000

pgpoo-IImode

10002000

30004000

replication load balance fail over virtually compatible with pgpool

parallel query

Page 22: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 22

Table partitioning controlCREATE TABLE dist_def ( dbname TEXT, -- database name schema_name TEXT, --schema name table_name TEXT, -- table name col_name TEXT, -- key col name col_list TEXT[], -- col names type_list TEXT[], -- col types dist_def_func TEXT, -- function name PRIMARY KEY (dbname, schema_name, table_name));

INSERT INTO dist_def VALUES ('y-mori','public','accounts','aid',ARRAY['aid','bid','abalance','filler'],ARRAY['integer','integer','integer','character(84)'],'dist_def_accounts');

CREATE OR REPLACE FUNCTION dist_def_accounts (val ANYELEMENT) RETURNS INTEGER AS ' SELECT CASE WHEN $1 >= 0 and $1 < 100001 THEN 0 WHEN $1 >= 100001 and $1 < 200001 THEN 1 WHEN $1 >= 200001 and $1 < 300000 THEN 2 END' LANGUAGE SQL;

Page 23: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 23

Simple parallel query

SELECT * FROM accounts WHERE aid = 1000;

pgpool

PostgreSQL

PostgreSQL

SELECT * FROM accounts WHERE aid = 1000;

SELECT * FROM accounts WHERE aid = 1000;

Page 24: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 24

Complex query exampleSELECT * FROM accounts WHERE aid = 1000ORDER BY aid;

pgpool

PostgreSQLSELECT * FROM accounts WHERE aid = 1000;

System DBPostgreSQL

SELECT * dblink('con', 'SELECT pool_parallel('SELECT * FROM accounts WHERE aid = 1000')') AS foo(...)ORDER BY aid;

pgpool SELECT pool_parallel('SELECT * FROM accounts WHERE aid = 1000')')

Page 25: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 25

DML

● INSERT recognize partition key value in a query and INSERT into appropreate DB node

● UPDATE/DELETE simply issues the same query to all DB nodes

Page 26: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 26

INSERT

INSERT INTO accounts(aid, bid)VALUES(500,100);

SELECT dist_def_accounts(500);

call

returnDB node = 1

Syetm DB

DB node 0 DB node 1 DB node 2

aid = 0-499 aid = 500-999 aid = 1000-1499

INSERT

Page 27: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 27

Query Cache

CREATE TABLE pgpool_catalog.query_cache (hash TEXT, -- query string(MD5 hashed)query TEXT, -- query stringvalue bytea, -- query result(RowDescription and

DataRow packets)dbname TEXT, -- database namecreate_time TIMESTAMP WITH TIME ZONE, -- cache creation timePRIMARY KEY(hash));

● Caches query result● Caches can be validated by pgpoolAdmin

Page 28: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 28

Using parallel query and replication together

pgpool-IIwith parallel query

pgpool-IIwith replication

pgpool-IIwith replication

PostgreSQL PostgreSQL PostgreSQL PostgreSQL

Parallel querylayer

Replicationand loadbalancelayer

Page 29: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 29

pgpoolAdmin

● Web based pgpool management tool

● Apache/PHP/Smarty● functions

– Stopping pgpool

– Switch over

– Monitoring connection pool status

– Monitoring process status

– Editing configuration file

– Query Cache management

Page 30: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 30

Benchmarks

● Hitachi Blade Symphony BS1000

– 10 blades

– Xeon 3.0GHz x 1

– 1GB Mem

– UL320 SCSI Disks● Cent OS 4.3

● PostgreSQL 8.1.4/pgbench

● Please note that these results are measured on pre-alpha version of pgpool-II and maybe slightly different from the future official release version

Page 31: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 31

Sequencial Scan case(mid size)

1 2 4 8 16 320

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

PostgreSQL pgpool-II

concurrent users

TPS

● scale factor = 90 (90M rows)

● SELECT * FROM accounts WHERE aid = :aid

● pgpool-II is 7 times faster at the best (32 concurrent users)

Page 32: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 32

Index Scan case(mid size)

1 2 4 8 16 320

250500750

100012501500175020002250250027503000325035003750

PostgreSQL pgpool-II

concurrent users

TPS

● scale factor = 90 (90M rows)

● SELECT * FROM accounts WHERE aid = :aid

● pgpool-II is 9 times faster at the best (16 concurrent users)

Page 33: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 33

Index Scan case(large size)

1 2 4 8 16 320

100

200

300

400

500

600

700

800

900

1000

1100

1200

1300

PostgreSQL pgpool-II

concurrent users

TPS

● scale factor = 900 (900M rows)

● SELECT * FROM accounts WHERE aid = :aid

● pgpool-II is 18 times faster at the best (32 concurrent users)

Page 34: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 34

complex query case

1 2 4 80

5

10

15

20

25

30

35

40

45

PostgreSQL pgpool-II

concurrent users

TPS

● scale factor = 900 (900M rows)

● SELECT a1.abalance FROM accounts as a1 ,accounts as a2 WHERE a1.aid = :aid1 and a2.aid = :aid2

● pgpool-II is faster only when at 8 concurrent users

Page 35: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 35

pgpool-II restrictions

● SQL restrictions– COPY is not supported

– SELECT INTO, INSERT INTO ... SELECT is not supported

– Extended protocol is not supported

– INSERT requires explicit values if target column is the key for data patitioning

– UPDATE with WHERE clause including function call is not supported

– CREATE TABLE/ALTER TABLE requires pgpool restarting

● Transactions– If a DB node fails during INSERT/UPDATE/DELETE, data consistency will

be broken

– SQL commands via dblink will be executed as separate transactions

Page 36: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 36

Future plans(TODO)

● More intelligent query rewriting● Remove SQL restrictions● Employ two phase commit to keep consistency

among DB nodes. However this technique can be used only for queries outside transaction block since PREPARE TRANSACTION closes current transaction block

● More intelligent cache validation

Page 37: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 37

References

● pgpool page– http://pgpool.projects.postgresql.org/

● pgpool-II page– English page

● http://pgpool.projects.postgresql.org/pgpool-II/en/

– Japanese page

● http://pgpool.projects.postgresql.org/pgpool-II/ja/

Page 38: pgpool: Features and Development · to avoid deadlock problem pgpool PostgreSQL PostgreSQL Query packet pgpool PostgreSQL PostgreSQL Wait until something returns ... pgpool SELECT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 38

Thank you!