33
What we love about pgpool and what we don’t? Sameer Kumar, Sr. Solution Architect, Ashnik (@AshnikBiz), Singapore 17/3/2017

What we love about pgpool and what we dont!

Embed Size (px)

Citation preview

Page 1: What we love about pgpool and what we dont!

What we love about pgpooland what we don’t?

Sameer Kumar, Sr. Solution Architect, Ashnik (@AshnikBiz), Singapore

17/3/2017

Page 2: What we love about pgpool and what we dont!

What am I going to talk?

• What is pgpool?

• What got us hooked to pgpool?

• What architectures we deployed with pgpool?

• Challenges we faced

• Some diagnostics and resolutions

• Some tips!

• IMHO, when not to use pgpool?

2

Page 3: What we love about pgpool and what we dont!

What got us hooked to pgpool?

Page 4: What we love about pgpool and what we dont!

What is pgpool?

• pgpool-II is a middleware

• Works between PostgreSQL servers and a PostgreSQL database client

• Licensed under BSD license

• This presentation is based on our experience with pgpool-II v3.3.7 and 3.5.1

4

Page 5: What we love about pgpool and what we dont!

pgpool is Swiss Army knife for PostgreSQL!

• It offers various features• Connection Pooling

• Replication

• Load Balancing

• Limiting Exceeding Connections

• Parallel Query (deprecated)

5

Page 6: What we love about pgpool and what we dont!

Connection Pooling

• pgpool-II saves connections to the PostgreSQL servers

• reuse them when a new connection with the same properties

• It reduces connection overhead, and improves system's overall throughput

6

Page 7: What we love about pgpool and what we dont!

Replication

• pgpool-II can manage multiple PostgreSQL servers

• Using the replication function enables creating a realtimebackup on 2 or more physical disks

• It is usually better to use native streaming replication feature of PostgreSQL

7

Page 8: What we love about pgpool and what we dont!

Load Balancing

8

• Read query is automatically load balanced

• pgpool can detect failover and start sending Read/write to surviving node

• It can also be used to run a failover command to promote standby upon failure of master

Page 9: What we love about pgpool and what we dont!

Watchdog Mode

9

Page 10: What we love about pgpool and what we dont!

Limiting Exceeding Connections a.k.a Connection Queuing

• In PostgreSQL connections after max_connections are rejected

• pgpool-II also has a limit on the maximum number of connection

• Extra connections will be queued

10

Page 11: What we love about pgpool and what we dont!

What got us hooked to pgpool?

• It understands the PostgreSQL wire protocol and has Postgres query parser?

• In plain English it means• You don’t have to do major changes in your application to use features

like load balancing and connections pooling.

• Most of the times you can use it with pgpool literally unchanged

• Makes it perfect• While scaling after development

• Using packaged application

11

Page 12: What we love about pgpool and what we dont!

We really started loving it!

• More stand-by servers can be added and pgpool can be configured for load balancing across more nodes in runtime

• Works in tandem with Virtualization and Provisioning on the fly

• Has in-built capability for HA with Watchdog

12

Page 13: What we love about pgpool and what we dont!

Architectures we deployed with pgpool

Page 14: What we love about pgpool and what we dont!

Streaming Replication – Master Slave

14

Page 15: What we love about pgpool and what we dont!

Streaming Replication – with Load Balancing

15

Page 16: What we love about pgpool and what we dont!

Load Balancing – with third party failover solution

You can combine this with a more reliable failover solution e.g. EDB Failover Manager, repmgr, corosync etc

16

Page 17: What we love about pgpool and what we dont!

Load Balancing with pgpool Watchdog – a.k.a. NoSPoF

17

Page 18: What we love about pgpool and what we dont!

pgpool tied with Application Servers

18

• If pgpool service fails, you can make your application reject any further client requests

• This way high availability and load balancing at application server will take care of pgpool’shigh availability as well

Page 19: What we love about pgpool and what we dont!

Challenges we faced!

Page 20: What we love about pgpool and what we dont!

Does not work very well with application connection pooling!

“An I/O error occured while sending to the backend.”

“A PooledConnection that has already signalled a Connection error is still in use”

“com.edb.util.PSQLException: This connection has been closed.”

• Occurs when a connection was idle in your application pool but was closed by pgpool

• Also when using application connection pooling, less changes of pgpool’s connection pool being useful

20

Page 21: What we love about pgpool and what we dont!

It is not guaranteed that pooled connections will be used

• In pgpool connections are pooled by child processes

• Each child process can pool multiple connections if they have different user-db combination

• When pgpool receives a request it assigns it to any child – “any child” randomly

• It is not guaranteed that a child pooling a relevant connection will receive request

21

Page 22: What we love about pgpool and what we dont!

We hit several issues and known bugs over 6 months

“connect_inet_domain_socket: select() interrupted “

”pid 22001: die: close listen socket“

“pcp_child: pcp_read() failed. reason: Success”

“pcp_child: authentication failed “

“an error message: ERROR: unable to read data from frontend DETAIL: EOF encountered with frontend “

22

Page 23: What we love about pgpool and what we dont!

Logging is not really best for someone who has not looked at the code

“pid 22001: die: close listen socket“

“pcp_child: pcp_read() failed. reason: Success”

“health_check: health check timer has been already expired before attempting to connect to 1 th backend”

23

Page 24: What we love about pgpool and what we dont!

Split Brain in Master Slave configuration

• False promotion of the Standby node during a network split

• No way to define quorum or voting before deciding to promote

• No auto reconfiguration of replication if you have multiple standby – you end up scripting a lot of things

24

Page 25: What we love about pgpool and what we dont!

You can not use pg_terminate_backend()

• pg_terminate_backend sends a signal back to the client, which is very similar to response sent when server is being shutdown

• pgpool uses the response from Server (on connection sockets) to determine it’s health

• Firing pg_terminte_backend() may lead to a connection being closed abruptly leading to pgpool initiating a failover

• This has been fixed to a large extent in v3.6 but still it does not cover all corner cases

25

Page 26: What we love about pgpool and what we dont!

No way to poll lost nodes automatically

• If a node is removed from pgpool, there is no check done by pgpool to check if it is ready again

• A periodic check to check if the same node is available again, would be nice

• This is specially good when you use a separate tool for failovers • Imagine when failover did not happen but pgpool thought master has

gone down

26

Page 27: What we love about pgpool and what we dont!

Connection restart upon node failure

2016-08-13 03:17:20 LOG: pid 8662: health_check: health check timer has been already expired before attempting to connect to 1 th backend

2016-08-13 03:17:20 LOG: pid 8662: set 1 th backend down status

2016-08-13 03:17:20 LOG: pid 8662: wd_start_interlock: start interlocking

2016-08-13 03:17:20 LOG: pid 8662: starting degeneration. shutdown host 172.163.169.150(5432)

2016-08-13 03:17:20 LOG: pid 8662: Restart all children

.

.

2016-08-13 03:17:34 LOG: pid 16338: pcp child process received restart request

2016-08-13 03:17:34 LOG: pid 8662: PCP child 16338 exits with status 256 in failover()

2016-08-13 03:17:34 LOG: pid 8662: fork a new PCP child pid 5133 in failover()

2016-08-13 03:18:11 LOG: pid 9204: worker process received restart request

27

Page 28: What we love about pgpool and what we dont!

Time skew can lead to issues in Watchdog mode

• pgpool exchanges UDP messages to keep a tab on other health of other pgpoolnodes

• These UDP messages contain timestamp

• If server time deviates or gets synchronized, there is a chance that pgpool will discard messages thinking it is from past

(!WD_TIME_ISSET(node->hb_send_time) ||WD_TIME_BEFORE(node->hb_send_time, pkt.send_time)){ereport(DEBUG1,(errmsg("received heartbeat signal from \"%s:%d\"",from, from_pgpool_port)));

node->hb_send_time = pkt.send_time;node->hb_last_recv_time = tv;}else{ereport(NOTICE,(errmsg("received heartbeat signal is older than the latest, ignored")));

28

Page 29: What we love about pgpool and what we dont!

Some tips!

Page 30: What we love about pgpool and what we dont!

• When running in Virtual Environment reserve the resource for Virtual Machine – i.e. no ballooning

• Make sure your pgpool timeout for node failover are higher than the failover

• Make sure you understand the authentication flow define pg_hba and pool_hba rule accordingly• Application --- AUTHENTICATES AGAINST pgpool --- AUTHENTICATES AGAINST PostgreSQL

• There are few hidden (not documented) checks that are done by pgpool• good to allow connections from pgpool to PostgreSQL for system

database e.g. postgres, template0 etc

Some tips while running pgpool

Page 31: What we love about pgpool and what we dont!

• Ensure that your servers hosting pgpool are in synchronized for their time

• pgpool timeout for node failover > than max time lag allowed / time sync frequency in your environment

• Use a different user for pgpool health check – with read only permissions/restrictions

• Learn to use pcp_* commands – manually and with your failover tool

Some more tips

Page 32: What we love about pgpool and what we dont!

• You have huge number of connections – any node failure can lead to restart of connections leading to a disconnection

• When you want to use it just for connection pooling – use pgbouncer instead

• When you want lost nodes to join back automatically

• When you are operating in a network where latency and QoS is not guaranteed

• You want queries to continue during a failure

• When you want to use it with different DB clusters

IMHO pgpool is not best when

Page 33: What we love about pgpool and what we dont!

If you have more questions, write to us at: [email protected]

Website: www.ashnik.com