Upload
others
View
65
Download
0
Embed Size (px)
Citation preview
Michael Banck
I Senior Consultant / Project Manager at credativ (since 2009)
I credativ database team
I Debian Developer (since 2001)
I Debian PostgreSQL packaging team
I Several PostgreSQL patches, e.g.
I checksum validation during base backupsI exclude schemas during pg restoreI permanent replication slot setup in pg basebackup
Michael Banck <[email protected]> credativ GmbH 1
PostgreSQL - Overview
I ‘’The World’s Most Advanced Open Source Relational Database”
I Extensible, object-relational database system
I Created as a research project at Berkeley, community-based development since themid-90s
I Vendor-neutral, commercial support available from multiple companies
I “Postgres Global Development Group”, core team (5 members), around 30 committers
I No copyright assignments, no open-core, no dual licensing
I BSD/MIT-style licence
I Many (also proprietary) forks
Michael Banck <[email protected]> credativ GmbH 2
Top Feature-Requests 2009
I Simple built-in replicationI In-place upgradesI Administration/monitoringI Driver quality/maintenanceI Extension managementI Per-column locale/collationI Materialized and updatable viewsI Autonomous transactionsI Parallel queriesI Index-only scansI Merge/upsert statementI Managed partitioningI Hot StandbyI Recursive queries and window functions
Michael Banck <[email protected]> credativ GmbH 6
Top Feature-Requests 2009 - Status in 2018
I Simple built-in replicationI In-place upgradesI Administration/monitoringI Driver quality/maintenanceI Extension managementI Per-column locale/collationI Materialized and updatable viewsI Autonomous transactionsI Parallel queriesI Index-Only ScansI Merge/upsert statementI Managed partitioningI Hot StandbyI Recursive queries and window functions
Michael Banck <[email protected]> credativ GmbH 7
PostgreSQL: What the Analysts Say
I Donald Feinberg, Gartner:
I ‘’Postgres functionality has increased greatly and is now more than sufficient to run bothmission-critical and non-mission-critical applications.”
I Noel Yuhanna, Forrester:
I ‘’PostgreSQL has the second-largest open source community; has competitive technologyand features and continues to expand its growth across various industries.”
I ‘’Performance, integration, security, unpredictable workloads, and high availability arecompanies’ top data management challenges.”
I Matt Aslett, 451 Group:
I ‘’PostgreSQL is a proven database for enterprise relational application workloads”I ‘’Increased commercial offerings and cloud-based functionality are driving adoption”
http://2013.pgconf.de/de/talks/edb-pggermany-2013-v01.pdf
Michael Banck <[email protected]> credativ GmbH 8
Enterprise Features - Definition
I Predictable major and patch releases, long support timeframes
I Fault tolerance and data consistency
I Enterprise relevant security-features
I Interoperability and extensibility
I Integrated operations, monitoring, backup
I Replication and high availability ‘
I Big data analytics
I Vertical and horizontal scaling
Michael Banck <[email protected]> credativ GmbH 9
But Wait - What about Antivirus?
Michael Banck <[email protected]> credativ GmbH 10
pg snakeoil - The PostgreSQL Antivirus
I Typical antivirus software on PostgreSQL has severe drawbacks
I Severely affecting performanceI Making the filesystem unreliableI Unclear failure modes
I Running antivirus software is sometimes required by local policy
I PostgreSQL extension pg snakeoil provides antivirus capabilities
I Leverages ClamAV to scan PostgreSQL data
I Technology preview
https://github.com/credativ/pg_snakeoil
Michael Banck <[email protected]> credativ GmbH 11
Predictable Major and Patch Releases, Long SupportTimeframes
Michael Banck <[email protected]> credativ GmbH 12
Predictable Major and Patch Releases, Long SupportTimeframes
I One major version per year, usually in September/October
Version Release Date
11 October 18, 201810 October 5, 20179.6 September 29, 20169.5 January 7, 20169.4 December 18, 20149.3 September 9, 20139.2 September 10, 20129.1 September 12, 20119.0 September 20, 2010
Michael Banck <[email protected]> credativ GmbH 13
Predictable Major and Patch Releases, Long SupportTimeframes
I Time-based code freeze (Q1), subsequent beta phase
I Release happens when no more serious bugs are presentI Release management team (since 2016)
I Major releases are supported for 5 years (so called back branches)
I Quarterly, predictable point releases for critical and security-relevant bugsI Always on the second Thursday in the second month of the quarterI Schedule: https://www.postgresql.org/developer/roadmap/I Security team handles security issuesI Potentially out-of-band point releases in case of emergencies
I Distribution packages for all supported versions for Red Hat/CentOS/SLES andDebian/Ubuntu
I http://yum.postgresql.org, http://apt.postgresql.org
Michael Banck <[email protected]> credativ GmbH 14
Predictable Major and Patch Releases, Long SupportTimeframes
I No bug tracker, but bug submission form
I Reported bugs are getting fixed promptly
I Patch support available from companies
I ‘’When I submitted a bug to the list, usually within an hour or two I would get anemail back saying ”confirmed that that’s a bug, I’m gonna look at it” and for the firstthree or four months, I never submitted a bug that I didn’t have a fix installed andrunning within 24 hours of submitting the initial post. And after my experience withSybase and MySQL/MaxDB, it was totally amazing.”
http://archives.postgresql.org/pgsql-advocacy/2007-08/msg00620.php
Michael Banck <[email protected]> credativ GmbH 15
Fault Tolerance and Data Consistency
Michael Banck <[email protected]> credativ GmbH 16
Fault Tolerance and Data Consistency
I ‘’I manage thousands of databases (PostgreSQL, SQL Server, and MySQL), and thispast weekend we had a massive power surge that knocked out two APC cabinets.[. . . ] Long story short, every single PostgreSQL machine survived the failure withzero data corruption. I had a few issues with SQL Server machines, and virtuallyevery MySQL machine has required data cleanup and table scans and tweaks to get itback to ”production” status.”
I ‘’I had exactly the same experience 3 years ago. Complete power failure (the stand-bygenerator took fire) in one small datacenter (around 500 machines). We had Oracle,SQL Server, DB2, MySQL, Progress, and of course PostgreSQL. The only databaseengine that restarted with no operation required was PostgreSQL. There were veryminimal problems with Oracle (typing recover on some instances), but we had quite afew problems with the other engines.”
http://archives.postgresql.org/pgsql-advocacy/2011-04/msg00085.php
Michael Banck <[email protected]> credativ GmbH 17
Fault Tolerance and Data Consistency
I Write-Ahead-Log protects transactions against crashes
I Automatic replay of transaction log during crash-recovery
I Synchronous replication to standbys possible
I Data checksums protect against storage errors
I https://github.com/credativ/pg_checksums
I Verification of index/data consistency (amcheck extension)
I Regression, isolation and WAL-consistency-checks during development
I Fuzz-Testing via sqlsmith
Michael Banck <[email protected]> credativ GmbH 18
Enterprise-Relevant Security-Features
Michael Banck <[email protected]> credativ GmbH 19
Enterprise-Relevant Security-Features
I Authentication
I Source-IP/User/Database basedI LDAPI SSL certificatesI SCRAM-SHA-256
I Database access control
I Column-based grantsI Row-level security (RLS)I SELinux extension (sepgsql)
I Auditing
I PGAudit extensionI Object audit logging
Michael Banck <[email protected]> credativ GmbH 20
Enterprise-Relevant Security-Features - STIG
https://crunchydata.com/postgres-stig/PGSQL-STIG-9.5+.pdf
Michael Banck <[email protected]> credativ GmbH 21
Interoperability and Extensibility
Michael Banck <[email protected]> credativ GmbH 22
Interoperability and Extensibility
I Federation via Foreign Data Wrappers (FDW) SQL/MED-Standard
I Particularly to other Postgres instances (postgres fdw)I Other SQL databases: MySQL, Oracle, Informix, SQLAlchemy
I Extensions
I Available since Postgres 9.1I Pure SQL or additional C-based librariesI Powerful API and hooksI Large, growing number
I Additional data typesI Procedural languagesI Administrative helpersI Auditing/loggingI Foreign-Data-WrapperI New index types (since 10)
Michael Banck <[email protected]> credativ GmbH 23
Enterprise Relevant Extensions - Examples
I pgaudit - Event auditing
I pglogical - Logical replication
I orafce - Oracle compatibility
I postgis - Spatial
I pg partman - Partition management
I pgcrypto - Table encryption
I tsearch/pg trgm - Full text search / similarity search
I sepgsql - SELinux-based Mandatory Access Controls
I pgstrom - GPU-offloading of compute-intensive workloads
Michael Banck <[email protected]> credativ GmbH 25
Integrated Operations, Monitoring, Backup
Michael Banck <[email protected]> credativ GmbH 26
PostgreSQL Appliance Dashboard
https://elephant-shed.io
https://github.com/credativ/elephant-shedMichael Banck <[email protected]> credativ GmbH 27
Elephant-Shed
I pgAdmin4 - Web-based PostgreSQL administration
I Grafana - Monitoring dashboards
I pgBadger - Logfile analysis
I pgBackRest - Backups
I Prometheus - Monitoring metrics
I Cockpit - System and services administration
I Shell In A Box - Web-based terminal emulator
Michael Banck <[email protected]> credativ GmbH 28
Elephant-Shed Monitoring Dashboard
Michael Banck <[email protected]> credativ GmbH 29
Replication and High Availability
Michael Banck <[email protected]> credativ GmbH 30
Physical (Streaming) Replication
I Transaction log streaming to standby
I Read-only queries possible on standby (Hot-Standby)
I Quorum-based synchronous replication, optionally per transaction
I Consistent reads from synchronous standbys
I Crash-proof retention of required transaction logs per standby via replication slots
I Standby cloning via base backup
I Switchover, switchback, promote and remastering
I Cascading and/or delayed replication
Michael Banck <[email protected]> credativ GmbH 31
Logical Replication - Use Cases
I Native (since 10) or via pglogical extension
I Major upgrades
I Change Data Capture
I Database changes as e.g. JSON
I Data Aggregation and Integration
I Individual tablesI Row/column filtering (pglogical)
I Bi-directional replication
I 3rd-party solutionI Geographically Distributed ClusterI Conflict resolution handling required
Michael Banck <[email protected]> credativ GmbH 32
High Availability - Definition
I Protection against hardware/software outages
I CPU defectI Network card failureI Kernel panicI Postgres process crash
I Maintenance does not impair service
I Restart of Postgres process after patching or configuration changeI Major version upgrade of PostgresI Operating system upgrade
I Application is continuously available
I No long-lasting locks during schema changes
Michael Banck <[email protected]> credativ GmbH 33
High Availability - Failover Solutions
I Pacemaker/Corosync
I pgsql resource agent (standard)I pgsqlms resource agent (PostgreSQL Automatic Failover, PAF)
I Patroni
I repmgr
I pgpool-II
I Kubernetes Operator
I PatroniI Crunchy Data Container Suite
I Client-based failover via definition of multiple hosts
I PgJDBC (since 9.3-1100)I libpq (since 10)
Michael Banck <[email protected]> credativ GmbH 34
High Availability - Pacemaker Master/Slave Set (PAF)
I Resource agent pgsqlms, developed by Dalibo, Postgres licence
I Master/Slave set, streaming replication
I Controlled switchover/switchback and demote possible besides failover/promote
I Switchover only if the current primary can become a standby without any problems
I In case of promote a notify event is intercepted and it is checked whether otherstandys have further replayed transactions
I Relatively simple configuration
I STONITH device required, timeouts need to be tested/adjusted
Michael Banck <[email protected]> credativ GmbH 35
High Availability - Pacemaker Example-Setup
Michael Banck <[email protected]> credativ GmbH 36
High Availability - Patroni
I Agent, configures instances and replication, enables switchover (Bot-Pattern)
I Uses a distributed consensus store (etcd, Consul, Zookeeper) for leader election andsplit-brain avoidance
I Offers a REST-API for status, health checks and configuration changes
I Optional HAProxy for master/replica service endpoints
I HTTP check REST-API on /master and /replica, respectively
I Deployment in containers, Kubernetes, bare-metal or via Debian/Ubuntu packages
Michael Banck <[email protected]> credativ GmbH 37
High Availability - Continuous Service Maintenance
I Transparent Postgres RestartI pgBouncer: PostgreSQL connection proxy/pooler/routerI Holds incoming connections with PAUSE commandI Postgres restart after all active connections have endedI Application sees delayed connection instead of error messagesI Requires short-lived sessions/transactionsI Incoming connection routing during major-version upgrade switchover
I Near-Zero-Downtime Major UpgradesI Logical Replication (pglogical, internal (from 10), Slony-I)
I Requires redundant hardware/storage and primary keys
I In-Place Upgrades with pg upgrade
I Does not require primary keys, but second data directoryI Hardlink mode (without possibility of switchback) downtime from 10sI Scales with amount of database objects, not database size
Michael Banck <[email protected]> credativ GmbH 39
High Availability - Long-Lasting Locks
The following operations do not require long-lasting exclusive locks or table rewrites:
I Adding columns with NULL or DEFAULT (from 11)
I Dropping columns
I Dropping or validating constraints
I Concurrent index creation
I Foreign Key creation
I Unique constraint creation via concurrent index
I Table reorganization with pg repack
Michael Banck <[email protected]> credativ GmbH 40
Big Data Analytics
I Declarative partitioning (since 10) allows management of huge tables
I CUBE, ROLLUP, and GROUPING SETS analytical functions
I Block-Range indexes (BRIN) partition data at 1% of default index size
I TABLESAMPLE command allows for data sample with upper bound runtime
I Parallel query allows usage of multiple cores for reporting queries
I PL/R procedural language allows for statistical analysis in R
Michael Banck <[email protected]> credativ GmbH 42
Vertical and Horizontal Scaling
Michael Banck <[email protected]> credativ GmbH 43
Vertical and Horizontal Scaling - Definition
I Vertical scaling: improved utilization of the server’s existing resources
I More transactions per CPU coreI Usage of multiple CPU cores for individual queries
I Horizontal scaling: load distribution to several servers
I Distributing queries to multiple servers
I Data replicated to every server: load balancingI Data distributed between servers: sharding
I Usage of multiple servers for individual queries
I Massive Parallel Processing
Michael Banck <[email protected]> credativ GmbH 44
Vertical Scaling - TPC-H Benchmark 5 GB v9.5-v11Sequential vs. Parallel
Michael Banck <[email protected]> credativ GmbH 45
Vertical Scaling - TPC-H Benchmark 5 GB v9.5-v11Selected Queries
Michael Banck <[email protected]> credativ GmbH 46
Vertical Scaling - TPC-H Benchmark 1 TB v9.6, 72Cores, q1
https://blog.2ndquadrant.com/parallel-monster-benchmark/
Michael Banck <[email protected]> credativ GmbH 47
Horizontal Scaling - Load Balancing
I Read queries get distributed
I Write queries on primary
I Data is replicated to all nodes
I Application-transparent
I pgpool-II
I Application support required
I HAProxyI pgBouncer DNS Round-RobinI PgJDBC connection option LoadBalance=true
I Parameter remote apply for consistent read queries
Michael Banck <[email protected]> credativ GmbH 48
Horizontal Scaling - Sharding
I Read and write queries get distributed
I Data is distributed between nodes
I Fact tables usually replicated for efficient joins
I Postgres-XL
I Greenplum
I CitusDB
I PL/Proxy
I FDW-based native sharding probably coming in the future
‘’Towards Built-in Sharding in Community PostgreSQL”
https://www.pgcon.org/2017/schedule/events/1069.en.html
Michael Banck <[email protected]> credativ GmbH 49
Thanks for your attention - Contact
I Question?
I Michael Banck <[email protected]>
I http://www.credativ.de
I http://www.credativ.de/postgresql-competence-center
I http://www.credativ.de/jobs
I http://www.credativ.de/blog
Michael Banck <[email protected]> credativ GmbH 50