24
To PostgreSQL and Beyond: The Definive Guide to Migang PostgreSQL Challenges Accelerang Applicaon Development and Delivery in any Enterprise Environment

To PostgreSQL and Beyond: The Definitive Guide to ......application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: To PostgreSQL and Beyond: The Definitive Guide to ......application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute

To PostgreSQL and Beyond: The Definitive Guide to Mitigating PostgreSQL ChallengesAccelerating Application Development and Delivery in any Enterprise Environment

Page 2: To PostgreSQL and Beyond: The Definitive Guide to ......application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute

CONTENTS

3

5

11

17

21

23

24

INTRODUCTION

READY: PLAN BEFORE YOU LAUNCH

START: DAY-ONE CHALLENGES

ACCELERATE: DAY-TWO AND BEYOND

COMPARING OPERATING ENVIRONMENTS

CONCLUSION

ABOUT STRATOSCALE

O

Page 3: To PostgreSQL and Beyond: The Definitive Guide to ......application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute

INTRODUCTION

The evolution of database software can be dated back to the era of punch cards and has naturally aligned with the capabilities and sophistication of underlying storage hardware. A key inflection point in this history occurred in the early 70s through the work of Edgar Codd at IBM. Codd developed the relational database model, including the foundations of Structured Query Language (SQL), which was subsequently realized in practice in IBM’s System R. Shortly after, Oracle (known at the time as Relational Software, Inc.) recognized the market opportunity for a relational database

A naive assessment easily dismissed by experienced IT leaders is that the business case for adopting open source software should be grounded in its ability to easily translate directly to reduced operating costs. While it’s tempting to simply compare the licensing fees of proprietary systems against that of open source, the approach is both misleading and incorrect when evaluating the potential for return on investment of a database technology migration. Indeed, many of the compelling value propositions for open source solutions revolve around freedom of choice and agility, whether that be in implementation (by virtue of source code access), deployment, maintenance, or application architectures. Similarly, while proprietary software products typically imply a single choice of vendor for support, open source ecosystems by their very nature promote a marketplace of differentiated expertise and providers. This allows IT leaders to select the combination of internal and external resources necessary to enable a successful implementation.

Flexibility can be a double-edged sword, however, as there is an implicit tradeoff between choice and assumed risk. Indeed, the downside risks associated with open source databases can be quite high due to the pivotal role these systems serve in application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute a staged evaluation and rollout approach which allows IT teams to obtain clarity around key challenges across the lifecycle of a PostgreSQL deployment. With this in-hand, IT leaders can feel confident in their team’s ability to implement PostgreSQL in a manner that realizes the promised business case potential of open source in practice.

As with all legacy software from that era, Oracle’s proprietary system was closed source and incurred high licensing and support costs to implement, all attributes which remain true today. The landscape for alternative solutions began to change as open source options were developed in the 80s and 90s. One such project emerged from the University of California at Berkeley during this timeframe and became the PostgreSQL software that is widely adopted today. PostgreSQL is ACID-compliant and exhibits a high degree of conformance with the SQL standard. Over its history, PostgreSQL has proven to provide a high degree of stability and performance. Because of these capabilities, a natural question many IT teams managing legacy relational databases find themselves facing is whether, and how, to potentially replace proprietary systems with this popular open source alternative.

A brief historical context The business case for open source

The rise of open source databases and PostgreSQL

Introduction | To PostgreSQL and Beyond3

Page 4: To PostgreSQL and Beyond: The Definitive Guide to ......application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute

The purpose of this paper is to provide an overview of key concerns and challenges that should be considered for IT leaders considering a PostgreSQL implementation. By identifying these points early in the process, teams are better positioned to create and execute against a pragmatic evaluation and rollout plan.

The goal is to provide readers context around these critical questions, including hurdles they’ll face, as well as guidance that can help guide an internal evaluation process.

Helping you get prepared

To help readers frame these discussions, this paper incrementally steps through stages of deployment and the corresponding questions

that should be top of mind during plannin:

BEFORE

DAYONE

DAYTWO

Before you launch:

What do we need to know and consider before getting started?

What common pitfalls and tools should be on our radar?

Day-one challenges:

What needs to happen once we turn the system on?

How should we approach important operational tasks?

Day-two and beyond:

How can we mitigate the risks of data loss and service availability?

How do we manage the system over time?

Introduction | To PostgreSQL and Beyond4

Page 5: To PostgreSQL and Beyond: The Definitive Guide to ......application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute

READY PLAN YOU LAUNCH

Open source software projects are conducive to multiple implementation approaches, allowing operators to adopt a deployment model that works best for them. Accordingly, PostgreSQL can be instantiated in a variety of environments including:

� On-premise via bare metal or VM host

� On-premise via private cloud

� Public cloud

Each of these options has its own tradeoffs as well as implementation considerations.

The foundation of any successful technology implementation lies in the planning and research conducted prior to execution. This section outlines two important areas that IT teams must consider before launching a PostgreSQL rollout:

Deployment

Migrating existing databases

Deployment

BEFORE

Ready: Plan before you launch | To PostgreSQL and Beyond5

Page 6: To PostgreSQL and Beyond: The Definitive Guide to ......application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute

Hardware considerationsIn general, there is no prescriptive hardware specification for PostgreSQL, and it has been proven to run reliably with even modest configurations. In practice, the specific CPU and memory hardware configuration for a PostgreSQL server should be commensurate with the expected load for the intended application and environment. For example, an application may have high resource demands on its database tier in production, but for a testing environment the needs might be significantly reduced. In this aspect virtual machines convey advantages over bare metal hosts since they provide a higher degree of flexibility for mapping resources appropriately and adjusting as needed over time.

When it comes to storage, it is critical that the configured data target for the server maps to a physical resource that is highly reliable and fault tolerant. For bare metal implementations this often equates to the use of RAID (beyond RAID 0 and typically RAID 1 or RAID 10). When installing on a virtual machine, the virtual storage can map virtual disks to some form of networked storage (SAN or NAS) which is orchestrated using features specific to the virtualization platform. The ability to leverage solid state drives (SSDs) at the physical storage layer can provide a performance boost, but this isn’t necessarily required and may not be a cost-effective choice depending upon the performance requirements and characteristics of the overall application architecture.

Hardware considerations

When running PostgreSQL on-premise without a cloud orchestration layer, potential options include deploying directly onto bare metal or using virtual machines. When there is a high degree of sensitivity towards performance concerns, it’s common to employ a bare metal install to avoid any overheads incurred from virtualization software as well as support any desire to directly control CPU, memory, and storage configurations. Conversely, for IT environments that have adopted virtualization technologies and established a higher degree of comfort with the resource management and efficiency benefits provided therein, deploying databases using a virtual machine may be the preferable approach.

Both directions share common implementation options:

� Hardware configuration

� Operating system

� Installation approach (package versus source)

� Manual versus automation tools for deployment and configuration

PostgreSQL on bare metal or VM host

Ready: Plan before you launch | To PostgreSQL and Beyond6

Page 7: To PostgreSQL and Beyond: The Definitive Guide to ......application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute

PostgreSQL can be installed on a variety of Unix-like operating systems as well as Windows, though the broader ecosystem and community tend to coalesce around the former. Once a host operating system is selected, PostgreSQL binaries can be installed via package or compiled directly from source code.

Packages for all major operating systems can be obtained from the PostgreSQL website and are also maintained by Linux distributions. As we’ll discuss next, the selection decision between these often gets integrated into the automation tools adopted to streamline deployments.

Public cloud providers such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP) all support managed database services that allow users to quickly deploy PostgreSQL with minimal complexity. Compared to the option of directly installing the database within a deployed virtual machine, which is always available through traditional public cloud Infrastructure-as-a-Service (IaaS) abstractions, managed services offer the benefit of automating basic deployment and lifecycle tasks. IT teams can alleviate deployment complexity by virtue of adopting these Database-as-a-Service (DBaaS) capabilities.

Luckily, the benefits of DBaaS aren’t limited to the provenance of public cloud providers. Within on-premise environments, there are commercial solutions available which allow enterprises to provide capabilities which are comparable to public clouds. Stratoscale Symphony, for example, equips administrators with the ability to deploy a private cloud within private data centers that is compatible with AWS and GCP (Microsoft Azure coming soon) APIs. Moreover, Symphony runs on any hardware and can even be deployed in any enterprise environments, including edge locations, thereby providing a compelling degree of flexibility within a single software-defined data center platform. Symphony also extends its own DBaaS, Symphony Managed Databases (SMD), that can be exposed via any of the aforementioned public cloud APIs. Administrators can adopt Symphony to realize the benefit of a managed database service offering through their on-premise infrastructures while simultaneously supporting APIs that their teams are already familiar with.

While IT administrators can always install and configure PostgreSQL manually on a host via command line or UI (depending upon the operating system and installation package), it’s highly preferable to utilize some form of automation so that the process is both repeatable (e.g. for CI / CD as we’ll discuss later) and auditable.

There are open source recipes available for commonly adopted tools such as Ansible and Chef, amongst others, that encapsulate best practices and allow administrators to leverage knowledge from the broad community of users with experience deploying the database software across a variety of operating systems and configurations.

Software considerations PostgreSQL within cloud environments

Automation tools

Ready: Plan before you launch | To PostgreSQL and Beyond7

Page 8: To PostgreSQL and Beyond: The Definitive Guide to ......application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute

Table 1 summarizes the various deployment choices and environments reviewed in this section by comparing how key attributes vary when deploying PostgreSQL in a private datacenter, a private cloud (via DBaaS), or a public cloud product.

Table 1: Comparing PostgreSQL deployment approaches.

Comparing deployment approaches

Public cloud)DBaaS(

Select from availableoptions

Select from operatorconfigurable options

Selected byadministrator

Managed byvendor

Managed by serviceoperator

Selected byadministrator

Pre-packaged byvendor

Pre-packaged by operator

Option of OS specificpackages or source

Orchestrated by managed service

Orchestrated by managed service

Limited option to orchestrate via opensource scripts / tools

Hardwareselection

Operatingsystem

Packages versussource

Automation

Private cloud)DBaaS(

Private DC / Bare metal

Ready: Plan before you launch | To PostgreSQL and Beyond8

Page 9: To PostgreSQL and Beyond: The Definitive Guide to ......application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute

A key decision criteria for adopting any technology within an IT infrastructure is whether it can support requirements for existing applications. The first step for evaluating the former is to conduct functional tests across workloads. This level of analysis requires migrating data from existing storage systems to a proof-of-concept environment of the candidate database, in this case PostgreSQL.

Elements of this process include:

� Schema and code migration

� Data migration

� Application code migration

� Testing and evaluation

Migrating existing databases

As highlighted earlier, PostgreSQL offers ANSI standard SQL syntax and data types, which helps to make converting schemas from other databases a relatively straightforward process. However, there are still nuances that must be considered and addressed. For example, when converting from an Oracle database, one must map data types across systems due to subtle variances or to reap the benefits of data types in PostgreSQL which may not be available with Oracle. For example, the Text type supported by PostgreSQL can store up to 1GB of data and can be used in place of the special large CLOB type in Oracle. While some of the migration tools mentioned shortly can help automate the process, this data type mapping is a critical stage of the process and often requires some degree of manual oversight.

Schema and code migration

In addition to data types, existing databases typically have triggers, stored procedures, scripts, etc. that require modifications to instantiate equivalent functionality within PostgreSQL. Similar to the issues with data types encountered with schema migration, there are differences in database programming languages, PL/SQL for Oracle versus PL/pgSQL for PostgreSQL, that must be accounted for during migration. In some cases, this work might require mapping Oracle supported functions to their ANSI standard equivalents. For example, code that uses the PL/SQL nvl() function needs to be modified to instead call the PL/pgSQL coalesce() function. Depending upon the legacy database, this porting process may be the most intensive effort required for migration.

Along with migrating the basic data schema, it will also be necessary to identify and instantiate any users and corresponding permission models that applications depend upon. In some cases, this might well be a good opportunity to revisit overall data governance and security (items reviewed later in this paper).

Ready: Plan before you launch | To PostgreSQL and Beyond9

Page 10: To PostgreSQL and Beyond: The Definitive Guide to ......application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute

The next step after converting the database schema is to migrate the underlying data from the legacy database to the PostgreSQL server. An iterative approach here consists of migrating a sample test dataset and validating functionality before incurring the overheads of migrating an entire dataset. Depending on the deployment environment, tools that can help streamline and automate this process, often along with schema conversion outlined earlier, include:

� AWS SCT and DMS - Migration tools available for the AWS cloud

� Full Convert and DBConvert - Third party tools for converting across databases

� ora2pg and ora_migrator - Open source tools for migrating Oracle to PostgreSQL

A critical component of any migration consists of porting application-level code that needs to access the underlying database storage system. PostgreSQL has broad support for modern programming languages and compatible libraries are widely available. If you need to migrate an internal application which is architected in a manner that it can simply move from an existing driver to the PostgreSQL equivalent, the risk of extensive changes might be minimized. Applications that, for example, use generic object-relational mappings (ORM) or standards-compliant JDBC / ODBC interfaces and do not contain hard coded SQL typically carry over easily. In contrast, applications that incorporate database specific classes (e.g. Oracle) or in-line SQL queries will likely require revising as part of the migration.

Enterprises often make heavy use of commercial applications across their business units. Common examples include third-party software for enterprise resource planning (ERP), customer relationship management (CRM), accounting, and integrated workplace management systems (IWMS). These applications typically allow administrators to install and configure a vendor provided connector for their specific backend database. As part of assessing potential issues across workloads, administrators should confirm, and validate, availability of PostgreSQL support with all third-party applications under consideration for migration.

At the end of a successful database migration from a legacy database, teams should find themselves with an equivalent schema, procedures, and data in a PostgreSQL environment. Along with the requisite modifications to application-level code and connectors, this provides the foundation to conduct the in-depth testing and refinements necessary to develop confidence in the ability of the migrated database environment to provide functionality equivalent to the legacy system.

Data migration

Application code migration

Testing and evaluation

Ready: Plan before you launch | To PostgreSQL and Beyond10

Page 11: To PostgreSQL and Beyond: The Definitive Guide to ......application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute

In our modern era of Internet connected systems and heightened awareness around cybersecurity incidents, security is an area that is always under scrutiny and of concern from a reliability as well as business risk standpoint.

In a PostgreSQL deployment, security can be addressed through:

� Data encryption

� Server configuration

� Network configuration

Once an IT team has planned out a deployment and validated application functionality post-migration, the next concerns that rise to the top of mind coalesce around hardening a deployment for production usage. In this section, we’ll highlight corresponding areas related to PostgreSQL operations including:

Security

Monitoring and logging

Security

ONE

Start: Day-one challenges | To PostgreSQL and Beyond11

Page 12: To PostgreSQL and Beyond: The Definitive Guide to ......application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute

For compliance reasons as well as general best practice, there is often a desire to ensure data is encrypted both when in-flight and when at-rest. In order to encrypt data in-flight, client access to databases should be configured to utilize SSL, and the server should be configured to accept (and at the administrator’s discretion, require) secure connections. Packaged and managed versions of PostgreSQL will support SSL out-of-the-box. However, when compiling binaries from source the build configuration must specify including SSL support.

Encryption at rest can be implemented at various levels of the stack depending upon application and infrastructure requirements. At the database level, PostgreSQL’s pgcrypto module can be applied to encrypt and decrypt data based upon client provided keys. This approach is useful if there is no other encryption mechanism in place and there is specific sensitive data that should not be stored unencrypted. Otherwise, administrators can employ encryption at the file system level (e.g. in a bare metal or virtual machine deployment) or at the cloud layer. For example, the AWS RDS and Stratoscale SMD, for on-premise deployments, allow administrators to easily enable encryption of all data without having to manually manage keys with minimal performance impact.

When it comes to server configuration, one of the most important security aspects is to ensure proper adoption of users and roles to grant users, and thereby applications authenticating with those users, the minimum access they need. PostgreSQL supports the ability to assign ownership and permissions for specific tables, functions, etc. based upon administrator defined roles.

One effective strategy to employ this role abstraction is to define three types of roles:

1. Role roles

2. Group roles

3. User roles

The first type of role can be used to assign privilege over objects, while the second, group roles, serves as collections of the former. Finally, user roles, are granted one or more group roles and are the entities used for login and authentication by administrators and applications. This decoupling allows for granular and maintainable permission management as well as flexibility over time.

Data encryption

Server configuration

Start: Day-one challenges | To PostgreSQL and Beyond12

Page 13: To PostgreSQL and Beyond: The Definitive Guide to ......application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute

Introduction | To PostgreSQL and Beyond 1

Looking to automatePostgreSQL ops on-prem?Stratoscale brings the power of the public cloud toany enterprise environment and enables automation againstpublic cloud APIs.

Learn More

The network layer provides a final layer of security that should be implemented by IT and network administrators.

PostgreSQL does include the ability to specify inbound IP address ranges on a per-client basis via its pg_hba.conf file.

However, in many cases this can become unwieldy to manage over time, particularly if virtualization or cloud technologies are employed.

In these cases, or simply as a supplement, network ACLs and topologies can be managed via virtual networking or VPC primitives based upon the underlying cloud platform.

Solutions such as AWS and Stratoscale Symphony, for public or private cloud deployments respectively, shine here with their support for API-driven software defined networks which can be used to provision strict security policies while allowing for ease of management over time.

Network configuration

Start: Day-one challenges | To PostgreSQL and Beyond13

Contact Us

Page 14: To PostgreSQL and Beyond: The Definitive Guide to ......application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute

An effective monitoring and logging strategy is critical for maintaining the reliability, availability, and performance of database environments. The first step towards developing such a strategy is to create a plan based upon answers to questions such as:

� What are your key performance indicators (KPIs) for monitoring?

� What metrics, and at what frequencies, do you need to monitor for these KPIs?

� What monitoring tools do you need to track these metrics?

� Where do you employ passive monitoring and where do you need active notifications?

Monitoring and logging

It’s tempting to simply define monitoring requirements as “capture everything”. However, in practice the key is to determine what scenarios necessitate action and ensure monitoring instrumentation provides the insights needed to trigger the corresponding workflows.

Examples of monitoring KPIs might include:

� Application-level performance relative to SLAs

� Database query performance

Monitoring KPIs

� Database resource capacity

� Network connection performance

� Storage IO performance

� Session events

� Authentication failure events

While this list is by no means exhaustive, it illustrates the basic concept that there are a variety of performance indicators, including those that might be indicative of database security intrusions.

Start: Day-one challenges | To PostgreSQL and Beyond14

Page 15: To PostgreSQL and Beyond: The Definitive Guide to ......application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute

Once the the KPIs to monitor have been established, the next step is to map these to underlying metrics as well as define relevant statistics on those metrics. The point of the latter is to define whether it’s important to monitor for outliers, average values, percentile statistics, etc. Table 2 illustrates a few examples drawing on the KPIs from earlier. As highlighted, some statistics are related to threshold values while others revolve around time-series behavior which can be relevant for performance and anomaly detection purposes.

Metrics and stats

KPI

Application performance

Network connection performance

Database query performance

Storage IO performance

Database resource capacity

Session events

Authentication failure events

Application specific metric X

Throughput

Query latency / throughput (Reads and writes)

Throughput

Resource utilizations

# of active sessions

# of failures

99.5th percentile value of X < SLA

percent_of_capacity(t)

average_value(t)

percent_of_capacity(t)

average_utiilizations(t)

active_sessions(t)

number_per_hour < threshold

Metric Statistic

Start: Day-one challenges | To PostgreSQL and Beyond15

Page 16: To PostgreSQL and Beyond: The Definitive Guide to ......application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute

Implementing metric collection and statistical calculations typically requires incorporating monitoring tools that extract and aggregate information from across the system stack. Specifically pertaining to the database server, there are a variety of tools that support drivers to pull relevant data from PostgreSQL including Nagios, Datadog, and pgDash. Cloud environments often integrate advanced monitoring and logging capabilities via native services such as AWS CloudWatch or Symphony’s Monitoring Service (SMS). Services such as these significantly reduce the complexity of data collection and instrumentation for monitoring while alleviating the administrator from worrying about managing the storage of the corresponding data.

Both of these can be built upon the monitoring tools mentioned earlier.

Here, yet again, cloud solutions incorporate native features that significantly simplify implementation of monitoring and alerting strategies.

For example, the Symphony Monitoring Service (SMS) allows private cloud deployments of PostgreSQL to easily incorporate monitoring outbound alerts that target specific team groups and / or individuals. Moreover, Stratoscale SMD natively offers a single pane-of-glass dashboard which administrators can use to track various statistics related to database status, health, and usage.

The final step of implementing an effective monitoring and logging strategy is to determine how the collected data and analytics correlate to actions by the IT operations team. There are two approaches that can be used jointly at the discretion of the administrator:

1. Proactive workflows: Rely upon continuous, passive human monitoring via dashboards

2. Reactive workflows: Rely upon automated notification events which are received by human operation teams

Monitoring tools

Passive monitoring versus active notifications

Start: Day-one challenges | To PostgreSQL and Beyond16

Page 17: To PostgreSQL and Beyond: The Definitive Guide to ......application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute

ACCELERATE DAY- AND BEYOND

While security revolves around operational risk scenarios one hopes to prevent, failure recovery is all about mitigating risks from events you know will eventually happen. Specifically, hardware, network, and application failures are a matter of when, not if, and enterprise IT teams must be prepared to handle these scenarios without loss of critical business data and, ideally, with minimal downtime.

Approaches towards achieving these goals in practice must address:

� What happens in the case of a catastrophic failure?

� What happens if there are data integrity issues (perhaps due to application failure or bugs)?

� How can availability issues related to failures or increased load be handled?

After a PostgreSQL deployment is live, there are a variety of day-two scenarios that an IT team needs to plan for. In this final section, we’ll highlight a couple of corresponding areas including:

Backups and availability

Upgrades and DevOps

Backups and availability

TWO

Accelerate: Day-two and beyond | To PostgreSQL and Beyond17

Page 18: To PostgreSQL and Beyond: The Definitive Guide to ......application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute

Backups lie at the core of any strategy to address potential catastrophic database failures. How backups are realized in practice can vary based upon the selected deployment approach. For example, with a bare metal or virtual machine installation where there is no managed service layer, the built-in pg_dump utility can be leveraged to create periodic backup files. It is important that this application is invoked with super user credentials as otherwise it will scope the data backed up based upon the limited permissions of the less privileged user. The output of pg_dump consists of a text file that can be stored in a secure environment isolated from the database and later retrieved as needed for recovery. The corresponding disaster recovery operation consists of simply providing file as input to the psql program via command line.

While pg_dump is an approach that can be adopted for user implemented backup strategies, cloud environments typically offer backups and recovery as features within their managed database services. Taking AWS RDS and Stratoscale SMD services as examples, both of these extend administrators with the ability to define backup schedules as well as features to instantiate databases based upon any historical snapshot image.

Application failures or bugs can sometimes result in database corruption or integrity issues. These artifacts arise independent of the schema and are application specific, but when encountered they often necessitate rolling back the database to a previously known consistent state. The backup strategy highlighted earlier is, in most cases, not granular enough for these types of situations.

Luckily, PostgreSQL supports continuous archiving of transaction logs to allow for point-in-time recovery operations. Similar to backup operations, these archives can be created via command line utilities and custom scripts.

Again, these advanced operations are vastly simplified by managed cloud environments while maintaining temporal granularity. For example, with the AWS RDS service, database instances can be rolled back to the nearest five minute increment for a point-in-time recovery operation.

By combining application logs with these types of features, even advanced recovery scenarios such as those caused by application bugs or user operational errors can be resolved with relative ease.

Backup and disaster recovery strategies

Point-in-time recovery

Accelerate: Day-two and beyond | To PostgreSQL and Beyond18

Page 19: To PostgreSQL and Beyond: The Definitive Guide to ......application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute

Introduction 1

One final scenario to consider is the ability to address application-level impacts from failure scenarios as well as transient performance issues that can arise from load demands. For example, in the backup and recovery strategies reviewed earlier, any applications that require access to the underlying database would incur downtimes during the period of time between when a failure is detected and a human intervenes to ameliorate the issue. Particularly for user-facing applications, this can result in significant negative business impact and must be avoided whenever possible. IT administrators often seek high-availability (HA) configurations to address this requirement.

PostgreSQL can be deployed with a read replica to create HA configurations. This functionality is enabled by the same built-in transaction logging feature used for point-in-time recovery, except here logs are shipped to one or more standby servers. These standby servers can then provide two benefits: 1) In the case of high read query loads, requests can be distributed amongst replicas to avoid overloading the master; 2) On a master failure event, a standby replica can become the new primary, thereby minimizing any application downtime.

The orchestration of failure detection and failover is outside of the scope of the core PostgreSQL software, but there are various approaches such as stolon and Patroni that IT administrators can build upon to instantiate HA deployments. Alternatively, for cloud scenarios both AWS RDS and Stratoscale SMD services simplify clustering configurations by integrating the requisite orchestration functionality as part of their respective platforms.

Read replicas for availability and scalability Looking to deploy PostgreSQL on-prem?

Stratoscale brings the power of the public cloud to any enterprise environment and enables rapid deployment PostgreSQL clusters on-prem

Learn More

Accelerate: Day-two and beyond | To PostgreSQL and Beyond19

Contact Us

Page 20: To PostgreSQL and Beyond: The Definitive Guide to ......application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute

Another day-two dynamic that IT administrators need to prepare for revolves around software updates. When any component of a software system incurs an inevitable version change, it’s critical that there is a defined process to ensure: 1) The functionality and reliability of the overall application is not detrimentally impacted (e.g. there are no regressions); 2) The availability of the application components is minimally impacted during the live upgrade (e.g. no significant downtime).

Upgrades and DevOps

PostgreSQL provides multiple approaches to conduct a database software upgrade. The first consists of creating a new, separate instance side-by-side with the older version, migrating data as reviewed earlier, and then conducting a switch-over once the updated version is confirmed to be working as desired. While these steps may seem straightforward, in practice complications can arise during migration as well as due to the need to manage any corresponding cluster replicas (which would need to be updated as well) if deployed in such a configuration.

The second approach entails in-place upgrades. The precise steps vary based upon whether PostgreSQL is deployed using packages or compiled binaries, but in both cases once binaries for the new version have been installed on the host (bare metal or virtual machine), administrators can use the pg_upgrade tool to complete the upgrade operation. Since the database must be shutdown during this operation, downtime would be incurred during this process.

It goes without saying that regardless of the approach taken (side-by-side or in-place), a backup should be created prior to

Applications commonly require updates either independently of, or in some cases due to, database software upgrades. In fact, it’s common for internal application upgrades to occur at a relatively high velocity, and as a best practice IT and application development teams work together to adopt a continuous integration and continuous delivery (CI / CD) process. Commonly cited in the context of DevOps, a successful CI / CD pipeline builds upon many of the mechanisms we’ve already reviewed including:

� Deployment automation: Systems such as Jenkins or GoCD rely upon the ability to automate architectural component deployments for QA stages including database and application code. These systems can leverage automation tools targeting hosts (e.g. Ansible playbooks or Chef cookbooks) or APIs exposed by cloud managed services.

� Migration and backup recovery tools: Automated functional QA tests rely upon an ability to test against deterministic datasets. These can be instantiated as part of CI / CD automation through the data migration tools or backup recovery operations reviewed earlier.

PostgreSQL upgrade processApplication upgrades and CI / CD

attempting an upgrade to ensure that there is no accidental data loss.

Given the high potential for things to go wrong during an upgrade, cloud DBaaS services integrate database upgrades and patching as part of their offerings. Stratoscale SMD, for example, allows administrators to schedule these activities including conducting corresponding backups. While there’s no such thing as a risk-free database upgrade, managed database services allow administrators to reduce the potential for unexpected issues significantly.

Accelerate: Day-two and beyond | To PostgreSQL and Beyond20

Page 21: To PostgreSQL and Beyond: The Definitive Guide to ......application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute

COMPARING OPERATING ENVIRONMENTS

Introduction 1

Table 3 summarizes some of the points made in this paper by providing an overview of the planning, day-one, and day-two challenges discussed in the context of potential operating environments:

� Public cloud with DBaaS (e.g. AWS RDS or similar platform)

� Private cloud with DBaaS (e.g. Stratoscale SMD or similar platform)

� Private DC with a bare metal or VM PostgreSQL host

While an IT administrator is unlikely to select their environment based solely upon these tradeoffs, it is useful to review them when multiple candidates are under consideration. Artifacts that can be observed in the resulting summary:

� Private datacenter (DC) installations via bare metal or VM hosts offer the greatest flexibility, albeit with the most customization / work required

� Managed database services, which are widely adopted in the context of public cloud environments, provide many advantages in terms of reducing operational complexity and overhead for an IT team

� Private cloud environments implemented with solutions such as Stratoscale Symphony and its managed databases capabilities compare extremely well with public cloud alternatives, thereby providing significant flexibility to an IT team seeking to maintain on-premise applications

Looking to scale PostgreSQL on-prem?

Stratoscale brings the power of the public cloud to any enterprise environment and enables rapid scaling of PostgreSQL clusters on-prem

Contact Us

Comparing operating environments | To PostgreSQL and Beyond21

Page 22: To PostgreSQL and Beyond: The Definitive Guide to ......application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute

Public cloud (DBaaS such as AWS RDS)

Automated

Built-in platform support for automated backups and recovery

Cloud specific tools (example: AWS SCT / DMS)

Built-in platform support for replication and failover

Managed SSL.Managed encryption at-rest

Managed by platform

Easily integrate APIs with CI / CD tools

Built-in platform services for collection, analytics, and notification

Automated

Built-in platform support for replication and failover

Choice of open or proprietary tools

Choice of open or proprietary tools

Managed SSL.Managed encryption at-rest

Managed by platform

Easily integrate APIs with CI / CD tools

Built-in platform services for collection, analytics, and notification

Manual or automated through open source tools

PostgreSQL supported tools for backup and recovery steps

Choice of open or proprietary tools

Ecosystem of tools and proprietary products for HA configurations and replication

Self-managed SSL Encryption via pgcrypto or file system

Managed by platform

Can integrate CI / CD tools with selected deployment tools / processes (time-consuming, requires special skill-set)

Third party tools that support PostgreSQL

Deployment

Backups

Migration

Availability

Security

Upgrades

Devops

Monitoring

Private cloud (DBaaS such as Stratoscale SMD)

Private DC (bare metal / VM)

Table 3: Comparison of potential operating environments.

Comparing operating environments | To PostgreSQL and Beyond22

Page 23: To PostgreSQL and Beyond: The Definitive Guide to ......application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute

CONCLUSION

While the business case for adopting open source seems straight forward, extracting the value in practice can be challenging for even the most experienced IT leaders. The key to success lies in effective preparation and planning. Accordingly, this paper offered foundational information and guidance that can help identify important decisions and questions needed for initial launch as well as addressing day-one and day-two issues.

By incorporating this information into a staged rollout strategy, an IT team canproceed confidently when executing a PostgreSQL implementation across a variety of operating environments including private data centers, private clouds, and public cloud infrastructures.

Conclusion | To PostgreSQL and Beyond23

Page 24: To PostgreSQL and Beyond: The Definitive Guide to ......application stacks, and thereby, the overall business. The best strategy to mitigate these risks is to adequately plan and execute

ABOUT STRATOSCALE

Stratoscale delivers a software-defined data center platform that enables true IaaS, PaaS and CaaS in data centers and edge locations. Symphony runs on any hardware and is combined with cloud management features such as centralized user access management, self-service portals, integrated metering for showback / chargeback, and more.

In addition, Stratoscale delivers a suite of managed open source platforms for developers to accelerate application development and delivery, including Kubernetes, a wide array of databases (MySQL, PostgreSQL, MongoDB, Redis, Cassandra, MariaDB, etc.), MapReduce (Hadoop, Spark, HBase), load balancers, object storage, file systems, and many more. By offering AWS compatible APIs, Stratoscale enables multi-cloud and hybrid applications and supports advanced DevOps and Infrastructure-as-Code in enterprise environments.

For more information please visit us:

www.stratoscale.com US Phone: +1 877 420-3244 | Email: [email protected]

About Stratoscale | To PostgreSQL and Beyond24