52
High Availability DevOps HA Features for Docker Swarm and GitLab

High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

High Availability

DevOps

HA Features for

Docker Swarm and

GitLab

Page 2: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

High-Availability DevOps

Deploying and managing a DevOps environment requires

attention to the elimination of single points of failure.

Using open source High-Availability and Desired State

Configuration tools, we address the availability and

maintainability of our overall DevOps environment and the

resources and services that it requires.

Page 3: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Topics to be Covered

• DevOps single points of failure

• Tools and methods to ameliorate risk

• Infrastructure as Code

• SRE Error Budgeting

Page 4: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Environment Overview

• Test and Prod Swarms, each 5 nodes– Docker CE

– Ubuntu 18.04

• GitLab CE for CI/CD and Docker Registry

• Apache 2 Load Balancers for Apps

• SaltStack codebase defines infrastructure

Page 5: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Failure Modes

Page 6: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Example DevOps Infrastructure

Orchestration

4

3

5

2

1

Codebase, Integration &

Deployment

Infrastructure &

Services

4

3

5

2

1

TEST

PROD

GitLabDatabases

Services

Applications

Infrastructure as Code

SaltStack

Page 7: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Application Dependencies

network LB

RISK: 1 load

balancer for

ingress

4

3

5

2

1

RISK: node

availability,

ingress

availability

LDAP

SMTP

SQL

Infrastructure

Services

Page 8: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Deployment Dependencies

Repository CI/CD

Script

RISK:

single VM

REMEDIATION:

HA Deployment

or Cloud

RISK:

TEST==PROD?

REMEDIATION:

Same CI Code

with Interpolation

go

Deploy

Container

(Runner)

RISK:

Runner Available

REMEDIATION:

Pacemaker

Bundle

audit+

health+

monitor

Validation

RISK:

Did it deploy and

stay deployed?

REMEDIATION:

Auditing,

Healthcheck,

Monitoring

Page 9: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Swarm Topology

node3

node5

node2

node1

Manager

Leader

Runner

Ingress

node4

Page 10: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Swarm Topology Failure Response

● Partition might lead to a leader

election

● Mesh network means any

node can have an ingress to a

stack’s service.

● Swarm will try to maintain

replica requirement.

node3

node5

node2

node1

node4

Page 11: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

(our) Swarm Integration 1

● In order to run docker stack deploy a GitLab runner (a

container) must be on a manager node — we’re making

all peer nodes managers and using Pacemaker bundle

to ensure container start.

● Having a DNS VIP ingress requires network and Docker

reconfiguration and restart (we have a script in salt-call

and call that from a Pacemaker alert monitor.)

Page 12: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

(our) Swarm Integration 2

● Although Docker Swarm is supposed to ensure that the

requested number of replicas are started, in practice,

there is occasionally a deficit, especially after an event.

● After an cluster event, another salt-call script is run that

simply updates any service not running enough

replicas.

● Automated deployment and service updates requires

valid registry authorization (We use CI_TOKEN in

deployment with a credential helper.)

Page 13: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

(our) Load Balancer

• Apache2 with mod_proxy

• Location directive to map URI to a service

• One load balancer: unscheduled

maintenance impossible

• One proxy entry: single point of ingress

Page 14: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Application Environment

● Applications behind LB could be in

container environments, on VM or in

cloud.

● Container environment is Docker Swarm

● Services generally provisioned by VMs

Page 15: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Load Balancer Topologies

LB

1

2

4

3

5

<Location /app1>

RedirectMatch "(.*)/app1$" \

"https://appsdemo.holycross.edu/apps1/$1"

require all granted

ProxyPass https://swarmdemo1.holycross.edu:6549 retry=5 \

acquire=3000 timeout=600 Keepalive=On

...

ProxyPass https://swarmdemo5.holycross.edu:6549 retry=5 \

acquire=3000 timeout=600 Keepalive=On

ProxyPassReverse https://swarmdemo1.holycross.edu:6549

...

ProxyPassReverse https://swarmdemo5.holycross.edu:6549

SetEnv proxy-sendchunked 1

</Location>

Page 16: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Pacemaker

Load Balancer Clustered Ingress

LB

1

2

4

3

5

<Location /app1>

RedirectMatch "(.*)/app1$" \

"https://appsdemo.holycross.edu/apps1/$1"

require all granted

ProxyPass https://swarmdemo.holycross.edu:6549 retry=5 \

acquire=3000 timeout=600 Keepalive=On

ProxyPassReverse https://swarmdemo.holycross.edu:6549

SetEnv proxy-sendchunked 1

</Location>

Page 17: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Pacemaker Pacemaker

Clustered Load Balancer

LB1

2

4

3

5

<Location /app1>

RedirectMatch "(.*)/app1$" \

"https://appsdemo.holycross.edu/apps1/$1"

require all granted

ProxyPass https://swarmdemo.holycross.edu:6549 retry=5 \

acquire=3000 timeout=600 Keepalive=On

ProxyPassReverse https://swarmdemo.holycross.edu:6549

SetEnv proxy-sendchunked 1

</Location>

LB

Page 18: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Pacemaker Pacemaker

Dual Ingress

LB1

2

4

B

3

5

A

<Location /app1>

RedirectMatch "(.*)/app1$" \

"https://appsdemo.holycross.edu/apps1/$1"

require all granted

ProxyPass https://swarmdemoA.holycross.edu:6549 retry=5 \

acquire=3000 timeout=600 Keepalive=On

ProxyPass https://swarmdemoB.holycross.edu:6549 retry=5 \

acquire=3000 timeout=600 Keepalive=On

ProxyPassReverse https://swarmdemoA.holycross.edu:6549

ProxyPassReverse https://swarmdemoB.holycross.edu:6549

SetEnv proxy-sendchunked 1

</Location>

LB

Page 19: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Reducing Risk1 Ingress

1 Balancer

HA Ingress

1 Balancer

HA Ingress

HA Balancer

HA Ingress (2)

Single Point

Failure?ES

YES YES NO NO

Transition

Ingress (s)

Intervention 13 sec. 13 sec. < 13 sec.

Transition

Balancer (s)

Intervention Intervention 1 sec. < 1 sec.

Page 20: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

HA Load Balancer

● Configure 2 (or more) Apache servers with

proxy configuration in a Pacemaker

configuration with a VIP.

● If a load balancer crashes or needs

maintenance, Pacemaker can move the load

balancer service to an alternate node, manually

or automatically.

Page 21: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

DevOps Storage Models

• Storage reliability and manageability is

already fairly high because of clustering

and LVM.

• Many storage requirements can be

managed using databases, repositories,

or tagged storage.

Page 22: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Storage Failure Modes

• One way to manage larger storage usage

by a service is to map it to a Docker

volume through a share/mount.

• This presents an availability issue for the

sharing node, either for node failure or a

maintenance window.

Page 23: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Tools & Methods

Page 24: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Tools & Methods Overview

• HA Cluster– Pacemaker/Corosync

• Desired State Configuration– SaltStack

• Highly Available Storage– DRBD, S2D

Page 25: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

High-Availability Clustering

• IPaddr2 resource virtual IP resource will be

auto-managed by the cluster.

• alerts event handlers run on nodes before or

after a cluster event, used to update

configuration.

• Docker bundle ensures that GitLab runner

containers are on each node.

Page 26: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Desired State Configuration

• Configuration for Docker, the cluster, alerts

and the ingress VIP stored in a YAML pillar

database.

• (push) salt state.apply to build Docker

nodes, configure alerts, VIP, etc.

• (pull) salt-call state.apply to update running

configuration of a node’s daemon.json.

Page 27: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Redundant Swarm Ingress

pillar YAML configuration for Virtual IP:

swarmtest_vip_cib:

resource:

swarmtest_vip:

resource_type: "ocf:heartbeat:IPaddr2"

resource_options:

- 'ip=192.168.1.120'

- 'cidr_netmask=32'

- 'iflabel=IP_VIRTUAL'

Page 28: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Docker Node Self-Configuration

● At initial node build, or on an event, SaltStack reads the

configuration in serialized (JSON) form from a Salt

‘pillar’ data set.

● The Salt ‘pillar’ is also dynamically configured with

current network configuration, independently of the

logical configuration of the Swarm.

● Changes to the /etc/docker/daemon.json file will trigger

a restart of Docker (i.e., with updated network

addresses.)

Page 29: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Daemon_JSON Salt pillar fragmentDaemon_JSON:

{{grains.get('docker-swarm-name','')}}:

hosts:

- "fd://"

{% for interface,addresses in grains.get('ip4_interfaces',{}).items() %}

{% if interface is not match('docker*') %}

{% for ip in addresses %}

- "tcp://{{ip}}:2376"

{# addresses #}

{% endfor %}

{% endif %}

{% endfor %}

storage-driver: overlay2

Page 30: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Docker.daemon state fragment 1

Daemon_Running:

service.running:

- name: docker

- enable: True

- restart: True

- watch:

- file: /etc/docker/daemon.json

Page 31: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Docker.daemon state fragment 2

Daemon_JSON_{{pillar_name}}:

file.serialize:

- name: /etc/docker/daemon.json

- dataset_pillar: "{{pillar_path}}"

- formatter: json

- merge_if_exists: True

- show_changes: True

- user: root

- group: root

- mode: 644

Page 32: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Redundant Services

● load balancers

(Apache, NGINX)

● smtp gateway

(Postfix, sendmail)

Page 33: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Redundant Filesystems and Shares

● Vendor solutions

○ EMC Isilon (CIFS+NFS)

○ Netapp (CIFS+NFS)

○ Pure Storage (CIFS+NFS)

● Microsoft

○ Azure Stack HCI (CIFS/ReFS)

● Open source

○ DRBD (NFS)

○ CEPH (CIFS, NFS, S3)

Page 34: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Docker Volumes and HA

● Most of our containerized applications either

use a database directory, or manage data on a

docker volume through a repository.

● We have a few static websites where we need

HA disks which we map onto docker volumes

via network filesystems.

Page 35: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Mapping NFS to Docker

● Allow docker swarm to manage the NFS

or CIFS mount in the compose file.

● HA disk server keeps mount available

during unexpected or scheduled

downtime.

Page 36: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Compose NFS Mount Definition

volumes:

- type: volume

source: web-cgibinintranet

Target:

$HC_WEB_CGIBININTRANET_MOUNTPOINT

volume:

nocopy: true

Page 37: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Compose NFS Volume Definition

volumes:

web-cgibinintranet:

driver_opts:

type: "nfs"

o:

"nfsvers=4,addr=sanfs1.holycross.edu,ro"

device: ":/sanfs/cgibinintranet"

Page 38: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Compose CIFS Mount Definition

volumes:

- type: volume

source: web-cifs

target: $HC_ALT_LEGACY_MOUNTPOINT

volume:

nocopy: true

Page 39: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Compose CIFS Volume Definition

volumes:

web-cifs:

driver_opts:

type: "cifs"

o:

"username=${USER},password=${PASS},domain=${DOM

AIN},iocharset=utf8,uid=${UID},gid=${GID}"

device: "//${SMB_SERVER}/web/legacy"

Page 40: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Infrastructure as Code

Page 41: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Motivations and Benefits

• Apply DevOps to system administration

– Repository, pipelines, issues, documentation

• Push configuration to build standardized templates,

validate, deploy and audit.

• Pull configuration for events, triggers, self-configuration.

• Extend and reuse code across platforms.

Page 42: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Build and Deployment

• We apply about 333 formulas on a typical Linux deploy

to ensure desired configuration.

• 5-10 minutes to deploy a Linux template after adding it

to authentication domain and defining some metadata.

• Build and deploy a Docker node in about 20-30 minutes

using a base template deploy.

• We also build complex clusters and application services

on top of nodes built this way.

Page 43: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Validation and Audit

• We validate and apply over 200 CIS rules on a base

Linux deployment, and additional CIS rules for Docker,

MySQL, Postgres, Apache, as well as internnally

developed best practices.

• Standardized tags on best practice rules can be parsed

into JSON for parsing into compliance reports and

documentation.

Page 44: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Self-Configuration

• Interactively fix a configuration knowing the change is

already documented in code.

• An event or trigger can reconfigure the running system

according to current state rather than the state at build

time.

Page 45: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

SRE Error Budgeting

Page 46: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Infeasible 100%

• As much as we’d like to have 100% uptime, we cannot

possibly guarantee that, and all of our infrastructure

needs occasional maintenance.

• We perform scheduled maintenance, but it is difficult to

schedule, and disruptive. DevOps, clustering and

virtualization generally has increased our ability to

safely perform unscheduled maintenance.

Page 47: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Current Monitoring

• Our current monitoring is cloud based, and simply

measures service availability.

• We need richer indicators with measurable objectives,

that lead to defined responses.

Page 48: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Service Level Indicators: SLI

• SLI - service level indicator.

– A good SLI should be at least a scalar, e.g., instead

of measuring ‘uptime’, we could measure ‘errors per

interval.’

– Try to standardize common SLIs for reuse.

– Naturally, some SLIs will be specific to the service.

Page 49: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Service Level Objectives: SLO

• Set internal objectives which will be used to manage

change.

• Typically once you set the SLO, say, “99.5% of

transactions will have an average latency of less than

500ms”, you define your error budget as 1 - n. So in

that case, if your latency average climbs above 500,

you exceeded your error budget. When we exceed our

error budget, we change our focus from new features to

stability.

Page 50: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Service Level Agreement: SLA

• The SLA will be the agreement you have with the

customer, and it will generally be a looser objective than

the SLO.

• As in the SLO, the SLA will need to have consequences

for exceeding the error budget, in the case of an internal

customer, perhaps a review.

Page 51: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Tool Versions

• ClusterLabs pacemaker 1.1.18

• RedHat corosync 2.4.3

• Docker CE 19.03.4

• Ubuntu Server 18.04

• Windows Server 2016 Datacenter 1607

• Virtual Box (for demos) 6.0.14

Page 52: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker

Some Unreviewed Tools

• Load balancing with IPVS/VRRP– Keepalived

– NGINX

– Traefik

• Storage Alternatives– S2D Azure Stack HCI

– CEPH