Upload
sean-chittenden
View
558
Download
2
Embed Size (px)
Citation preview
PostgreSQL High-Availability and Geographic Locality using consul
Sean ChittendenEngineering, HashiCorp@[email protected]://keybase.io/seanc
Quick Demo
Consul Consul
dc2dc1PostgreSQLFollower
PostgreSQLLeaderPostgreSQL
Follower
CONSULHASHICORP
HASHICORP
Key Value Store
HTTP API
Host & Service Level Health
Checks
Datacenter Aware
Consul solves four central challenges with SOA
Service Discovery
HTTP + DNS
HASHICORP
Consul Installation
HASHICORP
Overview
1. Introduction to Consul
2. Review of Consul
a. Architecture
b. Agent Functionality
c. Agent Configuration
d. Features
3. Further Reading
HASHICORP
Introduction
HASHICORP
Consul powers runtime orchestration
CONSULHASHICORP
1. Service discovery
2. Service registry
3. Key/value store
4. Health checks
HASHICORP
Glossary
Agent - Long-running daemon on every member of the Consul
cluster. The agent is able to run in either client or server mode.
Client - Agent that forwards all RPCs to a server and
participates in the LAN gossip pool.
Server - Agent that maintains cluster state, responds to RPC
queries, exchanges WAN gossip with other datacenters, and
forwards queries to leaders of remote datacenters.
Consensus - Agreement upon the elected leader
HASHICORP
Glossary
Gossip - Random node-to-node communication primarily over
UDP that provides membership, failure detection, and event
broadcast information to the cluster. Built on Serf. Consul has
both LAN and WAN Gossip.
Datacenter - Networking environment that is private, low latency,
and high bandwidth. A Consul cluster is run per datacenter, so its
important to have low latency for the gossip protocol.
HASHICORP
Consul vs. Other Software
- Opinionated framework for service discovery using DNS or HTTP
- Scalable gossip system that links server nodes and clients- Distributed health checking with edge triggered updates- Globally aware with multi-datacenter support- Operationally simple- Incorporation into the HashiCorp ecosystem
HASHICORP
Architecture
HASHICORP
Single Datacenter
CLIENT CLIENT CLIENT CLIENT CLIENT CLIENT
SERVER SERVER SERVERREPLICATION REPLICATION
RPC
RPC LAN GOSSIP
HASHICORP
Multi-Datacenter
CLIENT CLIENT CLIENT CLIENT CLIENT CLIENT
SERVER SERVER SERVERREPLICATION REPLICATION
RPC
RPC LAN GOSSIP
SERVERSERVER SERVERREPLICATION REPLICATION
WAN GOSSIP
HASHICORP
Raft Introduction
~/src/raft/thesecretlivesofdata/raft
open index.html
~/src/raft/raftscope
open index.html
HASHICORP
TCP and UDP Ports
Client HTTP RPCTCP/8500
DNSTCP/8600UDP/8600 LAN Gossip
TCP/8301UDP/8301
LAN GossipTCP/8301UDP/8301
RPCTCP/8400
RPCTCP/8400
WAN GossipTCP/8301UDP/8301
Clients consul1.dc1
Client RPC(HTTP)
DNSTCP/8600UDP/8600
Server RPCTCP/8300
consulN.dc2 consul2.dc1
HASHICORP
Agent functionality (client or server)
- RPC, HTTP, DNS APIs
- Health Checks
- Event Execution
- Gossip Participation
- Membership
- Failure detection
HASHICORP
Agent functionality (server)
- State replication
- Query Handling
- Leader election
- WAN Gossip
HASHICORP
Failover via DNS
HASHICORP
DNS Failover
• Works across L3 boundaries in LAN environments
• Works across L3 boundaries in WAN environments
• Small TTLs• Workload Distribution• Clients cache DNS data• Not subject to spanning-tree
• Requires TCP connections be reset on failover
• Clients can cache stale DNS data
Pro Con
HASHICORP
Consul Installation
HASHICORP
consul Server 1/3
% cat config.json{ "acl_datacenter": "lab1", "acl_default_policy": "deny", "acl_master_token": "rootToken", "addresses": { "dns": "0.0.0.0", "http": "unix:///tmp/.consul.http.sock", "https": "0.0.0.0", "rpc": "unix:///tmp/.consul.rpc.sock" }, "bootstrap_expect": 3, "datacenter": "lab1", "data_dir": "./svc/data", "disable_remote_exec": true,
HASHICORP
Consul Server 2/3
"dns_config": { "allow_stale": true, "max_stale": "10080m", "node_ttl": "60s", "service_ttl": { "*": "5s", "stable-service": "86400s" } }, "encrypt": "[ random mime encoded data ]", "log_level": "debug", "ports": { "https": -1 }, "server": true, "unix_sockets": { "mode": "0700" }}
HASHICORP
Consul Server 3/3
% cat svc/run#!/bin/sh --set -eexec 2>&1exec \ /usr/bin/env -i \ ./bin/consul agent \ -config-file=./config.json \ -config-dir=./conf.d/
% cat svc/log/run#!/bin/sh —set -eset 2>&1exec chpst -u _log:_log svlogd ./main
HASHICORP
Consul Cluster
% consul membersNode Address Status Type Build Protocol DCvm1 172.16.139.140:8301 alive server 0.7.0dev 2 lab1% consul join 172.16.139.139 172.16.139.138Successfully joined cluster by contacting 2 nodes.% consul membersNode Address Status Type Build Protocol DCvm1 172.16.139.140:8301 alive server 0.7.0dev 2 lab1vm2 172.16.139.138:8301 alive server 0.7.0dev 2 lab1vm3 172.16.139.139:8301 alive server 0.7.0dev 2 lab1
HASHICORP
Consul Cluster
% consul infoagent: check_monitors = 0 check_ttls = 0 checks = 0 services = 1build: prerelease = dev revision = 'fa26d5f version = 0.7.0consul: bootstrap = false known_datacenters = 2 leader = false leader_addr = 172.16.139.139:8300 server = true[snip]
HASHICORP
Consul Cluster
% consul info[snip]raft: applied_index = 103339 commit_index = 103339 fsm_pending = 0 last_contact = 82.95803ms last_log_index = 103339 last_log_term = 50663 last_snapshot_index = 98437 last_snapshot_term = 2228 num_peers = 2 raft_peers = 172.16.139.139:8300,172.16.139.138:8300,172.16.139.140:8300 state = Follower term = 50663[snip]
HASHICORP
dnsmasq Config
% cat /usr/local/etc/dnsmasq.conflocal-serviceport=53server=/consul/127.0.0.1#8600rev-server=172.16.0.0/12,127.0.0.1#8600server=208.67.222.222server=208.67.220.220cache-size=65536% cat /etc/resov.confsearch localdomainnameserver 127.0.0.1
HASHICORP
Service DiscoveryHTTP + DNS
HASHICORP
- Nodes, Services, Checks
- Simple registration (JSON)
- DNS Interface
- HTTP API
Service Discovery
HASHICORP
PostgreSQL Service
% hostnamepg002% cat config.d/pg-db.json{ "service": { "name": "pg-db", "tags": ["follower"], "port": 5432, "checks": [{ "id": "pg-alive", "notes": "Make sure connect and queries work", "script": "/usr/local/bin/check_postgresql", "interval": "10s" }] }}
Terminal
HASHICORP
$ dig follower.pg-db.service.consul
Terminal
HASHICORP
$ dig follower.pg-db.service.consul; <<>> DiG 9.8.3-P1 <<>> follower.pg-db.service.consul; (3 servers found);; global options: +cmd;; Got answer:;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 946;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0;; WARNING: recursion requested but not available
;; QUESTION SECTION:;follower.pg-db.service.consul. IN A
;; ANSWER SECTION:follower.pg-db.service.consul. 0 IN A 172.16.139.141
Terminal
HASHICORP
$ dig follower.pg-db.service.consul SRV; <<>> DiG 9.8.3-P1 <<>> follower.pg-db.service.consul SRV; (3 servers found);; global options: +cmd;; Got answer:;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 480;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1;; WARNING: recursion requested but not available
;; QUESTION SECTION:;follower.pg-db.service.consul. IN SRV
;; ANSWER SECTION:follower.pg-db.service.consul. 0 IN SRV 1 1 5432
HASHICORP
DNS Interface
- Zero Touch
- Randomized Round-Robin DNS
- Filters on Health Checks
HASHICORP
HTTP API
- HTTP API
- Custom Integrations
HASHICORP
Host & Service Level Health Checks
HASHICORP
What is a health check?
0PASSING
1WARNING
__FAILING
Any command that returns an exit code
HASHICORP
Health Checks & Monitoring
- Nagios-compatible
- Scalable
- Actionable
- Edge Triggered
Text Editor
HASHICORP
% cat conf.d/mem-check.json{ "check": { "id": "mem-util", "name": "Memory utilization", "script": "/usr/local/bin/mem_check.sh", "interval": "10s" }}
Creating a check
Use a custom script
Text Editor
HASHICORP
% cat conf.d/http-check.json{ "check": { "id": "api", "name": "HTTP API on port 4455", "http": "http://localhost:4455/_health", "interval": "10s", "timeout": "1s" }}
Creating a check
Use a built-in check type
HASHICORP
Traditional Health Checking (pull)
HEALTH CHECKINGSERVICE
DB 1
DB 2
DB N
"Are you healthy?"
HASHICORP
Traditional Health Checking (pull)
HEALTH CHECKINGSERVICE
DB 1
DB 2
DB N
"Are you healthy?""Yessir!"
HASHICORP
Traditional Health Checking (pull)
HEALTH CHECKINGSERVICE
DB 1
DB 2
DB N
"Are you healthy?"
"What about you?"
"Yessir!"
HASHICORP
Traditional Health Checking (pull)
HEALTH CHECKINGSERVICE
DB 1
DB 2
DB N
"Are you healthy?"
"What about you?"
"Yessir!"
"Nah"
HASHICORP
Traditional Health Checking (pull)
DB 1
DB 2
DB N
HEALTH CHECKINGSERVICE
"Are you healthy?"
"What about you?"
"Yessir!"
"Nah"
HASHICORP
Traditional Health Checking (pull)
DB 1
DB 2
DB N
HEALTH CHECKINGSERVICE
1,000'S OFREQUESTS
HASHICORP
Consul Health Checking (push)
CONSUL
DB 1
DB 2
DB N
My status has changed
HASHICORP
Consul Health Checking (push)
CONSUL
DB 1
DB 2
DB N
10'S OFREQUESTS
HASHICORP
Liveness
- No Heartbeats
- Gossip-based Failure Detector built
on Serf
- Constant Load
HASHICORP
HTTP UI
http://172.16.139.138:8500/ui/#/lab1/services
HASHICORP
Key Value StoreHTTP API
Terminal
HASHICORP
$ curl -X PUT -d 'bar' http://localhost:8500/v1/kv/footrue
Terminal
HASHICORP
$ curl -X PUT -d 'bar' http://localhost:8500/v1/kv/footrue
$ curl http://localhost:8500/v1/kv/foo[ { "CreateIndex": 100, "ModifyIndex": 200, "Key": "foo", "Flags": 0, "Value": "YmFy" }]% echo -n 'bar' | base64YmFy% echo -n 'YmFy' | base64 -d ; echobar
Terminal
HASHICORP
% cat <<EOF > acl.anonymous.json{ "ID": "anonymous", "Name": "Anonymous Token", "Type": "client", "Rules": "# Default all keys to read-onlykey \"\" { policy = \"read\"}
# Default all services to read-onlyservice \"\" { policy = \"read\"}
# Allow hearing any user event by default.event \"\" { policy = \"read\"}
Terminal
HASHICORP
# Default prepared queries to read-only.query \"\" { policy = \"read\"}
# Read-only mode for the encryption keyring by default (list only)keyring = \"read\""}EOF% curl -v -X PUT -d @acl.anonymous.json --unix-socket /tmp/.consul.http.sock 'http://consul/v1/acl/update?token=rootToken'
Prepared Queries
Use Case• Multiple instances of a given service exist in
multiple datacenters
• Clients can talk to any of them, and always prefer the instances with lowest latency
• Policies can change, desire to not have the clients know the details of how to locate a healthy service
Prepared Queries• New query namespace, similar to services
• Register queries to answer for parts of this namespace
• Clients use APIs, or “.query.consul” DNS lookups to run queries
• Magic happens :-)
pg-db with Failover$ curl -X POST -d \'{ "Name": "geo-pg-db—follower", "Service": { "Service": "pg-db", "Failover": { "NearestN": 3 }, "Tags": ["follower"] }}’ localhost:8500/v1/query
geo-pgdb—follower.query.consul
PostgreSQL Template$ curl -X POST -d \'{ "Name": "geo-db", "Template": { "Type": "name_prefix_match", "Regexp": "^geo-db-(.*?)-([^\\-]+?)$" }, "Service": { "Service": "pg—${match(1)}", "Failover": { "NearestN": 3, "Datacenters": ["dc1", "dc2"] }, "OnlyPassing": true, "Tags": ["${match(2)}"] }}' localhost:8500/v1/query
geo-db-customer-leader.query.consulgeo-db-customer-follower.query.consulgeo-db-billing-follower.query.consul
leader.pg-customer.service.consulfollower.pg-customer.service.consulfollower.pg-billing.service.consul
Catch All Template$ curl -X POST -d \'{ "Name": "", "Template": { "Type": "name_prefix_match" }, "Service": { "Service": "${name.full}", "Failover": { "NearestN": 3 } }}' localhost:8500/v1/query
*.query.consul
With a single query template, all services can fail over to the nearest healthy service in a different datacenter!
Under the Hood: Network Tomography
• Rides on pings that are part of LAN and WAN gossip
• Models networking round trip time using simple physics simulation with masses and springs
• Develops a set of “network coordinates” for round trip time estimation with a simple calculation
Under the Hood: Network Tomography
HASHICORP
ConsulConclusion
HASHICORP
Key Value Store
HTTP API
Host & Service Level Health
Checks
Datacenter Aware
Consul solves four central challenges with SOA
Service Discovery
HTTP + DNS
HASHICORP
Further reading
- Consul vs. Other Software:
consul.io/intro/vs/index.html- Consul Agent:
consul.io/docs/agent/basics.html- Consul Commands:
consul.io/docs/commands/index.html- Consul Internals:
consul.io/docs/internals/index.html
Questions?