Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Novel Multi-region ClustersCassandra Deployments Split Between Heterogeneous Data Centres
with NAT & DNS-SD
#CassandraSummit
Instaclustr
• Instaclustr provides Cassandra-as-a-service in the cloud (Currently only on AWS — Google Cloud in private beta)
• We currently manage 50+ Cassandra nodes for various customers
• We often get requests to do cool things — and try and make it happen!
Multi-DC @ Instaclustr• Cloud ⇄ cloud, “classic” internet-facing data centre ⇄ cloud
• Works out-of-the-box today.
• Requires per-node public IP
• Private network clusters ⇄ Cloud clusters
• Easy if your private network allocates per-node public IP addresses
• VPNs
• Something else?
• Overview of multi- region/data centre clusters
• What is supported out-of-the-box
• Alternative solutions
• Supporting technology overview (NAT/PAT and DNS-SD)
• Implementation
Single Node
• What you get from running apt-get install cassandra and /usr/bin/cassandra
• Fragile (no redundancy)
• Dev/test/sandbox only
C*
Multi-node, Single Data Centre• Two or more servers running
Cassandra within one DC
• Replication of data (redundancy)
• Increased capacity (storage + throughput)
• Baseline for production clusters
C* C*
C*
Multi-node, Multi-DC
• Cassandra running in two or more data centres
• Global deployments
• Data near your customers (reduced latency)
• Supported out-of-the-box
C* C*
C*
C* C*
C*
C* C*
C*
Snitches• Understands data centres and racks
• Implementation may automatically determine node DC and rack (EC2MultiRegionSnitch uses AWS internal metadata service, GossipingPropertiesFileSnitch loads a .properties file)
• Node DC and rack is advertised via Gossip
• Determine node proximity (estimated link latency)
• Cluster may use a combination of Snitch implementations
Data Centres
• Collection of Racks
• Complete replications
• Geographically separate
• Possibly high-latency interconnects (e.g. East Coast US → Sydney, ~300ms round-trip)
Racks
• Collection of nodes
• May fail as a single unit
• Modelled on the traditional DC rack/cage (n-servers running of a UPS)
☁• Amazon Web Services
(use EC2MultiRegionSnitch)
• Data Centre ≡ AWS Region(e.g. US_East_1, AP_SOUTHEAST_2)
• Rack ≡ Availability Zone(e.g. us-east-1a, ap-southeast-2b)
• Google Cloud Platform(no out-of-the-box auto-configuring snitch — use GossipingPropertiesFileSnitch, or roll your own!)
• Data Centre ≡ GCP Region(e.g. US, Europe)
• Rack ≡ Zone(e.g. us-central1-a, europe-west1-a)
Data Centre Aware• Cassandra is data centre aware
• Only fetch data from a remote DC if absolutely required (remote data is more “expensive”)
• Clients can be made data centre aware
• If your app knows its DC, client will talk to the closest DC
Cluster cluster = Cluster.builder() .addContactPoint(…) .withLoadBalancingPolicy(new DCAwareRoundRobinPolicy(“US_EAST_1")) .build();
Multi DC Support
• Per-node public (internet-facing) IP address
• Optionally, per-node private IP address
• Per-node public address is used for inter-data centre connectivity
• Per node private address is used for intra-data centre connectivity
Multi DC Support• Cloud ⇄ cloud, traditional ⇄ cloud, traditional ⇄ traditional
• Easy to setup per-node public and private addresses
• Private network clusters ⇄ Cloud clusters
• Private networks: 𝑛 public addresses, shared by 𝑥 private addresses. Not 1 ↔ 1 (where often 𝑥 > 𝑛)
• done via Network Address Translation
IPv4 Address Space Exhaustion
Source: http://www.potaroo.net/tools/ipv4/
Multi-DC Support
• IPv4
• Address exhaustion
• Over time, will become more expensive to purchase addresses
• Wasteful(being a good internet citizen)
Alternatives• IPv6
• Java supports it ∴ Cassandra probably supports it (untested by us)
• Global IPv6 adoption is ~4%(according to Google — google.com/intl/en/ipv6/statistics.html)
• IPv6/IPv4 hybrid(Teredo, 6over4, et. al.)
• AWS EC2 does not support IPv6. End of story. (Elastic Load Balancer does support IPv6)
Alternatives• VPNs
• tinc, OpenVPN, etc.
• All private address space — no dual addressing
• Requires multiple links — between every DC and per client
• Address space overlaps between multiple VPNs
• Connectivity to multiple clusters an issue (for multi-cluster apps, centralised monitoring, etc)
Data Centres Links
3 3
5 10
7 21
Alternatives
• Network Address Translation (NAT)(aka IP Masquerading or Port Address Translation (PAT))
• Deployed on most private networks
• Connectivity between private network clusters ⇄ Cloud clusters
• Supports client connectivity to multiple clusters
NAT Basics• Re-maps IP address spaces
(e.g. Public 96.31.81.80 ↔ Private 192.168.*.*)
• 𝑛 public addresses, shared by 𝑥 private addresses. Not 1 ↔ 1 (where often n = 1, 𝑥 > 𝑛)
• Port Address Translation
• Private port ↔ Public port
• Outbound connections only without port forwarding or NAT traversal
• Per DC gateway device — performs NAT and port forwarding
NAT with Inbound Connections
• Static port forwarding(configured on the gateway)
• Automatic port forwarding — UPnP, NAT-PMP/PCP (configured by the application, e.g. Cassandra)
• NAT Traversal — STUN, ICE, etc.
NAT + C∗
Situation: 𝑛 Cassandra nodes, 1 public address per data centre
• Port forward different public ports for each node
• Advertise assigned ports
• Modify Cassandra and client applications to connect to advertised ports
Advertising Port Mappings• Extend Cassandra Gossip
• Include port numbers in node address announcements
• Allow seed node addresses to include port numbers
• Allow multiple nodes to have identical public & private addresses(only port numbers differ per DC)
• How to bootstrap? SIP?
• Cassandra must be aware of the allocated ports in order to advertise
• Hard if C* is not directly responsible for the port mapping (e.g. static port forwarding)
• Too many modifications to internals
Advertising Port Mappings• DNS-SD — dns-sd.org
(aka Bonjour/Zeroconf)
• Reads — works with existing DNS implementations(it’s just a DNS query)
• Even inside restrictive networks, DNS usually works
• Combination of DNS TXT, SRV and PTR records.
• Updates
• via DNS Update & TSIG — supported by bind
• via API — e.g. for AWS Route 53
Advertising Port Mappings• DNS-SD cont’d.
• SRV records contain hostname and port(i.e., hostname of the NAT gateway and public C* port)
• TXT records contain key=value pairs(useful for additional connection & config details)
• Modify C* connection code to lookup foreign node port from DNS
• Modify client driver connection code to lookup ports from DNS
• Can be queried & updated out-of-band(updated by the NAT device or central management server which knows which ports were mapped)
Advertised Details• Each cluster is it’s own browse domain
• Each NAT gateway device has an A record in the browse domain
• Each DNS-SD service is named based on the private IP address
• Requires unique private IP addresses across data centres
• SRV port is the C* thrift port
• Additional ports are advertise via TXT
Configuration• Cassandra is configured to only use private addresses
• On cluster creation
• Establish a new DNS-SD browse domain
• Create A records for each gateway device
• NAT gateway device is notified when a new C* node is started
• Allocates random public ports for C* and configures Port Forwarding
• Updates DNS-SD
• New SRV and TXT record
$ dns-sd -B _cassandra._tcp 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.Browsing for _cassandra._tcp
A/R Flags if Domain Service Type Instance NameAdd 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-2-4Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-1-2Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-2-3Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-2-2Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-1-4Add 2 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-1-3
$ dns-sd -L 192-168-1-4 _cassandra._tcp 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.Lookup 192-168-1-4._cassandra._tcp.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.
192-168-1-4._cassandra._tcp.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. can be reached at aws-us-east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.:1236 (interface 0) version=2.0.7 cqlport=1237
$ nslookup aws-us-east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.
Non-authoritative answer:Name: aws-us-east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.auAddress: 54.209.123.195
Output of dns-sd (Can also use avahi-browse, dig, or any other DNS query tool)
Java Driver Modifications
• This is usually a no-op (the default is IdentityTranslater)
• Modify translate() to perform a DNS-SD lookup.
• The address parameter is a node private IP address.
• Locate a service with a name = private IP address to determine public IP/port.
public interface AddressTranslater { public InetSocketAddress translate(InetSocketAddress address); }
Modifying Cassandra
• Responsible for managing Socket connections.
• Modify newSocket() to perform a DNS-SD lookup.
• The endpoint parameter is a node private IP address.
• Locate a service with a name = private IP address to determine public IP/port
public class OutboundTcpConnectionPool{
⋮ public static Socket newSocket(InetAddress endpoint) throws IOException {…} ⋮ }
C* C*
C*
C* C*
C*
NAT Gateway NAT Gateway
DNS (+ DNS-SD) Server (Route 53, Self-hosted, etc)Client
Application
Thanks! Questions?