18
Couchbase Server™ Technical Overview Key concepts, system architecture and subsystem design

Couchbase Serverâ„¢ Technical Overview

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Couchbase Server™ Technical Overview

Key concepts, system architecture and subsystem design

COUCHBASE SERVER TECHNICAL OVERVIEW

© 2012 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM

2

Table of Contents

What is Couchbase Server? 3

System overview and architecture 5

Overview – Couchbase Server and client software

Couchbase Server in the application stack

Data flow in a Couchbase Server environment

Between application and Couchbase Server

Within the Couchbase Server cluster

Top-level software block architecture

Couchbase Server – Data manager 9

TCP ports

Embedded Moxi

Memcached protocol listener/sender

Couchbase Server storage engine

Couchbase Server – Cluster manager 12

TCP ports

REST management API

Per node configuration management and monitoring functions

Per cluster functions

Getting started with Couchbase Server 15

Glossary 15

COUCHBASE SERVER TECHNICAL OVERVIEW

© 2012 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM

3

What is Couchbase Server?

Couchbase Server is a simple, fast, elastic NoSQL database, optimized for the data management needs of interactive web applications. Couchbase Server makes it easy to optimally match resources to the changing needs of an application by automatically distributing data and I/O across commodity servers or virtual machines. It scales out and supports live cluster topology changes while continuing to service data operations. Its managed object caching technology delivers consistent, sub-millisecond random reads, while sustaining high-throughput writes. As a document-oriented database, Couchbase Server accommodates changing data management requirements without the burden of schema management.

Key Couchbase Server characteristics and capabilities include:

• Push-button elasticity

◦ Add or remove multiple servers simultaneously with the push of a button

◦ Efficient data rebalancing without requiring application changes• Memcached compatible

◦ Easy to get started with Couchbase – drop-in replacement for memcached

◦ Simple, easy to use and widely supported key-value interface• Zero-downtime maintenance

◦ Add or remove servers, upgrade software in and perform any maintenance tasks in a live cluster

◦ No application downtime required

◦ No application performance degradation• Enterprise class monitoring and administration

◦ Deeply instrumented monitoring with rich administration GUI

◦ Dynamic system monitoring charts

◦ Backup and restore capability

◦ RESTful management API

◦ Easy interface to external monitoring and management systems

◦ Easy to automate deployment to the cloud• Reliable low-latency storage architecture

◦ Memcached inside. Caching technology has 10 years of production maturity and powers 18 of the top 20 web applications on the planet

◦ Efficient use of memory (object-level cache prevents thrashing inherent to page-level approaches)

◦ Predictable low latency

◦ No memory mapped files

◦ Pull the plug on a server without fear of stored data corruption• Data replication with auto-failover

◦ Maintain multiple copies of your data within the cluster for high-availability

◦ User configurable replication count

◦ User configurable failover policy to ensure data availability in the face of hardware failure

COUCHBASE SERVER TECHNICAL OVERVIEW

© 2012 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM

4

• Professional SDKs for wide variety of languages

◦ Well-documented, easy-to-use SDKS make it easy for developers to build applications that store data in Couchbase

◦ Support for Java, C#, PHP, C, Python, Ruby

At the highest level, Couchbase Server is simple, fast, elastic, and reliable. Every feature and design decision is weighed against these core principles:

Simple. Everything about Couchbase Server is easy: getting, installing, managing, expanding and using it. As a document database, there is no need to create and manage schemas; and never a need to normalize, shard or tune the database. Build applications faster, keep them running reliably and easily adapt them to changing business requirements.

Fast. Couchbase Server is screamingly, predictably fast. It is the lowest latency, highest throughput NoSQL database technology available. Read and write data with consistently low latency and sustained high throughput across the scaling spectrum. Get the performance you need at lower cost.

Elastic. By automatically distributing data and I/O across commodity servers or virtual machines, Couchbase Server makes it easy to match the optimal quantity of resources to the changing needs of an application. Quickly grow a cluster from 1 node to 25 nodes to 100 nodes or shrink a cluster to sustain application performance, while precisely matching cost to demand. There are no single points of failure in a Couchbase cluster and all operations function across the entire cluster. Sophisticated replication and persistence subsystems guarantee continuous operations.

Reliable. Couchbase Server is enterprise-ready software that you can depend on for mission critical applications. With zero-downtime maintenance and rich monitoring capabilities, deploy mission critical applications with confidence.

COUCHBASE SERVER TECHNICAL OVERVIEW

© 2012 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM

5

System overview and architecture

Overview – Couchbase Server and client software

A Couchbase Server is a computer (e.g., commodity Intel server, VMware virtual machine, Amazon machine instance) running Couchbase Server software. Couchbase Server runs on 32- and 64-bit Linux, Windows and Mac operating systems. The source code is a mix of C, C++ and Erlang, with some utility functionality authored in Python.

Each server in a Couchbase Server cluster runs identical Couchbase Server software, meaning “all Couchbase Server nodes are created equal.”

A number of benefits flow from the decision to avoid special-case nodes running differentiated software or exhibiting differentiated functionality (e.g., masters, slaves, cluster managers, configuration servers):

1.Nosinglepointoffailure. Nodes can fail at any time (up to the replication count of the cluster) and a Couchbase Server cluster can continue to process data operations for the entire key space of data, and with no loss of administrative functionality. If the server with the global singleton is lost (the elected leader of the cluster), the Erlang-based cluster management system will elect a new leader and cluster management operations will continue without impacting applications on top. And given the distributed architecture of Couchbase Server, even if the cluster management subsystem were to completely fail, data operations would continue uninterrupted.

2.Getstartedwithonenode. The full functionality of Couchbase Server is available with just a single package installation. Download, install and begin using Couchbase Server in five minutes or less, on just one node if desired.

3.Clonetogrow. Because all nodes are alike, you can literally clone a virtual machine running Couchbase Server software, join it to a cluster (one mouse click) and rebalance the cluster (another mouse click) to migrate data to the net new server, balancing data and I/O across the cluster. You can do this with many servers at once, and the entire process can be automated through use of the Couchbase Server CLI utility or REST calls.

An application interacts with a Couchbase Server cluster through a memcached client library, typically over a network connection.

The client library employs an algorithm (pluggable, but a hashing algorithm is default in Couchbase Server) to calculate a virtual “bucket” in which a given key’s value is to be located. Couchbase Server will hash a key to 1 of 1024 vBuckets.

COUCHBASE SERVER TECHNICAL OVERVIEW

© 2012 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM

6

The vBucket number is then used as an index by the client to lookup, in the vBucket map data structure, the individual server in the cluster responsible for the data in that vBucket (including master and replica server responsibilities). Memcached client libraries are available for practically every language and application framework.

Couchbase Server in the application stack

As shown in Figure 1, Couchbase Server supports a “scale out” architecture at the data layer. Couchbase Servers are deployed as a cluster behind web application servers, spreading the data and I/O operations evenly across the cluster. Servers can be added to, and removed from, a live cluster. This deployment model matches what is already best practice architecture at the application logic tier, where new web servers are deployed alongside existing servers and placed into rotation behind a load balancer. With Couchbase Server, client-side logic effectively “load balances” data operations across the cluster through a key hashing and server mapping algorithm.

Figure 1: Couchbase Server deployment architecture

Load BalancerLoad Balancer

Web ServersWeb Servers

www.wellsfargo.comwww.wellsfargo.com

Couchbase ServersCouchbase Servers

COUCHBASE SERVER TECHNICAL OVERVIEW

© 2012 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM

7

Data flow in a Couchbase Server environment

Between application and Couchbase Server

Figure 2: shows the flow of data from an application to a Couchbase Server cluster, illustrating a data

write operation.

The illustration starts at the presentation layer:

1. An application user takes an action that results in the need to update a data item in Couchbase Server

2. The application server responding to the user action updates the key’s value and makes a call to a memcached client library to set the key-value pair

3. The memcached client library selects the server currently serving as master for the referenced key and transmits the operation to the server

4. (and 5.) Upon arrival, Couchbase Server replicates, caches and stores the data, as detailed in the next section

Within the Couchbase Server cluster

Picking up from step 5 in figure 2, figure 3 shows the processing of the set operation inside the Couchbase Server cluster.

1. The set arrives into the Couchbase Server listener-receiver.

2. Couchbase Server immediately replicates the data to replica servers – the number of replica copies is user defined. Upon arrival at replica servers, the data is persisted.

3. The data is cached in main memory.

Couchbase Server (memcached) client hashes KEY,identifies KEY’s master server

Couchbase Server replicates KEY-VALUE pair,caches it in memory and stores it to disk.

COUCHBASE SERVER TECHNICAL OVERVIEW

© 2012 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM

8

4. The data is queued for persistence and de-duplicated if a write is already pending. Once the pending write is pulled from the queue, the value is retrieved from cache and written to disk (or SSD).

5. Set acknowledgment return to application.

Figure 3: data flow within the Couchbase Server cluster on write

Top-level software block architecture

At the highest level, Couchbase Server has two distinct functional blocks: the Data Manager and the Cluster Manager. With some effort, it is possible to selectively build Couchbase Server complete devoid of a Cluster Management subsystem. Node configuration management, replication, health monitoring and other capabilities would have to be performed by an external system.

Figure 4: Couchbase Server software architecture

COUCHBASE SERVER TECHNICAL OVERVIEW

© 2012 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM

9

DataManager. The data manager does the work of storing and retrieving data in response to data operation requests from applications. It exposes two “memcapable” ports to the network – one port supports non-vBucket-aware memcached client libraries (pre-memcapable 2.0 API), which are proxied if required; the other port expects to communicate with vBucket-aware clients (memcapable 2.0+ API). The majority of code in the Data Manager is C and C++.

ClusterManager. The cluster manager supervises the configuration and behavior of all nodes in a Couchbase Server cluster. Cluster management code runs on every node in the cluster, but one node (the one holding a global singleton) is elected to perform aggregation, consensus building and cross-node control decisions at any point in time. The majority of code in the Cluster Manager is written in Erlang/OTP, a language which makes writing correct concurrent code (notoriously difficult) nearly effortless.

The following sections provide a high-level look at the subsystems inside the data and cluster manager systems.

Couchbase Server – Data manager

Figure 5 below highlights the key subsystems, and their interconnections, in the data path within a Couchbase Server node.

Figure 5: Couchbase Server data manager

COUCHBASE SERVER TECHNICAL OVERVIEW

© 2012 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM

10

TCP ports

The Couchbase Server data manager listens for requests on two TCP ports (the port numbers are configurable, defaults are shown):

• Port11211–The traditional memcached port number processes requests from clients supporting version 1.0 of the memcapable API specification. These clients rely on a consistent hashing algorithm to map keys directly to servers in a variable-length server list. Most memcached clients today support memcapable 1.0, though memcapable 2.0 clients for the most popular platforms are being introduced (e.g., spymemcached for Java, enyim for .NET, fauna for Ruby, libmemcached for C and other languages that wrap this client library).

• Port11210–a port directly accessible to clients implementing version 2.0 of the memcapable API. These clients are “vBucket aware,” using a hashing algorithm to map keys to one of a fixed number of “vBuckets” (in Couchbase Server, the key space is grouped into 1024 vBuckets). [For more information on vBuckets, see the vBuckets section later in this document]. vBuckets are then mapped to a server, providing a layer of indirection enabling dynamic cluster rebalancing, non-disruptive cluster expansion or contraction, replication, failover and a host of other capabilities.

Embedded Moxi

For non-vbucket-aware clients, moxi provides high-performance proxy services. When clients send operations to port 11211, moxi processes them and, if required, forwards them to the server(s) currently servicing requests for the key(s) referenced by the operation. This mapping and forwarding function is unnecessary for vBucket-aware clients.

Memcached protocol listener/sender

As mentioned previously, the latest stable memcached front-end source code is directly linked into Couchbase Server, guaranteeing protocol compatibility with memcached (both ASCII and binary protocols) now and in to the future. A number of capabilities are embodied within this subsystem: network listener, protocol parser, thread manager, and the tap stream sender logic.

COUCHBASE SERVER TECHNICAL OVERVIEW

© 2012 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM

11

Couchbase Server storage engine

The Couchbase Server storage engine does the heavy lifting of caching and persisting data within a Couchbase Server node.

Figure 6: Data storage hierarchy behind the Couchbase Server storage engine

As shown in Figure 6, the Couchbase Server storage engine can manage a hierarchy of storage media, including main memory and spinning disk drives. Couchbase Server supports both on- and off-node storage; each node can be configured to use local storage media or to store data on an external data path, including mixing the two.

Data is automatically migrated up and down the latency/cost stack (RAM-Disk) based on data access patterns (Figure 7).

Figure 7: Data migrates up and down the latency stack

COUCHBASE SERVER TECHNICAL OVERVIEW

© 2012 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM

12

In Couchbase Server, data migration is based on an LRU algorithm, keeping recently used items in low-latency media while “aging out” colder items; first to SSD (if available) and then to spinning media. Alternative storage migration (and replication management, covered later) algorithms offer a rich set of community research and development opportunities.

Couchbase Server – Cluster manager

The Couchbase Server cluster manager monitors health and coordinates data manager behavior on each node; configures and supervises inter-node behavior (e.g. replication streams and rebalancing operations); provides aggregation and consensus functions for the cluster (e.g. global singleton election); and provides a RESTful cluster management API. The cluster manager is build atop Erlang/OTP, a proven environment for building and operating robust fault-tolerant distributed applications.

Figure 8: Couchbase Server cluster manager

TCP ports

The Couchbase Server cluster manager listens for http requests on a configurable TCP port (default is 8091) – a REST API and web user interface receive and process this traffic. By default, ports 4369 and a range from 21100-21199 are dedicated to Erlang/OTP functions. The erlang port mapper runs on 4369 and inter-erlang-node communications operate in the 211xx range.

COUCHBASE SERVER TECHNICAL OVERVIEW

© 2012 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM

13

REST management API

This port services cluster management requests via a published RESTful API. A CLI utility that leverages the REST interface provides a convenient way to programmatically manage a Couchbase Server cluster. Figure 9 summarizes the capabilities of the Couchbase Server CLI (and the underlying REST API).

Figure 9: CLI utility uses Couchbase Server REST interface

COUCHBASE SERVER TECHNICAL OVERVIEW

© 2012 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM

14

Per node configuration management and monitoring functions

The Couchbase Server cluster manager executes on each node in a Couchbase Server cluster. There are four primary subsystems that operate on each node.

1.Heartbeat. A watchdog process periodically communicates with the currently elected cluster leader (the node with the global singleton) to provide Couchbase Server health updates.

2.Processmonitor. This subsystem monitors execution of the local data manager, restarting failed processes as required and contributing status information to the heartbeat module.

3.ConfigurationManager. Each Couchbase Server node has a configuration – a vBucket map, active replication streams, a target rebalance map, etc. The configuration manager receives, processes and monitors local configuration, in concert with a cluster-wide configuration distribution system.

4.GlobalSingletonSupervisor. In a Couchbase Server cluster, one node is elected leader. If the leader dies, a new leader is elected. The Global Singleton Supervisor is responsible for electing a cluster leader and supervising “per-cluster” processes if the local node is the current leader.

Per cluster functions

In addition to the per-node functions which are always executing at each node in a Couchbase Server cluster, there are a set of functions which active only on one node in the cluster at any point in time. Possession of a global singleton data structure indicates to a node that it should execute these functions.

1.RebalanceOrchestrator. The rebalance orchestrator calculates, distributes and provides cluster-wide supervision of a rebalance operation. When a rebalance operation is initiated, it calculates a target vBucket map based on the current pending set of servers to be added and removed from the cluster; distributes commands to individual nodes to build a network of vBucket migration streams; and monitors migration completion events, updating and distributing the current vBucket map as migrations complete (note: there is a companion white paper that details the operation of the Couchbase Server rebalance orchestrator).

2.NodeHealthMonitor. The node health monitor (also known as The Doctor) receives heartbeat updates from individual nodes in the cluster, updating configuration and raising alerts as required.

3.vBucketstateandreplicationmanager. Responsible for establishing and monitoring the current network of replication streams.

COUCHBASE SERVER TECHNICAL OVERVIEW

© 2012 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM

15

Getting started with Couchbase Server

Couchbase Server is freely available in both binary and source form. Downloading, installing and configuring Couchbase Server takes less than five minutes. This paper outlined the internal workings of Couchbase Server; but experiencing the simple, fast and elastic properties of Couchbase Server first-hand is the only way to really get a feel for the technology and how it may be useful in your application development environment. To download Couchbase Server, go to http://www.couchbase.com/downloads.

Glossary

• Bucket:A Bucket is a Couchbase Server data partition with its own key-space. Each Bucket therefore has its own vBucket map. Couchbase Server allows multiple buckets to exist on a single Couchbase Server cluster – providing secure multi-tenancy and separation of data sets. Each bucket can have its own properties and settings (e.g., replication count, blocking behavior, and cache and storage quotas). In most cases, a bucket can be thought of as a “virtual Couchbase Server cluster.”

• Cache:The caching layer in Couchbase Server is derived from the Memcached open source project. The Couchbase Server Cache transparently provides in-memory caching services to any application interacting with Couchbase Server.

• CouchbaseServer:A distributed database management system optimized for storing data behind interactive web applications.

• CouchbaseServerClusterManager:A Couchbase Server module (written in erlang) which provides a number of cluster-wide services, such as consensus formation, configuration management/distribution, and rebalance orchestration. To maximize performance, the cluster manager is never in the data flow path for any data operation (including replication and rebalancing streams). It is responsible only for configuring and coordinating the interaction between servers in a Couchbase Server cluster.

• CurrentvBucketMap:A table identifying the active Master and Replica Servers for each vBucket. During a rebalance operation, this map is updated by the Rebalance Orchestrator as individual vBucket migrations complete.

COUCHBASE SERVER TECHNICAL OVERVIEW

© 2012 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM

16

• Failover:If a server in a Couchbase Server cluster fails, the Failover mechanism can rapidly (< 100 msec) transfer Master Server status for all vBuckets previously “mastered” on that server to servers which have replica copies of those vBuckets. This operation leaves the cluster with one less replica copy of any data object which was stored (either in master or replica form) on the “failed-over” server. Failover ensures all objects stored in Couchbase Server are quickly available to an application for reading and writing, following failure of a server (because only one server can service reads and writes for any given vBucket, at any point in time). After initiating a failover, a Couchbase Server cluster administrator will typically repair, add or remove servers, then rebalance the cluster to restore a full set of replica copies.

• MasterMigrationTapStream:A special type of tap stream that copies all data objects in a given vBucket to the server which requested the tap stream. The special behavior happens at the end of the iteration process and pro vides a rapid, but orderly, transfer of Master Server status while maintaining data consistency.

• MasterServer:Each vBucket has one active Master Server at any point in time. The Master Server for a given vBucket is the only server that will accept reads and writes for keys that map to that vBucket.

• Migrate:To transfer Master Server or Replica Server status for a given vBucket (along with all the data associated with that vBucket) from one server to another.

• MigrationCommand:A request which can be sent to a Couchbase Server cluster member by the Rebalance Orchestrator, asking for specific actions in support of the rebalancing process. These commands can be used to establish Migration Tap Streams, to purge data associated with a given vBucket, or to order a Server to cease serving as a Master or Replica Server for a given vBucket.

• Node: A single server in a Couchbase Server cluster.

• NodevBucketMasterList:Each server in a Couchbase Server cluster has a Node vBucket Master List, identifying the vBuckets for which it is currently acting as Master Server.

• PendingSet:The list of all servers which are to be added to, or removed from, the Couchbase Server cluster during the next rebalance operation. When administrators add servers to a Couchbase Server cluster, whether through the graphical or a programmatic interface, those new servers enter in a “pending add” state; when administrators remove servers from the Couchbase Server cluster, they enter a “pending removal” state. On the next Rebalance operation, the Rebalance Orchestrator places vBucket data on the “pending add” servers while removing it from the “pending removal” servers.

COUCHBASE SERVER TECHNICAL OVERVIEW

© 2012 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM

17

• Persistence:Storing data in a technology that enables retrieval even in the case of complete data center power loss. Couchbase Server has a multi-tier persistence model – data can be stored in SSD devices or on spinning disk media, with auto-migration of the data to the lowest-latency device available, based on data access patterns. Couchbase Server uses a LRU model which migrates data based on temporal access patterns.

• Rebalance:The systematic process of redistributing data within a live cluster. In Couchbase Server, the Rebalance Orchestrator rebalances by selecting and then migrating certain vBuckets, including the data objects belonging to that vBucket, from old (Current) to new (Target) servers. Rebalancing will move both Master and Replica copies of objects. The intent is to spread the data, and in particular I/O requests, evenly across the cluster. Rebalancing is typically done following the removal or addition of servers to a cluster. A Couchbase Server rebalance operation can be stopped and restarted any time.

• RebalanceCalculator:Logic in the Couchbase Server Cluster Manager subsystem which calculates a Target vBucket Map. It takes as input the Current vBucket Map and the Pending Set. It calculates the optimal placement of vBuckets and returns the Target vBucket Map.

• RebalanceOrchestrator:Logic within the Couchbase Server Cluster Manager (executed on the Node with the global singleton) which coordinates a Rebalancing process (primarily by issuing Migration Commands to individual servers in the cluster).

• ReplicaMigrationTapStream:A special type of tap stream that copies all data objects in a given vBucket to the server which requested the tap stream. The special behavior happens at the end of the iteration process and provides a rapid, but orderly, transfer of Replica Server status while maintaining data consistency.

• ReplicaServer:Couchbase Server replicates object data (the number of Replicants is user-defined) to Replica Servers. Replica Servers can rapidly (within 100 msec) become the Master Server for a given key in case of original Master Server failure.

• Replicant:A replica (backup) copy of an object stored in Couchbase Server.

• Replication:The process of storing multiple copies of an object, across different servers, facilitating high-availability of any object stored in the cluster. Specifically, Replication supports rapid accessibility of an object, via the Couchbase Server Failover mechanism. Couchbase Server supports both Master-Slave and Peer-to-Peer replication topologies.

COUCHBASE SERVER TECHNICAL OVERVIEW

© 2012 COUCHBASE ALL RIGHTS RESERVED. WWW.COUCHBASE.COM

18

• TapStream:A publish-and-subscribe mechanism allowing a subscribing server to request copies of all data objects associated with one or more vBuckets on the publishing server. There are a number of Tap Stream types allowing only subsets of the data to be streamed, based on time and other selection filters. Tap Streams are a core building block of Couchbase Server replication and dynamic cluster rebalancing.

• TargetvBucketMap:The vBucket Map that represents the state a cluster will be in once a currently running rebalance operation completes. The Rebalance Orchestrator compares the target and current maps to determine which Migration Tap Streams to create and supervise. The rebalance operation is complete when the Current and Target vBucket Maps are identical.

• vBucket:A vBucket is the “owner” of a subset of the key space of a Couchbase Server cluster. Every key is “contained within” a vBucket. A mapping function is used to calculate the vBucket in which a given key belongs. In Couchbase Server the mapping function is a hash function that takes a key as input and outputs a vBucket identifier.

• vBucketMap:A table identifying the servers acting as Master and Replica Servers for each vBucket. A server appearing in this table can be (and usually is) responsible for multiple vBuckets. The number of vBuckets in a Couchbase Server cluster must exceed the number of physical servers that may eventually be present in the cluster. In Couchbase Server, the vBucket map supports up to 1024 servers per cluster. See also Current vBucket Map and Target vBucket Map.