42
Using Rust to Build a Distributed Transactional Key-Value Database LiuTang | [email protected]

Using Rust to Build a Distributed Transactional Key-Value ... · a Distributed Transactional Key-Value Database LiuTang | [email protected]. About me Chief Architect at PingCAP ... Compaction

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Using Rust to Build a Distributed Transactional Key-Value Database

LiuTang | [email protected]

About me● Chief Architect at PingCAP● TiDB and TiKV● Open source projects

○ LedisDB○ go-mysql○ go-mysql-elasticsearch○ rust-prometheus○ ...

Agenda● Introduction● Hierarchy

○ Storage○ Raft○ Transaction○ RPC Framework○ Monitor○ Test

● Combine them all

When we want to build a distributed transactional key-value database...

Performance

ACID

Scalability

Stability

Consistency

HA Others…

A High Building,A Low Foundation

Language

Let’s start from scratch!!!

Memory Table

ImmutableMemory Table

SST

SST SST …...

SST SST

WAL

Flush

SST

…... SST

CompactionMemory

Disk

Level 0

Level 1

Level 2

Info Log

Manifest

Current

RocksDB

https://github.com/pingcap/rust-rocksdb

Raft

a = 1 b = 2

a = 1

b = 2

State Machine

Log

RaftModule

Client

a = 1 b = 2

a = 1

b = 2

State Machine

Log

RaftModule

a = 1 b = 2

a = 1

b = 2

State Machine

Log

RaftModule

Multi-Raft

Region 1

Region 2

Region 3

Region 1

Region 2

Region 3

Region 1

Region 2

Region 3

Raft Group

Raft Group

Raft Group

A - B

B - C

C - D

Key Space

Multi-Raft - Scalability

Region 1

Region 2

Region 1

Region 2

Region 1

Region 2

A B C D

Multi-Raft - Scalability

Raft ConfChange - AddNode

Region 1

Region 2

Region 1

Region 2

Region 1

Region 2

A B C D

Region 2

Multi-Raft - Scalability

Raft ConfChange - RemoveNode

Region 1

Region 2

Region 1

Region 2

Region 1

A B C D

Region 2

https://github.com/pingcap/raft-rs

Transaction

Transaction

let mut txn = store.begin()let value1 = txn.get(region1_key)let value2 = txn.get(region2_key)// do something with valuetxn.set(region1_key, new_value1)txn.set(region2_key, new_value2)txn.commit()// or txn.rollback()

How to keep consistency crossing multi-Raft Groups?

Transaction1. Inspired by Google Percolator2. Optimized Two Phase Commit (2 PC)3. Multiversion Concurrency Control (MVCC)4. Snapshot Isolation5. Optimistic Transaction

gRPC● Mode

○ Unary○ Client streaming○ Server streaming ○ Duplex streaming

● Using Futures to wrap the asynchronous C gRPC API

let f = unary(service, method, request);let resp = f.wait();

https://github.com/pingcap/grpc-rs

Prometheus● Type

○ Counter○ Gauge○ Histogram

lazy_static! {

static ref HTTP_COUNTER: Counter = register_counter!(

"http_request_total",

"Total number of HTTP request."

).unwrap();

}

HTTP_COUNTER.inc();

https://github.com/pingcap/rust-prometheus

Testing

Testing - Failure Injection

// Ingest a failurefn function_foo() { fail_point!("foo");}

// Run and Trigger the failureFAILPOINTS=foo=panic cargo run

https://github.com/pingcap/fail-rs

Architecture

Architecture

RocksDB

Raft

MVCC

Txn API

RocksDB

Raft

MVCC

Txn API

RocksDB

Raft

MVCC

Txn API

Client

gRPC

Prometheus

https://github.com/pingcap/tikv

Beyond TiKV

A Distributed Relational Database

TiDB

Applications

MySQL Drivers(e.g. JDBC)

TiDB

TiKV

MySQL Protocol

RPC

A Distributed Analytical Database

TiSpark

Worker

Spark Driver

Job

Spark Cluster

TiKV

RPC

Worker Worker

Hybrid Transactional/Analytical Processing Database

TiDB

TiDB

Worker

Spark Driver

TiKV Cluster (Storage)

Meta data

TiKV TiKV

TiKV

Data location

Job

TiSpark

API

TiKV

TiDB

TSO/Data location

Worker

Worker

Spark Cluster

TiDB Cluster

TiDB

... ......

API

PD PD

PD

PD Cluster

TiKV TiKVTiDB

Thank you!https://github.com/pingcap/tidbhttps://github.com/pingcap/tikv

We are hiring…@China @Silicon Valley @Home