Upload
zingopen
View
5.758
Download
3
Embed Size (px)
DESCRIPTION
Citation preview
Zing Database – Distributed Key-Value Database
Nguyễn Quang NamZing Web-Technical Team
Content
Why
Introduction
Overview architecture
1
3
2
Single Server/Storage4
Distribution5
Introduction
Some statistics:
- Feeds: 1.6 B, 700 GB hard drive in 4 DB instances, 8 caching servers, 136 GB memory cache in used.
- User Profiles: 44.5 M registered accounts, 2 database instances, 30 GB memory cache.
- Comments: 350 M, 50 GB hard drive in 2 DB instances, 20 GB memory cache
Why
Access time
L1 cache reference 0.5 nsBranch mispredict 5 nsL2 cache reference 7 nsMutex lock/unlock 100 nsMain memory reference 100 nsCompress 1K bytes with Zippy 10,000 nsSend 2K bytes over 1 Gbps network 20,000 nsRead 1 MB sequentially from memory 250,000 nsRound trip within same datacenter 500,000 nsDisk seek 10,000,000 nsRead 1 MB sequentially from network 10,000,000 nsRead 1 MB sequentially from disk 30,000,000 nsSend packet CA->Netherlands->CA 150,000,000 ns
by Jeff Dean (http://labs.google.com/people/jeff)
Standard & Real Requirement
- Time to load a page < 200 ms- Read data rate ~12K ops/sec- Write data rate ~8K ops/sec- Caching service/Database recovery time < 5 mins
Existent thing
- RDBMS (MySQL, MSSQL): Write: too slow; Read: so so with a small DB, too bad with a huge DB
- Cassandra (by Facebook): difficult to do operation/maintain, and performance is not so good
- HBase/Hadoop: We use this for log system
- MongoDB, Membase, Tokyo Tyrant, .. : OK! we use these in several cases, but not suitable for all
Overview architecture
ZN
onbl
ocki
ngS
erve
r
MODELRequests API
Disk
CommitlogStorage
(W)
ZiDBStorage
(RW)
LocalDatabase
LRU ICache(RW)
Remote Storage
(RW)Remote system
TCP
Transportlayer
Model(Business)
layerStorage
layer
Memory storage
Persistentstorage
Remotestorage
- Load configuration- Create & manage backend storages- Implement business rules
Server/Storage
ZNonblockingServer
- Based on TNonblockingServer (Apache Thrift)- 185K reqs/sec (original TNonblockingServer is just 45K reqs/sec)- Serialize/Deserialize data- Prevent overload server- Data is not secured while transferring- Protect service from invalid requests
ICache
- Least Recently Used/Time based expiration strategy- zlru_table<key_type, value_type>: hash table data structure- Re-write malloc/free functions instead of using standard malloc/free in glibc to reduce memory fragment- Support dirty-items marking => for lazy DB flush
ZiDB
- Separate into DataFile & IndexFile- 1 seek for a read, 1-2 seeks for a write- IndexFile (hash structure) is loaded onto memory as a mapping file (shared memory) to reduce system call- Write-ahead log to avoid data loss- Data magic-padding- Checksum & checkpoint for repair data- Partitioning DB for easier maintenance
Distribution
Key requirements:- Scalability- Load balance- Availability- Consistency
2 Models:- Centralized: 1 addressing server & multiple storage servers => bottleneck & single-point-of-failure- Peer-peer: Each server includes addressing module & storage
2 Types of routing:- Client routing: Each client itself does the addressing and query data - Server routing: The addressing is done at server
Operation Flows
Business Logic Server
Addressing Server (DHT)
Storage Layer
Storage Node 1ICache ZiDB Storage
Module
Storage Node NICache ZiDB Storage
Module…
(1) Request key
locations(2)
Key locations(3)
Get & Set operations
(4)Operation
returns
* Addressing module is moved into each storage node in Peer-peer model
Addressing:
- Provide key locations of resources- Basically a Distributed Hash Table, using consistent hashing- Hashing: Jenkins, Murmur, or any algorithm that satisfies two conditions: - Uniform distribution of generated keys in the key space - Consistency(MD5, SHA are bad choice since performance)
Addressing - Node location:
Each node is assigned a continuous range of IDs (hashed key)
Addressing - Node location: Golden ratio principle (a/b = 2b/a)
- Init ratio = 1.618- Max ratio ~ 2.6- Easy to implement- Easy for routing from client 2 3
4
5
1
Server 1: 1,2,3Server 2: 4,5,6,7Server 3: 8,9
1
47
3
6
25
8
9
Addressing - Node location: Virtual nodes
- Each real server has multiple virtual nodes on ring- More virtual nodes, more balance of load- Hard to maintain table of nodes
A
A
A
B
B
CAddressing – Multi-layer rings
- Store the change history of system - Provide availability/reconfigurability- Able to put a node on ring manually
* Write: data is located on the highest ring* Read: data is located on the highest ring, then lower rings if not found
Replication & Backup - Each node has one primary range of IDs, and Some secondary range of IDs- Each real node need a backup instance to replace in case it’s down
* Data is queried from primary node, then secondary nodes
Configuration: to find the best parameters to configure DB or to choose the suitable DB type.
- How many read/write per second?- Length Deviation of data: data length is same same or much different each others, - Has updation/deletion data? - How important of data: acceptable loss or not- The old data can be recycled?
Q & A
Contact:Nguyễn Quang [email protected]://me.zing.vn/nam.nq