Upload
uchannos
View
3.432
Download
0
Embed Size (px)
Citation preview
WalB: A Fast and Low LatencyBackup System for Block Devices
Cybozu Meetup #8 SRE WalB
Kota Uchida
September 25, 2017
1
2
About me
▌Kota Uchida
▌SRE team at Cybozu, Inc.
▌A WalB developer
3
About Cybozu
▌A large cloud service vendor in Japan.
▌Largest market shares
in field of collaborative software.
▌We serve web applications on our own cloud platform.
kintone: a low-code business app platform
and more
#customer companies:
#accesses / day:
write IOs / day:
20,000+
210 millions
24.5 TiB
4
5
Service Level Objective
▌24/7 nonstop service
▌99.99% availability (4 min / month)
▌Daily backup (retention period is 14 days)
▌Disaster recover: copy data to a remote site once a day
Architecture of our platform
6
ApplicationServer
L7LB
Storage Server
dm-snap
Storage Server
dm-snap
Backup Server
Remote Site
DatabaseServer
DiffDiff
DiffDiff
The scope of this talk
RAID 1Blob
Server
MappingInfo
Snapshot Managementwith dm-snap
7
A B
Original Volume Area
Snapshot Area
Logical Structure
Physical Structure
(1) CoW
Latest Image
Write A’ Write B’
Snapshot Image
(2) Write
B’
B
B’
A
A’
A’
0 1 2 3 4
Backup using dm-snap
8
Snapshot1
(2) Full-scan a new snapshot
Logical Structure
Snapshot0
B’A’
(3) Generate a diff imageby comparing two snapshots
B
(1) Full-scan an old snapshot
B’A’
A
Full-scan at night
9
Daytime
Backup processing time
o’clock
UX degradationduring a full-scan
10Full-scanning
11
We have no more “nights”
▌Until now:
Full scan is allowed only when access rate is low, i.e., at night.
▌From now on:
We have to handle accesses from multiple timezones.
▌We must be able to backup any time without UX degradation.
12
New Solution
▌We need a new solution with:
No IO spikes
Short backup time
▌We compared dm-thin with WalB
13
What is dm-thin?
▌dm-thin provides thin-provisioning volume management to
share same data among volumes
reduce disk usage using snapshots
▌In the mainline Linux kernel
Snapshot Managementwith dm-thin
Logical Structure
Physical Structure
A
Latest Tree
Latest Image A
Snapshot Managementwith dm-thin
15
Logical Structure
Physical Structure
A
Snapshot Tree Latest Tree
ASnapshot
Latest Image A
Snapshot Managementwith dm-thin
16
A A’
Snapshot Tree Latest Tree
(1) CoW
(1) CoW
Write A’
Physical Structure
(2) Write
(2) Update
A’
ASnapshot
Latest Image
Logical Structure
17
A B B’
Snapshot0 Snapshot1
A’
A’ B’
A BSnapshot0
Snapshot1
Generate a diff image using dm-thin metadata
Logical Structure
Physical Structure
Backup using dm-thin
18
What is WalB?
▌A real-time and incremental backup system
developed at Cybozu Labs
▌Can backup block devices without IO spikes
dm-snapfull scanning
WalBno spikes
Special Block Devices for WalB
19
WalB device
Data device Log device
Read Write
Any application (File system, DBMS, etc.)
Linear mapped Ring buffer
Write IO Logging and Backup with WalB
20
A B
Data Device Log Device
0 1 2 3 4
Time series of write I/Os
Time
Write IO Logging and Backup with WalB
21
B
A B
Write A’
Data Device Log Device
A’
0 1 2 3 4
1 A’
Time series of write I/Os
Time
Scan the log device and generate a diff image
Write IO Logging and Backup with WalB
22
B
A B
B’
Write A’
Write B’
Data Device Log Device
A’
A’ 41
0 1 2 3 4
A’
A’ B’
Time series of write I/Os
Scan the log device and generate a diff image
Time
1
23
Performance test
▌Compared dm-snap, dm-thin, and WalB
▌Executed a workload during a backup
The workload & the backup will affect each other
▌Measured the following metrics:
Latencies of the workload
Backup time
24
Environment & Settings
▌Test environment:
CPU:2.40 GHz x 12 cores
MEM:192 GiB
HDD:4 TB HDD, RAID 6 (8D2P)
NIC:10 Gbps x 2
Kernel:4.11 (latest upstream)
▌Test settings:
100 GiB volumes
Workload: 4 KiB Random writes for a 5 GiB range
25
Measuring the Backup Time(dm-snap, dm-thin)
▌dm-snap:take a snapshot & scan full image
▌dm-thin:get a structure of snapshot trees & find modified
blocks & read these blocks
5 GiB 95 GiB (unchanged)
4 KiB Random Writes
dm-snap : scan full image
dm-thin : scan changed chunks (tree traversal)
26
Measuring the Backup Time(WalB)
▌WalB:scan logs from a log device & send them to a backup
server continuously
5 GiB 95 GiB (unchanged)
4 KiB Random Writes
WalB : scan logs
Log Device
Write IO logsWalB Device
Backup Server
DiffDiff
Network
Write I/O latency
dm-thin
dm-snap
WalB
no-backup
27
IO spikes due to CoW,worse than dm-snap!
Small overhead
large due to CoW
Backup time
28
1146
2260
1.2
slower than dm-snap
so fast!
29
Conclusion
▌dm-snap & dm-thin
High I/O latency during a backup
Long backup time
▌WalB
Stable and low I/O latency (no spikes)
Short backup time
WalB satisfies our requirements for production use.
30
Try WalB!
▌Project page
https://walb-linux.github.io/
▌Tutorial
https://github.com/walb-linux/walb-
tools/tree/master/misc/vagrant/
Vagrantfile for Ubuntu 16.04 and CentOS 7
Remote Host
31
Incremental backup
▌Daily backup (retention period is 14 days)
▌Worker daemon of WalB selects diff files older than 14
days and applies them to a base image.
Volume Diff Diff Diff…Base
Diff files for 14 days
Backup Host
Apply everyday
Remote Host
32
Restoring a volume
▌To restore the latest state of a volume:
take a snapshot of a base image, and
apply all diff files to it.
Diff Diff Diff…Base
Base'Writablesnapshot
Apply all diffs
Remote Host
33
Make restoration faster 1/2
▌Fast restoration
by preparing read-only snapshots for each day
Diff Diff Diff…Base
1421
dm-thin snapshots for each day
Diff
Remote Host
34
Make restoration faster 2/2
▌Apply some diffs to the appropriate snapshot.
▌At most 24 hours of diffs are needed to be applied.
Faster!
Diff Diff Diff…Base
1421
Diff
35
Worldline: restoring a whole environment
▌"Worldline" means a parallel world.
▌We backup configurations in addition to user data.
Configurations:
definitions for each customer (ID, FQDN, Apps, …),
application version definition,
host definition, etc.
▌It is important to use applications whose versions are
consistent with user data backed up before.
36
Worldline: restoring a whole environment
▌A daily script takes a snapshot of a whole environment.
▌An weekly script restores the latest backup, so we can use it
for investigation of failures or development our services.
User data
DiffDiff
Snapshot
ConfigDB
ConfigDB'Backup Backup
Worldline
Spare hosts
Restore
DiffDiff
Restore