45
SAVE BANDWIDTH (AND LEARN TO LOVE BLOBS)

Blob sync. Optimized updating of blobs on Azure

Embed Size (px)

DESCRIPTION

Optimize the way you update blobs on Azure Blob Storage. Only upload/download the deltas instead of wasting your bandwidth.

Citation preview

Page 1: Blob sync. Optimized updating of blobs on Azure

SAVE BANDWIDTH(AND LEARN TO LOVE BLOBS)

Page 2: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101

• DURABLE

• HIGHLY AVAILABLE

• ACCESS ANYWHERE (WITH CREDENTIALS)

• SCALABLE

Page 3: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101

CHEAP!!

Page 4: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101: BLOB BASICS

• USE BLOCKS OF DATA TO CONSTRUCT BLOB

• REPLACE BLOCKS IN EXISTING BLOBS

Page 5: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101

Page 6: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101

Page 7: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101

Page 8: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101

UPLOAD ENTIRE BLOB AGAIN

Page 9: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101

UPLOAD ENTIRE BLOB AGAIN

Page 10: Blob sync. Optimized updating of blobs on Azure

WHY?

Page 11: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101

TRY AGAIN

Page 12: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101

Page 13: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101

Page 14: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101

Page 15: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101

UPLOAD SINGLE BLOCK

Page 16: Blob sync. Optimized updating of blobs on Azure

BLOBSYNC AWESOMESAUCE

• DETECTS CHANGES

• DOES NOT NEED ORIGINAL FILE TO DETECT CHANGES

• UPLOADS/DOWNLOADS CHANGES ONLY

• A TRANSPARENT BLACKBOX… OPEN SOURCE BUT CAN TREAT AS A BLACK BOX

Page 17: Blob sync. Optimized updating of blobs on Azure
Page 18: Blob sync. Optimized updating of blobs on Azure

THEORY VS REALITY

• THEORY

Azure Blob Storage

Local machine

Page 19: Blob sync. Optimized updating of blobs on Azure

THEORY VS REALITY

• THEORY

Azure Blob Storage

Local machine

0 100 200 300 400

Page 20: Blob sync. Optimized updating of blobs on Azure

THEORY VS REALITY

• THEORY

Azure Blob Storage

Local machine

0 100 200 300 400

Page 21: Blob sync. Optimized updating of blobs on Azure

THEORY VS REALITY

• THEORY

Azure Blob Storage

Local machine

0 100 200 300 400

Page 22: Blob sync. Optimized updating of blobs on Azure

THEORY….

• IS ALL GOOD IN THEORY

Page 23: Blob sync. Optimized updating of blobs on Azure

THEORY VS REALITY

• REALITY

Azure Blob Storage

Local machine

0 100 200 300 400

Page 24: Blob sync. Optimized updating of blobs on Azure

THEORY VS REALITY

• REALITY

Azure Blob Storage

Local machine

0 100 200 300 400

A DB C

A B’ C D

Page 25: Blob sync. Optimized updating of blobs on Azure

FINDING COMMON GROUND

• HOW DO WE FIND MOVED BLOCKS?

Page 26: Blob sync. Optimized updating of blobs on Azure

FINDING COMMON GROUND

• HOW DO WE FIND MOVED BLOCKS?

• USE HASH/SIGNATURES FOR EACH BLOCK

• SEARCH FOR SIGNATURE ALL THROUGHOUT FILE

Page 27: Blob sync. Optimized updating of blobs on Azure

THEORY VS REALITY

• SEARCH LOCAL

Azure Blob Storage

Local machine

0 100 200 300 400

A DB C

A B’ C D

Page 28: Blob sync. Optimized updating of blobs on Azure

THEORY VS REALITY

• SEARCH LOCAL

• EG. SEARCH FOR ‘C’Local machine

0 100 200 300 400

A B’ C D

Page 29: Blob sync. Optimized updating of blobs on Azure

SUCCESS!

• CAN NOW FIND BLOCKS EVEN WHEN MOVED

Page 30: Blob sync. Optimized updating of blobs on Azure

SUCCESS!

• CAN NOW FIND BLOCKS EVEN WHEN MOVED

• IF WE CAN FIND A BLOCK WE CAN DETERMINE IF WE CAN REUSE IT

Page 31: Blob sync. Optimized updating of blobs on Azure

SUCCESS!

• CAN NOW FIND BLOCKS EVEN WHEN MOVED

• IF WE CAN FIND A BLOCK WE CAN DETERMINE IF WE CAN REUSE IT

• BUT…….

Page 32: Blob sync. Optimized updating of blobs on Azure

SUCCESS!

• CAN NOW FIND BLOCKS EVEN WHEN MOVED

• IF WE CAN FIND A BLOCK WE CAN DETERMINE IF WE CAN REUSE IT

• BUT…….

• MD5/SHA ETC ARE TOO SLOW TO DO THIS

Page 33: Blob sync. Optimized updating of blobs on Azure

• TOO SLOW? NO WAY!

• EG

• 100MB FILE/BLOB

• BLOCK OF 100K

• > 104M HASH CALCULATIONS. JUST TO FIND THAT ONE BLOCK

Page 34: Blob sync. Optimized updating of blobs on Azure

YOU HAVE TO ROLL WITH IT.

• ROLLING SIGNATURE

• EXTREMELY QUICK.

Page 35: Blob sync. Optimized updating of blobs on Azure

YOU HAVE TO ROLL WITH IT.

• ROLLING SIGNATURE

• EXTREMELY QUICK.

• DUE TO FALSE POSITIVES USE MD5/SHA AS CONFIRMATION STEP

Page 36: Blob sync. Optimized updating of blobs on Azure

YOU HAVE TO ROLL WITH IT.

• SIG = FUNC( 0 .. 4 )

Page 37: Blob sync. Optimized updating of blobs on Azure

YOU HAVE TO ROLL WITH IT.

• SIG = FUNC( 0 .. 4 )

• CALCULATE SIG OF 1..5 BASED OFF OLD SIG

• NEW SIG = OLDSIG – ARRAY[0] + ARRAY[5]

Page 38: Blob sync. Optimized updating of blobs on Azure

YOU HAVE TO ROLL WITH IT.

• CAN SEARCH ENTIRE FILE WITH MINIMAL CALCULATIONS. IE FAST!

Page 39: Blob sync. Optimized updating of blobs on Azure

SO WHAT NOW?

• CAN NOW SEARCH FILES QUICKLY FOR SIGNATURE MATCHES

• MEANS WE CAN FIGURE OUT WHAT IS COMMON BETWEEN CLOUD AND LOCAL

• CAN DOWNLOAD/UPLOAD ONLY THE DIFFERENCES.

Page 40: Blob sync. Optimized updating of blobs on Azure

PROVE IT!

Page 41: Blob sync. Optimized updating of blobs on Azure

FILE INTERNALS

Page 42: Blob sync. Optimized updating of blobs on Azure

FILE INTERNALS

ADDDELETE

REPLACE

Page 43: Blob sync. Optimized updating of blobs on Azure

LIES, MORE LIES AND STATISTICS

• SMALL DB (14M).

• CLEARED A SMALL TABLE.

• UPDATE 340K

• LARGE DB (555M).

• CLEARED A SMALL TABLE

• UPDATE 720K

• VM (8G).

• DELETED SOME FILES

• UPDATE 800M

Page 44: Blob sync. Optimized updating of blobs on Azure

UPCOMING CHANGES

• DEFRAG

• DYNAMICALLY DETERMINE BLOCK SIZE

• BETTER PARALLEL UPLOAD/DOWNLOAD

• 32 BIT VERSION

Page 45: Blob sync. Optimized updating of blobs on Azure

LINKS

• BLOG ON BLOBSYNC:

• HTTPS://KPFAULKNER.WORDPRESS.COM/CATEGORY/BLOBSYNC/

• NUGET PACKAGE:

• HTTPS://WWW.NUGET.ORG/PACKAGES/BLOBSYNC/

• GITHUB WITH SOURCE:

• HTTPS://GITHUB.COM/KPFAULKNER/BLOBSYNC/