6
Encrypted Incremental Backup without Server-Side Software Shih-Yu Lu Cloud System Software Institute Institute for Information Industry Taipei City, Taiwan (R.O.C) [email protected]  Abstract  With the development of science and technology, the capacity of hard disk and quantity of data are constantly increasing. Backup has become an important mechanism for  prevention of data loss. When the amount of data is small, full  backup is appropriate. However, if the amount of data is considerably large, ensuring full backup every time becomes time consuming. In this case, incremental backup is used for saving time. Incremental backup preserves data by not creating multiple copies; therefore, incremental backup is faster than full backup. To solve the related security problem, encryption is necessary when the backup data are stored in the storage server. However, most incremental backup cannot be encrypted. Thus, this paper presents a method to take encrypted incremental backup without any additional server-side software requirements. The encrypted incremental backup collects information from every file and stores it into a single file called the checksum file. This information includes filename, checksum, last modified time, file size, delete stamp, and encryption key. When the backup begins, the client collects the filename, checksum, last modified time, and file size. Then, the client gets another checksum file from the storage server. By comparing these two checksum files, the system will know which files have been changed and should be transmitted in the  backup this time. Before these files are sent to the storage server, the client will generate random keys to encrypt the files and store the encrypted keys in the checksum file. The system will use another key (key encryption key; KEK) to encrypt the checksum file; the administrator password is used for encrypting/decrypting this KEK.  Keywords-In cremental Backup, En crypted Backup, Res tore I. I  NTRODUCTION Extensive network coverage and mobile devices have made information and files readily available. The security of  business information and user’s data has become pivotal for enterprises [1][2][3][3][4][5][6][7]. Therefore, there is a need for security technology that can be used for protecting  backup data in a storage server. The term “encrypted  backup” refers to the b ackup data that have been encrypted to keep data secure. If such a technology is used, then even if other users get these encrypted data, they will not be able to access the data content easily. On the other hand, because the amount of data used in day-to-day life has increased considerably in the recent years, incremental backup is has  become increasingly important now. The main concept of incremental backup is to back up only the difference  between two file versions as doing so will reduce the transmitting time and the data transfer. Unfortunately, as encryption protects the file content, it may not allow the incremental backup algorithm [8] to detect the parts of a file that have been changed since the last  backup. Therefore, most incremental backup methods do not support encryption. In this paper, I propose a backup mechanism that supports encryption and incremental backup. In order to achieve this goal, we need to do the following: (1) support incremental backup, (2) support encrypted backup, (3) take multiple versions of backup/restoration, and (4) support different storage servers. In summary, I divide the  proposed method into three main functions in this paper; the overview is presented in Figure 1: Check Files: To determine what parts of a file have been changed since the last backup and need to be transmitted to the storage server in the current backup. Encryption: After the system generates the transmitted file list, all files in this list will be encrypted before sending files to the storage server. Storage Server Switcher: Different users have different storage server requirements, so a storage server switcher will transmit data through a different application interface on the  basis of the storage s erver that the use r chooses. After the encryption of a file, the encrypted data will be treated as normal data in the storage server, and the storage server will need to have the ability to perform only a few  basic functions such as copy, move, and transmit file. When a user wants to restore a file, the system can use the checksum file to retrieve the encrypted file from the server and decrypt it on the client side. In this way, the server does not need additional software for backup; it becomes a simple storage space because most of the computing is performed on the client side. It is easy to implement this system. We do not need to implement checksum, split file, and encryption ourselves. We can use well-designed open-source software to help us complete this system. II. CHECK FILES  A. Check Files After the data are encrypted, the system cannot compare the original file with the encrypted file for determining the modified parts of the file. Therefore, the system must have the file information from the last backup for determining 2013 27th International Conference on Advanced Information Networking and Applications Workshops 978-0-7695-4 952-1/13 $26.00 © 2013 IEEE DOI 10.1109/WAINA.2013.178 1467

Encrypted Incremental Backup Without Server-Side Software

Embed Size (px)

Citation preview

Page 1: Encrypted Incremental Backup Without Server-Side Software

8/10/2019 Encrypted Incremental Backup Without Server-Side Software

http://slidepdf.com/reader/full/encrypted-incremental-backup-without-server-side-software 1/6

Encrypted Incremental Backup without Server-Side Software

Shih-Yu Lu

Cloud System Software Institute

Institute for Information Industry

Taipei City, Taiwan (R.O.C)

[email protected]

 Abstract  — With the development of science and technology, the

capacity of hard disk and quantity of data are constantlyincreasing. Backup has become an important mechanism for

 prevention of data loss. When the amount of data is small, full backup is appropriate. However, if the amount of data is

considerably large, ensuring full backup every time becomes time

consuming. In this case, incremental backup is used for savingtime. Incremental backup preserves data by not creating multiplecopies; therefore, incremental backup is faster than full backup. To

solve the related security problem, encryption is necessary when

the backup data are stored in the storage server. However, most

incremental backup cannot be encrypted. Thus, this paper presentsa method to take encrypted incremental backup without any

additional server-side software requirements. The encrypted

incremental backup collects information from every file and stores

it into a single file called the checksum file. This information

includes filename, checksum, last modified time, file size, deletestamp, and encryption key. When the backup begins, the client

collects the filename, checksum, last modified time, and file size.

Then, the client gets another checksum file from the storage server.

By comparing these two checksum files, the system will know

which files have been changed and should be transmitted in the backup this time. Before these files are sent to the storage server,

the client will generate random keys to encrypt the files and store

the encrypted keys in the checksum file. The system will useanother key (key encryption key; KEK) to encrypt the checksum

file; the administrator password is used for encrypting/decrypting

this KEK.

 Keywords-Incremental Backup, Encrypted Backup, Restore

I.  I NTRODUCTION 

Extensive network coverage and mobile devices havemade information and files readily available. The security of business information and user’s data has become pivotal for

enterprises [1][2][3][3][4][5][6][7]. Therefore, there is aneed for security technology that can be used for protecting backup data in a storage server. The term “encrypted backup” refers to the backup data that have been encryptedto keep data secure. If such a technology is used, then evenif other users get these encrypted data, they will not be ableto access the data content easily. On the other hand, becausethe amount of data used in day-to-day life has increasedconsiderably in the recent years, incremental backup is has become increasingly important now. The main concept ofincremental backup is to back up only the difference

 between two file versions as doing so will reduce thetransmitting time and the data transfer.

Unfortunately, as encryption protects the file content, itmay not allow the incremental backup algorithm [8] todetect the parts of a file that have been changed since the last backup. Therefore, most incremental backup methods do not

support encryption. In this paper, I propose a backupmechanism that supports encryption and incremental backup.In order to achieve this goal, we need to do the following: (1)support incremental backup, (2) support encrypted backup,(3) take multiple versions of backup/restoration, and (4)support different storage servers. In summary, I divide the proposed method into three main functions in this paper; theoverview is presented in Figure 1:

Check Files: To determine what parts of a file have beenchanged since the last backup and need to be transmitted tothe storage server in the current backup.

Encryption: After the system generates the transmittedfile list, all files in this list will be encrypted before sendingfiles to the storage server.

Storage Server Switcher: Different users have differentstorage server requirements, so a storage server switcher willtransmit data through a different application interface on the basis of the storage server that the user chooses.

After the encryption of a file, the encrypted data will betreated as normal data in the storage server, and the storageserver will need to have the ability to perform only a few basic functions such as copy, move, and transmit file. Whena user wants to restore a file, the system can use thechecksum file to retrieve the encrypted file from the serverand decrypt it on the client side. In this way, the server doesnot need additional software for backup; it becomes a simplestorage space because most of the computing is performedon the client side. It is easy to implement this system. We do

not need to implement checksum, split file, and encryptionourselves. We can use well-designed open-source softwareto help us complete this system.

II.  CHECK FILES 

 A.  Check Files

After the data are encrypted, the system cannot comparethe original file with the encrypted file for determining themodified parts of the file. Therefore, the system must havethe file information from the last backup for determining

2013 27th International Conference on Advanced Information Networking and Applications Workshops

978-0-7695-4952-1/13 $26.00 © 2013 IEEE

DOI 10.1109/WAINA.2013.178

1467

Page 2: Encrypted Incremental Backup Without Server-Side Software

8/10/2019 Encrypted Incremental Backup Without Server-Side Software

http://slidepdf.com/reader/full/encrypted-incremental-backup-without-server-side-software 2/6

Figure 1. Overview of Encrypted Incremental Backu

what parts of the file have been changed since then. In this proposed method, the system will collect file information before the backup; this file information includes the full pathof the file, last modified time, file size, checksum, deletestamp, and encryption key. Hence, the system can comparethe two checksum lists to determine which part of the filehas been modified since the last backup.

Let T   be the file size threshold,  F   be a collection of backup data, C(F i ) be the checksum of F i where F i   F , S(F i ) be the file size, and N  be the file number in  F. Assume thatthere are N  files, and each is of a different size. If S(F1)  T ,

the checksum of F 1 will be C(F 1 ). If S 3 > T , F 3 will split into(S3 mod T ) +1 blocks. Assume that there are I  blocks in F 3;this implies that  F 3  will have  I + 1 checksums: C(F 31)C(F 32)…. C(F 3 I-1), C(F 3 I),  and C(F3). The details of thesequence are as follows:

1)  Check whether there are backup data in the targetstorage server.

a)  If there are no any backup data in the storage server,

the system will perform full backup.

b)  If there are backup data in the storage server, we

will require the checksum file from the storage server. 

2)  Start making the checksum file. The first system

records the full path, last modified time, and file size of the

 backup data first. Then, it starts comparing the two

checksum lists.

a)  If the last modified time and the file size of a file

are the same in two checksum lists, this file has not been

changed since the last backup; so nothing needs to be done

for this file. Remove this file information from the

checksum list of the client.b)  If the file information is different in the two

checksum file, the file will need to be backed up in the

storage server or deleted. If the file information only exists

in the checksum list of the storage server, add a delete

stamp to this file in the checksum list of the storage server.

3)  Start generating the checksum of the file in the

checksum list from the client.

a)  If S(F i )  T , generate C(F i ), add it to the checksum

list, and transmit the list.

b)  If S(Fi)  > T , split  F i  into I blocks, generate the

checksum for every block, and then go to step 2; do not

compare the last modified time and the file size, but

compare the checksum of every block.

The basic principle is that the delete stamp should bechanged from 0 to 1 when the file exists only in thechecksum list obtained from the storage server and therevised parts of the file are transmitted. In fact, the check filestep may be the most important part of the entire system, andit costs the maximum time to implement. There are so manyscenarios that need to be taken care of that if the developers

1468

Page 3: Encrypted Incremental Backup Without Server-Side Software

8/10/2019 Encrypted Incremental Backup Without Server-Side Software

http://slidepdf.com/reader/full/encrypted-incremental-backup-without-server-side-software 3/6

do not think carefully and consider them, the system maydelete the wrong data or not back up the necessary data.

 B.  Checksum File Format

The checksum file is the one of the importantcomponents of the proposed method, and Table 1 shows theformat of the checksum file when the file size threshold is

300 kB. The first column is File Name and includes the full path and the filename of a file. The second column has thelast modified time, and its format is as follows:YYYYMMDDhhmmss.nnn. FS stands for file size. Becausethe file size threshold is 300 kB in this table and file1 andfile2 are smaller than 300 kB, they do not change. File3 is bigger than 300 kB and hence will be split into two chunks.Chunk 1 is 300 kB, and chunk 2 is 262 kB (562 kB - 300 kB= 262 kB). CKS is the checksum value and depends on theselected hash algorithm [9] [10] [11]. The fifth column D isfor the delete stamp; when D is equal to 1, it implies that thisfile or chunk should be deleted or should not be transmitted.When the system is in the backup step, if the systemmodifies the delete stamp from 0 to 1, it implies that this fileno longer exists on the client now but it exists in the previous version of the backup data in the storage server.Hence, I use the delete stamp as an indicator for restoringdata from multiple backup versions and letting the systemknow which file should be transferred from the storageserver to the client, when the system begins the restoreoperation. Key refers to the encryption key and is used forencrypting a file; it is generated by a random key generator.The format and the length of encryption key also depend onthe encryption algorithm [12][13][14]. Different algorithmswill generate different keys with different encryptionstrengths.

The two most important parameters given in Table 1 areCKS and Key; these two values depend on the algorithms orfunctions that the developers use for implementing the

 backup system. Further, they affect the performance of the backup; in most cases, increased security means increasedcomputing, and increased computing needs relatively moretime to complete.

TABLE I. CHECKSUM FILE FORMAT 

FN LMT FS CKS D Key

Path/F1 20120812131425.111 290390 C(F 1 )  0 34sdrt6y

Path/F2 20120903012256.234 153787 C(F 2 )  1 rewg645

Path/F3 20120831173073.729 562000 C(F 3 )  0 7yjf8qil

Path/F3_1 20120831173073.729 300000 C(F 31)  0 3098ller

Path/F3_2 20120831173073.729 262000 C(F 32)  0 u63jduk

C.   Notice

The following need to be noted: First, the reason for thenon-generation of the checksum when the system writes thelast modified time and the file size into the checksum file is performance. Checksum needs more computing power andcost more time. In particular, when there are many files in amodern computer, if the system generates the checksum forevery bit of backup data, the backup process will take moretime to complete. For saving the backup time of the checkfile, I designed two file check steps in this study. Second, the

last modified time and the file size are the indicators in thefirst check file step. In order to avoid the collision problem[15], the last modified time should have millisecondaccuracy, and the unit of data size should be bytes. Assumethat there are two different files; they have the same lastmodified time, file size, and path, but they are saved ondifferent storage servers; the system will not transmit these

files, and this will be a big problem.

III.  E NCRYPTION 

Encryption is the process of transforming data using analgorithm to make the data unreadable to anyone exceptthose possessing a special key [16]. Using encryption to protect certain information and data from other people is avery common procedure. In this paper, I do not propose myown security algorithm; I only propose a flow that showshow to use an encryption function in the backup and restore process.

Figure 2 shows the detailed process of encryption. Afterthe check file step, the system will generate a transmittingfile list and encrypt every file in the transmitting file list.

The key generator will produce random keys for encryption.Every file has different encryption keys, and these keys arewritten into the checksum file. The system will encrypt thechecksum file by using KEK; KEK is a hashed combinationof the user password and a random key. For increasing thedifficulty of decryption, the user password is used forencrypting the random key, and the encrypted key is storedin the storage server. Thus, even if people know the user password or the random key, they will still not be able todecrypt the data easily.

Figure 2. Encryption

Encryption is used for backup that stores data in thestorage server, and decryption is used for restoring thedecrypted data so that the data owner can use these data.Figure 3 shows the details of the decryption process. If wewant to decrypt data, we need to get the encryption keystored in the checksum file. Although the checksum file is protected by KEK, we have to get the key combination andthen hash it by a specific hash key designed by the developer.In order to obtain the key combination, the administrator password is used for decrypting the encrypted key to get thekey in the first step; this key is combined with the

1469

Page 4: Encrypted Incremental Backup Without Server-Side Software

8/10/2019 Encrypted Incremental Backup Without Server-Side Software

http://slidepdf.com/reader/full/encrypted-incremental-backup-without-server-side-software 4/6

administrator password and a specific hash function is usedfor hashing the key combination. In this way, we can obtainthe key required for decrypting the checksum file. Forgetting the encryption keys of every file, the system will usethe hashed key combination to decrypt the encryptedchecksum file. After we obtain the encryption key of everyfile, we can start decrypting the file for the restore operation.

.

Figure 3. Decryption

IV.  STORAGE SERVER SWITCHER  

In this section, the system already has the transmittingfile list and encrypts all the files in the list. Now, we have todecide the storage server to store the encrypted data in.Considering that customers have difference storage serversthat they want to backup to, this backup system must havethe ability to store data in different storage servers. In mostcases, the backup software is for Windows, Linux, and Mac, but customers may have other choices now. The choice is a

network storage service. Network storage services have become increasingly popular in the last few years (e.g.,Amazon S3, Dropbox, Microsoft Skydrive, and GoogleDrive). Users upload, download, back up, and share filesusing these network storage services. These service providers not only offer application software that can beinstalled on computers and mobile devices but also offer anapplication interface for synchronizing the client and theserver data in real time. Hence, a developer can use theseapplication interfaces to connect to the network storageservice. Combining the abovementioned two kinds ofstorage servers may be a good way to enhance the userexperience.

As users have more choices to store their data now, forcommunication with different storage servers, developershave to implement many sets of application interfaces, asshown in Figure 4. The user interface should change whenthe user chooses a different storage server because theapplication interfaces of different storage servers aredifferent.

Figure 4. Storage Switcher

V.  MULTIPLE VERSIONS OF BACKUP AND R ESTORATION 

In this section, I will show how to use the checksum fileto obtain multiple versions of the backup and restoration.

Figure 5 shows an example when the maximum versionlimitation is two versions.

•  The first backup is the full backup; the system will back up all data to the storage server, and checksumfile version 1 has information about every file. Weassume that the first backup is stored in a foldernamed V1.

•  When the second backup begins, the system onlytransmits the revised parts of the files and chunks tothe storage server and generates checksum fileversion 2 that has the information of the transmittingfile and chunk.

•  In this example, the system begins the third backup.The system will first check the number of versions

of the backup in the storage server. If the number ofversions is greater than or equal to the maximumversion limitation, the system will move all the datafrom the V2 folder to the V1 folder, and the twochecksum files will be merged into a new checksumfile version1. After all the merge processes arecompleted, the system will begin backing up data tothe storage server.

Figure 5. Backup process when maximum version limitation is two

1470

Page 5: Encrypted Incremental Backup Without Server-Side Software

8/10/2019 Encrypted Incremental Backup Without Server-Side Software

http://slidepdf.com/reader/full/encrypted-incremental-backup-without-server-side-software 5/6

A complete backup system can not only back up datafrom the client to the storage server but also restore datafrom the storage server. Hence, in this part, I will discuss therestoration of data from the storage server by using thechecksum file. Figures 6, 7, and 8 show the scenarios inwhich the user restores data from three different versions ofthe backup. Let  Fn  be file n,  Fn’   be the new file n, and

 Fn_m be the number m chunk of file n.If we want to restore from the incremental backup data,

we need to access the storage server that has the base datafor the restore operation. In this method, I always use thedata in folder V1 as the base data. The other folders store themodified parts of the files and chunks.

Figure 6 shows the scenario in which the user restoresdata from the second version of the backup. We consider thedata in folder V1 as the base data, and the data in folder V2as the modified date. When the user chooses version 2 datafor restoration, we require both checksum files, version 1and version 2, from the storage server first. After we getthese two checksum files, the system will merge them into anew transmitting file list containing information regarding

the file and chunks that need to be transmitted from theserver to the client. In this case, the user modified file3 andthe third chunk of file2, so we keep this two pieces of data inthe transmitting file list; the other data are obtained fromfolder V1.

Figure 6. Restoration from backup data version 2

.

Figure 7. Restoration from backup data version 3

As shown in Figure 7, the user modifies file1 and deletesfile 4, so there is a delete stamp on the information of file4,and the information of the new file1 in the checksum fileversion 3. The only difference from Figure 6 is that the

version 3 data is based on version 2, so there are F2_3’ andF3 data in version 3. Finally, we obtain F1’ from folder V3,F2_1 F2_2 from folder V1, and F2_3’ F3’ from folder V2.Because F4 no longer exists in backup data version 3, we donot need to transmit F4. Figure 8 is similar to Figure 7; it just shows the restoration from backup data version 4

Figure 8. Restoration from backup data version 4

VI.  CONCLUSION 

This paper presents an encrypted incremental backupmethod and system. By pre-collect information from everyfile, developers can implement this method easily. There aremany parts of system are functional programs, like split file,combine file, encryption, decryption, and hash function. Ifdeveloper want communicates with cloud storage service,there are a lot of APIs (application interface), provided byservice provider, for implementation.

There are still two issues in this system. First problemis key management; the second is size of file size threshold.But those two issues came from the same source, system performance.

VII.  ACKNOWLEDGMENT 

The authors would like to thank Enago (www.enago.tw)for the English language review.

R EFERENCES 

[1]  S. Chaitanya, B. Urgaonkar, A. Sivasubramaniam, “Multi-levelCrypto Disk: Secondary Storage with Flexible PerformanceVersus Security Trade-offs”, 2010 IEEE International Symposium onModeling, Analysis & Simulation of Computer andTelecommunication Systems, pp. 434-436, 2010.

[2]  M. Liang, C. Chang, “Research and design of full disk encryption based on virtual machine”, 3rd IEEE International Conference onComputer Science and Information Technology, pp. 642-646, 2010.

[3] 

R. Prabhakar, Seung Woo Son, C. Patrick, S. Narayanan, M.Kandemir, “Securing Disk-Resident Data through Application LevelEncryption”, Fourth International Workshop on Security in Storage, pp. 46-57, 2007.

[4]  C. Gebhardt, A. Tomlinson, “Secure Virtual Disk Images for GridComputing”, Third Asia-Pacific Conference on TrustedInfrastructure Technologies, pp. 19-29, 2008.

[5]  J. He, M. Xu, “Research on Storage Security Based on TrustedComputing Platform”, International Symposium on ElectronicCommerce and Security, pp. 448-452, 2008.

1471

Page 6: Encrypted Incremental Backup Without Server-Side Software

8/10/2019 Encrypted Incremental Backup Without Server-Side Software

http://slidepdf.com/reader/full/encrypted-incremental-backup-without-server-side-software 6/6

[6]  F. Hou, N. Xiao, F. Liu, H. He, “Secure Disk with AuthenticatedEncryption and IV Verification”, Fifth International Conference onInformation Assurance and Security, pp. 41-44, 2009.

[7]  J. Li, H. Yu, “Trusted full disk encryption model based on TPM”,2nd International Conference on Information Science andEngineering, pp. 1-4, 2010.

[8]  A. Tridgell, P. Mackerras, "The rsync algorithm", available at :http://cs.anu.edu.au/techreports/1996/TR-CS-96-05.html, Retrieved:

2012-02-28 [9]  R. L. Rivest, "The MD5 Message Digest Algorithm", Internet RFC

1321 (1992-04)

[10]  R. Housley, “A 224-bit One-way Hash Function: SHA-224”,available at : http://tools.ietf.org/html/rfc3874, Retrieved: 2012-05-

12 

[11]  D. Eastlake, “US Secure Hash Algorithms (SHA and SHA-basedHMAC and HKDF)” available at : http://tools.ietf.org/html/rfc6234,Retrieved 2012-05-26 

[12]  J. Daemen, S. Borg, V. Rijmen, “The Design of Rijndael : AES –The Advanced Encryption Standard”, Springer-Verlag, 2002. ISBN3-540-42580-2

[13]  B. Schneier, “Description of a New Variable-Length Key, 64-bitBlock ciphers”, Dast Software Encryption 1993: 191-204

[14]  B. Schneier, J. Kelsey, D. Whiting, D. Wanger, C. Hall, N. Ferguson,“Twofish: A 128-Bit Block Cipher”, available at :

http://www.schneier.com/paper-twofish-paper.pdf . Retrieved 2012-03-025

[15]  X. Wang; H. Yu. "How to Break MD5 and Other Hash Functions".EUROCRYPT. 2005. ISBN 3-540-25910-4.

[16]  Wikipedia, “Encryption”, available at :http://en.wikipedia.org/wiki/Encryption, Retrieved 2012-07-30

1472