23
Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems Research Center, University of California, Santa Cruz, Retreat June 1,2 2004

Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems

Embed Size (px)

Citation preview

Page 1: Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems

Using Algebraic Signatures in StorageApplications

Thomas Schwarz, S.J.

Associate Professor, Santa Clara University

Associate, SSRC UCSC

Storage Systems Research Center, University of California, Santa Cruz, Retreat June 1,2 2004

Page 2: Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems

Signatures

Small strings that characterize objects. Calculated from the object. Distinct Signatures Objects different. Same Signatures Objects same.

With high probability. Error probability is 2-f, f length of signatures in bits.

A.k.a. checksums, hashes, fingerprint, condensed representation, …

Page 3: Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems

Signatures

Examples Tripwire: Protection against malware.

Maintains the signatures of all system libraries in a secure location.

Before a library module is called, verify signature of module.

Page 4: Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems

Signatures

Examples Remote comparison of files:

Problem arose out of first prototypes of replicated databases.

Divide records into pages. Calculate and compare signatures for all

pages. Do this efficiently by combining signatures

of a set of pages into a super-signature.

Page 5: Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems

Signatures

Integrity check for archival storage Keep two copies of archived data. Maintain the signatures of tape

contents. Periodically “scrub” tapes.

Page 6: Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems

Signatures

Similarity Measurement between Files. Similarity of web-pages. Similarity of files in Deep-Store.

Page 7: Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems

Signatures For Scalable Distributed Data

Structures SDDS implement a large file of records in

buckets distributed over a network. SDDS operations (insert, update, delete,

read, scan) have execution times independent of SDDS file size.

Use signatures of blocks to decide which portions of the bucket needs to be backed up.

More secure than dirty bit. Litwin, W., Mokadem, R., Schwarz, T.: Disk Backup through algebraic signatures in scalable and distributed data structures. Proc. 5th Workshop on Distributed Data and Structures, Thessaloniki, June 2003 (WDAS 2003).

Page 8: Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems

Signatures For Scalable Distributed Data

Structures Use signatures of records to test whether

they have been changed. Leads to a read-verification based

concurrency scheme. Read the record. Process the record. Verify that record has not changed by signature.

Litwin, W., Mokadem, R., Schwarz, T.: Disk Backup through algebraic signatures in scalable and distributed data structures. Proc. 5th Workshop on Distributed Data and Structures, WDAS’03, Thessaloniki, June 2003.

Schwarz, T., Holliday, J.: A Signature Based Concurrency Scheme for Scalable Distributed Data Structures. Workshop on Distributed Data and Structures, WDAS'04, Lausanne, 2004.

Page 9: Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems

Cryptographically Secure Signatures

Computationally impossible to find an object with the same signature. Protects against malicious attacks. Used to protect data integrity or to sign data:

Object signature

Apply Private Key

K(signature) Object

K(signature)

Store encrypted signature with object

Page 10: Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems

Cryptographically Secure Signatures MD5

1995 Rivest 16B

SHA1 1994 NSA: FIPS 180 /ANSI x9.30 20B

… Implement a one-way hash.

Page 11: Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems

Signatures with Algebraic Properties Composable signatures*:

Capable of calculating object signatures from component objects.

Updatable signatures: Calculate new signature of a changed

object from old signature and the signature and location of change.

* Suel, T., Noel, P., and Trendafilov, D.: Improved File Synchronization for Maintaining Large Replicated Collections over Slow Networks. In Proc. 20th Int. Conf. on Data Engineering, ICDE, Boston, 2004, p. 153-164.

Litwin, W., Schwarz, T. Algebraic Signatures for Scalable Distributed Data Structures. Proc. of the 20th International Conference on Data Engineering (ICDE), Boston, 2004, p. 412-423.

Page 12: Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems

Signatures with Algebraic Properties

Algebraic properties prevent cryptographic security. Fundamental Design Trade-off.

Page 13: Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems

Algebraic Signatures Karp-Rabin signatures over Galois fields.

A Galois field defines addition, multiplication, subtraction, division, etc. over bit strings of length f.

Same mathematical rules as for rational numbers, real numbers, complex numbers, etc.

Single and compound signature of P=(p1,p2, …)

Karp, R. and Rabin, M.: Efficient randomized pattern-matching algorithms. In IBM Journal of Research and Development, Vol. 31, No. 2, March 1987.

Schwarz, T., Bowdidge, R. and Burkhard, W.: Low Cost Comparison of File Copies. In Proc. Intern. Conf. on Distributed Computing Systems, Paris, Fr., 1990, (ICDCS 5 Proceedings), p. 196-202.

2 -1

1

1

,

sig ( ) :

sig ( ) : (sig ( ),sig ( )...,sig ( ))m

l

m

P p

P P P P

Page 14: Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems

Algebraic Signatures

Properties of compound signature: Size is mf. Detects for sure any change of up to

m symbols. A symbol is a GF element, i.e. a bit string

of length f. Collision probability is 2-fm

Page 15: Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems

Algebraic Signatures

Algebraic Signatures Properties Can update signature from simple

change:

Discovers changes from a cut-and-paste operation.

sig ( ') sig ( ) sig ( ).rP P

Page 16: Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems

Algebraic Signatures Algebraic Signatures Properties

Can calculate the signature of a parity object from the signatures of the data objects.

Holds for normal parity (RAID Level 5)

But also for some forms of generalized parity. Reed-Solomon Codes. Convolutional Array Codes.

Thomas Schwarz, S.J.: Verification of Parity Data in Large Scale Storage Systems, PDPTA 2004, Las Vegas.

(par) (1) (2) ( ), , , ,sig ( ) sig ( ) sig ( ) ... sig ( )rn n n nP P P P

Page 17: Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems

Algebraic Signatures in Large Scale Storage Systems Data – Parity Coherency:

If we miss an update to parity data, then we can no longer reconstruct data:

D1 D2 D3 D4 D5 P

D1’ D2 D3 D4 D5 P

D1’ D2 D3 D4 D5 P

?

Page 18: Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems

Protecting Data in a Large Archival Storage System.

Disk-Based Archival Storage System Data is cold:

Power down disks between accesses. Data on disk storage systems is lost because

of: Device Failure. Block Failure.

Periodically check whether we can access disks.

Periodically check whether we can still read all data on disks.

Page 19: Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems

Protecting Data in a Large Archival Storage System.

Since we need to read all the data anyway,

Since we also need to be concerned about software failures

Check the signatures of data.

Page 20: Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems

Protecting Data in a Large Archival Storage System. Divide disks into scrubbing blocks. Assume that the redundancy scheme creates

generalized parity blocks for scrubbing blocks. Maintain a map of the signatures of the

scrubbing blocks.

D1

D9

D15

D23

P2,5,13

P1,4,7

D12

D2

D20

D22

P2,25,31

D5

P9,12,25

D17

D25

P1,8,22

D3

D31

D19

D3

D10

D15

P15,3,19

Page 21: Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems

Protecting Data in a Large Archival Storage System. When data in the scrubbing block is

updated change its signature. This happens rarely.

When we scrub, check whether the actual signature of block coincides with the signature in metadata.

If not: Something bad has happened. Typically software error, but occasionally

data corruption. Comes at almost no costs.

We need to read anyway.

Page 22: Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems

Protecting Data in a Large Archival Storage System.

Periodically check whether parity blocks and data blocks cohere. Access signatures of data blocks. Calculate signature of parity block(s). Compare with actual signature on file.

Page 23: Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems

Protecting Data in a Large Archival Storage System.

Conclusion Low cost scheme. Protects against data corruption and

parity / data incoherence.