Upload
grisha-weintraub
View
269
Download
2
Tags:
Embed Size (px)
DESCRIPTION
BigTable is a Google's distributed storage system that is designed to manage large-scale structured data. BigTable was designed for internal (i.e. trusted) use and therefore no security considerations were taken into account. Since 2006, following the publication of the paper that describes BigTable's architecture, several open-source BigTable-like systems have been developed (e.g. HBase, Hypertable). One of the primary uses of such systems is cloud storage - service that provides users with access to data without the need for managing hardware or software. However users may not trust cloud provider and hence appropriate security techniques should be applied. In this seminar three different security approaches for BigTable-like systems are reviewed: 1. iBigTable - enhancement of BigTable that provides scalable data integrity assurance. 2. BigSecret - secure data management framework for BigTable-like storage systems. 3. Accumulo – extension of BigTable that provides cell-level access control.
Citation preview
Security approaches in BigTable-like storage systems
22951 Research Seminar: Information Security and Privacy July 2014
Open University of Israel
Grisha Weintraub
Abstract
• BigTable - Google’s scalable storage system. • Designed for internal(i.e. trusted) use. • Open sources implementations (e.g. HBase).• Can be deployed in a public cloud (i.e. DBaaS). • However one may not trust the public cloud
provider.• Our focus is on the approaches to make
BigTable-like systems secure.
Outline
• BigTable
• Security approaches :
Integrity(iBigTable)
Encryption(BigSecret)
Access Control(Accumulo)
BigTable - Introduction
• Fay Chang et al., Bigtable: A Distributed Storage System for Structured Data, OSDI2006 (Best Paper)
• Distributed storage system for managing structured data that is designed to scale to a very large size.
BigTable – Data Model
• BigTable is a sparse, distributed, persistent multidimensional sorted map.
• The map is indexed by a row key, column key, and a timestamp.
• (row_key,column_key,time) string
BigTable – Data Modelphone name user_id
178145 John 15
email name user_id
[email protected] 29Bob t1Robert t2
row_keycolumn_key
timestamp
(29, name, t2) “Robert”
email phone name user_id
RDBMSApproach
null 178145 John [email protected] null Bob 29
BigTable – Data Model• Columns are grouped into Column Families:
– family : optional qualifier
contactInfo : email contactInfo : phone name: user_id
[email protected] 17814552 John 15
Column Family
Optional Qualifier
name user_id RDBMSApproach
John 15
value type user_id
178145 phone [email protected] email 15
BigTable – Data Model
Value Timestamp Column Row-Key
Qualifier Family
Key Value
• Sorting order:– Row-Key Family Qualifier Timestamp
BigTable – Data Model
• Tablets :– Large tables broken into tablets at row boundaries.– Tablet holds contiguous range of rows.– Approximately 100-200 MB of data per tablet.
..… id
..… 15000
Tablet 1..… .…
..… 20000
..… 20001
Tablet 2..… .…
..… 25000
BigTable – API
• Metadata operations :– Creating and deleting tables, column families, modify access control
rights.
• Client operations :– Write/delete values– Read values– Scan row ranges
// Open the tableTable *T = OpenOrDie("/bigtable/users");
// Update name and delete a phoneRowMutation r1(T, “29");r1.Set(“name:", “Robert");r1.Delete(“contactInfo:phone");Operation op;Apply(&op, &r1);
BigTable – System Structure • Three major components:
– Client library
– Master (exactly one) :• Assigning tablets to tablet servers.• Detecting the addition and expiration of tablet servers.• Balancing tablet-server load.• Garbage collection of files in GFS.• Schema changes such as table and column family creations.
– Tablet Servers(multiple, dynamically added) :• Manages 10-1000 tablets• Handles read and write requests to the tablets.• Splits tablets that have grown too large.
BigTable – System Structure
BigTable – Tablet Location
• Three-level hierarchy analogous to that of a B+ tree to store tablet location information.
• Client library caches tablet locations.
BigTable – Tablet Serving• Writes :
– Updates committed to a commit log.– Recently committed updates are stored in memory – memtable.– Older updates are stored in a sequence of SSTables.
• Reads :– Read operation is executed on a merged view of the sequence of SSTables and the memtable.– Since the SSTables and the memtable are sorted, the merged view can be formed efficiently.
BigTable - Compactions
• Minor compaction:– Converts the memtable into SSTable.– Reduces memory usage.– Reduces log reads during recovery.
• Major compaction:– Merging compaction that results in a single SSTable.– No deletion records, only live data.– Good place to apply policy “keep only N versions”
Outline
• BigTable √
• Security approaches :
Integrity(iBigTable)
Encryption(BigSecret)
Access Control(Accumulo)
iBigTable - Introduction
• Wei Wei, Ting Yu, Rui Xue: iBigTable: practical data integrity for bigtable in public cloud. CODASPY 2013
• Enhancement of BigTable that provides scalable data integrity assurance.
iBigTable – System Model
BigTable
Data Owner
Clients
writes
reads
iBigTable - Goals
• Correctness:– returned records have not been modified in any way
• Completeness:– no answers have been omitted from the result
• Freshness:– results are based on the most current version of the data
iBigTable – System Design• Basic Idea:
– Build Merkle Hash Tree based Authenticated Data Structure for each tablet.
• Verification Object(VO) - Data returned along with result and used to authenticate the result.
• Example – VO for Data block 1 – {Hash 0-1, Hash 1}
iBigTable – System Design
Merkle B+ Tree
iBigTable – System Design
User Tablet User Tablet
Meta Tablet
Root Tablet
Data Owner
Root hash
• Pros:– Only maintain one hash for all data
• Cons:– Require update propagation– Concurrent updates could cause issues
User Tablet User Tablet
Meta Tablet
Root Tablet
Data OwnerRoot hash
Root hash
Root hash
Root hash
……
iBigTable – System Design
iBigTable – Reads
1.1 getMetaTabletLocation(table name, row key)
Tablet Server serving ROOT tabletClient
1.3 meta tablet location
1.4
verif
y
2.1 getUserTabletLocation(table name, row key)
Tablet Server serving META tabletClient
2.3 user tablet location
2.4
verif
y
3.1 getRow(row key)
Tablet Server serving USER tabletClient
3.3 row data
3.4
verif
y
1.2 generate VO
2.2 generate VO
2.2 generate VO
, VO
, VO
, VO
iBigTable – Updates
3.1 new/updated row
Tablet Server serving USER tabletData Owner
3.3 PT-VO
3.4 verify and update tablet root hash 3.2 generate PT-VO
Partial Tree Verification Object (PT-VO) – The difference between a VO and a PT-VO is that a PT-VO contains keys along with hashes, while a VO does not.
iBigTable – Updates
6030
10 50 80
0 10 20 5030 40 80 9060 70
70
Initial MB+ row tree of a tablet in a tablet server.
iBigTable – Updates
6030
50
5030 40
45
New Key 45
Insert a row with key 45 into partial tree VO
40 45
6030
50
5030
New Key 45
40
Partial tree VO after 45 is inserted
iBigTable – Authenticated Data Structure
• Projected range queries - expensive to generate and verify VOs.
SL-MBT: A single-level Merkle B+ tree
iBigTable – Authenticated Data Structure
TL-MBT: A two-level Merkle B+ tree.
Outline
• BigTable √
• Security approaches :
Integrity(iBigTable) √
Encryption(BigSecret)
Access Control(Accumulo)
BigSecret - Introduction
• Erman Pattuk et al., BigSecret: A Secure Data Management Framework for Key-Value Stores. IEEE CLOUD 2013
• A secure data management framework for BigTable-like storage systems.
BigSecret – System Model
BigTable
Clients
BigSecret
get(“Bob”, “email”) Get(“A4Vc”, “Zx$23”)
“DF77Xs9”“[email protected]”
BigSecret – Goals• Secure storage of data on untrusted servers.
• Efficient query execution on encrypted data.
• Supported queries :– Put– Get– Delete– Scan
BigSecret – Preliminaries• Key :
– row||fam||qua||ts
• Symmetric Encryption:– E(p) c //encryption– D(c) p //decryption
• Pseudo-Random Functions(PRF):– H(m) h //deterministic random
• Bucketization:– Partitions p1,p2,… of domain Z.– Ident function that assigns unique random identifiers to each partition.– Map function that takes a partitioned domain, a value v from the domain, and returns
Ident(p), where v belongs to p.
BigSecret – Bucketization
0 100002000 4000 6000 8000
34 97 123 266 771
Map(100) = 34 Map(6451) = 266
Order-preserving mapping:x<y Map(x) < Map(y)
BigSecret – Encryption Models
Naive approach – encrypt values only
BigSecret BigTable
Put(row, fam, qua, ts, value ) Put (row, fam, qua, ts, E(value))
E(value)D(E(value))
– All operations are supported.– Relatively good performance.– Only minor changes to the system are required.– Poor privacy.
BigSecret – Encryption Models
Model-1– bucketization for all key parts
BigSecret BigTable
Put(row, fam, qua, ts, value ) Put (Map(row), Map(fam), Map(qua)||E(key), Map(ts), E(value))
– All operations are supported.– Relatively bad performance.– Privacy-performance trade-off.
Scan(row_from, row_to, fam)
Scan(200, 300, contactInfo)
Scan(Map(row_from), Map(row_to), Map(fam))
Scan(34, 34, 452)
BigSecret – Encryption Models
Model-2– PRF for all key parts
BigSecret BigTable
Put(row, fam, qua, ts, value ) Put (H(row), H(fam), H(qua)||E(key), H(ts), E(value))
– Scan is not supported.– Relatively good performance.– Frequency-based attacks.
Get(row, fam, qua)
Get(200, contactInfo, email)
Get(H(row), H(fam), H(qua))
Get(Az54Et, q8dj8, qWd29h)
BigSecret – Encryption Models
Frequency-based attacks(Damiani et al. 2003)
Possible solutions:• Decreasing the range of the PRFs.• Model-3
city name id
Tel-Aviv Alice 19New York Bob 24
Paris Carol 32New York Alice 38
city name id
$ 27 j
& 14 a
* 23 t
& 27 z
27 = “Alice”& = “New York”
Alice lives in NY
BigSecret – Encryption Models
Model-3– PRF only for row-key
BigSecret BigTable
Put(row, fam, qua, ts, value ) Put (H(row), 0, E(key), 1, E(value))
– Scan is not supported.– Relatively good privacy.– Performance ?
Get(row, fam, qua)
Get(200, contactInfo, email)
Get(H(row), 0, null)
Get(Az54Et, 0, null)
BigSecret – Encryption Models
Outline
• BigTable √
• Security approaches :
Integrity(iBigTable) √
Encryption(BigSecret) √
Access Control(Accumulo)
Accumulo- Introduction
• Adam Fuchs, Apache Accumulo: Extensions to Google's Bigtable Design, 2012, lecture conducted from Morgan State University
• An extension of BigTable that provides cell-level access control.
Accumulo – System Model
BigTable
Value Qualifier Family Row
Bob name [email protected] email contactInfo 14sodium : 137 …
blood test
healthData 14
Patient suffers from .…
doctor’s notes
healthData 14
… … .… …
email, blood test
blood test, notes
Bob
Accumulo – System Model
BigTable
credentials, query
lookup user user authorization set
auth, query
datadata
Accumulo- Data Model
Value Timestamp Column Row-Key
Visibility Qualifier Family
Value Timestamp Column Row-Key
Qualifier Family
Security labels (e.g. A|(B&C) )
Accumulo- Visibility
• Syntax:– A&B – both A and B required– A|B – must have either A or B – A|(B & C) – must have A or both B and C
• Examples:– Admin|(Manager & Sales)– Citizen & Adult– Secret | Top Secret
Accumulo- Visibility
Value Visibility Qualifier Family RowBob name [email protected] bob14 email contactInfo 14
sodium : 137 …
bob14|doctor blood test healthData 14
Patient suffers from .…
doctor doctor’s notes
healthData 14
… … .… …
Accumulo – Visibility
BigTable
(bob, ***), health data
lookup user {bob14}
{bob14}, health data
blood testblood test
Visibility Qualifier Family
doctor notes HealthData
bob14|doctor
blood test HealthData
Bob
Accumulo- Iterators
Iterator
Accumulo- Iterators
Outline
• BigTable √
• Security approaches :
Integrity(iBigTable) √
Encryption(BigSecret) √
Access Control(Accumulo) √
References
• Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, Robert Gruber: Bigtable: A Distributed Storage System for Structured Data (Awarded Best Paper!). OSDI 2006:205-218
• Wei Wei, Ting Yu, Rui Xue: iBigTable: practical data integrity for bigtable in public cloud. CODASPY 2013:341-352
• Erman Pattuk, Murat Kantarcioglu, Vaibhav Khadilkar, Huseyin Ulusoy, Sharad Mehrotra: BigSecret: A Secure Data Management Framework for Key-Value Stores. IEEE CLOUD 2013:147-154
• http://accumulo.apache.org/