21
C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar. 13, 2009

C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar. 13, 2009

Embed Size (px)

Citation preview

Page 1: C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar. 13, 2009

C-Store: An Introduction to Berkeley DB

Jianlin FengSchool of SoftwareSUN YAT-SEN UNIVERSITYMar. 13, 2009

Page 2: C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar. 13, 2009

Overview of Berkeley DB

Means the Berkeley Database An open-source, embedded transactional data ma

nagement system A key/value store

Embedded ? As a library that is linked with an application Hides data management from end-user

Scales from Bytes to Petabytes Runs on everything from cell phone to large s

ervers.

Page 3: C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar. 13, 2009

Berkeley DB : Examples of Applications Google Accounts

Store all user and service account information and preferences.

Amazon’s user-customization

Berkeley DB has high reliability and high performance.

Page 4: C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar. 13, 2009

Berkeley DB: A Brief History (1) Began life in 1991 as a dynamic linear hashin

g implementation. historic UNIX database libraries: dbm, ndbm and

hsearch Released as a library in the 4.4 BSD in 1992.

db-1.85 == Hash + B-Tree

The package LIBTP Transactional Implementation of db-1.85 A research prototype that was never released.

Page 5: C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar. 13, 2009

Berkeley DB: A Brief History (2) In 1996, Seltzer and Bostic started Sleepycat

Software. for use in the Netscape browser

Berkeley DB 2.0, Released in 1997 Transactional implementation the first commercial release

Berkeley DB 3.0, Released in 1999 Transformed into an Object-Oriented Handle and

Method style API.

Page 6: C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar. 13, 2009

Berkeley DB: A Brief History (3) Berkeley DB 4.0, Released in 1999

Single-Master, Multiple-Reader Replication High Availability

replicas can take over for a failed master High Scalability

Read-only replicas can reduce master load Similar ideas are adopted in C-Store.

In Feb. 2006, Oracle acquired Sleepycat.

Page 7: C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar. 13, 2009

Sleepycat Public License: a Dual License The code

Is open source And may be downloaded and used freely

However, redistribution requires Either the package using Berkeley DB be release

d as open source Or that the distributors obtain a commercial licens

e from Sleepycat (and now Oracle, acquired in Feb. 2006).

Page 8: C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar. 13, 2009

Berkeley DB: Product Family Today The original Berkeley DB library Berkeley DB XML

Atop the library Berkeley DB Java Edition

100% pure Java implementation

Page 9: C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar. 13, 2009

Berkeley DB : Product Family Architecture

Page 10: C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar. 13, 2009

Berkeley DB: The Design Philosophy Provide mechanisms without specifying

policies

For example, Berkeley DB is abstracted as a store of <key, value> pairs. Both keys and values are opaque byte-strings. i.e., Berkeley DB has no schema, And the application that embeds Berkeley DB is

responsible for imposing its own schema on the data.

Page 11: C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar. 13, 2009

Advantages of <key, value> pairs An application is free to store data in

whatever form is most natural to it. Objects (like structures in C language) Rows in Oracle, SQL Server Columns in C-store

Different data formats can be stored in the same databases. As long as the application understands how to

interpret the data items.

Page 12: C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar. 13, 2009

Indexing Key Values

Indexing methods B-Tree Hash Queue A record-number-based index implemented atop

B-Tree Data manipulation

Put, store key/value pairs Get, retrieve key/value pairs Delete, remove key/value pairs

Page 13: C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar. 13, 2009

How Applications Access key/value pairs? Through handles on databases

Similar to relational tables Or through cursor handles

Representing a specific place within a database Used for iteration, i.e., fetch a key/value pair each

time. Databases are implemented atop OS file

system. A file may contain one or more databases.

Page 14: C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar. 13, 2009

Berkeley DB Replication:A Log-Shipping System A Replication Group

A single Master One or more Read-Only Replicas.

All write operations must be processed transactionally by the Master

The Master sends log records to each of the Replicas.

The Replicas apply log records only when they receive a transaction commit record.

Page 15: C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar. 13, 2009

Berkeley DB: Configuration Flexibility Configuration flexibility is critical

Due to a wide range of applications

Three ways Compile Time Configuration Feature Set Selection Runtime Configuration

Page 16: C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar. 13, 2009

Compile Time Configuration Option 1: small footprint build

-enable-smallbuild For use in a cell phone The compiled library contains only B-Tree index, Omits replication, cryptography, statistics collectio

n, etc. The library is about 0.5 MB.

Option 2: higher concurrency locking -enable-fine-grained-lock-manager For use in a Data Center Lock-Based Concurrency Control

Page 17: C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar. 13, 2009

Feature Set Selection

1. The Data Store (DS) feature set Most similar to the original db-1.85 library Good for temporary data storage

2. The Concurrent Data Store (CDS) feature set Acquires a single lock per API invocation Good for Read-Most applications

3. The Transactional Data Store (TDS) feature set Currently the most widely used feature set Acquires a single lock per page

4. The High Availability (HA) feature set Can continue running even after a site fails.

Page 18: C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar. 13, 2009

Runtime Configuration

Index Selection and Tuning Applications can select the page size in an index

Trading off Durability and Performance No-force log write Extreme case: applications can run completely in

memory Trading off Two-Phase Locking and Multivers

ion Concurrency Control. Note: C-Store adopts similar ideas for high pe

rformance.

Page 19: C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar. 13, 2009

Challenges of Berkeley DB’s Flexibility Need flexibility in Berkeley DB designers

Need flexibility in application developers

Page 20: C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar. 13, 2009

Any Dream? Any Idea?

iGoogle中国大学生创新设计大赛

中山大学软件学院第四届软件创新设计大赛

Some Research with Me?

Page 21: C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar. 13, 2009

References

M Seltzer . Berkeley DB: A Retrospective. IEEE Data Engineering Bulletin, Pp. 21-28, Volume 30, Number 3, September 2007

MA Olson, K Bostic, M Seltzer . Berkeley DB. USENIX Annual Technical Conference, Pp. 183–192, June 6-11, 1999, Monterey, California, USA.

Oracle Berkeley DB Site. http://www.oracle.com/technology/products/berkeley-db