Berkeley DB Reference Guide: Introduction ee,hash,hashing,transaction,transactions,locking,logging,access method,access me thods,java,C,C++">

Berkeley DB Reference Guide: Introduction

What is Berkeley DB?

Berkeley DB is an embeddable database system that supports keyed access to data. The software is distributed in source code form, and developers can compile and link the source code into a single library for inclusion directly in their applications.

Developers may choose to store data in any of several different storage structures to satisfy the requirements of a particular application. In database terminology, these storage structures and the code that operates on them are called access methods. The library includes support for the following access methods:

B+tree: Stores keys in sorted order, using either a programmer-supplied ordering function or a default function that does lexicographical ordering of keys. Applications may perform equality or range searches.
Hashing: Stores records in a hash table for fast searches based on strict equality. Extended Linear Hashing modifies the hash function used by the table as new records are inserted, in order to keep buckets underfull in the steady state.
Fixed and Variable-Length Records: Stores fixed- or variable-length records in sequential order. Record numbers may be immutable or mutable, i.e., permitting new records to be inserted between existing records or requiring that new records be added only at the end of the database.

Berkeley DB also provides core database services to developers. These services include:

Page cache management: The page cache provides fast access to a cache of database pages, handling the I/O associated with the cache to ensure that dirty pages are written back to the file system and that new pages are allocated in unused places.
Transactions: The transaction system provides recoverability and atomicity for multiple database operations. The transaction system uses two-phase locking and write-ahead logging protocols to ensure that database operations may be undone or redone in the case of application or system failure.
Locking: The locking system provides multiple reader or single writer access to objects. The Berkeley DB access methods use the locking system to acquire the right to read or write database pages.
Logging: The logging system implements the write-ahead log, so that changes to database pages are captured in a separate log file. The log file changes are always written to stable storage before the changed data pages, guaranteeing that the database state can be restored to either its pre-change or post-change state even after a system crash or hard-disk failure.

By combining the page cache, transaction, locking, and logging systems, Berkeley DB provides the same services found in much larger, complex and more expensive, database systems. Berkeley DB supports multiple simultaneous readers and writers and guarantees that all changes are recoverable, even in the case of a catastrophic hardware failure during a database update.

Developers may select some or all of the core database services for any access method or database. Therefore, it is possible to choose the appropriate storage structure and the right degrees of concurrency and recoverability for any application.

In addition, some of the systems (for example, the locking subsystem) can be called separately from the Berkeley DB access method. As a result, developers can integrate non-database objects into their transactional applications using Berkeley DB.

Berkeley DB includes callable APIs in C, C++ and Java. Callable APIs for Tcl, Python and Perl are separately available.

The Berkeley DB library does not provide user-level interfaces, data entry GUI's, SQL support or any of the other standard database interfaces. What it does provide are the programmatic building blocks that allow you to easily embed database-style functionality and support into other objects or interfaces.