A library that provides an embeddable, persistent key-value store for fast storage.
Go to file
Tomislav Novak 9f690ec62c Fix a deadlock in CompactRange()
Summary:
The way DBImpl::TEST_CompactRange() throttles down the number of bg compactions
can cause it to deadlock when CompactRange() is called concurrently from
multiple threads. Imagine a following scenario with only two threads
(max_background_compactions is 10 and bg_compaction_scheduled_ is initially 0):

   1. Thread #1 increments bg_compaction_scheduled_ (to LargeNumber), sets
      bg_compaction_scheduled_ to 9 (newvalue), schedules the compaction
      (bg_compaction_scheduled_ is now 10) and waits for it to complete.
   2. Thread #2 calls TEST_CompactRange(), increments bg_compaction_scheduled_
      (now LargeNumber + 10) and waits on a cv for bg_compaction_scheduled_ to
      drop to LargeNumber.
   3. BG thread completes the first manual compaction, decrements
      bg_compaction_scheduled_ and wakes up all threads waiting on bg_cv_.
      Thread #1 runs, increments bg_compaction_scheduled_ by LargeNumber again
      (now 2*LargeNumber + 9). Since that's more than LargeNumber + newvalue,
      thread #2 also goes to sleep (waiting on bg_cv_), without resetting
      bg_compaction_scheduled_.

This diff attempts to address the problem by introducing a new counter
bg_manual_only_ (when positive, MaybeScheduleFlushOrCompaction() will only
schedule manual compactions).

Test Plan:
I could pretty much consistently reproduce the deadlock with a program that
calls CompactRange(nullptr, nullptr) immediately after Write() from multiple
threads. This no longer happens with this patch.

Tests (make check) pass.

Reviewers: dhruba, igor, sdong, haobo

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14799
2014-01-07 10:37:34 -08:00
build_tools Revert change in 8f6e319. 2014-01-06 11:53:19 -08:00
coverage Fix the gcov/lcov related issues 2013-08-22 17:01:06 -07:00
db Fix a deadlock in CompactRange() 2014-01-07 10:37:34 -08:00
doc Fix typo. 2013-11-28 03:57:16 +09:00
hdfs Add appropriate LICENSE and Copyright message. 2013-10-16 17:48:41 -07:00
helpers/memenv Change Function names from Compaction->Flush When they really mean Flush 2013-10-14 15:12:15 -07:00
include Support multi-threaded DisableFileDeletions() and EnableFileDeletions() 2014-01-02 03:33:42 -08:00
linters/src fixing linters. 2012-12-14 14:05:27 -08:00
port Print stack trace on assertion failure 2013-12-06 17:11:09 -08:00
table Fix #26 by putting the implementation of CreateDBStatistics() to a cc file 2013-12-05 22:29:03 -08:00
tools Killing Transform Rep 2013-12-03 12:42:15 -08:00
util Fix issue #57 2014-01-06 11:11:19 -08:00
utilities Support multi-threaded DisableFileDeletions() and EnableFileDeletions() 2014-01-02 03:33:42 -08:00
.arcconfig Enable linting in arc. 2013-02-01 11:34:25 -08:00
.clang-format Add clang-format rules 2014-01-02 14:32:15 -08:00
.gitignore Refactor build_tools/build_detect_version 2014-01-06 08:44:43 +02:00
CONTRIBUTING.md Reformat CONTRIBUTING.md with less than 80 characters. 2013-11-16 22:51:09 -08:00
INSTALL.md docs for shared library builds 2013-12-30 21:34:45 +08:00
LICENSE Add appropriate LICENSE and Copyright message. 2013-10-16 17:48:41 -07:00
Makefile moving autovector_test after db_test 2014-01-02 03:30:29 -08:00
PATENTS Fix the patent format 2013-10-16 15:37:32 -07:00
README Add a pointer to the engineering design discussion forum. 2013-12-23 12:19:18 -08:00
README.fb update the latest version in README.fb to 2.7 2013-12-30 16:16:24 -08:00

rocksdb: A persistent key-value store for flash storage
Authors: * The Facebook Database Engineering Team
         * Build on earlier work on leveldb by Sanjay Ghemawat
           (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast
key value server, especially suited for storing data on flash drives.
It has an Log-Structured-Merge-Database (LSM) design with flexible tradeoffs
between Write-Amplification-Factor(WAF), Read-Amplification-Factor (RAF)
and Space-Amplification-Factor(SAF). It has multi-threaded compactions,
making it specially suitable for storing multiple terabytes of data in a
single database.

The core of this code has been derived from open-source leveldb.

The code under this directory implements a system for maintaining a
persistent key/value store.

See doc/index.html and github wiki (https://github.com/facebook/rocksdb/wiki)
for more explanation.

The public interface is in include/*.  Callers should not include or
rely on the details of any other header files in this package.  Those
internal APIs may be changed without warning.

Guide to header files:

include/rocksdb/db.h
    Main interface to the DB: Start here

include/rocksdb/options.h
    Control over the behavior of an entire database, and also
    control over the behavior of individual reads and writes.

include/rocksdb/comparator.h
    Abstraction for user-specified comparison function.  If you want
    just bytewise comparison of keys, you can use the default comparator,
    but clients can write their own comparator implementations if they
    want custom ordering (e.g. to handle different character
    encodings, etc.)

include/rocksdb/iterator.h
    Interface for iterating over data. You can get an iterator
    from a DB object.

include/rocksdb/write_batch.h
    Interface for atomically applying multiple updates to a database.

include/rocksdb/slice.h
    A simple module for maintaining a pointer and a length into some
    other byte array.

include/rocksdb/status.h
    Status is returned from many of the public interfaces and is used
    to report success and various kinds of errors.

include/rocksdb/env.h
    Abstraction of the OS environment.  A posix implementation of
    this interface is in util/env_posix.cc

include/rocksdb/table_builder.h
    Lower-level modules that most clients probably won't use directly

include/rocksdb/cache.h
    An API for the block cache.

include/rocksdb/compaction_filter.h
    An API for a application filter invoked on every compaction.

include/rocksdb/filter_policy.h
    An API for configuring a bloom filter.

include/rocksdb/memtablerep.h
    An API for implementing a memtable.

include/rocksdb/statistics.h
    An API to retrieve various database statistics.

include/rocksdb/transaction_log.h
    An API to retrieve transaction logs from a database.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/