* Added check_snapshot option in the DB's AddFile function
* change check_snapshot to skip_snapshot_check
* add unit test for skip_snapshot_check
* Add skip_snapshot_check comment
Summary: Implement a time series database that supports DateTieredCompactionStrategy. It wraps a db object and separate SST files in different column families (time windows).
Test Plan: Add `date_tiered_test`.
Reviewers: dhruba, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D61653
Summary: Add a utility function that trigger necessary full compaction and put output to the correct level by looking at new options and old options.
Test Plan: Add unit tests for it.
Reviewers: andrewkr, igor, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: muthu, sumeet, leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D60783
EnvLibrados is a customized RocksDB Env to use RADOS as the backend file system of RocksDB. It overrides all file system related API of default Env. The easiest way to use it is just like following:
std::string db_name = "test_db";
std::string config_path = "path/to/ceph/config";
DB* db;
Options options;
options.env = EnvLibrados(db_name, config_path);
Status s = DB::Open(options, kDBPath, &db);
Then EnvLibrados will forward all file read/write operation to the RADOS cluster assigned by config_path. Default pool is db_name+"_pool".
There are some options that users could set for EnvLibrados.
- write_buffer_size. This variable is the max buffer size for WritableFile. After reaching the buffer_max_size, EnvLibrados will sync buffer content to RADOS, then clear buffer.
- db_pool. Rather than using default pool, users could set their own db pool name
- wal_dir. The dir for WAL files. Because RocksDB only has 2-level structure (dir_name/file_name), the format of wal_dir is "/dir_name"(CAN'T be "/dir1/dir2"). Default wal_dir is "/wal".
- wal_pool. Corresponding pool name for WAL files. Default value is db_name+"_wal_pool"
The example of setting options looks like following:
db_name = "test_db";
db_pool = db_name+"_pool";
wal_dir = "/wal";
wal_pool = db_name+"_wal_pool";
write_buffer_size = 1 << 20;
env_ = new EnvLibrados(db_name, config, db_pool, wal_dir, wal_pool, write_buffer_size);
DB* db;
Options options;
options.env = env_;
// The last level dir name should match the dir name in prefix_pool_map
options.wal_dir = "/tmp/wal";
// open DB
Status s = DB::Open(options, kDBPath, &db);
Librados is required to compile EnvLibrados. Then use "$make LIBRADOS=1" to compile RocksDB. If you want to only compile EnvLibrados test, just run "$ make env_librados_test LIBRADOS=1". To run env_librados_test, you need to have a running RADOS cluster with the configure file located in "../ceph/src/ceph.conf" related to "rocksdb/".
Summary: Refactor cache.cc so that I can plugin clock cache (D55581). Mainly move `ShardedCache` to separate file, move `LRUHandle` back to cache.cc and rename it lru_cache.cc.
Test Plan:
make check -j64
Reviewers: lightmark, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D59655
Summary:
add DestroyColumnFamilyHandle(ColumnFamilyHandle**) to close column family instead of deleting cfh*
User should call this to close a cf and then we can detect the deletion in this function.
Test Plan: make all check -j64
Reviewers: andrewkr, yiwu, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D60765
Summary:
DB::AddFile(std::string file_path) API that allow them to ingest an SST file created using SstFileWriter
We want to update this interface to be able to accept a list of files that will be ingested, DB::AddFile(std::vector<std::string> file_path_list).
Test Plan:
Add test case `AddExternalSstFileList` in `DBSSTTest`. To make sure:
1. files key ranges are not overlapping with each other
2. each file key range dont overlap with the DB key range
3. make sure no snapshots are held
Reviewers: andrewkr, sdong, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D58587
Summary: char is not signed in some platforms. Having negative values confuse those compilers.
Test Plan: Run all existing tests.
Reviewers: andrewkr, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D59619
Summary:
Rocksdb backup and restore rate limiting is currently done per backup/restore.
So, it is difficult to control rate across multiple backup/restores. With this
change, a throttler can be provided. If a throttler is provided, it is used.
Otherwise, a new throttler is created based on the actual rate limits specified
in the options.
Test Plan: Added unit tests
Reviewers: ldemailly, andrewkr, sdong
Reviewed By: andrewkr
Subscribers: igor, yiwu, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56265
Summary:
This enables configurable Envs without recompiling. For example, my
next diff will make env_test test an Env created by NewEnvFromUri(). Then,
users can determine which Env is tested simply by providing the URI for
NewEnvFromUri() (e.g., through a CLI argument or environment variable).
The registration process allows us to register any Env that is linked with the
RocksDB library, so we can register our internal Envs as well.
The registration code is inspired by our internal InitRegistry.
Test Plan: new unit test
Reviewers: IslamAbdelRahman, lightmark, ldemailly, sdong
Reviewed By: sdong
Subscribers: leveldb, dhruba, andrewkr
Differential Revision: https://reviews.facebook.net/D58449
Summary: The function wrapper for LDBCommand::SelectCommand is too long so that Windows build fails with warning "decorated name length exceeded, name was truncated". Shrink the length by using a struct.
Test Plan: Build on both of Linux and Windows and make sure the warning doesn't show in either platform.
Reviewers: andrewkr, adsharma, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D58965
Summary:
So a customized ldb tool can pass it's own Selector.
Such a selector is expected to call LDBCommand::SelectCommand
and then add some of its own customized commands
Test Plan: make ldb
Reviewers: sdong, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D57249
Summary:
The implementation remains where it is. Only the
header is exported. This is so that a customized
ldb tool can print help along with its own
extra commands
Test Plan: make ldb
Reviewers: sdong, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D57255
Summary: Adds three new WriteBatch data types: Prepare(xid), Commit(xid), Rollback(xid). Prepare(xid) should precede the (single) operation to which is applies. There can obviously be multiple Prepare(xid) markers. There should only be one Rollback(xid) or Commit(xid) marker yet not both. None of this logic is currently enforced and will most likely be implemented further up such as in the memtableinserter. All three markers are similar to PutLogData in that they are writebatch meta-data, ie stored but not counted. All three markers differ from PutLogData in that they will actually be written to disk. As for WriteBatchWithIndex, Prepare, Commit, Rollback are all implemented just as PutLogData and none are tested just as PutLogData.
Test Plan: single unit test in write_batch_test.
Reviewers: hermanlee4, sdong, anthony
Subscribers: leveldb, dhruba, vasilep, andrewkr
Differential Revision: https://reviews.facebook.net/D57867
Summary:
This is needed so that rocksdb users can add more
commands to the included ldb tool by adding more custom
commands.
Test Plan: make -j ldb
Reviewers: sdong, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D57243
Summary: Adds three new WriteBatch data types: Prepare(xid), Commit(xid), Rollback(xid). Prepare(xid) should precede the (single) operation to which is applies. There can obviously be multiple Prepare(xid) markers. There should only be one Rollback(xid) or Commit(xid) marker yet not both. None of this logic is currently enforced and will most likely be implemented further up such as in the memtableinserter. All three markers are similar to PutLogData in that they are writebatch meta-data, ie stored but not counted. All three markers differ from PutLogData in that they will actually be written to disk. As for WriteBatchWithIndex, Prepare, Commit, Rollback are all implemented just as PutLogData and none are tested just as PutLogData.
Test Plan: single unit test in write_batch_test.
Reviewers: hermanlee4, sdong, anthony
Subscribers: andrewkr, vasilep, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D54093
Summary:
This interface is redundant and has been deprecated for a while.
It's also unused internally. Let's delete it.
I moved the comments to the corresponding functions in BackupEngine/
BackupEngineReadOnly. This caused the diff tool to not work cleanly.
Test Plan:
unit tests
$ ./backupable_db_test
Reviewers: yhchiang, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D56331
Summary:
Rocksdb backup engine maintains metadata about backups in separate files. But,
there was no way to add extra application specific data to it. Adding support
for that.
In some use cases, applications decide to restore a backup based on some
metadata. This will help those cases to cheaply decide whether to restore or
not.
Test Plan:
Added a unit test. Existing ones are passing
Sample meta file for BinaryMetadata test-
```
1459454043
0
metadata 6162630A64656600676869
2
private/1/MANIFEST-000001 crc32 1184723444
private/1/CURRENT crc32 3505765120
```
Reviewers: sdong, ldemailly, andrewkr
Reviewed By: andrewkr
Subscribers: andrewkr, dhruba, ldemailly
Differential Revision: https://reviews.facebook.net/D56007
Summary:
Now that we get sizes efficiently, we no longer need the workaround to
embed file size in filename.
Test Plan:
$ ./backupable_db_test
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D55035
Summary: Add function to reinitialize a transaction object so that it can be reused. This is an optimization so users can potentially avoid reallocating transaction objects.
Test Plan: added tests
Reviewers: yhchiang, kradhakrishnan, IslamAbdelRahman, sdong
Reviewed By: sdong
Subscribers: jkedgar, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D53835
Summary: This was fixed by 0c2bd5cb
Test Plan: n/a
Reviewers: gabijs
Reviewed By: gabijs
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D54753
Summary: MyRocks wants to be able to un-lock a key that was just locked by GetForUpdate(). To do this safely, I am now keeping track of the number of reads(for update) and writes for each key in a transaction. UndoGetForUpdate() will only unlock a key if it hasn't been written and the read count reaches 0.
Test Plan: more unit tests
Reviewers: igor, rven, yhchiang, spetrunia, sdong
Reviewed By: spetrunia, sdong
Subscribers: spetrunia, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D47043
Summary: Stopped using std::timed_mutex as it has known issues in older versiong of gcc. Ran into these problems when testing MongoRocks.
Test Plan: unit tests. Manual mongo testing on gcc 4.8.
Reviewers: igor, yhchiang, rven, IslamAbdelRahman, kradhakrishnan, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52197
Summary: Add support to change write options after creating a transaction. This is needed for MongoRocks.
Test Plan: added test
Reviewers: sdong, rven, kradhakrishnan, IslamAbdelRahman, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D51867
This is an Env implementation that mirrors all storage-related methods on
two different backend Env's and verifies that they return the same
results (return status and read results). This is useful for implementing
a new Env and verifying its correctness.
Signed-off-by: Sage Weil <sage@redhat.com>
Summary: When SetSnapshot() is used the caller immediately knows a snapshot has been created, but when SetSnapshotOnNextOperation() is used the caller needs a way to get notified when that snapshot has been generated. This creates an interface that the client can implement that will be called at the time the snapshot is created.
Test Plan: Added a new SetSnapshotOnNextOperationWithNotification test into the transaction_test.
Reviewers: sdong, anthony
Reviewed By: anthony
Subscribers: yoshinorim, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D51177
Summary:
Fixes T8781168.
Added a new function EnableAutoCompactions in db.h to be publicly
avialable. This allows compaction to be re-enabled after disabling it via
SetOptions
Refactored code to set the dbptr earlier on in TransactionDB::Open and DB::Open
Temporarily disable auto_compaction in TransactionDB::Open until dbptr is set to
prevent race condition.
Test Plan:
Ran make all check
verified fix on myrocks side:
was able to reproduce the seg fault with
../tools/mysqltest.sh --mem --force rocksdb.drop_table
method was to manually sleep the thread after DB::Open but before TransactionDB ptr was
assigned in transaction_db_impl.cc:
DB::Open(db_options, dbname, column_families_copy, handles, &db);
clock_t goal = (60000 * 10) + clock();
while (goal > clock());
...dbptr(aka rdb) gets assigned below
verified my changes fixed the issue.
Also added unit test 'ToggleAutoCompaction' in transaction_test.cc
Reviewers: hermanlee4, anthony
Reviewed By: anthony
Subscribers: alex, dhruba
Differential Revision: https://reviews.facebook.net/D51147
Summary: Getting file size from all the backup files can take a long time. In some cases, the sizes are available in file names. We allow a mode to get those sizes from file name.
Test Plan:
Make some unit tests in backupable_db_test to run in such a mode.
Make sure RocksDB Lite builds too.
Reviewers: IslamAbdelRahman, rven, yhchiang, kradhakrishnan, anthony, igor
Reviewed By: igor
Subscribers: muthu, asameet, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D51243
Summary:
Add CheckOptionsCompatibility() API to options_util that returns
Status::OK if the input DBOptions and ColumnFamilyDescriptors
are compatible with the latest options stored in the specified DB path.
Test Plan: Added tests in options_util_test
Reviewers: igor, anthony, IslamAbdelRahman, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D50649
Summary:
This patch adds OptionsUtil::LoadOptionsFromFile() and
OptionsUtil::LoadLatestOptionsFromDB(), which allow developers
to construct DBOptions and ColumnFamilyOptions from a RocksDB
options file. Note that most pointer-typed options such as
merge_operator will not be constructed.
With this API, developers no longer need to remember all the
options in order to reopen an existing rocksdb instance like
the following:
DBOptions db_options;
std::vector<std::string> cf_names;
std::vector<ColumnFamilyOptions> cf_opts;
// Load primitive-typed options from an existing DB
OptionsUtil::LoadLatestOptionsFromDB(
dbname, &db_options, &cf_names, &cf_opts);
// Initialize necessary pointer-typed options
cf_opts[0].merge_operator.reset(new MyMergeOperator());
...
// Construct the vector of ColumnFamilyDescriptor
std::vector<ColumnFamilyDescriptor> cf_descs;
for (size_t i = 0; i < cf_opts.size(); ++i) {
cf_descs.emplace_back(cf_names[i], cf_opts[i]);
}
// Open the DB
DB* db = nullptr;
std::vector<ColumnFamilyHandle*> cf_handles;
auto s = DB::Open(db_options, dbname, cf_descs,
&handles, &db);
Test Plan:
Augment existing tests in column_family_test
options_test
db_test
Reviewers: igor, IslamAbdelRahman, sdong, anthony
Reviewed By: anthony
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D49095
Summary:
This patch introduces utilities/memory, which currently includes
GetApproximateMemoryUsageByType that reports different types of
rocksdb memory usage given a list of input DBs.
The API also take care of the case where Cache could be shared
across multiple column families / multiple db instances.
Currently, it reports memory usage of memtable, table-readers
and cache.
Test Plan: utilities/memory/memory_test.cc
Reviewers: igor, anthony, IslamAbdelRahman, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D49257
Summary:
This patch adds GetAggregatedIntProperty() that returns the aggregated
value from all CFs
Test Plan: Added a test in db_test
Reviewers: igor, sdong, anthony, IslamAbdelRahman, rven
Reviewed By: rven
Subscribers: rven, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D49497
Summary:
This diff is a first step towards an iterator based interface for the
SearchRadial method which replaces a vector of GeoObjects with an
iterator for GeoObjects. This diff works by just wrapping the iterator
for the encapsulated vector of GeoObjects. A future diff could extend
this approach by defining an interator in terms of the underlying
iteration in SearchRadial which would then remove the need to have
an in-memory representation for all the matching GeoObjects.
Fixes T8421387
Test Plan:
The existing tests have been modified to work with the new
interface.
Reviewers: IslamAbdelRahman, kradhakrishnan, dhruba, igor
Reviewed By: igor
Subscribers: igor, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D50031
Summary: Need to pass through the memtable parameter.
Test Plan: built, tested through myrocks
Reviewers: igor, sdong, rven
Reviewed By: rven
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D49167
Summary: Remove warning about an issue that was resolved. Turns out the issue was a false-alarm.
Test Plan: n/a
Reviewers: igor, yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D49011
Summary: As above.
Test Plan: USE_CLANG=1 make check -j
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D48981
Summary: In MyRocks, it is sometimes important to get propeties only for the subset of the database. This diff implements the API in RocksDB.
Test Plan: ran the GetPropertiesOfTablesInRange
Reviewers: rven, sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D48651