Summary:
Replace dynamic_cast<> so that users can choose to build with RTTI off, so that they can save several bytes per object, and get tiny more memory available.
Some nontrivial changes:
1. Add Comparator::GetRootComparator() to get around the internal comparator hack
2. Add the two experiemental functions to DB
3. Add TableFactory::GetOptionString() to avoid unnecessary casting to get the option string
4. Since 3 is done, move the parsing option functions for table factory to table factory files too, to be symmetric.
Closes https://github.com/facebook/rocksdb/pull/2645
Differential Revision: D5502723
Pulled By: siying
fbshipit-source-id: fd13cec5601cf68a554d87bfcf056f2ffa5fbf7c
Summary:
platform_dependent tests in Travis now builds all tests, which is not needed. Only build those tests we need to run.
Closes https://github.com/facebook/rocksdb/pull/2647
Differential Revision: D5513954
Pulled By: siying
fbshipit-source-id: 4d540b146124e70dd25586c47939d19f93655b0a
Summary:
Major changes in this PR:
* Implement CassandraCompactionFilter to remove expired columns and rows (if all column expired)
* Move cassandra related code from utilities/merge_operators/cassandra to utilities/cassandra/*
* Switch to use shared_ptr<> from uniqu_ptr for Column membership management in RowValue. Since columns do have multiple owners in Merge and GC process, use shared_ptr helps make RowValue immutable.
* Rename cassandra_merge_test to cassandra_functional_test and add two TTL compaction related tests there.
Closes https://github.com/facebook/rocksdb/pull/2588
Differential Revision: D5430010
Pulled By: wpc
fbshipit-source-id: 9566c21e06de17491d486a68c70f52d501f27687
Summary:
Instead of ignoring UBSan checks, fix the negative shifts in
Hash(). Also add test to make sure the hash values are stable over
time. The values were computed before this change, so the test also
verifies the correctness of the change.
Closes https://github.com/facebook/rocksdb/pull/2546
Differential Revision: D5386369
Pulled By: yiwu-arbug
fbshipit-source-id: 6de4b44461a544d6222cc5d72d8cda2c0373d17e
Summary:
1. The buckifier script assume each test "foo" comes with a .cc file of the same name (i.e. foo.cc). Update cassandra tests to follow this pattern so that the buckifier script can recognize them.
2. add blob_db_test
Closes https://github.com/facebook/rocksdb/pull/2506
Differential Revision: D5331517
Pulled By: yiwu-arbug
fbshipit-source-id: 86f3eba471fc621186ab44cbd073b6162cde8e57
Summary:
This PR adds support for encrypting data stored by RocksDB when written to disk.
It adds an `EncryptedEnv` override of the `Env` class with matching overrides for sequential&random access files.
The encryption itself is done through a configurable `EncryptionProvider`. This class creates is asked to create `BlockAccessCipherStream` for a file. This is where the actual encryption/decryption is being done.
Currently there is a Counter mode implementation of `BlockAccessCipherStream` with a `ROT13` block cipher (NOTE the `ROT13` is for demo purposes only!!).
The Counter operation mode uses an initial counter & random initialization vector (IV).
Both are created randomly for each file and stored in a 4K (default size) block that is prefixed to that file. The `EncryptedEnv` implementation is such that clients of the `Env` class do not see this prefix (nor data, nor in filesize).
The largest part of the prefix block is also encrypted, and there is room left for implementation specific settings/values/keys in there.
To test the encryption, the `DBTestBase` class has been extended to consider a new environment variable called `ENCRYPTED_ENV`. If set, the test will setup a encrypted instance of the `Env` class to use for all tests.
Typically you would run it like this:
```
ENCRYPTED_ENV=1 make check_some
```
There is also an added test that checks that some data inserted into the database is or is not "visible" on disk. With `ENCRYPTED_ENV` active it must not find plain text strings, with `ENCRYPTED_ENV` unset, it must find the plain text strings.
Closes https://github.com/facebook/rocksdb/pull/2424
Differential Revision: D5322178
Pulled By: sdwilsh
fbshipit-source-id: 253b0a9c2c498cc98f580df7f2623cbf7678a27f
Summary:
Improve write buffer manager in several ways:
1. Size is tracked when arena block is allocated, rather than every allocation, so that it can better track actual memory usage and the tracking overhead is slightly lower.
2. We start to trigger memtable flush when 7/8 of the memory cap hits, instead of 100%, and make 100% much harder to hit.
3. Allow a cache object to be passed into buffer manager and the size allocated by memtable can be costed there. This can help users have one single memory cap across block cache and memtable.
Closes https://github.com/facebook/rocksdb/pull/2350
Differential Revision: D5110648
Pulled By: siying
fbshipit-source-id: b4238113094bf22574001e446b5d88523ba00017
Summary:
Blob db rely on base db returning sequence number through write batch after DB::Write(). However after recent changes to the write path, DB::Writ()e no longer return sequence number in some cases. Fixing it by have WriteBatchInternal::InsertInto() always encode sequence number into write batch.
Stacking on #2375.
Closes https://github.com/facebook/rocksdb/pull/2385
Differential Revision: D5148358
Pulled By: yiwu-arbug
fbshipit-source-id: 8bda0aa07b9334ed03ed381548b39d167dc20c33
Summary:
Re-enable blob_db_test with some update:
* Commented out delay at the end of GC tests. Will update the logic later with sync point to properly trigger GC.
* Added some helper functions.
Also update make files to include blob_dump tool.
Closes https://github.com/facebook/rocksdb/pull/2375
Differential Revision: D5133793
Pulled By: yiwu-arbug
fbshipit-source-id: 95470b26d0c1f9592ba4b7637e027fdd263f425c
Summary:
zstd files are downloaded and used as part of JNI build, but are left behind even after doing a `make clean`. This PR updates the `clean` target to remove these zstd files as well.
Closes https://github.com/facebook/rocksdb/pull/2365
Differential Revision: D5123537
Pulled By: sagar0
fbshipit-source-id: a8f355da5ba961aa89d5852e35751ffc35de03ea
Summary:
added ctags -e to the tags target in the makefile. It creates an etags file suitable for emacs.
Closes https://github.com/facebook/rocksdb/pull/2193
Differential Revision: D4983535
Pulled By: siying
fbshipit-source-id: 1077ef0676025b8109df37433572533c9e8fe86e
Summary:
-pic seems to be not working in gcc-5 and it is curently broken. Remove it to fix the build.
Closes https://github.com/facebook/rocksdb/pull/2320
Differential Revision: D5082775
Pulled By: siying
fbshipit-source-id: 5055f987353f1417643a394e7ce05905670410a4
Summary:
snprintf is in <stdio.h> and not in namespace std.
Closes https://github.com/facebook/rocksdb/pull/2287
Reviewed By: anirbanr-fb
Differential Revision: D5054752
Pulled By: yiwu-arbug
fbshipit-source-id: 356807ec38f3c7d95951cdb41f31a3d3ae0714d4
Summary:
As an alternative to Vagrant, we can now also use Docker to cross-build RocksDB. The advantages are:
1. The Docker images are fixed; they include all the latest updates and build tools.
2. The Vagrant image, required scripts that ran for every build that would update CentOS and install the buildtools. This lead to slow repeatable builds, we don't need to do this with Docker as they are already in the provided images.
The Docker images I have used have their Docker build files here: https://github.com/evolvedbinary/docker-rocksjava and the images themselves are available from Docker hub: https://hub.docker.com/r/evolvedbinary/rocksjava/
I have added the following targets to the `Makefile`:
1. `rocksdbjavastaticreleasedocker` this uses Docker to perform the cross-builds. It is basically the Docker version of the existing Vagrant `rocksdbjavastaticrelease` target.
2. `rocksdbjavastaticpublishdocker` delegates to `rocksdbjavastaticreleasedocker` and then `rocksdbjavastaticpublishcentral` to upload the artiacts to Maven Central. Equivalent to the existing Vagrant target: `rocksdbjavastaticpublish`
Closes https://github.com/facebook/rocksdb/pull/2278
Differential Revision: D5048206
Pulled By: yiwu-arbug
fbshipit-source-id: 78fa96ef9d966fe09638ed01de282cd4e31961a9
Summary:
When building rocksdb in fbcode using `make`, util/build_version.cc is always updated (gitignore/hgignore doesn't apply because the file is already checked into fbcode). To use the rocksdb makefile from our own makefile, I would like an option to prevent the metadata update, which is of no value for us.
Closes https://github.com/facebook/rocksdb/pull/2264
Differential Revision: D5037846
Pulled By: siying
fbshipit-source-id: 9fa005725c5ecb31d9cbe2e738cbee209591f08a
Summary:
I'm going to add more DB tests for statistics as currently we have very few. I started a file dedicated to this purpose and moved the existing stats-specific tests there.
Closes https://github.com/facebook/rocksdb/pull/2211
Differential Revision: D4951558
Pulled By: ajkr
fbshipit-source-id: 05d11c35079c40ecabdfd2cf5556ccb761f694a4
Summary:
Replacement of #2147
The change was squashed due to a lot of conflicts.
Closes https://github.com/facebook/rocksdb/pull/2194
Differential Revision: D4929799
Pulled By: siying
fbshipit-source-id: 5cd49c254737a1d5ac13f3c035f128e86524c581
Summary:
Previously, the shared library (make shared_lib) was built with only one
compile line, compiling all .cc files and linking the shared library in
one step. That step would often take 10+ minutes on one machine, and
could not take advantage of multiple CPUs (it's only one invocation of
the compiler).
This commit changes the shared_lib build to compile .o files
individually (placing the resulting .o files in the directory
shared-objects) and then link them into the shared library at the end,
similarly to how the java static build (jls) does it.
Tested by making sure that both static and shared libraries work, and by
making sure that "make clean" cleans up the shared-objects directory.
Closes https://github.com/facebook/rocksdb/pull/2165
Differential Revision: D4897121
Pulled By: yiwu-arbug
fbshipit-source-id: 9811e043d1c01e10503593f3489d186c786ee7d7
Summary:
In some CI test environment, compression libraries can't be successfully built. It still helps to build RocksDB there. Provide such an option to skip to download and build compression libraries.
Closes https://github.com/facebook/rocksdb/pull/2135
Differential Revision: D4872617
Pulled By: siying
fbshipit-source-id: bb21ac373bc62a2528cdf1ca4547e05fcae86214
Summary:
siying this is a resubmission of #2081 with the 4th commit fixed. From that commit message:
> Note that the previous use of quotes in PLATFORM_{CC,CXX}FLAGS was
incorrect and caused GCC to produce the incorrect define:
>
> #define ROCKSDB_JEMALLOC -DJEMALLOC_NO_DEMANGLE 1
>
> This was the cause of the Linux build failure on the previous version
of this change.
I've tested this locally, and the Linux build succeeds now.
Closes https://github.com/facebook/rocksdb/pull/2097
Differential Revision: D4839964
Pulled By: siying
fbshipit-source-id: cc51322
Summary:
Move some files under util/ to new directories env/, monitoring/ options/ and cache/
Closes https://github.com/facebook/rocksdb/pull/2090
Differential Revision: D4833681
Pulled By: siying
fbshipit-source-id: 2fd8bef
Summary:
I've needed Env timing measurements a few times now, so finally built something for it.
Closes https://github.com/facebook/rocksdb/pull/2073
Differential Revision: D4811231
Pulled By: ajkr
fbshipit-source-id: 218a249
Summary:
It is confusing to have auto_roll_logger to stay under db/, which has nothing to do with database. Move filename together as it is a dependency.
Closes https://github.com/facebook/rocksdb/pull/2080
Differential Revision: D4821141
Pulled By: siying
fbshipit-source-id: ca7d768
Summary:
Right now, building rocksdbjava in PowerPC is broken due to JNI library name. I figured it out that "uname -m" and java's os.arch matches in PowerPC architecture. I made use of this advantage to fix the issue. More info can found from this issue --> https://github.com/facebook/rocksdb/issues/1317
Closes https://github.com/facebook/rocksdb/pull/2040
Differential Revision: D4779967
Pulled By: siying
fbshipit-source-id: 259f939
Summary:
After we have db_basic_test and external_sst_file_basic_test, we don't need to run db_test and external_sst_file_test in Travis's MAC OS run anymore. Move it out.
Closes https://github.com/facebook/rocksdb/pull/1940
Differential Revision: D4659361
Pulled By: siying
fbshipit-source-id: e64e291
Summary:
Separate the platform dependent tests from external_sst_file_test. Only those tests need to run on platforms like OSX
Closes https://github.com/facebook/rocksdb/pull/1923
Differential Revision: D4622461
Pulled By: siying
fbshipit-source-id: d2d6f04
Summary:
Separate a smal subset of tests in DBTest to DBBasicTest. Tests in DBTest don't have to run in CI tests on platforms like OSX, as long as they are covered by Linux.
Closes https://github.com/facebook/rocksdb/pull/1924
Differential Revision: D4616702
Pulled By: siying
fbshipit-source-id: 13e6549
Summary:
Travis is short of OSX resource. Try to move platform independent test suites out of OSX
Closes https://github.com/facebook/rocksdb/pull/1922
Differential Revision: D4616070
Pulled By: siying
fbshipit-source-id: 786342c
Summary:
valgrind tests always timeout with parallel run. Black list some slowest ones. It is better to run fewer tests than always have the tests timeout.
Closes https://github.com/facebook/rocksdb/pull/1908
Differential Revision: D4607875
Pulled By: siying
fbshipit-source-id: 7062664
Summary:
The previous version of zlib is no longer available. I have also updated the versions of the other static libraries and added checkum checks for the downloads; This is related to https://github.com/facebook/rocksdb/issues/1769
Closes https://github.com/facebook/rocksdb/pull/1863
Differential Revision: D4550742
Pulled By: yiwu-arbug
fbshipit-source-id: 4414150
Summary:
The Env registration framework supports registering client Envs and selecting which one to instantiate according to a text field. This enabled things like adding the -env_uri argument to db_bench, so the same binary could be reused with different Envs just by changing CLI config.
Now this problem has come up again in a non-Env context, as I want to instantiate a client Statistics implementation from db_bench, which is configured entirely via text parameters. Also, in the future we may wish to use it for deserializing client objects when loading OPTIONS file.
This diff generalizes the Env registration logic to work with arbitrary types.
- Generalized registration and instantiation code by templating them
- The entire implementation is in a header file as that's Google style guide's recommendation for template definitions
- Pattern match with std::regex_match rather than checking prefix, which was the previous behavior
- Rename functions/files to be non-Env-specific
Closes https://github.com/facebook/rocksdb/pull/1776
Differential Revision: D4421933
Pulled By: ajkr
fbshipit-source-id: 34647d1
Summary:
Cleanable objects will perform the registered cleanups when
they are destructed. We however rather to delay this cleaning like when
we are gathering the merge operands. Current approach is to create the
Cleanable object on heap (instead of on stack) and delay deleting it.
By allowing Cleanables to delegate their cleanups to another cleanable
object we can delay the cleaning without however the need to craete the
cleanable object on heap and keeping it around. This patch applies this
technique for the cleanups of BlockIter and shows improved performance
for some in-memory benchmarks:
+1.8% for merge worklaod, +6.4% for non-merge workload when the merge
operator is specified.
https://our.intern.facebook.com/intern/tasks?t=15168163
Non-merge benchmark:
TEST_TMPDIR=/dev/shm/v100nocomp/ ./db_bench --benchmarks=fillrandom
--num=1000000 -value_size=100 -compression_type=none
Reading random with no merge operator specified:
TEST_TMPDIR=/dev/shm/v100nocomp/ ./db_bench
--benchmarks="read
Closes https://github.com/facebook/rocksdb/pull/1711
Differential Revision: D4361163
Pulled By: maysamyabandeh
fbshipit-source-id: 9801e07
Summary:
Added a tombstone-collapsing mode to RangeDelAggregator, which eliminates overlap in the TombstoneMap. In this mode, we can check whether a tombstone covers a user key using upper_bound() (i.e., binary search). However, the tradeoff is the overhead to add tombstones is now higher, so at first I've only enabled it for range scans (compaction/flush/user iterators), where we expect a high number of calls to ShouldDelete() for the same tombstones. Point queries like Get() will still use the linear scan approach.
Also in this diff I changed RangeDelAggregator's TombstoneMap to use multimap with user keys instead of map with internal keys. Callers sometimes provided ParsedInternalKey directly, from which it would've required string copying to derive an internal key Slice with which we could search the map.
Closes https://github.com/facebook/rocksdb/pull/1614
Differential Revision: D4270397
Pulled By: ajkr
fbshipit-source-id: 93092c7
Summary:
Iterator should be in corrupted status if merge operator return false.
Also add test to make sure if max_successive_merges is hit during write,
data will not be lost.
Closes https://github.com/facebook/rocksdb/pull/1665
Differential Revision: D4322695
Pulled By: yiwu-arbug
fbshipit-source-id: b327b05
Summary:
currently when running a portable build we have to do the following
PORTABLE=1 make ...
this commit adds support for the following
make PORTABLE=1 ...
this might be seem subtle but it makes PORTABLE like all other
makefile args and simplifies invocation from numerous build systems
including things like ExternalProject_Add in cmake.
Signed-off-by: Bassam Tabbara <bassam.tabbara@quantum.com>
Closes https://github.com/facebook/rocksdb/pull/1643
Differential Revision: D4315870
Pulled By: yiwu-arbug
fbshipit-source-id: ee43755
Summary:
The second variable "SHELL" simply tells make explicitly which shell to use, instead of allowing it to default to "/bin/sh", which may or may not be Bash.
However, simply defining the second variable by itself causes make to throw an error concerning a circular definition, as it would be attempting to use the "shell" command while simultaneously trying to set which shell to use. Thus, the first variable "BASH_EXISTS" is defined such that make already knows about "/path/to/bash" before trying to use it to set "SHELL".
A more technically correct solution would be to edit the makefile itself to make it compatible with non-bash shells (see the original Issue discussion for details). However, as it seems very few of the people working on this project were building with non-bash shells, I figured this solution would be good enough.
Closes https://github.com/facebook/rocksdb/pull/1631
Differential Revision: D4295689
Pulled By: yiwu-arbug
fbshipit-source-id: e4f9532
Summary:
Travis now is building for ldb tests. Disable for now to unblock other tests while we are investigating.
Closes https://github.com/facebook/rocksdb/pull/1546
Differential Revision: D4209404
Pulled By: siying
fbshipit-source-id: 47edd97
Summary:
This diff includes an implementation of CompactionFilter that allows
users to write CompactionFilter in Lua. With this ability, users can
dynamically change compaction filter logic without requiring building
the rocksdb binary and restarting the database.
To compile, WITH_LUA_PATH must be specified to the base directory
of lua.
Closes https://github.com/facebook/rocksdb/pull/1478
Differential Revision: D4150138
Pulled By: yhchiang
fbshipit-source-id: ed84222
Summary:
Currently our skip-list have an optimization to speedup sequential
inserts from a single stream, by remembering the last insert position.
We extend the idea to support sequential inserts from multiple streams,
and even tolerate small reordering wihtin each stream.
This PR is the interface part adding the following:
- Add `memtable_insert_prefix_extractor` to allow specifying prefix for each key.
- Add `InsertWithHint()` interface to memtable, to allow underlying
implementation to return a hint of insert position, which can be later
pass back to optimize inserts.
- Memtable will maintain a map from prefix to hints and pass the hint
via `InsertWithHint()` if `memtable_insert_prefix_extractor` is non-null.
Closes https://github.com/facebook/rocksdb/pull/1419
Differential Revision: D4079367
Pulled By: yiwu-arbug
fbshipit-source-id: 3555326
* util/build_verion.cc.in: add this file, so cmake and make can share the
template file for generating util/build_version.cc.
* CMakeLists.txt: also, cmake v2.8.11 does not support file(GENERATE ...),
so we are using configure_file() for creating build_version.cc.
* Makefile: use util/build_verion.cc.in for creating build_version.cc.
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Summary:
Valgrind does not work well with JEMALLOC. If you run
a simple make valgrind_check, you will see lots of issues and
crashes. When precommit runs, this is taken care of. Here we
make sure valgrind_check is passed in DISABLE_JEMALLOC=1
Test Plan: Ran local valgrind_test and noticed the difference
Reviewers: IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D65379
Summary:
Revert the behavior where we don't read sequence id from WAL, but increase it as we replay the log. We still keep the behave for 2PC for now but will fix later.
This change fixes github issue 1339, where some writes come with WAL disabled and we may recover records with wrong sequence id.
Test Plan: Added unit test.
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D64275
Summary:
I just realized that when we run parallel valgrind we actually don't run the parallel tests under valgrind (we run the normally)
This patch make sure that we run both parallel and non-parallel tests with valgrind
Test Plan: DISABLE_JEMALLOC=1 make valgrind_check -j64
Reviewers: andrewkr, yiwu, lightmark, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D62469
Summary:
Add mid-point insertion functionality to LRU cache. Caller of `Cache::Insert()` can set an additional parameter to make a cache entry have higher priority. The LRU cache will reserve at most `capacity * high_pri_pool_pct` bytes for high-pri cache entries. If `high_pri_pool_pct` is zero, the cache degenerates to normal LRU cache.
Context: If we are to put index and filter blocks into RocksDB block cache, index/filter block can be swap out too early. We want to add an option to RocksDB to reserve some capacity in block cache just for index/filter blocks, to mitigate the issue.
In later diffs I'll update block based table reader to use the interface to cache index/filter blocks at high priority, and expose the option to `DBOptions` and make it dynamic changeable.
Test Plan: unit test.
Reviewers: IslamAbdelRahman, sdong, lightmark
Reviewed By: lightmark
Subscribers: andrewkr, dhruba, march, leveldb
Differential Revision: https://reviews.facebook.net/D61977
Summary:
This is a proof of concept of a RocksDB blob log file. The actual value of the Put() is appended to a blob log using normal data block format, and the handle of the block is written as the value of the key in RocksDB.
The prototype only supports Put() and Get(). It doesn't support DB restart, garbage collection, Write() call, iterator, snapshots, etc.
Test Plan: Add unit tests.
Reviewers: arahut
Reviewed By: arahut
Subscribers: kradhakrishnan, leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D61485
Summary: With read_options.background_purge_on_iterator_cleanup=true, File deletion and closing can still happen in forward iterator, or WAL file closing. Cover those cases too.
Test Plan: I am adding unit tests.
Reviewers: andrewkr, IslamAbdelRahman, yiwu
Reviewed By: yiwu
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D61503
Summary:
compact_on_deletion_collector_test does not support --gtest_list_tests
since it isn't gtest, so the full program would run for the target
gen_parallel_tests. This caused gen_parallel_tests to take 8+ minutes for tsan
and prevented compact_on_deletion_collector_test from running during check_0
since no t/run-* script could be generated.
Test Plan:
run make check, verify generating t/run-* scripts is fast and
./compact_on_deletion_collector_test is now run
Reviewers: IslamAbdelRahman, wanning, lightmark, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D61695
Summary: Implement a time series database that supports DateTieredCompactionStrategy. It wraps a db object and separate SST files in different column families (time windows).
Test Plan: Add `date_tiered_test`.
Reviewers: dhruba, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D61653
Summary: Add a utility function that trigger necessary full compaction and put output to the correct level by looking at new options and old options.
Test Plan: Add unit tests for it.
Reviewers: andrewkr, igor, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: muthu, sumeet, leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D60783
Summary:
parallel tests are broken because gnu_parallel is reading deprecated options from `/etc/parallel/config`
Fix this by passing `--plain` to ignore `/etc/parallel/config`
Test Plan: make check -j64
Reviewers: kradhakrishnan, sdong, andrewkr, yiwu, arahut
Reviewed By: arahut
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D61359
Summary:
Experiments on column-aware encodings. Supported features: 1) extract data blocks from SST file and encode with specified encodings; 2) Decode encoded data back into row format; 3) Directly extract data blocks and write in row format (without prefix encoding); 4) Get column distribution statistics for column format; 5) Dump data blocks separated by columns in human-readable format.
There is still on-going work on this diff. More refactoring is necessary.
Test Plan: Wrote tests in `column_aware_encoding_test.cc`. More tests should be added.
Reviewers: sdong
Reviewed By: sdong
Subscribers: arahut, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D60027
Summary:
Removing moreutils from sandcastle and adding gnu parallel.
Then passing in J= nproc command
Test Plan: Testing on sandcastle
Reviewers: sdong, kradhakrishnan
Reviewed By: kradhakrishnan
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D61017
Summary:
TickersNameMap is not consistent with Tickers enum.
this cause us to report wrong statistics and sometimes to access TickersNameMap outside it's boundary causing crashes (in Fb303 statistics)
Test Plan: added new unit test
Reviewers: sdong, kradhakrishnan
Reviewed By: kradhakrishnan
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D61083
Summary:
Tsan crash white-box test was disabled because it never ends. Re-enable it with reduced killing odds.
Add a parameter in crash test script to allow we pass it through an environment variable.
Test Plan: Run it manually.
Reviewers: IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D61053
Summary: Multiput atomiciy is broken across multiple column families if we don't sync WAL before flushing one column family. The WAL file may contain a write batch containing writes to a key to the CF to be flushed and a key to other CF. If we don't sync WAL before flushing, if machine crashes after flushing, the write batch will only be partial recovered. Data to other CFs are lost.
Test Plan: Add a new unit test which will fail without the diff.
Reviewers: yhchiang, IslamAbdelRahman, igor, yiwu
Reviewed By: yiwu
Subscribers: yiwu, leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D60915
Summary: RocksDB behavior is slightly different between data on tmpfs and normal file systems. Add a test case to run RocksDB on normal file system.
Test Plan: See the tests launched by Phabricator
Reviewers: kradhakrishnan, IslamAbdelRahman, gunnarku
Reviewed By: gunnarku
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D60963
EnvLibrados is a customized RocksDB Env to use RADOS as the backend file system of RocksDB. It overrides all file system related API of default Env. The easiest way to use it is just like following:
std::string db_name = "test_db";
std::string config_path = "path/to/ceph/config";
DB* db;
Options options;
options.env = EnvLibrados(db_name, config_path);
Status s = DB::Open(options, kDBPath, &db);
Then EnvLibrados will forward all file read/write operation to the RADOS cluster assigned by config_path. Default pool is db_name+"_pool".
There are some options that users could set for EnvLibrados.
- write_buffer_size. This variable is the max buffer size for WritableFile. After reaching the buffer_max_size, EnvLibrados will sync buffer content to RADOS, then clear buffer.
- db_pool. Rather than using default pool, users could set their own db pool name
- wal_dir. The dir for WAL files. Because RocksDB only has 2-level structure (dir_name/file_name), the format of wal_dir is "/dir_name"(CAN'T be "/dir1/dir2"). Default wal_dir is "/wal".
- wal_pool. Corresponding pool name for WAL files. Default value is db_name+"_wal_pool"
The example of setting options looks like following:
db_name = "test_db";
db_pool = db_name+"_pool";
wal_dir = "/wal";
wal_pool = db_name+"_wal_pool";
write_buffer_size = 1 << 20;
env_ = new EnvLibrados(db_name, config, db_pool, wal_dir, wal_pool, write_buffer_size);
DB* db;
Options options;
options.env = env_;
// The last level dir name should match the dir name in prefix_pool_map
options.wal_dir = "/tmp/wal";
// open DB
Status s = DB::Open(options, kDBPath, &db);
Librados is required to compile EnvLibrados. Then use "$make LIBRADOS=1" to compile RocksDB. If you want to only compile EnvLibrados test, just run "$ make env_librados_test LIBRADOS=1". To run env_librados_test, you need to have a running RADOS cluster with the configure file located in "../ceph/src/ceph.conf" related to "rocksdb/".
Summary:
Fix flush not being commit while writing manifest, which is a recent bug introduced by D60075.
The issue:
# Options.max_background_flushes > 1
# Background thread A pick up a flush job, flush, then commit to manifest. (Note that mutex is released before writing manifest.)
# Background thread B pick up another flush job, flush. When it gets to `MemTableList::InstallMemtableFlushResults`, it notices another thread is commiting, so it quit.
# After the first commit, thread A doesn't double check if there are more flush result need to commit, leaving the second flush uncommitted.
Test Plan: run the test. Also verify the new test hit deadlock without the fix.
Reviewers: sdong, igor, lightmark
Reviewed By: lightmark
Subscribers: andrewkr, omegaga, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D60969
The tests run by `make check` require Bash. On Debian you'd need to run
the test as `make SHELL=/bin/bash check`. This commit makes it work on
all POSIX compatible shells (tested on Debian with `dash`).
Summary: As Title.
Test Plan: See how the diff works.
Reviewers: kradhakrishnan, andrewkr, gunnarku
Reviewed By: gunnarku
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D60933
Summary:
When write stalls because of auto compaction is disabled, or stop write trigger is reached,
user may change these two options to unblock writes. Unfortunately we had issue where the write
thread will block the attempt to persist the options, thus creating a deadlock. This diff
fix the issue and add two test cases to detect such deadlock.
Test Plan:
Run unit tests.
Also, revert db_impl.cc to master (but don't revert `DBImpl::BackgroundCompaction:Finish` sync point) and run db_options_test. Both tests should hit deadlock.
Reviewers: sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D60627
Summary: In D33849 we updated Makefile to generate .d files for all .cc sources. Since we have more types of source files now, this needs to be updated so that this mechanism can work for new files.
Test Plan: change a dependent .h file, re-make and see if .o file is recompiled.
Reviewers: sdong, andrewkr
Reviewed By: andrewkr
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D60591
Summary:
Update Makefile to show warnings when we have invalid paths in our make_config.mk file
sample output
```
$ make static_lib -j64
Makefile:150: Warning: /mnt/gvfs/third-party2/libgcc/53e0eac8911888a105aa98b9a35fe61cf1d8b278/4.9.x/gcc-4.9-glibc-2.20/024dbc3/libs dont exist
Makefile:150: Warning: /mnt/gvfs/third-party2/llvm-fb/b91de48a4974ec839946d824402b098d43454cef/stable/centos6-native/7aaccbe/../../src/clang/tools/scan-build/scan-build dont exist
GEN util/build_version.cc
```
Test Plan: check that warning is printed visually
Reviewers: sdong, yiwu, andrewkr
Reviewed By: andrewkr
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D59523
Summary:
This provides provides an implementation of PersistentCacheTier that is
specialized for RAM. This tier does not persist data though.
Why do we need this tier ?
This is ideal as tier 0. This tier can host data that is too hot.
Why can't we use Cache variants ?
Yes you can use them instead. This tier can potentially outperform BlockCache
in RAW mode by virtue of compression and compressed cache in block cache doesn't
seem very popular. Potentially this tier can be modified to under stand the
disadvantage of the tier below and retain data that the tier below is bad at
handling (for example index and bloom data that is huge in size)
Test Plan: Run unit tests added
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D57069
Summary:
- Provide env_test as a static library. We will build it for future releases so internal Envs can use env_test by linking against this library.
- Add tests for CustomEnv, which is configurable via ENV_TEST_URI environment variable. It uses the URI-based Env lookup (depends on D58449).
- Refactor env_basic_test cases to use a unique/configurable directory for test files.
Test Plan:
built a test binary against librocksdb_env_test.a. It registered the
default Env with URI prefix "a://".
- verify runs all CustomEnv tests when URI with correct prefix is provided
```
$ ENV_TEST_URI="a://ok" ./tmp --gtest_filter="CustomEnv/*"
...
[ PASSED ] 12 tests.
```
- verify runs no CustomEnv tests when URI with non-matching prefix is provided
```
$ ENV_TEST_URI="b://ok" ./tmp --gtest_filter="CustomEnv/*"
...
[ PASSED ] 0 tests.
```
Reviewers: ldemailly, IslamAbdelRahman, lightmark, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D58485
Summary:
Extracted basic Env-related tests from mock_env_test and memenv_test into a
parameterized test for Envs: env_basic_test.
Depends on D58449. (The dependency is here only so I can keep this series of
diffs in a chain -- there is no dependency on that diff's code.)
Test Plan: ran tests
Reviewers: IslamAbdelRahman, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D58635
Summary:
This enables configurable Envs without recompiling. For example, my
next diff will make env_test test an Env created by NewEnvFromUri(). Then,
users can determine which Env is tested simply by providing the URI for
NewEnvFromUri() (e.g., through a CLI argument or environment variable).
The registration process allows us to register any Env that is linked with the
RocksDB library, so we can register our internal Envs as well.
The registration code is inspired by our internal InitRegistry.
Test Plan: new unit test
Reviewers: IslamAbdelRahman, lightmark, ldemailly, sdong
Reviewed By: sdong
Subscribers: leveldb, dhruba, andrewkr
Differential Revision: https://reviews.facebook.net/D58449
Summary:
This patch allows db_bench to initialize it's RocksDB Options via a
options file, specified by the --options_file flag. Note that if
--options_file flag is set, then it has higher priority than the
command-line argument.
Test Plan: db_bench_tool_test
Reviewers: sdong, IslamAbdelRahman, kradhakrishnan, yiwu, andrewkr
Reviewed By: andrewkr
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D58533
Summary:
Currently all the tools are included in librocksdb.a (db_bench is not). With
this separate library, we can access db_bench functionality from our internal
repo and eventually move tools out of librocksdb.a.
Test Plan: built a simple binary against this library that invokes db_bench_tool().
Reviewers: IslamAbdelRahman, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D58977
Summary: Deprecate this one option and delete code and tests that are now superfluous.
Test Plan: all tests pass
Reviewers: igor, yhchiang, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: msalib, leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D55317
Summary:
Introduce MaxOperator a simple merge operator that return the max of all operands.
This merge operand help me in benchmarking
Test Plan: Add new unitttests
Reviewers: sdong, andrewkr, yhchiang
Reviewed By: yhchiang
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D57873
Summary:
This is a part of effort to reduce the size of db_test.cc. We move the following tests to a separate file `db_io_failure_test.cc`:
* DropWrites
* DropWritesFlush
* NoSpaceCompactRange
* NonWritableFileSystem
* ManifestWriteError
* PutFailsParanoid
Test Plan: Run `make check` to see if the tests are working properly.
Reviewers: sdong, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D58341
Summary:
We expect the persistent read cache to perform at speeds upto 8 GB/s. In order
to accomplish that, we need build a index mechanism which operate in the order
of multiple millions per sec rate.
This patch provide the basic data structure to accomplish that:
(1) Hash table implementation with lock contention spread
It is based on the StripedHashSet<T> implementation in
The Art of multiprocessor programming by Maurice Henry & Nir Shavit
(2) LRU implementation
Place holder algorithm for further optimizing
(3) Evictable Hash Table implementation
Building block for building index data structure that evicts data like files
etc
TODO:
(1) Figure if the sharded hash table and LRU can be used instead
(2) Figure if we need to support configurable eviction algorithm for
EvictableHashTable
Test Plan: Run unit tests
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D55785
Summary:
Introduced option to dump malloc statistics using new option flag.
Added new command line option to db_bench tool to enable this
funtionality.
Also extended build to support environments with/without jemalloc.
Test Plan:
1) Build rocksdb using `make` command. Launch the following command
`./db_bench --benchmarks=fillrandom --dump_malloc_stats=true
--num=10000000` end verified that jemalloc dump is present in LOG file.
2) Build rocksdb using `DISABLE_JEMALLOC=1 make db_bench -j32` and ran
the same db_bench tool and found the following message in LOG file:
"Please compile with jemalloc to enable malloc dump".
3) Also built rocksdb using `make` command on MacOS to verify behavior
in non-FB environment.
Also to debug build configuration change temporary changed
AM_DEFAULT_VERBOSITY = 1 in Makefile to see compiler and build
tools output. For case 1) -DROCKSDB_JEMALLOC was present in compiler
command line. For both 2) and 3) this flag was not present.
Reviewers: sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D57321
* Musl libc does not provide adaptive mutex. Added feature test for PTHREAD_MUTEX_ADAPTIVE_NP.
* Musl libc does not provide backtrace(3). Added a feature check for backtrace(3).
* Fixed compiler error.
* Musl libc does not implement backtrace(3). Added platform check for libexecinfo.
* Alpine does not appear to support gcc -pg option. By default (gcc has PIE option enabled) it fails with:
gcc: error: -pie and -pg|p|profile are incompatible when linking
When -fno-PIE and -nopie are used it fails with:
/usr/lib/gcc/x86_64-alpine-linux-musl/5.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find gcrt1.o: No such file or directory
Added gcc -pg platform test and output PROFILING_FLAGS accordingly. Replaced pg var in Makefile with PROFILING_FLAGS.
* fix segfault when TEST_IOCTL_FRIENDLY_TMPDIR is undefined and default candidates are not suitable
* use ASSERT_DOUBLE_EQ instead of ASSERT_EQ
* When compiled with ROCKSDB_MALLOC_USABLE_SIZE UniversalCompactionFourPaths and UniversalCompactionSecondPathRatio tests fail due to premature memtable flushes on systems with 16-byte alignment. Arena runs out of block space before GenerateNewFile() completes.
Increased options.write_buffer_size.
Summary:
Generate t/run-* scripts to run tests in $PARALLEL_TEST separately, then make check_0 rule execute all of them.
Run `time make check` after running `make all`.
master: 71 sec
with this diff: 63 sec.
It seems moving more tests to $PARALLEL_TEST doesn't help improve test time though.
Test Plan:
Run the following
make check
J=16 make check
J=1 make check
make valgrind_check
J=1 make valgrind_check
J=16 make_valgrind_check
Reviewers: IslamAbdelRahman, sdong
Reviewed By: sdong
Subscribers: leveldb, kradhakrishnan, dhruba, andrewkr, yhchiang
Differential Revision: https://reviews.facebook.net/D56805
Summary:
Before
{F1131675}
After
{F1131681}
This will have no effect on normal make check
Test Plan: make watch-log
Reviewers: sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56847
Summary:
Extend "J=<parallel>" to valgrind_check.
For DBTest, modify the script to run valgrind. For other tests, prefix launch command with valgrind.
Test Plan: Run valgrind_check with J=1 and J>1 and make sure tests run under valgrind. Manually change codes to introduce memory leak and make sure "make watch-log" correctly report it.
Reviewers: yhchiang, yiwu, andrewkr, kradhakrishnan, IslamAbdelRahman
Reviewed By: kradhakrishnan, IslamAbdelRahman
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56727
Summary:
- Update Makefile check_some command to run a subset of the tests
- Update travis config to split the unittests job into 2 jobs
-- job testing db_test, db_test2
-- job testing the rest of the unittests
Test Plan:
Run the new travis.yml on my own branch
https://travis-ci.org/facebook/rocksdb/builds/122691453
Reviewers: kradhakrishnan, yhchiang, sdong, andrewkr
Reviewed By: andrewkr
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56673
Summary: Test DBOptionsAllFieldsSettable sometimes fails under valgrind. Move option settable tests to a separate test file and disable it in valgrind..
Test Plan: Run valgrind test and make sure the test doesn't run.
Reviewers: andrewkr, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: kradhakrishnan, yiwu, yhchiang, leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56529
Summary:
Undefined Behavior Sanitizer (ubsan) is //a good thing// which will help us to find sneaky bugs with low cost. Please see http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/ for more details and official GCC documentation for more context: https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html.
Changes itself are quite simple and pretty much imitating whatever is implemented for ASan.
Hooking the UBsan validation build to Sandcastle is a separate step and will be dealt as separate diff because code is in internal repository.
Test Plan: Make sure that that there no regressions when it comes to builds and test pass rate.
Reviewers: leveldb, sdong, IslamAbdelRahman, yhchiang
Reviewed By: yhchiang
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56049
Summary:
Basic test cases:
- Manifest is lost or corrupt
- Manifest refers to too many or too few SST files
- SST file is corrupt
- Unflushed data is present when RepairDB is called
Depends on D55065 for its CreateFile() function in file_utils
Test Plan: Ran the tests.
Reviewers: IslamAbdelRahman, yhchiang, yoshinorim, sdong
Reviewed By: sdong
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D55485
Summary:
Cache to have an option to fail Cache::Insert() when full. Update call sites to check status and handle error.
I totally have no idea what's correct behavior of all the call sites when they encounter error. Please let me know if you see something wrong or more unit test is needed.
Test Plan: make check -j32, see tests pass.
Reviewers: anthony, yhchiang, andrewkr, IslamAbdelRahman, kradhakrishnan, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D54705
Summary: We want to provide a way to detect whether an iterator is stale and needs to be recreated. Add a iterator property to return version number.
Test Plan: Add two unit tests for it.
Reviewers: IslamAbdelRahman, yhchiang, anthony, kradhakrishnan, andrewkr
Reviewed By: andrewkr
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D54921
Summary: similar to D52809 add option to exclude zero counters.
Test Plan:
[yiwu@dev4504.prn1 ~/rocksdb] ./iostats_context_test
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from IOStatsContextTest
[ RUN ] IOStatsContextTest.ToString
[ OK ] IOStatsContextTest.ToString (0 ms)
[----------] 1 test from IOStatsContextTest (0 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (0 ms total)
[ PASSED ] 1 test.
Reviewers: anthony, yhchiang, andrewkr, IslamAbdelRahman, kradhakrishnan, sdong
Reviewed By: sdong
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D54591
Summary:
Users are confused on how to get the parallel compilation going. This
can help wire the parallelism.
Test Plan: Run manually
Reviewers: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D53931
Summary:
as titled, this will prevent the error that was printed because
test_names was evaluated before db_test was built.
Test Plan:
verified below command works and no longer prints errors:
$ make release -j32
verified below command still finds the right tests:
$ make J=32 parallel_check
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D54117
Summary: unit_481 is misspelt. Fixing it.
Test Plan: Running make commit_prereq
Reviewers: leveldb
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D53757
Summary:
db/c_test.c uses the functions in db/c.cc. If we have tags generated
for one but not the other, it's easy to make mistakes like updating a function
signature and missing a call site.
Test Plan:
$ make tags
in vim:
:cscope find s rocksdb_options_set_compression_options
...
3 325 db/c_test.c <<main>>
rocksdb_options_set_compression_options(options, -14, -1, 0);
Reviewers: sdong, yhchiang, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D53685
Summary:
Added make targets parallel_test and parallel_dbtest to run
tests in parallel. Each test is run 32 times in parallel. There is a
timeout to catch hangs. The test continues after a failure and reports
non-zero status on failure
Test Plan: Run the two make targets
Reviewers: anthony, yhchiang, IslamAbdelRahman, kradhakrishnan, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D53079
Summary: This patch provides a mechanism to run pre commit tests on the local
branch before committing. This can help prevent frequent build breaks.
The tests can be run in parallel by specifying the J=<..> environment
variable.
Test Plan: Run manually
Reviewers: sdong rven tec
CC: leveldb@
Task ID: #9689218
Blame Rev:
Summary:
Moved all the tests that verify property correctness into a separate
file. The goal is to reduce compile time and complexity of db_test. I didn't
add parallelism for db_properties_test, even though these tests were
parallelized in db_test, since the file is small enough that it won't matter.
Some of these moves may be controversial since it's hard to say whether the
test is "verifying property correctness," or "using properties to verify
rocksdb's correctness." I'm interested in any opinions.
Test Plan: ran db_properties_test, also waiting on "make commit-prereq -j32"
Reviewers: yhchiang, IslamAbdelRahman, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52995
These simple changes are required to allow builds on ppc64[le] systems
consistent with X86. The Makefile now recognizes both ppc64 and ppc64le, and
in the absence of PORTABLE=1, the code will be built analogously to the X86
-march=native.
Note that although GCC supports -mcpu=native -mtune=native on POWER, it
doesn't work correctly on all systems. This is why we need to get the actual
machine model from the AUX vector.
Makefile adjust paths for solaris build
Makefile enable _GLIBCXX_USE_C99 so that std::to_string is available
db_compaction_test.cc Initialise a variable to avoid a compilation error
db_impl.cc Include <alloca.h>
db_test.cc Include <alloca.h>
Environment.java recognise solaris envrionment
options_bulder.cc Make log unambiguous
geodb_impl.cc Make log and floor unambiguous
Summary:
GTEST dont compile under clang when -Werror=missing-field-initializers is set
revert this change
Test Plan:
USE_CLANG=1 make check
make check
Reviewers: rven
Reviewed By: rven
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D52455
Summary:
myrocks seems to build rocksdb using
-Wmissing-field-initializers (and treats warnings as errors). This diff
adds that flag to the rocksdb build, and fixes the compilation failures
that result. I have not checked for any other differences in the build
flags for rocksdb build as part of myrocks.
Test Plan: make check
Reviewers: sdong, rven
Reviewed By: rven
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D52443
This is an Env implementation that mirrors all storage-related methods on
two different backend Env's and verifies that they return the same
results (return status and read results). This is useful for implementing
a new Env and verifying its correctness.
Signed-off-by: Sage Weil <sage@redhat.com>
Summary: Ldb and sst_dump are not included in shared library now. Add it.
Test Plan:
Build
make release
make shared_lib
Reviewers: igor, kradhakrishnan, rven, yhchiang, IslamAbdelRahman, anthony
Reviewed By: IslamAbdelRahman, anthony
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D51735
Summary:
This diff is 1/3 in a sequence that introduces a skip list optimized for
a key that is a freshly-allocated const char*. The diff is broken into
pieces to make it easier to review. This piece only introduces the new
type by copying the existing SkipList, with mechanical naming changes
and reformatting.
Test Plan: new unit test
Reviewers: igor, sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D51279
Summary: forward_iterator_bench is not stable enough for build. Remove it for now.
Test Plan: Build it with both of CLANG and non-CLANG and make sure it builds.
Reviewers: rven, kradhakrishnan, anthony, yhchiang, igor, IslamAbdelRahman
Reviewed By: igor, IslamAbdelRahman
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D50991
Summary:
In case rocksdb java package is built using make rocksdbjavastaticrelease, then
only those rocksdb binary built under the virtual environments is release build.
This patch fix this issue.
Test Plan:
PORTABLE=1 V=2 make rocksdbjavastaticrelease -j32
and make sure -O2 and -NDEBUG is included when compiling all source files.
Reviewers: sdong, anthony, IslamAbdelRahman, rven, kradhakrishnan, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D50895
Summary:
Under a tailing workload, there were increased block cache
misses when a memtable was flushed because we were rebuilding iterators
in that case since the version set changed. This was exacerbated in the
case of iterate_upper_bound, since file iterators which were over the
iterate_upper_bound would have been deleted and are now brought back as
part of the Rebuild, only to be deleted again. We now renew the iterators
and only build iterators for files which are added and delete file
iterators for files which are deleted.
Refer to https://reviews.facebook.net/D50463 for previous version
Test Plan: DBTestTailingIterator.TailingIteratorTrimSeekToNext
Reviewers: anthony, IslamAbdelRahman, igor, tnovak, yhchiang, sdong
Reviewed By: sdong
Subscribers: yhchiang, march, dhruba, leveldb, lovro
Differential Revision: https://reviews.facebook.net/D50679
Summary:
This patch adds OptionsUtil::LoadOptionsFromFile() and
OptionsUtil::LoadLatestOptionsFromDB(), which allow developers
to construct DBOptions and ColumnFamilyOptions from a RocksDB
options file. Note that most pointer-typed options such as
merge_operator will not be constructed.
With this API, developers no longer need to remember all the
options in order to reopen an existing rocksdb instance like
the following:
DBOptions db_options;
std::vector<std::string> cf_names;
std::vector<ColumnFamilyOptions> cf_opts;
// Load primitive-typed options from an existing DB
OptionsUtil::LoadLatestOptionsFromDB(
dbname, &db_options, &cf_names, &cf_opts);
// Initialize necessary pointer-typed options
cf_opts[0].merge_operator.reset(new MyMergeOperator());
...
// Construct the vector of ColumnFamilyDescriptor
std::vector<ColumnFamilyDescriptor> cf_descs;
for (size_t i = 0; i < cf_opts.size(); ++i) {
cf_descs.emplace_back(cf_names[i], cf_opts[i]);
}
// Open the DB
DB* db = nullptr;
std::vector<ColumnFamilyHandle*> cf_handles;
auto s = DB::Open(db_options, dbname, cf_descs,
&handles, &db);
Test Plan:
Augment existing tests in column_family_test
options_test
db_test
Reviewers: igor, IslamAbdelRahman, sdong, anthony
Reviewed By: anthony
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D49095
Summary:
This patch allows rocksdb to persist options into a file on
DB::Open, SetOptions, and Create / Drop ColumnFamily.
Options files are created under the same directory as the rocksdb
instance.
In addition, this patch also adds a fail_if_missing_options_file in DBOptions
that makes any function call return non-ok status when it is not able to
persist options properly.
// If true, then DB::Open / CreateColumnFamily / DropColumnFamily
// / SetOptions will fail if options file is not detected or properly
// persisted.
//
// DEFAULT: false
bool fail_if_missing_options_file;
Options file names are formatted as OPTIONS-<number>, and RocksDB
will always keep the latest two options files.
Test Plan:
Add options_file_test.
options_test
column_family_test
Reviewers: igor, IslamAbdelRahman, sdong, anthony
Reviewed By: anthony
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D48285
Summary:
Reverting c745f1d2c4 because it
was based on an incorrect understanding of the correct way to enable
TSAN tests (it assumes "make COMPILE_WITH_TSAN=1 check" but in fact only
"COMPILE_WITH_TSAN=1 make check" is supported).
Test Plan: COMPILE_WITH_TSAN=1 make check
Reviewers: kradhakrishnan
Reviewed By: kradhakrishnan
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D50445
Summary:
TSAN builds for gcc 4.9 need a PIC version of the libraries
taken from the fbcode platform. This is accomplished by assuming every
.a has a _pic.a sibling, and by fixing the third-party2 zlib build.
Test Plan: make COMPILE_WITH_TSAN=1 check
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D50331
Summary:
This patch introduces utilities/memory, which currently includes
GetApproximateMemoryUsageByType that reports different types of
rocksdb memory usage given a list of input DBs.
The API also take care of the case where Cache could be shared
across multiple column families / multiple db instances.
Currently, it reports memory usage of memtable, table-readers
and cache.
Test Plan: utilities/memory/memory_test.cc
Reviewers: igor, anthony, IslamAbdelRahman, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D49257
Summary:
The goal of this diff is to create a simple stress test with focus on catching:
* bugs in compaction/flush processes, especially the ones that cause assertion errors
* bugs in the code that deletes obsolete files
There are two parts of the test:
* write_stress, a binary that writes to the database
* write_stress_runner.py, a script that invokes and kills write_stress
Here are some interesting parts of write_stress:
* Runs with very high concurrency of compactions and flushes (32 threads total) and tries to create a huge amount of small files
* The keys written to the database are not uniformly distributed -- there is a 3-character prefix that mutates occasionally (in prefix mutator thread), in such a way that the first character mutates slower than second, which mutates slower than third character. That way, the compaction stress tests some interesting compaction features like trivial moves and bottommost level calculation
* There is a thread that creates an iterator, holds it for couple of seconds and then iterates over all keys. This is supposed to test RocksDB's abilities to keep the files alive when there are references to them.
* Some writes trigger WAL sync. This is stress testing our WAL sync code.
* At the end of the run, we make sure that we didn't leak any of the sst files
write_stress_runner.py changes the mode in which we run write_stress and also kills and restarts it. There are some interesting characteristics:
* At the beginning we divide the full test runtime into smaller parts -- shorter runtimes (couple of seconds) and longer runtimes (100, 1000) seconds
* The first time we run write_stress, we destroy the old DB. Every next time during the test, we use the same DB.
* We can run in kill mode or clean-restart mode. Kill mode kills the write_stress violently.
* We can run in mode where delete_obsolete_files_with_fullscan is true or false
* We can run with low_open_files mode turned on or off. When it's turned on, we configure table cache to only hold a couple of files -- that way we need to reopen files every time we access them.
Another goal was to create a stress test without a lot of parameters. So tools/write_stress_runner.py should only take one parameter -- runtime_sec and it should figure out everything else on its own.
In a separate diff, I'll add this new test to our nightly legocastle runs.
Test Plan:
The goal of this test was to retroactively catch the following bugs: D33045, D48201, D46899, D42399. I failed to reproduce D48201, but all others have been caught!
When i reverted https://reviews.facebook.net/D33045:
./write_stress --runtime_sec=200 --low_open_files_mode=true
Iterator statuts not OK: IO error: /fast-rocksdb-tmp/rocksdb_test/write_stress/089166.sst: No such file or directory
When i reverted https://reviews.facebook.net/D42399:
python tools/write_stress_runner.py --runtime_sec=5000
Running write_stress, will kill after 5 seconds: ./write_stress --runtime_sec=-1
Running write_stress, will kill after 2 seconds: ./write_stress --runtime_sec=-1 --destroy_db=false --delete_obsolete_files_with_fullscan=true
Running write_stress, will kill after 7 seconds: ./write_stress --runtime_sec=-1 --destroy_db=false
Running write_stress, will kill after 5 seconds: ./write_stress --runtime_sec=-1 --destroy_db=false
Running write_stress, will kill after 8 seconds: ./write_stress --runtime_sec=-1 --destroy_db=false --low_open_files_mode=true
Write to DB failed: IO error: /fast-rocksdb-tmp/rocksdb_test/write_stress/019250.sst: No such file or directory
ERROR: write_stress died with exitcode=-6
When i reverted https://reviews.facebook.net/D46899:
python tools/write_stress_runner.py --runtime_sec=1000
runtime: 1000
Going to execute write stress for [3, 3, 100, 3, 2, 100, 1, 788]
Running write_stress for 3 seconds: ./write_stress --runtime_sec=3 --low_open_files_mode=true
Running write_stress for 3 seconds: ./write_stress --runtime_sec=3 --destroy_db=false --delete_obsolete_files_with_fullscan=true
Running write_stress, will kill after 100 seconds: ./write_stress --runtime_sec=-1 --destroy_db=false --delete_obsolete_files_with_fullscan=true
write_stress: db/db_impl.cc:2070: void rocksdb::DBImpl::MarkLogsSynced(uint64_t, bool, const rocksdb::Status&): Assertion `log.getting_synced' failed.
ERROR: write_stress died with exitcode=-6
Reviewers: IslamAbdelRahman, yhchiang, rven, kradhakrishnan, sdong, anthony
Reviewed By: anthony
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D49533
Summary: Use DEBUG_LEVEL=0 in make release and make clean
Test Plan:
make clean
make release -j32
Reviewers: MarkCallaghan, sdong, anthony, IslamAbdelRahman, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D49125
Summary:
merge tools/db_crashtest2.py into tools/db_crashtest.py
python tools/db_crashtest.py -h # show help message, ALL parameters can be overwrite by arguments
Example usages:
python tools/db_crashtest.py blackbox # run blackbox with default parameters
python tools/db_crashtest.py blackbox --simple
python tools/db_crashtest.py whitebox # run whitebox with default parameters
python tools/db_crashtest.py whitebox --simple
all default parameters are identical to previous version.
Test Plan: `make crash_test` and make sure it can run with expected parameters pased to db_stress.
Reviewers: igor, rven, anthony, IslamAbdelRahman, yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D48567
Summary:
In the current make file, DEBUG_LEVEL is forced to set to 1
instead of default to 1. This patch fix this issue.
Test Plan:
DEBUG_LEVEL=2 make db_bench -j32
DEBUG_LEVEL=0 make db_bench -j32
DEBUG_LEVEL=0 make release -j32
And see whether there's a warning pops up.
Reviewers: MarkCallaghan, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D48933