Summary:
Rename Version::Builder to VersionBuilder and expose its definition to a header.
Make VerisonBuilder not reference Version or ColumnFamilyData, only working with VersionStorageInfo.
Add version_builder_test which has a simple test.
Test Plan: make all check
Reviewers: rven, yhchiang, igor, ljin
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D27969
Summary: Apply InfoLogLevel to the logs in db/db_iter.cc
Test Plan: make
Reviewers: igor, ljin, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27861
Summary: CompactionPickerTest.Level1Trigger2 now depends on the STL implementation to be correct. Fix it.
Test Plan: Run the test
Reviewers: ljin, yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D27963
Summary: Decoupling code that deals with archived log files outside of DBImpl. That will make this code easier to reason about and test. It will also make the code easier to improve, because an improver doesn't have to understand DBImpl code in entirety.
Test Plan: added test
Reviewers: ljin, yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27873
Summary:
comparator_db_test now adds verification for three more comparators:
(1) one that store double as string
(2) one that cast uint64 to string
(3) one that concatenate two strings, prefixing their sizes.
(4) one that order by hash of the string
Test Plan:
Run ./comparator_db_test
Reviewers: ljin, yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D27927
Summary:
Increase the level size so that impact of a single file is smaller.
Also relax the bound
Test Plan: ran locally
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27939
Summary: Add a new unit test in compaction_picker_test to make sure level-based compaction to pick up the level with the largest score.
Test Plan: Run the new test
Reviewers: ljin, yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D27933
Summary:
Add some helper functions to make sure DB works well for non-default comparators.
Add a test for SimpleSuffixReverseComparator.
Test Plan: Run the new test
Reviewers: ljin, rven, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D27831
Summary:
Make compaction picker easier to test.
The basic idea is to separate a minimum subcomponent of Version to VersionStorageInfo, which just responsible to LSM tree. A stub VersionStorageInfo can then be easily created and passed into compaction picker so that we can check the outputs.
It now passes most tests. Still two things need to be done:
(1) deal with the FIFO compaction's file size.
(2) write an example test to make sure the interface can do the job.
Add a compaction_picker_test to make sure compaction picker codes can be easily unit tested.
Test Plan:
Pass all unit tests and compaction_picker_test
Reviewers: yhchiang, rven, igor, ljin
Reviewed By: ljin
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D27639
Summary: Apply InfoLogLevel to the logs in db/transaction_log_impl.h
Test Plan: make
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27867
Summary: Apply InfoLogLevel to the logs in db/repair.cc
Test Plan: make
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27855
Summary: Apply InfoLogLevel to the logs in db/flush_job.cc
Test Plan: make
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27849
Summary: Apply InfoLogLevel to the logs in db/column_family.cc
Test Plan: make
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27843
Summary: Apply InfoLogLevel to the logs in db/compaction_picker.cc
Test Plan: make
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27837
Summary: Apply InfoLogLevel to the logs in db/db_filesnapshot.cc
Test Plan: make
Reviewers: ljin, sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27813
Summary:
So now all open() in db_test should get options from callsite. And
destroy() always uses the last used options saved on open()
I will start to integrate env_mem in the next diff
Test Plan: make all check -j32
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27819
Summary: as title
Test Plan: as part 1
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27789
Summary: as title
Test Plan: same as part 1
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27705
Summary: as title
Test Plan: same as part 1
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27693
Summary:
DBTest has several functions (Reopen(), TryReopen(), ChangeOptins(), etc
that takes a pointer to options), depending on if it is nullptr, it uses
different options underneath. This makes it really hard to track what
options is used in different test case. We should just kill the default
value and make it being passed into explicitly. It is going to be very
hairy. I will start with simple ones.
Test Plan:
make db_test
stacked diffs, will run test with full stack
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27687
Summary:
This diff replaces BlockBasedTable in flush_job_test with TableMock, making it depend on less things and making it closer to an unit test than integration test.
It also introduces a framework to compile mock classes -- Any file named *mock.cc will not be compiled into the build. It will only get compiled into the tests. What way we can mock out most other classes, Version, VersionSet, DBImpl, etc.
Test Plan: flush_job_test
Reviewers: ljin, rven, yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27681
Summary:
This diff has two fixes.
1. Fix the bug where compaction does not fail when RocksDB can't create a new file.
2. When NewWritableFiles() fails in OpenCompactionOutputFiles(), previously such fail-to-created file will be still be included as a compaction output. This patch also fixes this bug.
3. Allow VersionEdit::EncodeTo() to return Status and add basic check.
Test Plan:
./version_edit_test
export ROCKSDB_TESTS=FileCreationRandomFailure
./db_test
Reviewers: ljin, sdong, nkg-, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D25581
Summary:
Abstract out FlushProcess and take it out of DBImpl.
This also includes taking DeletionState outside of DBImpl.
Currently this diff is only doing the refactoring. Future work includes:
1. Decoupling flush_process.cc, make it depend on less state
2. Write flush_process_test, which will mock out everything that FlushProcess depends on and test it in isolation
Test Plan: make check
Reviewers: rven, yhchiang, sdong, ljin
Reviewed By: ljin
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27561
Summary: No need to expose them in .h
Test Plan: make release
Reviewers: igor, yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27645
Summary: as title
Test Plan:
make release
will run full test on all stacked diffs
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27597
Summary: as title
Test Plan:
make release
will run full test on all stacked diffs before committing
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27591
Summary:
We have several different types of data structures for file information.
FileLevel is kinda of confusing since it only contains file range and
fd. Rename it to LevelFilesBrief to make it clear.
Unfriend CompactedDBImpl as a by product
Test Plan:
make release / make all
will run full test with all stacked diffs
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27585
Summary: as title
Test Plan:
make release
I will run make all check for all stacked diffs before commit
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27573
Summary: as title
Test Plan: running make all check
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27549
Summary: RocksDB already depends on C++11, so we might as well all the goodness that C++11 provides. This means that we don't need AtomicPointer anymore. The less things in port/, the easier it will be to port to other platforms.
Test Plan: make check + careful visual review verifying that NoBarried got memory_order_relaxed, while Acquire/Release methods got memory_order_acquire and memory_order_release
Reviewers: rven, yhchiang, ljin, sdong
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D27543
Summary:
Make inplace_update_support and inplace_update_num_locks dynamic.
inplace_callback becomes immutable
We are almost free of references to cfd->options() in db_impl
Test Plan: unit test
Reviewers: igor, yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D25293
Summary: D24513 introduced a bug that a variable is not initialized. It also causes valgrind issue.
Test Plan: Run tests used to fail valgrind and make sure it passes
Reviewers: yhchiang, ljin, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D25569
Summary:
Previously, the log for Universal Compaction does not include the current
number of files in case the compaction is triggered by the number of files.
This diff includes the number of files in the log.
Test Plan:
make
Summary: as title
Test Plan: unit test
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D25347
Summary: as title
Test Plan: n/a
Reviewers: yhchiang, sdong, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24963
Summary:
This is not a critical options. Making it dynamic so that we can remove
more reference to cfd->options()
Test Plan: unit test
Reviewers: yhchiang, sdong, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24957
Summary: Now --bench_size is only used in multireadrandom tests, although the codes allow it to run in all write tests. I don't see a reason why we can't enable it.
Test Plan:
Run
./db_bench -benchmarks multirandomwrite --threads=5 -batch_size=16
and see the stats printed out in LOG to make sure batching really happened.
Reviewers: ljin, yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D25509
Summary: It is useful to print out number of keys in DB Stats
Test Plan:
./db_bench --benchmarks fillrandom --num 1000000 -threads 16 -batch_size=16
and watch the outputs in LOG files
Reviewers: MarkCallaghan, ljin, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24513
Summary:
DeleteFile() call was broken for non-default column family. This fixes it. We might need this feature for mongo.
I also introduced a possibility of deleting oldest file in level 0.
Test Plan: added unit test to deletefile_test
Reviewers: ljin, yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24909
Summary:
Add a function as tittle.
Also use the same parameter to fillseekseq too.
Test Plan: Run seekrandom using the new parameter
Reviewers: ljin, MarkCallaghan
Reviewed By: MarkCallaghan
Subscribers: rven, igor, yhchiang, leveldb
Differential Revision: https://reviews.facebook.net/D25035
Summary:
This diff speeds up DB::Open() and Version creation by limiting the number of FileMetaData initialization. The behavior of Version::UpdateAccumulatedStats() is changed as follows:
* It only initializes the first 20 uninitialized FileMetaData from file. This guarantees the size of the latest 20 files will always be compensated when they have any deletion entries. Previously it may initialize all FileMetaData by loading all files at DB::Open().
* In case none the first 20 files has any data entry, UpdateAccumulatedStats() will initialize the FileMetaData of the oldest file.
Test Plan: db_test
Reviewers: igor, sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24255
Summary:
Fix the following Mac compile error.
db/db_test.cc:8686:52: error: C++11 forbids default arguments for lambda expressions [-Werror,-Wlambda-extensions]
auto gen_l0_kb = [this](int start, int size, int stride = 1) {
^ ~
Test Plan:
db_test
Summary: as title
Test Plan: make release
Reviewers: igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D25029
Summary: Added one new counter for GetProperty
Test Plan: Not sure if needs a test case. compiles
Reviewers: sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D25023
Summary: as title
Test Plan: make release
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24903
Summary: as title
Test Plan:
unit test
I am only able to build the test case for hard_rate_limit.
soft_rate_limit is essentially the same thing as hard_rate_limit
Reviewers: igor, sdong, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24759
Summary: Add more tests as well
Test Plan: unit test
Reviewers: igor, sdong, yhchiang
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24747
Summary: as title
Test Plan: unit test
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D24729
Summary:
Fixed the following compile error on Mac.
db/db_test.cc:8618:52: error: C++11 forbids default arguments for lambda expressions [-Werror,-Wlambda-extensions]
auto gen_l0_kb = [this](int start, int size, int stride = 1) {
^ ~
1 error generated.
Test Plan:
db_test
Summary: option max_background_flushes doesn't make sense if thread pool size is not set accordingly. Set the thread pool size as what we do for max_background_compactions.
Test Plan: Run db_bench with max_background_flushes > 1
Reviewers: yhchiang, igor, rven, ljin
Reviewed By: ljin
Subscribers: MarkCallaghan, leveldb
Differential Revision: https://reviews.facebook.net/D24717
Summary:
This diff introduces the `lookahead` argument to `SkipListFactory()`. This is an
optimization for the tailing use case which includes many seeks. E.g. consider
the following operations on a skip list iterator:
Seek(x), Next(), Next(), Seek(x+2), Next(), Seek(x+3), Next(), Next(), ...
If `lookahead` is positive, `SkipListRep` will return an iterator which also
keeps track of the previously visited node. Seek() then first does a linear
search starting from that node (up to `lookahead` steps). As in the tailing
example above, this may require fewer than ~log(n) comparisons as with regular
skip list search.
Test Plan:
Added a new benchmark (`fillseekseq`) which simulates the usage pattern. It
first writes N records (with consecutive keys), then measures how much time it
takes to read them by calling `Seek()` and `Next()`.
$ time ./db_bench -num 10000000 -benchmarks fillseekseq -prefix_size 1 \
-key_size 8 -write_buffer_size $[1024*1024*1024] -value_size 50 \
-seekseq_next 2 -skip_list_lookahead=0
[...]
DB path: [/dev/shm/rocksdbtest/dbbench]
fillseekseq : 0.389 micros/op 2569047 ops/sec;
real 0m21.806s
user 0m12.106s
sys 0m9.672s
$ time ./db_bench [...] -skip_list_lookahead=2
[...]
DB path: [/dev/shm/rocksdbtest/dbbench]
fillseekseq : 0.153 micros/op 6540684 ops/sec;
real 0m19.469s
user 0m10.192s
sys 0m9.252s
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb, march, lovro
Differential Revision: https://reviews.facebook.net/D23997
Summary: This will be much easier than reviewing git sha's we currently have in our LOGs
Test Plan: none
Reviewers: sdong, yhchiang, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24591
Summary:
The test only covers changing write_buffer_size. Other changable
parameters such bloom bits/probes are not obvious how to test.
Suggestions are welcome
Test Plan: db_test
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24429
Summary:
Fix a check in database shutdown or Column family drop during flush.
Special thanks to Maurice Barnum who spots the problem :)
Test Plan: db_test
Reviewers: ljin, igor, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24273
Summary:
Add two stats to compaction summary:
1. Total input records from previous level
2. Total number of records dropped after compaction
Test Plan: See outputs of printing when runnning locally
Reviewers: ljin, igor, MarkCallaghan
Reviewed By: MarkCallaghan
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24411
Summary: perf_context.get_from_output_files_time is now only set writable DB's DB::Get(). Extend it to MultiGet() and read only DB.
Test Plan:
make all check
Fix perf_context_test and extend it to cover MultiGet(), as long as read-only DB. Run it and watch the results
Reviewers: ljin, yhchiang, igor
Reviewed By: igor
Subscribers: rven, leveldb
Differential Revision: https://reviews.facebook.net/D24207
Summary:
Fixed signed-unsigned comparison warning in db_test.cc
db/db_test.cc:8606:3: note: in instantiation of function template specialization 'rocksdb::test::Tester::IsEq<int, unsigned long>' requested here
ASSERT_EQ(2, metadata.size());
^
Test Plan:
make db_test
Summary:
Fixed compile warning caused by unused variables.
./db/compaction_picker.h:118:7: error: private field 'max_grandparent_overlap_factor_' is not used [-Werror,-Wunused-private-field]
int max_grandparent_overlap_factor_;
^
./db/compaction_picker.h:119:7: error: private field 'expanded_compaction_factor_' is not used [-Werror,-Wunused-private-field]
int expanded_compaction_factor_;
^
2 errors generated.
Test Plan:
make db_test
Summary:
Since ForwardIterator is on a level below DBIter, the latter may call Next() on
it (e.g. in order to skip deletion markers). Since this also updates
`prev_key_`, it may prevent the Seek() optimization.
For example, assume that there's only one SST file and it contains the following
entries: 0101, 0201 (`ValueType::kTypeDeletion`, i.e. a tombstone record), 0201
(`kTypeValue`), 0202. Memtable is empty. `Seek(0102)` will result in `prev_key_`
being set to `0201` instead of `0102`, since `DBIter::Seek()` will call
`ForwardIterator::Next()` to skip record 0201. Therefore, when `Seek(0102)` is
called again, `NeedToSeekImmutable()` will return true.
This fix relies on `prefix_extractor_` to detect prefix changes. `prev_key_` is
only set to `current_->key()` as long as they have the same prefix.
I also made a small change to `NeedToSeekImmutable()` so it no longer returns
true when the db is empty (i.e. there's nothing but a memtable).
Test Plan:
$ TEST_TMPDIR=/dev/shm/rocksdbtest ROCKSDB_TESTS=TailingIterator ./db_test
Reviewers: sdong, igor, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23823
Summary:
make compaction related options changeable. Most of changes are tedious,
following the same convention: grabs MutableCFOptions at the beginning
of compaction under mutex, then pass it throughout the job and register
it in SuperVersion at the end.
Test Plan: make all check
Reviewers: igor, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23349
Fix for:
[db/version_set.cc:1219]: (style) Unsigned variable 'last_file'
can't be negative so it is unnecessary to test it.
[db/version_set.cc:1234]: (style) Unsigned variable 'first_file'
can't be negative so it is unnecessary to test it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Fix for:
[db/compaction_picker.cc:923]: (style) Unsigned variable
'start_index' can't be negative so it is unnecessary to test it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Fix for:
[db/db_impl.cc:4039]: (error) Instance of 'StopWatch' object is
destroyed immediately.
[db/db_impl.cc:4042]: (error) Instance of 'StopWatch' object is
destroyed immediately.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Use empty() since it should be prefered as it has, following
the standard, a constant time complexity regardless of the
containter type. The same is not guaranteed for size().
Fix for:
[db/version_set.cc:2250]: (performance) Possible inefficient
checking for 'column_families_not_found' emptiness.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Fix for:
[db/db_test.cc:6141]: (performance) Function parameter
'key' should be passed by reference.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Fix for:
[db/corruption_test.cc:134]: (performance) Function parameter
'fname' should be passed by reference.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
It seems that a FlushMemTable() call is needed in the
Uint64Comparator test after call Delete(). Otherwise the later
via Put() added keys get lost with the next FlushMemTable()
call before the check.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Summary:
From this line there used to be one column (MB/sec) that includes reads and writes. This change splits it and for real workloads the rd and wr rates might not match when keys are dropped.
2014/09/29-17:31:01.213162 7f929fbff700 (Original Log Time 2014/09/29-17:31:01.180025) [default] compacted to: files[2 5 0 0 0 0 0], MB/sec: 14.0 rd, 14.0 wr, level 1, files in(4, 0) out(5) MB in(8.5, 0.0) out(8.5), read-write-amplify(2.0) write-amplify(1.0) OK
Test Plan:
make check, grepped LOG
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: igor
Differential Revision: https://reviews.facebook.net/D24237
Summary: Building master on OS X has some compile errors due to implicit type conversions which generate warnings which RocksDB's build settings raise as errors.
Test Plan: It compiles!
Reviewers: ljin, igor
Reviewed By: ljin
Differential Revision: https://reviews.facebook.net/D24135
Summary: see above
Test Plan:
make check, ran db_bench and looked at output
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: igor
Differential Revision: https://reviews.facebook.net/D24189
Summary:
info_log from supplied Options can be nullptr. Using the one from
db_impl. Also call flush after that since no more loggging will happen
and LOG can contain partial output
Test Plan: verified with db_bench
Reviewers: igor, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24183
Summary:
Intead of passing callback function pointer and its arg on Table::Get()
interface, passing GetContext. This makes the interface cleaner and
possible better perf. Also adding a fast pass for SaveValue()
Test Plan: make all check
Reviewers: igor, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24057
Summary: There is a possible overflow case in universal compaction picker. Use double to make the logic straight-forward
Test Plan: make all check
Reviewers: yhchiang, igor, MarkCallaghan, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23817
Summary:
Add the MultiGet API to allow prefetching.
With file size of 1.5G, I configured it to have 0.9 hash ratio that can
fill With 115M keys and result in 2 hash functions, the lookup QPS is
~4.9M/s vs. 3M/s for Get().
It is tricky to set the parameters right. Since files size is determined
by power-of-two factor, that means # of keys is fixed in each file. With
big file size (thus smaller # of files), we will have more chance to
waste lot of space in the last file - lower space utilization as a
result. Using smaller file size can improve the situation, but that
harms lookup speed.
Test Plan: db_bench
Reviewers: yhchiang, sdong, igor
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23673
Summary:
Add a CompactedDBImpl that will enabled when calling OpenForReadOnly()
and the DB only has one level (>0) of files. As a performan comparison,
CuckooTable performs 2.1M/s with CompactedDBImpl vs. 1.78M/s with
ReadOnlyDBImpl.
Test Plan: db_bench
Reviewers: yhchiang, igor, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23553
Summary: While debugging clients compaction issues, I noticed bunch of delete bugs: P16329995. MakeTableName returns sst file with "/" prefix. We also need "/" prefix when we get the files though GetChildren(), so that we can properly dedup the files.
Test Plan: none
Reviewers: sdong, yhchiang, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23457
Summary:
Now the file summary is too small for printing. Enlarge it.
To enable it, allow to pass a size to log buffer.
Test Plan:
Add a unit test.
make all check
Reviewers: ljin, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D21723
Summary:
Previously, one single column family is given to WriteBatchWithIndex to index keys for all column families. An extra map from column family ID to comparator is maintained which can override the default comparator given in the constructor. A WriteBatchWithIndex::SetComparatorForCF() is added for user to add comparators per column family.
Also move more codes into anonymous namespace.
Test Plan: Add a unit test
Reviewers: ljin, igor
Reviewed By: igor
Subscribers: dhruba, leveldb, yhchiang
Differential Revision: https://reviews.facebook.net/D23355
Summary: It contrains the file size to be 4G max with int
Test Plan:
tried to grep instance and made sure other related variables are also
uint64
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23697
Summary:
Compaction creates backup_input iterator even though it only needed
when compaction filter v2 is enabled
Test Plan: make all check
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23769
Summary: Use PRIu64 to format uint64 in a portable manner
Test Plan: Run "make all check"
Reviewers: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23595
Summary: Those were introduced with 2fb1fea30f because the flushing behavior changed when max_background_flushes is > 0.
Test Plan: make check
Reviewers: ljin, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23577
Summary:
MurmurHash becomes expensive when we do millions Get() a second in one
thread. Add this option to allow the first hash function to use identity
function as hash function. It results in QPS increase from 3.7M/s to
~4.3M/s. I did not observe improvement for end to end RocksDB
performance. This may be caused by other bottlenecks that I will address
in a separate diff.
Test Plan:
```
[ljin@dev1964 rocksdb] ./cuckoo_table_reader_test --enable_perf --file_dir=/dev/shm --write --identity_as_first_hash=0
==== Test CuckooReaderTest.WhenKeyExists
==== Test CuckooReaderTest.WhenKeyExistsWithUint64Comparator
==== Test CuckooReaderTest.CheckIterator
==== Test CuckooReaderTest.CheckIteratorUint64
==== Test CuckooReaderTest.WhenKeyNotFound
==== Test CuckooReaderTest.TestReadPerformance
With 125829120 items, utilization is 93.75%, number of hash functions: 2.
Time taken per op is 0.272us (3.7 Mqps) with batch size of 0, # of found keys 125829120
With 125829120 items, utilization is 93.75%, number of hash functions: 2.
Time taken per op is 0.138us (7.2 Mqps) with batch size of 10, # of found keys 125829120
With 125829120 items, utilization is 93.75%, number of hash functions: 2.
Time taken per op is 0.142us (7.1 Mqps) with batch size of 25, # of found keys 125829120
With 125829120 items, utilization is 93.75%, number of hash functions: 2.
Time taken per op is 0.142us (7.0 Mqps) with batch size of 50, # of found keys 125829120
With 125829120 items, utilization is 93.75%, number of hash functions: 2.
Time taken per op is 0.144us (6.9 Mqps) with batch size of 100, # of found keys 125829120
With 104857600 items, utilization is 78.12%, number of hash functions: 2.
Time taken per op is 0.201us (5.0 Mqps) with batch size of 0, # of found keys 104857600
With 104857600 items, utilization is 78.12%, number of hash functions: 2.
Time taken per op is 0.121us (8.3 Mqps) with batch size of 10, # of found keys 104857600
With 104857600 items, utilization is 78.12%, number of hash functions: 2.
Time taken per op is 0.123us (8.1 Mqps) with batch size of 25, # of found keys 104857600
With 104857600 items, utilization is 78.12%, number of hash functions: 2.
Time taken per op is 0.121us (8.3 Mqps) with batch size of 50, # of found keys 104857600
With 104857600 items, utilization is 78.12%, number of hash functions: 2.
Time taken per op is 0.112us (8.9 Mqps) with batch size of 100, # of found keys 104857600
With 83886080 items, utilization is 62.50%, number of hash functions: 2.
Time taken per op is 0.251us (4.0 Mqps) with batch size of 0, # of found keys 83886080
With 83886080 items, utilization is 62.50%, number of hash functions: 2.
Time taken per op is 0.107us (9.4 Mqps) with batch size of 10, # of found keys 83886080
With 83886080 items, utilization is 62.50%, number of hash functions: 2.
Time taken per op is 0.099us (10.1 Mqps) with batch size of 25, # of found keys 83886080
With 83886080 items, utilization is 62.50%, number of hash functions: 2.
Time taken per op is 0.100us (10.0 Mqps) with batch size of 50, # of found keys 83886080
With 83886080 items, utilization is 62.50%, number of hash functions: 2.
Time taken per op is 0.116us (8.6 Mqps) with batch size of 100, # of found keys 83886080
With 73400320 items, utilization is 54.69%, number of hash functions: 2.
Time taken per op is 0.189us (5.3 Mqps) with batch size of 0, # of found keys 73400320
With 73400320 items, utilization is 54.69%, number of hash functions: 2.
Time taken per op is 0.095us (10.5 Mqps) with batch size of 10, # of found keys 73400320
With 73400320 items, utilization is 54.69%, number of hash functions: 2.
Time taken per op is 0.096us (10.4 Mqps) with batch size of 25, # of found keys 73400320
With 73400320 items, utilization is 54.69%, number of hash functions: 2.
Time taken per op is 0.098us (10.2 Mqps) with batch size of 50, # of found keys 73400320
With 73400320 items, utilization is 54.69%, number of hash functions: 2.
Time taken per op is 0.105us (9.5 Mqps) with batch size of 100, # of found keys 73400320
[ljin@dev1964 rocksdb] ./cuckoo_table_reader_test --enable_perf --file_dir=/dev/shm --write --identity_as_first_hash=1
==== Test CuckooReaderTest.WhenKeyExists
==== Test CuckooReaderTest.WhenKeyExistsWithUint64Comparator
==== Test CuckooReaderTest.CheckIterator
==== Test CuckooReaderTest.CheckIteratorUint64
==== Test CuckooReaderTest.WhenKeyNotFound
==== Test CuckooReaderTest.TestReadPerformance
With 125829120 items, utilization is 93.75%, number of hash functions: 2.
Time taken per op is 0.230us (4.3 Mqps) with batch size of 0, # of found keys 125829120
With 125829120 items, utilization is 93.75%, number of hash functions: 2.
Time taken per op is 0.086us (11.7 Mqps) with batch size of 10, # of found keys 125829120
With 125829120 items, utilization is 93.75%, number of hash functions: 2.
Time taken per op is 0.088us (11.3 Mqps) with batch size of 25, # of found keys 125829120
With 125829120 items, utilization is 93.75%, number of hash functions: 2.
Time taken per op is 0.083us (12.1 Mqps) with batch size of 50, # of found keys 125829120
With 125829120 items, utilization is 93.75%, number of hash functions: 2.
Time taken per op is 0.083us (12.1 Mqps) with batch size of 100, # of found keys 125829120
With 104857600 items, utilization is 78.12%, number of hash functions: 2.
Time taken per op is 0.159us (6.3 Mqps) with batch size of 0, # of found keys 104857600
With 104857600 items, utilization is 78.12%, number of hash functions: 2.
Time taken per op is 0.078us (12.8 Mqps) with batch size of 10, # of found keys 104857600
With 104857600 items, utilization is 78.12%, number of hash functions: 2.
Time taken per op is 0.080us (12.6 Mqps) with batch size of 25, # of found keys 104857600
With 104857600 items, utilization is 78.12%, number of hash functions: 2.
Time taken per op is 0.080us (12.5 Mqps) with batch size of 50, # of found keys 104857600
With 104857600 items, utilization is 78.12%, number of hash functions: 2.
Time taken per op is 0.082us (12.2 Mqps) with batch size of 100, # of found keys 104857600
With 83886080 items, utilization is 62.50%, number of hash functions: 2.
Time taken per op is 0.154us (6.5 Mqps) with batch size of 0, # of found keys 83886080
With 83886080 items, utilization is 62.50%, number of hash functions: 2.
Time taken per op is 0.077us (13.0 Mqps) with batch size of 10, # of found keys 83886080
With 83886080 items, utilization is 62.50%, number of hash functions: 2.
Time taken per op is 0.077us (12.9 Mqps) with batch size of 25, # of found keys 83886080
With 83886080 items, utilization is 62.50%, number of hash functions: 2.
Time taken per op is 0.078us (12.8 Mqps) with batch size of 50, # of found keys 83886080
With 83886080 items, utilization is 62.50%, number of hash functions: 2.
Time taken per op is 0.079us (12.6 Mqps) with batch size of 100, # of found keys 83886080
With 73400320 items, utilization is 54.69%, number of hash functions: 2.
Time taken per op is 0.218us (4.6 Mqps) with batch size of 0, # of found keys 73400320
With 73400320 items, utilization is 54.69%, number of hash functions: 2.
Time taken per op is 0.083us (12.0 Mqps) with batch size of 10, # of found keys 73400320
With 73400320 items, utilization is 54.69%, number of hash functions: 2.
Time taken per op is 0.085us (11.7 Mqps) with batch size of 25, # of found keys 73400320
With 73400320 items, utilization is 54.69%, number of hash functions: 2.
Time taken per op is 0.086us (11.6 Mqps) with batch size of 50, # of found keys 73400320
With 73400320 items, utilization is 54.69%, number of hash functions: 2.
Time taken per op is 0.078us (12.8 Mqps) with batch size of 100, # of found keys 73400320
```
Reviewers: sdong, igor, yhchiang
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23451
Summary: as title
Test Plan:
make all check
I will think a way to set up stress test for this
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23055
Summary: We currently don't test mmap reads as part of db_test. Piggyback it on kWalDir test config.
Test Plan: make check
Reviewers: ljin, sdong, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23337
Summary: This diff just moves the write thread control out of the DBImpl. I will need this as I will control column family data concurrency by only accessing some data in the write thread. That way, we won't have to lock our accesses to column family hash table (mappings from IDs to CFDs).
Test Plan: make check
Reviewers: sdong, yhchiang, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23301
Summary:
1. wrap a filter policy like what fbcode/multifeed/rocksdb/MultifeedRocksDbKey.h
to ensure that rocksdb works fine after filterpolicy interface change
Test Plan: 1. valgrind ./bloom_test
Reviewers: ljin, igor, yhchiang, dhruba, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23229
Summary:
If bg_error_ is set, that means that we mark DB read only. However, current behavior still continues the flushes and compactions, even though bg_error_ is set.
On the other hand, if bg_error_ is set, we will return Status::OK() from CompactRange(), although the compaction didn't actually succeed.
This is clearly not desired behavior. I found this when I was debugging t5132159, although I'm pretty sure these aren't related.
Also, when we're shutting down, it's dangerous to exit RunManualCompaction(), since that will destruct ManualCompaction object. Background compaction job might still hold a reference to manual_compaction_ and this will lead to undefined behavior. I changed the behavior so that we only exit RunManualCompaction when manual compaction job is marked done.
Test Plan: make check
Reviewers: sdong, ljin, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23223
Summary: Get valgrind to stop complaining about uninitialized value
Test Plan: valgrind not complaining anymore
Reviewers: sdong, yhchiang, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23289
Summary: The test makes sure that we don't call flush too often. For that, it's ok to check if we have less than 10 table files. Otherwise, the test is flaky because it's hard to estimate number of entries in the memtable before it gets flushed (any ideas?)
Test Plan: Still works, but hopefully less flaky.
Reviewers: ljin, sdong, yhchiang
Reviewed by: yhchiang
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23241
Summary:
When memtable is full it calls the registered callback. That callback then registers column family as needing the flush. Every write checks if there are some column families that need to be flushed. This completely eliminates the need for MakeRoomForWrite() function and simplifies our Write code-path.
There is some complexity with the concurrency when the column family is dropped. I made it a bit less complex by dropping the column family from the write thread in https://reviews.facebook.net/D22965. Let me know if you want to discuss this.
Test Plan: make check works. I'll also run db_stress with creating and dropping column families for a while.
Reviewers: yhchiang, sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23067
Summary:
See t5106397.
Also, few more changes:
1. in unit tests, the assumption is that writes will be dropped when there is no space left on device. I changed the wording around it.
2. InvalidArgument() errors are only when user-provided arguments are invalid. When the file is corrupted, we need to return Status::Corruption
Test Plan: make check
Reviewers: sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23145
Summary: Correct some comments and typos in RocksDB.
Test Plan: Inspection
Reviewers: sdong, igor
Reviewed By: igor
Differential Revision: https://reviews.facebook.net/D23133
Summary:
In column family's SanitizeOptions() [1], we make sure that min_write_buffer_number_to_merge is normal value. However, this test depended on the fact that setting min_write_buffer_number_to_merge to be bigger than max_write_buffer_number will cause a deadlock. I'm not sure how it worked before.
This diff fixes it by scheduling sleeping background task, which will actually block any attempts of flushing.
[1] https://github.com/facebook/rocksdb/blob/master/db/column_family.cc#L104
Test Plan: the test works now
Reviewers: yhchiang, sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23103
Summary: To follow the coding convention and make sure when passing reference as a parameter it is also const, pass MergeContext as a pointer to mem tables.
Test Plan: make all check
Reviewers: ljin, igor
Reviewed By: igor
Subscribers: leveldb, dhruba, yhchiang
Differential Revision: https://reviews.facebook.net/D23085
Summary: Avoid creating unnecessary sst files while db opening
Test Plan: make all check
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: zagfox, yhchiang, ljin, leveldb
Differential Revision: https://reviews.facebook.net/D20661
Summary: removed reference to options in WriteBatch and DBImpl::Get()
Test Plan: make all check
Reviewers: yhchiang, igor, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23049
Summary: as title
Test Plan: make release
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23007
Summary: ...
Test Plan: Can't repro the test failure, but let's see what jenkins says
Reviewers: zagfox, sdong, ljin
Reviewed By: sdong, ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23061
Summary:
all shared_ptrs are in immutable_options now. This will also make
options assignment a little cheaper
Test Plan: make release
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23001
Summary:
I found it is almost impossible to get rid of this function in a single
batch. I will take a step by step approach
Test Plan: make release
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22995
Summary:
When we have multiple column families, users can issue Flush() on every column families to make sure everything is flushes, even if some of them might be empty. By skipping the waiting for empty cases, it can be greatly speed up.
Still wait for people's comments before writing unit tests for it.
Test Plan: Will write a unit test to make sure it is correct.
Reviewers: ljin, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D22953
Summary:
Introducing WriteController, which is a source of truth about per-DB write delays. Let's define an DB epoch as a period where there are no flushes and compactions (i.e. new epoch is started when flush or compaction finishes). Each epoch can either:
* proceed with all writes without delay
* delay all writes by fixed time
* stop all writes
The three modes are recomputed at each epoch change (flush, compaction), rather than on every write (which is currently the case).
When we have a lot of column families, our current pull behavior adds a big overhead, since we need to loop over every column family for every write. With new push model, overhead on Write code-path is minimal.
This is just the start. Next step is to also take care of stalls introduced by slow memtable flushes. The final goal is to eliminate function MakeRoomForWrite(), which currently needs to be called for every column family by every write.
Test Plan: make check for now. I'll add some unit tests later. Also, perf test.
Reviewers: dhruba, yhchiang, MarkCallaghan, sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22791
Summary:
1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file.
2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter.
3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type.
4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h.
5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc
Benchmark: base commit 1d23b5c470
Command:
db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1
Read QPS increase for about 30% from 2230002 to 2991411.
Test Plan:
make all check
valgrind db_test
db_stress --use_block_based_filter = 0
./auto_sanity_test.sh
Reviewers: igor, yhchiang, ljin, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D20979
Summary: If we drop column family only from (single) write thread, we can be sure that nobody will drop the column family while we're writing (and our mutex is released). This greatly simplifies my patch that's getting rid of MakeRoomForWrite().
Test Plan: make check, but also running stress test
Reviewers: ljin, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22965
Summary:
That way we can see when this graph goes up and be happy.
Couple of changes:
1. title
2. fix db_bench to delete column families before deleting the DB. this was asserting when compiled in debug mode
3. don't sync manifest when disableDataSync. We discussed this offline. I can move it to separate diff if you'd like
Test Plan: ran it
Reviewers: sdong, yhchiang, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22815
Summary: as title
Test Plan: make release
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22935
Summary: Fix compaction bug in Cuckoo Table Builder. Use kvs_.size() instead of num_entries in FileSize() method. Also added tests.
Test Plan:
make check all
Also ran db_bench to generate multiple files.
Reviewers: sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22743
Summary: fixed memory leak in unit test DBIteratorBoundTest
Test Plan: ran valgrind test on my unit test
Reviewers: sdong
Differential Revision: https://reviews.facebook.net/D22911
Summary:
PlainTable takes reference instead of a copy. Keep a copy in the test
code
Test Plan: make asan_check
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22899
Summary:
Simply code by removing code path which does not use Arena
from NewInternalIterator
Test Plan:
make all check
make valgrind_check
Reviewers: sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22395
Summary:
As a preparation to support updating some options dynamically, I'd like
to first introduce ImmutableOptions, which is a subset of Options that
cannot be changed during the course of a DB lifetime without restart.
ColumnFamily will keep both Options and ImmutableOptions. Any component
below ColumnFamily should only take ImmutableOptions in their
constructor. Other options should be taken from APIs, which will be
allowed to adjust dynamically.
I am yet to make changes to memtable and other related classes to take
ImmutableOptions in their ctor. That can be done in a seprate diff as
this one is already pretty big.
Test Plan: make all check
Reviewers: yhchiang, igor, sdong
Reviewed By: sdong
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D22545
1, const qualifiers on return types make no sense and will trigger a compile warning: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
2, class HistogramImpl has virtual functions and thus should have a virtual destructor
3, with some toolchain, the macro __STDC_FORMAT_MACROS is predefined and thus should be checked before define
Change-Id: I69747a03bfae88671bfbb2637c80d17600159c99
Signed-off-by: liuhuahang <liuhuahang@zerus.co>
Summary: It only covers Open() with default column family right now
Test Plan: make release
Reviewers: igor, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22467
Summary:
Before this diff, whenever we Write to non-existing column family, Write() would fail.
This diff adds an option to not fail a Write() when WriteBatch points to non-existing column family. MongoDB said this would be useful for them, since they might have a transaction updating an index that was dropped by another thread. This way, they don't have to worry about checking if all indexes are alive on every write. They don't care if they lose writes to dropped index.
Test Plan: added a small unit test
Reviewers: sdong, yhchiang, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22143
Summary: 1. db/db_impl.cc:2324 (DBImpl::BackgroundCompaction) should not raise bg_error_ when column family is dropped during compaction.
Test Plan: 1. db_stress
Reviewers: ljin, yhchiang, dhruba, igor, sdong
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22653
This eliminates the need to remember to call PERF_TIMER_STOP when a section has
been timed. This allows more useful design with the perf timers and enables
possible return value optimizations. Simplistic example:
class Foo {
public:
Foo(int v) : m_v(v);
private:
int m_v;
}
Foo makeFrobbedFoo(int *errno)
{
*errno = 0;
return Foo();
}
Foo bar(int *errno)
{
PERF_TIMER_GUARD(some_timer);
return makeFrobbedFoo(errno);
}
int main(int argc, char[] argv)
{
Foo f;
int errno;
f = bar(&errno);
if (errno)
return -1;
return 0;
}
After bar() is called, perf_context.some_timer would be incremented as if
Stop(&perf_context.some_timer) was called at the end, and the compiler is still
able to produce optimizations on the return value from makeFrobbedFoo() through
to main().
Summary: We need to set contbuild for this :)
Test Plan: compiles
Reviewers: sdong, yhchiang, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22701
Summary:
I have an application configured with 16 background threads. Write rates are high. L0->L1 compactions is very slow and it limits the concurrency of the system. While it's happening, other 15 threads are idle. However, when there is a need of a flush, that one thread busy with L0->L1 is doing flush, instead of any other 15 threads that are just sitting there.
This diff prevents that. If there are threads that are idle, we don't let flush preempt compaction.
Test Plan: Will run stress test
Reviewers: ljin, sdong, yhchiang
Reviewed By: sdong, yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D22299
Summary:
When reading from kBlockCacheTier, ForwardIterator's internal child iterators
may end up in the incomplete state (read was unable to complete without doing
disk I/O). `ForwardIterator::status()` will correctly report that; however, the
iterator may be stuck in that state until all sub-iterators are rebuilt:
* `NeedToSeekImmutable()` may return false even if some sub-iterators are
incomplete
* one of the child iterators may be an empty iterator without any state other
that the kIncomplete status (created using `NewErrorIterator()`); seeking on
any such iterator has no effect -- we need to construct it again
Akin to rebuilding iterators after a superversion bump, this diff makes forward
iterator reset all incomplete child iterators when `Seek()` or `Next()` are
called.
Test Plan: TEST_TMPDIR=/dev/shm/rocksdbtest ROCKSDB_TESTS=TailingIterator ./db_test
Reviewers: igor, sdong, ljin
Reviewed By: ljin
Subscribers: lovro, march, leveldb
Differential Revision: https://reviews.facebook.net/D22575
Summary: It is too expensive to bump ticker to every key/vaue pair
Test Plan: make release
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22527
Summary:
1. remove class InternalFilterPolicy in db/dbformat.h
2. Transformation from internal key to user key is done in filter_block.cc
3. This is a preparation for patch D20979
Test Plan:
make all check
valgrind ./db_test
Reviewers: igor, yhchiang, ljin, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22509
Summary:
Based on discussions from t4982833. This is just a short-term fix, I plan to revamp manual compaction process as part of t4982812.
Also, I think we should schedule automatic compactions at the very end of manual compactions, not when we're done with one level. I made that change as part of this diff. Let me know if you disagree.
Test Plan: make check for now
Reviewers: sdong, tnovak, yhchiang, ljin
Reviewed By: yhchiang
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22401
Summary: No __thread for ios.
Test Plan: compile works for ios now
Reviewers: ljin, dhruba
Reviewed By: dhruba
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22491
Summary:
- New Uint64 comparator
- Modify Reader and Builder to take custom user comparators instead of bytewise comparator
- Modify logic for choosing unused user key in builder
- Modify iterator logic in reader
- test changes
Test Plan:
cuckoo_table_{builder,reader,db}_test
make check all
Reviewers: ljin, sdong
Reviewed By: ljin
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D22377
Summary: In DBImpl::Recover method, while loading memtables, also check if memtables are empty. Use this in DBImplReadonly to determine whether to lookup memtable or not.
Test Plan:
db_test
make check all
Reviewers: sdong, yhchiang, ljin, igor
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22281
Summary: also fix HISTORY.md
Test Plan: make all check
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22437
Summary:
It was creating BlockBasedTableOptions object in a loop without calling
destroy()
Test Plan: valgrind ./c_test --leak-check=full --show-reachable=yes
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22431
Summary: Add a virtual function in table factory that will print table options
Test Plan: make release
Reviewers: igor, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22149
Summary:
I will move compression related options in a separate diff since this
diff is already pretty lengthy.
I guess I will also need to change JNI accordingly :(
Test Plan: make all check
Reviewers: yhchiang, igor, sdong
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D21915
Summary:
Currently, PlainTable must use mmap_reads. When PlainTable is used but
allow_mmap_reads is not set, rocksdb will fail in flush.
This diff improve Options sanitization and add MmapReadRequired() to
TableFactory.
Test Plan:
export ROCKSDB_TESTS=PlainTableOptionsSanitizeTest
make db_test -j32
./db_test
Reviewers: sdong, ljin
Reviewed By: ljin
Subscribers: you, leveldb
Differential Revision: https://reviews.facebook.net/D21939
Summary: A previous change triggered a change by mistake: DestroyDB() will keep info logs under DB directory. Revert the unintended change.
Test Plan: Add a unit test case to verify it.
Reviewers: ljin, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22209
Summary:
Adding num_column_families flag. Adding support for column families in DoWrite and ReadRandom methods.
[Igor, please let me know if this approach sounds good. I shall add it to other methods too.]
Test Plan: Ran fillseq on 1M keys and 10 Column families and ran readrandom.
Reviewers: sdong, yhchiang, igor, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D21387
Summary:
Add WriteBatchWithIndex so that a user can query data out of a WriteBatch, to support MongoDB's read-its-own-write.
WriteBatchWithIndex uses a skiplist to store the binary index. The index stores the offset of the entry in the write batch. When searching for a key, the key for the entry is read by read the entry from the write batch from the offset.
Define a new iterator class for querying data out of WriteBatchWithIndex. A user can create an iterator of the write batch for one column family, seek to a key and keep calling Next() to see next entries.
I will add more unit tests if people are OK about this API.
Test Plan:
make all check
Add unit tests.
Reviewers: yhchiang, igor, MarkCallaghan, ljin
Reviewed By: ljin
Subscribers: dhruba, leveldb, xjin
Differential Revision: https://reviews.facebook.net/D21381
Summary: Adding flags to use cuckoo table SST in db_bench.cc
Test Plan: Ran benchmark with fillseq and readrandom
Reviewers: sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D21729
Summary:
Avoid retrying to read property block from a table when it does not exist
in updating stats for compensating deletion entries.
In addition, ReadTableProperties() now returns Status::NotFound instead
of Status::Corruption when table properties does not exist in the file.
Test Plan:
make db_test -j32
export ROCKSDB_TESTS=CompactionDeleteionTrigger
./db_test
Reviewers: ljin, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D21867
Summary:
1. Support purging info logs from a separate paths from DB path. Refactor the codes of generating info log prefixes so that it can be called when generating new files and scanning log directory.
2. Fix the bug of not scanning multiple DB paths (should only impact multiple DB paths)
Test Plan:
Add unit test for generating and parsing info log files
Add end-to-end test in db_test
Reviewers: yhchiang, ljin
Reviewed By: ljin
Subscribers: leveldb, igor, dhruba
Differential Revision: https://reviews.facebook.net/D21801
Summary: This is a linux-specific system call.
Test Plan: ran db_bench
Reviewers: igor, yhchiang, sdong
Reviewed By: sdong
Subscribers: haobo, leveldb
Differential Revision: https://reviews.facebook.net/D21183
Summary: as title
Test Plan: make release
Reviewers: sdong, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D21717
Summary: 1. write db MANIFEST, CURRENT, IDENTITY, sst files, log files to log before open
Test Plan: run db and check LOG file
Reviewers: ljin, yhchiang, igor, dhruba, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D21459
Summary: as title
Test Plan:
make all check
ran db_bench and saw seek stats at the end
Reviewers: yhchiang, igor, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D21651
Summary: as title
Test Plan: make all check
Reviewers: igor, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D21201
Previously, the prefix extractor was being supplied with the RocksDB
key instead of a parsed user key. This makes correct interpretation
by calling application fragile or impossible.
Summary: Small change: replace mutex_.Lock/mutex_.Unlock() with scope guard
Test Plan: make all check
Reviewers: igor, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D21609
Summary:
Currently DBImpl::Flush() triggers flushes in all column families.
Instead we need to trigger just the column family specified.
Test Plan: make all check
Reviewers: igor, ljin, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D20841
Summary:
Contains the following changes:
- Implementation of cuckoo_table_factory
- Adding cuckoo table into AdaptiveTableFactory
- Adding cuckoo_table_db_test, similar to lines of plain_table_db_test
- Minor fixes to Reader: When a key is found in the table, return the key found instead of the search key.
- Minor fixes to Builder: Add table properties that are required by Version::UpdateTemporaryStats() during Get operation. Don't define curr_node as a reference variable as the memory locations may get reassigned during tree.push_back operation, leading to invalid memory access.
Test Plan:
cuckoo_table_reader_test --enable_perf
cuckoo_table_builder_test
cuckoo_table_db_test
make check all
make valgrind_check
make asan_check
Reviewers: sdong, igor, yhchiang, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D21219
Changes to support unity build:
Script for building the unity.cc file via Makefile
Unity executable Makefile target for testing builds
Source code changes to fix compilation of unity build
* Script for building the unity.cc file via Makefile
* Unity executable Makefile target for testing builds
* Source code changes to fix compilation of unity build
Summary: In FindObsoleteFiles(), we don't scan db_log_dir. Add it.
Test Plan: make all check
Reviewers: ljin, igor, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb, yhchiang
Differential Revision: https://reviews.facebook.net/D21429
Summary: If there is an outstanding compaction scheduled but at the time a manual compaction is triggered, the manual compaction will preempt. In the end of the manual compaction, we should try to schedule compactions to make sure those preempted ones are not skipped.
Test Plan: make all check
Reviewers: yhchiang, ljin
Reviewed By: ljin
Subscribers: leveldb, dhruba, igor
Differential Revision: https://reviews.facebook.net/D21321
Summary: Fix old issue with DBTest.Randomized with BlockBasedTableWithWholeKeyHashIndex + added printing in DBTest.Randomized.
Test Plan: make all check
Reviewers: zagfox, igor, ljin, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D21003
Summary:
Add a DB Property "rocksdb.estimate-table-readers-mem" to return estimated memory usage by all loaded table readers, other than allocated from block cache.
Refactor the property codes to allow getting property from a version, with DB mutex not acquired.
Test Plan: Add several checks of this new property in existing codes for various cases.
Reviewers: yhchiang, ljin
Reviewed By: ljin
Subscribers: xjin, igor, leveldb
Differential Revision: https://reviews.facebook.net/D20733
Summary:
1. logging when create and delete manifest file
2. fix formating in table/format.cc
Test Plan:
make all check
run db_bench, track the LOG file.
Reviewers: ljin, yhchiang, igor, yufei.zhu, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D21009
Summary: I was bit by this when developing SpatialDB. In case all files are at level 0, CompactRange() will output the compacted files to level 0. This is not ideal, since read amp. is much better at level 1 and higher.
Test Plan: Compacted data in SpatialDB, read manifest using ldb, verified that files are now at level 1 instead of 0.
Reviewers: sdong, ljin, yhchiang, dhruba
Reviewed By: dhruba
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D20901
Summary:
Fixed the following compilation error detected in mac:
db/db_test.cc:2524:3: note: in instantiation of function template
specialization 'rocksdb::test::Tester::IsEq<unsigned long long, int>' requested here
ASSERT_EQ(int_num, 0);
^
Test Plan:
make
Summary:
Previously, builder.cc has a check for merge operator which prevents
RocksDB from crash when reopening a DB w/o properly specifying the merge
operator. However, currently we observed a memory leak on failing in
RocksDB recovery. This diff removes such check and let it crash instead of
causing memory leak for now before we have identified the real cause of
the memory leak.
Test Plan: make all check
Reviewers: sdong
Subscribers: ljin, igor
Differential Revision: https://reviews.facebook.net/D20913
Summary:
DBImplReadOnly::CompactRange wasn't override DBImpl::CompactRange;
this can cause problem when using StackableDB inheritors like
DbWithTtl.
P. S. Thanks C++11 for override :)
Test Plan: make all check
Reviewers: igor, sdong
Reviewed By: sdong
Subscribers: yhchiang, leveldb
Differential Revision: https://reviews.facebook.net/D20829
Summary:
Fixed the crash when merge_operator is not properly set after reopen
and added two test cases for this.
Test Plan:
make merge_test
./merge_test
Reviewers: igor, ljin, sdong
Reviewed By: sdong
Subscribers: benj, mvikjord, leveldb
Differential Revision: https://reviews.facebook.net/D20793
Summary: const string& dbname parameter is not used
Test Plan: make all
Reviewers: sdong, igor
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D20703
Summary:
We now reads table properties in VersionSet::LogAndApply(), which requires options.db_paths to be set. But since ldb_cmd directly creates VersionSet without initialization db_paths, causing a seg fault. This patch fix it by initializing db_paths.
log_and_apply_bench still shows segfault, because table cache is nullptr in VersionSet created.
Test Plan: Run ldb dump_manifest which used to fail.
Reviewers: yhchiang, ljin, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D20751
Summary: We have quite some properties that are integers and we are adding more. Add a function to directly return them as an integer, instead of a string
Test Plan: Add several unit test checks
Reviewers: yhchiang, igor, dhruba, haobo, ljin
Reviewed By: ljin
Subscribers: yoshinorim, leveldb
Differential Revision: https://reviews.facebook.net/D20637
Summary: Add a DB property of estimated number of live keys, by adding number of entries of all mem tables and all files, subtracted by all deletions in all files.
Test Plan: Add the case in unit tests
Reviewers: hobbymanyp, ljin
Reviewed By: ljin
Subscribers: MarkCallaghan, yoshinorim, leveldb, igor, dhruba
Differential Revision: https://reviews.facebook.net/D20631
Summary:
It has one-to-one relationship with CFD. Take a pointer to CFD on
constructor to avoid passing cfd through member functions.
Test Plan: make
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D20565
Summary: So that we can avoid calling NowSecs() in MakeRoomForWrite twice
Test Plan: make all check
Reviewers: yhchiang, igor, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D20529
Summary:
Make StatisticsImpl being able to forward stats to provided statistics
implementation. The main purpose is to allow us to collect internal
stats in the future even when user supplies custom statistics
implementation. It avoids intrumenting 2 sets of stats collection code.
One immediate use case is tuning advisor, which needs to collect some
internal stats, users may not be interested.
Test Plan:
ran db_bench and see stats show up at the end of run
Will run make all check since some tests rely on statistics
Reviewers: yhchiang, sdong, igor
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D20145
Summary:
Make StatisticsImpl being able to forward stats to provided statistics
implementation. The main purpose is to allow us to collect internal
stats in the future even when user supplies custom statistics
implementation. It avoids intrumenting 2 sets of stats collection code.
One immediate use case is tuning advisor, which needs to collect some
internal stats, users may not be interested.
Test Plan:
ran db_bench and see stats show up at the end of run
Will run make all check since some tests rely on statistics
Reviewers: yhchiang, sdong, igor
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D20145
Summary:
User gets undefinied error since the definition is not exposed.
Also re-enable the db test with only upper bound check
Test Plan: db_test, rate_limit_test
Reviewers: igor, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D20403
Summary:
Fixed compaction-related errors where number of input levels are hard-coded.
It's a bug found in compaction branch.
This diff will be pushed into master.
Test Plan:
export ROCKSDB_TESTS=Compact
make db_test -j32
./db_test
also passed the tests in compaction branch
Reviewers: igor, sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D20577
Summary:
Modify a functioin TrimAppend in dbformat.h: IterKey. Write a test for it in dbformat_test
Use IterKey in block::Iter to replace std::string to reduce malloc.
Evaluate it using perf record.
malloc: 4.26% -> 2.91%
free: 3.61% -> 3.08%
Test Plan:
make all check
./valgrind db_test dbformat_test
Reviewers: ljin, haobo, yhchiang, dhruba, igor, sdong
Reviewed By: sdong
Differential Revision: https://reviews.facebook.net/D20433
Summary:
Fixed a bug in Compaction.cc where input_levels_ was not properly resized.
Without this fix, there would be invalid access in input_levels_ when more
than two levels are involved in one compaction run.
This fix will go to master instead of compaction branch.
Test Plan: tested in compaction branch.
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D20481
Summary:
All public headers need to be under `include/rocksdb` directory. Otherwise, clients include our header files like this:
#include <rocksdb/db.h>
#include <utilities/backupable_db.h> // still our public header!
Also, internally, we include:
#include "utilities/backupable/backupable_db.h" // internal header
#include "utilities/backupable_db.h" // public header
which is confusing.
This way, when we install rocksdb as a system library, we can just copy `include/rocksdb` directory to system's header files. We can't really copy `utilities` directory to system's header files.
Test Plan: compiles
Reviewers: dhruba, ljin, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D20409
Summary: Add a parameter path_id to DB::CompactRange(), to indicate where the output file should be placed to.
Test Plan: add a unit test
Reviewers: yhchiang, ljin
Reviewed By: ljin
Subscribers: xjin, igor, dhruba, MarkCallaghan, leveldb
Differential Revision: https://reviews.facebook.net/D20085
Summary:
Add block_restart_interval in db_bench, default value 16
Test Plan:
make
Reviewers: sdong
Reviewed By: sdong
Differential Revision: https://reviews.facebook.net/D20331
Summary:
Adding option to save PlainTable index and bloom filter in SST file.
If there is no bloom block and/or index block, PlainTableReader builds
new ones. Otherwise PlainTableReader just use these blocks.
Test Plan: make all check
Reviewers: sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19527
Summary:
Since we have a lot of options for PlainTable, add a struct PlainTableOptions
to manage them
Test Plan: make all check
Reviewers: sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D20175
Summary:
Having if-then branch for different compaction strategies is considered
hacky and make CompactionPicker less pluggable. This diff removes two
of such if-then branches in version_set.cc by adding MaxInputLevel() to
CompactionPicker.
// Given the current number of levels, returns the lowest allowed level
// for compaction input.
virtual int MaxInputLevel(int current_num_levels) const;
Test Plan:
make db_test
export ROCKSDB_TESTS=Compaction
./db_test
Reviewers: igor, sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19971
Summary: Adding guards to files_ attribute of FilePicker class. This attribute is used only in DEBUG mode. This fixes build of static_lib in mac.
Test Plan:
make static_lib in mac
make check all in devserver
Reviewers: ljin, igor, sdong
Reviewed By: sdong
Differential Revision: https://reviews.facebook.net/D20163
Summary:
Allow class Compaction to handle input files from multiple levels.
This diff is a subset of https://reviews.facebook.net/D19263 where
only db/compaction.cc and db/compaction.h are changed.
Test Plan:
make db_test
export ROCKSDB_TESTS=Compaction
./db_test
Reviewers: igor, sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19923
Summary:
add a flag called use_hash_search in db_bench
Test Plan:
make all check
./db_bench --use_hash_search=1
Reviewers: ljin, haobo, yhchiang, sdong
Reviewed By: sdong
Subscribers: igor, dhruba
Differential Revision: https://reviews.facebook.net/D20067
Summary:
One line change to fix a bug in the LOG when flush memtable
Test Plan:
NONE
Reviewers: sdong
Reviewed By: sdong
Differential Revision: https://reviews.facebook.net/D20049
Summary: Refactoring Version::Get() method to move file picker logic to a separate class.
Test Plan: make check all
Reviewers: igor, sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19713
Summary: Implement Prev() with merge operator for DBIterator. Request from mongoDB. Task 4673663.
Test Plan: make all check
Reviewers: sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19743
Summary:
This patch adds a target size parameter in options.db_paths and universal compaction will base it to determine which DB path to place a new file.
Level-style stays the same.
Test Plan: Add new unit tests
Reviewers: ljin, yhchiang
Reviewed By: yhchiang
Subscribers: MarkCallaghan, dhruba, igor, leveldb
Differential Revision: https://reviews.facebook.net/D19869
Summary: Browsing through the code, looks like StatsLogger is not used at all!
Test Plan: compiles
Reviewers: ljin, sdong, yhchiang, dhruba
Reviewed By: dhruba
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D19827
Summary: We should use PRIu64 instead of "%lu" for portability
Test Plan: compiles now
Reviewers: ljin, dhruba
Reviewed By: dhruba
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19809
Summary:
Use FileLevel in LevelFileNumIterator, thus use new version of findFile.
Old version of findFile function is deleted.
Write a function in version_set.cc to generate FileLevel from files_.
Add GenerateFileLevelTest in version_set_test.cc
Test Plan:
make all check
Reviewers: ljin, haobo, yhchiang, sdong
Reviewed By: sdong
Subscribers: igor, dhruba
Differential Revision: https://reviews.facebook.net/D19659
Summary:
If `NeedToSeekImmutable()` returns false, `SeekInternal()` won't reset the
contents of `immutable_min_heap_`. However, since it calls `UpdateCurrent()`
unconditionally, if `current_` is one of immutable iterators (previously popped
from `immutable_min_heap_`), `UpdateCurrent()` will overwrite it. As a result,
if old `current_` in fact pointed to the smallest entry, forward iterator will
skip some records.
Fix implemented in this diff pushes `current_` back to `immutable_min_heap_`
before calling `UpdateCurrent()`.
Test Plan:
New unit test (courtesy of @lovro):
$ ROCKSDB_TESTS=TailingIteratorSeekToSame ./db_test
Reviewers: igor, dhruba, haobo, ljin
Reviewed By: ljin
Subscribers: lovro, leveldb
Differential Revision: https://reviews.facebook.net/D19653
Summary:
Forward iterator only checked `status_` and `mutable_iter_->status()`, which is
not sufficient. For example, when reading exclusively from cache
(kBlockCacheTier), `mutable_iter_->status()` may return kOk (e.g. there's
nothing in the memtable), but one of immutable iterators could be in
kIncomplete. In this case, `ForwardIterator::status()` ought to return that
status instead of kOk.
This diff changes `status()` to also check `imm_iters_`, `l0_iters_`, and
`level_iters_`.
Test Plan:
ROCKSDB_TESTS=TailingIteratorIncomplete ./db_test
Reviewers: ljin, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D19581
Summary: Add a function to return the perf level. It is to allow a wrapper of DB to increase the perf level and restore the original perf level after finishing the function call.
Test Plan: Add a verification in db_test
Reviewers: yhchiang, igor, ljin
Reviewed By: ljin
Subscribers: xjin, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D19551
Summary:
Seems like NewTotalOrderPlainTableFactory is useless and is semantically incorrect.
Total order mode indicator is prefix_extractor == nullptr,
but NewTotalOrderPlainTableFactory doesn't set it to be nullptr. That's why some tests
in plain_table_db_tests is incorrect.
Test Plan: make all check
Reviewers: sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19587
Summary:
This is a rough sketch of our new document API. Would like to get some thoughts and comments about the high-level architecture and API.
I didn't optimize for performance at all. Leaving some low-hanging fruit so that we can be happy when we fix them! :)
Currently, bunch of features are not supported at all. Indexes can be only specified when creating database. There is no query planner whatsoever. This will all be added in due time.
Test Plan: Added a simple unit test
Reviewers: haobo, yhchiang, dhruba, sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D18747
Summary:
fix a bug in improve_file_key_search, change the parameter for FileDescriptor
Test Plan:
make all check
Reviewers: sdong
Reviewed By: sdong
Differential Revision: https://reviews.facebook.net/D19611
Summary:
The test is not stable because it relies on disk and only runs for a
short period of time. So misisng a compaction/flush would greatly affect
the rate. I am disabling it for now. What do you guys think?
Test Plan: make
Reviewers: yhchiang, igor, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19599
Summary:
Define CompressedFileMetaData that just contains fd, smallest_slice, largest_slice. Create compressed_levels_ in Version, the space is allocated using arena
Thus increase the file meta data locality, speed up "Get" and "FindFile"
benchmark with in-memory tmpfs, could have 4% improvement under "random read" and 2% improvement under "read while writing"
benchmark command:
./db_bench --db=/mnt/db/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --block_size=4096 --cache_size=17179869184 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --disable_wal=0 --sync=0 --disable_data_sync=1 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_grandparent_overlap_factor=10 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --perf_level=0 --benchmarks=readwhilewriting,readwhilewriting,readwhilewriting --use_existing_db=1 --num=52428800 --threads=1 —writes_per_second=81920
Read Random:
From 1.8363 ms/op, improve to 1.7587 ms/op.
Read while writing:
From 2.985 ms/op, improve to 2.924 ms/op.
Test Plan:
make all check
Reviewers: ljin, haobo, yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba, igor
Differential Revision: https://reviews.facebook.net/D19419
Summary:
This patch include two fixes:
1. newly created Version will now takes the aggregated stats for average-value-size from the latest Version.
2. compensated size of a file is now computed only for newly created / loaded file, this addresses the issue where files are already sorted by their compensated file size but might sometimes observe some out-of-order due to later update on compensated file size.
Test Plan:
export ROCKSDB_TESTS=CompactionDele
./db_test
Reviewers: ljin, igor, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19557
Summary: stall count is wrong
Test Plan: make release
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19539
Summary:
Randomize keys so that compaction actually happens.
Change the config so that compaction happens more aggressively.
The test takes longer time, but the results are more stable shown by
iostat
Test Plan: ran it
Reviewers: igor, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19533
Summary: as title
Test Plan: make release
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19515
Summary:
Add option and plugin rate limiter for PosixWritableFile. The rate
limiter only applies to flush and compaction. WAL and MANIFEST are
excluded from this enforcement.
Test Plan: db_test
Reviewers: igor, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19425
Summary:
Use search hint to reduce FindFile range thus avoid comparison
For a small DB with 50M keys, perf_context counter shows it reduces
comparison from 2B to 1.3B for a 15-minute run. No perf change was
observed for 1 seek thread, but quite good improvement was seen for 32
seek threads, when CPU was busy.
will post detail results when ready
Test Plan: db_bench and db_test
Reviewers: haobo, sdong, dhruba, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D18879
Summary:
Changes:
- Adding numa_aware flag to db_bench.cc
- Using numa.h library to bind memory and cpu of threads to a fixed NUMA node
Result: There seems to be no significant change in the micros/op time with numa_aware enabled. I also tried this with other implementations, including a combination of pthread_setaffinity_np, sched_setaffinity and set_mempolicy methods. It'd be great if someone could point out where I'm going wrong and if we can achieve a better micors/op.
Test Plan:
Ran db_bench tests using following command:
./db_bench --db=/mnt/tmp --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --block_size=4096 --cache_size=17179869184 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=134217728 --max_bytes_for_level_base=1073741824 --disable_wal=0 --wal_dir=/mnt/tmp --sync=0 --disable_data_sync=1 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_grandparent_overlap_factor=10 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --perf_level=0 --duration=300 --benchmarks=readwhilewriting --use_existing_db=1 --num=157286400 --threads=24 --writes_per_second=10240 --numa_aware=[False/True]
The tests were run in private devserver with 24 cores and the db was prepopulated using filluniquerandom test. The tests resulted in 0.145 us/op with numa_aware=False and 0.161 us/op with numa_aware=True.
Reviewers: sdong, yhchiang, ljin, igor
Reviewed By: ljin, igor
Subscribers: igor, leveldb
Differential Revision: https://reviews.facebook.net/D19353
Summary:
SimpleWriteTimeoutTest has two parts: 1) insert two large key/values
to make memtable full and expect both of them are successful; 2) insert
another key / value and expect it to be timed-out. Previously we also
set a timeout in the first step, but this might sometimes cause
false alarm.
This diff makes the first two writes run without timeout setting.
Test Plan:
export ROCKSDB_TESTS=Time
make db_test
Reviewers: sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19461
Summary: Removed a variable that is only used in assertion check.
Test Plan: make release
Reviewers: ljin, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19455
Summary:
This diff allows the I/O stats about Flush and Compaction to be reported
in a more accurate way. Instead of measuring the size of a file, it
measure I/O cost in per read / write basis.
Test Plan: make all check
Reviewers: sdong, igor, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19383
Summary:
This diff adds timeout_hint_us to WriteOptions. If it's non-zero, then
1) writes associated with this options MAY be aborted when it has been
waiting for longer than the specified time. If an abortion happens,
associated writes will return Status::TimeOut.
2) the stall time of the associated write caused by flush or compaction
will be limited by timeout_hint_us.
The default value of timeout_hint_us is 0 (i.e., OFF.)
The statistics of timeout writes will be recorded in WRITE_TIMEDOUT.
Test Plan:
export ROCKSDB_TESTS=WriteTimeoutAndDelayTest
make db_test
./db_test
Reviewers: igor, ljin, haobo, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D18837
Summary:
In this patch, we allow RocksDB to support multiple DB paths internally.
No user interface is supported yet so this patch is silent to users.
Test Plan: make all check
Reviewers: igor, haobo, ljin, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D18921
Summary:
Before this diff, we're deciding enable_compression in CompactionPicker and then we're deciding final compression type in DBImpl. This is kind of confusing.
After the diff, the final compression type will be decided in CompactionPicker.
The reason for this is that I want CompactFiles() to specify output compression type, so that people can mix and match compression styles in their compaction algorithms. This diff makes it much easier to do that.
Test Plan: make check
Reviewers: dhruba, haobo, sdong, yhchiang, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19137
Commit 6634844dba by sdong
Two small fixes in db_test
Summary:
Two fixes:
(1) WalDir to pick a directory under TmpDir to allow two tests running in parallel without impacting each other
(2) kBlockBasedTableWithWholeKeyHashIndex is disabled by mistake (I assume). Enable it.
Test Plan: ./db_test
Reviewers: yhchiang, ljin
Reviewed By: ljin
Subscribers: nkg-, igor, dhruba, haobo, leveldb
Differential Revision: https://reviews.facebook.net/D19389
Summary:
In this patch, we enhance HashLinkList memtable to reduce performance outliers when a bucket contains too many entries. We switch to skip list for this case to enable binary search.
Add threshold_use_skiplist parameter to determine when a bucket needs to switch to skip list.
The new data structure is documented in comments in the codes.
Test Plan:
make all check
set threshold_use_skiplist in several tests
Reviewers: yhchiang, haobo, ljin
Reviewed By: yhchiang, ljin
Subscribers: nkg-, xjin, dhruba, yhchiang, leveldb
Differential Revision: https://reviews.facebook.net/D19299
Summary:
Two fixes:
(1) WalDir to pick a directory under TmpDir to allow two tests running in parallel without impacting each other
(2) kBlockBasedTableWithWholeKeyHashIndex is disabled by mistake (I assume). Enable it.
Test Plan: ./db_test
Reviewers: yhchiang, ljin
Reviewed By: ljin
Subscribers: nkg-, igor, dhruba, haobo, leveldb
Differential Revision: https://reviews.facebook.net/D19389
Summary: files_by_size_ is sorted by time in case of universal compaction. However, Version::files_ is also sorted by time. So no need for files_by_size_
Test Plan:
1) make check with the change
2) make check with `assert(last_index == c->input_version_->files_[level].size() - 1);` in compaction picker
Reviewers: dhruba, haobo, yhchiang, sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19125
Summary:
Bloomfilter and hashskiplist's buckets_ allocated by memtable's arena
DynamicBloom: pass arena via constructor, allocate space in SetTotalBits
HashSkipListRep: allocate space of buckets_ using arena.
do not delete it in deconstructor because arena would take care of it.
Several test files are changed.
Test Plan:
make all check
Reviewers: ljin, haobo, yhchiang, sdong
Reviewed By: sdong
Subscribers: igor, dhruba
Differential Revision: https://reviews.facebook.net/D19335
Summary: Code cleaning up, since we are already using __STDC_FORMAT_MACROS in printing uint64_t, change other places. Only logging is changed.
Test Plan: make all check
Reviewers: ljin
Reviewed By: ljin
Subscribers: dhruba, yhchiang, haobo, leveldb
Differential Revision: https://reviews.facebook.net/D19113