Summary:
This is is based on https://reviews.facebook.net/D15027. It's not finished but I would like to give a prototype to avoid arena over-allocation while making better use of the already allocated memory blocks.
Instead of check approximate memtable size, we will take a deeper look at the arena, which incorporate essential idea that @sdong suggests: flush when arena has allocated its last and the last is "almost full"
Test Plan: N/A
Reviewers: haobo, sdong
Reviewed By: sdong
CC: leveldb, sdong
Differential Revision: https://reviews.facebook.net/D15051
Summary: A bad Auto-Merge caused log buffer is flushed twice. Remove the unintended one.
Test Plan: Should already be tested (the code looks the same as when I ran unit tests).
Reviewers: haobo, igor
Reviewed By: haobo
CC: ljin, yhchiang, leveldb
Differential Revision: https://reviews.facebook.net/D16821
Summary:
(1) Fix SanitizeOptions() to also check HashLinkList. The current
dynamic case just happens to work because the 2 classes have the same
layout.
(2) Do not delete SliceTransform object in HashSkipListFactory and
HashLinkListFactory destructor. Reason: SanitizeOptions() enforces
prefix_extractor and SliceTransform to be the same object when
Hash**Factory is used. This makes the behavior strange: when
Hash**Factory is used, prefix_extractor will be released by RocksDB. If
other memtable factory is used, prefix_extractor should be released by
user.
Test Plan: db_bench && make asan_check
Reviewers: haobo, igor, sdong
Reviewed By: igor
CC: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D16587
Summary: I wrote a test that triggers assertion in MergingIterator. I have not touched that code ever, so I'm looking for somebody with good understanding of the MergingIterator code to fix this. The solution is probably a one-liner. Let me know if you're willing to take a look.
Test Plan: This test fails with an assertion `use_heap_ == false`
Reviewers: dhruba, haobo, sdong, kailiu
Reviewed By: sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16521
Summary:
Add an optional input parameter ReadOptions to DB::GetUpdateSince(),
which allows the verification of checksums to be disabled by setting
ReadOptions::verify_checksums to false.
Test Plan: Tests are done off-line and will not be included in the regular unit test.
Reviewers: igor
Reviewed By: igor
CC: leveldb, xjin, dhruba
Differential Revision: https://reviews.facebook.net/D16305
Summary:
Two new column family tests:
* DifferentMergeOperators -- three column families, one without merge operator, one with add operator and one with append operator. verify that operations work as expected.
* DifferentCompactionStyles -- three column families, two with level compactions and one with universal compaction. trigger the compactions and verify they work as expected.
Test Plan: nope
Reviewers: dhruba, haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16377
Summary:
* Add ColumnFamilyHandle::GetID() function. Client needs to know column family's ID to be able to construct WriteBatch
* Handle WriteBatch::Handler failure gracefully. Since WriteBatch is not a very smart function (it takes raw CF id), client can add data to WriteBatch for column family that doesn't exist. In that case, we need to gracefully return failure status from DB::Write(). To do that, I added a return Status to WriteBatch functions PutCF, DeleteCF and MergeCF.
Test Plan: Added test to column_family_test
Reviewers: dhruba, haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16323
Summary: Adapt table properties to column family world
Test Plan: make check
Reviewers: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16161
Summary:
This is a huge diff and it was hectic, but the idea is actually quite simple. Every operation (Put, Get, etc.) done on default column family in DBTest is now forwarded to non-default ("pikachu"). The good news is that we had zero test failures! Column families look stable so far.
One interesting test that I adapted for column families is MultiThreadedTest. I replaced every Put() with a WriteBatch writing to all column families concurrently. Every Put in the write batch contains unique_id. Instead of Get() I do a multiget across all column families with the same key. If atomicity holds, I expect to see the same unique_id in all column families.
Test Plan: This is a test!
Reviewers: dhruba, haobo, kailiu, sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16149
Summary: Provide a public API for users to access the table properties for each SSTable.
Test Plan: Added a unit tests to test the function correctness under differnet conditions.
Reviewers: haobo, dhruba, sdong
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16083
Summary:
The change to the public behavior:
* When opening a DB or creating new column family client gets a ColumnFamilyHandle.
* As long as column family handle is alive, client can do whatever he wants with it, even drop it
* Dropped column family can still be read from (using the column family handle)
* Added a new call CloseColumnFamily(). Client has to close all column families that he has opened before deleting the DB
* As soon as column family is closed, any calls to DB using that column family handle will fail (also any outstanding calls)
Internally:
* Ref-counting ColumnFamilyData
* New thread-safety for ColumnFamilySet
* Dropped column families are now completely dropped and their memory cleaned-up
Test Plan: added some tests to column_family_test
Reviewers: dhruba, haobo, kailiu, sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16101
Summary: Added a bit more information to compaction context, requested by internal team at FB.
Test Plan: Modified CompactionFilter test to make sure is_manual_compaction is properly set.
Reviewers: haobo
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16095
Summary: This covers existing table files before DB open happens and avoids contention on table cache
Test Plan: db_test
Reviewers: haobo, sdong, igor, dhruba
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16089
Summary:
Use super_version insider NewIterator to avoid Ref() each component
separately under mutex
The new added bench shows NewIterator QPS increases from 515K to 719K
No meaningful improvement for multiget I guess due to its relatively small
cost comparing to 90 keys fetch in the test.
Test Plan: unit test and db_bench
Reviewers: igor, sdong
Reviewed By: igor
CC: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D15609
Summary: In PlainTable, use one single byte to represent 8 bytes of internal bytes, if seqID = 0 and it is value type (which should be common for bottom most files). It is to save 7 bytes for uncompressed cases.
Test Plan: make all check
Reviewers: haobo, dhruba, kailiu
Reviewed By: haobo
CC: igor, leveldb
Differential Revision: https://reviews.facebook.net/D15489
Summary: This removes the default implementation of LogAndApply that applied the changed to the default column family by default. It is mostly simple reformatting.
Test Plan: make check
Reviewers: dhruba, kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15465
Summary: By removing some includes form options.h and reply on forward declaration, we can more easily reason the dependencies.
Test Plan: make all check
Reviewers: kailiu, haobo, igor, dhruba
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15411
Summary:
Mixing index/filter blocks with data blocks resulted in some known
issues. To make sure in next release our users won't be affected,
we added a new option in BlockBasedTableFactory::TableOption to
conceal this functionality for now.
This patch also introduced a BlockBasedTableReader::OpenOptions,
which avoids the "infinite" growth of parameters in
BlockBasedTableReader::Open().
Test Plan: make check
Reviewers: haobo, sdong, igor, dhruba
Reviewed By: igor
CC: leveldb, tnovak
Differential Revision: https://reviews.facebook.net/D15327
Summary: as title
Test Plan:
make all check
What else tests shall I cover?
Reviewers: igor, haobo
CC:
Differential Revision: https://reviews.facebook.net/D15339
Summary:
This diff implements a special type of iterator that doesn't create a snapshot
(can be used to read newly inserted data) and is optimized for doing sequential
reads.
TailingIterator uses current superversion number to determine whether to
invalidate its internal iterators. If the version hasn't changed, it can often
avoid doing expensive seeks over immutable structures (sst files and immutable
memtables).
Test Plan:
* new unit tests
* running LD with this patch
Reviewers: igor, dhruba, haobo, sdong, kailiu
Reviewed By: sdong
CC: leveldb, lovro, march
Differential Revision: https://reviews.facebook.net/D15285
Summary:
I created a separate class ColumnFamilySet to keep track of column families. Before we did this in VersionSet and I believe this approach is cleaner.
Let me know if you have any comments. I will commit tomorrow.
Test Plan: make check
Reviewers: dhruba, haobo, kailiu, sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15357
Summary:
This diff does two things:
* Rethinks how we call Recover() with read_only option. Before, we call it with pointer to memtable where we'd like to apply those changes to. This memtable is set in db_impl_readonly.cc and it's actually DBImpl::mem_. Why don't we just apply updates to mem_ right away? It seems more intuitive.
* Changes when we apply updates to manifest. Before, the process is to recover all the logs, flush it to sst files and then do one giant commit that atomically adds all recovered sst files and sets the next log number. This works good enough, but causes some small troubles for my column family approach, since I can't have one VersionEdit apply to more than single column family[1]. The change here is to commit the files recovered from logs right away. Here is the state of the world before the change:
1. Recover log 5, add new sst files to edit
2. Recover log 7, add new sst files to edit
3. Recover log 8, add new sst files to edit
4. Commit all added sst files to manifest and mark log files 5, 7 and 8 as recoverd (via SetLogNumber(9) function)
After the change, we'll do:
1. Recover log 5, commit the new sst files and set log 5 as recovered
2. Recover log 7, commit the new sst files and set log 7 as recovered
3. Recover log 8, commit the new sst files and set log 8 as recovered
The added (small) benefit is that if we fail after (2), the new recovery will only have to recover log 8. In previous case, we'll have to restart the recovery from the beginning. The bigger benefit will be to enable easier integration of multiple column families in Recovery code path.
[1] I'm happy to dicuss this decison, but I believe this is the cleanest way to go. It also makes backward compatibility much easier. We don't have a requirement of adding multiple column families atomically.
Test Plan: make check
Reviewers: dhruba, haobo, kailiu, sdong
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15237
Summary: I'm separating code-cleanup part of https://reviews.facebook.net/D14517. This will make D14517 easier to understand and this diff easier to review.
Test Plan: make check
Reviewers: haobo, kailiu, sdong, dhruba, tnovak
Reviewed By: tnovak
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15099