Summary: Client doesn't need to know anything about ColumnFamily ID. By making WriteBatch take ColumnFamilyHandle as a parameter, we can eliminate method GetID() from ColumnFamilyHandle
Test Plan: column_family_test
Reviewers: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16887
Summary:
I'm cleaning up some code preparing for the big diff review tomorrow. This is the first part of the cleanup.
Changes are mostly cosmetic. The goal is to decrease amount of code difference between columnfamilies and master branch.
This diff also fixes race condition when dropping column family.
Test Plan: Ran db_stress with variety of parameters
Reviewers: dhruba, haobo
Differential Revision: https://reviews.facebook.net/D16833
Summary:
This diff fixes two bugs:
* Increase sequence number even if WriteBatch fails. This is important because WriteBatches in WAL logs have implictly increasing sequence number, even if one update in a write batch fails. This caused some writes to get lost in my CF stress testing
* Tolerate 'invalid column family' errors on recovery. When a column family is dropped, processing WAL logs can have some WriteBatches that still refer to the dropped column family. In recovery environment, we want to ignore those errors. In client's Write() code path, however, we want to return the failure to the client if he's trying to add data to invalid column family.
Test Plan: db_stress's verification works now
Reviewers: dhruba, haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16533
Summary:
* Add ColumnFamilyHandle::GetID() function. Client needs to know column family's ID to be able to construct WriteBatch
* Handle WriteBatch::Handler failure gracefully. Since WriteBatch is not a very smart function (it takes raw CF id), client can add data to WriteBatch for column family that doesn't exist. In that case, we need to gracefully return failure status from DB::Write(). To do that, I added a return Status to WriteBatch functions PutCF, DeleteCF and MergeCF.
Test Plan: Added test to column_family_test
Reviewers: dhruba, haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16323
Summary:
The change to the public behavior:
* When opening a DB or creating new column family client gets a ColumnFamilyHandle.
* As long as column family handle is alive, client can do whatever he wants with it, even drop it
* Dropped column family can still be read from (using the column family handle)
* Added a new call CloseColumnFamily(). Client has to close all column families that he has opened before deleting the DB
* As soon as column family is closed, any calls to DB using that column family handle will fail (also any outstanding calls)
Internally:
* Ref-counting ColumnFamilyData
* New thread-safety for ColumnFamilySet
* Dropped column families are now completely dropped and their memory cleaned-up
Test Plan: added some tests to column_family_test
Reviewers: dhruba, haobo, kailiu, sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16101
Summary:
WriteBatch can have multiple column families in one batch. Every column family has different options. So we have to add a way for write batch to get options for an arbitrary column family.
This required a bit more acrobatics since lots of interfaces had to be changed.
Test Plan: make check
Reviewers: dhruba, haobo, sdong, kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15957
Summary: This one is big. It adds ability to write to and read from different column families (see the unit test). It also supports recovery of different column families from log, which was the hardest part to reason about. We need to make sure to never delete the log file which has unflushed data from any column family. To support that, I added another concept, which is versions_->MinLogNumber()
Test Plan: Added a unit test in column_family_test
Reviewers: dhruba, haobo, sdong, kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15537
Summary: I'm separating code-cleanup part of https://reviews.facebook.net/D14517. This will make D14517 easier to understand and this diff easier to review.
Test Plan: make check
Reviewers: haobo, kailiu, sdong, dhruba, tnovak
Reviewed By: tnovak
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15099
Summary:
This diff fixes 2 hacks:
* The callback function can modify the existing value inplace, if the merged value fits within the existing buffer size. But currently the existing buffer size is not being modified. Now the callback recieves a int* allowing the size to be modified. Since size is encoded as a varint in the internal key for memtable. It might happen that the entire value might have be copied to the new location if the new size varint is smaller than the existing size varint.
* The callback function has 3 functionalities
1. Modify existing buffer inplace, and update size correspondingly. Now to indicate that, Returns 1.
2. Generate a new buffer indicating merged value. Returns 2.
3. Fails to do either of above, based on whatever application logic. Returns 0.
Test Plan: Just make all for now. I'm adding another unit test to test each scenario.
Reviewers: dhruba, haobo
Reviewed By: haobo
CC: leveldb, sdong, kailiu, xinyaohu, sumeet, danguo
Differential Revision: https://reviews.facebook.net/D15195
Summary:
In one of CPU profiles, we see some CPU costs of string::reserve() inside Batch.Put(). This patch should be able to reduce some of the costs by allocating sufficient buffer before hand.
Since it is a trivial percentage of CPU costs, I didn't find a way to show the improvement in one of the benchmarks. I'll deploy it to same application and do the same CPU profiling to make sure those CPU costs are reduced.
Test Plan: make all check
Reviewers: haobo, kailiu, igor
Reviewed By: haobo
CC: leveldb, nkg-
Differential Revision: https://reviews.facebook.net/D15135
Summary:
In one of CPU profiles, we see some CPU costs of string::reserve() inside Batch.Put(). This patch should be able to reduce some of the costs by allocating sufficient buffer before hand.
Since it is a trivial percentage of CPU costs, I didn't find a way to show the improvement in one of the benchmarks. I'll deploy it to same application and do the same CPU profiling to make sure those CPU costs are reduced.
Test Plan: make all check
Reviewers: haobo, kailiu, igor
Reviewed By: haobo
CC: leveldb, nkg-
Differential Revision: https://reviews.facebook.net/D15135
Summary: The application can set a callback function, which is applied on the previous value. And calculates the new value. This new value can be set, either inplace, if the previous value existed in memtable, and new value is smaller than previous value. Otherwise the new value is added normally.
Test Plan: fbmake. Added unit tests. All unit tests pass.
Reviewers: dhruba, haobo
Reviewed By: haobo
CC: sdong, kailiu, xinyaohu, sumeet, leveldb
Differential Revision: https://reviews.facebook.net/D14745
Summary:
Added an option (max_successive_merges) that can be used to specify the
maximum number of successive merge operations on a key in the memtable.
This can be used to improve performance of the "get" operation. If many
successive merge operations are performed on a key, the performance of "get"
operations on the key deteriorates, as the value has to be computed for each
"get" operation by applying all the successive merge operations.
FB Task ID: #3428853
Test Plan:
make all check
db_bench --benchmarks=readrandommergerandom
counter_stress_test
Reviewers: haobo, vamsi, dhruba, sdong
Reviewed By: haobo
CC: zshao
Differential Revision: https://reviews.facebook.net/D14991
Summary:
I have added three new value types:
* kTypeColumnFamilyDeletion
* kTypeColumnFamilyValue
* kTypeColumnFamilyMerge
which include column family Varint32 before the data (value, deletion and merge). These values are used only in WAL (not in memtables yet).
This endeavour required changing some WriteBatch internals.
Test Plan: Added a unittest
Reviewers: dhruba, haobo, sdong, kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15045
Summary: The previous patch is wrong. rep_.resize(kHeader) just resets the header portion to zero, and should not cause a re-allocation if g++ does it right. I will go ahead and revert it.
Test Plan: make check
Reviewers: dhruba, sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14793
Summary: tmp_batch_ will get re-allocated for every merged write batch because of the existing resize in WriteBatch::Clear. Note that in DBImpl::BuildBatchGroup, we have a hard coded upper limit of batch size 1<<20 = 1MB already.
Test Plan: make check
Reviewers: dhruba, sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14787
Summary:
<This diff is for Column Family branch>
Sharing some of the work I've done so far. This diff compiles and passes the tests.
The biggest change is in options.h - I broke down Options into two parts - DBOptions and ColumnFamilyOptions. DBOptions is DB-specific (env, create_if_missing, block_cache, etc.) and ColumnFamilyOptions is column family-specific (all compaction options, compresion options, etc.). Note that this does not break backwards compatibility at all.
Further, I created DBWithColumnFamily which inherits DB interface and adds new functions with column family support. Clients can transparently switch to DBWithColumnFamily and it will not break their backwards compatibility.
There are few methods worth checking out: ListColumnFamilies(), MultiNewIterator(), MultiGet() and GetSnapshot(). [GetSnapshot() returns the snapshot across all column families for now - I think that's what we agreed on]
Finally, I made small changes to WriteBatch so we are able to atomically insert data across column families.
Please provide feedback.
Test Plan: make check works, the code is backward compatible
Reviewers: dhruba, haobo, sdong, kailiu, emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14445
Summary: liveness of the statistics object is already ensured by the shared pointer in DB options. There's no reason to pass again shared pointer among internal functions. Raw pointer is sufficient and efficient.
Test Plan: make check
Reviewers: dhruba, MarkCallaghan, igor
Reviewed By: dhruba
CC: leveldb, reconnect.grayhat
Differential Revision: https://reviews.facebook.net/D14289
Summary: In our project, when writing to the database, we want to form the value as the concatenation of a small header and a larger payload. It's a shame to have to copy the payload just so we can give RocksDB API a linear view of the value. Since RocksDB makes a copy internally, it's easy to support gather writes.
Test Plan: write_batch_test, new test case
Reviewers: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13947
Summary:
Currently for each put, a fresh memory is allocated, and a new entry is added to the memtable with a new sequence number irrespective of whether the key already exists in the memtable. This diff is an attempt to update the value inplace for existing keys. It currently handles a very simple case:
1. Key already exists in the current memtable. Does not inplace update values in immutable memtable or snapshot
2. Latest value type is a 'put' ie kTypeValue
3. New value size is less than existing value, to avoid reallocating memory
TODO: For a put of an existing key, deallocate memory take by values, for other value types till a kTypeValue is found, ie. remove kTypeMerge.
TODO: Update the transaction log, to allow consistent reload of the memtable.
Test Plan: Added a unit test verifying the inplace update. But some other unit tests broken due to invalid sequence number checks. WIll fix them next.
Reviewers: xinyaohu, sumeet, haobo, dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12423
Automatic commit by arc
Summary:
Change namespace from leveldb to rocksdb. This allows a single
application to link in open-source leveldb code as well as
rocksdb code into the same process.
Test Plan: compile rocksdb
Reviewers: emayanke
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13287
Summary: Replace include/leveldb with include/rocksdb.
Test Plan:
make clean; make check
make clean; make release
Differential Revision: https://reviews.facebook.net/D12489
Summary:
Sometimes you don't need to iterate through the whole WriteBatch. This diff makes the Handler member functions return a bool that indicates whether to abort or not. If they return true, the iteration stops.
One thing I just thought of is that this will break backwards-compability. Maybe it would be better to add a virtual member function WriteBatch::Handler::ShouldAbort() that returns false by default. Comments requested.
I still have to add a new unit test for the abort code, but let's finalize the API first.
Test Plan: make -j32 check
Reviewers: dhruba, haobo, vamsi, emayanke
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12339
Summary:
This patch adds the ability for the user to add sequences of arbitrary data (blobs) to write batches. These blobs are saved to the log along with everything else in the write batch. You can add multiple blobs per WriteBatch and the ordering of blobs, puts, merges, and deletes are preserved.
Blobs are not saves to SST files. RocksDB ignores blobs in every way except for writing them to the log.
Before committing this patch, I need to add some test code. But I'm submitting it now so people can comment on the API.
Test Plan: make -j32 check
Reviewers: dhruba, haobo, vamsi
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12195
Summary: Removed KeyMayExistImpl because KeyMayExist demanded Get like semantics now. Removed no_io from memtable and imm because we need the proper value now and shouldn't just stop when we see Merge in memtable. Added checks to block_cache. Updated documentation and unit-test
Test Plan: make all check;db_stress for 1 hour
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D11853
Summary:
Introduced KeyMayExist checking during writebatch-delete and removed from Outer Delete API because it uses writebatch-delete.
Added code to skip getting Table from disk if not already present in table_cache.
Some renaming of variables.
Introduced KeyMayExistImpl which allows checking since specified sequence number in GetImpl useful to check partially written writebatch.
Changed KeyMayExist to not be pure virtual and provided a default implementation.
Expanded unit-tests in db_test to check appropriately.
Ran db_stress for 1 hour with ./db_stress --max_key=100000 --ops_per_thread=10000000 --delpercent=50 --filter_deletes=1 --statistics=1.
Test Plan: db_stress;make check
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb, xjin
Differential Revision: https://reviews.facebook.net/D11745
Summary: As title. Exposed a Count function that returns the number of updates in a batch. Could be handy for replication sequence number check.
Test Plan: make check;
Reviewers: emayanke, sheki, dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D11523
Summary:
This diff introduces a new Merge operation into rocksdb.
The purpose of this review is mostly getting feedback from the team (everyone please) on the design.
Please focus on the four files under include/leveldb/, as they spell the client visible interface change.
include/leveldb/db.h
include/leveldb/merge_operator.h
include/leveldb/options.h
include/leveldb/write_batch.h
Please go over local/my_test.cc carefully, as it is a concerete use case.
Please also review the impelmentation files to see if the straw man implementation makes sense.
Note that, the diff does pass all make check and truly supports forward iterator over db and a version
of Get that's based on iterator.
Future work:
- Integration with compaction
- A raw Get implementation
I am working on a wiki that explains the design and implementation choices, but coding comes
just naturally and I think it might be a good idea to share the code earlier. The code is
heavily commented.
Test Plan: run all local tests
Reviewers: dhruba, heyongqiang
Reviewed By: dhruba
CC: leveldb, zshao, sheki, emayanke, MarkCallaghan
Differential Revision: https://reviews.facebook.net/D9651
- Replace raw slice comparison with a call to user comparator.
Added test for custom comparators.
- Fix end of namespace comments.
- Fixed bug in picking inputs for a level-0 compaction.
When finding overlapping files, the covered range may expand
as files are added to the input set. We now correctly expand
the range when this happens instead of continuing to use the
old range. For example, suppose L0 contains files with the
following ranges:
F1: a .. d
F2: c .. g
F3: f .. j
and the initial compaction target is F3. We used to search
for range f..j which yielded {F2,F3}. However we now expand
the range as soon as another file is added. In this case,
when F2 is added, we expand the range to c..j and restart the
search. That picks up file F1 as well.
This change fixes a bug related to deleted keys showing up
incorrectly after a compaction as described in Issue 44.
(Sync with upstream @25072954)
* env_chromium.cc should not export symbols.
* Fix MSVC warnings.
* Removed large value support.
* Fix broken reference to documentation file
git-svn-id: https://leveldb.googlecode.com/svn/trunk@24 62dab493-f737-651d-591e-8d6aee1b9529