rocksdb

Author	SHA1	Message	Date
Yi Wu	42564ada53	Blob DB: not using PinnableSlice move assignment Summary: The current implementation of PinnableSlice move assignment have an issue #3163. We are moving away from it instead of try to get the move assignment right, since it is too tricky. Closes https://github.com/facebook/rocksdb/pull/3164 Differential Revision: D6319201 Pulled By: yiwu-arbug fbshipit-source-id: 8f3279021f3710da4a4caa14fd238ed2df902c48	2017-11-13 18:12:20 -08:00
Maysam Yabandeh	2515266725	WritePrepared Txn: Refactoring TrackKeys Summary: This patch clarifies and refactors the logic around tracked keys in transactions. Closes https://github.com/facebook/rocksdb/pull/3140 Differential Revision: D6290258 Pulled By: maysamyabandeh fbshipit-source-id: 03b50646264cbcc550813c060b180fc7451a55c1	2017-11-11 13:14:20 -08:00
Maysam Yabandeh	2edc92bc28	WritePrepared Txn: cross-compatibility test Summary: Add tests to ensure that WritePrepared and WriteCommitted policies are cross compatible when the db WAL is empty. This is important when the admin want to switch between the policies. In such case, before the switch the admin needs to empty the WAL by i) committing/rollbacking all the pending transactions, ii) FlushMemTables Closes https://github.com/facebook/rocksdb/pull/3118 Differential Revision: D6227247 Pulled By: maysamyabandeh fbshipit-source-id: bcde3d92c1e89cda3b9cfa69f6a20af5d8993db7	2017-11-11 11:28:37 -08:00
Maysam Yabandeh	857adf388f	WritePrepared Txn: Refactor conf params Summary: Summary of changes: - Move seq_per_batch out of Options - Rename concurrent_prepare to two_write_queues - Add allocate_seq_only_for_data_ Closes https://github.com/facebook/rocksdb/pull/3136 Differential Revision: D6304458 Pulled By: maysamyabandeh fbshipit-source-id: 08e685bfa82bbc41b5b1c5eb7040a8ca6e05e58c	2017-11-10 17:28:12 -08:00
Dmitri Smirnov	f8e2db0717	Fix crashes, address test issues and adjust windows test script Summary: Add per-exe execution capability Add fix parsing of groups/tests Add timer test exclusion Fix unit tests Ifdef threadpool specific tests that do not pass on Vista threadpool. Remove spurious outout from prefix_test so test case listing works properly. Fix not using standard test directories results in file creation errors in sst_dump_test. BlobDb fixes: In C++ end() iterators can not be dereferenced. They are not valid. When deleting blob_db_ set it to nullptr before any other code executes. Not fixed:. On Windows you can not delete a file while it is open. [ RUN ] BlobDBTest.ReadWhileGC d:\dev\rocksdb\rocksdb\utilities\blob_db\blob_db_test.cc(75): error: DestroyBlobDB(dbname_, options, bdb_options) IO error: Failed to delete: d:/mnt/db\testrocksdb-17444/blob_db_test/blob_dir/000001.blob: Permission denied d:\dev\rocksdb\rocksdb\utilities\blob_db\blob_db_test.cc(75): error: DestroyBlobDB(dbname_, options, bdb_options) IO error: Failed to delete: d:/mnt/db\testrocksdb-17444/blob_db_test/blob_dir/000001.blob: Permission denied write_batch Should not call front() if there is a chance the container is empty Closes https://github.com/facebook/rocksdb/pull/3152 Differential Revision: D6293274 Pulled By: sagar0 fbshipit-source-id: 318c3717c22087fae13b18715dffb24565dbd956	2017-11-10 10:41:57 -08:00
Yi Wu	5e9e5a4702	Blob DB: Fix race condition between flush and write Summary: A race condition will happen when: * a user thread writes a value, but it hits the write stop condition because there are too many un-flushed memtables, while holding blob_db_impl.write_mutex_. * Flush is triggered and call flush begin listener and try to acquire blob_db_impl.write_mutex_. Fixing it. Closes https://github.com/facebook/rocksdb/pull/3149 Differential Revision: D6279805 Pulled By: yiwu-arbug fbshipit-source-id: 0e3c58afb78795ebe3360a2c69e05651e3908c40	2017-11-08 19:42:22 -08:00
Yi Wu	ca75f0a64a	Blob DB: Fix release build Summary: `compression` shadow the method name in `BlobFile`. Rename it. Closes https://github.com/facebook/rocksdb/pull/3148 Differential Revision: D6274498 Pulled By: yiwu-arbug fbshipit-source-id: 7d293596530998b23b6b8a8940f983f9b6343a98	2017-11-08 13:14:20 -08:00
Yi Wu	4f9f124347	Blob DB: use compression in file header instead of global options Summary: To fix the issue of failing to decompress existing value after reopen DB with a different compression settings. Closes https://github.com/facebook/rocksdb/pull/3142 Differential Revision: D6267260 Pulled By: yiwu-arbug fbshipit-source-id: c7cf7f3e33b0cd25520abf4771cdf9180cc02a5f	2017-11-07 17:42:17 -08:00
Manuel Ung	e03377c7fd	Add lock wait time as a perf context counter Summary: Adds two new counters: `key_lock_wait_count` counts how many times a lock was blocked by another transaction and had to wait, instead of being granted the lock immediately. `key_lock_wait_time` counts the time spent acquiring locks. Closes https://github.com/facebook/rocksdb/pull/3107 Differential Revision: D6217332 Pulled By: lth fbshipit-source-id: 55d4f46da5550c333e523263422fd61d6a46deb9	2017-11-06 10:57:19 -08:00
Yi Wu	be410dede8	Fix PinnableSlice move assignment Summary: After move assignment, we need to re-initialized the moved PinnableSlice. Also update blob_db_impl.cc to not reuse the moved PinnableSlice since it is supposed to be in an undefined state after move. Closes https://github.com/facebook/rocksdb/pull/3127 Differential Revision: D6238585 Pulled By: yiwu-arbug fbshipit-source-id: bd99f2e37406c4f7de160c7dee6a2e8126bc224e	2017-11-03 18:13:21 -07:00
Yi Wu	2581c0a5a1	Blob DB: Fix BlobDBTest::SnapshotAndGarbageCollection asan failure Summary: Fix unreleased snapshot at the end of the test. Closes https://github.com/facebook/rocksdb/pull/3126 Differential Revision: D6232867 Pulled By: yiwu-arbug fbshipit-source-id: 651ca3144fc573ea2ab0ab20f0a752fb4a101d26	2017-11-03 10:26:59 -07:00
Yi Wu	62578d80c1	Blob DB: Add compaction filter to remove expired blob index entries Summary: After adding expiration to blob index in #3066, we are now able to add a compaction filter to cleanup expired blob index entries. Closes https://github.com/facebook/rocksdb/pull/3090 Differential Revision: D6183812 Pulled By: yiwu-arbug fbshipit-source-id: 9cb03267a9702975290e758c9c176a2c03530b83	2017-11-02 17:27:38 -07:00
Yi Wu	7bfa88037e	Blob DB: fix snapshot handling Summary: Blob db will keep blob file if data in the file is visible to an active snapshot. Before this patch it checks whether there is an active snapshot has sequence number greater than the earliest sequence in the file. This is problematic since we take snapshot on every read, if it keep having reads, old blob files will not be cleanup. Change to check if there is an active snapshot falls in the range of [earliest_sequence, obsolete_sequence) where obsolete sequence is 1. if data is relocated to another file by garbage collection, it is the latest sequence at the time garbage collection finish 2. otherwise, it is the latest sequence of the file Closes https://github.com/facebook/rocksdb/pull/3087 Differential Revision: D6182519 Pulled By: yiwu-arbug fbshipit-source-id: cdf4c35281f782eb2a9ad6a87b6727bbdff27a45	2017-11-02 15:58:27 -07:00
Yi Wu	f662f8f0b6	Blob DB: option to enable garbage collection Summary: Add an option to enable/disable auto garbage collection, where we keep counting how many keys have been evicted by either deletion or compaction and decide whether to garbage collect a blob file. Default disable auto garbage collection for now since the whole logic is not fully tested and we plan to make major change to it. Closes https://github.com/facebook/rocksdb/pull/3117 Differential Revision: D6224756 Pulled By: yiwu-arbug fbshipit-source-id: cdf53bdccec96a4580a2b3a342110ad9e8864dfe	2017-11-02 15:58:27 -07:00
Yi Wu	167ba599ec	Blob DB: Fix flaky BlobDBTest::GCExpiredKeyWhileOverwriting test Summary: The test intent to wait until key being overwritten until proceed with garbage collection. It failed to wait for `PutUntil` finally finish. Fixing it. Closes https://github.com/facebook/rocksdb/pull/3116 Differential Revision: D6222833 Pulled By: yiwu-arbug fbshipit-source-id: fa9b57a772b92a66cf250b44e7975c43f62f45c5	2017-11-02 13:27:34 -07:00
Sagar Vemuri	25ac1697b4	Blob DB: Evict oldest blob file when close to blob db size limit Summary: Evict oldest blob file and put it in obsolete_files list when close to blob db size limit. The file will be delete when the `DeleteObsoleteFiles` background job runs next time. For now I set `kEvictOldestFileAtSize` constant, which controls when to evict the oldest file, at 90%. It could be tweaked or made into an option if really needed; I didn't want to expose it as an option pre-maturely as there are already too many :) . Closes https://github.com/facebook/rocksdb/pull/3094 Differential Revision: D6187340 Pulled By: sagar0 fbshipit-source-id: 687f8262101b9301bf964b94025a2fe9d8573421	2017-11-02 12:11:21 -07:00
Maysam Yabandeh	60d83df23d	WritePrepared Txn: Move DB class to its own file Summary: Move WritePreparedTxnDB from pessimistic_transaction_db.h to its own header, write_prepared_txn_db.h Closes https://github.com/facebook/rocksdb/pull/3114 Differential Revision: D6220987 Pulled By: maysamyabandeh fbshipit-source-id: 18893fb4fdc6b809fe117dabb544080f9b4a301b	2017-11-02 11:14:30 -07:00
Maysam Yabandeh	02693f64fc	WritePrepared Txn: ValidateSnapshot Summary: Implements ValidateSnapshot for WritePrepared txns and also adds a unit test to clarify the contract of this function. Closes https://github.com/facebook/rocksdb/pull/3101 Differential Revision: D6199405 Pulled By: maysamyabandeh fbshipit-source-id: ace509934c307ea5d26f4bbac5f836d7c80fd240	2017-11-01 19:11:09 -07:00
Maysam Yabandeh	17731a43a6	WritePrepared Txn: Optimize for recoverable state Summary: GetCommitTimeWriteBatch is currently used to store some state as part of commit in 2PC. In MyRocks it is specifically used to store some data that would be needed only during recovery. So it is not need to be stored in memtable right after each commit. This patch enables an optimization to write the GetCommitTimeWriteBatch only to the WAL. The batch will be written to memtable during recovery when the WAL is replayed. To cover the case when WAL is deleted after memtable flush, the batch is also buffered and written to memtable right before each memtable flush. Closes https://github.com/facebook/rocksdb/pull/3071 Differential Revision: D6148023 Pulled By: maysamyabandeh fbshipit-source-id: 2d09bae5565abe2017c0327421010d5c0d55eaa7	2017-11-01 17:26:46 -07:00
Maysam Yabandeh	c1cf94c787	WritePrepared Txn: sort indexes before batch collapse Summary: The collapse of duplicate keys in write batch needs to sort the indexes of duplicate keys since it only checks the index in the batch with the head of the list of duplicate keys. Closes https://github.com/facebook/rocksdb/pull/3093 Differential Revision: D6186800 Pulled By: maysamyabandeh fbshipit-source-id: abc9ae8c2f1840445a5584f925cf86ecc6f37154	2017-11-01 08:56:57 -07:00
Yi Wu	f6082d1944	Blob DB: cleanup unused options Summary: * cleanup num_concurrent_simple_blobs. We don't do concurrent writes (by taking write_mutex_) so it doesn't make sense to have multiple non TTL files open. We can revisit later when we want to improve writes. * cleanup eviction callback. we don't have plan to use it now. * rename s/open_simple_blob_files_/open_non_ttl_file_/ and s/open_blob_files_/open_ttl_files_/ to avoid confusion. Closes https://github.com/facebook/rocksdb/pull/3088 Differential Revision: D6182598 Pulled By: yiwu-arbug fbshipit-source-id: 99e6f5e01fa66d31309cdb06ce48502464bac6ad	2017-10-31 16:42:08 -07:00
Sagar Vemuri	f5078dde2d	Blob DB: Initialize all fields in Blob Header, Footer and Record structs Summary: Fixing un-itializations caught by valgrind. Closes https://github.com/facebook/rocksdb/pull/3103 Differential Revision: D6200195 Pulled By: sagar0 fbshipit-source-id: bf35a3fb03eb1d308e4c5ce30dee1e345d7b03b3	2017-10-31 16:42:08 -07:00
Yi Wu	3ebb7ba7b9	Blob DB: update blob file format Summary: Changing blob file format and some code cleanup around the change. The change with blob log format are: * Remove timestamp field in blob file header, blob file footer and blob records. The field is not being use and often confuse with expiration field. * Blob file header now come with column family id, which always equal to default column family id. It leaves room for future support of column family. * Compression field in blob file header now is a standalone byte (instead of compact encode with flags field) * Blob file footer now come with its own crc. * Key length now being uint64_t instead of uint32_t * Blob CRC now checksum both key and value (instead of value only). * Some reordering of the fields. The list of cleanups: * Better inline comments in blob_log_format.h * rename ttlrange_t and snrange_t to ExpirationRange and SequenceRange respectively. * simplify blob_db::Reader * Move crc checking logic to inside blob_log_format.cc Closes https://github.com/facebook/rocksdb/pull/3081 Differential Revision: D6171304 Pulled By: yiwu-arbug fbshipit-source-id: e4373e0d39264441b7e2fbd0caba93ddd99ea2af	2017-10-27 13:27:12 -07:00
Yi Wu	5a2a6483dc	Blob DB: Inline small values in base DB Summary: Adding the `min_blob_size` option to allow storing small values in base db (in LSM tree) together with the key. The goal is to improve performance for small values, while taking advantage of blob db's low write amplification for large values. Also adding expiration timestamp to blob index. It will be useful to evict stale blob indexes in base db by adding a compaction filter. I'll work on the compaction filter in future patches. See blob_index.h for the new blob index format. There are 4 cases when writing a new key: * small value w/o TTL: put in base db as normal value (i.e. ValueType::kTypeValue) * small value w/ TTL: put (type, expiration, value) to base db. * large value w/o TTL: write value to blob log and put (type, file, offset, size, compression) to base db. * large value w/TTL: write value to blob log and put (type, expiration, file, offset, size, compression) to base db. Closes https://github.com/facebook/rocksdb/pull/3066 Differential Revision: D6142115 Pulled By: yiwu-arbug fbshipit-source-id: 9526e76e19f0839310a3f5f2a43772a4ad182cd0	2017-10-26 12:30:54 -07:00
Sagar Vemuri	96e3a600ba	Return write error on reaching blob dir size limit Summary: I found that we continue accepting writes even when the blob db goes beyond the configured blob directory size limit. Now, we return an error for writes on reaching `blob_dir_size` limit and if `is_fifo` is set to false. (We cannot just drop any file when `is_fifo` is true.) Deleting the oldest file when `is_fifo` is true will be handled in a later PR. Closes https://github.com/facebook/rocksdb/pull/3060 Differential Revision: D6136156 Pulled By: sagar0 fbshipit-source-id: 2f11cb3f2eedfa94524fbfa2613dd64bfad7a23c	2017-10-25 16:30:37 -07:00
zach shipko	386a57e6ef	Fix build on OpenBSD Summary: A few simple changes to allow RocksDB to be built on OpenBSD. Let me know if any further changes are needed. Closes https://github.com/facebook/rocksdb/pull/3061 Differential Revision: D6138800 Pulled By: ajkr fbshipit-source-id: a13a17b5dc051e6518bd56a8c5efd1d24dd81b0c	2017-10-24 13:27:38 -07:00
Yi Wu	66a2c44ef4	Add DB::Properties::kEstimateOldestKeyTime Summary: With FIFO compaction we would like to get the oldest data time for monitoring. The problem is we don't have timestamp for each key in the DB. As an approximation, we expose the earliest of sst file "creation_time" property. My plan is to override the property with a more accurate value with blob db, where we actually have timestamp. Closes https://github.com/facebook/rocksdb/pull/2842 Differential Revision: D5770600 Pulled By: yiwu-arbug fbshipit-source-id: 03833c8f10bbfbee62f8ea5c0d03c0cafb5d853a	2017-10-23 15:27:27 -07:00
Dmitri Smirnov	d2a65c59e1	Fix unused var warnings in Release mode Summary: MSVC does not support unused attribute at this time. A separate assignment line fixes the issue probably by being counted as usage for MSVC and it no longer complains about unused var. Closes https://github.com/facebook/rocksdb/pull/3048 Differential Revision: D6126272 Pulled By: maysamyabandeh fbshipit-source-id: 4907865db45fd75a39a15725c0695aaa17509c1f	2017-10-23 14:27:04 -07:00
Maysam Yabandeh	63822eb761	Enable two write queues for transactions Summary: Enable concurrent_prepare flag for WritePrepared transactions and extend the existing transaction tests with this config. Closes https://github.com/facebook/rocksdb/pull/3046 Differential Revision: D6106534 Pulled By: maysamyabandeh fbshipit-source-id: 88c8d21d45bc492beb0a131caea84a2ac5e7d38c	2017-10-23 14:27:04 -07:00
Dmitri Smirnov	ebab2e2d42	Enable MSVC W4 with a few exceptions. Fix warnings and bugs Summary: Closes https://github.com/facebook/rocksdb/pull/3018 Differential Revision: D6079011 Pulled By: yiwu-arbug fbshipit-source-id: 988a721e7e7617967859dba71d660fc69f4dff57	2017-10-19 10:57:12 -07:00
Maysam Yabandeh	7e38238981	WritePrepared Txn: Disable GC during recovery Summary: Disables GC during recovery of a WritePrepared txn db to avoid GCing uncommitted key values. Closes https://github.com/facebook/rocksdb/pull/2980 Differential Revision: D6000191 Pulled By: maysamyabandeh fbshipit-source-id: fc4d522c643d24ebf043f811fe4ecd0dd0294675	2017-10-18 09:11:50 -07:00
Yi Wu	eaaef91178	Blob DB: Store blob index as kTypeBlobIndex in base db Summary: Blob db insert blob index to base db as kTypeBlobIndex type, to tell apart values written by plain rocksdb or blob db. This is to make it possible to migrate from existing rocksdb to blob db. Also with the patch blob db garbage collection get away from OptimisticTransaction. Instead it use a custom write callback to achieve similar behavior as OptimisticTransaction. This is because we need to pass the is_blob_index flag to DBImpl::Get but OptimisticTransaction don't support it. Closes https://github.com/facebook/rocksdb/pull/3000 Differential Revision: D6050044 Pulled By: yiwu-arbug fbshipit-source-id: 61dc72ab9977625e75f78cd968e7d8a3976e3632	2017-10-17 17:28:11 -07:00
Yi Wu	0552029b5c	Blob DB: not writing sequence number as blob record footer Summary: Previously each time we write a blob we write blog_record_header + key + value + blob_record_footer to blob log. The footer only contains a sequence and a crc for the sequence number. The sequence number was used in garbage collection to verify the value is recent. After #2703 we moved to use optimistic transaction and no longer use sequence number from the footer. Remove the footer altogether. There's another usage of sequence number and we are keeping it: Each blob log file keep track of sequence number range of keys in it, and use it to check if it is reference by a snapshot, before being deleted. Closes https://github.com/facebook/rocksdb/pull/3005 Differential Revision: D6057585 Pulled By: yiwu-arbug fbshipit-source-id: d6da53c457a316e9723f359a1b47facfc3ffe090	2017-10-17 12:13:08 -07:00
Yi Wu	10ba50e9eb	Blob DB: Move BlobFile definition to a separate file Summary: simply move BlobFile definition from blob_db_impl.h to blob_file.h. Closes https://github.com/facebook/rocksdb/pull/3002 Differential Revision: D6050143 Pulled By: yiwu-arbug fbshipit-source-id: a8fb6e094fe39bdeace6279569834bc65aa64a34	2017-10-13 14:42:26 -07:00
Zhongyi Xie	e2548366e1	add GetLiveFiles and GetLiveFilesMetaData for BlobDB Summary: Closes https://github.com/facebook/rocksdb/pull/2976 Differential Revision: D5994759 Pulled By: miasantreble fbshipit-source-id: 985c31dccb957cb970c302f813cd07a1e8cb6438	2017-10-09 19:56:04 -07:00
Yi Wu	8c392a31d7	WritePrepared Txn: Iterator Summary: On iterator create, take a snapshot, create a ReadCallback and pass the ReadCallback to the underlying DBIter to check if key is committed. Closes https://github.com/facebook/rocksdb/pull/2981 Differential Revision: D6001471 Pulled By: yiwu-arbug fbshipit-source-id: 3565c4cdaf25370ba47008b0e0cb65b31dfe79fe	2017-10-09 17:15:28 -07:00
Yi Wu	17c6325e8a	WritePrepare Txn: Cancel flush/compaction before destruction Summary: On WritePreparedTxnDB destruct there could be running compaction/flush holding a SnapshotChecker, which holds a pointer back to WritePreparedTxnDB. Make sure those jobs finished before destructing WritePreparedTxnDB. This is caught by TransactionTest::SeqAdvanceTest. Closes https://github.com/facebook/rocksdb/pull/2982 Differential Revision: D6002957 Pulled By: yiwu-arbug fbshipit-source-id: f1e70390c9798d1bd7959f5c8e2a1c14100773c3	2017-10-06 20:55:53 -07:00
Maysam Yabandeh	ec6c5383d0	WritePrepared Txn: end-to-end tests Summary: Enable WritePrepared policy for existing transaction tests. Closes https://github.com/facebook/rocksdb/pull/2972 Differential Revision: D5993614 Pulled By: maysamyabandeh fbshipit-source-id: d1eb53e2920c4e2a56434bb001231c98426f3509	2017-10-06 14:26:45 -07:00
Sagar Vemuri	da29eba43b	Enable WAL for blob index Summary: Enabled WAL, during GC, for blob index which is stored on regular RocksDB. Closes https://github.com/facebook/rocksdb/pull/2975 Differential Revision: D5997384 Pulled By: sagar0 fbshipit-source-id: b76c1487d8b5be0e36c55e8d77ffe3d37d63d85b	2017-10-06 10:59:31 -07:00
Yi Wu	d1b74b0c82	WritePrepared Txn: Compaction/Flush Summary: Update Compaction/Flush to support WritePreparedTxnDB: Add SnapshotChecker which is a proxy to query WritePreparedTxnDB::IsInSnapshot. Pass SnapshotChecker to DBImpl on WritePreparedTxnDB open. CompactionIterator use it to check if a key has been committed and if it is visible to a snapshot. In CompactionIterator: * check if key has been committed. If not, output uncommitted keys AS-IS. * use SnapshotChecker to check if key is visible to a snapshot when in need. * do not output key with seq = 0 if the key is not committed. Closes https://github.com/facebook/rocksdb/pull/2926 Differential Revision: D5902907 Pulled By: yiwu-arbug fbshipit-source-id: 945e037fdf0aa652dc5ba0ad879461040baa0320	2017-10-06 10:41:53 -07:00
Yi Wu	cc20ec3689	WritePrepared Txn: Test sequence number 0 is visible Summary: Compaction will output keys with sequence number 0, if it is visible to earliest snapshot. Adding a test to make sure IsInSnapshot() report sequence number 0 is visible to any snapshot. Closes https://github.com/facebook/rocksdb/pull/2974 Differential Revision: D5990665 Pulled By: yiwu-arbug fbshipit-source-id: ef50ebc777ff8ca688771f3ab598c7a609b0b65e	2017-10-05 16:26:44 -07:00
Maysam Yabandeh	4e3c3d8c6a	WritePrepared Txn: duplicate keys Summary: With WriteCommitted, when the write batch has duplicate keys, the txn db simply inserts them to the db with different seq numbers and let the db ignore/merge the duplicate values at the read time. With WritePrepared all the entries of the batch are inserted with the same seq number which prevents us from benefiting from this simple solution. This patch applies a hackish solution to unblock the end-to-end testing. The hack is to be replaced with a proper solution soon. The patch simply detects the duplicate key insertions, and mark the previous one as obsolete. Then before writing to the db it rewrites the batch eliminating the obsolete keys. This would incur a memcpy cost. Furthermore handing duplicate merge would require to do FullMerge instead of simply ignoring the previous value, which is not handled by this patch. Closes https://github.com/facebook/rocksdb/pull/2969 Differential Revision: D5976337 Pulled By: maysamyabandeh fbshipit-source-id: 114e65b66f137d8454ff2d1d782b8c05da95f989	2017-10-05 07:41:02 -07:00
Maysam Yabandeh	283d60761e	fix valgrind leak report in unit test Summary: I cannot locally reproduce the valgrind leak report but based on my code inspection not deleting txn1 might be the reason. ``` ==197848== 2,990 (544 direct, 2,446 indirect) bytes in 1 blocks are definitely lost in loss record 15 of 16 ==197848== at 0x4C2D06F: operator new(unsigned long) (in /usr/local/fbcode/gcc-5-glibc-2.23/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==197848== by 0x7D5B31: rocksdb::WritePreparedTxnDB::BeginTransaction(rocksdb::WriteOptions const&, rocksdb::TransactionOptions const&, rocksdb::Transaction) (pessimistic_transaction_db.cc:173) ==197848== by 0x7D80C1: rocksdb::PessimisticTransactionDB::Initialize(std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<rocksdb::ColumnFamilyHandle, std::allocator<rocksdb::ColumnFamilyHandle> > const&) (pessimistic_transaction_db.cc:115) ==197848== by 0x7DC42F: rocksdb::WritePreparedTxnDB::Initialize(std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<rocksdb::ColumnFamilyHandle, std::allocator<rocksdb::ColumnFamilyHandle> > const&) (pessimistic_transaction_db.cc:151) ==197848== by 0x7D8CA0: rocksdb::TransactionDB::WrapDB(rocksdb::DB, rocksdb::TransactionDBOptions const&, std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<rocksdb::ColumnFamilyHandle, std::allocator<rocksdb::ColumnFamilyHandle> > const&, rocksdb::TransactionDB*) (pessimistic_transaction_db.cc:275) ==197848== by 0x7D9F26: rocksdb::TransactionDB::Open(rocksdb::DBOptions const&, rocksdb::TransactionDBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle, std::allocator<rocksdb::ColumnFamilyHandle> >, rocksdb::TransactionDB) (pessimistic_transaction_db.cc:227) ==197848== by 0x7DB349: rocksdb::TransactionDB::Open(rocksdb::Options const&, rocksdb::TransactionDBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::TransactionDB) (pessimistic_transaction_db.cc:198) ==197848== by 0x52ABD2: rocksdb::TransactionTest::ReOpenNoDelete() (transaction_test.h:87) ==197848== by 0x51F7B8: rocksdb::WritePreparedTransactionTest_BasicRecoveryTest_Test::TestBody() (write_prepared_transaction_test.cc:843) ==197848== by 0x857557: HandleSehExceptionsInMethodIfSupported<testing::Test, void> (gtest-all.cc:3824) ==197848== by 0x857557: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test, void (testing::Test::)(), char const*) (gtest-all.cc:3860) ==197848== by 0x84E7EB: testing::Test::Run() [clone .part.485] (gtest-all.cc:3897) ==197848== by 0x84E9BC: Run (gtest-all.cc:3888) ==197848== by 0x84E9BC: testing::TestInfo::Run() [clone .part.486] (gtest-all.cc:4072) ``` Closes https://github.com/facebook/rocksdb/pull/2963 Differential Revision: D5968856 Pulled By: maysamyabandeh fbshipit-source-id: 2ac512bbcad37dc8eeeffe4f363978913354180c	2017-10-03 14:58:07 -07:00
Maysam Yabandeh	d27258d3a6	WritePrepared Txn: Rollback Summary: Implement the rollback of WritePrepared txns. For each modified value, it reads the value before the txn and write it back. This would cancel out the effect of transaction. It also remove the rolled back txn from prepared heap. Closes https://github.com/facebook/rocksdb/pull/2946 Differential Revision: D5937575 Pulled By: maysamyabandeh fbshipit-source-id: a6d3c47f44db3729f44b287a80f97d08dc4e888d	2017-10-02 19:59:27 -07:00
Sagar Vemuri	bb38cd03a9	Limit number of merge operands in Cassandra merge operator Summary: Now that RocksDB supports conditional merging during point lookups (introduced in #2923), Cassandra value merge operator can be updated to pass in a limit. The limit needs to be passed in from the Cassandra code. Closes https://github.com/facebook/rocksdb/pull/2947 Differential Revision: D5938454 Pulled By: sagar0 fbshipit-source-id: d64a72d53170d8cf202b53bd648475c3952f7d7f	2017-10-02 16:11:40 -07:00
Andrew Kryczka	5df172da2f	fix deletion-triggered compaction in table builder Summary: It was broken when `NotifyCollectTableCollectorsOnFinish` was introduced. That function called `Finish` on each of the `TablePropertiesCollector`s, and `CompactOnDeletionCollector::Finish()` was resetting all its internal state. Then, when we checked whether compaction is necessary, the flag had already been cleared. Fixed above issue by avoiding resetting internal state during `Finish()`. Multiple calls to `Finish()` are allowed, but callers cannot invoke `AddUserKey()` on the collector after any finishes. Closes https://github.com/facebook/rocksdb/pull/2936 Differential Revision: D5918659 Pulled By: ajkr fbshipit-source-id: 4f05e9d80e50ee762ba1e611d8d22620029dca6b	2017-09-28 18:17:30 -07:00
Maysam Yabandeh	385049baf2	WritePrepared Txn: Recovery Summary: Recover txns from the WAL. Also added some unit tests. Closes https://github.com/facebook/rocksdb/pull/2901 Differential Revision: D5859596 Pulled By: maysamyabandeh fbshipit-source-id: 6424967b231388093b4effffe0a3b1b7ec8caeb0	2017-09-28 16:56:45 -07:00
Quinn Jarrell	6a541afcc4	Make bytes_per_sync and wal_bytes_per_sync mutable Summary: SUMMARY Moves the bytes_per_sync and wal_bytes_per_sync options from immutableoptions to mutable options. Also if wal_bytes_per_sync is changed, the wal file and memtables are flushed. TEST PLAN ran make check all passed Two new tests SetBytesPerSync, SetWalBytesPerSync check that after issuing setoptions with a new value for the var, the db options have the new value. Closes https://github.com/facebook/rocksdb/pull/2893 Reviewed By: yiwu-arbug Differential Revision: D5845814 Pulled By: TheRushingWookie fbshipit-source-id: 93b52d779ce623691b546679dcd984a06d2ad1bd	2017-09-27 17:49:45 -07:00
Yi Wu	ec48e5c77f	Add TransactionDB::SingleDelete() Summary: Looks like the API is simply missing. Adding it. Closes https://github.com/facebook/rocksdb/pull/2937 Differential Revision: D5919955 Pulled By: yiwu-arbug fbshipit-source-id: 6e2e9c96c29882b0bb4113d1f8efb72bffc57878	2017-09-27 10:27:26 -07:00
Yi Wu	be97dbb15c	Fix WritePreparedTransactionTest::SeqAdvanceTest ASAN failure Summary: Closes https://github.com/facebook/rocksdb/pull/2922 Differential Revision: D5895310 Pulled By: yiwu-arbug fbshipit-source-id: 52c635a25d22478ec1eca49b6817551202babac2	2017-09-22 15:26:42 -07:00
Yi Wu	1480e6f7cf	Fix TransactionTest::SeqAdvanceTest ASAN failure Summary: The test didn't delete txn before creating a new one. Closes https://github.com/facebook/rocksdb/pull/2913 Differential Revision: D5880236 Pulled By: yiwu-arbug fbshipit-source-id: 7a4fcaada3d86332292754502cd8f4341143bf4f	2017-09-21 09:56:54 -07:00
Pengchao Wang	e4234fbdcf	collecting kValue type tombstone Summary: In our testing cluster, we found large amount tombstone has been promoted to kValue type from kMerge after reaching the top level of compaction. Since we used to only collecting tombstone in merge operator, those tombstones can never be collected. This PR addresses the issue by adding a GC step in compaction filter, which is only for kValue type records. Since those record already reached the top of compaction (no earlier data exists) we can safely remove them in compaction filter without worrying old data appears. This PR also removes an old optimization in cassandra merge operator for single merge operands. We need to do GC even on a single operand, so the optimation does not make sense anymore. Closes https://github.com/facebook/rocksdb/pull/2855 Reviewed By: sagar0 Differential Revision: D5806445 Pulled By: wpc fbshipit-source-id: 6eb25629d4ce917eb5e8b489f64a6aa78c7d270b	2017-09-18 16:27:12 -07:00
Maysam Yabandeh	60beefd6e0	WritePrepared Txn: Advance seq one per batch Summary: By default the seq number in DB is increased once per written key. WritePrepared txns requires the seq to be increased once per the entire batch so that the seq would be used as the prepare timestamp by which the transaction is identified. Also we need to increase seq for the commit marker since it would give a unique id to the commit timestamp of transactions. Two unit tests are added to verify our understanding of how the seq should be increased. The recovery path requires much more work and is left to another patch. Closes https://github.com/facebook/rocksdb/pull/2885 Differential Revision: D5837843 Pulled By: maysamyabandeh fbshipit-source-id: a08960b93d727e1cf438c254d0c2636fb133cc1c	2017-09-18 14:45:08 -07:00
Yi Wu	9a970c81af	Fix WriteBatchWithIndex::GetFromBatchAndDB not allowing StackableDB Summary: Closes https://github.com/facebook/rocksdb/pull/2881 Differential Revision: D5829682 Pulled By: yiwu-arbug fbshipit-source-id: abb8fa14b58cea7c416282f9be19e8b1a7961c6e	2017-09-13 17:26:35 -07:00
Maysam Yabandeh	09713a64b3	WritePrepared Txn: Lock-free CommitMap Summary: We had two proposals for lock-free commit maps. This patch implements the latter one that was simpler. We can later experiment with both proposals. In this impl each entry is an std::atomic of uint64_t, which are accessed via memory_order_acquire/release. In x86_64 arch this is compiled to simple reads and writes from memory. Closes https://github.com/facebook/rocksdb/pull/2861 Differential Revision: D5800724 Pulled By: maysamyabandeh fbshipit-source-id: 41abae9a4a5df050a8eb696c43de11c2770afdda	2017-09-13 12:12:11 -07:00
Amy Xu	5785b1fcb8	Fix naming in InternalKey Summary: - Switched all instances of SetMinPossibleForUserKey and SetMaxPossibleForUserKey in accordance to InternalKeyComparator's comparison logic Closes https://github.com/facebook/rocksdb/pull/2868 Differential Revision: D5804152 Pulled By: axxufb fbshipit-source-id: 80be35e04f2e8abc35cc64abe1fecb03af24e183	2017-09-12 17:17:42 -07:00
Andrew Kryczka	f5148ade10	support opening zero backups during engine init Summary: There are internal users who open BackupEngine for writing new backups only, and they don't care whether old backups can be read or not. The condition `BackupableDBOptions::max_valid_backups_to_open == 0` should be supported (previously in `df74b775e6` I made the mistake of choosing 0 as a special value to disable the limit). Closes https://github.com/facebook/rocksdb/pull/2819 Differential Revision: D5751599 Pulled By: ajkr fbshipit-source-id: e73ac19eb5d756d6b68601eae8e43407ee4f2752	2017-09-12 13:26:34 -07:00
Siying Dong	64b6452e0c	Make InternalKeyComparator final and directly use it in merging iterator Summary: Merging iterator invokes InternalKeyComparator.Compare() frequently to heap merge. By making InternalKeyComparator final and merging iterator to directly use InternalKeyComparator rather than through Iterator interface, we can give compiler a choice to avoid one more virtual function call if possible. I ran readseq benchmark in memory-only use case to make sure the performance at least doesn't regress. I have to disable the final key word in debug build, as a hack test class depends on overriding the class. Closes https://github.com/facebook/rocksdb/pull/2860 Differential Revision: D5800461 Pulled By: siying fbshipit-source-id: ab876f22a09bb5c560740911412336e0e25ccb53	2017-09-11 12:04:21 -07:00
Maysam Yabandeh	f46464d383	write-prepared txn: call IsInSnapshot Summary: This patch instruments the read path to verify each read value against an optional ReadCallback class. If the value is rejected, the reader moves on to the next value. The WritePreparedTxn makes use of this feature to skip sequence numbers that are not in the read snapshot. Closes https://github.com/facebook/rocksdb/pull/2850 Differential Revision: D5787375 Pulled By: maysamyabandeh fbshipit-source-id: 49d808b3062ab35e7ae98ad388f659757794184c	2017-09-11 09:14:48 -07:00
Maysam Yabandeh	9a4df72994	WritePrepared Txn: CommitBatch Summary: Implements CommitBatch and CommitWithoutPrepare for WritePreparedTxn Closes https://github.com/facebook/rocksdb/pull/2854 Differential Revision: D5793999 Pulled By: maysamyabandeh fbshipit-source-id: d8b9858221162c6ac7a1f6912cbd3481d0d8a503	2017-09-08 15:56:39 -07:00
Maysam Yabandeh	fce6c892ab	Advance max evicted seq in coarser granularity Summary: This patch advances the max_evicted_seq_ is larger granularities to reduce the overhead of updating the relevant data structures. It also refactor the related code and adds testing to that. As part of this patch some of the TODOs for removing usage of non-static const members are also addressed. Closes https://github.com/facebook/rocksdb/pull/2844 Differential Revision: D5772928 Pulled By: maysamyabandeh fbshipit-source-id: f4fcc2948be69c034f10812cf922ce5ab82ef98c	2017-09-08 14:41:22 -07:00
Yi Wu	dcd36a6aee	Make it explicit blob db doesn't support CF Summary: Blob db doesn't currently support column families. Return NotSupported status explicitly. Closes https://github.com/facebook/rocksdb/pull/2825 Differential Revision: D5757438 Pulled By: yiwu-arbug fbshipit-source-id: 44de9408fd032c98e8ae337d4db4ed37169bd9fa	2017-09-08 11:11:04 -07:00
Siying Dong	0e99323ac2	Fix CLANG Analyze Summary: clang analyze shows warnings after we upgrade the CLANG version. Fix them. Closes https://github.com/facebook/rocksdb/pull/2839 Differential Revision: D5769060 Pulled By: siying fbshipit-source-id: 3f8e4df715590d8984f6564b608fa08cfdfa5f14	2017-09-07 14:28:06 -07:00
Maysam Yabandeh	7e19a571e9	Remove unused TransactionCallback Summary: TransactionCallback was never used. Remove it to avoid confusion. Closes https://github.com/facebook/rocksdb/pull/2853 Differential Revision: D5787219 Pulled By: maysamyabandeh fbshipit-source-id: e2b6a89537e3770a269ad38be71c4b0b160a88ac	2017-09-07 12:17:18 -07:00
Maysam Yabandeh	79810e2d49	Skip write_prepared_transaction_test in travis Summary: The patch skips write_prepared_transaction_test from travis as they time out there. They are still covered in daily runs of tests. Closes https://github.com/facebook/rocksdb/pull/2836 Differential Revision: D5767203 Pulled By: maysamyabandeh fbshipit-source-id: 51045ef98a745197136e14b2ec02fc6f38081b75	2017-09-05 15:29:52 -07:00
Yi Wu	ab95e293d2	Fix memory leak on blob db open Summary: Fixes #2820 Closes https://github.com/facebook/rocksdb/pull/2826 Differential Revision: D5757527 Pulled By: yiwu-arbug fbshipit-source-id: f495b63700495aeaade30a1da5e3675848f3d72f	2017-09-01 14:13:51 -07:00
Maysam Yabandeh	37ae8cc60f	Signal progress of the test to avoid timeout Summary: Closes https://github.com/facebook/rocksdb/pull/2824 Differential Revision: D5756457 Pulled By: maysamyabandeh fbshipit-source-id: dff53e945d8ac4ffe6775a2176424fd1a27fc189	2017-09-01 11:28:52 -07:00
Dmitri Smirnov	0ec90a7cc2	Add -DPORTABLE=1 to MSVC CI build Summary: Add -DPORTABLE=1 port::cacheline_aligned_alloc() has arguments swapped which prevents every single test from running. Closes https://github.com/facebook/rocksdb/pull/2815 Differential Revision: D5751661 Pulled By: siying fbshipit-source-id: e0857d6e138ec46035b3c23d7c3c751901a0a4a0	2017-08-31 16:42:48 -07:00
Andrew Kryczka	b97685aef6	fix backup engine when latest backup corrupt Summary: Backup engine is intentionally openable even when some backups are corrupt. Previously the engine could write new backups as long as the most recent backup wasn't corrupt. This PR makes the backup engine able to create new backups even when the most recent one is corrupt. We now maintain two ID instance variables: - `latest_backup_id_` is used when creating backup to choose the new ID - `latest_valid_backup_id_` is used when restoring latest backup since we want most recent valid one Closes https://github.com/facebook/rocksdb/pull/2804 Differential Revision: D5734148 Pulled By: ajkr fbshipit-source-id: db440707b31df2c7b084188aa5f6368449e10bcf	2017-08-31 15:41:49 -07:00
Pengchao Wang	825a22c00c	garbage collect tombstones in merge operator Summary: Remove cassandra tombstone when reaching the max compaction level (full merge). if all columns collected key will be removed in next compaction via compaction filter Closes https://github.com/facebook/rocksdb/pull/2791 Reviewed By: sagar0 Differential Revision: D5722465 Pulled By: wpc fbshipit-source-id: 61e9898a5686551653a16383255aeaab3197e65e	2017-08-31 10:11:54 -07:00
Maysam Yabandeh	26ac24f199	Add more unit test to write_prepared txns Summary: Closes https://github.com/facebook/rocksdb/pull/2798 Differential Revision: D5724173 Pulled By: maysamyabandeh fbshipit-source-id: fb6b782d933fb4be315b1a231a6a67a66fdc9c96	2017-08-31 09:41:27 -07:00
Maysam Yabandeh	fbfa3e7a43	WriteAtPrepare: Efficient read from snapshot list Summary: Divide the old snapshots to two lists: a few that fit into a cached array and the rest in a vector, which is expected to be empty in normal cases. The former is to optimize concurrent reads from snapshots without requiring locks. It is done by an array of std::atomic, from which std::memory_order_acquire reads are compiled to simple read instructions in most of the x86_64 architectures. Closes https://github.com/facebook/rocksdb/pull/2758 Differential Revision: D5660504 Pulled By: maysamyabandeh fbshipit-source-id: 524fcf9a8e7f90a92324536456912a99aaa6740c	2017-08-26 01:00:38 -07:00
Yi Wu	503db684f7	make blob file close synchronous Summary: Fixing flaky blob_db_test. To close a blob file, blob db used to add a CloseSeqWrite job to the background thread to close it. Changing file close to be synchronous in order to simplify logic, and fix flaky blob_db_test. Closes https://github.com/facebook/rocksdb/pull/2787 Differential Revision: D5699387 Pulled By: yiwu-arbug fbshipit-source-id: dd07a945cd435cd3808fce7ee4ea57817409474a	2017-08-25 10:41:49 -07:00
Maysam Yabandeh	cd26af3476	Add unit test for WritePrepared skeleton Summary: Closes https://github.com/facebook/rocksdb/pull/2756 Differential Revision: D5660516 Pulled By: maysamyabandeh fbshipit-source-id: f3f3d3b5f544007a7fbdd78e49e4738b4437c7ee	2017-08-23 13:56:03 -07:00
Andrew Kryczka	234f33a3f9	allow nullptr Slice only as sentinel Summary: Allow `Slice` holding nullptr as a sentinel value but not in comparisons. This new restriction eliminates the need for the manual checks in `39ef900551`, while still conforming to glibc's `memcmp` API. Thanks siying for the idea. Users may need to migrate, so mentioned it in HISTORY.md. Closes https://github.com/facebook/rocksdb/pull/2777 Differential Revision: D5686016 Pulled By: ajkr fbshipit-source-id: 03a2ca3fd9a0ebade9d0d5686c81d59a9534f563	2017-08-23 10:56:06 -07:00
Maysam Yabandeh	ccf7f833e3	Use PinnableSlice in Transactions Summary: The ::Get from DB is not augmented with an overload method that takes a PinnableSlice instead of a string. Transactions however are not yet upgraded to use the new API. As a result, transaction users such as MyRocks cannot benefit from it. This patch updates the transactional API with a PinnableSlice overload. Closes https://github.com/facebook/rocksdb/pull/2736 Differential Revision: D5645770 Pulled By: maysamyabandeh fbshipit-source-id: f6af520df902f842de1bcf99bed3e8dfc43ad96d	2017-08-23 10:11:45 -07:00
yiwu-arbug	5b68b114f1	Blob db create a snapshot before every read Summary: If GC kicks in between * A Get() reads index entry from base db. * The Get() read from a blob file The GC can delete the corresponding blob file, making the key not found. Fortunately we have existing logic to avoid deleting a blob file if it is referenced by a snapshot. So the fix is to explicitly create a snapshot before reading index entry from base db. Closes https://github.com/facebook/rocksdb/pull/2754 Differential Revision: D5655956 Pulled By: yiwu-arbug fbshipit-source-id: e4ccbc51331362542e7343175bbcbdea5830f544	2017-08-20 18:26:19 -07:00
yiwu-arbug	4624ae52c9	GC the oldest file when out of space Summary: When out of space, blob db should GC the oldest file. The current implementation GC the newest one instead. Fixing it. Closes https://github.com/facebook/rocksdb/pull/2757 Differential Revision: D5657611 Pulled By: yiwu-arbug fbshipit-source-id: 56c30a4c52e6ab04551dda8c5c46006d4070b28d	2017-08-20 17:11:06 -07:00
Archit Mishra	bddd5d3630	Added mechanism to track deadlock chain Summary: Changes: * extended the wait_txn_map to track additional information * designed circular buffer to store n latest deadlocks' information * added test coverage to verify the additional information tracked is accurately stored in the buffer Closes https://github.com/facebook/rocksdb/pull/2630 Differential Revision: D5478025 Pulled By: armishra fbshipit-source-id: 2b138de7b5a73f5ca554fc3ff8220a3be49f39e7	2017-08-17 18:56:21 -07:00
yiwu-arbug	29877ec7b4	Fix blob db crash during calculating write amp Summary: On initial call to BlobDBImpl::WaStats() `all_periods_write_` would be empty, so it will crash when we call pop_front() at line 1627. Apparently it is mean to pop only when `all_periods_write_.size() > kWriteAmplificationStatsPeriods`. The whole write amp calculation doesn't seems to be correct and it is not being exposed. Will work on it later. Test Plan Change kWriteAmplificationStatsPeriodMillisecs to 1000 (1 second) and run db_bench --use_blob_db for 5 minutes. Closes https://github.com/facebook/rocksdb/pull/2751 Differential Revision: D5648269 Pulled By: yiwu-arbug fbshipit-source-id: b843d9a09bb5f9e1b713d101ec7b87e54b5115a4	2017-08-17 15:01:09 -07:00
Sagar Vemuri	8f2598ac9d	Enable Cassandra merge operator to be called with a single merge operand Summary: Updating Cassandra merge operator to make use of a single merge operand when needed. Single merge operand support has been introduced in #2721. Closes https://github.com/facebook/rocksdb/pull/2753 Differential Revision: D5652867 Pulled By: sagar0 fbshipit-source-id: b9fbd3196d3ebd0b752626dbf9bec9aa53e3e26a	2017-08-17 15:01:09 -07:00
follitude	ac8fb77afd	fix some misspellings Summary: PTAL ajkr Closes https://github.com/facebook/rocksdb/pull/2750 Differential Revision: D5648052 Pulled By: ajkr fbshipit-source-id: 7cd1ddd61364d5a55a10fdd293fa74b2bf89dd98	2017-08-16 21:57:20 -07:00
Maysam Yabandeh	eb6425303e	Update WritePrepared with the pseudo code Summary: Implement the main body of WritePrepared pseudo code. This includes PrepareInternal and CommitInternal, as well as AddCommitted which updates the commit map. It also provides a IsInSnapshot method that could be later called form the read path to decide if a version is in the read snapshot or it should other be skipped. This patch lacks unit tests and does not attempt to offer an efficient implementation. The idea is that to have the API specified so that we can work on related tasks in parallel. Closes https://github.com/facebook/rocksdb/pull/2713 Differential Revision: D5640021 Pulled By: maysamyabandeh fbshipit-source-id: bfa7a05e8d8498811fab714ce4b9c21530514e1c	2017-08-16 16:57:47 -07:00
Sagar Vemuri	132306fbf0	Remove PartialMerge implementation from Cassandra merge operator Summary: `PartialMergeMulti` implementation is enough for Cassandra, and `PartialMerge` is not required. Implementing both will just duplicate the code. As per https://github.com/facebook/rocksdb/blob/master/include/rocksdb/merge_operator.h#L130-L135 : ``` // The default implementation of PartialMergeMulti will use this function // as a helper, for backward compatibility. Any successor class of // MergeOperator should either implement PartialMerge or PartialMergeMulti, // although implementing PartialMergeMulti is suggested as it is in general // more effective to merge multiple operands at a time instead of two // operands at a time. ``` Closes https://github.com/facebook/rocksdb/pull/2737 Reviewed By: scv119 Differential Revision: D5633073 Pulled By: sagar0 fbshipit-source-id: ef4fa102c22fec6a0175ed12f5c44c15afe3c8ca	2017-08-15 14:59:34 -07:00
yiwu-arbug	e5a1b727c0	Fix blob DB transaction usage while GC Summary: While GC, blob DB use optimistic transaction to delete or replace the index entry in LSM, to guarantee correctness if there's a normal write writing to the same key. However, the previous implementation doesn't call SetSnapshot() nor use GetForUpdate() of transaction API, instead it do its own sequence number checking before beginning the transaction. A normal write can sneak in after the sequence number check and overwrite the key, and the GC will delete or relocate the old version of the key by mistake. Update the code to property use GetForUpdate() to check the existing index entry. After the patch the sequence number store with each blob record is useless, So I'm considering remove the sequence number from blob record, in another patch. Closes https://github.com/facebook/rocksdb/pull/2703 Differential Revision: D5589178 Pulled By: yiwu-arbug fbshipit-source-id: 8dc960cd5f4e61b36024ba7c32d05584ce149c24	2017-08-11 12:43:17 -07:00
Daniel Black	64f8484356	block_cache_tier: fix gcc-7 warnings Summary: Error was: utilities/persistent_cache/block_cache_tier.cc: In instantiation of ‘void rocksdb::Add(std::map<std::__cxx11::basic_string<char>, double>*, const string&, const T&) [with T = double; std::__cxx11::string = std::__cxx11::basic_string<char>]’: utilities/persistent_cache/block_cache_tier.cc:147:40: required from here utilities/persistent_cache/block_cache_tier.cc:141:23: error: type qualifiers ignored on cast result type [-Werror=ignored-qualifiers] stats->insert({key, static_cast<const double>(t)}); fixing like #2562 Closes https://github.com/facebook/rocksdb/pull/2603 Differential Revision: D5600910 Pulled By: yiwu-arbug fbshipit-source-id: 891a5ec7e451d2dec6ad1b6b7fac545657f87363	2017-08-10 11:58:53 -07:00
Maysam Yabandeh	bdc056f8aa	Refactor PessimisticTransaction Summary: This patch splits Commit and Prepare into lock-related logic and db-write-related logic. It moves lock-related logic to PessimisticTransaction to be reused by all children classes and movies the existing impl of db-write-related to PrepareInternal, CommitSingleInternal, and CommitInternal in WriteCommittedTxnImpl. Closes https://github.com/facebook/rocksdb/pull/2691 Differential Revision: D5569464 Pulled By: maysamyabandeh fbshipit-source-id: d1b8698e69801a4126c7bc211745d05c636f5325	2017-08-07 16:12:29 -07:00
Maysam Yabandeh	c9804e007a	Refactor TransactionDBImpl Summary: This opens space for the new implementations of TransactionDBImpl such as WritePreparedTxnDBImpl that has a different policy of how to write to DB. Closes https://github.com/facebook/rocksdb/pull/2689 Differential Revision: D5568918 Pulled By: maysamyabandeh fbshipit-source-id: f7eac866e175daf3793ae79da108f65cc7dc7b25	2017-08-05 17:26:15 -07:00
Yi Wu	0d4a2b7330	Avoid blob db call Sync() while writing Summary: The FsyncFiles background job call Fsync() periodically for blob files. However it can access WritableFileWriter concurrently with a Put() or Write(). And WritableFileWriter does not support concurrent access. It will lead to WritableFileWriter buffer being flush with same content twice, and blob file end up corrupted. Fixing by simply let FsyncFiles hold write_mutex_. Closes https://github.com/facebook/rocksdb/pull/2685 Differential Revision: D5561908 Pulled By: yiwu-arbug fbshipit-source-id: f0bb5bcab0e05694e053b8c49eab43640721e872	2017-08-04 13:12:07 -07:00
Yi Wu	92afe830f9	Update all blob db TTL and timestamps to uint64_t Summary: The current blob db implementation use mix of int32_t, uint32_t and uint64_t for TTL and expiration. Update all timestamps to uint64_t for consistency. Closes https://github.com/facebook/rocksdb/pull/2683 Differential Revision: D5557103 Pulled By: yiwu-arbug fbshipit-source-id: e4eab2691629a755e614e8cf1eed9c3a681d0c42	2017-08-03 17:57:30 -07:00
Yi Wu	0b814ba92d	Allow concurrent writes to blob db Summary: I'm going with brute-force solution, just letting Put() and Write() holding a mutex before writing. May improve concurrent writing with finer granularity locking later. Closes https://github.com/facebook/rocksdb/pull/2682 Differential Revision: D5552690 Pulled By: yiwu-arbug fbshipit-source-id: 039abd675b5d274a7af6428198d1733cafecef4c	2017-08-03 15:11:26 -07:00
Yi Wu	2c45ada4c4	Blob DB garbage collection should keep keys with newer version Summary: Fix the bug where if blob db garbage collection revmoe keys with newer version. It shouldn't delete the key from base db when sequence number in base db is not equal to the one in blob log. Closes https://github.com/facebook/rocksdb/pull/2678 Differential Revision: D5549752 Pulled By: yiwu-arbug fbshipit-source-id: abb8649260963b5c389748023970fd746279d227	2017-08-03 13:12:12 -07:00
Maysam Yabandeh	c3d5c4d38a	Refactor TransactionImpl Summary: This patch refactors TransactionImpl by separating the logic for pessimistic concurrency control from the implementation of how to write the data to rocksdb. The existing implementation is named WriteCommittedTxnImpl as it writes committed data to the db. A template named WritePreparedTxnImpl is also added which will be later completed to provide a an alternative implementation. Closes https://github.com/facebook/rocksdb/pull/2676 Differential Revision: D5549998 Pulled By: maysamyabandeh fbshipit-source-id: 16298e86b43ca4849324c1f35c731913c6d17bec	2017-08-03 08:57:22 -07:00
Yi Wu	1900771bd2	Dump Blob DB options to info log Summary: * Dump blob db options to info log * Remove BlobDBOptionsImpl to disallow dynamic cast BlobDBOptions into BlobDBOptionsImpl. Move options there to be constants or into BlobDBOptions. The dynamic cast is broken after #2645 * Change some of the default options * Remove blob_db_options.min_blob_size, which is unimplemented. Will implement it soon. Closes https://github.com/facebook/rocksdb/pull/2671 Differential Revision: D5529912 Pulled By: yiwu-arbug fbshipit-source-id: dcd58ca981db5bcc7f123b65a0d6f6ae0dc703c7	2017-08-01 13:01:47 -07:00
Siying Dong	21696ba502	Replace dynamic_cast<> Summary: Replace dynamic_cast<> so that users can choose to build with RTTI off, so that they can save several bytes per object, and get tiny more memory available. Some nontrivial changes: 1. Add Comparator::GetRootComparator() to get around the internal comparator hack 2. Add the two experiemental functions to DB 3. Add TableFactory::GetOptionString() to avoid unnecessary casting to get the option string 4. Since 3 is done, move the parsing option functions for table factory to table factory files too, to be symmetric. Closes https://github.com/facebook/rocksdb/pull/2645 Differential Revision: D5502723 Pulled By: siying fbshipit-source-id: fd13cec5601cf68a554d87bfcf056f2ffa5fbf7c	2017-07-28 16:27:16 -07:00
Yi Wu	aaf42fe775	Move blob_db/ttl_extractor.h into blob_db/blob_db.h Summary: Move blob_db/ttl_extractor.h into blob_db/blob_db.h Also exclude TTLExtractor from LITE build. Closes https://github.com/facebook/rocksdb/pull/2665 Differential Revision: D5520009 Pulled By: yiwu-arbug fbshipit-source-id: 4813dcc272c7cc4bf2cdac285256d9a17d78c7b7	2017-07-28 14:28:21 -07:00
Sagar Vemuri	aace46516b	Fix license headers in Cassandra related files Summary: I might have missed these while doing some recent cassandra code reviews. Closes https://github.com/facebook/rocksdb/pull/2663 Differential Revision: D5520138 Pulled By: sagar0 fbshipit-source-id: 340930afe9efe03c75f535a1da1f89bd3e53c1f9	2017-07-28 13:56:56 -07:00
Islam AbdelRahman	50a969131f	CacheActivityLogger, component to log cache activity into a file Summary: Simple component that will add a new entry in a log file every time we lookup/insert a key in SimCache. API: ``` SimCache::StartActivityLogging(<file_name>, <env>, <optional_max_size>) SimCache::StopActivityLogging() ``` Sending for review, Still need to add more comments. I was thinking about a better approach, but I ended up deciding I will use a mutex to sync the writes to the file, since this feature should not be heavily used and only used to collect info that will be analyzed offline. I think it's okay to hold the mutex every time we lookup/add to the SimCache. Closes https://github.com/facebook/rocksdb/pull/2295 Differential Revision: D5063826 Pulled By: IslamAbdelRahman fbshipit-source-id: f3b5daed8b201987c9a071146ddd5c5740a2dd8c	2017-07-28 12:36:48 -07:00
Yi Wu	6083bc79f8	Blob DB TTL extractor Summary: Introducing blob_db::TTLExtractor to replace extract_ttl_fn. The TTL extractor can be use to extract TTL from keys insert with Put or WriteBatch. Change over existing extract_ttl_fn are: * If value is changed, it will be return via std::string* (rather than Slice). With Slice the new value has to be part of the existing value. With std::string* the limitation is removed. * It can optionally return TTL or expiration. Other changes in this PR: * replace `std::chrono::system_clock` with `Env::NowMicros` so that I can mock time in tests. * add several TTL tests. * other minor naming change. Closes https://github.com/facebook/rocksdb/pull/2659 Differential Revision: D5512627 Pulled By: yiwu-arbug fbshipit-source-id: 0dfcb00d74d060b8534c6130c808e4d5d0a54440	2017-07-27 23:26:04 -07:00
Maysam Yabandeh	2b259c9d49	Lower num of iterations in DeadlockCycle test Summary: Currently this test times out with tsan. This is likely due to decreased speed with tsan. By lowering the number of iterations we can still catch a bug as the test is run regularly and multiple runs of the test is equivalent with running the test with more iterations. Closes https://github.com/facebook/rocksdb/pull/2639 Differential Revision: D5490549 Pulled By: maysamyabandeh fbshipit-source-id: bd69c42a9728d337ac95a06a401088384e51731a	2017-07-25 11:42:26 -07:00

1 2 3 4 5 ...

760 Commits