rocksdb

Author	SHA1	Message	Date
Yi Wu	2581c0a5a1	Blob DB: Fix BlobDBTest::SnapshotAndGarbageCollection asan failure Summary: Fix unreleased snapshot at the end of the test. Closes https://github.com/facebook/rocksdb/pull/3126 Differential Revision: D6232867 Pulled By: yiwu-arbug fbshipit-source-id: 651ca3144fc573ea2ab0ab20f0a752fb4a101d26	2017-11-03 10:26:59 -07:00
Yi Wu	62578d80c1	Blob DB: Add compaction filter to remove expired blob index entries Summary: After adding expiration to blob index in #3066, we are now able to add a compaction filter to cleanup expired blob index entries. Closes https://github.com/facebook/rocksdb/pull/3090 Differential Revision: D6183812 Pulled By: yiwu-arbug fbshipit-source-id: 9cb03267a9702975290e758c9c176a2c03530b83	2017-11-02 17:27:38 -07:00
Yi Wu	7bfa88037e	Blob DB: fix snapshot handling Summary: Blob db will keep blob file if data in the file is visible to an active snapshot. Before this patch it checks whether there is an active snapshot has sequence number greater than the earliest sequence in the file. This is problematic since we take snapshot on every read, if it keep having reads, old blob files will not be cleanup. Change to check if there is an active snapshot falls in the range of [earliest_sequence, obsolete_sequence) where obsolete sequence is 1. if data is relocated to another file by garbage collection, it is the latest sequence at the time garbage collection finish 2. otherwise, it is the latest sequence of the file Closes https://github.com/facebook/rocksdb/pull/3087 Differential Revision: D6182519 Pulled By: yiwu-arbug fbshipit-source-id: cdf4c35281f782eb2a9ad6a87b6727bbdff27a45	2017-11-02 15:58:27 -07:00
Yi Wu	f662f8f0b6	Blob DB: option to enable garbage collection Summary: Add an option to enable/disable auto garbage collection, where we keep counting how many keys have been evicted by either deletion or compaction and decide whether to garbage collect a blob file. Default disable auto garbage collection for now since the whole logic is not fully tested and we plan to make major change to it. Closes https://github.com/facebook/rocksdb/pull/3117 Differential Revision: D6224756 Pulled By: yiwu-arbug fbshipit-source-id: cdf53bdccec96a4580a2b3a342110ad9e8864dfe	2017-11-02 15:58:27 -07:00
Yi Wu	167ba599ec	Blob DB: Fix flaky BlobDBTest::GCExpiredKeyWhileOverwriting test Summary: The test intent to wait until key being overwritten until proceed with garbage collection. It failed to wait for `PutUntil` finally finish. Fixing it. Closes https://github.com/facebook/rocksdb/pull/3116 Differential Revision: D6222833 Pulled By: yiwu-arbug fbshipit-source-id: fa9b57a772b92a66cf250b44e7975c43f62f45c5	2017-11-02 13:27:34 -07:00
Sagar Vemuri	25ac1697b4	Blob DB: Evict oldest blob file when close to blob db size limit Summary: Evict oldest blob file and put it in obsolete_files list when close to blob db size limit. The file will be delete when the `DeleteObsoleteFiles` background job runs next time. For now I set `kEvictOldestFileAtSize` constant, which controls when to evict the oldest file, at 90%. It could be tweaked or made into an option if really needed; I didn't want to expose it as an option pre-maturely as there are already too many :) . Closes https://github.com/facebook/rocksdb/pull/3094 Differential Revision: D6187340 Pulled By: sagar0 fbshipit-source-id: 687f8262101b9301bf964b94025a2fe9d8573421	2017-11-02 12:11:21 -07:00
Maysam Yabandeh	60d83df23d	WritePrepared Txn: Move DB class to its own file Summary: Move WritePreparedTxnDB from pessimistic_transaction_db.h to its own header, write_prepared_txn_db.h Closes https://github.com/facebook/rocksdb/pull/3114 Differential Revision: D6220987 Pulled By: maysamyabandeh fbshipit-source-id: 18893fb4fdc6b809fe117dabb544080f9b4a301b	2017-11-02 11:14:30 -07:00
Maysam Yabandeh	02693f64fc	WritePrepared Txn: ValidateSnapshot Summary: Implements ValidateSnapshot for WritePrepared txns and also adds a unit test to clarify the contract of this function. Closes https://github.com/facebook/rocksdb/pull/3101 Differential Revision: D6199405 Pulled By: maysamyabandeh fbshipit-source-id: ace509934c307ea5d26f4bbac5f836d7c80fd240	2017-11-01 19:11:09 -07:00
Maysam Yabandeh	17731a43a6	WritePrepared Txn: Optimize for recoverable state Summary: GetCommitTimeWriteBatch is currently used to store some state as part of commit in 2PC. In MyRocks it is specifically used to store some data that would be needed only during recovery. So it is not need to be stored in memtable right after each commit. This patch enables an optimization to write the GetCommitTimeWriteBatch only to the WAL. The batch will be written to memtable during recovery when the WAL is replayed. To cover the case when WAL is deleted after memtable flush, the batch is also buffered and written to memtable right before each memtable flush. Closes https://github.com/facebook/rocksdb/pull/3071 Differential Revision: D6148023 Pulled By: maysamyabandeh fbshipit-source-id: 2d09bae5565abe2017c0327421010d5c0d55eaa7	2017-11-01 17:26:46 -07:00
Maysam Yabandeh	c1cf94c787	WritePrepared Txn: sort indexes before batch collapse Summary: The collapse of duplicate keys in write batch needs to sort the indexes of duplicate keys since it only checks the index in the batch with the head of the list of duplicate keys. Closes https://github.com/facebook/rocksdb/pull/3093 Differential Revision: D6186800 Pulled By: maysamyabandeh fbshipit-source-id: abc9ae8c2f1840445a5584f925cf86ecc6f37154	2017-11-01 08:56:57 -07:00
Yi Wu	f6082d1944	Blob DB: cleanup unused options Summary: * cleanup num_concurrent_simple_blobs. We don't do concurrent writes (by taking write_mutex_) so it doesn't make sense to have multiple non TTL files open. We can revisit later when we want to improve writes. * cleanup eviction callback. we don't have plan to use it now. * rename s/open_simple_blob_files_/open_non_ttl_file_/ and s/open_blob_files_/open_ttl_files_/ to avoid confusion. Closes https://github.com/facebook/rocksdb/pull/3088 Differential Revision: D6182598 Pulled By: yiwu-arbug fbshipit-source-id: 99e6f5e01fa66d31309cdb06ce48502464bac6ad	2017-10-31 16:42:08 -07:00
Sagar Vemuri	f5078dde2d	Blob DB: Initialize all fields in Blob Header, Footer and Record structs Summary: Fixing un-itializations caught by valgrind. Closes https://github.com/facebook/rocksdb/pull/3103 Differential Revision: D6200195 Pulled By: sagar0 fbshipit-source-id: bf35a3fb03eb1d308e4c5ce30dee1e345d7b03b3	2017-10-31 16:42:08 -07:00
Yi Wu	3ebb7ba7b9	Blob DB: update blob file format Summary: Changing blob file format and some code cleanup around the change. The change with blob log format are: * Remove timestamp field in blob file header, blob file footer and blob records. The field is not being use and often confuse with expiration field. * Blob file header now come with column family id, which always equal to default column family id. It leaves room for future support of column family. * Compression field in blob file header now is a standalone byte (instead of compact encode with flags field) * Blob file footer now come with its own crc. * Key length now being uint64_t instead of uint32_t * Blob CRC now checksum both key and value (instead of value only). * Some reordering of the fields. The list of cleanups: * Better inline comments in blob_log_format.h * rename ttlrange_t and snrange_t to ExpirationRange and SequenceRange respectively. * simplify blob_db::Reader * Move crc checking logic to inside blob_log_format.cc Closes https://github.com/facebook/rocksdb/pull/3081 Differential Revision: D6171304 Pulled By: yiwu-arbug fbshipit-source-id: e4373e0d39264441b7e2fbd0caba93ddd99ea2af	2017-10-27 13:27:12 -07:00
Yi Wu	5a2a6483dc	Blob DB: Inline small values in base DB Summary: Adding the `min_blob_size` option to allow storing small values in base db (in LSM tree) together with the key. The goal is to improve performance for small values, while taking advantage of blob db's low write amplification for large values. Also adding expiration timestamp to blob index. It will be useful to evict stale blob indexes in base db by adding a compaction filter. I'll work on the compaction filter in future patches. See blob_index.h for the new blob index format. There are 4 cases when writing a new key: * small value w/o TTL: put in base db as normal value (i.e. ValueType::kTypeValue) * small value w/ TTL: put (type, expiration, value) to base db. * large value w/o TTL: write value to blob log and put (type, file, offset, size, compression) to base db. * large value w/TTL: write value to blob log and put (type, expiration, file, offset, size, compression) to base db. Closes https://github.com/facebook/rocksdb/pull/3066 Differential Revision: D6142115 Pulled By: yiwu-arbug fbshipit-source-id: 9526e76e19f0839310a3f5f2a43772a4ad182cd0	2017-10-26 12:30:54 -07:00
Sagar Vemuri	96e3a600ba	Return write error on reaching blob dir size limit Summary: I found that we continue accepting writes even when the blob db goes beyond the configured blob directory size limit. Now, we return an error for writes on reaching `blob_dir_size` limit and if `is_fifo` is set to false. (We cannot just drop any file when `is_fifo` is true.) Deleting the oldest file when `is_fifo` is true will be handled in a later PR. Closes https://github.com/facebook/rocksdb/pull/3060 Differential Revision: D6136156 Pulled By: sagar0 fbshipit-source-id: 2f11cb3f2eedfa94524fbfa2613dd64bfad7a23c	2017-10-25 16:30:37 -07:00
zach shipko	386a57e6ef	Fix build on OpenBSD Summary: A few simple changes to allow RocksDB to be built on OpenBSD. Let me know if any further changes are needed. Closes https://github.com/facebook/rocksdb/pull/3061 Differential Revision: D6138800 Pulled By: ajkr fbshipit-source-id: a13a17b5dc051e6518bd56a8c5efd1d24dd81b0c	2017-10-24 13:27:38 -07:00
Yi Wu	66a2c44ef4	Add DB::Properties::kEstimateOldestKeyTime Summary: With FIFO compaction we would like to get the oldest data time for monitoring. The problem is we don't have timestamp for each key in the DB. As an approximation, we expose the earliest of sst file "creation_time" property. My plan is to override the property with a more accurate value with blob db, where we actually have timestamp. Closes https://github.com/facebook/rocksdb/pull/2842 Differential Revision: D5770600 Pulled By: yiwu-arbug fbshipit-source-id: 03833c8f10bbfbee62f8ea5c0d03c0cafb5d853a	2017-10-23 15:27:27 -07:00
Dmitri Smirnov	d2a65c59e1	Fix unused var warnings in Release mode Summary: MSVC does not support unused attribute at this time. A separate assignment line fixes the issue probably by being counted as usage for MSVC and it no longer complains about unused var. Closes https://github.com/facebook/rocksdb/pull/3048 Differential Revision: D6126272 Pulled By: maysamyabandeh fbshipit-source-id: 4907865db45fd75a39a15725c0695aaa17509c1f	2017-10-23 14:27:04 -07:00
Maysam Yabandeh	63822eb761	Enable two write queues for transactions Summary: Enable concurrent_prepare flag for WritePrepared transactions and extend the existing transaction tests with this config. Closes https://github.com/facebook/rocksdb/pull/3046 Differential Revision: D6106534 Pulled By: maysamyabandeh fbshipit-source-id: 88c8d21d45bc492beb0a131caea84a2ac5e7d38c	2017-10-23 14:27:04 -07:00
Dmitri Smirnov	ebab2e2d42	Enable MSVC W4 with a few exceptions. Fix warnings and bugs Summary: Closes https://github.com/facebook/rocksdb/pull/3018 Differential Revision: D6079011 Pulled By: yiwu-arbug fbshipit-source-id: 988a721e7e7617967859dba71d660fc69f4dff57	2017-10-19 10:57:12 -07:00
Maysam Yabandeh	7e38238981	WritePrepared Txn: Disable GC during recovery Summary: Disables GC during recovery of a WritePrepared txn db to avoid GCing uncommitted key values. Closes https://github.com/facebook/rocksdb/pull/2980 Differential Revision: D6000191 Pulled By: maysamyabandeh fbshipit-source-id: fc4d522c643d24ebf043f811fe4ecd0dd0294675	2017-10-18 09:11:50 -07:00
Yi Wu	eaaef91178	Blob DB: Store blob index as kTypeBlobIndex in base db Summary: Blob db insert blob index to base db as kTypeBlobIndex type, to tell apart values written by plain rocksdb or blob db. This is to make it possible to migrate from existing rocksdb to blob db. Also with the patch blob db garbage collection get away from OptimisticTransaction. Instead it use a custom write callback to achieve similar behavior as OptimisticTransaction. This is because we need to pass the is_blob_index flag to DBImpl::Get but OptimisticTransaction don't support it. Closes https://github.com/facebook/rocksdb/pull/3000 Differential Revision: D6050044 Pulled By: yiwu-arbug fbshipit-source-id: 61dc72ab9977625e75f78cd968e7d8a3976e3632	2017-10-17 17:28:11 -07:00
Yi Wu	0552029b5c	Blob DB: not writing sequence number as blob record footer Summary: Previously each time we write a blob we write blog_record_header + key + value + blob_record_footer to blob log. The footer only contains a sequence and a crc for the sequence number. The sequence number was used in garbage collection to verify the value is recent. After #2703 we moved to use optimistic transaction and no longer use sequence number from the footer. Remove the footer altogether. There's another usage of sequence number and we are keeping it: Each blob log file keep track of sequence number range of keys in it, and use it to check if it is reference by a snapshot, before being deleted. Closes https://github.com/facebook/rocksdb/pull/3005 Differential Revision: D6057585 Pulled By: yiwu-arbug fbshipit-source-id: d6da53c457a316e9723f359a1b47facfc3ffe090	2017-10-17 12:13:08 -07:00
Yi Wu	10ba50e9eb	Blob DB: Move BlobFile definition to a separate file Summary: simply move BlobFile definition from blob_db_impl.h to blob_file.h. Closes https://github.com/facebook/rocksdb/pull/3002 Differential Revision: D6050143 Pulled By: yiwu-arbug fbshipit-source-id: a8fb6e094fe39bdeace6279569834bc65aa64a34	2017-10-13 14:42:26 -07:00
Zhongyi Xie	e2548366e1	add GetLiveFiles and GetLiveFilesMetaData for BlobDB Summary: Closes https://github.com/facebook/rocksdb/pull/2976 Differential Revision: D5994759 Pulled By: miasantreble fbshipit-source-id: 985c31dccb957cb970c302f813cd07a1e8cb6438	2017-10-09 19:56:04 -07:00
Yi Wu	8c392a31d7	WritePrepared Txn: Iterator Summary: On iterator create, take a snapshot, create a ReadCallback and pass the ReadCallback to the underlying DBIter to check if key is committed. Closes https://github.com/facebook/rocksdb/pull/2981 Differential Revision: D6001471 Pulled By: yiwu-arbug fbshipit-source-id: 3565c4cdaf25370ba47008b0e0cb65b31dfe79fe	2017-10-09 17:15:28 -07:00
Yi Wu	17c6325e8a	WritePrepare Txn: Cancel flush/compaction before destruction Summary: On WritePreparedTxnDB destruct there could be running compaction/flush holding a SnapshotChecker, which holds a pointer back to WritePreparedTxnDB. Make sure those jobs finished before destructing WritePreparedTxnDB. This is caught by TransactionTest::SeqAdvanceTest. Closes https://github.com/facebook/rocksdb/pull/2982 Differential Revision: D6002957 Pulled By: yiwu-arbug fbshipit-source-id: f1e70390c9798d1bd7959f5c8e2a1c14100773c3	2017-10-06 20:55:53 -07:00
Maysam Yabandeh	ec6c5383d0	WritePrepared Txn: end-to-end tests Summary: Enable WritePrepared policy for existing transaction tests. Closes https://github.com/facebook/rocksdb/pull/2972 Differential Revision: D5993614 Pulled By: maysamyabandeh fbshipit-source-id: d1eb53e2920c4e2a56434bb001231c98426f3509	2017-10-06 14:26:45 -07:00
Sagar Vemuri	da29eba43b	Enable WAL for blob index Summary: Enabled WAL, during GC, for blob index which is stored on regular RocksDB. Closes https://github.com/facebook/rocksdb/pull/2975 Differential Revision: D5997384 Pulled By: sagar0 fbshipit-source-id: b76c1487d8b5be0e36c55e8d77ffe3d37d63d85b	2017-10-06 10:59:31 -07:00
Yi Wu	d1b74b0c82	WritePrepared Txn: Compaction/Flush Summary: Update Compaction/Flush to support WritePreparedTxnDB: Add SnapshotChecker which is a proxy to query WritePreparedTxnDB::IsInSnapshot. Pass SnapshotChecker to DBImpl on WritePreparedTxnDB open. CompactionIterator use it to check if a key has been committed and if it is visible to a snapshot. In CompactionIterator: * check if key has been committed. If not, output uncommitted keys AS-IS. * use SnapshotChecker to check if key is visible to a snapshot when in need. * do not output key with seq = 0 if the key is not committed. Closes https://github.com/facebook/rocksdb/pull/2926 Differential Revision: D5902907 Pulled By: yiwu-arbug fbshipit-source-id: 945e037fdf0aa652dc5ba0ad879461040baa0320	2017-10-06 10:41:53 -07:00
Yi Wu	cc20ec3689	WritePrepared Txn: Test sequence number 0 is visible Summary: Compaction will output keys with sequence number 0, if it is visible to earliest snapshot. Adding a test to make sure IsInSnapshot() report sequence number 0 is visible to any snapshot. Closes https://github.com/facebook/rocksdb/pull/2974 Differential Revision: D5990665 Pulled By: yiwu-arbug fbshipit-source-id: ef50ebc777ff8ca688771f3ab598c7a609b0b65e	2017-10-05 16:26:44 -07:00
Maysam Yabandeh	4e3c3d8c6a	WritePrepared Txn: duplicate keys Summary: With WriteCommitted, when the write batch has duplicate keys, the txn db simply inserts them to the db with different seq numbers and let the db ignore/merge the duplicate values at the read time. With WritePrepared all the entries of the batch are inserted with the same seq number which prevents us from benefiting from this simple solution. This patch applies a hackish solution to unblock the end-to-end testing. The hack is to be replaced with a proper solution soon. The patch simply detects the duplicate key insertions, and mark the previous one as obsolete. Then before writing to the db it rewrites the batch eliminating the obsolete keys. This would incur a memcpy cost. Furthermore handing duplicate merge would require to do FullMerge instead of simply ignoring the previous value, which is not handled by this patch. Closes https://github.com/facebook/rocksdb/pull/2969 Differential Revision: D5976337 Pulled By: maysamyabandeh fbshipit-source-id: 114e65b66f137d8454ff2d1d782b8c05da95f989	2017-10-05 07:41:02 -07:00
Maysam Yabandeh	283d60761e	fix valgrind leak report in unit test Summary: I cannot locally reproduce the valgrind leak report but based on my code inspection not deleting txn1 might be the reason. ``` ==197848== 2,990 (544 direct, 2,446 indirect) bytes in 1 blocks are definitely lost in loss record 15 of 16 ==197848== at 0x4C2D06F: operator new(unsigned long) (in /usr/local/fbcode/gcc-5-glibc-2.23/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==197848== by 0x7D5B31: rocksdb::WritePreparedTxnDB::BeginTransaction(rocksdb::WriteOptions const&, rocksdb::TransactionOptions const&, rocksdb::Transaction) (pessimistic_transaction_db.cc:173) ==197848== by 0x7D80C1: rocksdb::PessimisticTransactionDB::Initialize(std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<rocksdb::ColumnFamilyHandle, std::allocator<rocksdb::ColumnFamilyHandle> > const&) (pessimistic_transaction_db.cc:115) ==197848== by 0x7DC42F: rocksdb::WritePreparedTxnDB::Initialize(std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<rocksdb::ColumnFamilyHandle, std::allocator<rocksdb::ColumnFamilyHandle> > const&) (pessimistic_transaction_db.cc:151) ==197848== by 0x7D8CA0: rocksdb::TransactionDB::WrapDB(rocksdb::DB, rocksdb::TransactionDBOptions const&, std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<rocksdb::ColumnFamilyHandle, std::allocator<rocksdb::ColumnFamilyHandle> > const&, rocksdb::TransactionDB*) (pessimistic_transaction_db.cc:275) ==197848== by 0x7D9F26: rocksdb::TransactionDB::Open(rocksdb::DBOptions const&, rocksdb::TransactionDBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle, std::allocator<rocksdb::ColumnFamilyHandle> >, rocksdb::TransactionDB) (pessimistic_transaction_db.cc:227) ==197848== by 0x7DB349: rocksdb::TransactionDB::Open(rocksdb::Options const&, rocksdb::TransactionDBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::TransactionDB) (pessimistic_transaction_db.cc:198) ==197848== by 0x52ABD2: rocksdb::TransactionTest::ReOpenNoDelete() (transaction_test.h:87) ==197848== by 0x51F7B8: rocksdb::WritePreparedTransactionTest_BasicRecoveryTest_Test::TestBody() (write_prepared_transaction_test.cc:843) ==197848== by 0x857557: HandleSehExceptionsInMethodIfSupported<testing::Test, void> (gtest-all.cc:3824) ==197848== by 0x857557: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test, void (testing::Test::)(), char const*) (gtest-all.cc:3860) ==197848== by 0x84E7EB: testing::Test::Run() [clone .part.485] (gtest-all.cc:3897) ==197848== by 0x84E9BC: Run (gtest-all.cc:3888) ==197848== by 0x84E9BC: testing::TestInfo::Run() [clone .part.486] (gtest-all.cc:4072) ``` Closes https://github.com/facebook/rocksdb/pull/2963 Differential Revision: D5968856 Pulled By: maysamyabandeh fbshipit-source-id: 2ac512bbcad37dc8eeeffe4f363978913354180c	2017-10-03 14:58:07 -07:00
Maysam Yabandeh	d27258d3a6	WritePrepared Txn: Rollback Summary: Implement the rollback of WritePrepared txns. For each modified value, it reads the value before the txn and write it back. This would cancel out the effect of transaction. It also remove the rolled back txn from prepared heap. Closes https://github.com/facebook/rocksdb/pull/2946 Differential Revision: D5937575 Pulled By: maysamyabandeh fbshipit-source-id: a6d3c47f44db3729f44b287a80f97d08dc4e888d	2017-10-02 19:59:27 -07:00
Sagar Vemuri	bb38cd03a9	Limit number of merge operands in Cassandra merge operator Summary: Now that RocksDB supports conditional merging during point lookups (introduced in #2923), Cassandra value merge operator can be updated to pass in a limit. The limit needs to be passed in from the Cassandra code. Closes https://github.com/facebook/rocksdb/pull/2947 Differential Revision: D5938454 Pulled By: sagar0 fbshipit-source-id: d64a72d53170d8cf202b53bd648475c3952f7d7f	2017-10-02 16:11:40 -07:00
Andrew Kryczka	5df172da2f	fix deletion-triggered compaction in table builder Summary: It was broken when `NotifyCollectTableCollectorsOnFinish` was introduced. That function called `Finish` on each of the `TablePropertiesCollector`s, and `CompactOnDeletionCollector::Finish()` was resetting all its internal state. Then, when we checked whether compaction is necessary, the flag had already been cleared. Fixed above issue by avoiding resetting internal state during `Finish()`. Multiple calls to `Finish()` are allowed, but callers cannot invoke `AddUserKey()` on the collector after any finishes. Closes https://github.com/facebook/rocksdb/pull/2936 Differential Revision: D5918659 Pulled By: ajkr fbshipit-source-id: 4f05e9d80e50ee762ba1e611d8d22620029dca6b	2017-09-28 18:17:30 -07:00
Maysam Yabandeh	385049baf2	WritePrepared Txn: Recovery Summary: Recover txns from the WAL. Also added some unit tests. Closes https://github.com/facebook/rocksdb/pull/2901 Differential Revision: D5859596 Pulled By: maysamyabandeh fbshipit-source-id: 6424967b231388093b4effffe0a3b1b7ec8caeb0	2017-09-28 16:56:45 -07:00
Quinn Jarrell	6a541afcc4	Make bytes_per_sync and wal_bytes_per_sync mutable Summary: SUMMARY Moves the bytes_per_sync and wal_bytes_per_sync options from immutableoptions to mutable options. Also if wal_bytes_per_sync is changed, the wal file and memtables are flushed. TEST PLAN ran make check all passed Two new tests SetBytesPerSync, SetWalBytesPerSync check that after issuing setoptions with a new value for the var, the db options have the new value. Closes https://github.com/facebook/rocksdb/pull/2893 Reviewed By: yiwu-arbug Differential Revision: D5845814 Pulled By: TheRushingWookie fbshipit-source-id: 93b52d779ce623691b546679dcd984a06d2ad1bd	2017-09-27 17:49:45 -07:00
Yi Wu	ec48e5c77f	Add TransactionDB::SingleDelete() Summary: Looks like the API is simply missing. Adding it. Closes https://github.com/facebook/rocksdb/pull/2937 Differential Revision: D5919955 Pulled By: yiwu-arbug fbshipit-source-id: 6e2e9c96c29882b0bb4113d1f8efb72bffc57878	2017-09-27 10:27:26 -07:00
Yi Wu	be97dbb15c	Fix WritePreparedTransactionTest::SeqAdvanceTest ASAN failure Summary: Closes https://github.com/facebook/rocksdb/pull/2922 Differential Revision: D5895310 Pulled By: yiwu-arbug fbshipit-source-id: 52c635a25d22478ec1eca49b6817551202babac2	2017-09-22 15:26:42 -07:00
Yi Wu	1480e6f7cf	Fix TransactionTest::SeqAdvanceTest ASAN failure Summary: The test didn't delete txn before creating a new one. Closes https://github.com/facebook/rocksdb/pull/2913 Differential Revision: D5880236 Pulled By: yiwu-arbug fbshipit-source-id: 7a4fcaada3d86332292754502cd8f4341143bf4f	2017-09-21 09:56:54 -07:00
Pengchao Wang	e4234fbdcf	collecting kValue type tombstone Summary: In our testing cluster, we found large amount tombstone has been promoted to kValue type from kMerge after reaching the top level of compaction. Since we used to only collecting tombstone in merge operator, those tombstones can never be collected. This PR addresses the issue by adding a GC step in compaction filter, which is only for kValue type records. Since those record already reached the top of compaction (no earlier data exists) we can safely remove them in compaction filter without worrying old data appears. This PR also removes an old optimization in cassandra merge operator for single merge operands. We need to do GC even on a single operand, so the optimation does not make sense anymore. Closes https://github.com/facebook/rocksdb/pull/2855 Reviewed By: sagar0 Differential Revision: D5806445 Pulled By: wpc fbshipit-source-id: 6eb25629d4ce917eb5e8b489f64a6aa78c7d270b	2017-09-18 16:27:12 -07:00
Maysam Yabandeh	60beefd6e0	WritePrepared Txn: Advance seq one per batch Summary: By default the seq number in DB is increased once per written key. WritePrepared txns requires the seq to be increased once per the entire batch so that the seq would be used as the prepare timestamp by which the transaction is identified. Also we need to increase seq for the commit marker since it would give a unique id to the commit timestamp of transactions. Two unit tests are added to verify our understanding of how the seq should be increased. The recovery path requires much more work and is left to another patch. Closes https://github.com/facebook/rocksdb/pull/2885 Differential Revision: D5837843 Pulled By: maysamyabandeh fbshipit-source-id: a08960b93d727e1cf438c254d0c2636fb133cc1c	2017-09-18 14:45:08 -07:00
Yi Wu	9a970c81af	Fix WriteBatchWithIndex::GetFromBatchAndDB not allowing StackableDB Summary: Closes https://github.com/facebook/rocksdb/pull/2881 Differential Revision: D5829682 Pulled By: yiwu-arbug fbshipit-source-id: abb8fa14b58cea7c416282f9be19e8b1a7961c6e	2017-09-13 17:26:35 -07:00
Maysam Yabandeh	09713a64b3	WritePrepared Txn: Lock-free CommitMap Summary: We had two proposals for lock-free commit maps. This patch implements the latter one that was simpler. We can later experiment with both proposals. In this impl each entry is an std::atomic of uint64_t, which are accessed via memory_order_acquire/release. In x86_64 arch this is compiled to simple reads and writes from memory. Closes https://github.com/facebook/rocksdb/pull/2861 Differential Revision: D5800724 Pulled By: maysamyabandeh fbshipit-source-id: 41abae9a4a5df050a8eb696c43de11c2770afdda	2017-09-13 12:12:11 -07:00
Amy Xu	5785b1fcb8	Fix naming in InternalKey Summary: - Switched all instances of SetMinPossibleForUserKey and SetMaxPossibleForUserKey in accordance to InternalKeyComparator's comparison logic Closes https://github.com/facebook/rocksdb/pull/2868 Differential Revision: D5804152 Pulled By: axxufb fbshipit-source-id: 80be35e04f2e8abc35cc64abe1fecb03af24e183	2017-09-12 17:17:42 -07:00
Andrew Kryczka	f5148ade10	support opening zero backups during engine init Summary: There are internal users who open BackupEngine for writing new backups only, and they don't care whether old backups can be read or not. The condition `BackupableDBOptions::max_valid_backups_to_open == 0` should be supported (previously in `df74b775e6` I made the mistake of choosing 0 as a special value to disable the limit). Closes https://github.com/facebook/rocksdb/pull/2819 Differential Revision: D5751599 Pulled By: ajkr fbshipit-source-id: e73ac19eb5d756d6b68601eae8e43407ee4f2752	2017-09-12 13:26:34 -07:00
Siying Dong	64b6452e0c	Make InternalKeyComparator final and directly use it in merging iterator Summary: Merging iterator invokes InternalKeyComparator.Compare() frequently to heap merge. By making InternalKeyComparator final and merging iterator to directly use InternalKeyComparator rather than through Iterator interface, we can give compiler a choice to avoid one more virtual function call if possible. I ran readseq benchmark in memory-only use case to make sure the performance at least doesn't regress. I have to disable the final key word in debug build, as a hack test class depends on overriding the class. Closes https://github.com/facebook/rocksdb/pull/2860 Differential Revision: D5800461 Pulled By: siying fbshipit-source-id: ab876f22a09bb5c560740911412336e0e25ccb53	2017-09-11 12:04:21 -07:00
Maysam Yabandeh	f46464d383	write-prepared txn: call IsInSnapshot Summary: This patch instruments the read path to verify each read value against an optional ReadCallback class. If the value is rejected, the reader moves on to the next value. The WritePreparedTxn makes use of this feature to skip sequence numbers that are not in the read snapshot. Closes https://github.com/facebook/rocksdb/pull/2850 Differential Revision: D5787375 Pulled By: maysamyabandeh fbshipit-source-id: 49d808b3062ab35e7ae98ad388f659757794184c	2017-09-11 09:14:48 -07:00
Maysam Yabandeh	9a4df72994	WritePrepared Txn: CommitBatch Summary: Implements CommitBatch and CommitWithoutPrepare for WritePreparedTxn Closes https://github.com/facebook/rocksdb/pull/2854 Differential Revision: D5793999 Pulled By: maysamyabandeh fbshipit-source-id: d8b9858221162c6ac7a1f6912cbd3481d0d8a503	2017-09-08 15:56:39 -07:00
Maysam Yabandeh	fce6c892ab	Advance max evicted seq in coarser granularity Summary: This patch advances the max_evicted_seq_ is larger granularities to reduce the overhead of updating the relevant data structures. It also refactor the related code and adds testing to that. As part of this patch some of the TODOs for removing usage of non-static const members are also addressed. Closes https://github.com/facebook/rocksdb/pull/2844 Differential Revision: D5772928 Pulled By: maysamyabandeh fbshipit-source-id: f4fcc2948be69c034f10812cf922ce5ab82ef98c	2017-09-08 14:41:22 -07:00
Yi Wu	dcd36a6aee	Make it explicit blob db doesn't support CF Summary: Blob db doesn't currently support column families. Return NotSupported status explicitly. Closes https://github.com/facebook/rocksdb/pull/2825 Differential Revision: D5757438 Pulled By: yiwu-arbug fbshipit-source-id: 44de9408fd032c98e8ae337d4db4ed37169bd9fa	2017-09-08 11:11:04 -07:00
Siying Dong	0e99323ac2	Fix CLANG Analyze Summary: clang analyze shows warnings after we upgrade the CLANG version. Fix them. Closes https://github.com/facebook/rocksdb/pull/2839 Differential Revision: D5769060 Pulled By: siying fbshipit-source-id: 3f8e4df715590d8984f6564b608fa08cfdfa5f14	2017-09-07 14:28:06 -07:00
Maysam Yabandeh	7e19a571e9	Remove unused TransactionCallback Summary: TransactionCallback was never used. Remove it to avoid confusion. Closes https://github.com/facebook/rocksdb/pull/2853 Differential Revision: D5787219 Pulled By: maysamyabandeh fbshipit-source-id: e2b6a89537e3770a269ad38be71c4b0b160a88ac	2017-09-07 12:17:18 -07:00
Maysam Yabandeh	79810e2d49	Skip write_prepared_transaction_test in travis Summary: The patch skips write_prepared_transaction_test from travis as they time out there. They are still covered in daily runs of tests. Closes https://github.com/facebook/rocksdb/pull/2836 Differential Revision: D5767203 Pulled By: maysamyabandeh fbshipit-source-id: 51045ef98a745197136e14b2ec02fc6f38081b75	2017-09-05 15:29:52 -07:00
Yi Wu	ab95e293d2	Fix memory leak on blob db open Summary: Fixes #2820 Closes https://github.com/facebook/rocksdb/pull/2826 Differential Revision: D5757527 Pulled By: yiwu-arbug fbshipit-source-id: f495b63700495aeaade30a1da5e3675848f3d72f	2017-09-01 14:13:51 -07:00
Maysam Yabandeh	37ae8cc60f	Signal progress of the test to avoid timeout Summary: Closes https://github.com/facebook/rocksdb/pull/2824 Differential Revision: D5756457 Pulled By: maysamyabandeh fbshipit-source-id: dff53e945d8ac4ffe6775a2176424fd1a27fc189	2017-09-01 11:28:52 -07:00
Dmitri Smirnov	0ec90a7cc2	Add -DPORTABLE=1 to MSVC CI build Summary: Add -DPORTABLE=1 port::cacheline_aligned_alloc() has arguments swapped which prevents every single test from running. Closes https://github.com/facebook/rocksdb/pull/2815 Differential Revision: D5751661 Pulled By: siying fbshipit-source-id: e0857d6e138ec46035b3c23d7c3c751901a0a4a0	2017-08-31 16:42:48 -07:00
Andrew Kryczka	b97685aef6	fix backup engine when latest backup corrupt Summary: Backup engine is intentionally openable even when some backups are corrupt. Previously the engine could write new backups as long as the most recent backup wasn't corrupt. This PR makes the backup engine able to create new backups even when the most recent one is corrupt. We now maintain two ID instance variables: - `latest_backup_id_` is used when creating backup to choose the new ID - `latest_valid_backup_id_` is used when restoring latest backup since we want most recent valid one Closes https://github.com/facebook/rocksdb/pull/2804 Differential Revision: D5734148 Pulled By: ajkr fbshipit-source-id: db440707b31df2c7b084188aa5f6368449e10bcf	2017-08-31 15:41:49 -07:00
Pengchao Wang	825a22c00c	garbage collect tombstones in merge operator Summary: Remove cassandra tombstone when reaching the max compaction level (full merge). if all columns collected key will be removed in next compaction via compaction filter Closes https://github.com/facebook/rocksdb/pull/2791 Reviewed By: sagar0 Differential Revision: D5722465 Pulled By: wpc fbshipit-source-id: 61e9898a5686551653a16383255aeaab3197e65e	2017-08-31 10:11:54 -07:00
Maysam Yabandeh	26ac24f199	Add more unit test to write_prepared txns Summary: Closes https://github.com/facebook/rocksdb/pull/2798 Differential Revision: D5724173 Pulled By: maysamyabandeh fbshipit-source-id: fb6b782d933fb4be315b1a231a6a67a66fdc9c96	2017-08-31 09:41:27 -07:00
Maysam Yabandeh	fbfa3e7a43	WriteAtPrepare: Efficient read from snapshot list Summary: Divide the old snapshots to two lists: a few that fit into a cached array and the rest in a vector, which is expected to be empty in normal cases. The former is to optimize concurrent reads from snapshots without requiring locks. It is done by an array of std::atomic, from which std::memory_order_acquire reads are compiled to simple read instructions in most of the x86_64 architectures. Closes https://github.com/facebook/rocksdb/pull/2758 Differential Revision: D5660504 Pulled By: maysamyabandeh fbshipit-source-id: 524fcf9a8e7f90a92324536456912a99aaa6740c	2017-08-26 01:00:38 -07:00
Yi Wu	503db684f7	make blob file close synchronous Summary: Fixing flaky blob_db_test. To close a blob file, blob db used to add a CloseSeqWrite job to the background thread to close it. Changing file close to be synchronous in order to simplify logic, and fix flaky blob_db_test. Closes https://github.com/facebook/rocksdb/pull/2787 Differential Revision: D5699387 Pulled By: yiwu-arbug fbshipit-source-id: dd07a945cd435cd3808fce7ee4ea57817409474a	2017-08-25 10:41:49 -07:00
Maysam Yabandeh	cd26af3476	Add unit test for WritePrepared skeleton Summary: Closes https://github.com/facebook/rocksdb/pull/2756 Differential Revision: D5660516 Pulled By: maysamyabandeh fbshipit-source-id: f3f3d3b5f544007a7fbdd78e49e4738b4437c7ee	2017-08-23 13:56:03 -07:00
Andrew Kryczka	234f33a3f9	allow nullptr Slice only as sentinel Summary: Allow `Slice` holding nullptr as a sentinel value but not in comparisons. This new restriction eliminates the need for the manual checks in `39ef900551`, while still conforming to glibc's `memcmp` API. Thanks siying for the idea. Users may need to migrate, so mentioned it in HISTORY.md. Closes https://github.com/facebook/rocksdb/pull/2777 Differential Revision: D5686016 Pulled By: ajkr fbshipit-source-id: 03a2ca3fd9a0ebade9d0d5686c81d59a9534f563	2017-08-23 10:56:06 -07:00
Maysam Yabandeh	ccf7f833e3	Use PinnableSlice in Transactions Summary: The ::Get from DB is not augmented with an overload method that takes a PinnableSlice instead of a string. Transactions however are not yet upgraded to use the new API. As a result, transaction users such as MyRocks cannot benefit from it. This patch updates the transactional API with a PinnableSlice overload. Closes https://github.com/facebook/rocksdb/pull/2736 Differential Revision: D5645770 Pulled By: maysamyabandeh fbshipit-source-id: f6af520df902f842de1bcf99bed3e8dfc43ad96d	2017-08-23 10:11:45 -07:00
yiwu-arbug	5b68b114f1	Blob db create a snapshot before every read Summary: If GC kicks in between * A Get() reads index entry from base db. * The Get() read from a blob file The GC can delete the corresponding blob file, making the key not found. Fortunately we have existing logic to avoid deleting a blob file if it is referenced by a snapshot. So the fix is to explicitly create a snapshot before reading index entry from base db. Closes https://github.com/facebook/rocksdb/pull/2754 Differential Revision: D5655956 Pulled By: yiwu-arbug fbshipit-source-id: e4ccbc51331362542e7343175bbcbdea5830f544	2017-08-20 18:26:19 -07:00
yiwu-arbug	4624ae52c9	GC the oldest file when out of space Summary: When out of space, blob db should GC the oldest file. The current implementation GC the newest one instead. Fixing it. Closes https://github.com/facebook/rocksdb/pull/2757 Differential Revision: D5657611 Pulled By: yiwu-arbug fbshipit-source-id: 56c30a4c52e6ab04551dda8c5c46006d4070b28d	2017-08-20 17:11:06 -07:00
Archit Mishra	bddd5d3630	Added mechanism to track deadlock chain Summary: Changes: * extended the wait_txn_map to track additional information * designed circular buffer to store n latest deadlocks' information * added test coverage to verify the additional information tracked is accurately stored in the buffer Closes https://github.com/facebook/rocksdb/pull/2630 Differential Revision: D5478025 Pulled By: armishra fbshipit-source-id: 2b138de7b5a73f5ca554fc3ff8220a3be49f39e7	2017-08-17 18:56:21 -07:00
yiwu-arbug	29877ec7b4	Fix blob db crash during calculating write amp Summary: On initial call to BlobDBImpl::WaStats() `all_periods_write_` would be empty, so it will crash when we call pop_front() at line 1627. Apparently it is mean to pop only when `all_periods_write_.size() > kWriteAmplificationStatsPeriods`. The whole write amp calculation doesn't seems to be correct and it is not being exposed. Will work on it later. Test Plan Change kWriteAmplificationStatsPeriodMillisecs to 1000 (1 second) and run db_bench --use_blob_db for 5 minutes. Closes https://github.com/facebook/rocksdb/pull/2751 Differential Revision: D5648269 Pulled By: yiwu-arbug fbshipit-source-id: b843d9a09bb5f9e1b713d101ec7b87e54b5115a4	2017-08-17 15:01:09 -07:00
Sagar Vemuri	8f2598ac9d	Enable Cassandra merge operator to be called with a single merge operand Summary: Updating Cassandra merge operator to make use of a single merge operand when needed. Single merge operand support has been introduced in #2721. Closes https://github.com/facebook/rocksdb/pull/2753 Differential Revision: D5652867 Pulled By: sagar0 fbshipit-source-id: b9fbd3196d3ebd0b752626dbf9bec9aa53e3e26a	2017-08-17 15:01:09 -07:00
follitude	ac8fb77afd	fix some misspellings Summary: PTAL ajkr Closes https://github.com/facebook/rocksdb/pull/2750 Differential Revision: D5648052 Pulled By: ajkr fbshipit-source-id: 7cd1ddd61364d5a55a10fdd293fa74b2bf89dd98	2017-08-16 21:57:20 -07:00
Maysam Yabandeh	eb6425303e	Update WritePrepared with the pseudo code Summary: Implement the main body of WritePrepared pseudo code. This includes PrepareInternal and CommitInternal, as well as AddCommitted which updates the commit map. It also provides a IsInSnapshot method that could be later called form the read path to decide if a version is in the read snapshot or it should other be skipped. This patch lacks unit tests and does not attempt to offer an efficient implementation. The idea is that to have the API specified so that we can work on related tasks in parallel. Closes https://github.com/facebook/rocksdb/pull/2713 Differential Revision: D5640021 Pulled By: maysamyabandeh fbshipit-source-id: bfa7a05e8d8498811fab714ce4b9c21530514e1c	2017-08-16 16:57:47 -07:00
Sagar Vemuri	132306fbf0	Remove PartialMerge implementation from Cassandra merge operator Summary: `PartialMergeMulti` implementation is enough for Cassandra, and `PartialMerge` is not required. Implementing both will just duplicate the code. As per https://github.com/facebook/rocksdb/blob/master/include/rocksdb/merge_operator.h#L130-L135 : ``` // The default implementation of PartialMergeMulti will use this function // as a helper, for backward compatibility. Any successor class of // MergeOperator should either implement PartialMerge or PartialMergeMulti, // although implementing PartialMergeMulti is suggested as it is in general // more effective to merge multiple operands at a time instead of two // operands at a time. ``` Closes https://github.com/facebook/rocksdb/pull/2737 Reviewed By: scv119 Differential Revision: D5633073 Pulled By: sagar0 fbshipit-source-id: ef4fa102c22fec6a0175ed12f5c44c15afe3c8ca	2017-08-15 14:59:34 -07:00
yiwu-arbug	e5a1b727c0	Fix blob DB transaction usage while GC Summary: While GC, blob DB use optimistic transaction to delete or replace the index entry in LSM, to guarantee correctness if there's a normal write writing to the same key. However, the previous implementation doesn't call SetSnapshot() nor use GetForUpdate() of transaction API, instead it do its own sequence number checking before beginning the transaction. A normal write can sneak in after the sequence number check and overwrite the key, and the GC will delete or relocate the old version of the key by mistake. Update the code to property use GetForUpdate() to check the existing index entry. After the patch the sequence number store with each blob record is useless, So I'm considering remove the sequence number from blob record, in another patch. Closes https://github.com/facebook/rocksdb/pull/2703 Differential Revision: D5589178 Pulled By: yiwu-arbug fbshipit-source-id: 8dc960cd5f4e61b36024ba7c32d05584ce149c24	2017-08-11 12:43:17 -07:00
Daniel Black	64f8484356	block_cache_tier: fix gcc-7 warnings Summary: Error was: utilities/persistent_cache/block_cache_tier.cc: In instantiation of ‘void rocksdb::Add(std::map<std::__cxx11::basic_string<char>, double>*, const string&, const T&) [with T = double; std::__cxx11::string = std::__cxx11::basic_string<char>]’: utilities/persistent_cache/block_cache_tier.cc:147:40: required from here utilities/persistent_cache/block_cache_tier.cc:141:23: error: type qualifiers ignored on cast result type [-Werror=ignored-qualifiers] stats->insert({key, static_cast<const double>(t)}); fixing like #2562 Closes https://github.com/facebook/rocksdb/pull/2603 Differential Revision: D5600910 Pulled By: yiwu-arbug fbshipit-source-id: 891a5ec7e451d2dec6ad1b6b7fac545657f87363	2017-08-10 11:58:53 -07:00
Maysam Yabandeh	bdc056f8aa	Refactor PessimisticTransaction Summary: This patch splits Commit and Prepare into lock-related logic and db-write-related logic. It moves lock-related logic to PessimisticTransaction to be reused by all children classes and movies the existing impl of db-write-related to PrepareInternal, CommitSingleInternal, and CommitInternal in WriteCommittedTxnImpl. Closes https://github.com/facebook/rocksdb/pull/2691 Differential Revision: D5569464 Pulled By: maysamyabandeh fbshipit-source-id: d1b8698e69801a4126c7bc211745d05c636f5325	2017-08-07 16:12:29 -07:00
Maysam Yabandeh	c9804e007a	Refactor TransactionDBImpl Summary: This opens space for the new implementations of TransactionDBImpl such as WritePreparedTxnDBImpl that has a different policy of how to write to DB. Closes https://github.com/facebook/rocksdb/pull/2689 Differential Revision: D5568918 Pulled By: maysamyabandeh fbshipit-source-id: f7eac866e175daf3793ae79da108f65cc7dc7b25	2017-08-05 17:26:15 -07:00
Yi Wu	0d4a2b7330	Avoid blob db call Sync() while writing Summary: The FsyncFiles background job call Fsync() periodically for blob files. However it can access WritableFileWriter concurrently with a Put() or Write(). And WritableFileWriter does not support concurrent access. It will lead to WritableFileWriter buffer being flush with same content twice, and blob file end up corrupted. Fixing by simply let FsyncFiles hold write_mutex_. Closes https://github.com/facebook/rocksdb/pull/2685 Differential Revision: D5561908 Pulled By: yiwu-arbug fbshipit-source-id: f0bb5bcab0e05694e053b8c49eab43640721e872	2017-08-04 13:12:07 -07:00
Yi Wu	92afe830f9	Update all blob db TTL and timestamps to uint64_t Summary: The current blob db implementation use mix of int32_t, uint32_t and uint64_t for TTL and expiration. Update all timestamps to uint64_t for consistency. Closes https://github.com/facebook/rocksdb/pull/2683 Differential Revision: D5557103 Pulled By: yiwu-arbug fbshipit-source-id: e4eab2691629a755e614e8cf1eed9c3a681d0c42	2017-08-03 17:57:30 -07:00
Yi Wu	0b814ba92d	Allow concurrent writes to blob db Summary: I'm going with brute-force solution, just letting Put() and Write() holding a mutex before writing. May improve concurrent writing with finer granularity locking later. Closes https://github.com/facebook/rocksdb/pull/2682 Differential Revision: D5552690 Pulled By: yiwu-arbug fbshipit-source-id: 039abd675b5d274a7af6428198d1733cafecef4c	2017-08-03 15:11:26 -07:00
Yi Wu	2c45ada4c4	Blob DB garbage collection should keep keys with newer version Summary: Fix the bug where if blob db garbage collection revmoe keys with newer version. It shouldn't delete the key from base db when sequence number in base db is not equal to the one in blob log. Closes https://github.com/facebook/rocksdb/pull/2678 Differential Revision: D5549752 Pulled By: yiwu-arbug fbshipit-source-id: abb8649260963b5c389748023970fd746279d227	2017-08-03 13:12:12 -07:00
Maysam Yabandeh	c3d5c4d38a	Refactor TransactionImpl Summary: This patch refactors TransactionImpl by separating the logic for pessimistic concurrency control from the implementation of how to write the data to rocksdb. The existing implementation is named WriteCommittedTxnImpl as it writes committed data to the db. A template named WritePreparedTxnImpl is also added which will be later completed to provide a an alternative implementation. Closes https://github.com/facebook/rocksdb/pull/2676 Differential Revision: D5549998 Pulled By: maysamyabandeh fbshipit-source-id: 16298e86b43ca4849324c1f35c731913c6d17bec	2017-08-03 08:57:22 -07:00
Yi Wu	1900771bd2	Dump Blob DB options to info log Summary: * Dump blob db options to info log * Remove BlobDBOptionsImpl to disallow dynamic cast BlobDBOptions into BlobDBOptionsImpl. Move options there to be constants or into BlobDBOptions. The dynamic cast is broken after #2645 * Change some of the default options * Remove blob_db_options.min_blob_size, which is unimplemented. Will implement it soon. Closes https://github.com/facebook/rocksdb/pull/2671 Differential Revision: D5529912 Pulled By: yiwu-arbug fbshipit-source-id: dcd58ca981db5bcc7f123b65a0d6f6ae0dc703c7	2017-08-01 13:01:47 -07:00
Siying Dong	21696ba502	Replace dynamic_cast<> Summary: Replace dynamic_cast<> so that users can choose to build with RTTI off, so that they can save several bytes per object, and get tiny more memory available. Some nontrivial changes: 1. Add Comparator::GetRootComparator() to get around the internal comparator hack 2. Add the two experiemental functions to DB 3. Add TableFactory::GetOptionString() to avoid unnecessary casting to get the option string 4. Since 3 is done, move the parsing option functions for table factory to table factory files too, to be symmetric. Closes https://github.com/facebook/rocksdb/pull/2645 Differential Revision: D5502723 Pulled By: siying fbshipit-source-id: fd13cec5601cf68a554d87bfcf056f2ffa5fbf7c	2017-07-28 16:27:16 -07:00
Yi Wu	aaf42fe775	Move blob_db/ttl_extractor.h into blob_db/blob_db.h Summary: Move blob_db/ttl_extractor.h into blob_db/blob_db.h Also exclude TTLExtractor from LITE build. Closes https://github.com/facebook/rocksdb/pull/2665 Differential Revision: D5520009 Pulled By: yiwu-arbug fbshipit-source-id: 4813dcc272c7cc4bf2cdac285256d9a17d78c7b7	2017-07-28 14:28:21 -07:00
Sagar Vemuri	aace46516b	Fix license headers in Cassandra related files Summary: I might have missed these while doing some recent cassandra code reviews. Closes https://github.com/facebook/rocksdb/pull/2663 Differential Revision: D5520138 Pulled By: sagar0 fbshipit-source-id: 340930afe9efe03c75f535a1da1f89bd3e53c1f9	2017-07-28 13:56:56 -07:00
Islam AbdelRahman	50a969131f	CacheActivityLogger, component to log cache activity into a file Summary: Simple component that will add a new entry in a log file every time we lookup/insert a key in SimCache. API: ``` SimCache::StartActivityLogging(<file_name>, <env>, <optional_max_size>) SimCache::StopActivityLogging() ``` Sending for review, Still need to add more comments. I was thinking about a better approach, but I ended up deciding I will use a mutex to sync the writes to the file, since this feature should not be heavily used and only used to collect info that will be analyzed offline. I think it's okay to hold the mutex every time we lookup/add to the SimCache. Closes https://github.com/facebook/rocksdb/pull/2295 Differential Revision: D5063826 Pulled By: IslamAbdelRahman fbshipit-source-id: f3b5daed8b201987c9a071146ddd5c5740a2dd8c	2017-07-28 12:36:48 -07:00
Yi Wu	6083bc79f8	Blob DB TTL extractor Summary: Introducing blob_db::TTLExtractor to replace extract_ttl_fn. The TTL extractor can be use to extract TTL from keys insert with Put or WriteBatch. Change over existing extract_ttl_fn are: * If value is changed, it will be return via std::string* (rather than Slice). With Slice the new value has to be part of the existing value. With std::string* the limitation is removed. * It can optionally return TTL or expiration. Other changes in this PR: * replace `std::chrono::system_clock` with `Env::NowMicros` so that I can mock time in tests. * add several TTL tests. * other minor naming change. Closes https://github.com/facebook/rocksdb/pull/2659 Differential Revision: D5512627 Pulled By: yiwu-arbug fbshipit-source-id: 0dfcb00d74d060b8534c6130c808e4d5d0a54440	2017-07-27 23:26:04 -07:00
Maysam Yabandeh	2b259c9d49	Lower num of iterations in DeadlockCycle test Summary: Currently this test times out with tsan. This is likely due to decreased speed with tsan. By lowering the number of iterations we can still catch a bug as the test is run regularly and multiple runs of the test is equivalent with running the test with more iterations. Closes https://github.com/facebook/rocksdb/pull/2639 Differential Revision: D5490549 Pulled By: maysamyabandeh fbshipit-source-id: bd69c42a9728d337ac95a06a401088384e51731a	2017-07-25 11:42:26 -07:00
Siying Dong	e67b35c076	Add Iterator::Refresh() Summary: Add and implement Iterator::Refresh(). When this function is called, if the super version doesn't change, update the sequence number of the iterator to the latest one and invalidate the iterator. If the super version changed, recreated the whole iterator. This can help users reuse the iterator more easily. Closes https://github.com/facebook/rocksdb/pull/2621 Differential Revision: D5464500 Pulled By: siying fbshipit-source-id: f548bd35e85c1efca2ea69273802f6704eba6ba9	2017-07-24 10:54:37 -07:00
Sagar Vemuri	72502cf227	Revert "comment out unused parameters" Summary: This reverts the previous commit `1d7048c598`, which broke the build. Did a `git revert 1d7048c`. Closes https://github.com/facebook/rocksdb/pull/2627 Differential Revision: D5476473 Pulled By: sagar0 fbshipit-source-id: 4756ff5c0dfc88c17eceb00e02c36176de728d06	2017-07-21 18:26:26 -07:00
Victor Gao	1d7048c598	comment out unused parameters Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually. Reviewed By: igorsugak Differential Revision: D5454343 fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2	2017-07-21 14:57:44 -07:00
Pengchao Wang	534c255c7a	Cassandra compaction filter for purge expired columns and rows Summary: Major changes in this PR: * Implement CassandraCompactionFilter to remove expired columns and rows (if all column expired) * Move cassandra related code from utilities/merge_operators/cassandra to utilities/cassandra/* * Switch to use shared_ptr<> from uniqu_ptr for Column membership management in RowValue. Since columns do have multiple owners in Merge and GC process, use shared_ptr helps make RowValue immutable. * Rename cassandra_merge_test to cassandra_functional_test and add two TTL compaction related tests there. Closes https://github.com/facebook/rocksdb/pull/2588 Differential Revision: D5430010 Pulled By: wpc fbshipit-source-id: 9566c21e06de17491d486a68c70f52d501f27687	2017-07-21 14:57:44 -07:00
Yi Wu	0302da47a7	Reduce blob db noisy logging Summary: Remove some of the per-key logging by blob db to reduce noise. Closes https://github.com/facebook/rocksdb/pull/2587 Differential Revision: D5429115 Pulled By: yiwu-arbug fbshipit-source-id: b89328282fb8b3c64923ce48738c16017ce7feaf	2017-07-20 15:02:31 -07:00
Yedidya Feldblum	f1a056e005	CodeMod: Prefer ADD_FAILURE() over EXPECT_TRUE(false), et cetera Summary: CodeMod: Prefer `ADD_FAILURE()` over `EXPECT_TRUE(false)`, et cetera. The tautologically-conditioned and tautologically-contradicted boolean expectations/assertions have better alternatives: unconditional passes and failures. Reviewed By: Orvid Differential Revision: D5432398 Tags: codemod, codemod-opensource fbshipit-source-id: d16b447e8696a6feaa94b41199f5052226ef6914	2017-07-16 21:26:02 -07:00
Siying Dong	3c327ac2d0	Change RocksDB License Summary: Closes https://github.com/facebook/rocksdb/pull/2589 Differential Revision: D5431502 Pulled By: siying fbshipit-source-id: 8ebf8c87883daa9daa54b2303d11ce01ab1f6f75	2017-07-15 16:11:23 -07:00
Yi Wu	26ce69b195	Update blob db to use ROCKS_LOG_* macro Summary: Update blob db to use the newer ROCKS_LOG_* macro. Closes https://github.com/facebook/rocksdb/pull/2574 Differential Revision: D5414526 Pulled By: yiwu-arbug fbshipit-source-id: e428753aa5917e8b435cead2db26df586e5d1def	2017-07-13 10:14:04 -07:00
foolenough	21b17d7686	Fix BlobDB::Get which only get out the value offset Summary: Blob db use StackableDB::get which only get out the value offset, but not the value. Fix by making BlobDB::Get override the designated getter. Closes https://github.com/facebook/rocksdb/pull/2553 Differential Revision: D5396823 Pulled By: yiwu-arbug fbshipit-source-id: 5a7d1cf77ee44490f836a6537225955382296878	2017-07-12 17:57:40 -07:00
Andrew Kryczka	33042573db	Fix GetCurrentTime() initialization for valgrind Summary: Valgrind had false positive complaints about the initialization pattern for `GetCurrentTime()`'s argument in #2480. We can instead have the client initialize the time variable before calling `GetCurrentTime()`, and have `GetCurrentTime()` promise to only overwrite it in success case. Closes https://github.com/facebook/rocksdb/pull/2526 Differential Revision: D5358689 Pulled By: ajkr fbshipit-source-id: 857b189f24c19196f6bb299216f3e23e7bc4be42	2017-07-05 12:12:00 -07:00
Mike Kolupaev	397ab11152	Improve Status message for block checksum mismatches Summary: We've got some DBs where iterators return Status with message "Corruption: block checksum mismatch" all the time. That's not very informative. It would be much easier to investigate if the error message contained the file name - then we would know e.g. how old the corrupted file is, which would be very useful for finding the root cause. This PR adds file name, offset and other stuff to some block corruption-related status messages. It doesn't improve all the error messages, just a few that were easy to improve. I'm mostly interested in "block checksum mismatch" and "Bad table magic number" since they're the only corruption errors that I've ever seen in the wild. Closes https://github.com/facebook/rocksdb/pull/2507 Differential Revision: D5345702 Pulled By: al13n321 fbshipit-source-id: fc8023d43f1935ad927cef1b9c55481ab3cb1339	2017-06-28 21:27:01 -07:00
Siying Dong	18c63af6ef	Make "make analyze" happy Summary: "make analyze" is reporting some errors. It's complicated to look but it seems to me that they are all false positive. Anyway, I think cleaning them up is a good idea. Some of the changes are hacky but I don't know a better way. Closes https://github.com/facebook/rocksdb/pull/2508 Differential Revision: D5341710 Pulled By: siying fbshipit-source-id: 6070e430e0e41a080ef441e05e8ec827d45efab6	2017-06-28 15:42:27 -07:00
Yi Wu	982cec22af	Fix TARGETS file tests list Summary: 1. The buckifier script assume each test "foo" comes with a .cc file of the same name (i.e. foo.cc). Update cassandra tests to follow this pattern so that the buckifier script can recognize them. 2. add blob_db_test Closes https://github.com/facebook/rocksdb/pull/2506 Differential Revision: D5331517 Pulled By: yiwu-arbug fbshipit-source-id: 86f3eba471fc621186ab44cbd073b6162cde8e57	2017-06-27 14:12:02 -07:00
Maysam Yabandeh	499ebb3ab5	Optimize for serial commits in 2PC Summary: Throughput: 46k tps in our sysbench settings (filling the details later) The idea is to have the simplest change that gives us a reasonable boost in 2PC throughput. Major design changes: 1. The WAL file internal buffer is not flushed after each write. Instead it is flushed before critical operations (WAL copy via fs) or when FlushWAL is called by MySQL. Flushing the WAL buffer is also protected via mutex_. 2. Use two sequence numbers: last seq, and last seq for write. Last seq is the last visible sequence number for reads. Last seq for write is the next sequence number that should be used to write to WAL/memtable. This allows to have a memtable write be in parallel to WAL writes. 3. BatchGroup is not used for writes. This means that we can have parallel writers which changes a major assumption in the code base. To accommodate for that i) allow only 1 WriteImpl that intends to write to memtable via mem_mutex_--which is fine since in 2PC almost all of the memtable writes come via group commit phase which is serial anyway, ii) make all the parts in the code base that assumed to be the only writer (via EnterUnbatched) to also acquire mem_mutex_, iii) stat updates are protected via a stat_mutex_. Note: the first commit has the approach figured out but is not clean. Submitting the PR anyway to get the early feedback on the approach. If we are ok with the approach I will go ahead with this updates: 0) Rebase with Yi's pipelining changes 1) Currently batching is disabled by default to make sure that it will be consistent with all unit tests. Will make this optional via a config. 2) A couple of unit tests are disabled. They need to be updated with the serial commit of 2PC taken into account. 3) Replacing BatchGroup with mem_mutex_ got a bit ugly as it requires releasing mutex_ beforehand (the same way EnterUnbatched does). This needs to be cleaned up. Closes https://github.com/facebook/rocksdb/pull/2345 Differential Revision: D5210732 Pulled By: maysamyabandeh fbshipit-source-id: 78653bd95a35cd1e831e555e0e57bdfd695355a4	2017-06-24 14:11:29 -07:00
Siying Dong	88cd2d96e7	Downgrade option sanitiy check level for prefix_extractor Summary: With `c7004840d2`, it's safe to open a DB with different prefix extractor. So it's safe to skip prefix extractor check. Closes https://github.com/facebook/rocksdb/pull/2474 Differential Revision: D5294700 Pulled By: siying fbshipit-source-id: eeb500da795eecb29b8c9c56a14cfd4afda12ecc	2017-06-22 16:26:36 -07:00
Andrew Kryczka	048446fc74	Fix cassandra ASAN use-after-free Summary: When we create a column based on the `string::c_str()`, we need to make sure that char array doesn't get deleted when calls to `string::append()` cause the string to expand. Closes https://github.com/facebook/rocksdb/pull/2470 Differential Revision: D5285049 Pulled By: ajkr fbshipit-source-id: f918dd426ff3c024e7a293dcb10448f10b6c98e8	2017-06-20 13:27:16 -07:00
Dmitri Smirnov	a21db161c9	Implement ReopenWritibaleFile on Windows and other fixes Summary: Make default impl return NoSupported so the db_blob tests exist in a meaningful manner. Replace std::thread to port::Thread Closes https://github.com/facebook/rocksdb/pull/2465 Differential Revision: D5275563 Pulled By: yiwu-arbug fbshipit-source-id: cedf1a18a2c05e20d768c1308b3f3224dbd70ab6	2017-06-20 10:31:13 -07:00
Chen Shen	cbd825deea	Create a MergeOperator for Cassandra Row Value Summary: This PR implements the MergeOperator for Cassandra Row Values. Closes https://github.com/facebook/rocksdb/pull/2289 Differential Revision: D5055464 Pulled By: scv119 fbshipit-source-id: 45f276ef8cbc4704279202f6a20c64889bc1adef	2017-06-16 14:27:00 -07:00
Yi Wu	ae8571f5c2	Fix blob db compression bug Summary: `CompressBlock()` will return the uncompressed slice (i.e. `Slice(value_unc)`) if compression ratio is not good enough. This is undesired. We need to always assign the compressed slice to `value`. Closes https://github.com/facebook/rocksdb/pull/2447 Differential Revision: D5244682 Pulled By: yiwu-arbug fbshipit-source-id: 6989dd8852c9622822ba9acec9beea02007dff09	2017-06-14 13:56:42 -07:00
Yi Wu	7a380deff7	Update blob_db_test Summary: I'm trying to improve unit test of blob db. I'm rewriting blob db test. In this patch: * Rewrite tests of basic put/write/delete operations. * Add disable_background_tasks to BlobDBOptionsImpl to allow me not running any background job for basic unit tests. * Move DestroyBlobDB out from BlobDBImpl to be a standalone function. * Remove all garbage collection related tests. Will rewrite them in following patch. * Disabled compression test since it is failing. Will fix in a followup patch. Closes https://github.com/facebook/rocksdb/pull/2446 Differential Revision: D5243306 Pulled By: yiwu-arbug fbshipit-source-id: 157c71ad3b699307cb88baa3830e9b6e74f8e939	2017-06-14 13:12:34 -07:00
Sagar Vemuri	89ad9f3adb	Allow ignoring unknown options when loading options from a file Summary: Added a flag, `ignore_unknown_options`, to skip unknown options when loading an options file (using `LoadLatestOptions`/`LoadOptionsFromFile`) or while verifying options (using `CheckOptionsCompatibility`). This will help in downgrading the db to an older version. Also added `--ignore_unknown_options` flag to ldb Example Use case: In MyRocks, if copying from newer version to older version, it is often impossible to start because of new RocksDB options that don't exist in older version, even though data format is compatible. MyRocks uses these load and verify functions in [ha_rocksdb.cc::check_rocksdb_options_compatibility](`e004fd9f41/storage/rocksdb/ha_rocksdb.cc (L3348-L3401)`). Test Plan: Updated the unit tests. `make check` ldb: $ ./ldb --db=/tmp/test_db --create_if_missing put a1 b1 OK Now edit /tmp/test_db/<OPTIONS-file> and add an unknown option. Try loading the options now, and it fails: $ ./ldb --db=/tmp/test_db --try_load_options get a1 Failed: Invalid argument: Unrecognized option DBOptions:: abcd Passes with the new --ignore_unknown_options flag $ ./ldb --db=/tmp/test_db --try_load_options --ignore_unknown_options get a1 b1 Closes https://github.com/facebook/rocksdb/pull/2423 Differential Revision: D5212091 Pulled By: sagar0 fbshipit-source-id: 2ec17636feb47dc0351b53a77e5f15ef7cbf2ca7	2017-06-13 16:58:01 -07:00
Andrew Kryczka	c217e0b9c7	Call RateLimiter for compaction reads Summary: Allow users to rate limit background work based on read bytes, written bytes, or sum of read and written bytes. Support these by changing the RateLimiter API, so no additional options were needed. Closes https://github.com/facebook/rocksdb/pull/2433 Differential Revision: D5216946 Pulled By: ajkr fbshipit-source-id: aec57a8357dbb4bfde2003261094d786d94f724e	2017-06-13 14:56:46 -07:00
Yi Wu	91e2aa3ce2	write exact sequence number for each put in write batch Summary: At the beginning of write batch write, grab the latest sequence from base db and assume sequence number will increment by 1 for each put and delete, and write the exact sequence number with each put. This is assuming we are the only writer to increment sequence number (no external file ingestion, etc) and there should be no holes in the sequence number. Also having some minor naming changes. Closes https://github.com/facebook/rocksdb/pull/2402 Differential Revision: D5176134 Pulled By: yiwu-arbug fbshipit-source-id: cb4712ee44478d5a2e5951213a10b72f08fe8c88	2017-06-13 12:42:36 -07:00
Maysam Yabandeh	550a1df72c	Fix clang errors by asserting the precondition Summary: USE_CLANG=1 make -j32 analyze The two errors would disappear after the assertion. Closes https://github.com/facebook/rocksdb/pull/2416 Differential Revision: D5193526 Pulled By: maysamyabandeh fbshipit-source-id: 16a21f18f68023f862764dd3ab9e00ca60b0eefa	2017-06-06 12:56:52 -07:00
hyunwoo	c7662a44a4	fixed typo Summary: fixed typo Closes https://github.com/facebook/rocksdb/pull/2376 Differential Revision: D5183630 Pulled By: ajkr fbshipit-source-id: 133cfd0445959e70aa2cd1a12151bf3c0c5c3ac5	2017-06-05 11:27:34 -07:00
Aaron Gao	7f6c02dda1	using ThreadLocalPtr to hide ROCKSDB_SUPPORT_THREAD_LOCAL from public… Summary: … headers https://github.com/facebook/rocksdb/pull/2199 should not reference RocksDB-specific macros (like ROCKSDB_SUPPORT_THREAD_LOCAL in this case) to public headers, `iostats_context.h` and `perf_context.h`. We shouldn't do that because users have to provide these compiler flags when building their binary with RocksDB. We should hide the thread local global variable inside our implementation and just expose a function api to retrieve these variables. It may break some users for now but good for long term. make check -j64 Closes https://github.com/facebook/rocksdb/pull/2380 Differential Revision: D5177896 Pulled By: lightmark fbshipit-source-id: 6fcdfac57f2e2dcfe60992b7385c5403f6dcb390	2017-06-02 17:26:19 -07:00
Yi Wu	ad19eb8686	Fixing blob db sequence number handling Summary: Blob db rely on base db returning sequence number through write batch after DB::Write(). However after recent changes to the write path, DB::Writ()e no longer return sequence number in some cases. Fixing it by have WriteBatchInternal::InsertInto() always encode sequence number into write batch. Stacking on #2375. Closes https://github.com/facebook/rocksdb/pull/2385 Differential Revision: D5148358 Pulled By: yiwu-arbug fbshipit-source-id: 8bda0aa07b9334ed03ed381548b39d167dc20c33	2017-05-31 10:56:45 -07:00
Yi Wu	345878a7fb	update blob_db_test Summary: Re-enable blob_db_test with some update: * Commented out delay at the end of GC tests. Will update the logic later with sync point to properly trigger GC. * Added some helper functions. Also update make files to include blob_dump tool. Closes https://github.com/facebook/rocksdb/pull/2375 Differential Revision: D5133793 Pulled By: yiwu-arbug fbshipit-source-id: 95470b26d0c1f9592ba4b7637e027fdd263f425c	2017-05-30 22:26:13 -07:00
Tamir Duberstein	103d0692ea	Avoid unsupported attributes when not building with UBSAN Summary: yiwu-arbug see individual commits. Closes https://github.com/facebook/rocksdb/pull/2318 Differential Revision: D5141520 Pulled By: yiwu-arbug fbshipit-source-id: 7987c92ab4461eef36afce5a133d3a0ee0c96300	2017-05-30 11:13:01 -07:00
Sagar Vemuri	02594b5f11	Fix build errors in blob_dump_tool with GCC 4.8 Summary: Fixing the build errors seen with GCC 4.8.1. ``` Makefile:105: Warning: Compiling in debug mode. Don't use the resulting binary in production utilities/blob_db/blob_dump_tool.cc: In member function ‘rocksdb::Status rocksdb::blob_db::BlobDumpTool::DumpBlobLogFooter(uint64_t, uint64_t)’: utilities/blob_db/blob_dump_tool.cc:149:42: error: expected ‘)’ before ‘PRIu64’ fprintf(stdout, " Blob count : %" PRIu64 "\n", footer.GetBlobCount()); ^ utilities/blob_db/blob_dump_tool.cc:149:76: error: spurious trailing ‘%’ in format [-Werror=format=] fprintf(stdout, " Blob count : %" PRIu64 "\n", footer.GetBlobCount()); ^ utilities/blob_db/blob_dump_tool.cc:149:76: error: too many arguments for format [-Werror=format-extra-args] utilities/blob_db/blob_dump_tool.cc: In member function ‘rocksdb::Status rocksdb::blob_db::BlobDumpTool::DumpRecord(rocksdb::blob_db::BlobDumpTool::DisplayType, rocksdb::blob_db::BlobDumpTool::DisplayType, uint64_t)’: utilities/blob_db/blob_dump_tool.cc:161:49: error: expected ‘)’ before ‘PRIx64’ fprintf(stdout, "Read record with offset 0x%" PRIx64 " (%" PRIu64 "):\n", ^ utilities/blob_db/blob_dump_tool.cc:162:27: error: spurious trailing ‘%’ in format [-Werror=format=] offset, offset); ^ utilities/blob_db/blob_dump_tool.cc:162:27: error: too many arguments for format [-Werror=format-extra-args] utilities/blob_db/blob_dump_tool.cc:176:38: error: expected ‘)’ before ‘PRIu64’ fprintf(stdout, " blob size : %" PRIu64 "\n", record.GetBlobSize()); ^ utilities/blob_db/blob_dump_tool.cc:176:71: error: spurious trailing ‘%’ in format [-Werror=format=] fprintf(stdout, " blob size : %" PRIu64 "\n", record.GetBlobSize()); ^ utilities/blob_db/blob_dump_tool.cc:176:71: error: too many arguments for format [-Werror=format-extra-args] utilities/blob_db/blob_dump_tool.cc:178:38: error: expected ‘)’ before ‘PRIu64’ fprintf(stdout, " time : %" PRIu64 "\n", record.GetTimeVal()); ^ utilities/blob_db/blob_dump_tool.cc:178:70: error: spurious trailing ‘%’ in format [-Werror=format=] fprintf(stdout, " time : %" PRIu64 "\n", record.GetTimeVal()); ^ utilities/blob_db/blob_dump_tool.cc:178:70: error: too many arguments for format [-Werror=format-extra-args] utilities/blob_db/blob_dump_tool.cc:214:38: error: expected ‘)’ before ‘PRIu64’ fprintf(stdout, " sequence : %" PRIu64 "\n", record.GetSN()); ^ utilities/blob_db/blob_dump_tool.cc:214:65: error: spurious trailing ‘%’ in format [-Werror=format=] fprintf(stdout, " sequence : %" PRIu64 "\n", record.GetSN()); ``` Closes https://github.com/facebook/rocksdb/pull/2359 Differential Revision: D5117684 Pulled By: sagar0 fbshipit-source-id: 7480346bcd96205fcae890927c5e68cf004e87be	2017-05-24 00:11:36 -07:00
Yi Wu	578fb0b1dc	Simple blob file dumper Summary: A simple blob file dumper. Closes https://github.com/facebook/rocksdb/pull/2242 Differential Revision: D5097553 Pulled By: yiwu-arbug fbshipit-source-id: c6e00d949fcd3658f9f68da9352f06339fac418d	2017-05-23 10:42:59 -07:00
Aaron Gao	3e86c0f07c	disable direct reads for log and manifest and add direct io to tests Summary: Disable direct reads for log and manifest. Direct reads should not affect sequential_file Also add kDirectIO for option_config_ in db_test_util Closes https://github.com/facebook/rocksdb/pull/2337 Differential Revision: D5100261 Pulled By: lightmark fbshipit-source-id: 0ebfd13b93fa1b8f9acae514ac44f8125a05868b	2017-05-22 18:41:28 -07:00
Sagar Vemuri	228f49d20a	Fix data races caught by tsan Summary: This fixes the tsan build failures in: - write_callback_test - persistent_cache_test.* Closes https://github.com/facebook/rocksdb/pull/2339 Differential Revision: D5101190 Pulled By: sagar0 fbshipit-source-id: 537e19ed05272b1f34cfbf793aa822b2264a1643	2017-05-22 10:27:23 -07:00
Yi Wu	d746aead1a	Suppress clang-analyzer false positive Summary: Fixing two types of clang-analyzer false positives: * db is deleted and then reopen, and clang-analyzer thinks we are reusing the pointer after it has been deleted. Adding asserts to hint clang-analyzer the pointer is recreated. * ParsedInternalKey is (intentionally) uninitialized. Initialize the struct only when clang-analyzer is running. Closes https://github.com/facebook/rocksdb/pull/2334 Differential Revision: D5093801 Pulled By: yiwu-arbug fbshipit-source-id: f51355382098eb3da5ab9f64e094c6d03e6bdf7d	2017-05-19 10:56:28 -07:00
yizhu.sun	f5ba131bf8	Fixed some spelling mistakes Summary: Closes https://github.com/facebook/rocksdb/pull/2314 Differential Revision: D5079601 Pulled By: sagar0 fbshipit-source-id: ae5696fd735718f544435c64c3179c49b8c04349	2017-05-17 23:12:36 -07:00
hyunwoo	0ebdd70579	fixed typo Summary: fixed typo Closes https://github.com/facebook/rocksdb/pull/2312 Differential Revision: D5079631 Pulled By: sagar0 fbshipit-source-id: e4c8d1d89b244ee69e9dea1dd013227cc5241026	2017-05-17 16:41:49 -07:00
Yi Wu	445f1235bf	s/std::snprintf/snprintf Summary: Looks like std::snprintf is not available on all platforms (e.g. MSVC 2010). Change it back to snprintf, where we have a macro in port.h to workaround compatibility. Closes https://github.com/facebook/rocksdb/pull/2308 Differential Revision: D5070988 Pulled By: yiwu-arbug fbshipit-source-id: bedfc1660bab0431c583ad434b7e68265e1211b1	2017-05-16 12:01:04 -07:00
Yi Wu	86d5492530	Fix build error with blob DB. Summary: snprintf is in <stdio.h> and not in namespace std. Closes https://github.com/facebook/rocksdb/pull/2287 Reviewed By: anirbanr-fb Differential Revision: D5054752 Pulled By: yiwu-arbug fbshipit-source-id: 356807ec38f3c7d95951cdb41f31a3d3ae0714d4	2017-05-15 14:05:46 -07:00
Andrew Kryczka	3fa9a39c68	Add GetAllKeyVersions API Summary: - Introduced an include/ file dedicated to db-related debug functions to avoid making db.h more complex - Added debugging function, `GetAllKeyVersions()`, to return a listing of internal data for a range of user keys. The new `struct KeyVersion` exposes data similar to internal key without exposing any internal type. - Migrated the "ldb idump" subcommand to use this function - The API takes an inclusive-exclusive range to match behavior of "ldb idump". This will be quite annoying for users who want to query a single user key's versions :(. Closes https://github.com/facebook/rocksdb/pull/2232 Differential Revision: D4976007 Pulled By: ajkr fbshipit-source-id: cab375da53a7595d6575af2b7e3b776aa3ad793e	2017-05-12 15:54:06 -07:00
Anirban Rahut	d85ff4953c	Blob storage pr Summary: The final pull request for Blob Storage. Closes https://github.com/facebook/rocksdb/pull/2269 Differential Revision: D5033189 Pulled By: yiwu-arbug fbshipit-source-id: 6356b683ccd58cbf38a1dc55e2ea400feecd5d06	2017-05-10 15:14:44 -07:00
siddontang	b551104e04	support PopSavePoint for WriteBatch Summary: Try to fix https://github.com/facebook/rocksdb/issues/1969 Closes https://github.com/facebook/rocksdb/pull/2170 Differential Revision: D4907333 Pulled By: yiwu-arbug fbshipit-source-id: 417b420ff668e6c2fd0dad42a94c57385012edc5	2017-05-03 10:57:45 -07:00
Yi Wu	da4b2070b3	Fix WriteBatchWithIndex address use after scope error Summary: Fix use after scope error caught by ASAN. Closes https://github.com/facebook/rocksdb/pull/2228 Differential Revision: D4968028 Pulled By: yiwu-arbug fbshipit-source-id: a2a266c98634237494ab4fb2d666bc938127aeb2	2017-04-28 13:12:10 -07:00
Siying Dong	d616ebea23	Add GPLv2 as an alternative license. Summary: Closes https://github.com/facebook/rocksdb/pull/2226 Differential Revision: D4967547 Pulled By: siying fbshipit-source-id: dd3b58ae1e7a106ab6bb6f37ab5c88575b125ab4	2017-04-27 18:06:12 -07:00
Dmitri Smirnov	cdad04b051	Remove double buffering on RandomRead on Windows. Summary: Remove double buffering on RandomRead on Windows. With more logic appear in file reader/write Read no longer obeys forwarding calls to Windows implementation. Previously direct_io (unbuffered) was only available on Windows but now is supported as generic. We remove intermediate buffering on Windows. Remove random_access_max_buffer_size option which was windows specific. Non-zero values for that opton introduced unnecessary lock contention. Remove Env::EnableReadAhead(), Env::ShouldForwardRawRequest() that are no longer necessary. Add aligned buffer reads for cases when requested reads exceed read ahead size. Closes https://github.com/facebook/rocksdb/pull/2105 Differential Revision: D4847770 Pulled By: siying fbshipit-source-id: 8ab48f8e854ab498a4fd398a6934859792a2788f	2017-04-27 12:30:05 -07:00
Andrew Kryczka	e5e545a021	Reunite checkpoint and backup core logic Summary: These code paths forked when checkpoint was introduced by copy/pasting the core backup logic. Over time they diverged and bug fixes were sometimes applied to one but not the other (like fix to include all relevant WALs for 2PC), or it required extra effort to fix both (like fix to forge CURRENT file). This diff reunites the code paths by extracting the core logic into a function, CreateCustomCheckpoint(), that is customizable via callbacks to implement both checkpoint and backup. Related changes: - flush_before_backup is now forcibly enabled when 2PC is enabled - Extracted CheckpointImpl class definition into a header file. This is so the function, CreateCustomCheckpoint(), can be called by internal rocksdb code but not exposed to users. - Implemented more functions in DummyDB/DummyLogFile (in backupable_db_test.cc) that are used by CreateCustomCheckpoint(). Closes https://github.com/facebook/rocksdb/pull/1932 Differential Revision: D4622986 Pulled By: ajkr fbshipit-source-id: 157723884236ee3999a682673b64f7457a7a0d87	2017-04-24 15:06:46 -07:00
Maysam Yabandeh	4c9447d889	Add erase option to release cache Summary: This is useful when we put the entries in the block cache for accounting purposes and do not expect it to be used after it is released. If the cache does not erase the item in such cases not only the performance of cache is negatively affected but the item's destructor not being called at the time of release might violate the assumptions about the lifetime of the object. The new change adds a force_erase option to the Release method and returns a boolean to indicate whehter the item is successfully deleted. Closes https://github.com/facebook/rocksdb/pull/2180 Differential Revision: D4916032 Pulled By: maysamyabandeh fbshipit-source-id: 94409a346069923cac9de8e57adc313b4ed46f28	2017-04-24 11:28:36 -07:00
Tomas Kolda	04d58970cb	AIX and Solaris Sparc Support Summary: Replacement of #2147 The change was squashed due to a lot of conflicts. Closes https://github.com/facebook/rocksdb/pull/2194 Differential Revision: D4929799 Pulled By: siying fbshipit-source-id: 5cd49c254737a1d5ac13f3c035f128e86524c581	2017-04-21 20:48:04 -07:00
Siying Dong	7534ba7bde	StackableDB should pass ResetStats() Summary: Closes https://github.com/facebook/rocksdb/pull/2190 Differential Revision: D4922688 Pulled By: siying fbshipit-source-id: eaa3d122f8d389ae0508ec8b61f7780fd8b0a7ef	2017-04-20 16:11:56 -07:00
Andrew Kryczka	df74b775e6	Limit backups opened Summary: This was requested by a customer who wants to proactively monitor whether any valid backups are available. The existing performance was poor because Open() serially reads every small meta-file (one per backup), which was slow on HDFS. Now we only read the minimum number of meta-files to find `max_valid_backups_to_open` valid backups. The customer mentioned above can just set it to one. Closes https://github.com/facebook/rocksdb/pull/2151 Differential Revision: D4882564 Pulled By: ajkr fbshipit-source-id: cb0edf9e8ac693e4d5f24902e725a011ed8c0c2f	2017-04-19 13:26:47 -07:00
Siying Dong	ca96654d85	Change Build Env to gcc-5 Summary: Default to build using gcc-5. Only apply to Facebook-only environments. Closes https://github.com/facebook/rocksdb/pull/2158 Differential Revision: D4887568 Pulled By: siying fbshipit-source-id: 53496c9af3273ccd44441bd0bef9d29beefbc00b	2017-04-14 11:12:56 -07:00
Manuel Ung	9300ef5455	Fix shared lock upgrades Summary: Upgrading a shared lock was silently succeeding because the actual locking code was skipped. This is because if the keys are tracked, it is assumed that they are already locked and do not require locking. Fix this by recording in tracked keys whether the key was locked exclusively or not. Note that lock downgrades are impossible, which is the behaviour we expect. This fixes facebook/mysql-5.6#587. Closes https://github.com/facebook/rocksdb/pull/2122 Differential Revision: D4861489 Pulled By: IslamAbdelRahman fbshipit-source-id: 58c7ebe7af098bf01b9774b666d3e9867747d8fd	2017-04-10 16:06:00 -07:00
Manuel Ung	1f8b119ed6	Limit maximum memory used in the WriteBatch representation Summary: Extend TransactionOptions to include max_write_batch_size which determines the maximum size of the writebatch representation. If memory limit is exceeded, the operation will abort with subcode kMemoryLimit. Closes https://github.com/facebook/rocksdb/pull/2124 Differential Revision: D4861842 Pulled By: lth fbshipit-source-id: 46fd172ea67cc90bbba829bf0d70cfab2261c161	2017-04-10 15:42:26 -07:00
Sagar Vemuri	7124268a09	Reduce the number of params needed to construct DBIter Summary: DBIter, and in-turn NewDBIterator and NewArenaWrappedDBIterator, take a bunch of params. They can be reduced by passing in ReadOptions directly instead of passing in every new param separately. It also seems much cleaner as a bunch of the params towards the end seem to be optional. (Recently I introduced max_skippable_internal_keys, which added one more to the already huge count). Idea courtesy IslamAbdelRahman Closes https://github.com/facebook/rocksdb/pull/2116 Differential Revision: D4857128 Pulled By: sagar0 fbshipit-source-id: 7d239df094b94bd9ea79d145cdf825478ac037a8	2017-04-10 11:14:14 -07:00
Sagar Vemuri	343b59d6ee	Move various string utility functions into string_util Summary: This is an effort to club all string related utility functions into one common place, in string_util, so that it is easier for everyone to know what string processing functions are available. Right now they seem to be spread out across multiple modules, like logging and options_helper. Check the sub-commits for easier reviewing. Closes https://github.com/facebook/rocksdb/pull/2094 Differential Revision: D4837730 Pulled By: sagar0 fbshipit-source-id: 344278a	2017-04-06 14:54:12 -07:00
Yi Wu	df6f5a3772	Move memtable related files into memtable directory Summary: Move memtable related files into memtable directory. Closes https://github.com/facebook/rocksdb/pull/2087 Differential Revision: D4829242 Pulled By: yiwu-arbug fbshipit-source-id: ca70ab6	2017-04-06 14:09:13 -07:00
Siying Dong	d2dce5611a	Move some files under util/ to separate dirs Summary: Move some files under util/ to new directories env/, monitoring/ options/ and cache/ Closes https://github.com/facebook/rocksdb/pull/2090 Differential Revision: D4833681 Pulled By: siying fbshipit-source-id: 2fd8bef	2017-04-05 19:09:16 -07:00
Andrew Kryczka	d659faad54	Level-based L0->L0 compaction Summary: Level-based L0->L0 compaction operates on spans of files that aren't currently being compacted. It reduces the number of L0 files, thus making write stall conditions harder to reach. - L0->L0 is triggered when base level is unavailable due to pending compactions - L0->L0 always outputs one file of at most `max_level0_burst_file_size` bytes. - Subcompactions are disabled for L0->L0 since we want to output one file. - Input files are chosen as the longest span of available files that will fit within the size limit. This minimizes number of files in L0. Closes https://github.com/facebook/rocksdb/pull/2027 Differential Revision: D4760318 Pulled By: ajkr fbshipit-source-id: 9d07183	2017-04-04 18:09:11 -07:00
Andrew Kryczka	e2c6c06366	add TimedEnv Summary: I've needed Env timing measurements a few times now, so finally built something for it. Closes https://github.com/facebook/rocksdb/pull/2073 Differential Revision: D4811231 Pulled By: ajkr fbshipit-source-id: 218a249	2017-04-04 11:24:12 -07:00
Yi Wu	9e44531803	Refactor WriteImpl (pipeline write part 1) Summary: Refactor WriteImpl() so when I plug-in the pipeline write code (which is an alternative approach for WriteThread), some of the logic can be reuse. I split out the following methods from WriteImpl(): * PreprocessWrite() * HandleWALFull() (previous MaybeFlushColumnFamilies()) * HandleWriteBufferFull() * WriteToWAL() Also adding a constructor to WriteThread::Writer, and move WriteContext into db_impl.h. No real logic change in this patch. Closes https://github.com/facebook/rocksdb/pull/2042 Differential Revision: D4781014 Pulled By: yiwu-arbug fbshipit-source-id: d45ca18	2017-04-04 10:24:32 -07:00
Siying Dong	6ef8c620d3	Move auto_roll_logger and filename out of db/ Summary: It is confusing to have auto_roll_logger to stay under db/, which has nothing to do with database. Move filename together as it is a dependency. Closes https://github.com/facebook/rocksdb/pull/2080 Differential Revision: D4821141 Pulled By: siying fbshipit-source-id: ca7d768	2017-04-03 18:39:14 -07:00

1 2 3 4 5 ...

800 Commits