rocksdb

Author	SHA1	Message	Date
Yi Wu	4f9f124347	Blob DB: use compression in file header instead of global options Summary: To fix the issue of failing to decompress existing value after reopen DB with a different compression settings. Closes https://github.com/facebook/rocksdb/pull/3142 Differential Revision: D6267260 Pulled By: yiwu-arbug fbshipit-source-id: c7cf7f3e33b0cd25520abf4771cdf9180cc02a5f	2017-11-07 17:42:17 -08:00
Andrew Kryczka	114896c4e0	db_bench compression options Summary: - moved existing compression options to `InitializeOptionsGeneral` since they cannot be set through options file - added flag for `zstd_max_train_bytes` which was recently introduced by #3057 Closes https://github.com/facebook/rocksdb/pull/3128 Differential Revision: D6240460 Pulled By: ajkr fbshipit-source-id: 27dbebd86a55de237ba6a45cc79cff9214e82ebc	2017-11-07 14:00:03 -08:00
Andrew Kryczka	65c95d9c59	support db_bench compact benchmark on bottommost files Summary: Without this option, running the compact benchmark on a DB containing only bottommost files simply returned immediately. Closes https://github.com/facebook/rocksdb/pull/3138 Differential Revision: D6256660 Pulled By: ajkr fbshipit-source-id: e3b64543acd503d821066f4200daa201d4fb3a9d	2017-11-07 10:57:24 -08:00
Manuel Ung	e03377c7fd	Add lock wait time as a perf context counter Summary: Adds two new counters: `key_lock_wait_count` counts how many times a lock was blocked by another transaction and had to wait, instead of being granted the lock immediately. `key_lock_wait_time` counts the time spent acquiring locks. Closes https://github.com/facebook/rocksdb/pull/3107 Differential Revision: D6217332 Pulled By: lth fbshipit-source-id: 55d4f46da5550c333e523263422fd61d6a46deb9	2017-11-06 10:57:19 -08:00
Yi Wu	be410dede8	Fix PinnableSlice move assignment Summary: After move assignment, we need to re-initialized the moved PinnableSlice. Also update blob_db_impl.cc to not reuse the moved PinnableSlice since it is supposed to be in an undefined state after move. Closes https://github.com/facebook/rocksdb/pull/3127 Differential Revision: D6238585 Pulled By: yiwu-arbug fbshipit-source-id: bd99f2e37406c4f7de160c7dee6a2e8126bc224e	2017-11-03 18:13:21 -07:00
Sagar Vemuri	a6d8e30c05	Remove unnecessary status check in TableCache::NewIterator Summary: While investigating the usage of `new_table_iterator_nanos` perf counter, I saw some code was wrapper around with unnecessary status check ... so removed it. Closes https://github.com/facebook/rocksdb/pull/3120 Differential Revision: D6229181 Pulled By: sagar0 fbshipit-source-id: f8a44fe67f5a05df94553fdb233b21e54e88cc34	2017-11-03 14:42:08 -07:00
Prashant D	4c8f336401	util: Fix coverity issues Summary: util/concurrent_arena.h: CID 1396145 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR) 2. uninit_member: Non-static class member free_begin_ is not initialized in this constructor nor in any functions that it calls. 94 Shard() : allocated_and_unused_(0) {} util/dynamic_bloom.cc: 1. Condition hash_func == NULL, taking true branch. CID 1322821 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR) 3. uninit_member: Non-static class member data_ is not initialized in this constructor nor in any functions that it calls. 47 hash_func_(hash_func == nullptr ? &BloomHash : hash_func) {} 48 util/file_reader_writer.h: 204 private: 205 AlignedBuffer buffer_; member_not_init_in_gen_ctor: The compiler-generated constructor for this class does not initialize buffer_offset_. 206 uint64_t buffer_offset_; CID 1418246 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR) member_not_init_in_gen_ctor: The compiler-generated constructor for this class does not initialize buffer_len_. 207 size_t buffer_len_; 208}; util/thread_local.cc: 341#endif CID 1322795 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR) 3. uninit_member: Non-static class member pthread_key_ is not initialized in this constructor nor in any functions that it calls. 342} 40struct ThreadData { 2. uninit_member: Non-static class member next is not initialized in this constructor nor in any functions that it calls. CID 1400668 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR) 4. uninit_member: Non-static class member prev is not initialized in this constructor nor in any functions that it calls. 41 explicit ThreadData(ThreadLocalPtr::StaticMeta* _inst) : entries(), inst(_inst) {} 42 std::vector<Entry> entries; 1. member_decl: Class member declaration for next. 43 ThreadData* next; 3. member_decl: Class member declaration for prev. 44 ThreadData* prev; 45 ThreadLocalPtr::StaticMeta* inst; 46}; Closes https://github.com/facebook/rocksdb/pull/3123 Differential Revision: D6233566 Pulled By: sagar0 fbshipit-source-id: aa2068790ea69787a0035c0db39d59b0c25108db	2017-11-03 14:42:08 -07:00
Andrew Kryczka	cfb120f737	fix CopyFile status checks Summary: copied from internal diff D6156261 Closes https://github.com/facebook/rocksdb/pull/3124 Differential Revision: D6230167 Pulled By: ajkr fbshipit-source-id: 17926bb1152d607556364e3aacfec0ef3c115748	2017-11-03 11:57:10 -07:00
Yi Wu	d956169563	Fix clang build error Summary: Fix cast from size_t to unsigned int. Closes https://github.com/facebook/rocksdb/pull/3125 Differential Revision: D6232863 Pulled By: yiwu-arbug fbshipit-source-id: 4c6131168b1faec26f7820b2cf4a09c242d323b7	2017-11-03 11:26:54 -07:00
Yi Wu	2581c0a5a1	Blob DB: Fix BlobDBTest::SnapshotAndGarbageCollection asan failure Summary: Fix unreleased snapshot at the end of the test. Closes https://github.com/facebook/rocksdb/pull/3126 Differential Revision: D6232867 Pulled By: yiwu-arbug fbshipit-source-id: 651ca3144fc573ea2ab0ab20f0a752fb4a101d26	2017-11-03 10:26:59 -07:00
Andrew Kryczka	24ad430600	pass key/value samples through zstd compression dictionary generator Summary: Instead of using samples directly, we now support passing the samples through zstd's dictionary generator when `CompressionOptions::zstd_max_train_bytes` is set to nonzero. If set to zero, we will use the samples directly as the dictionary -- same as before. Note this is the first step of #2987, extracted into a separate PR per reviewer request. Closes https://github.com/facebook/rocksdb/pull/3057 Differential Revision: D6116891 Pulled By: ajkr fbshipit-source-id: 70ab13cc4c734fa02e554180eed0618b75255497	2017-11-02 22:56:36 -07:00
Andrew Kryczka	c4c1f961e7	dynamically change current memtable size Summary: Previously setting `write_buffer_size` with `SetOptions` would only apply to new memtables. An internal user wanted it to take effect immediately, instead of at an arbitrary future point, to prevent OOM. This PR makes the memtable's size mutable, and makes `SetOptions()` mutate it. There is one case when we preserve the old behavior, which is when memtable prefix bloom filter is enabled and the user is increasing the memtable's capacity. That's because the prefix bloom filter's size is fixed and wouldn't work as well on a larger memtable. Closes https://github.com/facebook/rocksdb/pull/3119 Differential Revision: D6228304 Pulled By: ajkr fbshipit-source-id: e44bd9d10a5f8c9d8c464bf7436070bb3eafdfc9	2017-11-02 22:28:10 -07:00
Zhongyi Xie	30e4e01e05	add missing else Summary: Closes https://github.com/facebook/rocksdb/pull/3121 Differential Revision: D6229415 Pulled By: miasantreble fbshipit-source-id: 57c7ad2fddf5dd6b8d7e3aaf6f62348151327dfb	2017-11-02 22:28:06 -07:00
Prashant D	602fe9454c	Fix coverity issues in include/rocksdb Summary: include/rocksdb/metadata.h: struct ColumnFamilyMetaData { CID 1322804 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR) 2. uninit_member: Non-static class member file_count is not initialized in this constructor nor in any functions that it calls. struct SstFileMetaData { 2. uninit_member: Non-static class member size is not initialized in this constructor nor in any functions that it calls. 4. uninit_member: Non-static class member smallest_seqno is not initialized in this constructor nor in any functions that it calls. 6. uninit_member: Non-static class member largest_seqno is not initialized in this constructor nor in any functions that it calls. 8. uninit_member: Non-static class member num_reads_sampled is not initialized in this constructor nor in any functions that it calls. CID 1322807 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR) 10. uninit_member: Non-static class member being_compacted is not initialized in this constructor nor in any functions that it calls. include/rocksdb/sst_file_writer.h: struct ExternalSstFileInfo { 2. uninit_member: Non-static class member sequence_number is not initialized in this constructor nor in any functions that it calls. 4. uninit_member: Non-static class member file_size is not initialized in this constructor nor in any functions that it calls. 6. uninit_member: Non-static class member num_entries is not initialized in this constructor nor in any functions that it calls. CID 1351697 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR) 8. uninit_member: Non-static class member version is not initialized in this constructor nor in any functions that it calls. 31 ExternalSstFileInfo() {} include/rocksdb/utilities/transaction.h: explicit Transaction(const TransactionDB* db) {} 2. uninit_member: Non-static class member log_number_ is not initialized in this constructor nor in any functions that it calls. CID 1396133 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR) 4. uninit_member: Non-static class member field txn_state_._M_i is not initialized in this constructor nor in any functions that it calls. 473 Transaction() {} Closes https://github.com/facebook/rocksdb/pull/3100 Differential Revision: D6227651 Pulled By: sagar0 fbshipit-source-id: 5caa4a2cf9471d1f9c3c073f81473636e1f0aa14	2017-11-02 17:56:48 -07:00
Yi Wu	62578d80c1	Blob DB: Add compaction filter to remove expired blob index entries Summary: After adding expiration to blob index in #3066, we are now able to add a compaction filter to cleanup expired blob index entries. Closes https://github.com/facebook/rocksdb/pull/3090 Differential Revision: D6183812 Pulled By: yiwu-arbug fbshipit-source-id: 9cb03267a9702975290e758c9c176a2c03530b83	2017-11-02 17:27:38 -07:00
Sagar Vemuri	76c3fbd651	Add Memtable Read Tier to RocksJava Summary: This options was introduced in the C++ API in #1953 . Closes https://github.com/facebook/rocksdb/pull/3064 Differential Revision: D6139010 Pulled By: sagar0 fbshipit-source-id: 164de11d539d174cf3afe7cd40e667049f44b0bc	2017-11-02 17:27:37 -07:00
Yi Wu	7bfa88037e	Blob DB: fix snapshot handling Summary: Blob db will keep blob file if data in the file is visible to an active snapshot. Before this patch it checks whether there is an active snapshot has sequence number greater than the earliest sequence in the file. This is problematic since we take snapshot on every read, if it keep having reads, old blob files will not be cleanup. Change to check if there is an active snapshot falls in the range of [earliest_sequence, obsolete_sequence) where obsolete sequence is 1. if data is relocated to another file by garbage collection, it is the latest sequence at the time garbage collection finish 2. otherwise, it is the latest sequence of the file Closes https://github.com/facebook/rocksdb/pull/3087 Differential Revision: D6182519 Pulled By: yiwu-arbug fbshipit-source-id: cdf4c35281f782eb2a9ad6a87b6727bbdff27a45	2017-11-02 15:58:27 -07:00
Yi Wu	f662f8f0b6	Blob DB: option to enable garbage collection Summary: Add an option to enable/disable auto garbage collection, where we keep counting how many keys have been evicted by either deletion or compaction and decide whether to garbage collect a blob file. Default disable auto garbage collection for now since the whole logic is not fully tested and we plan to make major change to it. Closes https://github.com/facebook/rocksdb/pull/3117 Differential Revision: D6224756 Pulled By: yiwu-arbug fbshipit-source-id: cdf53bdccec96a4580a2b3a342110ad9e8864dfe	2017-11-02 15:58:27 -07:00
Yi Wu	167ba599ec	Blob DB: Fix flaky BlobDBTest::GCExpiredKeyWhileOverwriting test Summary: The test intent to wait until key being overwritten until proceed with garbage collection. It failed to wait for `PutUntil` finally finish. Fixing it. Closes https://github.com/facebook/rocksdb/pull/3116 Differential Revision: D6222833 Pulled By: yiwu-arbug fbshipit-source-id: fa9b57a772b92a66cf250b44e7975c43f62f45c5	2017-11-02 13:27:34 -07:00
Sagar Vemuri	25ac1697b4	Blob DB: Evict oldest blob file when close to blob db size limit Summary: Evict oldest blob file and put it in obsolete_files list when close to blob db size limit. The file will be delete when the `DeleteObsoleteFiles` background job runs next time. For now I set `kEvictOldestFileAtSize` constant, which controls when to evict the oldest file, at 90%. It could be tweaked or made into an option if really needed; I didn't want to expose it as an option pre-maturely as there are already too many :) . Closes https://github.com/facebook/rocksdb/pull/3094 Differential Revision: D6187340 Pulled By: sagar0 fbshipit-source-id: 687f8262101b9301bf964b94025a2fe9d8573421	2017-11-02 12:11:21 -07:00
Prashant D	3c208e7616	HistogramStat: Handle divide by zero situation Summary: The num() might return cur_num as 0 and we are making sure that cur_num will not be 0 down the path. The mult variable is being set to 100.0/cur_num which makes program crash when cur_num is 0. Closes https://github.com/facebook/rocksdb/pull/3105 Differential Revision: D6222594 Pulled By: ajkr fbshipit-source-id: 986154709897ff4dbbeb0e8aa81eb8c0b2a2db76	2017-11-02 11:41:50 -07:00
Maysam Yabandeh	25fbd9a996	Remove the experimental notes about partitioning Summary: This patch will remove the existing comments that declare partitioning indexes and filters as experimental. Closes https://github.com/facebook/rocksdb/pull/3115 Differential Revision: D6222227 Pulled By: maysamyabandeh fbshipit-source-id: 6179ec43b22c518494051b674d91c9e1b54d4ac0	2017-11-02 11:14:30 -07:00
Maysam Yabandeh	60d83df23d	WritePrepared Txn: Move DB class to its own file Summary: Move WritePreparedTxnDB from pessimistic_transaction_db.h to its own header, write_prepared_txn_db.h Closes https://github.com/facebook/rocksdb/pull/3114 Differential Revision: D6220987 Pulled By: maysamyabandeh fbshipit-source-id: 18893fb4fdc6b809fe117dabb544080f9b4a301b	2017-11-02 11:14:30 -07:00
Andrew Kryczka	6778690b51	fix duplicate definition of GetEntryType() Summary: It's also defined in db/dbformat.cc per 7fe3b32896ecbb21d67ec52fccb713cb9bc6a644 Closes https://github.com/facebook/rocksdb/pull/3111 Differential Revision: D6219140 Pulled By: ajkr fbshipit-source-id: 0f2b14e41457334a4665c6b7e3f42f1a060a0f35	2017-11-01 22:56:17 -07:00
Andrew Kryczka	cd124215df	release 5.9 Summary: updated HISTORY.md and version.h for the release. Closes https://github.com/facebook/rocksdb/pull/3110 Differential Revision: D6218645 Pulled By: ajkr fbshipit-source-id: 99ab8473e9088b02d7596e92351cce7a60a99e93	2017-11-01 21:26:14 -07:00
Maysam Yabandeh	02693f64fc	WritePrepared Txn: ValidateSnapshot Summary: Implements ValidateSnapshot for WritePrepared txns and also adds a unit test to clarify the contract of this function. Closes https://github.com/facebook/rocksdb/pull/3101 Differential Revision: D6199405 Pulled By: maysamyabandeh fbshipit-source-id: ace509934c307ea5d26f4bbac5f836d7c80fd240	2017-11-01 19:11:09 -07:00
Mikhail Antonov	7fe3b32896	Added support for differential snapshots Summary: The motivation for this PR is to add to RocksDB support for differential (incremental) snapshots, as snapshot of the DB changes between two points in time (one can think of it as diff between to sequence numbers, or the diff D which can be thought of as an SST file or just set of KVs that can be applied to sequence number S1 to get the database to the state at sequence number S2). This feature would be useful for various distributed storages layers built on top of RocksDB, as it should help reduce resources (time and network bandwidth) needed to recover and rebuilt DB instances as replicas in the context of distributed storages. From the API standpoint that would like client app requesting iterator between (start seqnum) and current DB state, and reading the "diff". This is a very draft PR for initial review in the discussion on the approach, i'm going to rework some parts and keep updating the PR. For now, what's done here according to initial discussions: Preserving deletes: - We want to be able to optionally preserve recent deletes for some defined period of time, so that if a delete came in recently and might need to be included in the next incremental snapshot it would't get dropped by a compaction. This is done by adding new param to Options (preserve deletes flag) and new variable to DB Impl where we keep track of the sequence number after which we don't want to drop tombstones, even if they are otherwise eligible for deletion. - I also added a new API call for clients to be able to advance this cutoff seqnum after which we drop deletes; i assume it's more flexible to let clients control this, since otherwise we'd need to keep some kind of timestamp < -- > seqnum mapping inside the DB, which sounds messy and painful to support. Clients could make use of it by periodically calling GetLatestSequenceNumber(), noting the timestamp, doing some calculation and figuring out by how much we need to advance the cutoff seqnum. - Compaction codepath in compaction_iterator.cc has been modified to avoid dropping tombstones with seqnum > cutoff seqnum. Iterator changes: - couple params added to ReadOptions, to optionally allow client to request internal keys instead of user keys (so that client can get the latest value of a key, be it delete marker or a put), as well as min timestamp and min seqnum. TableCache changes: - I modified table_cache code to be able to quickly exclude SST files from iterators heep if creation_time on the file is less then iter_start_ts as passed in ReadOptions. That would help a lot in some DB settings (like reading very recent data only or using FIFO compactions), but not so much for universal compaction with more or less long iterator time span. What's left: - Still looking at how to best plug that inside DBIter codepath. So far it seems that FindNextUserKeyInternal only parses values as UserKeys, and iter->key() call generally returns user key. Can we add new API to DBIter as internal_key(), and modify this internal method to optionally set saved_key_ to point to the full internal key? I don't need to store actual seqnum there, but I do need to store type. Closes https://github.com/facebook/rocksdb/pull/2999 Differential Revision: D6175602 Pulled By: mikhail-antonov fbshipit-source-id: c779a6696ee2d574d86c69cec866a3ae095aa900	2017-11-01 18:56:43 -07:00
Maysam Yabandeh	17731a43a6	WritePrepared Txn: Optimize for recoverable state Summary: GetCommitTimeWriteBatch is currently used to store some state as part of commit in 2PC. In MyRocks it is specifically used to store some data that would be needed only during recovery. So it is not need to be stored in memtable right after each commit. This patch enables an optimization to write the GetCommitTimeWriteBatch only to the WAL. The batch will be written to memtable during recovery when the WAL is replayed. To cover the case when WAL is deleted after memtable flush, the batch is also buffered and written to memtable right before each memtable flush. Closes https://github.com/facebook/rocksdb/pull/3071 Differential Revision: D6148023 Pulled By: maysamyabandeh fbshipit-source-id: 2d09bae5565abe2017c0327421010d5c0d55eaa7	2017-11-01 17:26:46 -07:00
Maysam Yabandeh	c1cf94c787	WritePrepared Txn: sort indexes before batch collapse Summary: The collapse of duplicate keys in write batch needs to sort the indexes of duplicate keys since it only checks the index in the batch with the head of the list of duplicate keys. Closes https://github.com/facebook/rocksdb/pull/3093 Differential Revision: D6186800 Pulled By: maysamyabandeh fbshipit-source-id: abc9ae8c2f1840445a5584f925cf86ecc6f37154	2017-11-01 08:56:57 -07:00
Yi Wu	f6082d1944	Blob DB: cleanup unused options Summary: * cleanup num_concurrent_simple_blobs. We don't do concurrent writes (by taking write_mutex_) so it doesn't make sense to have multiple non TTL files open. We can revisit later when we want to improve writes. * cleanup eviction callback. we don't have plan to use it now. * rename s/open_simple_blob_files_/open_non_ttl_file_/ and s/open_blob_files_/open_ttl_files_/ to avoid confusion. Closes https://github.com/facebook/rocksdb/pull/3088 Differential Revision: D6182598 Pulled By: yiwu-arbug fbshipit-source-id: 99e6f5e01fa66d31309cdb06ce48502464bac6ad	2017-10-31 16:42:08 -07:00
Sagar Vemuri	f5078dde2d	Blob DB: Initialize all fields in Blob Header, Footer and Record structs Summary: Fixing un-itializations caught by valgrind. Closes https://github.com/facebook/rocksdb/pull/3103 Differential Revision: D6200195 Pulled By: sagar0 fbshipit-source-id: bf35a3fb03eb1d308e4c5ce30dee1e345d7b03b3	2017-10-31 16:42:08 -07:00
Shaohua Li	33c7d4ccd9	Make writable_file_max_buffer_size dynamic Summary: The DBOptions::writable_file_max_buffer_size can be changed dynamically. Closes https://github.com/facebook/rocksdb/pull/3053 Differential Revision: D6152720 Pulled By: shligit fbshipit-source-id: aa0c0cfcfae6a54eb17faadb148d904797c68681	2017-10-31 13:56:35 -07:00
Prashant D	c1be8d86c6	Fix removed structurally dead return statement Summary: There seems to be a typo mistake in env ReuseWritableFile func where status is being returned twice. Closes https://github.com/facebook/rocksdb/pull/3099 Differential Revision: D6196204 Pulled By: ajkr fbshipit-source-id: abb6e3e1c1e772dd485fc39e7f1b9d502fa188fe	2017-10-31 01:26:13 -07:00
Andrew Kryczka	4d43c6a6a4	db_stress snapshot compatibility with reopens Summary: - Release all snapshots before crashing and reopening the DB. Without this, we may attempt to release snapshots from an old DB using a new DB. That tripped an assertion. - Release multiple snapshots in the same operation if needed. Without this, we would sometimes leak snapshots. Closes https://github.com/facebook/rocksdb/pull/3098 Differential Revision: D6194923 Pulled By: ajkr fbshipit-source-id: b9c89bcca7ebcbb6c7802c616f9d1175a005aadf	2017-10-31 01:26:08 -07:00
Andrew Kryczka	b7bc9cc038	fix tracking oldest snapshot for bottom-level compaction Summary: The assertion was caught by `MySQLStyleTransactionTest/MySQLStyleTransactionTest.TransactionStressTest/5` when run in a loop. The caller doesn't track whether the released snapshot is oldest, so let this function handle that case. Closes https://github.com/facebook/rocksdb/pull/3080 Differential Revision: D6185257 Pulled By: ajkr fbshipit-source-id: 4b3015c11db5d31e46521a00af568546ef4558cd	2017-10-30 00:55:58 -07:00
Yi Wu	792ef10ca8	Return Status::InvalidArgument if user request sync write while disabling WAL Summary: write_options.sync = true and write_options.disableWAL is incompatible. When WAL is disabled, we are not able to persist the write immediately. Return an error in this case to avoid misuse of the options. Closes https://github.com/facebook/rocksdb/pull/3086 Differential Revision: D6176822 Pulled By: yiwu-arbug fbshipit-source-id: 1eb10028c14fe7d7c13c8bc12c0ef659f75aa071	2017-10-28 22:11:18 -07:00
Andrew Kryczka	6a9335dbbb	always drop tombstones compacted to bottommost level Summary: Problem was in bottommost compaction, when an L0->L0 compaction happened and L0 was bottommost. Then we'd preserve tombstones according to `Compaction::KeyNotExistsBeyondOutputLevel`, while zeroing seqnum according to `CompactionIterator::PrepareOutput`, thus triggering the assertion in `PrepareOutput`. To fix, we can just drop tombstones in L0->L0 when the output is "bottommost", i.e., the compaction includes the oldest L0 file and there's nothing at lower levels. Closes https://github.com/facebook/rocksdb/pull/3085 Differential Revision: D6175742 Pulled By: ajkr fbshipit-source-id: 8ab19a2e001496f362e9eb0a71757e2f6ecfdb3b	2017-10-27 15:56:35 -07:00
Yi Wu	84a04af9a9	TableProperty::oldest_key_time defaults to 0 Summary: We don't propagate TableProperty::oldest_key_time on compaction and just write the default value to SST files. It is more natural to default the value to 0. Also revert db_sst_test back to before #2842. Closes https://github.com/facebook/rocksdb/pull/3079 Differential Revision: D6165702 Pulled By: yiwu-arbug fbshipit-source-id: ca3ce5928d96ae79a5beb12bb7d8c640a71478a0	2017-10-27 15:00:05 -07:00
Islam AbdelRahman	05993155ef	Mark files as trash by using .trash extension Summary: SstFileManager move files that need to be deleted into a trash directory. Deprecate this behaviour and instead add ".trash" extension to files that need to be deleted Closes https://github.com/facebook/rocksdb/pull/2970 Differential Revision: D5976805 Pulled By: IslamAbdelRahman fbshipit-source-id: 27374ece4315610b2792c30ffcd50232d4c9a343	2017-10-27 13:27:12 -07:00
Yi Wu	3ebb7ba7b9	Blob DB: update blob file format Summary: Changing blob file format and some code cleanup around the change. The change with blob log format are: * Remove timestamp field in blob file header, blob file footer and blob records. The field is not being use and often confuse with expiration field. * Blob file header now come with column family id, which always equal to default column family id. It leaves room for future support of column family. * Compression field in blob file header now is a standalone byte (instead of compact encode with flags field) * Blob file footer now come with its own crc. * Key length now being uint64_t instead of uint32_t * Blob CRC now checksum both key and value (instead of value only). * Some reordering of the fields. The list of cleanups: * Better inline comments in blob_log_format.h * rename ttlrange_t and snrange_t to ExpirationRange and SequenceRange respectively. * simplify blob_db::Reader * Move crc checking logic to inside blob_log_format.cc Closes https://github.com/facebook/rocksdb/pull/3081 Differential Revision: D6171304 Pulled By: yiwu-arbug fbshipit-source-id: e4373e0d39264441b7e2fbd0caba93ddd99ea2af	2017-10-27 13:27:12 -07:00
Dmitri Smirnov	682db81385	Enable cacheline_aligned_alloc() to allocate from jemalloc if enabled. Summary: Reuse WITH_JEMALLOC option in preparation for module search unification. Move jemalloc overrides into a separate .cc Remote obsolete JEMALLOC_NOINIT option. Closes https://github.com/facebook/rocksdb/pull/3078 Differential Revision: D6174826 Pulled By: yiwu-arbug fbshipit-source-id: 9970a0289b4490272d15853920d9d7531af91140	2017-10-27 13:27:12 -07:00
Prashant D	d9240b548c	Fix coverity uninitialized fields warnings in lru_cache Summary: Coverity uninitialized member variable warnings in lru_cache Closes https://github.com/facebook/rocksdb/pull/3082 Differential Revision: D6173062 Pulled By: sagar0 fbshipit-source-id: 7bcfc653457bd362d46045d06527838c9a6adad6	2017-10-27 11:26:43 -07:00
Prashant D	50e95a63dd	Fix coverity issues column_family, compaction_db/iterator Summary: db/column_family.h : 79 ColumnFamilyHandleInternal() CID 1322806 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR) 2. uninit_member: Non-static class member internal_cfd_ is not initialized in this constructor nor in any functions that it calls. 80 : ColumnFamilyHandleImpl(nullptr, nullptr, nullptr) {} db/compacted_db_impl.cc: 18CompactedDBImpl::CompactedDBImpl( 19 const DBOptions& options, const std::string& dbname) 20 : DBImpl(options, dbname) { 2. uninit_member: Non-static class member cfd_ is not initialized in this constructor nor in any functions that it calls. 4. uninit_member: Non-static class member version_ is not initialized in this constructor nor in any functions that it calls. CID 1396120 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR) 6. uninit_member: Non-static class member user_comparator_ is not initialized in this constructor nor in any functions that it calls. 21} db/compaction_iterator.cc: 9. uninit_member: Non-static class member current_user_key_sequence_ is not initialized in this constructor nor in any functions that it calls. 11. uninit_member: Non-static class member current_user_key_snapshot_ is not initialized in this constructor nor in any functions that it calls. CID 1419855 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR) 13. uninit_member: Non-static class member current_key_committed_ is not initialized in this constructor nor in any functions that it calls. Closes https://github.com/facebook/rocksdb/pull/3084 Differential Revision: D6172999 Pulled By: sagar0 fbshipit-source-id: 084d73393faf8022c01359cfb445807b6a782460	2017-10-27 11:26:42 -07:00
Prashant D	47166baeac	Fix coverity uninitialized fields warnings Pulled By: ajkr Differential Revision: D6170448 fbshipit-source-id: 5fd6d1608fc0df27c94d9f5059315ce7f79b8f5c	2017-10-26 21:11:50 -07:00
Prashant D	67b29e26be	Fix coverity issue for MutableDBOptions default constructor Summary: 228MutableDBOptions::MutableDBOptions() 229 : max_background_jobs(2), 230 base_background_compactions(-1), 231 max_background_compactions(-1), 232 avoid_flush_during_shutdown(false), 233 delayed_write_rate(2 * 1024U * 1024U), 234 max_total_wal_size(0), 235 delete_obsolete_files_period_micros(6ULL * 60 * 60 * 1000000), 236 stats_dump_period_sec(600), 2. uninit_member: Non-static class member bytes_per_sync is not initialized in this constructor nor in any functions that it calls. CID 1419857 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR) 4. uninit_member: Non-static class member wal_bytes_per_sync is not initialized in this constructor nor in any functions that it calls. 237 max_open_files(-1) {} Closes https://github.com/facebook/rocksdb/pull/3069 Differential Revision: D6170424 Pulled By: ajkr fbshipit-source-id: 1f94e86b87611ad2330b8b1707911150978d68b8	2017-10-26 20:56:45 -07:00
Andrew Kryczka	95667383db	implement lower bound for iterators Summary: - for `SeekToFirst()`, just convert it to a regular `Seek()` if lower bound is specified - for operations that iterate backwards over user keys (`SeekForPrev`, `SeekToLast`, `Prev`), change `PrevInternal` to check whether user key went below lower bound every time the user key changes -- same approach we use to ensure we stay within a prefix when `prefix_same_as_start=true`. Closes https://github.com/facebook/rocksdb/pull/3074 Differential Revision: D6158654 Pulled By: ajkr fbshipit-source-id: cb0e3a922e2650d2cd4d1c6e1c0f1e8b729ff518	2017-10-26 17:27:42 -07:00
Yi Wu	5a2a6483dc	Blob DB: Inline small values in base DB Summary: Adding the `min_blob_size` option to allow storing small values in base db (in LSM tree) together with the key. The goal is to improve performance for small values, while taking advantage of blob db's low write amplification for large values. Also adding expiration timestamp to blob index. It will be useful to evict stale blob indexes in base db by adding a compaction filter. I'll work on the compaction filter in future patches. See blob_index.h for the new blob index format. There are 4 cases when writing a new key: * small value w/o TTL: put in base db as normal value (i.e. ValueType::kTypeValue) * small value w/ TTL: put (type, expiration, value) to base db. * large value w/o TTL: write value to blob log and put (type, file, offset, size, compression) to base db. * large value w/TTL: write value to blob log and put (type, expiration, file, offset, size, compression) to base db. Closes https://github.com/facebook/rocksdb/pull/3066 Differential Revision: D6142115 Pulled By: yiwu-arbug fbshipit-source-id: 9526e76e19f0839310a3f5f2a43772a4ad182cd0	2017-10-26 12:30:54 -07:00
Andrew Kryczka	9b18cc2363	single-file bottom-level compaction when snapshot released Summary: When snapshots are held for a long time, files may reach the bottom level containing overwritten/deleted keys. We previously had no mechanism to trigger compaction on such files. This particularly impacted DBs that write to different parts of the keyspace over time, as such files would never be naturally compacted due to second-last level files moving down. This PR introduces a mechanism for bottommost files to be recompacted upon releasing all snapshots that prevent them from dropping their deleted/overwritten keys. - Changed `CompactionPicker` to compact files in `BottommostFilesMarkedForCompaction()`. These are the last choice when picking. Each file will be compacted alone and output to the same level in which it originated. The goal of this type of compaction is to rewrite the data excluding deleted/overwritten keys. - Changed `ReleaseSnapshot()` to recompute the bottom files marked for compaction when the oldest existing snapshot changes, and schedule a compaction if needed. We cache the value that oldest existing snapshot needs to exceed in order for another file to be marked in `bottommost_files_mark_threshold_`, which allows us to avoid recomputing marked files for most snapshot releases. - Changed `VersionStorageInfo` to track the list of bottommost files, which is recomputed every time the version changes by `UpdateBottommostFiles()`. The list of marked bottommost files is first computed in `ComputeBottommostFilesMarkedForCompaction()` when the version changes, but may also be recomputed when `ReleaseSnapshot()` is called. - Extracted core logic of `Compaction::IsBottommostLevel()` into `VersionStorageInfo::RangeMightExistAfterSortedRun()` since logic to check whether a file is bottommost is now necessary outside of compaction. Closes https://github.com/facebook/rocksdb/pull/3009 Differential Revision: D6062044 Pulled By: ajkr fbshipit-source-id: 123d201cf140715a7d5928e8b3cb4f9cd9f7ad21	2017-10-25 16:30:37 -07:00
Sagar Vemuri	96e3a600ba	Return write error on reaching blob dir size limit Summary: I found that we continue accepting writes even when the blob db goes beyond the configured blob directory size limit. Now, we return an error for writes on reaching `blob_dir_size` limit and if `is_fifo` is set to false. (We cannot just drop any file when `is_fifo` is true.) Deleting the oldest file when `is_fifo` is true will be handled in a later PR. Closes https://github.com/facebook/rocksdb/pull/3060 Differential Revision: D6136156 Pulled By: sagar0 fbshipit-source-id: 2f11cb3f2eedfa94524fbfa2613dd64bfad7a23c	2017-10-25 16:30:37 -07:00
Islam AbdelRahman	addfe1ef4b	Fix tombstone scans in SeekForPrev outside prefix Summary: When doing a Seek() or SeekForPrev() we should stop the moment we see a key with a different prefix as start if ReadOptions:: prefix_same_as_start was set to true Right now we don't stop if we encounter a tombstone outside the prefix while executing SeekForPrev() Closes https://github.com/facebook/rocksdb/pull/3067 Differential Revision: D6149638 Pulled By: IslamAbdelRahman fbshipit-source-id: 7f659862d2bf552d3c9104a360c79439ceba2f18	2017-10-25 15:12:00 -07:00

1 2 3 4 5 ...

6655 Commits