rocksdb

Author	SHA1	Message	Date
Yanqin Jin	22368965a0	Modify verification logic of ObsoleteOptionsFileTest (#4218 ) Summary: The current verification logic does not consider the case in which multiple threads (foreground and background) may execute `PurgeObsoleteFiles` function simultaneously. Each invocation will trigger the callback adding elements to a vector. Then we verify the elements in the vector, which can fail sometimes. The solution is to give up checking the elements. Instead, we check the number of OPTIONS file in the database dir. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4218 Differential Revision: D9128727 Pulled By: riversand963 fbshipit-source-id: 2b13b705fb21bc0ddd41940c4ec9b6b0c8d88224	2018-08-03 13:57:40 -07:00
Sagar Vemuri	fefdac1004	Fix lite build failure in db_bench due to trace/replay (#4225 ) Summary: Fix lite build failure in db_bench due to trace/replay feature. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4225 Differential Revision: D9153303 Pulled By: sagar0 fbshipit-source-id: 9f7a8035429d0dcdbe99616d11389ed7bccf44be	2018-08-03 11:58:55 -07:00
DorianZheng	f9373e2d5c	Make sure to call ReleaseFileNumberFromPendingOutputs Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4219 Differential Revision: D9144294 Pulled By: riversand963 fbshipit-source-id: e46b72e5f8a149dc7a0512e38edcd0ddb0150f30	2018-08-02 18:57:34 -07:00
Pooja Malik	9dbf39399e	Rules Advisor: some fixes to support fetching stats from ODS (#4223 ) Summary: This PR includes fixes for some bugs that I encountered while testing the Optimizer with ODS stats support. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4223 Differential Revision: D9140786 Pulled By: poojam23 fbshipit-source-id: 045cb3f27d075c2042040ac2d561938349419516	2018-08-02 15:42:42 -07:00
Pooja Malik	892a156267	Advisor: README and blog, and also tests for DBBenchRunner, DatabaseOptions (#4201 ) Summary: This pull request adds a README file and a blog post for the Advisor tool. It also adds the missing tests for some Optimizer modules. Some comments are added to the classes being tested for improved readability. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4201 Reviewed By: maysamyabandeh Differential Revision: D9125311 Pulled By: poojam23 fbshipit-source-id: aefcf2f06eaa05490cc2834ef5aa6e21f0d1dc55	2018-08-01 16:13:09 -07:00
Andrew Kryczka	f8f6983f89	Skip range deletions at seqno zero when collapsing (#4216 ) Summary: `CollapsedRangeDelMap` internally uses seqno zero as a sentinel value to denote a gap between range tombstones or the end of range tombstones. It therefore expects to never have consecutive sentinel tombstones. However, since `DeleteRange` is now supported in `SstFileWriter`, an ingested file may contain range tombstones, and that ingested file may be assigned global seqno zero. When such tombstones are added to the collapsed map, they resemble sentinel tombstones due to having seqno zero. Then, the invariant mentioned above about never having consecutive sentinel tombstones can be violated. The symptom of this violation was dereferencing the `end()` iterator (#4204). The fix in this PR is to not add range tombstones with seqno zero to the collapsed map. They're not needed anyways since they can't possibly cover anything (in case of a key and a range tombstone with the same seqno, the key is visible). Pull Request resolved: https://github.com/facebook/rocksdb/pull/4216 Differential Revision: D9121716 Pulled By: ajkr fbshipit-source-id: f5b78a70bea9527354603ea7ac8542a7e2b6a210	2018-08-01 12:12:02 -07:00
Sagar Vemuri	12b6cdeed3	Trace and Replay for RocksDB (#3837 ) Summary: A framework for tracing and replaying RocksDB operations. A binary trace file is created by capturing the DB operations, and it can be replayed back at the same rate using db_bench. - Column-families are supported - Multi-threaded tracing is supported. - TraceReader and TraceWriter are exposed to the user, so that tracing to various destinations can be enabled (say, to other messaging/logging services). By default, a FileTraceReader and FileTraceWriter are implemented to capture to a file and replay from it. - This is not yet ideal to be enabled in production due to large performance overhead, but it can be safely tried out in a shadow setup, say, for analyzing RocksDB operations. Currently supported DB operations: - Writes: -- Put -- Merge -- Delete -- SingleDelete -- DeleteRange -- Write - Reads: -- Get (point lookups) Pull Request resolved: https://github.com/facebook/rocksdb/pull/3837 Differential Revision: D7974837 Pulled By: sagar0 fbshipit-source-id: 8ec65aaf336504bc1f6ed0feae67f6ed5ef97a72	2018-08-01 00:27:08 -07:00
Fenggang Wu	ee7617167f	DataBlockHashIndex: Specify that DataBlockHashIndex is not yet implemented in the comment Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4203 Differential Revision: D9090912 Pulled By: fgwu fbshipit-source-id: 6a68be83693ddf2a5c060290382141f0d2fb400b	2018-07-31 11:43:08 -07:00
Andrew Kryczka	a1a546a634	Avoid integer division in filter probing (#4071 ) Summary: The cache line size was computed dynamically based on the length of the filter bits, and the number of cache-lines encoded in the footer. This calculation had to be dynamic in case users migrate their data between platforms with different cache line sizes. The downside, though, was bloom filter probing became expensive as it did integer mod and division. However, since we know all possible cache line sizes are powers of two, we should be able to use bit shift to find the cache line, and bitwise-and to find the bit within the cache line. To do this, we compute the log-base-two of cache line size in the constructor, and use that in bitwise operations to replace division/mod. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4071 Differential Revision: D8684067 Pulled By: ajkr fbshipit-source-id: 50298872fba5acd01e8269cd7abcc51a095e0f61	2018-07-30 17:57:44 -07:00
Yanqin Jin	8abafb1feb	Generalize parameters generation. (#4046 ) Summary: Making generation of column families and keys virtual function so that subclasses of StressTest can override them to provide custom parameter generation for more flexibility. This will be useful for future tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4046 Differential Revision: D9073382 Pulled By: riversand963 fbshipit-source-id: 2754f0fdfa5c24d95c1f92d4944bc479552fb665	2018-07-30 17:42:12 -07:00
Yanqin Jin	54de56844d	Remove random writes from SST file ingestion (#4172 ) Summary: RocksDB used to store global_seqno in external SST files written by SstFileWriter. During file ingestion, RocksDB uses `pwrite` to update the `global_seqno`. Since random write is not supported in some non-POSIX compliant file systems, external SST file ingestion is not supported on these file systems. To address this limitation, we no longer update `global_seqno` during file ingestion. Later RocksDB uses the MANIFEST and other information in table properties to deduce global seqno for externally-ingested SST files. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4172 Differential Revision: D8961465 Pulled By: riversand963 fbshipit-source-id: 4382ec85270a96be5bc0cf33758ca2b167b05071	2018-07-27 16:12:23 -07:00
Fenggang Wu	a11df583ec	Add DataBlockIndexType option in BlockBasedTableOptions (#4150 ) Summary: Added DataBlockIndexType option in BlockBasedTableOptions. ``` enum DataBlockIndexType : char { kDataBlockBinarySearch = 0, // traditional block type kDataBlockHashIndex = 1, // additional hash index appended to the end. }; ``` The default type is the traditional binary seek option: `kDataBlockBinarySearch`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4150 Differential Revision: D8895958 Pulled By: fgwu fbshipit-source-id: 480adef48104cf11d30db3bad9a73f98b4a80c10	2018-07-27 15:42:27 -07:00
DorianZheng	f5e46354d2	Protect external file when ingesting (#4099 ) Summary: If crash happen after a hard link established, Recover function may reuse the file number that has already assigned to the internal file, and this will overwrite the external file. To protect the external file, we have to make sure the file number will never being reused. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4099 Differential Revision: D9034092 Pulled By: riversand963 fbshipit-source-id: 3f1a737440b86aa2ef01673e5013aacbb7c33e28	2018-07-27 14:13:12 -07:00
Maysam Yabandeh	c33b32671e	Correct description of GetColumnFamilyMetaData (#4196 ) Summary: The inline doc was incorrectly mentioned a return status while the function does not return a value. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4196 Differential Revision: D9030927 Pulled By: maysamyabandeh fbshipit-source-id: 07c34dc6bf521021bf790ac1bfedb676171129ec	2018-07-27 11:42:37 -07:00
Maysam Yabandeh	e0906eb785	Clarify max_total_wal_size's scope (#4194 ) Summary: max_total_wal_size takes effect only when there are more than one column families. The patch clarify that in the inline docs Closes https://github.com/facebook/rocksdb/issues/4180 Pull Request resolved: https://github.com/facebook/rocksdb/pull/4194 Differential Revision: D9028767 Pulled By: maysamyabandeh fbshipit-source-id: 8d730ca7f15e76e7ee9ff88b2b48030b2d1b7078	2018-07-27 09:29:44 -07:00
Pooja Malik	134a52e144	Optimizer's skeleton: use advisor to optimize config options (#4169 ) Summary: In https://github.com/facebook/rocksdb/pull/3934 we introduced advisor scripts that make suggestions in the config options based on the log file and stats from a run of rocksdb. The optimizer runs the advisor on a benchmark application in a loop and automatically applies the suggested changes until the config options are optimized. This is a work in progress and the patch is the initial skeleton for the optimizer. The sample application that is run in the loop is currently dbbench. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4169 Reviewed By: maysamyabandeh Differential Revision: D9023671 Pulled By: poojam23 fbshipit-source-id: a6192d475c462cf6eb2b316716f97cb400fcb64d	2018-07-26 17:13:32 -07:00
Yanqin Jin	bdc6abd0b4	Enable cscope to exclude test source files (#4190 ) Summary: Usually when using cscope, the query results contain a lot of function calls in test, making it hard to browse. So this PR aims to provide an option to exclude test source files. Add a new PHONY target, tags0, to exclude test source files while using cscope. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4190 Differential Revision: D9015901 Pulled By: riversand963 fbshipit-source-id: ea9a45756ccff5b26344d37e9ff1c02c5d9736d6	2018-07-26 11:12:29 -07:00
Siying Dong	fd45495cf5	DBImpl::IngestExternalFile() should grab mutex when releasing file number in failure case (#4189 ) Summary: `995fcf7573` has a bug: ReleaseFileNumberFromPendingOutputs() added is not protected by the DB mutex. Fix it by grabbing the lock for this operation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4189 Differential Revision: D9015447 Pulled By: siying fbshipit-source-id: b8506e09a96c3f95a6fe32b5ca5fcdb9bee88937	2018-07-26 11:12:29 -07:00
Siying Dong	2a81633da2	Fix bug when seeking backward against an out-of-bound iterator (#4187 ) Summary: `92ee3350e0` introduces an out-of-bound check in BlockBasedTableIterator::Valid(). However, this flag is not reset when re-seeking in backward direction. This caused the iterator to be invalide by mistake. Fix it by always resetting the out-of-bound flag in every seek. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4187 Differential Revision: D8996600 Pulled By: siying fbshipit-source-id: b6235ea614f71381e50e7904c4fb036300604ac1	2018-07-25 17:14:01 -07:00
Yanqin Jin	18f538038a	Increase version number to 5.16 (#4176 ) Summary: Given that we have cut 5.15, we should bump the version number to the next version, i.e. 5.16. Also update HISTORY.md cc sagar0 Pull Request resolved: https://github.com/facebook/rocksdb/pull/4176 Differential Revision: D8977965 Pulled By: riversand963 fbshipit-source-id: 481d75d2f446946f0eb2afb7e94ef894c8c87e1e	2018-07-24 13:43:33 -07:00
Fenggang Wu	8805ec2f49	DataBlockHashIndex: Standalone Implementation with Unit Test (#4139 ) Summary: The first step of the `DataBlockHashIndex` implementation. A string based hash table is implemented and unit-tested. `DataBlockHashIndexBuilder`: `Add()` takes pairs of `<key, restart_index>`, and formats it into a string when `Finish()` is called. `DataBlockHashIndex`: initialized by the formatted string, and can interpret it as a hash table. Lookup for a key is supported by iterator operation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4139 Reviewed By: sagar0 Differential Revision: D8866764 Pulled By: fgwu fbshipit-source-id: 7f015f0098632c65979a22898a50424384730b10	2018-07-24 11:43:37 -07:00
Manuel Ung	ea212e5316	WriteUnPrepared: Implement unprepared batches for transactions (#4104 ) Summary: This adds support for writing unprepared batches based on size defined in `TransactionOptions::max_write_batch_size`. This is done by overriding methods that modify data (Put/Delete/SingleDelete/Merge) and checking first if write batch size has exceeded threshold. If so, the write batch is written to DB as an unprepared batch. Support for Commit/Rollback for unprepared batch is added as well. This has been done by simply extending the WritePrepared Commit/Rollback logic to take care of all unprep_seq numbers either when updating prepare heap, or adding to commit map. For updating the commit map, this logic exists inside `WriteUnpreparedCommitEntryPreReleaseCallback`. A test change was also made to have transactions unregister themselves when committing without prepare. This is because with write unprepared, there may be unprepared entries (which act similarly to prepared entries) already when a commit is done without prepare. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4104 Differential Revision: D8785717 Pulled By: lth fbshipit-source-id: c02006e281ec1ce00f628e2a7beec0ee73096a91	2018-07-24 00:13:18 -07:00
Chang Su	374c37da5b	move static msgs out of Status class (#4144 ) Summary: The member msgs of class Status contains all types of status messages. When users dump a Status object, msgs will confuse users. So move it out of class Status by making it as file-local static variable. Closes #3831 . Pull Request resolved: https://github.com/facebook/rocksdb/pull/4144 Differential Revision: D8941419 Pulled By: sagar0 fbshipit-source-id: 56b0510258465ff26db15aa6b04e01532e053e3d	2018-07-23 15:44:16 -07:00
Adam Retter	c6d2a7f821	Build improvements: Split docker targets and parallelize java builds Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4165 Differential Revision: D8955531 Pulled By: sagar0 fbshipit-source-id: 97d5a1375e200bde3c6414f94703504a4ed7536a	2018-07-23 13:28:37 -07:00
Siying Dong	4b0a43574a	db_stress to cover upper bound in iterators (#4162 ) Summary: db_stress doesn't cover upper or lower bound in iterators. Try to cover it by randomly assigning a random one. Also in prefix scan tests, with 50% of the chance, set next prefix as the upper bound. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4162 Differential Revision: D8953507 Pulled By: siying fbshipit-source-id: f0f04e9cb6c07cbebbb82b892ca23e0daeea708b	2018-07-23 10:45:29 -07:00
Zhongyi Xie	f95a5b2464	Avoid unnecessary big for-loop when reporting ticker stats stored in GetContext (#3490 ) Summary: Currently in `Version::Get` when reporting ticker stats stored in `GetContext`, there is a big for-loop through all `Ticker` which adds unnecessary cost to overall CPU usage. We can optimize by storing only ticker values that are used in `Get()` calls in a new struct `GetContextStats` since only a small fraction of all tickers are used in `Get()` calls. For comparison, with the new approach we only need to visit 17 values while old approach will require visiting 100+ `Ticker` Pull Request resolved: https://github.com/facebook/rocksdb/pull/3490 Differential Revision: D6969154 Pulled By: miasantreble fbshipit-source-id: fc27072965a3a94125a3e6883d20dafcf5b84029	2018-07-20 16:58:13 -07:00
Zhichao Cao	6811fb0658	Fixed the db_bench MergeRandom only access CF_default (#4155 ) Summary: When running the tracing and analyzing, I found that MergeRandom benchmark in db_bench only access the default column family even the -num_column_families is specified > 1. changes: Using the db_with_cfh as DB to randomly select the column family to execute the Merge operation if -num_column_families is specified > 1. Tested with make asan_check and verified in tracing Pull Request resolved: https://github.com/facebook/rocksdb/pull/4155 Differential Revision: D8907888 Pulled By: zhichao-cao fbshipit-source-id: 2b4bc8fe0e99c8f262f5be6b986c7025d62cf850	2018-07-20 15:58:54 -07:00
Siying Dong	a5e851e113	Reformatting some recent changes (#4161 ) Summary: Lint is not happy with some new code recently committed. Format them. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4161 Differential Revision: D8940582 Pulled By: siying fbshipit-source-id: c9b43b1ef8c88b5e923911058b44eb77234b36b7	2018-07-20 14:43:38 -07:00
Siying Dong	8425c8bd4d	BlockBasedTableReader: automatically adjust tail prefetch size (#4156 ) Summary: Right now we use one hard-coded prefetch size to prefetch data from the tail of the SST files. However, this may introduce a waste for some use cases, while not efficient for others. Introduce a way to adjust this prefetch size by tracking 32 recent times, and pick a value with which the wasted read is less than 10% Pull Request resolved: https://github.com/facebook/rocksdb/pull/4156 Differential Revision: D8916847 Pulled By: siying fbshipit-source-id: 8413f9eb3987e0033ed0bd910f83fc2eeaaf5758	2018-07-20 14:43:37 -07:00
Andrew Kryczka	ab35505e21	Write properties metablock last in block-based tables (#4158 ) Summary: The properties meta-block should come at the end since we always need to read it when opening a file, unlike index/filter/other meta-blocks, which are sometimes read depending on the user's configuration. This ordering will allow us to (in a future PR) do a small readahead on the end of the file to read properties and meta-index blocks with one I/O. The bulk of this PR is a refactoring of the `BlockBasedTableBuilder::Finish` function. It was previously too large with inconsistent error handling, which made it difficult to change. So I broke it up into one function per meta-block write, and tried to make error handling consistent within those functions. Then reordering the metablocks was trivial -- just reorder the calls to these helper functions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4158 Differential Revision: D8921705 Pulled By: ajkr fbshipit-source-id: 96c9cc3182eb1adf11af46adab79dbeba7b12fcc	2018-07-20 09:11:59 -07:00
Yanqin Jin	2736752b33	Fix a bug in MANIFEST group commit (#4157 ) Summary: PR #3944 introduces group commit of `VersionEdit` in MANIFEST. The implementation has a bug. When updating the log file number of each column family, we must consider only `VersionEdit`s that operate on the same column family. Otherwise, a column family may accidentally set its log file number higher than actual value, indicating that log files with smaller file number will be ignored, thus causing some updates to be lost. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4157 Differential Revision: D8916650 Pulled By: riversand963 fbshipit-source-id: 8f456cf688f17bf35ad87b38e30e899aa162f201	2018-07-19 17:27:56 -07:00
Andrew Kryczka	b5613227a9	Smaller tail readahead when not reading index/filters (#4159 ) Summary: In all cases during `BlockBasedTable::Open`, we issue at least three read requests to the file's tail: (1) footer, (2) metaindex block, and (3) properties block. Depending on the config, we may also read other metablocks like filter and index. This PR issues smaller readahead when we expect to do only the three necessary reads mentioned above. Then, 4KB should be enough (ignoring the case where there are lots of user-defined properties). We can keep doing 512KB readahead when additional reads are expected. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4159 Differential Revision: D8924002 Pulled By: ajkr fbshipit-source-id: cfc713275de4d05ce11f18571f1d72e27ccd3356	2018-07-19 16:13:22 -07:00
Dmitri Smirnov	78ab11cd71	Return new operator for Status allocations for Windows (#4128 ) Summary: Windows requires new/delete for memory allocations to be overriden. Refactor to be less intrusive. Differential Revision: D8878047 Pulled By: siying fbshipit-source-id: 35f2b5fec2f88ea48c9be926539c6469060aab36	2018-07-19 15:09:06 -07:00
Sagar Vemuri	f3801528c1	Disable DBFlushTest.SyncFail and DBTest.GroupCommitTest on Travis (#4154 ) Summary: I am temporarily disabling DBFlushTest.SyncFail and DBTest.GroupCommitTest tests on Travis until we figure out the root-cause. These tests will still continue to run locally though. I haven't been able to reproduce these failures locally so far (even on a [local Travis environment](https://docs.travis-ci.com/user/common-build-problems/#Troubleshooting-Locally-in-a-Docker-Image) ). These tests are failing way too frequently causing everyone to wonder why their PR failed on travis, and waste time in debugging. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4154 Differential Revision: D8907258 Pulled By: sagar0 fbshipit-source-id: f40068b16e9245fb3791b6a4796435d1ce1ed205	2018-07-18 18:43:11 -07:00
Pooja Malik	1857576e03	db_bench support for OPTIONS+bloom and nicer output for perf_context (#4153 ) Summary: Adding the string "PERF_CONTEXT:" before the perf_context stats are printed. Setting the filter policy if it's a block based table even when options are being loaded from the provided FLAGS_options_file. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4153 Differential Revision: D8905517 Pulled By: poojam23 fbshipit-source-id: 5956ed7882d39ec8ae654d5dadeb88727a36f0dd	2018-07-18 16:27:49 -07:00
Tomas Kolda	80afa84903	Windows JNI build fixes (#4015 ) Summary: Fixing compilation, unsatisfied link exceptions (updated list of files that needs to be linked) and warnings for Windows build. ```C++ //MSVC 2015 does not support dynamic arrays like: rocksdb::Slice key_parts[jkey_parts_len]; //I have converted to: std::vector<rocksdb::Slice> key_parts; ``` Also reusing `free_key_parts` that does the same as `free_key_value_parts` that was removed. Java elapsedTime unit test increase of sleep to 2 ms. Otherwise it was failing. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4015 Differential Revision: D8558215 Pulled By: sagar0 fbshipit-source-id: d3c34f846343f9218424da2402a2bd367bbd0aa2	2018-07-18 12:31:48 -07:00
Siying Dong	4bb1e239b5	Cap concurrent arena's shard block size to 128KB (#4147 ) Summary: Users sometime see their memtable size far smaller than expected. They probably have hit a fragementation of shard blocks. Cap their size anyway to reduce the impact of problem. 128KB is conservative so I don't imagine it can cause any performance problem. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4147 Differential Revision: D8886706 Pulled By: siying fbshipit-source-id: 8528a2a4196aa4457274522e2565fd3ff28f621e	2018-07-18 10:43:54 -07:00
Yanqin Jin	79f009f22e	Release 5.15. (#4148 ) Summary: Cut 5.15.fb Pull Request resolved: https://github.com/facebook/rocksdb/pull/4148 Differential Revision: D8886802 Pulled By: riversand963 fbshipit-source-id: 6b6427ce97f5b323a7eebf92458fda8b24b0cece	2018-07-17 21:44:51 -07:00
Siying Dong	37e0fdc824	DBSSTTest.DeleteSchedulerMultipleDBPaths data race (#4146 ) Summary: Fix a minor data race in DBSSTTest.DeleteSchedulerMultipleDBPaths reported by TSAN Pull Request resolved: https://github.com/facebook/rocksdb/pull/4146 Differential Revision: D8880945 Pulled By: siying fbshipit-source-id: 25c632f685757735c59ad4ff26b2f346a443a446	2018-07-17 17:57:46 -07:00
Yi Wu	d538ebdff0	Fix write get stuck when pipelined write is enabled (#4143 ) Summary: Fix the issue when pipelined write is enabled, writers can get stuck indefinitely and not able to finish the write. It can show with the following example: Assume there are 4 writers W1, W2, W3, W4 (W1 is the first, W4 is the last). T1: all writers pending in WAL writer queue: WAL writer queue: W1, W2, W3, W4 memtable writer queue: empty T2. W1 finish WAL writer and move to memtable writer queue: WAL writer queue: W2, W3, W4, memtable writer queue: W1 T3. W2 and W3 finish WAL write as a batch group. W2 enter ExitAsBatchGroupLeader and move the group to memtable writer queue, but before wake up next leader. WAL writer queue: W4 memtable writer queue: W1, W2, W3 T4. W1, W2, W3 finish memtable write as a batch group. Note that W2 still in the previous ExitAsBatchGroupLeader, although W1 have done memtable write for W2. WAL writer queue: W4 memtable writer queue: empty T5. The thread corresponding to W3 create another writer W3' with the same address as W3. WAL writer queue: W4, W3' memtable writer queue: empty T6. W2 continue with ExitAsBatchGroupLeader. Because the address of W3' is the same as W3, the last writer in its group, it thinks there are no pending writers, so it reset newest_writer_ to null, emptying the queue. W4 and W3' are deleted from the queue and will never be wake up. The issue exists since pipelined write was introduced in 5.5.0. Closes #3704 Pull Request resolved: https://github.com/facebook/rocksdb/pull/4143 Differential Revision: D8871599 Pulled By: yiwu-arbug fbshipit-source-id: 3502674e51066a954a0660257e24ac588f815e2a	2018-07-17 17:27:51 -07:00
Siying Dong	ddc07b40fc	Remove managed iterator Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4124 Differential Revision: D8829910 Pulled By: siying fbshipit-source-id: f3e952ccf3a631071a5d77c48e327046f8abb560	2018-07-17 14:43:18 -07:00
Siying Dong	995fcf7573	Pending output file number should be released after bulkload failure (#4145 ) Summary: If bulkload fails for an input error, the pending output file number wasn't released. This bug can cause all future files with larger number than the current number won't be deleted, even they are compacted. This commit fixes the bug. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4145 Differential Revision: D8877900 Pulled By: siying fbshipit-source-id: 080be92a23d43305ca1e13fe1c06eb4cd0b01466	2018-07-17 14:13:16 -07:00
Fenggang Wu	5a59ce4149	Coding.h: Added Fixed16 support (#4142 ) Summary: Added Get Put Encode Decode support for Fixed16 (uint16_t). Unit test added in `coding_test.cc` Pull Request resolved: https://github.com/facebook/rocksdb/pull/4142 Differential Revision: D8873516 Pulled By: fgwu fbshipit-source-id: 331913e0a9a8fe9c95606a08e856e953477d64d3	2018-07-16 23:43:41 -07:00
Sagar Vemuri	fb768a4289	Dump mutable FIFO and Universal compaction options (#4140 ) Summary: We forgot to dump FIFO and Universal compaction options to the LOG when any option was dynamically changed via `SetOptions` API. Now added those options also to `MutableCFOptions::Dump`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4140 Differential Revision: D8865634 Pulled By: sagar0 fbshipit-source-id: 05a93e26ab8e72fec6249acccd09b0eb3e1ef0ac	2018-07-16 22:28:24 -07:00
Maysam Yabandeh	b55da012f6	Refactor IndexBlockIter (#4141 ) Summary: Refactor IndexBlockIter to reduce conditional branches on key_includes_seq_. IndexBlockIter::Prev is also separated from DataBlockIter::Prev, not to cache the prev entries as they are of less importance when iterating over the index block. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4141 Differential Revision: D8866437 Pulled By: maysamyabandeh fbshipit-source-id: fdac76880426fc2be7d3c6354c09ab98f6657d4b	2018-07-16 17:13:10 -07:00
Sagar Vemuri	991120fa10	Allow ttl to be changed dynamically (#4133 ) Summary: Allow ttl to be changed dynamically. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4133 Differential Revision: D8845440 Pulled By: sagar0 fbshipit-source-id: c8c87ae643b3a8c4123e4c037c4645efc094a2d3	2018-07-16 14:27:53 -07:00
Siying Dong	8f06b4fa01	Separate some IndexBlockIter logic from BlockIter (#4136 ) Summary: Some logic only related to IndexBlockIter is separated from BlockIter to IndexBlockIter. This is done by writing an exclusive Seek() and SeekForPrev() for DataBlockIter, and all metadata block iter and tombstone block iter now use data block iter. Dealing with the BinarySeek() sharing problem by passing in the comparator to use. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4136 Reviewed By: maysamyabandeh Differential Revision: D8859673 Pulled By: siying fbshipit-source-id: 703e5e6824b82b7cbf4721f3594b94127797ca9e	2018-07-16 10:13:18 -07:00
Nathan VanBenschoten	ef7815b803	Support range deletion tombstones in IngestExternalFile SSTs (#3778 ) Summary: Fixes #3391. This change adds a `DeleteRange` method to `SstFileWriter` and adds support for ingesting SSTs with range deletion tombstones. This is important for applications that need to atomically ingest SSTs while clearing out any existing keys in a given key range. Pull Request resolved: https://github.com/facebook/rocksdb/pull/3778 Differential Revision: D8821836 Pulled By: anand1976 fbshipit-source-id: ca7786c1947ff129afa703dab011d524c7883844	2018-07-13 22:43:09 -07:00
Zhongyi Xie	91d7c03cdc	Exclude time waiting for rate limiter from rocksdb.sst.read.micros (#4102 ) Summary: Our "rocksdb.sst.read.micros" stat includes time spent waiting for rate limiter. It probably only affects people rate limiting compaction reads, which is fairly rare. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4102 Differential Revision: D8848506 Pulled By: miasantreble fbshipit-source-id: 01258ac5ae56e4eee372978cfc9143a6869f8bfc	2018-07-13 18:44:14 -07:00
Peter Mattis	90fc40690a	Relax VersionStorageInfo::GetOverlappingInputs check (#4050 ) Summary: Do not consider the range tombstone sentinel key as causing 2 adjacent sstables in a level to overlap. When a range tombstone's end key is the largest key in an sstable, the sstable's end key is so to a "sentinel" value that is the smallest key in the next sstable with a sequence number of kMaxSequenceNumber. This "sentinel" is guaranteed to not overlap in internal-key space with the next sstable. Unfortunately, GetOverlappingFiles uses user-keys to determine overlap and was thus considering 2 adjacent sstables in a level to overlap if they were separated by this sentinel key. This in turn would cause compactions to be larger than necessary. Note that this conflicts with https://github.com/facebook/rocksdb/pull/2769 and cases `DBRangeDelTest.CompactionTreatsSplitInputLevelDeletionAtomically` to fail. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4050 Differential Revision: D8844423 Pulled By: ajkr fbshipit-source-id: df3f9f1db8f4cff2bff77376b98b83c2ae1d155b	2018-07-13 17:42:38 -07:00

1 2 3 4 5 ...

7325 Commits