rocksdb

Author	SHA1	Message	Date
Zhichao Cao	8ea087ad16	Workload generator (Mixgraph) based on prefix hotness (#5953 ) Summary: In the previous PR https://github.com/facebook/rocksdb/issues/4788, user can use db_bench mix_graph option to generate the workload that is from the social graph. The key is generated based on the key access hotness. In this PR, user can further model the key-range hotness and fit those to two-term-exponential distribution. First, user cuts the whole key space into small key ranges (e.g., key-ranges are the same size and the key-range number is the number of SST files). Then, user calculates the average access count per key of each key-range as the key-range hotness. Next, user fits the key-range hotness to two-term-exponential distribution (f(x) = f(x) = aexp(bx) + cexp(dx)) and generate the value of a, b, c, and d. They are the parameters in db_bench: prefix_dist_a, prefix_dist_b, prefix_dist_c, and prefix_dist_d. Finally, user can run db_bench by specify the parameters. For example: `./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=268435456 -key_dist_a=0.002312 -key_dist_b=0.3467 -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=350 -sine_b=0.0105 -sine_d=50000 --perf_level=2 -reads=1000000 -num=5000000 -key_size=48` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5953 Test Plan: run db_bench with different parameters and checked the results. Differential Revision: D18053527 Pulled By: zhichao-cao fbshipit-source-id: 171f8b3142bd76462f1967c58345ad7e4f84bab7	2019-11-06 13:02:20 -08:00
Yanqin Jin	c0abc6bbc1	Use FLAGS_env for certain operations in db_bench (#5943 ) Summary: Since we already parse env_uri from command line and creates custom Env accordingly, we should invoke the methods of such Envs instead of using Env::Default(). Test Plan (on devserver): ``` $make db_bench db_stress $./db_bench -benchmarks=fillseq ./db_stress ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5943 Differential Revision: D18018550 Pulled By: riversand963 fbshipit-source-id: 03b61329aaae0dfd914a0b902cc677f570f102e3	2019-10-22 11:43:21 -07:00
Zhichao Cao	526e3b9763	Enable trace_replay with multi-threads (#5934 ) Summary: In the current trace replay, all the queries are serialized and called by single threads. It may not simulate the original application query situations closely. The multi-threads replay is implemented in this PR. Users can set the number of threads to replay the trace. The queries generated according to the trace records are scheduled in the thread pool job queue. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5934 Test Plan: test with make check and real trace replay. Differential Revision: D17998098 Pulled By: zhichao-cao fbshipit-source-id: 87eecf6f7c17a9dc9d7ab29dd2af74f6f60212c8	2019-10-18 14:13:50 -07:00
Levi Tamasi	78b28d80b0	Support non-TTL Puts for BlobDB in db_bench (#5921 ) Summary: Currently, db_bench only supports PutWithTTL operations for BlobDB but not regular Puts. The patch adds support for regular (non-TTL) Puts and also changes the default for blob_db_max_ttl_range to zero, which corresponds to no TTL. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5921 Test Plan: make check ./db_bench -benchmarks=fillrandom -statistics -stats_interval_seconds=1 -duration=90 -num=500000 -use_blob_db=1 -blob_db_file_size=1000000 -target_file_size_base=1000000 (issues Put operations with no TTL) ./db_bench -benchmarks=fillrandom -statistics -stats_interval_seconds=1 -duration=90 -num=500000 -use_blob_db=1 -blob_db_file_size=1000000 -target_file_size_base=1000000 -blob_db_max_ttl_range=86400 (issues PutWithTTL operations with random TTLs in the [0, blob_db_max_ttl_range) interval, as before) Differential Revision: D17919798 Pulled By: ltamasi fbshipit-source-id: b946c3522b836b92b4c157ffbad24f92ba2b0a16	2019-10-14 17:49:20 -07:00
sdong	e8263dbdaa	Apply formatter to recent 200+ commits. (#5830 ) Summary: Further apply formatter to more recent commits. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5830 Test Plan: Run all existing tests. Differential Revision: D17488031 fbshipit-source-id: 137458fd94d56dd271b8b40c522b03036943a2ab	2019-09-20 12:04:26 -07:00
Lingjing You	1a928c22a0	Add insert hints for each writebatch (#5728 ) Summary: Add insert hints for each writebatch so that they can be used in concurrent write, and add write option to enable it. Bench result (qps): `./db_bench --benchmarks=fillseq -allow_concurrent_memtable_write=true -num=4000000 -batch-size=1 -threads=1 -db=/data3/ylj/tmp -write_buffer_size=536870912 -num_column_families=4` master: \| batch size \ thread num \| 1 \| 2 \| 4 \| 8 \| \| ----------------------- \| ------- \| ------- \| ------- \| ------- \| \| 1 \| 387883 \| 220790 \| 308294 \| 490998 \| \| 10 \| 1397208 \| 978911 \| 1275684 \| 1733395 \| \| 100 \| 2045414 \| 1589927 \| 1798782 \| 2681039 \| \| 1000 \| 2228038 \| 1698252 \| 1839877 \| 2863490 \| fillseq with writebatch hint: \| batch size \ thread num \| 1 \| 2 \| 4 \| 8 \| \| ----------------------- \| ------- \| ------- \| ------- \| ------- \| \| 1 \| 286005 \| 223570 \| 300024 \| 466981 \| \| 10 \| 970374 \| 813308 \| 1399299 \| 1753588 \| \| 100 \| 1962768 \| 1983023 \| 2676577 \| 3086426 \| \| 1000 \| 2195853 \| 2676782 \| 3231048 \| 3638143 \| Pull Request resolved: https://github.com/facebook/rocksdb/pull/5728 Differential Revision: D17297240 fbshipit-source-id: b053590a6d77871f1ef2f911a7bd013b3899b26c	2019-09-12 17:15:18 -07:00
anand76	eb9026f09b	Add a db_bench benchmark to warm up the row cache Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5707 Differential Revision: D17242698 Pulled By: anand1976 fbshipit-source-id: 5d1bfda3c9e8f56176ae391cae6c91e6262016b8	2019-09-10 11:06:36 -07:00
Zhongyi Xie	2f41ecfe75	Refactor trimming logic for immutable memtables (#5022 ) Summary: MyRocks currently sets `max_write_buffer_number_to_maintain` in order to maintain enough history for transaction conflict checking. The effectiveness of this approach depends on the size of memtables. When memtables are small, it may not keep enough history; when memtables are large, this may consume too much memory. We are proposing a new way to configure memtable list history: by limiting the memory usage of immutable memtables. The new option is `max_write_buffer_size_to_maintain` and it will take precedence over the old `max_write_buffer_number_to_maintain` if they are both set to non-zero values. The new option accounts for the total memory usage of flushed immutable memtables and mutable memtable. When the total usage exceeds the limit, RocksDB may start dropping immutable memtables (which is also called trimming history), starting from the oldest one. The semantics of the old option actually works both as an upper bound and lower bound. History trimming will start if number of immutable memtables exceeds the limit, but it will never go below (limit-1) due to history trimming. In order the mimic the behavior with the new option, history trimming will stop if dropping the next immutable memtable causes the total memory usage go below the size limit. For example, assuming the size limit is set to 64MB, and there are 3 immutable memtables with sizes of 20, 30, 30. Although the total memory usage is 80MB > 64MB, dropping the oldest memtable will reduce the memory usage to 60MB < 64MB, so in this case no memtable will be dropped. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5022 Differential Revision: D14394062 Pulled By: miasantreble fbshipit-source-id: 60457a509c6af89d0993f988c9b5c2aa9e45f5c5	2019-08-23 13:55:34 -07:00
Vijay Nadimpalli	d150e01474	New API to get all merge operands for a Key (#5604 ) Summary: This is a new API added to db.h to allow for fetching all merge operands associated with a Key. The main motivation for this API is to support use cases where doing a full online merge is not necessary as it is performance sensitive. Example use-cases: 1. Update subset of columns and read subset of columns - Imagine a SQL Table, a row is encoded as a K/V pair (as it is done in MyRocks). If there are many columns and users only updated one of them, we can use merge operator to reduce write amplification. While users only read one or two columns in the read query, this feature can avoid a full merging of the whole row, and save some CPU. 2. Updating very few attributes in a value which is a JSON-like document - Updating one attribute can be done efficiently using merge operator, while reading back one attribute can be done more efficiently if we don't need to do a full merge. ---------------------------------------------------------------------------------------------------- API : Status GetMergeOperands( const ReadOptions& options, ColumnFamilyHandle* column_family, const Slice& key, PinnableSlice* merge_operands, GetMergeOperandsOptions* get_merge_operands_options, int* number_of_operands) Example usage : int size = 100; int number_of_operands = 0; std::vector<PinnableSlice> values(size); GetMergeOperandsOptions merge_operands_info; db_->GetMergeOperands(ReadOptions(), db_->DefaultColumnFamily(), "k1", values.data(), merge_operands_info, &number_of_operands); Description : Returns all the merge operands corresponding to the key. If the number of merge operands in DB is greater than merge_operands_options.expected_max_number_of_operands no merge operands are returned and status is Incomplete. Merge operands returned are in the order of insertion. merge_operands-> Points to an array of at-least merge_operands_options.expected_max_number_of_operands and the caller is responsible for allocating it. If the status returned is Incomplete then number_of_operands will contain the total number of merge operands found in DB for key. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5604 Test Plan: Added unit test and perf test in db_bench that can be run using the command: ./db_bench -benchmarks=getmergeoperands --merge_operator=sortlist Differential Revision: D16657366 Pulled By: vjnadimpalli fbshipit-source-id: 0faadd752351745224ee12d4ae9ef3cb529951bf	2019-08-06 14:26:44 -07:00
Mark Rambacher	cfcf045acc	The ObjectRegistry class replaces the Registrar and NewCustomObjects.… (#5293 ) Summary: The ObjectRegistry class replaces the Registrar and NewCustomObjects. Objects are registered with the registry by Type (the class must implement the static const char *Type() method). This change is necessary for a few reasons: - By having a class (rather than static template instances), the class can be passed between compilation units, meaning that objects could be registered and shared from a dynamic library with an executable. - By having a class with instances, different units could have different objects registered. This could be useful if, for example, one Option allowed for a dynamic library and one did not. When combined with some other PRs (being able to load shared libraries, a Configurable interface to configure objects to/from string), this code will allow objects in external shared libraries to be added to a RocksDB image at run-time, rather than requiring every new extension to be built into the main library and called explicitly by every program. Test plan (on riversand963's devserver) ``` $COMPILE_WITH_ASAN=1 make -j32 all && sleep 1 && make check ``` All tests pass. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5293 Differential Revision: D16363396 Pulled By: riversand963 fbshipit-source-id: fbe4acb615bfc11103eef40a0b288845791c0180	2019-07-23 17:13:05 -07:00
sdong	e4dcf5fd22	db_bench to add a new "benchmark" to print out all stats history (#5532 ) Summary: Sometimes it is helpful to fetch the whole history of stats after benchmark runs. Add such an option Pull Request resolved: https://github.com/facebook/rocksdb/pull/5532 Test Plan: Run the benchmark manually and observe the output is as expected. Differential Revision: D16097764 fbshipit-source-id: 10b5b735a22a18be198b8f348be11f11f8806904	2019-07-03 20:03:28 -07:00
haoyuhuang	66464d1fde	Remove multiple declarations o kMicrosInSecond. Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5526 Test Plan: OPT=-g V=1 make J=1 unity_test -j32 make clean && make -j32 Differential Revision: D16079315 Pulled By: HaoyuHuang fbshipit-source-id: 294ab439cf0db8dd5da44e30eabf0cbb2bb8c4f6	2019-07-01 15:15:12 -07:00
Eli Pozniansky	3e6c185381	Formatting fixes in db_bench_tool (#5525 ) Summary: Formatting fixes in db_bench_tool that were accidentally omitted Pull Request resolved: https://github.com/facebook/rocksdb/pull/5525 Test Plan: Unit tests Differential Revision: D16078516 Pulled By: elipoz fbshipit-source-id: bf8df0e3f08092a91794ebf285396d9b8a335bb9	2019-07-01 14:57:28 -07:00
Eli Pozniansky	f872009237	Fix from some C-style casting (#5524 ) Summary: Fix from some C-style casting in bloom.cc and ./tools/db_bench_tool.cc Pull Request resolved: https://github.com/facebook/rocksdb/pull/5524 Differential Revision: D16075626 Pulled By: elipoz fbshipit-source-id: 352948885efb64a7ef865942c75c3c727a914207	2019-07-01 13:05:34 -07:00
Zhongyi Xie	671d15cbdd	Persistent Stats: persist stats history to disk (#5046 ) Summary: This PR continues the work in https://github.com/facebook/rocksdb/pull/4748 and https://github.com/facebook/rocksdb/pull/4535 by adding a new DBOption `persist_stats_to_disk` which instructs RocksDB to persist stats history to RocksDB itself. When statistics is enabled, and both options `stats_persist_period_sec` and `persist_stats_to_disk` are set, RocksDB will periodically write stats to a built-in column family in the following form: key -> (timestamp in microseconds)#(stats name), value -> stats value. The existing API `GetStatsHistory` will detect the current value of `persist_stats_to_disk` and either read from in-memory data structure or from the hidden column family on disk. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5046 Differential Revision: D15863138 Pulled By: miasantreble fbshipit-source-id: bb82abdb3f2ca581aa42531734ac799f113e931b	2019-06-17 15:21:50 -07:00
haoyuhuang	d43b4cd570	Integrate block cache tracing into db_bench (#5459 ) Summary: This PR integrates the block cache tracing into db_bench. It adds three command line arguments. -block_cache_trace_file (Block cache trace file path.) type: string default: "" -block_cache_trace_max_trace_file_size_in_bytes (The maximum block cache trace file size in bytes. Block cache accesses will not be logged if the trace file size exceeds this threshold. Default is 64 GB.) type: int64 default: 68719476736 -block_cache_trace_sampling_frequency (Block cache trace sampling frequency, termed s. It uses spatial downsampling and samples accesses to one out of s blocks.) type: int32 default: 1 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5459 Differential Revision: D15832031 Pulled By: HaoyuHuang fbshipit-source-id: 0ecf2f2686557251fe741a2769b21170777efa3d	2019-06-17 11:08:21 -07:00
Zhongyi Xie	d68f9f4580	simplify include directive involving inttypes (#5402 ) Summary: When using `PRIu64` type of printf specifier, current code base does the following: ``` #ifndef __STDC_FORMAT_MACROS #define __STDC_FORMAT_MACROS #endif #include <inttypes.h> ``` However, this can be simplified to ``` #include <cinttypes> ``` as long as flag `-std=c++11` is used. This should solve issues like https://github.com/facebook/rocksdb/issues/5159 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5402 Differential Revision: D15701195 Pulled By: miasantreble fbshipit-source-id: 6dac0a05f52aadb55e9728038599d3d2e4b59d03	2019-06-06 13:56:07 -07:00
Vijay Nadimpalli	49c5a12dbe	Organizing rocksdb/db directory Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5390 Differential Revision: D15579388 Pulled By: vjnadimpalli fbshipit-source-id: 5bfc95e31554b8ff05b97b76d6534113f527f366	2019-05-31 11:57:01 -07:00
Yanqin Jin	83f7a8eed0	Fix compilation error in LITE mode (#5391 ) Summary: Add macro ROCKSDB_LITE to fix compilation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5391 Differential Revision: D15574522 Pulled By: riversand963 fbshipit-source-id: 95aea83c5d9b2bf98a3ba0ef9167b63c9be2988b	2019-05-31 08:32:22 -07:00
Yanqin Jin	b9f5900658	Fix WAL replay by skipping old write batches (#5170 ) Summary: 1. Fix a bug in WAL replay in which write batches with old sequence numbers are mistakenly inserted into memtables. 2. Add support for benchmarking secondary instance to db_bench_tool. With changes made in this PR, we can start benchmarking secondary instance using two processes. It is also possible to vary the frequency at which the secondary instance tries to catch up with the primary. The info log of the secondary can be found in a directory whose path can be specified with '-secondary_path'. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5170 Differential Revision: D15564608 Pulled By: riversand963 fbshipit-source-id: ce97688ed3d33f69d3a0b9266ebbbbf887aa0ec8	2019-05-30 19:33:33 -07:00
Siying Dong	8843129ece	Move some memory related files from util/ to memory/ (#5382 ) Summary: Move arena, allocator, and memory tools under util to a separate memory/ directory. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5382 Differential Revision: D15564655 Pulled By: siying fbshipit-source-id: 9cd6b5d0d3d52b39606e19221fa154596e5852a5	2019-05-30 17:44:09 -07:00
Siying Dong	e9e0101ca4	Move test related files under util/ to test_util/ (#5377 ) Summary: There are too many types of files under util/. Some test related files don't belong to there or just are just loosely related. Mo ve them to a new directory test_util/, so that util/ is cleaner. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5377 Differential Revision: D15551366 Pulled By: siying fbshipit-source-id: 0f5c8653832354ef8caa31749c0143815d719e2c	2019-05-30 11:25:51 -07:00
Maysam Yabandeh	eab4f49a2c	WritePrepared: skip_concurrency_control option (#5330 ) Summary: This enables the user to set TransactionDBOptions::skip_concurrency_control so the standard `DB::Write(const WriteOptions& opts, WriteBatch* updates)` would skip the concurrency control. This would give higher throughput to the users who know their use case doesn't need concurrency control. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5330 Differential Revision: D15525932 Pulled By: maysamyabandeh fbshipit-source-id: 68421ac1ba34f549a4a8de9ce4c2dccf6fb4b06b	2019-05-28 16:29:45 -07:00
Zhichao Cao	a13026fb2f	Added trace replay fast forward function (#5273 ) Summary: In the current db_bench trace replay, the replay process strictly follows the timestamp to issue the queries. In some cases, user does not care about the time. Therefore, fast forward is needed for users to speed up the replay process. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5273 Differential Revision: D15389232 Pulled By: zhichao-cao fbshipit-source-id: 735d629b9d2a167b05af3e4fa0ddf9d5d0be1806	2019-05-16 20:21:18 -07:00
Maysam Yabandeh	f383641a1d	Unordered Writes (#5218 ) Summary: Performing unordered writes in rocksdb when unordered_write option is set to true. When enabled the writes to memtable are done without joining any write thread. This offers much higher write throughput since the upcoming writes would not have to wait for the slowest memtable write to finish. The tradeoff is that the writes visible to a snapshot might change over time. If the application cannot tolerate that, it should implement its own mechanisms to work around that. Using TransactionDB with WRITE_PREPARED write policy is one way to achieve that. Doing so increases the max throughput by 2.2x without however compromising the snapshot guarantees. The patch is prepared based on an original by siying Existing unit tests are extended to include unordered_write option. Benchmark Results: ``` TEST_TMPDIR=/dev/shm/ ./db_bench_unordered --benchmarks=fillrandom --threads=32 --num=10000000 -max_write_buffer_number=16 --max_background_jobs=64 --batch_size=8 --writes=3000000 -level0_file_num_compaction_trigger=99999 --level0_slowdown_writes_trigger=99999 --level0_stop_writes_trigger=99999 -enable_pipelined_write=false -disable_auto_compactions --unordered_write=1 ``` With WAL - Vanilla RocksDB: 78.6 MB/s - WRITER_PREPARED with unordered_write: 177.8 MB/s (2.2x) - unordered_write: 368.9 MB/s (4.7x with relaxed snapshot guarantees) Without WAL - Vanilla RocksDB: 111.3 MB/s - WRITER_PREPARED with unordered_write: 259.3 MB/s MB/s (2.3x) - unordered_write: 645.6 MB/s (5.8x with relaxed snapshot guarantees) - WRITER_PREPARED with unordered_write disable concurrency control: 185.3 MB/s MB/s (2.35x) Limitations: - The feature is not yet extended to `max_successive_merges` > 0. The feature is also incompatible with `enable_pipelined_write` = true as well as with `allow_concurrent_memtable_write` = false. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5218 Differential Revision: D15219029 Pulled By: maysamyabandeh fbshipit-source-id: 38f2abc4af8780148c6128acdba2b3227bc81759	2019-05-13 17:47:21 -07:00
Yi Wu	92c60547fe	db_bench: fix hang on IO error (#5300 ) Summary: db_bench will wait indefinitely if there's background error. Fix by pass `abs_time_us` to cond var. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5300 Differential Revision: D15319945 Pulled By: miasantreble fbshipit-source-id: 0034fb7f6ec7c3303c4ccf26e54c20fbdac8ab44	2019-05-13 11:30:35 -07:00
Sagar Vemuri	3548e4220d	Improve explicit user readahead performance (#5246 ) Summary: Improve the iterators performance when the user explicitly sets the readahead size via `ReadOptions.readahead_size`. 1. Stop creating new table readers when the user explicitly sets readahead size. 2. Make use of an internal buffer based on `FilePrefetchBuffer` instead of using `ReadaheadRandomAccessFileReader`, to handle the user readahead requests (for both buffered and direct io cases). 3. Add `readahead_size` to db_bench. Benchmarks: https://gist.github.com/sagar0/53693edc320a18abeaeca94ca32f5737 For 1 MB readahead, Buffered IO performance improves by 28% and Direct IO performance improves by 50%. For 512KB readahead, Buffered IO performance improves by 30% and Direct IO performance improves by 67%. Test Plan: Updated `DBIteratorTest.ReadAhead` test to make sure that: - no new table readers are created for iterators on setting ReadOptions.readahead_size - At least "readahead" number of bytes are actually getting read on each iterator read. TODO later: - Use similar logic for compactions as well. - This ties in nicely with #4052 and paves the way for removing ReadaheadRandomAcessFile later. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5246 Differential Revision: D15107946 Pulled By: sagar0 fbshipit-source-id: 2c1149729ca7d779e4e8b7710ba6f4e8cbfd3bea	2019-04-26 21:24:10 -07:00
Adam Retter	990b2f4cb3	Fix compilation on db_bench_tool.cc on Windows (#5227 ) Summary: I needed this change to be able to build the v6.0.1 release on Windows. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5227 Differential Revision: D15033815 Pulled By: sagar0 fbshipit-source-id: 579f3b8e694c34c0d43527eb2fa37175e37f5911	2019-04-23 11:16:51 -07:00
Zhongyi Xie	baa5302447	Avoid double-compacting data in bottom level in manual compactions (#5138 ) Summary: Depending on the config, manual compaction (leveled compaction style) does following compactions: L0->L1 L1->L2 ... Ln-1 -> Ln Ln -> Ln The final Ln -> Ln compaction is partly unnecessary as it recompacts all the files that were just generated by the Ln-1 -> Ln. We should avoid recompacting such files. This rule should be applied to Lmax only. Resolves issue https://github.com/facebook/rocksdb/issues/4995 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5138 Differential Revision: D14940106 Pulled By: miasantreble fbshipit-source-id: 8d3cf5507a17e76f3333cfd4bac5256d005636e5	2019-04-16 23:32:20 -07:00
Yi Wu	b70967aac7	db_bench: support seek to non-exist prefix (#5163 ) Summary: Add `--seek_missing_prefix` flag to db_bench to allow benchmarking seeking to non-existing prefix. Usage example: ``` ./db_bench --db=/dev/shm/db_bench --use_existing_db=false --benchmarks=fillrandom --num=100000000 --prefix_size=9 --keys_per_prefix=10 ./db_bench --db=/dev/shm/db_bench --use_existing_db=true --benchmarks=seekrandom --disable_auto_compactions=true --num=100000000 --prefix_size=9 --keys_per_prefix=10 --reads=1000 --prefix_same_as_start=true --seek_missing_prefix=true ``` Also adding `--total_order_seek` and `--prefix_same_as_start` flags. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5163 Differential Revision: D14935724 Pulled By: riversand963 fbshipit-source-id: 7c41023f007febe373eb1589861f215432a9e18a	2019-04-15 10:54:58 -07:00
Yanqin Jin	3189398c00	Fix bugs detected by clang analyzer (#5185 ) Summary: as titled. False positive included, fixed anyway to make the check pass. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5185 Differential Revision: D14909384 Pulled By: riversand963 fbshipit-source-id: dc5177e72b1929ccfd6175a60e2cd7bdb9bd80f3	2019-04-12 10:45:56 -07:00
anand76	fefd4b98c5	Introduce a new MultiGet batching implementation (#5011 ) Summary: This PR introduces a new MultiGet() API, with the underlying implementation grouping keys based on SST file and batching lookups in a file. The reason for the new API is twofold - the definition allows callers to allocate storage for status and values on stack instead of std::vector, as well as return values as PinnableSlices in order to avoid copying, and it keeps the original MultiGet() implementation intact while we experiment with batching. Batching is useful when there is some spatial locality to the keys being queries, as well as larger batch sizes. The main benefits are due to - 1. Fewer function calls, especially to BlockBasedTableReader::MultiGet() and FullFilterBlockReader::KeysMayMatch() 2. Bloom filter cachelines can be prefetched, hiding the cache miss latency The next step is to optimize the binary searches in the level_storage_info, index blocks and data blocks, since we could reduce the number of key comparisons if the keys are relatively close to each other. The batching optimizations also need to be extended to other formats, such as PlainTable and filter formats. This also needs to be added to db_stress. Benchmark results from db_bench for various batch size/locality of reference combinations are given below. Locality was simulated by offsetting the keys in a batch by a stride length. Each SST file is about 8.6MB uncompressed and key/value size is 16/100 uncompressed. To focus on the cpu benefit of batching, the runs were single threaded and bound to the same cpu to eliminate interference from other system events. The results show a 10-25% improvement in micros/op from smaller to larger batch sizes (4 - 32). Batch Sizes 1 \| 2 \| 4 \| 8 \| 16 \| 32 Random pattern (Stride length 0) 4.158 \| 4.109 \| 4.026 \| 4.05 \| 4.1 \| 4.074 - Get 4.438 \| 4.302 \| 4.165 \| 4.122 \| 4.096 \| 4.075 - MultiGet (no batching) 4.461 \| 4.256 \| 4.277 \| 4.11 \| 4.182 \| 4.14 - MultiGet (w/ batching) Good locality (Stride length 16) 4.048 \| 3.659 \| 3.248 \| 2.99 \| 2.84 \| 2.753 4.429 \| 3.728 \| 3.406 \| 3.053 \| 2.911 \| 2.781 4.452 \| 3.45 \| 2.833 \| 2.451 \| 2.233 \| 2.135 Good locality (Stride length 256) 4.066 \| 3.786 \| 3.581 \| 3.447 \| 3.415 \| 3.232 4.406 \| 4.005 \| 3.644 \| 3.49 \| 3.381 \| 3.268 4.393 \| 3.649 \| 3.186 \| 2.882 \| 2.676 \| 2.62 Medium locality (Stride length 4096) 4.012 \| 3.922 \| 3.768 \| 3.61 \| 3.582 \| 3.555 4.364 \| 4.057 \| 3.791 \| 3.65 \| 3.57 \| 3.465 4.479 \| 3.758 \| 3.316 \| 3.077 \| 2.959 \| 2.891 dbbench command used (on a DB with 4 levels, 12 million keys)- TEST_TMPDIR=/dev/shm numactl -C 10 ./db_bench.tmp -use_existing_db=true -benchmarks="readseq,multireadrandom" -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5011 Differential Revision: D14348703 Pulled By: anand1976 fbshipit-source-id: 774406dab3776d979c809522a67bedac6c17f84b	2019-04-11 14:28:26 -07:00
Siying Dong	2b4d5ceb47	Remove some "using std::..." from header files. (#5113 ) Summary: The code convention we are following, Google C++ Style, discourage alias in header files, especially public headers: https://google.github.io/styleguide/cppguide.html#Aliases Remove some of them. Might removed some from .cc files as well to be consistent. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5113 Differential Revision: D14633030 Pulled By: siying fbshipit-source-id: b990edc919d5de60295992284f980195e501d424	2019-03-27 10:28:21 -07:00
Shobhit Dayal	b45b1cde3e	Feature for sampling and reporting compressibility (#4842 ) Summary: This is a feature to sample data-block compressibility and and report them as stats. 1 in N (tunable) blocks is sampled for compressibility using two algorithms: 1. lz4 or snappy for fast compression 2. zstd or zlib for slow but higher compression. The stats are reported to the caller as raw-bytes and compressed-bytes. The block continues to be compressed for storage using the specified CompressionType. The db_bench_tool how has a command line option for specifying the sampling rate. It's default value is 0 (no sampling). To test the overhead for a certain value, users can compare the performance of db_bench_tool, varying the sampling rate. It is unlikely to have a noticeable impact for high values like 20. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4842 Differential Revision: D13629011 Pulled By: shobhitdayal fbshipit-source-id: 14ca668bcab6499b2a1734edf848eb62a4f4fafa	2019-03-18 12:15:34 -07:00
Zhichao Cao	05ebfebc17	Fixed the potential stack overflow of MixGraph in db_bench (#5051 ) Summary: In the MixGraph benchmark of db_bench, The max buffer size used for value of KV-pair might be extremely large (64MB), which might cause function stack overflow in some platforms, reduced to 1MB. Added the finished ops printing in MixGraph benchmark. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5051 Differential Revision: D14379571 Pulled By: zhichao-cao fbshipit-source-id: 24084fbe38f60f2902d9a40f6bc9a25e4e2c9bb9	2019-03-08 14:10:17 -08:00
Andrew Kryczka	18d2e4beb7	Run db_bench on database generated externally (#5017 ) Summary: Added an option, `-use_existing_keys`, which can be set to run benchmarks against an arbitrary existing database. Now users can benchmark against their actual database rather than synthetic data. Before the run begins, it loads all the keys into memory, then uses that set of keys rather than synthesizing new ones in `GenerateKeyFromInt`. This is mainly intended for small-scale DBs where the memory consumption is not a concern. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5017 Differential Revision: D14270303 Pulled By: riversand963 fbshipit-source-id: 6328df9dffb5e19170270dd00a69f4bbe424e5ed	2019-03-01 11:19:03 -08:00
Siying Dong	aef763b6d6	Make statistics's stats_level change thread-safe (#5030 ) Summary: Right now, users can change statistics.stats_level while DB is running, but TSAN may report data race. We make stats_level_ to be atomic, and access them using accessors. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5030 Differential Revision: D14267519 Pulled By: siying fbshipit-source-id: 37d7ebeff7a43a406230143422a16af899163f73	2019-03-01 10:42:09 -08:00
Siying Dong	5e298f865b	Add two more StatsLevel (#5027 ) Summary: Statistics cost too much CPU for some use cases. Add two stats levels so that people can choose to skip two types of expensive stats, timers and histograms. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5027 Differential Revision: D14252765 Pulled By: siying fbshipit-source-id: 75ecec9eaa44c06118229df4f80c366115346592	2019-02-28 10:27:59 -08:00
Zhongyi Xie	c4f5d0aa15	add GetStatsHistory to retrieve stats snapshots (#4748 ) Summary: This PR adds public `GetStatsHistory` API to retrieve stats history in the form of an std map. The key of the map is the timestamp in microseconds when the stats snapshot is taken, the value is another std map from stats name to stats value (stored in std string). Two DBOptions are introduced: `stats_persist_period_sec` (default 10 minutes) controls the intervals between two snapshots are taken; `max_stats_history_count` (default 10) controls the max number of history snapshots to keep in memory. RocksDB will stop collecting stats snapshots if `stats_persist_period_sec` is set to 0. (This PR is the in-memory part of https://github.com/facebook/rocksdb/pull/4535) Pull Request resolved: https://github.com/facebook/rocksdb/pull/4748 Differential Revision: D13961471 Pulled By: miasantreble fbshipit-source-id: ac836d401ecb84ea92216bf9966f969dedf4ad04	2019-02-20 15:52:54 -08:00
Zhongyi Xie	ed995c6a69	add whole key bloom filter support in memtables (#4985 ) Summary: MyRocks calls `GetForUpdate` on `INSERT`, for unique key check, and in almost all cases GetForUpdate returns empty result. For such cases, whole key bloom filter is helpful. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4985 Differential Revision: D14118257 Pulled By: miasantreble fbshipit-source-id: d35cb7109c62fd5ad541a26968e3a3e16d3e85ea	2019-02-19 12:15:39 -08:00
Siying Dong	4db46aa2e6	Fix LITE Build (#4989 ) Summary: LITE mode has EventListener to be an empty class. However in db_bench, it is used. When "override" is added to the functions, the build breaks. Fix it by keeping the listener empty in LITE mode. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4989 Differential Revision: D14108132 Pulled By: siying fbshipit-source-id: 80121aab35b1120e502b37b782301dd700692697	2019-02-15 16:13:11 -08:00
Aubin Sanyal	3231a2e581	Deprecate ttl option from CompactionOptionsFIFO (#4965 ) Summary: We introduced ttl option in CompactionOptionsFIFO when ttl-based file deletion (compaction) was supported only as part of FIFO Compaction. But with the extension of ttl semantics even to Level compaction, CompactionOptionsFIFO.ttl can now be deprecated. Instead we will start using ColumnFamilyOptions.ttl for FIFO compaction as well. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4965 Differential Revision: D14072960 Pulled By: sagar0 fbshipit-source-id: c98cc2ae695a28136295787cd88d36a220fc219e	2019-02-15 09:51:41 -08:00
Michael Liu	ca89ac2ba9	Apply modernize-use-override (2nd iteration) Summary: Use C++11’s override and remove virtual where applicable. Change are automatically generated. Reviewed By: Orvid Differential Revision: D14090024 fbshipit-source-id: 1e9432e87d2657e1ff0028e15370a85d1739ba2a	2019-02-14 14:41:36 -08:00
Siying Dong	cf3a671733	Remove cuckoo hash memtable (#4953 ) Summary: Cuckoo Hash is less useful than we initially expected. Remove it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4953 Differential Revision: D13979264 Pulled By: siying fbshipit-source-id: 2a60afdaa989f045357398b43a1cc5d46f4492ed	2019-02-07 16:15:27 -08:00
Siying Dong	d9c9f3c809	db_bench: fix "micros/op" reporting (#4949 ) Summary: `4985a9f73b (diff-e5276985b26a0551957144f4420a594bR511)` changes the meaning of latency reporting from running time per query, to elapse_time / #ops, without providing a reason why. Considering that this is a counter-intuitive reporting, Reverting the change. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4949 Differential Revision: D13964684 Pulled By: siying fbshipit-source-id: d6304d3d4b5a802daa292302623c7dbca9a680bc	2019-02-05 17:20:02 -08:00
Alexander Zinoviev	32a6dd9a41	Add a new CPU time counter to compaction report (#4889 ) Summary: Measure CPU time consumed for a compaction and report it in the stats report Enable NowCPUNanos() to work for MacOS Pull Request resolved: https://github.com/facebook/rocksdb/pull/4889 Differential Revision: D13701276 Pulled By: zinoale fbshipit-source-id: 5024e5bbccd4dd10fd90d947870237f436445055	2019-01-29 17:24:00 -08:00
zhichao-cao	e2547103fd	Fix the build error caused by the dynamic array (#4918 ) Summary: In the MixGraph benchmark of db_bench #4788 , the char array is initialized with an argument from user's input, which can cause build error on some platforms. Also, the msg char array size can be potentially smaller than the printed data, which should be extended from 100 to 256. Tested with make check. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4918 Differential Revision: D13844298 Pulled By: sagar0 fbshipit-source-id: 33c4809c5c4438f0a9f7b289d3f42e20c545bbab	2019-01-28 12:39:57 -08:00
Andrew Kryczka	e242fa4664	Add latest toolchain (gcc-8, etc.) build support for fbcode users (#4923 ) Summary: - When building with internal dependencies, specify this toolchain by setting `ROCKSDB_FBCODE_BUILD_WITH_PLATFORM007=1` - It is not enabled by default. However, it is enabled for TSAN builds in CI since there is a known problem with TSAN in gcc-5: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71090 - I did not add support for Lua since (1) we agreed to deprecate it, and (2) we only have an internal build for v5.3 with this toolchain while that has breaking changes compared to our current version (v5.2). Pull Request resolved: https://github.com/facebook/rocksdb/pull/4923 Differential Revision: D13827226 Pulled By: ajkr fbshipit-source-id: 9aa3388ed3679777cfb15ef8cbcb83c07f62f947	2019-01-28 11:26:32 -08:00
Sagar Vemuri	0cead31d10	Fix Clang static analyzer warning in db_bench (#4910 ) Summary: Fixed clang static analyzer warning about division by 0. ``` ar: creating librocksdb_debug.a tools/db_bench_tool.cc:4650:43: warning: Division by zero int pos = static_cast<int>(rand_num % range_); ~~~~~~~~~^~~~~~~~ 1 warning generated. make: *** [analyze] Error 1 ``` This is from the new code I recently merged in `ce8e88d2d7`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4910 Differential Revision: D13788037 Pulled By: sagar0 fbshipit-source-id: f48851dca85047c19fbb1a361e25ce643aa4c7ea	2019-01-23 13:33:02 -08:00
Zhongyi Xie	cbe0239270	add cast to avoid loss of precision error (#4906 ) Summary: this PR address the following error: > tools/db_bench_tool.cc:4776:68: error: implicit conversion loses integer precision: 'int64_t' (aka 'long') to 'unsigned int' [-Werror,-Wshorten-64-to-32] s = db_with_cfh->db->Put(write_options_, key, gen.Generate(value_size)); Pull Request resolved: https://github.com/facebook/rocksdb/pull/4906 Differential Revision: D13780185 Pulled By: miasantreble fbshipit-source-id: 1c83a77d341099518c72f0f4a63e97ab9c4784b3	2019-01-22 22:44:17 -08:00

1 2 3 4 5

220 Commits