rocksdb

Author	SHA1	Message	Date
sdong	4e0dcd36df	db_stress sometimes generates keys close to SST file boundaries (#6037 ) Summary: Recently, a bug was found related to a seek key that is close to SST file boundary. However, it only occurs in a very small chance in db_stress, because the chance that a random key hits SST file boundaries is small. To boost the chance, with 1/16 chance, we pick keys that are close to SST file boundaries. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6037 Test Plan: Did some manual printing out, and hack to cover the key generation logic to be correct. Differential Revision: D18598476 fbshipit-source-id: 13b76687d106c5be4e3e02a0c77fa5578105a071	2019-11-19 13:17:03 -08:00
sdong	a150604e10	db_stress to cover total order seek (#6039 ) Summary: Right now, in db_stress, as long as prefix extractor is defined, TestIterator always uses. There is value of cover total_order_seek = true when prefix extractor is define. Add a small chance that this flag is turned on. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6039 Test Plan: Run the test for a while. Differential Revision: D18539689 fbshipit-source-id: 568790dd7789c9986b83764b870df0423a122d99	2019-11-18 15:01:38 -08:00
sdong	6123611c42	crash_test: use large max_manifest_file_size most of the time. (#6034 ) Summary: Right now, crash_test always uses 16KB max_manifest_file_size value. It is good to cover logic of manifest file switch. However, information stored in manifest files might be useful in debugging failures. Switch to only use small manifest file size in 1/15 of the time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6034 Test Plan: Observe command generated by db_crash_test.py multiple times and see the --max_manifest_file_size value distribution. Differential Revision: D18513824 fbshipit-source-id: 7b3ae6dbe521a0918df41064e3fa5ecbf2466e04	2019-11-14 14:01:06 -08:00
sdong	a19de78da5	db_stress to cover SeekForPrev() (#6022 ) Summary: Right now, db_stress doesn't cover SeekForPrev(). Add the coverage, which mirrors what we do for Seek(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6022 Test Plan: Run "make crash_test". Do some manual source code hack to simular iterator wrong results and see it caught. Differential Revision: D18442193 fbshipit-source-id: 879b79000d5e33c625c7e970636de191ccd7776c	2019-11-11 17:33:54 -08:00
sdong	1da1f04231	Stress test to relax the iterator verification case for lower bound (#5869 ) Summary: In stress test, all iterator verification is turned off is lower bound is enabled. This might be stricter than needed. This PR relaxes the condition and include the case where lower bound is lower than both of seek key and upper bound. It seems to work mostly fine when I run crash test locally. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5869 Test Plan: Run crash_test Differential Revision: D18363578 fbshipit-source-id: 23d57e11ea507949b8100f4190ddfbe8db052d5a	2019-11-07 11:16:59 -08:00
sdong	111ebf3161	db_stress: improve TestGet() failure printing (#5989 ) Summary: Right now, in db_stress's CF consistency test's TestGet case, if failure happens, we do normal string printing, rather than hex printing, so that some text is not printed out, which makes debugging harder. Fix it by printing hex instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5989 Test Plan: Build db_stress and see t passes. Differential Revision: D18363552 fbshipit-source-id: 09d1b8f6fbff37441cbe7e63a1aef27551226cec	2019-11-06 17:38:25 -08:00
Zhichao Cao	8ea087ad16	Workload generator (Mixgraph) based on prefix hotness (#5953 ) Summary: In the previous PR https://github.com/facebook/rocksdb/issues/4788, user can use db_bench mix_graph option to generate the workload that is from the social graph. The key is generated based on the key access hotness. In this PR, user can further model the key-range hotness and fit those to two-term-exponential distribution. First, user cuts the whole key space into small key ranges (e.g., key-ranges are the same size and the key-range number is the number of SST files). Then, user calculates the average access count per key of each key-range as the key-range hotness. Next, user fits the key-range hotness to two-term-exponential distribution (f(x) = f(x) = aexp(bx) + cexp(dx)) and generate the value of a, b, c, and d. They are the parameters in db_bench: prefix_dist_a, prefix_dist_b, prefix_dist_c, and prefix_dist_d. Finally, user can run db_bench by specify the parameters. For example: `./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=268435456 -key_dist_a=0.002312 -key_dist_b=0.3467 -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=350 -sine_b=0.0105 -sine_d=50000 --perf_level=2 -reads=1000000 -num=5000000 -key_size=48` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5953 Test Plan: run db_bench with different parameters and checked the results. Differential Revision: D18053527 Pulled By: zhichao-cao fbshipit-source-id: 171f8b3142bd76462f1967c58345ad7e4f84bab7	2019-11-06 13:02:20 -08:00
Maysam Yabandeh	50804656d2	Enable write-conflict snapshot in stress tests (#5897 ) Summary: DBImpl extends the public GetSnapshot() with GetSnapshotForWriteConflictBoundary() method that takes snapshots specially for write-write conflict checking. Compaction treats such snapshots differently to avoid GCing a value written after that, so that the write conflict remains visible even after the compaction. The patch extends stress tests with such snapshots. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5897 Differential Revision: D17937476 Pulled By: maysamyabandeh fbshipit-source-id: bd8b0c578827990302194f63ae0181e15752951d	2019-11-06 11:13:22 -08:00
sdong	e4e1d35cc2	Revert "Disable pre-5.5 versions in the format compatibility test (#5990 )" (#5999 ) Summary: This reverts commit `351e25401b`. All branches have been fixed to buildable on FB environments, so we can revert it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5999 Differential Revision: D18281947 fbshipit-source-id: 6deaaf1b5df2349eee5d6ed9b91208cd7e23ec8e	2019-11-01 15:57:15 -07:00
sdong	5b656584af	crash_test: disable periodic compaction in FIFO compaction. (#5993 ) Summary: A recent commit make periodic compaction option valid in FIFO, which means TTL. But we fail to disable it in crash test, causing assert failure. Fix it by having it disabled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5993 Test Plan: Restart "make crash_test" many times and make sure --periodic_compaction_seconds=0 is always the case when --compaction_style=2 Differential Revision: D18263223 fbshipit-source-id: c91a802017d83ae89ac43827d1b0012861933814	2019-10-31 17:28:03 -07:00
Levi Tamasi	351e25401b	Disable pre-5.5 versions in the format compatibility test (#5990 ) Summary: We have updated earlier release branches going back to 5.5 so they are built using gcc7 by default. Disabling ancient versions before that until we figure out a plan for them. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5990 Test Plan: Ran the script locally. Differential Revision: D18252386 Pulled By: ltamasi fbshipit-source-id: a7bbb30dc52ff2eaaf31a29ecc79f7cf4e2834dc	2019-10-31 13:45:02 -07:00
sdong	0337d87b42	crash_test: disable atomic flush with pipelined write (#5986 ) Summary: Recently, pipelined write is enabled even if atomic flush is enabled, which causing sanitizing failure in db_stress. Revert this change. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5986 Test Plan: Run "make crash_test_with_atomic_flush" and see it to run for some while so that the old sanitizing error (which showed up quickly) doesn't show up. Differential Revision: D18228278 fbshipit-source-id: 27fdf2f8e3e77068c9725a838b9bef4ab25a2553	2019-10-30 11:36:55 -07:00
sdong	15119f08e2	Add more release branches to tools/check_format_compatible.sh (#5985 ) Summary: More release branches are created. We should include them in continuous format compatibility checks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5985 Test Plan: Let's see whether it is passes. Differential Revision: D18226532 fbshipit-source-id: 75d8cad5b03ccea4ce16f00cea1f8b7893b0c0c8	2019-10-30 11:20:49 -07:00
sdong	a3960fc875	Move pipeline write waiting logic into WaitForPendingWrites() (#5716 ) Summary: In pipeline writing mode, memtable switching needs to wait for memtable writing to finish to make sure that when memtables are made immutable, inserts are not going to them. This is currently done in DBImpl::SwitchMemtable(). This is done after flush_scheduler_.TakeNextColumnFamily() is called to fetch the list of column families to switch. The function flush_scheduler_.TakeNextColumnFamily() itself, however, is not thread-safe when being called together with flush_scheduler_.ScheduleFlush(). This change provides a fix, which moves the waiting logic before flush_scheduler_.TakeNextColumnFamily(). WaitForPendingWrites() is a natural place where the logic can happen. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5716 Test Plan: Run all tests with ASAN and TSAN. Differential Revision: D18217658 fbshipit-source-id: b9c5e765c9989645bf10afda7c5c726c3f82f6c3	2019-10-29 18:16:36 -07:00
sdong	f22aaf8b3f	db_stress: CF Consistency check to use random CF to validate iterator results (#5983 ) Summary: Right now, in db_stress's iterator tests, we always use the same CF to validate iterator results. This commit changes it so that a randomized CF is used in Cf consistency test, where every CF should have exactly the same data. This would help catch more bugs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5983 Test Plan: Run "make crash_test_with_atomic_flush". Differential Revision: D18217643 fbshipit-source-id: 3ac998852a0378bb59790b20c5f236f6a5d681fe	2019-10-29 18:16:35 -07:00
sdong	9f1e5a0b87	CfConsistencyStressTest to validate key consistent across CFs in TestGet() (#5863 ) Summary: Right now in CF consitency stres test's TestGet(), keys are just fetched without validation. With this change, in 1/2 the time, compare all the CFs share the same value with the same key. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5863 Test Plan: Run "make crash_test_with_atomic_flush" and see tests pass. Hack the code to generate some inconsistency and observe the test fails as expected. Differential Revision: D17934206 fbshipit-source-id: 00ba1a130391f28785737b677f80f366fb83cced	2019-10-23 16:57:16 -07:00
Yanqin Jin	c0abc6bbc1	Use FLAGS_env for certain operations in db_bench (#5943 ) Summary: Since we already parse env_uri from command line and creates custom Env accordingly, we should invoke the methods of such Envs instead of using Env::Default(). Test Plan (on devserver): ``` $make db_bench db_stress $./db_bench -benchmarks=fillseq ./db_stress ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5943 Differential Revision: D18018550 Pulled By: riversand963 fbshipit-source-id: 03b61329aaae0dfd914a0b902cc677f570f102e3	2019-10-22 11:43:21 -07:00
Yanqin Jin	c53db172a1	Fix TestIterate for HashSkipList in db_stress (#5942 ) Summary: Since SeekForPrev (used by Prev) is not supported by HashSkipList when prefix is used, we disable it when stress testing HashSkipList. - Change the default memtablerep to skip list. - Avoid Prev() when memtablerep is HashSkipList and prefix is used. Test Plan (on devserver): ``` $make db_stress $./db_stress -ops_per_thread=10000 -reopen=1 -destroy_db_initially=true -column_families=1 -threads=1 -column_families=1 -memtablerep=prefix_hash $# or simply $./db_stress $./db_stress -memtablerep=prefix_hash ``` Results must print "Verification successful". Pull Request resolved: https://github.com/facebook/rocksdb/pull/5942 Differential Revision: D18017062 Pulled By: riversand963 fbshipit-source-id: af867e59aa9e6f533143c984d7d529febf232fd7	2019-10-18 15:49:12 -07:00
Peter Dillinger	fe464bca5c	Fix PlainTableReader not to crash sst_dump (#5940 ) Summary: Plain table SSTs could crash sst_dump because of a bug in PlainTableReader that can leave table_properties_ as null. Even if it was intended not to keep the table properties in some cases, they were leaked on the offending code path. Steps to reproduce: $ db_bench --benchmarks=fillrandom --num=2000000 --use_plain_table --prefix-size=12 $ sst_dump --file=0000xx.sst --show_properties from [] to [] Process /dev/shm/dbbench/000014.sst Sst file format: plain table Raw user collected properties ------------------------------ Segmentation fault (core dumped) Also added missing unit testing of plain table full_scan_mode, and an assertion in NewIterator to check for regression. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5940 Test Plan: new unit test, manual, make check Differential Revision: D18018145 Pulled By: pdillinger fbshipit-source-id: 4310c755e824c4cd6f3f86a3abc20dfa417c5e07	2019-10-18 14:44:42 -07:00
Zhichao Cao	526e3b9763	Enable trace_replay with multi-threads (#5934 ) Summary: In the current trace replay, all the queries are serialized and called by single threads. It may not simulate the original application query situations closely. The multi-threads replay is implemented in this PR. Users can set the number of threads to replay the trace. The queries generated according to the trace records are scheduled in the thread pool job queue. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5934 Test Plan: test with make check and real trace replay. Differential Revision: D17998098 Pulled By: zhichao-cao fbshipit-source-id: 87eecf6f7c17a9dc9d7ab29dd2af74f6f60212c8	2019-10-18 14:13:50 -07:00
Yanqin Jin	e60cc0925c	Expose db stress tests (#5937 ) Summary: expose db stress test by providing db_stress_tool.h in public header. This PR does the following: - adds a new header, db_stress_tool.h, in include/rocksdb/ - renames db_stress.cc to db_stress_tool.cc - adds a db_stress.cc which simply invokes a test function. - update Makefile accordingly. Test Plan (dev server): ``` make db_stress ./db_stress ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5937 Differential Revision: D17997647 Pulled By: riversand963 fbshipit-source-id: 1a8d9994f89ce198935566756947c518f0052410	2019-10-18 09:46:44 -07:00
Levi Tamasi	fdc1cb43a6	Support decoding blob indexes in sst_dump (#5926 ) Summary: The patch adds a new command line parameter --decode_blob_index to sst_dump. If this switch is specified, sst_dump prints blob indexes in a human readable format, printing the blob file number, offset, size, and expiration (if applicable) for blob references, and the blob value (and expiration) for inlined blobs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5926 Test Plan: Used db_bench's BlobDB mode to generate SST files containing blob references with and without expiration, as well as inlined blobs with and without expiration (note: the latter are stored as plain values), and confirmed sst_dump correctly prints all four types of records. Differential Revision: D17939077 Pulled By: ltamasi fbshipit-source-id: edc5f58fee94ba35f6699c6a042d5758f5b3963d	2019-10-17 19:36:54 -07:00
Levi Tamasi	78b28d80b0	Support non-TTL Puts for BlobDB in db_bench (#5921 ) Summary: Currently, db_bench only supports PutWithTTL operations for BlobDB but not regular Puts. The patch adds support for regular (non-TTL) Puts and also changes the default for blob_db_max_ttl_range to zero, which corresponds to no TTL. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5921 Test Plan: make check ./db_bench -benchmarks=fillrandom -statistics -stats_interval_seconds=1 -duration=90 -num=500000 -use_blob_db=1 -blob_db_file_size=1000000 -target_file_size_base=1000000 (issues Put operations with no TTL) ./db_bench -benchmarks=fillrandom -statistics -stats_interval_seconds=1 -duration=90 -num=500000 -use_blob_db=1 -blob_db_file_size=1000000 -target_file_size_base=1000000 -blob_db_max_ttl_range=86400 (issues PutWithTTL operations with random TTLs in the [0, blob_db_max_ttl_range) interval, as before) Differential Revision: D17919798 Pulled By: ltamasi fbshipit-source-id: b946c3522b836b92b4c157ffbad24f92ba2b0a16	2019-10-14 17:49:20 -07:00
Maysam Yabandeh	a6e615a7ba	Enable partitioned index/filter in stress tests (#5918 ) Summary: This is the 3rd attempt after the revert of https://github.com/facebook/rocksdb/issues/4020 and https://github.com/facebook/rocksdb/issues/5895 The last bug is fixed https://github.com/facebook/rocksdb/pull/5907 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5918 Test Plan: ``` make -j32 crash_test ``` Differential Revision: D17909489 Pulled By: maysamyabandeh fbshipit-source-id: 7dfb8cf998c2d295c86465dd21734593d277887e	2019-10-14 10:35:18 -07:00
Yanqin Jin	bc8b05cb77	Revert "Enable partitioned index/filter in stress tests (#5895 )" (#5904 ) Summary: This reverts commit `2f4e288143`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5904 Differential Revision: D17871282 Pulled By: riversand963 fbshipit-source-id: d210725f8f3b26d8eac25892094da09d9694337e	2019-10-10 19:19:39 -07:00
anand76	80ad996b35	Make the db_stress reopen loop in OperateDb() more robust (#5893 ) Summary: The loop in OperateDb() is getting quite complicated with the introduction of multiple key operations such as MultiGet and Reseeks. This is resulting in a number of corner cases that hangs db_stress due to synchronization problems during reopen (i.e when -reopen=<> option is specified). This PR makes it more robust by ensuring all db_stress threads vote to reopen the DB the exact same number of times. Most of the changes in this diff are due to indentation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5893 Test Plan: Run crash test Differential Revision: D17823827 Pulled By: anand1976 fbshipit-source-id: ec893829f611ac7cac4057c0d3d99f9ffb6a6dd9	2019-10-09 09:27:10 -07:00
Yanqin Jin	167cdc9f17	Support custom env in sst_dump (#5845 ) Summary: This PR allows for the creation of custom env when using sst_dump. If the user does not set options.env or set options.env to nullptr, then sst_dump will automatically try to create a custom env depending on the path to the sst file or db directory. In order to use this feature, the user must call ObjectRegistry::Register() beforehand. Test Plan (on devserver): ``` $make all && make check ``` All tests must pass to ensure this change does not break anything. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5845 Differential Revision: D17678038 Pulled By: riversand963 fbshipit-source-id: 58ecb4b3f75246d52b07c4c924a63ee61c1ee626	2019-10-08 19:19:12 -07:00
Maysam Yabandeh	2f4e288143	Enable partitioned index/filter in stress tests (#5895 ) Summary: This is the 2nd attempt after the revert of https://github.com/facebook/rocksdb/pull/4020 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5895 Test Plan: ``` ./tools/db_crashtest.py blackbox --simple --interval=10 --max_key=10000000 ``` Differential Revision: D17822137 Pulled By: maysamyabandeh fbshipit-source-id: 3d148c0d8cc129080410ff859c04b544223c8ea3	2019-10-08 16:50:21 -07:00
anand76	cca87d7722	Fix reopen voting logic in db_stress to prevent hangs (#5876 ) Summary: When multiple operations are performed in a db_stress thread in one loop iteration, the reopen voting logic needs to take that into account. It was doing that for MultiGet, but a new option was introduced recently to do multiple iterator seeks per iteration, which broke it again. Fix the logic to be more robust and agnostic of the type of operation performed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5876 Test Plan: Run db_stress Differential Revision: D17733590 Pulled By: anand1976 fbshipit-source-id: 787f01abefa1e83bba43e0b4f4abb26699b2089e	2019-10-03 10:22:26 -07:00
sdong	503a756e42	Fix clang analyze warning in db_stress (#5870 ) Summary: Recent changes trigger clang analyze warning. Fix it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5870 Test Plan: "USE_CLANG=1 TEST_TMPDIR=/dev/shm/rocksdb OPT=-g make -j60 analyze" and make sure it passes. Differential Revision: D17682533 fbshipit-source-id: 02716f2a24572550a22db4bbe9b54d4872dfae32	2019-09-30 22:15:27 -07:00
Jay Zhuang	51413e0a85	Fix a compile error (#5864 ) Summary: ``` tools/block_cache_analyzer/block_cache_trace_analyzer.cc:653:48: error: implicit conversion loses integer precision: 'uint64_t' (aka 'unsigned long long') to 'std::__1::linear_congruential_engine<unsigned int, 48271, 0, 2147483647>::result_type' (aka 'unsigned int') [-Werror,-Wshorten-64-to-32] std::default_random_engine rand_engine(env_->NowMicros()); ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5864 Differential Revision: D17668962 fbshipit-source-id: e08fa58b2a78a8dd8b334862b5714208f696b8ab	2019-09-30 14:02:19 -07:00
sdong	69c4ccb970	Fix three more db_stress bugs (#5867 ) Summary: Two more bug fixes in db_stress: 1. this is to complete the fix of the regression bug causing overflowing when supporting FLAGS_prefix_size = -1. 2. Fix regression bug in compare iterator itself: (1) when creating control iterator, which used the same read option as the normal iterator by mistake; (2) the logic of comparing has some problems. Fix them. (3) disable validation for lower bound now, which generated some wildly different results. Disabling it to make normal tests pass while investigating it. 3. Cleaning up snapshots in verification failure cases. Memory is leaked otherwise. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5867 Test Plan: Run "make crash_test" for a while and see at least 1 is fixed. Differential Revision: D17671712 fbshipit-source-id: 011f98ea1a72aef23e19ff28656830c78699b402	2019-09-30 12:38:23 -07:00
sdong	5cd8aaf75f	db_stress: fix run time error when prefix_size = -1 (#5862 ) Summary: When prefix_size = -1, stress test crashes with run time error because of overflow. Fix it by not using -1 but 7 in prefix scan mode. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5862 Test Plan: Run python -u tools/db_crashtest.py --simple whitebox --random_kill_odd \ 888887 --compression_type=zstd and see it doesn't crash. Differential Revision: D17642313 fbshipit-source-id: f029e7651498c905af1b1bee6d310ae50cdcda41	2019-09-27 16:55:57 -07:00
sdong	679a45d0cb	crash_test to do some verification for prefix extractor and iterator bounds. (#5846 ) Summary: For now, crash_test is not able to report any failure for the logic related to iterator upper, lower bounds or iterators, or reseek. These are features prone to errors. Improve db_stress in several ways: (1) For each iterator run, reseek up to 3 times. (2) For every iterator, create control iterator with upper or lower bound, with total order seek. Compare the results with the iterator. (3) Make simple crash test to avoid prefix size to have more coverage. (4) make prefix_size = 0 a valid size and -1 to indicate disabling prefix extractor. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5846 Test Plan: Manually hack the code to create wrong results and see they are caught by the tool. Differential Revision: D17631760 fbshipit-source-id: acd460a177bd2124a5ffd7fff490702dba63030b	2019-09-27 11:10:44 -07:00
sdong	e8263dbdaa	Apply formatter to recent 200+ commits. (#5830 ) Summary: Further apply formatter to more recent commits. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5830 Test Plan: Run all existing tests. Differential Revision: D17488031 fbshipit-source-id: 137458fd94d56dd271b8b40c522b03036943a2ab	2019-09-20 12:04:26 -07:00
sdong	c06b54d0c6	Apply formatter on recent 45 commits. (#5827 ) Summary: Some recent commits might not have passed through the formatter. I formatted recent 45 commits. The script hangs for more commits so I stopped there. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5827 Test Plan: Run all existing tests. Differential Revision: D17483727 fbshipit-source-id: af23113ee63015d8a43d89a3bc2c1056189afe8f	2019-09-19 12:34:17 -07:00
Maysam Yabandeh	6ec6a4a9a4	Remove snap_refresh_nanos option (#5826 ) Summary: The snap_refresh_nanos option didn't bring much benefit. Remove the feature to simplify the code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5826 Differential Revision: D17467147 Pulled By: maysamyabandeh fbshipit-source-id: 4f950b046990d0d1292d7fc04c2ccafaf751c7f0	2019-09-18 20:26:04 -07:00
Levi Tamasi	94d62d771e	Temporarily disable partitioned index/filter in stress test (#5811 ) Summary: PR https://github.com/facebook/rocksdb/issues/4020 enabled partitioned indexes/filters in stress tests; however, this causes assertion failures in BatchedOpsStressTest. This patch disables them until we can root cause the failures. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5811 Test Plan: Ran the script and made sure it only uses the binary search index. Differential Revision: D17399366 Pulled By: ltamasi fbshipit-source-id: adb116e6297f9c6ccd7ac15b6a16c9aa91f21ac5	2019-09-16 11:41:35 -07:00
sdong	b931f84e56	Divide file_reader_writer.h and .cc (#5803 ) Summary: file_reader_writer.h and .cc contain several files and helper function, and it's hard to navigate. Separate it to multiple files and put them under file/ Pull Request resolved: https://github.com/facebook/rocksdb/pull/5803 Test Plan: Build whole project using make and cmake. Differential Revision: D17374550 fbshipit-source-id: 10efca907721e7a78ed25bbf74dc5410dea05987	2019-09-16 10:33:51 -07:00
Peter (Stig) Edwards	2ed91622fb	sst_dump recompress show #blocks compressed and not compressed (#5791 ) Summary: Closes https://github.com/facebook/rocksdb/issues/1474 Helps show when the 12.5% threshold for GoodCompressionRatio (originally from ldb) is hit. Example output: ``` > ./sst_dump --file=/tmp/test.sst --command=recompress from [] to [] Process /tmp/test.sst Sst file format: block-based Block Size: 16384 Compression: kNoCompression Size: 122579836 Blocks: 2300 Compressed: 0 ( 0.0%) Not compressed (ratio): 2300 (100.0%) Not compressed (abort): 0 ( 0.0%) Compression: kSnappyCompression Size: 46289962 Blocks: 2300 Compressed: 2119 ( 92.1%) Not compressed (ratio): 181 ( 7.9%) Not compressed (abort): 0 ( 0.0%) Compression: kZlibCompression Size: 29689825 Blocks: 2300 Compressed: 2301 (100.0%) Not compressed (ratio): 0 ( 0.0%) Not compressed (abort): 0 ( 0.0%) Unsupported compression type: kBZip2Compression. Compression: kLZ4Compression Size: 44785490 Blocks: 2300 Compressed: 1950 ( 84.8%) Not compressed (ratio): 350 ( 15.2%) Not compressed (abort): 0 ( 0.0%) Compression: kLZ4HCCompression Size: 37498895 Blocks: 2300 Compressed: 2301 (100.0%) Not compressed (ratio): 0 ( 0.0%) Not compressed (abort): 0 ( 0.0%) Unsupported compression type: kXpressCompression. Compression: kZSTD Size: 32208707 Blocks: 2300 Compressed: 2301 (100.0%) Not compressed (ratio): 0 ( 0.0%) Not compressed (abort): 0 ( 0.0%) ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5791 Differential Revision: D17347870 fbshipit-source-id: af10849c010b46b20e54162b70123c2805ffe526	2019-09-13 16:30:41 -07:00
Lingjing You	1a928c22a0	Add insert hints for each writebatch (#5728 ) Summary: Add insert hints for each writebatch so that they can be used in concurrent write, and add write option to enable it. Bench result (qps): `./db_bench --benchmarks=fillseq -allow_concurrent_memtable_write=true -num=4000000 -batch-size=1 -threads=1 -db=/data3/ylj/tmp -write_buffer_size=536870912 -num_column_families=4` master: \| batch size \ thread num \| 1 \| 2 \| 4 \| 8 \| \| ----------------------- \| ------- \| ------- \| ------- \| ------- \| \| 1 \| 387883 \| 220790 \| 308294 \| 490998 \| \| 10 \| 1397208 \| 978911 \| 1275684 \| 1733395 \| \| 100 \| 2045414 \| 1589927 \| 1798782 \| 2681039 \| \| 1000 \| 2228038 \| 1698252 \| 1839877 \| 2863490 \| fillseq with writebatch hint: \| batch size \ thread num \| 1 \| 2 \| 4 \| 8 \| \| ----------------------- \| ------- \| ------- \| ------- \| ------- \| \| 1 \| 286005 \| 223570 \| 300024 \| 466981 \| \| 10 \| 970374 \| 813308 \| 1399299 \| 1753588 \| \| 100 \| 1962768 \| 1983023 \| 2676577 \| 3086426 \| \| 1000 \| 2195853 \| 2676782 \| 3231048 \| 3638143 \| Pull Request resolved: https://github.com/facebook/rocksdb/pull/5728 Differential Revision: D17297240 fbshipit-source-id: b053590a6d77871f1ef2f911a7bd013b3899b26c	2019-09-12 17:15:18 -07:00
Levi Tamasi	d35ffd569c	Temporarily disable hash index in stress tests (#5792 ) Summary: PR https://github.com/facebook/rocksdb/issues/4020 implicitly enabled the hash index as well in stress/crash tests, resulting in assertion failures in Block. This patch disables the hash index until we can pinpoint the root cause of these issues. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5792 Test Plan: Ran tools/db_crashtest.py and made sure it only uses index types 0 and 2 (binary search and partitioned index). Differential Revision: D17346777 Pulled By: ltamasi fbshipit-source-id: b4318f37f1fda3ee1bbff4ef2c2f556ca9e6b551	2019-09-12 12:11:34 -07:00
Adam Retter	e8c2e68b4e	Fix RocksDB bug in block_cache_trace_analyzer.cc on Windows (#5786 ) Summary: This is required to compile on Windows with Visual Studio 2015. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5786 Differential Revision: D17335994 fbshipit-source-id: 8f9568310bc6f697e312b5e24ad465e9084f0011	2019-09-11 18:36:41 -07:00
Andrew Kryczka	dd2a35f13f	Support partitioned index and filters in stress/crash tests (#4020 ) Summary: - In `db_stress`, support choosing index type and whether to enable filter partitioning, and randomly set those options in crash test - When partitioned filter is enabled by crash test, force partitioned index to also be enabled since it's a prerequisite Pull Request resolved: https://github.com/facebook/rocksdb/pull/4020 Test Plan: currently this is blocked on fixing the bug that crash test caught: ``` $ TEST_TMPDIR=/data/compaction_bench python ./tools/db_crashtest.py blackbox --simple --interval=10 --max_key=10000000 ... Verification failed for column family 0 key 937501: Value not found: NotFound: Crash-recovery verification failed :( ``` Differential Revision: D8508683 Pulled By: maysamyabandeh fbshipit-source-id: 0337e5d0558bcef26b1f3699f47265a2c1e99629	2019-09-11 14:13:38 -07:00
anand76	eb9026f09b	Add a db_bench benchmark to warm up the row cache Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5707 Differential Revision: D17242698 Pulled By: anand1976 fbshipit-source-id: 5d1bfda3c9e8f56176ae391cae6c91e6262016b8	2019-09-10 11:06:36 -07:00
sdong	1daff8f85a	crash_test to skip compaction TTL for FIFO compaction. (#5749 ) Summary: https://github.com/facebook/rocksdb/pull/5741 added compaction TTL to crash test, but it causes assertion fails for FIFO compaction. Disable this combination for now while we debug the assertion failure. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5749 Test Plan: Run crash test and observe that when compaction_style=2, compaction_ttl is always 0. Differential Revision: D17078292 fbshipit-source-id: 446821a3b9739956094d5e4f9be1251a15b57f5d	2019-08-27 17:55:37 -07:00
sdong	1d6a10f52d	Extend stress test to cover periodic compaction and compaction TTL (#5741 ) Summary: Covering periodic compaction and compaction TTL can help us expose potential issues. Add it there. Randomly select value for these two options. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5741 Test Plan: Run crash_test and see the perameters generated. Differential Revision: D17059515 fbshipit-source-id: 8213974846a0b6a22fc13be705825c9054d1d097	2019-08-26 15:03:25 -07:00
Zhongyi Xie	2f41ecfe75	Refactor trimming logic for immutable memtables (#5022 ) Summary: MyRocks currently sets `max_write_buffer_number_to_maintain` in order to maintain enough history for transaction conflict checking. The effectiveness of this approach depends on the size of memtables. When memtables are small, it may not keep enough history; when memtables are large, this may consume too much memory. We are proposing a new way to configure memtable list history: by limiting the memory usage of immutable memtables. The new option is `max_write_buffer_size_to_maintain` and it will take precedence over the old `max_write_buffer_number_to_maintain` if they are both set to non-zero values. The new option accounts for the total memory usage of flushed immutable memtables and mutable memtable. When the total usage exceeds the limit, RocksDB may start dropping immutable memtables (which is also called trimming history), starting from the oldest one. The semantics of the old option actually works both as an upper bound and lower bound. History trimming will start if number of immutable memtables exceeds the limit, but it will never go below (limit-1) due to history trimming. In order the mimic the behavior with the new option, history trimming will stop if dropping the next immutable memtable causes the total memory usage go below the size limit. For example, assuming the size limit is set to 64MB, and there are 3 immutable memtables with sizes of 20, 30, 30. Although the total memory usage is 80MB > 64MB, dropping the oldest memtable will reduce the memory usage to 60MB < 64MB, so in this case no memtable will be dropped. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5022 Differential Revision: D14394062 Pulled By: miasantreble fbshipit-source-id: 60457a509c6af89d0993f988c9b5c2aa9e45f5c5	2019-08-23 13:55:34 -07:00
sdong	d8a27d9331	Atomic Flush Crash Test also covers the case that WAL is enabled. (#5729 ) Summary: AtomicFlushStressTest is a powerful test, but right now we only run it for atomic_flush=true + disable_wal=true. We further extend it to the case where atomic_flush=false + disable_wal = false. All the workload generation and validation can stay the same. Atomic flush crash test is also changed to switch between the two test scenarios. It makes the name "atomic flush crash test" out of sync from what it really does. We leave it as it is to avoid troubles with continous test set-up. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5729 Test Plan: Run "CRASH_TEST_KILL_ODD=188 TEST_TMPDIR=/dev/shm/ USE_CLANG=1 make whitebox_crash_test_with_atomic_flush", observe the settings used and see it passed. Differential Revision: D16969791 fbshipit-source-id: 56e37487000ae631e31b0100acd7bdc441c04163	2019-08-22 16:32:55 -07:00
sdong	8e12638f3d	Slightly adjust atomic white box test's kill odd (#5717 ) Summary: Atomic white box test's kill odd is the same as normal test. However, in the scenario that only WritableFileWriter::Append() is blacklisted, WritableFileWriter::Flush() dominates the killing odds. Normally, most of WritableFileWriter::Flush() are called in WAL writes, where every write triggers a WAL flush. In atomic test, WAL is disabled, so the kill happens less frequently than we antipated. In some rare cases, the kill didn't end up with happening (for reasons I still don't fully understand) and cause the stress test timeout. If WAL is disabled, make the odds 5x likely to trigger. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5717 Test Plan: Run whitebox_crash_test_with_atomic_flush and whitebox_crash_test and observe the kill odds printed out. Differential Revision: D16897237 fbshipit-source-id: cbf5d96f6fc0e980523d0f1f94bf4e72cdb82d1c	2019-08-19 10:51:59 -07:00
sdong	e1c468d16f	Do readahead in VerifyChecksum() (#5713 ) Summary: Right now VerifyChecksum() doesn't do read-ahead. In some use cases, users won't be able to achieve good performance. With this change, by default, RocksDB will do a default readahead, and users will be able to overwrite the readahead size by passing in a ReadOptions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5713 Test Plan: Add a new unit test. Differential Revision: D16860874 fbshipit-source-id: 0cff0fe79ac855d3d068e6ccd770770854a68413	2019-08-16 16:42:56 -07:00
sdong	bd2c753dd0	Add command "list_file_range_deletes" in ldb (#5615 ) Summary: Add a command in ldb so that users can print out tombstones in SST files. In order to test the code, change the interface of LDBCommandRunner::RunCommand() so that it doesn't return from the program, but return the status code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5615 Test Plan: Add a new unit test Differential Revision: D16550326 fbshipit-source-id: 88ddfe6984bdcbb3a528abdd115089df09eba52e	2019-08-15 17:01:03 -07:00
haoyuhuang	3da225716c	Block cache analyzer: Support reading from human readable trace file. (#5679 ) Summary: This PR adds support in block cache trace analyzer to read from human readable trace file. This is needed when a user does not have access to the binary trace file. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5679 Test Plan: USE_CLANG=1 make check -j32 Differential Revision: D16697239 Pulled By: HaoyuHuang fbshipit-source-id: f2e29d7995816c389b41458f234ec8e184a924db	2019-08-09 13:13:54 -07:00
haoyuhuang	6e78fe3c8d	Pysim more algorithms (#5644 ) Summary: This PR adds four more eviction policies. - OPT [1] - Hyperbolic caching [2] - ARC [3] - GreedyDualSize [4] [1] L. A. Belady. 1966. A Study of Replacement Algorithms for a Virtual-storage Computer. IBM Syst. J. 5, 2 (June 1966), 78-101. DOI=http://dx.doi.org/10.1147/sj.52.0078 [2] Aaron Blankstein, Siddhartha Sen, and Michael J. Freedman. 2017. Hyperbolic caching: flexible caching for web applications. In Proceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '17). USENIX Association, Berkeley, CA, USA, 499-511. [3] Nimrod Megiddo and Dharmendra S. Modha. 2003. ARC: A Self-Tuning, Low Overhead Replacement Cache. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies (FAST '03). USENIX Association, Berkeley, CA, USA, 115-130. [4] N. Young. The k-server dual and loose competitiveness for paging. Algorithmica, June 1994, vol. 11,(no.6):525-41. Rewritten version of ''On-line caching as cache size varies'', in The 2nd Annual ACM-SIAM Symposium on Discrete Algorithms, 241-250, 1991. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5644 Differential Revision: D16548817 Pulled By: HaoyuHuang fbshipit-source-id: 838f76db9179f07911abaab46c97e1c929cfcd63	2019-08-06 18:50:59 -07:00
Vijay Nadimpalli	d150e01474	New API to get all merge operands for a Key (#5604 ) Summary: This is a new API added to db.h to allow for fetching all merge operands associated with a Key. The main motivation for this API is to support use cases where doing a full online merge is not necessary as it is performance sensitive. Example use-cases: 1. Update subset of columns and read subset of columns - Imagine a SQL Table, a row is encoded as a K/V pair (as it is done in MyRocks). If there are many columns and users only updated one of them, we can use merge operator to reduce write amplification. While users only read one or two columns in the read query, this feature can avoid a full merging of the whole row, and save some CPU. 2. Updating very few attributes in a value which is a JSON-like document - Updating one attribute can be done efficiently using merge operator, while reading back one attribute can be done more efficiently if we don't need to do a full merge. ---------------------------------------------------------------------------------------------------- API : Status GetMergeOperands( const ReadOptions& options, ColumnFamilyHandle* column_family, const Slice& key, PinnableSlice* merge_operands, GetMergeOperandsOptions* get_merge_operands_options, int* number_of_operands) Example usage : int size = 100; int number_of_operands = 0; std::vector<PinnableSlice> values(size); GetMergeOperandsOptions merge_operands_info; db_->GetMergeOperands(ReadOptions(), db_->DefaultColumnFamily(), "k1", values.data(), merge_operands_info, &number_of_operands); Description : Returns all the merge operands corresponding to the key. If the number of merge operands in DB is greater than merge_operands_options.expected_max_number_of_operands no merge operands are returned and status is Incomplete. Merge operands returned are in the order of insertion. merge_operands-> Points to an array of at-least merge_operands_options.expected_max_number_of_operands and the caller is responsible for allocating it. If the status returned is Incomplete then number_of_operands will contain the total number of merge operands found in DB for key. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5604 Test Plan: Added unit test and perf test in db_bench that can be run using the command: ./db_bench -benchmarks=getmergeoperands --merge_operator=sortlist Differential Revision: D16657366 Pulled By: vjnadimpalli fbshipit-source-id: 0faadd752351745224ee12d4ae9ef3cb529951bf	2019-08-06 14:26:44 -07:00
haoyuhuang	f4a616ebf9	Block cache analyzer: python script to plot graphs (#5673 ) Summary: This PR updated the python script to plot graphs for stats output from block cache analyzer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5673 Test Plan: Manually run the script to generate graphs. Differential Revision: D16657145 Pulled By: HaoyuHuang fbshipit-source-id: fd510b5fd4307835f9a986fac545734dbe003d28	2019-08-05 18:35:52 -07:00
haoyuhuang	70c7302fb5	Block cache simulator: Add pysim to simulate caches using reinforcement learning. (#5610 ) Summary: This PR implements cache eviction using reinforcement learning. It includes two implementations: 1. An implementation of Thompson Sampling for the Bernoulli Bandit [1]. 2. An implementation of LinUCB with disjoint linear models [2]. The idea is that a cache uses multiple eviction policies, e.g., MRU, LRU, and LFU. The cache learns which eviction policy is the best and uses it upon a cache miss. Thompson Sampling is contextless and does not include any features. LinUCB includes features such as level, block type, caller, column family id to decide which eviction policy to use. [1] Daniel J. Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, and Zheng Wen. 2018. A Tutorial on Thompson Sampling. Found. Trends Mach. Learn. 11, 1 (July 2018), 1-96. DOI: https://doi.org/10.1561/2200000070 [2] Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web (WWW '10). ACM, New York, NY, USA, 661-670. DOI=http://dx.doi.org/10.1145/1772690.1772758 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5610 Differential Revision: D16435067 Pulled By: HaoyuHuang fbshipit-source-id: 6549239ae14115c01cb1e70548af9e46d8dc21bb	2019-07-26 14:41:13 -07:00
Mark Rambacher	cfcf045acc	The ObjectRegistry class replaces the Registrar and NewCustomObjects.… (#5293 ) Summary: The ObjectRegistry class replaces the Registrar and NewCustomObjects. Objects are registered with the registry by Type (the class must implement the static const char *Type() method). This change is necessary for a few reasons: - By having a class (rather than static template instances), the class can be passed between compilation units, meaning that objects could be registered and shared from a dynamic library with an executable. - By having a class with instances, different units could have different objects registered. This could be useful if, for example, one Option allowed for a dynamic library and one did not. When combined with some other PRs (being able to load shared libraries, a Configurable interface to configure objects to/from string), this code will allow objects in external shared libraries to be added to a RocksDB image at run-time, rather than requiring every new extension to be built into the main library and called explicitly by every program. Test plan (on riversand963's devserver) ``` $COMPILE_WITH_ASAN=1 make -j32 all && sleep 1 && make check ``` All tests pass. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5293 Differential Revision: D16363396 Pulled By: riversand963 fbshipit-source-id: fbe4acb615bfc11103eef40a0b288845791c0180	2019-07-23 17:13:05 -07:00
sdong	3782accf7d	ldb sometimes specify a string-append merge operator (#5607 ) Summary: Right now, ldb cannot scan a DB with merge operands with default ldb. There is no hard to give a general merge operator so that it can at least print out something Pull Request resolved: https://github.com/facebook/rocksdb/pull/5607 Test Plan: Run ldb against a DB with merge operands and see the outputs. Differential Revision: D16442634 fbshipit-source-id: c66c414ec07f219cfc6e6ec2cc14c783ee95df54	2019-07-23 14:25:18 -07:00
haoyuhuang	3778470061	Block cache analyzer: Compute correlation of features and human readable trace file. (#5596 ) Summary: - Compute correlation between a few features and predictions, e.g., number of accesses since the last access vs number of accesses till the next access on a block. - Output human readable trace file so python can consume it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5596 Test Plan: make clean && USE_CLANG=1 make check -j32 Differential Revision: D16373200 Pulled By: HaoyuHuang fbshipit-source-id: c848d26bc2e9210461f317d7dbee42d55be5a0cc	2019-07-22 17:51:34 -07:00
Yanqin Jin	a78503bd6c	Temporarily disable snapshot list refresh for atomic flush stress test (#5581 ) Summary: Atomic flush test started to fail after https://github.com/facebook/rocksdb/issues/5099. Then https://github.com/facebook/rocksdb/issues/5278 provided a fix after which the same error occurred much less frequently. However it still occur occasionally. Not sure what the root cause is. This PR disables the feature of snapshot list refresh, and we should keep an eye on the failure in the future. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5581 Differential Revision: D16295985 Pulled By: riversand963 fbshipit-source-id: c9e62e65133c52c21b07097de359632ca62571e4	2019-07-22 14:38:16 -07:00
sdong	6bb3b4b567	ldb idump to support non-default column families. (#5594 ) Summary: ldb idump now only works for default column family. Extend it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5594 Test Plan: Compile and run the tool against a multiple CF DB. Differential Revision: D16380684 fbshipit-source-id: bfb8af36fdad1806837c90aaaab492d71528aceb	2019-07-19 11:36:59 -07:00
haoyuhuang	8a008d4170	Block access tracing: Trace referenced key for Get on non-data blocks. (#5548 ) Summary: This PR traces the referenced key for Get for all types of blocks. This is useful when evaluating hybrid row-block caches. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5548 Test Plan: make clean && USE_CLANG=1 make check -j32 Differential Revision: D16157979 Pulled By: HaoyuHuang fbshipit-source-id: f6327411c9deb74e35e22a35f66cdbae09ab9d87	2019-07-17 13:05:58 -07:00
Levi Tamasi	3bde41b5a3	Move the filter readers out of the block cache (#5504 ) Summary: Currently, when the block cache is used for the filter block, it is not really the block itself that is stored in the cache but a FilterBlockReader object. Since this object is not pure data (it has, for instance, pointers that might dangle, including in one case a back pointer to the TableReader), it's not really sharable. To avoid the issues around this, the current code erases the cache entries when the TableReader is closed (which, BTW, is not sufficient since a concurrent TableReader might have picked up the object in the meantime). Instead of doing this, the patch moves the FilterBlockReader out of the cache altogether, and decouples the filter reader object from the filter block. In particular, instead of the TableReader owning, or caching/pinning the FilterBlockReader (based on the customer's settings), with the change the TableReader unconditionally owns the FilterBlockReader, which in turn owns/caches/pins the filter block. This change also enables us to reuse the code paths historically used for data blocks for filters as well. Note: Eviction statistics for filter blocks are temporarily broken. We plan to fix this in a separate phase. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5504 Test Plan: make asan_check Differential Revision: D16036974 Pulled By: ltamasi fbshipit-source-id: 770f543c5fb4ed126fd1e04bfd3809cf4ff9c091	2019-07-16 13:14:58 -07:00
haoyuhuang	68d43b4d30	A python script to plot graphs for cvs files generated by block_cache_trace_analyzer Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5563 Test Plan: Manually run the script on files generated by block_cache_trace_analyzer. Differential Revision: D16214400 Pulled By: HaoyuHuang fbshipit-source-id: 94485eed995e9b2b63e197c5dfeb80129fa7897f	2019-07-12 18:56:20 -07:00
haoyuhuang	3e9c5a3523	Block cache analyzer: Add more stats (#5516 ) Summary: This PR provides more command line options for block cache analyzer to better understand block cache access pattern. -analyze_bottom_k_access_count_blocks -analyze_top_k_access_count_blocks -reuse_lifetime_labels -reuse_lifetime_buckets -analyze_callers -access_count_buckets -analyze_blocks_reuse_k_reuse_window Pull Request resolved: https://github.com/facebook/rocksdb/pull/5516 Test Plan: make clean && COMPILE_WITH_ASAN=1 make check -j32 Differential Revision: D16037440 Pulled By: HaoyuHuang fbshipit-source-id: b9a4ac0d4712053fab910732077a4d4b91400bc8	2019-07-12 16:55:34 -07:00
haoyuhuang	1a59b6e2a9	Cache simulator: Add a ghost cache for admission control and a hybrid row-block cache. (#5534 ) Summary: This PR adds a ghost cache for admission control. Specifically, it admits an entry on its second access. It also adds a hybrid row-block cache that caches the referenced key-value pairs of a Get/MultiGet request instead of its blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5534 Test Plan: make clean && COMPILE_WITH_ASAN=1 make check -j32 Differential Revision: D16101124 Pulled By: HaoyuHuang fbshipit-source-id: b99edda6418a888e94eb40f71ece45d375e234b1	2019-07-11 12:43:29 -07:00
Yanqin Jin	f786b4a5b4	Improve result print on atomic flush stress test failure (#5549 ) Summary: When atomic flush stress test fails, we print internal keys within the range with mismatched key/values for all column families. Test plan (on devserver) Manually hack the code to randomly insert wrong data. Run the test. ``` $make clean && COMPILE_WITH_TSAN=1 make -j32 db_stress $./db_stress -test_atomic_flush=true -ops_per_thread=10000 ``` Check that proper error messages are printed, as follows: ``` 2019/07/08-17:40:14 Starting verification Verification failed Latest Sequence Number: 190903 [default] 000000000000050B => 56290000525350515E5F5C5D5A5B5859 [3] 0000000000000533 => EE100000EAEBE8E9E6E7E4E5E2E3E0E1FEFFFCFDFAFBF8F9 Internal keys in CF 'default', [000000000000050B, 0000000000000533] (max 8) key 000000000000050B seq 139920 type 1 key 0000000000000533 seq 0 type 1 Internal keys in CF '3', [000000000000050B, 0000000000000533] (max 8) key 0000000000000533 seq 0 type 1 ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5549 Differential Revision: D16158709 Pulled By: riversand963 fbshipit-source-id: f07fa87763f87b3bd908da03c956709c6456bcab	2019-07-09 16:27:22 -07:00
sdong	aa0367aabb	Allow ldb to open DB as secondary (#5537 ) Summary: Right now ldb can open running DB through read-only DB. However, it might leave info logs files to the read-only DB directory. Add an option to open the DB as secondary to avoid it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5537 Test Plan: Run ./ldb scan --max_keys=10 --db=/tmp/rocksdbtest-2491/dbbench --secondary_path=/tmp --no_value --hex and ./ldb get 0x00000000000000103030303030303030 --hex --db=/tmp/rocksdbtest-2491/dbbench --secondary_path=/tmp against a normal db_bench run and observe the output changes. Also observe that no new info logs files are created under /tmp/rocksdbtest-2491/dbbench. Run without --secondary_path and observe that new info logs created under /tmp/rocksdbtest-2491/dbbench. Differential Revision: D16113886 fbshipit-source-id: 4e09dec47c2528f6ca08a9e7a7894ba2d9daebbb	2019-07-09 12:51:28 -07:00
Tim Hatch	a6a9213a36	Fix interpreter lines for files with python2-only syntax. Reviewed By: lisroach Differential Revision: D15362271 fbshipit-source-id: 48fab12ab6e55a8537b19b4623d2545ca9950ec5	2019-07-09 10:51:37 -07:00
sdong	872a261ffc	db_stress to print some internal keys after verification failure (#5543 ) Summary: Print out some more information when db_tress fails with verification failures to help debugging problems. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5543 Test Plan: Manually ingest some failures and observe the outputs are like this: Verification failed [default] 0000000000199A5A => 7C3D000078797A7B74757677707172736C6D6E6F68696A6B [6] 000000000019C8BD => 65380000616063626D6C6F6E69686B6A internal keys in default CF [0000000000199A5A, 000000000019C8BD] (max 8) key 0000000000199A5A seq 179246 type 1 key 000000000019C8BD seq 163970 type 1 Lastest Sequence Number: 292234 Differential Revision: D16153717 fbshipit-source-id: b33fa50a828c190cbf8249a37955432044f92daf	2019-07-08 13:36:37 -07:00
sdong	e4dcf5fd22	db_bench to add a new "benchmark" to print out all stats history (#5532 ) Summary: Sometimes it is helpful to fetch the whole history of stats after benchmark runs. Add such an option Pull Request resolved: https://github.com/facebook/rocksdb/pull/5532 Test Plan: Run the benchmark manually and observe the output is as expected. Differential Revision: D16097764 fbshipit-source-id: 10b5b735a22a18be198b8f348be11f11f8806904	2019-07-03 20:03:28 -07:00
haoyuhuang	66464d1fde	Remove multiple declarations o kMicrosInSecond. Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5526 Test Plan: OPT=-g V=1 make J=1 unity_test -j32 make clean && make -j32 Differential Revision: D16079315 Pulled By: HaoyuHuang fbshipit-source-id: 294ab439cf0db8dd5da44e30eabf0cbb2bb8c4f6	2019-07-01 15:15:12 -07:00
Eli Pozniansky	3e6c185381	Formatting fixes in db_bench_tool (#5525 ) Summary: Formatting fixes in db_bench_tool that were accidentally omitted Pull Request resolved: https://github.com/facebook/rocksdb/pull/5525 Test Plan: Unit tests Differential Revision: D16078516 Pulled By: elipoz fbshipit-source-id: bf8df0e3f08092a91794ebf285396d9b8a335bb9	2019-07-01 14:57:28 -07:00
Eli Pozniansky	f872009237	Fix from some C-style casting (#5524 ) Summary: Fix from some C-style casting in bloom.cc and ./tools/db_bench_tool.cc Pull Request resolved: https://github.com/facebook/rocksdb/pull/5524 Differential Revision: D16075626 Pulled By: elipoz fbshipit-source-id: 352948885efb64a7ef865942c75c3c727a914207	2019-07-01 13:05:34 -07:00
haoyuhuang	9f0bd56889	Cache simulator: Refactor the cache simulator so that we can add alternative policies easily (#5517 ) Summary: This PR creates cache_simulator.h file. It contains a CacheSimulator that runs against a block cache trace record. We can add alternative cache simulators derived from CacheSimulator later. For example, this PR adds a PrioritizedCacheSimulator that inserts filter/index/uncompressed dictionary blocks with high priority. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5517 Test Plan: make clean && COMPILE_WITH_ASAN=1 make check -j32 Differential Revision: D16043689 Pulled By: HaoyuHuang fbshipit-source-id: 65f28ed52b866ffb0e6eceffd7f9ca7c45bb680d	2019-07-01 12:46:32 -07:00
Yanqin Jin	c360675750	Add secondary instance to stress test (#5479 ) Summary: This PR allows users to run stress tests on secondary instance. Test plan (on devserver) ``` ./db_stress -ops_per_thread=100000 -enable_secondary=true -threads=32 -secondary_catch_up_one_in=10000 -clear_column_family_one_in=1000 -reopen=100 ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5479 Differential Revision: D16074325 Pulled By: riversand963 fbshipit-source-id: c0ed959e7b6c7cda3efd0b3070ab379de3b29f1c	2019-07-01 11:49:50 -07:00
sdong	10bae8ceb3	Add more release versions to tools/check_format_compatible.sh (#5518 ) Summary: tools/check_format_compatible.sh is lagged behind. Catch up. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5518 Test Plan: Run the command Differential Revision: D16063180 fbshipit-source-id: d063eb42df9653dec06a2cf0fb982b8a60ca3d2f	2019-06-28 17:41:58 -07:00
Aaron Gao	5c2f13fb14	add create_column_family and drop_column_family cmd to ldb tool (#5503 ) Summary: `create_column_family` cmd already exists but was somehow missed in the help message. also add `drop_column_family` cmd which can drop a cf without opening db. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5503 Test Plan: Updated existing ldb_test.py to test deleting a column family. Differential Revision: D16018414 Pulled By: lightmark fbshipit-source-id: 1fc33680b742104fea86b10efc8499f79e722301	2019-06-27 11:11:48 -07:00
haoyuhuang	554a6456aa	Block cache trace analysis: Write time series graphs in csv files (#5490 ) Summary: This PR adds a feature in block cache trace analysis tool to write statistics into csv files. 1. The analysis tool supports grouping the number of accesses per second by various labels, e.g., block, column family, block type, or a combination of them. 2. It also computes reuse distance and reuse interval. Reuse distance: The cumulated size of unique blocks read between two consecutive accesses on the same block. Reuse interval: The time between two consecutive accesses on the same block. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5490 Differential Revision: D15901322 Pulled By: HaoyuHuang fbshipit-source-id: b5454fea408a32757a80be63de6fe1c8149ca70e	2019-06-24 20:42:12 -07:00
Yanqin Jin	1bfeffab2d	Stop printing after verification fails (#5493 ) Summary: Stop verification and printing once verification fails. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5493 Differential Revision: D15928992 Pulled By: riversand963 fbshipit-source-id: 699feac034a217d57280aa3fb50f5aba06adf317	2019-06-20 22:16:58 -07:00
haoyuhuang	705b8eecb4	Add more callers for table reader. (#5454 ) Summary: This PR adds more callers for table readers. These information are only used for block cache analysis so that we can know which caller accesses a block. 1. It renames the BlockCacheLookupCaller to TableReaderCaller as passing the caller from upstream requires changes to table_reader.h and TableReaderCaller is a more appropriate name. 2. It adds more table reader callers in table/table_reader_caller.h, e.g., kCompactionRefill, kExternalSSTIngestion, and kBuildTable. This PR is long as it requires modification of interfaces in table_reader.h, e.g., NewIterator. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5454 Test Plan: make clean && COMPILE_WITH_ASAN=1 make check -j32. Differential Revision: D15819451 Pulled By: HaoyuHuang fbshipit-source-id: b6caa704c8fb96ddd15b9a934b7e7ea87f88092d	2019-06-20 14:31:48 -07:00
haoyuhuang	2e8ad03ab3	Add more stats in the block cache trace analyzer (#5482 ) Summary: This PR adds more stats in the block cache trace analyzer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5482 Differential Revision: D15883553 Pulled By: HaoyuHuang fbshipit-source-id: 6d440e4f657af75690420102d532d0ee1ed4e9cf	2019-06-18 18:38:42 -07:00
Huisheng Liu	92f631da33	replace sprintf with its safe version snprintf (#5475 ) Summary: sprintf is unsafe and has buffer overrun risk. Replace it with the safer version snprintf where buffer size is supplied to avoid overrun. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5475 Differential Revision: D15879481 Pulled By: sagar0 fbshipit-source-id: 7ae1958ffc9727fa50261dfbb98ddd74e70a72d8	2019-06-18 16:42:26 -07:00
haoyuhuang	bcfc53b436	Block cache tracing: Fix minor bugs with downsampling and some benchmark results. (#5473 ) Summary: As the code changes for block cache tracing are almost complete, I did a benchmark to compare the performance when block cache tracing is enabled/disabled. With 1% downsampling ratio, the performance overhead of block cache tracing is negligible. When we trace all block accesses, the throughput drops by 6 folds with 16 threads issuing random reads and all reads are served in block cache. Setup: RocksDB: version 6.2 Date: Mon Jun 17 17:11:13 2019 CPU: 24 * Intel Core Processor (Skylake) CPUCache: 16384 KB Keys: 20 bytes each Values: 100 bytes each (100 bytes after compression) Entries: 10000000 Prefix: 20 bytes Keys per prefix: 0 RawSize: 1144.4 MB (estimated) FileSize: 1144.4 MB (estimated) Write rate: 0 bytes/second Read rate: 0 ops/second Compression: NoCompression Compression sampling rate: 0 Memtablerep: skip_list Perf Level: 1 I ran the readrandom workload for 1 minute. Detailed throughput results: (ops/second) Sample rate 0: no block cache tracing. Sample rate 1: trace all block accesses. Sample rate 100: trace accesses 1% blocks. 1 thread \| \| \| -- \| -- \| -- \| -- Sample rate \| 0 \| 1 \| 100 1 MB block cache size \| 13,094 \| 13,166 \| 13,341 10 GB block cache size \| 202,243 \| 188,677 \| 229,182 16 threads \| \| \| -- \| -- \| -- \| -- Sample rate \| 0 \| 1 \| 100 1 MB block cache size \| 208,761 \| 178,700 \| 201,872 10 GB block cache size \| 2,645,996 \| 426,295 \| 2,587,605 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5473 Differential Revision: D15869479 Pulled By: HaoyuHuang fbshipit-source-id: 7ae802abe84811281a6af8649f489887cd7c4618	2019-06-17 17:59:02 -07:00
haoyuhuang	2d1dd5bce7	Support computing miss ratio curves using sim_cache. (#5449 ) Summary: This PR adds a BlockCacheTraceSimulator that reports the miss ratios given different cache configurations. A cache configuration contains "cache_name,num_shard_bits,cache_capacities". For example, "lru, 1, 1K, 2K, 4M, 4G". When we replay the trace, we also perform lookups and inserts on the simulated caches. In the end, it reports the miss ratio for each tuple <cache_name, num_shard_bits, cache_capacity> in a output file. This PR also adds a main source block_cache_trace_analyzer so that we can run the analyzer in command line. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5449 Test Plan: Added tests for block_cache_trace_analyzer. COMPILE_WITH_ASAN=1 make check -j32. Differential Revision: D15797073 Pulled By: HaoyuHuang fbshipit-source-id: aef0c5c2e7938f3e8b6a10d4a6a50e6928ecf408	2019-06-17 16:41:12 -07:00
Zhongyi Xie	671d15cbdd	Persistent Stats: persist stats history to disk (#5046 ) Summary: This PR continues the work in https://github.com/facebook/rocksdb/pull/4748 and https://github.com/facebook/rocksdb/pull/4535 by adding a new DBOption `persist_stats_to_disk` which instructs RocksDB to persist stats history to RocksDB itself. When statistics is enabled, and both options `stats_persist_period_sec` and `persist_stats_to_disk` are set, RocksDB will periodically write stats to a built-in column family in the following form: key -> (timestamp in microseconds)#(stats name), value -> stats value. The existing API `GetStatsHistory` will detect the current value of `persist_stats_to_disk` and either read from in-memory data structure or from the hidden column family on disk. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5046 Differential Revision: D15863138 Pulled By: miasantreble fbshipit-source-id: bb82abdb3f2ca581aa42531734ac799f113e931b	2019-06-17 15:21:50 -07:00
haoyuhuang	d43b4cd570	Integrate block cache tracing into db_bench (#5459 ) Summary: This PR integrates the block cache tracing into db_bench. It adds three command line arguments. -block_cache_trace_file (Block cache trace file path.) type: string default: "" -block_cache_trace_max_trace_file_size_in_bytes (The maximum block cache trace file size in bytes. Block cache accesses will not be logged if the trace file size exceeds this threshold. Default is 64 GB.) type: int64 default: 68719476736 -block_cache_trace_sampling_frequency (Block cache trace sampling frequency, termed s. It uses spatial downsampling and samples accesses to one out of s blocks.) type: int32 default: 1 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5459 Differential Revision: D15832031 Pulled By: HaoyuHuang fbshipit-source-id: 0ecf2f2686557251fe741a2769b21170777efa3d	2019-06-17 11:08:21 -07:00
haoyuhuang	7a8d7358bb	Integrate block cache tracer in block based table reader. (#5441 ) Summary: This PR integrates the block cache tracer into block based table reader. The tracer will write the block cache accesses using the trace_writer. The tracer is null in this PR so that nothing will be logged. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5441 Differential Revision: D15772029 Pulled By: HaoyuHuang fbshipit-source-id: a64adb92642cd23222e0ba8b10d86bf522b42f9b	2019-06-14 17:40:31 -07:00
haoyuhuang	bb4178066d	Integrate block cache tracer into db_impl (#5433 ) Summary: This PR integrates the block cache tracer class into db_impl.cc. db_impl.cc contains a member variable of AtomicBlockCacheTraceWriter class and passes its reference to the block_based_table_reader. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5433 Differential Revision: D15728016 Pulled By: HaoyuHuang fbshipit-source-id: 23d5659e8c82d556833dcc1a5558aac8c1f7db71	2019-06-13 15:43:10 -07:00
Maysam Yabandeh	f9842869cf	Disable pipeline writes in stress test (#5445 ) Summary: The tsan crash tests are failing with a data race compliant with pipelined write option. Temporarily disable it until its concurrency issue are fixed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5445 Differential Revision: D15783824 Pulled By: maysamyabandeh fbshipit-source-id: 413a0c3230b86f524fc7eeea2cf8e8375406e65b	2019-06-12 11:12:36 -07:00
haoyuhuang	9bbccda01e	First commit for block cache trace analyzer (#5425 ) Summary: This PR contains the first commit for block cache trace analyzer. It reads a block cache trace file and prints statistics of the traces. We will extend this class to provide more functionalities. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5425 Differential Revision: D15709580 Pulled By: HaoyuHuang fbshipit-source-id: 2f43bd2311f460ab569880819d95eeae217c20bb	2019-06-11 12:22:44 -07:00
Zhongyi Xie	d68f9f4580	simplify include directive involving inttypes (#5402 ) Summary: When using `PRIu64` type of printf specifier, current code base does the following: ``` #ifndef __STDC_FORMAT_MACROS #define __STDC_FORMAT_MACROS #endif #include <inttypes.h> ``` However, this can be simplified to ``` #include <cinttypes> ``` as long as flag `-std=c++11` is used. This should solve issues like https://github.com/facebook/rocksdb/issues/5159 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5402 Differential Revision: D15701195 Pulled By: miasantreble fbshipit-source-id: 6dac0a05f52aadb55e9728038599d3d2e4b59d03	2019-06-06 13:56:07 -07:00
Siying Dong	5851cb7fdb	Move util/trace_replay.* to trace_replay/ (#5376 ) Summary: util/ means for lower level libraries. trace_replay is highly integrated to DB and sometimes call DB. Move it out to a separate directory. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5376 Differential Revision: D15550938 Pulled By: siying fbshipit-source-id: f46dce5ceffdc05a73f26379c7bb1b79ebe6c207	2019-06-03 13:25:26 -07:00
Siying Dong	000b9ec217	Move some logging related files to logging/ (#5387 ) Summary: Many logging related source files are under util/. It will be more structured if they are together. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5387 Differential Revision: D15579036 Pulled By: siying fbshipit-source-id: 3850134ed50b8c0bb40a0c8ae1f184fa4081303f	2019-05-31 17:23:59 -07:00
Vijay Nadimpalli	49c5a12dbe	Organizing rocksdb/db directory Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5390 Differential Revision: D15579388 Pulled By: vjnadimpalli fbshipit-source-id: 5bfc95e31554b8ff05b97b76d6534113f527f366	2019-05-31 11:57:01 -07:00
Yanqin Jin	83f7a8eed0	Fix compilation error in LITE mode (#5391 ) Summary: Add macro ROCKSDB_LITE to fix compilation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5391 Differential Revision: D15574522 Pulled By: riversand963 fbshipit-source-id: 95aea83c5d9b2bf98a3ba0ef9167b63c9be2988b	2019-05-31 08:32:22 -07:00
Yanqin Jin	b9f5900658	Fix WAL replay by skipping old write batches (#5170 ) Summary: 1. Fix a bug in WAL replay in which write batches with old sequence numbers are mistakenly inserted into memtables. 2. Add support for benchmarking secondary instance to db_bench_tool. With changes made in this PR, we can start benchmarking secondary instance using two processes. It is also possible to vary the frequency at which the secondary instance tries to catch up with the primary. The info log of the secondary can be found in a directory whose path can be specified with '-secondary_path'. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5170 Differential Revision: D15564608 Pulled By: riversand963 fbshipit-source-id: ce97688ed3d33f69d3a0b9266ebbbbf887aa0ec8	2019-05-30 19:33:33 -07:00
Siying Dong	8843129ece	Move some memory related files from util/ to memory/ (#5382 ) Summary: Move arena, allocator, and memory tools under util to a separate memory/ directory. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5382 Differential Revision: D15564655 Pulled By: siying fbshipit-source-id: 9cd6b5d0d3d52b39606e19221fa154596e5852a5	2019-05-30 17:44:09 -07:00
Vijay Nadimpalli	50e470791d	Organizing rocksdb/table directory by format Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5373 Differential Revision: D15559425 Pulled By: vjnadimpalli fbshipit-source-id: 5d6d6d615582bedd96a4b879bb25d429a6de8b55	2019-05-30 14:51:11 -07:00

1 2 3 4 5 ...

950 Commits