rocksdb

Author	SHA1	Message	Date
Levi Tamasi	238a9c3f68	Bump version number to 6.23.2	2021-08-04 13:07:49 -07:00
Yanqin Jin	2511b42c7e	Fix NotifyOnFlushCompleted() for atomic flush (#8585 ) Summary: PR https://github.com/facebook/rocksdb/issues/5908 added `flush_jobs_info_` to `FlushJob` to make sure `OnFlushCompleted()` is called after committing flush results to MANIFEST. However, `flush_jobs_info_` is not updated in atomic flush, causing `NotifyOnFlushCompleted()` to skip `OnFlushCompleted()`. This PR fixes this, in a similar way to https://github.com/facebook/rocksdb/issues/5908 that handles regular flush. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8585 Test Plan: make check Reviewed By: jay-zhuang Differential Revision: D29913720 Pulled By: riversand963 fbshipit-source-id: 4ff023c98372fa2c93188d4a5c8a4e9ffa0f4dda	2021-08-04 11:24:41 -07:00
Levi Tamasi	1920121cef	Mention PR 8605 in HISTORY.md (#8619 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8619 Reviewed By: riversand963 Differential Revision: D30081937 Pulled By: ltamasi fbshipit-source-id: 57505957ae2c22d4b194aa28cb3fd261b3b39919	2021-08-04 11:20:31 -07:00
Levi Tamasi	7c393c3fc6	Fix a race in ColumnFamilyData::UnrefAndTryDelete (#8605 ) Summary: The `ColumnFamilyData::UnrefAndTryDelete` code currently on the trunk unlocks the DB mutex before destroying the `ThreadLocalPtr` holding the per-thread `SuperVersion` pointers when the only remaining reference is the back reference from `super_version_`. The idea behind this was to break the circular dependency between `ColumnFamilyData` and `SuperVersion`: when the penultimate reference goes away, `ColumnFamilyData` can clean up the `SuperVersion`, which can in turn clean up `ColumnFamilyData`. (Assuming there is a `SuperVersion` and it is not referenced by anything else.) However, unlocking the mutex throws a wrench in this plan by making it possible for another thread to jump in and take another reference to the `ColumnFamilyData`, keeping the object alive in a zombie `ThreadLocalPtr`-less state. This can cause issues like https://github.com/facebook/rocksdb/issues/8440 , https://github.com/facebook/rocksdb/issues/8382 , and might also explain the `was_last_ref` assertion failures from the `ColumnFamilySet` destructor we sometimes observe during close in our stress tests. Digging through the archives, this unlocking goes way back to 2014 (or earlier). The original rationale was that `SuperVersionUnrefHandle` used to lock the mutex so it can call `SuperVersion::Cleanup`; however, this logic turned out to be deadlock-prone. https://github.com/facebook/rocksdb/pull/3510 fixed the deadlock but left the unlocking in place. https://github.com/facebook/rocksdb/pull/6147 then introduced the circular dependency and associated cleanup logic described above (in order to enable iterators to keep the `ColumnFamilyData` for dropped column families alive), and moved the unlocking-relocking snippet to its present location in `UnrefAndTryDelete`. Finally, https://github.com/facebook/rocksdb/pull/7749 fixed a memory leak but apparently exacerbated the race by (otherwise correctly) switching to `UnrefAndTryDelete` in `SuperVersion::Cleanup`. The patch simply eliminates the unlocking and relocking, which has been unnecessary ever since https://github.com/facebook/rocksdb/issues/3510 made `SuperVersionUnrefHandle` lock-free. This closes the window during which another thread could increase the reference count, and hopefully fixes the issues above. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8605 Test Plan: Ran `make check` and stress tests locally. Reviewed By: pdillinger Differential Revision: D30051035 Pulled By: ltamasi fbshipit-source-id: 8fe559e4b4ad69fc142579f8bc393ef525918528	2021-08-04 11:00:18 -07:00
Jay Zhuang	7513e9011b	Update HISTORY.md and version.h for 6.23.1	2021-07-22 13:46:33 -07:00
Jay Zhuang	bf117c54ef	Fix an race condition during multiple DB opening (#8574 ) Summary: ObjectLibrary is shared between multiple DB instances, the Register() could have race condition. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8574 Test Plan: pass the failed test Reviewed By: ajkr Differential Revision: D29855096 Pulled By: jay-zhuang fbshipit-source-id: 541eed0bd495d2c963d858d81e7eabf1ba16153c	2021-07-22 13:43:40 -07:00
Jay Zhuang	c04a86a0e9	Update HISTORY.md and version.h 6.23 release (#8552 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8552 Reviewed By: ajkr Differential Revision: D29746828 Pulled By: jay-zhuang fbshipit-source-id: 17d564895ae9cb675d455e73626b9a6717db6279	2021-07-16 17:52:14 -07:00
Merlin Mao	3455ab0e2b	Remove extra double quote in options.h (#8550 ) Summary: There is an extra " in options.h (`"index block""`) Pull Request resolved: https://github.com/facebook/rocksdb/pull/8550 Test Plan: None Reviewed By: jay-zhuang Differential Revision: D29746077 Pulled By: autopear fbshipit-source-id: 2e5117296e5414b7c7440d990926bc1e567a0b4f	2021-07-16 17:05:25 -07:00
sdong	39a07c9651	DB Stress Reopen write failure to skip WAL (#8548 ) Summary: When DB Stress enables write failure in reopen, WAL files are also created with a wrapper writalbe file which buffers write until fsync. However, crash test currently expects all writes to WAL is persistent. This is at odd with the unsynced bytes dropped. To work it around temporarily, we disable WAL write failure for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8548 Test Plan: Run db_stress. Manual printf to make sure only WAL files are skipped. Reviewed By: jay-zhuang Differential Revision: D29745095 fbshipit-source-id: 1879dd2c01abad7879ca243ee94570ec47c347f3	2021-07-16 16:09:33 -07:00
Jay Zhuang	a379dae4f7	Minor Makefile update to exclude microbench as dependency (#8523 ) Summary: Otherwise the build may report warning about missing `benchmark.h` for some targets, the error won't break the build. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8523 Test Plan: `make blackbox_ubsan_crash_test` on a machine without benchmark lib installed. Reviewed By: pdillinger Differential Revision: D29682478 Pulled By: jay-zhuang fbshipit-source-id: e1261fbcda46bc6bd3cd39b7b03b7f78927d0430	2021-07-16 15:36:51 -07:00
mrambacher	ac37bfded0	Allow CreateFromString to work on complex URIs (#8547 ) Summary: Some URIs for creating instances (ala SecondaryCache) use complex URIs like (cache://name;prop=value). These URIs were treated as name-value properties. With this change, if the URI does not contain an "id=XX" setting, it will be treated as a single string value (and not an ID and map of name-value properties). Pull Request resolved: https://github.com/facebook/rocksdb/pull/8547 Reviewed By: anand1976 Differential Revision: D29741386 Pulled By: mrambacher fbshipit-source-id: 0621f62bec3a6699a7b66c7c0b5634b2856653aa	2021-07-16 15:05:45 -07:00
Peter Dillinger	df5dc73bec	Don't hold DB mutex for block cache entry stat scans (#8538 ) Summary: I previously didn't notice the DB mutex was being held during block cache entry stat scans, probably because I primarily checked for read performance regressions, because they require the block cache and are traditionally latency-sensitive. This change does some refactoring to avoid holding DB mutex and to avoid triggering and waiting for a scan in GetProperty("rocksdb.cfstats"). Some tests have to be updated because now the stats collector is populated in the Cache aggressively on DB startup rather than lazily. (I hope to clean up some of this added complexity in the future.) This change also ensures proper treatment of need_out_of_mutex for non-int DB properties. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8538 Test Plan: Added unit test logic that uses sync points to fail if the DB mutex is held during a scan, covering the various ways that a scan might be triggered. Performance test - the known impact to holding the DB mutex is on TransactionDB, and the easiest way to see the impact is to hack the scan code to almost always miss and take an artificially long time scanning. Here I've injected an unconditional 5s sleep at the call to ApplyToAllEntries. Before (hacked): $ TEST_TMPDIR=/dev/shm ./db_bench.base_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 \| egrep 'db.db.write.micros\|micros/op' randomtransaction : 433.219 micros/op 2308 ops/sec; 0.1 MB/s ( transactions:78999 aborts:0) rocksdb.db.write.micros P50 : 16.135883 P95 : 36.622503 P99 : 66.036115 P100 : 5000614.000000 COUNT : 149677 SUM : 8364856 $ TEST_TMPDIR=/dev/shm ./db_bench.base_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 \| egrep 'db.db.write.micros\|micros/op' randomtransaction : 448.802 micros/op 2228 ops/sec; 0.1 MB/s ( transactions:75999 aborts:0) rocksdb.db.write.micros P50 : 16.629221 P95 : 37.320607 P99 : 72.144341 P100 : 5000871.000000 COUNT : 143995 SUM : 13472323 Notice the 5s P100 write time. After (hacked): $ TEST_TMPDIR=/dev/shm ./db_bench.new_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 \| egrep 'db.db.write.micros\|micros/op' randomtransaction : 303.645 micros/op 3293 ops/sec; 0.1 MB/s ( transactions:98999 aborts:0) rocksdb.db.write.micros P50 : 16.061871 P95 : 33.978834 P99 : 60.018017 P100 : 616315.000000 COUNT : 187619 SUM : 4097407 $ TEST_TMPDIR=/dev/shm ./db_bench.new_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 \| egrep 'db.db.write.micros\|micros/op' randomtransaction : 310.383 micros/op 3221 ops/sec; 0.1 MB/s ( transactions:96999 aborts:0) rocksdb.db.write.micros P50 : 16.270026 P95 : 35.786844 P99 : 64.302878 P100 : 603088.000000 COUNT : 183819 SUM : 4095918 P100 write is now ~0.6s. Not good, but it's the same even if I completely bypass all the scanning code: $ TEST_TMPDIR=/dev/shm ./db_bench.new_skip -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 \| egrep 'db.db.write.micros\|micros/op' randomtransaction : 311.365 micros/op 3211 ops/sec; 0.1 MB/s ( transactions:96999 aborts:0) rocksdb.db.write.micros P50 : 16.274362 P95 : 36.221184 P99 : 68.809783 P100 : 649808.000000 COUNT : 183819 SUM : 4156767 $ TEST_TMPDIR=/dev/shm ./db_bench.new_skip -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 \| egrep 'db.db.write.micros\|micros/op' randomtransaction : 308.395 micros/op 3242 ops/sec; 0.1 MB/s ( transactions:97999 aborts:0) rocksdb.db.write.micros P50 : 16.106222 P95 : 37.202403 P99 : 67.081875 P100 : 598091.000000 COUNT : 185714 SUM : 4098832 No substantial difference. Reviewed By: siying Differential Revision: D29738847 Pulled By: pdillinger fbshipit-source-id: 1c5c155f5a1b62e4fea0fd4eeb515a8b7474027b	2021-07-16 14:13:08 -07:00
sdong	1e5b631e51	db_bench seekrandom with multiDB should only create iterators queried (#7818 ) Summary: Right now, db_bench with seekrandom and multiple DB setup creates iterator for all DBs just to query one of them. It's different from most real workloads. Fix it by only creating iterators that will be queried. Also fix a bug that DBs are not destroyed in multi-DB mode. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7818 Test Plan: Run db_bench with single/multiDB X using/not using tailing iterator with ASAN build, and validate the behavior is expected. Reviewed By: ajkr Differential Revision: D25720226 fbshipit-source-id: c2ff7ff7120e5ba64287a30b057c5d29b2cbe20b	2021-07-16 12:28:10 -07:00
Baptiste Lemaire	0229a88dfe	Crashtest mempurge (#8545 ) Summary: Add `experiemental_allow_mempurge` flag support for `db_stress` and `db_crashtest.py`, with a `false` default value. I succesfully tested locally both `whitebox` and `blackbox` crash tests with `experiemental_allow_mempurge` flag set as true. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8545 Reviewed By: akankshamahajan15 Differential Revision: D29734513 Pulled By: bjlemaire fbshipit-source-id: 24316c0eccf6caf409e95c035f31d822c66714ae	2021-07-16 10:20:22 -07:00
Mark Rambacher	42ba60b3ba	Make EncryptionProvider and BlockCipher into Customizable objects (#8354 ) Summary: Made the EncryptionProvider and BlockCipher classes inherit from Customizable. Added/fixed the CreateFromString method to these classes to create instances from builtin or registered classes. Added tests to verify that instances can be registered and retrieved as appropriate. Added the ability to configure the builtin (CTR, ROT13) classes from configurable properties. Added the appropriate tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8354 Reviewed By: zhichao-cao Differential Revision: D29558949 Pulled By: mrambacher fbshipit-source-id: c20286b32d179777e060f51a58943e9b0cf81d04	2021-07-16 07:58:51 -07:00
Peter Dillinger	aeb913dd01	Standardize on GCC for TSAN conditional compilation (#8543 ) Summary: In https://github.com/facebook/rocksdb/issues/8539 I accidentally only checked for GCC TSAN, which is what I tested locally, while CircleCI and FB CI use clang TSAN. Related: other existing code like in stack_trace.cc only check for clang TSAN. I've now standardized these to the GCC convention in port/lang.h, so now #ifdef __SANITIZE_THREAD__ can check for any TSAN (assuming lang.h include) Pull Request resolved: https://github.com/facebook/rocksdb/pull/8543 Test Plan: Put an assert(false) in slice_test and look for the NOTE about "signal-unsafe call", both GCC and clang. Eventually, CircleCI TSAN in https://github.com/facebook/rocksdb/issues/8538 Reviewed By: zhichao-cao Differential Revision: D29728483 Pulled By: pdillinger fbshipit-source-id: 8a3b8015c2ed48078214c3ee17146a2c3f11c9f7	2021-07-15 23:50:00 -07:00
zaorangyang	b678cb1f86	The formal parameter types of CompressionOptions constructor should b… (#8510 ) Summary: …e consistent with the member variables's Pull Request resolved: https://github.com/facebook/rocksdb/pull/8510 Reviewed By: ajkr Differential Revision: D29654067 Pulled By: mrambacher fbshipit-source-id: 908baaddfb20c266db7c5aca6a87971393d62ee6	2021-07-15 18:06:50 -07:00
Baptiste Lemaire	206845c057	Mempurge support for wal (#8528 ) Summary: In this PR, `mempurge` is made compatible with the Write Ahead Log: in case of recovery, the DB is now capable of recovering the data that was "mempurged" and kept in the `imm()` list of immutable memtables. The twist was to add a uint64_t to the `memtable` struct to store the number of the earliest log file containing entries from the `memtable`. When a `Flush` operation is replaced with a `MemPurge`, the `VersionEdit` (which usually contains the new min log file number to pick up for recovery and the level 0 file path of the newly created SST file) is no longer appended to the manifest log, and every time the `deleteWal` method is called, a check is made on the list of immutable memtables. This PR also includes a unit test that verifies that no data is lost upon Reopening of the database when the mempurge feature is activated. This extensive unit test includes two column families, with valid data contained in the imm() at time of "crash"/reopening (recovery). Pull Request resolved: https://github.com/facebook/rocksdb/pull/8528 Reviewed By: pdillinger Differential Revision: D29701097 Pulled By: bjlemaire fbshipit-source-id: 072a900fb6ccc1edcf5eef6caf88f3060238edf9	2021-07-15 17:49:13 -07:00
longlijian	4e4ec16957	Replace the namespace "rocksdb" to "ROCKSDB_NAMESPACE" (#8531 ) Summary: For more detail can reference the https://github.com/facebook/rocksdb/issues/6433 (https://github.com/facebook/rocksdb/pull/6433) Pull Request resolved: https://github.com/facebook/rocksdb/pull/8531 Reviewed By: siying Differential Revision: D29717057 Pulled By: ajkr fbshipit-source-id: 3ccad9501e5612590e54a7cf8c447118f323c7f4	2021-07-15 17:23:39 -07:00
Peter Dillinger	5ad3227650	Work around falsely reported data race on LRUHandle::flags (#8539 ) Summary: Some bits are mutated and read while holding a lock, other immutable bits (esp. secondary cache compatibility) can be read by arbitrary threads without holding a lock. AFAIK, this doesn't cause an issue on any architecture we care about, because you will get some legitimate version of the value that includes the initialization, as long as synchronization guarantees the initialization happens before the read. I've only seen this in https://github.com/facebook/rocksdb/issues/8538 so far, but it should be fixed regardless. Otherwise, we'll surely get these false reports again some time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8539 Test Plan: some local TSAN test runs and in CircleCI Reviewed By: zhichao-cao Differential Revision: D29720262 Pulled By: pdillinger fbshipit-source-id: 365fd7e565577c648815161f71b339bcb5ce12d5	2021-07-15 16:09:18 -07:00
Jay Zhuang	31193a73a4	Add missing steps for cmake build (#8524 ) Summary: Some cmake and test configuration are set in pre-steps enviroment variables. Add the missing steps. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8524 Test Plan: CI pass Reviewed By: siying Differential Revision: D29682731 Pulled By: jay-zhuang fbshipit-source-id: afda1acf6a7b76989db450442b0b27f387388b9d	2021-07-15 13:37:49 -07:00
longlijian	803a40d412	Delete legacy code not used any more. (#8508 ) Summary: The removed function in this PR, just only have declared and dose not have any reference used. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8508 Reviewed By: mrambacher Differential Revision: D29649033 Pulled By: jay-zhuang fbshipit-source-id: df98143b73d6c184a2a60c9f7ea2548a065ee35d	2021-07-14 16:04:56 -07:00
hongrubb	870033291a	Fix Get() return status when block cache is disabled (#8485 ) Summary: This PR is for https://github.com/facebook/rocksdb/issues/8453 We need to update `s = biter.status();` when `biter.status().IsIncomplete()` is true. By doing this, can fix the problem in issue. Besides, we still need to update `db_statistics` in `get_context.ReportCounters()` before return back. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8485 Reviewed By: jay-zhuang Differential Revision: D29604835 Pulled By: ajkr fbshipit-source-id: c7f2f1cd058223ce1b507ec05d57cf264b9c9710	2021-07-13 18:13:24 -07:00
sherriiiliu	7b9ecd4067	fix several MSVC build errors (#8519 ) Summary: Fixed a few MSVC (VCToolsVersion=14.0) build errors and warnings * `DEFINE_string` is a macro and VC compiler complains that it cannot put [ifdef-inside-define](https://stackoverflow.com/questions/5586429/ifdef-inside-define) * `sleep()` is not a recognizable function. Use `FLAGS_env->SleepForMicroseconds` instead * Define precise type in comparison to avoid mismatch warning Pull Request resolved: https://github.com/facebook/rocksdb/pull/8519 Reviewed By: jay-zhuang Differential Revision: D29683086 fbshipit-source-id: 8c80941472089f8daba84ae29597e75e603850e4	2021-07-13 12:40:43 -07:00
dependabot[bot]	e8e911a11c	Bump addressable from 2.7.0 to 2.8.0 in /docs (#8515 ) Summary: Bumps [addressable](https://github.com/sporkmonger/addressable) from 2.7.0 to 2.8.0. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/sporkmonger/addressable/blob/main/CHANGELOG.md">addressable's changelog</a>.</em></p> <blockquote> <h1>Addressable 2.8.0</h1> <ul> <li>fixes ReDoS vulnerability in Addressable::Template#match</li> <li>no longer replaces <code>+</code> with spaces in queries for non-http(s) schemes</li> <li>fixed encoding ipv6 literals</li> <li>the <code>:compacted</code> flag for <code>normalized_query</code> now dedupes parameters</li> <li>fix broken <code>escape_component</code> alias</li> <li>dropping support for Ruby 2.0 and 2.1</li> <li>adding Ruby 3.0 compatibility for development tasks</li> <li>drop support for <code>rack-mount</code> and remove Addressable::Template#generate</li> <li>performance improvements</li> <li>switch CI/CD to GitHub Actions</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`6469a232c0`"><code>6469a23</code></a> Updating gemspec again</li> <li><a href="`24336385de`"><code>2433638</code></a> Merge branch 'main' of github.com:sporkmonger/addressable into main</li> <li><a href="`e9c76b8897`"><code>e9c76b8</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/sporkmonger/addressable/issues/378">https://github.com/facebook/rocksdb/issues/378</a> from ashmaroli/flat-map</li> <li><a href="`56c5cf7ece`"><code>56c5cf7</code></a> Update the gemspec</li> <li><a href="`c1fed1ca0a`"><code>c1fed1c</code></a> Require a non-vulnerable rake</li> <li><a href="`0d8a3127e3`"><code>0d8a312</code></a> Adding note about ReDoS vulnerability</li> <li><a href="`89c76130ce`"><code>89c7613</code></a> Merge branch 'template-regexp' into main</li> <li><a href="`cf8884f815`"><code>cf8884f</code></a> Note about alias fix</li> <li><a href="`bb03f7112e`"><code>bb03f71</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/sporkmonger/addressable/issues/371">https://github.com/facebook/rocksdb/issues/371</a> from charleystran/add_missing_encode_component_doc_entry</li> <li><a href="`6d1d8094a6`"><code>6d1d809</code></a> Adding note about :compacted normalization</li> <li>Additional commits viewable in <a href="https://github.com/sporkmonger/addressable/compare/addressable-2.7.0...addressable-2.8.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=addressable&package-manager=bundler&previous-version=2.7.0&new-version=2.8.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `dependabot rebase` will rebase this PR - `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `dependabot merge` will merge this PR after your CI passes on it - `dependabot squash and merge` will squash and merge this PR after your CI passes on it - `dependabot cancel merge` will cancel a previously requested merge and block automerging - `dependabot reopen` will reopen this PR if it is closed - `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/facebook/rocksdb/network/alerts). </details> Pull Request resolved: https://github.com/facebook/rocksdb/pull/8515 Reviewed By: jay-zhuang Differential Revision: D29668988 Pulled By: ajkr fbshipit-source-id: c4b7abd4a879a7b562cb8ba745088dba6644f503	2021-07-12 17:06:07 -07:00
Peter Dillinger	a53d6d25e0	Improve support for valgrind error on reachable (#8503 ) Summary: MyRocks apparently uses valgrind to check for unreachable unfreed data, which is stricter than our valgrind checks. Internal ref: D29257815 This patch adds valgrind support to STATIC_AVOID_DESTRUCTION so that it's not reported with those stricter checks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8503 Test Plan: make valgrind_test Also, with modified VALGRIND_OPTS (see Makefile), more kinds of failures seen before than after this commit. Reviewed By: ajkr, yizhang82 Differential Revision: D29597784 Pulled By: pdillinger fbshipit-source-id: 360de157a176aec4d1be99ca20d160ecd47c0873	2021-07-12 17:00:27 -07:00
mrambacher	da90e23998	Improvements to benchmark.sh script (#8346 ) Summary: 1. Fix printing of stats when there are no writes (wamp=0). Previously had a div0 error 2. Added multireadrandom command as a valid target 3. Added ability to pass additional command line options to db_bench. Now can say things like benchmark.sh readrandom --mmap_read and the option will be passed to db_bench. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8346 Reviewed By: zhichao-cao Differential Revision: D29500436 Pulled By: mrambacher fbshipit-source-id: 54e90708aae9133be3a903e35efdf8f8abbd86fa	2021-07-12 12:18:17 -07:00
bjlemaire	955b80e84f	Add WARN/INFO for mempurge output status. (#8514 ) Summary: The MemPurge output status can either be an Abort if the mempurge is aborted due to the new_mem memtable reaching more than the target capacity (currently 60%), or for other reasons. As a result, in the log, we want to differentiate between an abort status, which in this PR only leads to a ROCKS_LOG_INFO, and any other status, which in this PR leads to a ROCKS_LOG_WARN. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8514 Reviewed By: pdillinger Differential Revision: D29662446 Pulled By: bjlemaire fbshipit-source-id: c9bec8e238ebc7ecb14fbbddf580e6887e281c16	2021-07-12 10:42:14 -07:00
Dmitry Vorobev	0b75b22321	Implement missing Handler methods in ColumnFamilyCollector. (#8456 ) Summary: When db is open as secondary, there are basically 2 step process: 1) Collect column families from wal log 2) Apply changes to Memtable In case primary db is TransactionDB instance, wal log will contain some additional data, like noop, etc. ColumnFamilyCollector doesn't implement methods to handle these, so it fails to open a wal log written by TransactionDB. (Everything works fine with standard DB::Open). Memtable recovery process knows how to handle such wal logs, so only missing piece seems to be ColumnFamilyCollector. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8456 Reviewed By: ajkr Differential Revision: D29455945 Pulled By: mrambacher fbshipit-source-id: 5b29560fcbc008e17e95d0dc4b07558f3d63e26f	2021-07-12 09:23:45 -07:00
Myth	bbdc4f2e9a	Fix a minor issue in checkpoint test case (#8483 ) Summary: A very simple change :) Pull Request resolved: https://github.com/facebook/rocksdb/pull/8483 Reviewed By: ajkr Differential Revision: D29558904 Pulled By: mrambacher fbshipit-source-id: bbe68c20c861103726cb6231ca3fb8fbe1e5a546	2021-07-12 09:09:09 -07:00
mrambacher	c8665611bc	Make FlushBlockPolicyFactory into a Customizable class (#8432 ) Summary: Add ability to treat FlushBlockPolicyFactory as a Customizable. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8432 Reviewed By: zhichao-cao Differential Revision: D29558941 Pulled By: mrambacher fbshipit-source-id: 4a791af941ea4a65fc2f1fdfb1d7a95f42ca6774	2021-07-12 09:04:59 -07:00
Adam Retter	5afd1e309c	Correct CVS -> CSV typo (#8513 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8513 Reviewed By: jay-zhuang Differential Revision: D29654066 Pulled By: mrambacher fbshipit-source-id: b8f492fe21edd37fe1f1c5a4a0e9153f58bbf3e2	2021-07-12 05:05:16 -07:00
anand76	d1b70b05a6	Avoid passing existing BG error to WriteStatusCheck (#8511 ) Summary: In ```DBImpl::WriteImpl()```, we call ```PreprocessWrite()``` which, among other things, checks the BG error and returns it set. This return status is later on passed to ```WriteStatusCheck()```, which calls ```SetBGError()```. This results in a spurious call, and info logs, on every user write request. We should avoid passing the ```PreprocessWrite()``` return status to ```WriteStatusCheck()```, as the former would have called ```SetBGError()``` already if it encountered any new errors, such as error when creating a new WAL file. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8511 Test Plan: Run existing tests Reviewed By: zhichao-cao Differential Revision: D29639917 Pulled By: anand1976 fbshipit-source-id: 19234163969e1645dbeb273712aaf5cd9ea2b182	2021-07-11 22:37:52 -07:00
Baptiste Lemaire	837705ad80	Make mempurge a background process (equivalent to in-memory compaction). (#8505 ) Summary: In https://github.com/facebook/rocksdb/issues/8454, I introduced a new process baptized `MemPurge` (memtable garbage collection). This new PR is built upon this past mempurge prototype. In this PR, I made the `mempurge` process a background task, which provides superior performance since the mempurge process does not cling on the db_mutex anymore, and addresses severe restrictions from the past iteration (including a scenario where the past mempurge was failling, when a memtable was mempurged but was still referred to by an iterator/snapshot/...). Now the mempurge process ressembles an in-memory compaction process: the stack of immutable memtables is filtered out, and the useful payload is used to populate an output memtable. If the output memtable is filled at more than 60% capacity (arbitrary heuristic) the mempurge process is aborted and a regular flush process takes place, else the output memtable is kept in the immutable memtable stack. Note that adding this output memtable to the `imm()` memtable stack does not trigger another flush process, so that the flush thread can go to sleep at the end of a successful mempurge. MemPurge is activated by making the `experimental_allow_mempurge` flag `true`. When activated, the `MemPurge` process will always happen when the flush reason is `kWriteBufferFull`. The 3 unit tests confirm that this process supports `Put`, `Get`, `Delete`, `DeleteRange` operators and is compatible with `Iterators` and `CompactionFilters`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8505 Reviewed By: pdillinger Differential Revision: D29619283 Pulled By: bjlemaire fbshipit-source-id: 8a99bee76b63a8211bff1a00e0ae32360aaece95	2021-07-09 17:23:59 -07:00
qieqieplus	bb485e986a	Add ribbon filter to C API (#8486 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8486 Reviewed By: jay-zhuang Differential Revision: D29625501 Pulled By: pdillinger fbshipit-source-id: e6e2a455ae62a71f3a202278a751b9bba17ad03c	2021-07-09 16:22:48 -07:00
Jay Zhuang	5dd18a8d8e	Add micro-benchmark support (#8493 ) Summary: Add google benchmark for microbench. Add ribbon_bench for benchmark ribbon filter vs. other filters. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8493 Test Plan: added test to CI To run the benchmark on devhost: Install benchmark: `$ sudo dnf install google-benchmark-devel` Build and run: `$ ROCKSDB_NO_FBCODE=1 DEBUG_LEVEL=0 make microbench` or with cmake: `$ mkdir build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DWITH_BENCHMARK=1 && make microbench` Reviewed By: pdillinger Differential Revision: D29589649 Pulled By: jay-zhuang fbshipit-source-id: 8fed13b562bef4472f161ecacec1ab6b18911dff	2021-07-08 18:22:45 -07:00
sdong	f127d459ad	Add comments to options.bottommost_compression (#8415 ) Summary: Add comments to options.bottommost_compression for options.num_levels=1 Pull Request resolved: https://github.com/facebook/rocksdb/pull/8415 Reviewed By: ajkr Differential Revision: D29181997 fbshipit-source-id: 5f0f49470f75d796320ecb24d5dc4ef4eb6fbe0f	2021-07-08 10:50:59 -07:00
longlijian	ac3f3f3719	Eliminate compiler complaining, which the return type of the function… (#8498 ) Summary: … should be uint64_t. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8498 Reviewed By: jay-zhuang Differential Revision: D29605064 Pulled By: ajkr fbshipit-source-id: e431448ac9d8a37ae83679c4cc5732e29fe49de4	2021-07-08 10:09:05 -07:00
sdong	b1a53db327	FaultInjectionTestFS::DeleteFilesCreatedAfterLastDirSync() to recover… (#8501 ) Summary: … small overwritten files. If a file is overwritten with renamed and the parent path is not synced, FaultInjectionTestFS::DeleteFilesCreatedAfterLastDirSync() will delete the file. However, RocksDB relies on file renaming to be atomic no matter whether the parent directory is synced or not, and the current behavior breaks the assumption and caused some false positive: https://github.com/facebook/rocksdb/pull/8489 Since the atomic renaming is used in CURRENT files, to fix the problem, in FaultInjectionTestFS::DeleteFilesCreatedAfterLastDirSync(), we recover the state of overwritten file if the file is small. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8501 Test Plan: Run stress test for a while and see it doesn't break. Reviewed By: anand1976 Differential Revision: D29594384 fbshipit-source-id: 589b5c2f0a9d2aca53752d7bdb0231efa5b3ae92	2021-07-07 16:23:23 -07:00
Andrew Kryczka	ed8eb436db	Move slow valgrind tests behind -DROCKSDB_FULL_VALGRIND_RUN (#8475 ) Summary: Various tests had disabled valgrind due to it slowing down and timing out (as is the case right now) the CI runs. Where a test was disabled with no comment, I assumed slowness was the cause. For these tests that were slow under valgrind, as well as the ones identified in https://github.com/facebook/rocksdb/issues/8352, this PR moves them behind the compiler flag `-DROCKSDB_FULL_VALGRIND_RUN`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8475 Test Plan: running `make full_valgrind_test`, `make valgrind_test`, `make check`; will verify they appear working correctly Reviewed By: jay-zhuang Differential Revision: D29504843 Pulled By: ajkr fbshipit-source-id: 2aac90749cfbd30d5ce11cb29a07a1b9314eeea7	2021-07-07 11:14:05 -07:00
Baptiste Lemaire	714ce5041d	Fix clang_analyzer failure (#8492 ) Summary: Previously, the following command: ```USE_CLANG=1 TEST_TMPDIR=/dev/shm/rocksdb OPT=-g make -j$(nproc) analyze``` was raising an error/warning the new_mem could potentially be a `nullptr`. This error appeared due to code changes from https://github.com/facebook/rocksdb/issues/8454, including an if-statement containing "`... && new_mem != nullptr && ...`", which made the analyzer believe that past this `if`-statement, a `new_mem==nullptr` was a possible scenario. This code patch simply introduces `assert`s and removes this condition in the `if`-statement. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8492 Reviewed By: jay-zhuang Differential Revision: D29571275 Pulled By: bjlemaire fbshipit-source-id: 75d72246b70ebbbae7dea11ccb5778686d8bcbea	2021-07-06 18:48:56 -07:00
anand76	df4197ca6e	Bypass buffer in TestFSWritableFile if direct IO is enabled (#8490 ) Summary: ```TestFSWritableFile``` buffers data in ```Append``` in order to simulate unsynced data loss on crash. This is only required for buffered IO and should be disabled for direct IO. Otherwise, it causes crash tests to assert on the buffer address alignment - ```db_stress: env/io_posix.cc:1194: virtual rocksdb::IOStatus rocksdb::PosixWritableFile::Append(const rocksdb::Slice&, const rocksdb::IOOptions&, rocksdb::IODebugContext*): Assertion `IsSectorAligned(data.data(), GetRequiredBufferAlignment())' failed.```. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8490 Reviewed By: zhichao-cao Differential Revision: D29565080 Pulled By: anand1976 fbshipit-source-id: 682831fd66ed3b9597caa74fc453e22dfaf9b973	2021-07-06 16:46:16 -07:00
anand76	fcd8088333	Temporarily disable file deletion after open failure in db_stress (#8489 ) Summary: Write and metadata error injection during DB open was enabled in https://github.com/facebook/rocksdb/issues/8474. This causes crash tests to fail very frequently due to another fault injection feature that deletes files created after the last dir sync during DB open. In real life, a similar failure would happen if the FS returns error on the CURRENT file rename, but the rename actually succeeded and got partially persisted (dir entry for the old CURRENT file got removed, but the entry for the new one is not persisted). Temporarily disable the fault injection feature until we figure out the likelihood of this bug happening and the proper way to fix it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8489 Test Plan: Stress test can open the DB successfully Reviewed By: siying Differential Revision: D29564516 Pulled By: anand1976 fbshipit-source-id: ffd1650715ea3c5bf7131936b0ca6fcf66f4e14e	2021-07-06 14:16:57 -07:00
sdong	f33611d5e9	Stress test to inject read failures in DB reopen (#8476 ) Summary: Inject read failures in DB reopen, just as what we do for metadata writes and writes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8476 Test Plan: Some manual tests and make sure failures are triggered. Reviewed By: anand1976 Differential Revision: D29507283 fbshipit-source-id: d04da0163973447041038bd87701686a417c4e0c	2021-07-06 11:05:27 -07:00
Levi Tamasi	1ae026c400	Partially revert the "apply subrange of table property collectors" change (#8465 ) Summary: We ended up using a different approach for tracking the amount of garbage in blob files (see e.g. https://github.com/facebook/rocksdb/pull/8450), so the ability to apply only a range of table property collectors is now unnecessary. The patch reverts this part of https://github.com/facebook/rocksdb/pull/8298 while keeping the cleanup done in that PR. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8465 Test Plan: `make check` Reviewed By: jay-zhuang Differential Revision: D29399921 Pulled By: ltamasi fbshipit-source-id: af64816c357d0829b9d7ba8ca1477038138f6f0a	2021-07-06 10:14:32 -07:00
mrambacher	570248aeff	Make SecondaryCache Customizable (#8480 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8480 Reviewed By: zhichao-cao Differential Revision: D29528740 Pulled By: mrambacher fbshipit-source-id: fd0f70d15f66611c8498257a9973f7e98ca13839	2021-07-06 09:18:08 -07:00
Baptiste Lemaire	9dc887ece0	Memtable "MemPurge" prototype (#8454 ) Summary: Implement an experimental feature called "MemPurge", which consists in purging "garbage" bytes out of a memtable and reuse the memtable struct instead of making it immutable and eventually flushing its content to storage. The prototype is by default deactivated and is not intended for use. It is intended for correctness and validation testing. At the moment, the "MemPurge" feature can be switched on by using the `options.experimental_allow_mempurge` flag. For this early stage, when the allow_mempurge flag is set to `true`, all the flush operations will be rerouted to perform a MemPurge. This is a temporary design decision that will give us the time to explore meaningful heuristics to use MemPurge at the right time for relevant workloads . Moreover, the current MemPurge operation only supports `Puts`, `Deletes`, `DeleteRange` operations, and handles `Iterators` as well as `CompactionFilter`s that are invoked at flush time . Three unit tests are added to `db_flush_test.cc` to test if MemPurge works correctly (and checks that the previously mentioned operations are fully supported thoroughly tested). One noticeable design decision is the timing of the MemPurge operation in the memtable workflow: for this prototype, the mempurge happens when the memtable is switched (and usually made immutable). This is an inefficient process because it implies that the entirety of the MemPurge operation happens while holding the db_mutex. Future commits will make the MemPurge operation a background task (akin to the regular flush operation) and aim at drastically enhancing the performance of this operation. The MemPurge is also not fully "WAL-compatible" yet, but when the WAL is full, or when the regular MemPurge operation fails (or when the purged memtable still needs to be flushed), a regular flush operation takes place. Later commits will also correct these behaviors. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8454 Reviewed By: anand1976 Differential Revision: D29433971 Pulled By: bjlemaire fbshipit-source-id: 6af48213554e35048a7e03816955100a80a26dc5	2021-07-02 05:23:02 -07:00
Akanksha Mahajan	c76778e2bd	Call OnCompactionCompleted API in case of DisableManualCompaction (#8469 ) Summary: Call OnCompactionCompleted API in case of DisableManualCompaction() with updated Status::Incomplete Pull Request resolved: https://github.com/facebook/rocksdb/pull/8469 Reviewed By: ajkr Differential Revision: D29475517 Pulled By: akankshamahajan15 fbshipit-source-id: a1726c5e6ee18c0b5097ea04f5e6975fbe108055	2021-07-01 19:18:55 -07:00
Peter (Stig) Edwards	b20737709f	Add -report_open_timing to db_bench (#8464 ) Summary: Hello and thanks for RocksDB, This PR adds support for ```-report_open_timing true``` to ```db_bench```. It can be useful when tuning RocksDB on filesystem/env with high latencies for file level operations (create/delete/rename...) seen during ```((Optimistic)Transaction)DB::Open```. Some examples: ``` > db_bench -benchmarks updaterandom -num 1 -db /dev/shm/db_bench > db_bench -benchmarks updaterandom -num 0 -db /dev/shm/db_bench -use_existing_db true -report_open_timing true -readonly true 2>&1 \| grep OpenDb OpenDb: 3.90133 milliseconds > db_bench -benchmarks updaterandom -num 0 -db /dev/shm/db_bench -use_existing_db true -report_open_timing true -use_secondary_db true 2>&1 \| grep OpenDb OpenDb: 3.33414 milliseconds > db_bench -benchmarks updaterandom -num 0 -db /dev/shm/db_bench -use_existing_db true -report_open_timing true 2>&1 \| grep -A1 OpenDb OpenDb: 6.05423 milliseconds > db_bench -benchmarks updaterandom -num 1 > db_bench -benchmarks updaterandom -num 0 -use_existing_db true -report_open_timing true -readonly true 2>&1 \| grep OpenDb OpenDb: 4.06859 milliseconds > db_bench -benchmarks updaterandom -num 0 -use_existing_db true -report_open_timing true -use_secondary_db true 2>&1 \| grep OpenDb OpenDb: 2.85794 milliseconds > db_bench -benchmarks updaterandom -num 0 -use_existing_db true -report_open_timing true 2>&1 \| grep OpenDb OpenDb: 6.46376 milliseconds > db_bench -benchmarks updaterandom -num 1 -db /clustered_fs/db_bench > db_bench -benchmarks updaterandom -num 0 -db /clustered_fs/db_bench -use_existing_db true -report_open_timing true -readonly true 2>&1 \| grep OpenDb OpenDb: 3.79805 milliseconds > db_bench -benchmarks updaterandom -num 0 -db /clustered_fs/db_bench -use_existing_db true -report_open_timing true -use_secondary_db true 2>&1 \| grep OpenDb OpenDb: 3.00174 milliseconds > db_bench -benchmarks updaterandom -num 0 -db /clustered_fs/db_bench -use_existing_db true -report_open_timing true 2>&1 \| grep OpenDb OpenDb: 24.8732 milliseconds ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/8464 Reviewed By: hx235 Differential Revision: D29398096 Pulled By: zhichao-cao fbshipit-source-id: 8f05dc3284f084612a3f30234e39e1c37548f50c	2021-07-01 18:42:19 -07:00
Zhichao Cao	a95a776d75	Inject fatal write failures to db_stress when DB is running (#8479 ) Summary: add the injest_error_severity to control if it is a retryable IO Error or a fatal or unrecoverable error. Use a flag to indicate, if fatal error comes, the flag is set and db is stopped (but not corrupted). Pull Request resolved: https://github.com/facebook/rocksdb/pull/8479 Test Plan: run ./db_stress --reopen=0 --read_fault_one_in=1000 --write_fault_one_in=5 --disable_wal=true --write_buffer_size=3000000 -writepercent=5 -readpercent=50 --injest_error_severity=2 --column_families=1, make check Reviewed By: anand1976 Differential Revision: D29524271 Pulled By: zhichao-cao fbshipit-source-id: 1aa9fb9b5655b0adba6f5ad12005ca8c074c795b	2021-07-01 14:16:47 -07:00

1 2 3 4 5 ...

10152 Commits