rocksdb

Author	SHA1	Message	Date
Yanqin Jin	ab202e8d72	Add a new stats level to exclude tickers (#7329 ) Summary: Currently, application may pass a statistics object to db but later wants to reduce stats tracking overhead by setting stats level to kExceptHistogramOrTimers (the current lowest level). Tickers will still be incremented, causing up to 1% CPU. We can add a new lowest stats level `kExceptTickers` to disable ticker incrementing as well, thus reducing CPU cycles spent on tickers. Test Plan (devserver): ``` make check make clean DEBUG_LEVEL=0 make db_bench ./db_bench -perf_level=1 -stats_level=0 -statistics -benchmarks=fillseq,readrandom -duration=120 ``` Measure CPU util (%) before and after change: CPU util by rocksdb::RecordTick: 1.1 vs (<0.1) Pull Request resolved: https://github.com/facebook/rocksdb/pull/7329 Reviewed By: pdillinger Differential Revision: D23434014 Pulled By: riversand963 fbshipit-source-id: 72ff0f02a192ac476d4b0044b9f37fd4a22ff0d4	2020-09-04 23:25:03 -07:00
Cheng Chang	3f9b75604d	Fix wrong level args (#7346 ) Summary: The level args should be output level instead of input levels. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7346 Test Plan: make check Reviewed By: ajkr Differential Revision: D23506373 Pulled By: cheng-chang fbshipit-source-id: b2f701d44c13581c5c10c4dbebded4fcd354d641	2020-09-03 23:17:37 -07:00
Eduardo Barreto Alexandre	5b1ccdc191	Expose rocksdb_open_column_families_with_ttl C function (#7314 ) Summary: This PR creates `rocksdb_open_column_families_with_ttl` which allows C API users to open a DBWithTLL with column families. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7314 Reviewed By: cheng-chang Differential Revision: D23430287 Pulled By: ajkr fbshipit-source-id: 307aa21d170d1402653263a91f6f832ef76afba0	2020-09-03 14:39:58 -07:00
Hiep	d0c1a01c1b	Avoid converting MERGES to PUTS when allow_ingest_behind is true (#7166 ) Summary: - Closes https://github.com/facebook/rocksdb/issues/6490 - Currently MERGEs are converted to PUTs at bottom or compaction has reached the beginning of the key, this can wrongly cover a PUT future base case. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7166 Test Plan: - Automated: `make all check` - Manual: With `allow_ingest_behind = true`, add Merge operations to a key then run compaction. Then run ingesting external files to make sure the base case is probably compacted with existing Merges. Reviewed By: cheng-chang Differential Revision: D23325425 Pulled By: ajkr fbshipit-source-id: 3eb415eb7b381b5453e45245393566153b1abb68	2020-09-03 14:39:58 -07:00
Andrew Kryczka	177f8bd063	Bound L0->Lbase fanout in dynamic leveled compaction (#7325 ) Summary: L0 score is based on size target and number of files. The size target used is `max_bytes_for_level_base`. However, the base level's size can dynamically expand in write burst mode. In fact, it can expand so much that L0->Lbase becomes the highest fanout in target sizes. This doesn't make sense from an efficiency perspective, so this PR bounds the L0->Lbase fanout to the smoothed level multiplier. The L0 scoring based on file count remains unchanged. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7325 Test Plan: contrived benchmark that exhibits the problem: ``` $ TEST_TMPDIR=/data/users/andrewkr/ ./db_bench -benchmarks=filluniquerandom,readrandom -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 -level0_file_num_compaction_trigger=4 -level_compaction_dynamic_level_bytes=true -compression_type=none -max_background_jobs=12 -rate_limiter_bytes_per_sec=104857600 -benchmark_write_rate_limit=10485760 -num=100000000 ``` Results: - "Burst W-Amp" is the write-amp near the end of the fillrandom benchmark - "Total W-Amp" is the write-amp after readrandom has run a while and all levels no longer need compaction Branch \| Burst W-Amp \| Total W-Amp \| fillrandom (MB/s) -- \| -- \| -- \| -- master \| 20.2 \| 21.5 \| 4.7 dynamic-l0-score \| 12.6 \| 14.1 \| 7.2 Reviewed By: siying Differential Revision: D23412935 Pulled By: ajkr fbshipit-source-id: f91f2067188e432dd39deab02f1c56f195057a0e	2020-09-01 19:34:01 -07:00
Levi Tamasi	792d2f906e	Log info about generated blob files in BlobFileBuilder (#7324 ) Summary: The patch adds a log message to `BlobFileBuilder` that is logged upon generating a blob file, similarly to how we log the generation of table files during flush and compaction. The log message contains the column family name, job id, blob file number, and the number and total size of blobs in the new file. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7324 Test Plan: Ran `make check` and checked the actual log messages using a custom `db_bench`. Reviewed By: riversand963 Differential Revision: D23402229 Pulled By: ltamasi fbshipit-source-id: ca42beb4db284b783d1eb2651f321032a45d0c5f	2020-08-31 13:24:12 -07:00
Akanksha Mahajan	963314ffd6	Add unit test for max_write_buffer_size_to_maintain (#7311 ) Summary: Add a unit test case to check memory usage when max_write_buffer_size_to_maintain is set if flushed immutable memtables are trimmed timely or not. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7311 Test Plan: Compared the results with before bug fix. Reviewed By: ltamasi Differential Revision: D23321702 Pulled By: akankshamahajan15 fbshipit-source-id: da04ee21137d641a07fd499a9e2749eb036fcb1e	2020-08-28 17:38:05 -07:00
Levi Tamasi	5043960623	Add a blob file builder class that can be used in background jobs (#7306 ) Summary: The patch adds a class called `BlobFileBuilder` that can be used to build and cut blob files in background jobs (flushes/compactions). The class enforces a value size threshold (`min_blob_size`; smaller blobs will be inlined in the LSM tree itself), and supports specifying a blob file size limit (`blob_file_size`), as well as compression (`blob_compression_type`) and checksums for blob files. It also keeps track of the generated blob files and their associated `BlobFileAddition` metadata, which can be applied as part of the background job's `VersionEdit`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7306 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D23298817 Pulled By: ltamasi fbshipit-source-id: 38f35d81dab1ba81f15236240612ec173d7f21b5	2020-08-27 11:55:54 -07:00
Akanksha Mahajan	8e0df9050c	Store FSRandomAccessPtr object in RandomAccessFileReader (#7192 ) Summary: Replace FSRandomAccessFile pointer with FSRandomAccessFilePtr object in RandomAccessFileReader. This new object wraps FSRandomAccessFile pointer. Objective: If tracing is enabled, FSRandomAccessFile Ptr returns FSRandomAccessFileTracingWrapper pointer that includes all necessary information in IORecord and calls underlying FileSystem and invokes IOTracer to dump that record in a binary file. If tracing is disabled then, underlying FileSystem pointer is returned directly. FSRandomAccessFilePtr wrapper class is added to bypass the FSRandomAccessFileWrapper when tracing is disabled. Test Plan: make check -j64 Pull Request resolved: https://github.com/facebook/rocksdb/pull/7192 Reviewed By: anand1976 Differential Revision: D23356867 Pulled By: akankshamahajan15 fbshipit-source-id: 48f31168166a17a7444b40be44a9a9d4a5c7182c	2020-08-27 11:21:52 -07:00
Peter Dillinger	9aad24da55	Real fix for race in backup custom checksum checking (#7309 ) Summary: This is a "real" fix for the issue worked around in https://github.com/facebook/rocksdb/issues/7294. To get DB checksum info for live files, we now read the manifest file that will become part of the checkpoint/backup. This requires a little extra handling in taking a custom checkpoint, including only reading the manifest file up to the size prescribed by the checkpoint. This moves GetFileChecksumsFromManifest from backup code to file_checksum_helper.{h,cc} and removes apparently unnecessary checking related to column families. Updated HISTORY.md and warned potential future users of DB::GetLiveFilesChecksumInfo() Pull Request resolved: https://github.com/facebook/rocksdb/pull/7309 Test Plan: updated unit test, before and after Reviewed By: ajkr Differential Revision: D23311994 Pulled By: pdillinger fbshipit-source-id: 741e30a2dc1830e8208f7648fcc8c5f000d4e2d5	2020-08-26 10:39:20 -07:00
sdong	722814e357	Get() to fail with underlying failures in PartitionIndexReader::CacheDependencies() (#7297 ) Summary: Right now all I/O failures under PartitionIndexReader::CacheDependencies() is swallowed. This doesn't impact correctness but we've made a decision that any I/O error in read path now should be returned to users for awareness. Return errors in those cases instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7297 Test Plan: Add a new unit test that ingest errors in this code path and see Get() fails. Only one I/O path is hit in PartitionIndexReader::CacheDependencies(). Several option changes are attempt but not able to got other pread paths triggered. Not sure whether other failure cases would be even possible. Would rely on continuous stress test to validate it. Reviewed By: anand1976 Differential Revision: D23257950 fbshipit-source-id: 859dbc92fa239996e1bb378329344d3d54168c03	2020-08-25 19:01:05 -07:00
sdong	cecdd5d2ab	Parameterize DBBasicTest.CompactBetweenSnapshots (#7301 ) Summary: DBBasicTest.CompactBetweenSnapshots can time-out in some slow-I/O hosts. Parameterize it so that single test runs shorter. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7301 Test Plan: Run the test and see see different runs are of different configerations in a hacky way. Reviewed By: ltamasi Differential Revision: D23277733 fbshipit-source-id: 1f717b4131322d175abf9e211131fe7e9b1ef758	2020-08-25 15:42:11 -07:00
Zhichao Cao	d51f88c9e4	Pass SST file checksum information through OnTableFileCreated (#7108 ) Summary: When SST file is created, application is able to know the file information through OnTableFileCreated callback in LogAndNotifyTableFileCreationFinished. Since file checksum information can be useful for application when the SST file is created, we add file_checksum and file_checksum_func_name information to TableFileCreationInfo, which will be passed through OnTableFileCreated. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7108 Test Plan: make check, listener_test. Reviewed By: ajkr Differential Revision: D22470240 Pulled By: zhichao-cao fbshipit-source-id: 92c20344d9b986eadfe3480f3769bf4add0dbaae	2020-08-25 10:46:11 -07:00
Connor1996	416943bf28	Eliminates a no-op compaction upon snapshot release when disabling auto compactions (#7267 ) Summary: After releasing a snapshot, it checks whether it is suitable to trigger bottom compactions. When disabling auto compactions, it may still schedule compaction when releasing a snapshot. Whereas no compaction job will be actually handled, so the state of LSM is not changed and compaction will be triggered again and again every time releasing a snapshot. Too frequent compactions lead to high CPU usage and high db_mutex lock contention which affects foreground write duration finally. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7267 Test Plan: - make check - manual test Reviewed By: akankshamahajan15 Differential Revision: D23252880 Pulled By: ajkr fbshipit-source-id: 4431e071a35d9912a2a3592875db27bae521434b	2020-08-24 22:06:45 -07:00
mrambacher	b7e1c5213f	Add some simulator cache and block tracer tests to ASSERT_STATUS_CHECKED (#7305 ) Summary: More tests now pass. When in doubt, I added a TODO comment to check what should happen with an ignored error. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7305 Reviewed By: akankshamahajan15 Differential Revision: D23301262 Pulled By: ajkr fbshipit-source-id: 5f120edc7393560aefc0633250277bbc7e8de9e6	2020-08-24 16:43:31 -07:00
sdong	21ce018a32	Disable fsync in some ExternalSSTFileTest tests (#7303 ) Summary: Some ExternalSSTFileTest runs very long on some places. Disable fsync in some tests to speed them up. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7303 Test Plan: Run these tests. Reviewed By: riversand963 Differential Revision: D23280261 fbshipit-source-id: 0dca862e462f9e6d807f393320a1f82aa5b87e59	2020-08-24 11:26:09 -07:00
Akanksha Mahajan	3844612625	Bug Fix for memtables not trimmed down. (#7296 ) Summary: When a memtable is trimmed in MemTableListVersion, the memtable is only added to delete list if it is the last reference. However it is not the last reference as it is held by the super version. But the super version would not be switched if the delete list is empty. So the memtable is never destroyed and memory usage increases beyond write_buffer_size + max_write_buffer_size_to_maintain. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7296 Test Plan: 1. ./db_bench -benchmarks=randomtransaction -optimistic_transaction_db=1 -statistics -stats_interval_seconds=1 -duration=90 -num=500000 --max_write_buffer_size_to_maintain=16000000 --transaction_set_snapshot Reviewed By: ltamasi Differential Revision: D23267395 Pulled By: akankshamahajan15 fbshipit-source-id: 3a8d437fe9f4015f851ff84c0e29528aa946b650	2020-08-21 13:29:05 -07:00
mrambacher	e9befdebbf	Add EnvTestWithParam::OptionsTest to the ASSERT_STATUS_CHECKED passes (#7283 ) Summary: This test uses database functionality and required more extensive work to get it to pass than the other tests. The DB functionality required for this test now passes the check. When it was unclear what the proper behavior was for unchecked status codes, a TODO was added. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7283 Reviewed By: akankshamahajan15 Differential Revision: D23251497 Pulled By: ajkr fbshipit-source-id: 52b79629bdafa0a58de8ead1d1d66f141b331523	2020-08-20 19:18:35 -07:00
Stanislav Tkach	b288f0131b	Add getters for the read options to the C API (#7289 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7289 Reviewed By: akankshamahajan15 Differential Revision: D23252520 Pulled By: ajkr fbshipit-source-id: 85cea485a6dcaa1c67c32a83eb49a1b623966609	2020-08-20 16:36:19 -07:00
Cheng Chang	ce4192375d	Track WAL in MANIFEST: minor udpates (#7282 ) Summary: The updates resolve comments left from https://github.com/facebook/rocksdb/pull/7164. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7282 Test Plan: wal_edit_test Reviewed By: ltamasi Differential Revision: D23196824 Pulled By: cheng-chang fbshipit-source-id: 797f3fef27fc72114c2be777d9eadd3429da5301	2020-08-20 15:12:00 -07:00
Jay Zhuang	3e422ce0ca	Fix a timer_test deadlock (#7277 ) Summary: There's a potential deadlock caused by MockTimeEnv time value get to a large number, which causes TimedWait() wait forever. The test misuses the microseconds as seconds, making it more likely to happen. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7277 Reviewed By: pdillinger Differential Revision: D23183873 Pulled By: jay-zhuang fbshipit-source-id: 6fc38ebd40b4125a99551204b271f91a27e70086	2020-08-20 08:43:13 -07:00
Jay Zhuang	ac7dcfda10	Add missing ComputeCompactionScore() for a new universal manual compaction (#7281 ) Summary: Seems it's only causing assert failure during compaction pick, but in production code, the problematic compactions are excluded at a later step. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7281 Reviewed By: akankshamahajan15 Differential Revision: D23228000 Pulled By: jay-zhuang fbshipit-source-id: 2e4055aeebe0f5a2b07e299e0a2d51a1ad2e216d	2020-08-19 17:42:08 -07:00
Levi Tamasi	b9bb59d49d	Add initial set of options for integrated blob write path (#7280 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7280 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D23195192 Pulled By: ltamasi fbshipit-source-id: 743b382de391963e62ba86119e9fbd0233ea3b3a	2020-08-18 18:32:37 -07:00
Akanksha Mahajan	cc24ac14eb	Store FSSequentialFilePtr object in SequenceFileReader (#7190 ) Summary: This diff contains following changes: 1. Replace `FSSequentialFile` pointer with `FSSequentialFilePtr` object that wraps `FSSequentialFile` pointer in `SequenceFileReader`. Objective: If tracing is enabled, `FSSequentialFilePtr` returns `FSSequentialFileTracingWrapper` pointer that includes all necessary information in `IORecord` and calls underlying FileSystem and invokes `IOTracer` to dump that record in a binary file. If tracing is disabled then, underlying `FileSystem` pointer is returned directly. `FSSequentialFilePtr` wrapper class is added to bypass the `FSSequentialFileTracingWrapper` when tracing is disabled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7190 Test Plan: make check -j64 COMPILE_WITH_TSAN=1 make check -j64 Reviewed By: anand1976 Differential Revision: D23059616 Pulled By: akankshamahajan15 fbshipit-source-id: 1564b94dd1297cd0fbfe2ed5c9cc3e20f7395301	2020-08-18 16:20:54 -07:00
sdong	b194c21bba	Whole DBTest to skip fsync (#7274 ) Summary: After https://github.com/facebook/rocksdb/pull/7036, we still see extra DBTest that can timeout when running 10 or 20 in parallel. Expand skip-fsync mode in whole DBTest. Still preserve other tests from doing this mode to be conservative. This commit reinstates https://github.com/facebook/rocksdb/issues/7049, whose un-revert was lost in an automatic infrastructure mis-merge. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7274 Test Plan: Run all existing files. Reviewed By: pdillinger Differential Revision: D23177444 fbshipit-source-id: 1f61690b2ac6333c3b2c87176fef6b2cba086b33	2020-08-17 18:42:25 -07:00
Andrew Kryczka	5d5ff82408	Disable `recycle_log_file_num` with `kTolerateCorruptedTailRecords` (#7271 ) Summary: The two features are naturally incompatible. WAL recycling expects the recovery to succeed upon encountering a corrupt record at the point where new data ends and recycled data remains at the tail. However, `WALRecoveryMode::kTolerateCorruptedTailRecords` must fail upon encountering any such corrupt record, as it cannot differentiate between this and a real corruption, which would cause committed updates to be truncated. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7271 Reviewed By: riversand963 Differential Revision: D23169923 Pulled By: ajkr fbshipit-source-id: 2cf8a3bcd2c9a0ecb0055a84725047a10fd4db50	2020-08-17 18:21:10 -07:00
Yanqin Jin	92593d511a	Add a new EntryType for deletion with timestamp (#7195 ) Summary: Add `kEntryDeleteWithTimestamp` to `EntryType` which is a public API. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7195 Test Plan: make check Reviewed By: ajkr Differential Revision: D22914704 Pulled By: riversand963 fbshipit-source-id: 886f73c6b70c527cad1c8fc9fc8d3afe60e1ea39	2020-08-17 16:26:06 -07:00
Levi Tamasi	9b083cb11c	Build blob file reader/writer classes in LITE mode as well (#7272 ) Summary: The patch makes sure that the functionality required for the new integrated BlobDB implementation (most importantly, the classes related to reading and writing blob files) is also built in LITE mode by removing the corresponding `#ifndef`s. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7272 Test Plan: Ran `make check` in both regular and LITE mode. Reviewed By: zhichao-cao Differential Revision: D23173280 Pulled By: ltamasi fbshipit-source-id: 1596bd1a76409a8a6d83d8f1dbfe08bfdea7ffe6	2020-08-17 15:19:05 -07:00
sdong	1760637539	CompactRange() refit level should confirm destination level is not empty (#7261 ) Summary: There is potential data race related CompactRange() with level refitting. After the compaction step and refitting step, some automatic compaction could put data to the destination level and cause the DB to be corrupted. Fix the bug by checking the target level to be empty. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7261 Test Plan: Add a unit test, which would fail with "Corruption: L1 have overlapping ranges '666F6F' seq:6, type:1 vs. '626172' seq:2, type:1", and now it succeeds. Reviewed By: ajkr Differential Revision: D23142269 fbshipit-source-id: 28bc14d5ac934c192260b23a4ce3f10a95e3ee91	2020-08-17 14:21:53 -07:00
matthewvon	2ad88ceae9	Populate cf_id member of CompactionJobInfo for OnCompactionBegin (#6938 ) Summary: Looks like somebody simply missed initializing a member variable. The column family ID, cf_id, is not set during OnCompactionBegin. But it is set properly in the next function for OnCompactionCompleted. Need this cf_id for tracking progress of a Stardog optimize since there may be multiple compactions required for a given column family. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6938 Reviewed By: siying Differential Revision: D23153235 Pulled By: ajkr fbshipit-source-id: 932938de3a4ebbc7ac89702f655583862587d251	2020-08-17 11:57:47 -07:00
Jay Zhuang	69760b4d05	Introduce a global StatsDumpScheduler for stats dumping (#7223 ) Summary: Have a global StatsDumpScheduler for all DB instance stats dumping, including `DumpStats()` and `PersistStats()`. Before this, there're 2 dedicate threads for every DB instance, one for DumpStats() one for PersistStats(), which could create lots of threads if there're hundreds DB instances. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7223 Reviewed By: riversand963 Differential Revision: D23056737 Pulled By: jay-zhuang fbshipit-source-id: 0faa2311142a73433ebb3317361db7cbf43faeba	2020-08-14 20:12:44 -07:00
Yanqin Jin	d758273ceb	Get() with timestamp should respect snapshot (#7227 ) Summary: If user-defined timestamp is enabled, current implementation can expose newer data to queries even if an older sequence number is specified via read_options.snapshot. This PR makes Get() respect sequence-number-based snapshot. Solution is simple. Besides using <ukey, ts, seq> to search the index for the key, we also verify that the candidate result's seq is smaller than or equal to seq. This requires passing a seq via `GetContext`, which results in the majority of code change caused by this PR. Also added a few unit tests to demonstrate standard visibility during point lookup and range scan when timestamp and snapshot are both present. Test plan (devserver): ``` make check $./db_bench --benchmarks=fillseq,readrandom -cache_size=$[6410241024] ``` Result this PR: readrandom : 4.827 micros/op 207180 ops/sec; 22.9 MB/s (1000000 of 1000000 found) master: readrandom : 4.936 micros/op 202610 ops/sec; 22.4 MB/s (1000000 of 1000000 found) Pull Request resolved: https://github.com/facebook/rocksdb/pull/7227 Reviewed By: ltamasi Differential Revision: D23015242 Pulled By: riversand963 fbshipit-source-id: ea7b85a728654553ba357d2e6a207b5e40f7376a	2020-08-14 19:20:58 -07:00
Andrew Kryczka	a1aa3f8385	Disable manual compaction during `ReFitLevel()` (#7250 ) Summary: Manual compaction with `CompactRangeOptions::change_levels` set could refit to a level targeted by another manual compaction. If force_consistency_checks were disabled, it could be possible for overlapping files to be written at that target level. This PR prevents the possibility by calling `DisableManualCompaction()` prior to `ReFitLevel()`. It also improves the manual compaction disabling mechanism to wait for pending manual compactions to complete before returning, and support disabling from multiple threads. Fixes https://github.com/facebook/rocksdb/issues/6432. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7250 Test Plan: crash test command that repro'd the bug reliably: ``` $ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --simple -target_file_size_base=524288 -write_buffer_size=1048576 -clear_column_family_one_in=0 -reopen=0 -max_key=10000000 -column_families=1 -max_background_compactions=8 -compact_range_one_in=100000 -compression_type=none -compaction_style=1 -num_levels=5 -universal_min_merge_width=4 -universal_max_merge_width=8 -level0_file_num_compaction_trigger=12 -rate_limiter_bytes_per_sec=1048576000 -universal_max_size_amplification_percent=100 --duration=3600 --interval=60 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --enable_compaction_filter=0 ``` Reviewed By: ltamasi Differential Revision: D23090800 Pulled By: ajkr fbshipit-source-id: afcbcd51b42ce76789fdb907d8b9ada790709c13	2020-08-14 11:29:52 -07:00
sdong	e7358da9a2	Upgrade tool chain (#7251 ) Summary: Upgrade tool chain to the latest. It is done mostly manually as build_tools/build_detect_platform fails to update many of them. Try to fix a new clang analyze warning with the new tool chain. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7251 Test Plan: "make all", "USE_CLANG=1 make all" Reviewed By: riversand963 Differential Revision: D23091090 fbshipit-source-id: 732e5a30137837431438f85f36296406b641f975	2020-08-12 19:30:00 -07:00
Levi Tamasi	9d6f48ec1d	Clean up CompressBlock/CompressBlockInternal a bit (#7249 ) Summary: The patch cleans up and refactors `CompressBlock` and `CompressBlockInternal` a bit. In particular, it does the following: * It renames `CompressBlockInternal` to `CompressData` and moves it to `util/compression.h`, where other general compression-related utilities are located. This will facilitate reuse in the BlobDB write path. * The signature of the method is changed so it now takes `compression_format_version` (similarly to the compression library specific methods) instead of `format_version` (which is specific to the block based table). * `GetCompressionFormatForVersion` no longer takes `compression_type` as a parameter. This parameter was only used in a (not entirely up-to-date) assertion; also, removing it eliminates the need to ensure this precondition holds at all call sites. * Does some minor cleanup in `CompressBlock`, for instance, it is now possible to pass only one of `sampled_output_fast` and `sampled_output_slow`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7249 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D23087278 Pulled By: ltamasi fbshipit-source-id: e6316e45baed8b4e7de7c1780c90501c2a3439b3	2020-08-12 18:25:48 -07:00
Akanksha Mahajan	1f9f630b27	Store FileSystemPtr object that contains FileSystem ptr (#7180 ) Summary: As part of the IOTracing project, this PR 1. Caches "FileSystemPtr" object(wrapper class that returns file system pointer based on tracing enabled) instead of "FileSystem" pointer. 2. FileSystemPtr object is created using FileSystem pointer and IOTracer pointer. 3. IOTracer shared_ptr is created in DBImpl and it is passed to different classes through constructor. 4. When tracing is enabled through DB::StartIOTrace, FileSystemPtr returns FileSystemTracingWrapper pointer for tracing purpose and when it is disabled underlying FileSystem pointer is returned. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7180 Test Plan: make check -j64 COMPILE_WITH_TSAN=1 make check -j64 Reviewed By: anand1976 Differential Revision: D22987117 Pulled By: akankshamahajan15 fbshipit-source-id: 6073617e4c2d5bc363914f3a1f55ae3b0a58fbf1	2020-08-12 17:31:23 -07:00
Zitan Chen	b578ca2e4d	BackupEngine supports custom file checksums (#7085 ) Summary: A new option `std::shared_ptr<FileChecksumGenFactory> backup_checksum_gen_factory` is added to `BackupableDBOptions`. This allows custom checksum functions to be used for creating, verifying, or restoring backups. Tests are added. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7085 Test Plan: Passed make check Reviewed By: pdillinger Differential Revision: D22390756 Pulled By: gg814 fbshipit-source-id: 3b7756ca444c2129844536b91c3ca09f53b6248f	2020-08-12 13:31:09 -07:00
Peter Dillinger	6ac1d25fd0	Fix+clean up handling of mock sleeps (#7101 ) Summary: We have a number of tests hanging on MacOS and windows due to mishandling of code for mock sleeps. In addition, the code was in terrible shape because the same variable (addon_time_) would sometimes refer to microseconds and sometimes to seconds. One test even assumed it was nanoseconds but was written to pass anyway. This has been cleaned up so that DB tests generally use a SpecialEnv function to mock sleep, for either some number of microseconds or seconds depending on the function called. But to call one of these, the test must first call SetMockSleep (precondition enforced with assertion), which also turns sleeps in RocksDB into mock sleeps. To also removes accounting for actual clock time, call SetTimeElapseOnlySleepOnReopen, which implies SetMockSleep (on DB re-open). This latter setting only works by applying on DB re-open, otherwise havoc can ensue if Env goes back in time with DB open. More specifics: Removed some unused test classes, and updated comments on the general problem. Fixed DBSSTTest.GetTotalSstFilesSize using a sync point callback instead of mock time. For this we have the only modification to production code, inserting a sync point callback in flush_job.cc, which is not a change to production behavior. Removed unnecessary resetting of mock times to 0 in many tests. RocksDB deals in relative time. Any behaviors relying on absolute date/time are likely a bug. (The above test DBSSTTest.GetTotalSstFilesSize was the only one clearly injecting a specific absolute time for actual testing convenience.) Just in case I misunderstood some test, I put this note in each replacement: // NOTE: Presumed unnecessary and removed: resetting mock time in env Strengthened some tests like MergeTestTime, MergeCompactionTimeTest, and FilterCompactionTimeTest in db_test.cc stats_history_test and blob_db_test are each their own beast, rather deeply dependent on MockTimeEnv. Each gets its own variant of a work-around for TimedWait in a mock time environment. (Reduces redundancy and inconsistency in stats_history_test.) Intended follow-up: Remove TimedWait from the public API of InstrumentedCondVar, and only make that accessible through Env by passing in an InstrumentedCondVar and a deadline. Then the Env implementations mocking time can fix this problem without using sync points. (Test infrastructure using sync points interferes with individual tests' control over sync points.) With that change, we can simplify/consolidate the scattered work-arounds. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7101 Test Plan: make check on Linux and MacOS Reviewed By: zhichao-cao Differential Revision: D23032815 Pulled By: pdillinger fbshipit-source-id: 7f33967ada8b83011fb54e8279365c008bd6610b	2020-08-11 12:41:30 -07:00
Levi Tamasi	a99fb67233	Remove redundant consistency check from VersionStorageInfo::AddFile (#7237 ) Summary: `VersionStorageInfo::AddFile` currently has a debug-mode consistency check to make sure the newly added file does not overlap with the previous one (for levels below L0). Considering that `VersionBuilder::CheckConsistency` also performs similar checks (in fact, those checks are more comprehensive and cover L0 as well), this check is redundant. The patch removes it and also cleans up `AddFile` a little. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7237 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D23041937 Pulled By: ltamasi fbshipit-source-id: e00665f3b83bfd17f86c54c238800f3d77d739bd	2020-08-11 09:23:17 -07:00
anand76	f308da5273	Fix delete triggered compaction for single level universal (#7224 ) Summary: Delete triggered compaction (DTC) for universal compaction style with ```num_levels = 1``` has been disabled for sometime due to a data correctness bug. This PR re-enables it with a bug fix. A file marked for compaction can be picked, along with all L0 files after it as the compaction input. We stop adding files to the input once we encounter a file already being compacted (the original bug failed to check the compaction status of the files). Tests: Add unit tests to ```compaction_picker_test.cc``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/7224 Reviewed By: ajkr Differential Revision: D23031845 Pulled By: anand1976 fbshipit-source-id: 9de3cab5f9774cede666c2c48d309a7d9b88a505	2020-08-10 12:19:17 -07:00
Yuhong Guo	5444942f15	Fix cmake build on MacOS (#7205 ) Summary: 1. `std::random_shuffle` is deprecated and now we can use `std::shuffle` ``` /rocksdb/db/prefix_test.cc:590:12: error: 'random_shuffle<std::__1::__wrap_iter<unsigned long long > >' is deprecated [-Werror,-Wdeprecated-declarations] std::random_shuffle(prefixes.begin(), prefixes.end()); ^ /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/algorithm:2982:1: note: 'random_shuffle<std::__1::__wrap_iter<unsigned long long > >' has been explicitly marked deprecated here _LIBCPP_DEPRECATED_IN_CXX14 void ^ /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/__config:1107:39: note: expanded from macro '_LIBCPP_DEPRECATED_IN_CXX14' # define _LIBCPP_DEPRECATED_IN_CXX14 _LIBCPP_DEPRECATED ^ /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/__config:1090:48: note: expanded from macro '_LIBCPP_DEPRECATED' # define _LIBCPP_DEPRECATED __attribute__ ((deprecated)) ``` 2. `c_test` link error with `-DROCKSDB_BUILD_SHARED=OFF`: ``` [ 7%] Linking CXX executable c_test ld: library not found for -lrocksdb-shared clang: error: linker command failed with exit code 1 (use -v to see invocation) make[5]: * [c_test] Error 1 make[4]: * [CMakeFiles/c_test.dir/all] Error 2 make[4]: *** Waiting for unfinished jobs.... ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/7205 Reviewed By: ajkr Differential Revision: D23030641 Pulled By: pdillinger fbshipit-source-id: f270e50fc0b824ca1a0876ec5c65d33f55a72dd0	2020-08-10 10:48:05 -07:00
Remington Brasga	633bff2f19	Fixed typo on Value mismatch error in db_test (#6587 ) Summary: The debug is supposed to print out two keys to show the value mismatch, which was compared just a few lines above. However, the actual print-out is the same values (so they obviously won't be mismatched) Pull Request resolved: https://github.com/facebook/rocksdb/pull/6587 Reviewed By: riversand963 Differential Revision: D23025279 Pulled By: ajkr fbshipit-source-id: 4c6c35bc60b273f13c08b5464b6f690d8a5cfe41	2020-08-10 10:06:08 -07:00
anand76	832b056a30	Enable IO timeouts for iterators (#7161 ) Summary: Introduce io_timeout in ReadOptions and enabled deadline/io_timeout for Iterators. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7161 Test Plan: New unit tests in db_basic_test Reviewed By: riversand963 Differential Revision: D22687352 Pulled By: anand1976 fbshipit-source-id: 67bbb0e6d7ae80b256589244468494292538c6ec	2020-08-07 12:01:08 -07:00
Zhichao Cao	b79f13b2aa	Fix the potential deadlock in WriteImplWALOnly and UnorderedWriteMemtable (#7199 ) Summary: Pointed out by https://github.com/facebook/rocksdb/issues/7197 , there is a double lock in WriteImplWALOnly. Also find another deadlock in UnorderedWriteMemtable. Move the check after switch_all_.notify_all(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/7199 Test Plan: pass make check Reviewed By: anand1976 Differential Revision: D22961714 Pulled By: zhichao-cao fbshipit-source-id: 0707922dc50d28ea141a15a8cdcbd1c8993ea0d8	2020-08-07 11:28:49 -07:00
mrambacher	56f468b356	Add more tests to ASSERT_STATUS_CHECKED (#7211 ) Summary: Added 4 more tests to those which pass ASSERT_STATUS_CHECKED (cache_test, lru_cache_test, filename_test, filelock_test). Pull Request resolved: https://github.com/facebook/rocksdb/pull/7211 Reviewed By: ajkr Differential Revision: D22982858 Pulled By: zhichao-cao fbshipit-source-id: acdd071582ed6aa7447ed96c5732f10bf720d783	2020-08-06 17:19:41 -07:00
Yingchun Lai	67bbac3621	Remove duplicate colon in Status message (#7041 ) Summary: A colon will be added after 'msg' automatically when invoke function Status(Code _code, const Slice& msg, const Slice& msg2), it's not needed to append a colon explicitly to 'msg'. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7041 Reviewed By: ajkr Differential Revision: D22292801 fbshipit-source-id: 8f2d69065bb779d2613468bf9fc9169f32c3f1ec	2020-08-06 15:18:04 -07:00
jsteemann	5e1808d515	fix typo: paraniod -> paranoid (#7163 ) Summary: Rename "paraniod" to "paranoid" in a few places. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7163 Reviewed By: ajkr Differential Revision: D22678242 fbshipit-source-id: 28b1011a736d0a95612676f7e1b9500a70c324b4	2020-08-06 14:25:34 -07:00
Cheng Chang	cd48ecaa1a	Define WAL related classes to be used in VersionEdit and VersionSet (#7164 ) Summary: `WalAddition`, `WalDeletion` are defined in `wal_version.h` and used in `VersionEdit`. `WalAddition` is used to represent events of creating a new WAL (no size, just log number), or closing a WAL (with size). `WalDeletion` is used to represent events of deleting or archiving a WAL, it means the WAL is no longer alive (won't be replayed during recovery). `WalSet` is the set of alive WALs kept in `VersionSet`. 1. Why use `WalDeletion` instead of relying on `MinLogNumber` to identify outdated WALs On recovery, we can compute `MinLogNumber()` based on the log numbers kept in MANIFEST, any log with number < MinLogNumber can be ignored. So it seems that we don't need to persist `WalDeletion` to MANIFEST, since we can ignore the WALs based on MinLogNumber. But the `MinLogNumber()` is actually a lower bound, it does not exactly mean that logs starting from MinLogNumber must exist. This is because in a corner case, when a column family is empty and never flushed, its log number is set to the largest log number, but not persisted in MANIFEST. So let's say there are 2 column families, when creating the DB, the first WAL has log number 1, so it's persisted to MANIFEST for both column families. Then CF 0 is empty and never flushed, CF 1 is updated and flushed, so a new WAL with log number 2 is created and persisted to MANIFEST for CF 1. But CF 0's log number in MANIFEST is still 1. So on recovery, MinLogNumber is 1, but since log 1 only contains data for CF 1, and CF 1 is flushed, log 1 might have already been deleted from disk. We can make `MinLogNumber()` be the exactly minimum log number that must exist, by persisting the most recent log number for empty column families that are not flushed. But if there are N such column families, then every time a new WAL is created, we need to add N records to MANIFEST. In current design, a record is persisted to MANIFEST only when WAL is created, closed, or deleted/archived, so the number of WAL related records are bounded to 3x number of WALs. 2. Why keep `WalSet` in `VersionSet` instead of applying the `VersionEdit`s to `VersionStorageInfo` `VersionEdit`s are originally designed to track the addition and deletion of SST files. The SST files are related to column families, each column family has a list of `Version`s, and each `Version` keeps the set of active SST files in `VersionStorageInfo`. But WALs are a concept of DB, they are not bounded to specific column families. So logically it does not make sense to store WALs in a column family's `Version`s. Also, `Version`'s purpose is to keep reference to SST / blob files, so that they are not deleted until there is no version referencing them. But a WAL is deleted regardless of version references. So we keep the WALs in `VersionSet` for the purpose of writing out the DB state's snapshot when creating new MANIFESTs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7164 Test Plan: make version_edit_test && ./version_edit_test make wal_edit_test && ./wal_edit_test Reviewed By: ltamasi Differential Revision: D22677936 Pulled By: cheng-chang fbshipit-source-id: 5a3b6890140e572ffd79eb37e6e4c3c32361a859	2020-08-05 16:34:38 -07:00
sdong	5c1a544122	Clean up InternalIterator upper bound logic a little bit (#7200 ) Summary: IteratorIterator::IsOutOfBound() and IteratorIterator::MayBeOutOfUpperBound() are two functions that related to upper bound check. It is hard for users to reason about this complexity. Consolidate the two functions into one and assign an enum as results to improve readability. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7200 Test Plan: Run all existing test. Would run crash test with atomic for a while. Reviewed By: anand1976 Differential Revision: D22833181 fbshipit-source-id: a0c724267056adbd0476bde74650e6c7226077e6	2020-08-05 10:44:57 -07:00
Yanqin Jin	2735b0275d	ReadOptions.iter_start_ts should support tombstones (#7178 ) Summary: as title. When ReadOptions.iter_start_ts is not nullptr, DBIter::key() should return internal keys including value type. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7178 Test Plan: make check Reviewed By: ltamasi Differential Revision: D22935879 Pulled By: riversand963 fbshipit-source-id: 7508d962cf11ebcfa6386d2529b4f3606b47ccfd	2020-08-04 18:52:08 -07:00
Akanksha Mahajan	493f425e77	Add support to start and end IOTracing through DB APIs (#7203 ) Summary: 1. Add support to start io tracing through DB::StartIOTrace(Env, const TraceOptions&, std::unique_ptr<TraceWriter>&&) and end tracing through DB::EndIOTrace(). This doesn't trace DB::Open. User side code: //Open DB DB::Open(options, dbname, &db); / Start tracing / db->StartIOTrace(env, trace_opt, std::move(trace_writer)); / Perform Operations / /End tracing*/ db->EndIOTrace(); 2. Fix the build errors for Windows. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7203 Test Plan: make check -j64 Reviewed By: anand1976 Differential Revision: D22901947 Pulled By: akankshamahajan15 fbshipit-source-id: e59c0b785a802168e6f1aa028d99c224a35cb30c	2020-08-04 18:41:45 -07:00
Andrew Kryczka	a4a4a2dabd	dedup ReadOptions in iterator hierarchy (#7210 ) Summary: Previously, a `ReadOptions` object was stored in every `BlockBasedTableIterator` and every `LevelIterator`. This redundancy consumes extra memory, resulting in the `Arena` making more allocations, and iteration observing worse cache performance. This PR migrates callers of `NewInternalIterator()` and `MakeInputIterator()` to provide a `ReadOptions` object guaranteed to outlive the returned iterator. When the iterator's lifetime will be managed by the user, this lifetime guarantee is achieved by storing the `ReadOptions` value in `ArenaWrappedDBIter`. Then, sub-iterators of `NewInternalIterator()` and `MakeInputIterator()` can hold a reference-to-const `ReadOptions`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7210 Test Plan: - `make check` under ASAN and valgrind - benchmark: on a DB with 2 L0 files and 3 L1+ levels, this PR reduced `Arena` allocation 4792 -> 4160 bytes. Reviewed By: anand1976 Differential Revision: D22861323 Pulled By: ajkr fbshipit-source-id: 54aebb3e89c872eeab0f5793b4b6e42878d093ce	2020-08-03 15:23:04 -07:00
Aaron Kabcenell	56ed601df3	Compaction Read/Write Stats by Compaction Type (#7165 ) Summary: Adds compaction statistics (total bytes read and written) for compactions that occur for delete-triggered, periodic, and TTL compaction reasons. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7165 Test Plan: TTL and periodic can be checked by runnning db_bench with the options activated: /db_bench --benchmarks="fillrandom,stats" --statistics --num=10000000 -base_background_compactions=16 -periodic_compaction_seconds=1 ./db_bench --benchmarks="fillrandom,stats" --statistics --num=10000000 -base_background_compactions=16 -fifo_compaction_ttl=1 Setting the time to one second causes non-zero bytes read/written for those compaction reasons. Disabling them or setting them to times longer than the test run length causes the stats to return to zero as expected. Delete-triggered compaction counting is tested in DBTablePropertiesTest.DeletionTriggeredCompactionMarking Reviewed By: ajkr Differential Revision: D22693050 Pulled By: akabcenell fbshipit-source-id: d15cef4d94576f703015c8942d5f0d492f69401d	2020-07-29 13:39:29 -07:00
codingsh	50f206ad84	feat: export SetBackgroundThreads(n, Env::BOTTOM); (#7191 ) Summary: - https://github.com/rust-rocksdb/rust-rocksdb/pull/448 Pull Request resolved: https://github.com/facebook/rocksdb/pull/7191 Reviewed By: riversand963 Differential Revision: D22809066 Pulled By: ajkr fbshipit-source-id: 036939f9a28cacc3f677c318d1aed97fe5f4f85e	2020-07-29 12:24:13 -07:00
sdong	692f6a3138	Implement NextAndGetResult() in memtable and level iterator (#7179 ) Summary: NextAndGetResult() is not implemented in memtable and is very simply implemented in level iterator. The result is that for a normal leveled iterator, performance regression will be observed for calling PrepareValue() for most iterator Next(). Mitigate the problem by implementing the function for both iterators. In level iterator, the implementation cannot be perfect as when calling file iterator's SeekToFirst() we don't have information about whether the value is prepared. Fortunately, the first key should not cause a big portion of the CPu. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7179 Test Plan: Run normal crash test for a while. Reviewed By: anand1976 Differential Revision: D22783840 fbshipit-source-id: c19f45cdf21b756190adef97a3b66ccde3936e05	2020-07-29 09:45:21 -07:00
mrambacher	d9d190742c	Make env_test work with ASSERT_STATUS_CHECKED (#7176 ) Summary: Make (most of) the env_test pass when ASSERT_STATUS_CHECKED is enabled. One test that opens a database is currently disabled in this mode, as there are many errors that need revisited for DB tests and status checks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7176 Reviewed By: cheng-chang Differential Revision: D22799278 Pulled By: ajkr fbshipit-source-id: 16d8a02eaeecd6df1060249b6a5811292801f2ed	2020-07-28 22:59:48 -07:00
codingsh	83ea266b43	export stats_persist_period_sec (#7168 ) Summary: fixed - https://github.com/rust-rocksdb/rust-rocksdb/issues/447 - https://github.com/rust-rocksdb/rust-rocksdb/pull/448 Pull Request resolved: https://github.com/facebook/rocksdb/pull/7168 Reviewed By: cheng-chang Differential Revision: D22736013 Pulled By: ajkr fbshipit-source-id: fdd784aa75d26a367b9108b05ffdd94a2ae117d3	2020-07-28 13:05:34 -07:00
Tomas Kolda	cd4592c220	SST Partitioner interface that allows to split SST files (#6957 ) Summary: SST Partitioner interface that allows to split SST files during compactions. It basically instruct compaction to create a new file when needed. When one is using well defined prefixes and prefixed way of defining tables it is good to define also partitioning so that promotion of some SST file does not cover huge key space on next level (worst case complete space). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6957 Reviewed By: ajkr Differential Revision: D22461239 fbshipit-source-id: 9ce07bba08b3ba89c2d45630520368f704d1316e	2020-07-24 13:44:49 -07:00
Jay Zhuang	b0c5ecd6b3	Make max_subcompactions dynamically changeable (#7159 ) Summary: Make `max-subcompactions` dynamically changeable by passing the `DBOption` to Compaction. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7159 Reviewed By: siying Differential Revision: D22671238 Pulled By: jay-zhuang fbshipit-source-id: 311ca9f6bb606965544d8708616d358cfed5be42	2020-07-22 18:32:52 -07:00
Levi Tamasi	0d04a8434a	Sync blob files before closing them (#7160 ) Summary: BlobDB currently syncs each blob file periodically after writing a certain amount of data (as specified by the configuration option `BlobDBOptions::bytes_per_sync`) and all open blob files when the base DB's memtables are flushed. With the patch, in addition to the above, blob files are also synced right before being closed, after the footer has been written. This will be beneficial for the new integrated blob file write path as well. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7160 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D22672646 Pulled By: ltamasi fbshipit-source-id: 62b34263543a7e74abcbb7adf011daa1e699998f	2020-07-22 17:25:20 -07:00
Cheng Chang	96ce0470a7	Clean snapshot dir before taking snapshot (#7156 ) Summary: `DBTest::SnapshotFiles` runs the tests in a `while` loop. Currently, the snapshot directory is not cleaned up in each loop, so previous snapshot files may remain in the next loop's snapshot. When I'm working on https://github.com/facebook/rocksdb/pull/7129, when checking the tracked WALs in MANIFEST, I find that this test always fails because it reads some unknown WAL. It turns out that the unknown WAL is left from previous loops. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7156 Test Plan: make db_test && ./db_test --gtest_filters=*SnapshotFiles Reviewed By: siying Differential Revision: D22668360 Pulled By: cheng-chang fbshipit-source-id: 69d4aa3506038ba30e218e8ae966357935a99c6c	2020-07-22 13:54:01 -07:00
mrambacher	d44cbc5314	Add hash of key/value checks when paranoid_file_checks=true (#7134 ) Summary: When paraoid_files_checks=true, a rolling key-value hash is generated and compared to what is written to the file. If the values do not match, the SST file is rejected. Code put in place for the check for both flush and compaction jobs. Corresponding test added to corruption_test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7134 Reviewed By: cheng-chang Differential Revision: D22646149 fbshipit-source-id: 8fde1984a1a11edd3bd82a413acffc5ea7aa683f	2020-07-22 11:04:40 -07:00
Haosen Wen	dbc51adbac	Use steady_clock instead of system_clock in FileOperationInfo::TimePoint (#7153 ) Summary: Issue https://github.com/facebook/rocksdb/issues/7133 reported that using `system_clock` in `FileOperationInfo::TimePoint` causes the duration of file flush operation (which can be a noop on MacOS in some scenarios) appears to be 0 and fail an assertion in listener_test. Using `steady_clock` supposedly fixed the problem. `steady_clock` actually fits better into the use cases of `FileOperationInfo::TimePoint` as all usages care about durations but not wall clock time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7153 Test Plan: make check. Reviewed By: riversand963 Differential Revision: D22654136 Pulled By: roghnin fbshipit-source-id: 5980b1080734bdae496a18071a2c2b5887c67d85	2020-07-22 08:55:02 -07:00
sdong	1cf4731dbb	column_family_test: fix a data race related to sleeping task (#7150 ) Summary: TSAN reports warning in one column_family_test: WARNING: ThreadSanitizer: data race (pid=16352) Write of size 8 at 0x7ffcdf042158 by main thread: #0 pthread_cond_destroy <null> (column_family_test+0x471f65) https://github.com/facebook/rocksdb/issues/1 rocksdb::port::CondVar::~CondVar() /home/circleci/project/port/port_posix.cc:101:49 (column_family_test+0x8a627a) https://github.com/facebook/rocksdb/issues/2 rocksdb::test::SleepingBackgroundTask::~SleepingBackgroundTask() /home/circleci/project/./test_util/testutil.h:397:7 (column_family_test+0x54b6e2) https://github.com/facebook/rocksdb/issues/3 rocksdb::ColumnFamilyTest_FlushCloseWALFiles_Test::TestBody() /home/circleci/project/db/column_family_test.cc:3008:1 (column_family_test+0x54b6e2) ...... Previous read of size 8 at 0x7ffcdf042158 by thread T2 (mutexes: write M0): #0 pthread_cond_broadcast <null> (column_family_test+0x471dd2) https://github.com/facebook/rocksdb/issues/1 rocksdb::port::CondVar::SignalAll() /home/circleci/project/port/port_posix.cc:139:28 (column_family_test+0x8a651a) https://github.com/facebook/rocksdb/issues/2 rocksdb::test::SleepingBackgroundTask::DoSleep() /home/circleci/project/./test_util/testutil.h:412:12 (column_family_test+0x58574b) ...... Likely, SleepingBackgroundTask::DoSleep() started to execute after the main thread has finished everything, cancelled and waited for sleeping tasks to finish. At this time, although DoSlee() will not sleep, but it also accesses the mutex, creating a data race with destructor of the test. Fix this bug by waiting for the sleeping task to start sleeping after it is scheduled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7150 Test Plan: Run these modified tests and make sure it doesn't break. Reviewed By: riversand963 Differential Revision: D22630716 fbshipit-source-id: cc5781cf69083685de406490438898238bdfc2d3	2020-07-20 14:19:48 -07:00
sdong	9870704420	Fix a minor data race in stats dumping threads initialization (#7151 ) Summary: https://github.com/facebook/rocksdb/pull/7145 creates a minor data race against the stat creation counter. Turn it to atomic. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7151 Test Plan: Run the test. Reviewed By: ajkr Differential Revision: D22631014 fbshipit-source-id: c6fb69ac5b9df7139795dacea5ce9fb9fd3278d7	2020-07-20 12:12:43 -07:00
Zhichao Cao	ed4712fe7e	Remove time out testing cases in error_handler_fs_test (#7141 ) Summary: Remove the 3 testing cases that cause the time out in linux build by https://github.com/facebook/rocksdb/issues/6765 . Will fix them later. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7141 Test Plan: make asan_check, buck run Reviewed By: ajkr Differential Revision: D22593831 Pulled By: zhichao-cao fbshipit-source-id: 14956c36476ecc3393f613178c22e13df843126e	2020-07-17 23:27:21 -07:00
Andrew Kryczka	9a83fd21e6	stagger first DumpMallocStats after opening DB (#7145 ) Summary: Previously when running `db_bench` with large value for `num_multi_dbs` and enabled `Options::dump_malloc_stats`, we would see most CPU spent in jemalloc locking. After this PR that no longer shows up at the top of the profile. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7145 Reviewed By: riversand963 Differential Revision: D22593031 Pulled By: ajkr fbshipit-source-id: 3b3fc91f93249c6afee53f59f34c487c3fc5add6	2020-07-17 16:13:26 -07:00
sdong	ca5a069a79	Suppress a TSAN warning (#7126 ) Summary: TSAN shows warning with clang with warning similar to this: WARNING: ThreadSanitizer: data race (pid=10159) Atomic write of size 8 at 0x7b5000002890 by thread T33: #0 __tsan_atomic64_store <null> (db_test+0x4ca2b5) https://github.com/facebook/rocksdb/issues/1 std::__atomic_base<unsigned long>::store(unsigned long, std::memory_order) /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/atomic_base.h:374:2 (db_test+0x774fde) https://github.com/facebook/rocksdb/issues/2 rocksdb::VersionSet::SetLastSequence(unsigned long) /home/circleci/project/./db/version_set.h:1057:20 (db_test+0x774fde) https://github.com/facebook/rocksdb/issues/3 rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch, rocksdb::WriteCallback, unsigned long, unsigned long, bool, unsigned long, unsigned long, rocksdb::PreReleaseCallback) /home/circleci/project/db/db_impl/db_impl_write.cc:449:18 (db_test+0x774fde) ...... Previous read of size 8 at 0x7b5000002890 by thread T5 (mutexes: write M1044689462619020832): #0 rocksdb::DBImpl::ReleaseSnapshot(rocksdb::Snapshot const) /home/circleci/project/db/db_impl/db_impl.cc (db_test+0x6f4ae7) https://github.com/facebook/rocksdb/issues/1 rocksdb::(anonymous namespace)::MTThreadBody(void) /home/circleci/project/db/db_test.cc:2514:13 (db_test+0x56ac59) https://github.com/facebook/rocksdb/issues/2 rocksdb::(anonymous namespace)::StartThreadWrapper(void) /home/circleci/project/env/env_posix.cc:443:3 (db_test+0x88c4cd) It is not limited to ReleaseSnapshot() and rocksdb::DBImpl::MultiCFSnapshot(). While we are not 100% sure it doesn't indicate any correctness violation, we suppress them for now to keep TSAN clean with more tests so that we can cover more bugs with CI. In the gcc runs we have been running, this warning rarely shows up. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7126 Test Plan: See the mini-TSAN test to pass with reasonable run time. Reviewed By: ajkr Differential Revision: D22552375 fbshipit-source-id: ebdd3854cb3becec3403970326a1ca961db2ab00	2020-07-15 13:25:14 -07:00
Zhichao Cao	a10f12eda1	Auto resume the DB from Retryable IO Error (#6765 ) Summary: In current codebase, in write path, if Retryable IO Error happens, SetBGError is called. The retryable IO Error is converted to hard error and DB is in read only mode. User or application needs to resume it. In this PR, if Retryable IO Error happens in one DB, SetBGError will create a new thread to call Resume (auto resume). otpions.max_bgerror_resume_count controls if auto resume is enabled or not (if max_bgerror_resume_count<=0, auto resume will not be enabled). options.bgerror_resume_retry_interval controls the time interval to call Resume again if the previous resume fails due to the Retryable IO Error. If non-retryable error happens during resume, auto resume will terminate. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6765 Test Plan: Added the unit test cases in error_handler_fs_test and pass make asan_check Reviewed By: anand1976 Differential Revision: D21916789 Pulled By: zhichao-cao fbshipit-source-id: acb8b5e5dc3167adfa9425a5b7fc104f6b95cb0b	2020-07-15 11:03:58 -07:00
Yanqin Jin	27735dea9a	Report corrupted keys during compaction (#7124 ) Summary: Currently, RocksDB lets compaction to go through even in case of corrupted keys, the number of which is reported in CompactionJobStats. However, RocksDB does not check this value. We should let compaction run in a stricter mode. Temporarily disable two tests that allow corrupted keys in compaction. With this PR, the two tests will assert(false) and terminate. Still need to investigate what is the recommended google-test way of doing it. Death test (EXPECT_DEATH) in gtest has warnings now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7124 Test Plan: make check Reviewed By: ajkr Differential Revision: D22530722 Pulled By: riversand963 fbshipit-source-id: 6a5a6a992028c6d4f92cb74693c92db462ae4ad6	2020-07-14 17:18:17 -07:00
Levi Tamasi	bdf4de6cb9	Remove some dead code from BlobLogWriter (#7125 ) Summary: Periodic syncing of blob files is performed by `WritableFileWriter`; `bytes_per_sync_` and `next_sync_offset_` in `BlobLogWriter` are actually unused (or more precisely, only used by methods that are themselves unused). The patch removes all this dead code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7125 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D22531021 Pulled By: ltamasi fbshipit-source-id: 6b293ad5a79d3e6bf15c5c68f7aedd7ce7a15f10	2020-07-14 13:51:54 -07:00
Yanqin Jin	c628fae6d1	Report corruption on unrecognized value type (#7121 ) Summary: During memtable lookup, an unrecognized value type should be reported as Status::Corruption. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7121 Test Plan: make check Reviewed By: cheng-chang Differential Revision: D22512124 Pulled By: riversand963 fbshipit-source-id: 9b97be7d9b230c5aae9205f96054420e5ea09066	2020-07-13 20:26:58 -07:00
Stanislav Tkach	393e486e3e	Add getters for options to the C API (#7094 ) Summary: Along with https://github.com/facebook/rocksdb/issues/6925 and https://github.com/facebook/rocksdb/issues/6998, this should add getters for all Options fields except several ones with non-trivial interface (for example rocksdb_options_set_min_level_to_compress). Pull Request resolved: https://github.com/facebook/rocksdb/pull/7094 Reviewed By: riversand963 Differential Revision: D22479800 Pulled By: pdillinger fbshipit-source-id: d14f305e12cfe268d07e0fe229d55cef299c792a	2020-07-10 14:30:04 -07:00
wenh	4924a506b9	Reduce `env_->GetChildren()` calls in DBImpl::Recover() (#7044 ) Summary: There currently exist multiple `GetChildren()` calls in `DBImpl::Recover()`, which can be expensive in cases of distributed file systems. This pull request try to call `DBImpl::Recover()` of each necessary directory only _once_ and reuse the results in the places of repeated calls in current code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7044 Test Plan: Run `make check` and use the default test suite. The modified code should be semantically identical to the current code. As a proof of this solution, we may optionally deploy the system onto a (real or simulated) distributed system and expect reduced latency caused by manifest fetching. (WIP) Reviewed By: riversand963 Differential Revision: D22419925 Pulled By: roghnin fbshipit-source-id: d3774fbfbc246c5527101bc16747eb5c90919886	2020-07-10 13:41:08 -07:00
mrambacher	c7c7b07f06	More Makefile Cleanup (#7097 ) Summary: Cleans up some of the dependencies on test code in the Makefile while building tools: - Moves the test::RandomString, DBBaseTest::RandomString into Random - Moves the test::RandomHumanReadableString into Random - Moves the DestroyDir method into file_utils - Moves the SetupSyncPointsToMockDirectIO into sync_point. - Moves the FaultInjection Env and FS classes under env These changes allow all of the tools to build without dependencies on test_util, thereby simplifying the build dependencies. By moving the FaultInjection code, the dependency in db_stress on different libraries for debug vs release was eliminated. Tested both release and debug builds via Make and CMake for both static and shared libraries. More work remains to clean up how the tools are built and remove some unnecessary dependencies. There is also more work that should be done to get the Makefile and CMake to align in their builds -- what is in the libraries and the sizes of the executables are different. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7097 Reviewed By: riversand963 Differential Revision: D22463160 Pulled By: pdillinger fbshipit-source-id: e19462b53324ab3f0b7c72459dbc73165cc382b2	2020-07-09 14:35:17 -07:00
Yanqin Jin	f70ad03137	Parameterize a few tests in DBWALTest (#7105 ) Summary: As title. The goal is to shorten the execution time of several tests when they are combined together in a single TEST_F. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7105 Test Plan: make db_wal_test ./db_wal_test Reviewed By: ltamasi Differential Revision: D22442705 Pulled By: riversand963 fbshipit-source-id: 0ad49b8f21fa86dcd5a4d3c9a06af313735ac217	2020-07-09 11:31:06 -07:00
Akanksha Mahajan	54f171fe90	Update Flush policy in PartitionedIndexBuilder on switching from user-key to internal-key mode (#7096 ) Summary: When format_version is high enough to support user-key and there are index entries for same user key that spans multiple data blocks then it changes from user-key mode to internal-key mode. But the flush policy is not reset to point to Block Builder of internal-keys. After this switch, no entries are added to user key index partition result, thus it never triggers flushing the block. Fix: 1. After adding the entry in sub_builder_index_, if there is a switch from user-key to internal-key, then flush policy is updated to point to Block Builder of internal-keys index partition. 2. Set sub_builder_index_->seperator_is_key_plus_seq_ = true if seperator_is_key_plus_seq_ is set to true so that subsequent partitions can also use internal key mode. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7096 Test Plan: make check -j64 Reviewed By: ajkr Differential Revision: D22416598 Pulled By: akankshamahajan15 fbshipit-source-id: 01fc2dc07ea1b32f8fb803995ebe6e9a3fbe67ac	2020-07-08 21:03:04 -07:00
rafael-aero	712458fc34	Add RestoreDBFromLatestBackup to C API, add new C# package (#7092 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7092 Reviewed By: riversand963 Differential Revision: D22412323 Pulled By: ajkr fbshipit-source-id: 3fc1c63bb19a8cd2c0ae620800c28f199a7f494b	2020-07-08 11:56:41 -07:00
rockeet	b649d8cb97	Fixed Factory construct just for calling .Name() (#7080 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7080 Reviewed By: riversand963 Differential Revision: D22412352 Pulled By: ajkr fbshipit-source-id: 1d7f4c1621040a0130245139b52c3f4d3deac865	2020-07-08 11:54:00 -07:00
wenh	226d1f9c73	extend listener callback functions to more file I/O operations (#7055 ) Summary: Currently, `EventListener` in listner.h only have callback functions for file read and write. One may favor extended callback functions for more file I/O operations like flush, sync and close. This PR tries to add those interface and have them called when appropriate throughout the code base. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7055 Test Plan: Write an experimental listener with those new callback functions with log output in them; run experiments and check logs to see those functions are actually called. Default test suits `make check` should also be included. Reviewed By: riversand963 Differential Revision: D22380624 Pulled By: roghnin fbshipit-source-id: 4121491d45c2c2aae8c255e7998090559a241c6a	2020-07-07 18:21:18 -07:00
Andrew Kryczka	dd29ad4223	Separate internal and user key comparators in `BlockIter` (#6944 ) Summary: Replace `BlockIter::comparator_` and `IndexBlockIter::user_comparator_wrapper_` with a concrete `UserComparatorWrapper` and `InternalKeyComparator`. The motivation for this change was the inconvenience of not knowing the concrete type of `BlockIter::comparator_`, which prevented calling specialized internal key comparison functions to optimize comparison of keys with global seqno applied. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6944 Test Plan: benchmark setup -- single file DBs, in-memory, no compression. "normal_db" created by regular flush; "ingestion_db" created by ingesting a file. Both DBs have same contents. ``` $ TEST_TMPDIR=/dev/shm/normal_db/ ./db_bench -benchmarks=fillrandom,compact -write_buffer_size=10485760000 -disable_auto_compactions=true -compression_type=none -num=1000000 $ ./ldb write_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/ --compression_type=no --hex --create_if_missing < <(./sst_dump --command=scan --output_hex --file=/dev/shm/normal_db/dbbench/000007.sst \| awk 'began {print "0x" substr($1, 2, length($1) - 2), "==>", "0x" $5} ; /^Sst file format: block-based/ {began=1}') $ ./ldb ingest_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/ ``` benchmark run command: ``` $ TEST_TMPDIR=/dev/shm/$DB/ ./db_bench -benchmarks=seekrandom -seek_nexts=$SEEK_NEXT -use_existing_db=true -cache_index_and_filter_blocks=false -num=1000000 -cache_size=0 -threads=1 -reads=200000000 -mmap_read=1 -verify_checksum=false ``` results: perf improved marginally for ingestion_db and did not change significantly for normal_db: SEEK_NEXT \| DB \| code \| ops/sec \| % change -- \| -- \| -- \| -- \| -- 0 \| normal_db \| master \| 350880 \| 0 \| normal_db \| PR6944 \| 351040 \| 0.0 0 \| ingestion_db \| master \| 343255 \| 0 \| ingestion_db \| PR6944 \| 349424 \| 1.8 10 \| normal_db \| master \| 218711 \| 10 \| normal_db \| PR6944 \| 217892 \| -0.4 10 \| ingestion_db \| master \| 220334 \| 10 \| ingestion_db \| PR6944 \| 226437 \| 2.8 Reviewed By: pdillinger Differential Revision: D21924676 Pulled By: ajkr fbshipit-source-id: ea4288a2eefa8112eb6c651a671c1de18c12e538	2020-07-07 17:26:16 -07:00
Levi Tamasi	a693341604	Move the blob file format related classes to the main namespace, rename reader/writer (#7086 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7086 Test Plan: `make check` Reviewed By: zhichao-cao Differential Revision: D22395420 Pulled By: ltamasi fbshipit-source-id: 088a20097bd6b73b0c433cd79725779f97ec04f2	2020-07-06 17:18:14 -07:00
Peter Dillinger	4b107ceb7e	Improve code comments in EstimateLiveDataSize (#7072 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7072 Reviewed By: ajkr Differential Revision: D22391641 Pulled By: pdillinger fbshipit-source-id: 0ef355576454514263ab684eb1a5c06787f3242a	2020-07-06 16:17:02 -07:00
Jay Zhuang	00de699096	Replace reinterpret_cast with static_cast_with_check (#7067 ) Summary: Replace `reinterpret_cast` with `static_cast_with_check` for `DBImpl` and `ColumnFamilyHandleImpl`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7067 Reviewed By: siying Differential Revision: D22361587 Pulled By: jay-zhuang fbshipit-source-id: dfe9e8f3af39c3d27cc372c55ab9ad905eb0a5a1	2020-07-02 19:25:41 -07:00
Zitan Chen	373d5ac485	BackupEngine verifies table file checksums on creating new backups (#7015 ) Summary: When table file checksums are enabled and stored in the DB manifest by using the RocksDB default crc32c checksum function, BackupEngine will calculate the crc32c checksum of the file to be copied and compare the calculated result with the one stored in the DB manifest before copying the file to the backup directory. After copying to the backup directory, BackupEngine will verify the checksum of the copied file with the one calculated before copying. This helps detect some rare corruption events such as bit-flips during the copying process. No verification with checksums in DB manifest will be performed if the table file checksum function is not the RocksDB default crc32c checksum function. In addition, If `share_table_files` and `share_files_with_checksum` are true, BackupEngine will compare the checksums computed before and after copying of the table files. Corresponding tests are added. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7015 Test Plan: Passed make check Reviewed By: pdillinger Differential Revision: D22165732 Pulled By: gg814 fbshipit-source-id: ee0e8cc397c455eba64545c29380b9d9853588ec	2020-07-02 18:15:12 -07:00
Peter Dillinger	a680a7ea37	Un-revert #7049 , revert #7022 (#7071 ) Summary: Even though local bisection gave me a clear signal (and still does) that reverting https://github.com/facebook/rocksdb/issues/7049 would fix the failures in MultiThreadedDBTest, https://github.com/facebook/rocksdb/issues/7022 seems to be the root cause. Reverting https://github.com/facebook/rocksdb/issues/7022 and keeping https://github.com/facebook/rocksdb/issues/7049 seems to fix the issue in local reproducer also. (Had these landed in opposite order, bisection would have found the root cause.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/7071 Reviewed By: akankshamahajan15 Differential Revision: D22362857 Pulled By: pdillinger fbshipit-source-id: ed63df3d74e9d4ce1604de8fe43b216166c7a3f0	2020-07-02 13:30:41 -07:00
Peter Dillinger	52d59e0c93	Revert "Whole DBTest to skip fsync (#7049 )" (#7070 ) Summary: This reverts commit `4f1534bdb0`. This commit caused failures and deadlocks in MultiThreadedDBTest.MultiThreaded/69 and others. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7070 Reviewed By: riversand963 Differential Revision: D22358778 Pulled By: pdillinger fbshipit-source-id: faf8f2cb469a7063a113921c8e9c64a9f7610dac	2020-07-02 10:22:43 -07:00
sdong	4f1534bdb0	Whole DBTest to skip fsync (#7049 ) Summary: After https://github.com/facebook/rocksdb/pull/7036, we still see extra DBTest that can timeout when running 10 or 20 in parallel. Expand skip-fsync mode in whole DBTest. Still preserve other tests from doing this mode to be conservative. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7049 Test Plan: Run all existing files. Reviewed By: pdillinger Differential Revision: D22301700 fbshipit-source-id: f9a9e3b3b26ce640665a47cb8bff33ba0c89b565	2020-07-01 19:37:56 -07:00
Akanksha Mahajan	5edfe3a3d8	Update Flush policy in PartitionedIndexBuilder on switching from user-key to internal-key mode (#7022 ) Summary: When format_version is high enough to support user-key and there are index entries for same user key that spans multiple data blocks then it changes from user-key mode to internal-key mode. But the flush policy is not reset to point to Block Builder of internal-keys. After this switch, no entries are added to user key index partition result, thus it never triggers flushing the block. Fix: After adding the entry in sub_builder_index_, if there is a switch from user-key to internal-key, then flush policy is updated to point to Block Builder of internal-keys index partition. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7022 Test Plan: 1. make check -j64 2. Added one unit test case Reviewed By: ajkr Differential Revision: D22197734 Pulled By: akankshamahajan15 fbshipit-source-id: d87e9e46bccab8e896ee6979d6b79c51f73d479e	2020-07-01 14:58:08 -07:00
Andrew Kryczka	c25a014792	deflake DBCompactionTestWithParam.IntraL0Compaction test (#7065 ) Summary: This check is flaky because compaction could run between the `Flush()` and the `TestGetTickerCount()`, which would increase the `BLOCK_CACHE_INDEX_MISS` count beyond what the test expects. Verified by adding a `sleep(1)` between those two lines and observing the counter is too high every time. The solution is just to remove this check as it doesn't have any use anyways. The latter check of index miss is sufficient to conclude the newest L0 file (i.e., the one generated by intra-L0) does not have its index block pinned in cache. It'd be nice to simultaneously check the L0 files generated by flush do have their index blocks pinned in cache, but that's not what the line deleted in this PR was checking.. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7065 Reviewed By: pdillinger Differential Revision: D22340327 Pulled By: ajkr fbshipit-source-id: e076b2c7228b7fa763dd0c0cb13828e176c1abee	2020-07-01 14:53:10 -07:00
Peter Dillinger	e2fd501d44	Stabilize DBTest.ApproximateSizesMemTable (#7064 ) Summary: Random memtable layouts could cause random failure, reproducible with command below running for a while. Test now using deterministic behavior. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7064 Test Plan: while ./db_test --gtest_filter=SizesMemTable; do true; done Reviewed By: siying Differential Revision: D22339442 Pulled By: pdillinger fbshipit-source-id: 8e74e5a9b5e88f7030854045a22c12cf561d5de6	2020-07-01 13:52:20 -07:00
mrambacher	80f71b5863	Use Libraries in the RocksDB Makefile Build (#6660 ) Summary: Change the linking of tests/tools to be against a library rather than a list of objects. This change substantially reduces the size of the objects produced. peterd clean repo size: 264M Before this change, with make all: 40G After this change, with make all: 28G With make LIB_MODE=shared all: 7.0G The list of TESTS was changed from being hard-coded to generated from the test sources variable. Note that there are some test sources that are not built as tests (though the set of tests is identical to the previous version). Added OBJ_DIR option to Makefile to allow objects to be placed in an alternative location. By default, OBJ_DIR is the same as before ("./"). This change is a precursor to being able to build/run the tests/tools linked against static libraries. Additionally, it should be possible to clean up and merge some of the rules for building tests and the like if so desired. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6660 Reviewed By: riversand963 Differential Revision: D22244463 Pulled By: pdillinger fbshipit-source-id: db9c6341d81ed62c2270374f4ede02fb9604c754	2020-06-30 19:33:31 -07:00
Levi Tamasi	e367bc7f4b	Clean up blob files based on the linked SST set (#7001 ) Summary: The earlier `VersionBuilder` code only cleaned up blob files that were marked as entirely consisting of garbage using `VersionEdits` with `BlobFileGarbage`. This covers the cases when table files go through regular compaction, where we iterate through the KVs and thus have an opportunity to calculate the amount of garbage (that is, most cases). However, it does not help when table files are simply dropped (e.g. deletion compactions or the `DeleteFile` API). To deal with such cases, the patch adds logic that cleans up all blob files at the head of the list until the first one with linked SSTs is found. (As an example, let's assume we have blob files with numbers 1..10, and the first one with any linked SSTs is number 8. This means that SSTs in the `Version` only rely on blob files with numbers >= 8, and thus 1..7 are no longer needed.) The code change itself is pretty small; however, changing the logic like this necessitated changes to some tests that have been added recently (namely to the ones that use blob files in isolation, i.e. without any table files referring to them). Some of these cases were fixed by bypassing `VersionBuilder` altogether in order to keep the tests simple (which actually makes them more proper unit tests as well), while the `VersionBuilder` unit tests were fixed by adding dummy table files to the test cases as needed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7001 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D22119474 Pulled By: ltamasi fbshipit-source-id: c6547141355667d4291d9661d6518eb741e7b54a	2020-06-30 15:31:21 -07:00
sdong	80b107a0a9	Divide WriteCallbackTest.WriteWithCallbackTest (#7037 ) Summary: WriteCallbackTest.WriteWithCallbackTest has a deep for-loop and in some cases runs very long. Parameterimized it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7037 Test Plan: Run the test and see it passes. Reviewed By: ltamasi Differential Revision: D22269259 fbshipit-source-id: a1b6687b5bf4609754833d14cf383d68bc7ab27a	2020-06-30 12:31:30 -07:00
Burton Li	5be2cb6948	Compaction filter support for BlobDB (#6850 ) Summary: Added compaction filter support for BlobDB non-TTL values. Same as vanilla RocksDB, user compaction filter applies to all k/v pairs of the compaction for non-TTL values. It honors `min_blob_size`, which potentially results value transitions between inlined data and stored-in-blob data when size of value is changed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6850 Reviewed By: siying Differential Revision: D22263487 Pulled By: ltamasi fbshipit-source-id: 8fc03f8cde2a5c831e63b436b3dbf1b7f90939e8	2020-06-29 17:32:14 -07:00
sdong	58547e533b	Disable fsync in some tests to speed them up (#7036 ) Summary: Fsyncing files is not providing more test coverage in many tests. Provide an option in SpecialEnv to turn it off to speed it up and enable this option in some tests with relatively long run time. Most of those tests can be divided as parameterized gtest too. This two speed up approaches are orthogonal and we can do both if needed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7036 Test Plan: Run all tests and make sure they pass. Reviewed By: ltamasi Differential Revision: D22268084 fbshipit-source-id: 6d4a838a1b7328c13931a2a5d93de57aa02afaab	2020-06-29 16:56:59 -07:00
Anand Ananthabhotla	9a5886bd8c	Extend Get/MultiGet deadline support to table open (#6982 ) Summary: Current implementation of the ```read_options.deadline``` option only checks the deadline for random file reads during point lookups. This PR extends the checks to file opens, prefetches and preloads as part of table open. The main changes are in the ```BlockBasedTable```, partitioned index and filter readers, and ```TableCache``` to take ReadOptions as an additional parameter. In ```BlockBasedTable::Open```, in order to retain existing behavior w.r.t checksum verification and block cache usage, we filter out most of the options in ```ReadOptions``` except ```deadline```. However, having the ```ReadOptions``` gives us more flexibility to honor other options like verify_checksums, fill_cache etc. in the future. Additional changes in callsites due to function signature changes in ```NewTableReader()``` and ```FilePrefetchBuffer```. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6982 Test Plan: Add new unit tests in db_basic_test Reviewed By: riversand963 Differential Revision: D22219515 Pulled By: anand1976 fbshipit-source-id: 8a3b92f4a889808013838603aa3ca35229cd501b	2020-06-29 14:53:17 -07:00
Stanislav Tkach	1b85d57cf5	Expose KeyMayExist in the C API (#7021 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7021 Reviewed By: ajkr Differential Revision: D22246297 Pulled By: pdillinger fbshipit-source-id: 81dfd0a49e4d5ce0c9f00772c17cca425757ea24	2020-06-29 12:21:53 -07:00
Yanqin Jin	d47c871190	Fix data race to VersionSet::io_status_ (#7034 ) Summary: After https://github.com/facebook/rocksdb/issues/6949 , VersionSet::io_status_ can be concurrently accessed by multiple threads without lock, causing tsan test to fail. For example, a bg flush thread resets io_status_ before calling LogAndApply(), while another thread already in the process of LogAndApply() reads io_status_. This is a bug. We do not have to reset io_status_ each time we call LogAndApply(). io_status_ is part of the state of VersionSet, and it indicates the outcome of preceding MANIFEST/CURRENT files IO operations. Its value should be updated only when: 1. MANIFEST/CURRENT files IO fail for the first time. 2. MANIFEST/CURRENT files IO succeed as part of recovering from a prior failure without process restart, e.g. calling Resume(). Test Plan (devserver): COMPILE_WITH_TSAN=1 make check COMPILE_WITH_TSAN=1 make db_test2 ./db_test2 --gtest_filter=DBTest2.CompactionStall Pull Request resolved: https://github.com/facebook/rocksdb/pull/7034 Reviewed By: zhichao-cao Differential Revision: D22247137 Pulled By: riversand963 fbshipit-source-id: 77b83e05390f3ee3cd2d96d3fdd6fe4f225e3216	2020-06-27 08:57:31 -07:00
sdong	f9817201af	Add unity build to CircleCI (#7026 ) Summary: We are still keeping unity build working. So it's a good idea to add to a pre-commit CI. A latest GCC docker image just to get a little bit more coverage. Fix three small issues to make it pass. Also make unity_test to run db_basic_test rather than db_test to cut the test time. There is no point to run expensive tests here. It was set to run db_test before db_basic_test was separated out. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7026 Test Plan: watch tests to pass. Reviewed By: zhichao-cao Differential Revision: D22223197 fbshipit-source-id: baa3b6cbb623bf359829b63ce35715c75bcb0ed4	2020-06-26 11:14:08 -07:00

1 2 3 4 5 ...

4196 Commits