rocksdb

Author	SHA1	Message	Date
Yanqin Jin	a8170d774c	Close file to avoid file-descriptor leakage (#6936 ) Summary: When operation on an open file descriptor fails, we should close the file descriptor. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6936 Test Plan: make check Reviewed By: pdillinger Differential Revision: D21885458 Pulled By: riversand963 fbshipit-source-id: ba077a76b256a8537f21e22e4ec198f45390bf50	2020-06-04 14:21:15 -07:00
Zitan Chen	02df00d97b	API change: DB::OpenForReadOnly will not write to the file system unless create_if_missing is true (#6900 ) Summary: DB::OpenForReadOnly will not write anything to the file system (i.e., create directories or files for the DB) unless create_if_missing is true. This change also fixes some subcommands of ldb, which write to the file system even if the purpose is for readonly. Two tests for this updated behavior of DB::OpenForReadOnly are also added. Other minor changes: 1. Updated HISTORY.md to include this API change of DB::OpenForReadOnly; 2. Updated the help information for the put and batchput subcommands of ldb with the option [--create_if_missing]; 3. Updated the comment of Env::DeleteDir to emphasize that it returns OK only if the directory to be deleted is empty. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6900 Test Plan: passed make check; also manually tested a few ldb subcommands Reviewed By: pdillinger Differential Revision: D21822188 Pulled By: gg814 fbshipit-source-id: 604cc0f0d0326a937ee25a32cdc2b512f9a3be6e	2020-06-03 18:57:49 -07:00
Levi Tamasi	0b8c549b3f	Mention the consistency check improvement in HISTORY.md (#6924 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6924 Reviewed By: cheng-chang Differential Revision: D21865662 Pulled By: ltamasi fbshipit-source-id: 83a01bcbb779cfba941154a36a9e735293a93211	2020-06-03 13:40:41 -07:00
Peter Dillinger	14eca6bf04	For ApproximateSizes, pro-rate table metadata size over data blocks (#6784 ) Summary: The implementation of GetApproximateSizes was inconsistent in its treatment of the size of non-data blocks of SST files, sometimes including and sometimes now. This was at its worst with large portion of table file used by filters and querying a small range that crossed a table boundary: the size estimate would include large filter size. It's conceivable that someone might want only to know the size in terms of data blocks, but I believe that's unlikely enough to ignore for now. Similarly, there's no evidence the internal function AppoximateOffsetOf is used for anything other than a one-sided ApproximateSize, so I intend to refactor to remove redundancy in a follow-up commit. So to fix this, GetApproximateSizes (and implementation details ApproximateSize and ApproximateOffsetOf) now consistently include in their returned sizes a portion of table file metadata (incl filters and indexes) based on the size portion of the data blocks in range. In other words, if a key range covers data blocks that are X% by size of all the table's data blocks, returned approximate size is X% of the total file size. It would technically be more accurate to attribute metadata based on number of keys, but that's not computationally efficient with data available and rarely a meaningful difference. Also includes miscellaneous comment improvements / clarifications. Also included is a new approximatesizerandom benchmark for db_bench. No significant performance difference seen with this change, whether ~700 ops/sec with cache_index_and_filter_blocks and small cache or ~150k ops/sec without cache_index_and_filter_blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6784 Test Plan: Test added to DBTest.ApproximateSizesFilesWithErrorMargin. Old code running new test... [ RUN ] DBTest.ApproximateSizesFilesWithErrorMargin db/db_test.cc:1562: Failure Expected: (size) <= (11 * 100), actual: 9478 vs 1100 Other tests updated to reflect consistent accounting of metadata. Reviewed By: siying Differential Revision: D21334706 Pulled By: pdillinger fbshipit-source-id: 6f86870e45213334fedbe9c73b4ebb1d8d611185	2020-06-02 12:30:23 -07:00
Andrew Kryczka	c5abf78bca	avoid `IterKey::UpdateInternalKey()` in `BlockIter` (#6843 ) Summary: `IterKey::UpdateInternalKey()` is an error-prone API as it's incompatible with `IterKey::TrimAppend()`, which is used for decoding delta-encoded internal keys. This PR stops using it in `BlockIter`. Instead, it assigns global seqno in a separate `IterKey`'s buffer when needed. The logic for safely getting a Slice with global seqno properly assigned is encapsulated in `GlobalSeqnoAppliedKey`. `BinarySeek()` is also migrated to use this API (previously it ignored global seqno entirely). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6843 Test Plan: benchmark setup -- single file DBs, in-memory, no compression. "normal_db" created by regular flush; "ingestion_db" created by ingesting a file. Both DBs have same contents. ``` $ TEST_TMPDIR=/dev/shm/normal_db/ ./db_bench -benchmarks=fillrandom,compact -write_buffer_size=10485760000 -disable_auto_compactions=true -compression_type=none -num=1000000 $ ./ldb write_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/ --compression_type=no --hex --create_if_missing < <(./sst_dump --command=scan --output_hex --file=/dev/shm/normal_db/dbbench/000007.sst \| awk 'began {print "0x" substr($1, 2, length($1) - 2), "==>", "0x" $5} ; /^Sst file format: block-based/ {began=1}') $ ./ldb ingest_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/ ``` benchmark run command: ``` TEST_TMPDIR=/dev/shm/$DB/ ./db_bench -benchmarks=seekrandom -seek_nexts=10 -use_existing_db=true -cache_index_and_filter_blocks=false -num=1000000 -cache_size=1048576000 -threads=1 -reads=40000000 ``` results: \| DB \| code \| throughput \| \|---\|---\|---\| \| normal_db \| master \| 267.9 \| \| normal_db \| PR6843 \| 254.2 (-5.1%) \| \| ingestion_db \| master \| 259.6 \| \| ingestion_db \| PR6843 \| 250.5 (-3.5%) \| Reviewed By: pdillinger Differential Revision: D21562604 Pulled By: ajkr fbshipit-source-id: 937596f836930515da8084d11755e1f247dcb264	2020-05-28 10:51:30 -07:00
Akanksha Mahajan	bcefc59e9f	Allow MultiGet users to limit cumulative value size (#6826 ) Summary: 1. Add a value_size in read options which limits the cumulative value size of keys read in batches. Once the size exceeds read_options.value_size, all the remaining keys are returned with status Abort without further fetching any key. 2. Add a unit test case MultiGetBatchedValueSizeSimple the reads keys from memory and sst files. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6826 Test Plan: 1. make check -j64 2. Add a new unit test case Reviewed By: anand1976 Differential Revision: D21471483 Pulled By: akankshamahajan15 fbshipit-source-id: dea51b8e76d5d1df38ece8cdb29933b1d798b900	2020-05-27 13:07:14 -07:00
Zhichao Cao	545e14b53b	Generate file checksum in SstFileWriter (#6859 ) Summary: If Option.file_checksum_gen_factory is set, rocksdb generates the file checksum during flush and compaction based on the checksum generator created by the factory and store the checksum and function name in vstorage and Manifest. This PR enable file checksum generation in SstFileWrite and store the checksum and checksum function name in the ExternalSstFileInfo, such that application can use them for other purpose, for example, ingest the file checksum with files in IngestExternalFile(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6859 Test Plan: add unit test and pass make asan_check. Reviewed By: ajkr Differential Revision: D21656247 Pulled By: zhichao-cao fbshipit-source-id: 78a3570c76031d8832e3d2de3d6c79cdf2b675d0	2020-05-20 11:55:31 -07:00
sdong	4a4b8a1344	sst_dump to reduce number of file reads (#6836 ) Summary: sst_dump can issue many file reads from the file system. This doesn't work well with file systems without a OS cache, especially remote file systems. In order to mitigate this problem, several improvements are done: 1. --readahead_size is added, so that users can specify readahead size when scanning the data. 2. Force a 512KB tail readahead, which prevents three I/Os for footer, meta index and property blocks and hopefully index and filter blocks too. 3. Consoldiate SSTDump's I/Os before opening the file for read. Use the same file prefetch buffer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6836 Test Plan: Add a test that covers this new feature. Reviewed By: pdillinger Differential Revision: D21516607 fbshipit-source-id: 3ae43526286f67b2f4a5bdedfbc92719d579b87e	2020-05-12 18:23:33 -07:00
sdong	a50ea71c00	Improve ldb consistency checks (#6802 ) Summary: When using ldb, users cannot turn on force consistency check in most commands, while they cannot use checksonsistnecy with --try_load_options. The change fixes both by: 1. checkconsistency now calls OpenDB() so that it gets all the options loading and sanitized options logic 2. use options.check_consistency_checks = true by default, and add a --disable_consistency_checks to turn it off. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6802 Test Plan: Add a new unit test. Some manual tests with corrupted DBs. Reviewed By: pdillinger Differential Revision: D21388051 fbshipit-source-id: 8d122732d391b426e3982a1c3232a8e3763ffad0	2020-05-08 14:17:47 -07:00
Yanqin Jin	e72e2167fd	Fix a few bugs in best-efforts recovery (#6824 ) Summary: 1. Update column_family_memtables_ to point to latest column_family_set in version_set after recovery. 2. Normalize file paths passed by application so that directories end with '/' or '\\'. 3. In addition to missing files, corrupted files are also ignored in best-efforts recovery. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6824 Test Plan: COMPILE_WITH_ASAN=1 make check Reviewed By: anand1976 Differential Revision: D21463905 Pulled By: riversand963 fbshipit-source-id: c48db8843cc93c8c1c7139c474b64e6f775307d2	2020-05-08 13:01:42 -07:00
anand76	94265234de	Fix race due to delete triggered compaction in Universal compaction mode (#6799 ) Summary: Delete triggered compaction in universal compaction mode was causing a corruption when scheduled in parallel with other compactions. 1. When num_levels = 1, a file marked for compaction may be picked along with all older files in L0, without checking if any of them are already being compaction. This can cause unpredictable results like resurrection of older versions of keys or deleted keys. 2. When num_levels > 1, a delete triggered compaction would not get scheduled if it overlaps with a running regular compaction. However, the reverse is not true. This is due to the fact that in ```UniversalCompactionBuilder::CalculateSortedRuns```, it assumes that entire sorted runs are picked for compaction and only checks the first file in a sorted run to determine conflicts. This is violated by a delete triggered compaction as it works on a subset of a sorted run. Fix the bug for num_levels > 1, and disable the feature for now when num_levels = 1. After disabling this feature, files would still get marked for compaction, but no compaction would get scheduled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6799 Reviewed By: pdillinger Differential Revision: D21431286 Pulled By: anand1976 fbshipit-source-id: ae9f0bdb1d6ae2f10284847db731c23f43af164a	2020-05-07 17:32:17 -07:00
Andrew Kryczka	3730b05dc9	Fixup HISTORY.md for `e9ba4ba` "validate range tombstone covers positiv… (#6825 ) Summary: …e range" Moved it from the wrong section (6.10) to the right section (Unreleased). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6825 Reviewed By: zhichao-cao Differential Revision: D21464577 Pulled By: ajkr fbshipit-source-id: a836b4ab10be2464182826f9411c9c424c933b70	2020-05-07 16:40:17 -07:00
Peter Dillinger	b27a1448b6	Fix false NotFound from batched MultiGet with kHashSearch (#6821 ) Summary: The error is assigning KeyContext::s to NotFound status in a table reader for a "not found in this table" case, which skips searching in later tables, like only a delete should. (The hash search index iterator is the only one that can return status NotFound even if Valid() == false.) This was detected by intermittent failure in MultiThreadedDBTest.MultiThreaded/5, a kHashSearch configuration. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6821 Test Plan: modified existing unit test to reproduce problem Reviewed By: anand1976 Differential Revision: D21450469 Pulled By: pdillinger fbshipit-source-id: 7478003684d637dbd491cdac81468041a791be2c	2020-05-07 15:41:37 -07:00
Andrew Kryczka	e9ba4ba348	validate range tombstone covers positive range (#6788 ) Summary: We found some files containing nothing but negative range tombstones, and unsurprisingly their metadata specified a negative range, which made things crash. Time to add a bit of user input validation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6788 Reviewed By: zhichao-cao Differential Revision: D21343719 Pulled By: ajkr fbshipit-source-id: f1c16e4c3e9fa150958c8c866176632a3206fb74	2020-05-07 11:55:30 -07:00
anand76	c1e1185b7a	Update release version to 6.10 (#6797 ) Summary: Update HISTORY.md and version.h to 6.10. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6797 Reviewed By: zhichao-cao Differential Revision: D21371390 Pulled By: anand1976 fbshipit-source-id: 6017bca24fc5d12076d1ddaec7783c9b85712d42	2020-05-06 16:42:37 -07:00
Levi Tamasi	06c3b85b9a	Disallow using the base DB's storage directory as blob_dir in BlobDB (#6810 ) Summary: https://github.com/facebook/rocksdb/pull/6807 extends the logic that identifies and purges obsolete files to blob files handled by RocksDB itself. In order to prevent that from interfering with the current BlobDB code, we need to make sure that `BlobDBOptions::blob_dir` is different from the storage directories used by the base DB. (Note: this is true by default.) The patch adds a check that explicitly disallows this configuration and returns `Status::NotSupported` from `BlobDB::Open` in such cases. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6810 Test Plan: Tested using the BlobDB mode of `db_bench`. Reviewed By: riversand963 Differential Revision: D21412676 Pulled By: ltamasi fbshipit-source-id: 6630cc7481e48c8bf55d59423b25f14d52ffe681	2020-05-06 14:00:46 -07:00
Yanqin Jin	5a61e7864d	Fix db_stress when GetLiveFiles() flushes dropped CF (#6805 ) Summary: Current impl. of db_stress will abort verification and report failure if GetLiveFiles() causes a dropped column family to be flushed. This is not desired. To fix, this PR makes the following change: In GetLiveFiles, if flush is triggered and returns Status::IsColumnFamilyDropped(), then set status to Status::OK(). This is OK because dropped column families will be skipped during the rest of this function, and valid column families will have their live files returned to caller. Test plan (dev server): make check ./db_stress -ops_per_thread=1000 -get_live_files_one_in=100 -clear_column_family_one_in=100 ./db_stress -disable_wal=1 -reopen=0 -ops_per_thread=1000 -get_live_files_one_in=100 -clear_column_family_one_in=100 Pull Request resolved: https://github.com/facebook/rocksdb/pull/6805 Reviewed By: ltamasi Differential Revision: D21390044 Pulled By: riversand963 fbshipit-source-id: de67846b95a4f1b88aa0a30c3d70c43cc68625b9	2020-05-04 17:45:49 -07:00
sdong	680c416348	Avoid Swallowing Some File Consistency Checking Bugs (#6793 ) Summary: We are swallowing some file consistency checking failures. This is not expected. We are fixing two cases: DB reopen and manifest dump. More places are not fixed and need follow-up. Error from CheckConsistencyForDeletes() is also swallowed, which is not fixed in this PR. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6793 Test Plan: Add a unit test to cover the reopen case. Reviewed By: riversand963 Differential Revision: D21366525 fbshipit-source-id: eb438a322237814e8d5125f916a3c6de97f39ded	2020-05-04 14:18:11 -07:00
sdong	6277e28039	Flag CompressionOptions::parallel_threads to be experimental (#6781 ) Summary: The feature of CompressionOptions::parallel_threads is still not yet mature. Mention it to be experimental in the comments for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6781 Reviewed By: pdillinger Differential Revision: D21330678 fbshipit-source-id: d7dd7d099fb002a5c6a5d8da689ce5ee08a9eb13	2020-04-30 15:22:06 -07:00
Peter Dillinger	bae6f58696	Basic MultiGet support for partitioned filters (#6757 ) Summary: In MultiGet, access each applicable filter partition only once per batch, rather than for each applicable key. Also, * Fix Bloom stats for MultiGet * Fix/refactor MultiGetContext::Range::KeysLeft, including * Add efficient BitsSetToOne implementation * Assert that MultiGetContext::Range does not go beyond shift range Performance test: Generate db: $ ./db_bench --benchmarks=fillrandom --num=15000000 --cache_index_and_filter_blocks -bloom_bits=10 -partition_index_and_filters=true ... Before (middle performing run of three; note some missing Bloom stats): $ ./db_bench --use-existing-db --benchmarks=multireadrandom --num=15000000 --cache_index_and_filter_blocks --bloom_bits=10 --threads=16 --cache_size=20000000 -partition_index_and_filters -batch_size=32 -multiread_batched -statistics --duration=20 2>&1 \| egrep 'micros/op\|block.cache.filter.hit\|bloom.filter.(full\|use)\|number.multiget' multireadrandom : 26.403 micros/op 597517 ops/sec; (548427 of 671968 found) rocksdb.block.cache.filter.hit COUNT : 83443275 rocksdb.bloom.filter.useful COUNT : 0 rocksdb.bloom.filter.full.positive COUNT : 0 rocksdb.bloom.filter.full.true.positive COUNT : 7931450 rocksdb.number.multiget.get COUNT : 385984 rocksdb.number.multiget.keys.read COUNT : 12351488 rocksdb.number.multiget.bytes.read COUNT : 793145000 rocksdb.number.multiget.keys.found COUNT : 7931450 After (middle performing run of three): $ ./db_bench_new --use-existing-db --benchmarks=multireadrandom --num=15000000 --cache_index_and_filter_blocks --bloom_bits=10 --threads=16 --cache_size=20000000 -partition_index_and_filters -batch_size=32 -multiread_batched -statistics --duration=20 2>&1 \| egrep 'micros/op\|block.cache.filter.hit\|bloom.filter.(full\|use)\|number.multiget' multireadrandom : 21.024 micros/op 752963 ops/sec; (705188 of 863968 found) rocksdb.block.cache.filter.hit COUNT : 49856682 rocksdb.bloom.filter.useful COUNT : 45684579 rocksdb.bloom.filter.full.positive COUNT : 10395458 rocksdb.bloom.filter.full.true.positive COUNT : 9908456 rocksdb.number.multiget.get COUNT : 481984 rocksdb.number.multiget.keys.read COUNT : 15423488 rocksdb.number.multiget.bytes.read COUNT : 990845600 rocksdb.number.multiget.keys.found COUNT : 9908456 So that's about 25% higher throughput even for random keys Pull Request resolved: https://github.com/facebook/rocksdb/pull/6757 Test Plan: unit test included Reviewed By: anand1976 Differential Revision: D21243256 Pulled By: pdillinger fbshipit-source-id: 5644a1468d9e8c8575be02f4e04bc5d62dbbb57f	2020-04-28 14:49:34 -07:00
Peter Dillinger	a7f0b27b39	HISTORY.md update for bzip upgrade (#6767 ) Summary: See https://github.com/facebook/rocksdb/issues/6714 and https://github.com/facebook/rocksdb/issues/6703 Pull Request resolved: https://github.com/facebook/rocksdb/pull/6767 Reviewed By: riversand963 Differential Revision: D21283307 Pulled By: pdillinger fbshipit-source-id: 8463bec725669d13846c728ad4b5bde43f9a84f8	2020-04-28 12:29:31 -07:00
Peter Dillinger	4574d7513d	Update HISTORY.md for block cache redundant adds (#6764 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6764 Reviewed By: ltamasi Differential Revision: D21267108 Pulled By: pdillinger fbshipit-source-id: a3dfe2dbe4e8f6309a53eb72903ef58d52308f97	2020-04-28 08:26:43 -07:00
Yanqin Jin	d4398e08fc	Fix timestamp support for MultiGet (#6748 ) Summary: 1. Avoid nullptr dereference when passing timestamp to KeyContext creation. 2. Construct LookupKey correctly with timestamp when creating MultiGetContext. 3. Compare without timestamp when sorting KeyContexts. Fixes https://github.com/facebook/rocksdb/issues/6745 Test plan (dev server): make check Pull Request resolved: https://github.com/facebook/rocksdb/pull/6748 Reviewed By: pdillinger Differential Revision: D21258691 Pulled By: riversand963 fbshipit-source-id: 44e65b759c18b9986947783edf03be4f890bb004	2020-04-27 22:49:56 -07:00
Levi Tamasi	bea91d5d61	Destroy any ColumnFamilyHandles in BlobDB::Open upon error (#6763 ) Summary: If an error happens during BlobDBImpl::Open after the base DB has been opened, we need to destroy the `ColumnFamilyHandle`s returned by `DB::Open` to prevent an assertion in `ColumnFamilySet`'s destructor from being hit. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6763 Test Plan: Ran `make check` and tested using the BlobDB mode of `db_bench`. Reviewed By: riversand963 Differential Revision: D21262643 Pulled By: ltamasi fbshipit-source-id: 60ebc7ab19be66cf37fbe5f6d8957d58470f3d3b	2020-04-27 16:45:13 -07:00
Akanksha Mahajan	75b13ea94a	Allow sst_dump to check size of different compression levels and report time (#6634 ) Summary: Summary : 1. Add two arguments --compression_level_from and --compression_level_to to check the compression size with different compression level in the given range. Users must specify one compression type else it will error out. Both from and to levels must also be specified together. 2. Display the time taken to compress each file with different compressions by default. Test Plan : make -j64 check Pull Request resolved: https://github.com/facebook/rocksdb/pull/6634 Test Plan: make -j64 check Reviewed By: anand1976 Differential Revision: D20810282 Pulled By: akankshamahajan15 fbshipit-source-id: ac9098d3c079a1fad098f6678dbedb4d888a791b	2020-04-27 12:36:16 -07:00
Cheng Chang	40497a875a	Reduce memory copies when fetching and uncompressing blocks from SST files (#6689 ) Summary: In https://github.com/facebook/rocksdb/pull/6455, we modified the interface of `RandomAccessFileReader::Read` to be able to get rid of memcpy in direct IO mode. This PR applies the new interface to `BlockFetcher` when reading blocks from SST files in direct IO mode. Without this PR, in direct IO mode, when fetching and uncompressing compressed blocks, `BlockFetcher` will first copy the raw compressed block into `BlockFetcher::compressed_buf_` or `BlockFetcher::stack_buf_` inside `RandomAccessFileReader::Read` depending on the block size. then during uncompressing, it will copy the uncompressed block into `BlockFetcher::heap_buf_`. In this PR, we get rid of the first memcpy and directly uncompress the block from `direct_io_buf_` to `heap_buf_`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6689 Test Plan: A new unit test `block_fetcher_test` is added. Reviewed By: anand1976 Differential Revision: D21006729 Pulled By: cheng-chang fbshipit-source-id: 2370b92c24075692423b81277415feb2aed5d980	2020-04-24 15:32:56 -07:00
Yanqin Jin	e04f3bce4f	Update CURRENT file after best-efforts recovery (#6746 ) Summary: After a successful recovery, the CURRENT file should be updated to point to the valid MANIFEST. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6746 Test Plan: make check Reviewed By: anand1976 Differential Revision: D21189876 Pulled By: riversand963 fbshipit-source-id: 7537b49988c5c425ebe9505a5cc260de351ad79b	2020-04-23 16:21:09 -07:00
mrambacher	4cbc19d2a1	Add a ConfigOptions for use in comparing objects and converting to/from strings (#6389 ) Summary: The methods in convenience.h are used to compare/convert objects to/from strings. There is a mishmash of parameters in use here with more needed in the future. This PR replaces those parameters with a single structure. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6389 Reviewed By: siying Differential Revision: D21163707 Pulled By: zhichao-cao fbshipit-source-id: f807b4cc7e2b0af3871536b69546b2604dfa81bd	2020-04-21 17:38:17 -07:00
Akanksha Mahajan	03a1d95db0	Set max_background_flushes dynamically (#6701 ) Summary: 1. Add changes so that max_background_flushes can be set dynamically. 2. Add a testcase DBOptionsTest.SetBackgroundFlushThreads which set the max_background_flushes dynamically using SetDBOptions. TestPlan: 1. make -j64 check 2. Using new testcase DBOptionsTest.SetBackgroundFlushThreads Pull Request resolved: https://github.com/facebook/rocksdb/pull/6701 Reviewed By: ajkr Differential Revision: D21028010 Pulled By: akankshamahajan15 fbshipit-source-id: 5f949e4a8fd3c32537b637947b7ee09a69cfc7c1	2020-04-20 16:19:02 -07:00
Yanqin Jin	243852ec15	Add IsDirectory() to Env and FS (#6711 ) Summary: IsDirectory() is a common API to check whether a path is a regular file or directory. POSIX: call stat() and use S_ISDIR(st_mode) Windows: PathIsDirectoryA() and PathIsDirectoryW() HDFS: FileSystem.IsDirectory() Java: File.IsDirectory() ... Pull Request resolved: https://github.com/facebook/rocksdb/pull/6711 Test Plan: make check Reviewed By: anand1976 Differential Revision: D21053520 Pulled By: riversand963 fbshipit-source-id: 680aadfd8ce982b63689190cf31b3145d5a89e27	2020-04-17 14:39:18 -07:00
Mike Kolupaev	e45673dece	Properly report IO errors when IndexType::kBinarySearchWithFirstKey is used (#6621 ) Summary: Context: Index type `kBinarySearchWithFirstKey` added the ability for sst file iterator to sometimes report a key from index without reading the corresponding data block. This is useful when sst blocks are cut at some meaningful boundaries (e.g. one block per key prefix), and many seeks land between blocks (e.g. for each prefix, the ranges of keys in different sst files are nearly disjoint, so a typical seek needs to read a data block from only one file even if all files have the prefix). But this added a new error condition, which rocksdb code was really not equipped to deal with: `InternalIterator::value()` may fail with an IO error or Status::Incomplete, but it's just a method returning a Slice, with no way to report error instead. Before this PR, this type of error wasn't handled at all (an empty slice was returned), and kBinarySearchWithFirstKey implementation was considered a prototype. Now that we (LogDevice) have experimented with kBinarySearchWithFirstKey for a while and confirmed that it's really useful, this PR is adding the missing error handling. It's a pretty inconvenient situation implementation-wise. The error needs to be reported from InternalIterator when trying to access value. But there are ~700 call sites of `InternalIterator::value()`, most of which either can't hit the error condition (because the iterator is reading from memtable or from index or something) or wouldn't benefit from the deferred loading of the value (e.g. compaction iterator that reads all values anyway). Adding error handling to all these call sites would needlessly bloat the code. So instead I made the deferred value loading optional: only the call sites that may use deferred loading have to call the new method `PrepareValue()` before calling `value()`. The feature is enabled with a new bool argument `allow_unprepared_value` to a bunch of methods that create iterators (it wouldn't make sense to put it in ReadOptions because it's completely internal to iterators, with virtually no user-visible effect). Lmk if you have better ideas. Note that the deferred value loading only happens for internal iterators. The user-visible iterator (DBIter) always prepares the value before returning from Seek/Next/etc. We could go further and add an API to defer that value loading too, but that's most likely not useful for LogDevice, so it doesn't seem worth the complexity for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6621 Test Plan: make -j5 check . Will also deploy to some logdevice test clusters and look at stats. Reviewed By: siying Differential Revision: D20786930 Pulled By: al13n321 fbshipit-source-id: 6da77d918bad3780522e918f17f4d5513d3e99ee	2020-04-15 17:40:44 -07:00
Zhichao Cao	38dfa406ff	Add NewFileChecksumGenCrc32cFactory to file checksum (#6688 ) Summary: Add NewFileChecksumGenCrc32cFactory to file checksum public interface such that applications can use the build in crc32 checksum factory. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6688 Test Plan: pass make asan_check Reviewed By: riversand963 Differential Revision: D21006859 Pulled By: zhichao-cao fbshipit-source-id: ea8a45196a8b77c310728ab05f6cc0f49f3baef0	2020-04-13 19:13:41 -07:00
Andrew Kryczka	9eca6d651d	fix comparison count for format_version=3 indexes (#6650 ) Summary: In index blocks since `format_version=3`, user keys are written rather than internal keys. When reading such blocks, the comparator is obtained via `InternalKeyComparator::user_comparator()`. That function must not return an unwrapped result as the wrapper class provides accounting logic to populate `PerfContext::user_key_comparison_count`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6650 Test Plan: ran db_bench and verified `PerfContext::user_key_comparison_count` became larger. Reviewed By: cheng-chang Differential Revision: D20866325 Pulled By: ajkr fbshipit-source-id: ad755d46bda31157dacc5b66e532279f19ad538c	2020-04-13 11:18:37 -07:00
sdong	1be3be5522	Auto-Format two recent diffs and add HISTORY.md (#6685 ) Summary: Two recent diffs can be autoformatted. Also add HISTORY.md entry for https://github.com/facebook/rocksdb/pull/6214 Pull Request resolved: https://github.com/facebook/rocksdb/pull/6685 Test Plan: Run all existing tests Reviewed By: cheng-chang Differential Revision: D20965780 fbshipit-source-id: 195b08d7849513d42fe14073112cd19fdda6af95	2020-04-10 11:32:44 -07:00
Cheng Chang	6e6f807917	Add two more optimization improvements to HISTORY (#6679 ) Summary: Although these optimizations are not user facing, still feel it's valuable to call out in HISTORY. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6679 Test Plan: no need Reviewed By: zhichao-cao Differential Revision: D20945916 Pulled By: cheng-chang fbshipit-source-id: f3e790c07f3bcc4a8a74246c4fa232800ddd4438	2020-04-09 11:19:51 -07:00
Yi Wu	eb287c72d7	Fix wrong key being read on ingested file with global seqno and delta encoding (#6669 ) Summary: On reading an ingested SST file, `DataBlockIter` will replace seqno encoded in a key with global seqno. However, if the original seqno was part of the prefix used for the next key, the global seqno is by mistake used as part of the prefix to construct the next key, causing wrong result being returned. Although at this point it is only software error while data in the file is not corrupted, the issue can further cause compaction output out of order and corrupted result when the ingested SST participated in compaction. Fixing the issue by save the actual seqno and restore it before the key being used as prefix to construct next key. The unit test is by Little-Wallace from https://github.com/facebook/rocksdb/issues/6666. Fixing https://github.com/facebook/rocksdb/issues/6666. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6669 Test Plan: New unit test Signed-off-by: Yi Wu <yiwu@pingcap.com> Reviewed By: cheng-chang Differential Revision: D20931808 Pulled By: ajkr fbshipit-source-id: f01959c35d6a493954dca981663766c7a5a9e8ab	2020-04-08 21:22:15 -07:00
sdong	94f90ac6bc	compression related options are not copied back from MutableCFOptions… (#6668 ) Summary: … to CFOptions https://github.com/facebook/rocksdb/pull/6615 made several compression related options dynamically changeable. They are moved to MutableCFOptions. However, they are not copied back to ColumnFamilyOptions, so the changed values are not written to option files and for some other uses. Fix it by copying them back. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6668 Test Plan: Add a unit test to make sure that when a MutableCFOptions is converted to CFOptions and back to MutableCFOptions, they stay the same. This test would fail without the fix. Reviewed By: ajkr Differential Revision: D20923999 fbshipit-source-id: c3bccd6923b00d677764e2269bed6a95ad7ed780	2020-04-08 14:40:46 -07:00
Zhichao Cao	278911a2d9	Remove redundant in HISTORY (#6627 ) Summary: Remove redundant description in HISTORY no code change Pull Request resolved: https://github.com/facebook/rocksdb/pull/6627 Test Plan: make check Reviewed By: anand1976 Differential Revision: D20797269 Pulled By: zhichao-cao fbshipit-source-id: dee4c9a22f6d241c985f250c0f11bfaa9198f4c1	2020-04-02 12:12:05 -07:00
Ziyue Yang	03a781a90c	Add pipelined & parallel compression optimization (#6262 ) Summary: This PR adds support for pipelined & parallel compression optimization for `BlockBasedTableBuilder`. This optimization makes block building, block compression and block appending a pipeline, and uses multiple threads to accelerate block compression. Users can set `CompressionOptions::parallel_threads` greater than 1 to enable compression parallelism. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6262 Reviewed By: ajkr Differential Revision: D20651306 fbshipit-source-id: 62125590a9c15b6d9071def9dc72589c1696a4cb	2020-04-01 16:40:18 -07:00
sdong	57096ab13e	Fix a bug that crashes the service when write buffer manager fails to insert to block cache (#6619 ) Summary: https://github.com/facebook/rocksdb/issues/6247 reports that when write buffer manager fails to insert the dummy entry to block cache, null pointer is still stored and used to release the handle and cause corruption. Fix the bug by not releasing it with null handle. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6619 Test Plan: Add a unit test that fails without the fix. Reviewed By: ajkr Differential Revision: D20776769 fbshipit-source-id: 4127fbd9f295a0a3e45774746ffcd91f939f6287	2020-04-01 11:27:40 -07:00
Levi Tamasi	e6f86cfb36	Revert the recent cache deleter change (#6620 ) Summary: Revert "Use function objects as deleters in the block cache (https://github.com/facebook/rocksdb/issues/6545)" This reverts commit `6301dbe7a7`. Revert "Call out the cache deleter related interface change in HISTORY.md (https://github.com/facebook/rocksdb/issues/6606)" This reverts commit `3a35542f86`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6620 Test Plan: `make check` Reviewed By: zhichao-cao Differential Revision: D20773311 Pulled By: ltamasi fbshipit-source-id: 7637a761f718f323ef0e7da959462e8fb06e7a2b	2020-03-31 16:11:06 -07:00
sdong	80979f81c7	Make options.bottommost_compression, compression_opts and bottommost_compression_opts dynamically changeable. (#6615 ) Summary: These three options should be made dynamically changeable. Simply add them to MutableCFOptions and made the change. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6615 Test Plan: Add a unit test to make sure that SetOptions() can change the options. Reviewed By: riversand963 Differential Revision: D20755951 fbshipit-source-id: 8165f4fd7a7a665cc7fb049698935022a5d2e7ff	2020-03-31 12:11:42 -07:00
Yanqin Jin	18cf0de640	Use flush time for the props.creation_time for FIFO compaction (#6612 ) Summary: For FIFO compaction, we use flush time instead of oldest key time as the creation time. This is to prevent FIFO compaction dropping files whose oldest key time is older than TTL but which has newer keys than TTL. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6612 Test Plan: make check Reviewed By: siying Differential Revision: D20748217 Pulled By: riversand963 fbshipit-source-id: 3f7b00a847020760537cdddd12f6fe039e5bc663	2020-03-30 18:59:17 -07:00
Zhichao Cao	eaf95c7d1a	Update release version to 6.9.0 (#6610 ) Summary: Update release version to 6.9.0 Pull Request resolved: https://github.com/facebook/rocksdb/pull/6610 Test Plan: no code change Reviewed By: riversand963 Differential Revision: D20741094 Pulled By: zhichao-cao fbshipit-source-id: 80a9e9ea8d164b6923112352d36fcbc1be85c034	2020-03-30 16:31:02 -07:00
Zhichao Cao	e8d332d97e	Use FileChecksumGenFactory for SST file checksum (#6600 ) Summary: In the current implementation, sst file checksum is calculated by a shared checksum function object, which may make some checksum function hard to be applied here such as SHA1. In this implementation, each sst file will have its own checksum generator obejct, created by FileChecksumGenFactory. User needs to implement its own FilechecksumGenerator and Factory to plugin the in checksum calculation method. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6600 Test Plan: tested with make asan_check Reviewed By: riversand963 Differential Revision: D20717670 Pulled By: zhichao-cao fbshipit-source-id: 2a74c1c280ac11a07a1980185b43b671acaa71c6	2020-03-29 15:58:46 -07:00
Cheng Chang	ee50b8d499	Be able to decrease background thread's CPU priority when creating database backup (#6602 ) Summary: When creating a database backup, the background threads will not only consume IO resources by copying files, but also consuming CPU such as by computing checksums. During peak times, the CPU consumption by the background threads might affect online queries. This PR makes it possible to decrease CPU priority of these threads when creating a new backup. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6602 Test Plan: make check Reviewed By: siying, zhichao-cao Differential Revision: D20683216 Pulled By: cheng-chang fbshipit-source-id: 9978b9ed9488e8ce135e90ca083e5b4b7221fd84	2020-03-28 19:07:25 -07:00
Levi Tamasi	3a35542f86	Call out the cache deleter related interface change in HISTORY.md (#6606 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6606 Reviewed By: riversand963 Differential Revision: D20708411 Pulled By: ltamasi fbshipit-source-id: c15b4ded19a4b5c84e3e4240bdcec15460806c88	2020-03-27 16:18:23 -07:00
Peter Dillinger	93b80ca7ba	Update default BBTO::format_version from 2 to 4 (#6582 ) Summary: Version 4 has been around long enough, for compatibility and extensive validation, that it should be default. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6582 Test Plan: CI (w.r.t. changing the default; format_version=4 is well tested and massively in production at Facebook) Reviewed By: siying Differential Revision: D20625233 Pulled By: pdillinger fbshipit-source-id: 2f83ed874cffa4a39bc7a66cdf3833b978fbb948	2020-03-24 21:22:21 -07:00
sdong	921cdd37e2	Fix bug that number of table loading threads is set as a boolean (#6576 ) Summary: When applying a new version in non DB open case, optimize_filters_for_hits is used for max_threads, which is clearly a bug. It is not clear what the indented value in the first place, but it value 1 makes sense here, which would create no extra threads. This bug is not expected to cause user visible problems, assuming C++ implicitly cast bool to 0 or 1. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6576 Test Plan: Run all exsiting test. Reviewed By: ajkr Differential Revision: D20602467 fbshipit-source-id: 40b2cd8619aba09ae9242b36c415464db3c9b737	2020-03-24 10:17:40 -07:00
Yanqin Jin	fb09ef05dc	Attempt to recover from db with missing table files (#6334 ) Summary: There are situations when RocksDB tries to recover, but the db is in an inconsistent state due to SST files referenced in the MANIFEST being missing. In this case, previous RocksDB will just fail the recovery and return a non-ok status. This PR enables another possibility. During recovery, RocksDB checks possible MANIFEST files, and try to recover to the most recent state without missing table file. `VersionSet::Recover()` applies version edits incrementally and "materializes" a version only when this version does not reference any missing table file. After processing the entire MANIFEST, the version created last will be the latest version. `DBImpl::Recover()` calls `VersionSet::Recover()`. Afterwards, WAL replay will not be performed. To use this capability, set `options.best_efforts_recovery = true` when opening the db. Best-efforts recovery is currently incompatible with atomic flush. Test plan (on devserver): ``` $make check $COMPILE_WITH_ASAN=1 make all && make check ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/6334 Reviewed By: anand1976 Differential Revision: D19778960 Pulled By: riversand963 fbshipit-source-id: c27ea80f29bc952e7d3311ecf5ee9c54393b40a8	2020-03-20 19:30:48 -07:00

1 2 3 4 5 ...

740 Commits