rocksdb

Author	SHA1	Message	Date
Peter Dillinger	df4d3cf6fd	Update version to 6.28.2 Summary: Update version to 6.28.2 for bug fix	2022-01-31 15:24:29 -08:00
Jay Zhuang	70a68ddc06	Update circleci xcode version (#9405 ) Summary: xcode 11.3.1 is deprecated https://circleci.com/docs/2.0/testing-ios/ , jobs are failing: ``` failed to create host: Image xcode:11.3.0 is not supported ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9405 Test Plan: CI Reviewed By: ajkr, hx235 Differential Revision: D33674462 Pulled By: jay-zhuang fbshipit-source-id: 85dd27aad84d26eaaa5c5375015344182b2c50b9	2022-01-31 15:24:29 -08:00
Peter Dillinger	c79005a6b6	Pick in install-jdk8-on-macos from `5602b1d3d9`	2022-01-31 15:24:28 -08:00
Peter Dillinger	0500c49f62	Fix^2 prefix extractor testing in crash test (#9463 ) Summary: Even after https://github.com/facebook/rocksdb/issues/9461 could see ``` Error: please specify prefix_size for test_batches_snapshots test! ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9463 Test Plan: run `make blackbox_crashtest` for a long time. (Unfortunately, it's taking a long time to reproduce these failures) Reviewed By: akankshamahajan15 Differential Revision: D33838152 Pulled By: pdillinger fbshipit-source-id: b9a73c5bbb68df53f14c22b9b52f61d1f7ef38af	2022-01-31 10:12:33 -08:00
Peter Dillinger	0cf2c6aa3b	Fix/expand prefix extractor testing in crash test (#9461 ) Summary: Changes in https://github.com/facebook/rocksdb/issues/9453 could trigger ``` stderr: Error: prefixpercent is non-zero while prefix_size is not positive! ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9461 Test Plan: run `make blackbox_crashtest` for a long time Reviewed By: ajkr Differential Revision: D33830751 Pulled By: pdillinger fbshipit-source-id: be88377dcaa47e4bb7adb0347762639eff8f1476	2022-01-31 10:12:33 -08:00
Peter Dillinger	b0e248e2a4	Fix major bug with MultiGet, DeleteRange, and memtable Bloom (#9453 ) Summary: MemTable::MultiGet was not considering range tombstones before querying Bloom filter. This means range tombstones would be skipped for keys (or prefixes) with no other entries in the memtable. This could cause old values for a key (in SST files) to still show up until the range tombstone covering it has been flushed. This is fixed by essentially disabling the memtable Bloom filter when there are any range tombstones. (This could be better optimized in the future, but good enough for now.) Did some other cleanup/optimization in the same code to (more than) offset the cost of checking on range tombstones in more cases. There is now notable improvement when memtable_whole_key_filtering and prefix_extractor are used together (unusual), and this makes MultiGet closer to the Get implementation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9453 Test Plan: new unit test added. Added memtable Bloom to crash test. Performance testing -------------------- Build WAL-only DB (recovers to memtable): ``` TEST_TMPDIR=/dev/shm/rocksdb ./db_bench -benchmarks=fillrandom -num=1000000 -write_buffer_size=250000000 ``` Query test command, to maximize sensitivity to the changed code: ``` TEST_TMPDIR=/dev/shm/rocksdb ./db_bench -use_existing_db -readonly -benchmarks=multireadrandom -num=10000000 -write_buffer_size=250000000 -memtable_bloom_size_ratio=0.015 -multiread_batched -batch_size=24 -threads=8 -memtable_whole_key_filtering=$MWKF -prefix_size=$PXS ``` (Note -num here is 10x larger for mostly memtable misses) Before & after run simultaneously, average over 10 iterations per data point, ops/sec. MWKF=0 PXS=0 (Bloom disabled) Before: 5724844 After: 6722066 MWKF=0 PXS=7 (prefixes hardly unique; Bloom not useful) Before: 9981319 After: 10237990 MWKF=0 PXS=8 (prefixes unique; Bloom useful) Before: 12081715 After: 12117603 MWKF=1 PXS=0 (whole key Bloom useful) Before: 11944354 After: 12096085 MWKF=1 PXS=7 (whole key Bloom useful in new version; prefixes not useful in old version) Before: 9444299 After: 11826029 MWKF=1 PXS=7 (whole key Bloom useful in new version; prefixes useful in old version) Before: 11784465 After: 11778591 Only in this last case is the 'before' slightly faster, perhaps because hashing prefixes is slightly faster than hashing whole keys. Otherwise, 'after' is faster. Reviewed By: ajkr Differential Revision: D33805025 Pulled By: pdillinger fbshipit-source-id: 597523cae4f4eafdf6ae6bb2bc6cb46f83b017bf	2022-01-31 10:12:30 -08:00
Yanqin Jin	fdb3125547	Update HISTORY and bump version	2022-01-10 11:31:18 -08:00
Yanqin Jin	128e36ca53	Make RocksDB codebase compatible with newer compilers like clang-12 (#9370 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9370 GCC and newer clang, e.g. clang-12 treat `std::unique_ptr` slightly differently. For the following code ``` #include <iostream> #include <memory> #include <type_traits> struct A { std::unique_ptr<int> m1; }; int main() { std::cout << std::boolalpha; std::cout << std::is_standard_layout<A>::value << '\n'; return 0; } ``` GCC11(C++20) (tested on https://en.cppreference.com/w/cpp/types/is_standard_layout) will print "true", while newer clang, e.g. clang-12 will print "false". This breaks the usage of `offsetof()` on structs with non-static members of type `std::unique_ptr`. Fixing this by replacing the builtin `offsetof` with a trick documented at https://gist.github.com/graphitemaster/494f21190bb2c63c5516. Reviewed By: jay-zhuang Differential Revision: D33420840 fbshipit-source-id: 02bde281dfa28809bec787ad0f7019e85dd9c607	2022-01-10 11:27:59 -08:00
Akanksha Mahajan	7bfad07194	Update to version 6.28 (#9312 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9312 Reviewed By: ajkr Differential Revision: D33196324 Pulled By: akankshamahajan15 fbshipit-source-id: 471da75eaedc54d3151672adc28643bc1d6fdf23	2021-12-17 16:20:39 -08:00
Peter Dillinger	0d9b256813	Fix unity build with SUPPORT_CLOCK_CACHE (#9309 ) Summary: After https://github.com/facebook/rocksdb/issues/9126 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9309 Test Plan: CI Reviewed By: ajkr Differential Revision: D33188902 Pulled By: pdillinger fbshipit-source-id: 54bf34e33c2b30b1b8dc2a0229e84c194321b606	2021-12-17 14:15:07 -08:00
Yanqin Jin	6b5e28a43c	Update TARGETS and related scripts (#9310 ) Summary: As title. Remove 'unexported_deps_by_default', replace 'deps' and 'external_deps' with 'exported_deps' and 'exported_external_deps' respectively. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9310 Test Plan: Github action and internal jobs. Reviewed By: DrMarcII Differential Revision: D33190092 Pulled By: riversand963 fbshipit-source-id: 64200e5331d822f88f8d122a55b7a29bfd1f9553	2021-12-17 11:51:51 -08:00
mrambacher	423538a816	Make MemoryAllocator into a Customizable class (#8980 ) Summary: - Make MemoryAllocator and its implementations into a Customizable class. - Added a "DefaultMemoryAllocator" which uses new and delete - Added a "CountedMemoryAllocator" that counts the number of allocs and free - Updated the existing tests to use these new allocators - Changed the memkind allocator test into a generic test that can test the various allocators. - Added tests for creating all of the allocators - Added tests to verify/create the JemallocNodumpAllocator using its options. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8980 Reviewed By: zhichao-cao Differential Revision: D32990403 Pulled By: mrambacher fbshipit-source-id: 6fdfe8218c10dd8dfef34344a08201be1fa95c76	2021-12-17 04:20:47 -08:00
Jermy Li	9828b6d5fd	fix java doc issues (#9253 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9253 Reviewed By: jay-zhuang Differential Revision: D32990516 Pulled By: mrambacher fbshipit-source-id: c7cdb6562ac6871bca6ea0d9efa454f3a902a137	2021-12-16 21:04:41 -08:00
Peter Dillinger	0050a73a4f	New stable, fixed-length cache keys (#9126 ) Summary: This change standardizes on a new 16-byte cache key format for block cache (incl compressed and secondary) and persistent cache (but not table cache and row cache). The goal is a really fast cache key with practically ideal stability and uniqueness properties without external dependencies (e.g. from FileSystem). A fixed key size of 16 bytes should enable future optimizations to the concurrent hash table for block cache, which is a heavy CPU user / bottleneck, but there appears to be measurable performance improvement even with no changes to LRUCache. This change replaces a lot of disjointed and ugly code handling cache keys with calls to a simple, clean new internal API (cache_key.h). (Preserving the old cache key logic under an option would be very ugly and likely negate the performance gain of the new approach. Complete replacement carries some inherent risk, but I think that's acceptable with sufficient analysis and testing.) The scheme for encoding new cache keys is complicated but explained in cache_key.cc. Also: EndianSwapValue is moved to math.h to be next to other bit operations. (Explains some new include "math.h".) ReverseBits operation added and unit tests added to hash_test for both. Fixes https://github.com/facebook/rocksdb/issues/7405 (presuming a root cause) Pull Request resolved: https://github.com/facebook/rocksdb/pull/9126 Test Plan: ### Basic correctness Several tests needed updates to work with the new functionality, mostly because we are no longer relying on filesystem for stable cache keys so table builders & readers need more context info to agree on cache keys. This functionality is so core, a huge number of existing tests exercise the cache key functionality. ### Performance Create db with `TEST_TMPDIR=/dev/shm ./db_bench -bloom_bits=10 -benchmarks=fillrandom -num=3000000 -partition_index_and_filters` And test performance with `TEST_TMPDIR=/dev/shm ./db_bench -readonly -use_existing_db -bloom_bits=10 -benchmarks=readrandom -num=3000000 -duration=30 -cache_index_and_filter_blocks -cache_size=250000 -threads=4` using DEBUG_LEVEL=0 and simultaneous before & after runs. Before ops/sec, avg over 100 runs: 121924 After ops/sec, avg over 100 runs: 125385 (+2.8%) ### Collision probability I have built a tool, ./cache_bench -stress_cache_key to broadly simulate host-wide cache activity over many months, by making some pessimistic simplifying assumptions: * Every generated file has a cache entry for every byte offset in the file (contiguous range of cache keys) * All of every file is cached for its entire lifetime We use a simple table with skewed address assignment and replacement on address collision to simulate files coming & going, with quite a variance (super-Poisson) in ages. Some output with `./cache_bench -stress_cache_key -sck_keep_bits=40`: ``` Total cache or DBs size: 32TiB Writing 925.926 MiB/s or 76.2939TiB/day Multiply by 9.22337e+18 to correct for simulation losses (but still assume whole file cached) ``` These come from default settings of 2.5M files per day of 32 MB each, and `-sck_keep_bits=40` means that to represent a single file, we are only keeping 40 bits of the 128-bit cache key. With file size of 2\\25 contiguous keys (pessimistic), our simulation is about 2\\(128-40-25) or about 9 billion billion times more prone to collision than reality. More default assumptions, relatively pessimistic: * 100 DBs in same process (doesn't matter much) * Re-open DB in same process (new session ID related to old session ID) on average every 100 files generated * Restart process (all new session IDs unrelated to old) 24 times per day After enough data, we get a result at the end: ``` (keep 40 bits) 17 collisions after 2 x 90 days, est 10.5882 days between (9.76592e+19 corrected) ``` If we believe the (pessimistic) simulation and the mathematical generalization, we would need to run a billion machines all for 97 billion days to expect a cache key collision. To help verify that our generalization ("corrected") is robust, we can make our simulation more precise with `-sck_keep_bits=41` and `42`, which takes more running time to get enough data: ``` (keep 41 bits) 16 collisions after 4 x 90 days, est 22.5 days between (1.03763e+20 corrected) (keep 42 bits) 19 collisions after 10 x 90 days, est 47.3684 days between (1.09224e+20 corrected) ``` The generalized prediction still holds. With the `-sck_randomize` option, we can see that we are beating "random" cache keys (except offsets still non-randomized) by a modest amount (roughly 20x less collision prone than random), which should make us reasonably comfortable even in "degenerate" cases: ``` 197 collisions after 1 x 90 days, est 0.456853 days between (4.21372e+18 corrected) ``` I've run other tests to validate other conditions behave as expected, never behaving "worse than random" unless we start chopping off structured data. Reviewed By: zhichao-cao Differential Revision: D33171746 Pulled By: pdillinger fbshipit-source-id: f16a57e369ed37be5e7e33525ace848d0537c88f	2021-12-16 17:15:13 -08:00
Andrea Cavalli	9918e1ee5a	Set KeyMayExist fields visibility to public (#9285 ) Summary: Fixes https://github.com/facebook/rocksdb/issues/9284 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9285 Reviewed By: pdillinger Differential Revision: D33062006 Pulled By: mrambacher fbshipit-source-id: c3471c2db717fa5bc2337cf996ce744af0ed877d	2021-12-16 10:59:05 -08:00
Andrew Kryczka	5383f1eec4	Verify recovery correctness in multi-CF blackbox crash test (#9303 ) Summary: db_crashtest.py uses multiple CFs only when run without flag `--simple`. The previous config set `-test_batches_snapshots=1` in that case for blackbox mode. But `-test_batches_snapshots=1` cannot verify recovery correctness, so it should not always be set for multi-CF blackbox tests. We can instead randomly toggle it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9303 Reviewed By: riversand963 Differential Revision: D33155229 Pulled By: ajkr fbshipit-source-id: 4a6fdc4eddccc8ece664063baf6393ce1c5de6b7	2021-12-16 09:05:40 -08:00
Alan Paxton	c1ec0b28eb	java / jni io_uring support (#9224 ) Summary: Existing multiGet() in java calls multi_get_helper() which then calls DB::std::vector MultiGet(). This doesn't take advantage of io_uring. This change adds another JNI level method that runs a parallel code path using the DB::void MultiGet(), using ByteBuffers at the JNI level. We call it multiGetDirect(). In addition to using the io_uring path, this code internally returns pinned slices which we can copy out of into our direct byte buffers; this should reduce the overall number of copies in the code path to/from Java. Some jmh benchmark runs (100k keys, 1000 key multiGet) suggest that for value sizes > 1k, we see about a 20% performance improvement, although performance is slightly reduced for small value sizes, there's a little bit more overhead in the JNI methods. Closes https://github.com/facebook/rocksdb/issues/8407 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9224 Reviewed By: mrambacher Differential Revision: D32951754 Pulled By: jay-zhuang fbshipit-source-id: 1f70df7334be2b6c42a9c8f92725f67c71631690	2021-12-15 18:09:25 -08:00
Radek Hubner	7ac3a5d406	ReadOptions - Add missing java API. (#9248 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9248 Reviewed By: mrambacher Differential Revision: D33011237 Pulled By: jay-zhuang fbshipit-source-id: b6544ad40cb722e327bac60a0af711db253e36d7	2021-12-15 17:46:05 -08:00
Akanksha Mahajan	96d0773a11	Update prepopulate_block_cache logic to support block-based filter (#9300 ) Summary: Update prepopulate_block_cache logic to support block-based filter during insertion in block cache Pull Request resolved: https://github.com/facebook/rocksdb/pull/9300 Test Plan: CircleCI tests, make crash_test -j64 Reviewed By: pdillinger Differential Revision: D33132018 Pulled By: akankshamahajan15 fbshipit-source-id: 241deabab8645bda704728e572d6de6354df18b2	2021-12-15 13:20:27 -08:00
Andrew Kryczka	c9818b3325	db_stress verify with lost unsynced operations (#8966 ) Summary: When a previous run left behind historical state/trace files (implying it was run with --sync_fault_injection set), this PR uses them to restore the expected state according to the DB's recovered sequence number. That way, a tail of latest unsynced operations are permitted to be dropped, as is the case when data in page cache or certain `Env`s is lost. The point of the verification in this scenario is just to ensure there is no hole in the recovered data. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8966 Test Plan: - ran it a while, made sure it is restoring expected values using the historical state/trace files: ``` $ rm -rf ./tmp-db/ ./exp/ && mkdir -p ./tmp-db/ ./exp/ && while ./db_stress -compression_type=none -clear_column_family_one_in=0 -expected_values_dir=./exp -sync_fault_injection=1 -destroy_db_initially=0 -db=./tmp-db -max_key=1000000 -ops_per_thread=10000 -reopen=0 -threads=32 ; do : ; done ``` Reviewed By: pdillinger Differential Revision: D31219445 Pulled By: ajkr fbshipit-source-id: f0e1d51fe5b35465b00565c33331190ea38ba0ad	2021-12-15 12:54:44 -08:00
sdong	806d8916da	SimulatedHybridFileSystem to simulate HDD behavior more accurately (#9259 ) Summary: SimulatedHybridFileSystem now takes a more thorough simualtion of an HDD: 1. cover writes too, not just read 2. Latency and throughput is now simulated as seek + read time, using a rate limiter This implementation can be modified to simulate full HDD behavior, which is not yet done. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9259 Test Plan: Run db_bench and observe the desired behavior. Reviewed By: jay-zhuang Differential Revision: D32903039 fbshipit-source-id: a83f5d72143e114d5e75edf39d647bf0b71978e1	2021-12-14 20:07:57 -08:00
Yanqin Jin	e05c2bb549	Stress test for RocksDB transactions (#8936 ) Summary: Current db_stress does not cover complex read-write transactions. Therefore, this PR adds coverage for emulated MyRocks-style transactions in `MultiOpsTxnsStressTest`. To achieve this, we need: - Add a new operation type 'customops' so that we can add new complex groups of operations, e.g. transactions involving multiple read-write operations. - Implement three read-write transactions and two read-only ones to emulate MyRocks-style transactions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8936 Test Plan: ``` make check ./db_stress -test_multi_ops_txns -use_txn -clear_column_family_one_in=0 -column_families=1 -writepercent=0 -delpercent=0 -delrangepercent=0 -customopspercent=60 -readpercent=20 -prefixpercent=0 -iterpercent=20 -reopen=0 -ops_per_thread=100000 ``` Next step is to add more configurability and refine input generation and result reporting, which will done in separate follow-up PRs. Reviewed By: zhichao-cao Differential Revision: D31071795 Pulled By: riversand963 fbshipit-source-id: 50d7c828346ec643311336b904848a1588a37006	2021-12-14 13:34:43 -08:00
Peter Dillinger	e92a0ed040	Optimize & clean up footer code (#9280 ) Summary: Again, ahead of planned changes in https://github.com/facebook/rocksdb/issues/9058. This change improves performance (vs. pre-https://github.com/facebook/rocksdb/issues/9240 baseline) by separating a FooterBuilder from Footer, where FooterBuilder includes (inline owns) the serialized data so that it can be stack allocated. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9280 Test Plan: existing tests + performance testing below Extreme case performance testing as in https://github.com/facebook/rocksdb/issues/9240 with TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=fillseq -memtablerep=vector -allow_concurrent_memtable_write=false -num=30000000 (Each is ops/s averaged over 50 runs, run simultaneously with competing configuration for load fairness) Pre-https://github.com/facebook/rocksdb/issues/9240 baseline (`f577458`): 436389 With https://github.com/facebook/rocksdb/issues/9240 (`653c392`): 417946 (-4.2% vs. baseline) This change: 443762 (+1.7% vs. baseline) Reviewed By: ajkr Differential Revision: D33077220 Pulled By: pdillinger fbshipit-source-id: 7eaa6499589aac1693414a758e8c799216c5016c	2021-12-13 17:43:07 -08:00
Yanqin Jin	08721293ea	Fix a bug causing duplicate trailing entries in WritableFile (buffered IO) (#9236 ) Summary: `db_stress` is a user of `FaultInjectionTestFS`. After injecting a write error, `db_stress` probabilistically determins data drop (https://github.com/facebook/rocksdb/blob/6.27.fb/db_stress_tool/db_stress_test_base.cc#L2615:L2619). In some of our recent runs of `db_stress`, we found duplicate trailing entries corresponding to file trivial move in the MANIFEST, causing the recovery to fail, because the file move operation is not idempotent: you cannot delete a file from a given level twice. Investigation suggests that data buffering in both `WritableFileWriter` and `FaultInjectionTestFS` may be the root cause. WritableFileWriter buffers data to write in a memory buffer, `WritableFileWriter::buf_`. After each `WriteBuffered()`/`WriteBufferedWithChecksum()` succeeds, the `buf_` is cleared. If the underlying file `WritableFileWriter::writable_file_` is opened in buffered IO mode, then `FaultInjectionTestFS` buffers data written for each file until next file sync. After an injected error, user of `FaultInjectionFS` can choose to drop some or none of previously buffered data. If `db_stress` does not drop any unsynced data, then such data will still exist in the `FaultInjectionTestFS`'s buffer. Existing implementation of `WritableileWriter::WriteBuffered()` does not clear `buf_` if there is an error. This may lead to the data being buffered two copies: one in `WritableFileWriter`, and another in `FaultInjectionTestFS`. We also know that the `WritableFileWriter` of MANIFEST file will close upon an error. During `Close()`, it will flush the content in `buf_`. If no write error is injected to `FaultInjectionTestFS` this time, then we end up with two copies of the data appended to the file. To fix, we clear the `WritableFileWriter::buf_` upon failure as well. We focus this PR on files opened in non-direct mode. This PR includes a unit test to reproduce a case when write error injection to `WritableFile` can cause duplicate trailing entries. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9236 Test Plan: make check Reviewed By: zhichao-cao Differential Revision: D33033984 Pulled By: riversand963 fbshipit-source-id: ebfa5a0db8cbf1ed73100528b34fcba543c5db31	2021-12-13 09:00:36 -08:00
Davide Angelocola	8a97c541e4	Fix copy constructors of Options and ColumnFamilyOptions (#9166 ) Summary: Looks like some fields are not copied by the copy constructor. Please confirm if it is a real issue! Pull Request resolved: https://github.com/facebook/rocksdb/pull/9166 Reviewed By: jay-zhuang Differential Revision: D32532093 Pulled By: mrambacher fbshipit-source-id: f636ef9425a530a8655947115160ae471916252b	2021-12-13 07:22:56 -08:00
Akanksha Mahajan	eca85cdb66	Fix flaky tests related to Blob file deletions (#9287 ) Summary: CompactRange() only waits for manual.done to be set which happens as soon as new version is installed. Added TEST_WaitForCompact() which waits for compaction thread to actually finish which is after PurgeObsoleteFiles(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/9287 Test Plan: Reproducible by adding `bg_cv_.SignalAll();` inside if condition `297d913275/db/db_impl/db_impl_compaction_flush.cc (L2876)` Reviewed By: ajkr Differential Revision: D33051122 Pulled By: akankshamahajan15 fbshipit-source-id: cd793c79efb8cf8587faaf89f7c51f5d8e5bb71d	2021-12-12 15:31:38 -08:00
Yanqin Jin	5455cacd18	Fix link error reported in issue 9272 (#9278 ) Summary: As title, Closes https://github.com/facebook/rocksdb/issues/9272 Since TimestampAssigner-related classes needs to access `WriteBatch::ProtectionInfo` objects which is for internal use only, it's difficult to make `AssignTimestamp` methods a template and put them in the same public header, `include/rocksdb/write_batch.h`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9278 Test Plan: ``` make check # Also manually test following the repro-steps in issue 9272 ``` Reviewed By: ltamasi Differential Revision: D33012686 Pulled By: riversand963 fbshipit-source-id: 89f24a86a1170125bd0b94ef3b32e69aa08bd949	2021-12-10 20:33:46 -08:00
Levi Tamasi	297d913275	Update HISTORY.md for PR 9273 (#9282 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9282 Reviewed By: akankshamahajan15 Differential Revision: D33027844 Pulled By: ltamasi fbshipit-source-id: 7540d36010414311bc39610fff92a6498be1570c	2021-12-10 14:50:02 -08:00
Hui Xiao	cd85439632	Make TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAcces less flaky (#9281 ) Summary: Context: [Rapid thread creation and deletion](https://github.com/facebook/rocksdb/blob/6.27.fb/utilities/transactions/write_prepared_transaction_test.cc#L439-L444) in `SnapshotConcurrentAccessTest.SnapshotConcurrentAcces` inside a [potentially big loop](https://github.com/facebook/rocksdb/blob/6.27.fb/utilities/transactions/write_prepared_transaction_test.cc#L1238-L1248) can lead to heavy-loading the system with many threads due to delay in actually cleaning up thread's resource in the kernel sometime. We ran into some [flaky failure](https://app.circleci.com/pipelines/github/facebook/rocksdb/10383/workflows/136f1005-80a9-4515-aee9-fe36ac6462a1/jobs/253289) in CI and reproduced it by below: - Command ``` Added `ROCKSDB_NAMESPACE::port::InstallStackTraceHandler();` like https://github.com/facebook/rocksdb/pull/9276 DEBUG_LEVEL=2 make -j56 write_prepared_transaction_test GTEST_CATCH_EXCEPTIONS=0 ~/gtest-parallel/gtest-parallel -r 200 -w 200 ./write_prepared_transaction_test --gtest_filter=TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1 ``` - Stack, where `write_prepared_transaction_test.cc:442` in `https://github.com/facebook/rocksdb/issues/9` points to thread creation ``` [ RUN ] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1 ....terminate called after throwing an instance of 'std::system_error' what(): Resource temporarily unavailable Received signal 6 (Aborted) #0 /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38) [0x7fc114f39438] ... https://github.com/facebook/rocksdb/issues/7 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8e73) [0x7fc1158a5e73] ?? ??:0 https://github.com/facebook/rocksdb/issues/8 ./write_prepared_transaction_test() [0x4ca86c] std:🧵:thread<rocksdb::WritePreparedTransactionTestBase::SnapshotConcurrentAccessTestInternal(rocksdb::WritePreparedTxnDB, std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<unsigned long, std::allocator<unsigned long> const&, rocksdb::WritePreparedTxnDB::CommitEntry&, unsigned long&, unsigned long, unsigned long, unsigned long, unsigned long)::{lambda()https://github.com/facebook/rocksdb/issues/1}>(rocksdb::WritePreparedTransactionTestBase::SnapshotConcurrentAccessTestInternal(rocksdb::WritePreparedTxnDB, s d::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&, rocksdb::WritePreparedTxnDB::CommitEntry&, unsigned long&, unsigned long, unsigned long, unsigned long, unsigned long)::{l mbda()https://github.com/facebook/rocksdb/issues/1}&&) /usr/include/c++/5/thread:137 (discriminator 4) https://github.com/facebook/rocksdb/issues/9 ./write_prepared_transaction_test() [0x4bb80c] rocksdb::WritePreparedTransactionTestBase::SnapshotConcurrentAccessTestInternal(rocksdb::WritePreparedTxnDB*, std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&, rocksdb::W itePreparedTxnDB::CommitEntry&, unsigned long&, unsigned long, unsigned long, unsigned long, unsigned long) /home/circleci/project/utilities/transactions/write_prepared_transaction_test.cc:442 https://github.com/facebook/rocksdb/issues/10 ./write_prepared_transaction_test() [0x4407b6] rocksdb::SnapshotConcurrentAccessTest_SnapshotConcurrentAccess_Test::TestBody() /home/circleci/project/utilities/transactions/write_prepared_transaction_test.cc:1244 ... [109/200] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1 returned/aborted with exit code -6 (34462 ms) ``` - Move thread 2's work into current thread to avoid half of the thread creation cuz there is no difference in doing so. We expect this can make the thread-creation error less often, even though we can't gurantee it from happening again. Considering this is a trivial change with positive impact, it's still worth landing and monitor if it's enough to solve the problem in reality. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9281 Test Plan: Before the change, repeating the test 200 times with 200 workers failed `~/gtest-parallel/gtest-parallel -r 200 -w 200 ./write_prepared_transaction_test --gtest_filter=TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1` ``` [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from TwoWriteQueues/SnapshotConcurrentAccessTest [ RUN ] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1 ..unknown file: Failure C++ exception with description "Resource temporarily unavailable" thrown in the test body. [ FAILED ] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1, where GetParam() = (false, true, 1, 0, 1, 20) (11882 ms) [----------] 1 test from TwoWriteQueues/SnapshotConcurrentAccessTest (11882 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (11882 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1, where GetParam() = (false, true, 1, 0, 1, 20) ``` After the change: repeating the test 200 times with 200 workers didn't fail, even with repeating the "repeating" for 10 times like below `for i in {1..10}; do ~/gtest-parallel/gtest-parallel -r 200 -w 200 ./write_prepared_transaction_test --gtest_filter=TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1; done` ``` [200/200] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1 [200/200] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1 [200/200] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1 [200/200] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1 [200/200] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1 [200/200] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1 [200/200] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1 [200/200] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1 [200/200] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1 [200/200] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1 ``` It does failed when repeating the test 400 times with 400 workers `~/project$ ~/gtest-parallel/gtest-parallel -r 400 -w 400 ./write_prepared_transaction_test --gtest_filter=TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1` ``` [1/400] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1 (2928 ms) Note: Google Test filter = TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1 [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from TwoWriteQueues/SnapshotConcurrentAccessTest [ RUN ] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1 unknown file: Failure C++ exception with description "std::bad_alloc" thrown in the test body. [ FAILED ] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/1, where GetParam() = (false, true, 1, 0, 1, 20) (2597 ms) [----------] 1 test from TwoWriteQueues/SnapshotConcurrentAccessTest (2597 ms total) ``` Reviewed By: ajkr Differential Revision: D33026776 Pulled By: hx235 fbshipit-source-id: 509f57126392821e835e48396e5bf224f4f5dcac	2021-12-10 12:52:33 -08:00
Yanqin Jin	bd513fd075	Add commit marker with timestamp (#9266 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9266 This diff adds a new tag `CommitWithTimestamp`. Currently, there is no API to trigger writing this tag to WAL, thus it is unavailable to users. This is an ongoing effort to add user-defined timestamp support to write-committed transactions. This diff also indicates all column families that may potentially participate in the same transaction must either disable timestamp or have the same timestamp format, since `CommitWithTimestamp` tag is followed by a single byte-array denoting the commit timestamp of the transaction. We will enforce this checking in a future diff. We keep this diff small. Reviewed By: ltamasi Differential Revision: D31721350 fbshipit-source-id: e1450811443647feb6ca01adec4c8aaae270ffc6	2021-12-10 11:05:35 -08:00
Jermy Li	c39a808cb6	Deprecate WriteBatch.remove() and use the new style delete() (#9256 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9256 Reviewed By: mrambacher Differential Revision: D32971447 Pulled By: jay-zhuang fbshipit-source-id: 6954d7287229a8c776092bd82af3a8a8cd92b35e	2021-12-10 09:18:17 -08:00
Peter Dillinger	653c392e47	More refactoring ahead of footer & meta changes (#9240 ) Summary: I'm working on a new format_version=6 to support context checksum (https://github.com/facebook/rocksdb/issues/9058) and this includes much of the refactoring and test updates to support that change. Test coverage data and manual inspection agree on dead code in block_based_table_reader.cc (removed). Pull Request resolved: https://github.com/facebook/rocksdb/pull/9240 Test Plan: tests enhanced to cover more cases etc. Extreme case performance testing indicates small % regression in fillseq (w/ compaction), though CPU profile etc. doesn't suggest any explanation. There is enhanced correctness checking in Footer::DecodeFrom, but this should be negligible. TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=fillseq -memtablerep=vector -allow_concurrent_memtable_write=false -num=30000000 -checksum_type=1 --disable_wal={false,true} (Each is ops/s averaged over 50 runs, run simultaneously with competing configuration for load fairness) Before w/ wal: 454512 After w/ wal: 444820 (-2.1%) Before w/o wal: 1004560 After w/o wal: 998897 (-0.6%) Since this doesn't modify WAL code, one would expect real effects to be larger in w/o wal case. This regression will be corrected in a follow-up PR. Reviewed By: ajkr Differential Revision: D32813769 Pulled By: pdillinger fbshipit-source-id: 444a244eabf3825cd329b7d1b150cddce320862f	2021-12-10 08:13:26 -08:00
stefan-zobel	f57745814f	Minor RocksJava Java code cosmetics (#9204 ) Summary: Specifically: - unused imports - code formatting - typos in comments - unnecessary casts - missing default label in switch statement - explicit use of long literals in multiplication - use generics where possible without backward compatibility risk Pull Request resolved: https://github.com/facebook/rocksdb/pull/9204 Reviewed By: ajkr Differential Revision: D32955184 Pulled By: jay-zhuang fbshipit-source-id: 42d05ce42639d982b9ea34c8081266dfba7f1efa	2021-12-09 20:00:48 -08:00
Peter Dillinger	aec95b8c09	Debug "Resource temporarily unavailable" exception in CircleCI (#9276 ) Summary: This changes write_prepared_transaction_test under CircleCI to print a stack trace on unhandled exception, so that we can debug rare exceptions seen in CircleCI: [ RUN ] TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccess/24 .......unknown file: Failure C++ exception with description "Resource temporarily unavailable" thrown in the test body. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9276 Test Plan: manual run test with seeded 'throw', with and without CIRCLECI=true environment variable Reviewed By: ajkr, hx235 Differential Revision: D32996993 Pulled By: pdillinger fbshipit-source-id: e790408ce204b676d3d84a290e41be511b203bfa	2021-12-09 12:58:46 -08:00
mrambacher	5486717ee2	Fix an issue with MemTableRepFactory::CreateFromString (#9273 ) Summary: If ignore_unsupported_options=true, then it is possible for MemTableRepFactory::CreateFromString to succeed without setting a result (result=nullptr). This would cause the original value to be overwritten with null and an error would be raised later when PrepareOptions is invoked. Added unit test for this condition. Will add (in another PR unless required by reviewers) comparable tests for all of the other Customizable classes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9273 Reviewed By: ltamasi Differential Revision: D32990365 Pulled By: mrambacher fbshipit-source-id: b150724c3f5ae7346357b3866244fd93466875c7	2021-12-09 12:36:18 -08:00
Si Ke	79f4a04ee3	Get DBTest passing Assert Status Checked (#7737 ) Summary: Closes https://github.com/facebook/rocksdb/pull/7737 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9231 Reviewed By: hx235 Differential Revision: D32978332 Pulled By: pdillinger fbshipit-source-id: b28900b685d60c668529a90dbaa8e1b357b28f76	2021-12-09 11:00:17 -08:00
Adam Retter	c879910102	Fix fstatfs call for compilation on 32 bit systems (#9251 ) Summary: On some 32-bit systems, BTRFS_SUPER_MAGIC is unsigned while __fsword_t is signed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9251 Reviewed By: ajkr Differential Revision: D32961651 Pulled By: pdillinger fbshipit-source-id: 78e85fc1336f304a21e4d5961e60957c90daed63	2021-12-08 22:01:23 -08:00
Peter Dillinger	80ac7412b5	Polish/deflake BackupEngineTest.FileCollision (#9257 ) Summary: Use smaller and more predictable behaviors Pull Request resolved: https://github.com/facebook/rocksdb/pull/9257 Test Plan: gtest-parallel --repeat=N ./backupable_db_test --gtest_filter=BackupEngineTest.FileCollision before (N=50) we see inconsistent sets of SST files $ find /dev/shm/rocksdb_blah/ \| grep -o '/00.sst' \| grep -o '^[^_]' \| sort \| uniq -c 49 /000009 3 /000010 1 /000010.sst 49 /000012 3 /000013 1 /000013.sst 49 /000015 2 /000016 1 /000016.sst 22 /000018 2 /000019 1 /000019.sst 29 /000020 11 /000021 2 /000021.sst 46 /000022 2 /000022.sst 4 /000023 1 /000023.sst 27 /000025 And after (N=5000) we see $ find /dev/shm/rocksdb_blah/ \| grep -o '/00.sst' \| grep -o '^[^_]' \| sort \| uniq -c 10000 /000009 10000 /000012 5000 /000015 Reviewed By: ajkr Differential Revision: D32888393 Pulled By: pdillinger fbshipit-source-id: 5bfd075b3184bb66c5613758a53f431c406e9808	2021-12-08 21:57:46 -08:00
anand76	ecf2bec613	Add a listener callback for end of auto error recovery (#9244 ) Summary: Previously, the OnErrorRecoveryCompleted callback was called when RocksDB was able to successfully recover from a retryable error. However, if the recovery failed and was eventually stopped, there was no indication of the status. To fix that, a new OnErrorRecoveryEnd callback is introduced that deprecates the OnErrorRecoveryCompleted callback. The new callback is called with the original error and the new error status. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9244 Test Plan: Add a new unit test in error_handler_fs_test Reviewed By: zhichao-cao Differential Revision: D32922303 Pulled By: anand1976 fbshipit-source-id: f04e77a9cb92c5ea6385590682d3fcf559971b99	2021-12-08 14:30:57 -08:00
Akanksha Mahajan	9e4d56f2c9	Fix segmentation fault in table_options.prepopulate_block_cache when used with partition_filters (#9263 ) Summary: When table_options.prepopulate_block_cache is set to BlockBasedTableOptions::PrepopulateBlockCache::kFlushOnly and table_options.partition_filters is also set true, then there is segmentation failure when top level filter is fetched because its entered with wrong type in cache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9263 Test Plan: Updated unit tests; Ran db_stress: make crash_test -j32 Reviewed By: pdillinger Differential Revision: D32936566 Pulled By: akankshamahajan15 fbshipit-source-id: 8bd79e53830d3e3c1bb79787e1ffbc3cb46d4426	2021-12-08 12:44:38 -08:00
Levi Tamasi	94d99400dc	Fix a typo in DBSSTTest.DBWithMaxSpaceAllowedWithBlobFiles (#9270 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9270 Test Plan: ``` gtest-parallel --repeat=10000 ./db_sst_test --gtest_filter=DBSSTTest.DBWithMaxSpaceAllowedWithBlobFiles ``` Reviewed By: akankshamahajan15 Differential Revision: D32958154 Pulled By: ltamasi fbshipit-source-id: b6ec2fbbece80d73c567cec57638dffd3c84a2ba	2021-12-08 12:05:37 -08:00
Levi Tamasi	d1f053b0ae	Attempt to deflake DBSSTTest.DestroyDBWithRateLimitedDelete (#9269 ) Summary: This test case seems to be occasionally failing due to the code hitting the immediate deletion branch in `DeleteScheduler::DeleteFile`. The patch increases the allowed trash ratio to a huge value to prevent this from happening. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9269 Test Plan: ``` gtest-parallel --repeat=10000 ./db_sst_test --gtest_filter=DBSSTTest.DestroyDBWithRateLimitedDelete ``` Reviewed By: akankshamahajan15 Differential Revision: D32956596 Pulled By: ltamasi fbshipit-source-id: 3945e7c1c19ede76698e03c3f133bc1d9fd61b84	2021-12-08 11:16:46 -08:00
Hui Xiao	66b31c5098	Fix -Werror=maybe-uninitialized in db_stress_tool (#9265 ) Summary: Context/Summary: Uninitialized variable `SequenceNumber old_saved_seqno` causes asan related compilation error/warning below: ``` db_stress_tool/expected_state.cc:308:55: error: ‘old_saved_seqno’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 308 \| if (s.ok() && old_saved_seqno != kMaxSequenceNumber && \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~ ``` Fix it by initializing to 0. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9265 Test Plan: - make clean && COMPILE_WITH_ASAN=1 make -j48 db_stress_tool/expected_state.o - monitor if same error happens again after merging Reviewed By: ajkr Differential Revision: D32939630 Pulled By: hx235 fbshipit-source-id: 41697515fd11ada8427f606b5dceb4e58d12cb80	2021-12-07 22:42:30 -08:00
Andrew Kryczka	ce42ae6ffd	Fix Statistics in db_stress (#9260 ) Summary: The `Statistics` objects are meant to be shared across translation units, but this was prevented by declaring them static. We need to ensure they are defined once in the program. The effect is now `StressTest::PrintStatistics()` can actually print statistics since it now sees non-null values when `--statistics=1`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9260 Reviewed By: zhichao-cao Differential Revision: D32910162 Pulled By: ajkr fbshipit-source-id: c926d6f556177987bee5fa3cbc87597803b230ee	2021-12-07 16:24:22 -08:00
Andrew Kryczka	a6a6aad74e	db_stress support tracking historical values (#8960 ) Summary: When `--sync_fault_injection` is set, this PR takes a snapshot of the expected values and starts an operation trace when the DB is opened. These files are stored in `--expected_values_dir`. They will be used for recovering the expected state of the DB following a crash where a suffix of unsynced operations are allowed to be lost. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8960 Test Plan: injected crashed at various points in `FileExpectedStateManager` and verified the next run recovers the state/trace file with highest seqno and removes all older/temporary files. Note we don't use sync_fault_injection in CI crash tests yet. Reviewed By: pdillinger Differential Revision: D31194941 Pulled By: ajkr fbshipit-source-id: b0f935a529a0186c5a9c7709fcaa8829de8a84cf	2021-12-07 13:41:48 -08:00
sdong	88875df821	File temperature information should be preserved when restart the DB (#9242 ) Summary: Fix a bug that causes file temperature not preserved after DB is restarted, or options.max_manifest_file_size is hit. Also, pass temperature information to NewRandomAccessFile() to allow users to hack a solution where they don't preserve tiering information. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9242 Test Plan: Add a unit test that would fail without the fix. Reviewed By: jay-zhuang Differential Revision: D32818150 fbshipit-source-id: 36aa3f148c60107f7b8e9d65b63b039f9e1a1eec	2021-12-03 14:43:14 -08:00
Hui Xiao	bf2f504188	Add Java API change HISTORY section for #9212 (#9243 ) Summary: Context/Summary: https://github.com/facebook/rocksdb/issues/9212 removed a Java public API without noting it in HISTORY. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9243 Test Plan: Existing tests. Reviewed By: ajkr Differential Revision: D32841050 Pulled By: hx235 fbshipit-source-id: 3b771ffef3ba718f8d70201747ee0e5cbf6de52f	2021-12-03 12:51:38 -08:00
Levi Tamasi	930f2e92e6	Attempt to deflake DBSSTTest.DBWithSFMForBlobFilesAtomicFlush (#9241 ) Summary: When using the SST file manager, the actual deletion of DB files potentially occurs in the background. The patch adds another call to `SstFileManagerImpl::WaitForEmptyTrash` to the test case `DBSSTTest.DBWithSFMForBlobFilesAtomicFlush` to ensure the deletions are performed before the test checks the number of deleted files. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9241 Test Plan: ``` gtest-parallel --repeat=1000 ./db_sst_test --gtest_filter=DBSSTTest.DBWithSFMForBlobFilesAtomicFlush ``` Reviewed By: akankshamahajan15 Differential Revision: D32811427 Pulled By: ltamasi fbshipit-source-id: 7f2ad649a22bd2d7900e5f132372034093cfcf47	2021-12-02 16:54:21 -08:00
lgqss	77c7085594	MemTableList::TrimHistory now use allocated bytes (#9020 ) Summary: Fix a bug when both max_write_buffer_size_to_maintain and max_write_buffer_number_to_maintain are 0. The bug was introduced in 6.5.0 and https://github.com/facebook/rocksdb/issues/5022. Fix https://github.com/facebook/rocksdb/issues/8371 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9020 Reviewed By: pdillinger Differential Revision: D32767084 Pulled By: ajkr fbshipit-source-id: c401ee6e2557230e892d0fe8abb4966cbd18e85f	2021-12-02 11:45:39 -08:00
Hui Xiao	9daf07305c	Replace TableProperties::properties_offsets map with external_sst_file_global_seqno_offset (#9212 ) Summary: Context: Searching `TableProperties::properties_offsets` across the codebase reveals that internally it is only used to find the external SST file's global seqno offeset. Therefore we can narrow it down and replace this map property with a uint64_t property `external_sst_file_global_seqno_offset` to save memory usage related to table properties. Note: - See PR comments for discussion about potential impact on existing external usage of `TableProperties::properties_offsets` - See PR comments for discussion on keeping external SST file global seqno's offset VS using a simple flag indicating seqno's existence. Summary: - Replaced `TableProperties::properties_offsets` with `TableProperties::external_sst_file_global_seqno_offset` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9212 Test Plan: - Relied on existing tests should be sufficient since `TableProperties::properties_offsets` existed before and should already be tested. Reviewed By: ajkr Differential Revision: D32665941 Pulled By: hx235 fbshipit-source-id: 718e44617346dc4f3b1276ee953e61c196277795	2021-12-02 08:30:36 -08:00

1 2 3 4 5 ...

10592 Commits