Summary:
When building with clang 9, warning is reported for InternalDBStatsType type names shadowed the one for statistics. Rename them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5779
Test Plan: Build with clang 9 and see it passes.
Differential Revision: D17239378
fbshipit-source-id: af28fb42066c738cd1b841f9fe21ab4671dafd18
Summary:
Fix the ouput overlap bug when using subcompactions, the upper bound of output
file was extended incorrectly.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4898
Differential Revision: D13736107
Pulled By: ajkr
fbshipit-source-id: 21dca09f81d5f07bf2766bf566f9b50dcab7d8e3
Summary:
1. this commit fixes our handling of a combination of two separate edge
cases. If a flush job does not pick any memtable to flush (because another
flush job has already picked the same memtables), and the column family
assigned to the flush job is dropped right before RocksDB calls
rocksdb::InstallMemtableAtomicFlushResults, our original code passes
a FileMetaData object whose file number is 0, failing the assertion in
rocksdb::InstallMemtableAtomicFlushResults (assert(m->GetFileNumber() > 0)).
2. Also piggyback a small change: since we already create a local copy of column family's mutable CF options to eliminate potential race condition with `SetOptions` call, we might as well use the local copy in other function calls in the same scope.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4932
Differential Revision: D13901322
Pulled By: riversand963
fbshipit-source-id: b936580af7c127ea0c6c19ea10cd5fcede9fb0f9
Summary:
If we do not do this, then reading MutableCFOptions may have a race condition
with SetOptions which modifies MutableCFOptions.
Also reserve space in advance for vectors to avoid reallocation changing the
address of its elements.
Test plan
```
$make clean && make -j32 all check
$make clean && COMPILE_WITH_TSAN=1 make -j32 all check
$make clean && COMPILE_WITH_ASAN=1 make -j32 all check
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4876
Differential Revision: D13644500
Pulled By: riversand963
fbshipit-source-id: 4b8112c5c819d5a2922bb61ad1521b3d2fb2fd47
Summary:
By convention, time_t almost always stores the integral number of seconds since
00:00 hours, Jan 1, 1970 UTC, according to http://www.cplusplus.com/reference/ctime/time_t/.
We surely want more precision than seconds.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4868
Differential Revision: D13633046
Pulled By: riversand963
fbshipit-source-id: 4e01e23a22e8838023c51a91247a286dbf3a5396
Summary:
Previously for point lookup we decided which file to look into based on user key overlap only. We also did not truncate range tombstones in the point lookup code path. These two ideas did not interact well in cases like this:
- L1 has range tombstone [a, c)#1 and point key b#2. The data is split between file1 with range [a#1,1, b#72057594037927935,15], and file2 with range [b#2, c#1].
- L1's file2 gets compacted to L2.
- User issues `Get()` for b#3.
- L1's file1 is opened and the range tombstone [a, c)#1 is found for b, while no point-key for b is found in L1.
- `Get()` assumes that the range tombstone must cover all data in that range in lower levels, so short circuits and returns `NotFound`.
The solution to this problem is to not look into files that only overlap with the point lookup at a range tombstone sentinel endpoint. In the above example, this would mean not opening L1's file1 or its tombstones during the `Get()`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4829
Differential Revision: D13561355
Pulled By: ajkr
fbshipit-source-id: a13c21c816870a2f5d32a48af6dbd719a7d9d19f
Summary:
as titled.
Since different bg flush threads can flush different sets of column families
(due to column family creation and drop), we decide not to let one thread
perform atomic flush result installation for other threads. Bg flush threads
will install their atomic flush results sequentially to MANIFEST, using
a conditional variable, i.e. atomic_flush_install_cv_ to coordinate.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4791
Differential Revision: D13498930
Pulled By: riversand963
fbshipit-source-id: dd7482fc41f4bd22dad1e1ef7d4764ef424688d7
Summary:
in certain cases, we do not perform memtable switching if the active
memtable of the column family is empty. Two exceptions:
1. In manual flush, if cached_recoverable_state_empty_ is false, then we need
to switch memtable due to requirement of transaction.
2. In switch WAL, we need to switch memtable anyway because we have to seal the
memtable if the WAL on which it depends will be closed.
This change can potentially delay the occurence of write stalls because number
of memtables increase more slowly.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4792
Differential Revision: D13499501
Pulled By: riversand963
fbshipit-source-id: 91c9b17ae753578578039f3851667d93610005e1
Summary:
If one column family is dropped, we should simply skip it and continue to flush
other active ones.
Currently we use Status::ShutdownInProgress to notify caller of column families
being dropped. In the future, we should consider using a different Status code.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4708
Differential Revision: D13378954
Pulled By: riversand963
fbshipit-source-id: 42f248cdf2d32d4c0f677cd39012694b8f1328ca
Summary:
1. DBImplReadOnly::GetLiveFiles should not return NotSupported. Instead, it
should call DBImpl::GetLiveFiles(flush_memtable=false).
2. In DBImp::Recover, we should also recover the OPTIONS file name and/or
number so that an immediate subsequent GetLiveFiles will get the correct
OPTIONS name.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4681
Differential Revision: D13069205
Pulled By: riversand963
fbshipit-source-id: 3e6a0174307d06db5a01feb099b306cea1f7f88a
Summary:
Declare Jemalloc non-standard APIs as weak symbols, so that if Jemalloc is linked with the binary, these symbols will be replaced by Jemalloc's, otherwise they will be nullptr. This is similar to how folly detect jemalloc, but we assume the main program use jemalloc as long as jemalloc is linked: https://github.com/facebook/folly/blob/master/folly/memory/Malloc.h#L147
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4844
Differential Revision: D13574934
Pulled By: yiwu-arbug
fbshipit-source-id: 7ea871beb1be7d5a1259cc38f9b78078793db2db
Summary:
Now that v2 is fully functional, the v1 aggregator is removed.
The v2 aggregator has been renamed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4778
Differential Revision: D13495930
Pulled By: abhimadan
fbshipit-source-id: 9d69500a60a283e79b6c4fa938fc68a8aa4d40d6
Summary:
RangeDelAggregatorV2 now supports ShouldDelete calls on
snapshot stripes and creation of range tombstone compaction iterators.
RangeDelAggregator is no longer used on any non-test code path, and will
be removed in a future commit.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4758
Differential Revision: D13439254
Pulled By: abhimadan
fbshipit-source-id: fe105bcf8e3d4a2df37a622d5510843cd71b0401
Summary:
To support the flush/compaction use cases of RangeDelAggregator
in v2, FragmentedRangeTombstoneIterator now supports dropping tombstones
that cannot be read in the compaction output file. Furthermore,
FragmentedRangeTombstoneIterator supports the "snapshot striping" use
case by allowing an iterator to be split by a list of snapshots.
RangeDelAggregatorV2 will use these changes in a follow-up change.
In the process of making these changes, other miscellaneous cleanups
were also done in these files.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4740
Differential Revision: D13287382
Pulled By: abhimadan
fbshipit-source-id: f5aeb03e1b3058049b80c02a558ee48f723fa48c
Summary:
When write stall has already been triggered due to number of L0 files reaching
threshold, file ingestion must proceed with its flush without waiting for the
write stall condition to cleared by the compaction because compaction can wait
for ingestion to finish (circular wait).
In order to avoid this wait, we can set `FlushOptions.allow_write_stall` to be
true (default is false). Setting it to false can cause deadlock.
This can happen when the number of compaction threads is low.
Considere the following
```
Time compaction_thread ingestion_thread
| num_running_ingest_file_++
| while(num_running_ingest_file_>0){wait}
| flush
V
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4751
Differential Revision: D13343037
Pulled By: riversand963
fbshipit-source-id: d3b95938814af46ec4c463feff0b50c70bd8b23f
Summary:
Current implementation of `current_over_upper_bound_` fails to take into consideration that keys might be invalid in either base iterator or delta iterator. Calling key() in such scenario will lead to assertion failure and runtime errors.
This PR addresses the bug by adding check for valid keys before calling `IsOverUpperBound()`, also added test coverage for iterate_upper_bound usage in BaseDeltaIterator
Also recommit https://github.com/facebook/rocksdb/pull/4656 (It was reverted earlier due to bugs)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4702
Differential Revision: D13146643
Pulled By: miasantreble
fbshipit-source-id: 6d136929da12d0f2e2a5cea474a8038ec5cdf1d0
Summary:
Full block (use_block_based_builder=false) Bloom filter has clear CPU saving benefits but with limitation of using temp memory when building an SST file proportional to the SST file size. We reduced the chance of having large SST files with multi-level universal compaction. Now we change to a default with better performance.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4735
Differential Revision: D13266674
Pulled By: siying
fbshipit-source-id: 7594a4c3e32568a5a2adce22bb0e46553e55c602
Summary:
**Summary:**
Simplified the code layout by moving FIFOCompactionPicker to a separate file.
**Why?:**
While trying to add ttl functionality to universal compaction, I found that `FIFOCompactionPicker` class and its impl methods to be interspersed between `LevelCompactionPicker` methods which kind-of made the code a little hard to traverse. So I moved `FIFOCompactionPicker` to a separate compaction_picker_fifo.h/cc file, similar to `UniversalCompactionPicker`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4724
Differential Revision: D13227914
Pulled By: sagar0
fbshipit-source-id: 89471766ea67fa4d87664a41c057dd7df4b3d4e3
Summary:
There is a race condition in DBFlushTest.SyncFail, as illustrated below.
```
time thread1 bg_flush_thread
| Flush(wait=false, cfd)
| refs_before=cfd->current()->TEST_refs() PickMemtable calls cfd->current()->Ref()
V
```
The race condition between thread1 getting the ref count of cfd's current
version and bg_flush_thread incrementing the cfd's current version makes it
possible for later assertion on refs_before to fail. Therefore, we add test
sync points to enforce the order and assert on the ref count before and after
PickMemtable is called in bg_flush_thread.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4633
Differential Revision: D12967131
Pulled By: riversand963
fbshipit-source-id: a99d2bacb7869ec5d8d03b24ef2babc0e6ae1a3b
Summary:
there is chance that
* the caller tries to repair the db when holding the db_lock, in
that case the env implementation might not set the `lock`
parameter of Repairer::Run().
* the caller somehow never calls Repairer::Run().
either way, the desctructor of Repair will compare the uninitialized
db_lock_ with nullptr, and tries to unlock it. there is good chance
that the db_lock_ is not nullptr, then boom.
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4683
Differential Revision: D13260287
Pulled By: riversand963
fbshipit-source-id: 878a119d2e9f10a0fa17ee62cf3fb24b33d49fa5
Summary:
Removed `one_time_use` flag, which removed the need for some
tests, and changed all `NewRangeTombstoneIterator` methods to return
`FragmentedRangeTombstoneIterators`.
These changes also led to removing `RangeDelAggregatorV2::AddUnfragmentedTombstones`
and one of the `MemTableListVersion::AddRangeTombstoneIterators` methods.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4692
Differential Revision: D13106570
Pulled By: abhimadan
fbshipit-source-id: cbab5432d7fc2d9cdfd8d9d40361a1bffaa8f845
Summary:
If user do not end the trace manually, the tracing will continue which can potential use up all the storage space and cause problem. In this PR, the max trace file size is added to the TraceOptions and user can set the value if they need or the default is 64GB.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4610
Differential Revision: D12893400
Pulled By: zhichao-cao
fbshipit-source-id: acf4b5a6076bb691778bdfbac4864e1006758953
Summary:
Previously, every range tombstone iterator was seeked on every
ShouldDelete call, which quickly degraded performance for long range
scans. This PR improves performance by tracking iterator positions and
only advancing iterators when necessary.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4677
Differential Revision: D13205373
Pulled By: abhimadan
fbshipit-source-id: 80c199dace1e19362a4c61c686bf01913eae87cb
Summary:
We haven't been populating `NO_FILE_CLOSES` since v1.5.8 even though it was never marked as deprecated. Start populating it again. Conveniently `DeleteTableReader` has an unused `void*` argument that we can use...
Blame: 63f216ee0aCloses#4700.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4703
Differential Revision: D13146769
Pulled By: ajkr
fbshipit-source-id: ad8d6fb0493e701f60a165a3bca1787d255be008
Summary:
…ons (#4676)"
This reverts commit b32d087dbb.
`MemoryAllocator` needs to be with `Cache`, since cache entry can
outlive DB and block based table. The cache needs to hold reference to
memory allocator when deleting cache entry.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4697
Differential Revision: D13133490
Pulled By: yiwu-arbug
fbshipit-source-id: 8ef7e8a51263bfd929f892fd062665ff4ce9ce5a
Summary:
The old RangeDelAggregator did expensive pre-processing work
to create a collapsed, binary-searchable representation of range
tombstones. With FragmentedRangeTombstoneIterator, much of this work is
now unnecessary. RangeDelAggregatorV2 takes advantage of this by seeking
in each iterator to find a covering tombstone in ShouldDelete, while
doing minimal work in AddTombstones. The old RangeDelAggregator is still
used during flush/compaction for now, though RangeDelAggregatorV2 will
support those uses in a future PR.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4649
Differential Revision: D13146964
Pulled By: abhimadan
fbshipit-source-id: be29a4c020fc440500c137216fcc1cf529571eb3
Summary:
Since a range tombstone seen at one level will cover all keys
in the range at lower levels, there was a short-circuiting check in Get
that reported a key was not found at most one file after the range
tombstone was discovered. However, this was incorrect for merge
operands, since a deletion might only cover some merge operands,
which implies that the key should be found. This PR fixes this logic in
the Version portion of Get, and removes the logic from the MemTable
portion of Get, since the perforamnce benefit provided there is minimal.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4698
Differential Revision: D13142484
Pulled By: abhimadan
fbshipit-source-id: cbd74537c806032f2bfa564724d01a80df7c8f10
Summary:
WriteBufferManger is not invoked when allocating memory for memtable if the limit is not set even if a cache is passed. It is inconsistent from the comment syas. Fix it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4695
Differential Revision: D13112722
Pulled By: siying
fbshipit-source-id: 0b27eef63867f679cd06033ea56907c0569597f4
Summary:
This is a quick fix for the uninitialized bugs in `LiveFileMetaData` and `SstFileMetaData` that were uncovered in #4686.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4693
Differential Revision: D13113189
Pulled By: ajkr
fbshipit-source-id: 18e798d031d2a59d0b55fc010c135e0126f4042d
Summary:
This fixes an assertion.
An atomic flush can have multiple flush jobs. Some of them may fail. If any of
them fails, we need to rollback all of them.
For the flush jobs that do fail, we already call `RollbackMemTableFlush` in
`FlushJob::Run`. The tricky part is for flush jobs that have completed
successfully. We need to call `RollbackMemTableFlush` for them as well.
The newly added DBAtomicFlushTest.AtomicFlushRollbackSomeJobs will SigAbort
without the corresponding change in AtomicFlushMemTablesToOutputFiles.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4641
Differential Revision: D12943649
Pulled By: riversand963
fbshipit-source-id: c66a4a664a1e0938e938fd41edc5a70c34cdd868
Summary:
Rather than storing a `vector<RangeTombstone>`, we now store a
`vector<RangeTombstoneStack>` and a `vector<SequenceNumber>`. A
`RangeTombstoneStack` contains the start and end keys of a range tombstone
fragment, and indices into the seqnum vector to indicate which sequence
numbers the fragment is located at. The diagram below illustrates an
example:
```
tombstones_: [a, b) [c, e) [h, k)
| \ / \ / |
| \ / \ / |
v v v v
tombstone_seqs_: [ 5 3 10 7 2 8 6 ]
```
This format allows binary searching the tombstone list to use less key
comparisons, which helps in cases where there are many overlapping
tombstones. Also, this format makes it easier to add DBIter-like
semantics to `FragmentedRangeTombstoneIterator` in the future.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4632
Differential Revision: D13053103
Pulled By: abhimadan
fbshipit-source-id: e8220cc712fcf5be4d602913bb23ace8ea5f8ef0
Summary:
DBTest.SanitizeNumThreads Sometimes fails. The test waited for 10ms timeout and expect all threads scheduled to be executed. This can be a source of flakiness. Make a check every 1ms and up to 10s.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4659
Differential Revision: D13074174
Pulled By: siying
fbshipit-source-id: b1d5ff87a326a4fc9eab8d1cc307bbb940dfe70c
Summary:
`GenSubcompactionBoundaries` calls `VersionSet::ApproximateSize` which gets BlockBasedTableReader for every file and seeks in its index block to find `key`'s offset. If the table or index block aren't in memory already, this involves I/O. This can be improved by releasing DB mutex when calling ApproximateSize.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4630
Differential Revision: D13052653
Pulled By: miasantreble
fbshipit-source-id: cae31d46d10d0860fa8a26b8d5154b2d17d1685f
Summary:
Currently transaction iterator does not apply `ReadOptions.iterate_upper_bound` when iterating. This PR attempts to fix the problem by having `BaseDeltaIterator` enforcing the upper bound check when iterator state is changed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4656
Differential Revision: D13039257
Pulled By: miasantreble
fbshipit-source-id: 909eb9f6b4597a4d80418fb139f32ec82c6ec1d1
Summary:
Per offline discussion with siying, `MemoryAllocator` and `Cache` should be decouple. The idea is that memory allocator handles memory allocation, while cache handle cache policy.
It is normal that external cache libraries pack couple the two components for better optimization. If we want to integrate with such library in the future, we can make a wrapper of the library implementing both `Cache` and `MemoryAllocator` interface.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4676
Differential Revision: D13047662
Pulled By: yiwu-arbug
fbshipit-source-id: cd42e246d80ab600b4de47d073f7d2db308ce6dd
Summary:
We used to have a bug, which caused every block to be read twice, and none of our tests caught it. Add a very simply unit test to make sure that when reading a data block, we only issue one pread against the SST file.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4657
Differential Revision: D13005260
Pulled By: siying
fbshipit-source-id: 03167b554ad2451192b1707415536d7d05e9026c
Summary:
he ratio of num_deletions to num_entries of a level can be useful to determine if a manual compaction needs to be triggered on a level.
Also refer #3980
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4623
Differential Revision: D13045744
Pulled By: sagar0
fbshipit-source-id: 71f3c8e363a8ffd194ec3bb0ed0b69612231f0b3
Summary:
Currently, `Statistics` can record tick by `recordTick()` whose second parameter is an `uint64_t`.
That means tick can only increase.
If we want to reduce tick, we have to work around like `RecordTick(statistics_, NO_ITERATORS, uint64_t(-1));`.
That's kind of a hack.
So, this PR divide `NO_ITERATORS` into two counters `NO_ITERATOR_CREATED` and `NO_ITERATOR_DELETE`, making the counters increase only.
Fixes#3013 .
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4498
Differential Revision: D10395010
Pulled By: sagar0
fbshipit-source-id: cfb523b22a37411c794b4e9da090f1ae30293db2
Summary:
Call `SyncClosedLogs()` only if there are more than one column families.
Update several unit tests (in `fault_injection_test` and `db_flush_test`) correspondingly.
See #3840 for more info.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4460
Differential Revision: D12896377
Pulled By: riversand963
fbshipit-source-id: f49afdaec32568f12f001219a3aec1dfde3b32bf
Summary:
Use the `DBOptions` that the backup engine already holds to figure out the right `EnvOptions` to use when reading the DB files. This means that, if a user opened a DB instance with `use_direct_reads=true`, then using `BackupEngine` to back up that DB instance will use direct I/O to read files when calculating checksums and copying. Currently the WALs and manifests would still be read using buffered I/O to prevent mixing direct I/O reads with concurrent buffered I/O writes.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4640
Differential Revision: D13015268
Pulled By: ajkr
fbshipit-source-id: 77006ad6f3e00ce58374ca4793b785eea0db6269
Summary:
this PR adds two more per-level perf context counters to track
* number of keys returned in Get call, break down by levels
* total processing time at each level during Get call
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4617
Differential Revision: D12898024
Pulled By: miasantreble
fbshipit-source-id: 6b84ef1c8097c0d9e97bee1a774958f56ab4a6c4
Summary:
Our internal CI test caught RocksDB Lite build failures. The failures are due to a new test introduced in #4665 using `SSTFileWriter` and `IngestExternalFile`, but these is not exposed under lite mode. Fixed by #ifdef'ing out the test.
```
db/db_test2.cc: In member function ‘virtual void rocksdb::DBTest2_TestCompactFiles_Test::TestBody()’:
db/db_test2.cc:2907:3: error: ‘SstFileWriter’ is not a member of ‘rocksdb’
rocksdb::SstFileWriter sst_file_writer{rocksdb::EnvOptions(), options};
^
In file included from ./util/testharness.h:15:0,
from ./table/mock_table.h:23,
from ./db/db_test_util.h:44,
from db/db_test2.cc:13:
db/db_test2.cc:2912:13: error: ‘sst_file_writer’ was not declared in this scope
ASSERT_OK(sst_file_writer.Open(external_file1));
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4675
Differential Revision: D13035984
Pulled By: sagar0
fbshipit-source-id: c1ceac550dfac1a85eeea436693dc7dd467519a6
Summary:
Part of the test required that a compaction start before a
manual flush, but this was not enforced by the test. In some cases,
particularly when writing to tmpfs, this could lead to the compaction
starting after the flush, which caused the base level to be higher than
it was expected to be. Add a sync point in the test to ensure that the
flush and compaction happen simultaneously.
The test also had some stale comments, so those have been removed or
modified, and the test has been simplified so that it no longer uses sleeps
and writes uncompressed SSTs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4668
Differential Revision: D13032440
Pulled By: abhimadan
fbshipit-source-id: 3f23b583a096454dafb8d8ea75678605dec80209
Summary:
`CompactFiles` gets `SuperVersion` before `WaitForIngestFile`, while `IngestExternalFile` may add files that overlap with `input_file_names`
The timeline of execution flow is as follow:
Let's say that level N has two file [1,2] and [5,6]
```
timeline user_thread1 user_thread2
t0 | CompactFiles([1, 2], [5, 6]) begin
t1 | GetReferencedSuperVersion()
t2 | IngestExternalFile([3,4]) to level N begin
t3 | CompactFiles resume
V
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4665
Differential Revision: D13030674
Pulled By: ajkr
fbshipit-source-id: 8be19477fd6e505032267a979d32f3097cc3be51
Summary:
In the past, both `DBImpl::atomic_flush_` and
`DBImpl::immutable_db_options_.atomic_flush` exist. However, we fail to set
`immutable_db_options_.atomic_flush`, but use `DBImpl::atomic_flush_` which is
set correctly. This does not lead to incorrect behavior, but is a duplicate of
information.
Since `immutable_db_options_` is always there and has `atomic_flush`, we should
use it as source of truth and remove `DBImpl::atomic_flush_`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4631
Differential Revision: D12928371
Pulled By: riversand963
fbshipit-source-id: f85a811959d3828aad4a3a1b05f71facf19c636d
Summary:
Hi, yiwu-arbug, I found that `DBImpl::GetColumnFamilyHandleUnlocked` still have data race condition, because `column_family_memtables_` has a stateful cache `current_` and `column_family_memtables_::Seek` maybe call without the protection of `mutex_` by a write thread
check 859dbda6e3/db/write_batch.cc (L1188) and 859dbda6e3/db/write_batch.cc (L1756) and 859dbda6e3/db/db_impl_write.cc (L318)
So it's better to use `versions_->GetColumnFamilySet()->GetColumnFamily` instead.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4666
Differential Revision: D13027117
Pulled By: yiwu-arbug
fbshipit-source-id: 4e3778eaf8e7f7c8577bbd78129b6a5fd7ce79fb