Summary:
Implementation of https://github.com/facebook/rocksdb/issues/8221, plus/including extension of Java options API to allow the get() of options from RocksDB. The extension allows more comprehensive testing of options at the Java side, by validating that the options are set at the C++ side.
Variations on methods:
MutableColumnFamilyOptions.MutableColumnFamilyOptionsBuilder getOptions()
MutableDBOptions.MutableDBOptionsBuilder getDBOptions()
retrieve the options via RocksDB C++ interfaces, and parse the resulting string into one of the Java-style option objects.
This necessitated generalising the parsing of option strings in Java, which now parses the full range of option strings returned by the C++ interface, rather than a useful subset. This necessitates the list-separator being changed to :(colon) from , (comma).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8999
Reviewed By: jay-zhuang
Differential Revision: D31655487
Pulled By: ltamasi
fbshipit-source-id: c38e98145c81c61dc38238b0df580db176ce4efd
Summary:
* New public header unique_id.h and function GetUniqueIdFromTableProperties
which computes a universally unique identifier based on table properties
of table files from recent RocksDB versions.
* Generation of DB session IDs is refactored so that they are
guaranteed unique in the lifetime of a process running RocksDB.
(SemiStructuredUniqueIdGen, new test included.) Along with file numbers,
this enables SST unique IDs to be guaranteed unique among SSTs generated
in a single process, and "better than random" between processes.
See https://github.com/pdillinger/unique_id
* In addition to public API producing 'external' unique IDs, there is a function
for producing 'internal' unique IDs, with functions for converting between the
two. In short, the external ID is "safe" for things people might do with it, and
the internal ID enables more "power user" features for the future. Specifically,
the external ID goes through a hashing layer so that any subset of bits in the
external ID can be used as a hash of the full ID, while also preserving
uniqueness guarantees in the first 128 bits (bijective both on first 128 bits
and on full 192 bits).
Intended follow-up:
* Use the internal unique IDs in cache keys. (Avoid conflicts with https://github.com/facebook/rocksdb/issues/8912) (The file offset can be XORed into
the third 64-bit value of the unique ID.)
* Publish the external unique IDs in FileStorageInfo (https://github.com/facebook/rocksdb/issues/8968)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8990
Test Plan:
Unit tests added, and checking of unique ids in stress test.
NOTE in stress test we do not generate nearly enough files to thoroughly
stress uniqueness, but the test trims off pieces of the ID to check for
uniqueness so that we can infer (with some assumptions) stronger
properties in the aggregate.
Reviewed By: zhichao-cao, mrambacher
Differential Revision: D31582865
Pulled By: pdillinger
fbshipit-source-id: 1f620c4c86af9abe2a8d177b9ccf2ad2b9f48243
Summary:
Add a property_bag option in FileOptions for direct FileSystem users to pass custom properties to the provider in APIs such as NewRandomAccessFile, NewWritableFile etc. This field will be ignored/not populated by RocksDB.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9030
Reviewed By: zhichao-cao
Differential Revision: D31630643
Pulled By: anand1976
fbshipit-source-id: 1e1ddc5e2933ecada99a94eada5f309b674a03e8
Summary:
`dbs` should not be cleared, as it is reused later when reopening the DBs, so we have an out-of-bounds access with `dbnames[dbnum]`. The values left in the vector don't need to be reset, as the db pointer is an out parameter for `DB::Open`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9046
Reviewed By: pdillinger
Differential Revision: D31738263
Pulled By: ot
fbshipit-source-id: c619e947b8d3dbc3d896f29971f093d3e3c794d3
Summary:
If `DB::Close()` is called in multi-thread env, the resource
could be double released, which causes exception or assert.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8970
Test Plan:
Test with multi-thread benchmark, with each thread try to
close the DB at the end.
Reviewed By: pdillinger
Differential Revision: D31242042
Pulled By: jay-zhuang
fbshipit-source-id: a61276b1b61e07732e375554106946aea86a23eb
Summary:
WaitForFlushMemTable() may only wait for mem flush but not background flush
finishing. The the obsoleted file may not be purged yet.
fcaa7ff638/db/db_impl/db_impl_compaction_flush.cc (L2200-L2203)
Use WaitForCompact() instead to wait for background flush job.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9049
Test Plan: `gtest-parallel ./obsolete_files_test --gtest_filter=ObsoleteFilesTest.DeleteObsoleteOptionsFile -r 1000`
Reviewed By: zhichao-cao
Differential Revision: D31737343
Pulled By: jay-zhuang
fbshipit-source-id: 82276ebeae7c7c75a733d3e1fd1c130d45e4761f
Summary:
https://github.com/facebook/rocksdb/issues/8031 broke internal tests. This should fix but also preserve
the intended capability of getting git commit id when hg not used
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9047
Test Plan: already broken ¯\\_(ツ)_/¯
Reviewed By: zhichao-cao
Differential Revision: D31732198
Pulled By: pdillinger
fbshipit-source-id: 7dba8531ddca55a6de5e04978a1a1601aae4cee9
Summary:
Primarily, this change reserves space in the std::string for building
the next block once a block is finished, using `block_size` as
reservation size. Note: also tried reusing same std::string in the
common "unbuffered" path but that showed no benefit or regression.
Secondarily, this slightly reduces the work in resetting `restarts_`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9039
Test Plan:
TEST_TMPDIR=/dev/shm/rocksdb1 ./db_bench -benchmarks=fillseq -memtablerep=vector -allow_concurrent_memtable_write=false -num=50000000
Compiled with DEBUG_LEVEL=0
Test vs. control runs simulaneous for better accuracy, units = ops/sec
Run 1, Primary change only: 292697 vs. 280267 (+4.4%)
Run 2, Primary change only: 288763 vs. 279621 (+3.3%)
Run 1, Secondary change only: 260065 vs. 254232 (+2.3%)
Run 2, Secondary change only: 275925 vs. 272248 (+1.4%)
Run 1, Both changes: 284890 vs. 270372 (+5.3%)
Run 2, Both changes: 263511 vs. 258188 (+2.0%)
Reviewed By: zhichao-cao
Differential Revision: D31701253
Pulled By: pdillinger
fbshipit-source-id: 7e40810afbb98e6b6446955e77bda59e69b19ffd
Summary:
New classes FileStorageInfo and LiveFileStorageInfo and
'experimental' function DB::GetLiveFilesStorageInfo, which is intended
to largely replace several fragmented DB functions needed to create
checkpoints and backups.
This function is now used to create checkpoints and backups, because
it fixes many (probably not all) of the prior complexities of checkpoint
not having atomic access to DB metadata. This also ensures strong
functional test coverage of the new API. Specifically, much of the old
CheckpointImpl::CreateCustomCheckpoint has been migrated to and
updated in DBImpl::GetLiveFilesStorageInfo, with the former now
calling the latter.
Also, the class FileStorageInfo in metadata.h compatibly replaces
BackupFileInfo and serves as a new base class for SstFileMetaData.
Some old fields of SstFileMetaData are still provided (for now) but
deprecated.
Although FileStorageInfo::directory is accurate when using db_paths
and/or cf_paths, these have never been supported by Checkpoint
nor BackupEngine and still are not. This change does now detect
these cases and return NotSupported when appropriate. (More work
needed for support.)
Somehow this change broke ProgressCallbackDuringBackup, but
the progress_callback logic was dubious to begin with because it
would call the callback based on copy buffer size, not size actually
copied. Logic and test updated to track size actually copied
per-thread.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8968
Test Plan:
tests updated.
DB::GetLiveFilesStorageInfo mostly tested by use in CheckpointImpl.
DBTest.SnapshotFiles updated to also test GetLiveFilesStorageInfo,
including reading the data after DB close.
Added CheckpointTest.CheckpointWithDbPath (NotSupported).
Reviewed By: siying
Differential Revision: D31242045
Pulled By: pdillinger
fbshipit-source-id: b183d1ce9799e220daaefd6b3b5365d98de676c0
Summary:
The first parameter of SyncPoint::Process is "const std::string&". The majority, maybe all, of the actual calls to this function use a "const char *". The conversion before entering the function requires a construction of a std::string object on the heap. This std::object is then typically not needed because first use of the string is a rocksdb::Slice which has a less costly conversion of char * to slice.
Example:
We have a load and iterate test. The test loads 10m keys and iterates most via 10 rocksdb::Iterator objects. We used TCMALLOC to gather information about allocation and space usage during iterators.
- Before this PR: test took 32 min 17 sec
- After this PR: test took 1 min 14 sec
The TCMALLOC top object list before this PR:
<pre>
Total: 5105999 objects
5003717 98.0% 98.0% 5009471 98.1% rocksdb::DBIter::MergeValuesNewToOld (inline)
20260 0.4% 98.4% 20260 0.4% std::__cxx11::basic_string::_M_mutate
15214 0.3% 98.7% 15214 0.3% rocksdb::UncompressBlockContentsForCompressionType (inline)
13408 0.3% 99.0% 13408 0.3% std::_Rb_tree::_M_emplace_hint_unique [clone .constprop.416] (inline)
12957 0.3% 99.2% 12957 0.3% std::_Rb_tree::_M_emplace_hint_unique [clone .constprop.405] (inline)
9327 0.2% 99.4% 9327 0.2% std::_Rb_tree::_M_copy (inline)
7691 0.2% 99.5% 9919 0.2% JVM_FindSignal
2859 0.1% 99.6% 2859 0.1% rocksdb::Cleanable::RegisterCleanup
2844 0.1% 99.7% 2844 0.1% std::map::operator[] (inline)
</pre>
The "MergeValuesNewToOld (inline)" objects are the #define wrappers to SyncPoint::Process. We discovered this in a 5.18 rocksdb release. There TCMALLOC was more specific that std::basic_string was being constructed. I believe that was before SyncPoint::Process was declared inline in subsequent releases.
The TCMALLOC top object list after this PR:
<pre>
Total: 104911 objects
45090 43.0% 43.0% 45090 43.0% rocksdb::Cleanable::RegisterCleanup
29995 28.6% 71.6% 29995 28.6% rocksdb::LRUCacheShard::Insert
15229 14.5% 86.1% 15229 14.5% rocksdb::UncompressBlockContentsForCompressionType (inline)
4373 4.2% 90.3% 4551 4.3% JVM_FindSignal
2881 2.7% 93.0% 2881 2.7% rocksdb::::ReadBlockFromFile (inline)
1162 1.1% 94.1% 1176 1.1% rocksdb::BlockFetcher::ReadBlockContents (inline)
1036 1.0% 95.1% 1036 1.0% std::__cxx11::basic_string::_M_mutate
869 0.8% 95.9% 869 0.8% std::vector::_M_realloc_insert (inline)
806 0.8% 96.7% 806 0.8% SnmpAgent::GetVariables (inline)
</pre>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9023
Reviewed By: pdillinger
Differential Revision: D31610907
Pulled By: mrambacher
fbshipit-source-id: 574ff51b639dd46ad253a8e664a575f06b7cc85d
Summary:
EncryptedWritableFile is derived from FSWritableFile, which implements
the `IsSyncThreadSafe()` function as
bool IsSyncThreadSafe() const { return false; }
EncryptedWritableFile does not override this method from the base class,
so the `IsSyncThreadSafe()` function on an EncryptedWritableFile will
always return false.
This change adds an override of `IsSyncThreadSafe()` to
EncryptedWritableFile so that the latter will now ask its underlying
`file_` object for the thread-safety of sync operations.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8993
Reviewed By: jay-zhuang
Differential Revision: D31613123
Pulled By: ajkr
fbshipit-source-id: b18625e21a9911744eef3215c29913490e4b6001
Summary:
`valueIndexMap_` in histogram is redundant and search in `valueIndexMap_` is slower than search in `bucketValues_`.
this PR delete `valueIndexMap_` and search in `bucketValues_` by `std::lower_bound`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8625
Reviewed By: zhichao-cao
Differential Revision: D31613386
Pulled By: ajkr
fbshipit-source-id: d7415d724f5c8f41f80cbe82afd7467cfad6f009
Summary:
I get `clang-format-diff` after running `apt install clang-format` on Ubuntu instead of `clang-format-diff.py`. So I think it makes sense to make the format script compatible with this behavior.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9028
Reviewed By: ajkr
Differential Revision: D31634041
Pulled By: jay-zhuang
fbshipit-source-id: b936de791ddcafa6ff304039ef33936e1e04864d
Summary:
closes https://github.com/facebook/rocksdb/issues/8039
Unnecessary use of multiple local JNI references at the same time, 1 per key, was limiting the size of the key array. The local references don't need to be held simultaneously, so if we rearrange the code we can make it work for bigger key arrays.
Incidentally, make errors throw helpful exception messages rather than returning a null pointer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9012
Reviewed By: mrambacher
Differential Revision: D31580862
Pulled By: jay-zhuang
fbshipit-source-id: ce05831d52ede332e1b20e74d2dc621d219b9616
Summary:
Set the perf_level in ```tools/regression_test.sh``` in order to exercise ```PerfContext``` counters in regression tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8031
Test Plan: Manually run the script
Reviewed By: ajkr
Differential Revision: D31508269
Pulled By: anand1976
fbshipit-source-id: 20ddfd1cbca37f1439eed2870086a86d90653b44
Summary:
The code in `IngestExternalFiles()` that bumps the DB's sequence number
depending on what seqnos were assigned to the files has 3 bugs:
1) There is an assertion that the sequence number is increased in all the
affected column families, but this is unnecessary, it is fine if some files can
stick to a lower sequence number. It is very easy to hit the assertion: it is
sufficient to insert 2 files in 2 CFs, one which overlaps the CF and one that
doesn't (for example the CF is empty). The line added in the
`IngestFilesIntoMultipleColumnFamilies_Success` test makes the assertion fail.
2) SetLastSequence() is called with the sum of all the bumps across CFs, but we
should take the maximum instead, as all CFs start with the current seqno and bump
it independently.
3) The code above is accidentally under a `#ifndef NDEBUG`, so it doesn't run in
optimized builds, so some files may be assigned seqnos from the future.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9005
Test Plan:
Added line in `IngestFilesIntoMultipleColumnFamilies_Success` that
triggers the assertion, verified that the test (and all the others) pass after
the fix.
Reviewed By: ajkr
Differential Revision: D31597892
Pulled By: ot
fbshipit-source-id: c2d3237f90290df1178736ace8653a9623f5a770
Summary:
The patch adds a new BlobDB benchmarking script called `run_blob_bench.sh`.
It is a thin wrapper around `benchmark.sh` (similarly to `run_flash_bench.sh`):
it actually calls `benchmark.sh` a number of times, cycling through six workloads,
two write-only ones (bulk load and overwrite), two read/write ones (point lookups
while writing, range scans while writing), and two read-only ones (point lookups
and range scans).
Note: this is a simpler/cleaned up/reworked version of the script used to produce the
benchmark results in http://rocksdb.org/blog/2021/05/26/integrated-blob-db.html .
The new version takes advantage of several recent `benchmark.sh` improvements
like the ability to pass in arbitrary `db_bench` options or the possibility of using a
job ID.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9015
Test Plan: Ran the script manually with different parameter combinations.
Reviewed By: riversand963
Differential Revision: D31555277
Pulled By: ltamasi
fbshipit-source-id: 0e151b2f7b2cf6f66ed7f95455571492ad7ea87f
Summary:
EndWriteStall has a data race: `queue_.empty()` is checked outside of the
mutex, so once we enter the critical section another thread may already have
cleared the list, and accessing the `front()` is undefined behavior (and causes
interesting crashes under high concurrency).
This PR fixes the bug, and also rewrites the logic to make it easier to reason
about it. It also fixes another subtle bug: if some writers are stalled and
`SetBufferSize(0)` is called, which disables the WBM, the writer are not
unblocked because of an early `enabled()` check in `EndWriteStall()`.
It doesn't significantly change the locking behavior, as before writers won't
lock unless entering a stall condition, and `FreeMem` almost always locks if
stalling is allowed, but that is inevitable with the current design. Liveness is
guaranteed by the fact that if some writes are blocked, eventually all writes
will be blocked due to `stall_active_`, and eventually all memory is freed.
While at it, do a couple of optimizations:
- In `WBMStallInterface::Signal()` signal the CV only after releasing the
lock. Signaling under the lock is a common pitfall, as it causes the woken-up
thread to immediately go back to sleep because the mutex is still locked by
the awaker.
- Move all allocations and deallocations outside of the lock.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9009
Test Plan:
```
USE_CLANG=1 make -j64 all check
```
Reviewed By: akankshamahajan15
Differential Revision: D31550668
Pulled By: ot
fbshipit-source-id: 5125387c3dc7ecaaa2b8bbc736e58c4156698580
Summary:
The current BlobDB garbage collection logic works by relocating the valid
blobs from the oldest blob files as they are encountered during compaction,
and cleaning up blob files once they contain nothing but garbage. However,
with sufficiently skewed workloads, it is theoretically possible to end up in a
situation when few or no compactions get scheduled for the SST files that contain
references to the oldest blob files, which can lead to increased space amp due
to the lack of GC.
In order to efficiently handle such workloads, the patch adds a new BlobDB
configuration option called `blob_garbage_collection_force_threshold`,
which signals to BlobDB to schedule targeted compactions for the SST files
that keep alive the oldest batch of blob files if the overall ratio of garbage in
the given blob files meets the threshold *and* all the given blob files are
eligible for GC based on `blob_garbage_collection_age_cutoff`. (For example,
if the new option is set to 0.9, targeted compactions will get scheduled if the
sum of garbage bytes meets or exceeds 90% of the sum of total bytes in the
oldest blob files, assuming all affected blob files are below the age-based cutoff.)
The net result of these targeted compactions is that the valid blobs in the oldest
blob files are relocated and the oldest blob files themselves cleaned up (since
*all* SST files that rely on them get compacted away).
These targeted compactions are similar to periodic compactions in the sense
that they force certain SST files that otherwise would not get picked up to undergo
compaction and also in the sense that instead of merging files from multiple levels,
they target a single file. (Note: such compactions might still include neighboring files
from the same level due to the need of having a "clean cut" boundary but they never
include any files from any other level.)
This functionality is currently only supported with the leveled compaction style
and is inactive by default (since the default value is set to 1.0, i.e. 100%).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8994
Test Plan: Ran `make check` and tested using `db_bench` and the stress/crash tests.
Reviewed By: riversand963
Differential Revision: D31489850
Pulled By: ltamasi
fbshipit-source-id: 44057d511726a0e2a03c5d9313d7511b3f0c4eab
Summary:
`FaultInjectionTest{Env,FS}::ReopenWritableFile()` functions were accidentally deleting WALs from previous `db_stress` runs causing verification to fail. They were operating under the assumption that `ReopenWritableFile()` would delete any existing file. It was a reasonable assumption considering the `{Env,FileSystem}::ReopenWritableFile()` documentation stated that would happen. The only problem was neither the implementations we offer nor the "real" clients in RocksDB code followed that contract. So, this PR updates the contract as well as fixing the fault injection client usage.
The fault injection change exposed that `ExternalSSTFileBasicTest.SyncFailure` was relying on a fault injection `Env` dropping unsynced data written by a regular `Env`. I changed that test to make its `SstFileWriter` use fault injection `Env`, and also implemented `LinkFile()` in fault injection so the unsynced data is tracked under the new name.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8995
Test Plan:
- Verified it fixes the following failure:
```
$ ./db_stress --clear_column_family_one_in=0 --column_families=1 --db=/dev/shm/rocksdb_crashtest_whitebox --delpercent=5 --expected_values_dir=/dev/shm/rocksdb_crashtest_expected --iterpercent=0 --key_len_percent_dist=1,30,69 --max_key=100000 --max_key_len=3 --nooverwritepercent=1 --ops_per_thread=1000 --prefixpercent=0 --readpercent=60 --reopen=0 --target_file_size_base=1048576 --test_batches_snapshots=0 --write_buffer_size=1048576 --writepercent=35 --value_size_mult=33 -threads=1
...
$ ./db_stress --avoid_flush_during_recovery=1 --clear_column_family_one_in=0 --column_families=1 --db=/dev/shm/rocksdb_crashtest_whitebox --delpercent=5 --destroy_db_initially=0 --expected_values_dir=/dev/shm/rocksdb_crashtest_expected --iterpercent=10 --key_len_percent_dist=1,30,69 --max_bytes_for_level_base=4194304 --max_key=100000 --max_key_len=3 --nooverwritepercent=1 --open_files=-1 --open_metadata_write_fault_one_in=8 --open_write_fault_one_in=16 --ops_per_thread=1000 --prefix_size=-1 --prefixpercent=0 --readpercent=50 --sync=1 --target_file_size_base=1048576 --test_batches_snapshots=0 --write_buffer_size=1048576 --writepercent=35 --value_size_mult=33 -threads=1
...
Verification failed for column family 0 key 000000000000001300000000000000857878787878 (1143): Value not found: NotFound:
Crash-recovery verification failed :(
...
```
- `make check -j48`
Reviewed By: ltamasi
Differential Revision: D31495388
Pulled By: ajkr
fbshipit-source-id: 7886ccb6a07cb8b78ad7b6c1c341ccf40bb68385
Summary:
Add the file temperature to `IngestExternalFileArg` such that when SST files are ingested, user is able to assign the temperature to each SST file. If the temperature vector is empty or its size does not match the file name vector size, all ingested SST files will be assigned with `Temperature::unKnown`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8949
Test Plan: add the new test and make check
Reviewed By: siying
Differential Revision: D31127852
Pulled By: zhichao-cao
fbshipit-source-id: 141a81f0f7b473d88f4ab0cb2a21a114cbc6f83c
Summary:
Instead of hardcoding the stress test type and some args, allow it to be passed through env variables.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8985
Reviewed By: akankshamahajan15
Differential Revision: D31349495
Pulled By: anand1976
fbshipit-source-id: 585c8fcb0232d0a95925b1a8c4e42a0940227e8b
Summary:
I was looking at https://github.com/facebook/rocksdb/issues/2636 and got very confused that `MergingIterator::AddIterator()` is populating `min_heap_` with dangling pointers. There is justification in the comments that `min_heap_` will be cleared before it's used, but it'd be cleaner to not populate it with dangling pointers in the first place. Also made similar change in the constructor for consistency, although the pointers there would not be dangling, just unused.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8975
Test Plan: rely on existing tests
Reviewed By: pdillinger, hx235
Differential Revision: D31273767
Pulled By: ajkr
fbshipit-source-id: 127ca9dd1f82f77f55dd0c3f19511de3282fc229
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8991
Test Plan: the new test hangs forever without this fix and passes with this fix.
Reviewed By: hx235
Differential Revision: D31456419
Pulled By: ajkr
fbshipit-source-id: a82c0e5560b6e6153089dccd8e46163c61b07bff
Summary:
This change introduces warnings instead of a silent override when trying to use level_compaction_dynamic_level_bytes with multiple cf_paths/db_paths.
I have completed the CLA.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8329
Reviewed By: hx235
Differential Revision: D31399713
Pulled By: ajkr
fbshipit-source-id: 29c6fe5258d1f739b4590ecd44aee44f55415595
Summary:
For tiered storage project, we need to know the block read count and read bytes of files with different temperature. Add FileIOByTemperature to IOStatsContext and collect the bytes read and read count from different temperature files through the RandomAccessFileReader.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8710
Test Plan: make check, add the testing cases
Reviewed By: siying
Differential Revision: D30582400
Pulled By: zhichao-cao
fbshipit-source-id: d83173de594374fc8404af5ce93a6a9be72c7141
Summary:
Background: Cache warming up will cause potential read performance degradation due to reading blocks from storage to the block cache. Since in production, the workload and access pattern to a certain DB is stable, it is a potential solution to dump out the blocks belonging to a certain DB to persist storage (e.g., to a file) and bulk-load the blocks to Secondary cache before the DB is relaunched. For example, when migrating a DB form host A to host B, it will take a short period of time, the access pattern to blocks in the block cache will not change much. It is efficient to dump out the blocks of certain DB, migrate to the destination host and insert them to the Secondary cache before we relaunch the DB.
Design: we introduce the interface of CacheDumpWriter and CacheDumpRead for user to store the blocks dumped out from block cache. RocksDB will encode all the information and send the string to the writer. User can implement their own writer it they want. CacheDumper and CacheLoad are introduced to save the blocks and load the blocks respectively.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8912
Test Plan: add new tests to lru_cache_test and pass make check.
Reviewed By: pdillinger
Differential Revision: D31452871
Pulled By: zhichao-cao
fbshipit-source-id: 11ab4f5d03e383f476947116361d54188d36ec48
Summary:
- Update few stale GitHub wiki link references from rocksdb.org
- Update the API comments for ignore_range_deletions
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8983
Reviewed By: ajkr
Differential Revision: D31355965
Pulled By: ramvadiv
fbshipit-source-id: 245ac4a6913976dd82afa308bc4aae6bff3d788c
Summary:
There were three implementations of VectorIterator (util/vector_iterator, test_util/testutil.h and LoggingForwardVectorIterator). Merged them into one class to increase code coverage/testing and reduce duplication.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8901
Reviewed By: pdillinger
Differential Revision: D31022673
Pulled By: mrambacher
fbshipit-source-id: 8e3acbd2dfd60b4df609d02cc72846de2389d531
Summary:
This change adds File IO Notifications to the SequentialFileReader The SequentialFileReader is extended
with a listener parameter.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8982
Test Plan:
A new test EventListenerTest::OnWALOperationTest has been added. The
test verifies that during restore the sequential file reader is called
and the notifications are fired.
Reviewed By: riversand963
Differential Revision: D31320844
Pulled By: shrfb
fbshipit-source-id: 040b24da7c010d7c14ebb5c6460fae9a19b8c168
Summary:
On MacOS, there were errors building in LITE mode related to unused private member variables:
In file included from ./db/compaction/compaction_job.h:20:
./db/blob/blob_file_completion_callback.h:87:19: error: private field ‘sst_file_manager_’ is not used [-Werror,-Wunused-private-field]
SstFileManager* sst_file_manager_;
^
./db/blob/blob_file_completion_callback.h:88:22: error: private field ‘mutex_’ is not used [-Werror,-Wunused-private-field]
InstrumentedMutex* mutex_;
^
./db/blob/blob_file_completion_callback.h:89:17: error: private field ‘error_handler_’ is not used [-Werror,-Wunused-private-field]
ErrorHandler* error_handler_;
This PR resolves those build issues by removing the values as members in LITE mode and fixing the constructor to ignore the input values in LITE mode (otherwise we get unused parameter warnings).
Tested by validating compiles without warnings.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8981
Reviewed By: akankshamahajan15
Differential Revision: D31320141
Pulled By: mrambacher
fbshipit-source-id: d67875ebbd39a9555e4f09b2d37159566dd8a085
Summary:
With test sync points, we can assert on the equality of iterator value in three existing
unit tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8973
Test Plan:
```
gtest-parallel -r 1000 ./db_test2 --gtest_filter=DBTest2.IterRaceFlush2:DBTest2.IterRaceFlush1:DBTest2.IterRefreshRaceFlush
```
make check
Reviewed By: akankshamahajan15
Differential Revision: D31256340
Pulled By: riversand963
fbshipit-source-id: a9440767ab383e0ec61bd43ffa8fbec4ba562ea2
Summary:
Enable SingleDelete with user defined timestamp in db_bench,
db_stress and crash test
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8971
Test Plan:
1. For db_stress, ran the command for full duration: i) python3 -u tools/db_crashtest.py
--enable_ts whitebox --nooverwritepercent=100
ii) make crash_test_with_ts
2. For db_bench, ran: ./db_bench -benchmarks=randomreplacekeys
-user_timestamp_size=8 -use_single_deletes=true
Reviewed By: riversand963
Differential Revision: D31246558
Pulled By: akankshamahajan15
fbshipit-source-id: 29cd8740c9921341e52f09242fca3c44d75a12b7
Summary:
IOSTATS_ADD_IF_POSITIVE() doesn't seem to a macro that aims to improve performance but does the opposite. The counter to add is almost always positive so the if is just a waste. Furthermore, adding to a thread local variable seemse to be much cheaper than an if condition if branch prediction has a possibility to be wrong. Remove the macro.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8984
Test Plan: See CI completes.
Reviewed By: anand1976
Differential Revision: D31348163
fbshipit-source-id: 30af6d45e1aa8bbc09b2c046206cce6f67f4777a
Summary:
The ldb list_live_files_metadata command does not print any information about blob files currently. We would like to add this functionality. Note that list_live_files_metadata has two different modes of operation: the one shown above, which shows the LSM tree structure, and another one, which can be enabled using the flag --sort_by_filename and simply lists the files in numerical order regardless of level. We would like to show blob files in both modes.
Changes:
1. Using GetAllColumnFamilyMetaData API instead of GetLiveFilesMetaData API for fetching live files data.
Testing:
1. Created a sample rocksdb instance using dbbench command (this creates both SST and blob files)
2. Checked if the blob files are listed or not by using ldb commands.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8976
Reviewed By: ltamasi
Differential Revision: D31316061
Pulled By: pradeepambati
fbshipit-source-id: d15cdea192febf7a45f28deee2ba40615d3d84ab