Summary:
* Basic handling of SST file with just range tombstones rather than
failing assertion about smallest_seqno <= largest_seqno
* Adds --verbose option so that there exists a way to see the INFO
output from Repairer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8544
Test Plan: unit test added, manual testing for --verbose
Reviewed By: ajkr
Differential Revision: D29954805
Pulled By: pdillinger
fbshipit-source-id: 696af25805fc36cc178b04ba6045922a22625fd9
Summary:
Add an argument to ldb to dump live file names, column families, and levels, `list_live_files_metadata`. The output shows all active SST file names, sorted first by column family and then by level. For each level the SST files are sorted alphabetically.
Typically, the output looks like this:
```
./ldb --db=/tmp/test_db list_live_files_metadata
Live SST Files:
===== Column Family: default =====
---------- level 0 ----------
/tmp/test_db/000069.sst
---------- level 1 ----------
/tmp/test_db/000064.sst
/tmp/test_db/000065.sst
/tmp/test_db/000066.sst
/tmp/test_db/000071.sst
---------- level 2 ----------
/tmp/test_db/000038.sst
/tmp/test_db/000039.sst
/tmp/test_db/000052.sst
/tmp/test_db/000067.sst
/tmp/test_db/000070.sst
------------------------------
```
Second, a flag was added `--sort_by_filename`, to change the layout of the output. When this flag is added to the command, the output shows all active SST files sorted by name, in front of which the LSM level and the column family are mentioned. With the same example, the following command would return:
```
./ldb --db=/tmp/test_db list_live_files_metadata --sort_by_filename
Live SST Files:
/tmp/test_db/000038.sst : level 2, column family 'default'
/tmp/test_db/000039.sst : level 2, column family 'default'
/tmp/test_db/000052.sst : level 2, column family 'default'
/tmp/test_db/000064.sst : level 1, column family 'default'
/tmp/test_db/000065.sst : level 1, column family 'default'
/tmp/test_db/000066.sst : level 1, column family 'default'
/tmp/test_db/000067.sst : level 2, column family 'default'
/tmp/test_db/000069.sst : level 0, column family 'default'
/tmp/test_db/000070.sst : level 2, column family 'default'
/tmp/test_db/000071.sst : level 1, column family 'default'
------------------------------
```
Thus, the user can either request to show the files by levels, or sorted by filenames.
This PR includes a simple Python unit test that makes sure the file name and level printed out by this new feature matches the one found with an existing feature, `dump_live_file`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8446
Reviewed By: akankshamahajan15
Differential Revision: D29320080
Pulled By: bjlemaire
fbshipit-source-id: 01fb7b5637c59010d74c80730a28d815994e7009
Summary:
At the moment, the following command : "`./ --db=mypath/ dump_file_files`" returns a series of erronous names with double slashes, ie: "`mypath//000xxx.sst`", including manifest file names with double slashes "`mypath//MANIFEST-00XXX`", whereas "`./ --db=mypath dump_file_files`" correctly returns "`mypath/000xxx.sst`" and "`mypath/MANIFEST-00XXX`".
This (very short) PR simply checks if there is a need to add or remove any '`/`' character when the `db_path` and `manifest_filename`/sst `filenames` are concatenated.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8439
Reviewed By: akankshamahajan15
Differential Revision: D29301349
Pulled By: bjlemaire
fbshipit-source-id: 3e9e58f9749d278b654ae838fcee13ad698705a8
Summary:
- Added CreateFromString method to Env and FilesSystem to replace LoadEnv/Load. This method/signature is a precursor to making these classes extend Customizable.
- Added CreateFromSystem to Env. This method standardizes creating an Env from the environment variables. Previously, some places would check TEST_ENV_URI and others would also check TEST_FS_URI. Now the code is more command/standardized.
- Added CreateFromFlags to Env. These method allows Env to be create from string options (such as GFLAGS options) in a more standard way.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8174
Reviewed By: zhichao-cao
Differential Revision: D28999603
Pulled By: mrambacher
fbshipit-source-id: 88e6911e7e91f908458a7fe10a20e93ecbc275fb
Summary:
Currently, we either use the file system inode or a monotonically incrementing runtime ID as the block cache key prefix. However, if we use a monotonically incrementing runtime ID (in the case that the file system does not support inode id generation), in some cases, it cannot ensure uniqueness (e.g., we have secondary cache migrated from host to host). We use DbSessionID (20 bytes) + current file number (at most 10 bytes) as the new cache block key prefix when the secondary cache is enabled. So can accommodate scenarios such as transfer of cache state across hosts.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8360
Test Plan: add the test to lru_cache_test
Reviewed By: pdillinger
Differential Revision: D29006215
Pulled By: zhichao-cao
fbshipit-source-id: 6cff686b38d83904667a2bd39923cd030df16814
Summary:
Currently, a few ldb commands do not check the execution result of
database operations. This PR checks the execution results and tries to
improve the error reporting.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8072
Test Plan:
```
make check
```
and
```
ASSERT_STATUS_CHECKED=1 make -j20 ldb
python tools/ldb_test.py
```
Reviewed By: zhichao-cao
Differential Revision: D27152466
Pulled By: riversand963
fbshipit-source-id: b94220496a4b3591b61c1d350f665860a6579f30
Summary:
This PR adds support for custom file systems to ldb and sst_dump by adding command line options for specifying --fs_uri and --backup_fs uri (for ldb backup/restore commands). fs_uri is already supported in db_bench and db_stress, and there is already support in ldb and db stress for specifying customized envs.
The PR also fixes what looks like a bug in the ldb backup/restore commands. As it is right now, backups can only be made from and to the same environment/file system which does not seem to be the intended behavior. This PR makes it possible to do/restore backups between different envs/file systems.
Example:
`./ldb backup --fs_uri=zenfs://dev:nvme2n1 --backup_fs_uri=posix:// --backup_dir=/tmp/my_rocksdb_backup --db=rocksdbtest/dbbench
`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8010
Reviewed By: jay-zhuang
Differential Revision: D26904654
Pulled By: ajkr
fbshipit-source-id: 9b695ed8b944fcc6b27c4daaa9f52e87ee2c1fb4
Summary:
Removed the uses of the Legacy FileWrapper classes from the source code. The wrappers were creating an additional layer of indirection/wrapping, as the Env already has a FileSystem.
Moved the Custom FileWrapper classes into the CustomEnv, as these classes are really for the private use the the CustomEnv class.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7851
Reviewed By: anand1976
Differential Revision: D26114816
Pulled By: mrambacher
fbshipit-source-id: db32840e58d969d3a0fa6c25aaf13d6dcdc74150
Summary:
When the --try_load_options is used in conjunction with the
--column_family option, ldb incorrectly sets the ColumnFamilyOptions for
that column family to defaults. This PR fixes that by retaining from the
OPTIONS file and applying command line overrides.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7847
Test Plan: Add a unit test in ldb_cmd_test
Reviewed By: ajkr
Differential Revision: D25874720
Pulled By: anand1976
fbshipit-source-id: 04bcf23b55e5a30b5b6a59b0e5cb4faef3da7429
Summary:
Prior to this PR it prints the raw bytes which can include non-printable
characters. This PR adds the option to print in hex instead.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7820
Test Plan:
try it out
```
$ ./ldb file_checksum_dump --hex --db=/tmp/rocksdbtest-9383//db_basic_test_12281129388755189514/
16, FileChecksumCrc32c, 0xC789D948
```
Reviewed By: jay-zhuang
Differential Revision: D25738072
Pulled By: ajkr
fbshipit-source-id: 8cf2856877971756c0495cfa63a9a1281c414dc7
Summary:
So that we can more easily get aggregate live table data such
as total filter, index, and data sizes.
Also adds ldb support for getting properties
Also fixed some missing/inaccurate related comments in db.h
For example:
$ ./ldb --db=testdb get_property rocksdb.aggregated-table-properties
rocksdb.aggregated-table-properties.data_size: 102871
rocksdb.aggregated-table-properties.filter_size: 0
rocksdb.aggregated-table-properties.index_partitions: 0
rocksdb.aggregated-table-properties.index_size: 2232
rocksdb.aggregated-table-properties.num_data_blocks: 100
rocksdb.aggregated-table-properties.num_deletions: 0
rocksdb.aggregated-table-properties.num_entries: 15000
rocksdb.aggregated-table-properties.num_merge_operands: 0
rocksdb.aggregated-table-properties.num_range_deletions: 0
rocksdb.aggregated-table-properties.raw_key_size: 288890
rocksdb.aggregated-table-properties.raw_value_size: 198890
rocksdb.aggregated-table-properties.top_level_index_size: 0
$ ./ldb --db=testdb get_property rocksdb.aggregated-table-properties-at-level1
rocksdb.aggregated-table-properties-at-level1.data_size: 80909
rocksdb.aggregated-table-properties-at-level1.filter_size: 0
rocksdb.aggregated-table-properties-at-level1.index_partitions: 0
rocksdb.aggregated-table-properties-at-level1.index_size: 1787
rocksdb.aggregated-table-properties-at-level1.num_data_blocks: 81
rocksdb.aggregated-table-properties-at-level1.num_deletions: 0
rocksdb.aggregated-table-properties-at-level1.num_entries: 12466
rocksdb.aggregated-table-properties-at-level1.num_merge_operands: 0
rocksdb.aggregated-table-properties-at-level1.num_range_deletions: 0
rocksdb.aggregated-table-properties-at-level1.raw_key_size: 238210
rocksdb.aggregated-table-properties-at-level1.raw_value_size: 163414
rocksdb.aggregated-table-properties-at-level1.top_level_index_size: 0
$
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7779
Test Plan: Added a test to ldb_test.py
Reviewed By: jay-zhuang
Differential Revision: D25653103
Pulled By: pdillinger
fbshipit-source-id: 2905469a08a64dd6b5510cbd7be2e64d3234d6d3
Summary:
This is a PR generated **semi-automatically** by an internal tool to remove unused includes and `using` statements.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7604
Test Plan: make check
Reviewed By: ajkr
Differential Revision: D24579392
Pulled By: riversand963
fbshipit-source-id: c4bfa6c6b08da1de186690d37eb73d8fff45aecd
Summary:
As suggested by pdillinger ,The name of kLogFile is misleading, in some tests, kLogFile is defined as info log. Replace it with kWalFile and move it to public, which will be used in https://github.com/facebook/rocksdb/issues/7523
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7580
Test Plan: make check
Reviewed By: riversand963
Differential Revision: D24485420
Pulled By: zhichao-cao
fbshipit-source-id: 955e3dacc1021bb590fde93b0a568ffe9ad80799
Summary:
This is adapted from https://github.com/facebook/rocksdb/issues/6678 but takes a different approach, avoiding opening a read-write DB and avoiding the `DeleteFile()` API.
First, this PR refactors how options variables are initialized in `ldb` so it can be reused in a subcommand that doesn't open a DB:
- Separated remaining option initialization logic out of `OpenDB()`. The new `PrepareOptions()` function initializes the full options state.
- Fixed an old TODO about applying the subcommand CF option overrides to the proper `ColumnFamilyOptions` object.
Second, this PR adds the `ldb unsafe_remove_sst_file` subcommand. It uses the `VersionSet`-level APIs to remove the file with the specified number.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7335
Test Plan: played with interactive python and this file removal command. Verified openability/correct results in case of multiple column families, multiple levels, etc.
Reviewed By: pdillinger
Differential Revision: D23454575
Pulled By: ajkr
fbshipit-source-id: 039b7a8cbfc42fd123dcb25821eef51d61148afe
Summary:
As part of the IOTracing project, this PR
1. Caches "FileSystemPtr" object(wrapper class that returns file system pointer based on tracing enabled) instead of "FileSystem" pointer.
2. FileSystemPtr object is created using FileSystem pointer and IOTracer
pointer.
3. IOTracer shared_ptr is created in DBImpl and it is passed to different classes through constructor.
4. When tracing is enabled through DB::StartIOTrace, FileSystemPtr
returns FileSystemTracingWrapper pointer for tracing purpose and when
it is disabled underlying FileSystem pointer is returned.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7180
Test Plan:
make check -j64
COMPILE_WITH_TSAN=1 make check -j64
Reviewed By: anand1976
Differential Revision: D22987117
Pulled By: akankshamahajan15
fbshipit-source-id: 6073617e4c2d5bc363914f3a1f55ae3b0a58fbf1
Summary:
`BackupableDBOptions::new_naming_for_backup_files` is added. This option is false by default. When it is true, backup table filenames under directory shared_checksum are of the form `<file_number>_<crc32c>_<db_session_id>.sst`.
Note that when this option is true, it comes into effect only when both `share_files_with_checksum` and `share_table_files` are true.
Three new test cases are added.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6997
Test Plan: Passed make check.
Reviewed By: ajkr
Differential Revision: D22098895
Pulled By: gg814
fbshipit-source-id: a1d9145e7fe562d71cde7ac995e17cb24fd42e76
Summary:
The LDB create and drop column family commands failed to check if theere was a valid database prior to dereferencing it, leading to a core dump.
The SstFileDumper prefetch code would dereference a file when the file did not exist as part of the Prefetch code. This dereference was moved inside an st.ok() check.
Tests were added for both failure conditions.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6922
Reviewed By: gg814
Differential Revision: D21884024
Pulled By: pdillinger
fbshipit-source-id: bddd45c299aa9dc7e928c17a37a96521f8c9149e
Summary:
DB::OpenForReadOnly will not write anything to the file system (i.e., create directories or files for the DB) unless create_if_missing is true.
This change also fixes some subcommands of ldb, which write to the file system even if the purpose is for readonly.
Two tests for this updated behavior of DB::OpenForReadOnly are also added.
Other minor changes:
1. Updated HISTORY.md to include this API change of DB::OpenForReadOnly;
2. Updated the help information for the put and batchput subcommands of ldb with the option [--create_if_missing];
3. Updated the comment of Env::DeleteDir to emphasize that it returns OK only if the directory to be deleted is empty.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6900
Test Plan: passed make check; also manually tested a few ldb subcommands
Reviewed By: pdillinger
Differential Revision: D21822188
Pulled By: gg814
fbshipit-source-id: 604cc0f0d0326a937ee25a32cdc2b512f9a3be6e
Summary:
sst_dump can issue many file reads from the file system. This doesn't work well with file systems without a OS cache, especially remote file systems. In order to mitigate this problem, several improvements are done:
1. --readahead_size is added, so that users can specify readahead size when scanning the data.
2. Force a 512KB tail readahead, which prevents three I/Os for footer, meta index and property blocks and hopefully index and filter blocks too.
3. Consoldiate SSTDump's I/Os before opening the file for read. Use the same file prefetch buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6836
Test Plan: Add a test that covers this new feature.
Reviewed By: pdillinger
Differential Revision: D21516607
fbshipit-source-id: 3ae43526286f67b2f4a5bdedfbc92719d579b87e
Summary:
When using ldb, users cannot turn on force consistency check in most commands, while they cannot use checksonsistnecy with --try_load_options. The change fixes both by:
1. checkconsistency now calls OpenDB() so that it gets all the options loading and sanitized options logic
2. use options.check_consistency_checks = true by default, and add a --disable_consistency_checks to turn it off.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6802
Test Plan: Add a new unit test. Some manual tests with corrupted DBs.
Reviewed By: pdillinger
Differential Revision: D21388051
fbshipit-source-id: 8d122732d391b426e3982a1c3232a8e3763ffad0
Summary:
The dynamic_cast in the filter benchmark causes release mode to fail due to
no-rtti. Replace with static_cast_with_check.
Signed-off-by: Derrick Pallas <derrick@pallas.us>
Addition by peterd: Remove unnecessary 2nd template arg on all static_cast_with_check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6732
Reviewed By: ltamasi
Differential Revision: D21304260
Pulled By: pdillinger
fbshipit-source-id: 6e8eb437c4ca5a16dbbfa4053d67c4ad55f1608c
Summary:
The methods in convenience.h are used to compare/convert objects to/from strings. There is a mishmash of parameters in use here with more needed in the future. This PR replaces those parameters with a single structure.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6389
Reviewed By: siying
Differential Revision: D21163707
Pulled By: zhichao-cao
fbshipit-source-id: f807b4cc7e2b0af3871536b69546b2604dfa81bd
Summary:
In the current implementation, sst file checksum is calculated by a shared checksum function object, which may make some checksum function hard to be applied here such as SHA1. In this implementation, each sst file will have its own checksum generator obejct, created by FileChecksumGenFactory. User needs to implement its own FilechecksumGenerator and Factory to plugin the in checksum calculation method.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6600
Test Plan: tested with make asan_check
Reviewed By: riversand963
Differential Revision: D20717670
Pulled By: zhichao-cao
fbshipit-source-id: 2a74c1c280ac11a07a1980185b43b671acaa71c6
Summary:
The current Env/FileSystem API separation has a couple of issues -
1. It requires the user to specify 2 options - ```Options::env``` and ```Options::file_system``` - which means they have to make code changes to benefit from the new APIs. Furthermore, there is a risk of accessing the same APIs in two different ways, through Env in the old way and through FileSystem in the new way. The two may not always match, for example, if env is ```PosixEnv``` and FileSystem is a custom implementation. Any stray RocksDB calls to env will use the ```PosixEnv``` implementation rather than the file_system implementation.
2. There needs to be a simple way for the FileSystem developer to instantiate an Env for backward compatibility purposes.
This PR solves the above issues and simplifies the migration in the following ways -
1. Embed a shared_ptr to the ```FileSystem``` in the ```Env```, and remove ```Options::file_system``` as a configurable option. This way, no code changes will be required in application code to benefit from the new API. The default Env constructor uses a ```LegacyFileSystemWrapper``` as the embedded ```FileSystem```.
1a. - This also makes it more robust by ensuring that even if RocksDB
has some stray calls to Env APIs rather than FileSystem, they will go
through the same object and thus there is no risk of getting out of
sync.
2. Provide a ```NewCompositeEnv()``` API that can be used to construct a
PosixEnv with a custom FileSystem implementation. This eliminates an
indirection to call Env APIs, and relieves the FileSystem developer of
the burden of having to implement wrappers for the Env APIs.
3. Add a couple of missing FileSystem APIs - ```SanitizeEnvOptions()``` and
```NewLogger()```
Tests:
1. New unit tests
2. make check and make asan_check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6552
Reviewed By: riversand963
Differential Revision: D20592038
Pulled By: anand1976
fbshipit-source-id: c3801ad4153f96d21d5a3ae26c92ba454d1bf1f7
Summary:
Preliminary support for iterator with user timestamp. Current implementation does not consider merge operator and reverse iterator. Auto compaction is also disabled in unit tests.
Create an iterator with timestamp.
```
...
read_opts.timestamp = &ts;
auto* iter = db->NewIterator(read_opts);
// target is key without timestamp.
for (iter->Seek(target); iter->Valid(); iter->Next()) {}
for (iter->SeekToFirst(); iter->Valid(); iter->Next()) {}
delete iter;
read_opts.timestamp = &ts1;
// lower_bound and upper_bound are without timestamp.
read_opts.iterate_lower_bound = &lower_bound;
read_opts.iterate_upper_bound = &upper_bound;
auto* iter1 = db->NewIterator(read_opts);
// Do Seek or SeekToFirst()
delete iter1;
```
Test plan (dev server)
```
$make check
```
Simple benchmarking (dev server)
1. The overhead introduced by this PR even when timestamp is disabled.
key size: 16 bytes
value size: 100 bytes
Entries: 1000000
Data reside in main memory, and try to stress iterator.
Repeated three times on master and this PR.
- Seek without next
```
./db_bench -db=/dev/shm/rocksdbtest-1000 -benchmarks=fillseq,seekrandom -enable_pipelined_write=false -disable_wal=true -format_version=3
```
master: 159047.0 ops/sec
this PR: 158922.3 ops/sec (2% drop in throughput)
- Seek and next 10 times
```
./db_bench -db=/dev/shm/rocksdbtest-1000 -benchmarks=fillseq,seekrandom -enable_pipelined_write=false -disable_wal=true -format_version=3 -seek_nexts=10
```
master: 109539.3 ops/sec
this PR: 107519.7 ops/sec (2% drop in throughput)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6255
Differential Revision: D19438227
Pulled By: riversand963
fbshipit-source-id: b66b4979486f8474619f4aa6bdd88598870b0746
Summary:
When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433
Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.
Differential Revision: D19977691
fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
Summary:
In the current code base, RocksDB generate the checksum for each block and verify the checksum at usage. Current PR enable SST file checksum. After a SST file is generated by Flush or Compaction, RocksDB generate the SST file checksum and store the checksum value and checksum method name in the vs_info and MANIFEST as part for the FileMetadata.
Added the enable_sst_file_checksum to Options to enable or disable file checksum. Added sst_file_checksum to Options such that user can plugin their own SST file checksum calculate method via overriding the SstFileChecksum class. The checksum information inlcuding uint32_t checksum value and a checksum name (string). A new tool is added to LDB such that user can dump out a list of file checksum information from MANIFEST. If user enables the file checksum but does not provide the sst_file_checksum instance, RocksDB will use the default crc32checksum implemented in table/sst_file_checksum_crc32c.h
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6216
Test Plan: Added the testing case in table_test and ldb_cmd_test to verify checksum is correct in different level. Pass make asan_check.
Differential Revision: D19171461
Pulled By: zhichao-cao
fbshipit-source-id: b2e53479eefc5bb0437189eaa1941670e5ba8b87
Summary:
The current Env API encompasses both storage/file operations, as well as OS related operations. Most of the APIs return a Status, which does not have enough metadata about an error, such as whether its retry-able or not, scope (i.e fault domain) of the error etc., that may be required in order to properly handle a storage error. The file APIs also do not provide enough control over the IO SLA, such as timeout, prioritization, hinting about placement and redundancy etc.
This PR separates out the file/storage APIs from Env into a new FileSystem class. The APIs are updated to return an IOStatus with metadata about the error, as well as to take an IOOptions structure as input in order to allow more control over the IO.
The user can set both ```options.env``` and ```options.file_system``` to specify that RocksDB should use the former for OS related operations and the latter for storage operations. Internally, a ```CompositeEnvWrapper``` has been introduced that inherits from ```Env``` and redirects individual methods to either an ```Env``` implementation or the ```FileSystem``` as appropriate. When options are sanitized during ```DB::Open```, ```options.env``` is replaced with a newly allocated ```CompositeEnvWrapper``` instance if both env and file_system have been specified. This way, the rest of the RocksDB code can continue to function as before.
This PR also ports PosixEnv to the new API by splitting it into two - PosixEnv and PosixFileSystem. PosixEnv is defined as a sub-class of CompositeEnvWrapper, and threading/time functions are overridden with Posix specific implementations in order to avoid an extra level of indirection.
The ```CompositeEnvWrapper``` translates ```IOStatus``` return code to ```Status```, and sets the severity to ```kSoftError``` if the io_status is retryable. The error handling code in RocksDB can then recover the DB automatically.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5761
Differential Revision: D18868376
Pulled By: anand1976
fbshipit-source-id: 39efe18a162ea746fabac6360ff529baba48486f
Summary:
The patch adds a new command line parameter --decode_blob_index to sst_dump.
If this switch is specified, sst_dump prints blob indexes in a human readable format,
printing the blob file number, offset, size, and expiration (if applicable) for blob
references, and the blob value (and expiration) for inlined blobs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5926
Test Plan:
Used db_bench's BlobDB mode to generate SST files containing blob references with
and without expiration, as well as inlined blobs with and without expiration (note: the
latter are stored as plain values), and confirmed sst_dump correctly prints all four types
of records.
Differential Revision: D17939077
Pulled By: ltamasi
fbshipit-source-id: edc5f58fee94ba35f6699c6a042d5758f5b3963d
Summary:
This PR allows for the creation of custom env when using sst_dump. If
the user does not set options.env or set options.env to nullptr, then sst_dump
will automatically try to create a custom env depending on the path to the sst
file or db directory. In order to use this feature, the user must call
ObjectRegistry::Register() beforehand.
Test Plan (on devserver):
```
$make all && make check
```
All tests must pass to ensure this change does not break anything.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5845
Differential Revision: D17678038
Pulled By: riversand963
fbshipit-source-id: 58ecb4b3f75246d52b07c4c924a63ee61c1ee626
Summary:
Further apply formatter to more recent commits.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5830
Test Plan: Run all existing tests.
Differential Revision: D17488031
fbshipit-source-id: 137458fd94d56dd271b8b40c522b03036943a2ab
Summary:
Add a command in ldb so that users can print out tombstones in SST files.
In order to test the code, change the interface of LDBCommandRunner::RunCommand() so that it doesn't return from the program, but return the status code.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5615
Test Plan: Add a new unit test
Differential Revision: D16550326
fbshipit-source-id: 88ddfe6984bdcbb3a528abdd115089df09eba52e
Summary:
The ObjectRegistry class replaces the Registrar and NewCustomObjects. Objects are registered with the registry by Type (the class must implement the static const char *Type() method).
This change is necessary for a few reasons:
- By having a class (rather than static template instances), the class can be passed between compilation units, meaning that objects could be registered and shared from a dynamic library with an executable.
- By having a class with instances, different units could have different objects registered. This could be useful if, for example, one Option allowed for a dynamic library and one did not.
When combined with some other PRs (being able to load shared libraries, a Configurable interface to configure objects to/from string), this code will allow objects in external shared libraries to be added to a RocksDB image at run-time, rather than requiring every new extension to be built into the main library and called explicitly by every program.
Test plan (on riversand963's devserver)
```
$COMPILE_WITH_ASAN=1 make -j32 all && sleep 1 && make check
```
All tests pass.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5293
Differential Revision: D16363396
Pulled By: riversand963
fbshipit-source-id: fbe4acb615bfc11103eef40a0b288845791c0180
Summary:
Right now, ldb cannot scan a DB with merge operands with default ldb. There is no hard to give a general merge operator so that it can at least print out something
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5607
Test Plan: Run ldb against a DB with merge operands and see the outputs.
Differential Revision: D16442634
fbshipit-source-id: c66c414ec07f219cfc6e6ec2cc14c783ee95df54
Summary:
ldb idump now only works for default column family. Extend it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5594
Test Plan: Compile and run the tool against a multiple CF DB.
Differential Revision: D16380684
fbshipit-source-id: bfb8af36fdad1806837c90aaaab492d71528aceb
Summary:
Right now ldb can open running DB through read-only DB. However, it might leave info logs files to the read-only DB directory. Add an option to open the DB as secondary to avoid it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5537
Test Plan:
Run
./ldb scan --max_keys=10 --db=/tmp/rocksdbtest-2491/dbbench --secondary_path=/tmp --no_value --hex
and
./ldb get 0x00000000000000103030303030303030 --hex --db=/tmp/rocksdbtest-2491/dbbench --secondary_path=/tmp
against a normal db_bench run and observe the output changes. Also observe that no new info logs files are created under /tmp/rocksdbtest-2491/dbbench.
Run without --secondary_path and observe that new info logs created under /tmp/rocksdbtest-2491/dbbench.
Differential Revision: D16113886
fbshipit-source-id: 4e09dec47c2528f6ca08a9e7a7894ba2d9daebbb
Summary:
Sometimes it is helpful to fetch the whole history of stats after benchmark runs. Add such an option
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5532
Test Plan: Run the benchmark manually and observe the output is as expected.
Differential Revision: D16097764
fbshipit-source-id: 10b5b735a22a18be198b8f348be11f11f8806904
Summary:
`create_column_family` cmd already exists but was somehow missed in the help message.
also add `drop_column_family` cmd which can drop a cf without opening db.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5503
Test Plan: Updated existing ldb_test.py to test deleting a column family.
Differential Revision: D16018414
Pulled By: lightmark
fbshipit-source-id: 1fc33680b742104fea86b10efc8499f79e722301
Summary:
This PR integrates the block cache tracer class into db_impl.cc.
db_impl.cc contains a member variable of AtomicBlockCacheTraceWriter class and passes its reference to the block_based_table_reader.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5433
Differential Revision: D15728016
Pulled By: HaoyuHuang
fbshipit-source-id: 23d5659e8c82d556833dcc1a5558aac8c1f7db71
Summary:
When using `PRIu64` type of printf specifier, current code base does the following:
```
#ifndef __STDC_FORMAT_MACROS
#define __STDC_FORMAT_MACROS
#endif
#include <inttypes.h>
```
However, this can be simplified to
```
#include <cinttypes>
```
as long as flag `-std=c++11` is used.
This should solve issues like https://github.com/facebook/rocksdb/issues/5159
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5402
Differential Revision: D15701195
Pulled By: miasantreble
fbshipit-source-id: 6dac0a05f52aadb55e9728038599d3d2e4b59d03
Summary:
util/ means for lower level libraries, so it's a good idea to move the files which requires knowledge to DB out. Create a file/ and move some files there.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5375
Differential Revision: D15550935
Pulled By: siying
fbshipit-source-id: 61a9715dcde5386eebfb43e93f847bba1ae0d3f2
Summary:
Depending on the config, manual compaction (leveled compaction style) does following compactions:
L0->L1
L1->L2
...
Ln-1 -> Ln
Ln -> Ln
The final Ln -> Ln compaction is partly unnecessary as it recompacts all the files that were just generated by the Ln-1 -> Ln. We should avoid recompacting such files. This rule should be applied to Lmax only.
Resolves issue https://github.com/facebook/rocksdb/issues/4995
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5138
Differential Revision: D14940106
Pulled By: miasantreble
fbshipit-source-id: 8d3cf5507a17e76f3333cfd4bac5256d005636e5
Summary:
mv port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5152
Differential Revision: D14779409
Pulled By: siying
fbshipit-source-id: d4162c47c979c6e8cc6a9e601802864ab3768ecb
Summary:
This PR allows RocksDB to run in single-primary, multi-secondary process mode.
The writer is a regular RocksDB (e.g. an `DBImpl`) instance playing the role of a primary.
Multiple `DBImplSecondary` processes (secondaries) share the same set of SST files, MANIFEST, WAL files with the primary. Secondaries tail the MANIFEST of the primary and apply updates to their own in-memory state of the file system, e.g. `VersionStorageInfo`.
This PR has several components:
1. (Originally in #4745). Add a `PathNotFound` subcode to `IOError` to denote the failure when a secondary tries to open a file which has been deleted by the primary.
2. (Similar to #4602). Add `FragmentBufferedReader` to handle partially-read, trailing record at the end of a log from where future read can continue.
3. (Originally in #4710 and #4820). Add implementation of the secondary, i.e. `DBImplSecondary`.
3.1 Tail the primary's MANIFEST during recovery.
3.2 Tail the primary's MANIFEST during normal processing by calling `ReadAndApply`.
3.3 Tailing WAL will be in a future PR.
4. Add an example in 'examples/multi_processes_example.cc' to demonstrate the usage of secondary RocksDB instance in a multi-process setting. Instructions to run the example can be found at the beginning of the source code.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4899
Differential Revision: D14510945
Pulled By: riversand963
fbshipit-source-id: 4ac1c5693e6012ad23f7b4b42d3c374fecbe8886
Summary:
Right now ldb command doesn't allow cases where option values contain equals sign. For example,
```
ldb --db=/tmp/test scan --from='q=3' --max_keys=1
```
after parsing, ldb will have one option 'db', 'max_keys' and one flag 'from'.
This PR updates the parsing logic so that it now supports the above mentioned cases
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5088
Differential Revision: D14600869
Pulled By: miasantreble
fbshipit-source-id: c6ef518c74a98d7b6675ea5954ae08b1bda5554e