Summary:
Only if there is a crash, power failure, or I/O error in
DeleteBackup, shared or private files from the backup might be left
behind that are not cleaned up by PurgeOldBackups or DeleteBackup-- only
by GarbageCollect. This makes the BackupEngine API "leaky by default."
Even if it means a modest performance hit, I think we should make
Delete and Purge do as they say, with ongoing best effort: i.e. future
calls will attempt to finish any incomplete work from earlier calls.
This change does that by having DeleteBackup and PurgeOldBackups do a
GarbageCollect, unless (to minimize performance hit) this BackupEngine
has already done a GarbageCollect and there have been no
deletion-related I/O errors in that GarbageCollect or since then.
Rejected alternative 1: remove meta file last instead of first. This would in theory turn partially deleted backups into corrupted backups, but code changes would be needed to allow the missing files and consider it acceptably corrupt, rather than failing to open the BackupEngine. This might be a reasonable choice, but I mostly rejected it because it doesn't solve the legacy problem of cleaning up existing lingering files.
Rejected alternative 2: use a deletion marker file. If deletion started with creating a file that marks a backup as flagged for deletion, then we could reliably detect partially deleted backups and efficiently finish removing them. In addition to not solving the legacy problem, this could be precarious if there's a disk full situation, and we try to create a new file in order to delete some files. Ugh.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6015
Test Plan: Updated unit tests
Differential Revision: D18401333
Pulled By: pdillinger
fbshipit-source-id: 12944e372ce6809f3f5a4c416c3b321a8927d925
Summary:
file_reader_writer.h and .cc contain several files and helper function, and it's hard to navigate. Separate it to multiple files and put them under file/
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5803
Test Plan: Build whole project using make and cmake.
Differential Revision: D17374550
fbshipit-source-id: 10efca907721e7a78ed25bbf74dc5410dea05987
Summary:
There are too many types of files under util/. Some test related files don't belong to there or just are just loosely related. Mo
ve them to a new directory test_util/, so that util/ is cleaner.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5377
Differential Revision: D15551366
Pulled By: siying
fbshipit-source-id: 0f5c8653832354ef8caa31749c0143815d719e2c
Summary:
util/ means for lower level libraries, so it's a good idea to move the files which requires knowledge to DB out. Create a file/ and move some files there.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5375
Differential Revision: D15550935
Pulled By: siying
fbshipit-source-id: 61a9715dcde5386eebfb43e93f847bba1ae0d3f2
Summary:
The code convention we are following, Google C++ Style, discourage
alias in header files, especially public headers:
https://google.github.io/styleguide/cppguide.html#Aliases
Remove some of them. Might removed some from .cc files as well to be consistent.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5113
Differential Revision: D14633030
Pulled By: siying
fbshipit-source-id: b990edc919d5de60295992284f980195e501d424
Summary:
Use the `DBOptions` that the backup engine already holds to figure out the right `EnvOptions` to use when reading the DB files. This means that, if a user opened a DB instance with `use_direct_reads=true`, then using `BackupEngine` to back up that DB instance will use direct I/O to read files when calculating checksums and copying. Currently the WALs and manifests would still be read using buffered I/O to prevent mixing direct I/O reads with concurrent buffered I/O writes.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4640
Differential Revision: D13015268
Pulled By: ajkr
fbshipit-source-id: 77006ad6f3e00ce58374ca4793b785eea0db6269
Summary:
Ran the following commands to recursively change all the files under RocksDB:
```
find . -type f -name "*.cc" -exec sed -i 's/ unique_ptr/ std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<unique_ptr/<std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/ shared_ptr/ std::shared_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<shared_ptr/<std::shared_ptr/g' {} +
```
Running `make format` updated some formatting on the files touched.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4638
Differential Revision: D12934992
Pulled By: sagar0
fbshipit-source-id: 45a15d23c230cdd64c08f9c0243e5183934338a8
Summary:
The patch makes sure that two parallel test threads will operate on different db paths. This enables using open source tools such as gtest-parallel to run the tests of a file in parallel.
Example: ``` ~/gtest-parallel/gtest-parallel ./table_test```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4135
Differential Revision: D8846653
Pulled By: maysamyabandeh
fbshipit-source-id: 799bad1abb260e3d346bcb680d2ae207a852ba84
Summary:
Prior to this PR, there was a race condition between `DBImpl::SetOptions` and `BackupEngine::CreateNewBackup`, as illustrated below.
```
Time thread 1 thread 2
| CreateNewBackup -> GetLiveFiles
| SetOptions -> RenameTempFileToOptionsFile
| SetOptions -> RenameTempFileToOptionsFile
| SetOptions -> RenameTempFileToOptionsFile // unlink oldest OPTIONS file
| copy the oldest OPTIONS // IO error!
V
```
Proposed fix is to check the value of `DBImpl::disable_obsolete_files_deletion_` before calling `DeleteObsoleteOptionsFiles`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4108
Differential Revision: D8796360
Pulled By: riversand963
fbshipit-source-id: 02045317f793ea4c7d4400a5bf333b8502fa3e82
Summary:
We used to name private directories like "1.tmp" while BackupEngine populated them, and then rename without the ".tmp" suffix (i.e., rename "1.tmp" to "1") after all files were copied. On glusterfs, directory renames like this require operations across many hosts, and partial failures have caused operational problems.
Fortunately we don't need to rename private directories. We already have a meta-file that uses the tempfile-rename pattern to commit a backup atomically after all its files have been successfully copied. So we can copy private files directly to their final location, so now there's no directory rename.
Closes https://github.com/facebook/rocksdb/pull/3749
Differential Revision: D7705610
Pulled By: ajkr
fbshipit-source-id: fd724a28dd2bf993ce323a5f2cb7e7d6980cc346
Summary:
this PR fixes a few failed contbuild:
1. ASAN memory leak in Block::NewIterator (table/block.cc:429). the proper destruction of first_level_iter_ and second_level_iter_ of two_level_iterator.cc is missing from the code after the refactoring in https://github.com/facebook/rocksdb/pull/3406
2. various unused param errors introduced by https://github.com/facebook/rocksdb/pull/3662
3. updated comment for `ForceReleaseCachedEntry` to emphasize the use of `force_erase` flag.
Closes https://github.com/facebook/rocksdb/pull/3718
Reviewed By: maysamyabandeh
Differential Revision: D7621192
Pulled By: miasantreble
fbshipit-source-id: 476c94264083a0730ded957c29de7807e4f5b146
Summary:
When `max_valid_backups_to_open` is set, the `BackupEngine` doesn't know about the files referenced by existing backups. This PR prevents us from deleting valid files when that option is set, in cases where we are unable to accurately determine refcount. There are warnings logged when we may miss deleting unreferenced files, and a recommendation in the header for users to periodically unset this option and run a full `GarbageCollect`.
Closes https://github.com/facebook/rocksdb/pull/3518
Differential Revision: D7008331
Pulled By: ajkr
fbshipit-source-id: 87907f964dc9716e229d08636a895d2fc7b72305
Summary:
This diff handles cases where compaction causes an ENOSPC error.
This does not handle corner cases where another background job is started while compaction is running, and the other background job triggers ENOSPC, although we do allow the user to provision for these background jobs with SstFileManager::SetCompactionBufferSize.
It also does not handle the case where compaction has finished and some other background job independently triggers ENOSPC.
Usage: Functionality is inside SstFileManager. In particular, users should set SstFileManager::SetMaxAllowedSpaceUsage, which is the reference highwatermark for determining whether to cancel compactions.
Closes https://github.com/facebook/rocksdb/pull/3449
Differential Revision: D7016941
Pulled By: amytai
fbshipit-source-id: 8965ab8dd8b00972e771637a41b4e6c645450445
Summary:
Use the rsync tempfile naming convention in our `BackupEngine`. The temp file follows the format, `.<filename>.<suffix>`, which is later renamed to `<filename>`. We fix `tmp` as the `<suffix>` as we don't need to use random bytes for now. The benefit is gluster treats this tempfile naming convention specially and applies hashing only to `<filename>`, so the file won't need to be linked or moved when it's renamed. Our gluster team suggested this will make things operationally easier.
Closes https://github.com/facebook/rocksdb/pull/3463
Differential Revision: D6893333
Pulled By: ajkr
fbshipit-source-id: fd7622978f4b2487fce33cde40dd3124f16bcaa8
Summary:
There are internal users who open BackupEngine for writing new backups only, and they don't care whether old backups can be read or not. The condition `BackupableDBOptions::max_valid_backups_to_open == 0` should be supported (previously in df74b775e6 I made the mistake of choosing 0 as a special value to disable the limit).
Closes https://github.com/facebook/rocksdb/pull/2819
Differential Revision: D5751599
Pulled By: ajkr
fbshipit-source-id: e73ac19eb5d756d6b68601eae8e43407ee4f2752
Summary:
Backup engine is intentionally openable even when some backups are corrupt. Previously the engine could write new backups as long as the most recent backup wasn't corrupt. This PR makes the backup engine able to create new backups even when the most recent one is corrupt.
We now maintain two ID instance variables:
- `latest_backup_id_` is used when creating backup to choose the new ID
- `latest_valid_backup_id_` is used when restoring latest backup since we want most recent valid one
Closes https://github.com/facebook/rocksdb/pull/2804
Differential Revision: D5734148
Pulled By: ajkr
fbshipit-source-id: db440707b31df2c7b084188aa5f6368449e10bcf
Summary:
This reverts the previous commit 1d7048c598, which broke the build.
Did a `git revert 1d7048c`.
Closes https://github.com/facebook/rocksdb/pull/2627
Differential Revision: D5476473
Pulled By: sagar0
fbshipit-source-id: 4756ff5c0dfc88c17eceb00e02c36176de728d06
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.
Reviewed By: igorsugak
Differential Revision: D5454343
fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
Summary:
Throughput: 46k tps in our sysbench settings (filling the details later)
The idea is to have the simplest change that gives us a reasonable boost
in 2PC throughput.
Major design changes:
1. The WAL file internal buffer is not flushed after each write. Instead
it is flushed before critical operations (WAL copy via fs) or when
FlushWAL is called by MySQL. Flushing the WAL buffer is also protected
via mutex_.
2. Use two sequence numbers: last seq, and last seq for write. Last seq
is the last visible sequence number for reads. Last seq for write is the
next sequence number that should be used to write to WAL/memtable. This
allows to have a memtable write be in parallel to WAL writes.
3. BatchGroup is not used for writes. This means that we can have
parallel writers which changes a major assumption in the code base. To
accommodate for that i) allow only 1 WriteImpl that intends to write to
memtable via mem_mutex_--which is fine since in 2PC almost all of the memtable writes
come via group commit phase which is serial anyway, ii) make all the
parts in the code base that assumed to be the only writer (via
EnterUnbatched) to also acquire mem_mutex_, iii) stat updates are
protected via a stat_mutex_.
Note: the first commit has the approach figured out but is not clean.
Submitting the PR anyway to get the early feedback on the approach. If
we are ok with the approach I will go ahead with this updates:
0) Rebase with Yi's pipelining changes
1) Currently batching is disabled by default to make sure that it will be
consistent with all unit tests. Will make this optional via a config.
2) A couple of unit tests are disabled. They need to be updated with the
serial commit of 2PC taken into account.
3) Replacing BatchGroup with mem_mutex_ got a bit ugly as it requires
releasing mutex_ beforehand (the same way EnterUnbatched does). This
needs to be cleaned up.
Closes https://github.com/facebook/rocksdb/pull/2345
Differential Revision: D5210732
Pulled By: maysamyabandeh
fbshipit-source-id: 78653bd95a35cd1e831e555e0e57bdfd695355a4
Summary:
These code paths forked when checkpoint was introduced by copy/pasting the core backup logic. Over time they diverged and bug fixes were sometimes applied to one but not the other (like fix to include all relevant WALs for 2PC), or it required extra effort to fix both (like fix to forge CURRENT file). This diff reunites the code paths by extracting the core logic into a function, CreateCustomCheckpoint(), that is customizable via callbacks to implement both checkpoint and backup.
Related changes:
- flush_before_backup is now forcibly enabled when 2PC is enabled
- Extracted CheckpointImpl class definition into a header file. This is so the function, CreateCustomCheckpoint(), can be called by internal rocksdb code but not exposed to users.
- Implemented more functions in DummyDB/DummyLogFile (in backupable_db_test.cc) that are used by CreateCustomCheckpoint().
Closes https://github.com/facebook/rocksdb/pull/1932
Differential Revision: D4622986
Pulled By: ajkr
fbshipit-source-id: 157723884236ee3999a682673b64f7457a7a0d87
Summary:
This was requested by a customer who wants to proactively monitor whether any valid backups are available. The existing performance was poor because Open() serially reads every small meta-file (one per backup), which was slow on HDFS.
Now we only read the minimum number of meta-files to find `max_valid_backups_to_open` valid backups. The customer mentioned above can just set it to one.
Closes https://github.com/facebook/rocksdb/pull/2151
Differential Revision: D4882564
Pulled By: ajkr
fbshipit-source-id: cb0edf9e8ac693e4d5f24902e725a011ed8c0c2f
Summary:
Move some files under util/ to new directories env/, monitoring/ options/ and cache/
Closes https://github.com/facebook/rocksdb/pull/2090
Differential Revision: D4833681
Pulled By: siying
fbshipit-source-id: 2fd8bef
Summary:
It is confusing to have auto_roll_logger to stay under db/, which has nothing to do with database. Move filename together as it is a dependency.
Closes https://github.com/facebook/rocksdb/pull/2080
Differential Revision: D4821141
Pulled By: siying
fbshipit-source-id: ca7d768
Summary:
previously we only cleaned up .tmp files under "shared/" and "private/" directories in case the previous backup failed. we need to do the same for "shared_checksum/"; otherwise, the subsequent backup will fail if it tries to backup at least one of the same files.
Closes https://github.com/facebook/rocksdb/pull/2062
Differential Revision: D4805599
Pulled By: ajkr
fbshipit-source-id: eaa6088
Summary:
As the last step in backup creation, the .tmp directory is renamed omitting the .tmp suffix. In case the process terminates before this, the .tmp directory will be left behind. Even if this happens, we want future backups to succeed, so I added some checks/cleanup for this case.
Closes https://github.com/facebook/rocksdb/pull/1896
Differential Revision: D4597323
Pulled By: ajkr
fbshipit-source-id: 48900d8
Summary:
introduce new methods into a public threadpool interface,
- allow submission of std::functions as they allow greater flexibility.
- add Joining methods to the implementation to join scheduled and submitted jobs with
an option to cancel jobs that did not start executing.
- Remove ugly `#ifdefs` between pthread and std implementation, make it uniform.
- introduce pimpl for a drop in replacement of the implementation
- Introduce rocksdb::port::Thread typedef which is a replacement for std::thread. On Posix Thread defaults as before std::thread.
- Implement WindowsThread that allocates memory in a more controllable manner than windows std::thread with a replaceable implementation.
- should be no functionality changes.
Closes https://github.com/facebook/rocksdb/pull/1823
Differential Revision: D4492902
Pulled By: siying
fbshipit-source-id: c74cb11
Summary:
#1733 started using SizeFileBytes(), so our dummy log file implementation should stop asserting that this function isn't called.
Closes https://github.com/facebook/rocksdb/pull/1740
Differential Revision: D4376055
Pulled By: ajkr
fbshipit-source-id: 2854d89
Summary:
We used to treat any failure to read a backup's meta-file as if the backup were corrupted; however, we should distinguish corruption errors from errors in the backup Env. This fixes an issue where callers would get inconsistent results from GetBackupInfo() if they called it on an engine that encountered Env error during initialization. Now we fail Initialize() in this case so callers cannot invoke GetBackupInfo() on such engines.
Closes https://github.com/facebook/rocksdb/pull/1654
Differential Revision: D4318573
Pulled By: ajkr
fbshipit-source-id: f7a7c54
Summary: Backup options file to private directory
Test Plan:
backupable_db_test.cc, BackupOptions
Modify DB options by calling OpenDB for 3 times. Check the latest options file is in the right place. Also check no redundent files are backuped.
Reviewers: andrewkr
Reviewed By: andrewkr
Subscribers: leveldb, dhruba, andrewkr
Differential Revision: https://reviews.facebook.net/D59373
Summary:
Rocksdb backup and restore rate limiting is currently done per backup/restore.
So, it is difficult to control rate across multiple backup/restores. With this
change, a throttler can be provided. If a throttler is provided, it is used.
Otherwise, a new throttler is created based on the actual rate limits specified
in the options.
Test Plan: Added unit tests
Reviewers: ldemailly, andrewkr, sdong
Reviewed By: andrewkr
Subscribers: igor, yiwu, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56265
Summary: Disable backupable_db_test.cc on Windows since EnvChroot is not supported
Test Plan: check ROCKSDB_LITE
Reviewers: andrewkr
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D58035
Summary:
- Used ChrootEnv so the database and backup Envs are isolated in the filesystem.
- Removed DifferentEnvs test since now every test uses different Envs
Depends on D57543
Test Plan:
- ran backupable_db_test
- verified backupable_db_test now catches the bug when D57159 is backed out (this bug previously passed through the test cases, which motivated this change)
Reviewers: sdong, lightmark, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D57615
Summary: Fix BackupableDBTest.NoDoubleCopy and BackupableDBTest.DifferentEnvs by mocking the db files in db_env instead of backup_env_
Test Plan: make check -j64
Reviewers: sdong, andrewkr
Reviewed By: andrewkr
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D57273
Summary:
Several of backupable_db_test fails if running standalone, because of directory missing. Fix it by:
(1) garbage collector skips shared directory if it doesn't exit
(2) BackupableDBTest.Issue921Test to create the parent directory of the backup directory fist.
Test Plan: Run the tests individually and make sure they pass
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56829
Summary:
Rocksdb backup engine maintains metadata about backups in separate files. But,
there was no way to add extra application specific data to it. Adding support
for that.
In some use cases, applications decide to restore a backup based on some
metadata. This will help those cases to cheaply decide whether to restore or
not.
Test Plan:
Added a unit test. Existing ones are passing
Sample meta file for BinaryMetadata test-
```
1459454043
0
metadata 6162630A64656600676869
2
private/1/MANIFEST-000001 crc32 1184723444
private/1/CURRENT crc32 3505765120
```
Reviewers: sdong, ldemailly, andrewkr
Reviewed By: andrewkr
Subscribers: andrewkr, dhruba, ldemailly
Differential Revision: https://reviews.facebook.net/D56007
Summary:
- Keep track of obsolete manifests in VersionSet
- Updated FindObsoleteFiles() to put obsolete manifests in the JobContext for later use by PurgeObsoleteFiles()
- Added test case that verifies a stale manifest is deleted by a non-full purge
Test Plan:
$ ./backupable_db_test --gtest_filter=BackupableDBTest.ChangeManifestDuringBackupCreation
Reviewers: IslamAbdelRahman, yoshinorim, sdong
Reviewed By: sdong
Subscribers: andrewkr, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D55269
Summary:
Now that we get sizes efficiently, we no longer need the workaround to
embed file size in filename.
Test Plan:
$ ./backupable_db_test
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D55035
Summary:
For VerifyBackup(), backup files can be spread across "shared/",
"shared_checksum/", and "private/" subdirectories, so we have to
bulk get all three.
For CreateNewBackup(), we make two separate bulk calls: one for the
data files and one for WAL files.
There is also a new helper function, ExtendPathnameToSizeBytes(),
that translates the file attributes vector to a map. I decided to leave
GetChildrenFileAttributes()'s (from D53781) return type as vector to
keep it consistent with GetChildren().
Depends on D53781.
Test Plan:
verified relevant unit tests
$ ./backupable_db_test
Reviewers: IslamAbdelRahman, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D53919
Summary:
Fixed two related race conditions in backup creation.
(1) CreateNewBackup() uses DB::DisableFileDeletions() to prevent table files
from being deleted while it is copying; however, the MANIFEST file could still
rotate during this time. The fix is to stop deleting the old manifest in the
rotation logic. It will be deleted safely later when PurgeObsoleteFiles() runs
(can only happen when file deletions are enabled).
(2) CreateNewBackup() did not account for the CURRENT file being mutable.
This is significant because the files returned by GetLiveFiles() contain a
particular manifest filename, but the manifest to which CURRENT refers can
change at any time. This causes problems when CURRENT changes between the call
to GetLiveFiles() and when it's copied to the backup directory. To workaround this, I
manually forge a CURRENT file referring to the manifest filename returned in
GetLiveFiles().
(2) also applies to the checkpointing code, so let me know if this approach is
good and I'll make the same change there.
Test Plan:
new test for roll manifest during backup creation.
running the test before this change:
$ ./backupable_db_test --gtest_filter=BackupableDBTest.ChangeManifestDuringBackupCreation
...
IO error: /tmp/rocksdbtest-9383/backupable_db/MANIFEST-000001: No such file or directory
running the test after this change:
$ ./backupable_db_test --gtest_filter=BackupableDBTest.ChangeManifestDuringBackupCreation
...
[ RUN ] BackupableDBTest.ChangeManifestDuringBackupCreation
[ OK ] BackupableDBTest.ChangeManifestDuringBackupCreation (2836 ms)
Reviewers: IslamAbdelRahman, anthony, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D54711
Summary:
As titled. This fixes the tsan error caused by logger_ being used in
backup_engine_'s destructor. It does not fix the transient unit test failure,
which is caused by MANIFEST file changing while backup is happening.
Test Plan:
verified the tsan error no longer happens on either success or
failure.
$ COMPILE_WITH_TSAN=1 make -j32 backupable_db_test
$ while ./backupable_db_test --gtest_filter=BackupableDBTest.CorruptionsTest ; do : ; done
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D54669
Summary: fix memory leak in test code
Test Plan: ran test
Reviewers: rven, igor, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52617
Summary:
See a bug report here: https://github.com/facebook/rocksdb/issues/921
The fix is to not check the shared/ directory if share_table_files is false. We could also check FileExists() before GetChildren(), but that will add extra latency when Env is Hdfs :(
Test Plan: added a unit test
Reviewers: rven, sdong, IslamAbdelRahman, yhchiang, anthony
Reviewed By: anthony
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52593
Summary: Getting file size from all the backup files can take a long time. In some cases, the sizes are available in file names. We allow a mode to get those sizes from file name.
Test Plan:
Make some unit tests in backupable_db_test to run in such a mode.
Make sure RocksDB Lite builds too.
Reviewers: IslamAbdelRahman, rven, yhchiang, kradhakrishnan, anthony, igor
Reviewed By: igor
Subscribers: muthu, asameet, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D51243
Summary:
This patch allows rocksdb to persist options into a file on
DB::Open, SetOptions, and Create / Drop ColumnFamily.
Options files are created under the same directory as the rocksdb
instance.
In addition, this patch also adds a fail_if_missing_options_file in DBOptions
that makes any function call return non-ok status when it is not able to
persist options properly.
// If true, then DB::Open / CreateColumnFamily / DropColumnFamily
// / SetOptions will fail if options file is not detected or properly
// persisted.
//
// DEFAULT: false
bool fail_if_missing_options_file;
Options file names are formatted as OPTIONS-<number>, and RocksDB
will always keep the latest two options files.
Test Plan:
Add options_file_test.
options_test
column_family_test
Reviewers: igor, IslamAbdelRahman, sdong, anthony
Reviewed By: anthony
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D48285
Summary:
CreateLoggerFromOptions have some parameters like db_log_dir and env, these parameters are redundant since they already exist in DBOptions
this patch remove the redundant parameters and expose CreateLoggerFromOptions to users
Test Plan: make check
Reviewers: igor, anthony, yhchiang, rven, kradhakrishnan, sdong
Reviewed By: sdong
Subscribers: dhruba, hermanlee4
Differential Revision: https://reviews.facebook.net/D49713
Summary: FailOverwritingBackups has unexpected results when auto-compaction runs.
Test Plan: ran test a bunch of times
Reviewers: IslamAbdelRahman, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D47181
Summary:
In case of huge db backup infromation about progress of downloading would help.
New callback parameter in CreateNewBackup() function will trigger whenever a some amount of data downloaded.
Task: 8057631
Test Plan:
ProgressCallbackDuringBackup test that cover new functionality added to BackupableDBTest tests.
other test succeed as well.
Reviewers: Guenena, benj, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D46575
Summary: This diff adds a verifyBackup method to BackupEngine. The method verifies the name and size of each file in the backup.
Test Plan: Unit test cases created and passing.
Reviewers: igor, benj
Subscribers: zelaine.fong, yhchiang, sdong, lgalanis, dhruba, AaronFeldman
Differential Revision: https://reviews.facebook.net/D46029
Summary:
In the first implementation of BackupEngine, LATEST_BACKUP was the commit point. The backup became committed after the write to LATEST_BACKUP completed.
However, we can avoid the need for LATEST_BACKUP. Instead of write to LATEST_BACKUP, the commit point can be the rename from `meta/<backup_id>.tmp` to `meta/<backup_id>`. Once we see that there exists a file `meta/<backup_id>` (without tmp), we can assume that backup is valid.
In this diff, we still write out the file LATEST_BACKUP. We need to do this so that we can maintain backward compatibility. However, the new version doesn't depend on this file anymore. We get the latest backup by `ls`-ing `meta` directory.
This diff depends on D41925
Test Plan: Adjusted backupable_db_test to this new behavior
Reviewers: benj, yhchiang, sdong, AaronFeldman
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D42069
Summary: RandomRWFile is not used anywhere in out code base, this patch remove RandomRWFile
Test Plan:
make check -j64
USE_CLANG=1 make all -j64
OPT=-DROCKSDB_LITE make release -j64
Reviewers: sdong, yhchiang, anthony, kradhakrishnan, rven, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D44091
Summary: Add new CheckFileExists method. Considered changing the FileExists api but didn't want to break anyone's builds.
Test Plan: unit tests
Reviewers: yhchiang, igor, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D42003
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
Summary:
Couple of changes here:
* NewBackupEngine() and NewReadOnlyBackupEngine() are now removed. They were deprecated since RocksDB 3.8. Changing these to new functions should be pretty straight-forward. As a followup, I'll fix all fbcode callsights
* Instead of initializing backup engine in the constructor, we initialize it in a separate function now. That way, we can catch all errors and return appropriate status code.
* We catch all errors during initializations and return them to the client properly.
* Added new tests to backupable_db_test, to make sure that we can't open BackupEngine when there are Env errors.
* Transitioned backupable_db_test to use BackupEngine rather than BackupableDB. From the two available APIs, judging by the current use-cases, it looks like BackupEngine API won. It's much more flexible since it doesn't require StackableDB.
Test Plan: Added a new unit test to backupable_db_test
Reviewers: yhchiang, sdong, AaronFeldman
Reviewed By: AaronFeldman
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D41925
Summary:
Add a new field: BackupableDBOptions.max_background_copies.
CreateNewBackup() and RestoreDBFromBackup() will use this number of threads to perform copies.
If there is a backup rate limit, then max_background_copies must be 1.
Update backupable_db_test.cc to test multi-threaded backup and restore.
Update backupable_db_test.cc to test backups when the backup environment is not the same as the database environment.
Test Plan:
Run ./backupable_db_test
Run valgrind ./backupable_db_test
Run with TSAN and ASAN
Reviewers: yhchiang, rven, anthony, sdong, igor
Reviewed By: igor
Subscribers: yhchiang, anthony, sdong, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D40725
Summary: Some test and benchmark codes don't build for CYGWIN. Fix it.
Test Plan: Build "make all" with TARGET_OS=Cygwin on cygwin and make sure it passes.
Reviewers: rven, yhchiang, anthony, igor, kradhakrishnan
Reviewed By: igor, kradhakrishnan
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D39711
Summary:
In D28521 we removed GarbageCollect() from BackupEngine's constructor. The reason was that opening BackupEngine on HDFS was very slow and in most cases we didn't have any garbage. We allowed the user to call GarbageCollect() when it detects some garbage files in his backup directory.
Unfortunately, this left us vulnerable to an interesting issue. Let's say we started a backup and copied files {1, 3} but the backup failed. On another host, we restore DB from backup and generate {1, 3, 5}. Since {1, 3} is already there, we will not overwrite. However, these files might be from a different database so their contents might be different. See internal task t6781803 for more info.
Now, when we're copying files and we discover a file already there, we check:
1. if the file is not referenced from any backups, we overwrite the file.
2. if the file is referenced from other backups AND the checksums don't match, we fail the backup. This will only happen if user is using a single backup directory for backing up two different databases.
3. if the file is referenced from other backups AND the checksums match, it's all good. We skip the copy and go copy the next file.
Test Plan: Added new test to backupable_db_test. The test fails before this patch.
Reviewers: sdong, rven, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37599
Summary:
The usage I'm fixing here caused trouble on Fedora 21 when
compiling with the current gcc version 4.9.2 20150212 (Red Hat 4.9.2-6) (GCC):
db/write_controller_test.cc: In member function ‘virtual void rocksdb::WriteControllerTest_SanityTest_Test::TestBody()’:
db/write_controller_test.cc:23:165: error: converting ‘false’ to pointer type for argument 1 of ‘char testing::internal::IsNullLiteralHelper(testing::internal::Secret*)’ [-Werror=conversion-null]
ASSERT_EQ(false, controller.IsStopped());
^
This change was induced mechanically via:
git grep -l -E 'ASSERT_EQ\(false'|xargs perl -pi -e 's/ASSERT_EQ\(false, /ASSERT_FALSE(/'
git grep -l -E 'ASSERT_EQ\(true'|xargs perl -pi -e 's/ASSERT_EQ\(true, /ASSERT_TRUE(/'
Except for the three in utilities/backupable/backupable_db_test.cc for which
I ended up reformatting (joining lines) in the result.
As for why this problem is exhibited with that version of gcc, and none
of the others I've used (from 4.8.1 through gcc-5.0.0 and newer), I suspect
it's a bug in F21's gcc that has been fixed in gcc-5.0.0.
Test Plan:
"make" now succeed on Fedora 21
Reviewers: ljin, rven, igor.sugak, yhchiang, sdong, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37329
Summary:
Our existing test notation is very similar to what is used in gtest. It makes it easy to adopt what is different.
In this diff I modify existing [[ https://code.google.com/p/googletest/wiki/Primer#Test_Fixtures:_Using_the_Same_Data_Configuration_for_Multiple_Te | test fixture ]] classes to inherit from `testing::Test`. Also for unit tests that use fixture class, `TEST` is replaced with `TEST_F` as required in gtest.
There are several custom `main` functions in our existing tests. To make this transition easier, I modify all `main` functions to fallow gtest notation. But eventually we can remove them and use implementation of `main` that gtest provides.
```lang=bash
% cat ~/transform
#!/bin/sh
files=$(git ls-files '*test\.cc')
for file in $files
do
if grep -q "rocksdb::test::RunAllTests()" $file
then
if grep -Eq '^class \w+Test {' $file
then
perl -pi -e 's/^(class \w+Test) {/${1}: public testing::Test {/g' $file
perl -pi -e 's/^(TEST)/${1}_F/g' $file
fi
perl -pi -e 's/(int main.*\{)/${1}::testing::InitGoogleTest(&argc, argv);/g' $file
perl -pi -e 's/rocksdb::test::RunAllTests/RUN_ALL_TESTS/g' $file
fi
done
% sh ~/transform
% make format
```
Second iteration of this diff contains only scripted changes.
Third iteration contains manual changes to fix last errors and make it compilable.
Test Plan:
Build and notice no errors.
```lang=bash
% USE_CLANG=1 make check -j55
```
Tests are still testing.
Reviewers: meyering, sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D35157
Summary:
gtest does not use exceptions to fail a unit test by design, and `ASSERT*`s are implemented using `return`. As a consequence we cannot use `ASSERT*` in a function that does not return `void` value ([[ https://code.google.com/p/googletest/wiki/AdvancedGuide#Assertion_Placement | 1]]), and have to fix our existing code. This diff does this in a generic way, with no manual changes.
In order to detect all existing `ASSERT*` that are used in functions that doesn't return void value, I change the code to generate compile errors for such cases.
In `util/testharness.h` I defined `EXPECT*` assertions, the same way as `ASSERT*`, and redefined `ASSERT*` to return `void`. Then executed:
```lang=bash
% USE_CLANG=1 make all -j55 -k 2> build.log
% perl -naF: -e 'print "-- -number=".$F[1]." ".$F[0]."\n" if /: error:/' \
build.log | xargs -L 1 perl -spi -e 's/ASSERT/EXPECT/g if $. == $number'
% make format
```
After that I reverted back change to `ASSERT*` in `util/testharness.h`. But preserved introduced `EXPECT*`, which is the same as `ASSERT*`. This will be deleted once switched to gtest.
This diff is independent and contains manual changes only in `util/testharness.h`.
Test Plan:
Make sure all tests are passing.
```lang=bash
% USE_CLANG=1 make check
```
Reviewers: igor, lgalanis, sdong, yufei.zhu, rven, meyering
Reviewed By: meyering
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33333
Summary:
This diff fixes a bug introduced by D28521. Read-only backup engine can delete a backup that is later than the latest -- we never check the condition.
I also added a bunch of logging that will help with debugging cases like this in the future.
See more discussion at t6218248.
Test Plan: Added a unit test that was failing before the change. Also, see new LOG file contents: https://phabricator.fb.com/P19738984
Reviewers: benj, sanketh, sumeet, yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33897
Summary:
When using latest clang (3.6 or 3.7/trunck) rocksdb is failing with many errors. Almost all of them are missing override errors. This diff adds missing override keyword. No manual changes.
Prerequisites: bear and clang 3.5 build with extra tools
```lang=bash
% USE_CLANG=1 bear make all # generate a compilation database http://clang.llvm.org/docs/JSONCompilationDatabase.html
% clang-modernize -p . -include . -add-override
% make format
```
Test Plan:
Make sure all tests are passing.
```lang=bash
% #Use default fb code clang.
% make check
```
Verify less error and no missing override errors.
```lang=bash
% # Have trunk clang present in path.
% ROCKSDB_NO_FBCODE=1 CC=clang CXX=clang++ make
```
Reviewers: igor, kradhakrishnan, rven, meyering, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34077
Summary:
CorruptionTest for backupable_db_test did not call
GarbageCollect() after deleting a corrupt backup,
which sometimes lead to test failures as the newly created backup
would reuse the same backup ID and files and fail the consistency
check.
Moved around some of the test logic to ensure that GarbageCollect()
is called at the right time.
Test Plan:
Run backupable_db_test eight times and make sure
it passes repeatedly. Also run make check to make sure other
tests don't fail.
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28863
Summary:
Improve the backup engine by not deleting the corrupted
backup when it is detected; instead leaving it to the client
to delete the corrupted backup.
Also add a BackupEngine::Open() call.
Test Plan:
Add check to CorruptionTest inside backupable_db_test
to check that the corrupt backups are not deleted. The previous
version of the code failed this test as backups were deleted,
but after the changes in this commit, this test passes.
Run make check to ensure that no other tests fail.
Reviewers: sdong, benj, sanketh, sumeet, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28521
Summary:
All public headers need to be under `include/rocksdb` directory. Otherwise, clients include our header files like this:
#include <rocksdb/db.h>
#include <utilities/backupable_db.h> // still our public header!
Also, internally, we include:
#include "utilities/backupable/backupable_db.h" // internal header
#include "utilities/backupable_db.h" // public header
which is confusing.
This way, when we install rocksdb as a system library, we can just copy `include/rocksdb` directory to system's header files. We can't really copy `utilities` directory to system's header files.
Test Plan: compiles
Reviewers: dhruba, ljin, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D20409
Summary: added a new option to BackupEngine: if share_files_with_checksum is set to true, sst files are stored in shared_checksum/ and are identified by the triple (file name, checksum, file size) instead of just the file name. This option is targeted at distributed databases that want to backup their primary replica.
Test Plan: unit tests and tested backup and restore on a distributed rocksdb
Reviewers: igor
Reviewed By: igor
Differential Revision: https://reviews.facebook.net/D18393
Summary: Read-only BackupEngine can connect to the same backup directory that is already running BackupEngine. That enables some interesting use-cases (i.e. restoring replica from primary's backup directory)
Test Plan: added a unit test
Reviewers: dhruba, haobo, ljin
Reviewed By: ljin
CC: leveldb
Differential Revision: https://reviews.facebook.net/D18297
Summary: Might be useful if client doesn't want to effect running system during backup too much.
Test Plan: added a test case
Reviewers: dhruba, haobo, ljin
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17091
Summary:
Added an option to BackupableDB implementation that allows users to persist in-memory databases. When the restore happens with keep_log_files = true, it will
*) Not delete existing log files in wal_dir
*) Move log files from archive directory to wal_dir, so that DB can replay them if necessary
Test Plan: Added an unit test
Reviewers: dhruba, ljin
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16941