Summary:
In addition to implementing OpenWithColumnFamilies, this diff also includes some minor changes:
* Changed all column family names from Slice() to std::string. The performance of column family name handling is not critical, and it's more convenient and cleaner to have names as std::strings
* Implemented ColumnFamilyOptions(const Options&) and DBOptions(const Options&)
* Added ColumnFamilyOptions to VersionSet::ColumnFamilyData. ColumnFamilyOptions are specified on OpenWithColumnFamilies() and CreateColumnFamily()
I will keep the diff in the Phabricator for a day or two and will push to the branch then. Feel free to comment even after the diff has been pushed.
Test Plan: Added a simple unit test
Reviewers: dhruba, haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15033
Summary:
This diff provides basic implementations of CreateColumnFamily(), DropColumnFamily() and ListColumnFamilies(). It builds on top of https://reviews.facebook.net/D14733
It also includes a bug fix for DBImplReadOnly, where Get implementation would be redirected to DBImpl instead of DBImplReadOnly.
Test Plan: Added unit test
Reviewers: dhruba, haobo, kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15021
Summary:
We don't want two threads to clash if they concurrently call DisableFileDeletions() and EnableFileDeletions(). I'm adding a counter that will enable file deletions only after all DisableFileDeletions() calls have been negated with EnableFileDeletions().
However, we also don't want to break the old behavior, so I added a parameter force to EnableFileDeletions(). If force is true, we will still enable file deletions after every call to EnableFileDeletions(), which is what is happening now.
Test Plan: make check
Reviewers: dhruba, haobo, sanketh
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14781
Summary:
<This diff is for Column Family branch>
Sharing some of the work I've done so far. This diff compiles and passes the tests.
The biggest change is in options.h - I broke down Options into two parts - DBOptions and ColumnFamilyOptions. DBOptions is DB-specific (env, create_if_missing, block_cache, etc.) and ColumnFamilyOptions is column family-specific (all compaction options, compresion options, etc.). Note that this does not break backwards compatibility at all.
Further, I created DBWithColumnFamily which inherits DB interface and adds new functions with column family support. Clients can transparently switch to DBWithColumnFamily and it will not break their backwards compatibility.
There are few methods worth checking out: ListColumnFamilies(), MultiNewIterator(), MultiGet() and GetSnapshot(). [GetSnapshot() returns the snapshot across all column families for now - I think that's what we agreed on]
Finally, I made small changes to WriteBatch so we are able to atomically insert data across column families.
Please provide feedback.
Test Plan: make check works, the code is backward compatible
Reviewers: dhruba, haobo, sdong, kailiu, emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14445
Summary:
In this diff I present you BackupableDB v1. You can easily use it to backup your DB and it will do incremental snapshots for you.
Let's first describe how you would use BackupableDB. It's inheriting StackableDB interface so you can easily construct it with your DB object -- it will add a method RollTheSnapshot() to the DB object. When you call RollTheSnapshot(), current snapshot of the DB will be stored in the backup dir. To restore, you can just call RestoreDBFromBackup() on a BackupableDB (which is a static method) and it will restore all files from the backup dir. In the next version, it will even support automatic backuping every X minutes.
There are multiple things you can configure:
1. backup_env and db_env can be different, which is awesome because then you can easily backup to HDFS or wherever you feel like.
2. sync - if true, it *guarantees* backup consistency on machine reboot
3. number of snapshots to keep - this will keep last N snapshots around if you want, for some reason, be able to restore from an earlier snapshot. All the backuping is done in incremental fashion - if we already have 00010.sst, we will not copy it again. *IMPORTANT* -- This is based on assumption that 00010.sst never changes - two files named 00010.sst from the same DB will always be exactly the same. Is this true? I always copy manifest, current and log files.
4. You can decide if you want to flush the memtables before you backup, or you're fine with backing up the log files -- either way, you get a complete and consistent view of the database at a time of backup.
5. More things you can find in BackupableDBOptions
Here is the directory structure I use:
backup_dir/CURRENT_SNAPSHOT - just 4 bytes holding the latest snapshot
0, 1, 2, ... - files containing serialized version of each snapshot - containing a list of files
files/*.sst - sst files shared between snapshots - if one snapshot references 00010.sst and another one needs to backup it from the DB, it will just reference the same file
files/ 0/, 1/, 2/, ... - snapshot directories containing private snapshot files - current, manifest and log files
All the files are ref counted and deleted immediatelly when they get out of scope.
Some other stuff in this diff:
1. Added GetEnv() method to the DB. Discussed with @haobo and we agreed that it seems right thing to do.
2. Fixed StackableDB interface. The way it was set up before, I was not able to implement BackupableDB.
Test Plan:
I have a unittest, but please don't look at this yet. I just hacked it up to help me with debugging. I will write a lot of good tests and update the diff.
Also, `make asan_check`
Reviewers: dhruba, haobo, emayanke
Reviewed By: dhruba
CC: leveldb, haobo
Differential Revision: https://reviews.facebook.net/D14295
Summary: As title
Test Plan: make clean and make
Reviewers: igor
Reviewed By: igor
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14469
Summary: This would enable rocksdb users to get the db identity without depending on implementation details(storing that in IDENTITY file)
Test Plan: db/db_test (has identity checks)
Reviewers: dhruba, haobo, igor, kailiu
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14463
Summary: We need access to options for BackupableDB
Test Plan: make check
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb, reconnect.grayhat
Differential Revision: https://reviews.facebook.net/D14331
Summary: This is part of https://reviews.facebook.net/D14295 -- smaller diff that is easier to review
Test Plan: make asan_check
Reviewers: dhruba, haobo, emayanke
Reviewed By: emayanke
CC: leveldb, kailiu, reconnect.grayhat
Differential Revision: https://reviews.facebook.net/D14301
Summary:
Archive cleaning will still happen every WAL_ttl seconds
but archived logs will be deleted only if archive size
is greater then a WAL_size_limit value.
Empty archived logs will be deleted evety WAL_ttl.
Test Plan:
1. Unit tests pass.
2. Benchmark.
Reviewers: emayanke, dhruba, haobo, sdong, kailiu, igor
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13869
Summary:
strict essentially means that we MUST find the startsequence. Thus we should return if starteSequence is not found in the first file in case strict is set. This will take care of ending the iterator in case of permanent gaps due to corruptions in the log files
Also created NextImpl function that will have internal variable to distinguish whether Next is being called from StartSequence or by application.
Set NotFoudn::gaps status to give an indication of gaps happeneing.
Polished the inline documentation at various places
Test Plan:
* db_repl_stress test
* db_test relating to transaction log iterator
* fbcode/wormhole/rocksdb/rocks_log_iterator
* sigma production machine sigmafio032.prn1
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13689
Summary:
Currently for each put, a fresh memory is allocated, and a new entry is added to the memtable with a new sequence number irrespective of whether the key already exists in the memtable. This diff is an attempt to update the value inplace for existing keys. It currently handles a very simple case:
1. Key already exists in the current memtable. Does not inplace update values in immutable memtable or snapshot
2. Latest value type is a 'put' ie kTypeValue
3. New value size is less than existing value, to avoid reallocating memory
TODO: For a put of an existing key, deallocate memory take by values, for other value types till a kTypeValue is found, ie. remove kTypeMerge.
TODO: Update the transaction log, to allow consistent reload of the memtable.
Test Plan: Added a unit test verifying the inplace update. But some other unit tests broken due to invalid sequence number checks. WIll fix them next.
Reviewers: xinyaohu, sumeet, haobo, dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12423
Automatic commit by arc
Summary:
This is to simplify rocksdb public APIs and improve the code quality.
Created an additional parameter to ParseFileName for log sub type and improved the code for deleting a wal file.
Wrote exhaustive unit-tests in delete_file_test
Unification of other redundant APIs can be taken up in a separate diff
Test Plan: Expanded delete_file test
Reviewers: dhruba, haobo, kailiu, sdong
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13647
Summary:
The previous release 2.4 had a mapping to alias the older
namespace to rocksdb. This mapping is not needed in the new
release.
Test Plan:
make check
make release
Reviewers: emayanke
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13359
Summary:
Change namespace from leveldb to rocksdb. This allows a single
application to link in open-source leveldb code as well as
rocksdb code into the same process.
Test Plan: compile rocksdb
Reviewers: emayanke
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13287
Summary:
As explained in comments in GetLiveFiles in db.h, this option will cause flush to be skipped in GetLiveFiles because some use-cases use GetSortedWalFiles after GetLiveFiles to generate more complete snapshots.
Using GetSortedWalFiles after GetLiveFiles allows us to not Flush in GetLiveFiles first because wals have everything.
Note: file deletions will be disabled before calling GLF or GSWF so live logs will not move to archive logs or get delted.
Note: Manifest file is truncated to a proper value in GLF, so it will always reply from the proper wal files on a restart
Test Plan: make
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13257
Summary: LogData doesn't consume sequence numbers and doesn't increase the count of the write-batch. Also it was discussed that GetLiveFiles will have to be followed by GetSortedWalFiles to get a lossless backup
Test Plan: visual
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12753
Summary: So that replication can just download from wherever LogFile.Pathname is pointing them.
Test Plan: make all check;./db_repl_stress
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12609
Summary:
Add new command "change_compaction_style" to ldb tool. For
universal->level, it shows "nothing to do". For level->universal, it
compacts all files into a single one and moves the file to level 0.
Also add check for number of files at level 1+ when opening db with
universal compaction style.
Test Plan:
'make all check'. New unit test for internal convertion function. Also manully test various
cmd like:
./ldb change_compaction_style --old_compaction_style=0
--new_compaction_style=1 --db=/tmp/leveldbtest-3088/db_test
Reviewers: haobo, dhruba
Reviewed By: haobo
CC: vamsi, emayanke
Differential Revision: https://reviews.facebook.net/D12603
Summary:
The DeleteFile API was removing files inside the db-lock. This
is now changed to remove files outside the db-lock.
The GetLiveFilesMetadata() returns the smallest and largest
seqnuence number of each file as well.
Test Plan: deletefile_test
Reviewers: emayanke, haobo
Reviewed By: haobo
CC: leveldb
Maniphest Tasks: T63
Differential Revision: https://reviews.facebook.net/D12567
Summary: Replace include/leveldb with include/rocksdb.
Test Plan:
make clean; make check
make clean; make release
Differential Revision: https://reviews.facebook.net/D12489