Commit Graph

37 Commits

Author SHA1 Message Date
Igor Canadi
9d0577a6be Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
	db/db_impl.h
	db/transaction_log_impl.cc
	db/transaction_log_impl.h
	include/rocksdb/options.h
	util/env.cc
	util/options.cc
2014-03-03 18:29:03 -08:00
Yueh-Hsuan Chiang
a77527f2af Add ReadOptions to TransactionLogIterator.
Summary:
Add an optional input parameter ReadOptions to DB::GetUpdateSince(),
which allows the verification of checksums to be disabled by setting
ReadOptions::verify_checksums to false.

Test Plan: Tests are done off-line and will not be included in the regular unit test.

Reviewers: igor

Reviewed By: igor

CC: leveldb, xjin, dhruba

Differential Revision: https://reviews.facebook.net/D16305
2014-02-28 11:50:36 -08:00
Igor Canadi
422bb09cb0 Fix table properties
Summary: Adapt table properties to column family world

Test Plan: make check

Reviewers: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16161
2014-02-14 17:13:10 -08:00
Igor Canadi
76c048183c Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
	db/db_test.cc
	include/rocksdb/db.h
2014-02-14 16:46:03 -08:00
kailiu
63690625cd Expose the table properties to application
Summary: Provide a public API for users to access the table properties for each SSTable.

Test Plan: Added a unit tests to test the function correctness under differnet conditions.

Reviewers: haobo, dhruba, sdong

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16083
2014-02-13 16:28:21 -08:00
Igor Canadi
b06840aa7d [CF] Rethinking ColumnFamilyHandle and fix to dropping column families
Summary:
The change to the public behavior:
* When opening a DB or creating new column family client gets a ColumnFamilyHandle.
* As long as column family handle is alive, client can do whatever he wants with it, even drop it
* Dropped column family can still be read from (using the column family handle)
* Added a new call CloseColumnFamily(). Client has to close all column families that he has opened before deleting the DB
* As soon as column family is closed, any calls to DB using that column family handle will fail (also any outstanding calls)

Internally:
* Ref-counting ColumnFamilyData
* New thread-safety for ColumnFamilySet
* Dropped column families are now completely dropped and their memory cleaned-up

Test Plan: added some tests to column_family_test

Reviewers: dhruba, haobo, kailiu, sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16101
2014-02-12 13:47:09 -08:00
Igor Canadi
c1071ed95c Merge branch 'master' into columnfamilies 2014-01-28 16:04:00 -08:00
Igor Canadi
e5ec7384a0 Better interface to create BackupEngine
Summary: I think it looks nicer. In RocksDB we have both styles, but I think that static method is the more common version.

Test Plan: backupable_db_test

Reviewers: ljin, benj, swk

Reviewed By: ljin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15519
2014-01-28 16:01:53 -08:00
Igor Canadi
ec2fa4a690 Export BackupEngine
Summary:
Lots of clients have problems with using StackableDB interface. It's nice to have BackupableDB as a layer on top of DB, but not necessary.

This diff exports BackupEngine, which can be used to create backups without forcing clients to use StackableDB interface.

Test Plan: backupable_db_test

Reviewers: dhruba, ljin, swk

Reviewed By: ljin

CC: leveldb, benj

Differential Revision: https://reviews.facebook.net/D15477
2014-01-28 11:26:07 -08:00
Igor Canadi
28d1a0c6f5 Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
	db/db_impl.h
	db/db_impl_readonly.h
	db/db_test.cc
	include/rocksdb/db.h
	include/utilities/stackable_db.h
2014-01-24 09:27:29 -08:00
Lei Jin
aba2acb5ec CompactRange() to return status
Summary: as title

Test Plan:
make all check
What else tests shall I cover?

Reviewers: igor, haobo

CC:

Differential Revision: https://reviews.facebook.net/D15339
2014-01-23 16:41:46 -08:00
Igor Canadi
151f9e144f Merge branch 'master' into columnfamilies 2014-01-13 09:09:51 -08:00
Igor Canadi
cb37ddf229 Feature requests for BackupableDB
Summary:
This diff introduces some features that were requested by two internal customers:
* Ability for backups not to share table files, because we can't guarantee that equal filename means equal content accross replicas
* Ability for two threads to call EnableFileDeletions() and DisableFileDeletions()
* Ability to stop backup from another thread and not slow down the DB close
* Copy the files to the temporary folder first and then atomically rename

Test Plan: Added some tests to backupable_db_test

Reviewers: dhruba, sanketh, muthu, sdong, haobo

Reviewed By: haobo

CC: leveldb, sanketh, muthu

Differential Revision: https://reviews.facebook.net/D14769
2014-01-09 12:24:28 -08:00
Igor Canadi
6de1b5b83e Merge branch 'master' into columnfamilies 2014-01-02 04:18:07 -08:00
Igor Canadi
b60c14f6ee Support multi-threaded DisableFileDeletions() and EnableFileDeletions()
Summary:
We don't want two threads to clash if they concurrently call DisableFileDeletions() and EnableFileDeletions(). I'm adding a counter that will enable file deletions only after all DisableFileDeletions() calls have been negated with EnableFileDeletions().

However, we also don't want to break the old behavior, so I added a parameter force to EnableFileDeletions(). If force is true, we will still enable file deletions after every call to EnableFileDeletions(), which is what is happening now.

Test Plan: make check

Reviewers: dhruba, haobo, sanketh

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14781
2014-01-02 03:33:42 -08:00
Igor Canadi
9385a5247e [RocksDB] [Column Family] Interface proposal
Summary:
<This diff is for Column Family branch>

Sharing some of the work I've done so far. This diff compiles and passes the tests.

The biggest change is in options.h - I broke down Options into two parts - DBOptions and ColumnFamilyOptions. DBOptions is DB-specific (env, create_if_missing, block_cache, etc.) and ColumnFamilyOptions is column family-specific (all compaction options, compresion options, etc.). Note that this does not break backwards compatibility at all.

Further, I created DBWithColumnFamily which inherits DB interface and adds new functions with column family support. Clients can transparently switch to DBWithColumnFamily and it will not break their backwards compatibility.
There are few methods worth checking out: ListColumnFamilies(), MultiNewIterator(), MultiGet() and GetSnapshot(). [GetSnapshot() returns the snapshot across all column families for now - I think that's what we agreed on]

Finally, I made small changes to WriteBatch so we are able to atomically insert data across column families.

Please provide feedback.

Test Plan: make check works, the code is backward compatible

Reviewers: dhruba, haobo, sdong, kailiu, emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14445
2013-12-18 13:08:22 -08:00
Igor Canadi
5e4ab767cf BackupableDB delete backups with newer seq number
Summary: We now delete backups with newer sequence number, so the clients don't have to handle confusing situations when they restore from backup.

Test Plan: added a unit test

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14547
2013-12-10 20:49:28 -08:00
Igor Canadi
fb9fce4fc3 [RocksDB] BackupableDB
Summary:
In this diff I present you BackupableDB v1. You can easily use it to backup your DB and it will do incremental snapshots for you.
Let's first describe how you would use BackupableDB. It's inheriting StackableDB interface so you can easily construct it with your DB object -- it will add a method RollTheSnapshot() to the DB object. When you call RollTheSnapshot(), current snapshot of the DB will be stored in the backup dir. To restore, you can just call RestoreDBFromBackup() on a BackupableDB (which is a static method) and it will restore all files from the backup dir. In the next version, it will even support automatic backuping every X minutes.

There are multiple things you can configure:
1. backup_env and db_env can be different, which is awesome because then you can easily backup to HDFS or wherever you feel like.
2. sync - if true, it *guarantees* backup consistency on machine reboot
3. number of snapshots to keep - this will keep last N snapshots around if you want, for some reason, be able to restore from an earlier snapshot. All the backuping is done in incremental fashion - if we already have 00010.sst, we will not copy it again. *IMPORTANT* -- This is based on assumption that 00010.sst never changes - two files named 00010.sst from the same DB will always be exactly the same. Is this true? I always copy manifest, current and log files.
4. You can decide if you want to flush the memtables before you backup, or you're fine with backing up the log files -- either way, you get a complete and consistent view of the database at a time of backup.
5. More things you can find in BackupableDBOptions

Here is the directory structure I use:

   backup_dir/CURRENT_SNAPSHOT - just 4 bytes holding the latest snapshot
               0, 1, 2, ... - files containing serialized version of each snapshot - containing a list of files
               files/*.sst - sst files shared between snapshots - if one snapshot references 00010.sst and another one needs to backup it from the DB, it will just reference the same file
               files/ 0/, 1/, 2/, ... - snapshot directories containing private snapshot files - current, manifest and log files

All the files are ref counted and deleted immediatelly when they get out of scope.

Some other stuff in this diff:
1. Added GetEnv() method to the DB. Discussed with @haobo and we agreed that it seems right thing to do.
2. Fixed StackableDB interface. The way it was set up before, I was not able to implement BackupableDB.

Test Plan:
I have a unittest, but please don't look at this yet. I just hacked it up to help me with debugging. I will write a lot of good tests and update the diff.

Also, `make asan_check`

Reviewers: dhruba, haobo, emayanke

Reviewed By: dhruba

CC: leveldb, haobo

Differential Revision: https://reviews.facebook.net/D14295
2013-12-09 14:06:52 -08:00
Mayank Agarwal
92e8316118 Make GetDbIdentity pure virtual and also implement it for StackableDB, DBWithTTL
Summary: As title

Test Plan: make clean and make

Reviewers: igor

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14469
2013-12-05 12:02:31 -08:00
Igor Canadi
3ce3658411 DB::GetOptions()
Summary: We need access to options for BackupableDB

Test Plan: make check

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb, reconnect.grayhat

Differential Revision: https://reviews.facebook.net/D14331
2013-11-25 15:51:50 -08:00
Igor Canadi
11c26bd4a4 [RocksDB] Interface changes required for BackupableDB
Summary: This is part of https://reviews.facebook.net/D14295 -- smaller diff that is easier to review

Test Plan: make asan_check

Reviewers: dhruba, haobo, emayanke

Reviewed By: emayanke

CC: leveldb, kailiu, reconnect.grayhat

Differential Revision: https://reviews.facebook.net/D14301
2013-11-25 12:39:23 -08:00
Mayank Agarwal
f837f5b1c9 Making the transaction log iterator more robust
Summary:
strict essentially means that we MUST find the startsequence. Thus we should return if starteSequence is not found in the first file in case strict is set. This will take care of ending the iterator in case of permanent gaps due to corruptions in the log files
Also created NextImpl function that will have internal variable to distinguish whether Next is being called from StartSequence or by application.
Set NotFoudn::gaps status to give an indication of gaps happeneing.
Polished the inline documentation at various places

Test Plan:
* db_repl_stress test
* db_test relating to transaction log iterator
* fbcode/wormhole/rocksdb/rocks_log_iterator
* sigma production machine sigmafio032.prn1

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D13689
2013-11-04 20:49:03 -08:00
Mayank Agarwal
56305221c4 Unify DeleteFile and DeleteWalFiles
Summary:
This is to simplify rocksdb public APIs and improve the code quality.
Created an additional parameter to ParseFileName for log sub type and improved the code for deleting a wal file.
Wrote exhaustive unit-tests in delete_file_test
Unification of other redundant APIs can be taken up in a separate diff

Test Plan: Expanded delete_file test

Reviewers: dhruba, haobo, kailiu, sdong

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D13647
2013-10-25 08:32:14 -07:00
Dhruba Borthakur
4463b11cad Migrate names of properties from 'leveldb' prefix to 'rocksdb' prefix.
Summary: Migrate names of properties from 'leveldb' prefix to 'rocksdb' prefix.

Test Plan: make check

Reviewers: emayanke, haobo

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D13311
2013-10-06 00:14:26 -07:00
Dhruba Borthakur
a143ef9b38 Change namespace from leveldb to rocksdb
Summary:
Change namespace from leveldb to rocksdb. This allows a single
application to link in open-source leveldb code as well as
rocksdb code into the same process.

Test Plan: compile rocksdb

Reviewers: emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D13287
2013-10-04 11:59:26 -07:00
Mayank Agarwal
854d236361 Add backward compatible option in GetLiveFiles to choose whether to not Flush first
Summary:
As explained in comments in GetLiveFiles in db.h, this option will cause flush to be skipped in GetLiveFiles because some use-cases use GetSortedWalFiles after GetLiveFiles to generate more complete snapshots.
Using GetSortedWalFiles after GetLiveFiles allows us to not Flush in GetLiveFiles first because wals have everything.
Note: file deletions will be disabled before calling GLF or GSWF so live logs will not move to archive logs or get delted.
Note: Manifest file is truncated to a proper value in GLF, so it will always reply from the proper wal files on a restart

Test Plan: make

Reviewers: dhruba, haobo

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D13257
2013-10-04 10:20:10 -07:00
Mayank Agarwal
aa5c897d42 Return pathname relative to db dir in LogFile and cleanup AppendSortedWalsOfType
Summary: So that replication can just download from wherever LogFile.Pathname is pointing them.

Test Plan: make all check;./db_repl_stress

Reviewers: dhruba, haobo

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D12609
2013-09-04 13:44:43 -07:00
Xing Jin
42c109cc2e New ldb command to convert compaction style
Summary:
Add new command "change_compaction_style" to ldb tool. For
universal->level, it shows "nothing to do". For level->universal, it
compacts all files into a single one and moves the file to level 0.

Also add check for number of files at level 1+ when opening db with
universal compaction style.

Test Plan:
'make all check'. New unit test for internal convertion function. Also manully test various
cmd like:

./ldb change_compaction_style --old_compaction_style=0
--new_compaction_style=1 --db=/tmp/leveldbtest-3088/db_test

Reviewers: haobo, dhruba

Reviewed By: haobo

CC: vamsi, emayanke

Differential Revision: https://reviews.facebook.net/D12603
2013-09-04 13:13:08 -07:00
Dhruba Borthakur
1186192ed1 Replace include/leveldb with include/rocksdb.
Summary: Replace include/leveldb with include/rocksdb.

Test Plan:
make clean; make check
make clean; make release

Differential Revision: https://reviews.facebook.net/D12489
2013-08-23 10:51:00 -07:00
Mayank Agarwal
404d63ac3e Add TODO for DBToStackableDB function which doesn't work yet
Summary:

Test Plan:

Reviewers:

CC:

Task ID: #

Blame Rev:
2013-08-20 21:33:53 -07:00
Mayank Agarwal
8a3547d38e API for getting archived log files
Summary: Also expanded class LogFile to have startSequene and FileSize and exposed it publicly

Test Plan: make all check

Reviewers: dhruba, haobo

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D12087
2013-08-19 13:37:04 -07:00
Deon Nicholas
e1346968d8 Merge operator fixes part 1.
Summary:
-Added null checks and revisions to DBIter::MergeValuesNewToOld()
-Added DBIter test to stringappend_test
-Major fix with Merge and TTL
More plans for fixes later.

Test Plan:
-make clean; make stringappend_test -j 32; ./stringappend_test
-make all check;

Reviewers: haobo, emayanke, vamsi, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D12315
2013-08-19 11:42:47 -07:00
Mayank Agarwal
1d7b4765c3 Expose base db object from ttl wrapper
Summary: rocksdb replicaiton will need this when writing value+TS from master to slave 'as is'

Test Plan: make

Reviewers: dhruba, vamsi, haobo

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11919
2013-08-05 18:44:14 -07:00
Mayank Agarwal
cacd812fb2 Support user's compaction filter in TTL logic
Summary: TTL uses compaction filter to purge key-values and required the user to not pass one. This diff makes it accommodating of user's compaciton filter. Added test to ttl_test

Test Plan: make; ./ttl_test

Reviewers: dhruba, haobo, vamsi

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11973
2013-08-05 11:28:01 -07:00
Mayank Agarwal
3102628873 Bring read_only into ttl
Summary: added an argument to openttldb for read only and open the db in normal readonly mode if the arguments is set to true

Test Plan: make ttl_test; ./ttl_test

Reviewers: dhruba, haobo, vamsi, sheki

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D10749
2013-05-10 16:13:26 -07:00
Mayank Agarwal
ff1a0801fc Correct path of db.h in utility_db.h
Summary: Will not be caught properly from file in fbcode with old path. New path fixes it.

Test Plan: make

Reviewers: sheki, dhruba, haobo, vamsi

Reviewed By: sheki

Differential Revision: https://reviews.facebook.net/D10707
2013-05-09 17:32:39 -07:00
Mayank Agarwal
d786b25e2d Timestamp and TTL Wrapper for rocksdb
Summary:
When opened with DBTimestamp::Open call, timestamps are prepended to and stripped from the value during subsequent Put and Get calls respectively. The Timestamp is used to discard values in Get and custom compaction filter which have exceeded their TTL which is specified during Open.
Have made a temporary change to Makefile to let us test with the temporary file TestTime.cc. Have also changed the private members of db_impl.h to protected to let them be inherited by the new class DBTimestamp

Test Plan: make db_timestamp; TestTime.cc(will not check it in) shows how to use the apis currently, but I will write unit-tests shortly

Reviewers: dhruba, vamsi, haobo, sheki, heyongqiang, vkrest

Reviewed By: vamsi

CC: zshao, xjin, vkrest, MarkCallaghan

Differential Revision: https://reviews.facebook.net/D10311
2013-05-02 16:34:42 -07:00