rocksdb

Author	SHA1	Message	Date
sdong	249e796dfc	Fix Flaky DBCompactionTest.SkipStatsUpdateTest Summary: DBCompactionTest.SkipStatsUpdateTest sometimes fails. I don't see any verification related to the deletes issued. Remove them to avoid the uncertainty. Test Plan: Run the test. Reviewers: IslamAbdelRahman, andrewkr, yhchiang Reviewed By: yhchiang Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59613	2016-06-15 12:00:51 -07:00
Islam AbdelRahman	f5177c761f	Remove wasteful instrumentation in FullMerge (stacked on D59577) Summary: [ This diff is stacked on top of D59577 ] We keep calling timer.ElapsedNanos() on every call to MergeOperator::FullMerge even when statistics are disabled, this is wasteful. I run the readseq benchmark on a DB containing 100K merge operands for 100K keys (1 operand per key) with 1GB block cache I see slight performance improvment Original results ``` $ ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=100000 --num=100000 --db="/dev/shm/100K_merge_compacted/" --cache_size=1073741824 --use_existing_db --disable_auto_compactions ------------------------------------------------ DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.498 micros/op 2006597 ops/sec; 222.0 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.295 micros/op 3393627 ops/sec; 375.4 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.285 micros/op 3511155 ops/sec; 388.4 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.286 micros/op 3500470 ops/sec; 387.2 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.283 micros/op 3530751 ops/sec; 390.6 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.289 micros/op 3464811 ops/sec; 383.3 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.277 micros/op 3612814 ops/sec; 399.7 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.283 micros/op 3539640 ops/sec; 391.6 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.285 micros/op 3503766 ops/sec; 387.6 MB/s ``` After patch ``` $ ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=100000 --num=100000 --db="/dev/shm/100K_merge_compacted/" --cache_size=1073741824 --use_existing_db --disable_auto_compactions ------------------------------------------------ DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.476 micros/op 2100119 ops/sec; 232.3 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.278 micros/op 3600887 ops/sec; 398.4 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.275 micros/op 3636698 ops/sec; 402.3 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.271 micros/op 3691661 ops/sec; 408.4 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.273 micros/op 3661534 ops/sec; 405.1 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.276 micros/op 3627106 ops/sec; 401.3 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.272 micros/op 3682635 ops/sec; 407.4 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.266 micros/op 3758331 ops/sec; 415.8 MB/s DB path: [/dev/shm/100K_merge_compacted/] readseq : 0.266 micros/op 3761907 ops/sec; 416.2 MB/s ``` Test Plan: make check -j64 Reviewers: yhchiang, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59583	2016-06-13 16:22:14 -07:00
Islam AbdelRahman	7c919deccc	Reuse TimedFullMerge instead of FullMerge + instrumentation Summary: We have alot of code duplication whenever we call FullMerge we keep duplicating the instrumentation and statistics code This is a simple diff to refactor the code to use TimedFullMerge instead of FullMerge Test Plan: COMPILE_WITH_ASAN=1 make check -j64 Reviewers: andrewkr, yhchiang, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59577	2016-06-13 16:17:26 -07:00
Yi Wu	bc8af90e8c	add option to not flush memtable on open() Summary: Add option to not flush memtable on open() In case the option is enabled, don't delete existing log files by not updating log numbers to MANIFEST. Will still flush if we need to (e.g. memtable full in the middle). In that case we also flush final memtable. If wal_recovery_mode = kPointInTimeRecovery, do not halt immediately after encounter corruption. Instead, check if seq id of next log file is last_log_sequence + 1. In that case we continue recovery. Test Plan: See unit test. Reviewers: dhruba, horuff, sdong Reviewed By: sdong Subscribers: benj, yhchiang, andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D57813	2016-06-13 11:34:16 -07:00
sdong	6faddd7c55	Merge db/slice.cc into util/slice.cc Summary: It confuses some compilers to have slice.cc under multiple directories. Merge them. Test Plan: Run existing tests Reviewers: andrewkr, yhchiang, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59409	2016-06-10 16:37:36 -07:00
sdong	5009b5326b	BlockBasedTable::FullFilterKeyMayMatch() Should skip prefix bloom if full key bloom exists Summary: Currently, if users define both of full key bloom and prefix bloom in SST files. During Get(), if full key bloom shows the key may exist, we still go ahead and check prefix bloom. This is wasteful. If bloom filter for full keys exists, we should always ignore prefix bloom in Get(). Test Plan: Run existing tests Reviewers: yhchiang, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D57825	2016-06-10 16:27:56 -07:00
sdong	20699df843	memtable_prefix_bloom_bits -> memtable_prefix_bloom_bits_ratio and deprecate memtable_prefix_bloom_probes Summary: memtable_prefix_bloom_probes is not a critical option. Remove it to reduce number of options. It's easier for users to make mistakes with memtable_prefix_bloom_bits, turn it to memtable_prefix_bloom_bits_ratio Test Plan: Run all existing tests Reviewers: yhchiang, igor, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: gunnarku, yoshinorim, MarkCallaghan, leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59199	2016-06-10 12:12:10 -07:00
Wanning Jiang	56887f6cb8	Backup Options Summary: Backup options file to private directory Test Plan: backupable_db_test.cc, BackupOptions Modify DB options by calling OpenDB for 3 times. Check the latest options file is in the right place. Also check no redundent files are backuped. Reviewers: andrewkr Reviewed By: andrewkr Subscribers: leveldb, dhruba, andrewkr Differential Revision: https://reviews.facebook.net/D59373	2016-06-09 19:03:10 -07:00
Anirban Rahut	a73b26f601	Adding test for contiguous WAL detection Summary: Add a test to detect that when WAL gets truncated, seq no's are checked to be contiguous. This test is put in ColumnFamilyTest as it has the necessary infrastructure/functions for flushing column families, which we use to ensure 2 active WAL files Test Plan: This is a test, no feature has been added. This test fails today and hence disabled Reviewers: sdong Reviewed By: sdong Subscribers: lgalanis, dhruba, andrewkr, pritamdamania Differential Revision: https://reviews.facebook.net/D59253	2016-06-07 18:04:15 -07:00
Aaron Gao	e532877940	Add statistics field to show total size of index and filter blocks in block cache Summary: With `table_options.cache_index_and_filter_blocks = true`, index and filter blocks are stored in block cache. Then people are curious how much of the block cache total size is used by indexes and bloom filters. It will be nice we have a way to report that. It can help people tune performance and plan for optimized hardware setting. We add several enum values for db Statistics. BLOCK_CACHE_INDEX/FILTER_BYTES_INSERT - BLOCK_CACHE_INDEX/FILTER_BYTES_ERASE = current INDEX/FILTER total block size in bytes. Test Plan: write a test case called `DBBlockCacheTest.IndexAndFilterBlocksStats`. The result is: ``` [gzh@dev9927.prn1 ~/local/rocksdb] make db_block_cache_test -j64 && ./db_block_cache_test --gtest_filter=DBBlockCacheTest.IndexAndFilterBlocksStats Makefile:101: Warning: Compiling in debug mode. Don't use the resulting binary in production GEN util/build_version.cc make: `db_block_cache_test' is up to date. Note: Google Test filter = DBBlockCacheTest.IndexAndFilterBlocksStats [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from DBBlockCacheTest [ RUN ] DBBlockCacheTest.IndexAndFilterBlocksStats [ OK ] DBBlockCacheTest.IndexAndFilterBlocksStats (689 ms) [----------] 1 test from DBBlockCacheTest (689 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (689 ms total) [ PASSED ] 1 test. ``` Reviewers: IslamAbdelRahman, andrewkr, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D58677	2016-06-03 10:47:47 -07:00
Jan Doms	02ec8154e5	allow updating block cache capacity from C (#1149 )	2016-06-03 14:04:51 +01:00
Andrew Kryczka	842958651f	Fix race condition in SwitchMemtable Summary: MemTableList::current_ could be written by background flush thread and simultaneously read in the user thread (NumNotFlushed() is used in SwitchMemtable()). Use the lock to prevent this case. Found the error from tsan. Related: D58833 Test Plan: $ OPT=-g COMPILE_WITH_TSAN=1 make -j64 db_test $ TEST_TMPDIR=/dev/shm/rocksdb ./db_test --gtest_filter=DBTest.RepeatedWritesToSameKey Reviewers: lightmark, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D59139	2016-06-02 17:11:45 -07:00
PraveenSinghRao	3a276b0cbe	Add a callback for when memtable is moved to immutable (#1137 ) * Create a callback for memtable becoming immutable Create a callback for memtable becoming immutable Create a callback for memtable becoming immutable moved notification outside the lock Move sealed notification to unlocked portion of SwitchMemtable * fix lite build	2016-06-02 11:57:31 -07:00
Mike Kolupaev	936973d145	Small tweaks to logging to track the number of immutable memtables Summary: We see some write stalls because of number of unflushed memtables. With existing logging I couldn't figure out what's happening exactly. See internal task t11446054 for details if interested. This diff adds: - logging of memtable creation at info level; I wanted it on multiple occasions for different reasons; also include number of immutable memtables, - logging of number of remaining immutable memtables after a flush. Test Plan: ran tests Reviewers: sdong Reviewed By: sdong Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D58833	2016-06-01 11:11:33 -07:00
siddontang	21c047ab49	add readahead size option (#1146 )	2016-06-01 10:48:50 -07:00
Reid Horuff	5d85fdb2c5	add missing lock	2016-05-31 12:26:48 -07:00
sdong	345fd73faf	Fix flaky DBTestDynamicLevel.DynamicLevelMaxBytesBase2 Summary: We added more table properties for each SST file, so when using 2KB SST file size, the estimated size of SST files is off by almost half, causing the LSM tree structure not as expected. Fix it by making file size 4x as previously, as well as LSM base size. Also avoid the sleeping based synchronization and turn to use sync points. Test Plan: Run paralell unit tests multiple times and make sure they always pass. Reviewers: IslamAbdelRahman, kradhakrishnan Reviewed By: kradhakrishnan Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D58749	2016-05-26 10:13:24 -07:00
krad	8fc75de327	Minor fix to disable DynamicLevelMaxBytesBase2	2016-05-24 17:45:50 -07:00
Ashish Shenoy	99765ed855	Clean up the ComputeCompactionScore() API Summary: Make CompactionOptionsFIFO a part of mutable_cf_options Test Plan: UT Reviewers: sdong Reviewed By: sdong Subscribers: andrewkr, lgalanis, dhruba Differential Revision: https://reviews.facebook.net/D58653	2016-05-23 15:55:29 -07:00
Shen Li	def2f7bd0e	Expose report_bg_io_stats option in the C API. (#1131 )	2016-05-23 13:13:47 -07:00
siddontang	8f1214531e	C API: Expose DeleteFileInRange (#1132 )	2016-05-23 04:19:47 -07:00
Sage Weil	11f329bd40	db/db_impl: restrict WALRecoveryMode when using recycled log files kPointInTimeRecovery is indistinguishable from kTolerateCorruptedTailRecords in recycle mode since we define the "end" of the log as the first corrupt record we encounter. kAbsoluteConsistency doesn't make sense because even a clean shutdown leaves old junk at the end of the log file. Signed-off-by: Sage Weil <sage@redhat.com>	2016-05-22 22:00:15 -07:00
Sage Weil	2b2a898e0b	db/log_reader: combine kBadRecord{Len,Checksum} for readability These vary only by the corruption string reported. Signed-off-by: Sage Weil <sage@redhat.com>	2016-05-22 22:00:15 -07:00
Sage Weil	34df1c94d5	db/log_reader: treat bad record length or checksum as EOF If we are in kTolerateCorruptedTailRecords, treat these errors as the end of the log. This is particularly important for recycled logs, where we will regularly see corrupted headers (bad length or checksum) when replaying a log. If we are aligned with a block boundary or get lucky, we will land on an old header and see the log number mismatch, but more commonly we will land midway through some previous block and record and effectively see noise. These must be treated as the end of the log in order for recycling to work. This makes the LogTest.Recycle/1 test pass. We also modify a number of existing tests because the recycled log files behave fundamentally differently in that they always stop when they reach the first bad record. Signed-off-by: Sage Weil <sage@redhat.com>	2016-05-22 22:00:15 -07:00
Sage Weil	7947aba68c	db/log_reader: move kBadRecord{Len,Checksum} handling into ReadRecord The behavior here needs to depend on the WAL recovery mode. No functional change in this patch. Signed-off-by: Sage Weil <sage@redhat.com>	2016-05-22 22:00:15 -07:00
Sage Weil	847e471db6	db/log_test: add recycle log test This currently fails because we do not properly map a corrupt header to the logical end of the log. Signed-off-by: Sage Weil <sage@redhat.com>	2016-05-22 22:00:15 -07:00
Aaron Orenstein	2073cf3775	Eliminate use of 'using namespace std'. Also remove a number of ADL references to std functions. Summary: Reduce use of argument-dependent name lookup in RocksDB. Test Plan: 'make check' passed. Reviewers: andrewkr Reviewed By: andrewkr Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D58203	2016-05-20 07:42:18 -07:00
Richard Cairns Jr	f6e404c20a	Added "number of merge operands" to statistics in ssts. Summary: A couple of notes from the diff: - The namespace block I added at the top of table_properties_collector.cc was in reaction to an issue i was having with PutVarint64 and reusing the "val" string. I'm not sure this is the cleanest way of doing this, but abstracting this out at least results in the correct behavior. - I chose "rocksdb.merge.operands" as the property name. I am open to suggestions for better names. - The change to sst_dump_tool.cc seems a bit inelegant to me. Is there a better way to do the if-else block? Test Plan: I added a test case in table_properties_collector_test.cc. It adds two merge operands and checks to make sure that both of them are reflected by GetMergeOperands. It also checks to make sure the wasPropertyPresent bool is properly set in the method. Running both of these tests should pass: ./table_properties_collector_test ./sst_dump_test Reviewers: IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D58119	2016-05-19 14:24:48 -07:00
omegaga	3c69f77c67	Move IO failure test to separate file Summary: This is a part of effort to reduce the size of db_test.cc. We move the following tests to a separate file `db_io_failure_test.cc`: * DropWrites * DropWritesFlush * NoSpaceCompactRange * NonWritableFileSystem * ManifestWriteError * PutFailsParanoid Test Plan: Run `make check` to see if the tests are working properly. Reviewers: sdong, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D58341	2016-05-18 17:09:20 -07:00
Islam AbdelRahman	c70a9335de	Fix mutex unlock issue between scheduled compaction and ReleaseCompactionFiles() Summary: NotifyOnCompactionCompleted can unlock the mutex. That mean that we can schedule a background compaction that will start before we ReleaseCompactionFiles(). Test Plan: added unittest existing unittest Reviewers: yhchiang, sdong Reviewed By: sdong Subscribers: yoshinorim, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D58065	2016-05-18 14:56:30 -07:00
Reid Horuff	a6254f2bd4	Long outstanding prepare test Summary: This tests that a prepared transaction is not lost after several crashes, restarts, and memtable flushes. Test Plan: TwoPhaseLongPrepareTest Reviewers: sdong Subscribers: hermanlee4, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D58185	2016-05-17 18:57:06 -07:00
Aaron Gao	43afd72bee	[rocksdb] make more options dynamic Summary: make more ColumnFamilyOptions dynamic: - compression - soft_pending_compaction_bytes_limit - hard_pending_compaction_bytes_limit - min_partial_merge_operands - report_bg_io_stats - paranoid_file_checks Test Plan: Add sanity check in `db_test.cc` for all above options except for soft_pending_compaction_bytes_limit and hard_pending_compaction_bytes_limit. All passed. Reviewers: andrewkr, sdong, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D57519	2016-05-17 13:11:56 -07:00
Islam AbdelRahman	f6aedb62c0	Fix Transaction memory leak Summary: - Make sure we clean up recovered_transactions_ on DBImpl destructor - delete leaked txns and env in TransactionTest Test Plan: Run transaction_test under valgrind Reviewers: sdong, andrewkr, yhchiang, horuff Reviewed By: horuff Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D58263	2016-05-16 16:32:55 -07:00
krad	a08c8c851a	Added PersistentCache abstraction Summary: Added a new abstraction to cache page to RocksDB designed for the read cache use. RocksDB current block cache is more of an object cache. For the persistent read cache project, what we need is a page cache equivalent. This changes adds a cache abstraction to RocksDB to cache pages called PersistentCache. PersistentCache can cache uncompressed pages or raw pages (content as in filesystem). The user can choose to operate PersistentCache either in COMPRESSED or UNCOMPRESSED mode. Blame Rev: Test Plan: Run unit tests Reviewers: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D55707	2016-05-15 22:17:18 -07:00
Reid Horuff	a400336398	TransactionLogIterator sequence gap fix Summary: DBTestXactLogIterator.TransactionLogIterator was failing due the sequence gaps. This was caused by an off-by-one error when calculating the new sequence number after recovering from logs. Test Plan: db_log_iter_test Reviewers: andrewkr Subscribers: andrewkr, hermanlee4, dhruba, IslamAbdelRahman Differential Revision: https://reviews.facebook.net/D58053	2016-05-12 13:54:08 -07:00
Islam AbdelRahman	560358dc93	Fix data race in GetObsoleteFiles() Summary: GetObsoleteFiles() and LogAndApply() functions modify obsolete_manifests_ vector we need to make sure that the mutex is held when we modify the obsolete_manifests_ Test Plan: run the test under TSAN Reviewers: andrewkr Reviewed By: andrewkr Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D58011	2016-05-10 19:30:09 -07:00
Reid Horuff	c27061dae7	[rocksdb] 2PC double recovery bug fix Summary: 1. prepare() 2. crash 3. recover 4. commit() 5. crash 6. data is lost This is due to the transaction data still only residing in the WAL but because the logs were flushed on the first recovery the data is ignored on the second recovery. We must scan all logs found on recovery and only ignore redundant data at the time of replay. It is not possible to know which logs still contain relevant data at time of recovery. We cannot simply ignore a log because all of the non-2pc data it contains has already been written to L0. The changes made to MemTableInserter are to ensure that prepared sections are still recovered even if all of the non-2pc data in that log has already been flushed to L0. Test Plan: Provided test. Reviewers: sdong Subscribers: andrewkr, hermanlee4, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D57729	2016-05-10 14:06:07 -07:00
Reid Horuff	a657ee9a9c	[rocksdb] Recovery path sequence miscount fix Summary: Consider the following WAL with 4 batch entries prefixed with their sequence at time of memtable insert. [1: BEGIN_PREPARE, PUT, PUT, PUT, PUT, END_PREPARE(a)] [1: BEGIN_PREPARE, PUT, PUT, PUT, PUT, END_PREPARE(b)] [4: COMMIT(a)] [7: COMMIT(b)] The first two batches do not consume any sequence numbers so are both prefixed with seq=1. For 2pc commit, memtable insertion takes place before COMMIT batch is written to WAL. We can see that sequence number consumption takes place between WAL entries giving us the seemingly sparse sequence prefix for WAL entries. This is a valid WAL. Because with 2PC markers one WriteBatch points to another batch containing its inserts a writebatch can consume more or less sequence numbers than the number of sequence consuming entries that it contains. We can see that, given the entries in the WAL, 6 sequence ids were consumed. Yet on recovery the maximum sequence consumed would be 7 + 3 (the number of sequence numbers consumed by COMMIT(b)) So, now upon recovery we must track the actual consumption of sequence numbers. In the provided scenario there will be no sequence gaps, but it is possible to produce a sequence gap. This should not be a problem though. correct? Test Plan: provided test. Reviewers: sdong Subscribers: andrewkr, leveldb, dhruba, hermanlee4 Differential Revision: https://reviews.facebook.net/D57645	2016-05-10 14:06:07 -07:00
Reid Horuff	8a66c85e90	[rocksdb] Two Phase Transaction Summary: Two Phase Commit addition to RocksDB. See wiki: https://github.com/facebook/rocksdb/wiki/Two-Phase-Commit-Implementation Quip: https://fb.quip.com/pxZrAyrx53r3 Depends on: WriteBatch modification: https://reviews.facebook.net/D54093 Memtable Log Referencing and Prepared Batch Recovery: https://reviews.facebook.net/D56919 Test Plan: - SimpleTwoPhaseTransactionTest - PersistentTwoPhaseTransactionTest. - TwoPhaseRollbackTest - TwoPhaseMultiThreadTest - TwoPhaseLogRollingTest - TwoPhaseEmptyWriteTest - TwoPhaseExpirationTest Reviewers: IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: leveldb, hermanlee4, andrewkr, vasilep, dhruba, santoshb Differential Revision: https://reviews.facebook.net/D56925	2016-05-10 14:06:07 -07:00
Reid Horuff	1b8a2e8fdd	[rocksdb] Memtable Log Referencing and Prepared Batch Recovery Summary: This diff is built on top of WriteBatch modification: https://reviews.facebook.net/D54093 and adds the required functionality to rocksdb core necessary for rocksdb to support 2PC. modfication of DBImpl::WriteImpl() - added two arguments uint64_t log_used = nullptr, uint64_t log_ref = 0; - log_used is an output argument which will return the log number which the incoming batch was inserted into, 0 if no WAL insert took place. - log_ref is a supplied log_number which all memtables inserted into will reference after the batch insert takes place. This number will reside in 'FindMinPrepLogReferencedByMemTable()' until all Memtables insertinto have flushed. - Recovery/writepath is now aware of prepared batches and commit and rollback markers. Test Plan: There is currently no test on this diff. All testing of this functionality takes place in the Transaction layer/diff but I will add some testing. Reviewers: IslamAbdelRahman, sdong Subscribers: leveldb, santoshb, andrewkr, vasilep, dhruba, hermanlee4 Differential Revision: https://reviews.facebook.net/D56919	2016-05-10 14:06:07 -07:00
Reid Horuff	0460e9dcce	Modification of WriteBatch to support two phase commit Summary: Adds three new WriteBatch data types: Prepare(xid), Commit(xid), Rollback(xid). Prepare(xid) should precede the (single) operation to which is applies. There can obviously be multiple Prepare(xid) markers. There should only be one Rollback(xid) or Commit(xid) marker yet not both. None of this logic is currently enforced and will most likely be implemented further up such as in the memtableinserter. All three markers are similar to PutLogData in that they are writebatch meta-data, ie stored but not counted. All three markers differ from PutLogData in that they will actually be written to disk. As for WriteBatchWithIndex, Prepare, Commit, Rollback are all implemented just as PutLogData and none are tested just as PutLogData. Test Plan: single unit test in write_batch_test. Reviewers: hermanlee4, sdong, anthony Subscribers: leveldb, dhruba, vasilep, andrewkr Differential Revision: https://reviews.facebook.net/D57867	2016-05-10 14:06:07 -07:00
Islam AbdelRahman	d86f9b9c3f	Fix lite build Summary: Fix lite build Test Plan: run under lite Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D57945	2016-05-09 16:08:30 -07:00
Islam AbdelRahman	4b31723433	Add bottommost_compression option Summary: Add a new option that can be used to set a specific compression algorithm for bottommost level. This option will only affect levels larger than base level. I have also updated CompactionJobInfo to include the compression algorithm used in compaction Test Plan: added new unittest existing unittests Reviewers: andrewkr, yhchiang, sdong Reviewed By: sdong Subscribers: lightmark, andrewkr, dhruba, yoshinorim Differential Revision: https://reviews.facebook.net/D57669	2016-05-09 15:57:19 -07:00
sdong	bfb6b1b8a8	Estimate pending compaction bytes more accurately Summary: Currently we estimate bytes needed for compaction by assuming fanout value to be level multiplier. It overestimates when size of a level exceeds the target by large. We estimate by the ratio of actual sizes in levels instead. Test Plan: Fix existing test cases and add a new one. Reviewers: IslamAbdelRahman, igor, yhchiang Reviewed By: yhchiang Subscribers: MarkCallaghan, leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D57789	2016-05-09 15:30:02 -07:00
Yi Wu	730f7e2e21	Fix win build Summary: Fixing error with win build where we compare int64_t with size_t. Test Plan: make check Reviewers: andrewkr Reviewed By: andrewkr Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D57885	2016-05-09 11:52:28 -07:00
Andrew Kryczka	269f6b2e2d	Revert "Modification of WriteBatch to support two phase commit" Summary: Revert D54093 and D57453 Test Plan: running make check Reviewers: horuff, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D57819	2016-05-06 16:58:24 -07:00
sdong	7ccb8d6ef3	BlockBasedTable::Get() not to use prefix bloom if read_options.total_order_seek = true Summary: This is to provide a way for users to skip prefix bloom in point look-up. Test Plan: Add a new unit test scenario. Reviewers: IslamAbdelRahman Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D57747	2016-05-06 10:16:11 -07:00
Islam AbdelRahman	967476eaee	Fix valgrind (DBIteratorTest.ReadAhead) Summary: This test is failing under valgrind because we dont delete the Env that we allocated Test Plan: run the test under valgrind Reviewers: andrewkr, yhchiang, yiwu, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D57693	2016-05-05 11:24:08 -07:00
Yi Wu	a4ea345b04	Fixing lite build Summary: Fixing lite build broke in unit test. `FilesPerLevel()` depends on `DB::GetProperty()`, which lite build doesn't support. Test Plan: OPT=-DROCKSDB_LITE make check -j64 Reviewers: sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D57651	2016-05-04 17:20:52 -07:00
Yi Wu	24a24f013d	Enable configurable readahead for iterators Summary: Add an option `iterator_readahead_size` to `ReadOptions` to enable configurable readahead for iterators similar to the corresponding option for compaction. Test Plan: ``` make commit_prereq ``` Reviewers: kumar.rangarajan, ott, igor, sdong Reviewed By: sdong Subscribers: yiwu, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D55419	2016-05-04 15:25:58 -07:00

1 2 3 4 5 ...

2344 Commits