rocksdb

Author	SHA1	Message	Date
Poornima Chozhiyath Raman	c0b23dd5b0	Enabling trivial move in universal compaction Summary: This change enables trivial move if all the input files are non onverlapping while doing Universal Compaction. Test Plan: ./compaction_picker_test and db_test ran successfully with the new testcases. Reviewers: sdong Reviewed By: sdong Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D40875	2015-07-07 14:18:55 -07:00
Yueh-Hsuan Chiang	59b50dcef9	Update HISTORY.md for Listener Summary: Update HISTORY.md for Listener Test Plan: no code change Reviewers: igor, sdong, IslamAbdelRahman, anthony Reviewed By: anthony Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D41325	2015-07-07 12:39:36 -07:00
agiardullo	4159f5b87b	Prepare 3.12 Summary: About to cut release Test Plan: none Reviewers: igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D41061	2015-07-02 12:20:36 -07:00
Aaron Feldman	a69bc91e37	Multithreaded backup and restore in BackupEngineImpl Summary: Add a new field: BackupableDBOptions.max_background_copies. CreateNewBackup() and RestoreDBFromBackup() will use this number of threads to perform copies. If there is a backup rate limit, then max_background_copies must be 1. Update backupable_db_test.cc to test multi-threaded backup and restore. Update backupable_db_test.cc to test backups when the backup environment is not the same as the database environment. Test Plan: Run ./backupable_db_test Run valgrind ./backupable_db_test Run with TSAN and ASAN Reviewers: yhchiang, rven, anthony, sdong, igor Reviewed By: igor Subscribers: yhchiang, anthony, sdong, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D40725	2015-07-02 11:35:51 -07:00
Igor Canadi	0a019d74a0	Use malloc_usable_size() for accounting block cache size Summary: Currently, when we insert something into block cache, we say that the block cache capacity decreased by the size of the block. However, size of the block might be less than the actual memory used by this object. For example, 4.5KB block will actually use 8KB of memory. So even if we configure block cache to 10GB, our actually memory usage of block cache will be 20GB! This problem showed up a lot in testing and just recently also showed up in MongoRocks production where we were using 30GB more memory than expected. This diff will fix the problem. Instead of counting the block size, we will count memory used by the block. That way, a block cache configured to be 10GB will actually use only 10GB of memory. I'm using non-portable function and I couldn't find info on portability on Google. However, it seems to work on Linux, which will cover majority of our use-cases. Test Plan: 1. fill up mongo instance with 80GB of data 2. restart mongo with block cache size configured to 10GB 3. do a table scan in mongo 4. memory usage before the diff: 12GB. memory usage after the diff: 10.5GB Reviewers: sdong, MarkCallaghan, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D40635	2015-06-26 11:48:09 -07:00
Islam AbdelRahman	674b1181cf	Bottommost level compaction option Summary: Replace force_bottommost_level_compaction in CompactRangeOption with an option that allow the user to (always skip, always compact, compact if compaction filter is present) the bottommost level for level based compaction. Test Plan: make check Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D40527	2015-06-23 13:32:40 -07:00
Giuseppe Ottaviano	782a1590f9	Implement a table-level row cache Summary: Implementation of a table-level row cache. It only caches point queries done through the `DB::Get` interface, queries done through the `Iterator` interface will completely skip the cache. Supports snapshots and merge operations. Test Plan: Ran `make valgrind_check commit-prereq` Reviewers: igor, philipp, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D39849	2015-06-23 10:25:45 -07:00
Igor Canadi	760e9a94de	Fail DB::Open() when the requested compression is not available Summary: Currently RocksDB silently ignores this issue and doesn't compress the data. Based on discussion, we agree that this is pretty bad because it can cause confusion for our users. This patch fails DB::Open() if we don't support the compression that is specified in the options. Test Plan: make check with LZ4 not present. If Snappy is not present all tests will just fail because Snappy is our default library. We should make Snappy the requirement, since without it our default DB::Open() fails. Reviewers: sdong, MarkCallaghan, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39687	2015-06-18 14:55:05 -07:00
Aaron Feldman	69bb210d58	Add Cache.GetPinnedUsageUsage() Summary: Add the funcion Cache.GetPinnedUsage() to return the memory size of entries that are in use by the system (that is, all the entries not in the LRU list). Test Plan: Run ./cache_test and examine PinnedUsageTest. Reviewers: tnovak, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D40305	2015-06-18 13:56:31 -07:00
Islam AbdelRahman	4eabbdb7ec	Skip bottommost level compaction if possible Summary: This is https://reviews.facebook.net/D39999 but after introducing an option to force compaction the bottom most level Changes in this patch - Introduce force_bottommost_level_compaction to CompactRangeOptions that force compacting bottommost level during compaction - Skip bottommost level compaction if we dont have a compaction filter and force_bottommost_level_compaction options is not set Although tests pass on my machine but I suspect that there maybe some tests that I am not aware of that should use force_bottommost_level_compaction to pass in a deterministic way Test Plan: make check adding new tests Reviewers: igor, sdong, yhchiang Reviewed By: yhchiang Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D40059	2015-06-18 11:03:31 -07:00
Islam AbdelRahman	12e030a992	Use CompactRangeOptions for CompactRange Summary: This diff update DB::CompactRange to use RangeCompactionOptions instead of using multiple parameters Old CompactRange is still available but deprecated Test Plan: make all check make rocksdbjava USE_CLANG=1 make all OPT=-DROCKSDB_LITE make release Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D40209	2015-06-17 14:36:14 -07:00
sdong	40f562e747	Allow GetApproximateSize() to include mem table size if it is skip list memtable Summary: Add an option in GetApproximateSize() so that the result will include estimated sizes in mem tables. To implement it, implement an estimated count from the beginning to a key in skip list. The approach is to count to find the entry, how many Next() is issued from each level, and sum them with a weight that is <branching factor> ^ <level>. Test Plan: Add a test case Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D40119	2015-06-16 18:13:23 -07:00
Igor Canadi	d59d90bb1f	db_bench periodically writes QPS to CSV file Summary: This is part of an effort to better understand and optimize RocksDB stalls under high load. I added a feature to db_bench to periodically write QPS to CSV files. That way we can nicely see how our QPS changes in time (especially when DB is stalled) and can do a better job of evaluating our stall system (i.e. we want the QPS to be as constant as possible, as opposed to having bunch of stalls) Cool part of CSV files is that we can easily graph them -- there are a bunch of tools available. Test Plan: Ran ./db_bench --report_interval_seconds=10 --benchmarks=fillrandom --num=10000000 and observed this in report.csv: secs_elapsed,interval_qps 10,2725860 20,1980480 30,1863456 40,1454359 50,1460389 Reviewers: sdong, MarkCallaghan, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D40047	2015-06-12 14:31:53 -07:00
sdong	7842920be5	Slow down writes by bytes written Summary: We slow down data into the database to the rate of options.delayed_write_rate (a new option) with this patch. The thread synchronization approach I take is to still synchronize write controller by DB mutex and GetDelay() is inside DB mutex. Try to minimize the frequency of getting time in GetDelay(). I verified it through db_bench and it seems to work hard_rate_limit is deprecated. options.delayed_write_rate is still not dynamically changeable. Need to work on it as a follow-up. Test Plan: Add new unit tests in db_test Reviewers: yhchiang, rven, kradhakrishnan, anthony, MarkCallaghan, igor Reviewed By: igor Subscribers: ikabiljo, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D36351	2015-06-11 20:42:18 -07:00
Igor Canadi	821cff114e	Re-generate WriteEntry on WBWIIterator::Entry() Summary: [This is the resubmit of D39813. Tests were failing, so I reverted the diff. I found the bug and I'm now resubmitting] If we don't do this, any calls to Entry() after WBWI mutation will result in undefined behavior. We need to re-fetch the offset from the skip list and regenerate the new pointer (because string's base pointer can change while mutating). Test Plan: COMPILE_WITH_ASAN=1 make write_batch_with_index_test && ./write_batch_with_index_test Reviewers: sdong Reviewed By: sdong Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D39897	2015-06-10 12:57:38 -07:00
Igor Canadi	75222d130e	Revert "Fix compile" This reverts commit `51440f83ec`. Revert "Re-generate WriteEntry on WBWIIterator::Entry()" This reverts commit `4949ef08db`.	2015-06-10 11:05:27 -07:00
Igor Canadi	4949ef08db	Re-generate WriteEntry on WBWIIterator::Entry() Summary: If we don't do this, any calls to Entry() after WBWI mutation will result in undefined behavior. We need to re-fetch the offset from the skip list and regenerate the new pointer (because string's base pointer can change while mutating). Test Plan: COMPILE_WITH_ASAN=1 make write_batch_with_index_test && ./write_batch_with_index_test Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39813	2015-06-10 10:35:19 -07:00
Venkatesh Radhakrishnan	406a5682eb	Fix hang when closing a DB after doing loads with WAL disabled. Summary: There is a hang during DB close in the following scenario: a) a load with WAL disabled was done, b) CancelAllBackgroundWork was called, c) DB Close was called This was because in that we will wait for a flush but we cannot do a background flush because we have called CancelAllBackgroundWork which marks the DB as shutting downn. Test Plan: Added DBTest FlushOnDestroy Reviewers: sdong Reviewed By: sdong Subscribers: yoshinorim, hermanlee4, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39747	2015-06-09 10:39:49 -07:00
Islam AbdelRahman	643bbbf081	Use nullptr for default compaction_filter_factory Summary: Replacing the default value for compaction_filter_factory and compaction_filter_factory_v2 to be nullptr instead of DefaultCompactionFilterFactory / DefaultCompactionFilterFactoryV2 The reason for this is to be able to determine easily if we have compaction filter factory or not without depending on RTTI Test Plan: make check Reviewers: yoshinorim, ott, igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D39693	2015-06-08 16:34:26 -07:00
Yueh-Hsuan Chiang	2e764f06ea	[API Change] Improve EventListener::OnFlushCompleted interface Summary: EventListener::OnFlushCompleted() now passes a structure instead of a list of parameters. This minimizes the API change in the future. Test Plan: listener_test compact_files_test example/compact_files_example Reviewers: kradhakrishnan, sdong, IslamAbdelRahman, rven, igor Reviewed By: rven, igor Subscribers: IslamAbdelRahman, rven, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39543	2015-06-05 12:28:51 -07:00
sdong	4266d4fd90	Allow users to migrate to options.level_compaction_dynamic_level_bytes=true using CompactRange() Summary: In DB::CompactRange(), change parameter "reduce_level" to "change_level". Users can compact all data to the last level if needed. By doing it, users can migrate the DB to options.level_compaction_dynamic_level_bytes=true. Test Plan: Add a unit test for it. Reviewers: yhchiang, anthony, kradhakrishnan, igor, rven Reviewed By: rven Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D39099	2015-06-01 18:21:14 -07:00
agiardullo	dc9d70de65	Optimistic Transactions Summary: Optimistic transactions supporting begin/commit/rollback semantics. Currently relies on checking the memtable to determine if there are any collisions at commit time. Not yet implemented would be a way of enuring the memtable has some minimum amount of history so that we won't fail to commit when the memtable is empty. You should probably start with transaction.h to get an overview of what is currently supported. Test Plan: Added a new test, but still need to look into stress testing. Reviewers: yhchiang, igor, rven, sdong Reviewed By: sdong Subscribers: adamretter, MarkCallaghan, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D33435	2015-05-29 14:36:35 -07:00
agiardullo	c815351038	Support saving history in memtable_list Summary: For transactions, we are using the memtables to validate that there are no write conflicts. But after flushing, we don't have any memtables, and transactions could fail to commit. So we want to someone keep around some extra history to use for conflict checking. In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit. After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure). It seems like the best place for this is abstracted inside the memtable_list. I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much. This diff adds a new parameter to control how much memtable history to keep around after flushing. However, it sounds like people aren't too fond of adding new parameters. So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers. This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit. (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached). So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit). However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions. Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests. Added testing in memtablelist_test and planning on adding more testing here. Reviewers: sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37443	2015-05-28 16:34:24 -07:00
Yueh-Hsuan Chiang	672dda9b3b	[API Change] Move listeners from ColumnFamilyOptions to DBOptions Summary: Move listeners from ColumnFamilyOptions to DBOptions Test Plan: listener_test compact_files_test Reviewers: rven, anthony, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39087	2015-05-28 13:21:39 -07:00
Yueh-Hsuan Chiang	e2c1d4b57f	[Public API Change] Make DB::GetDbIdentity() be const function. Summary: Make DB::GetDbIdentity() be const function. Test Plan: make db_test Reviewers: igor, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38745	2015-05-21 11:01:48 -07:00
Karthikeyan Radhakrishnan	eaf61ba9f3	Minor text correction New features title was repeated twice. Fixed it.	2015-05-21 10:55:58 -07:00
Yueh-Hsuan Chiang	b588505a7f	Update HISTORY.md for GetThreadList() update. Summary: Update HISTORY.md for GetThreadList() update. Test Plan: no code change Reviewers: sdong, rven, anthony, krishnanm86, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38685	2015-05-19 18:41:57 -07:00
Karthikeyan Radhakrishnan	d5de04d20e	Update history for 3.11 Flipped the unreleased section to 3.11	2015-05-19 14:19:11 -07:00
Igor Canadi	4a855c0799	Add an option wal_bytes_per_sync to control sync_file_range for WAL files Summary: sync_file_range is not always asyncronous and thus can block writes if we do this for WAL in the foreground thread. See more here: http://yoshinorimatsunobu.blogspot.com/2014/03/how-syncfilerange-really-works.html Some users don't want us to call sync_file_range on WALs. Some other do. Thus, I'm adding a separate option wal_bytes_per_sync to control calling sync_file_range on WAL files. bytes_per_sync will apply only to table files now. Test Plan: no more sync_file_range for WAL as evidenced by strace Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38253	2015-05-18 17:03:59 -07:00
Aashish Pant	794ccfde89	Task 6532943: Rocksdb - SetCapacity() can dynamically change cache capacity if feasible Summary: When new capacity is larger than existing capacity, simply update the capacity to the new valie When new capacity is less than existing capacity, but more than the usage, simply update the capacity to new value When new capacity is less than the existing capacity and existing usage both, try to purge entries in LRU if feasible to make usage < capacity Test Plan: Created unit tests in cache_test.cc Reviewers: sdong, rven, yhchiang, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D37527	2015-04-24 14:12:58 -07:00
sdong	953a885ebf	A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge Summary: Currently users have no idea a key is add, delete or merge from TablePropertiesCollector call back. Add a new function to add it. Also refactor the codes so that (1) make table property collector and internal table property collector two separate data structures with the later one now exposed (2) table builders only receive internal table properties Test Plan: Add cases in table_properties_collector_test to cover both of old and new ways of using TablePropertiesCollector. Reviewers: yhchiang, igor.sugak, rven, igor Reviewed By: rven, igor Subscribers: meyering, yoshinorim, maykov, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D35373	2015-04-06 10:27:21 -07:00
sdong	b23bbaa82a	Universal Compactions with Small Files Summary: With this change, we use L1 and up to store compaction outputs in universal compaction. The compaction pick logic stays the same. Outputs are stored in the largest "level" as possible. If options.num_levels=1, it behaves all the same as now. Test Plan: 1) convert most of existing unit tests for universal comapaction to include the option of one level and multiple levels. 2) add a unit test to cover parallel compaction in universal compaction and run it in one level and multiple levels 3) add unit test to migrate from multiple level setting back to one level setting 4) add a unit test to insert keys to trigger multiple rounds of compactions and verify results. Reviewers: rven, kradhakrishnan, yhchiang, igor Reviewed By: igor Subscribers: meyering, leveldb, MarkCallaghan, dhruba Differential Revision: https://reviews.facebook.net/D34539	2015-03-30 15:12:02 -07:00
Igor Canadi	d61cb0b9de	db_bench can now disable flashcache for background threads Summary: Most of the approach is copied from WebSQL's MySQL branch. It's nice that we can do this without touching core RocksDB code. Test Plan: Compiles and runs. Didn't test flashback code, as I don't have flashback device and most if it is c/p Reviewers: MarkCallaghan, sdong Reviewed By: sdong Subscribers: rven, lgalanis, kradhakrishnan, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35391	2015-03-30 09:51:11 -07:00
Yueh-Hsuan Chiang	39d508e34c	Add a missing section title in HISTORY.md Summary: Add a missing section title in HISTORY.md Test Plan: no code change	2015-03-25 14:14:26 -07:00
Yueh-Hsuan Chiang	2d417e52df	Update HISTORY.md for 3.10.0 Summary: Update HISTORY.md for 3.10.0 Test Plan: no code chagne. Reviewers: sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35871	2015-03-24 16:39:39 -07:00
Igor Canadi	b088c83e6e	Don't delete files when column family is dropped Summary: To understand the bug read t5943287 and check out the new test in column_family_test (ReadDroppedColumnFamily), iter 0. RocksDB contract allowes you to read a drop column family as long as there is a live reference. However, since our iteration ignores dropped column families, AddLiveFiles() didn't mark files of a dropped column families as live. So we deleted them. In this patch I no longer ignore dropped column families in the iteration. I think this behavior was confusing and it also led to this bug. Now if an iterator client wants to ignore dropped column families, he needs to do it explicitly. Test Plan: Added a new unit test that is failing on master. Unit test succeeds now. Reviewers: sdong, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32535	2015-03-19 17:04:29 -07:00
agiardullo	81345b90f9	Create an abstract interface for write batches Summary: WriteBatch and WriteBatchWithIndex now both inherit from a common abstract base class. This makes it easier to write code that is agnostic toward the implementation of the particular write batch. In particular, I plan on utilizing this abstraction to allow transactions to support using either implementation of a write batch. Test Plan: modified existing WriteBatchWithIndex tests to test new functions. Running all tests. Reviewers: igor, rven, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34017	2015-03-17 19:23:08 -07:00
Igor Canadi	c88ff4ca76	Deprecate removeScanCountLimit in NewLRUCache Summary: It is no longer used by the implementation, so we should also remove it from the public API. Test Plan: make check Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34971	2015-03-17 15:04:37 -07:00
Igor Canadi	db03739340	options.level_compaction_dynamic_level_bytes to allow RocksDB to pick size bases of levels dynamically. Summary: When having fixed max_bytes_for_level_base, the ratio of size of largest level and the second one can range from 0 to the multiplier. This makes LSM tree frequently irregular and unpredictable. It can also cause poor space amplification in some cases. In this improvement (proposed by Igor Kabiljo), we introduce a parameter option.level_compaction_use_dynamic_max_bytes. When turning it on, RocksDB is free to pick a level base in the range of (options.max_bytes_for_level_base/options.max_bytes_for_level_multiplier, options.max_bytes_for_level_base] so that real level ratios are close to options.max_bytes_for_level_multiplier. Test Plan: New unit tests and pass tests suites including valgrind. Reviewers: MarkCallaghan, rven, yhchiang, igor, ikabiljo Reviewed By: ikabiljo Subscribers: yoshinorim, ikabiljo, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31437	2015-03-02 22:40:41 -08:00
Igor Canadi	b9ff6b050d	Fix a bug in ReadOnlyBackupEngine Summary: This diff fixes a bug introduced by D28521. Read-only backup engine can delete a backup that is later than the latest -- we never check the condition. I also added a bunch of logging that will help with debugging cases like this in the future. See more discussion at t6218248. Test Plan: Added a unit test that was failing before the change. Also, see new LOG file contents: https://phabricator.fb.com/P19738984 Reviewers: benj, sanketh, sumeet, yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33897	2015-02-27 14:03:56 -08:00
sdong	68af7811ea	Remember whole key/prefix filtering on/off in SST file Summary: Remember whole key or prefix filtering on/off in SST files. If user opens the DB with a different setting that cannot be satisfied while reading the SST file, ignore the bloom filter. Test Plan: Add a unit test for it Reviewers: yhchiang, igor, rven Reviewed By: rven Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32889	2015-02-11 11:20:04 -08:00
fyrz	cfe8837e43	Switch logv with loglevel to virtual	2015-02-09 20:59:29 +01:00
sdong	e63140d52b	Get() to use prefix bloom filter when filter is not block based Summary: Get() now doesn't make use of bloom filter if it is prefix based. Add the check. Didn't touch block based bloom filter. I can't fully reason whether it is correct to do that. But it's straight-forward to for full bloom filter. Test Plan: make all check Add a test case in DBTest Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: MarkCallaghan, leveldb, dhruba, yoshinorim Differential Revision: https://reviews.facebook.net/D31941	2015-02-04 15:15:41 -08:00
sdong	5917de0bae	CappedFixTransform: return fixed length prefix, or full key if key is shorter than the fixed length Summary: Add CappedFixTransform, which is the same as fixed length prefix extractor, except that when slice is shorter than the fixed length, it will use the full key. Test Plan: Add a test case for db_test options_test and a new test Reviewers: yhchiang, rven, igor Reviewed By: igor Subscribers: MarkCallaghan, leveldb, dhruba, yoshinorim Differential Revision: https://reviews.facebook.net/D31887	2015-01-30 16:04:30 -08:00
Igor Canadi	2fd8f750ab	Compile MemEnv with standard RocksDB library Summary: This was a feature request by osquery. See task t5617758 Test Plan: compiles and memenv_test runs Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32115	2015-01-29 16:33:11 -08:00
Yueh-Hsuan Chiang	cd4c071973	Update HISTORY.md for GetThreadStatus() support on compaction.	2015-01-22 15:46:56 -08:00
Igor Canadi	9ab5adfc59	New BlockBasedTable version -- better compressed block format Summary: This diff adds BlockBasedTable format_version = 2. New format version brings better compressed block format for these compressions: 1) Zlib -- encode decompressed size in compressed block header 2) BZip2 -- encode decompressed size in compressed block header 3) LZ4 and LZ4HC -- instead of doing memcpy of size_t encode size as varint32. memcpy is very bad because the DB is not portable accross big/little endian machines or even platforms where size_t might be 8 or 4 bytes. It does not affect format for snappy. If you write a new database with format_version = 2, it will not be readable by RocksDB versions before 3.10. DB::Open() will return corruption in that case. Test Plan: Added a new test in db_test. I will also run db_bench and verify VSIZE when block_cache == 1GB Reviewers: yhchiang, rven, MarkCallaghan, dhruba, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31461	2015-01-14 16:24:24 -08:00
Igor Canadi	62ad0a9b19	Deprecating skip_log_error_on_recovery Summary: Since https://reviews.facebook.net/D16119, we ignore partial tailing writes. Because of that, we no longer need skip_log_error_on_recovery. The documentation says "Skip log corruption error on recovery (If client is ok with losing most recent changes)", while the option actually ignores any corruption of the WAL (not only just the most recent changes). This is very dangerous and can lead to DB inconsistencies. This was originally set up to ignore partial tailing writes, which we now do automatically (after D16119). I have digged up old task t2416297 which confirms my findings. Test Plan: There was actually no tests that verified correct behavior of skip_log_error_on_recovery. Reviewers: yhchiang, rven, dhruba, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30603	2015-01-05 13:35:56 -08:00
Lei Jin	5045c43944	add support for nested BlockBasedTableOptions in config string Summary: Add support to allow nested config for block-based table factory. The format looks like this: "write_buffer_size=1024;block_based_table_factory={block_size=4k};max_write_buffer_num=2" Test Plan: unit test Reviewers: yhchiang, rven, igor, ljin, jonahcohen Reviewed By: jonahcohen Subscribers: jonahcohen, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D29223	2014-12-22 16:34:21 -08:00
Igor Canadi	0acc738810	Speed up FindObsoleteFiles() Summary: There are two versions of FindObsoleteFiles(): * full scan, which is executed every 6 hours (and it's terribly slow) * no full scan, which is executed every time a background process finishes and iterator is deleted This diff is optimizing the second case (no full scan). Here's what we do before the diff: * Get the list of obsolete files (files with ref==0). Some files in obsolete_files set might actually be live. * Get the list of live files to avoid deleting files that are live. * Delete files that are in obsolete_files and not in live_files. After this diff: * The only files with ref==0 that are still live are files that have been part of move compaction. Don't include moved files in obsolete_files. * Get the list of obsolete files (which exclude moved files). * No need to get the list of live files, since all files in obsolete_files need to be deleted. I'll post the benchmark results, but you can get the feel of it here: https://reviews.facebook.net/D30123 This depends on D30123. P.S. We should do full scan only in failure scenarios, not every 6 hours. I'll do this in a follow-up diff. Test Plan: One new unit test. Made sure that unit test fails if we don't have a `if (!f->moved)` safeguard in ~Version. make check Big number of compactions and flushes: ./db_stress --threads=30 --ops_per_thread=20000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0 --reopen=15 --max_background_compactions=10 --max_background_flushes=10 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000 Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30249	2014-12-22 12:04:45 +01:00

1 2 3

144 Commits