rocksdb

Author	SHA1	Message	Date
Kai Liu	054c5dda8c	Merge branch 'master' into performance Conflicts: db/db_impl.cc db/db_test.cc db/memtable.cc db/version_set.cc include/rocksdb/statistics.h util/statistics_imp.h	2014-01-23 16:32:49 -08:00
Tomislav Novak	81c9cc9b3b	Tailing iterator Summary: This diff implements a special type of iterator that doesn't create a snapshot (can be used to read newly inserted data) and is optimized for doing sequential reads. TailingIterator uses current superversion number to determine whether to invalidate its internal iterators. If the version hasn't changed, it can often avoid doing expensive seeks over immutable structures (sst files and immutable memtables). Test Plan: * new unit tests * running LD with this patch Reviewers: igor, dhruba, haobo, sdong, kailiu Reviewed By: sdong CC: leveldb, lovro, march Differential Revision: https://reviews.facebook.net/D15285	2014-01-23 16:26:08 -08:00
Kai Liu	bb19b530ca	Aggressively inlining the short functions in coding.cc Summary: This diff takes an even more aggressive way to inline the functions. A decent rule that I followed is "not inline a function if it is more than 10 lines long." Normally optimizing code by inline is ugly and hard to control, but since one of our usecase has significant amount of CPU used in functions from coding.cc, I'd like to try this diff out. Test Plan: 1. the size for some .o file increased a little bit, but most less than 1%. So I think the negative impact of inline is negligible. 2. As the regression test shows (ran for 10 times and I calculated the average number) Metrics Befor After ======================================================================== rocksdb.build.fillseq.qps 426595 444515 (+4.6%) rocksdb.build.memtablefillrandom.qps 121739 123110 rocksdb.build.memtablereadrandom.qps 1285103 1280520 rocksdb.build.overwrite.qps 125816 135570 (+9%) rocksdb.build.readrandom_fillunique_random.qps 285995 296863 rocksdb.build.readrandom_memtable_sst.qps 1027132 1027279 rocksdb.build.readrandom.qps 1041427 1054665 rocksdb.build.readrandom_smallblockcache.qps 1028631 1038433 rocksdb.build.readwhilewriting.qps 918352 914629 Reviewers: haobo, sdong, igor CC: leveldb Differential Revision: https://reviews.facebook.net/D15291	2014-01-23 16:03:34 -08:00
Igor Canadi	83681bf9ef	Statistics code cleanup Summary: I'm separating code-cleanup part of https://reviews.facebook.net/D14517. This will make D14517 easier to understand and this diff easier to review. Test Plan: make check Reviewers: haobo, kailiu, sdong, dhruba, tnovak Reviewed By: tnovak CC: leveldb Differential Revision: https://reviews.facebook.net/D15099	2014-01-17 12:46:06 -08:00
Naman Gupta	1447bb5919	Allow callback to change size of existing value. Change return type of the callback function to an enum status to handle 3 cases. Summary: This diff fixes 2 hacks: * The callback function can modify the existing value inplace, if the merged value fits within the existing buffer size. But currently the existing buffer size is not being modified. Now the callback recieves a int* allowing the size to be modified. Since size is encoded as a varint in the internal key for memtable. It might happen that the entire value might have be copied to the new location if the new size varint is smaller than the existing size varint. * The callback function has 3 functionalities 1. Modify existing buffer inplace, and update size correspondingly. Now to indicate that, Returns 1. 2. Generate a new buffer indicating merged value. Returns 2. 3. Fails to do either of above, based on whatever application logic. Returns 0. Test Plan: Just make all for now. I'm adding another unit test to test each scenario. Reviewers: dhruba, haobo Reviewed By: haobo CC: leveldb, sdong, kailiu, xinyaohu, sumeet, danguo Differential Revision: https://reviews.facebook.net/D15195	2014-01-16 15:12:39 -08:00
kailiu	1304d8c8ce	Merge branch 'master' into performance Conflicts: Makefile db/db_impl.cc db/db_impl.h db/db_test.cc db/memtable.cc db/memtable.h db/version_edit.h db/version_set.cc include/rocksdb/options.h util/hash_skiplist_rep.cc util/options.cc	2014-01-15 23:12:31 -08:00
kailiu	eae1804f29	Remove the unnecessary use of shared_ptr Summary: shared_ptr is slower than unique_ptr (which literally comes with no performance cost compare with raw pointers). In memtable and memtable rep, we use shared_ptr when we'd actually should use unique_ptr. According to igor's previous work, we are likely to make quite some performance gain from this diff. Test Plan: make check Reviewers: dhruba, igor, sdong, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15213	2014-01-15 18:22:01 -08:00
kailiu	c8f16221ed	Fix the return type of WriteBatch::Data(). Summary: Quick fix for https://reviews.facebook.net/D15123 Test Plan: Make check Reviewers: sdong, vkrest CC: leveldb Differential Revision: https://reviews.facebook.net/D15165	2014-01-14 20:24:48 -08:00
Igor Canadi	d9cd7a063f	Fix CompactRange to apply filter to every key Summary: When doing CompactRange(), we should first flush the memtable and then calculate max_level_with_files. Also, we want to compact all the levels that have files, including level `max_level_with_files`. This patch fixed the unit test. Test Plan: Added a failing unit test and a fix, so it's not failing anymore. Reviewers: dhruba, haobo, sdong Reviewed By: haobo CC: leveldb, xjin Differential Revision: https://reviews.facebook.net/D14421	2014-01-14 16:19:09 -08:00
Siying Dong	9ea8bf90f1	DB::Put() to estimate write batch data size needed and pre-allocate buffer Summary: In one of CPU profiles, we see some CPU costs of string::reserve() inside Batch.Put(). This patch should be able to reduce some of the costs by allocating sufficient buffer before hand. Since it is a trivial percentage of CPU costs, I didn't find a way to show the improvement in one of the benchmarks. I'll deploy it to same application and do the same CPU profiling to make sure those CPU costs are reduced. Test Plan: make all check Reviewers: haobo, kailiu, igor Reviewed By: haobo CC: leveldb, nkg- Differential Revision: https://reviews.facebook.net/D15135	2014-01-14 11:24:43 -08:00
Siying Dong	51dd21926c	DB::Put() to estimate write batch data size needed and pre-allocate buffer Summary: In one of CPU profiles, we see some CPU costs of string::reserve() inside Batch.Put(). This patch should be able to reduce some of the costs by allocating sufficient buffer before hand. Since it is a trivial percentage of CPU costs, I didn't find a way to show the improvement in one of the benchmarks. I'll deploy it to same application and do the same CPU profiling to make sure those CPU costs are reduced. Test Plan: make all check Reviewers: haobo, kailiu, igor Reviewed By: haobo CC: leveldb, nkg- Differential Revision: https://reviews.facebook.net/D15135	2014-01-14 10:53:16 -08:00
Naman Gupta	8454cfe569	Add read/modify/write functionality to Put() api Summary: The application can set a callback function, which is applied on the previous value. And calculates the new value. This new value can be set, either inplace, if the previous value existed in memtable, and new value is smaller than previous value. Otherwise the new value is added normally. Test Plan: fbmake. Added unit tests. All unit tests pass. Reviewers: dhruba, haobo Reviewed By: haobo CC: sdong, kailiu, xinyaohu, sumeet, leveldb Differential Revision: https://reviews.facebook.net/D14745	2014-01-14 07:55:16 -08:00
Siying Dong	c4548d5f1f	WriteBatch to provide a way for user to query data size directly and only return constant reference of data in Data() Summary: WriteBatch::Data() now is easily to be misuse by users. Also, there is no cheap way for user of WriteBatch to know the data size accumulated. This patch fix the problem by: (1) return a constant reference to Data() so it's obvious to caller what it means. (2) add a function to return data size directly Test Plan: make all check Reviewers: haobo, igor, kailiu Reviewed By: kailiu CC: zshao, leveldb Differential Revision: https://reviews.facebook.net/D15123	2014-01-13 16:52:14 -08:00
Schalk-Willem Kruger	a09ee1069d	Improve RocksDB "get" performance by computing merge result in memtable Summary: Added an option (max_successive_merges) that can be used to specify the maximum number of successive merge operations on a key in the memtable. This can be used to improve performance of the "get" operation. If many successive merge operations are performed on a key, the performance of "get" operations on the key deteriorates, as the value has to be computed for each "get" operation by applying all the successive merge operations. FB Task ID: #3428853 Test Plan: make all check db_bench --benchmarks=readrandommergerandom counter_stress_test Reviewers: haobo, vamsi, dhruba, sdong Reviewed By: haobo CC: zshao Differential Revision: https://reviews.facebook.net/D14991	2014-01-10 17:33:56 -08:00
Siying Dong	aa0ef6602d	[Performance Branch] If options.max_open_files set to be -1, cache table readers in FileMetadata for Get() and NewIterator() Summary: In some use cases, table readers for all live files should always be cached. In that case, there will be an opportunity to avoid the table cache look-up while Get() and NewIterator(). We define options.max_open_files = -1 to be the mode that table readers for live files will always be kept. In that mode, table readers are cached in FileMetaData (with a reference count hold in table cache). So that when executing table_cache.Get() and table_cache.newInterator(), LRU cache checking can be by-passed, to reduce latency. Test Plan: add a test case in db_test Reviewers: haobo, kailiu Reviewed By: haobo CC: dhruba, igor, leveldb Differential Revision: https://reviews.facebook.net/D15039	2014-01-10 15:57:49 -08:00
Siying Dong	424a524ac9	[Performance Branch] A Hashed Linked List Based Mem Table Summary: Implement a mem table, in which keys are hashed based on prefixes. In each bucket, entries are organized in a sorted linked list. It has the same thread safety guarantee as skip list. The motivation is to optimize memory usage for the case that prefix hashing is primary way of seeking to the entry. Compared to hash skip list implementation, this implementation is more memory efficient, but inside each bucket, search is always linear. The target scenario is that there are only very limited number of records in each hash bucket. Test Plan: Add a test case in db_test Reviewers: haobo, kailiu, dhruba Reviewed By: haobo CC: igor, nkg-, leveldb Differential Revision: https://reviews.facebook.net/D14979	2014-01-09 16:19:11 -08:00
kailiu	12b6d2b839	Separate the aligned and unaligned memory allocation Summary: Use two vectors for different types of memory allocation. Test Plan: run all unit tests. Reviewers: haobo, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15027	2014-01-08 15:11:42 -08:00
Igor Canadi	a45b7d83ba	Merge pull request #59 from mlin/more-c-bindings C API: add rocksdb_env_set_high_priority_background_threads	2014-01-07 16:33:03 -08:00
kailiu	e72aa37cc5	Merge branch 'master' into performance Conflicts: db/table_cache.cc	2014-01-02 16:34:59 -08:00
Igor Canadi	b60c14f6ee	Support multi-threaded DisableFileDeletions() and EnableFileDeletions() Summary: We don't want two threads to clash if they concurrently call DisableFileDeletions() and EnableFileDeletions(). I'm adding a counter that will enable file deletions only after all DisableFileDeletions() calls have been negated with EnableFileDeletions(). However, we also don't want to break the old behavior, so I added a parameter force to EnableFileDeletions(). If force is true, we will still enable file deletions after every call to EnableFileDeletions(), which is what is happening now. Test Plan: make check Reviewers: dhruba, haobo, sanketh Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14781	2014-01-02 03:33:42 -08:00
Mike Lin	4b1d049236	C API: add rocksdb_env_set_high_priority_background_threads	2013-12-31 15:14:18 -08:00
kailiu	f1cec73a76	Merge branch 'master' into performance Conflicts: db/db_impl.cc db/db_test.cc db/memtable.cc db/version_set.cc include/rocksdb/statistics.h	2013-12-27 12:23:17 -08:00
Siying Dong	18df47b79a	Avoid malloc in NotFound key status if no message is given. Summary: In some places we have NotFound status created with empty message, but it doesn't avoid a malloc. With this patch, the malloc is avoided for that case. The motivation of it is that I found in db_bench readrandom test when all keys are not existing, about 4% of the total running time is spent on malloc of Status, plus a similar amount of CPU spent on free of them, which is not necessary. Test Plan: make all check Reviewers: dhruba, haobo, igor Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D14691	2013-12-26 16:23:10 -08:00
Siying Dong	abaf26266d	[RocksDB] [Performance Branch] Some Changes to PlainTable format Summary: Some changes to PlainTable format: (1) support variable key length (2) use user defined slice transformer to extract prefixes (3) Run some test cases against PlainTable in db_test and table_test Test Plan: test db_test Reviewers: haobo, kailiu CC: dhruba, igor, leveldb, nkg- Differential Revision: https://reviews.facebook.net/D14457	2013-12-20 12:08:35 -08:00
Igor Canadi	b26dc95628	Initialize sequence number in BatchResult - issue #39	2013-12-20 10:01:12 -08:00
Mike Lin	2a2506b629	C bindings: add a bunch of the newer options	2013-12-15 13:47:06 -08:00
Kai Liu	2e9efcd6d8	Add the property block for the plain table Summary: This is the last diff that adds the property block to plain table. The format resembles that of the block-based table: https://github.com/facebook/rocksdb/wiki/Rocksdb-table-format [data block] [meta block 1: stats block] [meta block 2: future extended block] ... [meta block K: future extended block] (we may add more meta blocks in the future) [metaindex block] [index block: we only have the placeholder here, we can add persistent index block in the future] [Footer: contains magic number, handle to metaindex block and index block] <end_of_file> Test Plan: extended existing property block test. Reviewers: haobo, sdong, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14523	2013-12-13 17:18:14 -08:00
kailiu	b660e2d468	Expose usage info for the cache Summary: This diff will help us to figure out the memory usage for the cache part. Test Plan: added a new memory usage test for cache Reviewers: haobo, sdong, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14559	2013-12-13 12:53:45 -08:00
Mark Callaghan	e9e6b00d29	Add monitoring for universal compaction and add counters for compaction IO Summary: Adds these counters { WAL_FILE_SYNCED, "rocksdb.wal.synced" } number of writes that request a WAL sync { WAL_FILE_BYTES, "rocksdb.wal.bytes" }, number of bytes written to the WAL { WRITE_DONE_BY_SELF, "rocksdb.write.self" }, number of writes processed by the calling thread { WRITE_DONE_BY_OTHER, "rocksdb.write.other" }, number of writes not processed by the calling thread. Instead these were processed by the current holder of the write lock { WRITE_WITH_WAL, "rocksdb.write.wal" }, number of writes that request WAL logging { COMPACT_READ_BYTES, "rocksdb.compact.read.bytes" }, number of bytes read during compaction { COMPACT_WRITE_BYTES, "rocksdb.compact.write.bytes" }, number of bytes written during compaction Per-interval stats output was updated with WAL stats and correct stats for universal compaction including a correct value for write-amplification. It now looks like: Compactions Level Files Size(MB) Score Time(sec) Read(MB) Write(MB) Rn(MB) Rnp1(MB) Wnew(MB) RW-Amplify Read(MB/s) Write(MB/s) Rn Rnp1 Wnp1 NewW Count Ln-stall Stall-cnt -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0 7 464 46.4 281 3411 3875 3411 0 3875 2.1 12.1 13.8 621 0 240 240 628 0.0 0 Uptime(secs): 310.8 total, 2.0 interval Writes cumulative: 9999999 total, 9999999 batches, 1.0 per batch, 1.22 ingest GB WAL cumulative: 9999999 WAL writes, 9999999 WAL syncs, 1.00 writes per sync, 1.22 GB written Compaction IO cumulative (GB): 1.22 new, 3.33 read, 3.78 write, 7.12 read+write Compaction IO cumulative (MB/sec): 4.0 new, 11.0 read, 12.5 write, 23.4 read+write Amplification cumulative: 4.1 write, 6.8 compaction Writes interval: 100000 total, 100000 batches, 1.0 per batch, 12.5 ingest MB WAL interval: 100000 WAL writes, 100000 WAL syncs, 1.00 writes per sync, 0.01 MB written Compaction IO interval (MB): 12.49 new, 14.98 read, 21.50 write, 36.48 read+write Compaction IO interval (MB/sec): 6.4 new, 7.6 read, 11.0 write, 18.6 read+write Amplification interval: 101.7 write, 102.9 compaction Stalls(secs): 142.924 level0_slowdown, 0.000 level0_numfiles, 0.805 memtable_compaction, 0.000 leveln_slowdown Stalls(count): 132461 level0_slowdown, 0 level0_numfiles, 3 memtable_compaction, 0 leveln_slowdown Task ID: #3329644, #3301695 Blame Rev: Test Plan: Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14583	2013-12-12 13:27:43 -08:00
Siying Dong	a8029fdc75	Introduce MergeContext to Lazily Initialize merge operand list Summary: In get operations, merge_operands is only used in few cases. Lazily initialize it can reduce average latency in some cases Test Plan: make all check Reviewers: haobo, kailiu, dhruba Reviewed By: haobo CC: igor, nkg-, leveldb Differential Revision: https://reviews.facebook.net/D14415 Conflicts: db/db_impl.cc db/memtable.cc	2013-12-11 11:37:28 -08:00
Haobo Xu	3c02c363b3	[RocksDB] [Performance Branch] Added dynamic bloom, to be used for memable non-existing key filtering Summary: as title Test Plan: dynamic_bloom_test Reviewers: dhruba, sdong, kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D14385	2013-12-11 00:15:14 -08:00
kailiu	c79e595471	Make Cache::GetCapacity constant Summary: This will allow us to access constant via `DB::GetOptions().table_cache.GetCapacity()` or `DB::GetOptions().block_cache.GetCapacity()` since GetOptions() is also constant method.	2013-12-10 17:34:35 -08:00
Igor Canadi	a204dabb9d	Merge pull request #31 from sepeth/c-api Rename leveldb to rocksdb in C api	2013-12-10 09:18:47 -08:00
Doğan Çeçen	6c4e110c8c	Rename leveldb to rocksdb in C api	2013-12-10 10:48:35 +02:00
Igor Canadi	fb9fce4fc3	[RocksDB] BackupableDB Summary: In this diff I present you BackupableDB v1. You can easily use it to backup your DB and it will do incremental snapshots for you. Let's first describe how you would use BackupableDB. It's inheriting StackableDB interface so you can easily construct it with your DB object -- it will add a method RollTheSnapshot() to the DB object. When you call RollTheSnapshot(), current snapshot of the DB will be stored in the backup dir. To restore, you can just call RestoreDBFromBackup() on a BackupableDB (which is a static method) and it will restore all files from the backup dir. In the next version, it will even support automatic backuping every X minutes. There are multiple things you can configure: 1. backup_env and db_env can be different, which is awesome because then you can easily backup to HDFS or wherever you feel like. 2. sync - if true, it guarantees backup consistency on machine reboot 3. number of snapshots to keep - this will keep last N snapshots around if you want, for some reason, be able to restore from an earlier snapshot. All the backuping is done in incremental fashion - if we already have 00010.sst, we will not copy it again. IMPORTANT -- This is based on assumption that 00010.sst never changes - two files named 00010.sst from the same DB will always be exactly the same. Is this true? I always copy manifest, current and log files. 4. You can decide if you want to flush the memtables before you backup, or you're fine with backing up the log files -- either way, you get a complete and consistent view of the database at a time of backup. 5. More things you can find in BackupableDBOptions Here is the directory structure I use: backup_dir/CURRENT_SNAPSHOT - just 4 bytes holding the latest snapshot 0, 1, 2, ... - files containing serialized version of each snapshot - containing a list of files files/*.sst - sst files shared between snapshots - if one snapshot references 00010.sst and another one needs to backup it from the DB, it will just reference the same file files/ 0/, 1/, 2/, ... - snapshot directories containing private snapshot files - current, manifest and log files All the files are ref counted and deleted immediatelly when they get out of scope. Some other stuff in this diff: 1. Added GetEnv() method to the DB. Discussed with @haobo and we agreed that it seems right thing to do. 2. Fixed StackableDB interface. The way it was set up before, I was not able to implement BackupableDB. Test Plan: I have a unittest, but please don't look at this yet. I just hacked it up to help me with debugging. I will write a lot of good tests and update the diff. Also, `make asan_check` Reviewers: dhruba, haobo, emayanke Reviewed By: dhruba CC: leveldb, haobo Differential Revision: https://reviews.facebook.net/D14295	2013-12-09 14:06:52 -08:00
kailiu	c7707f24c2	Refine the statistics	2013-12-06 16:51:35 -08:00
kailiu	551e9428ce	Merge branch 'master' into performance	2013-12-06 14:15:42 -08:00
Siying Dong	ef2211a9ca	[RocksDB Performance Branch] Introduce MergeContext to Lazily Initialize merge operand list Summary: In get operations, merge_operands is only used in few cases. Lazily initialize it can reduce average latency in some cases Test Plan: make all check Reviewers: haobo, kailiu, dhruba Reviewed By: haobo CC: igor, nkg-, leveldb Differential Revision: https://reviews.facebook.net/D14415	2013-12-06 10:28:59 -08:00
kailiu	90729f8b23	Extract metaindex block from block-based table Summary: This change will allow other table to reuse the code for meta blocks. Test Plan: all existing unit tests passed Reviewers: dhruba, haobo, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D14475	2013-12-05 16:34:16 -08:00
Mayank Agarwal	92e8316118	Make GetDbIdentity pure virtual and also implement it for StackableDB, DBWithTTL Summary: As title Test Plan: make clean and make Reviewers: igor Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D14469	2013-12-05 12:02:31 -08:00
Mayank Agarwal	18802689b8	Make an API to get database identity from the IDENTITY file Summary: This would enable rocksdb users to get the db identity without depending on implementation details(storing that in IDENTITY file) Test Plan: db/db_test (has identity checks) Reviewers: dhruba, haobo, igor, kailiu Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14463	2013-12-04 22:39:17 -08:00
Siying Dong	f040e536e4	[RocksDB Performance Branch] A more customized index in PlainTableReader Summary: PlainTableReader to use a more customized hash table. This patch assumes the SST file is smaller than 2GB: (1) Every bucket uses 32-bit integer (2) no key is stored in bucket (3) use the first bit of the bucket value to distinguish it points to the file offset or a second level index. This index schema fits the use case that most of prefixes have very small number of keys Test Plan: plain_table_db_test Reviewers: haobo, kailiu, dhruba Reviewed By: haobo CC: nkg-, leveldb Differential Revision: https://reviews.facebook.net/D14343	2013-12-04 13:43:45 -08:00
Sajal Jain	28a1b9b95f	[rocksdb] statistics counters for memtable hits and misses Summary: added counters rocksdb.memtable.hit - for memtable hit rocksdb.memtable.miss - for memtable miss Test Plan: db_bench tests Reviewers: igor, dhruba, haobo Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D14433	2013-12-03 12:59:53 -08:00
Igor Canadi	eb12e47e0e	Killing Transform Rep Summary: Let's get rid of TransformRep and it's children. We have confirmed that HashSkipListRep works better with multifeed, so there is no benefit to keeping this around. This diff is mostly just deleting references to obsoleted functions. I also have a diff for fbcode that we'll need to push when we switch to new release. I had to expose HashSkipListRepFactory in the client header files because db_impl.cc needs access to GetTransform() function for SanitizeOptions. Test Plan: make check Reviewers: dhruba, haobo, kailiu, sdong Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14397	2013-12-03 12:42:15 -08:00
lovro	930cb0b9ee	Clarify CompactionFilter thread safety requirements Summary: Documenting our discussion Test Plan: make Reviewers: dhruba, haobo Reviewed By: dhruba CC: igor Differential Revision: https://reviews.facebook.net/D14403	2013-12-02 16:41:43 -08:00
Dhruba Borthakur	38feca4f35	Removed redundant slice_transform.h and memtablerep.h Summary: Removed redundant slice_transform.h and memtablerep.h Test Plan: make check Reviewers: CC: Task ID: # Blame Rev:	2013-11-29 18:03:02 -08:00
Kai Liu	1966b63137	Merge branch 'master' into perf	2013-11-27 11:47:40 -08:00
Haobo Xu	4e6463ea44	[RocksDB][Performance Branch] Make height and branching factor configurable for skiplist implementation Summary: As title. Especially, HashSkipListRepFactory will be able to specify a relatively small height, to reduce the memory overhead of one skiplist per bucket. Test Plan: make check and test it on leaf4 Reviewers: dhruba, sdong, kailiu CC: reconnect.grayhat, leveldb Differential Revision: https://reviews.facebook.net/D14307	2013-11-26 21:59:36 -08:00
Igor Canadi	3ce3658411	DB::GetOptions() Summary: We need access to options for BackupableDB Test Plan: make check Reviewers: dhruba Reviewed By: dhruba CC: leveldb, reconnect.grayhat Differential Revision: https://reviews.facebook.net/D14331	2013-11-25 15:51:50 -08:00
Igor Canadi	11c26bd4a4	[RocksDB] Interface changes required for BackupableDB Summary: This is part of https://reviews.facebook.net/D14295 -- smaller diff that is easier to review Test Plan: make asan_check Reviewers: dhruba, haobo, emayanke Reviewed By: emayanke CC: leveldb, kailiu, reconnect.grayhat Differential Revision: https://reviews.facebook.net/D14301	2013-11-25 12:39:23 -08:00
Haobo Xu	5b825d6964	[RocksDB] Use raw pointer instead of shared pointer when passing Statistics object internally Summary: liveness of the statistics object is already ensured by the shared pointer in DB options. There's no reason to pass again shared pointer among internal functions. Raw pointer is sufficient and efficient. Test Plan: make check Reviewers: dhruba, MarkCallaghan, igor Reviewed By: dhruba CC: leveldb, reconnect.grayhat Differential Revision: https://reviews.facebook.net/D14289	2013-11-25 10:38:15 -08:00
Siying Dong	dfa1460d88	[For Performance Branch] Bloom filter in PlainTableIterator::Seek() - Update 1 Summary: Address @haobo's comments in D14277 Test Plan: ./indexed_table_db_test Reviewers: haobo CC: Task ID: # Blame Rev:	2013-11-21 23:33:45 -08:00
Siying Dong	718488abc5	Add BloomFilter to PlainTableIterator::Seek() Summary: This patch adds a simple bloom filter in PlainTableIterator::Seek() Test Plan: N/A Reviewers: CC: Task ID: # Blame Rev:	2013-11-21 22:26:39 -08:00
Siying Dong	3e35aa6412	Revert "Allow users to profile a query and see bottleneck of the query" This reverts commit `3d8ac31d71`.	2013-11-21 17:40:39 -08:00
Siying Dong	b135d01e7b	Allow users to profile a query and see bottleneck of the query Summary: Provide a framework to profile a query in detail to figure out latency bottleneck. Currently, in Get(), Put() and iterators, 2-3 simple timing is used. We can easily add more profile counters to the framework later. Test Plan: Enable this profiling in seveal existing tests. Reviewers: haobo, dhruba, kailiu, emayanke, vamsi, igor CC: leveldb Differential Revision: https://reviews.facebook.net/D14001 Conflicts: table/merger.cc	2013-11-21 17:39:19 -08:00
Siying Dong	3d8ac31d71	Allow users to profile a query and see bottleneck of the query Summary: Provide a framework to profile a query in detail to figure out latency bottleneck. Currently, in Get(), Put() and iterators, 2-3 simple timing is used. We can easily add more profile counters to the framework later. Test Plan: Enable this profiling in seveal existing tests. Reviewers: haobo, dhruba, kailiu, emayanke, vamsi, igor CC: leveldb Differential Revision: https://reviews.facebook.net/D14001	2013-11-21 16:29:57 -08:00
Siying Dong	58e1956d50	[Only for Performance Branch] A Hacky patch to lazily generate memtable key for prefix-hashed memtables. Summary: For prefix mem tables, encoding mem table key may be unnecessary if the prefix doesn't have any key. This patch is a little bit hacky but I want to try out the performance gain of removing this lazy initialization. In longer term, we might want to revisit the way we abstract mem tables implementations. Test Plan: make all check Reviewers: haobo, igor, kailiu Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D14265	2013-11-20 20:49:23 -08:00
Siying Dong	b59d4d5a50	A Simple Plain Table Summary: A Simple plain table format. No block structure. When creating the table reader, scanning the full table to create indexes. Test Plan:Add unit test Reviewers:haobo,dhruba,kailiu CC: Task ID: # Blame Rev:	2013-11-20 18:44:22 -08:00
kailiu	6eb5649800	Move flush_block_policy from Options to TableFactory Summary: Previously we introduce a `flush_block_policy_factory` in Options, however, that options is strongly releated to Table based tables. It will make more sense to move it to block based table's own factory class. Test Plan: make check to pass existing tests Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14211	2013-11-19 22:00:48 -08:00
kailiu	1415f8820d	Improve the "table stats" Summary: The primary motivation of the changes is to make it easier to figure out the inside of the tables. * rename "table stats" to "table properties" since now we have more than "integers" to store in the property block. * Add filter block size to the basic table properties. * Whenever a table is built, we'll log the table properties (the sample output is in Test Plan). * Make an api to expose deleted keys. Test Plan: Passed all existing test. and the sample output of table stats: ================================================================== Basic Properties ------------------------------------------------------------------ # data blocks: 1 # entries: 1 raw key size: 9 raw average key size: 9 raw value size: 9 raw average value size: 0 data block size: 25 index block size: 27 filter block size: 18 (estimated) table size: 70 filter policy: rocksdb.BuiltinBloomFilter ================================================================== User collected properties: InternalKeyPropertiesCollector ------------------------------------------------------------------ kDeletedKeys: 1 ================================================================== Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14187	2013-11-19 16:29:42 -08:00
Dhruba Borthakur	31295b0a1b	Add License message to public header files. Summary: Add License message to public header files. Test Plan: Reviewers: CC: Task ID: # Blame Rev:	2013-11-18 10:21:35 -08:00
kailiu	97d8e573a6	make util/env_posix.cc work under mac Summary: This diff invoves some more complicated issues in the posix environment. Test Plan: works under mac os. will need to verify dev box. Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14061	2013-11-16 23:44:39 -08:00
Igor Canadi	a0ce3fd00a	PurgeObsoleteFiles() unittest Summary: Created a unittest that verifies that automatic deletion performed by PurgeObsoleteFiles() works correctly. Also, few small fixes on the logic part -- call version_set_->GetObsoleteFiles() in FindObsoleteFiles() instead of on some arbitrary positions. Test Plan: Created a unit test Reviewers: dhruba, haobo, nkg- Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D14079	2013-11-14 18:03:57 -08:00
Kai Liu	88ba331c1a	Add the index/filter block cache Summary: This diff leverage the existing block cache and extend it to cache index/filter block. Test Plan: Added new tests in db_test and table_test The correctness is checked by: 1. make check 2. make valgrind_check Performance is test by: 1. 10 times of build_tools/regression_build_test.sh on two versions of rocksdb before/after the code change. Test results suggests no significant difference between them. For the two key operatons `overwrite` and `readrandom`, the average iops are both 20k and ~260k, with very small variance). 2. db_stress. Reviewers: dhruba Reviewed By: dhruba CC: leveldb, haobo, xjin Differential Revision: https://reviews.facebook.net/D13167	2013-11-12 22:46:51 -08:00
kailiu	21587760b9	Fixing the warning messages captured under mac os # Consider using `git commit -m 'One line title' && arc diff`. # You will save time by running lint and unit in the background. Summary: The work to make sure mac os compiles rocksdb is not completed yet. But at least we can start cleaning some warnings captured only by g++ from mac os.. Test Plan: ran make in mac os Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14049	2013-11-12 20:05:28 -08:00
Igor Canadi	9bc4a26f56	Small changes in Deleting obsolete files Summary: @haobo's suggestions from https://reviews.facebook.net/D13827 Renaming some variables, deprecating purge_log_after_flush, changing for loop into auto for loop. I have not implemented deleting objects outside of mutex yet because it would require a big code change - we would delete object in db_impl, which currently does not know anything about object because it's defined in version_edit.h (FileMetaData). We should do it at some point, though. Test Plan: Ran deletefile_test Reviewers: haobo Reviewed By: haobo CC: leveldb, haobo Differential Revision: https://reviews.facebook.net/D14025	2013-11-12 11:53:26 -08:00
Igor Canadi	65e45f0c4a	Update documentation Summary: Changed leveldb documentation with rocksdb in doc/index.html. Added some of the important options from options.h to doc. Also removed benchmark files and impl.h, since this is all replaced by RocksDB wikis. Test Plan: - Reviewers: dhruba, haobo, kailiu, emayanke, sdong Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13977	2013-11-11 21:02:38 -08:00
Kai Liu	e7c4d823c9	Fix two bugs that caused 3rd party release failure Summary: * Fix the link to gflags. * Fix a warning for the uninitialized data member.	2013-11-10 15:36:30 -08:00
lovro	8a46ecd357	WriteBatch::Put() overload that gathers key and value from arrays of slices Summary: In our project, when writing to the database, we want to form the value as the concatenation of a small header and a larger payload. It's a shame to have to copy the payload just so we can give RocksDB API a linear view of the value. Since RocksDB makes a copy internally, it's easy to support gather writes. Test Plan: write_batch_test, new test case Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13947	2013-11-08 16:34:32 -08:00
Igor Canadi	1510339e52	Speed up FindObsoleteFiles Summary: Here's one solution we discussed on speeding up FindObsoleteFiles. Keep a set of all files in DBImpl and update the set every time we create a file. I probably missed few other spots where we create a file. It might speed things up a bit, but makes code uglier. I don't really like it. Much better approach would be to abstract all file handling to a separate class. Think of it as layer between DBImpl and Env. Having a separate class deal with file namings and deletion would benefit both code cleanliness (especially with huge DBImpl) and speed things up. It will take a huge effort to do this, though. Let's discuss offline today. Test Plan: Ran ./db_stress, verified that files are getting deleted Reviewers: dhruba, haobo, kailiu, emayanke Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D13827	2013-11-08 15:23:46 -08:00
Igor Canadi	8b3379dc0a	Implementing DynamicIterator for TransformRepNoLock Summary: What @haobo done with TransformRep, now in TransformRepNoLock. Similar implementation, except that I made DynamicIterator a subclass of Iterator which makes me have less iterator initializations. Test Plan: ./prefix_test. Seeing huge savings vs. TransformRep again! Reviewers: dhruba, haobo, sdong, kailiu Reviewed By: haobo CC: leveldb, haobo Differential Revision: https://reviews.facebook.net/D13953	2013-11-08 00:31:09 -08:00
Kai Liu	fd075d6edd	Provide mechanism to configure when to flush the block Summary: Allow block based table to configure the way flushing the blocks. This feature will allow us to add support for prefix-aligned block. Test Plan: make check Reviewers: dhruba, haobo, sdong, igor Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D13875	2013-11-07 21:27:21 -08:00
Igor Canadi	444cf88a56	Flush the log outside of lock Summary: Added a new call LogFlush() that flushes the log contents to the OS buffers. We never call it with lock held. We call it once for every Read/Write and often in compaction/flush process so the frequency should not be a problem. Test Plan: db_test Reviewers: dhruba, haobo, kailiu, emayanke Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13935	2013-11-07 11:31:56 -08:00
Haobo Xu	fd2044883a	[RocksDB] Generalize prefix-aware iterator to be used for more than one Seek Summary: Added a prefix_seek flag in ReadOptions to indicate that Seek is prefix aware(might not return data with different prefix), and also not bound to a specific prefix. Multiple Seeks and range scans can be invoked on the same iterator. If a specific prefix is specified, this flag will be ignored. Just a quick prototype that works for PrefixHashRep, the new lockless memtable could be easily extended with this support too. Test Plan: test it on Leaf Reviewers: dhruba, kailiu, sdong, igor Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D13929	2013-11-06 20:45:49 -08:00
shamdor	c2be2cba04	WAL log retention policy based on archive size. Summary: Archive cleaning will still happen every WAL_ttl seconds but archived logs will be deleted only if archive size is greater then a WAL_size_limit value. Empty archived logs will be deleted evety WAL_ttl. Test Plan: 1. Unit tests pass. 2. Benchmark. Reviewers: emayanke, dhruba, haobo, sdong, kailiu, igor Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D13869	2013-11-06 18:46:28 -08:00
Igor Canadi	be96f2498e	TransformRep - use array instead of unordered_map Summary: I'm sending this diff together with https://reviews.facebook.net/D13881 because it didn't allow me to send only the array one. Here I also replaced unordered_map with just an array of shared_ptrs. This elminated all the locks. I will run the new benchmark and post the results here. Test Plan: db_test Reviewers: dhruba, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D13893	2013-11-06 11:55:43 -08:00
Mayank Agarwal	f837f5b1c9	Making the transaction log iterator more robust Summary: strict essentially means that we MUST find the startsequence. Thus we should return if starteSequence is not found in the first file in case strict is set. This will take care of ending the iterator in case of permanent gaps due to corruptions in the log files Also created NextImpl function that will have internal variable to distinguish whether Next is being called from StartSequence or by application. Set NotFoudn::gaps status to give an indication of gaps happeneing. Polished the inline documentation at various places Test Plan: * db_repl_stress test * db_test relating to transaction log iterator * fbcode/wormhole/rocksdb/rocks_log_iterator * sigma production machine sigmafio032.prn1 Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13689	2013-11-04 20:49:03 -08:00
Dhruba Borthakur	b4ad5e89ae	Implement a compressed block cache. Summary: Rocksdb can now support a uncompressed block cache, or a compressed block cache or both. Lookups first look for a block in the uncompressed cache, if it is not found only then it is looked up in the compressed cache. If it is found in the compressed cache, then it is uncompressed and inserted into the uncompressed cache. It is possible that the same block resides in the compressed cache as well as the uncompressed cache at the same time. Both caches have their own individual LRU policy. Test Plan: Unit test case attached. Reviewers: kailiu, sdong, haobo, leveldb Reviewed By: haobo CC: xjin, haobo Differential Revision: https://reviews.facebook.net/D12675	2013-11-01 14:31:35 -07:00
Haobo Xu	8cbe5bb56b	[RocksDB] Add OnCompactionStart to CompactionFilter class Summary: This is to give application compaction filter a chance to access context information of a specific compaction run. For example, depending on whether a compaction goes through all data files, the application could do things differently. Test Plan: make check Reviewers: dhruba, kailiu, sdong Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13683	2013-10-31 13:36:43 -07:00
Naman Gupta	b4fab3be2a	Merge branch 'master' of github.com:facebook/rocksdb into inplace	2013-10-31 11:51:03 -07:00
Naman Gupta	fe25070242	In-place updates for equal keys and similar sized values Summary: Currently for each put, a fresh memory is allocated, and a new entry is added to the memtable with a new sequence number irrespective of whether the key already exists in the memtable. This diff is an attempt to update the value inplace for existing keys. It currently handles a very simple case: 1. Key already exists in the current memtable. Does not inplace update values in immutable memtable or snapshot 2. Latest value type is a 'put' ie kTypeValue 3. New value size is less than existing value, to avoid reallocating memory TODO: For a put of an existing key, deallocate memory take by values, for other value types till a kTypeValue is found, ie. remove kTypeMerge. TODO: Update the transaction log, to allow consistent reload of the memtable. Test Plan: Added a unit test verifying the inplace update. But some other unit tests broken due to invalid sequence number checks. WIll fix them next. Reviewers: xinyaohu, sumeet, haobo, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D12423 Automatic commit by arc	2013-10-31 11:27:12 -07:00
Siying Dong	f03b2df010	Follow-up Cleaning-up After D13521 Summary: This patch is to address @haobo's comments on D13521: 1. rename Table to be TableReader and make its factory function to be GetTableReader 2. move the compression type selection logic out of TableBuilder but to compaction logic 3. more accurate comments 4. Move stat name constants into BlockBasedTable implementation. 5. remove some uncleaned codes in simple_table_db_test Test Plan: pass test suites. Reviewers: haobo, dhruba, kailiu Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D13785	2013-10-30 10:52:33 -07:00
Siying Dong	d4eec30ed0	Make "Table" pluggable Summary: This patch makes Table and TableBuilder a abstract class and make all the implementation of the current table into BlockedBasedTable and BlockedBasedTable Builder. Test Plan: Make db_test.cc to work with block based table. Add a new test simple_table_db_test.cc where a different simple table format is implemented. Reviewers: dhruba, haobo, kailiu, emayanke, vamsi Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13521	2013-10-28 17:54:09 -07:00
Kai Liu	994575c134	Support user-defined table stats collector Summary: 1. Added a new option that support user-defined table stats collection. 2. Added a deleted key stats collector in `utilities` Test Plan: Added a unit test for newly added code. Also ran make check to make sure other tests are not broken. Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13491	2013-10-28 15:45:14 -07:00
Igor Canadi	100fa8e013	If a Put fails, fail all other puts Summary: When a Put fails, it can leave database in a messy state. We don't want to pretend that everything is OK when it may not be. We fail every write following the failed one. I added checks for corruption to DBImpl::Write(). Is there anywhere else I need to add them? Test Plan: Corruption unit test. Reviewers: dhruba, haobo, kailiu Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13671	2013-10-28 12:36:02 -07:00
Mayank Agarwal	56305221c4	Unify DeleteFile and DeleteWalFiles Summary: This is to simplify rocksdb public APIs and improve the code quality. Created an additional parameter to ParseFileName for log sub type and improved the code for deleting a wal file. Wrote exhaustive unit-tests in delete_file_test Unification of other redundant APIs can be taken up in a separate diff Test Plan: Expanded delete_file test Reviewers: dhruba, haobo, kailiu, sdong Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13647	2013-10-25 08:32:14 -07:00
Haobo Xu	2fb361ad98	[RocksDB] Add perf_context.wal_write_time to track time spent on writing the recovery log. Summary: as title Test Plan: make check; ./perf_context_test Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13629	2013-10-23 13:38:39 -07:00
Mayank Agarwal	9b50106f9a	Dbid feature Summary: Create a new type of file on startup if it doesn't already exist called DBID. This will store a unique number generated from boost library's uuid header file. The use-case is to identify the case of a db losing all its data and coming back up either empty or from an image(backup/live replica's recovery) the key point to note is that DBID is not stored in a backup or db snapshot It's preferable to use Boost for uuid because: 1) A non-standard way of generating uuid is not good 2) /proc/sys/kernel/random/uuid generates a uuid but only on linux environments and the solution would not be clean 3) c++ doesn't have any direct way to get a uuid 4) Boost is a very good library that was already having linkage in rocksdb from third-party Note: I had to update the TOOLCHAIN_REV in build files to get latest verison of boost from third-party as the older version had a bug. I had to put Wno-uninitialized in Makefile because boost-1.51 has an unitialized variable and rocksdb would not comiple otherwise. Latet open-source for boost is 1.54 but is not there in third-party. I have notified the concerned people in fbcode about it. @kailiu : While releasing to third-party, an additional dependency will need to be created for boost in TARGETS file. I can help identify. Test Plan: Expand db_test to test 2 cases 1) Restarting db with Id file present - verify that no change to Id 2)Restarting db with Id file deleted - verify that a different Id is there after reopen Also run make all check Reviewers: dhruba, haobo, kailiu, sdong Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13587	2013-10-22 12:23:34 -07:00
Siying Dong	9edda37027	Universal Compaction to Have a Size Percentage Threshold To Decide Whether to Compress Summary: This patch adds a option for universal compaction to allow us to only compress output files if the files compacted previously did not yet reach a specified ratio, to save CPU costs in some cases. Compression is always skipped for flushing. This is because the size information is not easy to evaluate for flushing case. We can improve it later. Test Plan: add test DBTest.UniversalCompactionCompressRatio1 and DBTest.UniversalCompactionCompressRatio12 Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13467	2013-10-17 13:33:39 -07:00
Kai Liu	aac44226a0	Add bloom filter to predefined table stats Summary: As title. Test Plan: Updated the unit tests to make sure new statistic is correctly written/read. Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D13497	2013-10-17 11:43:06 -07:00
Mayank Agarwal	fe3713961e	Features in Transaction log iterator Summary: * Logstore requests a valid change of reutrning an empty iterator and not an error in case of no log files. * Changed the code to return the writebatch containing the sequence number requested from GetupdatesSince even if it lies in the middle. Earlier we used to return the next writebatch,. This also allows me oto guarantee that no files played upon by the iterator are redundant. I mean the starting log file has at least a sequence number >= the sequence number requested form GetupdatesSince. * Cleaned up redundant logic in Iterator::Next and made a new function SeekToStartSequence for greater readability and maintainibilty. * Modified a test in db_test accordingly Please check the logic carefully and suggest improvements. I have a separate patch out for more improvements like restricting reader to read till written sequences. Test Plan: * transaction log iterator tests in db_test, * db_repl_stress. * rocks_log_iterator_test in fbcode/wormhole/rocksdb/test - 2 tests thriving on hacks till now can get simplified * testing on the shadow setup for sigma with replication Reviewers: dhruba, haobo, kailiu, sdong Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13437	2013-10-14 18:16:21 -07:00
Kai Liu	86ef6c3f74	Add statistics to sst file Summary: So far we only have key/value pairs as well as bloom filter stored in the sst file. It will be great if we are able to store more metadata about this table itself, for example, the entry size, bloom filter name, etc. This diff is the first step of this effort. It allows table to keep the basic statistics mentioned in http://fburl.com/14995441, as well as allowing writing user-collected stats to stats block. After this diff, we will figure out the interface of how to allow user to collect their interested statistics. Test Plan: 1. Added several unit tests. 2. Ran `make check` to ensure it doesn't break other tests. Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D13419	2013-10-14 15:56:13 -07:00
sdong	f8509653ba	LRUCache to try to clean entries not referenced first. Summary: With this patch, when LRUCache.Insert() is called and the cache is full, it will first try to free up entries whose reference counter is 1 (would become 0 after remo\ ving from the cache). We do it in two passes, in the first pass, we only try to release those unreferenced entries. If we cannot free enough space after traversing t\ he first remove_scan_cnt_ entries, we start from the beginning again and remove those entries being used. Test Plan: add two unit tests to cover the codes Reviewers: dhruba, haobo, emayanke Reviewed By: emayanke CC: leveldb, emayanke, xjin Differential Revision: https://reviews.facebook.net/D13377	2013-10-11 09:26:21 -07:00
Igor Canadi	d0beadd456	Env class that can randomly read and write Summary: I have implemented basic simple use case that I need for External Value Store I'm working on. There is a potential for making this prettier by refactoring/combining WritableFile and RandomAccessFile, avoiding some copypasta. However, I decided to implement just the basic functionality, so I can continue working on the other diff. Test Plan: Added a unittest Reviewers: dhruba, haobo, kailiu Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D13365	2013-10-10 00:03:08 -07:00
Dhruba Borthakur	3c37955a2f	Remove obsolete namespace mappings. Summary: The previous release 2.4 had a mapping to alias the older namespace to rocksdb. This mapping is not needed in the new release. Test Plan: make check make release Reviewers: emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D13359	2013-10-08 22:53:38 -07:00
Naman Gupta	cbf4a06427	Add option for storing transaction logs in a separate dir Summary: In some cases, you might not want to store the data log (write ahead log) files in the same dir as the sst files. An example use case is leaf, which stores sst files in tmpfs. And would like to save the log files in a separate dir (disk) to save memory. Test Plan: make all. Ran db_test test. A few test failing. P2785018. If you guys don't see an obvious problem with the code, maybe somebody from the rocksdb team could help me debug the issue here. Running this on leaf worked well. I could see logs stored on disk, and deleted appropriately after compactions. Obviously this is only one set of options. The unit tests cover different options. Seems like I'm missing some edge cases. Reviewers: dhruba, haobo, leveldb CC: xinyaohu, sumeet Differential Revision: https://reviews.facebook.net/D13239	2013-10-08 17:40:27 -07:00
Dhruba Borthakur	4463b11cad	Migrate names of properties from 'leveldb' prefix to 'rocksdb' prefix. Summary: Migrate names of properties from 'leveldb' prefix to 'rocksdb' prefix. Test Plan: make check Reviewers: emayanke, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D13311	2013-10-06 00:14:26 -07:00
Dhruba Borthakur	a143ef9b38	Change namespace from leveldb to rocksdb Summary: Change namespace from leveldb to rocksdb. This allows a single application to link in open-source leveldb code as well as rocksdb code into the same process. Test Plan: compile rocksdb Reviewers: emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D13287	2013-10-04 11:59:26 -07:00
Mayank Agarwal	b3ed08129b	Add a statistic to count the number of calls to GetUpdatesSince Summary: This is useful to keep track of refreshes in transaction log iterator Test Plan: make; db_stress --statistics=1 shows it Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13281	2013-10-04 10:47:20 -07:00
Mayank Agarwal	854d236361	Add backward compatible option in GetLiveFiles to choose whether to not Flush first Summary: As explained in comments in GetLiveFiles in db.h, this option will cause flush to be skipped in GetLiveFiles because some use-cases use GetSortedWalFiles after GetLiveFiles to generate more complete snapshots. Using GetSortedWalFiles after GetLiveFiles allows us to not Flush in GetLiveFiles first because wals have everything. Note: file deletions will be disabled before calling GLF or GSWF so live logs will not move to archive logs or get delted. Note: Manifest file is truncated to a proper value in GLF, so it will always reply from the proper wal files on a restart Test Plan: make Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13257	2013-10-04 10:20:10 -07:00

1 2 3 4

174 Commits