rocksdb

Author	SHA1	Message	Date
Haobo Xu	ecd8db0200	[RocksDB] Minimize Mutex protected code section in the critical path Summary: rocksdb uses a single global lock to protect in memory metadata. We should minimize the mutex protected code section to increase the effective parallelism of the program. See https://our.intern.facebook.com/intern/tasks/?t=2218928 Test Plan: make check db_bench Reviewers: dhruba, heyongqiang CC: zshao, leveldb Differential Revision: https://reviews.facebook.net/D9705	2013-03-26 22:42:26 -07:00
Abhishek Kona	9b70529c86	Disable Unit Test for TransactionLogIteratorStall Summary: The unit test fails as our solution does not work with MMap'd files. Disable the failing unit test. Put it back with the next diff which should fix the problem. Test Plan: db_test Reviewers: heyongqiang CC: dhruba Differential Revision: https://reviews.facebook.net/D9645	2013-03-21 15:51:18 -07:00
Abhishek Kona	27c15fb67e	TransactionLogIter should stall at the last record. Currently it errors out Summary: * Add a method to check if the log reader is at EOF. * If we know a record has been flushed force the log_reader to believe it is not at EOF, using a new method UnMarkEof(). This does not work with MMpaed files. Test Plan: added a unit test. Reviewers: dhruba, heyongqiang Reviewed By: heyongqiang CC: leveldb Differential Revision: https://reviews.facebook.net/D9567	2013-03-21 15:12:35 -07:00
Dhruba Borthakur	d0798f67f4	Run compactions even if workload is readonly or read-mostly. Summary: The events that trigger compaction: * opening the database * Get -> only if seek compaction is not disabled and other checks are true * MakeRoomForWrite -> when memtable is full * BackgroundCall -> If the background thread is about to do a compaction run, it schedules a new background task to trigger a possible compaction. This will cause additional background threads to find and process other compactions that can run concurrently. Test Plan: ran db_bench with overwrite and readonly alternatively. Reviewers: sheki, MarkCallaghan Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D9579	2013-03-20 23:43:29 -07:00
Dhruba Borthakur	ad96563b79	Ability to configure bufferedio-reads, filesystem-readaheads and mmap-read-write per database. Summary: This patch allows an application to specify whether to use bufferedio, reads-via-mmaps and writes-via-mmaps per database. Earlier, there was a global static variable that was used to configure this functionality. The default setting remains the same (and is backward compatible): 1. use bufferedio 2. do not use mmaps for reads 3. use mmap for writes 4. use readaheads for reads needed for compaction I also added a parameter to db_bench to be able to explicitly specify whether to do readaheads for compactions or not. Test Plan: make check Reviewers: sheki, heyongqiang, MarkCallaghan Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D9429	2013-03-20 23:14:03 -07:00
Mayank Agarwal	b1bea58457	Fix more signed-unsigned comparisons Summary: Some comparisons left in log_test.cc and db_test.cc complained by make Test Plan: make Reviewers: dhruba, sheki Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D9537	2013-03-19 17:21:36 -07:00
Mayank Agarwal	487168cdcf	Fixed sign-comparison in rocksdb code-base and fixed Makefile Summary: Makefile had options to ignore sign-comparisons and unused-parameters, which should be there. Also fixed the specific errors in the code-base Test Plan: make Reviewers: chip, dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D9531	2013-03-19 14:35:23 -07:00
Mark Callaghan	72d14eafd3	add --benchmarks=levelstats option to db_bench, prevent "nan" in stats output Summary: Add --benchmarks=levelstats option to report per-level stats (#files, #bytes) Change readwhilewriting test to report response time for writes but exclude them from the stats merged by all threads. Prevent "NaN" in stats output by preventing division by 0. Remove "o" file I committed by mistake. Task ID: # Blame Rev: Test Plan: make check Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D9513	2013-03-19 13:14:44 -07:00
Abhishek Kona	02c459805b	Ignore a zero-sized file while looking for a seq-no in GetUpdatesSince Summary: Rocksdb can create 0 sized log files when it is opened and closed without any operations. The GetUpdatesSince fails currently if there is a log file of size zero. This diff fixes this. If there is a log file is 0, it is removed form the probable_file_list Test Plan: unit test Reviewers: dhruba, heyongqiang Reviewed By: heyongqiang CC: leveldb Differential Revision: https://reviews.facebook.net/D9507	2013-03-19 11:00:09 -07:00
Abhishek Kona	7b9db9c98e	DO not report level size as zero when there are no files in L0 Summary: Instead of checking for number of files in L0. Check for number of files in the requested level. Bug introduced in D4929 (diff trying to do too many things). Test Plan: db_test. Reviewers: dhruba, MarkCallaghan Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D9483	2013-03-18 12:04:38 -07:00
Mark Callaghan	5a8c8845a9	Enhance db_bench Summary: Add --benchmarks=updaterandom for read-modify-write workloads. This is different from --benchmarks=readrandomwriterandom in a few ways. First, an "operation" is the combined time to do the read & write rather than treating them as two ops. Second, the same key is used for the read & write. Change RandomGenerator to support rows larger than 1M. That was using "assert" to fail and assert is compiled-away when -DNDEBUG is used. Add more options to db_bench --duration - sets the number of seconds for tests to run. When not set the operation count continues to be the limit. This is used by random operation tests. --use_snapshot - when set GetSnapshot() is called prior to each random read. This is to measure the overhead from using snapshots. --get_approx - when set GetApproximateSizes() is called prior to each random read. This is to measure the overhead for a query optimizer. Task ID: # Blame Rev: Test Plan: run db_bench Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D9267	2013-03-14 16:00:23 -07:00
Mayank Agarwal	5b278b53ae	Fix valgrind errors in rocksdb tests: auto_roll_logger_test, reduce_levels_test Summary: Fix for memory leaks in rocksdb tests. Also modified the variable NUM_FAILED_TESTS to print the actual number of failed tests. Test Plan: make <test>; valgrind --leak-check=full ./<test> Reviewers: sheki, dhruba Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D9333	2013-03-12 16:03:16 -07:00
Dhruba Borthakur	ebf16f57c9	Prevent segfault because SizeUnderCompaction was called without any locks. Summary: SizeBeingCompacted was called without any lock protection. This causes crashes, especially when running db_bench with value_size=128K. The fix is to compute SizeUnderCompaction while holding the mutex and passing in these values into the call to Finalize. (gdb) where #4 leveldb::VersionSet::SizeBeingCompacted (this=this@entry=0x7f0b490931c0, level=level@entry=4) at db/version_set.cc:1827 #5 0x000000000043a3c8 in leveldb::VersionSet::Finalize (this=this@entry=0x7f0b490931c0, v=v@entry=0x7f0b3b86b480) at db/version_set.cc:1420 #6 0x00000000004418d1 in leveldb::VersionSet::LogAndApply (this=0x7f0b490931c0, edit=0x7f0b3dc8c200, mu=0x7f0b490835b0, new_descriptor_log=<optimized out>) at db/version_set.cc:1016 #7 0x00000000004222b2 in leveldb::DBImpl::InstallCompactionResults (this=this@entry=0x7f0b49083400, compact=compact@entry=0x7f0b2b8330f0) at db/db_impl.cc:1473 #8 0x0000000000426027 in leveldb::DBImpl::DoCompactionWork (this=this@entry=0x7f0b49083400, compact=compact@entry=0x7f0b2b8330f0) at db/db_impl.cc:1757 #9 0x0000000000426690 in leveldb::DBImpl::BackgroundCompaction (this=this@entry=0x7f0b49083400, madeProgress=madeProgress@entry=0x7f0b41bf2d1e, deletion_state=...) at db/db_impl.cc:1268 #10 0x0000000000428f42 in leveldb::DBImpl::BackgroundCall (this=0x7f0b49083400) at db/db_impl.cc:1170 #11 0x000000000045348e in BGThread (this=0x7f0b49023100) at util/env_posix.cc:941 #12 leveldb::(anonymous namespace)::PosixEnv::BGThreadWrapper (arg=0x7f0b49023100) at util/env_posix.cc:874 #13 0x00007f0b4a7cf10d in start_thread (arg=0x7f0b41bf3700) at pthread_create.c:301 #14 0x00007f0b49b4b11d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Test Plan: make check I am running db_bench with a value size of 128K to see if the segfault is fixed. Reviewers: MarkCallaghan, sheki, emayanke Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D9279	2013-03-11 14:09:01 -07:00
Dhruba Borthakur	6d812b6afb	A mechanism to detect manifest file write errors and put db in readonly mode. Summary: If there is an error while writing an edit to the manifest file, the manifest file is closed and reopened to check if the edit made it in. However, if the re-opening of the manifest is unsuccessful and options.paranoid_checks is set t true, then the db refuses to accept new puts, effectively putting the db in readonly mode. In a future diff, I would like to make the default value of paranoid_check to true. Test Plan: make check Reviewers: sheki Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D9201	2013-03-07 09:45:49 -08:00
Abhishek Kona	d68880a1b9	Do not allow Transaction Log Iterator to fall ahead when writer is writing the same file Summary: Store the last flushed, seq no. in db_impl. Check against it in transaction Log iterator. Do not attempt to read ahead if we do not know if the data is flushed completely. Does not work if flush is disabled. Any ideas on fixing that? * Minor change, iter->Next is called the first time automatically for * the first time. Test Plan: existing test pass. More ideas on testing this? Planning to run some stress test. Reviewers: dhruba, heyongqiang CC: leveldb Differential Revision: https://reviews.facebook.net/D9087	2013-03-06 14:05:53 -08:00
Dhruba Borthakur	afed60938f	Fox db_stress crash by copying keys before changing sequencenum to zero. Summary: The compaction process zeros out sequence numbers if the output is part of the bottommost level. The Slice is supposed to refer to an immutable data buffer. The merger that implements the priority queue while reading kvs as the input of a compaction run reies on this fact. The bug was that were updating the sequence number of a record in-place and that was causing suceeding invocations of the merger to return kvs in arbitrary order of sequence numbers. The fix is to copy the key to a local memory buffer before setting its seqno to 0. Test Plan: Set Options.purge_redundant_kvs_while_flush = false and then run db_stress --ops_per_thread=1000 --max_key=320 Reviewers: emayanke, sheki Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D9147	2013-03-06 10:52:08 -08:00
Dhruba Borthakur	f5896681b4	Removed unnecesary file object in table_cache. Summary: TableCache->file is not used. remove it. I kept the TableAndFile structure and will clean it up in a future patch. Test Plan: make clean check Reviewers: sheki, chip Reviewed By: chip CC: leveldb Differential Revision: https://reviews.facebook.net/D9075	2013-03-04 13:56:23 -08:00
Mark Callaghan	993543d1be	Add rate_delay_limit_milliseconds Summary: This adds the rate_delay_limit_milliseconds option to make the delay configurable in MakeRoomForWrite when the max compaction score is too high. This delay is called the Ln slowdown. This change also counts the Ln slowdown per level to make it possible to see where the stalls occur. From IO-bound performance testing, the Level N stalls occur: * with compression -> at the largest uncompressed level. This makes sense because compaction for compressed levels is much slower. When Lx is uncompressed and Lx+1 is compressed then files pile up at Lx because the (Lx,Lx+1)->Lx+1 compaction process is the first to be slowed by compression. * without compression -> at level 1 Task ID: #1832108 Blame Rev: Test Plan: run with real data, added test Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D9045	2013-03-04 07:41:15 -08:00
Dhruba Borthakur	806e264350	Ability for rocksdb to compact when flushing the in-memory memtable to a file in L0. Summary: Rocks accumulates recent writes and deletes in the in-memory memtable. When the memtable is full, it writes the contents on the memtable to a file in L0. This patch removes redundant records at the time of the flush. If there are multiple versions of the same key in the memtable, then only the most recent one is dumped into the output file. The purging of redundant records occur only if the most recent snapshot is earlier than the earliest record in the memtable. Should we switch on this feature by default or should we keep this feature turned off in the default settings? Test Plan: Added test case to db_test.cc Reviewers: sheki, vamsi, emayanke, heyongqiang Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D8991	2013-03-04 00:01:47 -08:00
bil	4992633751	enable the ability to set key size in db_bench in rocksdb Summary: 1. the default value for key size is still 16 2. enable the ability to set the key size via command line --key_size= Test Plan: build & run db_banch and pass some value via command line. verify it works correctly. Reviewers: sheki Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D8943	2013-03-01 14:10:09 -08:00
Abhishek Kona	c41f1e995c	Codemod NULL to nullptr Summary: scripted NULL to nullptr in * include/leveldb/ * db/ * table/ * util/ Test Plan: make all check Reviewers: dhruba, emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D9003	2013-02-28 18:04:58 -08:00
Dhruba Borthakur	e45c7a8444	Abilty to support upto a million .sst files in the database Summary: There was an artifical limit of 50K files per database. This is insifficient if the database is 1 TB in size and each file is 2 MB. Test Plan: make check Reviewers: sheki, emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D8919	2013-02-26 16:27:51 -08:00
Abhishek Kona	a9866b721b	Refactor statistics. Remove individual functions like incNumFileOpens Summary: Use only the counter mechanism. Do away with incNumFileOpens, incNumFileClose, incNumFileErrors s/NULL/nullptr/g in db/table_cache.cc Test Plan: make clean check Reviewers: dhruba, heyongqiang, emayanke Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D8841	2013-02-25 13:58:34 -08:00
Abhishek Kona	959337ed5b	Measure compaction time. Summary: just record time consumed in compaction Test Plan: compile Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D8781	2013-02-22 11:38:40 -08:00
Abhishek Kona	ec77366e14	Counters for bytes written and read. Summary: * Counters for bytes read and write. as a part of this diff, I want to=> * Measure compaction times. @dhruba can you point which function, should * I time to get Compaction-times. Was looking at CompactRange. Test Plan: db_test Reviewers: dhruba, emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D8763	2013-02-21 16:06:32 -08:00
Vamsi Ponnekanti	6abb30d4d0	[Missed adding cmdline parsing for new flags added in D8685] Summary: I had added FLAGS_numdistinct and FLAGS_deletepercent for randomwithverify but forgot to add cmdline parsing for those flags. Test Plan: [nponnekanti@dev902 /data/users/nponnekanti/rocksdb] ./db_bench --benchmarks=randomwithverify --numdistinct=500 LevelDB: version 1.5 Date: Thu Feb 21 10:34:40 2013 CPU: 24 * Intel(R) Xeon(R) CPU X5650 @ 2.67GHz CPUCache: 12288 KB Keys: 16 bytes each Values: 100 bytes each (50 bytes after compression) Entries: 1000000 RawSize: 110.6 MB (estimated) FileSize: 62.9 MB (estimated) Compression: snappy WARNING: Assertions are enabled; benchmarks unnecessarily slow ------------------------------------------------ Created bg thread 0x7fbf90bff700 randomwithverify : 4.693 micros/op 213098 ops/sec; ( get:900000 put:80000 del:20000 total:1000000 found:714556) [nponnekanti@dev902 /data/users/nponnekanti/rocksdb] ./db_bench --benchmarks=randomwithverify --deletepercent=5 LevelDB: version 1.5 Date: Thu Feb 21 10:35:03 2013 CPU: 24 * Intel(R) Xeon(R) CPU X5650 @ 2.67GHz CPUCache: 12288 KB Keys: 16 bytes each Values: 100 bytes each (50 bytes after compression) Entries: 1000000 RawSize: 110.6 MB (estimated) FileSize: 62.9 MB (estimated) Compression: snappy WARNING: Assertions are enabled; benchmarks unnecessarily slow ------------------------------------------------ Created bg thread 0x7fe14dfff700 randomwithverify : 4.883 micros/op 204798 ops/sec; ( get:900000 put:50000 del:50000 total:1000000 found:443847) [nponnekanti@dev902 /data/users/nponnekanti/rocksdb] [nponnekanti@dev902 /data/users/nponnekanti/rocksdb] ./db_bench --benchmarks=randomwithverify --deletepercent=5 --numdistinct=500 LevelDB: version 1.5 Date: Thu Feb 21 10:36:18 2013 CPU: 24 * Intel(R) Xeon(R) CPU X5650 @ 2.67GHz CPUCache: 12288 KB Keys: 16 bytes each Values: 100 bytes each (50 bytes after compression) Entries: 1000000 RawSize: 110.6 MB (estimated) FileSize: 62.9 MB (estimated) Compression: snappy WARNING: Assertions are enabled; benchmarks unnecessarily slow ------------------------------------------------ Created bg thread 0x7fc31c7ff700 randomwithverify : 4.920 micros/op 203233 ops/sec; ( get:900000 put:50000 del:50000 total:1000000 found:445522) Revert Plan: OK Task ID: # Reviewers: dhruba, emayanke Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D8769	2013-02-21 12:26:32 -08:00
Vamsi Ponnekanti	945d2b59b9	[Add randomwithverify benchmark option] Summary: Added RandomWithVerify benchmark option. Test Plan: This whole diff is to test. [nponnekanti@dev902 /data/users/nponnekanti/rocksdb] ./db_bench --benchmarks=randomwithverify LevelDB: version 1.5 Date: Tue Feb 19 17:50:28 2013 CPU: 24 * Intel(R) Xeon(R) CPU X5650 @ 2.67GHz CPUCache: 12288 KB Keys: 16 bytes each Values: 100 bytes each (50 bytes after compression) Entries: 1000000 RawSize: 110.6 MB (estimated) FileSize: 62.9 MB (estimated) Compression: snappy WARNING: Assertions are enabled; benchmarks unnecessarily slow ------------------------------------------------ Created bg thread 0x7fa9c3fff700 randomwithverify : 5.004 micros/op 199836 ops/sec; ( get:900000 put:80000 del:20000 total:1000000 found:711992) Revert Plan: OK Task ID: # Reviewers: dhruba, emayanke Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D8685	2013-02-21 10:27:02 -08:00
amayank	b2c50f1c3f	Fix for the weird behaviour encountered by ldb Get where it could read only the second-latest value Summary: Changed the Get and Scan options with openForReadOnly mode to have access to the memtable. Changed the visibility of NewInternalIterator in db_impl from private to protected so that the derived class db_impl_read_only can call that in its NewIterator function for the scan case. The previous approach which changed the default for flush_on_destroy_ from false to true caused many problems in the unit tests due to empty sst files that it created. All unit tests pass now. Test Plan: make clean; make all check; ldb put and get and scans Reviewers: dhruba, heyongqiang, sheki Reviewed By: dhruba CC: kosievdmerwe, zshao, dilipj, kailiu Differential Revision: https://reviews.facebook.net/D8697	2013-02-20 10:45:52 -08:00
Abhishek Kona	fe10200ddc	Introduce histogram in statistics.h Summary: * Introduce is histogram in statistics.h * stop watch to measure time. * introduce two timers as a poc. Replaced NULL with nullptr to fight some lint errors Should be useful for google. Test Plan: ran db_bench and check stats. make all check Reviewers: dhruba, heyongqiang Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D8637	2013-02-20 10:43:32 -08:00
amayank	f3901e0647	Revert "Fix for the weird behaviour encountered by ldb Get where it could read only the second-latest value" This reverts commit `4c696ed001`.	2013-02-18 22:32:27 -08:00
Dhruba Borthakur	fd367e677e	Fix unit test failure in db_filename.cc Summary: c_test: db/filename.cc:74: std::string leveldb::DescriptorFileName(const string&,.... Test Plan: this is a failure in a unit test Differential Revision: https://reviews.facebook.net/D8667	2013-02-18 21:53:56 -08:00
Dhruba Borthakur	4564915446	Zero out redundant sequence numbers for kvs to increase compression efficiency Summary: The sequence numbers in each record eat up plenty of space on storage. The optimization zeroes out sequence numbers on kvs in the Lmax layer that are earlier than the earliest snapshot. Test Plan: Unit test attached. Differential Revision: https://reviews.facebook.net/D8619	2013-02-18 21:51:15 -08:00
amayank	4c696ed001	Fix for the weird behaviour encountered by ldb Get where it could read only the second-latest value Summary: flush_on_destroy has a default value of false and the memtable is flushed in the dbimpl-destructor only when that is set to true. Because we want the memtable to be flushed everytime that the destructor is called(db is closed) and the cases where we work with the memtable only are very less it is a good idea to give this a default value of true. Thus the put from ldb wil have its data flushed to disk in the destructor and the next Get will be able to read it when opened with OpenForReadOnly. The reason that ldb could read the latest value when the db was opened in the normal Open mode is that the Get from normal Open first reads the memtable and directly finds the latest value written there and the Get from OpenForReadOnly doesn't have access to the memtable (which is correct because all its Put/Modify) are disabled Test Plan: make all; ldb put and get and scans Reviewers: dhruba, heyongqiang, sheki Reviewed By: heyongqiang CC: kosievdmerwe, zshao, dilipj, kailiu Differential Revision: https://reviews.facebook.net/D8631	2013-02-15 16:56:06 -08:00
Kai Liu	b63aafce42	Allow the logs to be purged by TTL. Summary: * Add a SplitByTTLLogger to enable this feature. In this diff I implemented generalized AutoSplitLoggerBase class to simplify the development of such classes. * Refactor the existing AutoSplitLogger and fix several bugs. Test Plan: * Added a unit tests for different types of "auto splitable" loggers individually. * Tested the composited logger which allows the log files to be splitted by both TTL and log size. Reviewers: heyongqiang, dhruba Reviewed By: heyongqiang CC: zshao, leveldb Differential Revision: https://reviews.facebook.net/D8037	2013-02-04 19:42:40 -08:00
Chip Turner	0b83a83191	Fix poor error on num_levels mismatch and few other minor improvements Summary: Previously, if you opened a db with num_levels set lower than the database, you received the unhelpful message "Corruption: VersionEdit: new-file entry." Now you get a more verbose message describing the issue. Also, fix handling of compression_levels (both the run-over-the-end issue and the memory management of it). Lastly, unique_ptr'ify a couple of minor calls. Test Plan: make check Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D8151	2013-01-25 15:37:26 -08:00
Chip Turner	772f75b3fb	Stop continually re-creating build_version.c Summary: We continually rebuilt build_version.c because we put the current date into it, but that's what __DATE__ already is. This makes builds faster. This also fixes an issue with 'make clean FOO' not working properly. Also tweak the build rules to be more consistent, always have warnings, and add a 'make release' rule to handle flags for release builds. Test Plan: make, make clean Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D8139	2013-01-24 17:51:39 -08:00
Chip Turner	3dafdfb2c4	Use fallocate to prevent excessive allocation of sst files and logs Summary: On some filesystems, pre-allocation can be a considerable amount of space. xfs in our production environment pre-allocates by 1GB, for instance. By using fallocate to inform the kernel of our expected file sizes, we eliminate this wasteage (that isn't recovered until the file is closed which, in the case of LOG files, can be a considerable amount of time). Test Plan: created an xfs loopback filesystem, mounted with allocsize=4M, and ran db_stress. LOG file without this change was 4M, and with it it was 128k then grew to normal size. Reviewers: dhruba Reviewed By: dhruba CC: adsharma, leveldb Differential Revision: https://reviews.facebook.net/D7953	2013-01-24 12:25:13 -08:00
Chip Turner	2fdf91a4f8	Fix a number of object lifetime/ownership issues Summary: Replace manual memory management with std::unique_ptr in a number of places; not exhaustive, but this fixes a few leaks with file handles as well as clarifies semantics of the ownership of file handles with log classes. Test Plan: db_stress, make check Reviewers: dhruba Reviewed By: dhruba CC: zshao, leveldb, heyongqiang Differential Revision: https://reviews.facebook.net/D8043	2013-01-23 16:54:11 -08:00
Abhishek Kona	16903c35b0	Add counters to count gets and writes Summary: Add Tickers to count Write's and Get's Test Plan: make check Reviewers: dhruba, chip Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D7977	2013-01-17 12:27:56 -08:00
Kosie van der Merwe	3c3df7402f	Fixed issues Valgrind found. Summary: Found issues with `db_test` and `db_stress` when running valgrind. `DBImpl` had an issue where if an compaction failed then it will use the uninitialised file size of an output file is used. This manifested as the final call to output to the log in `DoCompactionWork()` branching on uninitialized memory (all the way down in printf's innards). Test Plan: Ran `valgrind --track_origins=yes ./db_test` and `valgrind ./db_stress` to see if issues disappeared. Ran `make check` to see if there were no regressions. Reviewers: vamsi, dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D8001	2013-01-17 10:04:45 -08:00
Abhishek Kona	7d5a4383bb	rollover manifest file. Summary: Check in LogAndApply if the file size is more than the limit set in Options. Things to consider : will this be expensive? Test Plan: make all check. Inputs on a new unit test? Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D7701	2013-01-16 12:09:44 -08:00
Chip Turner	a2dcd79c1e	Add optional clang compile mode Summary: clang is an alternate compiler based on llvm. It produces nicer error messages and finds some bugs that gcc doesn't, such as the size_t change in this file (which caused some write return values to be misinterpreted!) Clang isn't the default; to try it, do "USE_CLANG=1 make" or "export USE_CLANG=1" then make as normal Test Plan: "make check" and "USE_CLANG=1 make check" Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D7899	2013-01-15 18:48:37 -08:00
Chip Turner	9bbcab57a9	Fix broken build Summary: Mis-merged from HEAD, had a duplicate declaration. Test Plan: make -j32 OPT=-g Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D7911	2013-01-15 14:05:49 -08:00
Kosie van der Merwe	28fe86c48a	Fixed bug with seek compactions on Level 0 Summary: Due to how the code handled compactions in Level 0 in `PickCompaction()` it could be the case that two compactions on level 0 ran that produced tables in level 1 that overlap. However, this case seems like it would only occur on a seek compaction which is unlikely on level 0. Furthermore, level 0 and level 1 had to have a certain arrangement of files. Test Plan: make check Reviewers: dhruba, vamsi Reviewed By: dhruba CC: leveldb, sheki Differential Revision: https://reviews.facebook.net/D7923	2013-01-15 12:43:09 -08:00
Chip Turner	c0cb289d57	Various build cleanups/improvements Summary: Specific changes: 1) Turn on -Werror so all warnings are errors 2) Fix some warnings the above now complains about 3) Add proper dependency support so changing a .h file forces a .c file to rebuild 4) Automatically use fbcode gcc on any internal machine rather than whatever system compiler is laying around 5) Fix jemalloc to once again be used in the builds (seemed like it wasn't being?) 6) Fix issue where 'git' would fail in build_detect_version because of LD_LIBRARY_PATH being set in the third-party build system Test Plan: make, make check, make clean, touch a header file, make sure rebuild is expected Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D7887	2013-01-14 18:40:22 -08:00
Mark Callaghan	2ba125faf6	fix warning for unused variable Test Plan: compile Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D7857	2013-01-11 15:00:47 -08:00
Abhishek Kona	85ad13be1a	Port fix for Leveldb manifest writing bug from Open-Source Summary: Pretty much a blind copy of the patch in open source. Hope to get this in before we make a release Test Plan: make clean check Reviewers: dhruba, heyongqiang Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D7809	2013-01-10 12:06:03 -08:00
Kosie van der Merwe	d8371ef1f6	Fixing some issues Valgrind found Summary: Found some issues running Valgrind on `db_test` (there are still some outstanding ones) and fixed them. Test Plan: make check ran `valgrind ./db_test` and saw that errors no longer occur Reviewers: dhruba, vamsi, emayanke, sheki Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D7803	2013-01-08 12:16:40 -08:00
Dhruba Borthakur	628dc2aad9	db_bench should use the default value for max_grandparent_overlap_factor. Summary: This was a peformance regression caused by https://reviews.facebook.net/D6729. The default value of max_grandparent_overlap_factor was erroneously set to 0 in db_bench. This was causing compactions to create really really small files because the max_grandparent_overlap_factor was erroneously set to zero in the benchmark. Test Plan: Run --benchmarks=overwrite Reviewers: heyongqiang, emayanke, sheki, MarkCallaghan Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D7797	2013-01-08 11:21:11 -08:00
Kosie van der Merwe	d6e873f22f	Added clearer error message for failure to create db directory in DBImpl::Recover() Summary: Changed CreateDir() to CreateDirIfMissing() so a directory that already exists now causes and error. Fixed CreateDirIfMissing() and added Env.DirExists() Test Plan: make check to test for regessions Ran the following to test if the error message is not about lock files not existing ./db_bench --db=dir/testdb After creating a file "testdb", ran the following to see if it failed with sane error message: ./db_bench --db=testdb Reviewers: dhruba, emayanke, vamsi, sheki Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D7707	2013-01-07 10:11:18 -08:00
Mark Callaghan	4069f66cc7	Add --seed, --read_range to db_bench Summary: Adds the option --seed to db_bench to specify the base for the per-thread RNG. When not set each thread uses the same value across runs of db_bench which defeats IO stress testing. Adds the option --read_range. When set to a value > 1 an iterator is created and each query done for the randomread benchmark will do a range scan for that many rows. When not set or set to 1 the existing behavior (a point lookup) is done. Fixes a bug where a printf format string was missing. Test Plan: run db_bench Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D7749	2013-01-07 09:56:10 -08:00
Kosie van der Merwe	8cd86a7be5	Fixing and adding some comments Summary: `MemTableList::Add()` neglected to mention that it took ownership of the reference held by its caller. The comment in `MemTable::Get()` was wrong in describing the format of the key. Test Plan: None Reviewers: dhruba, sheki, emayanke, vamsi Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D7755	2013-01-03 17:13:56 -08:00
Dhruba Borthakur	d7d43ae21a	ExtendOverlappingInputs too slow for large databases. Summary: There was a bug in the ExtendOverlappingInputs method so that the terminating condition for the backward search was incorrect. Test Plan: make clean check Reviewers: sheki, emayanke, MarkCallaghan Reviewed By: MarkCallaghan CC: leveldb Differential Revision: https://reviews.facebook.net/D7725	2013-01-02 13:19:06 -08:00
Dhruba Borthakur	f4c2b7cf97	Enhance ReadOnly mode to process the all committed transactions. Summary: Leveldb has an api OpenForReadOnly() that opens the database in readonly mode. This call had an option to not process the transaction log. This patch removes this option and always processes all transactions that had been committed. It has been done in such a way that it does not create/write to any new files in the process. The invariant of "no-writes" to the leveldb data directory is still true. This enhancement allows multiple threads to open the same database in readonly mode and access all trancations that were committed right upto the OpenForReadOnly call. I changed the public API to match the new semantics because there are no users who are currently using this api. Test Plan: make clean check Reviewers: sheki Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D7479	2012-12-19 16:30:46 -08:00
Dhruba Borthakur	3d1e92b05a	Enhancements to rocksdb for better support for replication. Summary: 1. The OpenForReadOnly() call should not lock the db. This is useful so that multiple processes can open the same database concurrently for reading. 2. GetUpdatesSince should not error out if the archive directory does not exist. 3. A new constructor for WriteBatch that can takes a serialized string as a parameter of the constructor. Test Plan: make clean check Reviewers: sheki Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D7449	2012-12-17 11:40:19 -08:00
Kosie van der Merwe	62d48571de	Added meta-database support. Summary: Added kMetaDatabase for meta-databases in db/filename.h along with supporting fuctions. Fixed switch in DBImpl so that it also handles kMetaDatabase. Fixed DestroyDB() that it can handle destroying meta-databases. Test Plan: make check Reviewers: sheki, emayanke, vamsi, dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D7245	2012-12-17 11:26:59 -08:00
Abhishek Kona	2f0585fb97	Fix a bug. Where DestroyDB deletes a non-existant archive directory. Summary: C tests would fail sometimes as DestroyDB would return a Failure Status message when deleting an archival directory which was not created (WAL_ttl_seconds = 0). Fix: Ignore the Status returned on Deleting Archival Directory. Test Plan: * make check Reviewers: dhruba, emayanke Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D7395	2012-12-17 10:25:26 -08:00
Zheng Shao	c28097538a	manifest_dump: Add --hex=1 option Summary: Without this option, manifest_dump does not print binary keys for files in a human-readable way. Test Plan: ./manifest_dump --hex=1 --verbose=0 --file=/data/users/zshao/fdb_comparison/leveldb/fbobj.apprequest-0_0_original/MANIFEST-000002 manifest_file_number 589 next_file_number 590 last_sequence 2311567 log_number 543 prev_log_number 0 --- level 0 --- version# 0 --- 532:1300357['0000455BABE20000' @ 2183973 : 1 .. 'FFFCA5D7ADE20000' @ 2184254 : 1] 536:1308170['000198C75CE30000' @ 2203313 : 1 .. 'FFFCF94A79E30000' @ 2206463 : 1] 542:1321644['0002931AA5E50000' @ 2267055 : 1 .. 'FFF77B31C5E50000' @ 2270754 : 1] 544:1286390['000410A309E60000' @ 2278592 : 1 .. 'FFFE470A73E60000' @ 2289221 : 1] 538:1298778['0006BCF4D8E30000' @ 2217050 : 1 .. 'FFFD77DAF7E30000' @ 2220489 : 1] 540:1282353['00090D5356E40000' @ 2231156 : 1 .. 'FFFFF4625CE40000' @ 2231969 : 1] --- level 1 --- version# 0 --- 510:2112325['000007F9C2D40000' @ 1782099 : 1 .. '146F5B67B8D80000' @ 1905458 : 1] 511:2121742['146F8A3023D60000' @ 1824388 : 1 .. '28BC8FBB9CD40000' @ 1777993 : 1] 512:801631['28BCD396F1DE0000' @ 2080191 : 1 .. '3082DBE9ADDB0000' @ 1989927 : 1] Reviewers: dhruba, sheki, emayanke Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D7425	2012-12-16 08:58:28 -08:00
Abhishek Kona	2ba866e0c5	GetSequence API in write batch. Summary: WriteBatch is now used by the GetUpdatesSinceAPI. This API is external and will be used by the rocks server. Rocks Server and others will need to know about the Sequence Number in the WriteBatch. This public method will allow for that. Test Plan: make all check. Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D7293	2012-12-12 22:21:10 -08:00
Abhishek Kona	22c283625b	Fix Bug in Binary Search for files containing a seq no. and delete Archived Log Files during Destroy DB. Summary: * Fixed implementation bug in Binary_Searvch introduced in https://reviews.facebook.net/D7119 * Binary search is also overflow safe. * Delete archive log files and archive dir during DestroyDB Test Plan: make check Reviewers: dhruba CC: kosievdmerwe, emayanke Differential Revision: https://reviews.facebook.net/D7263	2012-12-11 16:15:02 -08:00
Dhruba Borthakur	24fc379273	An public api to fetch the latest transaction id. Summary: Implement a interface to retrieve the most current transaction id from the database. Test Plan: Added unit test. Reviewers: sheki Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D7269	2012-12-10 16:04:19 -08:00
Abhishek Kona	1c6742e32f	Refactor GetArchivalDirectoryName to filename.h Summary: filename.h has functions to do similar things. Moving code away from db_impl.cc Test Plan: make check Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D7251	2012-12-10 10:51:07 -08:00
Abhishek Kona	8055008909	GetUpdatesSince API to enable replication. Summary: How it works: * GetUpdatesSince takes a SequenceNumber. * A LogFile with the first SequenceNumber nearest and lesser than the requested Sequence Number is found. * Seek in the logFile till the requested SeqNumber is found. * Return an iterator which contains logic to return record's one by one. Test Plan: * Test case included to check the good code path. * Will update with more test-cases. * Feedback required on test-cases. Reviewers: dhruba, emayanke Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D7119	2012-12-07 11:42:13 -08:00
Dhruba Borthakur	c847a31727	Print compaction score for every compaction run. Summary: A compaction is picked based on its score. It is useful to print the compaction score in the LOG because it aids in debugging. If one looks at the logs, one can find out why a compaction was preferred over another. Test Plan: make clean check Differential Revision: https://reviews.facebook.net/D7137	2012-12-04 10:03:47 -08:00
sheki	d4627e6de4	Move WAL files to archive directory, instead of deleting. Summary: Create a directory "archive" in the DB directory. During DeleteObsolteFiles move the WAL files (*.log) to the Archive directory, instead of deleting. Test Plan: Created a DB using DB_Bench. Reopened it. Checked if files move. Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D6975	2012-11-28 17:28:08 -08:00
Abhishek Kona	d29f181923	Fix all the lint errors. Summary: Scripted and removed all trailing spaces and converted all tabs to spaces. Also fixed other lint errors. All lint errors from this point of time should be taken seriously. Test Plan: make all check Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D7059	2012-11-28 17:18:41 -08:00
Dhruba Borthakur	9a357847eb	Delete non-visible keys during a compaction even in the presense of snapshots. Summary: LevelDB should delete almost-new keys when a long-open snapshot exists. The previous behavior is to keep all versions that were created after the oldest open snapshot. This can lead to database size bloat for high-update workloads when there are long-open snapshots and long-open snapshot will be used for logical backup. By "almost new" I mean that the key was updated more than once after the oldest snapshot. If there were two snapshots with seq numbers s1 and s2 (s1 < s2), and if we find two instances of the same key k1 that lie entirely within s1 and s2 (i.e. s1 < k1 < s2), then the earlier version of k1 can be safely deleted because that version is not visible in any snapshot. Test Plan: unit test attached make clean check Differential Revision: https://reviews.facebook.net/D6999	2012-11-28 15:47:40 -08:00
Dhruba Borthakur	3366eda839	Print out status at the end of a compaction run. Summary: Print out status at the end of a compaction run. This helps in debugging. Test Plan: make clean check Reviewers: sheki Reviewed By: sheki Differential Revision: https://reviews.facebook.net/D7035	2012-11-27 22:17:38 -08:00
sheki	43f5a07989	Remove unused varibles. Cause compiler warnings. Test Plan: make check Reviewers: dhruba Reviewed By: dhruba CC: emayanke Differential Revision: https://reviews.facebook.net/D6993	2012-11-26 20:55:24 -08:00
Dhruba Borthakur	2a39699900	Assertion failure while running with unit tests with OPT=-g Summary: When we expand the range of keys for a level 0 compaction, we need to invoke ParentFilesInCompaction() only once for the entire range of keys that is being compacted. We were invoking it for each file that was being compacted, but this triggers an assertion because each file's range were contiguous but non-overlapping. I renamed ParentFilesInCompaction to ParentRangeInCompaction to adequately represent that it is the range-of-keys and not individual files that we compact in a single compaction run. Here is the assertion that is fixed by this patch. db_test: db/version_set.cc:585: void leveldb::Version::ExtendOverlappingInputs(int, const leveldb::Slice&, const leveldb::Slice&, std::vector<leveldb::FileMetaData, std::allocator<leveldb::FileMetaData> >*, int): Assertion `user_cmp->Compare(flimit, user_begin) >= 0' failed. Test Plan: make clean check OPT=-g Reviewers: sheki Reviewed By: sheki CC: MarkCallaghan, emayanke, leveldb Differential Revision: https://reviews.facebook.net/D6963	2012-11-26 14:00:39 -08:00
Dhruba Borthakur	e0cd6bf0e9	The c_test was sometimes failing with an assertion. Summary: On fast filesystems (e.g. /dev/shm and ext4), the flushing of memstore to disk was fast and quick, and the background compaction thread was not getting scheduled fast enough to delete obsolete files before the db was closed. This caused the repair method to pick up those files that were not part of the db and the unit test was failing. The fix is to enhance the unti test to run a compaction before closing the database so that all files that are not part of the database are truly deleted from the filesystem. Test Plan: make c_test; ./c_test Reviewers: chip, emayanke, sheki Reviewed By: chip CC: leveldb Differential Revision: https://reviews.facebook.net/D6915	2012-11-26 11:59:51 -08:00
Dhruba Borthakur	7632fdb5cb	Support taking a configurable number of files from the same level to compact in a single compaction run. Summary: The compaction process takes some files from LevelK and merges it into LevelK+1. The number of files it picks from LevelK was capped such a way that the total amount of data picked does not exceed the maxfilesize of that level. This essentially meant that only one file from LevelK is picked for a single compaction. For bulkloads, we would like to take many many file from LevelK and compact them using a single compaction run. This patch introduces a option called the 'source_compaction_factor' (similar to expanded_compaction_factor). It is a multiplier that is multiplied by the maxfilesize of that level to arrive at the limit that is used to throttle the number of source files from LevelK. For bulk loads, set source_compaction_factor to a very high number so that multiple files from the same level are picked for compaction in a single compaction. The default value of source_compaction_factor is 1, so that we can keep backward compatibilty with existing compaction semantics. Test Plan: make clean check Reviewers: emayanke, sheki Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D6867	2012-11-21 08:37:03 -08:00
Dhruba Borthakur	fbb73a4ac3	Support to disable background compactions on a database. Summary: This option is needed for fast bulk uploads. The goal is to load all the data into files in L0 without any interference from background compactions. Test Plan: make clean check Reviewers: sheki Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D6849	2012-11-20 21:12:06 -08:00
Dhruba Borthakur	3754f2f4ff	A major bug that was not considering the compaction score of the n-1 level. Summary: The method Finalize() recomputes the compaction score of each level and then sorts these score from largest to smallest. The idea is that the level with the largest compaction score will be a better candidate for compaction. There are usually very few levels, and a bubble sort code was used to sort these compaction scores. There existed a bug in the sorting code that skipped looking at the score for the n-1 level. This meant that even if the compaction score of the n-1 level is large, it will not be picked for compaction. This patch fixes the bug and also introduces "asserts" in the code to detect any possible inconsistencies caused by future bugs. This bug existed in the very first code change that introduced multi-threaded compaction to the leveldb code. That version of code was committed on Oct 19th via `1ca0584345` Test Plan: make clean check OPT=-g Reviewers: emayanke, sheki, MarkCallaghan Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D6837	2012-11-20 15:44:21 -08:00
Dhruba Borthakur	dde70898a1	Fix asserts Summary: make check OPT=-g fails with the following assert. ==== Test DBTest.ApproximateSizes db_test: db/version_set.cc:765: void leveldb::VersionSet::Builder::CheckConsistencyForDeletes(leveldb::VersionEdit, int, int): Assertion `found' failed. The assertion was that file #7 that was being deleted did not preexists, but actualy it did pre-exist as shown in the manifest dump shows below. The bug was that we did not check for file existance at the same level. **********************Edit[0] = VersionEdit { Comparator: leveldb.BytewiseComparator } *********************Edit[1] = VersionEdit { LogNumber: 8 PrevLogNumber: 0 NextFile: 9 LastSeq: 80 AddFile: 0 7 8005319 'key000000' @ 1 : 1 .. 'key000079' @ 80 : 1 } ***********************Edit[2] = VersionEdit { LogNumber: 8 PrevLogNumber: 0 NextFile: 13 LastSeq: 80 CompactPointer: 0 'key000079' @ 80 : 1 DeleteFile: 0 7 AddFile: 1 9 2101425 'key000000' @ 1 : 1 .. 'key000020' @ 21 : 1 AddFile: 1 10 2101425 'key000021' @ 22 : 1 .. 'key000041' @ 42 : 1 AddFile: 1 11 2101425 'key000042' @ 43 : 1 .. 'key000062' @ 63 : 1 AddFile: 1 12 1701165 'key000063' @ 64 : 1 .. 'key000079' @ 80 : 1 } Test Plan: Reviewers: CC: Task ID: # Blame Rev:	2012-11-19 14:51:22 -08:00
Dhruba Borthakur	a4b79b6e28	Merge branch 'master' into performance	2012-11-19 13:20:25 -08:00
Dhruba Borthakur	74054fa993	Fix compilation error while compiling unit tests with OPT=-g Summary: Fix compilation error while compiling with OPT=-g Test Plan: make clean check OPT=-g Reviewers: CC: Task ID: # Blame Rev:	2012-11-19 13:16:46 -08:00
Dhruba Borthakur	48dafb2c59	Fix compilation error introduced by previous commit `7889e09455` Summary: Fix compilation error introduced by previous commit `7889e09455` Test Plan: make clean check	2012-11-19 12:16:45 -08:00
Dhruba Borthakur	7889e09455	Enhance manifest_dump to print each individual edit. Summary: The manifest file contains a series of edits. If the verbose option is switched on, then print each individual edit in the manifest file. This helps in debugging. Test Plan: make clean manifest_dump Reviewers: emayanke, sheki Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D6807	2012-11-19 12:04:35 -08:00
amayank	65b035a47f	Fix a coding error in db_test.cc Summary: The new function MinLevelToCompress in db_test.cc was incomplete. It needs to tell the calling function-TEST whether the test has to be skipped or not Test Plan: make all;./db_test Reviewers: dhruba, heyongqiang Reviewed By: dhruba CC: sheki Differential Revision: https://reviews.facebook.net/D6771	2012-11-19 12:04:35 -08:00
Dhruba Borthakur	4b622ab0f2	Enhance manifest_dump to print each individual edit. Summary: The manifest file contains a series of edits. If the verbose option is switched on, then print each individual edit in the manifest file. This helps in debugging. Test Plan: make clean manifest_dump Reviewers: emayanke, sheki Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D6807	2012-11-19 12:02:27 -08:00
Dhruba Borthakur	62e7583f94	enhance dbstress to simulate hard crash Summary: dbstress has an option to reopen the database. Make it such that the previous handle is not closed before we reopen, this simulates a situation similar to a process crash. Added new api to DMImpl to remove the lock file. Test Plan: run db_stress Reviewers: emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D6777	2012-11-18 23:16:17 -08:00
amayank	de278a6de9	Fix a coding error in db_test.cc Summary: The new function MinLevelToCompress in db_test.cc was incomplete. It needs to tell the calling function-TEST whether the test has to be skipped or not Test Plan: make all;./db_test Reviewers: dhruba, heyongqiang Reviewed By: dhruba CC: sheki Differential Revision: https://reviews.facebook.net/D6771	2012-11-16 14:56:50 -08:00
Dhruba Borthakur	6c5a4d646a	Merge branch 'master' into performance Conflicts: db/db_impl.h	2012-11-14 21:39:52 -08:00
Dhruba Borthakur	e988c11f58	Enhance db_bench to be able to specify a grandparent_overlap_factor. Summary: The value specified in max_grandparent_overlap_factor is used to limit the file size in a compaction run. This patch makes it configurable when using db_bench. Test Plan: make clean db_bench Reviewers: MarkCallaghan, heyongqiang Reviewed By: heyongqiang CC: leveldb Differential Revision: https://reviews.facebook.net/D6729	2012-11-14 16:20:13 -08:00
Dhruba Borthakur	5d16e503a6	Improved CompactionFilter api: pass in a opaque argument to CompactionFilter invocation. Summary: There are applications that operate on multiple leveldb instances. These applications will like to pass in an opaque type for each leveldb instance and this type should be passed back to the application with every invocation of the CompactionFilter api. Test Plan: Enehanced unit test for opaque parameter to CompactionFilter. Reviewers: heyongqiang Reviewed By: heyongqiang CC: MarkCallaghan, sheki, emayanke Differential Revision: https://reviews.facebook.net/D6711	2012-11-13 16:22:26 -08:00
Dhruba Borthakur	43d9a8225a	Fix asserts so that "make check OPT=-g" works on performance branch Summary: Compilation used to fail with the error: db/version_set.cc:1773: error: ‘number_of_files_to_sort_’ is not a member of ‘leveldb::VersionSet’ I created a new method called CheckConsistencyForDeletes() so that all the high cost checking is done only when OPT=-g is specified. I also fixed a bug in PickCompactionBySize that was triggered when OPT=-g was switched on. The base_index in the compaction record was not set correctly. Test Plan: make check OPT=-g Differential Revision: https://reviews.facebook.net/D6687	2012-11-13 10:40:52 -08:00
Dhruba Borthakur	a785e029f7	The db_bench utility was broken in 1.5.4.fb because of a signed-unsigned comparision. Summary: The db_bench utility was broken in 1.5.4.fb because of a signed-unsigned comparision. The static variable FLAGS_min_level_to_compress was recently changed from int to 'unsigned in' but it is initilized to a nagative value -1. The segfault is of this type: Program received signal SIGSEGV, Segmentation fault. Open (this=0x7fffffffdee0) at db/db_bench.cc:939 939 db/db_bench.cc: No such file or directory. (gdb) where Test Plan: run db_bench with no options. Reviewers: heyongqiang Reviewed By: heyongqiang CC: MarkCallaghan, emayanke, sheki Differential Revision: https://reviews.facebook.net/D6663	2012-11-12 13:59:35 -08:00
Dhruba Borthakur	9c6c232e47	Compilation error while compiling with OPT=-g Summary: make clean check OPT=-g fails leveldb::DBStatistics::getTickerCount(leveldb::Tickers)’: ./db/db_statistics.h:34: error: ‘MAX_NO_TICKERS’ was not declared in this scope util/ldb_cmd.cc:255: warning: left shift count >= width of type Test Plan: make clean check OPT=-g Reviewers: CC: Task ID: # Blame Rev:	2012-11-11 00:20:40 -08:00
Abhishek Kona	0f8e4721a5	Metrics: record compaction drop's and bloom filter effectiveness Summary: Record BloomFliter hits and drop off reasons during compaction. Test Plan: Unit tests work. Reviewers: dhruba, heyongqiang Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D6591	2012-11-09 11:38:45 -08:00
heyongqiang	20d18a89a3	disable size compaction in ldb reduce_levels and added compression and file size parameter to it Summary: disable size compaction in ldb reduce_levels, this will avoid compactions rather than the manual comapction, added --compression=none\|snappy\|zlib\|bzip2 and --file_size= per-file size to ldb reduce_levels command Test Plan: run ldb Reviewers: dhruba, MarkCallaghan Reviewed By: dhruba CC: sheki, emayanke Differential Revision: https://reviews.facebook.net/D6597	2012-11-09 10:14:47 -08:00
Abhishek Kona	391885c4e4	stat's collection in leveldb Summary: Prototype stat's collection. Diff is a good estimate of what the final code will look like. A few assumptions : * Used a global static instance of the statistics object. Plan to pass it to each internal function. Static allows metrics only at app level. * In the Ticker's do not do any locking. Depend on the mutex at each function of LevelDB. If we ever remove the mutex, we should change here too. The other option is use atomic objects anyways as there won't be any contention as they will be always acquired only by one thread. * The counters are dumb, increment through lifecycle. Plan to use ods etc to get last5min stat etc. Test Plan: made changes in db_bench Ran ./db_bench --statistics=1 --num=10000 --cache_size=5000 This will print the cache hit/miss stats. Reviewers: dhruba, heyongqiang Differential Revision: https://reviews.facebook.net/D6441	2012-11-08 13:55:49 -08:00
Dhruba Borthakur	95dda37858	Move filesize-based-sorting to outside the Mutex Summary: When a new version is created, we sort all the files at every level based on their size. This is necessary because we want to compact the largest file first. The sorting takes quite a bit of CPU. Moved the sorting code to be outside the mutex. Also, the earlier code was sorting files at all levels but we do not need to sort the highest-number level because those files are never the cause of any compaction. To reduce sorting costs, we sort only the first few files in each level because it is likely that those are the only files in that level that will be picked for compaction. At steady state, I have seen that this patch increase throughout from 1500 writes/sec to 1700 writes/sec at the end of a 72 hour run. The cpu saving by not sorting the last level was not distinctive in this test run because there were only 100K files in the highest numbered level. I expect the cpu saving to be significant when the number of files is much higher. This is mostly an early preview and not ready for rigorous review. With this patch, the writs/sec is now bottlenecked not by the sorting code but by GetOverlappingInputs. I am working on a patch to optimize GetOverlappingInputs. Test Plan: make check Reviewers: MarkCallaghan, heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D6411	2012-11-07 15:39:44 -08:00
Dhruba Borthakur	18cb6004d2	Fixed compilation error in previous merge. Summary: Fixed compilation error in previous merge. Test Plan: Reviewers: CC: Task ID: # Blame Rev:	2012-11-07 15:24:47 -08:00
Dhruba Borthakur	8143062edd	Merge branch 'master' into performance Conflicts: db/db_impl.cc db/version_set.cc util/options.cc	2012-11-07 15:11:37 -08:00
heyongqiang	3fcf533ed0	Add a readonly db Summary: as subject Test Plan: run db_bench readrandom Reviewers: dhruba Reviewed By: dhruba CC: MarkCallaghan, emayanke, sheki Differential Revision: https://reviews.facebook.net/D6495	2012-11-07 14:19:48 -08:00
Dhruba Borthakur	9b87a2bae8	Avoid doing a exhaustive search when looking for overlapping files. Summary: The Version::GetOverlappingInputs() is called multiple times in the compaction code path. Eack invocation does a binary search for overlapping files in the specified key range. This patch remembers the offset of an overlapped file when GetOverlappingInputs() is called the first time within a compaction run. Suceeding calls to GetOverlappingInputs() uses the remembered index to avoid the binary search. I measured that 1000 iterations of GetOverlappingInputs takes around 4500 microseconds without this patch. If I use this patch with the hint on every invocation, then 1000 iterations take about 3900 microsecond. Test Plan: make check OPT=-g Reviewers: heyongqiang Reviewed By: heyongqiang CC: MarkCallaghan, emayanke, sheki Differential Revision: https://reviews.facebook.net/D6513	2012-11-07 11:47:17 -08:00
Abhishek Kona	4e413df3d0	Flush Data at object destruction if disableWal is used. Summary: Added a conditional flush in ~DBImpl to flush. There is still a chance of writes not being persisted if there is a crash (not a clean shutdown) before the DBImpl instance is destroyed. Test Plan: modified db_test to meet the new expectations. Reviewers: dhruba, heyongqiang Differential Revision: https://reviews.facebook.net/D6519	2012-11-06 15:04:42 -08:00
Dhruba Borthakur	aa42c66814	Fix all warnings generated by -Wall option to the compiler. Summary: The default compilation process now uses "-Wall" to compile. Fix all compilation error generated by gcc. Test Plan: make all check Reviewers: heyongqiang, emayanke, sheki Reviewed By: heyongqiang CC: MarkCallaghan Differential Revision: https://reviews.facebook.net/D6525	2012-11-06 14:07:31 -08:00
Dhruba Borthakur	5f91868cee	Merge branch 'master' into performance Conflicts: db/version_set.cc util/options.cc	2012-11-05 16:51:55 -08:00

1 2 3 4 5 ...

278 Commits