rocksdb

Author	SHA1	Message	Date
Igor Sugak	9fd6edf81c	rocksdb: Replace ASSERT* with EXPECT* in functions that does not return void value Summary: gtest does not use exceptions to fail a unit test by design, and `ASSERT`s are implemented using `return`. As a consequence we cannot use `ASSERT` in a function that does not return `void` value ([[ https://code.google.com/p/googletest/wiki/AdvancedGuide#Assertion_Placement \| 1]]), and have to fix our existing code. This diff does this in a generic way, with no manual changes. In order to detect all existing `ASSERT` that are used in functions that doesn't return void value, I change the code to generate compile errors for such cases. In `util/testharness.h` I defined `EXPECT` assertions, the same way as `ASSERT`, and redefined `ASSERT` to return `void`. Then executed: ```lang=bash % USE_CLANG=1 make all -j55 -k 2> build.log % perl -naF: -e 'print "-- -number=".$F[1]." ".$F[0]."\n" if /: error:/' \ build.log \| xargs -L 1 perl -spi -e 's/ASSERT/EXPECT/g if $. == $number' % make format ``` After that I reverted back change to `ASSERT` in `util/testharness.h`. But preserved introduced `EXPECT`, which is the same as `ASSERT*`. This will be deleted once switched to gtest. This diff is independent and contains manual changes only in `util/testharness.h`. Test Plan: Make sure all tests are passing. ```lang=bash % USE_CLANG=1 make check ``` Reviewers: igor, lgalanis, sdong, yufei.zhu, rven, meyering Reviewed By: meyering Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33333	2015-03-16 20:52:32 -07:00
Venkatesh Radhakrishnan	b2b3086524	Speed up rocksDB close call. Summary: On RocksDB, when there are multiple instances doing flushes/compactions in the background, the close call takes a long time because the flushes/compactions need to complete before the database can shut down. If another instance is using the background threads and the compaction for this instance is in the queue since it has been scheduled, we still cannot shutdown. We now remove the scheduled background tasks which have not yet started running, so that shutdown is speeded up. Test Plan: DB Test added. Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33741	2015-03-16 18:49:14 -07:00
Igor Sugak	95344346af	rocksdb: Small refactoring before migrating to gtest Summary: These changes are necessary to make tests look more generic, and avoid feature conflicts with gtest. Test Plan: Make sure no build errors, and all test are passing. ``` % make check ``` Reviewers: igor, meyering Reviewed By: meyering Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35145	2015-03-16 18:08:59 -07:00
Mark Callaghan	56337faf3e	Fix compaction IO stats to handle large file counts Summary: The output did not have space for 6-digit file counts or for 3-digit counts of files being compacted. This adds space for that while preserving existing alignment. See https://gist.github.com/mdcallag/0a61c6a18dd467224c11 Task ID: # Blame Rev: Test Plan: run db_bench, look at output Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D35091	2015-03-16 11:50:23 -07:00
Igor Canadi	c6967a1a5e	Make RecordIn/RecordOut human readable Summary: I had hard time understanding these big numbers. Here's how the output looks like now: https://gist.github.com/igorcanadi/4c39c17685049584a992 Test Plan: db_bench Reviewers: sdong, MarkCallaghan Reviewed By: MarkCallaghan Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35073	2015-03-14 15:12:41 -07:00
Mark Callaghan	c8da670325	Stop printing per-level stall times. Summary: Per-level stall times are the suggested stall time, not the actual stall time so this change stops printing them both in the per-level output lines and in the summary. Also changed output for total stall time to include units in all cases. The new output looks like: Level Files Size(MB) Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) Stall(cnt) RecordIn RecordDrop ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ L0 4/1 7 0.8 0.0 0.0 0.0 0.6 0.6 0.0 0.0 0.0 12.9 50 352 0.141 882 0 0 L1 5/0 9 0.9 0.0 0.0 0.0 0.0 0.0 0.6 0.0 0.0 0.0 0 0 0.000 0 0 0 L2 54/0 99 1.0 0.0 0.0 0.0 0.0 0.0 0.6 0.0 0.0 0.0 0 0 0.000 0 0 0 L3 289/0 527 0.5 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0 0 0.000 0 0 0 Sum 352/1 642 0.0 0.0 0.0 0.0 0.6 0.6 1.7 1.0 0.0 12.9 50 352 0.141 882 0 0 Int 0/0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 15.5 0 3 0.118 7 0 0 Flush(GB): accumulative 0.627, interval 0.005 Stalls(count): 0 level0_slowdown, 0 level0_numfiles, 882 memtable_compaction, 0 leveln_slowdown_soft, 0 leveln_slowdown_hard Task ID: #6493861 Blame Rev: Test Plan: run db_bench, look at output Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D35085	2015-03-14 15:01:43 -07:00
Yueh-Hsuan Chiang	12134139e3	Fixed the unit-test issue in PreShutdownCompactionMiddle Summary: Fixed the unit-test issue in PreShutdownCompactionMiddle Test Plan: export ROCKSDB_TESTS=PreShutdownCompactionMiddle Reviewers: rven, sdong, igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35061	2015-03-14 08:25:27 -07:00
Yueh-Hsuan Chiang	fd1b3f385a	Fix the issue in PreShutdownMultipleCompaction Summary: Fix the issue in PreShutdownMultipleCompaction Test Plan: export ROCKSDB_TESTS=PreShutdownMultipleCompaction ./db_test Reviewers: rven, sdong, igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35055	2015-03-14 08:03:02 -07:00
Igor Canadi	417367c42d	Fix SIGSEGV when not using cache	2015-03-13 16:41:00 -07:00
Venkatesh Radhakrishnan	e25ff039c8	Prevent slowdowns and stalls in PreShutdown tests Summary: The preshutdown tests check for stopped compactions/flushes. Removing stalls on the write path. Test Plan: DBTests.PreShutdown* Reviewers: yhchiang, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35037	2015-03-13 14:51:40 -07:00
Igor Canadi	f690712652	Speed up db_bench shutdown Summary: See t6489044 Test Plan: compiles Reviewers: MarkCallaghan Reviewed By: MarkCallaghan Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34977	2015-03-13 14:45:15 -07:00
Yueh-Hsuan Chiang	c1b3cde18a	Improve the robustness of ThreadStatusSingleCompaction Summary: Improve the robustness of ThreadStatusSingleCompaction by ensuring the number of files flushed in the test. Test Plan: export ROCKSDB_TESTS=ThreadStatus ./db_test Reviewers: sdong, igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35019	2015-03-13 13:16:53 -07:00
Yueh-Hsuan Chiang	8c12426c93	Fix the deadlock issue in ThreadStatusSingleCompaction. Summary: Fix the deadlock issue in ThreadStatusSingleCompaction. In the previous version of ThreadStatusSingleCompaction, the compaction thread will wait for a SYNC_POINT while its db_mutex is held. However, if the test hasn't finished its Put cycle while a compaction is running, a deadlock will happen in the test. Test Plan: export ROCKSDB_TESTS=ThreadStatus ./db_test Reviewers: sdong, igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35001	2015-03-13 12:53:00 -07:00
sdong	b16ead531d	DBTest.DynamicLevelCompressionPerLevel should not run without snappy support Summary: The test depends on snappy to be used. Skip the test if it is not supported. Test Plan: Run the test Reviewers: meyering, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D34995	2015-03-13 11:26:17 -07:00
Yueh-Hsuan Chiang	a5e60bafc2	Fix a typo / test failure in ThreadStatusSingleCompaction Summary: Fix a typo / test failure in ThreadStatusSingleCompaction Test Plan: export ROCKSDB_TESTS=ThreadStatus ./db_test	2015-03-13 11:20:17 -07:00
Igor Canadi	cb2c91850c	Don't run some tests is snappy is not present Summary: Currently, we have `ifdef SNAPPY` around bunch of db_test code. Some tests that don't even use compression are also blocked when running system doesn't have snappy. This also causes hard-to-catch bugs, like D34983. We should dynamically figure out if compression is supported or not. Test Plan: compiles Reviewers: sdong, meyering Reviewed By: meyering Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34989	2015-03-13 11:08:50 -07:00
Yueh-Hsuan Chiang	c594b0e89d	Allow GetThreadList() to report operation stage. Summary: Allow GetThreadList() to report operation stage. Test Plan: ./thread_list_test ./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \ --max_background_compactions=10 --max_background_flushes=3 \ --thread_status_per_interval=1000 --key_size=16 --value_size=1000 \ --num_column_families=10 export ROCKSDB_TESTS=ThreadStatus ./db_test Sample output ThreadID ThreadType cfName Operation OP_StartTime ElapsedTime Stage State 140116265861184 Low Pri 140116270055488 Low Pri 140116274249792 High Pri column_family_name_000005 Flush 2015/03/10-14:58:11 0 us FlushJob::WriteLevel0Table 140116400078912 Low Pri column_family_name_000004 Compaction 2015/03/10-14:58:11 0 us CompactionJob::FinishCompactionOutputFile 140116358135872 Low Pri column_family_name_000006 Compaction 2015/03/10-14:58:10 1 us CompactionJob::FinishCompactionOutputFile 140116341358656 Low Pri 140116295221312 High Pri default Flush 2015/03/10-14:58:11 0 us FlushJob::WriteLevel0Table 140116324581440 Low Pri column_family_name_000009 Compaction 2015/03/10-14:58:11 0 us CompactionJob::ProcessKeyValueCompaction 140116278444096 Low Pri 140116299415616 Low Pri column_family_name_000008 Compaction 2015/03/10-14:58:11 0 us CompactionJob::FinishCompactionOutputFile 140116291027008 High Pri column_family_name_000001 Flush 2015/03/10-14:58:11 0 us FlushJob::WriteLevel0Table 140116286832704 Low Pri column_family_name_000002 Compaction 2015/03/10-14:58:11 0 us CompactionJob::FinishCompactionOutputFile 140116282638400 Low Pri Reviewers: rven, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34683	2015-03-13 10:45:40 -07:00
Igor Canadi	52d8347a91	EventLogger Summary: Here's my proposal for making our LOGs easier to read by machines. The idea is to dump all events as JSON objects. JSON is easy to read by humans, but more importantly, it's easy to read by machines. That way, we can parse this, load into SQLite/mongo and then query or visualize. I started with table_create and table_delete events, but if everybody agrees, I'll continue by adding more events (flush/compaction/etc etc) Test Plan: Ran db_bench. Observed: 2015/01/15-14:13:25.788019 1105ef000 EVENT_LOG_v1 {"time_micros": 1421360005788015, "event": "table_file_creation", "file_number": 12, "file_size": 1909699} 2015/01/15-14:13:25.956500 110740000 EVENT_LOG_v1 {"time_micros": 1421360005956498, "event": "table_file_deletion", "file_number": 12} Reviewers: yhchiang, rven, dhruba, MarkCallaghan, lgalanis, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31647	2015-03-13 10:15:54 -07:00
Islam AbdelRahman	9d22a1f136	Allow negative Wnew Summary: we are using uint64_t for Wnew this is not correct since this value can be negative https://github.com/facebook/rocksdb/issues/535 Test Plan: run db_bench and check what happens when Wnew is -ve Reviewers: sdong, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D34935	2015-03-12 20:53:18 -07:00
Venkatesh Radhakrishnan	b411d06031	Prevent stalls in preshutdown tests Summary: The tests using sync_point for intent to shutdown stop compaction and this results in stalls if too many rows are written. We now limit the number of rows written to prevent stalls, since the focus of the test is to cancel background work, which is being correctly tested. This fixes a Jenkins issue. Test Plan: DBTest.PreShutdown* Reviewers: sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34893	2015-03-12 10:49:06 -07:00
Islam AbdelRahman	1d43bc41fb	Fixing segmentation fault in db_bench Summary: Fixing segmentation fault when running db_bench This seg fault happens because num_created is used without being initialized Test Plan: running db_bench using these arguments bpl=10485760;overlap=10;mcz=2;del=300000000;levels=6;ctrig=4; delay=8; stop=12; wbn=3; mbc=20; mb=67108864;wbs=134217728; dds=0; sync=0; r=1000000; t=1; vs=800; bs=65536; cs=1048576; of=500000; si=1000000; ./db_bench --benchmarks=overwrite --disable_seek_compaction=1 --mmap_read=0 --statistics=1 --histogram=1 --num=$r --threads=$t --value_size=$vs --block_size=$bs --cache_size=$cs --bloom_bits=10 --cache_numshardbits=4 --open_files=$of --verify_checksum=1 --db=/home/tec/koko/ --sync=$sync --disable_wal=1 --compression_type=zlib --stats_interval=$si --compression_ratio=0.5 --disable_data_sync=$dds --write_buffer_size=$wbs --target_file_size_base=$mb --max_write_buffer_number=$wbn --max_background_compactions=$mbc --level0_file_num_compaction_trigger=$ctrig --level0_slowdown_writes_trigger=$delay --level0_stop_writes_trigger=$stop --num_levels=$levels --delete_obsolete_files_period_micros=$del --min_level_to_compress=$mcz --max_grandparent_overlap_factor=$overlap --stats_per_interval=1 --max_bytes_for_level_base=$bpl --use_existing_db=1 Reviewers: sdong, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D34881	2015-03-11 17:57:16 -07:00
sdong	e9de8b65a6	Change the way options.compression_per_level is used when options.level_compaction_dynamic_level_bytes=true Summary: Change the way options.compression_per_level is used when options.level_compaction_dynamic_level_bytes=true so that options.compression_per_level[1] determines compression for the level L0 is merged to, options.compression_per_level[2] to the level after that, etc. Test Plan: run all tests Reviewers: rven, yhchiang, kradhakrishnan, igor Reviewed By: igor Subscribers: yoshinorim, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D34431	2015-03-11 13:14:52 -07:00
Yueh-Hsuan Chiang	2b785d76b8	Fixed a bug where CompactFiles won't delete obsolete files until flush. Summary: Fixed a bug where CompactFiles won't delete obsolete files until flush. Test Plan: ./compact_files_test export ROCKSDB_TESTS=CompactFiles ./db_test Reviewers: rven, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34671	2015-03-11 13:06:59 -07:00
sdong	2884b100ba	db_bench: Better way to randomize repeated read keys in -read_random_exp_range Summary: Use a better way to map from a key with locality to a random location. Now with the same -read_random_exp_range setting, hit rate drops, which it is expected. Test Plan: ./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=<multiple_values> Reviewers: MarkCallaghan, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D34761	2015-03-11 11:46:14 -07:00
Venkatesh Radhakrishnan	284be570c8	Provide a mechanism to inform Rocksdb that it is shutting down Summary: Provide an API which enables users to infor Rocksdb that it is shutting down. Test Plan: db_test Reviewers: sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34617	2015-03-11 10:31:02 -07:00
Igor Canadi	2ddf53b2ca	Get OptimizeFilterForHits work on Mac Summary: Got it working by some voodoo programming Test Plan: works! Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34611	2015-03-10 17:53:22 -07:00
Yueh-Hsuan Chiang	89597bb66b	Allow GetThreadList() to report the start time of the current operation. Summary: Allow GetThreadList() to report the start time of the current operation. Test Plan: ./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \ --max_background_compactions=10 --max_background_flushes=3 \ --thread_status_per_interval=1000 --key_size=16 --value_size=1000 \ --num_column_families=10 Sample output: ThreadID ThreadType cfName Operation OP_StartTime State 140338840797248 High Pri column_family_name_000003 Flush 2015/03/09-17:49:59 140338844991552 High Pri column_family_name_000004 Flush 2015/03/09-17:49:59 140338849185856 Low Pri 140338983403584 Low Pri 140339008569408 Low Pri 140338861768768 Low Pri 140338924683328 Low Pri 140338899517504 Low Pri 140338853380160 Low Pri 140338882740288 Low Pri 140338865963072 High Pri column_family_name_000006 Flush 2015/03/09-17:49:59 140338954043456 Low Pri 140338857574464 Low Pri Reviewers: igor, rven, sdong Reviewed By: sdong Subscribers: lgalanis, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34689	2015-03-10 14:51:28 -07:00
sdong	37921b4997	db_bench: Add Option -read_random_exp_range to allow read skewness. Summary: Introduce parameter -read_random_exp_range in db_bench to provide some key skewness in readrandom and multireadrandom benchmarks. It will helpful to cover block cache better. Test Plan: Run benchmarks with this new parameter. I can clearly see block cache hit rate change while I increase this value (DB size is about 66MB): ./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=0.0 rocksdb.block.cache.data.miss COUNT : 958418 rocksdb.block.cache.data.hit COUNT : 41582 ./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=5.0 rocksdb.block.cache.data.miss COUNT : 819518 rocksdb.block.cache.data.hit COUNT : 180482 ./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=10.0 rocksdb.block.cache.data.miss COUNT : 450479 rocksdb.block.cache.data.hit COUNT : 549521 ./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=20.0 rocksdb.block.cache.data.miss COUNT : 223192 rocksdb.block.cache.data.hit COUNT : 776808 Reviewers: MarkCallaghan, kradhakrishnan, yhchiang, rven, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D34629	2015-03-09 11:34:52 -07:00
Igor Canadi	485ac0dbd0	Add rate_limiter to string options Summary: I want to be able to set this through mongo config. Test Plan: added unit test Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34599	2015-03-06 14:21:15 -08:00
Yueh-Hsuan Chiang	dc4532c497	Add --thread_status_per_interval to db_bench Summary: Add --thread_status_per_interval to db_bench, which allows db_bench to optionally enable print the current thread status periodically. Test Plan: ./db_bench --benchmarks=fillrandom --num=100000 --threads=40 --max_background_compactions=10 --max_background_flushes=3 --thread_status_per_interval=1000 --key_size=16 --value_size=1000 --num_column_families=10 Sample output: ThreadID ThreadType dbName cfName Operation State 140281571770432 Low Pri 140281575964736 High Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000001 Flush 140281710182464 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000008 Compaction 140281638879296 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000007 Compaction 140281592741952 Low Pri 140281580159040 High Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000002 Flush 140281676628032 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000006 Compaction 140281584353344 Low Pri 140281622102080 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000009 Compaction 140281605324864 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000004 Compaction 140281601130560 High Pri /tmp/rocksdbtest-5297/dbbench default Flush 140281596936256 Low Pri 140281588547648 Low Pri Reviewers: igor, rven, sdong Reviewed By: rven Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D34515	2015-03-06 11:22:06 -08:00
Yueh-Hsuan Chiang	694988b627	Fix a bug in stall time counter. Improve its output format. Summary: Fix a bug in stall time counter. Improve its output format. Test Plan: export ROCKSDB_TESTS=Timeout ./db_test ./db_bench --benchmarks=fillrandom --stats_interval=10000 --statistics=true --stats_per_interval=1 --num=1000000 --threads=4 --level0_stop_writes_trigger=3 --level0_slowdown_writes_trigger=2 sample output: Uptime(secs): 35.8 total, 0.0 interval Cumulative writes: 359590 writes, 359589 keys, 183047 batches, 2.0 writes per batch, 0.04 GB user ingest, stall seconds: 1786.008 ms Cumulative WAL: 359591 writes, 183046 syncs, 1.96 writes per sync, 0.04 GB written Interval writes: 253 writes, 253 keys, 128 batches, 2.0 writes per batch, 0.0 MB user ingest, stall time: 0 us Interval WAL: 253 writes, 128 syncs, 1.96 writes per sync, 0.00 MB written Reviewers: MarkCallaghan, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34275	2015-03-03 12:48:12 -08:00
Igor Canadi	db03739340	options.level_compaction_dynamic_level_bytes to allow RocksDB to pick size bases of levels dynamically. Summary: When having fixed max_bytes_for_level_base, the ratio of size of largest level and the second one can range from 0 to the multiplier. This makes LSM tree frequently irregular and unpredictable. It can also cause poor space amplification in some cases. In this improvement (proposed by Igor Kabiljo), we introduce a parameter option.level_compaction_use_dynamic_max_bytes. When turning it on, RocksDB is free to pick a level base in the range of (options.max_bytes_for_level_base/options.max_bytes_for_level_multiplier, options.max_bytes_for_level_base] so that real level ratios are close to options.max_bytes_for_level_multiplier. Test Plan: New unit tests and pass tests suites including valgrind. Reviewers: MarkCallaghan, rven, yhchiang, igor, ikabiljo Reviewed By: ikabiljo Subscribers: yoshinorim, ikabiljo, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31437	2015-03-02 22:40:41 -08:00
Mark Callaghan	c4bd03a97e	Fix typo in log message Summary: fix typo Task ID: # Blame Rev: Test Plan: Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D34251	2015-03-02 09:35:50 -08:00
Igor Canadi	3cf7f353d9	Instrument memtable seeks Summary: As title Test Plan: compiles Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34191	2015-02-27 17:06:06 -08:00
Sameet Agarwal	e7c434c364	Add columnfamily option optimize_filters_for_hits to optimize for key hits only Summary: Summary: Added a new option to ColumnFamllyOptions - optimize_filters_for_hits. This option can be used in the case where most accesses to the store are key hits and we dont need to optimize performance for key misses. This is useful when you have a very large database and most of your lookups succeed. The option allows the store to not store and use filters in the last level (the largest level which contains data). These filters can take a large amount of space for large databases (in memory and on-disk). For the last level, these filters are only useful for key misses and not for key hits. If we are not optimizing for key misses, we can choose to not store these filters for that level. This option is only provided for BlockBasedTable. We skip the filters when we are compacting Test Plan: 1. Modified db_test toalso run tests with an additonal option (skip_filters_on_last_level) 2. Added another unit test to db_test which specifically tests that filters are being skipped Reviewers: rven, igor, sdong Reviewed By: sdong Subscribers: lgalanis, yoshinorim, MarkCallaghan, rven, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33717	2015-02-26 16:25:56 -08:00
Igor Sugak	62247ffa3b	rocksdb: Add missing override Summary: When using latest clang (3.6 or 3.7/trunck) rocksdb is failing with many errors. Almost all of them are missing override errors. This diff adds missing override keyword. No manual changes. Prerequisites: bear and clang 3.5 build with extra tools ```lang=bash % USE_CLANG=1 bear make all # generate a compilation database http://clang.llvm.org/docs/JSONCompilationDatabase.html % clang-modernize -p . -include . -add-override % make format ``` Test Plan: Make sure all tests are passing. ```lang=bash % #Use default fb code clang. % make check ``` Verify less error and no missing override errors. ```lang=bash % # Have trunk clang present in path. % ROCKSDB_NO_FBCODE=1 CC=clang CXX=clang++ make ``` Reviewers: igor, kradhakrishnan, rven, meyering, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34077	2015-02-26 11:28:41 -08:00
Mark Callaghan	182b4ceacd	Limit key range to number of keys, not number of writes Summary: An old commit (482401) changed DoWrite to use the value of --writes rather than --num to determine the range for keys. This restores the old and correct behavior which is to limit it using --num. Task ID: #6353043 Blame Rev: Test Plan: run db_bench Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D34065	2015-02-25 15:53:45 -08:00
Venkatesh Radhakrishnan	4ade89962d	Fix compile error on MacOS. Summary: In a release build, a member was not being accessed. This member was only being accessed in a debug build. We now add an accessor function for this member and the buid succeeds. Test Plan: build release/unity/debug on linux/mac Reviewers: sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34035	2015-02-24 16:24:53 -08:00
Igor Canadi	ace3d85068	Revert "Unused managed iterator" This reverts commit `bd339a9798`. Conflicts: db/managed_iterator.cc	2015-02-24 13:27:41 -08:00
Igor Canadi	7b8f348e56	Attempt at fixing travis issue	2015-02-24 12:20:43 -08:00
Igor Canadi	bd339a9798	Unused managed iterator Summary: This causes warnings on OS X Test Plan: compiles Reviewers: rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33969	2015-02-24 09:51:52 -08:00
Jinfu Leng	96d989f70d	catch config errors with L0 file count triggers Test Plan: Run "make clean && make all check" Reviewers: rven, igor, yhchiang, kradhakrishnan, MarkCallaghan, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D33627	2015-02-23 16:08:27 -08:00
Igor Sugak	62f7a1be4f	rocksdb: Fixed 'Dead assignment' and 'Dead initialization' scan-build warnings Summary: This diff contains trivial fixes for 6 scan-build warnings: db/c_test.c `db` variable is never read. Removed assignment. scan-build report: http://home.fburl.com/~sugak/latest20/report-9b77d2.html#EndPath db/db_iter.cc `skipping` local variable is assigned to false. Then in the next switch block the only "non return" case assign `skipping` to true, the rest cases don't use it and all do return. scan-build report: http://home.fburl.com/~sugak/latest20/report-13fca7.html#EndPath db/log_reader.cc In `bool Reader::SkipToInitialBlock()` `offset_in_block` local variable is assigned to 0 `if (offset_in_block > kBlockSize - 6)` and then never used. Removed the assignment and renamed it to `initial_offset_in_block` to avoid confusion. scan-build report: http://home.fburl.com/~sugak/latest20/report-a618dd.html#EndPath In `bool Reader::ReadRecord(Slice* record, std::string* scratch)` local variable `in_fragmented_record` in switch case `kFullType` block is assigned to false and then does `return` without use. In the other switch case `kFirstType` block the same `in_fragmented_record` is assigned to false, but later assigned to true without prior use. Removed assignment for both cases. scan-build reprots: http://home.fburl.com/~sugak/latest20/report-bb86b0.html#EndPath http://home.fburl.com/~sugak/latest20/report-a975be.html#EndPath table/plain_table_key_coding.cc Local variable `user_key_size` is assigned when declared. But then in both places where it is used assigned to `static_cast<uint32_t>(key.size() - 8)`. Changed to initialize the variable to the proper value in declaration. scan-build report: http://home.fburl.com/~sugak/latest20/report-9e6b86.html#EndPath tools/db_stress.cc Missing `break` in switch case block. This seems to be a bug. Added missing `break`. Test Plan: Make sure all tests are passing and scan-build does not report 'Dead assignment' and 'Dead initialization' bugs. ```lang=bash % make check % make analyze ``` Reviewers: meyering, igor, kradhakrishnan, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33795	2015-02-23 14:10:09 -08:00
Jim Meyering	a2b911b63f	inputs: restore "const" attribute removed by D33759 Summary: The "const" attribute applies to the type, and placing it before that return type retains the desired semantics, yet avoids the compiler error/warning. Test Plan: Run make Reviewers: ljin, sdong, igor.sugak, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D33789	2015-02-20 11:52:20 -08:00
Jim Meyering	c6d54b5037	fix erroneous assert: cast kBlockSize (of type unsigned int) to "int" Summary: Otherwise, we would assert that an unsigned expression is always >= 0. The intent was to form a possibly negative number, and to assert that that value is always >= 0, but since one variable in the computation was unsigned, the result was guaranteed to be unsigned, too, rendering the assertion useless. Cast that unsigned variable to "int", so that all operands are signed, and thus so that the result can be negative. Test Plan: Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors. Reviewers: ljin, sdong, igor.sugak, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D33771	2015-02-20 11:07:17 -08:00
Jim Meyering	c37937a9ce	maint: remove extraneous "const" attribute from return type Summary: The "const" attribute does not make sense on a return type, and provokes a warning/error from gcc -W -Wextra. Test Plan: Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors. Reviewers: ljin, sdong, igor.sugak, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D33759	2015-02-20 11:07:07 -08:00
Jim Meyering	9283c7afd2	build: remove always-true assertions Summary: Remove some always-true assertions. They provoke these compilation failures: table/plain_table_key_coding.cc:279:20: error: comparison of unsigned expression >= 0 is always true [-Werror=type-limits] db/version_set.cc:336:15: error: comparison of unsigned expression >= 0 is always true [-Werror=type-limits] * table/plain_table_key_coding.cc (rocksdb): Remove assertion that unsigned type variable is >= 0. * db/version_set.cc (DoGenerateLevelFilesBrief): Likewise. Test Plan: Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors. Reviewers: ljin, sdong, igor.sugak, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D33747	2015-02-20 11:07:03 -08:00
Igor Sugak	73711f956c	rocksdb: Fix scan-build bug 'Memory leak' in db/db_bench.cc Summary: The bug is detected by scan-build. In `void WriteSeqSeekSeq(ThreadState* thread)` memory is allocated in line 3118 `Slice key = AllocateKey();` but `Slice` is not responsible deleting `Slice::data()`. Added `std::unique_ptr<const char[]>*` parameter to ` AllocateKey()`, so that it requires caller to not forget about Slice::data() management. scan-build bug report: http://home.fburl.com/~sugak/latest6/report-6e9754.html#EndPath Test Plan: Make sure scan-build does not report 'Memory leak' in db/db_bench.cc and all tests are passing. ```lang=bash % make analyze % make check ``` Reviewers: lgalanis, igor, meyering, sdong Reviewed By: meyering, sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33501	2015-02-19 14:27:48 -08:00
Jim Meyering	a42324e370	build: do not relink every single binary just for a timestamp Summary: Prior to this change, "make check" would always waste a lot of time relinking 60+ binaries. With this change, it does that only when the generated file, util/build_version.cc, changes, and that happens only when the date changes or when the current git SHA changes. This change makes some other improvements: before, there was no rule to build a deleted util/build_version.cc. If it was somehow removed, any attempt to link a program would fail. There is no longer any need for the separate file, build_tools/build_detect_version. Its functionality is now in the Makefile. * Makefile (DEPFILES): Don't filter-out util/build_version.cc. No need, and besides, removing that dependency was wrong. (date, git_sha, gen_build_version): New helper variables. (util/build_version.cc): New rule, to create this file and update it only if it would contain new information. * build_tools/build_detect_platform: Remove file. * db/db_impl.cc: Now, print only date (not the time). * util/build_version.h (rocksdb_build_compile_time): Remove declaration. No longer used. Test Plan: - Run "make check" twice, and note that the second time no linking is performed. - Remove util/build_version.cc and ensure that any "make" command regenerates it before doing anything else. - Run this: strings librocksdb.a\|grep _build_. That prints output including the following: rocksdb_build_git_date:2015-02-19 rocksdb_build_git_sha:2.8.fb-1792-g3cb6cc0 Reviewers: ljin, sdong, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D33591	2015-02-19 13:11:10 -08:00
sdong	d45a6a4002	Add rocksdb.num-live-versions: number of live versions Summary: Add a DB property about live versions. It can be helpful to figure out whether there are files not live but not yet deleted, in some use cases. Test Plan: make all check Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: yoshinorim, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33327	2015-02-19 13:10:37 -08:00
Venkatesh Radhakrishnan	7d817268b9	Managed iterator Summary: This is a diff for managed iterator. A managed iterator is a wrapper around an iterator which saves the options for that iterator as well as the current key/value so that the underlying iterator and its associated memory can be released when it is aged out automatically or on the request of the user. Will provide the automatic release as a follow-up diff. Test Plan: Managed* tests in db_test and XF tests for managed iterator Reviewers: igor, yhchiang, anthony, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31401	2015-02-18 11:49:31 -08:00
Yueh-Hsuan Chiang	12753130ec	Remove ThreadStatusMultiCompaction test Summary: Remove ThreadStatusMultiCompaction test as it's currently written in a way that depends on some randomness, while the flush / compaction status of a single thread is also covered in ThreadStatusFlush and ThreadStatusSingleCompaction tests. Test Plan: ./db_test Reviewers: igor, sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33537	2015-02-17 12:04:56 -08:00
Yueh-Hsuan Chiang	e60bc99fe0	Allow GetThreadList to reflect flush activity. Summary: Allow GetThreadList to reflect flush activity. Test Plan: Developed ThreadStatusFlush test and updated ThreadStatusMultiCompaction test. ./db_test ./thread_list_test Reviewers: sdong, rven, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D32871	2015-02-17 10:13:52 -08:00
Igor Canadi	e7ea51a8e7	Introduce job_id for flush and compaction Summary: It would be good to assing background job their IDs. Two benefits: 1) makes LOGs more readable 2) I might use it in my EventLogger, which will try to make our LOG easier to read/query/visualize Test Plan: ran rocksdb, read the LOG Reviewers: sdong, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31617	2015-02-12 09:54:48 -08:00
sdong	5f00af4570	DBTest.DestroyDBMetaDatabase: create DB directories if not exists Summary: DBTest.DestroyDBMetaDatabase occasionally fails on my dev host, for file not existing. Always create directories to avoid that. Test Plan: Run the test Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33321	2015-02-11 16:16:50 -08:00
sdong	68af7811ea	Remember whole key/prefix filtering on/off in SST file Summary: Remember whole key or prefix filtering on/off in SST files. If user opens the DB with a different setting that cannot be satisfied while reading the SST file, ignore the bloom filter. Test Plan: Add a unit test for it Reviewers: yhchiang, igor, rven Reviewed By: rven Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32889	2015-02-11 11:20:04 -08:00
sdong	6d6305dd7d	Perf Context to report DB mutex waiting time Summary: Add counters in perf context to allow users to figure out how time spent on waiting for DB mutex Test Plan: Add a test and run it. Reviewers: yhchiang, rven, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D33177	2015-02-09 17:55:12 -08:00
Igor Canadi	863009b5a5	Fix deleting obsolete files #2 Summary: For description of the bug, see comment in db_test. The fix is pretty straight forward. Test Plan: added unit test. eventually we need better testing of FOF/POF process. Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33081	2015-02-09 17:38:32 -08:00
Grace Law	1851f977c2	Added RocksDB stats GET_HIT_L0 and GET_HIT_L1 Summary: - In statistics.h , added tickers. - In version_set.cc, -- Added a getter method for hit_file_level_ in the class FilePicker -- Added a line in the Get() method in case of a found, increment the corresponding counters based on the level of the file respectively. Corresponding task: https://our.intern.facebook.com/intern/tasks/?s=506100481&t=5952818 Personal fork: `0c3f2e3600` Test Plan: In terminal, ``` make -j32 db_test ROCKSDB_TESTS=L0L1L2AndUpHitCounter ./db_test ``` Or to use debugger, ``` make -j32 db_test export ROCKSDB_TESTS=L0L1L2AndUpHitCounter gdb db_test ``` Reviewers: rven, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D32205	2015-02-09 14:53:58 -08:00
sdong	91ac3b2067	Print DB pointer when opening a DB Summary: Having a pointer for DB will be helpful to debug when GDB or working on a dump. If the client process doesn't have any thread actively working on RocksDB, it can be hard to find out. Test Plan: make all check Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: yoshinorim, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33159	2015-02-09 12:52:58 -08:00
fyrz	cfe8837e43	Switch logv with loglevel to virtual	2015-02-09 20:59:29 +01:00
Igor Canadi	aaceef3638	Fix formatting	2015-02-09 09:53:30 -08:00
Igor Canadi	ee4aa9a0ee	Merge pull request #481 from mkevac/backupable Allow creating and restoring backups from C	2015-02-09 09:51:19 -08:00
Marko Kevac	9651308307	renamed backup to backup_and_restore in c_test for clarity	2015-02-09 12:11:42 +03:00
Marko Kevac	7e50ed8c24	Added some more wrappers and wrote a test for backup in C	2015-02-07 14:25:10 +03:00
Igor Canadi	2a979822b6	Fix deleting obsolete files Summary: This diff basically reverts D30249 and also adds a unit test that was failing before this patch. I have no idea how I didn't catch this terrible bug when writing a diff, sorry about that :( I think we should redesign our system of keeping track of and deleting files. This is already a second bug in this critical piece of code. I'll think of few ideas. BTW this diff is also a regression when running lots of column families. I plan to revisit this separately. Test Plan: added a unit test Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33045	2015-02-06 08:44:30 -08:00
Igor Canadi	6f10130354	Fix DestroyDB Summary: When DestroyDB() finds a wal file in the DB directory, it assumes it is actually in WAL directory. This can lead to confusion, since it reports IO error when it tries to delete wal file from DB directory. For example: https://ci-builds.fb.com/job/rocksdb_clang_build/296/console This change will fix our unit tests. Test Plan: unit tests work Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32907	2015-02-05 20:09:42 -08:00
Igor Canadi	7de4e99a8e	Revert "Fix wal_dir not getting cleaned" This reverts commit `f36d394aed`.	2015-02-05 11:44:17 -08:00
Yueh-Hsuan Chiang	181191a1e4	Add a counter for collecting the wait time on db mutex. Summary: Add a counter for collecting the wait time on db mutex. Also add MutexWrapper and CondVarWrapper for measuring wait time. Test Plan: ./db_test export ROCKSDB_TESTS=MutexWaitStats ./db_test verify stats output using db_bench make clean make release ./db_bench --statistics=1 --benchmarks=fillseq,readwhilewriting --num=10000 --threads=10 Sample output: rocksdb.db.mutex.wait.micros COUNT : 7546866 Reviewers: MarkCallaghan, rven, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32787	2015-02-04 21:39:45 -08:00
Igor Canadi	f36d394aed	Fix wal_dir not getting cleaned	2015-02-04 18:57:22 -08:00
sdong	53ae09c398	db_test: fix a data race in SpecialEnv Summary: db_test's test class SpecialEnv has a thread unsafe variable rnd_ but it can be accessed by multiple threads. It is complained by TSAN. Protect it by a mutex. Test Plan: Run the test Reviewers: yhchiang, rven, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32895	2015-02-04 18:32:53 -08:00
Igor Canadi	3e53760fc4	Fix compaction_picker_test	2015-02-04 16:20:25 -08:00
Igor Canadi	e39f4f6cf9	Fix data race #3 Summary: Added requirement that ComputeCompactionScore() be executed in mutex, since it's accessing being_compacted bool, which can be mutated by other threads. Also added more comments about thread safety of FileMetaData, since it was a bit confusing. However, it seems that FileMetaData doesn't have data races (except being_compacted) Test Plan: Ran 100 ConvertCompactionStyle tests with thread sanitizer. On master -- some failures. With this patch -- none. Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32283	2015-02-04 16:04:51 -08:00
sdong	e63140d52b	Get() to use prefix bloom filter when filter is not block based Summary: Get() now doesn't make use of bloom filter if it is prefix based. Add the check. Didn't touch block based bloom filter. I can't fully reason whether it is correct to do that. But it's straight-forward to for full bloom filter. Test Plan: make all check Add a test case in DBTest Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: MarkCallaghan, leveldb, dhruba, yoshinorim Differential Revision: https://reviews.facebook.net/D31941	2015-02-04 15:15:41 -08:00
Venkatesh Radhakrishnan	dad98dd4ae	Changes for supporting cross functional tests for inplace_update Summary: This diff containes the changes to the code and db_test for supporting cross functional tests for inplace_update Test Plan: Run XF with inplace_test and also without Reviewers: igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32367	2015-02-03 12:19:56 -08:00
sdong	9898f63988	Divide test DBIteratorTest.DBIterator to smaller tests Summary: When building on my host, I saw warning: In file included from db/db_iter_test.cc:17:0: db/db_iter_test.cc: In member function â€˜void rocksdb::_Test_DBIterator::_Run()â€™: ./util/testharness.h:147:14: note: variable tracking size limit exceeded with -fvar-tracking-assignments, retrying without void TCONCAT(_Test_,name)::_Run() ^ ./util/testharness.h:134:23: note: in definition of macro â€˜TCONCAT1â€™ #define TCONCAT1(a,b) a##b ^ ./util/testharness.h:147:6: note: in expansion of macro â€˜TCONCATâ€™ void TCONCAT(_Test_,name)::_Run() ^ db/db_iter_test.cc:589:1: note: in expansion of macro â€˜TESTâ€™ TEST(DBIteratorTest, DBIterator) { ^ By dividing the test into small tests, it should fix the problem Test Plan: Run the test Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32679	2015-02-03 09:48:03 -08:00
Venkatesh Radhakrishnan	0b8dec7172	Cross functional test infrastructure for RocksDB. Summary: This Diff provides the implementation of the cross functional test infrastructure. This provides the ability to test a single feature with every existing regression test in order to identify issues with interoperability between features. Test Plan: Reference implementation of inplace update support cross functional test. Able to find interoperability issues with inplace support and ran all of db_test. Will add separate diff for those changes. Reviewers: igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32247	2015-02-02 14:49:22 -08:00
Marko Kevac	86e2a1eeea	Allow creating backups from C	2015-01-31 15:47:49 +03:00
sdong	5917de0bae	CappedFixTransform: return fixed length prefix, or full key if key is shorter than the fixed length Summary: Add CappedFixTransform, which is the same as fixed length prefix extractor, except that when slice is shorter than the fixed length, it will use the full key. Test Plan: Add a test case for db_test options_test and a new test Reviewers: yhchiang, rven, igor Reviewed By: igor Subscribers: MarkCallaghan, leveldb, dhruba, yoshinorim Differential Revision: https://reviews.facebook.net/D31887	2015-01-30 16:04:30 -08:00
Igor Canadi	6c6037f60c	Expose Snapshot's SequenceNumber Summary: Requested here: https://www.facebook.com/groups/rocksdb.dev/permalink/705524519546065/ It might also help with mongo. I don't see a reason why we shouldn't expose this info. Test Plan: make check Reviewers: sdong, yhchiang, rven Reviewed By: rven Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32547	2015-01-29 18:22:43 -08:00
sdong	d07fec3bdc	make DBTest.SharedWriteBuffer to pass MockEnv Summary: DBTest.SharedWriteBuffer uses an Options that doesn't pass CurrentOptions(), so that it doesn't use MockEnv. However, DBTest's constructor uses MockEnv to call DestoryDB() to clean up, causing uncleaned state before it runs. Test Plan: Run the test modified to make sure they pass default Env and SharedWriteBuffer now passes MockEnv. Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32475	2015-01-28 16:19:27 -08:00
Igor Canadi	4bdf38b16e	Disable FlushSchedule when running TSAN Summary: There's a bug in TSAN (or libstdc++?) with std::shared_ptr<> for some reason. In db_test, only FlushSchedule is affected. See more: https://groups.google.com/forum/#!topic/thread-sanitizer/vz_s-t226Vg With this change and all other @sdong's and mine diffs, our db_test should be TSAN-clean. I'll move to other tests. Test Plan: no more flush schedule when running TSAN Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: sdong, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32469	2015-01-28 15:31:48 -08:00
sdong	10af17f3d7	fault_injection_test: add a unit test to allow parallel compactions and multiple levels Summary: Add a new test case in fault_injection_test, which covers parallel compactions and multiple levels. Use MockEnv to run the new test case to speed it up. Improve MockEnv to avoid DestoryDB(), previously failed when deleting lock files. Test Plan: Run ./fault_injection_test, including valgrind Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32415	2015-01-28 14:07:25 -08:00
Igor Canadi	560ed402bd	[minor] fprintf to stderr instead of stdout in test	2015-01-27 21:00:33 -08:00
sdong	d2a2b058f0	fault_injection_test: to support file closed after being deleted Summary: fault_injection_test occasionally fails because file closing can happen after deletion. Improve the test to support it. Test Plan: I have a new test case I'm working on, where the issue appears almost every time. With the patch, the problem goes away. Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32373	2015-01-27 17:06:47 -08:00
Yueh-Hsuan Chiang	d6c7300ccf	Fixed a compile warning in clang in db/listener_test.cc Summary: Fixed a compile warning in clang in db/listener_test.cc Test Plan: make listener_test Reviewers: oridb Reviewed By: oridb Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D32337	2015-01-27 15:01:04 -08:00
Ori Bernstein	f9758e0129	Add compaction listener. Summary: This adds a listener for compactions, and gives some useful statistics on each compaction pass. Test Plan: Unit tests. Reviewers: sdong, igor, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D31641	2015-01-27 14:44:02 -08:00
sdong	e919ecedfc	SuperVersion::Unref() to use sequential consistency to decrease ref counting Summary: I'm not sure the expected results of std::atomic::fetch_sub() when using memory_order_relaxed, and I suspect TSAN complains. Test Plan: make all check Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32259	2015-01-27 14:08:08 -08:00
sdong	1b43ab58d9	fault_injection_test: add more logging and makes synchronization slightly stronger Summary: We see failure of the test in travis but I can't repro it. Add more logging in failure cases to help us figure out which failure it is. Also makes synchronization slightly stronger, though there isn't seem to be a problem without it Test Plan: Run the test Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32319	2015-01-27 13:53:17 -08:00
sdong	be8f0b12ed	Rename DBImpl::log_dir_unsynced_ to log_dir_synced_ Summary: log_dir_unsynced_ is a confusing name. Rename it to log_dir_synced_ and flip the value. Test Plan: Run ./fault_injection_test Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32235	2015-01-26 16:01:36 -08:00
sdong	c1de6c42a0	fault_injection_test: add a test case to drop random number of unsynced data Summary: Currently fault_injection_test has a test case to drop all the unsynced data. Add one more case to take a randomized bytes from it. Test Plan: Run the test Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32229	2015-01-26 15:53:21 -08:00
sdong	d888c95748	Sync WAL Directory and DB Path if different from DB directory Summary: 1. If WAL directory is different from db directory. Sync the directory after creating a log file under it. 2. After creating an SST file, sync its parent directory instead of DB directory. 3. change the check of kResetDeleteUnsyncedFiles in fault_injection_test. Since we changed the behavior to sync log files' parent directory after first WAL sync, instead of creating, kResetDeleteUnsyncedFiles will not guarantee to show post sync updates. Test Plan: make all check Reviewers: yhchiang, rven, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32067	2015-01-26 14:17:45 -08:00
Igor Canadi	f1c8862479	Fix data race #1 Summary: This is first in a series of diffs that fixes data races detected by thread sanitizer. Here the problem is that we call Ref() on a column family during a single-threaded write, without holding a mutex. Test Plan: TSAN is no longer complaining about LevelLimitReopen. Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32121	2015-01-26 11:48:07 -08:00
Igor Canadi	42189612c3	Fix data race #2 Summary: We should not be calling InternalStats methods outside of the mutex. Test Plan: COMPILE_WITH_TSAN=1 m db_test && ROCKSDB_TESTS=CompactionTrigger ./db_test failing before the diff, works now Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32127	2015-01-23 18:04:39 -08:00
Igor Canadi	f5a8398352	Fix archive WAL race conditions Summary: More race condition bugs with our archive WAL files. I do believe this caused t5988326, but can't reproduce the failure unfortunately. Test Plan: make check Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32103	2015-01-23 17:35:12 -08:00
sdong	43ec4e68ba	fault_injection_test: bring back 3 iteration runs Summary: 3 iterations were disabled by mistake by one recent commit, causing CLANG build error. Fix it Test Plan: USE_CLANG=1 make fault_injection_test and run the test Reviewers: igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32109	2015-01-23 16:30:43 -08:00
sdong	c2e8e8c1c0	Fix two namings in fault_injection_test.cc Summary: fault_injection_test.cc has two variable names not following the convention fix it. Test Plan: run the test Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32097	2015-01-23 16:12:48 -08:00
sdong	b4c13a868a	fault_injection_test: improvements and add new tests Summary: Wrapper classes in fault_injection_test doesn't simulate RocksDB Env behavior close enough. Improve it by: (1) when fsync, don't sync parent (2) support directory fsync (3) support multiple directories Add test cases of (1) persisting by WAL fsync, not just compact range (2) different WAL dir (3) combination of (1) and (2) (4) data directory is not the same as db name. Test Plan: Run the test and make sure it passes. Reviewers: rven, yhchiang, igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32031	2015-01-23 15:50:15 -08:00
Yueh-Hsuan Chiang	46a7048dcd	Reduce false alarm in ThreadStatusMultipleCompaction test	2015-01-22 15:45:02 -08:00
sdong	4e48753b73	Sync manifest file when initializing it Summary: Now we don't sync manifest file when initializing it, so DB cannot be safely reopened before the first mem table flush. Fix it by syncing it. This fixes fault_injection_test. Test Plan: make all check Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32001	2015-01-22 14:32:03 -08:00
Igor Canadi	ae82849bc9	Fix build failure	2015-01-21 18:23:12 -08:00
Igor Canadi	423dee8418	Abort db_bench if Get() returns error Summary: I saw this when running readrandom benchmark with corrupted database -- benchmark worked! If a Get() returns corruption we should probably abort. Test Plan: compiles Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31701	2015-01-21 18:18:15 -08:00
sdong	206237d121	DBImpl::CheckConsistency() shouldn't create path name with double "/" Summary: GetLiveFilesMetaData() already adds a leading "/" in file name. No need to add one extra "/" in DBImpl::CheckConsistency() Test Plan: make all check Reviewers: yhchiang, rven, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D31779	2015-01-21 16:36:13 -08:00
Yueh-Hsuan Chiang	b229f970df	Remove Compaction::ReleaseInputs(). Summary: This patch remove the unnecessary Compaction::ReleaseInputs(). Compaction::ReleaseInputs() tries to unref its input_version and column_family. However, such unref is always done in ~Compaction(), and all current ReleaseInputs() calls are right before the destructor. Test Plan: ./db_test Reviewers: igor Reviewed By: igor Subscribers: igor, rven, dhruba, sdong Differential Revision: https://reviews.facebook.net/D31605	2015-01-15 12:44:19 -08:00
Thomas Dudziak	d10f1de2b4	Ported LevelDB's fault_injection_test Summary: This is a port of [[ https://github.com/google/leveldb/blob/master/db/fault_injection_test.cc \| LevelDB's fault_injection_test ]] to RocksDB. Unfortunately it fails with: ``` ==== Test FaultInjectionTest.FaultTest db/fault_injection_test.cc:491: Corruption: no meta-nextfile entry in descriptor #0 ./fault_injection_test() [0x41477a] rocksdb::FaultInjectionTest::PartialCompactTestReopenWithFault(rocksdb::FaultInjectionTest::ResetMethod, int, int) /data/users/tomdzk/rocksdb/db/fault_injection_test.cc:491 #1 ./fault_injection_test() [0x40a38a] rocksdb::_Test_FaultTest::_Run() /data/users/tomdzk/rocksdb/db/fault_injection_test.cc:517 #2 ./fault_injection_test() [0x415bea] rocksdb::_Test_FaultTest::_RunIt() /data/users/tomdzk/rocksdb/db/fault_injection_test.cc:507 #3 ./fault_injection_test() [0x584367] rocksdb::test::RunAllTests() /data/users/tomdzk/rocksdb/util/testharness.cc:70 #4 /usr/local/fbcode/gcc-4.8.1-glibc-2.17/lib/libc.so.6(__libc_start_main+0x10e) [0x7f7a40857efe] ?? ??:0 #5 ./fault_injection_test() [0x408bb8] _start ??:0 ``` so I commented out the test invocation in the source code for now (lines 514-520) so it can be merged. Test Plan: This is a new test. Reviewers: igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31587	2015-01-15 10:28:10 -08:00
Igor Canadi	9ab5adfc59	New BlockBasedTable version -- better compressed block format Summary: This diff adds BlockBasedTable format_version = 2. New format version brings better compressed block format for these compressions: 1) Zlib -- encode decompressed size in compressed block header 2) BZip2 -- encode decompressed size in compressed block header 3) LZ4 and LZ4HC -- instead of doing memcpy of size_t encode size as varint32. memcpy is very bad because the DB is not portable accross big/little endian machines or even platforms where size_t might be 8 or 4 bytes. It does not affect format for snappy. If you write a new database with format_version = 2, it will not be readable by RocksDB versions before 3.10. DB::Open() will return corruption in that case. Test Plan: Added a new test in db_test. I will also run db_bench and verify VSIZE when block_cache == 1GB Reviewers: yhchiang, rven, MarkCallaghan, dhruba, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31461	2015-01-14 16:24:24 -08:00
sdong	bb128bfec3	More accurate message for compaction applied to a different version Test Plan: Compile. Run it. Reviewers: yhchiang, dhruba, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D31479	2015-01-13 16:48:18 -08:00
Igor Canadi	53f615df6a	Fix clang build	2015-01-13 12:27:28 -08:00
Yueh-Hsuan Chiang	2159484dd6	Remove two unnecessary blank lines in db/db_test.cc	2015-01-13 01:40:11 -08:00
Yueh-Hsuan Chiang	d2c018fd5b	Make ThreadStatusMultipleCompaction more robust.	2015-01-13 01:02:10 -08:00
Yueh-Hsuan Chiang	bf9aa4dfcd	Improve GetThreadStatus to avoid false alarm in some case.	2015-01-13 00:38:09 -08:00
Yueh-Hsuan Chiang	c91cdd59c1	Allow GetThreadList() to indicate a thread is doing Compaction. Summary: Allow GetThreadList() to indicate a thread is doing Compaction. Test Plan: export ROCKSDB_TESTS=ThreadStatus ./db_test Reviewers: ljin, igor, sdong Reviewed By: sdong Subscribers: leveldb, dhruba, jonahcohen, rven Differential Revision: https://reviews.facebook.net/D30105	2015-01-13 00:04:08 -08:00
Igor Canadi	15d2abbec3	Fix build issues	2015-01-09 13:04:06 -08:00
Igor Canadi	abb9b95ffe	Move compression functions from port/ to util/ Summary: We keep checksum functions in util/, there is no reason for compression to be in port/ Test Plan: compiles Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31281	2015-01-09 12:57:11 -08:00
sdong	9132e52ea4	DB Stats Dump to print total stall time Summary: Add printing of stall time in DB Stats: Sample outputs: DB Stats Uptime(secs): 53.2 total, 1.7 interval Cumulative writes: 625940 writes, 625939 keys, 625940 batches, 1.0 writes per batch, 0.49 GB user ingest, stall micros: 50691070 Cumulative WAL: 625940 writes, 625939 syncs, 1.00 writes per sync, 0.49 GB written Interval writes: 10859 writes, 10859 keys, 10859 batches, 1.0 writes per batch, 8.7 MB user ingest, stall micros: 1692319 Interval WAL: 10859 writes, 10859 syncs, 1.00 writes per sync, 0.01 MB written Test Plan: make all check verify printing using db_bench Reviewers: igor, yhchiang, rven, MarkCallaghan Reviewed By: MarkCallaghan Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D31239	2015-01-09 11:44:19 -08:00
Ameya Gupte	242b9769c3	Memtablerep Benchmark Summary: Create a benchmark for testing memtablereps. This diff is a bit rough, but it should do the trick until other bootcampers can clean it up. Addressing comments Removed the mutexes Changed ReadWriteBenchmark to fix number of reads and count the number of writes we can perform in that time. Test Plan: Run it. Below runs pass ./memtablerep_bench --benchmarks fillrandom,readrandom --memtablerep skiplist ./memtablerep_bench --benchmarks fillseq,readseq --memtablerep skiplist ./memtablerep_bench --benchmarks readwrite,seqreadwrite --memtablerep skiplist --num_operations 200 --num_threads 5 ./memtablerep_bench --benchmarks fillrandom,readrandom --memtablerep hashskiplist ./memtablerep_bench --benchmarks fillseq,readseq --memtablerep hashskiplist --num_scans 2 ./memtablerep_bench --benchmarks fillseq,readseq --memtablerep vector Reviewers: jpaton, ikabiljo, sdong Reviewed By: sdong Subscribers: dhruba, ameyag Differential Revision: https://reviews.facebook.net/D22683	2015-01-07 15:15:30 -08:00
sdong	73ee4febab	Add comments about properties supported by DB::GetProperty() and DB::GetIntProperty() Summary: Add comments in db.h to help users discover their options. Test Plan: Compile Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: MarkCallaghan, yoshinorim, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31077	2015-01-07 15:09:35 -08:00
sdong	9ef59a09a5	VersionSet::AddLiveFiles() to assert current version is included. Summary: Add an extra assert to make sure current version is included in VersionSet::AddLiveFiles(). Test Plan: make all check Reviewers: yhchiang, rven, igor Reviewed By: igor Subscribers: dhruba, hermanlee4, leveldb Differential Revision: https://reviews.facebook.net/D30819	2015-01-07 12:03:40 -08:00
sdong	4d16a9a633	VersionBuilder to optimize for applying a later edit deleting files added by previous edits Summary: During recovery, VersionBuilder::Apply() was called multiple times. If the DB is open for long enough, most of files added earlier will be deleted by later deletes. In current solution, sorting added file happens first and then deletes are applied. In this patch, deletes are applied when possible inside Apply(), which can significantly reduce the sorting time in some cases. Test Plan: Add unit tests in version_builder valgrind_check Open a manifest of 50MB, with 9K live files. The manifest read time reduced from 1.6 seconds to 0.7 seconds. Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30765	2015-01-07 10:36:58 -08:00
Igor Canadi	7731d51c82	Simplify column family concurrency Summary: This patch changes concurrency guarantees around ColumnFamilySet::column_families_ and ColumnFamilySet::column_families_data_. Before: * When mutating: lock DB mutex and spin lock * When reading: lock DB mutex OR spin lock After: * When mutating: lock DB mutex and be in write thread * When reading: lock DB mutex or be in write thread That way, we eliminate the spin lock that protects these hash maps and simplify concurrency. That means we don't need to lock the spin lock during writing, since writing is mutually exclusive with column family create/drop (the only operations that mutate those hash maps). With these new restrictions, I also needed to move column family create to the write thread (column family drop was already in the write thread). Even though we don't need to lock the spin lock during write, impact on performance should be minimal -- the spin lock is almost never busy, so locking it is almost free. This addresses task t5116919. Test Plan: make check Stress test with lots and lots of column family drop and create: time ./db_stress --threads=30 --ops_per_thread=5000000 --max_key=5000 --column_families=200 --clear_column_family_one_in=100000 --verify_before_write=0 --reopen=15 --max_background_compactions=10 --max_background_flushes=10 --db=/fast-rocksdb-tmp/db_stress/ Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30651	2015-01-06 12:44:21 -08:00
Igor Canadi	07aa4e0e35	Fix compaction summary log for trivial move Summary: When trivial move commit is done, we log the summary of the input version instead of current. This is inconsistent with other log messages and confusing. Test Plan: compiles Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30939	2015-01-05 17:32:49 -08:00
Leonidas Galanis	9d5bd411be	benchmark.sh won't run through all tests properly if one specifies wal_dir to be different than db directory. Summary: A command line like this to run all the tests: source benchmark.config.sh && nohup ./benchmark.sh 'bulkload,fillseq,overwrite,filluniquerandom,readrandom,readwhilewriting' where benchmark.config.sh is: export DB_DIR=/data/mysql/rocksdata export WAL_DIR=/txlogs/rockswal export OUTPUT_DIR=/root/rocks_benchmarking/output Will fail for the tests that need a new DB . Also 1) set disable_data_sync=0 and 2) add debug mode to run through all the tests more quickly Test Plan: run ./benchmark.sh 'debug,bulkload,fillseq,overwrite,filluniquerandom,readrandom,readwhilewriting' and verify that there are no complaints about WAL dir not being empty. Reviewers: sdong, yhchiang, rven, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D30909	2015-01-05 15:36:47 -08:00
Igor Canadi	62ad0a9b19	Deprecating skip_log_error_on_recovery Summary: Since https://reviews.facebook.net/D16119, we ignore partial tailing writes. Because of that, we no longer need skip_log_error_on_recovery. The documentation says "Skip log corruption error on recovery (If client is ok with losing most recent changes)", while the option actually ignores any corruption of the WAL (not only just the most recent changes). This is very dangerous and can lead to DB inconsistencies. This was originally set up to ignore partial tailing writes, which we now do automatically (after D16119). I have digged up old task t2416297 which confirms my findings. Test Plan: There was actually no tests that verified correct behavior of skip_log_error_on_recovery. Reviewers: yhchiang, rven, dhruba, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30603	2015-01-05 13:35:56 -08:00
Igor Canadi	fa0b126c0c	Fix corruption_test -- if status is not OK, return status -- during recovery	2015-01-05 10:49:41 -08:00
Igor Canadi	d7b4bb62a7	Fail DB::Open() on WAL corruption Summary: This is a serious bug. If paranod_check == true and WAL is corrupted, we don't fail DB::Open(). I tried going into history and it seems we've been doing this for a long long time. I found this when investigating t5852041. Test Plan: Added unit test to verify correct behavior. Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30597	2015-01-05 10:26:34 -08:00
sdong	e9ca358157	Fix CLANG build for db_bench Summary: CLANG was broken for a recent change in db_ench. Fix it. Test Plan: Build db_bench using CLANG. Reviewers: rven, igor, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30801	2014-12-30 18:33:49 -08:00
Yueh-Hsuan Chiang	bf287b76e0	Add structures for exposing thread events and operations. Summary: Add structures for exposing events and operations. Event describes high-level action about a thread such as doing compaciton or doing flush, while an operation describes lower-level action of a thread such as reading / writing a SST table, waiting for mutex. Events and operations are designed to be independent. One thread would typically involve in one event and one operation. Code instrument will be in a separate diff. Test Plan: Add unit-tests in thread_list_test make dbg -j32 ./thread_list_test export ROCKSDB_TESTS=ThreadList ./db_test Reviewers: ljin, igor, sdong Reviewed By: sdong Subscribers: rven, jonahcohen, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D29781	2014-12-30 10:39:13 -08:00
sdong	a801c1fb09	db_bench --num_hot_column_families to be default off Summary: Having --num_hot_column_families default on fails some existing regression tests. By default turn it off Test Plan: Run db_bench to make sure it is default off. Reviewers: yhchiang, rven, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D30705	2014-12-24 09:00:23 -08:00
sdong	ddc81440d5	db_bench to add an option as number of hot column families to add to Summary: Add option --num_hot_column_families in db_bench. If it is set, write options will first write to that number of column families, and then move on to next set of hot column families. The working set of column families can be smaller than total number of CFs. It is to test how RocksDB can handle cold column families Test Plan: Run db_bench with --num_hot_column_families set and not set. Reviewers: yhchiang, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30663	2014-12-23 16:52:05 -08:00
Yueh-Hsuan Chiang	a944afd356	Fixed a compile error in db/db_impl.cc on ROCKSDB_LITE	2014-12-23 16:19:40 -08:00
Chris BeHanna	d232cb156b	Fix the build with -DNDEBUG. Dike out the body of VerifyCompactionResult. With assert() compiled out, the loop index variable in the inner loop was unused, breaking the build when -Werror is enabled.	2014-12-22 17:06:18 -06:00
Yueh-Hsuan Chiang	45bab305f9	Move GetThreadList() feature under Env. Summary: GetThreadList() feature depends on the thread creation and destruction, which is currently handled under Env. This patch moves GetThreadList() feature under Env to better manage the dependency of GetThreadList() feature on thread creation and destruction. Renamed ThreadStatusImpl to ThreadStatusUpdater. Add ThreadStatusUtil, which is a static class contains utility functions for ThreadStatusUpdater. Test Plan: run db_test, thread_list_test and db_bench and verify the life cycle of Env and ThreadStatusUpdater is properly managed. Reviewers: igor, sdong Reviewed By: sdong Subscribers: ljin, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30057	2014-12-22 12:20:17 -08:00
Igor Canadi	4fd26f287c	Only execute flush from compaction if max_background_flushes = 0 Summary: As title. We shouldn't need to execute flush from compaction if there are dedicated threads doing flushes. Test Plan: make check Reviewers: rven, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30579	2014-12-22 12:05:14 +01:00
Igor Canadi	0acc738810	Speed up FindObsoleteFiles() Summary: There are two versions of FindObsoleteFiles(): * full scan, which is executed every 6 hours (and it's terribly slow) * no full scan, which is executed every time a background process finishes and iterator is deleted This diff is optimizing the second case (no full scan). Here's what we do before the diff: * Get the list of obsolete files (files with ref==0). Some files in obsolete_files set might actually be live. * Get the list of live files to avoid deleting files that are live. * Delete files that are in obsolete_files and not in live_files. After this diff: * The only files with ref==0 that are still live are files that have been part of move compaction. Don't include moved files in obsolete_files. * Get the list of obsolete files (which exclude moved files). * No need to get the list of live files, since all files in obsolete_files need to be deleted. I'll post the benchmark results, but you can get the feel of it here: https://reviews.facebook.net/D30123 This depends on D30123. P.S. We should do full scan only in failure scenarios, not every 6 hours. I'll do this in a follow-up diff. Test Plan: One new unit test. Made sure that unit test fails if we don't have a `if (!f->moved)` safeguard in ~Version. make check Big number of compactions and flushes: ./db_stress --threads=30 --ops_per_thread=20000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0 --reopen=15 --max_background_compactions=10 --max_background_flushes=10 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000 Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30249	2014-12-22 12:04:45 +01:00
Igor Canadi	f8999fcf31	Fix a SIGSEGV in BackgroundFlush Summary: This one wasn't easy to find :) What happens is we go through all cfds on flush_queue_ and find no cfds to flush, but the cfd is set to the last CF we looped through and following code assumes we want it flushed. BTW @sdong do you think we should also make BackgroundFlush() only check a single cfd for flushing instead of doing this `while (!flush_queue_.empty())`? Test Plan: regression test no longer fails Reviewers: sdong, rven, yhchiang Reviewed By: yhchiang Subscribers: sdong, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30591	2014-12-21 00:23:28 -08:00
Igor Canadi	fdb6be4e24	Rewritten system for scheduling background work Summary: When scaling to higher number of column families, the worst bottleneck was MaybeScheduleFlushOrCompaction(), which did a for loop over all column families while holding a mutex. This patch addresses the issue. The approach is similar to our earlier efforts: instead of a pull-model, where we do something for every column family, we can do a push-based model -- when we detect that column family is ready to be flushed/compacted, we add it to the flush_queue_/compaction_queue_. That way we don't need to loop over every column family in MaybeScheduleFlushOrCompaction. Here are the performance results: Command: ./db_bench --write_buffer_size=268435456 --db_write_buffer_size=268435456 --db=/fast-rocksdb-tmp/rocks_lots_of_cf --use_existing_db=0 --open_files=55000 --statistics=1 --histogram=1 --disable_data_sync=1 --max_write_buffer_number=2 --sync=0 --benchmarks=fillrandom --threads=16 --num_column_families=5000 --disable_wal=1 --max_background_flushes=16 --max_background_compactions=16 --level0_file_num_compaction_trigger=2 --level0_slowdown_writes_trigger=2 --level0_stop_writes_trigger=3 --hard_rate_limit=1 --num=33333333 --writes=33333333 Before the patch: fillrandom : 26.950 micros/op 37105 ops/sec; 4.1 MB/s After the patch: fillrandom : 17.404 micros/op 57456 ops/sec; 6.4 MB/s Next bottleneck is VersionSet::AddLiveFiles, which is painfully slow when we have a lot of files. This is coming in the next patch, but when I removed that code, here's what I got: fillrandom : 7.590 micros/op 131758 ops/sec; 14.6 MB/s Test Plan: make check two stress tests: Big number of compactions and flushes: ./db_stress --threads=30 --ops_per_thread=20000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0 --reopen=15 --max_background_compactions=10 --max_background_flushes=10 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000 max_background_flushes=0, to verify that this case also works correctly ./db_stress --threads=30 --ops_per_thread=2000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0 --reopen=3 --max_background_compactions=3 --max_background_flushes=0 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000 Reviewers: ljin, rven, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30123	2014-12-19 20:38:12 +01:00
Yueh-Hsuan Chiang	25f70a5abb	Avoid unnecessary unlock and lock mutex when notifying events. Summary: Avoid unnecessary unlock and lock mutex when notifying events. Test Plan: ./listener_test Reviewers: igor Reviewed By: igor Subscribers: sdong, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30267	2014-12-16 17:10:23 -08:00
Venkatesh Radhakrishnan	7661e5a76e	Move the file copy out of the mutex. Summary: We now release the mutex before copying the files in the case of the trivial move. This path does not use the compaction job. Test Plan: DBTest.LevelCompactionThirdPath Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30381	2014-12-16 16:57:22 -08:00
Venkatesh Radhakrishnan	153f4f0719	RocksDB: Allow Level-Style Compaction to Place Files in Different Paths Summary: Allow Level-style compaction to place files in different paths This diff provides the code for task 4854591. We now support level-compaction to place files in different paths by specifying them in db_paths along with the minimum level for files to store in that path. Test Plan: ManualLevelCompactionOutputPathId in db_test.cc Reviewers: yhchiang, MarkCallaghan, dhruba, yoshinorim, sdong Reviewed By: sdong Subscribers: yoshinorim, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D29799	2014-12-15 21:48:16 -08:00
sdong	7ab1526c0e	Add an assert and avoid std::sort(autovector) to investigate an ASAN issue Summary: ASAN build fails once for this error: 14:04:52 ==== Test DBTest.CompactFilesOnLevelCompaction 14:04:52 db_test: db/version_set.cc:1062: void rocksdb::VersionStorageInfo::AddFile(int, rocksdb::FileMetaData): Assertion `level <= 0 \|\| level_files->empty() \|\| internal_comparator_->Compare( (level_files)[level_files->size() - 1]->largest, f->smallest) < 0' failed. Not abling figure out reason. We use std:vector for sorting for save and add one more assert to help figure out whether it is the sorting's problem. Test Plan: make all check Reviewers: yhchiang, rven, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D30117	2014-12-12 12:44:00 -08:00
sdong	d7a486668c	Improve scalability of DB::GetSnapshot() Summary: Now DB::GetSnapshot() doesn't scale to more column families, as it needs to go through all the column families to find whether snapshot is supported. This patch optimizes it. Test Plan: Add unit tests to cover negative cases. make all check Reviewers: yhchiang, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30093	2014-12-11 13:27:57 -08:00
sdong	0ab0242f37	VersionBuilder to use unordered set and map to store added and deleted files Summary: Set operations in VerisonBuilder is shown as a performance bottleneck of restarting DB when there are lots of files. Make both of added_files and deleted_files use unordered set or map. Only when adding the files, sort the added files. Test Plan: make all check Reviewers: yhchiang, rven, igor Reviewed By: igor Subscribers: hermanlee4, leveldb, dhruba, ljin Differential Revision: https://reviews.facebook.net/D30051	2014-12-10 18:53:30 -08:00
sdong	046ba7d47c	Fix calculation of max_total_wal_size in db_options_.max_total_wal_size == 0 case Summary: This is a regression bug introduced by https://reviews.facebook.net/D24729 . max_total_wal_size would be off the target it should be more and more in the case that the a user holds the current super version after flush or compaction. This patch fixes it Test Plan: make all check Reviewers: yhchiang, rven, igor Reviewed By: igor Subscribers: ljin, yoshinorim, MarkCallaghan, hermanlee4, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D29961	2014-12-08 15:26:35 -08:00
Leonidas Galanis	635c61fd3b	Fix problem with create_if_missing option when wal_dir is used Summary: When wal_dir is used, DestroyDB is not passed the wal_dir option and so we get a Corruption exception. Test Plan: Verified manually that the following command line works now: ./db_bench --db=/mnt/db/rocksdb ... --disable_wal=0 --wal_dir=/data/users/rocksdb/WAL... --benchmarks=filluniquerandom --use_existing_db=0... Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D29859	2014-12-08 12:53:24 -08:00
sdong	1f04066cab	Add DBProperty to return number of snapshots and time for oldest snapshot Summary: Add a counter in SnapshotList to show number of snapshots. Also a unix timestamp in every snapshot. Add two DB Properties to return number of snapshots and timestamp of the oldest one. Test Plan: Add unit test checking Reviewers: yhchiang, rven, igor Reviewed By: igor Subscribers: leveldb, dhruba, MarkCallaghan Differential Revision: https://reviews.facebook.net/D29919	2014-12-05 17:07:49 -08:00
Yueh-Hsuan Chiang	a94d54aa47	Remove the use of exception in WriteBatch::Handler Summary: Remove the use of exception in WriteBatch::Handler. Now the default implementations of Put, Merge, and Delete in WriteBatch::Handler are no-op. Test Plan: Add three test cases in write_batch_test ./write_batch_test Reviewers: sdong, igor Reviewed By: sdong, igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D29835	2014-12-04 12:01:55 -08:00
Yueh-Hsuan Chiang	5f719d7202	Replace exception by setting valid_ = false in DBIter::MergeValuesNewToOld() Summary: Replace exception by setting valid_ = false in DBIter::MergeValuesNewToOld(). Test Plan: Not sure if I am right at this, but it seems we currently don't have a good way to test that code path as it requires dynamically set merge_operator = nullptr at the time while Merge() is calling. Reviewers: igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D29811	2014-12-04 11:11:11 -08:00
Mark Callaghan	32a0a03844	Add Moved(GB) to Compaction IO stats Summary: Adds counter for bytes moved (files pushed down a level rather than compacted) to compaction IO stats as Moved(GB). From the output removed these infrequently used columns: RW-Amp, Rn(cnt), Rnp1(cnt), Wnp1(cnt), Wnew(cnt). Example old output: Level Files Size(MB) Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) RW-Amp W-Amp Rd(MB/s) Wr(MB/s) Rn(cnt) Rnp1(cnt) Wnp1(cnt) Wnew(cnt) Comp(sec) Comp(cnt) Avg(sec) Stall(sec) Stall(cnt) Avg(ms) RecordIn RecordDrop ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ L0 0/0 0 0.0 0.0 0.0 0.0 2130.8 2130.8 0.0 0.0 0.0 109.1 0 0 0 0 20002 25068 0.798 28.75 182059 0.16 0 0 L1 142/0 509 1.0 4618.5 2036.5 2582.0 4602.1 2020.2 4.5 2.3 88.5 88.1 24220 701246 1215528 514282 53466 4229 12.643 0.00 0 0.002032745988 300688729 Example new output: Level Files Size(MB) Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) Stall(sec) Stall(cnt) Avg(ms) RecordIn RecordDrop -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- L0 7/0 13 1.8 0.0 0.0 0.0 0.6 0.6 0.0 0.0 0.0 14.7 44 353 0.124 0.03 626 0.05 0 0 L1 9/0 16 1.6 0.0 0.0 0.0 0.0 0.0 0.6 0.0 0.0 0.0 0 0 0.000 0.00 0 0.00 0 0 Task ID: # Blame Rev: Test Plan: make check, run db_bench --fillseq --stats_per_interval --stats_interval and look at output Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D29787	2014-12-03 18:28:39 -08:00
Jonah Cohen	a14b7873ee	Enforce write buffer memory limit across column families Summary: Introduces a new class for managing write buffer memory across column families. We supplement ColumnFamilyOptions::write_buffer_size with ColumnFamilyOptions::write_buffer, a shared pointer to a WriteBuffer instance that enforces memory limits before flushing out to disk. Test Plan: Added SharedWriteBuffer unit test to db_test.cc Reviewers: sdong, rven, ljin, igor Reviewed By: igor Subscribers: tnovak, yhchiang, dhruba, xjin, MarkCallaghan, yoshinorim Differential Revision: https://reviews.facebook.net/D22581	2014-12-02 12:09:20 -08:00
Igor Canadi	37d73d597e	Fix linters Summary: Two fixes: 1. if cpplint is not present on the system, don't return a confusing error in the linter 2. Add include_alpha, which means our includes should be sorted lexicographically Test Plan: Tried unsorting our includes, lint complained Reviewers: rven, ljin, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28845	2014-12-02 13:53:39 -05:00

1 2 3 4 5 ...

1633 Commits