rocksdb

Author	SHA1	Message	Date
Igor Canadi	d59d90bb1f	db_bench periodically writes QPS to CSV file Summary: This is part of an effort to better understand and optimize RocksDB stalls under high load. I added a feature to db_bench to periodically write QPS to CSV files. That way we can nicely see how our QPS changes in time (especially when DB is stalled) and can do a better job of evaluating our stall system (i.e. we want the QPS to be as constant as possible, as opposed to having bunch of stalls) Cool part of CSV files is that we can easily graph them -- there are a bunch of tools available. Test Plan: Ran ./db_bench --report_interval_seconds=10 --benchmarks=fillrandom --num=10000000 and observed this in report.csv: secs_elapsed,interval_qps 10,2725860 20,1980480 30,1863456 40,1454359 50,1460389 Reviewers: sdong, MarkCallaghan, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D40047	2015-06-12 14:31:53 -07:00
sdong	7842920be5	Slow down writes by bytes written Summary: We slow down data into the database to the rate of options.delayed_write_rate (a new option) with this patch. The thread synchronization approach I take is to still synchronize write controller by DB mutex and GetDelay() is inside DB mutex. Try to minimize the frequency of getting time in GetDelay(). I verified it through db_bench and it seems to work hard_rate_limit is deprecated. options.delayed_write_rate is still not dynamically changeable. Need to work on it as a follow-up. Test Plan: Add new unit tests in db_test Reviewers: yhchiang, rven, kradhakrishnan, anthony, MarkCallaghan, igor Reviewed By: igor Subscribers: ikabiljo, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D36351	2015-06-11 20:42:18 -07:00
sdong	e409d3d745	Make "make all" work for CYGWIN Summary: Some test and benchmark codes don't build for CYGWIN. Fix it. Test Plan: Build "make all" with TARGET_OS=Cygwin on cygwin and make sure it passes. Reviewers: rven, yhchiang, anthony, igor, kradhakrishnan Reviewed By: igor, kradhakrishnan Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D39711	2015-06-09 16:36:07 -07:00
Igor Canadi	4c181f08bc	Fix compile on darwin Summary: As title Test Plan: make check Reviewers: anthony Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39243	2015-05-30 12:25:45 -04:00
agiardullo	dc9d70de65	Optimistic Transactions Summary: Optimistic transactions supporting begin/commit/rollback semantics. Currently relies on checking the memtable to determine if there are any collisions at commit time. Not yet implemented would be a way of enuring the memtable has some minimum amount of history so that we won't fail to commit when the memtable is empty. You should probably start with transaction.h to get an overview of what is currently supported. Test Plan: Added a new test, but still need to look into stress testing. Reviewers: yhchiang, igor, rven, sdong Reviewed By: sdong Subscribers: adamretter, MarkCallaghan, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D33435	2015-05-29 14:36:35 -07:00
agiardullo	c815351038	Support saving history in memtable_list Summary: For transactions, we are using the memtables to validate that there are no write conflicts. But after flushing, we don't have any memtables, and transactions could fail to commit. So we want to someone keep around some extra history to use for conflict checking. In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit. After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure). It seems like the best place for this is abstracted inside the memtable_list. I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much. This diff adds a new parameter to control how much memtable history to keep around after flushing. However, it sounds like people aren't too fond of adding new parameters. So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers. This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit. (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached). So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit). However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions. Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests. Added testing in memtablelist_test and planning on adding more testing here. Reviewers: sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37443	2015-05-28 16:34:24 -07:00
Igor Canadi	7a3577519f	Don't artificially inflate L0 score Summary: This turns out to be pretty bad because if we prioritize L0->L1 then L1 can grow artificially large, which makes L0->L1 more and more expensive. For example: 256MB @ L0 + 256MB @ L1 --> 512MB @ L1 256MB @ L0 + 512MB @ L1 --> 768MB @ L1 256MB @ L0 + 768MB @ L1 --> 1GB @ L1 .... 256MB @ L0 + 10GB @ L1 --> 10.2GB @ L1 At some point we need to start compacting L1->L2 to speed up L0->L1. Test Plan: The performance improvement is massive for heavy write workload. This is the benchmark I ran: https://phabricator.fb.com/P19842671. Before this change, the benchmark took 47 minutes to complete. After, the benchmark finished in 2minutes. You can see full results here: https://phabricator.fb.com/P19842674 Also, we ran this diff on MongoDB on RocksDB on one replicaset. Before the change, our initial sync was so slow that it couldn't keep up with primary writes. After the change, the import finished without any issues Reviewers: dynamike, MarkCallaghan, rven, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38637	2015-05-21 11:40:48 -07:00
Mark Callaghan	944043d683	Add --wal_bytes_per_sync for db_bench and more IO stats Summary: See https://gist.github.com/mdcallag/89ebb2b8cbd331854865 for the IO stats. I added "Cumulative compaction:" and "Interval compaction:" lines. The IO rates can be confusing. Rates fro per-level stats lines, Wr(MB/s) & Rd(MB/s), are computed using the duration of the compaction job. If the job reads 10MB, writes 9MB and the job (IO & merging) takes 1 second then the rates are 10MB/s for read and 9MB/s for writes. The IO rates in the Cumulative compaction line uses the total uptime. The IO rates in the Interval compaction line uses the interval uptime. So these Cumalative & Interval compaction IO rates cannot be compared to the per-level IO rates. But both forms of the rates are useful for debugging perf. Task ID: # Blame Rev: Test Plan: run db_bench Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D38667	2015-05-19 16:19:30 -07:00
sdong	bc68bd5a13	db_bench to support rate limiter Summary: Add --rate_limiter_bytes_per_sec to db_bench to allow rater limit to disk Test Plan: Run ./db_bench --benchmarks=fillseq --num=30000000 --rate_limiter_bytes_per_sec=3000000 --num_multi_db=8 -disable_wal And see io_stats to have the rate limited. Reviewers: yhchiang, rven, anthony, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D38385	2015-05-13 10:03:41 -07:00
Yueh-Hsuan Chiang	77a5a543a5	Allow GetThreadList() to report basic compaction operation properties. Summary: Now we're able to show more details about a compaction in GetThreadList() :) This patch allows GetThreadList() to report basic compaction operation properties. Basic compaction properties include: 1. job id 2. compaction input / output level 3. compaction property flags (is_manual, is_deletion, .. etc) 4. total input bytes 5. the number of bytes has been read currently. 6. the number of bytes has been written currently. Flush operation properties will be done in a seperate diff. Test Plan: /db_bench --threads=30 --num=1000000 --benchmarks=fillrandom --thread_status_per_interval=1 Sample output of tracking same job: ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140664171987072 Low Pri default Compaction 31.357 ms CompactionJob::FinishCompactionOutputFile BaseInputLevel 1 \| BytesRead 2264663 \| BytesWritten 1934241 \| IsDeletion 0 \| IsManual 0 \| IsTrivialMove 0 \| JobID 277 \| OutputLevel 2 \| TotalInputBytes 3964158 \| ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140664171987072 Low Pri default Compaction 59.440 ms CompactionJob::FinishCompactionOutputFile BaseInputLevel 1 \| BytesRead 2264663 \| BytesWritten 1934241 \| IsDeletion 0 \| IsManual 0 \| IsTrivialMove 0 \| JobID 277 \| OutputLevel 2 \| TotalInputBytes 3964158 \| ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140664171987072 Low Pri default Compaction 226.375 ms CompactionJob::Install BaseInputLevel 1 \| BytesRead 3958013 \| BytesWritten 3621940 \| IsDeletion 0 \| IsManual 0 \| IsTrivialMove 0 \| JobID 277 \| OutputLevel 2 \| TotalInputBytes 3964158 \| Reviewers: sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37653	2015-05-06 22:51:06 -07:00
Mark Callaghan	b6b100fe04	Remove iter_refresh_interval_us Summary: The default, use one iter for the whole test, isn't good. This cost me a few hours of debugging and a few days of tessting. For readonly that isn't realistic and for read-write that keeps a lot of old sst files around. I remove the option because nothing uses it and not calling gettimeofday per loop iteration adds about 3% to QPS at 20 threads. Task ID: # Blame Rev: Test Plan: run db_bench Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D37965	2015-05-01 14:17:45 -07:00
Mark Callaghan	ed229a0dee	Fixes for readcache-flashcache Summary: This fixes two problems: 1) the env should not be created twice when use_existing_db is false 2) the env dtor should run before cachedev_fd_ is closed. Task ID: # Blame Rev: Test Plan: Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D36795	2015-04-09 15:51:34 -07:00
Yoshinori Matsunobu	f12614070f	Fix TSAN build error of D36447 Summary: D36447 caused build error when using COMPILE_WITH_TSAN=1. This diff fixes the error. Test Plan: jenkins Reviewers: igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D36579	2015-04-06 17:37:36 -07:00
Yoshinori Matsunobu	824e646341	Adding another NewFlashcacheAwareEnv function to support pre-opened fd Summary: There are some cases when flachcache file descriptor was already allocated (i.e. fb-MySQL). Then NewFlashcacheAwareEnv returns an error at open() because fd was already assigned. This diff adds another function to instantiate FlashcacheAwareEnv, with pre-allocated fd cachedev_fd. Test Plan: Tested with MyRocks using this function, then worked Reviewers: sdong, igor Reviewed By: igor Subscribers: dhruba, MarkCallaghan, rven Differential Revision: https://reviews.facebook.net/D36447	2015-04-06 16:50:36 -07:00
Mark Callaghan	1bd70fb54a	Add --stats_interval_seconds to db_bench Summary: The --stats_interval_seconds determines interval for stats reporting and overrides --stats_interval when set. I also changed tools/benchmark.sh to report stats every 60 seconds so I can avoid trying to figure out a good value for --stats_interval per test and per storage device. Task ID: #6631621 Blame Rev: Test Plan: run tools/run_flash_bench, look at output Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D36189	2015-03-30 12:58:32 -07:00
Mark Callaghan	99ec2412e5	Make the benchmark scripts configurable and add tests Summary: This makes run_flash_bench.sh configurable. Previously it was hardwired for 1B keys and tests ran for 12 hours each. That kept me from using it. This makes it configuable, adds more tests, makes the duration per-test configurable and refactors the test scripts. Adds the seekrandomwhilemerging test to db_bench which is the same as seekrandomwhilewriting except the writer thread does Merge rather than Put. Forces the stall-time column in compaction IO stats to use a fixed format (H:M:S) which makes it easier to scrape and parse. Also adds an option to AppendHumanMicros to force a fixed format. Sometimes automation and humans want different format. Calls thread->stats.AddBytes(bytes); in db_bench for more tests to get the MB/sec summary stats in the output at test end. Adds the average ingest rate to compaction IO stats. Output now looks like: https://gist.github.com/mdcallag/2bd64d18be1b93adc494 More information on the benchmark output is at https://gist.github.com/mdcallag/db43a58bd5ac624f01e1 For benchmark.sh changes default RocksDB configuration to reduce stalls: * min_level_to_compress from 2 to 3 * hard_rate_limit from 2 to 3 * max_grandparent_overlap_factor and max_bytes_for_level_multiplier from 10 to 8 * L0 file count triggers from 4,8,12 to 4,12,20 for (start,stall,stop) Task ID: #6596829 Blame Rev: Test Plan: run tools/run_flash_bench.sh Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D36075	2015-03-30 11:28:25 -07:00
Igor Canadi	d61cb0b9de	db_bench can now disable flashcache for background threads Summary: Most of the approach is copied from WebSQL's MySQL branch. It's nice that we can do this without touching core RocksDB code. Test Plan: Compiles and runs. Didn't test flashback code, as I don't have flashback device and most if it is c/p Reviewers: MarkCallaghan, sdong Reviewed By: sdong Subscribers: rven, lgalanis, kradhakrishnan, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35391	2015-03-30 09:51:11 -07:00
Alexander.Mikhaylov	a3e4b32483	fix compilation error (same as fix #284 ) [maa@srv2-nskb-devg2 rocksdb-master]$ CXX=/usr/local/CC/gcc-4.7.4/bin/g++ EXTRA_CXXFLAGS=-std=c++11 DISABLE_WARNING_AS_ERROR=1 make db_bench CC db/db_bench.o db/db_bench.cc: In member function 'rocksdb::Slice rocksdb::Benchmark::AllocateKey(std::unique_ptr<const char []>)': db/db_bench.cc:1434:41: error: use of deleted function 'void std::unique_ptr<_Tp [], _Dp>::reset(_Up) [with _Up = char; _Tp = const char; _Dp = std::default_delete<const char []>]' In file included from /usr/local/CC/gcc-4.7.4/lib/gcc/x86_64-unknown-linux-gnu/4.7.4/../../../../include/c++/4.7.4/memory:86:0, from ./include/rocksdb/db.h:14, from ./db/dbformat.h:14, from ./db/db_impl.h:21, from db/db_bench.cc:33:	2015-03-26 14:53:42 +06:00
Yueh-Hsuan Chiang	248c063ba1	Report elapsed time in micros in ThreadStatus instead of start time. Summary: Report elapsed time of a thread operation in micros in ThreadStatus instead of start time of a thread operation in seconds since the Epoch, 1970-01-01 00:00:00 (UTC). Test Plan: ./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \ --max_background_compactions=10 --max_background_flushes=3 \ --thread_status_per_interval=1000 --key_size=16 --value_size=1000 \ --num_column_families=10 Sample Output: ThreadID ThreadType cfName Operation ElapsedTime Stage State 140667724562496 High Pri column_family_name_000002 Flush 772.419 ms FlushJob::WriteLevel0Table 140667728756800 High Pri default Flush 617.845 ms FlushJob::WriteLevel0Table 140667732951104 High Pri column_family_name_000005 Flush 772.078 ms FlushJob::WriteLevel0Table 140667875557440 Low Pri column_family_name_000008 Compaction 1409.216 ms CompactionJob::Install 140667737145408 Low Pri 140667749728320 Low Pri 140667816837184 Low Pri column_family_name_000007 Compaction 1071.815 ms CompactionJob::ProcessKeyValueCompaction 140667787477056 Low Pri column_family_name_000009 Compaction 772.516 ms CompactionJob::ProcessKeyValueCompaction 140667741339712 Low Pri 140667758116928 Low Pri column_family_name_000004 Compaction 620.739 ms CompactionJob::ProcessKeyValueCompaction 140667753922624 Low Pri 140667842003008 Low Pri column_family_name_000006 Compaction 1260.079 ms CompactionJob::ProcessKeyValueCompaction 140667745534016 Low Pri Reviewers: sdong, igor, rven Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35769	2015-03-24 11:32:25 -07:00
Mark Callaghan	dfccc7b4e2	Add readwhilemerging benchmark Summary: This is like readwhilewriting but uses Merge rather than Put in the writer thread. I am using it for in-progress benchmarks. I don't think the other benchmarks for Merge cover this behavior. The purpose for this test is to measure read performance when readers might have to merge results. This will also benefit from work-in-progress to add skewed key generation. Task ID: # Blame Rev: Test Plan: Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D35115	2015-03-18 13:50:52 -07:00
Igor Canadi	c88ff4ca76	Deprecate removeScanCountLimit in NewLRUCache Summary: It is no longer used by the implementation, so we should also remove it from the public API. Test Plan: make check Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34971	2015-03-17 15:04:37 -07:00
Igor Canadi	417367c42d	Fix SIGSEGV when not using cache	2015-03-13 16:41:00 -07:00
Igor Canadi	f690712652	Speed up db_bench shutdown Summary: See t6489044 Test Plan: compiles Reviewers: MarkCallaghan Reviewed By: MarkCallaghan Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34977	2015-03-13 14:45:15 -07:00
Yueh-Hsuan Chiang	c594b0e89d	Allow GetThreadList() to report operation stage. Summary: Allow GetThreadList() to report operation stage. Test Plan: ./thread_list_test ./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \ --max_background_compactions=10 --max_background_flushes=3 \ --thread_status_per_interval=1000 --key_size=16 --value_size=1000 \ --num_column_families=10 export ROCKSDB_TESTS=ThreadStatus ./db_test Sample output ThreadID ThreadType cfName Operation OP_StartTime ElapsedTime Stage State 140116265861184 Low Pri 140116270055488 Low Pri 140116274249792 High Pri column_family_name_000005 Flush 2015/03/10-14:58:11 0 us FlushJob::WriteLevel0Table 140116400078912 Low Pri column_family_name_000004 Compaction 2015/03/10-14:58:11 0 us CompactionJob::FinishCompactionOutputFile 140116358135872 Low Pri column_family_name_000006 Compaction 2015/03/10-14:58:10 1 us CompactionJob::FinishCompactionOutputFile 140116341358656 Low Pri 140116295221312 High Pri default Flush 2015/03/10-14:58:11 0 us FlushJob::WriteLevel0Table 140116324581440 Low Pri column_family_name_000009 Compaction 2015/03/10-14:58:11 0 us CompactionJob::ProcessKeyValueCompaction 140116278444096 Low Pri 140116299415616 Low Pri column_family_name_000008 Compaction 2015/03/10-14:58:11 0 us CompactionJob::FinishCompactionOutputFile 140116291027008 High Pri column_family_name_000001 Flush 2015/03/10-14:58:11 0 us FlushJob::WriteLevel0Table 140116286832704 Low Pri column_family_name_000002 Compaction 2015/03/10-14:58:11 0 us CompactionJob::FinishCompactionOutputFile 140116282638400 Low Pri Reviewers: rven, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34683	2015-03-13 10:45:40 -07:00
Islam AbdelRahman	1d43bc41fb	Fixing segmentation fault in db_bench Summary: Fixing segmentation fault when running db_bench This seg fault happens because num_created is used without being initialized Test Plan: running db_bench using these arguments bpl=10485760;overlap=10;mcz=2;del=300000000;levels=6;ctrig=4; delay=8; stop=12; wbn=3; mbc=20; mb=67108864;wbs=134217728; dds=0; sync=0; r=1000000; t=1; vs=800; bs=65536; cs=1048576; of=500000; si=1000000; ./db_bench --benchmarks=overwrite --disable_seek_compaction=1 --mmap_read=0 --statistics=1 --histogram=1 --num=$r --threads=$t --value_size=$vs --block_size=$bs --cache_size=$cs --bloom_bits=10 --cache_numshardbits=4 --open_files=$of --verify_checksum=1 --db=/home/tec/koko/ --sync=$sync --disable_wal=1 --compression_type=zlib --stats_interval=$si --compression_ratio=0.5 --disable_data_sync=$dds --write_buffer_size=$wbs --target_file_size_base=$mb --max_write_buffer_number=$wbn --max_background_compactions=$mbc --level0_file_num_compaction_trigger=$ctrig --level0_slowdown_writes_trigger=$delay --level0_stop_writes_trigger=$stop --num_levels=$levels --delete_obsolete_files_period_micros=$del --min_level_to_compress=$mcz --max_grandparent_overlap_factor=$overlap --stats_per_interval=1 --max_bytes_for_level_base=$bpl --use_existing_db=1 Reviewers: sdong, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D34881	2015-03-11 17:57:16 -07:00
sdong	2884b100ba	db_bench: Better way to randomize repeated read keys in -read_random_exp_range Summary: Use a better way to map from a key with locality to a random location. Now with the same -read_random_exp_range setting, hit rate drops, which it is expected. Test Plan: ./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=<multiple_values> Reviewers: MarkCallaghan, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D34761	2015-03-11 11:46:14 -07:00
Yueh-Hsuan Chiang	89597bb66b	Allow GetThreadList() to report the start time of the current operation. Summary: Allow GetThreadList() to report the start time of the current operation. Test Plan: ./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \ --max_background_compactions=10 --max_background_flushes=3 \ --thread_status_per_interval=1000 --key_size=16 --value_size=1000 \ --num_column_families=10 Sample output: ThreadID ThreadType cfName Operation OP_StartTime State 140338840797248 High Pri column_family_name_000003 Flush 2015/03/09-17:49:59 140338844991552 High Pri column_family_name_000004 Flush 2015/03/09-17:49:59 140338849185856 Low Pri 140338983403584 Low Pri 140339008569408 Low Pri 140338861768768 Low Pri 140338924683328 Low Pri 140338899517504 Low Pri 140338853380160 Low Pri 140338882740288 Low Pri 140338865963072 High Pri column_family_name_000006 Flush 2015/03/09-17:49:59 140338954043456 Low Pri 140338857574464 Low Pri Reviewers: igor, rven, sdong Reviewed By: sdong Subscribers: lgalanis, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34689	2015-03-10 14:51:28 -07:00
sdong	37921b4997	db_bench: Add Option -read_random_exp_range to allow read skewness. Summary: Introduce parameter -read_random_exp_range in db_bench to provide some key skewness in readrandom and multireadrandom benchmarks. It will helpful to cover block cache better. Test Plan: Run benchmarks with this new parameter. I can clearly see block cache hit rate change while I increase this value (DB size is about 66MB): ./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=0.0 rocksdb.block.cache.data.miss COUNT : 958418 rocksdb.block.cache.data.hit COUNT : 41582 ./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=5.0 rocksdb.block.cache.data.miss COUNT : 819518 rocksdb.block.cache.data.hit COUNT : 180482 ./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=10.0 rocksdb.block.cache.data.miss COUNT : 450479 rocksdb.block.cache.data.hit COUNT : 549521 ./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=20.0 rocksdb.block.cache.data.miss COUNT : 223192 rocksdb.block.cache.data.hit COUNT : 776808 Reviewers: MarkCallaghan, kradhakrishnan, yhchiang, rven, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D34629	2015-03-09 11:34:52 -07:00
Yueh-Hsuan Chiang	dc4532c497	Add --thread_status_per_interval to db_bench Summary: Add --thread_status_per_interval to db_bench, which allows db_bench to optionally enable print the current thread status periodically. Test Plan: ./db_bench --benchmarks=fillrandom --num=100000 --threads=40 --max_background_compactions=10 --max_background_flushes=3 --thread_status_per_interval=1000 --key_size=16 --value_size=1000 --num_column_families=10 Sample output: ThreadID ThreadType dbName cfName Operation State 140281571770432 Low Pri 140281575964736 High Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000001 Flush 140281710182464 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000008 Compaction 140281638879296 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000007 Compaction 140281592741952 Low Pri 140281580159040 High Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000002 Flush 140281676628032 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000006 Compaction 140281584353344 Low Pri 140281622102080 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000009 Compaction 140281605324864 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000004 Compaction 140281601130560 High Pri /tmp/rocksdbtest-5297/dbbench default Flush 140281596936256 Low Pri 140281588547648 Low Pri Reviewers: igor, rven, sdong Reviewed By: rven Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D34515	2015-03-06 11:22:06 -08:00
Igor Canadi	db03739340	options.level_compaction_dynamic_level_bytes to allow RocksDB to pick size bases of levels dynamically. Summary: When having fixed max_bytes_for_level_base, the ratio of size of largest level and the second one can range from 0 to the multiplier. This makes LSM tree frequently irregular and unpredictable. It can also cause poor space amplification in some cases. In this improvement (proposed by Igor Kabiljo), we introduce a parameter option.level_compaction_use_dynamic_max_bytes. When turning it on, RocksDB is free to pick a level base in the range of (options.max_bytes_for_level_base/options.max_bytes_for_level_multiplier, options.max_bytes_for_level_base] so that real level ratios are close to options.max_bytes_for_level_multiplier. Test Plan: New unit tests and pass tests suites including valgrind. Reviewers: MarkCallaghan, rven, yhchiang, igor, ikabiljo Reviewed By: ikabiljo Subscribers: yoshinorim, ikabiljo, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31437	2015-03-02 22:40:41 -08:00
Igor Sugak	62247ffa3b	rocksdb: Add missing override Summary: When using latest clang (3.6 or 3.7/trunck) rocksdb is failing with many errors. Almost all of them are missing override errors. This diff adds missing override keyword. No manual changes. Prerequisites: bear and clang 3.5 build with extra tools ```lang=bash % USE_CLANG=1 bear make all # generate a compilation database http://clang.llvm.org/docs/JSONCompilationDatabase.html % clang-modernize -p . -include . -add-override % make format ``` Test Plan: Make sure all tests are passing. ```lang=bash % #Use default fb code clang. % make check ``` Verify less error and no missing override errors. ```lang=bash % # Have trunk clang present in path. % ROCKSDB_NO_FBCODE=1 CC=clang CXX=clang++ make ``` Reviewers: igor, kradhakrishnan, rven, meyering, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34077	2015-02-26 11:28:41 -08:00
Mark Callaghan	182b4ceacd	Limit key range to number of keys, not number of writes Summary: An old commit (482401) changed DoWrite to use the value of --writes rather than --num to determine the range for keys. This restores the old and correct behavior which is to limit it using --num. Task ID: #6353043 Blame Rev: Test Plan: run db_bench Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D34065	2015-02-25 15:53:45 -08:00
Igor Sugak	73711f956c	rocksdb: Fix scan-build bug 'Memory leak' in db/db_bench.cc Summary: The bug is detected by scan-build. In `void WriteSeqSeekSeq(ThreadState* thread)` memory is allocated in line 3118 `Slice key = AllocateKey();` but `Slice` is not responsible deleting `Slice::data()`. Added `std::unique_ptr<const char[]>*` parameter to ` AllocateKey()`, so that it requires caller to not forget about Slice::data() management. scan-build bug report: http://home.fburl.com/~sugak/latest6/report-6e9754.html#EndPath Test Plan: Make sure scan-build does not report 'Memory leak' in db/db_bench.cc and all tests are passing. ```lang=bash % make analyze % make check ``` Reviewers: lgalanis, igor, meyering, sdong Reviewed By: meyering, sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33501	2015-02-19 14:27:48 -08:00
Igor Canadi	ae82849bc9	Fix build failure	2015-01-21 18:23:12 -08:00
Igor Canadi	423dee8418	Abort db_bench if Get() returns error Summary: I saw this when running readrandom benchmark with corrupted database -- benchmark worked! If a Get() returns corruption we should probably abort. Test Plan: compiles Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31701	2015-01-21 18:18:15 -08:00
Igor Canadi	9ab5adfc59	New BlockBasedTable version -- better compressed block format Summary: This diff adds BlockBasedTable format_version = 2. New format version brings better compressed block format for these compressions: 1) Zlib -- encode decompressed size in compressed block header 2) BZip2 -- encode decompressed size in compressed block header 3) LZ4 and LZ4HC -- instead of doing memcpy of size_t encode size as varint32. memcpy is very bad because the DB is not portable accross big/little endian machines or even platforms where size_t might be 8 or 4 bytes. It does not affect format for snappy. If you write a new database with format_version = 2, it will not be readable by RocksDB versions before 3.10. DB::Open() will return corruption in that case. Test Plan: Added a new test in db_test. I will also run db_bench and verify VSIZE when block_cache == 1GB Reviewers: yhchiang, rven, MarkCallaghan, dhruba, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31461	2015-01-14 16:24:24 -08:00
Igor Canadi	15d2abbec3	Fix build issues	2015-01-09 13:04:06 -08:00
Leonidas Galanis	9d5bd411be	benchmark.sh won't run through all tests properly if one specifies wal_dir to be different than db directory. Summary: A command line like this to run all the tests: source benchmark.config.sh && nohup ./benchmark.sh 'bulkload,fillseq,overwrite,filluniquerandom,readrandom,readwhilewriting' where benchmark.config.sh is: export DB_DIR=/data/mysql/rocksdata export WAL_DIR=/txlogs/rockswal export OUTPUT_DIR=/root/rocks_benchmarking/output Will fail for the tests that need a new DB . Also 1) set disable_data_sync=0 and 2) add debug mode to run through all the tests more quickly Test Plan: run ./benchmark.sh 'debug,bulkload,fillseq,overwrite,filluniquerandom,readrandom,readwhilewriting' and verify that there are no complaints about WAL dir not being empty. Reviewers: sdong, yhchiang, rven, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D30909	2015-01-05 15:36:47 -08:00
sdong	e9ca358157	Fix CLANG build for db_bench Summary: CLANG was broken for a recent change in db_ench. Fix it. Test Plan: Build db_bench using CLANG. Reviewers: rven, igor, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30801	2014-12-30 18:33:49 -08:00
sdong	a801c1fb09	db_bench --num_hot_column_families to be default off Summary: Having --num_hot_column_families default on fails some existing regression tests. By default turn it off Test Plan: Run db_bench to make sure it is default off. Reviewers: yhchiang, rven, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D30705	2014-12-24 09:00:23 -08:00
sdong	ddc81440d5	db_bench to add an option as number of hot column families to add to Summary: Add option --num_hot_column_families in db_bench. If it is set, write options will first write to that number of column families, and then move on to next set of hot column families. The working set of column families can be smaller than total number of CFs. It is to test how RocksDB can handle cold column families Test Plan: Run db_bench with --num_hot_column_families set and not set. Reviewers: yhchiang, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30663	2014-12-23 16:52:05 -08:00
Igor Canadi	0acc738810	Speed up FindObsoleteFiles() Summary: There are two versions of FindObsoleteFiles(): * full scan, which is executed every 6 hours (and it's terribly slow) * no full scan, which is executed every time a background process finishes and iterator is deleted This diff is optimizing the second case (no full scan). Here's what we do before the diff: * Get the list of obsolete files (files with ref==0). Some files in obsolete_files set might actually be live. * Get the list of live files to avoid deleting files that are live. * Delete files that are in obsolete_files and not in live_files. After this diff: * The only files with ref==0 that are still live are files that have been part of move compaction. Don't include moved files in obsolete_files. * Get the list of obsolete files (which exclude moved files). * No need to get the list of live files, since all files in obsolete_files need to be deleted. I'll post the benchmark results, but you can get the feel of it here: https://reviews.facebook.net/D30123 This depends on D30123. P.S. We should do full scan only in failure scenarios, not every 6 hours. I'll do this in a follow-up diff. Test Plan: One new unit test. Made sure that unit test fails if we don't have a `if (!f->moved)` safeguard in ~Version. make check Big number of compactions and flushes: ./db_stress --threads=30 --ops_per_thread=20000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0 --reopen=15 --max_background_compactions=10 --max_background_flushes=10 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000 Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30249	2014-12-22 12:04:45 +01:00
Leonidas Galanis	635c61fd3b	Fix problem with create_if_missing option when wal_dir is used Summary: When wal_dir is used, DestroyDB is not passed the wal_dir option and so we get a Corruption exception. Test Plan: Verified manually that the following command line works now: ./db_bench --db=/mnt/db/rocksdb ... --disable_wal=0 --wal_dir=/data/users/rocksdb/WAL... --benchmarks=filluniquerandom --use_existing_db=0... Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D29859	2014-12-08 12:53:24 -08:00
Jonah Cohen	a14b7873ee	Enforce write buffer memory limit across column families Summary: Introduces a new class for managing write buffer memory across column families. We supplement ColumnFamilyOptions::write_buffer_size with ColumnFamilyOptions::write_buffer, a shared pointer to a WriteBuffer instance that enforces memory limits before flushing out to disk. Test Plan: Added SharedWriteBuffer unit test to db_test.cc Reviewers: sdong, rven, ljin, igor Reviewed By: igor Subscribers: tnovak, yhchiang, dhruba, xjin, MarkCallaghan, yoshinorim Differential Revision: https://reviews.facebook.net/D22581	2014-12-02 12:09:20 -08:00
Yueh-Hsuan Chiang	13de000f07	Add rocksdb::ToString() to address cases where std::to_string is not available. Summary: In some environment such as android, the c++ library does not have std::to_string. This path adds rocksdb::ToString(), which wraps std::to_string when std::to_string is not available, and implements std::to_string in the other case. Test Plan: make dbg -j32 ./db_test make clean make dbg OPT=-DOS_ANDROID -j32 ./db_test Reviewers: ljin, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D29181	2014-11-24 20:44:49 -08:00
sdong	3a40c427b9	Fix db_bench on CLANG mode Summary: "build all" breaks in Clang mode with db_bench. Fix it. Test Plan: USE_CLANG=1 make all Reviewers: ljin, rven, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D29379	2014-11-21 11:30:22 -08:00
Igor Canadi	cd278584c9	Clean up StringSplit Summary: stringSplit is not how we name our functions. Also, we had two StringSplit's in the codebase Test Plan: make check Reviewers: yhchiang, dhruba Reviewed By: dhruba Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D29361	2014-11-21 11:05:28 -05:00
sdong	a177742a9b	Make db_stress built for ROCKSDB_LITE Summary: Make db_stress built for ROCKSDB_LITE. The test doesn't pass tough. It seg fault quickly. But I took a look and it doesn't seem to be related to lite version. Likely to be a bug inside RocksDB. Test Plan: make db_stress Reviewers: yhchiang, rven, ljin, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D28797	2014-11-14 10:20:51 -08:00
Igor Canadi	767777c2bd	Turn on -Wshorten-64-to-32 and fix all the errors Summary: We need to turn on -Wshorten-64-to-32 for mobile. See D1671432 (internal phabricator) for details. This diff turns on the warning flag and fixes all the errors. There were also some interesting errors that I might call bugs, especially in plain table. Going forward, I think it makes sense to have this flag turned on and be very very careful when converting 64-bit to 32-bit variables. Test Plan: compiles Reviewers: ljin, rven, yhchiang, sdong Reviewed By: yhchiang Subscribers: bobbaldwin, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28689	2014-11-11 16:47:22 -05:00
Tomislav Novak	35c8c814e8	Make ForwardIterator::status() more efficient Summary: In D19581 I made `ForwardIterator::status()` check all child iterators, including immutable ones. It's, however, not necessary to do it every time -- it'll suffice to check only when they're used and their status could change. This diff: * introduces `immutable_status_` which is updated by `Seek()` and `Next()` * removes special handling of `kIncomplete` status in those methods Test Plan: * `db_test` * hacked ReadSequential in db_bench.cc to check `status()` in addition to validity: ``` $ ./db_bench -use_existing_db -benchmarks readseq -disable_auto_compactions \ -use_tailing_iterator # without this patch Keys: 16 bytes each Values: 100 bytes each (50 bytes after compression) Entries: 1000000 [...] DB path: [/dev/shm/rocksdbtest/dbbench] readseq : 0.562 micros/op 1778103 ops/sec; 98.4 MB/s $ ./db_bench -use_existing_db -benchmarks readseq -disable_auto_compactions \ -use_tailing_iterator # with the patch readseq : 0.433 micros/op 2311363 ops/sec; 127.8 MB/s ``` Reviewers: igor, sdong, ljin Reviewed By: ljin Subscribers: dhruba, leveldb, march, lovro Differential Revision: https://reviews.facebook.net/D24063	2014-11-10 15:44:20 -08:00

1 2 3 4 5 ...

272 Commits