rocksdb

Author	SHA1	Message	Date
Mark Callaghan	ed229a0dee	Fixes for readcache-flashcache Summary: This fixes two problems: 1) the env should not be created twice when use_existing_db is false 2) the env dtor should run before cachedev_fd_ is closed. Task ID: # Blame Rev: Test Plan: Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D36795	2015-04-09 15:51:34 -07:00
Yoshinori Matsunobu	f12614070f	Fix TSAN build error of D36447 Summary: D36447 caused build error when using COMPILE_WITH_TSAN=1. This diff fixes the error. Test Plan: jenkins Reviewers: igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D36579	2015-04-06 17:37:36 -07:00
Yoshinori Matsunobu	824e646341	Adding another NewFlashcacheAwareEnv function to support pre-opened fd Summary: There are some cases when flachcache file descriptor was already allocated (i.e. fb-MySQL). Then NewFlashcacheAwareEnv returns an error at open() because fd was already assigned. This diff adds another function to instantiate FlashcacheAwareEnv, with pre-allocated fd cachedev_fd. Test Plan: Tested with MyRocks using this function, then worked Reviewers: sdong, igor Reviewed By: igor Subscribers: dhruba, MarkCallaghan, rven Differential Revision: https://reviews.facebook.net/D36447	2015-04-06 16:50:36 -07:00
Mark Callaghan	1bd70fb54a	Add --stats_interval_seconds to db_bench Summary: The --stats_interval_seconds determines interval for stats reporting and overrides --stats_interval when set. I also changed tools/benchmark.sh to report stats every 60 seconds so I can avoid trying to figure out a good value for --stats_interval per test and per storage device. Task ID: #6631621 Blame Rev: Test Plan: run tools/run_flash_bench, look at output Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D36189	2015-03-30 12:58:32 -07:00
Mark Callaghan	99ec2412e5	Make the benchmark scripts configurable and add tests Summary: This makes run_flash_bench.sh configurable. Previously it was hardwired for 1B keys and tests ran for 12 hours each. That kept me from using it. This makes it configuable, adds more tests, makes the duration per-test configurable and refactors the test scripts. Adds the seekrandomwhilemerging test to db_bench which is the same as seekrandomwhilewriting except the writer thread does Merge rather than Put. Forces the stall-time column in compaction IO stats to use a fixed format (H:M:S) which makes it easier to scrape and parse. Also adds an option to AppendHumanMicros to force a fixed format. Sometimes automation and humans want different format. Calls thread->stats.AddBytes(bytes); in db_bench for more tests to get the MB/sec summary stats in the output at test end. Adds the average ingest rate to compaction IO stats. Output now looks like: https://gist.github.com/mdcallag/2bd64d18be1b93adc494 More information on the benchmark output is at https://gist.github.com/mdcallag/db43a58bd5ac624f01e1 For benchmark.sh changes default RocksDB configuration to reduce stalls: * min_level_to_compress from 2 to 3 * hard_rate_limit from 2 to 3 * max_grandparent_overlap_factor and max_bytes_for_level_multiplier from 10 to 8 * L0 file count triggers from 4,8,12 to 4,12,20 for (start,stall,stop) Task ID: #6596829 Blame Rev: Test Plan: run tools/run_flash_bench.sh Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D36075	2015-03-30 11:28:25 -07:00
Igor Canadi	d61cb0b9de	db_bench can now disable flashcache for background threads Summary: Most of the approach is copied from WebSQL's MySQL branch. It's nice that we can do this without touching core RocksDB code. Test Plan: Compiles and runs. Didn't test flashback code, as I don't have flashback device and most if it is c/p Reviewers: MarkCallaghan, sdong Reviewed By: sdong Subscribers: rven, lgalanis, kradhakrishnan, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35391	2015-03-30 09:51:11 -07:00
Alexander.Mikhaylov	a3e4b32483	fix compilation error (same as fix #284 ) [maa@srv2-nskb-devg2 rocksdb-master]$ CXX=/usr/local/CC/gcc-4.7.4/bin/g++ EXTRA_CXXFLAGS=-std=c++11 DISABLE_WARNING_AS_ERROR=1 make db_bench CC db/db_bench.o db/db_bench.cc: In member function 'rocksdb::Slice rocksdb::Benchmark::AllocateKey(std::unique_ptr<const char []>)': db/db_bench.cc:1434:41: error: use of deleted function 'void std::unique_ptr<_Tp [], _Dp>::reset(_Up) [with _Up = char; _Tp = const char; _Dp = std::default_delete<const char []>]' In file included from /usr/local/CC/gcc-4.7.4/lib/gcc/x86_64-unknown-linux-gnu/4.7.4/../../../../include/c++/4.7.4/memory:86:0, from ./include/rocksdb/db.h:14, from ./db/dbformat.h:14, from ./db/db_impl.h:21, from db/db_bench.cc:33:	2015-03-26 14:53:42 +06:00
Yueh-Hsuan Chiang	248c063ba1	Report elapsed time in micros in ThreadStatus instead of start time. Summary: Report elapsed time of a thread operation in micros in ThreadStatus instead of start time of a thread operation in seconds since the Epoch, 1970-01-01 00:00:00 (UTC). Test Plan: ./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \ --max_background_compactions=10 --max_background_flushes=3 \ --thread_status_per_interval=1000 --key_size=16 --value_size=1000 \ --num_column_families=10 Sample Output: ThreadID ThreadType cfName Operation ElapsedTime Stage State 140667724562496 High Pri column_family_name_000002 Flush 772.419 ms FlushJob::WriteLevel0Table 140667728756800 High Pri default Flush 617.845 ms FlushJob::WriteLevel0Table 140667732951104 High Pri column_family_name_000005 Flush 772.078 ms FlushJob::WriteLevel0Table 140667875557440 Low Pri column_family_name_000008 Compaction 1409.216 ms CompactionJob::Install 140667737145408 Low Pri 140667749728320 Low Pri 140667816837184 Low Pri column_family_name_000007 Compaction 1071.815 ms CompactionJob::ProcessKeyValueCompaction 140667787477056 Low Pri column_family_name_000009 Compaction 772.516 ms CompactionJob::ProcessKeyValueCompaction 140667741339712 Low Pri 140667758116928 Low Pri column_family_name_000004 Compaction 620.739 ms CompactionJob::ProcessKeyValueCompaction 140667753922624 Low Pri 140667842003008 Low Pri column_family_name_000006 Compaction 1260.079 ms CompactionJob::ProcessKeyValueCompaction 140667745534016 Low Pri Reviewers: sdong, igor, rven Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35769	2015-03-24 11:32:25 -07:00
Mark Callaghan	dfccc7b4e2	Add readwhilemerging benchmark Summary: This is like readwhilewriting but uses Merge rather than Put in the writer thread. I am using it for in-progress benchmarks. I don't think the other benchmarks for Merge cover this behavior. The purpose for this test is to measure read performance when readers might have to merge results. This will also benefit from work-in-progress to add skewed key generation. Task ID: # Blame Rev: Test Plan: Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D35115	2015-03-18 13:50:52 -07:00
Igor Canadi	c88ff4ca76	Deprecate removeScanCountLimit in NewLRUCache Summary: It is no longer used by the implementation, so we should also remove it from the public API. Test Plan: make check Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34971	2015-03-17 15:04:37 -07:00
Igor Canadi	417367c42d	Fix SIGSEGV when not using cache	2015-03-13 16:41:00 -07:00
Igor Canadi	f690712652	Speed up db_bench shutdown Summary: See t6489044 Test Plan: compiles Reviewers: MarkCallaghan Reviewed By: MarkCallaghan Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34977	2015-03-13 14:45:15 -07:00
Yueh-Hsuan Chiang	c594b0e89d	Allow GetThreadList() to report operation stage. Summary: Allow GetThreadList() to report operation stage. Test Plan: ./thread_list_test ./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \ --max_background_compactions=10 --max_background_flushes=3 \ --thread_status_per_interval=1000 --key_size=16 --value_size=1000 \ --num_column_families=10 export ROCKSDB_TESTS=ThreadStatus ./db_test Sample output ThreadID ThreadType cfName Operation OP_StartTime ElapsedTime Stage State 140116265861184 Low Pri 140116270055488 Low Pri 140116274249792 High Pri column_family_name_000005 Flush 2015/03/10-14:58:11 0 us FlushJob::WriteLevel0Table 140116400078912 Low Pri column_family_name_000004 Compaction 2015/03/10-14:58:11 0 us CompactionJob::FinishCompactionOutputFile 140116358135872 Low Pri column_family_name_000006 Compaction 2015/03/10-14:58:10 1 us CompactionJob::FinishCompactionOutputFile 140116341358656 Low Pri 140116295221312 High Pri default Flush 2015/03/10-14:58:11 0 us FlushJob::WriteLevel0Table 140116324581440 Low Pri column_family_name_000009 Compaction 2015/03/10-14:58:11 0 us CompactionJob::ProcessKeyValueCompaction 140116278444096 Low Pri 140116299415616 Low Pri column_family_name_000008 Compaction 2015/03/10-14:58:11 0 us CompactionJob::FinishCompactionOutputFile 140116291027008 High Pri column_family_name_000001 Flush 2015/03/10-14:58:11 0 us FlushJob::WriteLevel0Table 140116286832704 Low Pri column_family_name_000002 Compaction 2015/03/10-14:58:11 0 us CompactionJob::FinishCompactionOutputFile 140116282638400 Low Pri Reviewers: rven, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34683	2015-03-13 10:45:40 -07:00
Islam AbdelRahman	1d43bc41fb	Fixing segmentation fault in db_bench Summary: Fixing segmentation fault when running db_bench This seg fault happens because num_created is used without being initialized Test Plan: running db_bench using these arguments bpl=10485760;overlap=10;mcz=2;del=300000000;levels=6;ctrig=4; delay=8; stop=12; wbn=3; mbc=20; mb=67108864;wbs=134217728; dds=0; sync=0; r=1000000; t=1; vs=800; bs=65536; cs=1048576; of=500000; si=1000000; ./db_bench --benchmarks=overwrite --disable_seek_compaction=1 --mmap_read=0 --statistics=1 --histogram=1 --num=$r --threads=$t --value_size=$vs --block_size=$bs --cache_size=$cs --bloom_bits=10 --cache_numshardbits=4 --open_files=$of --verify_checksum=1 --db=/home/tec/koko/ --sync=$sync --disable_wal=1 --compression_type=zlib --stats_interval=$si --compression_ratio=0.5 --disable_data_sync=$dds --write_buffer_size=$wbs --target_file_size_base=$mb --max_write_buffer_number=$wbn --max_background_compactions=$mbc --level0_file_num_compaction_trigger=$ctrig --level0_slowdown_writes_trigger=$delay --level0_stop_writes_trigger=$stop --num_levels=$levels --delete_obsolete_files_period_micros=$del --min_level_to_compress=$mcz --max_grandparent_overlap_factor=$overlap --stats_per_interval=1 --max_bytes_for_level_base=$bpl --use_existing_db=1 Reviewers: sdong, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D34881	2015-03-11 17:57:16 -07:00
sdong	2884b100ba	db_bench: Better way to randomize repeated read keys in -read_random_exp_range Summary: Use a better way to map from a key with locality to a random location. Now with the same -read_random_exp_range setting, hit rate drops, which it is expected. Test Plan: ./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=<multiple_values> Reviewers: MarkCallaghan, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D34761	2015-03-11 11:46:14 -07:00
Yueh-Hsuan Chiang	89597bb66b	Allow GetThreadList() to report the start time of the current operation. Summary: Allow GetThreadList() to report the start time of the current operation. Test Plan: ./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \ --max_background_compactions=10 --max_background_flushes=3 \ --thread_status_per_interval=1000 --key_size=16 --value_size=1000 \ --num_column_families=10 Sample output: ThreadID ThreadType cfName Operation OP_StartTime State 140338840797248 High Pri column_family_name_000003 Flush 2015/03/09-17:49:59 140338844991552 High Pri column_family_name_000004 Flush 2015/03/09-17:49:59 140338849185856 Low Pri 140338983403584 Low Pri 140339008569408 Low Pri 140338861768768 Low Pri 140338924683328 Low Pri 140338899517504 Low Pri 140338853380160 Low Pri 140338882740288 Low Pri 140338865963072 High Pri column_family_name_000006 Flush 2015/03/09-17:49:59 140338954043456 Low Pri 140338857574464 Low Pri Reviewers: igor, rven, sdong Reviewed By: sdong Subscribers: lgalanis, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34689	2015-03-10 14:51:28 -07:00
sdong	37921b4997	db_bench: Add Option -read_random_exp_range to allow read skewness. Summary: Introduce parameter -read_random_exp_range in db_bench to provide some key skewness in readrandom and multireadrandom benchmarks. It will helpful to cover block cache better. Test Plan: Run benchmarks with this new parameter. I can clearly see block cache hit rate change while I increase this value (DB size is about 66MB): ./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=0.0 rocksdb.block.cache.data.miss COUNT : 958418 rocksdb.block.cache.data.hit COUNT : 41582 ./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=5.0 rocksdb.block.cache.data.miss COUNT : 819518 rocksdb.block.cache.data.hit COUNT : 180482 ./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=10.0 rocksdb.block.cache.data.miss COUNT : 450479 rocksdb.block.cache.data.hit COUNT : 549521 ./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=20.0 rocksdb.block.cache.data.miss COUNT : 223192 rocksdb.block.cache.data.hit COUNT : 776808 Reviewers: MarkCallaghan, kradhakrishnan, yhchiang, rven, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D34629	2015-03-09 11:34:52 -07:00
Yueh-Hsuan Chiang	dc4532c497	Add --thread_status_per_interval to db_bench Summary: Add --thread_status_per_interval to db_bench, which allows db_bench to optionally enable print the current thread status periodically. Test Plan: ./db_bench --benchmarks=fillrandom --num=100000 --threads=40 --max_background_compactions=10 --max_background_flushes=3 --thread_status_per_interval=1000 --key_size=16 --value_size=1000 --num_column_families=10 Sample output: ThreadID ThreadType dbName cfName Operation State 140281571770432 Low Pri 140281575964736 High Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000001 Flush 140281710182464 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000008 Compaction 140281638879296 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000007 Compaction 140281592741952 Low Pri 140281580159040 High Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000002 Flush 140281676628032 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000006 Compaction 140281584353344 Low Pri 140281622102080 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000009 Compaction 140281605324864 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000004 Compaction 140281601130560 High Pri /tmp/rocksdbtest-5297/dbbench default Flush 140281596936256 Low Pri 140281588547648 Low Pri Reviewers: igor, rven, sdong Reviewed By: rven Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D34515	2015-03-06 11:22:06 -08:00
Igor Canadi	db03739340	options.level_compaction_dynamic_level_bytes to allow RocksDB to pick size bases of levels dynamically. Summary: When having fixed max_bytes_for_level_base, the ratio of size of largest level and the second one can range from 0 to the multiplier. This makes LSM tree frequently irregular and unpredictable. It can also cause poor space amplification in some cases. In this improvement (proposed by Igor Kabiljo), we introduce a parameter option.level_compaction_use_dynamic_max_bytes. When turning it on, RocksDB is free to pick a level base in the range of (options.max_bytes_for_level_base/options.max_bytes_for_level_multiplier, options.max_bytes_for_level_base] so that real level ratios are close to options.max_bytes_for_level_multiplier. Test Plan: New unit tests and pass tests suites including valgrind. Reviewers: MarkCallaghan, rven, yhchiang, igor, ikabiljo Reviewed By: ikabiljo Subscribers: yoshinorim, ikabiljo, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31437	2015-03-02 22:40:41 -08:00
Igor Sugak	62247ffa3b	rocksdb: Add missing override Summary: When using latest clang (3.6 or 3.7/trunck) rocksdb is failing with many errors. Almost all of them are missing override errors. This diff adds missing override keyword. No manual changes. Prerequisites: bear and clang 3.5 build with extra tools ```lang=bash % USE_CLANG=1 bear make all # generate a compilation database http://clang.llvm.org/docs/JSONCompilationDatabase.html % clang-modernize -p . -include . -add-override % make format ``` Test Plan: Make sure all tests are passing. ```lang=bash % #Use default fb code clang. % make check ``` Verify less error and no missing override errors. ```lang=bash % # Have trunk clang present in path. % ROCKSDB_NO_FBCODE=1 CC=clang CXX=clang++ make ``` Reviewers: igor, kradhakrishnan, rven, meyering, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34077	2015-02-26 11:28:41 -08:00
Mark Callaghan	182b4ceacd	Limit key range to number of keys, not number of writes Summary: An old commit (482401) changed DoWrite to use the value of --writes rather than --num to determine the range for keys. This restores the old and correct behavior which is to limit it using --num. Task ID: #6353043 Blame Rev: Test Plan: run db_bench Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D34065	2015-02-25 15:53:45 -08:00
Igor Sugak	73711f956c	rocksdb: Fix scan-build bug 'Memory leak' in db/db_bench.cc Summary: The bug is detected by scan-build. In `void WriteSeqSeekSeq(ThreadState* thread)` memory is allocated in line 3118 `Slice key = AllocateKey();` but `Slice` is not responsible deleting `Slice::data()`. Added `std::unique_ptr<const char[]>*` parameter to ` AllocateKey()`, so that it requires caller to not forget about Slice::data() management. scan-build bug report: http://home.fburl.com/~sugak/latest6/report-6e9754.html#EndPath Test Plan: Make sure scan-build does not report 'Memory leak' in db/db_bench.cc and all tests are passing. ```lang=bash % make analyze % make check ``` Reviewers: lgalanis, igor, meyering, sdong Reviewed By: meyering, sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33501	2015-02-19 14:27:48 -08:00
Igor Canadi	ae82849bc9	Fix build failure	2015-01-21 18:23:12 -08:00
Igor Canadi	423dee8418	Abort db_bench if Get() returns error Summary: I saw this when running readrandom benchmark with corrupted database -- benchmark worked! If a Get() returns corruption we should probably abort. Test Plan: compiles Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31701	2015-01-21 18:18:15 -08:00
Igor Canadi	9ab5adfc59	New BlockBasedTable version -- better compressed block format Summary: This diff adds BlockBasedTable format_version = 2. New format version brings better compressed block format for these compressions: 1) Zlib -- encode decompressed size in compressed block header 2) BZip2 -- encode decompressed size in compressed block header 3) LZ4 and LZ4HC -- instead of doing memcpy of size_t encode size as varint32. memcpy is very bad because the DB is not portable accross big/little endian machines or even platforms where size_t might be 8 or 4 bytes. It does not affect format for snappy. If you write a new database with format_version = 2, it will not be readable by RocksDB versions before 3.10. DB::Open() will return corruption in that case. Test Plan: Added a new test in db_test. I will also run db_bench and verify VSIZE when block_cache == 1GB Reviewers: yhchiang, rven, MarkCallaghan, dhruba, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31461	2015-01-14 16:24:24 -08:00
Igor Canadi	15d2abbec3	Fix build issues	2015-01-09 13:04:06 -08:00
Leonidas Galanis	9d5bd411be	benchmark.sh won't run through all tests properly if one specifies wal_dir to be different than db directory. Summary: A command line like this to run all the tests: source benchmark.config.sh && nohup ./benchmark.sh 'bulkload,fillseq,overwrite,filluniquerandom,readrandom,readwhilewriting' where benchmark.config.sh is: export DB_DIR=/data/mysql/rocksdata export WAL_DIR=/txlogs/rockswal export OUTPUT_DIR=/root/rocks_benchmarking/output Will fail for the tests that need a new DB . Also 1) set disable_data_sync=0 and 2) add debug mode to run through all the tests more quickly Test Plan: run ./benchmark.sh 'debug,bulkload,fillseq,overwrite,filluniquerandom,readrandom,readwhilewriting' and verify that there are no complaints about WAL dir not being empty. Reviewers: sdong, yhchiang, rven, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D30909	2015-01-05 15:36:47 -08:00
sdong	e9ca358157	Fix CLANG build for db_bench Summary: CLANG was broken for a recent change in db_ench. Fix it. Test Plan: Build db_bench using CLANG. Reviewers: rven, igor, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30801	2014-12-30 18:33:49 -08:00
sdong	a801c1fb09	db_bench --num_hot_column_families to be default off Summary: Having --num_hot_column_families default on fails some existing regression tests. By default turn it off Test Plan: Run db_bench to make sure it is default off. Reviewers: yhchiang, rven, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D30705	2014-12-24 09:00:23 -08:00
sdong	ddc81440d5	db_bench to add an option as number of hot column families to add to Summary: Add option --num_hot_column_families in db_bench. If it is set, write options will first write to that number of column families, and then move on to next set of hot column families. The working set of column families can be smaller than total number of CFs. It is to test how RocksDB can handle cold column families Test Plan: Run db_bench with --num_hot_column_families set and not set. Reviewers: yhchiang, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30663	2014-12-23 16:52:05 -08:00
Igor Canadi	0acc738810	Speed up FindObsoleteFiles() Summary: There are two versions of FindObsoleteFiles(): * full scan, which is executed every 6 hours (and it's terribly slow) * no full scan, which is executed every time a background process finishes and iterator is deleted This diff is optimizing the second case (no full scan). Here's what we do before the diff: * Get the list of obsolete files (files with ref==0). Some files in obsolete_files set might actually be live. * Get the list of live files to avoid deleting files that are live. * Delete files that are in obsolete_files and not in live_files. After this diff: * The only files with ref==0 that are still live are files that have been part of move compaction. Don't include moved files in obsolete_files. * Get the list of obsolete files (which exclude moved files). * No need to get the list of live files, since all files in obsolete_files need to be deleted. I'll post the benchmark results, but you can get the feel of it here: https://reviews.facebook.net/D30123 This depends on D30123. P.S. We should do full scan only in failure scenarios, not every 6 hours. I'll do this in a follow-up diff. Test Plan: One new unit test. Made sure that unit test fails if we don't have a `if (!f->moved)` safeguard in ~Version. make check Big number of compactions and flushes: ./db_stress --threads=30 --ops_per_thread=20000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0 --reopen=15 --max_background_compactions=10 --max_background_flushes=10 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000 Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30249	2014-12-22 12:04:45 +01:00
Leonidas Galanis	635c61fd3b	Fix problem with create_if_missing option when wal_dir is used Summary: When wal_dir is used, DestroyDB is not passed the wal_dir option and so we get a Corruption exception. Test Plan: Verified manually that the following command line works now: ./db_bench --db=/mnt/db/rocksdb ... --disable_wal=0 --wal_dir=/data/users/rocksdb/WAL... --benchmarks=filluniquerandom --use_existing_db=0... Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D29859	2014-12-08 12:53:24 -08:00
Jonah Cohen	a14b7873ee	Enforce write buffer memory limit across column families Summary: Introduces a new class for managing write buffer memory across column families. We supplement ColumnFamilyOptions::write_buffer_size with ColumnFamilyOptions::write_buffer, a shared pointer to a WriteBuffer instance that enforces memory limits before flushing out to disk. Test Plan: Added SharedWriteBuffer unit test to db_test.cc Reviewers: sdong, rven, ljin, igor Reviewed By: igor Subscribers: tnovak, yhchiang, dhruba, xjin, MarkCallaghan, yoshinorim Differential Revision: https://reviews.facebook.net/D22581	2014-12-02 12:09:20 -08:00
Yueh-Hsuan Chiang	13de000f07	Add rocksdb::ToString() to address cases where std::to_string is not available. Summary: In some environment such as android, the c++ library does not have std::to_string. This path adds rocksdb::ToString(), which wraps std::to_string when std::to_string is not available, and implements std::to_string in the other case. Test Plan: make dbg -j32 ./db_test make clean make dbg OPT=-DOS_ANDROID -j32 ./db_test Reviewers: ljin, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D29181	2014-11-24 20:44:49 -08:00
sdong	3a40c427b9	Fix db_bench on CLANG mode Summary: "build all" breaks in Clang mode with db_bench. Fix it. Test Plan: USE_CLANG=1 make all Reviewers: ljin, rven, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D29379	2014-11-21 11:30:22 -08:00
Igor Canadi	cd278584c9	Clean up StringSplit Summary: stringSplit is not how we name our functions. Also, we had two StringSplit's in the codebase Test Plan: make check Reviewers: yhchiang, dhruba Reviewed By: dhruba Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D29361	2014-11-21 11:05:28 -05:00
sdong	a177742a9b	Make db_stress built for ROCKSDB_LITE Summary: Make db_stress built for ROCKSDB_LITE. The test doesn't pass tough. It seg fault quickly. But I took a look and it doesn't seem to be related to lite version. Likely to be a bug inside RocksDB. Test Plan: make db_stress Reviewers: yhchiang, rven, ljin, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D28797	2014-11-14 10:20:51 -08:00
Igor Canadi	767777c2bd	Turn on -Wshorten-64-to-32 and fix all the errors Summary: We need to turn on -Wshorten-64-to-32 for mobile. See D1671432 (internal phabricator) for details. This diff turns on the warning flag and fixes all the errors. There were also some interesting errors that I might call bugs, especially in plain table. Going forward, I think it makes sense to have this flag turned on and be very very careful when converting 64-bit to 32-bit variables. Test Plan: compiles Reviewers: ljin, rven, yhchiang, sdong Reviewed By: yhchiang Subscribers: bobbaldwin, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28689	2014-11-11 16:47:22 -05:00
Tomislav Novak	35c8c814e8	Make ForwardIterator::status() more efficient Summary: In D19581 I made `ForwardIterator::status()` check all child iterators, including immutable ones. It's, however, not necessary to do it every time -- it'll suffice to check only when they're used and their status could change. This diff: * introduces `immutable_status_` which is updated by `Seek()` and `Next()` * removes special handling of `kIncomplete` status in those methods Test Plan: * `db_test` * hacked ReadSequential in db_bench.cc to check `status()` in addition to validity: ``` $ ./db_bench -use_existing_db -benchmarks readseq -disable_auto_compactions \ -use_tailing_iterator # without this patch Keys: 16 bytes each Values: 100 bytes each (50 bytes after compression) Entries: 1000000 [...] DB path: [/dev/shm/rocksdbtest/dbbench] readseq : 0.562 micros/op 1778103 ops/sec; 98.4 MB/s $ ./db_bench -use_existing_db -benchmarks readseq -disable_auto_compactions \ -use_tailing_iterator # with the patch readseq : 0.433 micros/op 2311363 ops/sec; 127.8 MB/s ``` Reviewers: igor, sdong, ljin Reviewed By: ljin Subscribers: dhruba, leveldb, march, lovro Differential Revision: https://reviews.facebook.net/D24063	2014-11-10 15:44:20 -08:00
Shi Feng	ea18b944a7	Add db_bench option --report_file_operations Summary: Add db_bench option --report_file_operations Test Plan: ./db_bench --report_file_operations Observe outputs on # of file operations Reviewers: ljin, MarkCallaghan, sdong Reviewed By: sdong Subscribers: yhchiang, rven, igor, dhruba Differential Revision: https://reviews.facebook.net/D27945	2014-11-05 18:40:18 -08:00
Igor Canadi	b680033e63	Include atomic in env_test	2014-10-27 15:41:05 -07:00
Igor Canadi	48842ab316	Deprecate AtomicPointer Summary: RocksDB already depends on C++11, so we might as well all the goodness that C++11 provides. This means that we don't need AtomicPointer anymore. The less things in port/, the easier it will be to port to other platforms. Test Plan: make check + careful visual review verifying that NoBarried got memory_order_relaxed, while Acquire/Release methods got memory_order_acquire and memory_order_release Reviewers: rven, yhchiang, ljin, sdong Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D27543	2014-10-27 14:50:21 -07:00
sdong	2a8e5203d8	db_bench: --batch_size used for write benchmarks too Summary: Now --bench_size is only used in multireadrandom tests, although the codes allow it to run in all write tests. I don't see a reason why we can't enable it. Test Plan: Run ./db_bench -benchmarks multirandomwrite --threads=5 -batch_size=16 and see the stats printed out in LOG to make sure batching really happened. Reviewers: ljin, yhchiang, rven, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D25509	2014-10-22 18:54:12 -07:00
sdong	5bfb7f5d0b	db_bench: seekrandom can specify --seek_nexts to read specific keys after seek. Summary: Add a function as tittle. Also use the same parameter to fillseekseq too. Test Plan: Run seekrandom using the new parameter Reviewers: ljin, MarkCallaghan Reviewed By: MarkCallaghan Subscribers: rven, igor, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D25035	2014-10-20 11:55:33 -07:00
sdong	b7d3d6ebc5	db_bench: set thread pool size according to max_background_flushes Summary: option max_background_flushes doesn't make sense if thread pool size is not set accordingly. Set the thread pool size as what we do for max_background_compactions. Test Plan: Run db_bench with max_background_flushes > 1 Reviewers: yhchiang, igor, rven, ljin Reviewed By: ljin Subscribers: MarkCallaghan, leveldb Differential Revision: https://reviews.facebook.net/D24717	2014-10-09 20:38:15 -07:00
Tomislav Novak	88edfd90ae	SkipListRep::LookaheadIterator Summary: This diff introduces the `lookahead` argument to `SkipListFactory()`. This is an optimization for the tailing use case which includes many seeks. E.g. consider the following operations on a skip list iterator: Seek(x), Next(), Next(), Seek(x+2), Next(), Seek(x+3), Next(), Next(), ... If `lookahead` is positive, `SkipListRep` will return an iterator which also keeps track of the previously visited node. Seek() then first does a linear search starting from that node (up to `lookahead` steps). As in the tailing example above, this may require fewer than ~log(n) comparisons as with regular skip list search. Test Plan: Added a new benchmark (`fillseekseq`) which simulates the usage pattern. It first writes N records (with consecutive keys), then measures how much time it takes to read them by calling `Seek()` and `Next()`. $ time ./db_bench -num 10000000 -benchmarks fillseekseq -prefix_size 1 \ -key_size 8 -write_buffer_size $[102410241024] -value_size 50 \ -seekseq_next 2 -skip_list_lookahead=0 [...] DB path: [/dev/shm/rocksdbtest/dbbench] fillseekseq : 0.389 micros/op 2569047 ops/sec; real 0m21.806s user 0m12.106s sys 0m9.672s $ time ./db_bench [...] -skip_list_lookahead=2 [...] DB path: [/dev/shm/rocksdbtest/dbbench] fillseekseq : 0.153 micros/op 6540684 ops/sec; real 0m19.469s user 0m10.192s sys 0m9.252s Reviewers: ljin, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb, march, lovro Differential Revision: https://reviews.facebook.net/D23997	2014-10-07 11:48:23 -07:00
mike@arpaia.co	f0f7955497	Fixing comile errors on OS X Summary: Building master on OS X has some compile errors due to implicit type conversions which generate warnings which RocksDB's build settings raise as errors. Test Plan: It compiles! Reviewers: ljin, igor Reviewed By: ljin Differential Revision: https://reviews.facebook.net/D24135	2014-09-29 16:05:25 -07:00
Mark Callaghan	747523d241	Print per column family metrics in db_bench Summary: see above Test Plan: make check, ran db_bench and looked at output - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Differential Revision: https://reviews.facebook.net/D24189	2014-09-29 15:47:05 -07:00
Lei Jin	fbd2dafc9f	CompactedDBImpl::MultiGet() for better CuckooTable performance Summary: Add the MultiGet API to allow prefetching. With file size of 1.5G, I configured it to have 0.9 hash ratio that can fill With 115M keys and result in 2 hash functions, the lookup QPS is ~4.9M/s vs. 3M/s for Get(). It is tricky to set the parameters right. Since files size is determined by power-of-two factor, that means # of keys is fixed in each file. With big file size (thus smaller # of files), we will have more chance to waste lot of space in the last file - lower space utilization as a result. Using smaller file size can improve the situation, but that harms lookup speed. Test Plan: db_bench Reviewers: yhchiang, sdong, igor Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23673	2014-09-25 13:34:51 -07:00
Lei Jin	3c68006109	CompactedDBImpl Summary: Add a CompactedDBImpl that will enabled when calling OpenForReadOnly() and the DB only has one level (>0) of files. As a performan comparison, CuckooTable performs 2.1M/s with CompactedDBImpl vs. 1.78M/s with ReadOnlyDBImpl. Test Plan: db_bench Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23553	2014-09-25 11:14:01 -07:00
Lei Jin	57a32f147f	change target_file_size_base to uint64_t Summary: It contrains the file size to be 4G max with int Test Plan: tried to grep instance and made sure other related variables are also uint64 Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23697	2014-09-22 11:15:03 -07:00
Lei Jin	51af7c326c	CuckooTable: add one option to allow identity function for the first hash function Summary: MurmurHash becomes expensive when we do millions Get() a second in one thread. Add this option to allow the first hash function to use identity function as hash function. It results in QPS increase from 3.7M/s to ~4.3M/s. I did not observe improvement for end to end RocksDB performance. This may be caused by other bottlenecks that I will address in a separate diff. Test Plan: ``` [ljin@dev1964 rocksdb] ./cuckoo_table_reader_test --enable_perf --file_dir=/dev/shm --write --identity_as_first_hash=0 ==== Test CuckooReaderTest.WhenKeyExists ==== Test CuckooReaderTest.WhenKeyExistsWithUint64Comparator ==== Test CuckooReaderTest.CheckIterator ==== Test CuckooReaderTest.CheckIteratorUint64 ==== Test CuckooReaderTest.WhenKeyNotFound ==== Test CuckooReaderTest.TestReadPerformance With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.272us (3.7 Mqps) with batch size of 0, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.138us (7.2 Mqps) with batch size of 10, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.142us (7.1 Mqps) with batch size of 25, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.142us (7.0 Mqps) with batch size of 50, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.144us (6.9 Mqps) with batch size of 100, # of found keys 125829120 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.201us (5.0 Mqps) with batch size of 0, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.121us (8.3 Mqps) with batch size of 10, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.123us (8.1 Mqps) with batch size of 25, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.121us (8.3 Mqps) with batch size of 50, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.112us (8.9 Mqps) with batch size of 100, # of found keys 104857600 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.251us (4.0 Mqps) with batch size of 0, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.107us (9.4 Mqps) with batch size of 10, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.099us (10.1 Mqps) with batch size of 25, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.100us (10.0 Mqps) with batch size of 50, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.116us (8.6 Mqps) with batch size of 100, # of found keys 83886080 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.189us (5.3 Mqps) with batch size of 0, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.095us (10.5 Mqps) with batch size of 10, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.096us (10.4 Mqps) with batch size of 25, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.098us (10.2 Mqps) with batch size of 50, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.105us (9.5 Mqps) with batch size of 100, # of found keys 73400320 [ljin@dev1964 rocksdb] ./cuckoo_table_reader_test --enable_perf --file_dir=/dev/shm --write --identity_as_first_hash=1 ==== Test CuckooReaderTest.WhenKeyExists ==== Test CuckooReaderTest.WhenKeyExistsWithUint64Comparator ==== Test CuckooReaderTest.CheckIterator ==== Test CuckooReaderTest.CheckIteratorUint64 ==== Test CuckooReaderTest.WhenKeyNotFound ==== Test CuckooReaderTest.TestReadPerformance With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.230us (4.3 Mqps) with batch size of 0, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.086us (11.7 Mqps) with batch size of 10, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.088us (11.3 Mqps) with batch size of 25, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.083us (12.1 Mqps) with batch size of 50, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.083us (12.1 Mqps) with batch size of 100, # of found keys 125829120 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.159us (6.3 Mqps) with batch size of 0, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.078us (12.8 Mqps) with batch size of 10, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.080us (12.6 Mqps) with batch size of 25, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.080us (12.5 Mqps) with batch size of 50, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.082us (12.2 Mqps) with batch size of 100, # of found keys 104857600 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.154us (6.5 Mqps) with batch size of 0, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.077us (13.0 Mqps) with batch size of 10, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.077us (12.9 Mqps) with batch size of 25, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.078us (12.8 Mqps) with batch size of 50, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.079us (12.6 Mqps) with batch size of 100, # of found keys 83886080 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.218us (4.6 Mqps) with batch size of 0, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.083us (12.0 Mqps) with batch size of 10, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.085us (11.7 Mqps) with batch size of 25, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.086us (11.6 Mqps) with batch size of 50, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.078us (12.8 Mqps) with batch size of 100, # of found keys 73400320 ``` Reviewers: sdong, igor, yhchiang Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23451	2014-09-18 11:00:48 -07:00
Feng Zhu	0af157f9bf	Implement full filter for block based table. Summary: 1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file. 2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter. 3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type. 4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h. 5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc Benchmark: base commit `1d23b5c470` Command: db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1 Read QPS increase for about 30% from 2230002 to 2991411. Test Plan: make all check valgrind db_test db_stress --use_block_based_filter = 0 ./auto_sanity_test.sh Reviewers: igor, yhchiang, ljin, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20979	2014-09-08 10:37:05 -07:00
Igor Canadi	8de151bb99	Add db_bench with lots of column families to regression tests Summary: That way we can see when this graph goes up and be happy. Couple of changes: 1. title 2. fix db_bench to delete column families before deleting the DB. this was asserting when compiled in debug mode 3. don't sync manifest when disableDataSync. We discussed this offline. I can move it to separate diff if you'd like Test Plan: ran it Reviewers: sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22815	2014-09-05 14:20:18 -07:00
liuhuahang	bb6ae0f80c	fix more compile warnings N/A Change-Id: I5b6f9c70aea7d3f3489328834fed323d41106d9f Signed-off-by: liuhuahang <liuhuahang@zerus.co>	2014-09-05 14:14:37 +08:00
Radheshyam Balasundaram	4142a3e783	Adding a user comparator for comparing Uint64 slices. Summary: - New Uint64 comparator - Modify Reader and Builder to take custom user comparators instead of bytewise comparator - Modify logic for choosing unused user key in builder - Modify iterator logic in reader - test changes Test Plan: cuckoo_table_{builder,reader,db}_test make check all Reviewers: ljin, sdong Reviewed By: ljin Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D22377	2014-08-27 10:39:31 -07:00
Lei Jin	384400128f	move block based table related options BlockBasedTableOptions Summary: I will move compression related options in a separate diff since this diff is already pretty lengthy. I guess I will also need to change JNI accordingly :( Test Plan: make all check Reviewers: yhchiang, igor, sdong Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21915	2014-08-25 14:22:05 -07:00
Radheshyam Balasundaram	162b8151f1	Adding Column Family support in db_bench. Summary: Adding num_column_families flag. Adding support for column families in DoWrite and ReadRandom methods. [Igor, please let me know if this approach sounds good. I shall add it to other methods too.] Test Plan: Ran fillseq on 1M keys and 10 Column families and ran readrandom. Reviewers: sdong, yhchiang, igor, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21387	2014-08-18 18:15:01 -07:00
Radheshyam Balasundaram	36e759d199	Adding Cuckoo Table SST option to db_bench Summary: Adding flags to use cuckoo table SST in db_bench.cc Test Plan: Ran benchmark with fillseq and readrandom Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21729	2014-08-18 11:59:38 -07:00
Lei Jin	58c49466d2	Allow env_posix to lower background thread IO priority Summary: This is a linux-specific system call. Test Plan: ran db_bench Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: haobo, leveldb Differential Revision: https://reviews.facebook.net/D21183	2014-08-13 20:49:58 -07:00
Feng Zhu	d3f2ec694f	check prefix_size when using hash search in db_bench Summary: 1. Check prefix_size when enable use_hash_search in db_bench 2. Remove include/statistics.h in db_bench Test Plan: ./db_bench --use_hash_search=1 Reviewers: ljin, yhchiang, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21375	2014-08-11 10:47:52 -07:00
Feng Zhu	50c2dcb78f	add options.block_restart_interval in db_bench Summary: Add block_restart_interval in db_bench, default value 16 Test Plan: make Reviewers: sdong Reviewed By: sdong Differential Revision: https://reviews.facebook.net/D20331	2014-07-21 12:01:40 -07:00
Stanislau Hlebik	92d73cbe78	Add PlainTableOptions Summary: Since we have a lot of options for PlainTable, add a struct PlainTableOptions to manage them Test Plan: make all check Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20175	2014-07-18 00:08:38 -07:00
Feng Zhu	bc6b2ab401	enable kHashSearch for blocktable in db_bench Summary: add a flag called use_hash_search in db_bench Test Plan: make all check ./db_bench --use_hash_search=1 Reviewers: ljin, haobo, yhchiang, sdong Reviewed By: sdong Subscribers: igor, dhruba Differential Revision: https://reviews.facebook.net/D20067	2014-07-16 17:32:30 -07:00
Radheshyam Balasundaram	f0660d5253	Adding NUMA support to db_bench tests Summary: Changes: - Adding numa_aware flag to db_bench.cc - Using numa.h library to bind memory and cpu of threads to a fixed NUMA node Result: There seems to be no significant change in the micros/op time with numa_aware enabled. I also tried this with other implementations, including a combination of pthread_setaffinity_np, sched_setaffinity and set_mempolicy methods. It'd be great if someone could point out where I'm going wrong and if we can achieve a better micors/op. Test Plan: Ran db_bench tests using following command: ./db_bench --db=/mnt/tmp --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --block_size=4096 --cache_size=17179869184 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=134217728 --max_bytes_for_level_base=1073741824 --disable_wal=0 --wal_dir=/mnt/tmp --sync=0 --disable_data_sync=1 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_grandparent_overlap_factor=10 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --perf_level=0 --duration=300 --benchmarks=readwhilewriting --use_existing_db=1 --num=157286400 --threads=24 --writes_per_second=10240 --numa_aware=[False/True] The tests were run in private devserver with 24 cores and the db was prepopulated using filluniquerandom test. The tests resulted in 0.145 us/op with numa_aware=False and 0.161 us/op with numa_aware=True. Reviewers: sdong, yhchiang, ljin, igor Reviewed By: ljin, igor Subscribers: igor, leveldb Differential Revision: https://reviews.facebook.net/D19353	2014-07-07 10:53:31 -07:00
Yueh-Hsuan Chiang	faa8d21922	Improve an assertion in RandomGenerator::Generate() in db_bench. Summary: RandomGenerator::Generate() currently has an assertion len < data_.size(). However, it is actually fine to have len == data_.size(). This diff change the assertion to len <= data_.size(). Test Plan: make db_bench ./db_bench Reviewers: haobo, sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19269	2014-06-24 15:29:28 -06:00
Lei Jin	3b0dc76699	db_bench: measure the real latency of write/delete Summary: as title Test Plan: make release Reviewers: haobo, sdong, yhchiang Reviewed By: yhchiang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19227	2014-06-23 13:23:02 -07:00
Lei Jin	a1b5650a75	db_bench: sanity check on compression ratio Summary: as requested by mark Test Plan: make release Reviewers: sdong, haobo Reviewed By: haobo Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19221	2014-06-23 10:46:16 -07:00
Igor Canadi	d4a8423334	Remove seek compaction Summary: As discussed in our internal group, we don't get much use of seek compaction at the moment, while it's making code more complicated and slower in some cases. This diff removes seek compaction and (hopefully) all code that was introduced to support seek compaction. There is one test case that relied on didIO information. I'll try to find another way to implement it. Test Plan: make check Reviewers: sdong, haobo, yhchiang, ljin, dhruba Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19161	2014-06-20 10:23:02 +02:00
Lei Jin	388d2054c7	forward iterator Summary: Forward iterator puts everything together in a flat structure instead of a hierarchy of nested iterators. this should simplify the code and provide better performance. It also enables more optimization since all information are accessiable in one place. Init evaluation shows about 6% improvement Test Plan: db_test and db_bench Reviewers: dhruba, igor, tnovak, sdong, haobo Reviewed By: haobo Subscribers: sdong, leveldb Differential Revision: https://reviews.facebook.net/D18795	2014-05-30 14:31:55 -07:00
Lei Jin	f29c62fc6f	add an iterator refresh option for SeekRandom Summary: One more option to allow iterator refreshing when using normal iterator Test Plan: ran db_bench Reviewers: haobo, sdong, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18849	2014-05-30 14:09:22 -07:00
sdong	4e0602f941	Remove maximum key_size check in db_bench Summary: Key size limit doesn't seem to be applicable anymore. Remove it. Test Plan: run a couple of tests in db_bench Reviewers: haobo, igor, yhchiang, dhruba Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D18723	2014-05-15 11:06:37 -07:00
Igor Canadi	a1068c91a1	Make RocksDB work with newer gflags Summary: Newer gflags switched from `google` namespace to `gflags` namespace. See: https://github.com/facebook/rocksdb/issues/139 and https://github.com/facebook/rocksdb/issues/102 Unfortunately, they don't define any macro with their namespace, so we need to actually try to compile gflags with two different namespace to figure out which one is the correct one. Test Plan: works in fbcode environemnt. I'll also try in ubutnu with newer gflags Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D18537	2014-05-08 17:25:13 -07:00
Igor Canadi	0afc8bc29a	xxHash Summary: Originally: https://github.com/facebook/rocksdb/pull/87/files I'm taking over to apply some finishing touches Test Plan: will add tests Reviewers: dhruba, haobo, sdong, yhchiang, ljin Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D18315	2014-05-01 14:09:32 -04:00
Yueh-Hsuan Chiang	9d9d2965cb	Add a new mem-table representation based on cuckoo hash. Summary: = Major Changes = * Add a new mem-table representation, HashCuckooRep, which is based cuckoo hash. Cuckoo hash uses multiple hash functions. This allows each key to have multiple possible locations in the mem-table. - Put: When insert a key, it will try to find whether one of its possible locations is vacant and store the key. If none of its possible locations are available, then it will kick out a victim key and store at that location. The kicked-out victim key will then be stored at a vacant space of its possible locations or kick-out another victim. In this diff, the kick-out path (known as cuckoo-path) is found using BFS, which guarantees to be the shortest. - Get: Simply tries all possible locations of a key --- this guarantees worst-case constant time complexity. - Time complexity: O(1) for Get, and average O(1) for Put if the fullness of the mem-table is below 80%. - Default using two hash functions, the number of hash functions used by the cuckoo-hash may dynamically increase if it fails to find a short-enough kick-out path. - Currently, HashCuckooRep does not support iteration and snapshots, as our current main purpose of this is to optimize point access. = Minor Changes = * Add IsSnapshotSupported() to DB to indicate whether the current DB supports snapshots. If it returns false, then DB::GetSnapshot() will always return nullptr. Test Plan: Run existing tests. Will develop a test specifically for cuckoo hash in the next diff. Reviewers: sdong, haobo Reviewed By: sdong CC: leveldb, dhruba, igor Differential Revision: https://reviews.facebook.net/D16155	2014-04-29 17:13:46 -07:00
Igor Canadi	38693d99c4	Fix more signed/unsigned comparsions	2014-04-29 12:40:18 -07:00
Lei Jin	3995e801ab	kill ReadOptions.prefix and .prefix_seek Summary: also add an override option total_order_iteration if you want to use full iterator with prefix_extractor Test Plan: make all check Reviewers: igor, haobo, sdong, yhchiang Reviewed By: haobo CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D17805	2014-04-25 12:21:34 -07:00
Igor Canadi	f9f8965e96	Print out stack trace in mac, too Summary: While debugging Mac-only issue with ThreadLocalPtr, this was very useful. Let's print out stack trace in MAC OS, too. Test Plan: Verified that somewhat useful stack trace was generated on mac. Will run PrintStack() on linux, too. Reviewers: ljin, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18189	2014-04-23 09:11:35 -04:00
Igor Canadi	8dc34364d2	Rename "benchmark" back to "bench". Also, make `benchharness.cc` not compiled into rocksdb library.	2014-04-21 13:12:15 -07:00
Pratyush Seth	ff1b5df4c6	Added benchmark functionality on the lines of folly/Benchmark.h Summary: Added benchmark functionality on the lines of folly/Benchmark.h Test Plan: Added unit tests Reviewers: igor, haobo, sdong, ljin, yhchiang, dhruba Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17973	2014-04-21 12:29:55 -07:00
sdong	c87ed0942c	Fix db_bench's multireadrandom Summary: multireadrandom is broken. Fix it Test Plan: run it and see segfault has gone. Reviewers: ljin Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17781	2014-04-14 15:43:34 -07:00
sdong	d5e087b6df	db_bench: add a mode to operate multiple DBs Summary: This patch introduces a new parameter num_multi_db in db_bench. When this parameter is larger than 1, multiple DBs will be created. In all benchmarks, any operation applies to a random DB among them. This is to benchmark the performance of similar applications. Test Plan: run db_bench on both of num_multi_db=0 and more. Reviewers: haobo, ljin, igor Reviewed By: igor CC: igor, yhchiang, dhruba, nkg-, leveldb Differential Revision: https://reviews.facebook.net/D17769	2014-04-11 16:59:08 -07:00
Lei Jin	0af36d6aa6	SeekRandomWhileWriting Summary: as title Test Plan: ran it Reviewers: igor, haobo, yhchiang Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D17751	2014-04-11 09:47:20 -07:00
Lei Jin	7a92537fc4	db_bench: add IteratorCreationWhileWriting mode and allow prefix_seek Summary: as title Test Plan: ran it Reviewers: igor, haobo, yhchiang Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17655	2014-04-10 10:15:59 -07:00
Igor Canadi	4daea66343	Turn on -Wmissing-prototypes Summary: Compiling for iOS has by default turned on -Wmissing-prototypes, which causes rocksdb to fail compiling. This diff turns on -Wmissing-prototypes in our compile options and cleans up all functions with missing prototypes. Test Plan: compiles Reviewers: dhruba, haobo, ljin, sdong Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17649	2014-04-09 21:17:14 -07:00
Lei Jin	4824014e3b	speed up db_bench filluniquerandom mode Summary: filluniquerandom is painfully slow due to the naive bitmap check to find out if a key has been seen before. Majority of time is spent on searching the last few keys. Split a giant BitSet to smaller ones so that we can quickly check if a BitSet is full and thus can skip quickly. It used to take over one hour to filluniquerandom for 100M keys, now it takes about 10 mins. Test Plan: unit test also verified correctness in db_bench and make sure all keys are generated Reviewers: igor, haobo, yhchiang Reviewed By: igor CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D17607	2014-04-09 11:25:21 -07:00
Igor Canadi	34455deb06	Fix Mac OS compile issues	2014-04-08 14:05:53 -07:00
Lei Jin	0c1126d4cf	db_bench cleanup Summary: clean up the db_bench a little bit. also avoid allocating memory for key in the loop Test Plan: I verified a run with filluniquerandom & readrandom. Iterator seek will be used lot to measure performance. Will fix whatever comes up Reviewers: haobo, igor, yhchiang Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17559	2014-04-08 11:21:09 -07:00
Lei Jin	c90d446ee7	make hash_link_list Node's key space consecutively followed at the end Summary: per sdong's request, this will help processor prefetch on n->key case. Test Plan: make all check Reviewers: sdong, haobo, igor Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D17415	2014-04-04 15:37:28 -07:00
Lei Jin	0d755fff14	cache friendly blocked bloomfilter Summary: By constraining the probes within cache line(s), we can improve the cache miss rate thus performance. This probably only makes sense for in-memory workload so defaults the option to off. Numbers and comparision can be found in wiki: https://our.intern.facebook.com/intern/wiki/index.php/Ljin/rocksdb_perf/2014_03_17#Bloom_Filter_Study Test Plan: benchmarked this change substantially. Will run make all check as well Reviewers: haobo, igor, dhruba, sdong, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17133	2014-03-28 09:21:20 -07:00
Lei Jin	8d007b4aaf	Consolidate SliceTransform object ownership Summary: (1) Fix SanitizeOptions() to also check HashLinkList. The current dynamic case just happens to work because the 2 classes have the same layout. (2) Do not delete SliceTransform object in HashSkipListFactory and HashLinkListFactory destructor. Reason: SanitizeOptions() enforces prefix_extractor and SliceTransform to be the same object when HashFactory is used. This makes the behavior strange: when HashFactory is used, prefix_extractor will be released by RocksDB. If other memtable factory is used, prefix_extractor should be released by user. Test Plan: db_bench && make asan_check Reviewers: haobo, igor, sdong Reviewed By: igor CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D16587	2014-03-10 12:56:46 -07:00
Lei Jin	04298f8c33	output perf_context in db_bench readrandom Summary: Add helper function to print perf context data in db_bench if enabled. I didn't find any code that actually exports perf context data. Not sure if I missed anything Test Plan: ran db_bench Reviewers: haobo, sdong, igor Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D16575	2014-03-05 10:32:54 -08:00
Lei Jin	64138b5d9c	fix db_bench to use HashSkipList for real Summary: For HashSkipList case, DBImpl has sanity check to see if prefix_extractor in options is the same as the one in memtable factory. If not, it falls back to SkipList. As result, I was experimenting with SkipList performance. No wonder it is much worse than LinkedList Test Plan: ran benchmark Reviewers: haobo, sdong, igor Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D16569	2014-03-05 10:28:53 -08:00
Lei Jin	51560ba755	config max_background_flushes in db_bench Summary: as title Test Plan: make release Reviewers: haobo, sdong, igor Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16437	2014-03-05 10:27:17 -08:00
Lei Jin	a5b1d2f146	make key evenly distributed between 0 and FLAGS_num Summary: The issue is that when FLAGS_num is small, the leading bytes of the key are padded with 0s. This makes all keys have the same prefix 00000000 Most of the changes are just to make lint happy Test Plan: ran db_bench Reviewers: sdong, haobo, igor Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D16317	2014-03-04 17:08:05 -08:00
Lei Jin	dea894ef8d	expose wal_dir in db_bench Summary: as title Test Plan: ran db_bench Reviewers: dhruba, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16269	2014-02-25 10:43:46 -08:00
Lei Jin	28b7f7faa8	enable plain table in db_bench Summary: as title Test Plan: ran db_bench to gather stats Reviewers: haobo, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16059	2014-02-12 10:41:55 -08:00
Albert Strasheim	df2f92214a	Support for LZ4 compression.	2014-02-08 14:15:51 -08:00
Igor Canadi	1560bb913e	Readrandom with tailing iterator Summary: Added an option for readrandom benchmark to run with tailing iterator instead of Get. Benefit of tailing iterator is that it doesn't require locking DB mutex on access. I also have some results when running on my machine. The results highly depend on number of cache shards. With our current benchmark setting of 4 table cache shards and 6 block cache shards, I don't see much improvements of using tailing iterator. In that case, we're probably seeing cache mutex contention. Here are the results for different number of shards cache shards tailing iterator get 6 1.38M 1.16M 10 1.58M 1.15M As soon as we get rid of cache mutex contention, we're seeing big improvements in using tailing iterator vs. ordinary get. Test Plan: ran regression test Reviewers: dhruba, haobo, ljin, kailiu, sding Reviewed By: haobo CC: tnovak Differential Revision: https://reviews.facebook.net/D15867	2014-02-07 09:47:47 -08:00
kailiu	84f8185fc0	Merge branch 'master' into performance Conflicts: HISTORY.md db/db_impl.cc db/memtable.cc	2014-02-05 21:21:00 -08:00

1 2 3 4 5 ...

311 Commits