rocksdb

Author	SHA1	Message	Date
Lei Jin	d650612c4c	expose RateLimiter definition Summary: User gets undefinied error since the definition is not exposed. Also re-enable the db test with only upper bound check Test Plan: db_test, rate_limit_test Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20403	2014-07-25 15:17:06 -07:00
Yueh-Hsuan Chiang	6480717a26	Fixed compaction-related errors where number of input levels are hard-coded. Summary: Fixed compaction-related errors where number of input levels are hard-coded. It's a bug found in compaction branch. This diff will be pushed into master. Test Plan: export ROCKSDB_TESTS=Compact make db_test -j32 ./db_test also passed the tests in compaction branch Reviewers: igor, sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20577	2014-07-24 17:06:00 -07:00
Igor Canadi	41a697256f	NewIterators in read-only mode Summary: As title. Test Plan: Added test to column_family_test Reviewers: ljin, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20523	2014-07-23 16:52:11 -04:00
Feng Zhu	da9274574f	Use IterKey instead of string in Block::Iter to reduce malloc Summary: Modify a functioin TrimAppend in dbformat.h: IterKey. Write a test for it in dbformat_test Use IterKey in block::Iter to replace std::string to reduce malloc. Evaluate it using perf record. malloc: 4.26% -> 2.91% free: 3.61% -> 3.08% Test Plan: make all check ./valgrind db_test dbformat_test Reviewers: ljin, haobo, yhchiang, dhruba, igor, sdong Reviewed By: sdong Differential Revision: https://reviews.facebook.net/D20433	2014-07-23 12:31:11 -07:00
Yueh-Hsuan Chiang	0e1b4787ed	Fixed a bug in Compaction.cc where input_levels_ was not properly resized. Summary: Fixed a bug in Compaction.cc where input_levels_ was not properly resized. Without this fix, there would be invalid access in input_levels_ when more than two levels are involved in one compaction run. This fix will go to master instead of compaction branch. Test Plan: tested in compaction branch. Reviewers: ljin, sdong, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20481	2014-07-23 10:22:21 -07:00
Igor Canadi	0ff183a0d9	Move include/utilities/.h to include/rocksdb/utilities/.h Summary: All public headers need to be under `include/rocksdb` directory. Otherwise, clients include our header files like this: #include <rocksdb/db.h> #include <utilities/backupable_db.h> // still our public header! Also, internally, we include: #include "utilities/backupable/backupable_db.h" // internal header #include "utilities/backupable_db.h" // public header which is confusing. This way, when we install rocksdb as a system library, we can just copy `include/rocksdb` directory to system's header files. We can't really copy `utilities` directory to system's header files. Test Plan: compiles Reviewers: dhruba, ljin, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20409	2014-07-23 10:21:38 -04:00
sdong	f6b7e1ed1a	Allow user to specify DB path of output file of manual compaction Summary: Add a parameter path_id to DB::CompactRange(), to indicate where the output file should be placed to. Test Plan: add a unit test Reviewers: yhchiang, ljin Reviewed By: ljin Subscribers: xjin, igor, dhruba, MarkCallaghan, leveldb Differential Revision: https://reviews.facebook.net/D20085	2014-07-21 19:06:00 -07:00
Lei Jin	f6f1533c6f	make internal stats independent of statistics Summary: also make it aware of column family output from db_bench ``` Compaction Stats [default] Level Files Size(MB) Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) RW-Amp W-Amp Rd(MB/s) Wr(MB/s) Rn(cnt) Rnp1(cnt) Wnp1(cnt) Wnew(cnt) Comp(sec) Comp(cnt) Avg(sec) Stall(sec) Stall(cnt) Avg(ms) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ L0 14 956 0.9 0.0 0.0 0.0 2.7 2.7 0.0 0.0 0.0 111.6 0 0 0 0 24 40 0.612 75.20 492387 0.15 L1 21 2001 2.0 5.7 2.0 3.7 5.3 1.6 5.4 2.6 71.2 65.7 31 43 55 12 82 2 41.242 43.72 41183 1.06 L2 217 18974 1.9 16.5 2.0 14.4 15.1 0.7 15.6 7.4 70.1 64.3 17 182 185 3 241 16 15.052 0.00 0 0.00 L3 1641 188245 1.8 9.1 1.1 8.0 8.5 0.5 15.4 7.4 61.3 57.2 9 75 76 1 152 9 16.887 0.00 0 0.00 L4 4447 449025 0.4 13.4 4.8 8.6 9.1 0.5 4.7 1.9 77.8 52.7 38 79 100 21 176 38 4.639 0.00 0 0.00 Sum 6340 659201 0.0 44.7 10.0 34.7 40.6 6.0 32.0 15.2 67.7 61.6 95 379 416 37 676 105 6.439 118.91 533570 0.22 Int 0 0 0.0 1.2 0.4 0.8 1.3 0.5 5.2 2.7 59.1 65.6 3 7 9 2 20 10 2.003 0.00 0 0.00 Stalls(secs): 75.197 level0_slowdown, 0.000 level0_numfiles, 0.000 memtable_compaction, 43.717 leveln_slowdown Stalls(count): 492387 level0_slowdown, 0 level0_numfiles, 0 memtable_compaction, 41183 leveln_slowdown DB Stats Uptime(secs): 202.1 total, 13.5 interval Cumulative writes: 6291456 writes, 6291456 batches, 1.0 writes per batch, 4.90 ingest GB Cumulative WAL: 6291456 writes, 6291456 syncs, 1.00 writes per sync, 4.90 GB written Interval writes: 1048576 writes, 1048576 batches, 1.0 writes per batch, 836.0 ingest MB Interval WAL: 1048576 writes, 1048576 syncs, 1.00 writes per sync, 0.82 MB written Test Plan: ran it Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19917	2014-07-21 12:57:29 -07:00
Feng Zhu	50c2dcb78f	add options.block_restart_interval in db_bench Summary: Add block_restart_interval in db_bench, default value 16 Test Plan: make Reviewers: sdong Reviewed By: sdong Differential Revision: https://reviews.facebook.net/D20331	2014-07-21 12:01:40 -07:00
Chilledheart	54f4e2f188	Fix clang compiler warnings	2014-07-20 22:57:20 +08:00
Stanislau Hlebik	9d70cce047	Adding option to save PlainTable index and bloom filter in SST file. Summary: Adding option to save PlainTable index and bloom filter in SST file. If there is no bloom block and/or index block, PlainTableReader builds new ones. Otherwise PlainTableReader just use these blocks. Test Plan: make all check Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19527	2014-07-18 16:58:13 -07:00
Stanislau Hlebik	92d73cbe78	Add PlainTableOptions Summary: Since we have a lot of options for PlainTable, add a struct PlainTableOptions to manage them Test Plan: make all check Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20175	2014-07-18 00:08:38 -07:00
Yueh-Hsuan Chiang	052ddbe0e2	Add MaxInputLevel() to CompactionPicker Summary: Having if-then branch for different compaction strategies is considered hacky and make CompactionPicker less pluggable. This diff removes two of such if-then branches in version_set.cc by adding MaxInputLevel() to CompactionPicker. // Given the current number of levels, returns the lowest allowed level // for compaction input. virtual int MaxInputLevel(int current_num_levels) const; Test Plan: make db_test export ROCKSDB_TESTS=Compaction ./db_test Reviewers: igor, sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19971	2014-07-17 18:01:04 -07:00
Yueh-Hsuan Chiang	aac941b3f0	Fixed a signed and unsigned comparison in Compaction Summary: Fixed a signed and unsigned comparison in Compaction Test Plan: make db_test export ROCKSDB_TESTS=Compaction ./db_test	2014-07-17 16:37:25 -07:00
Radheshyam Balasundaram	0d57e3ad7d	Guarding files_ attribute with #ifndef NDEBUG guard in FilePicker class. Summary: Adding guards to files_ attribute of FilePicker class. This attribute is used only in DEBUG mode. This fixes build of static_lib in mac. Test Plan: make static_lib in mac make check all in devserver Reviewers: ljin, igor, sdong Reviewed By: sdong Differential Revision: https://reviews.facebook.net/D20163	2014-07-17 15:07:05 -07:00
Yueh-Hsuan Chiang	3178510153	Allow class Compaction to handle input files from multiple levels. Summary: Allow class Compaction to handle input files from multiple levels. This diff is a subset of https://reviews.facebook.net/D19263 where only db/compaction.cc and db/compaction.h are changed. Test Plan: make db_test export ROCKSDB_TESTS=Compaction ./db_test Reviewers: igor, sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19923	2014-07-17 14:36:41 -07:00
Yueh-Hsuan Chiang	296e340753	Add struct CompactionInputFiles to manage compaction input files. Summary: Add struct CompactionInputFiles to manage compaction input files. Test Plan: export ROCKSDB_TESTS=Compact make db_test ./db_test Reviewers: ljin, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20061	2014-07-16 18:12:17 -07:00
Feng Zhu	bc6b2ab401	enable kHashSearch for blocktable in db_bench Summary: add a flag called use_hash_search in db_bench Test Plan: make all check ./db_bench --use_hash_search=1 Reviewers: ljin, haobo, yhchiang, sdong Reviewed By: sdong Subscribers: igor, dhruba Differential Revision: https://reviews.facebook.net/D20067	2014-07-16 17:32:30 -07:00
Feng Zhu	87895c62db	fix bug in LOG for flush memtable Summary: One line change to fix a bug in the LOG when flush memtable Test Plan: NONE Reviewers: sdong Reviewed By: sdong Differential Revision: https://reviews.facebook.net/D20049	2014-07-16 16:56:49 -07:00
Stanislau Hlebik	1c9f190ae3	Fix db_test Summary: Added deletion of DBIterators in DBIterator's tests Test Plan: make valgrind_check Reviewers: igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20043	2014-07-16 14:51:43 -07:00
Radheshyam Balasundaram	0418e66e2a	Refactoring Version::Get() Summary: Refactoring Version::Get() method to move file picker logic to a separate class. Test Plan: make check all Reviewers: igor, sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19713	2014-07-16 13:33:02 -07:00
Feng Zhu	c11d604ab3	store file_indexer info in sequential memory Summary: use arena to allocate space for next_level_index_ and level_rb_ Thus increasing data locality and make Version::Get faster. Benchmark detail Base version: commit `d2a727c182` command used: ./db_bench --db=/mnt/db/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --block_size=4096 --cache_size=17179869184 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=2097152 --max_bytes_for_level_base=1073741824 --disable_wal=0 --sync=0 --disable_data_sync=1 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_grandparent_overlap_factor=10 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --perf_level=0 --benchmarks=fillseq, readrandom,readrandom,readrandom --use_existing_db=0 --num=52428800 --threads=1 Result: cpu running percentage: Version::Get, improved from 7.98% to 7.42% FileIndexer::GetNextLevelIndex, improved from 1.18% to 0.68%. Test Plan: make all check Reviewers: ljin, haobo, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, igor Differential Revision: https://reviews.facebook.net/D19845	2014-07-16 11:21:30 -07:00
Stanislau Hlebik	d916593ead	Add Prev() for merge operator Summary: Implement Prev() with merge operator for DBIterator. Request from mongoDB. Task 4673663. Test Plan: make all check Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19743	2014-07-15 16:10:18 -07:00
sdong	0abaed2e08	Support multiple DB directories in universal compaction style Summary: This patch adds a target size parameter in options.db_paths and universal compaction will base it to determine which DB path to place a new file. Level-style stays the same. Test Plan: Add new unit tests Reviewers: ljin, yhchiang Reviewed By: yhchiang Subscribers: MarkCallaghan, dhruba, igor, leveldb Differential Revision: https://reviews.facebook.net/D19869	2014-07-15 12:06:28 -07:00
Igor Canadi	20c056306b	Remove stats logger Summary: Browsing through the code, looks like StatsLogger is not used at all! Test Plan: compiles Reviewers: ljin, sdong, yhchiang, dhruba Reviewed By: dhruba Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D19827	2014-07-15 09:16:32 -04:00
Igor Canadi	d2a727c182	BG -> GB	2014-07-14 09:06:38 -07:00
Igor Canadi	ee6b35e55a	Fix mac compile Summary: We should use PRIu64 instead of "%lu" for portability Test Plan: compiles now Reviewers: ljin, dhruba Reviewed By: dhruba Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19809	2014-07-14 09:56:52 -04:00
Lei Jin	46f0f6ddd5	improve InternalStats output Summary: as title Test Plan: sampe output: Level Files Size(MB) Score Read(GB) Rn(GB) Rnp1(GB) Write(BG) Wnew(GB) RW-Amp W-Amp Rd(MB/s) Wr(MB/s) Rn(cnt) Rnp1(cnt) Wnp1(cnt) Wnew(cnt) Comp(sec) Comp(cnt) Avg(sec) Stall(sec) Stall(cnt) Avg(ms) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ L0 15 1024 1.0 0.0 0.0 0.0 8.2 8.2 0.0 0.0 0.0 111.4 0 0 1 1 75 123 0.612 295.94 1939238 0.15 L1 23 2118 2.1 20.9 8.3 12.7 20.0 7.3 5.0 2.4 73.2 69.9 124 141 208 67 293 8 36.582 17.05 16100 1.06 L2 162 15333 1.5 47.0 7.1 40.0 42.6 2.6 12.7 6.0 67.9 61.5 62 457 482 25 709 55 12.898 0.00 0 0.00 L3 985 108065 1.1 37.8 4.0 33.9 36.9 3.0 18.8 9.3 60.1 58.5 41 338 363 25 645 31 20.812 0.00 0 0.00 L4 2788 356033 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 0.000 0.00 0 0.00 Sum 3973 482572 0.0 105.8 19.3 86.5 107.7 21.2 11.1 5.6 62.9 64.0 227 936 1054 118 1723 217 7.938 312.99 1955338 0.16 Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19707	2014-07-11 15:03:30 -07:00
Feng Zhu	178fd6f9db	use FileLevel in LevelFileNumIterator Summary: Use FileLevel in LevelFileNumIterator, thus use new version of findFile. Old version of findFile function is deleted. Write a function in version_set.cc to generate FileLevel from files_. Add GenerateFileLevelTest in version_set_test.cc Test Plan: make all check Reviewers: ljin, haobo, yhchiang, sdong Reviewed By: sdong Subscribers: igor, dhruba Differential Revision: https://reviews.facebook.net/D19659	2014-07-11 12:52:41 -07:00
Tomislav Novak	3b97ee96c4	ForwardIterator seek bugfix Summary: If `NeedToSeekImmutable()` returns false, `SeekInternal()` won't reset the contents of `immutable_min_heap_`. However, since it calls `UpdateCurrent()` unconditionally, if `current_` is one of immutable iterators (previously popped from `immutable_min_heap_`), `UpdateCurrent()` will overwrite it. As a result, if old `current_` in fact pointed to the smallest entry, forward iterator will skip some records. Fix implemented in this diff pushes `current_` back to `immutable_min_heap_` before calling `UpdateCurrent()`. Test Plan: New unit test (courtesy of @lovro): $ ROCKSDB_TESTS=TailingIteratorSeekToSame ./db_test Reviewers: igor, dhruba, haobo, ljin Reviewed By: ljin Subscribers: lovro, leveldb Differential Revision: https://reviews.facebook.net/D19653	2014-07-10 16:46:13 -07:00
Reed Allman	1fc71a4b16	C API: create missing cf's, cleanup	2014-07-10 12:55:53 -07:00
Tomislav Novak	105c1e099b	ForwardIterator::status() checks all child iterators Summary: Forward iterator only checked `status_` and `mutable_iter_->status()`, which is not sufficient. For example, when reading exclusively from cache (kBlockCacheTier), `mutable_iter_->status()` may return kOk (e.g. there's nothing in the memtable), but one of immutable iterators could be in kIncomplete. In this case, `ForwardIterator::status()` ought to return that status instead of kOk. This diff changes `status()` to also check `imm_iters_`, `l0_iters_`, and `level_iters_`. Test Plan: ROCKSDB_TESTS=TailingIteratorIncomplete ./db_test Reviewers: ljin, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D19581	2014-07-10 12:43:12 -07:00
sdong	36de0e5359	Add a function to return current perf level Summary: Add a function to return the perf level. It is to allow a wrapper of DB to increase the perf level and restore the original perf level after finishing the function call. Test Plan: Add a verification in db_test Reviewers: yhchiang, igor, ljin Reviewed By: ljin Subscribers: xjin, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D19551	2014-07-10 11:35:48 -07:00
Stanislau Hlebik	30c81e7717	Removing NewTotalOrderPlainTableFactory Summary: Seems like NewTotalOrderPlainTableFactory is useless and is semantically incorrect. Total order mode indicator is prefix_extractor == nullptr, but NewTotalOrderPlainTableFactory doesn't set it to be nullptr. That's why some tests in plain_table_db_tests is incorrect. Test Plan: make all check Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19587	2014-07-10 11:32:04 -07:00
Igor Canadi	f0a8be253e	JSON (Document) API sketch Summary: This is a rough sketch of our new document API. Would like to get some thoughts and comments about the high-level architecture and API. I didn't optimize for performance at all. Leaving some low-hanging fruit so that we can be happy when we fix them! :) Currently, bunch of features are not supported at all. Indexes can be only specified when creating database. There is no query planner whatsoever. This will all be added in due time. Test Plan: Added a simple unit test Reviewers: haobo, yhchiang, dhruba, sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18747	2014-07-10 09:31:42 -07:00
Feng Zhu	222cf2555a	change the init parameter for FileDescriptor Summary: fix a bug in improve_file_key_search, change the parameter for FileDescriptor Test Plan: make all check Reviewers: sdong Reviewed By: sdong Differential Revision: https://reviews.facebook.net/D19611	2014-07-09 23:40:03 -07:00
Lei Jin	8a7d1fe616	disable rate limiter test Summary: The test is not stable because it relies on disk and only runs for a short period of time. So misisng a compaction/flush would greatly affect the rate. I am disabling it for now. What do you guys think? Test Plan: make Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19599	2014-07-09 22:46:15 -07:00
Feng Zhu	f697cad15c	create compressed_levels_ in Version, allocate its space using arena. Make Version::Get, Version::FindFile faster Summary: Define CompressedFileMetaData that just contains fd, smallest_slice, largest_slice. Create compressed_levels_ in Version, the space is allocated using arena Thus increase the file meta data locality, speed up "Get" and "FindFile" benchmark with in-memory tmpfs, could have 4% improvement under "random read" and 2% improvement under "read while writing" benchmark command: ./db_bench --db=/mnt/db/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --block_size=4096 --cache_size=17179869184 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --disable_wal=0 --sync=0 --disable_data_sync=1 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_grandparent_overlap_factor=10 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --perf_level=0 --benchmarks=readwhilewriting,readwhilewriting,readwhilewriting --use_existing_db=1 --num=52428800 --threads=1 —writes_per_second=81920 Read Random: From 1.8363 ms/op, improve to 1.7587 ms/op. Read while writing: From 2.985 ms/op, improve to 2.924 ms/op. Test Plan: make all check Reviewers: ljin, haobo, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, igor Differential Revision: https://reviews.facebook.net/D19419	2014-07-09 22:14:39 -07:00
Yueh-Hsuan Chiang	70828557ef	Some fixes on size compensation logic for deletion entry in compaction Summary: This patch include two fixes: 1. newly created Version will now takes the aggregated stats for average-value-size from the latest Version. 2. compensated size of a file is now computed only for newly created / loaded file, this addresses the issue where files are already sorted by their compensated file size but might sometimes observe some out-of-order due to later update on compensated file size. Test Plan: export ROCKSDB_TESTS=CompactionDele ./db_test Reviewers: ljin, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19557	2014-07-09 12:46:08 -07:00
Lei Jin	ef1aad97f9	fix one more internal_stats issue Summary: stall count is wrong Test Plan: make release Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19539	2014-07-08 15:29:13 -07:00
Lei Jin	73d7147096	make rate limiter test more reliable Summary: Randomize keys so that compaction actually happens. Change the config so that compaction happens more aggressively. The test takes longer time, but the results are more stable shown by iostat Test Plan: ran it Reviewers: igor, yhchiang Reviewed By: yhchiang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19533	2014-07-08 15:15:00 -07:00
Lei Jin	8a9cc7885c	report correct interval amplification Summary: as title Test Plan: make release Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19515	2014-07-08 12:48:10 -07:00
Lei Jin	534357ca3a	integrate rate limiter into rocksdb Summary: Add option and plugin rate limiter for PosixWritableFile. The rate limiter only applies to flush and compaction. WAL and MANIFEST are excluded from this enforcement. Test Plan: db_test Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19425	2014-07-08 12:31:49 -07:00
Lei Jin	b278ae8e50	Apply fractional cascading in ForwardIterator::Seek() Summary: Use search hint to reduce FindFile range thus avoid comparison For a small DB with 50M keys, perf_context counter shows it reduces comparison from 2B to 1.3B for a 15-minute run. No perf change was observed for 1 seek thread, but quite good improvement was seen for 32 seek threads, when CPU was busy. will post detail results when ready Test Plan: db_bench and db_test Reviewers: haobo, sdong, dhruba, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18879	2014-07-08 11:40:42 -07:00
Igor Canadi	1b95bf734d	Merge pull request #197 from rdallman/update-options C API: update options w/ convenience funcs & fifo compaction	2014-07-08 11:40:11 -07:00
Reed Allman	fd3fb4b0bf	C API: update options w/ convenience funcs & fifo compaction	2014-07-08 10:57:45 -07:00
Reed Allman	e9b18b6b89	C API: bugfix column_family_comact_range	2014-07-07 21:48:49 -07:00
Igor Canadi	4adf64e068	Fix compile issue	2014-07-07 14:54:11 -07:00
Igor Canadi	8a03935f8c	Fix valgrind error in c_test Summary: External contribution caused some valgrind errors: `1a34aaaef0` This diff fixes them Test Plan: ran valgrind Reviewers: sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19485	2014-07-07 14:41:54 -07:00
Evan Shaw	13a130cc00	C API: Add test for compaction filter factories Also refactored the compaction filter tests to share some code and ensure that options were getting reset so future test results aren't confused.	2014-07-08 08:12:36 +12:00
Evan Shaw	3f7104d7c5	C API: Allow setting compaction filter factory	2014-07-08 08:12:36 +12:00
Evan Shaw	91bede79cc	C API: Add support for compaction filter factories (v1)	2014-07-08 08:12:36 +12:00
Radheshyam Balasundaram	f0660d5253	Adding NUMA support to db_bench tests Summary: Changes: - Adding numa_aware flag to db_bench.cc - Using numa.h library to bind memory and cpu of threads to a fixed NUMA node Result: There seems to be no significant change in the micros/op time with numa_aware enabled. I also tried this with other implementations, including a combination of pthread_setaffinity_np, sched_setaffinity and set_mempolicy methods. It'd be great if someone could point out where I'm going wrong and if we can achieve a better micors/op. Test Plan: Ran db_bench tests using following command: ./db_bench --db=/mnt/tmp --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --block_size=4096 --cache_size=17179869184 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=134217728 --max_bytes_for_level_base=1073741824 --disable_wal=0 --wal_dir=/mnt/tmp --sync=0 --disable_data_sync=1 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_grandparent_overlap_factor=10 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --perf_level=0 --duration=300 --benchmarks=readwhilewriting --use_existing_db=1 --num=157286400 --threads=24 --writes_per_second=10240 --numa_aware=[False/True] The tests were run in private devserver with 24 cores and the db was prepopulated using filluniquerandom test. The tests resulted in 0.145 us/op with numa_aware=False and 0.161 us/op with numa_aware=True. Reviewers: sdong, yhchiang, ljin, igor Reviewed By: ljin, igor Subscribers: igor, leveldb Differential Revision: https://reviews.facebook.net/D19353	2014-07-07 10:53:31 -07:00
Igor Canadi	0bc5fa9f40	Merge pull request #190 from edsrzf/c-api-writebatch-serialized C API: support constructing write batch from serialized representation	2014-07-07 10:17:43 -07:00
Reed Allman	1a34aaaef0	C API: column family support	2014-07-07 01:41:01 -07:00
Evan Shaw	9fc23d0c56	C API: support constructing write batch from serialized representation	2014-07-06 10:36:33 +12:00
Yueh-Hsuan Chiang	7b85c1e900	Improve SimpleWriteTimeoutTest to avoid false alarm. Summary: SimpleWriteTimeoutTest has two parts: 1) insert two large key/values to make memtable full and expect both of them are successful; 2) insert another key / value and expect it to be timed-out. Previously we also set a timeout in the first step, but this might sometimes cause false alarm. This diff makes the first two writes run without timeout setting. Test Plan: export ROCKSDB_TESTS=Time make db_test Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19461	2014-07-04 00:02:12 -07:00
Yueh-Hsuan Chiang	d33657a4a5	Fixed a warning in release mode. Summary: Removed a variable that is only used in assertion check. Test Plan: make release Reviewers: ljin, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19455	2014-07-03 17:19:17 -07:00
Yueh-Hsuan Chiang	90a6aca48e	Finer report I/O stats about Flush and Compaction. Summary: This diff allows the I/O stats about Flush and Compaction to be reported in a more accurate way. Instead of measuring the size of a file, it measure I/O cost in per read / write basis. Test Plan: make all check Reviewers: sdong, igor, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19383	2014-07-03 16:28:03 -07:00
Yueh-Hsuan Chiang	d4d338de33	Add timeout_hint_us to WriteOptions and introduce Status::TimeOut. Summary: This diff adds timeout_hint_us to WriteOptions. If it's non-zero, then 1) writes associated with this options MAY be aborted when it has been waiting for longer than the specified time. If an abortion happens, associated writes will return Status::TimeOut. 2) the stall time of the associated write caused by flush or compaction will be limited by timeout_hint_us. The default value of timeout_hint_us is 0 (i.e., OFF.) The statistics of timeout writes will be recorded in WRITE_TIMEDOUT. Test Plan: export ROCKSDB_TESTS=WriteTimeoutAndDelayTest make db_test ./db_test Reviewers: igor, ljin, haobo, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18837	2014-07-03 15:47:02 -07:00
Igor Canadi	4203431e71	Fix mac os compile error	2014-07-03 23:03:24 +02:00
sdong	2459f7ec4e	Support Multiple DB paths (without having an interface to expose to users) Summary: In this patch, we allow RocksDB to support multiple DB paths internally. No user interface is supported yet so this patch is silent to users. Test Plan: make all check Reviewers: igor, haobo, ljin, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18921	2014-07-02 21:14:44 -07:00
Igor Canadi	f146cab261	Centralize compression decision to compaction picker Summary: Before this diff, we're deciding enable_compression in CompactionPicker and then we're deciding final compression type in DBImpl. This is kind of confusing. After the diff, the final compression type will be decided in CompactionPicker. The reason for this is that I want CompactFiles() to specify output compression type, so that people can mix and match compression styles in their compaction algorithms. This diff makes it much easier to do that. Test Plan: make check Reviewers: dhruba, haobo, sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19137	2014-07-02 20:40:57 +02:00
sdong	1d05006740	Re-commit the correct part (WalDir) of the revision: Commit `6634844dba` by sdong Two small fixes in db_test Summary: Two fixes: (1) WalDir to pick a directory under TmpDir to allow two tests running in parallel without impacting each other (2) kBlockBasedTableWithWholeKeyHashIndex is disabled by mistake (I assume). Enable it. Test Plan: ./db_test Reviewers: yhchiang, ljin Reviewed By: ljin Subscribers: nkg-, igor, dhruba, haobo, leveldb Differential Revision: https://reviews.facebook.net/D19389	2014-07-01 18:54:50 -07:00
sdong	30b20604db	Revert "Two small fixes in db_test" This reverts commit `6634844dba`.	2014-07-01 17:41:38 -07:00
sdong	9c332aa11a	HashLinkList memtable switches a bucket to a skip list to reduce performance outliers Summary: In this patch, we enhance HashLinkList memtable to reduce performance outliers when a bucket contains too many entries. We switch to skip list for this case to enable binary search. Add threshold_use_skiplist parameter to determine when a bucket needs to switch to skip list. The new data structure is documented in comments in the codes. Test Plan: make all check set threshold_use_skiplist in several tests Reviewers: yhchiang, haobo, ljin Reviewed By: yhchiang, ljin Subscribers: nkg-, xjin, dhruba, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D19299	2014-07-01 17:14:15 -07:00
sdong	6634844dba	Two small fixes in db_test Summary: Two fixes: (1) WalDir to pick a directory under TmpDir to allow two tests running in parallel without impacting each other (2) kBlockBasedTableWithWholeKeyHashIndex is disabled by mistake (I assume). Enable it. Test Plan: ./db_test Reviewers: yhchiang, ljin Reviewed By: ljin Subscribers: nkg-, igor, dhruba, haobo, leveldb Differential Revision: https://reviews.facebook.net/D19389	2014-07-01 16:58:03 -07:00
Igor Canadi	f5d4df1c02	Fix compile error	2014-07-01 10:55:03 +02:00
Igor Canadi	a2e0d890ed	No need for files_by_size_ in universal compaction Summary: files_by_size_ is sorted by time in case of universal compaction. However, Version::files_ is also sorted by time. So no need for files_by_size_ Test Plan: 1) make check with the change 2) make check with `assert(last_index == c->input_version_->files_[level].size() - 1);` in compaction picker Reviewers: dhruba, haobo, yhchiang, sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19125	2014-07-01 08:55:04 +02:00
Feng Zhu	5656367416	use arena to allocate memtable's bloomfilter and hashskiplist's buckets_ Summary: Bloomfilter and hashskiplist's buckets_ allocated by memtable's arena DynamicBloom: pass arena via constructor, allocate space in SetTotalBits HashSkipListRep: allocate space of buckets_ using arena. do not delete it in deconstructor because arena would take care of it. Several test files are changed. Test Plan: make all check Reviewers: ljin, haobo, yhchiang, sdong Reviewed By: sdong Subscribers: igor, dhruba Differential Revision: https://reviews.facebook.net/D19335	2014-06-30 15:54:31 -07:00
sdong	dd337bc0b2	In logging format, use PRIu64 instead of casting Summary: Code cleaning up, since we are already using __STDC_FORMAT_MACROS in printing uint64_t, change other places. Only logging is changed. Test Plan: make all check Reviewers: ljin Reviewed By: ljin Subscribers: dhruba, yhchiang, haobo, leveldb Differential Revision: https://reviews.facebook.net/D19113	2014-06-27 16:34:15 -07:00
Stanislau Hlebik	a3594867ba	Cache some conditions for DBImpl::MakeRoomForWrite Summary: Task 4580155. Some conditions in DBImpl::MakeRoomForWrite can be cached in ColumnFamilyData, because theirs value can be changed only during compaction, adding new memtable and/or add recalculation of compaction score. These conditions are: cfd->imm()->size() == cfd->options()->max_write_buffer_number - 1 cfd->current()->NumLevelFiles(0) >= cfd->options()->level0_stop_writes_trigger cfd->options()->soft_rate_limit > 0.0 && (score = cfd->current()->MaxCompactionScore()) > cfd->options()->soft_rate_limit cfd->options()->hard_rate_limit > 1.0 && (score = cfd->current()->MaxCompactionScore()) > cfd->options()->hard_rate_limit P.S. As it's my first diff, Siying suggested to add everybody as a reviewers for this diff. Sorry, if I forgot someone or add someone by mistake. Test Plan: make all check Reviewers: haobo, xjin, dhruba, yhchiang, zagfox, ljin, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19311	2014-06-26 16:45:27 -07:00
sdong	19de6a7aad	Remove MemTableRep::GetIterator(const Slice& slice) Summary: It seems to me that when ever function MemTableRep::GetIterator(const Slice& slice) is used, we can use MemTableRep::GetDynamicPrefixIterator() instead. Just delete it to simplify the codes. Test Plan: make all check Reviewers: yhchiang, ljin Reviewed By: ljin Subscribers: xjin, dhruba, haobo, leveldb Differential Revision: https://reviews.facebook.net/D19281	2014-06-25 14:09:29 -07:00
Yueh-Hsuan Chiang	8898a0a0d1	Reorder the member variables of FileMetaData to improve cache locality. Summary: Move stats related member variables of FileMetaData to the bottom to improve cache locality of normal DB operations. Test Plan: make Reviewers: haobo, ljin, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19287	2014-06-24 19:22:11 -06:00
Yueh-Hsuan Chiang	e813f5b6d9	Allow compaction to reclaim storage more effectively. Summary: This diff allows compaction to reclaim storage more effectively. In the current design, compactions are mainly triggered based on the file sizes. However, since deletion entries does not have value, files which have many deletion entries are less likely to be compacted. As a result, it may took a while to make deletion entries to be compacted. This diff address issue by compensating the size of deletion entries during compaction process: the size of each deletion entry in the compaction process is augmented by 2x average value size. The diff applies to both leveled and universal compacitons. Test Plan: develop CompactionDeletionTrigger make db_test ./db_test Reviewers: haobo, igor, ljin, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19029	2014-06-24 16:37:06 -06:00
Yueh-Hsuan Chiang	faa8d21922	Improve an assertion in RandomGenerator::Generate() in db_bench. Summary: RandomGenerator::Generate() currently has an assertion len < data_.size(). However, it is actually fine to have len == data_.size(). This diff change the assertion to len <= data_.size(). Test Plan: make db_bench ./db_bench Reviewers: haobo, sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19269	2014-06-24 15:29:28 -06:00
Lei Jin	3b0dc76699	db_bench: measure the real latency of write/delete Summary: as title Test Plan: make release Reviewers: haobo, sdong, yhchiang Reviewed By: yhchiang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19227	2014-06-23 13:23:02 -07:00
Lei Jin	a1b5650a75	db_bench: sanity check on compression ratio Summary: as requested by mark Test Plan: make release Reviewers: sdong, haobo Reviewed By: haobo Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19221	2014-06-23 10:46:16 -07:00
Igor Canadi	d4a8423334	Remove seek compaction Summary: As discussed in our internal group, we don't get much use of seek compaction at the moment, while it's making code more complicated and slower in some cases. This diff removes seek compaction and (hopefully) all code that was introduced to support seek compaction. There is one test case that relied on didIO information. I'll try to find another way to implement it. Test Plan: make check Reviewers: sdong, haobo, yhchiang, ljin, dhruba Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19161	2014-06-20 10:23:02 +02:00
Igor Canadi	107e08baa7	Use same sorting for all level 0 files Summary: We decided that one of the long term goals is to unify level and universal compaction. As a small first step, I'm unifying level 0 sorting methods. Previously, we used to sort level 0 files in level compaction by file number and in universal compaction by sequence number. But it turns out that in level compaction, sorting by file number is exactly the same as sorting by sequence number. Test Plan: Ran make check with bunch of asserts to verify the sorting order is exactly the same. Also, make check with this patch Reviewers: haobo, yhchiang, ljin, dhruba, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19131	2014-06-20 09:12:14 +02:00
Haobo Xu	7a9dd5f214	[RocksDB] Make block based table hash index more adaptive Summary: Currently, RocksDB returns error if a db written with prefix hash index, is later opened without providing a prefix extractor. This is uncessarily harsh. Without a prefix extractor, we could always fallback to the normal binary index. Test Plan: unit test, also manually veried LOG that fallback did occur. Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19191	2014-06-19 16:40:32 -07:00
Yueh-Hsuan Chiang	4f5ccfd179	Fixed a potential write hang Summary: Currently, when something badly happen in the DB::Write() while the write-queue contains more than one element, the current design seems to forget to clean up the queue as well as wake-up all the writers, this potentially makes rocksdb hang on writes. Test Plan: make all check Reviewers: sdong, ljin, igor, haobo Reviewed By: haobo Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19167	2014-06-19 14:53:03 -07:00
Igor Canadi	bae495740d	Merge pull request #179 from edsrzf/c-api-compaction-filter Support for compaction filters in the C API	2014-06-19 21:22:46 +02:00
Lei Jin	c4e90c79ed	bug fix: iteration over ColumnFamilySet needs to be under mutex Summary: asan_crash_test is failing on segfault Test Plan: running asan_crash_test Reviewers: sdong, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19149	2014-06-19 09:31:14 -07:00
Evan Shaw	5363eb8ad4	Add a test for using compaction filters via the C API	2014-06-19 21:46:58 +12:00
Evan Shaw	d72313a7fa	Add a way to set compaction filter in the C API	2014-06-19 16:31:24 +12:00
Evan Shaw	df2701373d	Support for compaction filters in the C API	2014-06-19 16:31:17 +12:00
sdong	edd47c5104	PlainTable to encode to avoid to rewrite prefix when it is the same as the previous key Summary: Add a encoding feature of PlainTable to encode PlainTable's keys to save some bytes for the same prefixes. The data format is documented in table/plain_table_factory.h Test Plan: Add unit test coverage in plain_table_db_test Reviewers: yhchiang, igor, dhruba, ljin, haobo Reviewed By: haobo Subscribers: nkg-, leveldb Differential Revision: https://reviews.facebook.net/D18735	2014-06-18 20:41:52 -07:00
Igor Canadi	3525aac9e5	Change order of parameters in adaptive table factory Summary: This is minor, but if we put the writing talbe factory as the third parameter, when we add a new table format, we'll have a situation: 1) block based factory 2) plain table factory 3) output factory 4) new format factory I think it makes more sense to have output as the first parameter. Also, fixed a NewAdaptiveTableFactory() call in unit test Test Plan: unit test Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19119	2014-06-18 07:04:37 +02:00
sdong	8c265c08f1	HashLinkList to log distribution of number of entries aross buckets Summary: Add two parameters of hash linked list to log distribution of number of entries across all buckets, and a sample row when there are too many entries in one single bucket. Test Plan: Turn it on in plain_table_db_test and see the logs. Reviewers: haobo, ljin Reviewed By: ljin Subscribers: leveldb, nkg-, dhruba, yhchiang Differential Revision: https://reviews.facebook.net/D19095	2014-06-17 17:55:36 -07:00
sdong	200e4b4a72	Add a table factory that can read DB with both of PlainTable and BlockBasedTable in it Summary: The new table factory is used if users want to convert a DB from one table format to the other. A user can use this table to open a DB written using one table format and write new files to another table format. Test Plan: add a unit test Reviewers: haobo, igor Reviewed By: igor Subscribers: dhruba, ljin, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D19017	2014-06-17 11:49:22 -07:00
Igor Canadi	4f18bfe376	Merge pull request #176 from bgrainger/mutexrw-unlock Add separate Read/WriteUnlock methods in MutexRW.	2014-06-17 20:38:06 +02:00
Yueh-Hsuan Chiang	e6e259b8ab	Include max_write_buffer_number >= 2 to SanitizeOptions. Summary: Include max_write_buffer_number >= 2 to SanitizeOptions. Test Plan: make all check Reviewers: haobo, sdong, igor, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19077	2014-06-16 16:26:46 -07:00
sdong	cadc1adffa	Refactor: group metadata needed to open an SST file to a separate copyable struct Summary: We added multiple fields to FileMetaData recently and are planning to add more. This refactoring separate the minimum information for accessing the file. This object is copyable (FileMetaData is not copyable since the ref counter). I hope this refactoring can enable further improvements: (1) use it to design a more efficient data structure to speed up read queries. (2) in the future, when we add information of storage level, we can easily do the encoding, instead of enlarge this structure, which might expand memory work set for file meta data. The definition is same as current EncodedFileMetaData used in two level iterator, so now the logic in two level iterator is easier to understand. Test Plan: make all check Reviewers: haobo, igor, ljin Reviewed By: ljin Subscribers: leveldb, dhruba, yhchiang Differential Revision: https://reviews.facebook.net/D18933	2014-06-16 16:10:52 -07:00
Bradley Grainger	2d02ec6533	Add separate Read/WriteUnlock methods in MutexRW. Some platforms, particularly Windows, do not have a single method that can release both a held reader lock and a held writer lock; instead, a separate method (ReleaseSRWLockShared or ReleaseSRWLockExclusive) must be called in each case. This may also be necessary to back MutexRW with a shared_mutex in C++14; the current language proposal includes both an unlock() and a shared_unlock() method.	2014-06-16 15:41:46 -07:00
sdong	983c93d731	VersionSet::Get(): Bring back the logic of skipping key range check when there are <=3 level 0 files Summary: https://reviews.facebook.net/D17205 removed the logic of skipping file key range check when there are less than 3 level 0 files. This patch brings it back. Other than that, add another small optimization to avoid to check all the levels if most higher levels don't have any file. Test Plan: make all check Reviewers: ljin Reviewed By: ljin Subscribers: yhchiang, igor, haobo, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D19035	2014-06-13 15:51:44 -07:00
Lei Jin	c83b085770	prefetch bloom filter data block for L0 files Summary: as title Test Plan: db_bench the initial result is very promising. I will post results of complete runs Reviewers: dhruba, haobo, sdong, igor Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18867	2014-06-12 10:06:18 -07:00
Lei Jin	77db08f27b	fix forward iterator bug Summary: obvious Test Plan: db_test Reviewers: sdong, haobo, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18987	2014-06-10 09:57:26 -07:00
sdong	80f409ea37	Clean PlainTableReader's variables for better data locality Summary: Clean PlainTableReader's data structures: (1) inline bloom_ (in order to do this, change DynamicBloom to allow lazy initialization) (2) remove some variables only used when initialization from the class (3) put variables not used in normal read code paths to the end of the class and reference prefix_extractor directly (4) make Options a reference. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: ljin Subscribers: igor, yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18891	2014-06-09 13:53:39 -07:00
Igor Canadi	0365eaf12e	remove unnecessary printf	2014-06-06 18:27:44 -07:00
Igor Canadi	a0191c9dfe	Create Missing Column Families Summary: Provide an convenience option to create column families if they are missing from the DB. Task #4460490 Test Plan: added unit test. also, stress test for some time Reviewers: sdong, haobo, dhruba, ljin, yhchiang Reviewed By: yhchiang Subscribers: yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D18951	2014-06-06 18:04:56 -07:00
Igor Canadi	99d3eed2fd	Write Fast-path for single column family Summary: We have a perf regression of Write() even with one column family. Make fast path for single column family to avoid the perf regression. See task #4455480 Test Plan: make check Reviewers: sdong, ljin Reviewed By: sdong, ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18963	2014-06-06 17:26:23 -07:00
sdong	b92a19a431	sst_dump: Set dummy prefix extractor for binary search index in block based table Summary: Now sst_dump fails in block based tables if binary search index is used, as it requires a prefix extractor. Add it. Test Plan: Run it against such a file to make sure it fixes the problem. Reviewers: yhchiang, kailiu Reviewed By: kailiu Subscribers: ljin, igor, dhruba, haobo, leveldb Differential Revision: https://reviews.facebook.net/D18927	2014-06-05 15:37:23 -07:00
Igor Canadi	5d870717ae	Correctly preallocate files in universal compaction Summary: In universal compaction, MaxFileSizeForLevel is ULLONG_MAX. We've been preallocation files to UULONG_MAX size all these time :) Test Plan: make check Reviewers: dhruba, haobo, ljin, sdong, yhchiang Reviewed By: yhchiang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18915	2014-06-05 13:19:35 -07:00
Igor Canadi	fd27001072	Fix compile errors on Mac Summary: https://phabricator.fb.com/P11372644 Test Plan: compiles Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18873	2014-06-03 12:28:58 -07:00
sdong	df9069d23f	In DB::NewIterator(), try to allocate the whole iterator tree in an arena Summary: In this patch, try to allocate the whole iterator tree starting from DBIter from an arena 1. ArenaWrappedDBIter is created when serves as the entry point of an iterator tree, with an arena in it. 2. Add an option to create iterator from arena for following iterators: DBIter, MergingIterator, MemtableIterator, all mem table's iterators, all table reader's iterators and two level iterator. 3. MergeIteratorBuilder is created to incrementally build the tree of internal iterators. It is passed to mem table list and version set and add iterators to it. Limitations: (1) Only DB::NewIterator() without tailing uses the arena. Other cases, including readonly DB and compactions are still from malloc (2) Two level iterator itself is allocated in arena, but not iterators inside it. Test Plan: make all check Reviewers: ljin, haobo Reviewed By: haobo Subscribers: leveldb, dhruba, yhchiang, igor Differential Revision: https://reviews.facebook.net/D18513	2014-06-02 17:44:57 -07:00
Igor Canadi	91ddd587cc	Only signal cond variable if need to Summary: At the end of BackgroundCallCompaction(), we call SignalAll(), even though we don't need to. If compaction hasn't done anything and there's another compaction running, there is no need to signal on the condition variable. Doing so creates a tight feedback loop which results in log files like: wait for memtable flush compaction nothing to do wait for memtable flush compaction nothing to do This change eliminates that Test Plan: make check Also: icanadi@dev1440 ~ $ grep "nothing to do" /fast-rocksdb-tmp/rocksdb_test/column_family_test/LOG \| wc -l 7435 icanadi@dev1440 ~ $ grep "nothing to do" /fast-rocksdb-tmp/rocksdb_test/column_family_test/LOG \| wc -l 372 First version is before the change, second version is after the change. Reviewers: dhruba, ljin, haobo, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18855	2014-06-02 17:23:55 -07:00
Igor Canadi	8cb7ad83c3	Flush stale column families less aggressively Summary: We've seen some production issues where column family is detected as stale, although there is only one column family in the system. This is a quick fix that: 1) doesn't flush stale column families if there's only one of them 2) Use 4 as a coefficient instead of 2 for determening when a column family is stale. This will make flushing less aggressive, while still keep a nice dynamic flushing of very stale CFs. Test Plan: make check Reviewers: dhruba, haobo, ljin, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18861	2014-06-02 15:33:54 -07:00
Lei Jin	388d2054c7	forward iterator Summary: Forward iterator puts everything together in a flat structure instead of a hierarchy of nested iterators. this should simplify the code and provide better performance. It also enables more optimization since all information are accessiable in one place. Init evaluation shows about 6% improvement Test Plan: db_test and db_bench Reviewers: dhruba, igor, tnovak, sdong, haobo Reviewed By: haobo Subscribers: sdong, leveldb Differential Revision: https://reviews.facebook.net/D18795	2014-05-30 14:31:55 -07:00
Lei Jin	f29c62fc6f	add an iterator refresh option for SeekRandom Summary: One more option to allow iterator refreshing when using normal iterator Test Plan: ran db_bench Reviewers: haobo, sdong, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18849	2014-05-30 14:09:22 -07:00
Igor Canadi	6de6a06631	FIFO compaction style Summary: Introducing new compaction style -- FIFO. FIFO compaction style has write amplification of 1 (+1 for WAL) and it deletes the oldest files when the total DB size exceeds pre-configured values. FIFO compaction style is suited for storing high-frequency event logs. Test Plan: Added a unit test Reviewers: dhruba, haobo, sdong Reviewed By: dhruba Subscribers: alberts, leveldb Differential Revision: https://reviews.facebook.net/D18765	2014-05-21 11:43:35 -07:00
Igor Canadi	b2cf95fe38	Call EnableFileDeletions with false as argument	2014-05-20 14:28:51 -07:00
sdong	4e0602f941	Remove maximum key_size check in db_bench Summary: Key size limit doesn't seem to be applicable anymore. Remove it. Test Plan: run a couple of tests in db_bench Reviewers: haobo, igor, yhchiang, dhruba Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D18723	2014-05-15 11:06:37 -07:00
Igor Canadi	f4574449e9	Clean up compaction logging Summary: Cleaned up compaction logging a little bit. Now file sizes are easier to read. Also, removed the trailing space. Test Plan: verified that i'm happy with logging output: files_size[#33(seq=101,sz=98KB,0) #31(seq=81,sz=159KB,0) #26(seq=0,sz=637KB,0)] Reviewers: sdong, haobo, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18549	2014-05-14 12:13:50 -07:00
sdong	3e4a9ec241	Arena to inline 2KB of data in it. Summary: In order to use arena to a use case that the total allocation size might be small (LogBuffer is already such a case), inline 1KB of data in it, so that it can be mostly in stack or inline in another class. If always inlining 2KB is a concern, I could make it a template to determine what to inline. However, dependents need to changes. Doesn't go with it for now Test Plan: make all check. Reviewers: haobo, igor, yhchiang, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18609	2014-05-14 11:49:01 -07:00
Igor Canadi	26f5dd9a5a	TablePropertiesCollectorFactory Summary: This diff addresses task #4296714 and rethinks how users provide us with TablePropertiesCollectors as part of Options. Here's description of task #4296714: I'm debugging #4295529 and noticed that our count of user properties kDeletedKeys is wrong. We're sharing one single InternalKeyPropertiesCollector with all Table Builders. In LOG Files, we're outputting number of kDeletedKeys as connected with a single table, while it's actually the total count of deleted keys since creation of the DB. For example, this table has 3155 entries and 1391828 deleted keys. The problem with current approach that we call methods on a single TablePropertiesCollector for all the tables we create. Even worse, we could do it from multiple threads at the same time and TablePropertiesCollector has no way of knowing which table we're calling it for. Good part: Looks like nobody inside Facebook is using Options::table_properties_collectors. This means we should be able to painfully change the API. In this change, I introduce TablePropertiesCollectorFactory. For every table we create, we call `CreateTablePropertiesCollector`, which creates a TablePropertiesCollector for a single table. We then use it sequentially from a single thread, which means it doesn't have to be thread-safe. Test Plan: Added a test in table_properties_collector_test that fails on master (build two tables, assert that kDeletedKeys count is correct for the second one). Also, all other tests Reviewers: sdong, dhruba, haobo, kailiu Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D18579	2014-05-13 12:30:55 -07:00
Yueh-Hsuan Chiang	1c7799d8aa	Fixed a file-not-found issue when a log file is moved to archive. Summary: Fixed a file-not-found issue when a log file is moved to archive by doing a missing retry. Test Plan: make db_test export ROCKSDB_TEST=TransactionLogIteratorRace ./db_test Reviewers: sdong, haobo Reviewed By: sdong CC: igor, leveldb Differential Revision: https://reviews.facebook.net/D18669	2014-05-12 17:50:21 -07:00
sdong	acd17fd002	Remove unused variable in DBIter Summary: as title Test Plan: Still compile Reviewers: haobo, igor, yhchiang Reviewed By: igor CC: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18603	2014-05-09 13:20:35 -07:00
Igor Canadi	a1068c91a1	Make RocksDB work with newer gflags Summary: Newer gflags switched from `google` namespace to `gflags` namespace. See: https://github.com/facebook/rocksdb/issues/139 and https://github.com/facebook/rocksdb/issues/102 Unfortunately, they don't define any macro with their namespace, so we need to actually try to compile gflags with two different namespace to figure out which one is the correct one. Test Plan: works in fbcode environemnt. I'll also try in ubutnu with newer gflags Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D18537	2014-05-08 17:25:13 -07:00
Igor Canadi	8e37a29bfb	Compaction with zero outputs Summary: We had a hypothesis in https://reviews.facebook.net/D18507 that empty-string internal keys might have been caused by compaction filter deleting all the entries. I added a unit test for that case. Unforutnately, everything works as expected. Test Plan: this is a test Reviewers: dhruba, haobo, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18519	2014-05-08 13:48:39 -07:00
Igor Canadi	768d424dd9	[fix] SIGSEGV when VersionEdit in MANIFEST is corrupted Summary: This was reported by our customers in task #4295529. Cause: * MANIFEST file contains a VersionEdit, which contains file entries whose 'smallest' and 'largest' internal keys are empty. String with zero characters. Root cause of corruption was not investigated. We should report corruption when this happens. However, we currently SIGSEGV. Here's what happens: * VersionEdit encodes zero-strings happily and stores them in smallest and largest InternalKeys. InternalKey::Encode() does assert when `rep_.empty()`, but we don't assert in production environemnts. Also, we should never assert as a result of DB corruption. * As part of our ConsistencyCheck, we call GetLiveFilesMetaData() * GetLiveFilesMetadata() calls `file->largest.user_key().ToString()` * user_key() function does: 1. assert(size > 8) (ooops, no assert), 2. returns `Slice(internal_key.data(), internal_key.size() - 8)` * since `internal_key.size()` is unsigned int, this call translates to `Slice(whatever, 1298471928561892576182756)`. Bazinga. Fix: * VersionEdit checks if InternalKey is valid in `VersionEdit::GetInternalKey()`. If it's invalid, returns corruption. Lessons learned: * Always keep in mind that even if you `assert()`, production code will continue execution even if assert fails. * Never `assert` based on DB corruption. Assert only if the code should guarantee that assert can't fail. Test Plan: dumped offending manifest. Before: assert. Now: corruption Reviewers: dhruba, haobo, sdong Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D18507	2014-05-07 16:52:12 -07:00
sdong	9efbd85ac9	fsync directory after creating current file in NewDB() Summary: One of our users reported current file corruption. The machine was rebooted during the time. This is the only think I can think of which could cause current file corruption. Just add this paranoid check. Test Plan: make all check Reviewers: haobo, igor Reviewed By: haobo CC: yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18495	2014-05-06 17:51:33 -07:00
sdong	3a171dcb51	Pass logger to memtable rep and TLB page allocation error logged to info logs Summary: TLB page allocation errors are now logged to info logs, instead of stderr. In order to do that, mem table rep's factory functions take a info logger now. Test Plan: make all check Reviewers: haobo, igor, yhchiang Reviewed By: yhchiang CC: leveldb, yhchiang, dhruba Differential Revision: https://reviews.facebook.net/D18471	2014-05-05 16:43:37 -07:00
Igor Canadi	15c3991933	Add comment about ValueType	2014-05-05 12:57:47 -07:00
Igor Canadi	d2569fea47	log_and_apply_bench on a new benchmark framework Summary: db_test includes Benchmark for LogAndApply. This diff removes it from db_test and puts it into a separate log_and_apply bench. I just wanted to play around with our new benchmark framework and figure out how it works. I would also like to show you how great it is! I believe right set of microbenchmarks can speed up our productivity a lot and help catch early regressions. Test Plan: no Reviewers: dhruba, haobo, sdong, ljin, yhchiang Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D18261	2014-05-05 11:11:48 -07:00
sdong	4a7c747064	Revert "Revert "Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB"" And make the default 0 for hash linked list memtable This reverts commit `d69dc64be7`.	2014-05-04 13:56:29 -07:00
Igor Canadi	d69dc64be7	Revert "Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB" This reverts commit `7dafa3a1d7`.	2014-05-04 08:37:09 -07:00
Igor Canadi	0afc8bc29a	xxHash Summary: Originally: https://github.com/facebook/rocksdb/pull/87/files I'm taking over to apply some finishing touches Test Plan: will add tests Reviewers: dhruba, haobo, sdong, yhchiang, ljin Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D18315	2014-05-01 14:09:32 -04:00
Igor Canadi	096f5be0ed	Put column family information in LiveFileMetaData Summary: As summary Test Plan: compiles :) Reviewers: dhruba, haobo, sdong, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18405	2014-04-30 16:24:52 -04:00
Igor Canadi	16f1aa7b2d	Fix signed/unsigned compare	2014-04-30 14:38:01 -04:00
Igor Canadi	df70047669	Flush stale column families Summary: Added a new option `max_total_wal_size`. Once the total WAL size goes over that, we make an attempt to flush all column families that still have data in the earliest WAL file. By default, I calculate `max_total_wal_size` dynamically, that should be good-enough for non-advanced customers. Test Plan: Added a test Reviewers: dhruba, haobo, sdong, ljin, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18345	2014-04-30 14:33:40 -04:00
sdong	7dafa3a1d7	Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB Summary: Add an option to allocate a piece of memory from huge page TLB. Add options to trigger it in dynamic bloom, plain table indexes andhash linked list hash table. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: haobo CC: nkg-, dhruba, leveldb, igor, yhchiang Differential Revision: https://reviews.facebook.net/D18357	2014-04-30 11:02:26 -07:00
Igor Canadi	66f88c43a5	Some fixes as preparation for release	2014-04-30 09:03:24 -07:00
Igor Canadi	d6d67c0efe	More s/us fixes	2014-04-30 07:04:36 -07:00
Yueh-Hsuan Chiang	9d9d2965cb	Add a new mem-table representation based on cuckoo hash. Summary: = Major Changes = * Add a new mem-table representation, HashCuckooRep, which is based cuckoo hash. Cuckoo hash uses multiple hash functions. This allows each key to have multiple possible locations in the mem-table. - Put: When insert a key, it will try to find whether one of its possible locations is vacant and store the key. If none of its possible locations are available, then it will kick out a victim key and store at that location. The kicked-out victim key will then be stored at a vacant space of its possible locations or kick-out another victim. In this diff, the kick-out path (known as cuckoo-path) is found using BFS, which guarantees to be the shortest. - Get: Simply tries all possible locations of a key --- this guarantees worst-case constant time complexity. - Time complexity: O(1) for Get, and average O(1) for Put if the fullness of the mem-table is below 80%. - Default using two hash functions, the number of hash functions used by the cuckoo-hash may dynamically increase if it fails to find a short-enough kick-out path. - Currently, HashCuckooRep does not support iteration and snapshots, as our current main purpose of this is to optimize point access. = Minor Changes = * Add IsSnapshotSupported() to DB to indicate whether the current DB supports snapshots. If it returns false, then DB::GetSnapshot() will always return nullptr. Test Plan: Run existing tests. Will develop a test specifically for cuckoo hash in the next diff. Reviewers: sdong, haobo Reviewed By: sdong CC: leveldb, dhruba, igor Differential Revision: https://reviews.facebook.net/D16155	2014-04-29 17:13:46 -07:00
Igor Canadi	f1c9aa6ebe	More unsigned/signed compare fixes	2014-04-29 13:01:06 -07:00
Igor Canadi	38693d99c4	Fix more signed/unsigned comparsions	2014-04-29 12:40:18 -07:00
Igor Canadi	dd9eb7a7d5	Cache result of ReadFirstRecord() Summary: ReadFirstRecord() reads the actual log file from disk on every call. This diff introduces a cache layer on top of ReadFirstRecord(), which should significantly speed up repeated calls to GetUpdatesSince(). I also cleaned up some stuff, but the whole TransactionLogIterator could use some refactoring, especially if we see increased usage. Test Plan: make check Reviewers: haobo, sdong, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18387	2014-04-29 13:27:58 -04:00
Igor Canadi	91ef2eae23	Use new DBWithTTL API in tests	2014-04-28 23:46:24 -04:00
Igor Canadi	72ff275e3c	Fix TransactionLogIterator EOF caching Summary: When TransactionLogIterator comes to EOF, it calls UnmarkEOF and continues reading. However, if glibc cached the EOF status of the file, it will get EOF again, even though the new data might have been written to it. This has been causing errors in Mac OS. Test Plan: test passes, was failing before Reviewers: dhruba, haobo, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18381	2014-04-28 23:30:27 -04:00
Donovan Hide	4f9fae9bb7	Add rocksdb_open_for_read_only to C API	2014-04-27 20:57:10 +01:00
Igor Canadi	c489499a2b	Fix OSX compile	2014-04-26 17:15:43 -04:00
Lei Jin	ccaca59bee	avoid calling FindFile twice in TwoLevelIterator for PlainTable Summary: this is to reclaim the regression introduced in https://reviews.facebook.net/D17853 Test Plan: make all check Reviewers: igor, haobo, sdong, dhruba, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17985	2014-04-25 12:23:07 -07:00
Lei Jin	d642c60bdc	Check PrefixMayMatch on Seek() Summary: As a follow-up diff for https://reviews.facebook.net/D17805, add optimization to check PrefixMayMatch on Seek() Test Plan: make all check Reviewers: igor, haobo, sdong, yhchiang, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17853	2014-04-25 12:22:23 -07:00
Lei Jin	3995e801ab	kill ReadOptions.prefix and .prefix_seek Summary: also add an override option total_order_iteration if you want to use full iterator with prefix_extractor Test Plan: make all check Reviewers: igor, haobo, sdong, yhchiang Reviewed By: haobo CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D17805	2014-04-25 12:21:34 -07:00
Igor Canadi	8ce5492623	Delete superversion and log outside of mutex Summary: As summary. Add two autovectors that get filled up in MakeRoomForWrite and they get deleted outside of mutex Test Plan: make check Reviewers: dhruba, haobo, ljin, sdong Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D18249	2014-04-25 14:58:02 -04:00
Igor Canadi	ad3cd39ccd	Column family logging Summary: Now that we have column families involved, we need to add extra context to every log message. They now start with "[column family name] log message" Also added some logging that I think would be useful, like level summary after every flush (I often needed that when going through the logs). Test Plan: make check + ran db_bench to confirm I'm happy with log output Reviewers: dhruba, haobo, ljin, yhchiang, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18303	2014-04-25 09:51:16 -04:00
Igor Canadi	4cd9f58c04	Fix corruption test	2014-04-24 14:56:41 -04:00
Igor Canadi	478990c81b	Make CompactionInputErrorParanoid less flakey Summary: I'm getting lots of e-mails with CompactionInputErrorParanoid failing. Most recent example early morning today was: http://ci-builds.fb.com/job/rocksdb_valgrind/562/consoleFull I'm putting a stop to these e-mails. I investigated why the test is flakey and it turns out it's because of non-determinsim of compaction scheduling. If there is a compaction after the last flush, CorruptFile will corrupt the compacted file instead of file at level 0 (as it assumes). That makes `Check(9, 9)` fail big time. I also saw some errors with table file getting outputed to >= 1 levels instead of 0. Also fixed that. Test Plan: Ran corruption_test 100 times without a failure. Previously it usually failed at 10th occurrence. Reviewers: dhruba, haobo, ljin Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D18285	2014-04-24 11:13:28 -07:00
sdong	4de5b84ee0	Fix a bug in IterKey Summary: IterKey set buffer_size_ to a wrong initial value, causing it to always allocate values from heap instead of stack if the key size is smaller. Fix it. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: haobo CC: igor, dhruba, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D18279	2014-04-23 19:45:22 -07:00
Igor Canadi	f9f8965e96	Print out stack trace in mac, too Summary: While debugging Mac-only issue with ThreadLocalPtr, this was very useful. Let's print out stack trace in MAC OS, too. Test Plan: Verified that somewhat useful stack trace was generated on mac. Will run PrintStack() on linux, too. Reviewers: ljin, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18189	2014-04-23 09:11:35 -04:00
sdong	a570740727	Expose number of entries in mem tables to users Summary: In this patch, two new DB properties are defined: rocksdb.num-immutable-mem-table and rocksdb.num-entries-imm-mem-tables, from where number of entries in mem tables can be exposed to users Test Plan: Cover the codes in db_test make all check Reviewers: haobo, ljin, igor Reviewed By: igor CC: nkg-, igor, yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18207	2014-04-22 22:13:21 -07:00
Lei Jin	5f1daf7ae3	get rid of shared_ptr in memtable.cc Summary: Get rid of the devil. Probably won't impact anything on the perf side. Test Plan: make all check Reviewers: igor, haobo, sdong, yhchiang Reviewed By: haobo CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D18153	2014-04-22 21:14:25 -07:00
sdong	86a0133d05	PlainTableReader to expose index size to users Summary: This is a temp solution to expose index sizes to users from PlainTableReader before we persistent them to files. In this patch, the memory consumption of indexes used by PlainTableReader will be reported as two user defined properties, so that users can monitor them. Test Plan: Add a unit test. make all check` Reviewers: haobo, ljin Reviewed By: haobo CC: nkg-, yhchiang, igor, ljin, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18195	2014-04-22 19:29:05 -07:00
Igor Canadi	1068d2fa60	Revert "Better port::Mutex::AssertHeld() and AssertNotHeld()" This reverts commit `ddafceb6c2`.	2014-04-22 18:38:10 -07:00
Igor Canadi	ddafceb6c2	Better port::Mutex::AssertHeld() and AssertNotHeld() Summary: Using ThreadLocalPtr as a flag to determine if a mutex is locked or not enables us to implement AssertNotHeld(). It also makes AssertHeld() actually correct. I had to remove port::Mutex as a dependency for util/thread_local.h, but that's fine since we can just use std::mutex :) Test Plan: make check Reviewers: ljin, dhruba, haobo, sdong, yhchiang Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D18171	2014-04-22 17:26:21 -07:00
Igor Canadi	3992aec8fa	Support for column families in TTL DB Summary: This will enable people using TTL DB to do so with multiple column families. They can also specify different TTLs for each one. TODO: Implement CreateColumnFamily() in TTL world. Test Plan: Added a very simple sanity test. Reviewers: dhruba, haobo, ljin, sdong, yhchiang Reviewed By: haobo CC: leveldb, alberts Differential Revision: https://reviews.facebook.net/D17859	2014-04-22 11:27:33 -07:00
Igor Canadi	8dc34364d2	Rename "benchmark" back to "bench". Also, make `benchharness.cc` not compiled into rocksdb library.	2014-04-21 13:12:15 -07:00
Pratyush Seth	ff1b5df4c6	Added benchmark functionality on the lines of folly/Benchmark.h Summary: Added benchmark functionality on the lines of folly/Benchmark.h Test Plan: Added unit tests Reviewers: igor, haobo, sdong, ljin, yhchiang, dhruba Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17973	2014-04-21 12:29:55 -07:00
Igor Canadi	f813279da5	Remove TransactionLogIteratorRace when -DNDEBUG	2014-04-21 11:08:30 -07:00
Lei Jin	0f2d768191	hints for narrowing down FindFile range and avoiding checking unrelevant L0 files Summary: The file tree structure in Version is prebuilt and the range of each file is known. On the Get() code path, we do binary search in FindFile() by comparing target key with each file's largest key and also check the range for each L0 file. With some pre-calculated knowledge, each key comparision that has been done can serve as a hint to narrow down further searches: (1) If a key falls within a L0 file's range, we can safely skip the next file if its range does not overlap with the current one. (2) If a key falls within a file's range in level L0 - Ln-1, we should only need to binary search in the next level for files that overlap with the current one. (1) will be able to skip some files depending one the key distribution. (2) can greatly reduce the range of binary search, especially for bottom levels, given that one file most likely only overlaps with N files from the level below (where N is max_bytes_for_level_multiplier). So on level L, we will only look at ~N files instead of N^L files. Some inital results: measured with 500M key DB, when write is light (10k/s = 1.2M/s), this improves QPS ~7% on top of blocked bloom. When write is heavier (80k/s = 9.6M/s), it gives us ~13% improvement. Test Plan: make all check Reviewers: haobo, igor, dhruba, sdong, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17205	2014-04-21 09:10:12 -07:00
sdong	651792251a	Fix bugs introduced by D17961 Summary: D17961 has two bugs: (1) two level iterator fails to populate FileMetaData.table_reader, causing performance regression. (2) table cache handle the !status.ok() case in the wrong place, causing seg fault which shouldn't happen. Test Plan: make all check Reviewers: ljin, igor, haobo Reviewed By: ljin CC: yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D17991	2014-04-17 17:25:28 -07:00
sdong	fa430bfd04	Minimize accessing multiple objects in Version::Get() Summary: One of our profilings shows that Version::Get() sometimes is slow when getting pointer of user comparators or other global objects. In this patch: (1) we keep pointers of immutable objects in Version to avoid accesses them though option objects or cfd objects (2) table_reader is directly cached in FileMetaData so that table cache don't have to go through handle first to fetch it (3) If level 0 has less than 3 files, skip the filtering logic based on SST tables' key range. Smallest and largest key are stored in separated memory locations, which has potential cache misses Test Plan: make all check Reviewers: haobo, ljin Reviewed By: haobo CC: igor, yhchiang, nkg-, leveldb Differential Revision: https://reviews.facebook.net/D17739	2014-04-17 14:14:00 -07:00
Igor Canadi	161d9e586b	Don't overflow size_t in mac	2014-04-16 15:15:22 -07:00
Igor Canadi	5c12f27791	Remove tautological assert	2014-04-16 09:09:28 -07:00
Igor Canadi	faf7691358	Close DB at the end of DontRollEmptyLogs test	2014-04-15 17:20:56 -07:00
Igor Canadi	1803ed2ccb	Fix Mac OS compile	2014-04-15 16:31:49 -07:00
Igor Canadi	7d838856cf	Fix compile issues when doing make release	2014-04-15 16:00:10 -07:00
sdong	0f40fe4bc7	When creating a new DB, fail it when wal_dir contains existing log files Summary: Current behavior of creating new DB is, if there is existing log files, we will go ahead and replay them on top of empty DB. This is a behavior that no user would expect. With this patch, we will fail the creation if a user creates a DB with existing log files. Test Plan: make all check Reviewers: haobo, igor, ljin Reviewed By: haobo CC: nkg-, yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D17817	2014-04-15 14:01:57 -07:00
Igor Canadi	c166615850	Fix compile issues introduced by RocksDBLite	2014-04-15 13:51:07 -07:00
Igor Canadi	588bca2020	RocksDBLite Summary: Introducing RocksDBLite! Removes all the non-essential features and reduces the binary size. This effort should help our adoption on mobile. Binary size when compiling for IOS (`TARGET_OS=IOS m static_lib`) is down to 9MB from 15MB (without stripping) Test Plan: compiles :) Reviewers: dhruba, haobo, ljin, sdong, yhchiang Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D17835	2014-04-15 13:39:26 -07:00
Igor Canadi	dbe0f327ca	Set log_empty to false even when options.sync is off [fix tests]	2014-04-15 10:28:34 -07:00
Igor Canadi	e6acb874cd	Don't roll empty logs Summary: With multiple column families, especially when manual Flush is executed, we might roll the log file, although the current log file is empty (no data has been written to the log). After the diff, we won't create new log file if current is empty. Next, I will write an algorithm that will flush column families that reference old log files (i.e., that weren't flushed in a while) Test Plan: Added an unit test. Confirmed that unit test failes in master Reviewers: dhruba, haobo, ljin, sdong Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17631	2014-04-15 09:57:25 -07:00
sdong	c87ed0942c	Fix db_bench's multireadrandom Summary: multireadrandom is broken. Fix it Test Plan: run it and see segfault has gone. Reviewers: ljin Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17781	2014-04-14 15:43:34 -07:00
Yueh-Hsuan Chiang	118f88d25d	Fix compile error in tailing_iter.h Summary: Fix the following compile error ./db/tailing_iter.h:17:1: error: class 'SuperVersion' was previously declared as a struct [-Werror,-Wmismatched-tags] class SuperVersion; ^ ./db/column_family.h:77:8: note: previous use is here struct SuperVersion { ^ ./db/tailing_iter.h:17:1: note: did you mean struct here? class SuperVersion; ^~~~~ struct 1 error generated. Test Plan: make Reviewers: ljin, igor, haobo, sdong Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17799	2014-04-14 14:05:15 -07:00
Yueh-Hsuan Chiang	327102efa5	Fix merge_test failure due to incorrect assert behavior in the release mode.	2014-04-14 12:06:49 -07:00
Lei Jin	82b37a18bd	thread local for tailing iterator Summary: replace the super version acquisision in tailing itrator with thread local Test Plan: will post results Reviewers: igor, haobo, sdong, yhchiang, dhruba Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17757	2014-04-14 10:48:01 -07:00
Lei Jin	539dd207df	using thread local SuperVersion for NewIterator Summary: Similar to GetImp(), use SuperVersion from thread local instead of acquriing mutex. I don't expect this change will make a dent on NewIterator() performance because the bottleneck seems to be on the rest part of the API Test Plan: make asan_check will post perf numbers Reviewers: haobo, igor, sdong, dhruba, yhchiang Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D17643	2014-04-14 09:34:59 -07:00
sdong	d5e087b6df	db_bench: add a mode to operate multiple DBs Summary: This patch introduces a new parameter num_multi_db in db_bench. When this parameter is larger than 1, multiple DBs will be created. In all benchmarks, any operation applies to a random DB among them. This is to benchmark the performance of similar applications. Test Plan: run db_bench on both of num_multi_db=0 and more. Reviewers: haobo, ljin, igor Reviewed By: igor CC: igor, yhchiang, dhruba, nkg-, leveldb Differential Revision: https://reviews.facebook.net/D17769	2014-04-11 16:59:08 -07:00
Lei Jin	eba3fc644a	make corruption_test:CompactionInputErrorParanoid deterministic Summary: it writes ~10M data, default L0 compaction trigger is 4, plus 2 writer buffer, so that can accommodate ~6M data before compaction happens for sure. I guess encoding is doing a good job to shrink the data so that sometime, compaction does not get triggered. I get test failure quite often. Test Plan: ran it multiple times and all got pass Reviewers: igor, sdong Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D17775	2014-04-11 12:48:38 -07:00
Igor Canadi	de41357a18	Don't dump rocksdb version on IOS	2014-04-11 10:19:58 -07:00
Lei Jin	0af36d6aa6	SeekRandomWhileWriting Summary: as title Test Plan: ran it Reviewers: igor, haobo, yhchiang Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D17751	2014-04-11 09:47:20 -07:00
Kai Liu	1405232b6d	Temporarily disable a test case in db_test Summary: Root cause is still under investigation. Just Disable the troubling use case for now.	2014-04-10 17:17:39 -07:00
Igor Canadi	ddef6841b3	Renamed InfoLogLevel::DEBUG to InfoLogLevel::DEBUG_LEVEL Summary: XCode for some reason injects `#define DEBUG 1` into our code, which makes compile fail because we use `DEBUG` keyword for other stuff. This diff fixes the issue by renaming `DEBUG` to `DEBUG_LEVEL`. Test Plan: compiles Reviewers: dhruba, haobo, sdong, yhchiang, ljin Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D17709	2014-04-10 15:27:42 -07:00
Kai Liu	75b59d5146	Enable hash index for block-based table Summary: Based on previous patches, this diff eventually provides the end-to-end mechanism for users to specify the hash-index. Test Plan: Wrote several new unit tests. Reviewers: sdong, haobo, dhruba Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D16539	2014-04-10 14:19:43 -07:00
Lei Jin	7a92537fc4	db_bench: add IteratorCreationWhileWriting mode and allow prefix_seek Summary: as title Test Plan: ran it Reviewers: igor, haobo, yhchiang Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17655	2014-04-10 10:15:59 -07:00
Igor Canadi	4daea66343	Turn on -Wmissing-prototypes Summary: Compiling for iOS has by default turned on -Wmissing-prototypes, which causes rocksdb to fail compiling. This diff turns on -Wmissing-prototypes in our compile options and cleans up all functions with missing prototypes. Test Plan: compiles Reviewers: dhruba, haobo, ljin, sdong Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17649	2014-04-09 21:17:14 -07:00
sdong	df2a8b6a1a	Polish IterKey and use it in DBImpl::ProcessKeyValueCompaction() Summary: 1. Polish IterKey a little bit. 2. Turn to use it in local parameter of current_user_key in DBImpl::ProcessKeyValueCompaction(). Our profile showing that DBImpl::ProcessKeyValueCompaction() has about 14% costs in std::string (the base including reading and writing data but excluding compaction filtering), which is higher than it should be. There are two std::string used in DBImpl::ProcessKeyValueCompaction(), compaction_filter_value and current_user_key and it's hard to distinguish the two. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: haobo CC: igor, yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D17613	2014-04-09 20:50:58 -07:00
Igor Canadi	dc55903293	Improved CompressedCache Summary: This is testing behavior that was reported in https://github.com/facebook/rocksdb/issues/111 No issue was found, but it still good to commit this and make CompressedCache more robust. Test Plan: this is a plan Reviewers: ljin, dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D17625	2014-04-09 11:43:14 -07:00
Lei Jin	4824014e3b	speed up db_bench filluniquerandom mode Summary: filluniquerandom is painfully slow due to the naive bitmap check to find out if a key has been seen before. Majority of time is spent on searching the last few keys. Split a giant BitSet to smaller ones so that we can quickly check if a BitSet is full and thus can skip quickly. It used to take over one hour to filluniquerandom for 100M keys, now it takes about 10 mins. Test Plan: unit test also verified correctness in db_bench and make sure all keys are generated Reviewers: igor, haobo, yhchiang Reviewed By: igor CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D17607	2014-04-09 11:25:21 -07:00
Igor Canadi	2014915d32	Fix ASAN issue	2014-04-09 10:38:05 -07:00
Igor Canadi	b947fdc89d	Column family support for DB::OpenForReadOnly() Summary: When opening DB in read-only mode, client can choose to only specify a subset of column families ("default" column family can't be omitted, though) Test Plan: added a unit test in column_family_test Reviewers: haobo, sdong, ljin, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17565	2014-04-09 09:56:17 -07:00
Igor Canadi	731e55c01c	Fix GetProperty() test Summary: GetProperty test is flakey. Before this diff: P8635927 After: P8635945 We need to make sure the thread is done before we destruct sleeping tasks. Otherwise, bad things happen. Test Plan: See summary Reviewers: ljin, sdong, haobo, dhruba Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17595	2014-04-08 14:57:00 -07:00
Igor Canadi	34455deb06	Fix Mac OS compile issues	2014-04-08 14:05:53 -07:00
Igor Canadi	5b345b76cb	Remove env_ from MergingIterator Summary: env_ is not used. Compiling for iOS complains. Test Plan: compiles now Reviewers: ljin, haobo, sdong, dhruba Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17589	2014-04-08 13:40:42 -07:00
Lei Jin	0c1126d4cf	db_bench cleanup Summary: clean up the db_bench a little bit. also avoid allocating memory for key in the loop Test Plan: I verified a run with filluniquerandom & readrandom. Iterator seek will be used lot to measure performance. Will fix whatever comes up Reviewers: haobo, igor, yhchiang Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17559	2014-04-08 11:21:09 -07:00
Igor Canadi	beeee9dccc	Small speedup of CompactionFilterV2 Summary: ToString() is expensive. Profiling shows that most compaction threads are stuck in jemalloc, allocating a new string. This will help out a litte. Test Plan: make check Reviewers: haobo, danguo Reviewed By: danguo CC: leveldb Differential Revision: https://reviews.facebook.net/D17583	2014-04-08 11:06:39 -07:00
Lei Jin	92c1eb0291	macros for perf_context Summary: This will allow us to disable them completely for iOS or for better performance Test Plan: will run make all check Reviewers: igor, haobo, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17511	2014-04-08 10:58:07 -07:00
sdong	5e2db3b434	PlainTableIterator not to store copied key in std::string Summary: Move PlainTableIterator's copied key from std::string local buffer to avoid paying the extra costs in std::string related to sharing. Reuse the same buffer class in DbIter. Move the class to dbformat.h. This patch improves iterator performance significantly. Running this benchmark: ./table_reader_bench --num_keys2=17 --iterator --plain_table --time_unit=nanosecond The average latency is improved to about 750 nanoseconds from 1100 nanoseconds. Test Plan: Add a unit test. make all check Reviewers: haobo, ljin Reviewed By: haobo CC: igor, yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D17547	2014-04-07 19:06:09 -07:00
Igor Canadi	664559fe2d	Small final fixes before merge	2014-04-07 15:38:53 -07:00
Igor Canadi	d1e2bce42d	CallFlushDuringCompaction	2014-04-07 15:03:15 -07:00
Igor Canadi	b42ceb9598	Simplify cleanup of dead (refcount == 0) column families	2014-04-07 14:31:02 -07:00
Igor Canadi	e48348d196	Make flush part of compaction process This will enable user to use only 1 background thread.	2014-04-07 13:53:08 -07:00
Igor Canadi	2a0917b28e	Merge branch 'master' into columnfamilies	2014-04-07 13:04:25 -07:00
Igor Canadi	751e4b1a35	Fix wal_dir sanitizing	2014-04-07 11:36:03 -07:00
Igor Canadi	3d2fe844ab	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_impl.h db/memtable_list.cc db/version_set.cc	2014-04-07 11:31:11 -07:00
Igor Canadi	7efdd9ef4d	Options::wal_dir shouldn't end in '/' Summary: If a client specifies wal_dir with trailing '/', we will fail in deleting obsolete log files. See task #4083746 Test Plan: make check Reviewers: haobo, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17535	2014-04-07 10:25:38 -07:00
sdong	ea0198fe9a	Create log::Writer out of DB Mutex Summary: Our measurement shows that sometimes new log::Write's constructor can take hundreds of milliseconds. It's unclear why but just simply move it out of DB mutex. Test Plan: make all check Reviewers: haobo, ljin, igor Reviewed By: haobo CC: nkg-, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D17487	2014-04-04 15:46:28 -07:00
Lei Jin	c90d446ee7	make hash_link_list Node's key space consecutively followed at the end Summary: per sdong's request, this will help processor prefetch on n->key case. Test Plan: make all check Reviewers: sdong, haobo, igor Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D17415	2014-04-04 15:37:28 -07:00
sdong	99c756f0fe	Flush Buffered Info Logs Before Doing Compaction (one line change) Summary: Flushing log buffer earlier to avoid confusion of time holding the locks. Test Plan: Should be safe as long as several related db test passes Reviewers: haobo, igor, ljin Reviewed By: igor CC: nkg-, leveldb Differential Revision: https://reviews.facebook.net/D17493	2014-04-04 10:58:30 -07:00
Igor Canadi	040657aec9	Fix MacOS errors	2014-04-03 16:04:34 -07:00
Igor Canadi	f76e4027ca	initialize candidate count	2014-04-03 11:45:44 -07:00
sdong	b9767d0e09	Move several more logging inside DB mutex to log buffer Summary: Move several some common logging still in DB mutex to log buffer. Test Plan: make all check Reviewers: haobo, igor, ljin, nkg- Reviewed By: nkg- CC: nkg-, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D17439	2014-04-03 10:47:18 -07:00
Thomas Adam	98422cba77	[C-API] implemented more options	2014-04-03 10:47:37 +02:00
Thomas Adam	3a30b5b0be	[C-API] added "rocksdb_options_set_plain_table_factory" to make it possible to use plain table factory	2014-04-03 10:47:37 +02:00
Haobo Xu	48bc0c6ad3	[RocksDB] Fix a race condition in GetSortedWalFiles Summary: This patch fixed a race condition where a log file is moved to archived dir in the middle of GetSortedWalFiles. Without the fix, the log file would be missed in the result, which leads to transaction log iterator gap. A test utility SyncPoint is added to help reproducing the race condition. Test Plan: TransactionLogIteratorRace; make check Reviewers: dhruba, ljin Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D17121	2014-04-02 22:12:29 -07:00
Igor Canadi	d1d19f5db3	Fix valgrind error in c_test	2014-04-02 17:24:30 -07:00
sdong	158845ba9a	Move a info logging out of DB Mutex Summary: As we know, logging can be slow, or even hang for some file systems. Move one more logging out of DB mutex. Test Plan: make all check Reviewers: haobo, igor, ljin Reviewed By: igor CC: yhchiang, nkg-, leveldb Differential Revision: https://reviews.facebook.net/D17427	2014-04-02 16:48:32 -07:00
sdong	4af1954fd6	Compaction Filter V1 to use old context struct to keep backward compatible Summary: The previous change D15087 changed existing compaction filter, which makes the commonly used class not backward compatible. Revert the older interface. Use a new interface for V2 instead. Test Plan: make all check Reviewers: haobo, yhchiang, igor CC: danguo, dhruba, ljin, igor, leveldb Differential Revision: https://reviews.facebook.net/D17223	2014-04-02 14:57:51 -07:00
sdong	284c365b77	Fix valgrind error caused by FileMetaData as two level iterator's index block handle Summary: It is a regression valgrind bug caused by using FileMetaData as index block handle. One of the fields of FileMetaData is not initialized after being contructed and copied, but I'm not able to find which one. Also, I realized that it's not a good idea to use FileMetaData as in TwoLevelIterator::InitDataBlock(), a copied FileMetaData can be compared with the one in version set byte by byte, but the refs can be changed. Also comparing such a large structure is slightly more expensive. Use a simpler structure instead Test Plan: Run the failing valgrind test (Harness.RandomizedLongDB) make all check Reviewers: igor, haobo, ljin Reviewed By: igor CC: yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D17409	2014-04-02 14:38:28 -07:00
Igor Canadi	8555ce2dec	Merge branch 'master' into columnfamilies	2014-04-02 10:48:05 -07:00
sdong	e0a87c4cf1	DBIter to use static allocated char array for saved_key_ (if it is not too long) Summary: DBIter now uses a std::string for saved_key. Based on some profiling, it could be more expensive than we though. Optimize it with the same technique as LookupKey -- if it is short, we copy it to a static allocated char. Otherwise, dynamically allocate memory for it. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: haobo CC: dhruba, igor, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D17289	2014-04-01 16:43:11 -07:00
sdong	d50619a559	PlainTableIterator::Seek() shouldn't check bloom filter in total order mode Summary: In total order mode, iterator's seek() shouldn't check total order. Also some cleaning up about checking null for shared pointers. I don't know the behavior before it. This bug was reported by @igor. Test Plan: test plain_table_db_test Reviewers: ljin, haobo, igor Reviewed By: igor CC: yhchiang, dhruba, igor, leveldb Differential Revision: https://reviews.facebook.net/D17391	2014-04-01 15:05:16 -07:00
Thomas Adam	38dc5ef45f	[C-API] added the possiblity to create a HashSkipList or HashLinkedList to support prefix seeks	2014-04-01 12:44:27 +02:00
Igor Canadi	ddbd1ece88	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_test.cc db/internal_stats.cc db/internal_stats.h db/version_edit.cc db/version_edit.h db/version_set.cc include/rocksdb/options.h util/options.cc	2014-03-31 13:39:24 -07:00
Igor Canadi	577556d5f9	Don't store version number in MANIFEST Summary: Talked to <insert internal project name> folks and they found it really scary that they won't be able to roll back once they upgrade to 2.8. We should fix this. Test Plan: make check Reviewers: haobo, ljin Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17343	2014-03-31 11:33:09 -07:00
Igor Canadi	8a139a054c	More valgrind issues! Summary: Fix some more CompactionFilterV2 valgrind issues. Maybe it would make sense for CompactionFilterV2 to delete its prefix_extractor? Test Plan: ran CompactionFilterV2* tests with valgrind. issues before patch -> no issues after Reviewers: haobo, sdong, ljin, dhruba Reviewed By: dhruba CC: leveldb, danguo Differential Revision: https://reviews.facebook.net/D17337	2014-03-29 10:34:47 -07:00
sdong	43a593a6d9	Change default value of some Options Summary: Since we are optimizing for server workloads, some default values are not optimized any more. We change some of those values that I feel it's less prone to regression bugs. Test Plan: make all check Reviewers: dhruba, haobo, ljin, igor, yhchiang Reviewed By: igor CC: leveldb, MarkCallaghan Differential Revision: https://reviews.facebook.net/D16995	2014-03-28 17:09:28 -07:00
sdong	2d3468c293	MemTableIterator not to reference Memtable Summary: In one of the perf, I shows 10%-25% CPU costs of MemTableIterator.Seek(), when doing dynamic prefix seek, are spent on checking whether we need to do bloom filter check or finding out the prefix extractor. Seems that more level of pointer checking makes CPU cache miss more likely. This patch makes things slightly simpler by copying pointer of bloom of prefix extractor into the iterator. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: ljin CC: igor, dhruba, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D17247	2014-03-28 16:46:25 -07:00
Lei Jin	0d755fff14	cache friendly blocked bloomfilter Summary: By constraining the probes within cache line(s), we can improve the cache miss rate thus performance. This probably only makes sense for in-memory workload so defaults the option to off. Numbers and comparision can be found in wiki: https://our.intern.facebook.com/intern/wiki/index.php/Ljin/rocksdb_perf/2014_03_17#Bloom_Filter_Study Test Plan: benchmarked this change substantially. Will run make all check as well Reviewers: haobo, igor, dhruba, sdong, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17133	2014-03-28 09:21:20 -07:00
Yueh-Hsuan Chiang	10cebec79e	Fix the bug in MergeUtil which causes mixing values of different keys. Summary: Fix the bug in MergeUtil which causes mixing values of different keys. Test Plan: stringappend_test make all check Reviewers: haobo, igor Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17235	2014-03-27 16:15:25 -07:00
Haobo Xu	a92194e5b2	[RocksDB] Add db property "rocksdb.cur-size-active-mem-table" Summary: as title Test Plan: db_test Reviewers: sdong Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D17217	2014-03-27 15:14:04 -07:00
Igor Canadi	1c9f8f0884	Fix valgrind issues Summary: NewFixedPrefixTransform is leaked in default options. Broken by `b47812fba6` Also included in the diff some code cleanup Test Plan: valgrind env_test also make check Reviewers: haobo, danguo, yhchiang Reviewed By: danguo CC: leveldb Differential Revision: https://reviews.facebook.net/D17211	2014-03-27 08:22:59 -07:00
sdong	d556200264	Some small cleaning up to make some compiling environment happy Summary: Compiler complains some errors when building using our internal build settings. Fix them. Test Plan: rebuild Reviewers: haobo, dhruba, igor, yhchiang, ljin Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17199	2014-03-26 18:11:41 -07:00
Igor Canadi	6a08bc042a	Fix no return warning in FileComparator	2014-03-26 14:46:07 -07:00
Igor Canadi	1e9621d4e5	Sort files correctly in Builder::SaveTo Summary: Previously, we used to sort all files by BySmallestFirst comparator and then re-sort level0 files in the Finalize() (recently moved to end of SaveTo). In this diff, I chose the correct comparator at the beginning and sort the files correctly in Builder::SaveTo. I also added a verification that all files are sorted correctly in CheckConsistency() NOTE: This diff depends on D17037 Test Plan: make check. Will also run db_stress Reviewers: dhruba, haobo, sdong, ljin Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17049	2014-03-26 13:30:14 -07:00
Igor Canadi	954679bb0f	AssertHeld() should do things Summary: AssertHeld() was a no-op before. Now it does things. Also, this change caught a bad bug in SuperVersion::Init(). The method is calling db->mutex.AssertHeld(), but db variable is not initialized yet! I also fixed that issue. Test Plan: make check Reviewers: dhruba, haobo, ljin, sdong, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17193	2014-03-26 11:24:52 -07:00
Igor Canadi	ad9a39c9b4	[RocksDB] Preallocate new MANIFEST files Summary: We don't preallocate MANIFEST file, even though we have an option for that. This diff preallocates manifest file every time we create it Test Plan: make check Reviewers: dhruba, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17163	2014-03-26 09:37:53 -07:00
sdong	6b2e7a2a01	When Options.max_num_files=-1, non level0 files also by pass table cache Summary: This is the part that was not finished when doing the Options.max_num_files=-1 feature. For iterating non level0 SST files (which was done using two level iterator), table cache is not bypassed. With this patch, the leftover feature is done. Test Plan: make all check; change Options.max_num_files=-1 in one of the tests to cover the codes. Reviewers: haobo, igor, dhruba, ljin, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17001	2014-03-25 18:40:52 -07:00
Igor Canadi	e86d7dffd7	Merge branch 'master' into columnfamilies	2014-03-25 15:24:02 -07:00
Yueh-Hsuan Chiang	b9ce156e38	Add assert to MergeOperator::PartialMergeMulti to check # of operands. Summary: Add assert(operands_list.size() >= 2) in MergeOperator::PartialMergeMulti to ensure it's only be called when we have at least two merge operands. Test Plan: run merge_test and stringappend_test. Reviewers: haobo, igor Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17169	2014-03-25 13:39:17 -07:00
Danny Guo	d9ca83df28	[rocksdb] make init prefix more robust Summary: Currently if client uses kNULLString as the prefix, it will confuse compaction filter v2. This diff added a bool to indicate if the prefix has been intialized. I also added a unit test to cover this case and make sure the new code path is hit. Test Plan: db_test Reviewers: igor, haobo Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17151	2014-03-25 11:59:40 -07:00
Yueh-Hsuan Chiang	34f9da1cef	Fix the failure of stringappend_test caused by PartialMergeMulti. Summary: Fix a bug that PartialMergeMulti will try to merge the first operand with an empty slice. Test Plan: run stringappend_test and merge_test. Reviewers: haobo, igor Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17157	2014-03-25 11:50:09 -07:00
Igor Canadi	e8168382c4	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc include/rocksdb/options.h util/options.cc	2014-03-25 11:09:40 -07:00
Danny Guo	b47812fba6	[rocksdb] new CompactionFilterV2 API Summary: This diff adds a new CompactionFilterV2 API that roll up the decisions of kv pairs during compactions. These kv pairs must share the same key prefix. They are buffered inside the db. typedef std::vector<Slice> SliceVector; virtual std::vector<bool> Filter(int level, const SliceVector& keys, const SliceVector& existing_values, std::vector<std::string>* new_values, std::vector<bool>* values_changed ) const = 0; Application can override the Filter() function to operate on the buffered kv pairs. More details in the inline documentation. Test Plan: make check. Added unit tests to make sure Keep, Delete, Change all works. Reviewers: haobo CCs: leveldb Differential Revision: https://reviews.facebook.net/D15087	2014-03-24 20:47:53 -07:00
Yueh-Hsuan Chiang	cda4006e87	Enhance partial merge to support multiple arguments Summary: * PartialMerge api now takes a list of operands instead of two operands. * Add min_pertial_merge_operands to Options, indicating the minimum number of operands to trigger partial merge. * This diff is based on Schalk's previous diff (D14601), but it also includes necessary changes such as updating the pure C api for partial merge. Test Plan: * make check all * develop tests for cases where partial merge takes more than two operands. TODOs (from Schalk): * Add test with min_partial_merge_operands > 2. * Perform benchmarks to measure the performance improvements (can probably use results of task #2837810.) * Add description of problem to doc/index.html. * Change wiki pages to reflect the interface changes. Reviewers: haobo, igor, vamsi Reviewed By: haobo CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D16815	2014-03-24 17:57:13 -07:00
Igor Canadi	ac328a86b9	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_test.cc	2014-03-20 14:41:37 -07:00
Igor Canadi	c21ce14fa5	Fix double-free in corruption_test	2014-03-20 14:37:30 -07:00
Igor Canadi	e67241f0b9	Sanity check on Open Summary: Everytime a client opens a DB, we do a sanity check that: * checks the existance of all the necessary files * verifies that file sizes are correct Some of the code was stolen from https://reviews.facebook.net/D16935 Test Plan: added a unit test Reviewers: dhruba, haobo, sdong Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D17097	2014-03-20 14:18:29 -07:00
Yiting Li	7981a43274	Consistency Check Function Summary: Added a function/command to check the consistency of live files' meta data Test Plan: Manual test (size mismatch, file not exist). Command test script. Reviewers: haobo Reviewed By: haobo CC: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D16935	2014-03-20 13:42:45 -07:00

... 3 4 5 6 7 ...

1270 Commits