rocksdb

Author	SHA1	Message	Date
Yoshinori Matsunobu	2cf0f4f471	Adding wal_recovery_mode log message Summary: wal_recovery_mode setting was not written to LOG. This diff adds the log message Test Plan: manually checked Reviewers: kradhakrishnan, sdong, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43953	2015-08-10 09:43:30 -07:00
Islam AbdelRahman	22dcaaff30	More accurate time measurement for delete_scheduler_test Summary: Start measuring time spent before BackgroundEmptyTrash starts Test Plan: delete_scheduler_test Reviewers: yhchiang, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43857	2015-08-07 15:37:56 -07:00
Nate Rosenblum	ac04a6cfb8	Fix OSX + Windows build Commit `257ee89` added a static destruction helper to avoid notional "leaks" of TLS on main thread exit. This helper fails to compile on OS X (and presumably Windows, though I haven't checked), which lacks the __thread storage class StaticMeta::tls_ member. This patch fixes the builds. Do note that the static cleanup mechanism may be somewhat brittle and atexit(3) may be a more suitable approach to releasing the main thread's TLS if it's highly desirable for this memory to not be reported "reachable" by Valgrind at exit.	2015-08-07 10:47:05 -07:00
Alexey Maykov	257ee895f9	Fixed memory leaks Summary: MyRocks valgrind run was showing memory leaks. The fixes are mostly self-explaining. There is only a single usage of ThreadLocalPtr. Potentially, we may think about replacing this use with thread_local, but it will be a bigger change. Another option to consider is using thread_local instead of __thread in ThreadLocalPtr implementation. This way, tls_ can be stored using std::unique_ptr and no destructor would be required. Test Plan: - make check - MyRocks valgrind run doesn't report leaks Reviewers: rven, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43677	2015-08-06 15:39:12 -07:00
Islam AbdelRahman	40f893f4a9	Fix delete_scheduler_test valgrind error Summary: Use shared_ptr instead of deleting in destructor Test Plan: DISABLE_JEMALLOC=1 make delete_scheduler_test -j64 && valgrind --error-exitcode=2 --leak-check=full ./delete_scheduler_test Reviewers: yhchiang, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43659	2015-08-06 10:56:00 -07:00
sdong	c7742452eb	Add Statistics.getHistogramString() to print more detailed outputs of a histogram Summary: Provide a way for users to know more detailed ditribution of a histogram metrics. Example outputs: Manually add statement fprintf(stdout, "%s\n", dbstats->getHistogramString(SST_READ_MICROS).c_str()); Will print out something like: Count: 989151 Average: 1.7659 StdDev: 1.52 Min: 0.0000 Median: 1.2071 Max: 860.0000 Percentiles: P50: 1.21 P75: 1.70 P99: 5.12 P99.9: 13.67 P99.99: 21.70 ------------------------------------------------------ [ 0, 1 ) 390839 39.513% 39.513% ######## [ 1, 2 ) 500918 50.641% 90.154% ########## [ 2, 3 ) 79358 8.023% 98.177% ## [ 3, 4 ) 6297 0.637% 98.813% [ 4, 5 ) 1712 0.173% 98.986% [ 5, 6 ) 1134 0.115% 99.101% [ 6, 7 ) 1222 0.124% 99.224% [ 7, 8 ) 1529 0.155% 99.379% [ 8, 9 ) 1264 0.128% 99.507% [ 9, 10 ) 988 0.100% 99.607% [ 10, 12 ) 1378 0.139% 99.746% [ 12, 14 ) 1828 0.185% 99.931% [ 14, 16 ) 410 0.041% 99.972% [ 16, 18 ) 72 0.007% 99.980% [ 18, 20 ) 67 0.007% 99.986% [ 20, 25 ) 106 0.011% 99.997% [ 25, 30 ) 24 0.002% 99.999% [ 30, 35 ) 1 0.000% 100.000% [ 250, 300 ) 2 0.000% 100.000% [ 300, 350 ) 1 0.000% 100.000% [ 800, 900 ) 1 0.000% 100.000% Test Plan: Manually add a print in db_bench and make sure it prints out as expected. Will add some codes to cover the function Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D43611	2015-08-05 20:05:56 -07:00
Islam AbdelRahman	29b028b0ed	Make DeleteScheduler tests more reliable Summary: Update DeleteScheduler tests so that they verify the used penalties for waiting instead of measuring the time spent which is not reliable Test Plan: make -j64 delete_scheduler_test && ./delete_scheduler_test COMPILE_WITH_TSAN=1 make -j64 delete_scheduler_test && ./delete_scheduler_test COMPILE_WITH_ASAN=1 make -j64 delete_scheduler_test && ./delete_scheduler_test make -j64 db_test && ./db_test --gtest_filter="DBTest.RateLimitedDelete:DBTest.DeleteSchedulerMultipleDBPaths" COMPILE_WITH_TSAN=1 make -j64 db_test && ./db_test --gtest_filter="DBTest.RateLimitedDelete:DBTest.DeleteSchedulerMultipleDBPaths" COMPILE_WITH_ASAN=1 make -j64 db_test && ./db_test --gtest_filter="DBTest.RateLimitedDelete:DBTest.DeleteSchedulerMultipleDBPaths" Reviewers: yhchiang, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43635	2015-08-05 19:16:52 -07:00
sdong	7ccd1c80a7	Add two unit tests for SyncWAL() Summary: Add two unit tests for SyncWAL(). One makes sure SyncWAL() doesn't block writes in the other thread. Another one makes sure SyncWAL() doesn't wait ongoing writes to finish before being executed. Create a new test file db_wal_test and move two WAL related tests from db_test to here. Test Plan: Run the new tests Reviewers: IslamAbdelRahman, rven, kradhakrishnan, kolmike, tnovak, yhchiang Reviewed By: yhchiang Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D43605	2015-08-05 14:27:02 -07:00
sdong	3ae386eafe	Add statistic histogram "rocksdb.sst.read.micros" Summary: Measure read latency histogram and put in statistics. Compaction inputs are excluded from it when possible (unfortunately usually no possible as we usually take table reader from table cache. Test Plan: Run db_bench and it shows the stats, like: rocksdb.sst.read.micros statistics Percentiles :=> 50 : 1.238522 95 : 2.529740 99 : 3.912180 Reviewers: kradhakrishnan, rven, anthony, IslamAbdelRahman, MarkCallaghan, yhchiang Reviewed By: yhchiang Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D43275	2015-08-05 13:02:33 -07:00
Islam AbdelRahman	bd2fc5f5fb	Fix TSAN for delete_scheduler_test Summary: Fixing TSAN false positive and relaxing the conditions when we are running under TSAN Test Plan: COMPILE_WITH_TSAN=1 make -j64 delete_scheduler_test && ./delete_scheduler_test Reviewers: yhchiang, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43593	2015-08-05 11:45:31 -07:00
Andres Notzli	c465071029	Removing duplicate code Summary: While working on https://reviews.facebook.net/D43179 , I found duplicate code in the tests. This patch removes it. Test Plan: make clean all check Reviewers: igor, sdong, rven, anthony, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43263	2015-08-05 07:33:27 -07:00
Mike Kolupaev	e06cf1a098	[wal changes 3/3] method in DB to sync WAL without blocking writers Summary: Subj. We really need this feature. Previous diff D40899 has most of the changes to make this possible, this diff just adds the method. Test Plan: `make check`, the new test fails without this diff; ran with ASAN, TSAN and valgrind. Reviewers: igor, rven, IslamAbdelRahman, anthony, kradhakrishnan, tnovak, yhchiang, sdong Reviewed By: sdong Subscribers: MarkCallaghan, maykov, hermanlee4, yoshinorim, tnovak, dhruba Differential Revision: https://reviews.facebook.net/D40905	2015-08-05 06:06:39 -07:00
Ari Ekmekji	5dc3e6881a	Update Tests To Enable Subcompactions Summary: Updated DBTest DBCompactionTest and CompactionJobStatsTest to run compaction-related tests once with subcompactions enabled and once disabled using the TEST_P test type in the Google Test suite. Test Plan: ./db_test ./db_compaction-test ./compaction_job_stats_test Reviewers: sdong, igor, anthony, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43443	2015-08-04 22:19:07 -07:00
Islam AbdelRahman	c45a57b41e	Support delete rate limiting Summary: Introduce DeleteScheduler that allow enforcing a rate limit on file deletion Instead of deleting files immediately, files are moved to trash directory and deleted in a background thread that apply sleep penalty between deletes if needed. I have updated PurgeObsoleteFiles and PurgeObsoleteWALFiles to use the delete_scheduler instead of env_->DeleteFile Test Plan: added delete_scheduler_test existing unit tests Reviewers: kradhakrishnan, anthony, rven, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43221	2015-08-04 20:45:27 -07:00
Yueh-Hsuan Chiang	14d0bfa429	Add DBOptions::skip_sats_update_on_db_open Summary: UpdateAccumulatedStats() is used to optimize compaction decision esp. when the number of deletion entries are high, but this function can slowdown DBOpen esp. in disk environment. This patch adds DBOptions::skip_sats_update_on_db_open, which skips UpdateAccumulatedStats() in DB::Open() time when it's set to true. Test Plan: Add DBCompactionTest.SkipStatsUpdateTest Reviewers: igor, anthony, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: tnovak, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42843	2015-08-04 13:48:16 -07:00
Boyang Zhang	d5c0a6da6c	Merge branch 'master' of github.com:facebook/rocksdb Fixed memory leak error	2015-08-04 10:56:49 -07:00
Boyang Zhang	2d41403f45	Made change to fix the memory leak Summary: So I took a look and I used a pointer to TableBuilder. Changed it to a unique_ptr. I think this should work, but I cannot run valgrind correctly on my local machine to test it. Test Plan: Run valgrind, but it's not working locally. It says I'm executing an unrecognized instruction. Reviewers: yhchiang Subscribers: dhruba, sdong Differential Revision: https://reviews.facebook.net/D43485	2015-08-04 10:55:42 -07:00
sdong	92f7039eec	fix memory corruption issue in sst_dump --show_compression_sizes Summary: In "sst_dump --show_compression_sizes", a reference of CompressionOptions is kept in TableBuilderOptions, which is destroyed later, causing a memory issue. Test Plan: Run valgrind against SSTDumpToolTest.CompressedSizes and make sure it is fixed Reviewers: IslamAbdelRahman, yhchiang, kradhakrishnan, rven Reviewed By: rven Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D43497	2015-08-04 10:48:44 -07:00
Ari Ekmekji	40c64434d4	Parallelize L0-L1 Compaction: Restructure Compaction Job Summary: As of now compactions involving files from Level 0 and Level 1 are single threaded because the files in L0, although sorted, are not range partitioned like the other levels. This means that during L0-L1 compaction each file from L1 needs to be merged with potentially all the files from L0. This attempt to parallelize the L0-L1 compaction assigns a thread and a corresponding iterator to each L1 file that then considers only the key range found in that L1 file and only the L0 files that have those keys (and only the specific portion of those L0 files in which those keys are found). In this way the overlap is minimized and potentially eliminated between different iterators focusing on the same files. The first step is to restructure the compaction logic to break L0-L1 compactions into multiple, smaller, sequential compactions. Eventually each of these smaller jobs will be run simultaneously. Areas to pay extra attention to are # Correct aggregation of compaction job statistics across multiple threads # Proper opening/closing of output files (make sure each thread's is unique) # Keys that span multiple L1 files # Skewed distributions of keys within L0 files Test Plan: Make and run db_test (newer version has separate compaction tests) and compaction_job_stats_test Reviewers: igor, noetzli, anthony, sdong, yhchiang Reviewed By: yhchiang Subscribers: MarkCallaghan, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42699	2015-08-03 11:32:14 -07:00
sdong	47316c2d08	dump_manifest supports DB with more number of levels Summary: Now ldb dump_manifest refuses to work if there are 20 levels. Extend the limit to 64. Test Plan: Run the tool with 20 number of levels Reviewers: kradhakrishnan, anthony, IslamAbdelRahman, yhchiang Reviewed By: yhchiang Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D42879	2015-08-03 11:02:09 -07:00
Andres Noetzli	544be638ab	Fixing fprintf of non string literal Summary: sst_dump_tool contains two instances of `fprintf`s where the `format` argument is not a string literal. This prevents the code from compiling with some compilers/compiler options because of the potential security risks associated with printing non-literals. Test Plan: make all Reviewers: rven, igor, yhchiang, sdong, anthony Reviewed By: anthony Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43305	2015-07-30 17:46:47 -07:00
Boyang Zhang	05d4265a28	Merge branch 'master' of github.com:facebook/rocksdb	2015-07-29 17:48:52 -07:00
Boyang Zhang	4be6d44167	Compression sizes option for sst_dump_tool Summary: Added a new feature to sst_dump_tool.cc to allow a user to see the sizes of the different compression algorithms on an .sst file. Usage: ./sst_dump --file=<filename> --show_compression_sizes ./sst_dump --file=<filename> --show_compression_sizes --set_block_size=<block_size> Note: If you do not set a block size, it will default to 16kb Test Plan: manual test and the write a unit test Reviewers: IslamAbdelRahman, anthony, yhchiang, rven, kradhakrishnan, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D42963	2015-07-29 17:42:13 -07:00
Andres Notzli	e95c59cd2f	Count number of corrupt keys during compaction Summary: For task #7771355, we would like to log the number of corrupt keys during a compaction. This patch implements and tests the count as part of CompactionJobStats. Test Plan: make && make check Reviewers: rven, igor, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42921	2015-07-28 16:41:40 -07:00
Poornima Chozhiyath Raman	1bdfcef7bf	Fix when output level is 0 of universal compaction with trivial move Summary: Fix for universal compaction with trivial move, when the ouput level is 0. The tests where failing. Fixed by allowing normal compaction when output level is 0. Test Plan: modified test cases run successfully. Reviewers: sdong, yhchiang, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: anthony, kradhakrishnan, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D42933	2015-07-27 14:25:57 -07:00
Siying Dong	8279d41972	Merge pull request #667 from yuslepukhin/fix_now_microsec_win Fix WinEnv::NowMicros	2015-07-23 11:00:43 -07:00
Dmitri Smirnov	555ca3e7b7	Fix WinEnv::NowMicrosec * std::chrono does not provide enough granularity for microsecs and periodically emits duplicates * the bug is manifested in log rotation logic where we get duplicate log file names and loose previous log content * msvc does not imlement COW on std::strings adjusted the test to use refs in the loops as auto does not retain ref info * adjust auto_log rotation test with Windows specific command to remove a folder. The test previously worked because we have unix utils installed in house but this may not be the case for everyone.	2015-07-22 14:36:43 -07:00
Mike Kolupaev	3bf9f9a832	cleaned up PosixMmapFile a little Summary: https://reviews.facebook.net/D42321 has left PosixMmapFile in some weird state. This diff removes pending_sync_ that was now unused, fixes indentation and prevents Fsync() from calling both fsync() and fdatasync(). Test Plan: `make -j check` Reviewers: sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D42885	2015-07-22 12:27:39 -07:00
sdong	85ac65536b	Tests to avoid to use TMPDIR directly Summary: Directly using TMPDIR can cause problems when running tests using parallel option. Fix them. Test Plan: Run all tests in parallel Reviewers: kradhakrishnan, yhchiang, IslamAbdelRahman, anthony Reviewed By: anthony Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D42807	2015-07-21 19:07:34 -07:00
Siying Dong	3dbf4ba220	RangeSync not to sync last 1MB of the file Summary: From other ones' investigation: "sync_file_range() behavior highly depends on kernel version and filesystem. xfs does neighbor page flushing outside of the specified ranges. For example, sync_file_range(fd, 8192, 16384) does not only trigger flushing page #3 to #4, but also flushing many more dirty pages (i.e. up to page#16)... Ranges of the sync_file_range() should be far enough from write() offset (at least 1MB)." Test Plan: make all check Reviewers: igor, rven, kradhakrishnan, yhchiang, IslamAbdelRahman, anthony Reviewed By: anthony Subscribers: yoshinorim, MarkCallaghan, sumeet, domas, dhruba, leveldb, ljin Differential Revision: https://reviews.facebook.net/D15807	2015-07-21 16:22:40 -07:00
agiardullo	064294081b	Improved FileExists API Summary: Add new CheckFileExists method. Considered changing the FileExists api but didn't want to break anyone's builds. Test Plan: unit tests Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42003	2015-07-20 17:20:40 -07:00
Islam AbdelRahman	59eca2cc99	Make memenv_test runnable in ROCKSDB_LITE Summary: Make memenv_test runnable in ROCKSDB_LITE Test Plan: memenv_test Reviewers: sdong, igor, yhchiang Reviewed By: yhchiang Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D42123	2015-07-20 11:35:10 -07:00
Islam AbdelRahman	aa8ac6445b	Skip unsupported tests in ROCKSDB_LITE Summary: Skipping these tests in ROCKSDB_LITE since they are not supported json_document_test wal_manager_test ttl_test sst_dump_test deletefile_test compact_files_test prefix_test checkpoint_test Test Plan: json_document_test wal_manager_test ttl_test sst_dump_test deletefile_test compact_files_test prefix_test checkpoint_test Reviewers: igor, sdong, yhchiang, kradhakrishnan, anthony Reviewed By: anthony Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D42573	2015-07-20 11:24:54 -07:00
Islam AbdelRahman	ce9712d340	Make mock_env_test runnable in ROCKSDB_LITE Summary: Make mock_env_test runnable in ROCKSDB_LITE Test Plan: mock_env_test Reviewers: igor, sdong, yhchiang, kradhakrishnan, anthony Reviewed By: anthony Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D42585	2015-07-20 11:19:51 -07:00
sdong	6e9fbeb27c	Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future. Test Plan: Run all existing unit tests. Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D42321	2015-07-17 16:58:18 -07:00
sdong	5ec829bc4f	Cleaning up CYGWIN define of fread_unlocked to port Summary: CYGWIN avoided fread_unlocked in a wrong way. Fix it to the standard way. Test Plan: Run tests Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D42549	2015-07-17 13:24:07 -07:00
Igor Canadi	35ca59364c	Don't let flushes preempt compactions Summary: When we first started, max_background_flushes was 0 by default and compaction thread was executing flushes (since there was no flush thread). Then, we switched the default max_background_flushes to 1. However, we still support the case where there is no flush thread and flushes are done in compaction. This is making our code a bit more complicated. By not supporting this use-case we can make our code simpler. We have a special case that when you set max_background_flushes to 0, we schedule the flush to execute on the compaction thread. Test Plan: make check (there might be some unit tests that depend on this behavior) Reviewers: IslamAbdelRahman, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D41931	2015-07-17 12:02:52 -07:00
agiardullo	79373c372d	Fix ROCKSDB_WARNING Summary: ROCKSDB_WARNING is only defined if either ROCKSDB_PLATFORM_POSIX or OS_WIN is defined. This works well for building rocksdb with its own build scripts. But this won't work when an outside project(like mongodb) doesn't define ROCKSDB_PLATFORM_POSIX. This fix defines ROCKSDB_WARNING for all platforms. No idea if its defined correctly on non-posix,non-windows platforms but this is no worse that the current situation where this macro is missing on unexpected platforms. This fix should hopefully fix anyone whose build broke now that we've switched from using #warning to Pragma (to support windows). Unfortunately, while mongo-rocks compiles, it ignores the Pragma and doesn't print a warning. I have not been able to figure out a way to implement this portably on all platforms. Of course, an alternate solution would be to just get rid of ROCKSDB_WARNING and live with include file redirects indefinitely. Thoughts? Test Plan: build rocks, build mongorocks Reviewers: igor, kradhakrishnan, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42477	2015-07-17 11:04:55 -07:00
Ari Ekmekji	74c755c552	Added JSON manifest dump option to ldb command Summary: Added a new flag --json to the ldb manifest_dump command that prints out the version edits as JSON objects for easier reading and parsing of information. Test Plan: Sample usage: ``` ./ldb manifest_dump --json --path=path/to/manifest/file ``` Sample output: ``` {"EditNumber": 0, "Comparator": "leveldb.BytewiseComparator", "ColumnFamily": 0} {"EditNumber": 1, "LogNumber": 0, "ColumnFamily": 0} {"EditNumber": 2, "LogNumber": 4, "PrevLogNumber": 0, "NextFileNumber": 7, "LastSeq": 35356, "AddedFiles": [{"Level": 0, "FileNumber": 5, "FileSize": 1949284, "SmallestIKey": "'", "LargestIKey": "'"}], "ColumnFamily": 0} ... {"EditNumber": 13, "PrevLogNumber": 0, "NextFileNumber": 36, "LastSeq": 290994, "DeletedFiles": [{"Level": 0, "FileNumber": 17}, {"Level": 0, "FileNumber": 20}, {"Level": 0, "FileNumber": 22}, {"Level": 0, "FileNumber": 24}, {"Level": 1, "FileNumber": 13}, {"Level": 1, "FileNumber": 14}, {"Level": 1, "FileNumber": 15}, {"Level": 1, "FileNumber": 18}], "AddedFiles": [{"Level": 1, "FileNumber": 25, "FileSize": 2114340, "SmallestIKey": "'", "LargestIKey": "'"}, {"Level": 1, "FileNumber": 26, "FileSize": 2115213, "SmallestIKey": "'", "LargestIKey": "'"}, {"Level": 1, "FileNumber": 27, "FileSize": 2114807, "SmallestIKey": "'", "LargestIKey": "'"}, {"Level": 1, "FileNumber": 30, "FileSize": 2115271, "SmallestIKey": "'", "LargestIKey": "'"}, {"Level": 1, "FileNumber": 31, "FileSize": 2115165, "SmallestIKey": "'", "LargestIKey": "'"}, {"Level": 1, "FileNumber": 32, "FileSize": 2114683, "SmallestIKey": "'", "LargestIKey": "'"}, {"Level": 1, "FileNumber": 35, "FileSize": 1757512, "SmallestIKey": "'", "LargestIKey": "'"}], "ColumnFamily": 0} ... ``` Reviewers: sdong, anthony, yhchiang, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D41727	2015-07-17 10:07:40 -07:00
Igor Canadi	a96fcd09b7	Deprecate CompactionFilterV2 Summary: It has been around for a while and it looks like it never found any uses in the wild. It's also complicating our compaction_job code quite a bit. We're deprecating it in 3.13, but will put it back in 3.14 if we actually find users that need this feature. Test Plan: make check Reviewers: noetzli, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42405	2015-07-17 18:59:11 +02:00
Andres Notzli	1d20fa9d0f	Fixed and simplified merge_helper Summary: MergeUntil was not reporting a success when merging an operand with a Value/Deletion despite the comments in MergeHelper and CompactionJob indicating otherwise. This lead to operands being written to the compaction output unnecessarily: M1 M2 M3 P M4 M5 --> (P+M1+M2+M3) M2 M3 M4 M5 (before the diff) M1 M2 M3 P M4 M5 --> (P+M1+M2+M3) M4 M5 (after the diff) In addition, the code handling Values/Deletion was basically identical. This patch unifies the code. Finally, this patch also adds testing for merge_helper. Test Plan: make && make check Reviewers: sdong, rven, yhchiang, tnovak, igor Reviewed By: igor Subscribers: tnovak, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42351	2015-07-17 09:27:24 -07:00
Siying Dong	aede5cd8e6	Merge pull request #656 from qinzuoyan/fb-master fix append bug in DumpDBFileSummary()	2015-07-16 18:06:11 -07:00
Siying Dong	d730c36777	Merge pull request #657 from yuslepukhin/ensure_clean_public_headers Ensure Windows build w/o port/port.h in public headers	2015-07-16 17:47:30 -07:00
Dmitri Smirnov	d1a457181d	Ensure Windows build w/o port/port.h in public headers - Remove make file defines from public headers and use _WIN32 because it is compiler defined - use __GNUC__ and __clang__ to guard non-portable attributes - add #include "port/port.h" to some new .cc files. - minor changes in CMakeLists to reflect recent changes	2015-07-16 12:10:16 -07:00
Igor Canadi	c5bca53198	Fix compile on Mac Summary: as title Test Plan: compiles Reviewers: lovro Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42411	2015-07-16 11:22:21 +02:00
qinzuoyan	487bba4348	extend temp str buffer size	2015-07-16 13:56:17 +08:00
qinzuoyan	84c3577af9	fix append bug in DumpDBFileSummary()	2015-07-16 12:02:03 +08:00
agiardullo	81d072623c	move convenience.h out of utilities Summary: Moved convenience.h out of utilities to remove a dependency on utilities in db. Test Plan: unit tests. Also compiled a link to the old location to verify the _Pragma works. Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42201	2015-07-15 14:51:51 -07:00
Poornima Chozhiyath Raman	beb19ad0dd	Fixing delete files in Trivial move of universal compaction Summary: Trvial move in universal compaction was failing when trying to move files from levels other than 0. This was because the DeleteFile while trivially moving, was only deleting files of level 0 which caused duplication of same file in different levels. This is fixed by passing the right level as argument in the call of DeleteFile while doing trivial move. Test Plan: ./db_test ran successfully with the new test cases. Reviewers: sdong Reviewed By: sdong Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D42135	2015-07-15 12:28:22 -07:00
lovro	e1c99e10c1	Replace std::priority_queue in MergingIterator with custom heap, take 2 Summary: Repeat of `b6655a679d` (reverted in `b7a2369fb2`) with a proper fix for the issue that `57d216ea65` was trying to fix. Test Plan: make check for i in $(seq 100); do ./db_stress --test_batches_snapshots=1 --threads=32 --write_buffer_size=4194304 --destroy_db_initially=0 --reopen=20 --readpercent=45 --prefixpercent=5 --writepercent=35 --delpercent=5 --iterpercent=10 --db=/tmp/rocksdb_crashtest_KdCI5F --max_key=100000000 --mmap_read=0 --block_size=16384 --cache_size=1048576 --open_files=500000 --verify_checksum=1 --sync=0 --progress_reports=0 --disable_wal=0 --disable_data_sync=1 --target_file_size_base=2097152 --target_file_size_multiplier=2 --max_write_buffer_number=3 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --filter_deletes=0 --memtablerep=prefix_hash --prefix_size=7 --ops_per_thread=200 \|\| break; done Reviewers: anthony, sdong, igor, yhchiang Reviewed By: igor, yhchiang Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D41391	2015-07-15 03:34:40 -07:00
Yueh-Hsuan Chiang	ce829c77e3	Make TransactionLogIterator related tests from db_test.cc to db_log_iter_test.cc Summary: Make TransactionLogIterator related tests from db_test.cc to db_log_iter_test.cc Test Plan: db_test db_log_iter_test Reviewers: sdong, IslamAbdelRahman, igor, anthony Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42045	2015-07-14 16:08:21 -07:00
Yueh-Hsuan Chiang	0936362a70	Block SyncPoint in util/db_test_util.h in released Windows mode. Summary: Block SyncPoint in util/db_test_util.h in released Windows mode. Test Plan: db_test Reviewers: igor, anthony, sdong, IslamAbdelRahman Reviewed By: sdong, IslamAbdelRahman Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42213	2015-07-14 16:02:31 -07:00
Igor Canadi	a9c5109515	Deprecate purge_redundant_kvs_while_flush Summary: This option is guarding the feature implemented 2 and a half years ago: D8991. The feature was enabled by default back then and has been running without issues. There is no reason why any client would turn this feature off. I found no reference in fbcode. Test Plan: none Reviewers: sdong, yhchiang, anthony, dhruba Reviewed By: dhruba Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42063	2015-07-14 13:07:02 +02:00
Igor Canadi	5aea98ddd8	Deprecate WriteOptions::timeout_hint_us Summary: In one of our recent meetings, we discussed deprecating features that are not being actively used. One of those features, at least within Facebook, is timeout_hint. The feature is really nicely implemented, but if nobody needs it, we should remove it from our code-base (until we get a valid use-case). Some arguments: * Less code == better icache hit rate, smaller builds, simpler code * The motivation for adding timeout_hint_us was to work-around RocksDB's stall issue. However, we're currently addressing the stall issue itself (see @sdong's recent work on stall write_rate), so we should never see sharp lock-ups in the future. * Nobody is using the feature within Facebook's code-base. Googling for `timeout_hint_us` also doesn't yield any users. Test Plan: make check Reviewers: anthony, kradhakrishnan, sdong, yhchiang Reviewed By: yhchiang Subscribers: sdong, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D41937	2015-07-14 09:35:48 +02:00
Yueh-Hsuan Chiang	49f42ad032	Move global static functions in db_test_util to DBTestBase Summary: Move global static functions in db_test_util to DBTestBase. This is to prevent unused function warning when decoupling db_test.cc into multiple files. Test Plan: db_test Reviewers: igor, sdong, anthony, IslamAbdelRahman, kradhakrishnan Reviewed By: kradhakrishnan Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D42009	2015-07-13 17:41:41 -07:00
Yueh-Hsuan Chiang	625467a08a	Move reusable part of db_test.cc to util/db_test_util.h Summary: Move reusable part of db_test.cc to util/db_test_util.h. This makes it more possible to partition db_test.cc into multiple smaller test files. Also, fixed many old lint errors in db_test. Test Plan: db_test Reviewers: igor, anthony, IslamAbdelRahman, sdong, kradhakrishnan Reviewed By: sdong, kradhakrishnan Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D41973	2015-07-13 16:53:38 -07:00
Ari Ekmekji	8bca83e5dd	Add tombstone information in CompactionJobStats Summary: Added new statistics in CompactionJobStats to keep track of deletion entries and the expiration of those entries. Updated these fields in compaction_job.cc as compaction took place and wrote a new test in compaction_job_stats_test.cc to verify accuracy. Test Plan: Wrote new test DeletionStatsTest in compaction_job_stats_test.cc to verify Reviewers: sdong, igor, yhchiang Reviewed By: yhchiang Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D41355	2015-07-13 15:51:38 -07:00
sdong	f9728640f3	"make format" against last 10 commits Summary: This helps Windows port to format their changes, as discussed. Might have formatted some other codes too becasue last 10 commits include more. Test Plan: Build it. Reviewers: anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D41961	2015-07-13 13:50:18 -07:00
sdong	76d3cd3286	Fix public API dependency on internal codes and dependency on MAX_INT32 Summary: Public API depends on port/port.h which is wrong. Fix it. Also with gcc 4.8.1 build was broken as MAX_INT32 was not recognized. Fix it by using ::max in linux. Test Plan: Build it and try to build an external project on top of it. Reviewers: anthony, yhchiang, kradhakrishnan, igor Reviewed By: igor Subscribers: yoshinorim, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D41745	2015-07-11 10:32:11 -07:00
sdong	5fd11853cb	Print Fast CRC32 support information in DB LOG Summary: Print whether fast CRC32 is supported in DB info LOG Test Plan: Run db_bench and see it prints out correctly. Reviewers: yhchiang, anthony, kradhakrishnan, igor Reviewed By: igor Subscribers: MarkCallaghan, yoshinorim, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D41733	2015-07-10 17:59:36 -07:00
sdong	a6e38fd170	Fix a uncleaned counter in PerfContext::Reset() Summary: new_table_iterator_nanos is not cleaned in PerfContext::Reset() while new_table_block_iter_nanos is cleaned twice. Fix it. Also fix a comment. Test Plan: Build and db_bench with --perf_context to see the value shown. Reviewers: kradhakrishnan, anthony, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D41721	2015-07-10 16:27:32 -07:00
Siying Dong	e41cbd9c2f	Merge pull request #646 from yuslepukhin/ms_win_port Windows Port from Microsoft	2015-07-10 15:53:39 -07:00
sdong	041b6f95a2	perf_context: report time spent on reading index and bloom blocks Summary: Add a perf context counter to help users figure out time spent on reading indexes and bloom filter blocks. Test Plan: Will write a unit test Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D41433	2015-07-10 14:45:42 -07:00
Dmitri Smirnov	c903ccc4c2	Merge from github/master	2015-07-09 18:01:08 -07:00
unknown	5c79132335	Revert the changes related to Options, as requested to seperate them into a different patch.	2015-07-09 11:31:42 -07:00
Dmitri Smirnov	ef4b87f1b2	Commit both PR and internal code review changes	2015-07-07 16:58:20 -07:00
Yueh-Hsuan Chiang	b7a2369fb2	Revert "Replace std::priority_queue in MergingIterator with custom heap" Summary: This patch reverts "Replace std::priority_queue in MergingIterator with custom heap" (commit commit `b6655a679d`) as it causes db_stress failure. Test Plan: ./db_stress --test_batches_snapshots=1 --threads=32 --write_buffer_size=4194304 --destroy_db_initially=0 --reopen=20 --readpercent=45 --prefixpercent=5 --writepercent=35 --delpercent=5 --iterpercent=10 --db=/tmp/rocksdb_crashtest_KdCI5F --max_key=100000000 --mmap_read=0 --block_size=16384 --cache_size=1048576 --open_files=500000 --verify_checksum=1 --sync=0 --progress_reports=0 --disable_wal=0 --disable_data_sync=1 --target_file_size_base=2097152 --target_file_size_multiplier=2 --max_write_buffer_number=3 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --filter_deletes=0 --memtablerep=prefix_hash --prefix_size=7 --ops_per_thread=200 --kill_random_test=97 Reviewers: igor, anthony, lovro, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D41343	2015-07-07 14:45:20 -07:00
lovro	b6655a679d	Replace std::priority_queue in MergingIterator with custom heap Summary: While profiling compaction in our service I noticed a lot of CPU (~15% of compaction) being spent in MergingIterator and key comparison. Looking at the code I found MergingIterator was (understandably) using std::priority_queue for the multiway merge. Keys in our dataset include sequence numbers that increase with time. Adjacent keys in an L0 file are very likely to be adjacent in the full database. Consequently, compaction will often pick a chunk of rows from the same L0 file before switching to another one. It would be great to avoid the O(log K) operation per row while compacting. This diff replaces std::priority_queue with a custom binary heap implementation. It has a "replace top" operation that is cheap when the new top is the same as the old one (i.e. the priority of the top entry is decreased but it still stays on top). Test Plan: make check To test the effect on performance, I generated databases with data patterns that mimic what I describe in the summary (rows have a mostly increasing sequence number). I see a 10-15% CPU decrease for compaction (and a matching throughput improvement on tmpfs). The exact improvement depends on the number of L0 files and the amount of locality. Performance on randomly distributed keys seems on par with the old code. Reviewers: kailiu, sdong, igor Reviewed By: igor Subscribers: yoshinorim, dhruba, tnovak Differential Revision: https://reviews.facebook.net/D29133	2015-07-06 04:24:09 -07:00
Dmitri Smirnov	e25ee32e3d	Arena needs mman header for mmap	2015-07-02 17:41:05 -07:00
Dmitri Smirnov	d2f0912bd3	Merge the latest changes from github/master	2015-07-02 17:23:41 -07:00
Ari Ekmekji	35cd75c379	Introduce InfoLogLevel::HEADER_LEVEL Summary: Introduced a new category in the enum InfoLogLevel in env.h. Modifed Log() in env.cc to use the Header() when the InfoLogLevel == HEADER_LEVEL. Updated tests in auto_roll_logger_test to ensure the header is handled properly in these cases. Test Plan: Augment existing tests in auto_roll_logger_test Reviewers: igor, sdong, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D41067	2015-07-02 17:14:39 -07:00
Dmitri Smirnov	feb99c31a4	Merge remote-tracking branch 'origin' into ms_win_port	2015-07-02 13:14:18 -07:00
Aaron Feldman	a69bc91e37	Multithreaded backup and restore in BackupEngineImpl Summary: Add a new field: BackupableDBOptions.max_background_copies. CreateNewBackup() and RestoreDBFromBackup() will use this number of threads to perform copies. If there is a backup rate limit, then max_background_copies must be 1. Update backupable_db_test.cc to test multi-threaded backup and restore. Update backupable_db_test.cc to test backups when the backup environment is not the same as the database environment. Test Plan: Run ./backupable_db_test Run valgrind ./backupable_db_test Run with TSAN and ASAN Reviewers: yhchiang, rven, anthony, sdong, igor Reviewed By: igor Subscribers: yhchiang, anthony, sdong, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D40725	2015-07-02 11:35:51 -07:00
Dmitri Smirnov	326da912de	Add string.h to Histogram as we init the array out of curly braces	2015-07-01 17:21:38 -07:00
Dmitri Smirnov	ca2fe2c1b6	Address GCC compilation issues invalid suffix on literal no return statement in function returning non-void CuckooStep::operator= extra qualification ‘rocksdb::spatial::Variant:: dereferencing type-punned pointer will break strict-aliasing rules	2015-07-01 17:04:08 -07:00
Dmitri Smirnov	18285c1e2f	Windows Port from Microsoft Summary: Make RocksDb build and run on Windows to be functionally complete and performant. All existing test cases run with no regressions. Performance numbers are in the pull-request. Test plan: make all of the existing unit tests pass, obtain perf numbers. Co-authored-by: Praveen Rao praveensinghrao@outlook.com Co-authored-by: Sherlock Huang baihan.huang@gmail.com Co-authored-by: Alex Zinoviev alexander.zinoviev@me.com Co-authored-by: Dmitri Smirnov dmitrism@microsoft.com	2015-07-01 16:13:56 -07:00
Giuseppe Ottaviano	782a1590f9	Implement a table-level row cache Summary: Implementation of a table-level row cache. It only caches point queries done through the `DB::Get` interface, queries done through the `Iterator` interface will completely skip the cache. Supports snapshots and merge operations. Test Plan: Ran `make valgrind_check commit-prereq` Reviewers: igor, philipp, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D39849	2015-06-23 10:25:45 -07:00
krad	de85e4cadf	Introduce WAL recovery consistency levels Summary: The "one size fits all" approach with WAL recovery will only introduce inconvenience for our varied clients as we go forward. The current recovery is a bit heuristic. We introduce the following levels of consistency while replaying the WAL. 1. RecoverAfterRestart (kTolerateCorruptedTailRecords) This mocks the current recovery mode. 2. RecoverAfterCleanShutdown (kAbsoluteConsistency) This is ideal for unit test and cases where the store is shutdown cleanly. We tolerate no corruption or incomplete writes. 3. RecoverPointInTime (kPointInTimeRecovery) This is ideal when using devices with controller cache or file systems which can loose data on restart. We recover upto the point were is no corruption or incomplete write. 4. RecoverAfterDisaster (kSkipAnyCorruptRecord) This is ideal mode to recover data. We tolerate corruption and incomplete writes, and we hop over those sections that we cannot make sense of salvaging as many records as possible. Test Plan: (1) Run added unit test to cover all levels. (2) Run make check. Reviewers: leveldb, sdong, igor Subscribers: yoshinorim, dhruba Differential Revision: https://reviews.facebook.net/D38487	2015-06-22 15:28:12 -07:00
krad	7015fd81c4	Add read_nanos to IOStatsContext. Summary: MyRocks need a mechanism to track read outliers. We need to expose this stat. Test Plan: None Reviewers: sdong CC: leveldb Task ID: #7152512 Blame Rev:	2015-06-22 11:09:35 -07:00
Aaron Feldman	18cc5018b7	Fix memory leaks in PinnedUsageTest Summary: See title Test Plan: Run valgrind ./cache_test Reviewers: igor Reviewed By: igor Subscribers: anthony, dhruba Differential Revision: https://reviews.facebook.net/D40419	2015-06-19 09:43:08 -07:00
Yueh-Hsuan Chiang	df719d4964	Make autovector_test runnable in ROCKSDB_LITE Summary: Make autovector_test runnable in ROCKSDB_LITE Test Plan: autovector_test Reviewers: sdong, rven, anthony, kradhakrishnan, IslamAbdelRahman, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D40245	2015-06-18 15:58:00 -07:00
Igor Canadi	760e9a94de	Fail DB::Open() when the requested compression is not available Summary: Currently RocksDB silently ignores this issue and doesn't compress the data. Based on discussion, we agree that this is pretty bad because it can cause confusion for our users. This patch fails DB::Open() if we don't support the compression that is specified in the options. Test Plan: make check with LZ4 not present. If Snappy is not present all tests will just fail because Snappy is our default library. We should make Snappy the requirement, since without it our default DB::Open() fails. Reviewers: sdong, MarkCallaghan, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39687	2015-06-18 14:55:05 -07:00
Aaron Feldman	69bb210d58	Add Cache.GetPinnedUsageUsage() Summary: Add the funcion Cache.GetPinnedUsage() to return the memory size of entries that are in use by the system (that is, all the entries not in the LRU list). Test Plan: Run ./cache_test and examine PinnedUsageTest. Reviewers: tnovak, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D40305	2015-06-18 13:56:31 -07:00
Igor Canadi	4b8bb62f0a	Don't dump DBOptions for each column family Summary: Currently we dump DBOptions for each column family options we dump. This leads to duplicate lines in our LOG file. This diff fixes that. Test Plan: Check out the LOG Reviewers: sdong, rven, yhchiang Reviewed By: yhchiang Subscribers: IslamAbdelRahman, yoshinorim, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39729	2015-06-18 10:15:54 -07:00
Islam AbdelRahman	12e030a992	Use CompactRangeOptions for CompactRange Summary: This diff update DB::CompactRange to use RangeCompactionOptions instead of using multiple parameters Old CompactRange is still available but deprecated Test Plan: make all check make rocksdbjava USE_CLANG=1 make all OPT=-DROCKSDB_LITE make release Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D40209	2015-06-17 14:36:14 -07:00
Yueh-Hsuan Chiang	1369f015ee	Only initialize the ThreadStatusData when necessary. Summary: Before this patch, any function call to ThreadStatusUtil might automatically initialize and register the thread status data. However, if it is the user-thread making this call, the allocated thread-status-data will never be released as such threads are not managed by rocksdb. In this patch, I remove the automatic-initialization part. Thread-status data is only initialized and uninitialized in Env during the thread creation and destruction. Test Plan: db_test thread_list_test listener_test Reviewers: igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D40017	2015-06-17 11:21:18 -07:00
sdong	40f562e747	Allow GetApproximateSize() to include mem table size if it is skip list memtable Summary: Add an option in GetApproximateSize() so that the result will include estimated sizes in mem tables. To implement it, implement an estimated count from the beginning to a key in skip list. The approach is to count to find the entry, how many Next() is issued from each level, and sum them with a weight that is <branching factor> ^ <level>. Test Plan: Add a test case Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D40119	2015-06-16 18:13:23 -07:00
Yueh-Hsuan Chiang	bee8d033f4	Removed two unused macros in iostats_context Summary: Removed two unused macros in iostats_context Test Plan: make all check Reviewers: sdong, rven, IslamAbdelRahman, kradhakrishnan, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D40005	2015-06-12 10:45:02 -07:00
sdong	7842920be5	Slow down writes by bytes written Summary: We slow down data into the database to the rate of options.delayed_write_rate (a new option) with this patch. The thread synchronization approach I take is to still synchronize write controller by DB mutex and GetDelay() is inside DB mutex. Try to minimize the frequency of getting time in GetDelay(). I verified it through db_bench and it seems to work hard_rate_limit is deprecated. options.delayed_write_rate is still not dynamically changeable. Need to work on it as a follow-up. Test Plan: Add new unit tests in db_test Reviewers: yhchiang, rven, kradhakrishnan, anthony, MarkCallaghan, igor Reviewed By: igor Subscribers: ikabiljo, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D36351	2015-06-11 20:42:18 -07:00
Islam AbdelRahman	ab455ce495	fix clang build	2015-06-11 14:32:10 -07:00
Yueh-Hsuan Chiang	3eddd1abe9	Add Env::GetThreadID(), which returns the ID of the current thread. Summary: Add Env::GetThreadID(), which returns the ID of the current thread. In addition, make GetThreadList() and InfoLog use same unique ID for the same thread. Test Plan: db_test listener_test Reviewers: igor, rven, IslamAbdelRahman, kradhakrishnan, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D39735	2015-06-11 14:18:02 -07:00
Islam AbdelRahman	643bbbf081	Use nullptr for default compaction_filter_factory Summary: Replacing the default value for compaction_filter_factory and compaction_filter_factory_v2 to be nullptr instead of DefaultCompactionFilterFactory / DefaultCompactionFilterFactoryV2 The reason for this is to be able to determine easily if we have compaction filter factory or not without depending on RTTI Test Plan: make check Reviewers: yoshinorim, ott, igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D39693	2015-06-08 16:34:26 -07:00
Yueh-Hsuan Chiang	7647df8f9e	Fixed the tsan failure in util/compaction_job_stats_impl.cc Summary: The type of smallest_output_key_prefix and largest_output_key_prefix have been changed to std::string in https://reviews.facebook.net/D39537. As a result, we shouldn't do smallest_output_key_prefix[0] = 0 in the initialization. Test Plan: compile db_test with tsan enabled and repeat DBTest.CompactionDeletionTrigger test to verify the tsan issue has been gone. Reviewers: igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39645	2015-06-05 11:05:35 -07:00
Yueh-Hsuan Chiang	fe5c6321cb	Allow EventListener::OnCompactionCompleted to return CompactionJobStats. Summary: Allow EventListener::OnCompactionCompleted to return CompactionJobStats, which contains useful information about a compaction. Example CompactionJobStats returned by OnCompactionCompleted(): smallest_output_key_prefix 05000000 largest_output_key_prefix 06990000 elapsed_time 42419 num_input_records 300 num_input_files 3 num_input_files_at_output_level 2 num_output_records 200 num_output_files 1 actual_bytes_input 167200 actual_bytes_output 110688 total_input_raw_key_bytes 5400 total_input_raw_value_bytes 300000 num_records_replaced 100 is_manual_compaction 1 Test Plan: Developed a mega test in db_test which covers 20 variables in CompactionJobStats. Reviewers: rven, igor, anthony, sdong Reviewed By: sdong Subscribers: tnovak, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38463	2015-06-02 17:07:16 -07:00
Mike Kolupaev	ec7a944360	more times in perf_context and iostats_context Summary: We occasionally get write stalls (>1s Write() calls) on HDD under read load. The following timers explain almost all of the stalls: - perf_context.db_mutex_lock_nanos - perf_context.db_condition_wait_nanos - iostats_context.open_time - iostats_context.allocate_time - iostats_context.write_time - iostats_context.range_sync_time - iostats_context.logger_time In my experiments each of these occasionally takes >1s on write path under some workload. There are rare cases when Write() takes long but none of these takes long. Test Plan: Added code to our application to write the listed timings to log for slow writes. They usually add up to almost exactly the time Write() call took. Reviewers: rven, yhchiang, sdong Reviewed By: sdong Subscribers: march, dhruba, tnovak Differential Revision: https://reviews.facebook.net/D39177	2015-06-02 02:07:58 -07:00
Mike Kolupaev	2ecac9f96d	add rocksdb::WritableFileWrapper similar to rocksdb::EnvWrapper Summary: It used to be no good (known to me) non-intrusive way to wrap WritableFile - you can't call protected virtual methods of the wrapped pointer to WritableFile. This diff adds a convenience class WritableFileWrapper that makes wrapping WritableFile both possible and easy. Test Plan: `make clean; make -j release`, `make clean; OPT=-DROCKSDB_LITE make release`, `make clean; USE_CLANG=1 make -j all`. Reviewers: sdong, yhchiang, rven Reviewed By: rven Subscribers: dhruba, tnovak, march Differential Revision: https://reviews.facebook.net/D39147	2015-06-01 11:22:36 -07:00
agiardullo	dc9d70de65	Optimistic Transactions Summary: Optimistic transactions supporting begin/commit/rollback semantics. Currently relies on checking the memtable to determine if there are any collisions at commit time. Not yet implemented would be a way of enuring the memtable has some minimum amount of history so that we won't fail to commit when the memtable is empty. You should probably start with transaction.h to get an overview of what is currently supported. Test Plan: Added a new test, but still need to look into stress testing. Reviewers: yhchiang, igor, rven, sdong Reviewed By: sdong Subscribers: adamretter, MarkCallaghan, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D33435	2015-05-29 14:36:35 -07:00
Yueh-Hsuan Chiang	9ffc8ba024	Include EventListener in stress test. Summary: Include EventListener in stress test. Test Plan: make blackbox_crash_test whitebox_crash_test Reviewers: anthony, igor, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39105	2015-05-29 13:17:49 -07:00
agiardullo	c815351038	Support saving history in memtable_list Summary: For transactions, we are using the memtables to validate that there are no write conflicts. But after flushing, we don't have any memtables, and transactions could fail to commit. So we want to someone keep around some extra history to use for conflict checking. In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit. After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure). It seems like the best place for this is abstracted inside the memtable_list. I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much. This diff adds a new parameter to control how much memtable history to keep around after flushing. However, it sounds like people aren't too fond of adding new parameters. So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers. This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit. (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached). So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit). However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions. Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests. Added testing in memtablelist_test and planning on adding more testing here. Reviewers: sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37443	2015-05-28 16:34:24 -07:00
Yueh-Hsuan Chiang	672dda9b3b	[API Change] Move listeners from ColumnFamilyOptions to DBOptions Summary: Move listeners from ColumnFamilyOptions to DBOptions Test Plan: listener_test compact_files_test Reviewers: rven, anthony, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39087	2015-05-28 13:21:39 -07:00
Reed Allman	328ad902ab	update an import path to fit in with the rest of the kids	2015-05-22 22:56:32 -07:00
Yueh-Hsuan Chiang	dc81efe415	Change the log-level of DB summary and options from INFO_LEVEL to WARN_LEVEL Summary: Change the log-level of DB summary and options from INFO_LEVEL to WARN_LEVEL Test Plan: Use db_bench to verify the log level. Sample output: 2015/05/22-00:20:39.778064 7fff75b41300 [WARN] RocksDB version: 3.11.0 2015/05/22-00:20:39.778095 7fff75b41300 [WARN] Git sha rocksdb_build_git_sha:7fee8775a459134c4cb04baae5bd1687e268f2a0 2015/05/22-00:20:39.778099 7fff75b41300 [WARN] Compile date May 22 2015 2015/05/22-00:20:39.778101 7fff75b41300 [WARN] DB SUMMARY 2015/05/22-00:20:39.778145 7fff75b41300 [WARN] SST files in /tmp/rocksdbtest-691931916/dbbench dir, Total Num: 0, files: 2015/05/22-00:20:39.778148 7fff75b41300 [WARN] Write Ahead Log file in /tmp/rocksdbtest-691931916/dbbench: 2015/05/22-00:20:39.778150 7fff75b41300 [WARN] Options.error_if_exists: 0 2015/05/22-00:20:39.778152 7fff75b41300 [WARN] Options.create_if_missing: 1 2015/05/22-00:20:39.778153 7fff75b41300 [WARN] Options.paranoid_checks: 1 Reviewers: MarkCallaghan, igor, kradhakrishnan Reviewed By: igor Subscribers: sdong, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38835	2015-05-22 11:54:59 -07:00
Yueh-Hsuan Chiang	7fee8775a4	Allow EventLogger to directly log from a JSONWriter. Summary: Allow EventLogger to directly log from a JSONWriter. This allows the JSONWriter to be shared by EventLogger and potentially EventListener, which is an important step to integrate EventLogger and EventListener. This patch also rewrites EventLoggerHelpers::LogTableFileCreation(), which uses the new API to generate identical log. Test Plan: Run db_bench in debug mode and make sure the log is correct and no assertions fail. Reviewers: sdong, anthony, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38709	2015-05-21 15:39:30 -07:00
Igor Canadi	4cb4d546cd	Set stats_dump_period_sec to 600 by default Summary: Having stats in our LOG more often will help a lot with perf debugging. Test Plan: none Reviewers: sdong, MarkCallaghan Reviewed By: MarkCallaghan Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38781	2015-05-21 14:22:16 -04:00
Yueh-Hsuan Chiang	d1a978ae3d	Rename JSONWritter to JSONWriter Summary: Rename JSONWritter to JSONWriter Test Plan: make Reviewers: igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38733	2015-05-20 12:11:57 -07:00
Igor Canadi	4a855c0799	Add an option wal_bytes_per_sync to control sync_file_range for WAL files Summary: sync_file_range is not always asyncronous and thus can block writes if we do this for WAL in the foreground thread. See more here: http://yoshinorimatsunobu.blogspot.com/2014/03/how-syncfilerange-really-works.html Some users don't want us to call sync_file_range on WALs. Some other do. Thus, I'm adding a separate option wal_bytes_per_sync to control calling sync_file_range on WAL files. bytes_per_sync will apply only to table files now. Test Plan: no more sync_file_range for WAL as evidenced by strace Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38253	2015-05-18 17:03:59 -07:00
Yueh-Hsuan Chiang	74f3832d85	Fixed compile errors due to some gcc does not have std::map::emplace Summary: Fixed the following compile errors due to some gcc does not have std::map::emplace util/thread_status_impl.cc: In static member function ‘static std::map<std::basic_string<char>, long unsigned int> rocksdb::ThreadStatus::InterpretOperationProperties(rocksdb::ThreadStatus::OperationType, const uint64_t)’: util/thread_status_impl.cc:88:20: error: ‘class std::map<std::basic_string<char>, long unsigned int>’ has no member named ‘emplace’ util/thread_status_impl.cc:90:20: error: ‘class std::map<std::basic_string<char>, long unsigned int>’ has no member named ‘emplace’ util/thread_status_impl.cc:94:20: error: ‘class std::map<std::basic_string<char>, long unsigned int>’ has no member named ‘emplace’ util/thread_status_impl.cc:96:20: error: ‘class std::map<std::basic_string<char>, long unsigned int>’ has no member named ‘emplace’ util/thread_status_impl.cc:98:20: error: ‘class std::map<std::basic_string<char>, long unsigned int>’ has no member named ‘emplace’ util/thread_status_impl.cc:101:20: error: ‘class std::map<std::basic_string<char>, long unsigned int>’ has no member named ‘emplace’ make: ** [util/thread_status_impl.o] Error 1 Test Plan: make db_bench Reviewers: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38643	2015-05-18 13:48:56 -07:00
stash93	0c8017dbae	Remove duplicated code Summary: Call Flush() function instead Test Plan: make all check Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D38583	2015-05-18 23:44:52 +03:00
Yueh-Hsuan Chiang	3f0867c0fe	Allow GetThreadList to report Flush properties. Summary: Allow GetThreadList to report Flush properties, which includes: * job id * number of bytes that has been written since flush started. * total size of input mem-tables Test Plan: ./db_bench --threads=30 --num=1000000 --benchmarks=fillrandom --thread_status_per_interval=100 --value_size=1000 Sample output from db_bench which tracks same flush job ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140213879898240 High Pri default Flush 5789 us FlushJob::WriteLevel0Table BytesMemtables 4112835 \| BytesWritten 577104 \| JobID 8 \| ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140213879898240 High Pri default Flush 30.634 ms FlushJob::WriteLevel0Table BytesMemtables 4112835 \| BytesWritten 1734865 \| JobID 8 \| Reviewers: rven, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38505	2015-05-15 23:22:22 -07:00
Yueh-Hsuan Chiang	714fcc067d	Make ThreadStatus::InterpretOperationProperties take const uint64_t* Summary: Make ThreadStatus::InterpretOperationProperties take const uint64_t* Test Plan: make make OPT=-DROCKSDB_LITE shared_lib Reviewers: igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38445	2015-05-13 12:26:07 -07:00
Igor Canadi	dbd95b7532	Add more table properties to EventLogger Summary: Example output: {"time_micros": 1431463794310521, "job": 353, "event": "table_file_creation", "file_number": 387, "file_size": 86937, "table_info": {"data_size": "81801", "index_size": "9751", "filter_size": "0", "raw_key_size": "23448", "raw_average_key_size": "24.000000", "raw_value_size": "990571", "raw_average_value_size": "1013.890481", "num_data_blocks": "245", "num_entries": "977", "filter_policy_name": "", "kDeletedKeys": "0"}} Also fixed a bug where BuildTable() in recovery was passing Env::IOHigh argument into paranoid_checks_file parameter. Test Plan: make check + check out the output in the log Reviewers: sdong, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38343	2015-05-12 15:53:55 -07:00
Yueh-Hsuan Chiang	77a5a543a5	Allow GetThreadList() to report basic compaction operation properties. Summary: Now we're able to show more details about a compaction in GetThreadList() :) This patch allows GetThreadList() to report basic compaction operation properties. Basic compaction properties include: 1. job id 2. compaction input / output level 3. compaction property flags (is_manual, is_deletion, .. etc) 4. total input bytes 5. the number of bytes has been read currently. 6. the number of bytes has been written currently. Flush operation properties will be done in a seperate diff. Test Plan: /db_bench --threads=30 --num=1000000 --benchmarks=fillrandom --thread_status_per_interval=1 Sample output of tracking same job: ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140664171987072 Low Pri default Compaction 31.357 ms CompactionJob::FinishCompactionOutputFile BaseInputLevel 1 \| BytesRead 2264663 \| BytesWritten 1934241 \| IsDeletion 0 \| IsManual 0 \| IsTrivialMove 0 \| JobID 277 \| OutputLevel 2 \| TotalInputBytes 3964158 \| ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140664171987072 Low Pri default Compaction 59.440 ms CompactionJob::FinishCompactionOutputFile BaseInputLevel 1 \| BytesRead 2264663 \| BytesWritten 1934241 \| IsDeletion 0 \| IsManual 0 \| IsTrivialMove 0 \| JobID 277 \| OutputLevel 2 \| TotalInputBytes 3964158 \| ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140664171987072 Low Pri default Compaction 226.375 ms CompactionJob::Install BaseInputLevel 1 \| BytesRead 3958013 \| BytesWritten 3621940 \| IsDeletion 0 \| IsManual 0 \| IsTrivialMove 0 \| JobID 277 \| OutputLevel 2 \| TotalInputBytes 3964158 \| Reviewers: sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37653	2015-05-06 22:51:06 -07:00
Igor Canadi	7f47ba0e26	Fix possible SIGSEGV in CompactRange (github issue #596 ) Summary: For very detailed explanation of what's happening read this: https://github.com/facebook/rocksdb/issues/596 Test Plan: make check + new unit test Reviewers: yhchiang, anthony, rven Reviewed By: rven Subscribers: adamretter, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37779	2015-04-29 10:52:31 -07:00
Igor Canadi	1bb4928da9	Include bunch of more events into EventLogger Summary: Added these events: * Recovery start, finish and also when recovery creates a file * Trivial move * Compaction start, finish and when compaction creates a file * Flush start, finish Also includes small fix to EventLogger Also added option ROCKSDB_PRINT_EVENTS_TO_STDOUT which is useful when we debug things. I've spent far too much time chasing LOG files. Still didn't get sst table properties in JSON. They are written very deeply into the stack. I'll address in separate diff. TODO: * Write specification. Let's first use this for a while and figure out what's good data to put here, too. After that we'll write spec * Write tools that parse and analyze LOGs. This can be in python or go. Good intern task. Test Plan: Ran db_bench with ROCKSDB_PRINT_EVENTS_TO_STDOUT. Here's the output: https://phabricator.fb.com/P19811976 Reviewers: sdong, yhchiang, rven, MarkCallaghan, kradhakrishnan, anthony Reviewed By: anthony Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37521	2015-04-27 15:20:02 -07:00
Aashish Pant	3db81d535a	Fix memory leak in cache_test introduced in the previous commit Test Plan: Verified that valgrind build passes for cache_test Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D37665	2015-04-26 21:47:30 -07:00
clark.kang	6ede020dc4	fix typos	2015-04-25 18:14:27 +09:00
Aashish Pant	242f9b4c26	Fix CLANG build issue introduced in previous commit Summary: Added keyword override for SetCapacity() Test Plan: Fixes build Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D37647	2015-04-24 14:45:12 -07:00
Aashish Pant	794ccfde89	Task 6532943: Rocksdb - SetCapacity() can dynamically change cache capacity if feasible Summary: When new capacity is larger than existing capacity, simply update the capacity to the new valie When new capacity is less than existing capacity, but more than the usage, simply update the capacity to new value When new capacity is less than the existing capacity and existing usage both, try to purge entries in LRU if feasible to make usage < capacity Test Plan: Created unit tests in cache_test.cc Reviewers: sdong, rven, yhchiang, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D37527	2015-04-24 14:12:58 -07:00
sdong	98a44559d5	Build for CYGWIN Summary: Make it build for CYGWIN. Need to define "-std=gnu++11" instead of "-std=c++11" and use some replacement functions. Test Plan: Build it and run some unit tests in CYGWIN Reviewers: yhchiang, rven, anthony, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D37605	2015-04-23 21:33:44 -07:00
Igor Canadi	e003d3864c	Abstract out SetMaxPossibleForUserKey() and SetMinPossibleForUserKey Summary: Based on feedback from D37083. Are all of these correct? In some spaces it seems like we're doing SetMaxPossibleForUserKey() although we want the smallest possible internal key for user key. Test Plan: make check Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37341	2015-04-23 18:08:37 -07:00
sdong	397b6588bd	options.paranoid_file_checks to read all rows after writing to a file. Summary: To further distinguish the corruption cases were caused by storage media or in memory states when writing it, add a paranoid check after writing the file to iterate all the rows. Test Plan: Add a new unit test for it Reviewers: rven, igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D37335	2015-04-23 11:34:35 -07:00
Jim Meyering	0a91bca5db	test: avoid vuln-inducing use of temporary directory Summary: Without this change, someone on the machine on which I run "make check" could cause me to overwrite arbitrary files owned by me, via a symlink attack. Instead of using a predictable temporary directory and accepting to use a preexisting one, always create a new one using mkdtemp. If $TEST_IOCTL_FRIENDLY_TMPDIR is set and usable, attempt first to find a usable temporary directory therein. If not, or if unusable, then try /var/tmp and /tmp. If none of those is usable abort with a diagnostic. To do that, I added a new class. Its constructor finds a suitable directory or aborts, the sole member prints that directory's name, and the destructor unlinks what should be an empty directory. Note that while the code before this did not remove its temporary directory, there was only one per $UID. Now, there would be at least one per run or one per test, depending on implementation, so it is important to remove them. Test Plan: Run this on a fedora rawhide system, where /tmp is a tmpfs file system, and /var/tmp is ext4. # This gives a diagnostic that /dev/shm is not suitable # and ends up using /var/tmp. TEST_IOCTL_FRIENDLY_TMPDIR=/dev/shm ./env_test # Uses /var/tmp; same as when envvar not set. TEST_IOCTL_FRIENDLY_TMPDIR=/var/tmp ./env_test # Uses /tmp unless it's tmpfs, in which case it gives # a diagnostic and uses /var/tmp. TEST_IOCTL_FRIENDLY_TMPDIR=/tmp ./env_test Reviewers: ljin, rven, igor.sugak, yhchiang, sdong, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D37287	2015-04-23 08:00:56 -07:00
Jim Meyering	79c21ec0c4	skip ioctl-using tests when not supported Summary: [NB: this is a prerequisite for the /tmp-abuse-fixing patch] This avoids spurious test failure on Linux systems like Fedora for which /tmp is a tmpfs file system. On a devtmpfs file system, ioctl(fd, FS_IOC_GETVERSION, &version) returns -1 with errno == ENOTTTY, indicating that that ioctl is not supported on such a file system. Do not let this cause test failures, e.g., where env_test would assert that file->GetUniqueId(...) > 0. Before this change, ./env_test would fail these three tests on a fedora rawhide system: [ FAILED ] 3 tests, listed below: [ FAILED ] EnvPosixTest.RandomAccessUniqueID [ FAILED ] EnvPosixTest.RandomAccessUniqueIDConcurrent [ FAILED ] EnvPosixTest.RandomAccessUniqueIDDeletes 3 FAILED TESTS The fix: When support for that ioctl is lacking, skip each affected test. Could be improved by noting which sub-tests are being skipped. Test Plan: run these on F21 and note that they now pass. TEST_TMPDIR=/dev/shm/rdb ./env_test ./env_test Reviewers: ljin, rven, igor.sugak, yhchiang, sdong, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D37323	2015-04-17 20:39:02 -07:00
Igor Canadi	48b0a045da	Speed up reduce_levels_test Summary: For some reason reduce_levels is opening the databse with 65.000 levels. This makes ComputeCompactionScore() function terribly slow and the tests is also very slow (20seconds). Test Plan: mr reduce_levels_test now takes 20ms Reviewers: sdong, rven, kradhakrishnan, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37059	2015-04-16 19:31:34 -07:00
sdong	fcb206b667	SyncPoint to allow a callback with an argument and use it to get DBTest.DynamicLevelCompressionPerLevel2 more straight-forward Summary: Allow users to give a callback function with parameter using sync point, so more complicated verification can be done in tests. Use it in DBTest.DynamicLevelCompressionPerLevel2 so that failures will be more easy to debug. Test Plan: Run all tests. Run DBTest.DynamicLevelCompressionPerLevel2 with valgrind check. Reviewers: rven, yhchiang, anthony, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D36999	2015-04-14 16:18:50 -07:00
Igor Canadi	1983fadcbc	assert(sorted) in vector rep Summary: based on discussion on https://reviews.facebook.net/D36969 Test Plan: will let jenkins do its job Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36975	2015-04-13 17:33:24 -07:00
Igor Canadi	9b983befa8	Fix flakiness of WalManagerTest Summary: We should use mocked-out env for these tests to make it more realiable. Added benefit is that instead of actually sleeping for 3 seconds, we can instead pretend to sleep and just increase time counters. Test Plan: for i in `seq 100`; do ./wal_manager_test --gtest_filter=WalManagerTest.WALArchivalTtl ;done Reviewers: rven, meyering Reviewed By: meyering Subscribers: meyering, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36951	2015-04-13 16:15:05 -07:00
Igor Canadi	d41a565a4a	Don't do O(N^2) operations in debug mode for vector memtable Summary: As title. For every operation we're asserting Valid(), which sorts the data. That's pretty terrible. We have to be careful to have decent performance even with DEBUG builds. Test Plan: make check Reviewers: sdong, rven, yhchiang, MarkCallaghan Reviewed By: MarkCallaghan Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36969	2015-04-13 16:11:47 -07:00
Igor Canadi	08be1803ee	Fix bad performance in debug mode Summary: See github issue 574: https://github.com/facebook/rocksdb/issues/574 Basically when we're running in DEBUG mode we're calling `usleep(0)` on every mutex lock. I bisected the issue to https://reviews.facebook.net/D36963. Instead of calling sleep(0), this diff just avoids calling SleepForMicroseconds() when delay is not set. Test Plan: bpl=10485760;overlap=10;mcz=2;del=300000000;levels=2;ctrig=10000000; delay=10000000; stop=10000000; wbn=30; mbc=20; mb=1073741824;wbs=268435456; dds=1; sync=0; r=100000; t=1; vs=800; bs=65536; cs=1048576; of=500000; si=1000000; ./db_bench --benchmarks=fillrandom --disable_seek_compaction=1 --mmap_read=0 --statistics=1 --histogram=1 --num=$r --threads=$t --value_size=$vs --block_size=$bs --cache_size=$cs --bloom_bits=10 --cache_numshardbits=4 --open_files=$of --verify_checksum=1 --db=/tmp/rdb10test --sync=$sync --disable_wal=1 --compression_type=snappy --stats_interval=$si --compression_ratio=0.5 --disable_data_sync=$dds --write_buffer_size=$wbs --target_file_size_base=$mb --max_write_buffer_number=$wbn --max_background_compactions=$mbc --level0_file_num_compaction_trigger=$ctrig --level0_slowdown_writes_trigger=$delay --level0_stop_writes_trigger=$stop --num_levels=$levels --delete_obsolete_files_period_micros=$del --min_level_to_compress=$mcz --max_grandparent_overlap_factor=$overlap --stats_per_interval=1 --max_bytes_for_level_base=$bpl --memtablerep=vector --use_existing_db=0 --disable_auto_compactions=1 --source_compaction_factor=10000000 \| grep ops Before: fillrandom : 117.525 micros/op 8508 ops/sec; 6.6 MB/s After: fillrandom : 1.283 micros/op 779502 ops/sec; 606.6 MB/s Reviewers: rven, yhchiang, sdong Reviewed By: sdong Subscribers: meyering, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36963	2015-04-13 15:58:45 -07:00
Igor Canadi	abb4052278	Kill benchharness Summary: 1. it doesn't work 2. we're not using it In the future, if we need general benchmark framework, we should probably use https://github.com/google/benchmark Test Plan: make all Reviewers: yhchiang, rven, anthony, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36777	2015-04-13 10:17:42 -07:00
Yueh-Hsuan Chiang	7b9581bc3b	Fixed xfunc related compile errors in ROCKSDB_LITE Summary: Fixed xfunc related compile errors in ROCKSDB_LITE Now make OPT=-DROCKSDB_LITE shared_lib -j32 would work Test Plan: make clean make OPT=-DROCKSDB_LITE shared_lib -j32 make clean make OPT=-DROCKSDB_LITE static_lib -j32 Reviewers: sdong, igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36825	2015-04-09 21:05:18 -07:00
Igor Canadi	5e067a7b19	Clean up compression logging Summary: Now we add warnings when user configures compression and the compression is not supported. Test Plan: Configured compression to non-supported values. Observed messages in my log: 2015/03/26-12:17:57.586341 7ffb8a496840 [WARN] Compression type chosen for level 2 is not supported: LZ4. RocksDB will not compress data on level 2. 2015/03/26-12:19:10.768045 7f36f15c5840 [WARN] Compression type chosen is not supported: LZ4. RocksDB will not compress data. Reviewers: rven, sdong, yhchiang Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35979	2015-04-06 12:50:44 -07:00
sdong	953a885ebf	A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge Summary: Currently users have no idea a key is add, delete or merge from TablePropertiesCollector call back. Add a new function to add it. Also refactor the codes so that (1) make table property collector and internal table property collector two separate data structures with the later one now exposed (2) table builders only receive internal table properties Test Plan: Add cases in table_properties_collector_test to cover both of old and new ways of using TablePropertiesCollector. Reviewers: yhchiang, igor.sugak, rven, igor Reviewed By: rven, igor Subscribers: meyering, yoshinorim, maykov, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D35373	2015-04-06 10:27:21 -07:00
sdong	b23bbaa82a	Universal Compactions with Small Files Summary: With this change, we use L1 and up to store compaction outputs in universal compaction. The compaction pick logic stays the same. Outputs are stored in the largest "level" as possible. If options.num_levels=1, it behaves all the same as now. Test Plan: 1) convert most of existing unit tests for universal comapaction to include the option of one level and multiple levels. 2) add a unit test to cover parallel compaction in universal compaction and run it in one level and multiple levels 3) add unit test to migrate from multiple level setting back to one level setting 4) add a unit test to insert keys to trigger multiple rounds of compactions and verify results. Reviewers: rven, kradhakrishnan, yhchiang, igor Reviewed By: igor Subscribers: meyering, leveldb, MarkCallaghan, dhruba Differential Revision: https://reviews.facebook.net/D34539	2015-03-30 15:12:02 -07:00
Igor Canadi	2511b7d947	Makefile minor cleanup Summary: Just couple of small changes: 1. removed signal_test, since it doesn't seem useful and we don't even run it as part of `make check` 2. moved perf_context_test to TESTS instead of PROGRAMS 3. `make release` probably shouldn't compile benchmarks. We currently rely on `make release` building db_bench (via Jenkins), so I left db_bench there. This is just a minor cleanup. We need to rethink our targets since they are a bit messy right now. We can do this during our tech debt week. Test Plan: make release Reviewers: anthony, rven, yhchiang, sdong, meyering Reviewed By: meyering Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36171	2015-03-30 16:05:35 -04:00
Mark Callaghan	99ec2412e5	Make the benchmark scripts configurable and add tests Summary: This makes run_flash_bench.sh configurable. Previously it was hardwired for 1B keys and tests ran for 12 hours each. That kept me from using it. This makes it configuable, adds more tests, makes the duration per-test configurable and refactors the test scripts. Adds the seekrandomwhilemerging test to db_bench which is the same as seekrandomwhilewriting except the writer thread does Merge rather than Put. Forces the stall-time column in compaction IO stats to use a fixed format (H:M:S) which makes it easier to scrape and parse. Also adds an option to AppendHumanMicros to force a fixed format. Sometimes automation and humans want different format. Calls thread->stats.AddBytes(bytes); in db_bench for more tests to get the MB/sec summary stats in the output at test end. Adds the average ingest rate to compaction IO stats. Output now looks like: https://gist.github.com/mdcallag/2bd64d18be1b93adc494 More information on the benchmark output is at https://gist.github.com/mdcallag/db43a58bd5ac624f01e1 For benchmark.sh changes default RocksDB configuration to reduce stalls: * min_level_to_compress from 2 to 3 * hard_rate_limit from 2 to 3 * max_grandparent_overlap_factor and max_bytes_for_level_multiplier from 10 to 8 * L0 file count triggers from 4,8,12 to 4,12,20 for (start,stall,stop) Task ID: #6596829 Blame Rev: Test Plan: run tools/run_flash_bench.sh Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D36075	2015-03-30 11:28:25 -07:00
xiaoxichen	bcd8a71a28	Fix interger overflow on i386 arch The error was: util/logging.cc: In function 'int rocksdb::AppendHumanMicros(uint64_t, char, int)': error: util/logging.cc:41:39: integer overflow in expression [-Werror=overflow] } else if (micros < 1000000l 60 * 60) { ^ error: util/logging.cc:41:39: comparison between signed and unsigned integer expressions [-Werror=sign-compare]	2015-03-27 08:38:53 +08:00
Yueh-Hsuan Chiang	cd987c383a	Fix compile error when NROCKSDB_THREAD_STATUS is not used. Summary: Fix compile error when NROCKSDB_THREAD_STATUS is not used. Test Plan: make dbg OPT=-DNROCKSDB_THREAD_STATUS -j32 Reviewers: sdong, igor, rven Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35847	2015-03-24 14:52:59 -07:00
Anurag Indu	3d1a924ff3	Adding stats for the merge and filter operation Summary: We have addded new stats and perf_context for measuring the merge and filter operation time consumption. We have bounded all the merge operations within the GUARD statment and collected the total time for these operations in the DB. Test Plan: WIP Reviewers: rven, yhchiang, kradhakrishnan, igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D34377	2015-03-24 14:42:04 -07:00
Yueh-Hsuan Chiang	248c063ba1	Report elapsed time in micros in ThreadStatus instead of start time. Summary: Report elapsed time of a thread operation in micros in ThreadStatus instead of start time of a thread operation in seconds since the Epoch, 1970-01-01 00:00:00 (UTC). Test Plan: ./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \ --max_background_compactions=10 --max_background_flushes=3 \ --thread_status_per_interval=1000 --key_size=16 --value_size=1000 \ --num_column_families=10 Sample Output: ThreadID ThreadType cfName Operation ElapsedTime Stage State 140667724562496 High Pri column_family_name_000002 Flush 772.419 ms FlushJob::WriteLevel0Table 140667728756800 High Pri default Flush 617.845 ms FlushJob::WriteLevel0Table 140667732951104 High Pri column_family_name_000005 Flush 772.078 ms FlushJob::WriteLevel0Table 140667875557440 Low Pri column_family_name_000008 Compaction 1409.216 ms CompactionJob::Install 140667737145408 Low Pri 140667749728320 Low Pri 140667816837184 Low Pri column_family_name_000007 Compaction 1071.815 ms CompactionJob::ProcessKeyValueCompaction 140667787477056 Low Pri column_family_name_000009 Compaction 772.516 ms CompactionJob::ProcessKeyValueCompaction 140667741339712 Low Pri 140667758116928 Low Pri column_family_name_000004 Compaction 620.739 ms CompactionJob::ProcessKeyValueCompaction 140667753922624 Low Pri 140667842003008 Low Pri column_family_name_000006 Compaction 1260.079 ms CompactionJob::ProcessKeyValueCompaction 140667745534016 Low Pri Reviewers: sdong, igor, rven Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35769	2015-03-24 11:32:25 -07:00
krad	689391406a	Make SSTDumpTest.GetProperties less noisy Summary: Limiting verbose printing to "command=scan" Test Plan: Run make check and manual testing of sst_dump_test Reviewers: sdong CC: leveldb Task ID: #6575982 Blame Rev:	2015-03-23 14:30:11 -07:00
Igor Sugak	28bc6de989	rocksdb: print status error message when (ASSERT\|EXPECT)_OK fails Summary: Modified rocksdb status assertions ASSERT_OK and EXPECT_OK to print error message from Status::ToString() when failed. Test Plan: Modify a test to fail status assertions ASSERT_OK and EXPECT_OK and notice an error message that came from Status::ToString() Reviewers: meyering, sdong, yhchiang, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35469	2015-03-19 17:32:43 -07:00
Igor Sugak	9405b5ef8f	rocksdb: Remove #include "util/string_util.h" from util/testharness.h Summary: 1. Manually deleted #include "util/string_util.h" from util/testharness.h 2. ``` % USE_CLANG=1 make all -j55 -k 2> build.log % perl -naF: -E 'say $F[0] if /: error:/' build.log \| sort -u \| xargs sed -i '/#include "util\/testharness.h"/i #include "util\/string_util.h"' ``` Test Plan: Make sure make all completes with no errors. ``` % make all -j55 ``` Reviewers: meyering, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35493	2015-03-19 17:29:37 -07:00
Igor Sugak	220d0dff7c	rocksdb: Remove #include "util/random.h" from util/testharness.h Summary: Cleaning util/testharness.h Test Plan: Make completes with no errors. ``` % make all ``` Reviewers: meyering, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35487	2015-03-19 17:06:02 -07:00
Igor Sugak	17ae3fcbca	rocksdb: initial util/testharness clean up Summary: Deleted some redundant code. More comming. Test Plan: ```lang=bash % USE_CLANG=1 make check ``` Reviewers: meyering, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35463	2015-03-19 16:52:59 -07:00
Igor Canadi	51301b869f	Enable dynamic changing of rate limiter's bytes_per_second Summary: This feature is going to be useful for mongodb+rocksdb. I'll expose it through mongo's API. Test Plan: added new unit test. also will run TSAN on the new unit test Reviewers: meyering, sdong Reviewed By: meyering, sdong Subscribers: meyering, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35307	2015-03-18 15:35:55 -07:00
Venkatesh Radhakrishnan	230e68727a	Fix TSAN failue in env_test Summary: Check for state of task before deleting it. Test Plan: Run env_test with TSAN Reviewers: igor, sdong Reviewed By: sdong Subscribers: meyering, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35283	2015-03-18 11:40:46 -07:00
Islam AbdelRahman	155d468c56	Using chrono as a fallback Summary: Right now if they system we are compiling on is not Linux and not Mac we will get a compilation error this diff use chrono as a fallback when we are compiling on something other than Linux/FreeBSD/Mac Test Plan: compile on CentOS/FreeBSD ./db_test (still running) Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D35277	2015-03-18 11:26:10 -07:00
Igor Canadi	c88ff4ca76	Deprecate removeScanCountLimit in NewLRUCache Summary: It is no longer used by the implementation, so we should also remove it from the public API. Test Plan: make check Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34971	2015-03-17 15:04:37 -07:00
Igor Sugak	b4b69e4f77	rocksdb: switch to gtest Summary: Our existing test notation is very similar to what is used in gtest. It makes it easy to adopt what is different. In this diff I modify existing [[ https://code.google.com/p/googletest/wiki/Primer#Test_Fixtures:_Using_the_Same_Data_Configuration_for_Multiple_Te \| test fixture ]] classes to inherit from `testing::Test`. Also for unit tests that use fixture class, `TEST` is replaced with `TEST_F` as required in gtest. There are several custom `main` functions in our existing tests. To make this transition easier, I modify all `main` functions to fallow gtest notation. But eventually we can remove them and use implementation of `main` that gtest provides. ```lang=bash % cat ~/transform #!/bin/sh files=$(git ls-files 'test\.cc') for file in $files do if grep -q "rocksdb::test::RunAllTests()" $file then if grep -Eq '^class \w+Test {' $file then perl -pi -e 's/^(class \w+Test) {/${1}: public testing::Test {/g' $file perl -pi -e 's/^(TEST)/${1}_F/g' $file fi perl -pi -e 's/(int main.\{)/${1}::testing::InitGoogleTest(&argc, argv);/g' $file perl -pi -e 's/rocksdb::test::RunAllTests/RUN_ALL_TESTS/g' $file fi done % sh ~/transform % make format ``` Second iteration of this diff contains only scripted changes. Third iteration contains manual changes to fix last errors and make it compilable. Test Plan: Build and notice no errors. ```lang=bash % USE_CLANG=1 make check -j55 ``` Tests are still testing. Reviewers: meyering, sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35157	2015-03-17 14:08:00 -07:00

1 2 3 4 5 ...

1055 Commits