Commit Graph

872 Commits

Author SHA1 Message Date
sdong
b23bbaa82a Universal Compactions with Small Files
Summary:
With this change, we use L1 and up to store compaction outputs in universal compaction.
The compaction pick logic stays the same. Outputs are stored in the largest "level" as possible.

If options.num_levels=1, it behaves all the same as now.

Test Plan:
1) convert most of existing unit tests for universal comapaction to include the option of one level and multiple levels.
2) add a unit test to cover parallel compaction in universal compaction and run it in one level and multiple levels
3) add unit test to migrate from multiple level setting back to one level setting
4) add a unit test to insert keys to trigger multiple rounds of compactions and verify results.

Reviewers: rven, kradhakrishnan, yhchiang, igor

Reviewed By: igor

Subscribers: meyering, leveldb, MarkCallaghan, dhruba

Differential Revision: https://reviews.facebook.net/D34539
2015-03-30 15:12:02 -07:00
Igor Canadi
2511b7d947 Makefile minor cleanup
Summary:
Just couple of small changes:
1. removed signal_test, since it doesn't seem useful and we don't even run it as part of `make check`
2. moved perf_context_test to TESTS instead of PROGRAMS
3. `make release` probably shouldn't compile benchmarks. We currently rely on `make release` building db_bench (via Jenkins), so I left db_bench there.

This is just a minor cleanup. We need to rethink our targets since they are a bit messy right now. We can do this during our tech debt week.

Test Plan: make release

Reviewers: anthony, rven, yhchiang, sdong, meyering

Reviewed By: meyering

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36171
2015-03-30 16:05:35 -04:00
Mark Callaghan
99ec2412e5 Make the benchmark scripts configurable and add tests
Summary:
This makes run_flash_bench.sh configurable. Previously it was hardwired for 1B keys and tests
ran for 12 hours each. That kept me from using it. This makes it configuable, adds more tests,
makes the duration per-test configurable and refactors the test scripts.

Adds the seekrandomwhilemerging test to db_bench which is the same as seekrandomwhilewriting except
the writer thread does Merge rather than Put.

Forces the stall-time column in compaction IO stats to use a fixed format (H:M:S) which makes
it easier to scrape and parse. Also adds an option to AppendHumanMicros to force a fixed format.
Sometimes automation and humans want different format.

Calls thread->stats.AddBytes(bytes); in db_bench for more tests to get the MB/sec summary
stats in the output at test end.

Adds the average ingest rate to compaction IO stats. Output now looks like:
https://gist.github.com/mdcallag/2bd64d18be1b93adc494

More information on the benchmark output is at https://gist.github.com/mdcallag/db43a58bd5ac624f01e1

For benchmark.sh changes default RocksDB configuration to reduce stalls:
* min_level_to_compress from 2 to 3
* hard_rate_limit from 2 to 3
* max_grandparent_overlap_factor and max_bytes_for_level_multiplier from 10 to 8
* L0 file count triggers from 4,8,12 to 4,12,20 for (start,stall,stop)

Task ID: #6596829

Blame Rev:

Test Plan:
run tools/run_flash_bench.sh

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D36075
2015-03-30 11:28:25 -07:00
xiaoxichen
bcd8a71a28 Fix interger overflow on i386 arch
The error was:

util/logging.cc: In function 'int rocksdb::AppendHumanMicros(uint64_t, char*, int)':
error: util/logging.cc:41:39: integer overflow in expression [-Werror=overflow]
} else if (micros < 1000000l * 60 * 60) {
^
error: util/logging.cc:41:39: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
2015-03-27 08:38:53 +08:00
Yueh-Hsuan Chiang
cd987c383a Fix compile error when NROCKSDB_THREAD_STATUS is not used.
Summary: Fix compile error when NROCKSDB_THREAD_STATUS is not used.

Test Plan: make dbg OPT=-DNROCKSDB_THREAD_STATUS -j32

Reviewers: sdong, igor, rven

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35847
2015-03-24 14:52:59 -07:00
Anurag Indu
3d1a924ff3 Adding stats for the merge and filter operation
Summary:
We have addded new stats and perf_context for measuring the merge and filter operation time consumption.
We have bounded all the merge operations within the GUARD statment and collected the total time for these operations in the DB.

Test Plan: WIP

Reviewers: rven, yhchiang, kradhakrishnan, igor, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D34377
2015-03-24 14:42:04 -07:00
Yueh-Hsuan Chiang
248c063ba1 Report elapsed time in micros in ThreadStatus instead of start time.
Summary:
Report elapsed time of a thread operation in micros in ThreadStatus
instead of start time of a thread operation in seconds since the
Epoch, 1970-01-01 00:00:00 (UTC).

Test Plan:
./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \
--max_background_compactions=10 --max_background_flushes=3 \
--thread_status_per_interval=1000 --key_size=16 --value_size=1000 \
--num_column_families=10

Sample Output:
            ThreadID ThreadType                    cfName    Operation  ElapsedTime                                         Stage        State
     140667724562496   High Pri column_family_name_000002        Flush   772.419 ms                    FlushJob::WriteLevel0Table
     140667728756800   High Pri                   default        Flush   617.845 ms                    FlushJob::WriteLevel0Table
     140667732951104   High Pri column_family_name_000005        Flush   772.078 ms                    FlushJob::WriteLevel0Table
     140667875557440    Low Pri column_family_name_000008   Compaction  1409.216 ms                        CompactionJob::Install
     140667737145408    Low Pri
     140667749728320    Low Pri
     140667816837184    Low Pri column_family_name_000007   Compaction  1071.815 ms      CompactionJob::ProcessKeyValueCompaction
     140667787477056    Low Pri column_family_name_000009   Compaction   772.516 ms      CompactionJob::ProcessKeyValueCompaction
     140667741339712    Low Pri
     140667758116928    Low Pri column_family_name_000004   Compaction   620.739 ms      CompactionJob::ProcessKeyValueCompaction
     140667753922624    Low Pri
     140667842003008    Low Pri column_family_name_000006   Compaction  1260.079 ms      CompactionJob::ProcessKeyValueCompaction
     140667745534016    Low Pri

Reviewers: sdong, igor, rven

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35769
2015-03-24 11:32:25 -07:00
krad
689391406a Make SSTDumpTest.GetProperties less noisy
Summary:
Limiting verbose printing to "command=scan"

Test Plan:
Run make check and manual testing of sst_dump_test

Reviewers: sdong

CC: leveldb

Task ID: #6575982

Blame Rev:
2015-03-23 14:30:11 -07:00
Igor Sugak
28bc6de989 rocksdb: print status error message when (ASSERT|EXPECT)_OK fails
Summary: Modified rocksdb status assertions ASSERT_OK and EXPECT_OK to print error message from Status::ToString() when failed.

Test Plan: Modify a test to fail status assertions ASSERT_OK and EXPECT_OK and notice an error message that came from Status::ToString()

Reviewers: meyering, sdong, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35469
2015-03-19 17:32:43 -07:00
Igor Sugak
9405b5ef8f rocksdb: Remove #include "util/string_util.h" from util/testharness.h
Summary:
1. Manually deleted #include "util/string_util.h" from util/testharness.h
2.
```
% USE_CLANG=1 make all -j55 -k 2> build.log
% perl -naF: -E 'say $F[0] if /: error:/' build.log | sort -u | xargs sed -i '/#include "util\/testharness.h"/i #include "util\/string_util.h"'
```

Test Plan:
Make sure make all completes with no errors.
```
% make all -j55
```

Reviewers: meyering, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35493
2015-03-19 17:29:37 -07:00
Igor Sugak
220d0dff7c rocksdb: Remove #include "util/random.h" from util/testharness.h
Summary: Cleaning util/testharness.h

Test Plan:
Make completes with no errors.
```
% make all
```

Reviewers: meyering, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35487
2015-03-19 17:06:02 -07:00
Igor Sugak
17ae3fcbca rocksdb: initial util/testharness clean up
Summary: Deleted some redundant code. More comming.

Test Plan:
```lang=bash
% USE_CLANG=1 make check
```

Reviewers: meyering, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35463
2015-03-19 16:52:59 -07:00
Igor Canadi
51301b869f Enable dynamic changing of rate limiter's bytes_per_second
Summary: This feature is going to be useful for mongodb+rocksdb. I'll expose it through mongo's API.

Test Plan: added new unit test. also will run TSAN on the new unit test

Reviewers: meyering, sdong

Reviewed By: meyering, sdong

Subscribers: meyering, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35307
2015-03-18 15:35:55 -07:00
Venkatesh Radhakrishnan
230e68727a Fix TSAN failue in env_test
Summary: Check for state of task before deleting it.

Test Plan: Run env_test with TSAN

Reviewers: igor, sdong

Reviewed By: sdong

Subscribers: meyering, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35283
2015-03-18 11:40:46 -07:00
Islam AbdelRahman
155d468c56 Using chrono as a fallback
Summary:
Right now if they system we are compiling on is not Linux and not Mac we will get a compilation error
this diff use chrono as a fallback when we are compiling on something other than Linux/FreeBSD/Mac

Test Plan:
compile on CentOS/FreeBSD
./db_test (still running)

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D35277
2015-03-18 11:26:10 -07:00
Igor Canadi
c88ff4ca76 Deprecate removeScanCountLimit in NewLRUCache
Summary: It is no longer used by the implementation, so we should also remove it from the public API.

Test Plan: make check

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34971
2015-03-17 15:04:37 -07:00
Igor Sugak
b4b69e4f77 rocksdb: switch to gtest
Summary:
Our existing test notation is very similar to what is used in gtest. It makes it easy to adopt what is different.
In this diff I modify existing [[ https://code.google.com/p/googletest/wiki/Primer#Test_Fixtures:_Using_the_Same_Data_Configuration_for_Multiple_Te | test fixture ]] classes to inherit from `testing::Test`. Also for unit tests that use fixture class, `TEST` is replaced with `TEST_F` as required in gtest.

There are several custom `main` functions in our existing tests. To make this transition easier, I modify all `main` functions to fallow gtest notation. But eventually we can remove them and use implementation of `main` that gtest provides.

```lang=bash
% cat ~/transform
#!/bin/sh
files=$(git ls-files '*test\.cc')
for file in $files
do
  if grep -q "rocksdb::test::RunAllTests()" $file
  then
    if grep -Eq '^class \w+Test {' $file
    then
      perl -pi -e 's/^(class \w+Test) {/${1}: public testing::Test {/g' $file
      perl -pi -e 's/^(TEST)/${1}_F/g' $file
    fi
    perl -pi -e 's/(int main.*\{)/${1}::testing::InitGoogleTest(&argc, argv);/g' $file
    perl -pi -e 's/rocksdb::test::RunAllTests/RUN_ALL_TESTS/g' $file
  fi
done
% sh ~/transform
% make format
```

Second iteration of this diff contains only scripted changes.

Third iteration contains manual changes to fix last errors and make it compilable.

Test Plan:
Build and notice no errors.
```lang=bash
% USE_CLANG=1 make check -j55
```
Tests are still testing.

Reviewers: meyering, sdong, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35157
2015-03-17 14:08:00 -07:00
Igor Canadi
413e35273e Merge pull request #540 from dalgaaf/wip-da-fix-elif
Fix '#elif with no expression'
2015-03-17 10:33:21 -07:00
Danny Al-Gaaf
969aa806b7 util/xfunc.h: fix #elif check for NDEBUG
Fix '#elif with no expression', add defined() to check.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2015-03-17 12:04:05 +01:00
Danny Al-Gaaf
87c7d49d67 util/env_posix.cc: fix #elif check for __MACH__
Fix '#elif with no expression' add defined() to check.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
2015-03-17 12:03:11 +01:00
Igor Sugak
9fd6edf81c rocksdb: Replace ASSERT* with EXPECT* in functions that does not return void value
Summary:
gtest does not use exceptions to fail a unit test by design, and `ASSERT*`s are implemented using `return`. As a consequence we cannot use `ASSERT*` in a function that does not return `void` value ([[ https://code.google.com/p/googletest/wiki/AdvancedGuide#Assertion_Placement | 1]]), and have to fix our existing code. This diff does this in a generic way, with no manual changes.

In order to detect all existing `ASSERT*` that are used in functions that doesn't return void value, I change the code to generate compile errors for such cases.

In `util/testharness.h` I defined `EXPECT*` assertions, the same way as `ASSERT*`, and redefined `ASSERT*` to return `void`. Then executed:

```lang=bash
% USE_CLANG=1 make all -j55 -k 2> build.log
% perl -naF: -e 'print "-- -number=".$F[1]." ".$F[0]."\n" if  /: error:/' \
build.log | xargs -L 1 perl -spi -e 's/ASSERT/EXPECT/g if $. == $number'
% make format
```
After that I reverted back change to `ASSERT*` in `util/testharness.h`. But preserved introduced `EXPECT*`, which is the same as `ASSERT*`. This will be deleted once switched to gtest.

This diff is independent and contains manual changes only in `util/testharness.h`.

Test Plan:
Make sure all tests are passing.
```lang=bash
% USE_CLANG=1 make check
```

Reviewers: igor, lgalanis, sdong, yufei.zhu, rven, meyering

Reviewed By: meyering

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D33333
2015-03-16 20:52:32 -07:00
Venkatesh Radhakrishnan
b2b3086524 Speed up rocksDB close call.
Summary:
On RocksDB, when there are multiple instances doing
flushes/compactions in the background, the close call takes a long time
because the flushes/compactions need to complete before the database can
shut down. If another instance is using the background threads and the compaction for this instance is in the queue since it has been scheduled, we still cannot shutdown. We now remove the scheduled background tasks which have not yet started running, so that shutdown is speeded up.

Test Plan: DB Test added.

Reviewers: yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D33741
2015-03-16 18:49:14 -07:00
Igor Sugak
95344346af rocksdb: Small refactoring before migrating to gtest
Summary: These changes are necessary to make tests look more generic, and avoid feature conflicts with gtest.

Test Plan:
Make sure no build errors, and all test are passing.
```
% make check
```

Reviewers: igor, meyering

Reviewed By: meyering

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35145
2015-03-16 18:08:59 -07:00
Igor Canadi
c6967a1a5e Make RecordIn/RecordOut human readable
Summary: I had hard time understanding these big numbers. Here's how the output looks like now: https://gist.github.com/igorcanadi/4c39c17685049584a992

Test Plan: db_bench

Reviewers: sdong, MarkCallaghan

Reviewed By: MarkCallaghan

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35073
2015-03-14 15:12:41 -07:00
Mark Callaghan
c8da670325 Stop printing per-level stall times.
Summary:
Per-level stall times are the suggested stall time, not the actual stall time so this change stops printing them
both in the per-level output lines and in the summary. Also changed output for total stall time to include units
in all cases. The new output looks like:
Level   Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) Stall(cnt)    RecordIn   RecordDrop
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0     4/1          7   0.8      0.0     0.0      0.0       0.6      0.6       0.0   0.0      0.0     12.9        50       352    0.141        882            0            0
  L1     5/0          9   0.9      0.0     0.0      0.0       0.0      0.0       0.6   0.0      0.0      0.0         0         0    0.000          0            0            0
  L2    54/0         99   1.0      0.0     0.0      0.0       0.0      0.0       0.6   0.0      0.0      0.0         0         0    0.000          0            0            0
  L3   289/0        527   0.5      0.0     0.0      0.0       0.0      0.0       0.5   0.0      0.0      0.0         0         0    0.000          0            0            0
 Sum   352/1        642   0.0      0.0     0.0      0.0       0.6      0.6       1.7   1.0      0.0     12.9        50       352    0.141        882            0            0
 Int     0/0          0   0.0      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0     15.5         0         3    0.118          7            0            0
Flush(GB): accumulative 0.627, interval 0.005
Stalls(count): 0 level0_slowdown, 0 level0_numfiles, 882 memtable_compaction, 0 leveln_slowdown_soft, 0 leveln_slowdown_hard

Task ID: #6493861

Blame Rev:

Test Plan:
run db_bench, look at output

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D35085
2015-03-14 15:01:43 -07:00
Yueh-Hsuan Chiang
56c4a9c760 Fix compile warning in thread_status_util.h on Mac
Summary:
Fix compile warning in thread_status_util.h on Mac

Test Plan:
make dbg -j32
2015-03-13 18:09:01 -07:00
Yueh-Hsuan Chiang
c594b0e89d Allow GetThreadList() to report operation stage.
Summary: Allow GetThreadList() to report operation stage.

Test Plan:
  ./thread_list_test
  ./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \
    --max_background_compactions=10 --max_background_flushes=3 \
    --thread_status_per_interval=1000 --key_size=16 --value_size=1000 \
    --num_column_families=10

  export ROCKSDB_TESTS=ThreadStatus
  ./db_test

Sample output
          ThreadID ThreadType                    cfName    Operation        OP_StartTime    ElapsedTime                                         Stage        State
   140116265861184    Low Pri
   140116270055488    Low Pri
   140116274249792   High Pri column_family_name_000005        Flush 2015/03/10-14:58:11           0 us                    FlushJob::WriteLevel0Table
   140116400078912    Low Pri column_family_name_000004   Compaction 2015/03/10-14:58:11           0 us     CompactionJob::FinishCompactionOutputFile
   140116358135872    Low Pri column_family_name_000006   Compaction 2015/03/10-14:58:10           1 us     CompactionJob::FinishCompactionOutputFile
   140116341358656    Low Pri
   140116295221312   High Pri                   default        Flush 2015/03/10-14:58:11           0 us                    FlushJob::WriteLevel0Table
   140116324581440    Low Pri column_family_name_000009   Compaction 2015/03/10-14:58:11           0 us      CompactionJob::ProcessKeyValueCompaction
   140116278444096    Low Pri
   140116299415616    Low Pri column_family_name_000008   Compaction 2015/03/10-14:58:11           0 us     CompactionJob::FinishCompactionOutputFile
   140116291027008   High Pri column_family_name_000001        Flush 2015/03/10-14:58:11           0 us                    FlushJob::WriteLevel0Table
   140116286832704    Low Pri column_family_name_000002   Compaction 2015/03/10-14:58:11           0 us     CompactionJob::FinishCompactionOutputFile
   140116282638400    Low Pri

Reviewers: rven, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34683
2015-03-13 10:45:40 -07:00
Igor Canadi
2623b2cf6d Include chrono 2015-03-13 10:29:32 -07:00
Igor Canadi
52d8347a91 EventLogger
Summary:
Here's my proposal for making our LOGs easier to read by machines.

The idea is to dump all events as JSON objects. JSON is easy to read by humans, but more importantly, it's easy to read by machines. That way, we can parse this, load into SQLite/mongo and then query or visualize.

I started with table_create and table_delete events, but if everybody agrees, I'll continue by adding more events (flush/compaction/etc etc)

Test Plan:
Ran db_bench. Observed:
2015/01/15-14:13:25.788019 1105ef000 EVENT_LOG_v1 {"time_micros": 1421360005788015, "event": "table_file_creation", "file_number": 12, "file_size": 1909699}
2015/01/15-14:13:25.956500 110740000 EVENT_LOG_v1 {"time_micros": 1421360005956498, "event": "table_file_deletion", "file_number": 12}

Reviewers: yhchiang, rven, dhruba, MarkCallaghan, lgalanis, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D31647
2015-03-13 10:15:54 -07:00
Sameet Agarwal
3ebebfccd8 Prevent xxhash symbols from polluting global namespace
Summary:
The functions and global symbols in xxhash.h and xxhash.cc were not in any namespace.
This caused issues when rocksdb library was being used along with other uses of libraries
with the same name

Test Plan:
unit tests

Reviewers:

CC:

Task ID: #

Blame Rev:
2015-03-12 12:07:10 -07:00
stash93
53996149d4 Removing unnecessary kInlineSize
Summary: Remove unnecessary rocksdb::kInlineSize, since it's not used and there is rocksdb::Arena::kInlineSize.

Test Plan: make all check

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D34905
2015-03-12 21:13:53 +03:00
Yueh-Hsuan Chiang
6f55798683 Fixed a compile error in db_bench in mac.
Summary:
Fixed a compile error in db_bench in mac.

Test Plan:
make db_bench
2015-03-11 13:02:46 -07:00
Yueh-Hsuan Chiang
89597bb66b Allow GetThreadList() to report the start time of the current operation.
Summary: Allow GetThreadList() to report the start time of the current operation.

Test Plan:
./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \
  --max_background_compactions=10 --max_background_flushes=3 \
  --thread_status_per_interval=1000 --key_size=16 --value_size=1000 \
  --num_column_families=10

Sample output:
          ThreadID ThreadType                    cfName    Operation        OP_StartTime         State
   140338840797248   High Pri column_family_name_000003        Flush 2015/03/09-17:49:59
   140338844991552   High Pri column_family_name_000004        Flush 2015/03/09-17:49:59
   140338849185856    Low Pri
   140338983403584    Low Pri
   140339008569408    Low Pri
   140338861768768    Low Pri
   140338924683328    Low Pri
   140338899517504    Low Pri
   140338853380160    Low Pri
   140338882740288    Low Pri
   140338865963072   High Pri column_family_name_000006        Flush 2015/03/09-17:49:59
   140338954043456    Low Pri
   140338857574464    Low Pri

Reviewers: igor, rven, sdong

Reviewed By: sdong

Subscribers: lgalanis, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34689
2015-03-10 14:51:28 -07:00
Igor Canadi
485ac0dbd0 Add rate_limiter to string options
Summary: I want to be able to set this through mongo config.

Test Plan: added unit test

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34599
2015-03-06 14:21:15 -08:00
Yueh-Hsuan Chiang
dc4532c497 Add --thread_status_per_interval to db_bench
Summary:
Add --thread_status_per_interval to db_bench, which allows
db_bench to optionally enable print the current thread status
periodically.

Test Plan:
./db_bench --benchmarks=fillrandom --num=100000 --threads=40 --max_background_compactions=10 --max_background_flushes=3 --thread_status_per_interval=1000 --key_size=16 --value_size=1000 --num_column_families=10

Sample output:
          ThreadID ThreadType                         dbName                     cfName       Operation           State
   140281571770432    Low Pri
   140281575964736   High Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000001           Flush
   140281710182464    Low Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000008      Compaction
   140281638879296    Low Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000007      Compaction
   140281592741952    Low Pri
   140281580159040   High Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000002           Flush
   140281676628032    Low Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000006      Compaction
   140281584353344    Low Pri
   140281622102080    Low Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000009      Compaction
   140281605324864    Low Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000004      Compaction
   140281601130560   High Pri  /tmp/rocksdbtest-5297/dbbench                    default           Flush
   140281596936256    Low Pri
   140281588547648    Low Pri

Reviewers: igor, rven, sdong

Reviewed By: rven

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D34515
2015-03-06 11:22:06 -08:00
Yueh-Hsuan Chiang
694988b627 Fix a bug in stall time counter. Improve its output format.
Summary: Fix a bug in stall time counter.  Improve its output format.

Test Plan:
export ROCKSDB_TESTS=Timeout
./db_test

./db_bench --benchmarks=fillrandom --stats_interval=10000 --statistics=true --stats_per_interval=1 --num=1000000 --threads=4 --level0_stop_writes_trigger=3 --level0_slowdown_writes_trigger=2

sample output:
    Uptime(secs): 35.8 total, 0.0 interval
    Cumulative writes: 359590 writes, 359589 keys, 183047 batches, 2.0 writes per batch, 0.04 GB user ingest, stall seconds: 1786.008 ms
    Cumulative WAL: 359591 writes, 183046 syncs, 1.96 writes per sync, 0.04 GB written
    Interval writes: 253 writes, 253 keys, 128 batches, 2.0 writes per batch, 0.0 MB user ingest, stall time: 0 us
    Interval WAL: 253 writes, 128 syncs, 1.96 writes per sync, 0.00 MB written

Reviewers: MarkCallaghan, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34275
2015-03-03 12:48:12 -08:00
Igor Canadi
b8d23cdcb8 Revert chrono use
Summary:
For some reason, libstdc++ implements steady_clock::now() using syscall instead of VDSO optimized clock_gettime() when using glibc 2.16 and earlier. This leads to significant performance degradation for users with older glibcs. See bug reported here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59177

We observed this behavior when testing mongo on AWS hosts. Facebook hosts are unaffected since we use glibc2.17 and 2.20.

Revert "Fix timing"
This reverts commit 965d9d50b8.

Revert "Use chrono for timing"
This reverts commit 001ce64dc7.

Test Plan: make check

Reviewers: MarkCallaghan, yhchiang, rven, meyering, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34371
2015-03-03 11:29:31 -08:00
Igor Canadi
db03739340 options.level_compaction_dynamic_level_bytes to allow RocksDB to pick size bases of levels dynamically.
Summary:
When having fixed max_bytes_for_level_base, the ratio of size of largest level and the second one can range from 0 to the multiplier. This makes LSM tree frequently irregular and unpredictable. It can also cause poor space amplification in some cases.

In this improvement (proposed by Igor Kabiljo), we introduce a parameter option.level_compaction_use_dynamic_max_bytes. When turning it on, RocksDB is free to pick a level base in the range of (options.max_bytes_for_level_base/options.max_bytes_for_level_multiplier, options.max_bytes_for_level_base] so that real level ratios are close to options.max_bytes_for_level_multiplier.

Test Plan: New unit tests and pass tests suites including valgrind.

Reviewers: MarkCallaghan, rven, yhchiang, igor, ikabiljo

Reviewed By: ikabiljo

Subscribers: yoshinorim, ikabiljo, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D31437
2015-03-02 22:40:41 -08:00
Igor Canadi
3cf7f353d9 Instrument memtable seeks
Summary: As title

Test Plan: compiles

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34191
2015-02-27 17:06:06 -08:00
Sameet Agarwal
e7c434c364 Add columnfamily option optimize_filters_for_hits to optimize for key hits only
Summary:
    Summary:
    Added a new option to ColumnFamllyOptions  - optimize_filters_for_hits. This option can be used in the case where most
    accesses to the store are key hits and we dont need to optimize performance for key misses.
    This is useful when you have a very large database and most of your lookups succeed.  The option allows the store to
     not store and use filters in the last level (the largest level which contains data). These filters can take a large amount of
     space for large databases (in memory and on-disk). For the last level, these filters are only useful for key misses and not
     for key hits. If we are not optimizing for key misses, we can choose to not store these filters for that level.

    This option is only provided for BlockBasedTable. We skip the filters when we are compacting

Test Plan:
1. Modified db_test toalso run tests with an additonal option (skip_filters_on_last_level)
 2. Added another unit test to db_test which specifically tests that filters are being skipped

Reviewers: rven, igor, sdong

Reviewed By: sdong

Subscribers: lgalanis, yoshinorim, MarkCallaghan, rven, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D33717
2015-02-26 16:25:56 -08:00
Islam AbdelRahman
ba9d1737a8 RocksDB on FreeBSD support
Summary:
This patch will update the Makefile and source code so that we can build RocksDB successfully on FreeBSD 10 and 11 (64-bit and 32-bit)
I have also encountered some problems when running tests on FreeBSD, I will try to fix them individually in different diffs

Notes:

  - FreeBSD uses clang as it's default compiler (http://lists.freebsd.org/pipermail/freebsd-current/2012-September/036480.html)
  - GNU C++ compiler have C++ 11 problems on FreeBSD (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193528)
  - make is not gmake on FreeBSD (http://www.khmere.com/freebsd_book/html/ch01.html)

Test Plan:
Using VMWare Fusion Create 4 VM machines (FreeBSD 11 64-bit, FreeBSD 11 32-bit, FreeBSD 10 64-bit, FreeBSD 10 32-bit)

  - pkg install git gmake gflags archivers/snappy
  - git clone https://github.com/facebook/rocksdb.git
  - apply this patch
  - setenv CXX c++
  - setenv CPATH /usr/local/include/
  - setenv LIBRARY_PATH  /usr/local/lib/
  - gmake db_bench
  - make sure compilation is successful and db_bench is running
  - gmake all
  - make sure compilation is successful

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D33891
2015-02-26 15:19:17 -08:00
Venkatesh Radhakrishnan
8984e5f848 Fix race in sync point.
Summary:
The LoadDependency function does not take a lock when it runs
and it could be modifying data structures while other threads are
accessing it.

Test Plan: Run TSAN.

Reviewers: igor, sdong

Reviewed By: igor, sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D34095
2015-02-26 15:11:50 -08:00
Igor Sugak
03b432d4b8 rocksdb: Fix uninitialized use error
Summary:
When using latest clang (3.6 or 3.7/trunck) rocksdb is failing with many errors. Some errors are uninitialized use errors.

```
...
  CC       db/log_test.o
util/ldb_cmd.cc:394:16: error: base class 'rocksdb::LDBCommand' is uninitialized when used here to access 'rocksdb::LDBCommand::BuildCmdLineOptions' [-Werror,-Wuninitialized]
               BuildCmdLineOptions({ARG_FROM, ARG_TO, ARG_HEX, ARG_KEY_HEX,
               ^
...
```

```lang=c++
CompactorCommand::CompactorCommand(const vector<string>& params,
      const map<string, string>& options, const vector<string>& flags) :
    LDBCommand(options, flags, false,
               BuildCmdLineOptions({ARG_FROM, ARG_TO, ARG_HEX, ARG_KEY_HEX,
                                    ARG_VALUE_HEX, ARG_TTL})),
    null_from_(true), null_to_(true) {
. . .
}
```
For the fourth parameter of the base constructor (`LDBCommand`) we call `BuildCmdLineOptions`, which is a private non-static method of `LDBCommand` base class.

This diff adds missing `static` keyword for `LDBCommand::BuildCmdLineOptions` method.

Test Plan:
Build with trunk clang and make sure all tests are passing.
```lang=bash
% # Have trunk clang present in path.
% ROCKSDB_NO_FBCODE=1 CC=clang CXX=clang++ make check
``

Reviewers: meyering, sdong, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34083
2015-02-26 14:19:51 -08:00
Igor Sugak
62247ffa3b rocksdb: Add missing override
Summary:
When using latest clang (3.6 or 3.7/trunck) rocksdb is failing with many errors. Almost all of them are missing override errors. This diff adds missing override keyword. No manual changes.

Prerequisites: bear and clang 3.5 build with extra tools

```lang=bash
% USE_CLANG=1 bear make all # generate a compilation database http://clang.llvm.org/docs/JSONCompilationDatabase.html
% clang-modernize -p . -include . -add-override
% make format
```

Test Plan:
Make sure all tests are passing.
```lang=bash
% #Use default fb code clang.
% make check
```
Verify less error and no missing override errors.
```lang=bash
% # Have trunk clang present in path.
% ROCKSDB_NO_FBCODE=1 CC=clang CXX=clang++ make
```

Reviewers: igor, kradhakrishnan, rven, meyering, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34077
2015-02-26 11:28:41 -08:00
krad
d9f4875e52 Disable pre-fetching of index and filter blocks for sst_dump_tool.
Summary:
BlockBasedTable pre-fetches the filter and index blocks on Open call.
This is an optimistic optimization targeted for runtime scenario. The
optimization is unnecessary for sst_dump_tool

- Added a provision to disable pre-fetching of index and filter blocks
  in BlockBasedTable
- Disabled pre-fetching for the sst_dump tool

Stack for reference :

#01  0x00000000005ed944 in snappy::InternalUncompress<snappy::SnappyArrayWriter> () from /home/engshare/third-party2/snappy/1.0.3/src/snappy-1.0.3/snappy.cc:148
#02  0x00000000005edeee in snappy::RawUncompress () from /home/engshare/third-party2/snappy/1.0.3/src/snappy-1.0.3/snappy.cc:947
#03  0x00000000004e0b4d in rocksdb::UncompressBlockContents () from /data/users/paultuckfield/rocksdb/./util/compression.h:69
#04  0x00000000004e145c in rocksdb::ReadBlockContents () from /data/users/paultuckfield/rocksdb/table/format.cc:334
#05  0x00000000004ca424 in rocksdb::(anonymous namespace)::ReadBlockFromFile () from /data/users/paultuckfield/rocksdb/table/block_based_table_reader.cc:70
#06  0x00000000004cccad in rocksdb::BlockBasedTable::CreateIndexReader () from /data/users/paultuckfield/rocksdb/table/block_based_table_reader.cc:173
#07  0x00000000004d17e5 in rocksdb::BlockBasedTable::Open () from /data/users/paultuckfield/rocksdb/table/block_based_table_reader.cc:553
#08  0x00000000004c8184 in rocksdb::BlockBasedTableFactory::NewTableReader () from /data/users/paultuckfield/rocksdb/table/block_based_table_factory.cc:51
#09  0x0000000000598463 in rocksdb::SstFileReader::NewTableReader () from /data/users/paultuckfield/rocksdb/util/sst_dump_tool.cc:69
#10  0x00000000005986c2 in rocksdb::SstFileReader::SstFileReader () from /data/users/paultuckfield/rocksdb/util/sst_dump_tool.cc:26
#11  0x0000000000599047 in rocksdb::SSTDumpTool::Run () from /data/users/paultuckfield/rocksdb/util/sst_dump_tool.cc:332
#12  0x0000000000409b06 in main () from /data/users/paultuckfield/rocksdb/tools/sst_dump.cc:12

Test Plan:
- Added a unit test to trigger the code.
- Also did some manual verification.
- Passed all unit tests

task #6296048

Reviewers: igor, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34041
2015-02-25 16:34:26 -08:00
krad
a047409ae9 Fixed a bug in the test case
Summary:
The unit test was supposed to check that the old file and the new file contains
the header message.

Test Plan: Run the unit test.

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D33705
2015-02-23 11:59:39 -08:00
Jim Meyering
1b4082581c mark as unused some variables with cpp-derived names
Summary: as above

Test Plan:
  Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors.

Reviewers: ljin, sdong, igor.sugak, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D33753
2015-02-20 11:07:21 -08:00
Islam AbdelRahman
06a766de56 Adding Flush to AutoRollLogger
Summary:
During running AutoRollLoggerTest on FreeBSD we have found out that AutoRollLogger is not flushing correctly (test fails on FreeBSD)
This diff add Flush to AutoRollLogger to fix this problem

Test Plan:
[My machine] make all check ( all tests pass)
[FreeBSD VM] running AutoRollLoggerTest ( all tests pass )

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D33633
2015-02-20 10:58:29 -08:00
Igor Canadi
92416fa7f2 Fix mac build 2015-02-19 19:26:38 -08:00
Igor Canadi
96ab15d306 GetOptionsFromString + fixes to block_based_table_options
Summary:
In mongo, we currently have a single column family and I'd like to support setting rocksdb::Options from string. This diff provides an option to GetOptionsFromString()

There's one more problem. Currently GetColumnFamilyOptionsFromString() overwrites block_based options. In mongo I set default values for block_cache and some other values of BlockBasedTableOptions and I don't want them reset to default with GetOptionsFromString().

Test Plan: added unit test

Reviewers: yhchiang, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D33729
2015-02-19 19:11:14 -08:00
Jim Meyering
a42324e370 build: do not relink every single binary just for a timestamp
Summary:
Prior to this change, "make check" would always waste a lot of
time relinking 60+ binaries. With this change, it does that
only when the generated file, util/build_version.cc, changes,
and that happens only when the date changes or when the
current git SHA changes.

This change makes some other improvements: before, there was no
rule to build a deleted util/build_version.cc. If it was somehow
removed, any attempt to link a program would fail.
There is no longer any need for the separate file,
build_tools/build_detect_version.  Its functionality is
now in the Makefile.

* Makefile (DEPFILES): Don't filter-out util/build_version.cc.
No need, and besides, removing that dependency was wrong.
(date, git_sha, gen_build_version): New helper variables.
(util/build_version.cc): New rule, to create this file
and update it only if it would contain new information.
* build_tools/build_detect_platform: Remove file.
* db/db_impl.cc: Now, print only date (not the time).
* util/build_version.h (rocksdb_build_compile_time): Remove
declaration.  No longer used.

Test Plan:
- Run "make check" twice, and note that the second time no linking is performed.
- Remove util/build_version.cc and ensure that any "make"
command regenerates it before doing anything else.
- Run this: strings librocksdb.a|grep _build_.
That prints output including the following:

  rocksdb_build_git_date:2015-02-19
  rocksdb_build_git_sha:2.8.fb-1792-g3cb6cc0

Reviewers: ljin, sdong, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D33591
2015-02-19 13:11:10 -08:00
Venkatesh Radhakrishnan
7d817268b9 Managed iterator
Summary:
This is a diff for managed iterator. A managed iterator
is a wrapper around an iterator which saves the options for that
iterator as well as the current key/value so that the underlying iterator
and its associated memory can be released when it is aged out
automatically or on the request of the user. Will provide the automatic release as a follow-up diff.

Test Plan: Managed* tests in db_test and XF tests for managed iterator

Reviewers: igor, yhchiang, anthony, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D31401
2015-02-18 11:49:31 -08:00
Igor Sugak
3ad6b794cf rocksdb: Fix 'Division by zero' scan-build warning
Summary:
scan-build complains with division by zero warning in a test. Added an assertion to prevent this.

scan-build report: http://home.fburl.com/~sugak/latest6/report-c61be9.html#EndPath

Test Plan:
Make sure scan-build does not report 'Division by zero' and all tests are passing.
```lang=bash
% make analyze
% make check
```

Reviewers: igor, meyering

Reviewed By: meyering

Subscribers: sdong, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D33495
2015-02-17 17:54:02 -08:00
Yueh-Hsuan Chiang
f0c36da6ee Add thread_status_util_debug.cc back
Summary:
Add thread_status_util_debug.cc back as InstrumentedMutex related tests
are using it to produce wait that can be reflected in the counter.

Test Plan:
./perf_context_test
export ROCKSDB_TESTS=MutexWaitStats
./db_test

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D33525
2015-02-17 11:29:36 -08:00
Yueh-Hsuan Chiang
e60bc99fe0 Allow GetThreadList to reflect flush activity.
Summary: Allow GetThreadList to reflect flush activity.

Test Plan:
Developed ThreadStatusFlush test and updated ThreadStatusMultiCompaction test.

./db_test  ./thread_list_test

Reviewers: sdong, rven, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D32871
2015-02-17 10:13:52 -08:00
Igor Sugak
4e4b857841 rocksdb: Fix scan-build 'Called C++ object pointer is null' and 'Dereference of null pointer' bugs
Summary:
In the existing implementation of `ASSERT*`, test termination happens in `~Tester`, which is called when instance of `Tester` goes out of scope. This is the cause of many scan-build bugs.

This diff changes `ASSERT*` to terminate the test immediately. Also added one suppression in `util/signal_test.cc`

scan-build bugs
before: http://home.fburl.com/~sugak/latest/index.html
after: http://home.fburl.com/~sugak/latest2/index.html

Test Plan:
Modify some test to fail an assertion and make sure that `ASSERT*` terminated the test.

Run `make analyze` and make sure no 'Called C++ object pointer is null' and 'Dereference of null pointer' bugs reported.

Run tests and make sure no failing tests:
```lang=bash
% make check
% USE_CLANG=1 make check
```

Reviewers: meyering, lgalanis, sdong, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D33381
2015-02-13 15:10:47 -08:00
sdong
6d6305dd7d Perf Context to report DB mutex waiting time
Summary: Add counters in perf context to allow users to figure out how time spent on waiting for DB mutex

Test Plan: Add a test and run it.

Reviewers: yhchiang, rven, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D33177
2015-02-09 17:55:12 -08:00
fyrz
cfe8837e43 Switch logv with loglevel to virtual 2015-02-09 20:59:29 +01:00
Igor Canadi
218c3ecea3 Fix std::cout data race
Summary: std::cout is not thread safe. tsan complains. Eliminate it.

Test Plan: env_test with TSAN

Reviewers: yhchiang, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D33087
2015-02-06 13:01:59 -08:00
Karthikeyan Radhakrishnan
8f679c2901 Merge branch 'master' of github.com:facebook/rocksdb 2015-02-06 11:02:25 -08:00
Karthikeyan Radhakrishnan
da9cbce731 Add Header to logging to capture application level information
Summary:
This change adds LogHeader provision to the logger. For the rolling logger
implementation, the headers are copied over to the new log file every time
there is a log roll over.

Test Plan: Added a unit test to test the rolling log case.

Reviewers: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D32817
2015-02-06 10:37:45 -08:00
Igor Canadi
2a979822b6 Fix deleting obsolete files
Summary:
This diff basically reverts D30249 and also adds a unit test that was failing before this patch.

I have no idea how I didn't catch this terrible bug when writing a diff, sorry about that :(

I think we should redesign our system of keeping track of and deleting files. This is already a second bug in this critical piece of code. I'll think of few ideas.

BTW this diff is also a regression when running lots of column families. I plan to revisit this separately.

Test Plan: added a unit test

Reviewers: yhchiang, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D33045
2015-02-06 08:44:30 -08:00
Yueh-Hsuan Chiang
8e83a9d315 Add a missing field for STATE_MUTEX_WAIT to global_state_table
Summary:
Add a missing field for STATE_MUTEX_WAIT to global_state_table.
This will fix the failure of thread_list_test.

Test Plan:
thread_list_test
2015-02-06 02:38:14 -08:00
Yueh-Hsuan Chiang
181191a1e4 Add a counter for collecting the wait time on db mutex.
Summary:
Add a counter for collecting the wait time on db mutex.
Also add MutexWrapper and CondVarWrapper for measuring wait time.

Test Plan:
./db_test
export ROCKSDB_TESTS=MutexWaitStats
./db_test

verify stats output using db_bench
make clean
make release
./db_bench --statistics=1 --benchmarks=fillseq,readwhilewriting --num=10000 --threads=10

Sample output:
    rocksdb.db.mutex.wait.micros COUNT : 7546866

Reviewers: MarkCallaghan, rven, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D32787
2015-02-04 21:39:45 -08:00
sdong
fe9f691194 Fix fault_injestion_test
Summary: A bug in MockEnv causes fault_injestion_test to fail. I don't know why it doesn't fail every time but it doesn't seem to be right.

Test Plan:
Run fault_injestion_test
Also run db_test with MEM_ENV=1 until the first failure.

Reviewers: yhchiang, rven, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D32877
2015-02-04 17:49:07 -08:00
Yueh-Hsuan Chiang
678503ebcf Add utility functions for interpreting ThreadStatus
Summary:
Add ThreadStatus::GetOperationName() and ThreadStatus::GetStateName(),
two utility functions that help interpreting ThreadStatus.

Test Plan: ./thread_list_test

Reviewers: sdong, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D32793
2015-02-04 01:47:32 -08:00
Yueh-Hsuan Chiang
756e1f151e Remove unused util/thread_event_info.h
Summary:
Remove unused util/thread_event_info.h, which is replaced by
util/thread_operation.h

Test Plan:
make dbg -j32
make release -j32
2015-02-03 17:53:05 -08:00
sdong
829363b449 Options::PrepareForBulkLoad() to increase parallelism of flushes
Summary: Increasing parallelism of flushes will help bulk load throughput.

Test Plan: Compile it.

Reviewers: MarkCallaghan, yhchiang, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D32685
2015-02-03 09:46:04 -08:00
Igor Canadi
b04408c47b Fix unity build
Summary: I broke it with 2fd8f750ab

Test Plan: make unity

Reviewers: yhchiang, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D32577
2015-02-03 00:32:11 -06:00
Yueh-Hsuan Chiang
2c2d5ab7e8 Fix compile warning in util/xfunc.h
Summary:
./util/xfunc.h:31:1: error: class 'Options' was previously declared as a struct [-Werror,-Wmismatched-tags]
class Options;
^

Test Plan:
make dbg -j32
2015-02-02 15:20:19 -08:00
Venkatesh Radhakrishnan
0b8dec7172 Cross functional test infrastructure for RocksDB.
Summary:
This Diff provides the implementation of the cross functional
test infrastructure. This provides the ability to test a single feature
with every existing regression test in order to identify issues with
interoperability between features.

Test Plan:
Reference implementation of inplace update support cross
functional test. Able to find interoperability issues with inplace
support and ran all of db_test. Will add separate diff for those changes.

Reviewers: igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D32247
2015-02-02 14:49:22 -08:00
Erik Garrison
e6eaf938c3 remove old debugging message (#487)
It doesn't seem this is needed.
2015-02-01 20:34:24 +00:00
sdong
5917de0bae CappedFixTransform: return fixed length prefix, or full key if key is shorter than the fixed length
Summary: Add CappedFixTransform, which is the same as fixed length prefix extractor, except that when slice is shorter than the fixed length, it will use the full key.

Test Plan:
Add a test case for
db_test
options_test
and a new test

Reviewers: yhchiang, rven, igor

Reviewed By: igor

Subscribers: MarkCallaghan, leveldb, dhruba, yoshinorim

Differential Revision: https://reviews.facebook.net/D31887
2015-01-30 16:04:30 -08:00
Igor Canadi
2fd8f750ab Compile MemEnv with standard RocksDB library
Summary: This was a feature request by osquery. See task t5617758

Test Plan: compiles and memenv_test runs

Reviewers: yhchiang, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D32115
2015-01-29 16:33:11 -08:00
sdong
e84299c769 Fix bug recently introduced in MemFile::Lock()
Summary: This bug fails DBTest.CheckLock

Test Plan: DBTest.CheckLock now passes with MEM_ENV=1.

Reviewers: rven, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D32451
2015-01-28 15:21:39 -08:00
sdong
e5aab4c2b2 Fix data race in HashLinkList
Summary:
1) need to do acquire load when read the first entry in the bucket.
2) Make num_entries atomic

Test Plan: Ran DBTest.MultiThreaded with TSAN

Reviewers: yhchiang, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D32361
2015-01-28 15:16:52 -08:00
sdong
10af17f3d7 fault_injection_test: add a unit test to allow parallel compactions and multiple levels
Summary: Add a new test case in fault_injection_test, which covers parallel compactions and multiple levels. Use MockEnv to run the new test case to speed it up. Improve MockEnv to avoid DestoryDB(), previously failed when deleting lock files.

Test Plan: Run ./fault_injection_test, including valgrind

Reviewers: rven, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D32415
2015-01-28 14:07:25 -08:00
Igor Canadi
0c4d1053df Fix data race #5
Summary: TSAN complained that these are non-atomic reads and writes from different threads.

Test Plan: TSAN no longer complains

Reviewers: yhchiang, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D32409
2015-01-28 13:42:40 -08:00
Igor Canadi
e8bf2310a0 Remove blob store from the codebase
Summary: We don't have plans to work on this in the short term. If we ever resurrect the project, we can find the code in the history. No need for it to linger around

Test Plan: no test

Reviewers: yhchiang, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D32349
2015-01-27 16:55:33 -08:00
sdong
4c49fedaf1 Use ustricter consistency in thread local operations
Summary:
ThreadSanitizer complains data race of super version and version's destructor with Get(). This patch will fix those warning.

The warning is likely from ColumnFamilyData::ReturnThreadLocalSuperVersion(). With relaxed consistency of CAS, reading the data of the super version can technically happen after swapping it in, enabling the background thread to clean it up.

Test Plan: make all check

Reviewers: rven, igor, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D32265
2015-01-27 13:56:03 -08:00
Igor Canadi
26b50783d3 Fix assert in histogramData 2015-01-23 18:10:52 -08:00
Yueh-Hsuan Chiang
3b494a6103 Make options_test runnable on ROCKSDB_LITE
Summary:
Make options_test runnable on ROCKSDB_LITE by blocking
those tests that require non-ROCKSDB_LITE feature.

Test Plan:
make options_test OPT=-DROCKSDB_LITE -j32
./options_test
make clean
make options_test -j32
./options_test

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D32025
2015-01-22 16:57:16 -08:00
Yoshinori Matsunobu
ceaea2b72d Adding prefix_extractor string config parameter
Summary:
This diff enables to configure prefix_extractor
string parameter as a CF option.

Test Plan: make all check, ./options_test

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D31653
2015-01-16 13:44:42 -08:00
Igor Canadi
9ab5adfc59 New BlockBasedTable version -- better compressed block format
Summary:
This diff adds BlockBasedTable format_version = 2. New format version brings better compressed block format for these compressions:
1) Zlib -- encode decompressed size in compressed block header
2) BZip2 -- encode decompressed size in compressed block header
3) LZ4 and LZ4HC -- instead of doing memcpy of size_t encode size as varint32. memcpy is very bad because the DB is not portable accross big/little endian machines or even platforms where size_t might be 8 or 4 bytes.

It does not affect format for snappy.

If you write a new database with format_version = 2, it will not be readable by RocksDB versions before 3.10. DB::Open() will return corruption in that case.

Test Plan:
Added a new test in db_test.
I will also run db_bench and verify VSIZE when block_cache == 1GB

Reviewers: yhchiang, rven, MarkCallaghan, dhruba, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D31461
2015-01-14 16:24:24 -08:00
Igor Canadi
02b30202cd Merge pull request #455 from Andersbakken/stdlib_fix
Build with clang 3.5 on Linux.
2015-01-13 09:08:32 -08:00
Yueh-Hsuan Chiang
c91cdd59c1 Allow GetThreadList() to indicate a thread is doing Compaction.
Summary: Allow GetThreadList() to indicate a thread is doing Compaction.

Test Plan:
export ROCKSDB_TESTS=ThreadStatus
./db_test

Reviewers: ljin, igor, sdong

Reviewed By: sdong

Subscribers: leveldb, dhruba, jonahcohen, rven

Differential Revision: https://reviews.facebook.net/D30105
2015-01-13 00:04:08 -08:00
Anders Bakken
a9ea65d652 Build with clang 3.5 on Linux. 2015-01-12 09:59:36 -08:00
Igor Canadi
abb9b95ffe Move compression functions from port/ to util/
Summary: We keep checksum functions in util/, there is no reason for compression to be in port/

Test Plan: compiles

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D31281
2015-01-09 12:57:11 -08:00
Igor Canadi
62ad0a9b19 Deprecating skip_log_error_on_recovery
Summary:
Since https://reviews.facebook.net/D16119, we ignore partial tailing writes. Because of that, we no longer need skip_log_error_on_recovery.

The documentation says "Skip log corruption error on recovery (If client is ok with losing most recent changes)", while the option actually ignores any corruption of the WAL (not only just the most recent changes). This is very dangerous and can lead to DB inconsistencies. This was originally set up to ignore partial tailing writes, which we now do automatically (after D16119). I have digged up old task t2416297 which confirms my findings.

Test Plan: There was actually no tests that verified correct behavior of skip_log_error_on_recovery.

Reviewers: yhchiang, rven, dhruba, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D30603
2015-01-05 13:35:56 -08:00
Yueh-Hsuan Chiang
bf287b76e0 Add structures for exposing thread events and operations.
Summary:
Add structures for exposing events and operations.  Event describes
high-level action about a thread such as doing compaciton or
doing flush, while an operation describes lower-level action
of a thread such as reading / writing a SST table, waiting for
mutex.  Events and operations are designed to be independent.
One thread would typically involve in one event and one operation.

Code instrument will be in a separate diff.

Test Plan:
Add unit-tests in thread_list_test
make dbg -j32
./thread_list_test
export ROCKSDB_TESTS=ThreadList
./db_test

Reviewers: ljin, igor, sdong

Reviewed By: sdong

Subscribers: rven, jonahcohen, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D29781
2014-12-30 10:39:13 -08:00
Manish Patil
2067058a60 Dump routine to BlockBasedTableReader (valgrind)
Summary: Fixed valgrind issue

Test Plan: valgrind check done

Reviewers: rven, sdong

Reviewed By: sdong

Subscribers: sdong, dhruba

Differential Revision: https://reviews.facebook.net/D30699
2014-12-23 18:01:29 -08:00
Manish Patil
7ea7bdf04d Dump routine to BlockBasedTableReader
Summary: Added necessary routines for dumping block based SST with block filter

Test Plan: Added "raw" mode to utility sst_dump

Reviewers: sdong, rven

Reviewed By: rven

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D29679
2014-12-23 13:24:07 -08:00
Lei Jin
5045c43944 add support for nested BlockBasedTableOptions in config string
Summary:
Add support to allow nested config for block-based table factory. The format looks like this:

"write_buffer_size=1024;block_based_table_factory={block_size=4k};max_write_buffer_num=2"

Test Plan: unit test

Reviewers: yhchiang, rven, igor, ljin, jonahcohen

Reviewed By: jonahcohen

Subscribers: jonahcohen, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D29223
2014-12-22 16:34:21 -08:00
Yueh-Hsuan Chiang
45bab305f9 Move GetThreadList() feature under Env.
Summary:
GetThreadList() feature depends on the thread creation and destruction, which is currently handled under Env.
This patch moves GetThreadList() feature under Env to better manage the dependency of GetThreadList() feature
on thread creation and destruction.

Renamed ThreadStatusImpl to ThreadStatusUpdater.  Add ThreadStatusUtil, which is a static class contains
utility functions for ThreadStatusUpdater.

Test Plan: run db_test, thread_list_test and db_bench and verify the life cycle of Env and ThreadStatusUpdater is properly managed.

Reviewers: igor, sdong

Reviewed By: sdong

Subscribers: ljin, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D30057
2014-12-22 12:20:17 -08:00
Venkatesh Radhakrishnan
7198ed5a2e Handle errors during pthread calls
Summary: Release locks before calling exit.

Test Plan: Force errors in debugger and verify correctness

Reviewers: igor, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D30423
2014-12-17 16:25:09 -08:00
Qiao Yang
cef6f84393 Added 'dump_live_files' command to ldb tool.
Summary:
Priliminary diff to solicit comments.
Given DB path, dump all SST files (key/value and properties), WAL file and manifest
files. What command options do we need to support for this command? Maybe
output_hex for keys?

Test Plan: Create additional ldb unit tests.

Reviewers: sdong, rven

Reviewed By: rven

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D29547
2014-12-12 17:50:36 -08:00
Yueh-Hsuan Chiang
74b3fb6d97 Fix Mac compile errors on util/cache_test.cc
Summary:
Fix Mac compile errors on util/cache_test.cc

Test Plan:
make dbg -j32
./cache_test
2014-12-11 14:15:13 -08:00
Alexey Maykov
ee95cae9a4 Modifed the LRU cache eviction code so that it doesn't evict blocks which have exteranl references
Summary:
Currently, blocks which have more than one reference (ie referenced by something other than cache itself) are evicted from cache. This doesn't make much sense:
- blocks are still in RAM, so the RAM usage reported by the cache is incorrect
- if the same block is needed by another iterator, it will be loaded and decompressed again

This diff changes the reference counting scheme a bit. Previously, if the cache contained the block, this was accounted for in its refcount. After this change, the refcount is only used to track external references. There is a boolean flag which indicates whether or not the block is contained in the cache.
This diff also changes how LRU list is used. Previously, both hashtable and the LRU list contained all blocks. After this change, the LRU list contains blocks with the refcount==0, ie those which can be evicted from the cache.

Note that this change still allows for cache to grow beyond its capacity. This happens when all blocks are pinned (ie refcount>0). This is consistent with the current behavior. The cache's insert function never fails. I spent lots of time trying to make table_reader and other places work with the insert which might failed. It turned out to be pretty hard. It might really destabilize some customers, so finally, I decided against doing this.

table_cache_remove_scan_count_limit option will be unneeded after this change, but I will remove it in the following diff, if this one gets approved

Test Plan: Ran tests, made sure they pass

Reviewers: sdong, ljin

Differential Revision: https://reviews.facebook.net/D25503
2014-12-10 22:28:53 -08:00
Igor Canadi
cb82d7b081 Fix #434
Summary: Why do we assert here? This doesn't seem like user friendly thing to do :)

Test Plan: none

Reviewers: sdong, yhchiang, rven

Reviewed By: rven

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D30027
2014-12-09 10:22:07 -08:00
Yueh-Hsuan Chiang
8f4e1c1c9a Remove the compability check on log2 OS_ANDROID as it's already blocked by ROCKSDB_LITE
Summary:
Remove the compability check on log2 OS_ANDROID as it's already blocked by ROCKSDB_LITE

Test Plan:
make OPT="-DROCKSDB_LITE -DOS_ANDROID" shared_lib -j32
make shared_lib -j32
2014-12-04 13:56:14 -08:00