Commit Graph

1185 Commits

Author SHA1 Message Date
kailiu
74939a9e13 Make the block-based table's index pluggable
Summary:
This patch introduced a new table options that allows us to make
block-based table's index pluggable.

To support that new features:

* Code has been refacotred to be more flexible and supports this option well.
* More documentation is added for the existing obsecure functionalities.
* Big surgeon on DataBlockReader(), where the logic was really convoluted.
* Other small code cleanups.

The pluggablility will mostly affect development of internal modules
and won't change frequently, as a result I intentionally avoid
heavy-weight patterns (like factory) and try to make it simple.

Test Plan: make all check

Reviewers: haobo, sdong

Reviewed By: sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16395
2014-02-28 18:19:07 -08:00
kailiu
bf86af5174 Remove the terrible hack in for flush_block_policy_factory
Summary:
Previous code is too convoluted and I must be drunk for letting
such code to be written without a second thought.

Thanks to the discussion with @sdong, I added the `Options` when
generating the flusher, thus avoiding the tricks.

Just FYI: I resisted to add Options in flush_block_policy.h since I
wanted to avoid cyclic dependencies: FlushBlockPolicy dpends on Options
and Options also depends FlushBlockPolicy... While I appreciate my
effort to prevent it, the old design turns out creating more troubles than
it tried to avoid.

Test Plan: ran ./table_test

Reviewers: sdong

Reviewed By: sdong

CC: sdong, leveldb

Differential Revision: https://reviews.facebook.net/D16503
2014-02-28 16:39:27 -08:00
Igor Canadi
58ca641d53 Make Log::Reader more robust
Summary:
This diff does two things:
(1) Log::Reader does not report a corruption when the last record in a log or manifest file is truncated (meaning that log writer died in the middle of the write). Inherited the code from LevelDB: https://code.google.com/p/leveldb/source/detail?r=269fc6ca9416129248db5ca57050cd5d39d177c8#
(2) Turn off mmap writes for all writes to log and manifest files

(2) is necessary because if we use mmap writes, the last record is not truncated, but is actually filled with zeros, making checksum fail. It is hard to recover from checksum failing.

Test Plan:
Added unit tests from LevelDB
Actually recovered a "corrupted" MANIFEST file.

Reviewers: dhruba, haobo

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16119
2014-02-28 13:19:47 -08:00
Yueh-Hsuan Chiang
a77527f2af Add ReadOptions to TransactionLogIterator.
Summary:
Add an optional input parameter ReadOptions to DB::GetUpdateSince(),
which allows the verification of checksums to be disabled by setting
ReadOptions::verify_checksums to false.

Test Plan: Tests are done off-line and will not be included in the regular unit test.

Reviewers: igor

Reviewed By: igor

CC: leveldb, xjin, dhruba

Differential Revision: https://reviews.facebook.net/D16305
2014-02-28 11:50:36 -08:00
Kai Liu
6ba1084f24 Fix some compilation bugs in different platforms
Summary:

detect some problems when testing my 3rd party release tool.
2014-02-27 22:20:17 -08:00
Kai Liu
99e4b40a55 Fix the [-Werror=sign-compare] issues
Summary:

Test Plan:

Reviewers:

CC:

Task ID: #

Blame Rev:
2014-02-27 22:18:33 -08:00
Yueh-Hsuan Chiang
9a7b74954f Refine the checks in InfoLogLevel test.
Summary:
InfoLogLevel test now checks the number of lines of the output log file
instead of the number of bytes in the log file.

This diff fixes the issue that the previous InfoLogLevel test in
auto_roll_logger_test passed in make check but fails when valgrind
is used.

Test Plan: run with make check and valgrind.

Reviewers: kailiu

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16407
2014-02-27 14:00:10 -08:00
Lei Jin
ad0c3747cb cache SuperVersion in thread local storage to avoid mutex lock
Summary: as title

Test Plan:
asan_check
will post results later

Reviewers: haobo, igor, dhruba, sdong

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16257
2014-02-27 11:38:55 -08:00
kailiu
e41c060a06 Make sure logger is safely released in InfoLogLevel
Summary: fix the memory leak that was captured by jenkin build.

Test Plan: ran the valgrind test locally

Reviewers: yhchiang

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16389
2014-02-26 19:07:57 -08:00
kailiu
444cafc28c Fix inconsistent code format
Summary:
Found some function follows camel style. When naming funciton, we have two styles:

Trivially expose internal data in readonly mode: `all_lower_case()`
Regular function: `CapitalizeFirstLetter()`

I renames these functions.

Test Plan: make -j32

Reviewers: haobo, sdong, dhruba, igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16383
2014-02-26 18:56:39 -08:00
sdong
a04dbf6e49 PlainTable::Next() should pass the error message from ReadKey()
Summary:
PlainTable::Next() should pass the error message from ReadKey(). Now it would return a wrong error message.
Also improve the messages of status when failing to read

Test Plan: make all check

Reviewers: ljin, kailiu, haobo

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16365
2014-02-26 15:12:44 -08:00
Yueh-Hsuan Chiang
ccaedd16d4 Enable log info with different levels.
Summary:
* Now each Log related function has a variant that takes an additional
  argument indicating its log level, which is one of the following:
 - DEBUG, INFO, WARN, ERROR, FATAL.

* To ensure backward-compatibility, old version Log functions are kept
  unchanged.

* Logger now has a member variable indicating its log level.  Any incoming
  Log request which log level is lower than Logger's log level will not
  be output.

* The output of the newer version Log will be prefixed by its log level.

Test Plan:
Add a LogType test in auto_roll_logger_test.cc

 = Sample log output =
    2014/02/11-00:03:07.683895 7feded179840 [DEBUG] this is the message to be written to the log file!!
    2014/02/11-00:03:07.683898 7feded179840 [INFO] this is the message to be written to the log file!!
    2014/02/11-00:03:07.683900 7feded179840 [WARN] this is the message to be written to the log file!!
    2014/02/11-00:03:07.683903 7feded179840 [ERROR] this is the message to be written to the log file!!
    2014/02/11-00:03:07.683906 7feded179840 [FATAL] this is the message to be written to the log file!!

Reviewers: dhruba, xjin, kailiu

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16071
2014-02-26 14:41:28 -08:00
Lei Jin
b2795b799e thread local pointer storage
Summary:
This is not a generic thread local implementation in the sense that it
only takes pointer. But it does support multiple instances per thread
and lets user plugin function to perform cleanup when thread exits or an
instance gets destroyed.

Test Plan: unit test for now

Reviewers: haobo, igor, sdong, dhruba

Reviewed By: igor

CC: leveldb, kailiu

Differential Revision: https://reviews.facebook.net/D16131
2014-02-25 17:47:37 -08:00
Igor Canadi
4209516359 Schedule flush when waiting on flush
Summary:
This will also help with avoiding the deadlock. If a flush failed and we're waiting for a memtable to be flushed, we should schedule a new flush and hope a new one succeedes.

If paranoid_checks = false, Wait() will still hang on ENOSPC, but at least it will automatically continue when the space frees up. Current behavior both hangs and deadlocks.

Also, I renamed some 'compaction' to 'flush'. 'compaction' was leveldb way of saying things.

Test Plan: make check

Reviewers: dhruba, haobo, ljin

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16281
2014-02-25 12:04:14 -08:00
Lei Jin
dea894ef8d expose wal_dir in db_bench
Summary: as title

Test Plan: ran db_bench

Reviewers: dhruba, haobo

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16269
2014-02-25 10:43:46 -08:00
Igor Canadi
a8c1f2970d Merge pull request #90 from alberts/morecapi
A few more C API functions.
2014-02-25 10:42:53 -08:00
Albert Strasheim
72aacf6b96 A few more C API functions. 2014-02-25 10:32:28 -08:00
Igor Canadi
6ed450a58c DeleteFile should schedule Flush or Compaction
Summary:
More info here: https://github.com/facebook/rocksdb/issues/89
If flush fails because of ENOSPC, we have a deadlock problem. This is a quick fix that will continue the normal operation when user deletes the file and frees up the space on the device.

We need to address the issue more broadly with bg_error_ cleanup.

Test Plan: make check

Reviewers: dhruba, haobo, ljin

Reviewed By: ljin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16275
2014-02-24 16:00:13 -08:00
Igor Canadi
2bf1151a25 Fix C API 2014-02-24 15:15:34 -08:00
sdong
01c27be5fb A simple benchmark to measure WAL append latency
Summary: A simple benchmark that simulates WAL append. It can be used to test different platform/file system's performance on WAL.

Test Plan: run it.

Reviewers: haobo, kailiu

Reviewed By: haobo

CC: igor, dhruba, i.am.jin.lei, yhchiang, leveldb, nkg-

Differential Revision: https://reviews.facebook.net/D16239
2014-02-24 14:39:32 -08:00
Igor Canadi
18a7cdfba0 Merge pull request #82 from tecbot/api-enhancements
Enhancements to the API
2014-02-24 14:20:13 -08:00
Kai Liu
c9244dcba6 Update the instruction to build shared library 2014-02-24 12:29:26 -08:00
Thomas Adam
ce2b1f7b44 added a test case for custom merge operator 2014-02-23 17:58:38 +01:00
Thomas Adam
68248a2ac5 added a delete method for custom filter policy and merge operator to make it possible to override the cleanup behaviour of the return value 2014-02-23 17:58:11 +01:00
Lei Jin
d45d17b2a3 allow lambda function syntax in cpplint
Summary: as title

Test Plan: arc lint

Reviewers: kailiu

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16251
2014-02-20 12:47:05 -08:00
Igor Canadi
15ac5ad1f6 Update to CONTRIBUTING.md 2014-02-20 10:55:54 -08:00
sdong
b2d29675c8 Add a test in prefix_test to verify correctness of results
Summary:
Add a test to verify HashLinkList and HashSkipList (mainly for the former one) returns the correct results when inserting the same bucket in the different orders.

Some other changes:
(1) add the test to test list
(2) fix compile error
(3) add header

Test Plan: ./prefix_test

Reviewers: haobo, kailiu

Reviewed By: haobo

CC: igor, yhchiang, i.am.jin.lei, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D16143
2014-02-19 17:00:34 -08:00
Kai Liu
2b205b35d8 Disable putting filter block to block cache
Summary: This bug caused server crash issues because the filter block is too big and kept purging out of cache.

Test Plan: Wrote a new unit tests to make sure it works.

Reviewers: dhruba, haobo, igor, sdong

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16221
2014-02-19 15:38:57 -08:00
Thomas Adam
d74c9b79ea Enhancements to the API 2014-02-19 23:59:54 +01:00
sdong
e90d3f7752 First Transaction Logs Should Not Skip Storage Options Given
Summary: Currently, the first transaction log file ignore bytes_per_sync and other storage-related options. It is not consistent. Fix it.

Test Plan: make all check. See the options set in GDB.

Reviewers: haobo, kailiu

Reviewed By: haobo

CC: igor, ljin, yhchiang, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D16215
2014-02-19 10:58:39 -08:00
kailiu
83e7842f80 Improve the check for header guard
Summary:
cpplint.py only recognize `#ifdef HEADER_GUARD` as header guard.
This patch enables the check for `#pragma once`.

Test Plan: New arc lint exclude the false alarm for `#pragma once`.

Reviewers: dhruba, sdong, igor, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16227
2014-02-19 01:02:34 -08:00
Kai Liu
78ce24a709 Fix the lint issues in dev box
Summary:
Owing to the difference between platforms (my macbook and dev server), arc lint throws fatal error in dev box.
To fix the problem (quickly), I removed all incompatible function calls.

Test Plan: ran `arc lint` in dev box and passed.

Reviewers: igor, yhchiang

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16173
2014-02-14 22:25:48 -08:00
Igor Canadi
be7e273d83 fix u/s comparison #83 2014-02-14 16:18:55 -08:00
kailiu
46812f68c3 Improve/fix bugs for the cpp linter
Summary:
Previous our new `arc lint` has two annoying bugs:

* Keeping sending false alarm that we'd put c++ system files first -- even though we've already done that.
  - this problem is caused by our linter, which doesn't give the underlying cpplint.py right file path (it gives "-" as file name), making cpplint.py work incorrectly.
* Only works in rocksdb's root dir; Otherwise it'll throw exception saying "cannot find cpplint.py".

I copied open source ArcanistCpplintLinter and modifiy it for our use.

Test Plan: Ran arc lint and made sure the above-mentioned problem won't occur.

Reviewers: haobo, sdong, igor, ljin, yhchiang, dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16137
2014-02-13 17:48:11 -08:00
kailiu
63690625cd Expose the table properties to application
Summary: Provide a public API for users to access the table properties for each SSTable.

Test Plan: Added a unit tests to test the function correctness under differnet conditions.

Reviewers: haobo, dhruba, sdong

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16083
2014-02-13 16:28:21 -08:00
Kai Liu
b2e7ee8b41 Followup code refactor on plain table
Summary:
Fixed most comments in https://reviews.facebook.net/D15429.
Still have some remaining comments left.

Test Plan: make all check

Reviewers: sdong, haobo

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15885
2014-02-13 15:27:59 -08:00
Kai Liu
85c0545fac Put *.out to the ignore list (for MacOS) 2014-02-13 14:15:02 -08:00
Kai Liu
59cffe02c4 Benchmark table reader wiht nanoseconds
Summary: nanosecnods gave us better view of the performance, especially when some operations are fast so that micro seconds may only reveal less informative results.

Test Plan:
sample output:

    ./table_reader_bench --plain_table --time_unit=nanosecond
    =======================================================================================================
    InMemoryTableSimpleBenchmark:           PlainTable   num_key1:   4096   num_key2:   512   non_empty
    =======================================================================================================
    Histogram (unit: nanosecond):
    Count: 6291456  Average: 475.3867  StdDev: 556.05
    Min: 135.0000  Median: 400.1817  Max: 33370.0000
    Percentiles: P50: 400.18 P75: 530.02 P99: 887.73 P99.9: 8843.26 P99.99: 9941.21
    ------------------------------------------------------
    [     120,     140 )        2   0.000%   0.000%
    [     140,     160 )      452   0.007%   0.007%
    [     160,     180 )    13683   0.217%   0.225%
    [     180,     200 )    54353   0.864%   1.089%
    [     200,     250 )   101004   1.605%   2.694%
    [     250,     300 )   729791  11.600%  14.294% ##
    [     300,     350 )   616070   9.792%  24.086% ##
    [     350,     400 )  1628021  25.877%  49.963% #####
    [     400,     450 )   647220  10.287%  60.250% ##
    [     450,     500 )   577206   9.174%  69.424% ##
    [     500,     600 )  1168585  18.574%  87.999% ####
    [     600,     700 )   506875   8.057%  96.055% ##
    [     700,     800 )   147878   2.350%  98.406%
    [     800,     900 )    42633   0.678%  99.083%
    [     900,    1000 )    16304   0.259%  99.342%
    [    1000,    1200 )     7811   0.124%  99.466%
    [    1200,    1400 )     1453   0.023%  99.490%
    [    1400,    1600 )      307   0.005%  99.494%
    [    1600,    1800 )       81   0.001%  99.496%
    [    1800,    2000 )       18   0.000%  99.496%
    [    2000,    2500 )        8   0.000%  99.496%
    [    2500,    3000 )        6   0.000%  99.496%
    [    3500,    4000 )        3   0.000%  99.496%
    [    4000,    4500 )      116   0.002%  99.498%
    [    4500,    5000 )     1144   0.018%  99.516%
    [    5000,    6000 )     1087   0.017%  99.534%
    [    6000,    7000 )     2403   0.038%  99.572%
    [    7000,    8000 )     9840   0.156%  99.728%
    [    8000,    9000 )    12820   0.204%  99.932%
    [    9000,   10000 )     3881   0.062%  99.994%
    [   10000,   12000 )      135   0.002%  99.996%
    [   12000,   14000 )      159   0.003%  99.998%
    [   14000,   16000 )       58   0.001%  99.999%
    [   16000,   18000 )       30   0.000% 100.000%
    [   18000,   20000 )       14   0.000% 100.000%
    [   20000,   25000 )        2   0.000% 100.000%
    [   25000,   30000 )        2   0.000% 100.000%
    [   30000,   35000 )        1   0.000% 100.000%

Reviewers: haobo, dhruba, sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16113
2014-02-13 13:57:36 -08:00
sdong
b5140a0361 Fix table_reader_bench and add it to "make"
Summary: Fix table_reader_bench after some interface changes. Add it to make to avoid future breaking

Test Plan: make table_reader_bench and run it with different options.

Reviewers: kailiu, haobo

Reviewed By: haobo

CC: igor, leveldb

Differential Revision: https://reviews.facebook.net/D16107
2014-02-12 18:31:02 -08:00
Siying Dong
f3ae3d07cc Add more black-box tests for PlainTable and explicitly support total order mode
Summary:
1. Add some more implementation-aware tests for PlainTable
2. move from a hard-coded one index per 16 rows in one prefix to a configurable number. Also, make hash table ratio = 0  means binary search only. Also fixes some divide 0 risks.
3. Explicitly support total order (only use binary search)
4. some code cleaning up.

Test Plan: make all check

Reviewers: haobo, kailiu

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16023
2014-02-12 17:37:22 -08:00
kailiu
e6b3e3b4db Support prefix seek in UserCollectedProperties
Summary: We'll need the prefix seek support for property aggregation.

Test Plan: make all check

Reviewers: haobo, sdong, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15963
2014-02-12 13:14:59 -08:00
Igor Canadi
ca5f1a225a CompactionContext to include is_manual_compaction
Summary: Added a bit more information to compaction context, requested by internal team at FB.

Test Plan: Modified CompactionFilter test to make sure is_manual_compaction is properly set.

Reviewers: haobo

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16095
2014-02-12 12:24:18 -08:00
Lei Jin
994c327b86 IOError cleanup
Summary: Clean up IOErrors so that it only indicates errors talking to device.

Test Plan: make all check

Reviewers: igor, haobo, dhruba, emayanke

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15831
2014-02-12 11:42:54 -08:00
Lei Jin
5fbf2ef42d preload table handle on Recover() when max_open_files == -1
Summary: This covers existing table files before DB open happens and avoids contention on table cache

Test Plan: db_test

Reviewers: haobo, sdong, igor, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16089
2014-02-12 10:43:27 -08:00
Lei Jin
28b7f7faa8 enable plain table in db_bench
Summary: as title

Test Plan: ran db_bench to gather stats

Reviewers: haobo, sdong

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16059
2014-02-12 10:41:55 -08:00
kailiu
265150cb49 Fix problem 3 for issue #80 2014-02-11 17:52:18 -08:00
kailiu
aa734ce9ab Fix a member variables initialization order issue
Summary:
In MacOS, I got issue with `Footer`'s default constructor, which initialized the magic number with some random number instead of 0.
With investigation, I found we forgot to make the kInvalidTableMagicNumber to be static. As a result, kInvalidTableMagicNumber was assgined to `table_magic_number_` before it is initialized (which will be populated with random number).

Test Plan: passed current unit tests; also passed the unit tests for the incoming diff which used the default footer.

Reviewers: yhchiang

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16077
2014-02-11 14:16:46 -08:00
Siying Dong
33042669f6 Reduce malloc of iterators in Get() code paths
Summary:
This patch optimized Get() code paths by avoiding malloc of iterators. Iterator creation is moved to mem table rep implementations, where a callback is called when any key is found. This is the same practice as what we do in (SST) table readers.

db_bench result for readrandom following a writeseq, with no compression, single thread and tmpfs, we see throughput improved to 144958 from 139027, about 3%.

Test Plan: make all check

Reviewers: dhruba, haobo, igor

Reviewed By: haobo

CC: leveldb, yhchiang

Differential Revision: https://reviews.facebook.net/D14685
2014-02-11 10:32:51 -08:00
Kai Liu
d4b789fdee Add LIBRARY back to make dbg 2014-02-10 20:15:09 -08:00
kailiu
745c181e20 Quick fix for table_test failure
Summary:
* Fixed the compression state array size bug.
* Temporarily disable running `DoCompressionTest()` against bzip, which will fail the test.

Test Plan: make && ./table_test

Reviewers: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16065
2014-02-10 17:05:14 -08:00