Commit Graph

51 Commits

Author SHA1 Message Date
Ramkumar Vadivelu
9a690a74e1 In ParseInternalKey(), include corrupt key info in Status (#7515)
Summary:
Fixes Issue https://github.com/facebook/rocksdb/issues/7497

When allow_data_in_errors db_options is set, log error key details in `ParseInternalKey()`

Have fixed most of the calls. Have few TODOs still pending - because have to make more deeper changes to pass in the allow_data_in_errors flag. Will do those in a separate PR later.

Tests:
- make check
- some of the existing tests that exercise the "internal key too small" condition are: dbformat_test, cuckoo_table_builder_test
- some of the existing tests that exercise the corrupted key path are: corruption_test, merge_helper_test, compaction_iterator_test

Example of new status returns:
- Key too small - `Corrupted Key: Internal Key too small. Size=5`
- Corrupt key with allow_data_in_errors option set to false: `Corrupted Key: '<redacted>' seq:3, type:3`
- Corrupt key with allow_data_in_errors option set to true: `Corrupted Key: '61' seq:3, type:3`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7515

Reviewed By: ajkr

Differential Revision: D24240264

Pulled By: ramvadiv

fbshipit-source-id: bc48f5d4475ac19d7713e16df37505b31aac42e7
2020-10-28 10:12:58 -07:00
sdong
7508175558 Introduce options.check_flush_compaction_key_order (#7467)
Summary:
Introduce an new option options.check_flush_compaction_key_order, by default set to true, which checks key order of flush and compaction, and fail the operation if the order is violated.
Also did minor refactor hash checking code, which consolidates the hashing logic to a vlidation class, where the key ordering logic is added.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7467

Test Plan: Add unit tests to validate the check can catch reordering in flush and compaction, and can be properly disabled.

Reviewed By: riversand963

Differential Revision: D24010683

fbshipit-source-id: 8dd6292d2cda8006054e9ded7cfa4bf405f0527c
2020-10-01 10:10:26 -07:00
Ramkumar Vadivelu
e04a50923d Change ParseInternalKey() to return Status instead of bool (#7457)
Summary:
Fixes https://github.com/facebook/rocksdb/issues/7430

Change ParseInternalKey() to return Status instead of bool.

db_bench (seekrandom) based before/after results with value size of 100 bytes and 16 bytes can be found at (tests ran on an udb server):
https://www.dropbox.com/s/47bwamdy5ozngph/PIK_ret_Status_results.xlsx?dl=0

![db_bench_results](https://user-images.githubusercontent.com/62277872/94642825-2a21a800-029a-11eb-88f2-124136c83fd3.png)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7457

Reviewed By: ajkr

Differential Revision: D24002433

Pulled By: ramvadiv

fbshipit-source-id: ac253ecf577a29044c47c3fe254a01e71404c44c
2020-09-30 19:16:47 -07:00
mrambacher
d44cbc5314 Add hash of key/value checks when paranoid_file_checks=true (#7134)
Summary:
When paraoid_files_checks=true, a rolling key-value hash is generated and compared to what is written to the file.  If the values do not match, the SST file is rejected.

Code put in place for the check for both flush and compaction jobs.  Corresponding test added to corruption_test.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7134

Reviewed By: cheng-chang

Differential Revision: D22646149

fbshipit-source-id: 8fde1984a1a11edd3bd82a413acffc5ea7aa683f
2020-07-22 11:04:40 -07:00
Anand Ananthabhotla
9a5886bd8c Extend Get/MultiGet deadline support to table open (#6982)
Summary:
Current implementation of the ```read_options.deadline``` option only checks the deadline for random file reads during point lookups. This PR extends the checks to file opens, prefetches and preloads as part of table open.

The main changes are in the ```BlockBasedTable```, partitioned index and filter readers, and ```TableCache``` to take ReadOptions as an additional parameter. In ```BlockBasedTable::Open```, in order to retain existing behavior w.r.t checksum verification and block cache usage, we filter out most of the options in ```ReadOptions``` except ```deadline```. However, having the ```ReadOptions``` gives us more flexibility to honor other options like verify_checksums, fill_cache etc. in the future.

Additional changes in callsites due to function signature changes in ```NewTableReader()``` and ```FilePrefetchBuffer```.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6982

Test Plan: Add new unit tests in db_basic_test

Reviewed By: riversand963

Differential Revision: D22219515

Pulled By: anand1976

fbshipit-source-id: 8a3b92f4a889808013838603aa3ca35229cd501b
2020-06-29 14:53:17 -07:00
anand76
ab13d43e1d Pass a timeout to FileSystem for random reads (#6751)
Summary:
Calculate ```IOOptions::timeout``` using ```ReadOptions::deadline``` and pass it to ```FileSystem::Read/FileSystem::MultiRead```. This allows us to impose a tighter bound on the time taken by Get/MultiGet on FileSystem/Envs that support IO timeouts. Even on those that don't support, check in ```RandomAccessFileReader::Read``` and ```MultiRead``` and return ```Status::TimedOut()``` if the deadline is exceeded.

For now, TableReader creation, which might do file opens and reads, are not covered. It will be implemented in another PR.

Tests:
Update existing unit tests to verify the correct timeout value is being passed
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6751

Reviewed By: riversand963

Differential Revision: D21285631

Pulled By: anand1976

fbshipit-source-id: d89af843e5a91ece866e87aa29438b52a65a8567
2020-04-30 14:50:39 -07:00
Mike Kolupaev
e45673dece Properly report IO errors when IndexType::kBinarySearchWithFirstKey is used (#6621)
Summary:
Context: Index type `kBinarySearchWithFirstKey` added the ability for sst file iterator to sometimes report a key from index without reading the corresponding data block. This is useful when sst blocks are cut at some meaningful boundaries (e.g. one block per key prefix), and many seeks land between blocks (e.g. for each prefix, the ranges of keys in different sst files are nearly disjoint, so a typical seek needs to read a data block from only one file even if all files have the prefix). But this added a new error condition, which rocksdb code was really not equipped to deal with: `InternalIterator::value()` may fail with an IO error or Status::Incomplete, but it's just a method returning a Slice, with no way to report error instead. Before this PR, this type of error wasn't handled at all (an empty slice was returned), and kBinarySearchWithFirstKey implementation was considered a prototype.

Now that we (LogDevice) have experimented with kBinarySearchWithFirstKey for a while and confirmed that it's really useful, this PR is adding the missing error handling.

It's a pretty inconvenient situation implementation-wise. The error needs to be reported from InternalIterator when trying to access value. But there are ~700 call sites of `InternalIterator::value()`, most of which either can't hit the error condition (because the iterator is reading from memtable or from index or something) or wouldn't benefit from the deferred loading of the value (e.g. compaction iterator that reads all values anyway). Adding error handling to all these call sites would needlessly bloat the code. So instead I made the deferred value loading optional: only the call sites that may use deferred loading have to call the new method `PrepareValue()` before calling `value()`. The feature is enabled with a new bool argument `allow_unprepared_value` to a bunch of methods that create iterators (it wouldn't make sense to put it in ReadOptions because it's completely internal to iterators, with virtually no user-visible effect). Lmk if you have better ideas.

Note that the deferred value loading only happens for *internal* iterators. The user-visible iterator (DBIter) always prepares the value before returning from Seek/Next/etc. We could go further and add an API to defer that value loading too, but that's most likely not useful for LogDevice, so it doesn't seem worth the complexity for now.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6621

Test Plan: make -j5 check . Will also deploy to some logdevice test clusters and look at stats.

Reviewed By: siying

Differential Revision: D20786930

Pulled By: al13n321

fbshipit-source-id: 6da77d918bad3780522e918f17f4d5513d3e99ee
2020-04-15 17:40:44 -07:00
Cheng Chang
0a0151fb99 Remove memcpy from RandomAccessFileReader::Read in direct IO mode (#6455)
Summary:
In direct IO mode, RandomAccessFileReader::Read allocates an internal aligned buffer, and then copies the result into the scratch buffer. If the result is only temporarily used inside a function, there is no need to do the memcpy and just let the result Slice refer to the internally allocated buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6455

Test Plan: make check

Differential Revision: D20106753

Pulled By: cheng-chang

fbshipit-source-id: 44f505843837bba47a56e3fa2c4dd3bd76486b58
2020-03-06 14:05:12 -08:00
sdong
fdf882ded2 Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433)
Summary:
When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433

Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.

Differential Revision: D19977691

fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
2020-02-20 12:09:57 -08:00
anand76
afa2420c2b Introduce a new storage specific Env API (#5761)
Summary:
The current Env API encompasses both storage/file operations, as well as OS related operations. Most of the APIs return a Status, which does not have enough metadata about an error, such as whether its retry-able or not, scope (i.e fault domain) of the error etc., that may be required in order to properly handle a storage error. The file APIs also do not provide enough control over the IO SLA, such as timeout, prioritization, hinting about placement and redundancy etc.

This PR separates out the file/storage APIs from Env into a new FileSystem class. The APIs are updated to return an IOStatus with metadata about the error, as well as to take an IOOptions structure as input in order to allow more control over the IO.

The user can set both ```options.env``` and ```options.file_system``` to specify that RocksDB should use the former for OS related operations and the latter for storage operations. Internally, a ```CompositeEnvWrapper``` has been introduced that inherits from ```Env``` and redirects individual methods to either an ```Env``` implementation or the ```FileSystem``` as appropriate. When options are sanitized during ```DB::Open```, ```options.env``` is replaced with a newly allocated ```CompositeEnvWrapper``` instance if both env and file_system have been specified. This way, the rest of the RocksDB code can continue to function as before.

This PR also ports PosixEnv to the new API by splitting it into two - PosixEnv and PosixFileSystem. PosixEnv is defined as a sub-class of CompositeEnvWrapper, and threading/time functions are overridden with Posix specific implementations in order to avoid an extra level of indirection.

The ```CompositeEnvWrapper``` translates ```IOStatus``` return code to ```Status```, and sets the severity to ```kSoftError``` if the io_status is retryable. The error handling code in RocksDB can then recover the DB automatically.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5761

Differential Revision: D18868376

Pulled By: anand1976

fbshipit-source-id: 39efe18a162ea746fabac6360ff529baba48486f
2019-12-13 14:48:41 -08:00
sdong
e8263dbdaa Apply formatter to recent 200+ commits. (#5830)
Summary:
Further apply formatter to more recent commits.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5830

Test Plan: Run all existing tests.

Differential Revision: D17488031

fbshipit-source-id: 137458fd94d56dd271b8b40c522b03036943a2ab
2019-09-20 12:04:26 -07:00
Maysam Yabandeh
6ec6a4a9a4 Remove snap_refresh_nanos option (#5826)
Summary:
The snap_refresh_nanos option didn't bring much benefit. Remove the feature to simplify the code.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5826

Differential Revision: D17467147

Pulled By: maysamyabandeh

fbshipit-source-id: 4f950b046990d0d1292d7fc04c2ccafaf751c7f0
2019-09-18 20:26:04 -07:00
sdong
b931f84e56 Divide file_reader_writer.h and .cc (#5803)
Summary:
file_reader_writer.h and .cc contain several files and helper function, and it's hard to navigate. Separate it to multiple files and put them under file/
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5803

Test Plan: Build whole project using make and cmake.

Differential Revision: D17374550

fbshipit-source-id: 10efca907721e7a78ed25bbf74dc5410dea05987
2019-09-16 10:33:51 -07:00
haoyuhuang
705b8eecb4 Add more callers for table reader. (#5454)
Summary:
This PR adds more callers for table readers. These information are only used for block cache analysis so that we can know which caller accesses a block.
1. It renames the BlockCacheLookupCaller to TableReaderCaller as passing the caller from upstream requires changes to table_reader.h and TableReaderCaller is a more appropriate name.
2. It adds more table reader callers in table/table_reader_caller.h, e.g., kCompactionRefill, kExternalSSTIngestion, and kBuildTable.

This PR is long as it requires modification of interfaces in table_reader.h, e.g., NewIterator.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5454

Test Plan: make clean && COMPILE_WITH_ASAN=1 make check -j32.

Differential Revision: D15819451

Pulled By: HaoyuHuang

fbshipit-source-id: b6caa704c8fb96ddd15b9a934b7e7ea87f88092d
2019-06-20 14:31:48 -07:00
Vijay Nadimpalli
24b118ad98 Combine the read-ahead logic for user reads and compaction reads (#5431)
Summary:
Currently the read-ahead logic for user reads and compaction reads go through different code paths where compaction reads create new table readers and use `ReadaheadRandomAccessFile`. This change is to unify read-ahead logic to use read-ahead in BlockBasedTableReader::InitDataBlock(). As a result of the change  `ReadAheadRandomAccessFile` class and `new_table_reader_for_compaction_inputs` option will no longer be used.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5431

Test Plan:
make check

Here is the benchmarking - https://gist.github.com/vjnadimpalli/083cf423f7b6aa12dcdb14c858bc18a5

Differential Revision: D15772533

Pulled By: vjnadimpalli

fbshipit-source-id: b71dca710590471ede6fb37553388654e2e479b9
2019-06-19 14:10:46 -07:00
Maysam Yabandeh
6a40ee5eb1 Refresh snapshot list during long compactions (2nd attempt) (#5278)
Summary:
Part of compaction cpu goes to processing snapshot list, the larger the list the bigger the overhead. Although the lifetime of most of the snapshots is much shorter than the lifetime of compactions, the compaction conservatively operates on the list of snapshots that it initially obtained. This patch allows the snapshot list to be updated via a callback if the compaction is taking long. This should let the compaction to continue more efficiently with much smaller snapshot list.
For simplicity, to avoid the feature is disabled in two cases: i) When more than one sub-compaction are sharing the same snapshot list, ii) when Range Delete is used in which the range delete aggregator has its own copy of snapshot list.
This fixes the reverted https://github.com/facebook/rocksdb/pull/5099 issue with range deletes.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5278

Differential Revision: D15203291

Pulled By: maysamyabandeh

fbshipit-source-id: fa645611e606aa222c7ce53176dc5bb6f259c258
2019-05-03 17:30:22 -07:00
Maysam Yabandeh
521d234bda Revert snap_refresh_nanos feature (#5269)
Summary:
Our daily stress tests are failing after this feature. Reverting temporarily until we figure the reason for test failures.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5269

Differential Revision: D15151285

Pulled By: maysamyabandeh

fbshipit-source-id: e4002b99690a97df30d4b4b58bf0f61e9591bc6e
2019-05-01 10:07:30 -07:00
Maysam Yabandeh
506e8448be Refresh snapshot list during long compactions (#5099)
Summary:
Part of compaction cpu goes to processing snapshot list, the larger the list the bigger the overhead. Although the lifetime of most of the snapshots is much shorter than the lifetime of compactions, the compaction conservatively operates on the list of snapshots that it initially obtained. This patch allows the snapshot list to be updated via a callback if the compaction is taking long. This should let the compaction to continue more efficiently with much smaller snapshot list.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5099

Differential Revision: D15086710

Pulled By: maysamyabandeh

fbshipit-source-id: 7649f56c3b6b2fb334962048150142a3bf9c1a12
2019-04-25 18:17:22 -07:00
Sagar Vemuri
dc3528077a Update all unique/shared_ptr instances to be qualified with namespace std (#4638)
Summary:
Ran the following commands to recursively change all the files under RocksDB:
```
find . -type f -name "*.cc" -exec sed -i 's/ unique_ptr/ std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<unique_ptr/<std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/ shared_ptr/ std::shared_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<shared_ptr/<std::shared_ptr/g' {} +
```
Running `make format` updated some formatting on the files touched.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4638

Differential Revision: D12934992

Pulled By: sagar0

fbshipit-source-id: 45a15d23c230cdd64c08f9c0243e5183934338a8
2018-11-09 11:19:58 -08:00
Yanqin Jin
bb5dcea98e Add path to WritableFileWriter. (#4039)
Summary:
We want to sample the file I/O issued by RocksDB and report the function calls. This requires us to include the file paths otherwise it's hard to tell what has been going on.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4039

Differential Revision: D8670178

Pulled By: riversand963

fbshipit-source-id: 97ee806d1c583a2983e28e213ee764dc6ac28f7a
2018-08-23 10:12:58 -07:00
Sagar Vemuri
189f0c27aa Make BlockBasedTableIterator compaction-aware (#4048)
Summary:
Pass in `for_compaction` to `BlockBasedTableIterator` via `BlockBasedTableReader::NewIterator`.

In 7103559f49, `for_compaction` was set in `BlockBasedTable::Rep` via `BlockBasedTable::SetupForCompaction`. In hindsight it was not the right decision; it also caused TSAN to complain.
Closes https://github.com/facebook/rocksdb/pull/4048

Differential Revision: D8601056

Pulled By: sagar0

fbshipit-source-id: 30127e898c15c38c1080d57710b8c5a6d64a0ab3
2018-06-25 13:19:27 -07:00
Zhongyi Xie
c3ebc75843 Move prefix_extractor to MutableCFOptions
Summary:
Currently it is not possible to change bloom filter config without restart the db, which is causing a lot of operational complexity for users.
This PR aims to make it possible to dynamically change bloom filter config.
Closes https://github.com/facebook/rocksdb/pull/3601

Differential Revision: D7253114

Pulled By: miasantreble

fbshipit-source-id: f22595437d3e0b86c95918c484502de2ceca120c
2018-05-21 14:43:11 -07:00
Maysam Yabandeh
67182678a5 Stats for false positive rate of full filtesr
Summary:
Adds two stats to allow us measuring the false positive rate of full filters:
- The total count of positives: rocksdb.bloom.filter.full.positive
- The total count of true positives: rocksdb.bloom.filter.full.true.positive
Not the term "full" in the stat name to indicate that they are meaningful in full filters. block-based filters are to be deprecated soon and supporting it is not worth the the additional cost of if-then-else branches.

Closes #3680

Tested by:
$ ./db_bench -benchmarks=fillrandom  -db /dev/shm/rocksdb-tmpdb --num=1000000 -bloom_bits=10
$ ./db_bench -benchmarks="readwhilewriting"  -db /dev/shm/rocksdb-tmpdb --statistics -bloom_bits=10 --duration=60 --num=2000000 --use_existing_db 2>&1 > /tmp/full.log
$ grep filter.full /tmp/full.log
rocksdb.bloom.filter.full.positive COUNT : 3628593
rocksdb.bloom.filter.full.true.positive COUNT : 3536026
which gives the false positive rate of 2.5%
Closes https://github.com/facebook/rocksdb/pull/3681

Differential Revision: D7517570

Pulled By: maysamyabandeh

fbshipit-source-id: 630ab1a473afdce404916d297035b6318de4c052
2018-04-05 15:58:48 -07:00
Andrew Kryczka
5d68243e61 Comment out unused variables
Summary:
Submitting on behalf of another employee.
Closes https://github.com/facebook/rocksdb/pull/3557

Differential Revision: D7146025

Pulled By: ajkr

fbshipit-source-id: 495ca5db5beec3789e671e26f78170957704e77e
2018-03-05 13:13:41 -08:00
Igor Sugak
aba3409740 Back out "[codemod] - comment out unused parameters"
Reviewed By: igorsugak

fbshipit-source-id: 4a93675cc1931089ddd574cacdb15d228b1e5f37
2018-02-22 12:43:17 -08:00
David Lai
f4a030ce81 - comment out unused parameters
Reviewed By: everiq, igorsugak

Differential Revision: D7046710

fbshipit-source-id: 8e10b1f1e2aecebbfb229c742e214db887e5a461
2018-02-22 09:44:23 -08:00
Aaron Gao
8f553d3c52 remove unnecessary internal_comparator param in newIterator
Summary:
solved https://github.com/facebook/rocksdb/issues/2604
Closes https://github.com/facebook/rocksdb/pull/2648

Differential Revision: D5504875

Pulled By: lightmark

fbshipit-source-id: c14bb62ccbdc9e7bda9cd914cae4ea0765d882ee
2017-07-27 14:30:42 -07:00
Sagar Vemuri
72502cf227 Revert "comment out unused parameters"
Summary:
This reverts the previous commit 1d7048c598, which broke the build.

Did a `git revert 1d7048c`.
Closes https://github.com/facebook/rocksdb/pull/2627

Differential Revision: D5476473

Pulled By: sagar0

fbshipit-source-id: 4756ff5c0dfc88c17eceb00e02c36176de728d06
2017-07-21 18:26:26 -07:00
Victor Gao
1d7048c598 comment out unused parameters
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.

Reviewed By: igorsugak

Differential Revision: D5454343

fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
2017-07-21 14:57:44 -07:00
Siying Dong
ae28634e9f Remove some left-over BSD headers
Summary: Closes https://github.com/facebook/rocksdb/pull/2608

Differential Revision: D5444797

Pulled By: siying

fbshipit-source-id: 690581d03f37822e059a16085088e8e2d8a45016
2017-07-18 11:56:57 -07:00
Yedidya Feldblum
f1a056e005 CodeMod: Prefer ADD_FAILURE() over EXPECT_TRUE(false), et cetera
Summary:
CodeMod: Prefer `ADD_FAILURE()` over `EXPECT_TRUE(false)`, et cetera.

The tautologically-conditioned and tautologically-contradicted boolean expectations/assertions have better alternatives: unconditional passes and failures.

Reviewed By: Orvid

Differential Revision:
D5432398

Tags: codemod, codemod-opensource

fbshipit-source-id: d16b447e8696a6feaa94b41199f5052226ef6914
2017-07-16 21:26:02 -07:00
Siying Dong
3c327ac2d0 Change RocksDB License
Summary: Closes https://github.com/facebook/rocksdb/pull/2589

Differential Revision: D5431502

Pulled By: siying

fbshipit-source-id: 8ebf8c87883daa9daa54b2303d11ce01ab1f6f75
2017-07-15 16:11:23 -07:00
Aaron Gao
a30a696034 do not read next datablock if upperbound is reached
Summary:
Now if we have iterate_upper_bound set, we continue read until get a key >= upper_bound. For a lot of cases that neighboring data blocks have a user key gap between them, our index key will be a user key in the middle to get a shorter size. For example, if we have blocks:
[a b c d][f g h]
Then the index key for the first block will be 'e'.
then if upper bound is any key between 'd' and 'e', for example, d1, d2, ..., d99999999999, we don't have to read the second block and also know that we have done our iteration by reaching the last key that smaller the upper bound already.

This diff can reduce RA in most cases.
Closes https://github.com/facebook/rocksdb/pull/2239

Differential Revision: D4990693

Pulled By: lightmark

fbshipit-source-id: ab30ea2e3c6edf3fddd5efed3c34fcf7739827ff
2017-05-05 23:20:01 -07:00
Siying Dong
d616ebea23 Add GPLv2 as an alternative license.
Summary: Closes https://github.com/facebook/rocksdb/pull/2226

Differential Revision: D4967547

Pulled By: siying

fbshipit-source-id: dd3b58ae1e7a106ab6bb6f37ab5c88575b125ab4
2017-04-27 18:06:12 -07:00
Andrew Kryczka
40a2e406f8 DeleteRange flush support
Summary:
Changed BuildTable() (used for flush) to (1) add range
tombstones to the aggregator, which is used by CompactionIterator to
determine which keys can be removed; and (2) add aggregator's range
tombstones to the table that is output for the flush.
Closes https://github.com/facebook/rocksdb/pull/1438

Differential Revision: D4100025

Pulled By: ajkr

fbshipit-source-id: cb01a70
2016-10-31 20:54:18 -07:00
omegaga
e70020e4f6 Only cache level 0 indexes and filter when opening table reader
Summary: In T8216281 we decided to disable prefetching the index and filter during opening table handlers during startup (max_open_files = -1).

Test Plan: Rely on `IndexAndFilterBlocksOfNewTableAddedToCache` to guarantee L0 indexes and filters are still cached and change `PinL0IndexAndFilterBlocksTest` to make sure other levels are not cached (maybe add one more test to test we don't cache other levels?)

Reviewers: sdong, andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59913
2016-07-20 11:23:31 -07:00
Baraa Hamodi
21e95811d1 Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
Andrew Kryczka
e089db40f9 Skip bottom-level filter block caching when hit-optimized
Summary:
When Get() or NewIterator() trigger file loads, skip caching the filter block if
(1) optimize_filters_for_hits is set and (2) the file is on the bottommost
level. Also skip checking filters under the same conditions, which means that
for a preloaded file or a file that was trivially-moved to the bottom level, its
filter block will eventually expire from the cache.

- added parameters/instance variables in various places in order to propagate the config ("skip_filters") from version_set to block_based_table_reader
- in BlockBasedTable::Rep, this optimization prevents filter from being loaded when the file is opened simply by setting filter_policy = nullptr
- in BlockBasedTable::Get/BlockBasedTable::NewIterator, this optimization prevents filter from being used (even if it was loaded already) by setting filter = nullptr

Test Plan:
updated unit test:

  $ ./db_test --gtest_filter=DBTest.OptimizeFiltersForHits

will also run 'make check'

Reviewers: sdong, igor, paultuckfield, anthony, rven, kradhakrishnan, IslamAbdelRahman, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D51633
2015-12-23 10:15:07 -08:00
Dmitri Smirnov
5c8f2ee786 Fix MockTable ID storage
On Windows two tests fail that use MockTable:
  flush_job_test and compaction_job_test with the following message:
  compaction_job_test_je.exe : Assertion failed: result.size() == 4,
  file c:\dev\rocksdb\rocksdb\table\mock_table.cc, line 110

  Investigation reveals that this failure occurs when a 4 byte
  ID written to a beginning of the physically open file (main
  contents remains in a in-memory map) can not be read back.

  The reason for the failure is that the ID is written directly
  to a WritableFile bypassing WritableFileWriter. The side effect of that
  is that pending_sync_ never becomes true so the file is never flushed,
  however, the direct cause of the failure is that the filesize_ member
  of the WritableFileWriter remains zero. At Close() the file is truncated
  to that size and the file becomes empty so the ID can not be read back.
2015-10-28 10:53:14 -07:00
sdong
35ad531be3 Seperate InternalIterator from Iterator
Summary:
Separate a new class InternalIterator from class Iterator, when the look-up is done internally, which also means they operate on key with sequence ID and type.

This change will enable potential future optimizations but for now InternalIterator's functions are still the same as Iterator's.
At the same time, separate the cleanup function to a separate class and let both of InternalIterator and Iterator inherit from it.

Test Plan: Run all existing tests.

Reviewers: igor, yhchiang, anthony, kradhakrishnan, IslamAbdelRahman, rven

Reviewed By: rven

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D48549
2015-10-13 15:32:13 -07:00
sdong
776bd8d5eb Pass column family ID to table property collector
Summary: Pass column family ID through TablePropertiesCollectorFactory::CreateTablePropertiesCollector() so that users can identify which column family this file is for and handle it differently.

Test Plan: Add unit test scenarios in tests related to table properties collectors to verify the information passed in is correct.

Reviewers: rven, yhchiang, anthony, kradhakrishnan, igor, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: yoshinorim, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D48411
2015-10-09 14:36:51 -07:00
Islam AbdelRahman
45e9e4f0bb Refactor NewTableReader to accept TableReaderOptions
Summary:
Refactoring NewTableReader to accept TableReaderOptions
This will make it easier to add new options in the future, for example in this diff https://reviews.facebook.net/D46071

Test Plan: run existing tests

Reviewers: igor, yhchiang, anthony, rven, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D46179
2015-09-11 11:36:33 -07:00
Andres Noetzli
3c9cef1eed Unified maps with Comparator for sorting, other cleanup
Summary:
This diff is a collection of cleanups that were initially part of D43179.
Additionally it adds a unified way of defining key-value maps that use a
Comparator for sorting (this was previously implemented in four different
places).

Test Plan: make clean check all

Reviewers: rven, anthony, yhchiang, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D45993
2015-09-02 13:58:22 -07:00
Andres Notzli
68f934355a Better CompactionJob testing
Summary:
Changed compaction_job_test to support better/more thorough
tests and added two tests. Also changed MockFileContents
to order using InternalKeyComparator.

Test Plan: make compaction_job_test && ./compaction_job_test; make all && make check

Reviewers: sdong, rven, igor, yhchiang, anthony

Reviewed By: anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D42837
2015-08-07 21:59:51 -07:00
sdong
6e9fbeb27c Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.

Test Plan: Run all existing unit tests.

Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:58:18 -07:00
sdong
fcb206b667 SyncPoint to allow a callback with an argument and use it to get DBTest.DynamicLevelCompressionPerLevel2 more straight-forward
Summary:
Allow users to give a callback function with parameter using sync point, so more complicated verification can be done in tests.
Use it in DBTest.DynamicLevelCompressionPerLevel2 so that failures will be more easy to debug.

Test Plan: Run all tests. Run DBTest.DynamicLevelCompressionPerLevel2 with valgrind check.

Reviewers: rven, yhchiang, anthony, kradhakrishnan, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D36999
2015-04-14 16:18:50 -07:00
sdong
953a885ebf A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge
Summary:
Currently users have no idea a key is add, delete or merge from TablePropertiesCollector call back. Add a new function to add it.

Also refactor the codes so that
(1) make table property collector and internal table property collector two separate data structures with the later one now exposed
(2) table builders only receive internal table properties

Test Plan: Add cases in table_properties_collector_test to cover both of old and new ways of using TablePropertiesCollector.

Reviewers: yhchiang, igor.sugak, rven, igor

Reviewed By: rven, igor

Subscribers: meyering, yoshinorim, maykov, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D35373
2015-04-06 10:27:21 -07:00
sdong
e9de8b65a6 Change the way options.compression_per_level is used when options.level_compaction_dynamic_level_bytes=true
Summary:
Change the way options.compression_per_level is used when options.level_compaction_dynamic_level_bytes=true so that options.compression_per_level[1] determines compression for the level L0 is merged to, options.compression_per_level[2] to the level after that, etc.

Test Plan: run all tests

Reviewers: rven, yhchiang, kradhakrishnan, igor

Reviewed By: igor

Subscribers: yoshinorim, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D34431
2015-03-11 13:14:52 -07:00
Sameet Agarwal
e7c434c364 Add columnfamily option optimize_filters_for_hits to optimize for key hits only
Summary:
    Summary:
    Added a new option to ColumnFamllyOptions  - optimize_filters_for_hits. This option can be used in the case where most
    accesses to the store are key hits and we dont need to optimize performance for key misses.
    This is useful when you have a very large database and most of your lookups succeed.  The option allows the store to
     not store and use filters in the last level (the largest level which contains data). These filters can take a large amount of
     space for large databases (in memory and on-disk). For the last level, these filters are only useful for key misses and not
     for key hits. If we are not optimizing for key misses, we can choose to not store these filters for that level.

    This option is only provided for BlockBasedTable. We skip the filters when we are compacting

Test Plan:
1. Modified db_test toalso run tests with an additonal option (skip_filters_on_last_level)
 2. Added another unit test to db_test which specifically tests that filters are being skipped

Reviewers: rven, igor, sdong

Reviewed By: sdong

Subscribers: lgalanis, yoshinorim, MarkCallaghan, rven, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D33717
2015-02-26 16:25:56 -08:00
Igor Canadi
9be338cf9d CompactionJobTest
Summary:
This is just a simple test that passes two files though a compaction. It shows the framework so that people can continue building new compaction *unit* tests.
In the future we might want to move some Compaction* tests from DBTest here. For example, CompactBetweenSnapshot seems a good candidate.

Hopefully this test can be simpler when we mock out VersionSet.

Test Plan: this is a test

Reviewers: ljin, rven, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D28449
2014-11-14 11:35:48 -08:00