Commit Graph

311 Commits

Author SHA1 Message Date
Maysam Yabandeh
8a04ee4fd1 WritePrepared Txn: use TransactionDBWriteOptimizations (2nd attempt)
Summary:
TransactionDB::Write can receive some optimization hints from the user. One is to skip the concurrency control mechanism. WritePreparedTxnDB is currently ignoring such hints. This patch optimizes WritePreparedTxnDB::Write for skip_concurrency_control and skip_duplicate_key_check hints.
Closes https://github.com/facebook/rocksdb/pull/3496

Differential Revision: D6971784

Pulled By: maysamyabandeh

fbshipit-source-id: cbab10ad538fa2b8bcb47e37c77724afe6e30f03
2018-02-12 16:43:40 -08:00
Maysam Yabandeh
8feee28020 Add skip_cc option to TransactionDB::Write
Summary:
Compared to DB::Write, TransactionDB::Write has the additional overhead of creating and initializing an internal transaction object, as well as the overhead of locking/unlocking the keys. This patch extends the TransactionDB::Write with an skip_cc option to allow the users to indicate that the write batch do not conflict with others and the concurrency control and its overhead can be skipped. TransactionDB::Write by default calls DB::Write when skip_cc is set, which works for WriteCommitted WritePolicy. Any other flavor of TransactionDB that is not compatible with this default behavior (such as WritePreparedTxnDB) can extend ::Write and implement their own approach for taking into account the skip_cc optimization.
Closes https://github.com/facebook/rocksdb/pull/3457

Differential Revision: D6877318

Pulled By: maysamyabandeh

fbshipit-source-id: 56f4e21db87ff71492db4e376fb7c2b03dfeab6b
2018-02-06 15:28:24 -08:00
Maysam Yabandeh
88d8b2a2f5 WritePrepared Txn: Duplicate Keys, Txn Part
Summary:
This patch takes advantage of memtable being able to detect duplicate <key,seq> and returning TryAgain to handle duplicate keys in WritePrepared Txns. Through WriteBatchWithIndex's index it detects existence of at least a duplicate key in the write batch. If duplicate key was reported, it then pays the cost of counting the number of sub-patches by iterating over the write batch and pass it to DBImpl::Write. DB will make use of the provided batch_count to assign proper sequence numbers before sending them to the WAL. When later inserting the batch to the memtable, it increases the seq each time memtbale reports a duplicate (a sub-patch in our counting) and tries again.
Closes https://github.com/facebook/rocksdb/pull/3455

Differential Revision: D6873699

Pulled By: maysamyabandeh

fbshipit-source-id: db8487526c3a5dc1ddda0ea49f0f979b26ae648d
2018-02-05 18:43:24 -08:00
Yi Wu
439855a774 StackableDB optionally take shared ownership of the underlying DB
Summary:
Allow StackableDB optionally takes a shared_ptr on construction and thus hold shared ownership of the underlying DB.
Closes https://github.com/facebook/rocksdb/pull/3423

Differential Revision: D6824163

Pulled By: yiwu-arbug

fbshipit-source-id: dbdc30c42e007533a987ef413785e192340f03eb
2018-01-26 15:28:44 -08:00
jonasf
4decff6fa8 Add possibility to change ttl on open DB
Summary:
We have seen cases where it could be good to change TTL on already open DB.
Change ttl in TtlCompactionFilterFactory on open db.
Next time a filter is created, it will filter accroding to the set TTL.

Is this something that could be useful for others?
Any downsides?
Closes https://github.com/facebook/rocksdb/pull/3292

Differential Revision: D6731993

Pulled By: miasantreble

fbshipit-source-id: 73b94d69237b11e8730734389052429d621a6b1e
2018-01-18 10:42:15 -08:00
Anand Ananthabhotla
d0f1b49ab6 Add a Close() method to DB to return status when closing a db
Summary:
Currently, the only way to close an open DB is to destroy the DB
object. There is no way for the caller to know the status. In one
instance, the destructor encountered an error due to failure to
close a log file on HDFS. In order to prevent silent failures, we add
DB::Close() that calls CloseImpl() which must be implemented by its
descendants.
The main failure point in the destructor is closing the log file. This
patch also adds a Close() entry point to Logger in order to get status.
When DBOptions::info_log is allocated and owned by the DBImpl, it is
explicitly closed by DBImpl::CloseImpl().
Closes https://github.com/facebook/rocksdb/pull/3348

Differential Revision: D6698158

Pulled By: anand1976

fbshipit-source-id: 9468e2892553eb09c4c41b8723f590c0dbd8ab7d
2018-01-16 11:08:57 -08:00
Maysam Yabandeh
2515266725 WritePrepared Txn: Refactoring TrackKeys
Summary:
This patch clarifies and refactors the logic around tracked keys in transactions.
Closes https://github.com/facebook/rocksdb/pull/3140

Differential Revision: D6290258

Pulled By: maysamyabandeh

fbshipit-source-id: 03b50646264cbcc550813c060b180fc7451a55c1
2017-11-11 13:14:20 -08:00
Prashant D
602fe9454c Fix coverity issues in include/rocksdb
Summary:
include/rocksdb/metadata.h:
struct ColumnFamilyMetaData {

CID 1322804 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member file_count is not initialized in this constructor nor in any functions that it calls.

struct SstFileMetaData {
        2. uninit_member: Non-static class member size is not initialized in this constructor nor in any functions that it calls.
        4. uninit_member: Non-static class member smallest_seqno is not initialized in this constructor nor in any functions that it calls.
        6. uninit_member: Non-static class member largest_seqno is not initialized in this constructor nor in any functions that it calls.
        8. uninit_member: Non-static class member num_reads_sampled is not initialized in this constructor nor in any functions that it calls.

CID 1322807 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
10. uninit_member: Non-static class member being_compacted is not initialized in this constructor nor in any functions that it calls.

include/rocksdb/sst_file_writer.h:
struct ExternalSstFileInfo {
        2. uninit_member: Non-static class member sequence_number is not initialized in this constructor nor in any functions that it calls.
        4. uninit_member: Non-static class member file_size is not initialized in this constructor nor in any functions that it calls.
        6. uninit_member: Non-static class member num_entries is not initialized in this constructor nor in any functions that it calls.

CID 1351697 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
8. uninit_member: Non-static class member version is not initialized in this constructor nor in any functions that it calls.
 31  ExternalSstFileInfo() {}

include/rocksdb/utilities/transaction.h:
explicit Transaction(const TransactionDB* db) {}
        2. uninit_member: Non-static class member log_number_ is not initialized in this constructor nor in any functions that it calls.

CID 1396133 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
4. uninit_member: Non-static class member field txn_state_._M_i is not initialized in this constructor nor in any functions that it calls.
473  Transaction() {}
Closes https://github.com/facebook/rocksdb/pull/3100

Differential Revision: D6227651

Pulled By: sagar0

fbshipit-source-id: 5caa4a2cf9471d1f9c3c073f81473636e1f0aa14
2017-11-02 17:56:48 -07:00
Mikhail Antonov
7fe3b32896 Added support for differential snapshots
Summary:
The motivation for this PR is to add to RocksDB support for differential (incremental) snapshots, as snapshot of the DB changes between two points in time (one can think of it as diff between to sequence numbers, or the diff D which can be thought of as an SST file or just set of KVs that can be applied to sequence number S1 to get the database to the state at sequence number S2).

This feature would be useful for various distributed storages layers built on top of RocksDB, as it should help reduce resources (time and network bandwidth) needed to recover and rebuilt DB instances as replicas in the context of distributed storages.

From the API standpoint that would like client app requesting iterator between (start seqnum) and current DB state, and reading the "diff".

This is a very draft PR for initial review in the discussion on the approach, i'm going to rework some parts and keep updating the PR.

For now, what's done here according to initial discussions:

Preserving deletes:
 - We want to be able to optionally preserve recent deletes for some defined period of time, so that if a delete came in recently and might need to be included in the next incremental snapshot it would't get dropped by a compaction. This is done by adding new param to Options (preserve deletes flag) and new variable to DB Impl where we keep track of the sequence number after which we don't want to drop tombstones, even if they are otherwise eligible for deletion.
 - I also added a new API call for clients to be able to advance this cutoff seqnum after which we drop deletes; i assume it's more flexible to let clients control this, since otherwise we'd need to keep some kind of timestamp < -- > seqnum mapping inside the DB, which sounds messy and painful to support. Clients could make use of it by periodically calling GetLatestSequenceNumber(), noting the timestamp, doing some calculation and figuring out by how much we need to advance the cutoff seqnum.
 - Compaction codepath in compaction_iterator.cc has been modified to avoid dropping tombstones with seqnum > cutoff seqnum.

Iterator changes:
 - couple params added to ReadOptions, to optionally allow client to request internal keys instead of user keys (so that client can get the latest value of a key, be it delete marker or a put), as well as min timestamp and min seqnum.

TableCache changes:
 - I modified table_cache code to be able to quickly exclude SST files from iterators heep if creation_time on the file is less then iter_start_ts as passed in ReadOptions. That would help a lot in some DB settings (like reading very recent data only or using FIFO compactions), but not so much for universal compaction with more or less long iterator time span.

What's left:

 - Still looking at how to best plug that inside DBIter codepath. So far it seems that FindNextUserKeyInternal only parses values as UserKeys, and iter->key() call generally returns user key. Can we add new API to DBIter as internal_key(), and modify this internal method to optionally set saved_key_ to point to the full internal key? I don't need to store actual seqnum there, but I do need to store type.
Closes https://github.com/facebook/rocksdb/pull/2999

Differential Revision: D6175602

Pulled By: mikhail-antonov

fbshipit-source-id: c779a6696ee2d574d86c69cec866a3ae095aa900
2017-11-01 18:56:43 -07:00
Maysam Yabandeh
17731a43a6 WritePrepared Txn: Optimize for recoverable state
Summary:
GetCommitTimeWriteBatch is currently used to store some state as part of commit in 2PC. In MyRocks it is specifically used to store some data that would be needed only during recovery. So it is not need to be stored in memtable right after each commit.
This patch enables an optimization to write the GetCommitTimeWriteBatch only to the WAL. The batch will be written to memtable during recovery when the WAL is replayed. To cover the case when WAL is deleted after memtable flush, the batch is also buffered and written to memtable right before each memtable flush.
Closes https://github.com/facebook/rocksdb/pull/3071

Differential Revision: D6148023

Pulled By: maysamyabandeh

fbshipit-source-id: 2d09bae5565abe2017c0327421010d5c0d55eaa7
2017-11-01 17:26:46 -07:00
Yi Wu
5a2a6483dc Blob DB: Inline small values in base DB
Summary:
Adding the `min_blob_size` option to allow storing small values in base db (in LSM tree) together with the key. The goal is to improve performance for small values, while taking advantage of blob db's low write amplification for large values.

Also adding expiration timestamp to blob index. It will be useful to evict stale blob indexes in base db by adding a compaction filter. I'll work on the compaction filter in future patches.

See blob_index.h for the new blob index format. There are 4 cases when writing a new key:
* small value w/o TTL: put in base db as normal value (i.e. ValueType::kTypeValue)
* small value w/ TTL: put (type, expiration, value) to base db.
* large value w/o TTL: write value to blob log and put (type, file, offset, size, compression) to base db.
* large value w/TTL: write value to blob log and put (type, expiration, file, offset, size, compression) to base db.
Closes https://github.com/facebook/rocksdb/pull/3066

Differential Revision: D6142115

Pulled By: yiwu-arbug

fbshipit-source-id: 9526e76e19f0839310a3f5f2a43772a4ad182cd0
2017-10-26 12:30:54 -07:00
Dmitri Smirnov
ebab2e2d42 Enable MSVC W4 with a few exceptions. Fix warnings and bugs
Summary: Closes https://github.com/facebook/rocksdb/pull/3018

Differential Revision: D6079011

Pulled By: yiwu-arbug

fbshipit-source-id: 988a721e7e7617967859dba71d660fc69f4dff57
2017-10-19 10:57:12 -07:00
Maysam Yabandeh
4e3c3d8c6a WritePrepared Txn: duplicate keys
Summary:
With WriteCommitted, when the write batch has duplicate keys, the txn db simply inserts them to the db with different seq numbers and let the db ignore/merge the duplicate values at the read time. With WritePrepared all the entries of the batch are inserted with the same seq number which prevents us from benefiting from this simple solution.

This patch applies a hackish solution to unblock the end-to-end testing. The hack is to be replaced with a proper solution soon. The patch simply detects the duplicate key insertions, and mark the previous one as obsolete. Then before writing to the db it rewrites the batch eliminating the obsolete keys. This would incur a memcpy cost. Furthermore handing duplicate merge would require to do FullMerge instead of simply ignoring the previous value, which is not handled by this patch.
Closes https://github.com/facebook/rocksdb/pull/2969

Differential Revision: D5976337

Pulled By: maysamyabandeh

fbshipit-source-id: 114e65b66f137d8454ff2d1d782b8c05da95f989
2017-10-05 07:41:02 -07:00
Maysam Yabandeh
385049baf2 WritePrepared Txn: Recovery
Summary:
Recover txns from the WAL. Also added some unit tests.
Closes https://github.com/facebook/rocksdb/pull/2901

Differential Revision: D5859596

Pulled By: maysamyabandeh

fbshipit-source-id: 6424967b231388093b4effffe0a3b1b7ec8caeb0
2017-09-28 16:56:45 -07:00
Yi Wu
ec48e5c77f Add TransactionDB::SingleDelete()
Summary:
Looks like the API is simply missing. Adding it.
Closes https://github.com/facebook/rocksdb/pull/2937

Differential Revision: D5919955

Pulled By: yiwu-arbug

fbshipit-source-id: 6e2e9c96c29882b0bb4113d1f8efb72bffc57878
2017-09-27 10:27:26 -07:00
Andrew Kryczka
f5148ade10 support opening zero backups during engine init
Summary:
There are internal users who open BackupEngine for writing new backups only, and they don't care whether old backups can be read or not. The condition `BackupableDBOptions::max_valid_backups_to_open == 0` should be supported (previously in df74b775e6 I made the mistake of choosing 0 as a special value to disable the limit).
Closes https://github.com/facebook/rocksdb/pull/2819

Differential Revision: D5751599

Pulled By: ajkr

fbshipit-source-id: e73ac19eb5d756d6b68601eae8e43407ee4f2752
2017-09-12 13:26:34 -07:00
Maysam Yabandeh
f46464d383 write-prepared txn: call IsInSnapshot
Summary:
This patch instruments the read path to verify each read value against an optional ReadCallback class. If the value is rejected, the reader moves on to the next value. The WritePreparedTxn makes use of this feature to skip sequence numbers that are not in the read snapshot.
Closes https://github.com/facebook/rocksdb/pull/2850

Differential Revision: D5787375

Pulled By: maysamyabandeh

fbshipit-source-id: 49d808b3062ab35e7ae98ad388f659757794184c
2017-09-11 09:14:48 -07:00
Artem Danilov
8a6708f5f2 Extend property map with compaction stats
Summary:
This branch extends existing property map which keeps values in doubles to keep values in strings so that it can be used to provide wider range of properties. The immediate need for that is to provide IO stall stats in an easy parseable way to MyRocks which is also part of this branch.
Closes https://github.com/facebook/rocksdb/pull/2794

Differential Revision: D5717676

Pulled By: Tema

fbshipit-source-id: e34ba5b79ba774697f7b97ce1138d8fd55471b8a
2017-08-30 15:26:55 -07:00
Maysam Yabandeh
fbfa3e7a43 WriteAtPrepare: Efficient read from snapshot list
Summary:
Divide the old snapshots to two lists: a few that fit into a cached array and the rest in a vector, which is expected to be empty in normal cases. The former is to optimize concurrent reads from snapshots without requiring locks. It is done by an array of std::atomic, from which std::memory_order_acquire reads are compiled to simple read instructions in most of the x86_64 architectures.
Closes https://github.com/facebook/rocksdb/pull/2758

Differential Revision: D5660504

Pulled By: maysamyabandeh

fbshipit-source-id: 524fcf9a8e7f90a92324536456912a99aaa6740c
2017-08-26 01:00:38 -07:00
Maysam Yabandeh
ccf7f833e3 Use PinnableSlice in Transactions
Summary:
The ::Get from DB is not augmented with an overload method that takes a PinnableSlice instead of a string. Transactions however are not yet upgraded to use the new API. As a result, transaction users such as MyRocks cannot benefit from it. This patch updates the transactional API with a PinnableSlice overload.
Closes https://github.com/facebook/rocksdb/pull/2736

Differential Revision: D5645770

Pulled By: maysamyabandeh

fbshipit-source-id: f6af520df902f842de1bcf99bed3e8dfc43ad96d
2017-08-23 10:11:45 -07:00
Archit Mishra
bddd5d3630 Added mechanism to track deadlock chain
Summary:
Changes:
* extended the wait_txn_map to track additional information
* designed circular buffer to store n latest deadlocks' information
* added test coverage to verify the additional information tracked is accurately stored in the buffer
Closes https://github.com/facebook/rocksdb/pull/2630

Differential Revision: D5478025

Pulled By: armishra

fbshipit-source-id: 2b138de7b5a73f5ca554fc3ff8220a3be49f39e7
2017-08-17 18:56:21 -07:00
Aaron G
7848f0b24c add VerifyChecksum() to db.h
Summary:
We need a tool to check any sst file corruption in the db.
It will check all the sst files in current version and read all the blocks (data, meta, index) with checksum verification. If any verification fails, the function will return non-OK status.
Closes https://github.com/facebook/rocksdb/pull/2498

Differential Revision: D5324269

Pulled By: lightmark

fbshipit-source-id: 6f8a272008b722402a772acfc804524c9d1a483b
2017-08-09 15:58:13 -07:00
Maysam Yabandeh
a9a4e89c38 Fix valgrind complaint about initialization
Summary: Closes https://github.com/facebook/rocksdb/pull/2697

Differential Revision: D5573894

Pulled By: maysamyabandeh

fbshipit-source-id: 8fc03ea8ea6f3f3bc0f68b64cf90243a70562dc4
2017-08-07 08:49:52 -07:00
Maysam Yabandeh
c9804e007a Refactor TransactionDBImpl
Summary:
This opens space for the new implementations of TransactionDBImpl such as WritePreparedTxnDBImpl that has a different policy of how to write to DB.
Closes https://github.com/facebook/rocksdb/pull/2689

Differential Revision: D5568918

Pulled By: maysamyabandeh

fbshipit-source-id: f7eac866e175daf3793ae79da108f65cc7dc7b25
2017-08-05 17:26:15 -07:00
Siying Dong
21696ba502 Replace dynamic_cast<>
Summary:
Replace dynamic_cast<> so that users can choose to build with RTTI off, so that they can save several bytes per object, and get tiny more memory available.
Some nontrivial changes:
1. Add Comparator::GetRootComparator() to get around the internal comparator hack
2. Add the two experiemental functions to DB
3. Add TableFactory::GetOptionString() to avoid unnecessary casting to get the option string
4. Since 3 is done, move the parsing option functions for table factory to table factory files too, to be symmetric.
Closes https://github.com/facebook/rocksdb/pull/2645

Differential Revision: D5502723

Pulled By: siying

fbshipit-source-id: fd13cec5601cf68a554d87bfcf056f2ffa5fbf7c
2017-07-28 16:27:16 -07:00
Sagar Vemuri
aace46516b Fix license headers in Cassandra related files
Summary:
I might have missed these while doing some recent cassandra code reviews.
Closes https://github.com/facebook/rocksdb/pull/2663

Differential Revision: D5520138

Pulled By: sagar0

fbshipit-source-id: 340930afe9efe03c75f535a1da1f89bd3e53c1f9
2017-07-28 13:56:56 -07:00
Islam AbdelRahman
50a969131f CacheActivityLogger, component to log cache activity into a file
Summary:
Simple component that will add a new entry in a log file every time we lookup/insert a key in SimCache.
API:
```
SimCache::StartActivityLogging(<file_name>, <env>, <optional_max_size>)
SimCache::StopActivityLogging()
```

Sending for review, Still need to add more comments.

I was thinking about a better approach, but I ended up deciding I will use a mutex to sync the writes to the file, since this feature should not be heavily used and only used to collect info that will be analyzed offline. I think it's okay to hold the mutex every time we lookup/add to the SimCache.
Closes https://github.com/facebook/rocksdb/pull/2295

Differential Revision: D5063826

Pulled By: IslamAbdelRahman

fbshipit-source-id: f3b5daed8b201987c9a071146ddd5c5740a2dd8c
2017-07-28 12:36:48 -07:00
Sagar Vemuri
72502cf227 Revert "comment out unused parameters"
Summary:
This reverts the previous commit 1d7048c598, which broke the build.

Did a `git revert 1d7048c`.
Closes https://github.com/facebook/rocksdb/pull/2627

Differential Revision: D5476473

Pulled By: sagar0

fbshipit-source-id: 4756ff5c0dfc88c17eceb00e02c36176de728d06
2017-07-21 18:26:26 -07:00
Victor Gao
1d7048c598 comment out unused parameters
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.

Reviewed By: igorsugak

Differential Revision: D5454343

fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
2017-07-21 14:57:44 -07:00
Siying Dong
3c327ac2d0 Change RocksDB License
Summary: Closes https://github.com/facebook/rocksdb/pull/2589

Differential Revision: D5431502

Pulled By: siying

fbshipit-source-id: 8ebf8c87883daa9daa54b2303d11ce01ab1f6f75
2017-07-15 16:11:23 -07:00
Siying Dong
e517bfa2c2 CLANG Tidy
Summary: Closes https://github.com/facebook/rocksdb/pull/2502

Differential Revision: D5326498

Pulled By: siying

fbshipit-source-id: 2f0ac6dc6ca5ddb23cecf67a278c086e52646714
2017-06-27 11:00:59 -07:00
Maysam Yabandeh
499ebb3ab5 Optimize for serial commits in 2PC
Summary:
Throughput: 46k tps in our sysbench settings (filling the details later)

The idea is to have the simplest change that gives us a reasonable boost
in 2PC throughput.

Major design changes:
1. The WAL file internal buffer is not flushed after each write. Instead
it is flushed before critical operations (WAL copy via fs) or when
FlushWAL is called by MySQL. Flushing the WAL buffer is also protected
via mutex_.
2. Use two sequence numbers: last seq, and last seq for write. Last seq
is the last visible sequence number for reads. Last seq for write is the
next sequence number that should be used to write to WAL/memtable. This
allows to have a memtable write be in parallel to WAL writes.
3. BatchGroup is not used for writes. This means that we can have
parallel writers which changes a major assumption in the code base. To
accommodate for that i) allow only 1 WriteImpl that intends to write to
memtable via mem_mutex_--which is fine since in 2PC almost all of the memtable writes
come via group commit phase which is serial anyway, ii) make all the
parts in the code base that assumed to be the only writer (via
EnterUnbatched) to also acquire mem_mutex_, iii) stat updates are
protected via a stat_mutex_.

Note: the first commit has the approach figured out but is not clean.
Submitting the PR anyway to get the early feedback on the approach. If
we are ok with the approach I will go ahead with this updates:
0) Rebase with Yi's pipelining changes
1) Currently batching is disabled by default to make sure that it will be
consistent with all unit tests. Will make this optional via a config.
2) A couple of unit tests are disabled. They need to be updated with the
serial commit of 2PC taken into account.
3) Replacing BatchGroup with mem_mutex_ got a bit ugly as it requires
releasing mutex_ beforehand (the same way EnterUnbatched does). This
needs to be cleaned up.
Closes https://github.com/facebook/rocksdb/pull/2345

Differential Revision: D5210732

Pulled By: maysamyabandeh

fbshipit-source-id: 78653bd95a35cd1e831e555e0e57bdfd695355a4
2017-06-24 14:11:29 -07:00
Sagar Vemuri
89ad9f3adb Allow ignoring unknown options when loading options from a file
Summary:
Added a flag, `ignore_unknown_options`, to skip unknown options when loading an options file (using `LoadLatestOptions`/`LoadOptionsFromFile`) or while verifying options (using `CheckOptionsCompatibility`). This will help in downgrading the db to an older version.

Also added `--ignore_unknown_options` flag to ldb

**Example Use case:**
In MyRocks, if copying from newer version to older version, it is often impossible to start because of new RocksDB options that don't exist in older version, even though data format is compatible.
MyRocks uses these load and verify functions in [ha_rocksdb.cc::check_rocksdb_options_compatibility](e004fd9f41/storage/rocksdb/ha_rocksdb.cc (L3348-L3401)).

**Test Plan:**
Updated the unit tests.
`make check`

ldb:
$ ./ldb --db=/tmp/test_db --create_if_missing put a1 b1
OK

Now edit /tmp/test_db/<OPTIONS-file> and add an unknown option.

Try loading the options now, and it fails:
$ ./ldb --db=/tmp/test_db --try_load_options get a1
Failed: Invalid argument: Unrecognized option DBOptions:: abcd

Passes with the new --ignore_unknown_options flag
$ ./ldb --db=/tmp/test_db --try_load_options --ignore_unknown_options get a1
b1
Closes https://github.com/facebook/rocksdb/pull/2423

Differential Revision: D5212091

Pulled By: sagar0

fbshipit-source-id: 2ec17636feb47dc0351b53a77e5f15ef7cbf2ca7
2017-06-13 16:58:01 -07:00
hyunwoo
0ebdd70579 fixed typo
Summary:
fixed typo
Closes https://github.com/facebook/rocksdb/pull/2312

Differential Revision: D5079631

Pulled By: sagar0

fbshipit-source-id: e4c8d1d89b244ee69e9dea1dd013227cc5241026
2017-05-17 16:41:49 -07:00
Andrew Kryczka
3fa9a39c68 Add GetAllKeyVersions API
Summary:
- Introduced an include/ file dedicated to db-related debug functions to avoid making db.h more complex
- Added debugging function, `GetAllKeyVersions()`, to return a listing of internal data for a range of user keys. The new `struct KeyVersion` exposes data similar to internal key without exposing any internal type.
- Migrated the "ldb idump" subcommand to use this function
- The API takes an inclusive-exclusive range to match behavior of "ldb idump". This will be quite annoying for users who want to query a single user key's versions :(.
Closes https://github.com/facebook/rocksdb/pull/2232

Differential Revision: D4976007

Pulled By: ajkr

fbshipit-source-id: cab375da53a7595d6575af2b7e3b776aa3ad793e
2017-05-12 15:54:06 -07:00
Yi Wu
2cd00773c7 Add bulk create/drop column family API
Summary:
Adding DB::CreateColumnFamilie() and DB::DropColumnFamilies() to bulk create/drop column families. This is to address the problem creating/dropping 1k column families takes minutes. The bottleneck is we persist options files for every single column family create/drop, and it parses the persisted options file for verification, which take a lot CPU time.

The new APIs simply create/drop column families individually, and persist options file once at the end. This improves create 1k column families to within ~0.1s. Further improvement can be merge manifest write to one IO.
Closes https://github.com/facebook/rocksdb/pull/2248

Differential Revision: D5001578

Pulled By: yiwu-arbug

fbshipit-source-id: d4e00bda671451e0b314c13e12ad194b1704aa03
2017-05-07 23:20:46 -07:00
siddontang
b551104e04 support PopSavePoint for WriteBatch
Summary:
Try to fix https://github.com/facebook/rocksdb/issues/1969
Closes https://github.com/facebook/rocksdb/pull/2170

Differential Revision: D4907333

Pulled By: yiwu-arbug

fbshipit-source-id: 417b420ff668e6c2fd0dad42a94c57385012edc5
2017-05-03 10:57:45 -07:00
Siying Dong
d616ebea23 Add GPLv2 as an alternative license.
Summary: Closes https://github.com/facebook/rocksdb/pull/2226

Differential Revision: D4967547

Pulled By: siying

fbshipit-source-id: dd3b58ae1e7a106ab6bb6f37ab5c88575b125ab4
2017-04-27 18:06:12 -07:00
Andrew Kryczka
e5e545a021 Reunite checkpoint and backup core logic
Summary:
These code paths forked when checkpoint was introduced by copy/pasting the core backup logic. Over time they diverged and bug fixes were sometimes applied to one but not the other (like fix to include all relevant WALs for 2PC), or it required extra effort to fix both (like fix to forge CURRENT file). This diff reunites the code paths by extracting the core logic into a function, CreateCustomCheckpoint(), that is customizable via callbacks to implement both checkpoint and backup.

Related changes:

- flush_before_backup is now forcibly enabled when 2PC is enabled
- Extracted CheckpointImpl class definition into a header file. This is so the function, CreateCustomCheckpoint(), can be called by internal rocksdb code but not exposed to users.
- Implemented more functions in DummyDB/DummyLogFile (in backupable_db_test.cc) that are used by CreateCustomCheckpoint().
Closes https://github.com/facebook/rocksdb/pull/1932

Differential Revision: D4622986

Pulled By: ajkr

fbshipit-source-id: 157723884236ee3999a682673b64f7457a7a0d87
2017-04-24 15:06:46 -07:00
Siying Dong
7534ba7bde StackableDB should pass ResetStats()
Summary: Closes https://github.com/facebook/rocksdb/pull/2190

Differential Revision: D4922688

Pulled By: siying

fbshipit-source-id: eaa3d122f8d389ae0508ec8b61f7780fd8b0a7ef
2017-04-20 16:11:56 -07:00
Siying Dong
97005dbd5d tools/check_format_compatible.sh to cover option file loading too
Summary:
tools/check_format_compatible.sh will check a newer version of RocksDB can open option files generated by older version releases. In order to achieve that, a new parameter "--try_load_options" is added to ldb. With this parameter set, if option file exists, we load the option file and use it to open the DB. With this opiton set, we can validate option loading logic.
Closes https://github.com/facebook/rocksdb/pull/2178

Differential Revision: D4914989

Pulled By: siying

fbshipit-source-id: db114f7724fcb41e5e9483116d84d7c4b8389ca4
2017-04-20 10:26:37 -07:00
Andrew Kryczka
df74b775e6 Limit backups opened
Summary:
This was requested by a customer who wants to proactively monitor whether any valid backups are available. The existing performance was poor because Open() serially reads every small meta-file (one per backup), which was slow on HDFS.

Now we only read the minimum number of meta-files to find `max_valid_backups_to_open` valid backups. The customer mentioned above can just set it to one.
Closes https://github.com/facebook/rocksdb/pull/2151

Differential Revision: D4882564

Pulled By: ajkr

fbshipit-source-id: cb0edf9e8ac693e4d5f24902e725a011ed8c0c2f
2017-04-19 13:26:47 -07:00
Manuel Ung
1f8b119ed6 Limit maximum memory used in the WriteBatch representation
Summary:
Extend TransactionOptions to include max_write_batch_size which determines the maximum size of the writebatch representation. If memory limit is exceeded, the operation will abort with subcode kMemoryLimit.
Closes https://github.com/facebook/rocksdb/pull/2124

Differential Revision: D4861842

Pulled By: lth

fbshipit-source-id: 46fd172ea67cc90bbba829bf0d70cfab2261c161
2017-04-10 15:42:26 -07:00
Siying Dong
9ef3627fd3 Allow checkpointing without flushing
Summary:
Add a parameter to Checkpoint::CreateCheckpoint() so that flush can be skipped if total log file size is within a threshold.
Closes https://github.com/facebook/rocksdb/pull/1993

Differential Revision: D4719842

Pulled By: siying

fbshipit-source-id: 4f9d9e1
2017-03-21 18:09:13 -07:00
Siying Dong
17866ecc3a Allow Users to change customized ldb tools' header in help printing
Summary: Closes https://github.com/facebook/rocksdb/pull/2018

Differential Revision: D4748448

Pulled By: siying

fbshipit-source-id: a54c2f9
2017-03-21 17:39:12 -07:00
Maysam Yabandeh
11526252cc Pinnableslice (2nd attempt)
Summary:
PinnableSlice

    Summary:
    Currently the point lookup values are copied to a string provided by the
    user. This incures an extra memcpy cost. This patch allows doing point lookup
    via a PinnableSlice which pins the source memory location (instead of
    copying their content) and releases them after the content is consumed
    by the user. The old API of Get(string) is translated to the new API
    underneath.

    Here is the summary for improvements:

    value 100 byte: 1.8% regular, 1.2% merge values
    value 1k byte: 11.5% regular, 7.5% merge values
    value 10k byte: 26% regular, 29.9% merge values
    The improvement for merge could be more if we extend this approach to
    pin the merge output and delay the full merge operation until the user
    actually needs it. We have put that for future work.

    PS:
    Sometimes we observe a small decrease in performance when switching from
    t5452014 to this patch but with the old Get(string) API. The d
Closes https://github.com/facebook/rocksdb/pull/1756

Differential Revision: D4391738

Pulled By: maysamyabandeh

fbshipit-source-id: 6f3edd3
2017-03-13 11:54:10 -07:00
Andrew Kryczka
0ad5af42d0 Clarify VerifyBackup behavior
Summary:
It's non-obvious to users that using the same backup engine for creating/verifying provides better results than using separate backup engines, so add a comment in header.
Closes https://github.com/facebook/rocksdb/pull/1942

Differential Revision: D4637865

Pulled By: ajkr

fbshipit-source-id: e6efe24
2017-03-02 17:24:11 -08:00
Giuseppe Ottaviano
4d7c06cedf Make WriteBatchWithIndex moveble
Summary:
`WriteBatchWithIndex` has an incorrect implicitly-generated move constructor (it will copy the pointer causing a double-free on destruction). Just switch to `unique_ptr` so we get correct move semantics for free.
Closes https://github.com/facebook/rocksdb/pull/1899

Differential Revision: D4598896

Pulled By: ajkr

fbshipit-source-id: 2373d47
2017-02-22 17:54:11 -08:00
Vitaliy Liptchinsky
1aaa898cf1 Adding GetApproximateMemTableStats method
Summary:
Added method that returns approx num of entries as well as size for memtables.
Closes https://github.com/facebook/rocksdb/pull/1841

Differential Revision: D4511990

Pulled By: VitaliyLi

fbshipit-source-id: 9a4576e
2017-02-06 14:54:16 -08:00
Andrew Kryczka
17c1180603 Generalize Env registration framework
Summary:
The Env registration framework supports registering client Envs and selecting which one to instantiate according to a text field. This enabled things like adding the -env_uri argument to db_bench, so the same binary could be reused with different Envs just by changing CLI config.

Now this problem has come up again in a non-Env context, as I want to instantiate a client Statistics implementation from db_bench, which is configured entirely via text parameters. Also, in the future we may wish to use it for deserializing client objects when loading OPTIONS file.

This diff generalizes the Env registration logic to work with arbitrary types.

- Generalized registration and instantiation code by templating them
- The entire implementation is in a header file as that's Google style guide's recommendation for template definitions
- Pattern match with std::regex_match rather than checking prefix, which was the previous behavior
- Rename functions/files to be non-Env-specific
Closes https://github.com/facebook/rocksdb/pull/1776

Differential Revision: D4421933

Pulled By: ajkr

fbshipit-source-id: 34647d1
2017-01-25 16:09:14 -08:00
Vitaliy Liptchinsky
e840213d6e Change DB::GetApproximateSizes for more flexibility needed for MyRocks
Summary:
Added an option to GetApproximateSizes to exclude file stats, as MyRocks has those counted exactly and we need only stats from memtables.
Closes https://github.com/facebook/rocksdb/pull/1787

Differential Revision: D4441111

Pulled By: IslamAbdelRahman

fbshipit-source-id: c11f4c3
2017-01-20 09:39:11 -08:00
Maysam Yabandeh
d0ba8ec8f9 Revert "PinnableSlice"
Summary:
This reverts commit 54d94e9c2c.

The pull request was landed by mistake.
Closes https://github.com/facebook/rocksdb/pull/1755

Differential Revision: D4391678

Pulled By: maysamyabandeh

fbshipit-source-id: 36d5149
2017-01-08 14:24:12 -08:00
Maysam Yabandeh
54d94e9c2c PinnableSlice
Summary:
Currently the point lookup values are copied to a string provided by the user.
This incures an extra memcpy cost. This patch allows doing point lookup
via a PinnableSlice which pins the source memory location (instead of
copying their content) and releases them after the content is consumed
by the user. The old API of Get(string) is translated to the new API
underneath.

 Here is the summary for improvements:
 1. value 100 byte: 1.8%  regular, 1.2% merge values
 2. value 1k   byte: 11.5% regular, 7.5% merge values
 3. value 10k byte: 26% regular,    29.9% merge values

 The improvement for merge could be more if we extend this approach to
 pin the merge output and delay the full merge operation until the user
 actually needs it. We have put that for future work.

PS:
Sometimes we observe a small decrease in performance when switching from
t5452014 to this patch but with the old Get(string) API. The difference
is a little and could be noise. More importantly it is safely
cancelled
Closes https://github.com/facebook/rocksdb/pull/1732

Differential Revision: D4374613

Pulled By: maysamyabandeh

fbshipit-source-id: a077f1a
2017-01-08 13:54:13 -08:00
Sage Weil
4e07b08eff include/rocksdb/utilities/env_librados: fix typo
Summary:
Broken by 972f96b3fb

Signed-off-by: Sage Weil <sage@redhat.com>
Closes https://github.com/facebook/rocksdb/pull/1719

Differential Revision: D4366123

Pulled By: IslamAbdelRahman

fbshipit-source-id: a11e535
2016-12-23 19:09:14 -08:00
Aaron Gao
972f96b3fb direct io write support
Summary:
rocksdb direct io support

```
[gzh@dev11575.prn2 ~/rocksdb] ./db_bench -benchmarks=fillseq --num=1000000
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
RocksDB:    version 5.0
Date:       Wed Nov 23 13:17:43 2016
CPU:        40 * Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
CPUCache:   25600 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
Write rate: 0 bytes/second
Compression: Snappy
Memtablerep: skip_list
Perf Level: 1
WARNING: Assertions are enabled; benchmarks unnecessarily slow
------------------------------------------------
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
DB path: [/tmp/rocksdbtest-112628/dbbench]
fillseq      :       4.393 micros/op 227639 ops/sec;   25.2 MB/s

[gzh@dev11575.prn2 ~/roc
Closes https://github.com/facebook/rocksdb/pull/1564

Differential Revision: D4241093

Pulled By: lightmark

fbshipit-source-id: 98c29e3
2016-12-22 13:09:19 -08:00
Manuel Ung
2005c88a75 Implement non-exclusive locks
Summary:
This is an implementation of non-exclusive locks for pessimistic transactions. It is relatively simple and does not prevent starvation (ie. it's possible that request for exclusive access will never be granted if there are always threads holding shared access). It is done by changing `KeyLockInfo` to hold an set a transaction ids, instead of just one, and adding a flag specifying whether this lock is currently held with exclusive access or not.

Some implementation notes:
- Some lock diagnostic functions had to be updated to return a set of transaction ids for a given lock, eg. `GetWaitingTxn` and `GetLockStatusData`.
- Deadlock detection is a bit more complicated since a transaction can now wait on multiple other transactions. A BFS is done in this case, and deadlock detection depth is now just a limit on the number of transactions we visit.
- Expirable transactions do not work efficiently with shared locks at the moment, but that's okay for now.
Closes https://github.com/facebook/rocksdb/pull/1573

Differential Revision: D4239097

Pulled By: lth

fbshipit-source-id: da7c074
2016-12-05 17:39:17 -08:00
Igor Canadi
3f407b065c Kill flashcache code in RocksDB
Summary:
Now that we have userspace persisted cache, we don't need flashcache anymore.
Closes https://github.com/facebook/rocksdb/pull/1588

Differential Revision: D4245114

Pulled By: igorcanadi

fbshipit-source-id: e2c1c72
2016-12-01 10:09:22 -08:00
Islam AbdelRahman
f39452e81f Fix heap use after free ASAN/Valgrind
Summary:
Dont use c_str() of temp std::string in RocksLuaCompactionFilter::Name()
Closes https://github.com/facebook/rocksdb/pull/1535

Differential Revision: D4199094

Pulled By: IslamAbdelRahman

fbshipit-source-id: e56ce62
2016-11-17 12:24:12 -08:00
Yueh-Hsuan Chiang
647eafdc21 Introduce Lua Extension: RocksLuaCompactionFilter
Summary:
This diff includes an implementation of CompactionFilter that allows
users to write CompactionFilter in Lua.  With this ability, users can
dynamically change compaction filter logic without requiring building
the rocksdb binary and restarting the database.

To compile, WITH_LUA_PATH must be specified to the base directory
of lua.
Closes https://github.com/facebook/rocksdb/pull/1478

Differential Revision: D4150138

Pulled By: yhchiang

fbshipit-source-id: ed84222
2016-11-16 15:39:12 -08:00
Maysam Yabandeh
361010d447 Exporting compaction stats in the form of a map
Summary:
Currently the compaction stats are printed to stdout. We want to export the compaction stats in a map format so that the upper layer apps (e.g., MySQL) could present
the stats in any format required by the them.
Closes https://github.com/facebook/rocksdb/pull/1477

Differential Revision: D4149836

Pulled By: maysamyabandeh

fbshipit-source-id: b3df19f
2016-11-11 20:54:14 -08:00
Reid Horuff
4dfaa6610a Make IsDeadlockDetect() virtual member of Transaction
Summary: Make `IsDeadlockDetect()` virtual member of base class `Transaction` for ease of use in MyRocks

Test Plan: compiles. compiles into MyRocks call-site.

Reviewers: mung

Reviewed By: mung

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D65385
2016-10-21 14:47:59 -07:00
Islam AbdelRahman
869ae5d786 Support IngestExternalFile (remove AddFile restrictions)
Summary:
Changes in the diff

API changes:
- Introduce IngestExternalFile to replace AddFile (I think this make the API more clear)
- Introduce IngestExternalFileOptions (This struct will encapsulate the options for ingesting the external file)
- Deprecate AddFile() API

Logic changes:
- If our file overlap with the memtable we will flush the memtable
- We will find the first level in the LSM tree that our file key range overlap with the keys in it
- We will find the lowest level in the LSM tree above the the level we found in step 2 that our file can fit in and ingest our file in it
- We will assign a global sequence number to our new file
- Remove AddFile restrictions by using global sequence numbers

Other changes:
- Refactor all AddFile logic to be encapsulated in ExternalSstFileIngestionJob

Test Plan:
unit tests (still need to add more)
addfile_stress (https://reviews.facebook.net/D65037)

Reviewers: yiwu, andrewkr, lightmark, yhchiang, sdong

Reviewed By: sdong

Subscribers: jkedgar, hcz, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D65061
2016-10-20 17:05:32 -07:00
Manuel Ung
4edd39fda2 Implement deadlock detection
Summary: Implement deadlock detection. This is done by maintaining a TxnID -> TxnID map which represents the edges in the wait for graph (this is named `wait_txn_map_`).

Test Plan: transaction_test

Reviewers: IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D64491
2016-10-19 19:45:57 -07:00
Yi Wu
e29d3b67c2 Make max_background_compactions and base_background_compactions dynamic changeable
Summary:
Add DB::SetDBOptions to dynamic change max_background_compactions and base_background_compactions.
I'll add more dynamic changeable options soon.

Test Plan: unit test.

Reviewers: yhchiang, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D64749
2016-10-14 12:25:39 -07:00
Reid Horuff
02b3e3985c Make txn->GetState() const
Summary: makes Transaction::GetState() a const function.

Test Plan: compiles.

Reviewers: mung

Reviewed By: mung

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D64929
2016-10-11 15:48:50 -07:00
Reid Horuff
37737c3a6b Expose Transaction State Publicly
Summary:
This exposes a transactions state through a public api rather than through a public member variable. I also do some name refactoring.
ExecutionStatus => TransactionState
exec_status_ => trx_state_

Test Plan: It compiles and transaction_test passes.

Reviewers: IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, mung, dhruba, sdong

Differential Revision: https://reviews.facebook.net/D64689
2016-10-07 11:58:53 -07:00
Sage Weil
4985f60fc8 env_mirror: fix a few leaks (#1363)
* env_mirror: fix leak from LockFile

Signed-off-by: Sage Weil <sage@redhat.com>

* env_mirror: instruct EnvMirror whether mirrored Envs should be destroyed

The lifecycle rules for Env are frustrating and undocumented.  Notably,
Env::Default() should *not* be freed, but any Env instances we created
should be.

Explicitly instruct EnvMirror whether to clean up child Env instances.
Default to false so that we do not affect existing callers.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-10-06 10:43:05 -07:00
Manuel Ung
be1f1092c9 Expose transaction id, lock state information and transaction wait information
Summary:
This diff does 3 things:

Expose TransactionID so that we can identify transactions when we retrieve locking and lock wait information. This is exposed as `Transaction::GetID`.

Expose lock state information by locking all stripes in all column families and copying their contents to a data structure. This is exposed as `TransactionDB::GetLockStatusData`.

Adds support for tracking the transaction and the key being waited on, and exposes this as `Transaction::GetWaitingTxn`.

Test Plan: unit tests

Reviewers: horuff, sdong

Reviewed By: sdong

Subscribers: vasilep, hermanlee4, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D64413
2016-09-30 11:41:21 -07:00
Aaron Gao
f517d9dd09 Add SeekForPrev() to Iterator
Summary:
Add new Iterator API, `SeekForPrev`: find the last key that <= target key
support prefix_extractor
support prefix_same_as_start
support upper_bound
not supported in iterators without Prev()

Also add tests in db_iter_test and db_iterator_test

Pass all tests
Cheers!

Test Plan: make all check -j64

Reviewers: andrewkr, yiwu, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D64149
2016-09-27 18:20:57 -07:00
Yi Wu
9ed928e7a9 Split DBOptions into ImmutableDBOptions and MutableDBOptions
Summary: Use ImmutableDBOptions/MutableDBOptions internally and DBOptions only for user-facing APIs. MutableDBOptions is barely a placeholder for now. I'll start to move options to MutableDBOptions in following diffs.

Test Plan:
  make all check

Reviewers: yhchiang, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D64065
2016-09-23 16:34:04 -07:00
Yi Wu
17f76fc564 DB::GetOptions() reflect dynamic changed options
Summary: DB::GetOptions() reflect dynamic changed options.

Test Plan: See the new unit test.

Reviewers: yhchiang, sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D63903
2016-09-14 22:10:28 -07:00
Aaron Gao
4590b53a4b add stats to Cache::LookUp()
Summary: basically for SimCache stats. I find most times it is hard to pass Statistics* to SimCache constructor.

Test Plan: make all check

Reviewers: andrewkr, sdong, yiwu

Reviewed By: yiwu

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D62193
2016-09-01 13:50:39 -07:00
Aaron Gao
4ad928e170 add comment to SimCache to estimate actual capacity
Summary: as title

Test Plan: make all check

Reviewers: yiwu

Reviewed By: yiwu

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D62493
2016-08-26 11:36:14 -07:00
Andrew Kryczka
3771e37970 WriteBatch support for range deletion
Summary:
Add API to WriteBatch to store range deletions in its buffer
which are later added to memtable. In the WriteBatch buffer, a range
deletion is encoded as "<optype><CF ID (optional)><begin key><end key>".

With this diff, the range tombstones are stored inline with the data in
the memtable. It's useful for now because the test cases rely on the
data being accessible via memtable. My next step is to store range
tombstones in a separate area in the memtable.

Test Plan: unit tests

Reviewers: IslamAbdelRahman, sdong, wanning

Reviewed By: wanning

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D61401
2016-08-16 08:16:04 -07:00
Aaron Gao
e408e98c8c add Name() to Cache
Summary: preparation for detecting Cache type. If SimCache, we then may trigger some command like "setSimCapacity()" with setOptions()

Test Plan: make all check

Reviewers: yiwu, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D61953
2016-08-12 14:16:57 -07:00
Aaron Gao
76a67cf741 support stackableDB as the baseDB of transactionDB
Summary: make transactionDB working with StackableDB

Test Plan: make all check -j64

Reviewers: andrewkr, yiwu, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D60705
2016-08-11 14:19:33 -07:00
Aaron Gao
2914de64e8 add sim_cache stats to Statistics
Summary:
add SIM_BLOCK_CACHE_HIT and SIM_BLOCK_CACHE_MISS tickers.
maybe can be combined with Histograms like DB_GET to evaluate the current setting of the size of block cache.

Test Plan: make all check

Reviewers: sdong, andrewkr, IslamAbdelRahman, yiwu

Reviewed By: yiwu

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D61803
2016-08-10 17:42:24 -07:00
Zongzhi Chen
98d0b78eac Added check_snapshot option in the DB's AddFile function (#1261)
* Added check_snapshot option in the DB's AddFile function

* change check_snapshot to skip_snapshot_check

* add unit test for skip_snapshot_check

* Add skip_snapshot_check comment
2016-08-09 18:14:13 -07:00
omegaga
44f5cc57a5 Add time series database (resubmitted)
Summary: Implement a time series database that supports DateTieredCompactionStrategy. It wraps a db object and separate SST files in different column families (time windows).

Test Plan: Add `date_tiered_test`.

Reviewers: dhruba, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D61653
2016-08-05 15:56:22 -07:00
sdong
7c4615cf1f A utility function to help users migrate DB after options change
Summary: Add a utility function that trigger necessary full compaction and put output to the correct level by looking at new options and old options.

Test Plan: Add unit tests for it.

Reviewers: andrewkr, igor, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: muthu, sumeet, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D60783
2016-08-05 15:39:55 -07:00
ryneli
663afef884 Add EnvLibrados - RocksDB Env of RADOS (#1222)
EnvLibrados is a customized RocksDB Env to use RADOS as the backend file system of RocksDB. It overrides all file system related API of default Env. The easiest way to use it is just like following:

	std::string db_name = "test_db";
	std::string config_path = "path/to/ceph/config";
	DB* db;
	Options options;
	options.env = EnvLibrados(db_name, config_path);
	Status s = DB::Open(options, kDBPath, &db);

Then EnvLibrados will forward all file read/write operation to the RADOS cluster assigned by config_path. Default pool is db_name+"_pool".

There are some options that users could set for EnvLibrados.
- write_buffer_size. This variable is the max buffer size for WritableFile. After reaching the buffer_max_size, EnvLibrados will sync buffer content to RADOS, then clear buffer.
- db_pool. Rather than using default pool, users could set their own db pool name
- wal_dir. The dir for WAL files. Because RocksDB only has 2-level structure (dir_name/file_name), the format of wal_dir is "/dir_name"(CAN'T be "/dir1/dir2"). Default wal_dir is "/wal".
- wal_pool. Corresponding pool name for WAL files. Default value is db_name+"_wal_pool"

The example of setting options looks like following:

	db_name = "test_db";
	db_pool = db_name+"_pool";
	wal_dir = "/wal";
	wal_pool = db_name+"_wal_pool";
	write_buffer_size = 1 << 20;
	env_ = new EnvLibrados(db_name, config, db_pool, wal_dir, wal_pool, write_buffer_size);

	DB* db;
	Options options;
	options.env = env_;
	// The last level dir name should match the dir name in prefix_pool_map
	options.wal_dir = "/tmp/wal";

	// open DB
	Status s = DB::Open(options, kDBPath, &db);

Librados is required to compile EnvLibrados. Then use "$make LIBRADOS=1" to compile RocksDB. If you want to only compile EnvLibrados test, just run "$ make env_librados_test LIBRADOS=1". To run env_librados_test, you need to have a running RADOS cluster with the configure file located in "../ceph/src/ceph.conf" related to "rocksdb/".
2016-07-21 11:16:34 -07:00
Yi Wu
4b95253587 Refactor cache.cc
Summary: Refactor cache.cc so that I can plugin clock cache (D55581). Mainly move `ShardedCache` to separate file, move `LRUHandle` back to cache.cc and rename it lru_cache.cc.

Test Plan:
    make check -j64

Reviewers: lightmark, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D59655
2016-07-15 10:41:36 -07:00
Aaron Gao
dda6c72ac8 Add DestroyColumnFamilyHandle(ColumnFamilyHandle**) to db.h
Summary:
add DestroyColumnFamilyHandle(ColumnFamilyHandle**) to close column family instead of deleting cfh*
User should call this to close a cf and then we can detect the deletion in this function.

Test Plan: make all check -j64

Reviewers: andrewkr, yiwu, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D60765
2016-07-13 17:59:25 -07:00
Aaron Gao
8e6b38d895 update DB::AddFile to ingest list of sst files
Summary:
DB::AddFile(std::string file_path) API that allow them to ingest an SST file created using SstFileWriter
We want to update this interface to be able to accept a list of files that will be ingested, DB::AddFile(std::vector<std::string> file_path_list).

Test Plan:
Add test case `AddExternalSstFileList` in `DBSSTTest`. To make sure:
1. files key ranges are not overlapping with each other
2. each file key range dont overlap with the DB key range
3. make sure no snapshots are held

Reviewers: andrewkr, sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D58587
2016-07-11 10:43:12 -07:00
Reid Horuff
892e9d3047 make transaction WriteOptions modifiable 2016-06-27 12:53:30 -07:00
sdong
0babce57f7 Move away from enum char value -1
Summary: char is not signed in some platforms. Having negative values confuse those compilers.

Test Plan: Run all existing tests.

Reviewers: andrewkr, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59619
2016-06-14 17:07:34 -07:00
Uddipta Maity
1147e5b05a Adding support for sharing throttler between multiple backup and restores
Summary:
Rocksdb backup and restore rate limiting is currently done per backup/restore.
So, it is difficult to control rate across multiple backup/restores. With this
change, a throttler can be provided. If a throttler is provided, it is used.
Otherwise, a new throttler is created based on the actual rate limits specified
in the options.

Test Plan: Added unit tests

Reviewers: ldemailly, andrewkr, sdong

Reviewed By: andrewkr

Subscribers: igor, yiwu, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D56265
2016-06-03 17:02:07 -07:00
Andrew Kryczka
af0c9ac01d Env registry for URI-based Env selection [pluggable Env part 1]
Summary:
This enables configurable Envs without recompiling. For example, my
next diff will make env_test test an Env created by NewEnvFromUri(). Then,
users can determine which Env is tested simply by providing the URI for
NewEnvFromUri() (e.g., through a CLI argument or environment variable).

The registration process allows us to register any Env that is linked with the
RocksDB library, so we can register our internal Envs as well.

The registration code is inspired by our internal InitRegistry.

Test Plan: new unit test

Reviewers: IslamAbdelRahman, lightmark, ldemailly, sdong

Reviewed By: sdong

Subscribers: leveldb, dhruba, andrewkr

Differential Revision: https://reviews.facebook.net/D58449
2016-06-03 08:15:16 -07:00
sdong
c40c4cae14 LDBCommand::SelectCommand to use a struct as the parameter
Summary: The function wrapper for LDBCommand::SelectCommand is too long so that Windows build fails with warning "decorated name length exceeded, name was truncated". Shrink the length by using a struct.

Test Plan: Build on both of Linux and Windows and make sure the warning doesn't show in either platform.

Reviewers: andrewkr, adsharma, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58965
2016-05-31 10:26:53 -07:00
sdong
0e20000171 LDBCommand::InitFromCmdLineArgs() to move from template to function wrapper
Summary:
Build failure with some compiler setting with

tools/reduce_levels_test.cc:97: undefined reference to `rocksdb::LDBCommand* rocksdb::LDBCommand::InitFromCmdLineArgs<rocksdb::LDBCommand* (*)(std::string const&, std::vector<std::string, std::allocator<std::string> > const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, std::vector<std::string, std::allocator<std::string> > const&)>(std::vector<std::string, std::allocator<std::string> > const&, rocksdb::Options const&, rocksdb::LDBOptions const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const*, rocksdb::LDBCommand* (*)(std::string const&, std::vector<std::string, std::allocator<std::string> > const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, std::vector<std::string, std::allocator<std::string> > const&))'

Fix it by changing to function pointer instead

Test Plan: Run all existing tests

Reviewers: andrewkr, kradhakrishnan, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: adsharma, lightmark, yiwu, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58905
2016-05-27 11:07:48 -07:00
Aaron Gao
5d660258e7 add simulator Cache as class SimCache/SimLRUCache(with test)
Summary: add class SimCache(base class with instrumentation api) and SimLRUCache(derived class with detailed implementation) which is used as an instrumented block cache that can predict hit rate for different cache size

Test Plan:
Add a test case in `db_block_cache_test.cc` called `SimCacheTest` to test basic logic of SimCache.
Also add option `-simcache_size` in db_bench. if set with a value other than -1, then the benchmark will use this value as the size of the simulator cache and finally output the simulation result.
```
[gzh@dev9927.prn1 ~/local/rocksdb] ./db_bench -benchmarks "fillseq,readrandom" -cache_size 1000000 -simcache_size 1000000
RocksDB:    version 4.8
Date:       Tue May 17 16:56:16 2016
CPU:        32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
CPUCache:   20480 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
Write rate: 0 bytes/second
Compression: Snappy
Memtablerep: skip_list
Perf Level: 0
WARNING: Assertions are enabled; benchmarks unnecessarily slow
------------------------------------------------
DB path: [/tmp/rocksdbtest-112628/dbbench]
fillseq      :       6.809 micros/op 146874 ops/sec;   16.2 MB/s
DB path: [/tmp/rocksdbtest-112628/dbbench]
readrandom   :       6.343 micros/op 157665 ops/sec;   17.4 MB/s (1000000 of 1000000 found)

SIMULATOR CACHE STATISTICS:
SimCache LOOKUPs: 986559
SimCache HITs:    264760
SimCache HITRATE: 26.84%

[gzh@dev9927.prn1 ~/local/rocksdb] ./db_bench -benchmarks "fillseq,readrandom" -cache_size 1000000 -simcache_size 10000000
RocksDB:    version 4.8
Date:       Tue May 17 16:57:10 2016
CPU:        32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
CPUCache:   20480 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
Write rate: 0 bytes/second
Compression: Snappy
Memtablerep: skip_list
Perf Level: 0
WARNING: Assertions are enabled; benchmarks unnecessarily slow
------------------------------------------------
DB path: [/tmp/rocksdbtest-112628/dbbench]
fillseq      :       5.066 micros/op 197394 ops/sec;   21.8 MB/s
DB path: [/tmp/rocksdbtest-112628/dbbench]
readrandom   :       6.457 micros/op 154870 ops/sec;   17.1 MB/s (1000000 of 1000000 found)

SIMULATOR CACHE STATISTICS:
SimCache LOOKUPs: 1059764
SimCache HITs:    374501
SimCache HITRATE: 35.34%

[gzh@dev9927.prn1 ~/local/rocksdb] ./db_bench -benchmarks "fillseq,readrandom" -cache_size 1000000 -simcache_size 100000000
RocksDB:    version 4.8
Date:       Tue May 17 16:57:32 2016
CPU:        32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
CPUCache:   20480 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
Write rate: 0 bytes/second
Compression: Snappy
Memtablerep: skip_list
Perf Level: 0
WARNING: Assertions are enabled; benchmarks unnecessarily slow
------------------------------------------------
DB path: [/tmp/rocksdbtest-112628/dbbench]
fillseq      :       5.632 micros/op 177572 ops/sec;   19.6 MB/s
DB path: [/tmp/rocksdbtest-112628/dbbench]
readrandom   :       6.892 micros/op 145094 ops/sec;   16.1 MB/s (1000000 of 1000000 found)

SIMULATOR CACHE STATISTICS:
SimCache LOOKUPs: 1150767
SimCache HITs:    1034535
SimCache HITRATE: 89.90%
```

Reviewers: IslamAbdelRahman, andrewkr, sdong

Reviewed By: sdong

Subscribers: MarkCallaghan, andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57999
2016-05-23 23:35:23 -07:00
Arun Sharma
5c06e0814c [ldb] Templatize the Selector
Summary:
So a customized ldb tool can pass it's own Selector.
Such a selector is expected to call LDBCommand::SelectCommand
and then add some of its own customized commands

Test Plan: make ldb

Reviewers: sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57249
2016-05-13 12:12:39 -07:00
Arun Sharma
49815e3841 [ldb] Export LDBCommandRunner
Summary:
The implementation remains where it is. Only the
header is exported. This is so that a customized
ldb tool can print help along with its own
extra commands

Test Plan: make ldb

Reviewers: sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57255
2016-05-11 13:08:45 -07:00
Andrew Kryczka
5c1c904877 ldb option for compression dictionary size
Summary:
Expose the option so it's easy to run offline tests of compression
dictionary feature.

Test Plan:
verified compression dictionary is loaded into lz4 for below command:

  $ ./ldb compact --compression_type=lz4 --compression_max_dict_bytes=16384 --db=/tmp/feed-compression-test/

Reviewers: IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57441
2016-05-10 16:33:47 -07:00
Reid Horuff
8a66c85e90 [rocksdb] Two Phase Transaction
Summary:
Two Phase Commit addition to RocksDB.

See wiki: https://github.com/facebook/rocksdb/wiki/Two-Phase-Commit-Implementation
Quip: https://fb.quip.com/pxZrAyrx53r3

Depends on:
WriteBatch modification: https://reviews.facebook.net/D54093
Memtable Log Referencing and Prepared Batch Recovery: https://reviews.facebook.net/D56919

Test Plan:
- SimpleTwoPhaseTransactionTest
- PersistentTwoPhaseTransactionTest.
- TwoPhaseRollbackTest
- TwoPhaseMultiThreadTest
- TwoPhaseLogRollingTest
- TwoPhaseEmptyWriteTest
- TwoPhaseExpirationTest

Reviewers: IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: leveldb, hermanlee4, andrewkr, vasilep, dhruba, santoshb

Differential Revision: https://reviews.facebook.net/D56925
2016-05-10 14:06:07 -07:00
Reid Horuff
0460e9dcce Modification of WriteBatch to support two phase commit
Summary: Adds three new WriteBatch data types: Prepare(xid), Commit(xid), Rollback(xid). Prepare(xid) should precede the (single) operation to which is applies. There can obviously be multiple Prepare(xid) markers. There should only be one Rollback(xid) or Commit(xid) marker yet not both. None of this logic is currently enforced and will most likely be implemented further up such as in the memtableinserter. All three markers are similar to PutLogData in that they are writebatch meta-data, ie stored but not counted. All three markers differ from PutLogData in that they will actually be written to disk. As for WriteBatchWithIndex, Prepare, Commit, Rollback are all implemented just as PutLogData and none are tested just as PutLogData.

Test Plan: single unit test in write_batch_test.

Reviewers: hermanlee4, sdong, anthony

Subscribers: leveldb, dhruba, vasilep, andrewkr

Differential Revision: https://reviews.facebook.net/D57867
2016-05-10 14:06:07 -07:00
Andrew Kryczka
269f6b2e2d Revert "Modification of WriteBatch to support two phase commit"
Summary: Revert D54093 and D57453

Test Plan: running make check

Reviewers: horuff, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57819
2016-05-06 16:58:24 -07:00
Arun Sharma
04dec2a359 [ldb] Export ldb_cmd*.h
Summary:
This is needed so that rocksdb users can add more
commands to the included ldb tool by adding more custom
commands.

Test Plan: make -j ldb

Reviewers: sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57243
2016-05-06 16:09:09 -07:00
Reid Horuff
6e56a114be Modification of WriteBatch to support two phase commit
Summary: Adds three new WriteBatch data types: Prepare(xid), Commit(xid), Rollback(xid). Prepare(xid) should precede the (single) operation to which is applies. There can obviously be multiple Prepare(xid) markers. There should only be one Rollback(xid) or Commit(xid) marker yet not both. None of this logic is currently enforced and will most likely be implemented further up such as in the memtableinserter. All three markers are similar to PutLogData in that they are writebatch meta-data, ie stored but not counted. All three markers differ from PutLogData in that they will actually be written to disk. As for WriteBatchWithIndex, Prepare, Commit, Rollback are all implemented just as PutLogData and none are tested just as PutLogData.

Test Plan: single unit test in write_batch_test.

Reviewers: hermanlee4, sdong, anthony

Subscribers: andrewkr, vasilep, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D54093
2016-04-29 11:50:30 -07:00
Li Peng
6d4832a998 Merge pull request #1101 from flyd1005/wip-fix-typo
fix typos and remove duplicated words
2016-04-28 02:30:44 -07:00
Andrew Kryczka
40b840f294 Delete deprecated *BackupableDB interface for backups
Summary:
This interface is redundant and has been deprecated for a while.
It's also unused internally. Let's delete it.

I moved the comments to the corresponding functions in BackupEngine/
BackupEngineReadOnly. This caused the diff tool to not work cleanly.

Test Plan:
unit tests

  $ ./backupable_db_test

Reviewers: yhchiang, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D56331
2016-04-18 09:04:14 -07:00
Uddipta Maity
b55e2165be Rocksdb backup can store optional application specific metadata
Summary:
Rocksdb backup engine maintains metadata about backups in separate files. But,
there was no way to add extra application specific data to it. Adding support
for that.
In some use cases, applications decide to restore a backup based on some
metadata. This will help those cases to cheaply decide whether to restore or
not.

Test Plan:
Added a unit test. Existing ones are passing

Sample meta file for BinaryMetadata test-

```

1459454043
0
metadata 6162630A64656600676869
2
private/1/MANIFEST-000001 crc32 1184723444
private/1/CURRENT crc32 3505765120

```

Reviewers: sdong, ldemailly, andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba, ldemailly

Differential Revision: https://reviews.facebook.net/D56007
2016-04-01 10:56:52 -07:00
agiardullo
2200295ee1 optimistic transactions support for reinitialization
Summary: Extend optimization in D53835 to optimistic transactions for completeness.

Test Plan: added test

Reviewers: sdong, IslamAbdelRahman, horuff, jkedgar

Reviewed By: horuff

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D55059
2016-03-07 19:03:09 -08:00
Andrew Kryczka
501927ffc4 [backupable db] Remove file size embedded in name workaround
Summary:
Now that we get sizes efficiently, we no longer need the workaround to
embed file size in filename.

Test Plan:
  $ ./backupable_db_test

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D55035
2016-03-03 13:32:20 -08:00
agiardullo
5ea9aa3c14 TransactionDB:ReinitializeTransaction
Summary: Add function to reinitialize a transaction object so that it can be reused.  This is an optimization so users can potentially avoid reallocating transaction objects.

Test Plan: added tests

Reviewers: yhchiang, kradhakrishnan, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: jkedgar, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D53835
2016-02-29 16:27:32 -08:00
agiardullo
cd3fe675a9 Remove stale TODO
Summary: This was fixed by 0c2bd5cb

Test Plan: n/a

Reviewers: gabijs

Reviewed By: gabijs

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D54753
2016-02-25 17:44:35 -08:00
Yueh-Hsuan Chiang
730a422c3a Improve the documentation of LoadLatestOptions
Summary: Improve the documentation of LoadLatestOptions

Test Plan: No code change

Reviewers: anthony, IslamAbdelRahman, kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D54087
2016-02-16 14:55:24 -08:00
Baraa Hamodi
21e95811d1 Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
agiardullo
fe93bf9b5d Transaction::UndoGetForUpdate
Summary: MyRocks wants to be able to un-lock a key that was just locked by GetForUpdate().  To do this safely, I am now keeping track of the number of reads(for update) and writes for each key in a transaction.  UndoGetForUpdate() will only unlock a key if it hasn't been written and the read count reaches 0.

Test Plan: more unit tests

Reviewers: igor, rven, yhchiang, spetrunia, sdong

Reviewed By: spetrunia, sdong

Subscribers: spetrunia, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47043
2016-02-09 10:46:11 -08:00
agiardullo
eff309867e Do not use timed_mutex in TransactionDB
Summary: Stopped using std::timed_mutex as it has known issues in older versiong of gcc.  Ran into these problems when testing MongoRocks.

Test Plan: unit tests.  Manual mongo testing on gcc 4.8.

Reviewers: igor, yhchiang, rven, IslamAbdelRahman, kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D52197
2015-12-18 17:26:02 -08:00
agiardullo
84f98792d6 Transaction::SetWriteOptions()
Summary: Add support to change write options after creating a transaction.  This is needed for MongoRocks.

Test Plan: added test

Reviewers: sdong, rven, kradhakrishnan, IslamAbdelRahman, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D51867
2015-12-11 16:08:25 -08:00
Igor Canadi
64fa43843b Merge pull request #862 from ceph/wip-env
implement EnvMirror
2015-12-10 18:45:07 -08:00
Sage Weil
2074ddd625 env: add EnvMirror
This is an Env implementation that mirrors all storage-related methods on
two different backend Env's and verifies that they return the same
results (return status and read results).  This is useful for implementing
a new Env and verifying its correctness.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-12-10 21:32:45 -05:00
charsyam
c30b499541 fix typos in comments 2015-12-11 01:54:48 +09:00
Jay Edgar
b28b7c6dd9 Added callback notification when a snapshot is created
Summary: When SetSnapshot() is used the caller immediately knows a snapshot has been created, but when SetSnapshotOnNextOperation() is used the caller needs a way to get notified when that snapshot has been generated.  This creates an interface that the client can implement that will be called at the time the snapshot is created.

Test Plan: Added a new SetSnapshotOnNextOperationWithNotification test into the transaction_test.

Reviewers: sdong, anthony

Reviewed By: anthony

Subscribers: yoshinorim, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D51177
2015-12-04 10:20:36 -08:00
Alex Yang
e8180f9901 added public api to schedule flush/compaction, code to prevent race with db::open
Summary:
Fixes T8781168.

Added a new function EnableAutoCompactions in db.h to be publicly
avialable.  This allows compaction to be re-enabled after disabling it via
SetOptions

Refactored code to set the dbptr earlier on in TransactionDB::Open and DB::Open
Temporarily disable auto_compaction in TransactionDB::Open until dbptr is set to
prevent race condition.

Test Plan:
Ran make all check

verified fix on myrocks side:
was able to reproduce the seg fault with
../tools/mysqltest.sh --mem --force rocksdb.drop_table

method was to manually sleep the thread after DB::Open but before TransactionDB ptr was
assigned in transaction_db_impl.cc:
  DB::Open(db_options, dbname, column_families_copy, handles, &db);
  clock_t goal = (60000 * 10) + clock();
  while (goal > clock());
  ...dbptr(aka rdb) gets assigned below

verified my changes fixed the issue.

Also added unit test 'ToggleAutoCompaction' in transaction_test.cc

Reviewers: hermanlee4, anthony

Reviewed By: anthony

Subscribers: alex, dhruba

Differential Revision: https://reviews.facebook.net/D51147
2015-12-03 22:59:44 -08:00
sdong
6bbfa1874b BackupDB to have a mode to use file size in file name
Summary: Getting file size from all the backup files can take a long time. In some cases, the sizes are available in file names. We allow a mode to get those sizes from file name.

Test Plan:
Make some unit tests in backupable_db_test to run in such a mode.
Make sure RocksDB Lite builds too.

Reviewers: IslamAbdelRahman, rven, yhchiang, kradhakrishnan, anthony, igor

Reviewed By: igor

Subscribers: muthu, asameet, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D51243
2015-11-25 11:55:37 -08:00
Yueh-Hsuan Chiang
d781da8164 Add CheckOptionsCompatibility() API to options_util
Summary:
Add CheckOptionsCompatibility() API to options_util that returns
Status::OK if the input DBOptions and ColumnFamilyDescriptors
are compatible with the latest options stored in the specified DB path.

Test Plan: Added tests in options_util_test

Reviewers: igor, anthony, IslamAbdelRahman, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D50649
2015-11-12 16:52:51 -08:00
Yueh-Hsuan Chiang
e11f676e34 Add OptionsUtil::LoadOptionsFromFile() API
Summary:
This patch adds OptionsUtil::LoadOptionsFromFile() and
OptionsUtil::LoadLatestOptionsFromDB(), which allow developers
to construct DBOptions and ColumnFamilyOptions from a RocksDB
options file.  Note that most pointer-typed options such as
merge_operator will not be constructed.

With this API, developers no longer need to remember all the
options in order to reopen an existing rocksdb instance like
the following:

  DBOptions db_options;
  std::vector<std::string> cf_names;
  std::vector<ColumnFamilyOptions> cf_opts;

  // Load primitive-typed options from an existing DB
  OptionsUtil::LoadLatestOptionsFromDB(
      dbname, &db_options, &cf_names, &cf_opts);

  // Initialize necessary pointer-typed options
  cf_opts[0].merge_operator.reset(new MyMergeOperator());
  ...

  // Construct the vector of ColumnFamilyDescriptor
  std::vector<ColumnFamilyDescriptor> cf_descs;
  for (size_t i = 0; i < cf_opts.size(); ++i) {
    cf_descs.emplace_back(cf_names[i], cf_opts[i]);
  }

  // Open the DB
  DB* db = nullptr;
  std::vector<ColumnFamilyHandle*> cf_handles;
  auto s = DB::Open(db_options, dbname, cf_descs,
                    &handles, &db);

Test Plan:
Augment existing tests in column_family_test
options_test
db_test

Reviewers: igor, IslamAbdelRahman, sdong, anthony

Reviewed By: anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D49095
2015-11-12 06:52:43 -08:00
Yueh-Hsuan Chiang
f3ca28ab03 Correct the comment of GetApproximateMemoryUsageByType
Summary: Correct the comment of GetApproximateMemoryUsageByType.

Test Plan: No code change.

Reviewers: igor, sdong, anthony, IslamAbdelRahman

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D50409
2015-11-08 09:02:35 -08:00
Yueh-Hsuan Chiang
7d7ee2b654 Add Memory Insight support to utilities
Summary:
This patch introduces utilities/memory, which currently includes
GetApproximateMemoryUsageByType that reports different types of
rocksdb memory usage given a list of input DBs.

The API also take care of the case where Cache could be shared
across multiple column families / multiple db instances.

Currently, it reports memory usage of memtable, table-readers
and cache.

Test Plan: utilities/memory/memory_test.cc

Reviewers: igor, anthony, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D49257
2015-11-03 17:52:17 -08:00
Yueh-Hsuan Chiang
3ecbab0040 Add GetAggregatedIntProperty(): returns the aggregated value from all CFs
Summary:
This patch adds GetAggregatedIntProperty() that returns the aggregated
value from all CFs

Test Plan: Added a test in db_test

Reviewers: igor, sdong, anthony, IslamAbdelRahman, rven

Reviewed By: rven

Subscribers: rven, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D49497
2015-11-03 15:54:18 -08:00
Satnam Singh
c9aef3c41c Add RocksDb/GeoDb Iterator interface
Summary:
This diff is a first step towards an iterator based interface for the
SearchRadial method which replaces a vector of GeoObjects with an
iterator for GeoObjects. This diff works by just wrapping the iterator
for the encapsulated vector of GeoObjects. A future diff could extend
this approach by defining an interator in terms of the underlying
iteration in SearchRadial which would then remove the need to have
an in-memory representation for all the matching GeoObjects.
Fixes T8421387

Test Plan:
The existing tests have been modified to work with the new
interface.

Reviewers: IslamAbdelRahman, kradhakrishnan, dhruba, igor

Reviewed By: igor

Subscribers: igor, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D50031
2015-11-03 15:20:58 -08:00
Alexey Maykov
980a82ee2f Fix a bug in GetApproximateSizes
Summary: Need to pass through the memtable parameter.

Test Plan: built, tested through myrocks

Reviewers: igor, sdong, rven

Reviewed By: rven

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D49167
2015-10-21 18:34:39 -07:00
agiardullo
cfaa33f9a5 Update transaction iterator documentation
Summary: Remove warning about an issue that was resolved.  Turns out the issue was a false-alarm.

Test Plan: n/a

Reviewers: igor, yhchiang, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D49011
2015-10-19 13:08:18 -07:00
Alexey Maykov
f18acd8875 Fixed the clang compilation failure
Summary: As above.

Test Plan: USE_CLANG=1 make check -j

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D48981
2015-10-19 10:38:50 -07:00
Alexey Maykov
e1a09a7703 Implementation for GetPropertiesOfTablesInRange
Summary: In MyRocks, it is sometimes important to get propeties only for the subset of the database. This diff implements the API in RocksDB.

Test Plan: ran the GetPropertiesOfTablesInRange

Reviewers: rven, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D48651
2015-10-17 13:34:43 -07:00
Jay Edgar
8f143e03fb Add ClearSnapshot()
Summary:
MyRocks needs the ability to clear a snapshot for Read Committed support

Test Plan: transaction_test

Reviewers: anthony

Reviewed By: anthony

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D48861
2015-10-16 11:53:30 -07:00
agiardullo
def74f8763 Deferred snapshot creation in transactions
Summary: Support for Transaction::CreateSnapshotOnNextOperation().  This is to fix a write-conflict race-condition that Yoshinori was running into when testing MyRocks with LinkBench.

Test Plan: New tests

Reviewers: yhchiang, spetrunia, rven, igor, yoshinorim, sdong

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D48099
2015-10-09 15:46:16 -07:00
agiardullo
c5f3707d42 DisableIndexing() for Transactions
Summary:
MyRocks reported some perfomance issues when inserting many keys into a transaction due to the cost of inserting new keys into WriteBatchWithIndex.  Frequently, they don't even need the keys to be indexed as they don't need to read them back.  DisableIndexing() can be used to avoid the cost of indexing.

I also plan on eventually investigating if we can improve WriteBatchWithIndex performance.  But even if we improved the perf here, it is still beneficial to be able to disable the indexing all together for large transactions.

Test Plan: unit test

Reviewers: igor, rven, yoshinorim, spetrunia, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D48471
2015-10-09 15:36:09 -07:00
Igor Canadi
115427ef63 Add APIs PauseBackgroundWork() and ContinueBackgroundWork()
Summary:
To support a new MongoDB capability, we need to make sure that we don't do any IO for a short period of time. For background, see:
* https://jira.mongodb.org/browse/SERVER-20704
* https://jira.mongodb.org/browse/SERVER-18899

To implement that, I add a new API calls PauseBackgroundWork() and ContinueBackgroundWork() which reuse the capability we already have in place for RefitLevel() function.

Test Plan: Added a new test in db_test. Made sure that test fails when PauseBackgroundWork() is commented out.

Reviewers: IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47901
2015-10-02 13:17:34 -07:00
agiardullo
03b08ba9a9 Return MergeInProgress when fetching from transactions or WBWI with overwrite_key
Summary:
WriteBatchWithIndex::GetFromBatchAndDB only works correctly for overwrite_key=false.  Transactions use overwrite_key=true (since WriteBatchWithIndex::GetIteratorWithBase only works when overwrite_key=true).  So currently, Transactions could return incorrectly merged results when calling Get/GetForUpdate().

Until a permanent fix can be put in place, Transaction::Get[ForUpdate] and WriteBatchWithIndex::GetFromBatch[AndDB] will now return MergeInProgress if the most recent write to a key in the batch is a Merge.

Test Plan: more tests

Reviewers: sdong, yhchiang, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47817
2015-09-30 11:14:42 -07:00
agiardullo
afe0dc539b SingleDelete support for Transactions
Summary: Transactional SingleDelete is needed for MyRocks.  Note: This diff requires D47529.

Test Plan: Added some new tests in this diff as well as more tests added in D47529

Reviewers: rven, sdong, igor, yhchiang

Reviewed By: yhchiang

Subscribers: yoshinorim, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47535
2015-09-28 12:14:26 -07:00
agiardullo
25fd743d75 Fix SingleDelete support in WriteBatchWithIndex
Summary: Fixed some  bugs in using SingleDelete on a WriteBatchWithIndex and added some tests.

Test Plan: new tests

Reviewers: sdong, yhchiang, rven, kradhakrishnan, IslamAbdelRahman, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47529
2015-09-25 12:23:07 -07:00
Islam AbdelRahman
f03b5c987b Add experimental DB::AddFile() to plug sst files into empty DB
Summary:
This is an initial version of bulk load feature

This diff allow us to create sst files, and then bulk load them later, right now the restrictions for loading an sst file are
(1) Memtables are empty
(2) Added sst files have sequence number = 0, and existing values in database have sequence number = 0
(3) Added sst files values are not overlapping

Test Plan: unit testing

Reviewers: igor, ott, sdong

Reviewed By: sdong

Subscribers: leveldb, ott, dhruba

Differential Revision: https://reviews.facebook.net/D39081
2015-09-23 12:42:43 -07:00
jsteemann
669b892f97 add missing header required for std::function
otherwise Visual Studio will have trouble compiling this file
2015-09-18 22:18:40 +02:00
Andres Noetzli
014fd55adc Support for SingleDelete()
Summary:
This patch fixes #7460559. It introduces SingleDelete as a new database
operation. This operation can be used to delete keys that were never
overwritten (no put following another put of the same key). If an overwritten
key is single deleted the behavior is undefined. Single deletion of a
non-existent key has no effect but multiple consecutive single deletions are
not allowed (see limitations).

In contrast to the conventional Delete() operation, the deletion entry is
removed along with the value when the two are lined up in a compaction. Note:
The semantics are similar to @igor's prototype that allowed to have this
behavior on the granularity of a column family (
https://reviews.facebook.net/D42093 ). This new patch, however, is more
aggressive when it comes to removing tombstones: It removes the SingleDelete
together with the value whenever there is no snapshot between them while the
older patch only did this when the sequence number of the deletion was older
than the earliest snapshot.

Most of the complex additions are in the Compaction Iterator, all other changes
should be relatively straightforward. The patch also includes basic support for
single deletions in db_stress and db_bench.

Limitations:
- Not compatible with cuckoo hash tables
- Single deletions cannot be used in combination with merges and normal
  deletions on the same key (other keys are not affected by this)
- Consecutive single deletions are currently not allowed (and older version of
  this patch supported this so it could be resurrected if needed)

Test Plan: make all check

Reviewers: yhchiang, sdong, rven, anthony, yoshinorim, igor

Reviewed By: igor

Subscribers: maykov, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43179
2015-09-17 11:42:56 -07:00
Dmytro Okhonko
31a27a3606 Callback for informing backup downloading added
Summary:
In case of huge db backup infromation about progress of downloading would help.
New callback parameter in CreateNewBackup() function will trigger whenever a some amount of data downloaded.
Task: 8057631

Test Plan:
ProgressCallbackDuringBackup test that cover new functionality added to BackupableDBTest tests.
other test succeed as well.

Reviewers: Guenena, benj, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D46575
2015-09-15 17:08:30 -07:00
agiardullo
aa6eed0c1e Transaction stats
Summary: Added funtions to fetch the number of locked keys in a transaction, the number of pending puts/merge/deletes, and the elapsed time

Test Plan: unit tests

Reviewers: yoshinorim, jkedgar, rven, sdong, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D45417
2015-09-09 13:35:53 -07:00
agiardullo
5e94f68f35 TransactionDB Custom Locking API
Summary:
Prototype of API to allow MyRocks to override default Mutex/CondVar used by transactions with their own implementations.  They would simply need to pass their own implementations of Mutex/CondVar to the templated TransactionDB::Open().

Default implementation of TransactionDBMutex/TransactionDBCondVar provided (but the code is not currently changed to use this).

Let me know if this API makes sense or if it should be changed

Test Plan: n/a

Reviewers: yhchiang, rven, igor, sdong, spetrunia

Reviewed By: spetrunia

Subscribers: maykov, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43761
2015-09-08 17:03:57 -07:00
Mayank Pundir
d9f42aa60d Adding a verifyBackup method to BackupEngine
Summary: This diff adds a verifyBackup method to BackupEngine. The method verifies the name and size of each file in the backup.

Test Plan: Unit test cases created and passing.

Reviewers: igor, benj

Subscribers: zelaine.fong, yhchiang, sdong, lgalanis, dhruba, AaronFeldman

Differential Revision: https://reviews.facebook.net/D46029
2015-09-03 17:27:21 -07:00
agiardullo
0f1aab6c12 Add SetLockTimeout for Transactions
Summary: MyRocks wants to be able to change the lock timeout of a transaction that has already started.  Expose existing SetLockTimeout function to users.

Test Plan: unit test

Reviewers: spetrunia, rven, sdong, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D45987
2015-09-02 20:07:19 -07:00
Andres Noetzli
3c9cef1eed Unified maps with Comparator for sorting, other cleanup
Summary:
This diff is a collection of cleanups that were initially part of D43179.
Additionally it adds a unified way of defining key-value maps that use a
Comparator for sorting (this was previously implemented in four different
places).

Test Plan: make clean check all

Reviewers: rven, anthony, yhchiang, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D45993
2015-09-02 13:58:22 -07:00
agiardullo
c3466eab07 Have Transactions use WriteBatch::RollbackToSavePoint
Summary:
Clean up transactions to use the new RollbackToSavePoint api in WriteBatchWithIndex.

Note, this diff depends on Pessimistic Transactions diff and ManagedSnapshot diff (D40869 and D43293).

Test Plan: unit tests

Reviewers: rven, yhchiang, kradhakrishnan, spetrunia, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43371
2015-08-11 17:53:30 -07:00
agiardullo
0db807ec28 Transaction error statuses
Summary:
Based on feedback from spetrunia, we should better differentiate error statuses for transaction failures.

https://github.com/MySQLOnRocksDB/mysql-5.6/issues/86#issuecomment-124605954

Test Plan: unit tests

Reviewers: rven, kradhakrishnan, spetrunia, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43323
2015-08-11 17:52:56 -07:00
agiardullo
c2f2cb0214 Pessimistic Transactions
Summary:
Initial implementation of Pessimistic Transactions.  This diff contains the api changes discussed in D38913.  This diff is pretty large, so let me know if people would prefer to meet up to discuss it.

MyRocks folks:  please take a look at the API in include/rocksdb/utilities/transaction[_db].h and let me know if you have any issues.

Also, you'll notice a couple of TODOs in the implementation of RollbackToSavePoint().  After chatting with Siying, I'm going to send out a separate diff for an alternate implementation of this feature that implements the rollback inside of WriteBatch/WriteBatchWithIndex.  We can then decide which route is preferable.

Next, I'm planning on doing some perf testing and then integrating this diff into MongoRocks for further testing.

Test Plan: Unit tests, db_bench parallel testing.

Reviewers: igor, rven, sdong, yhchiang, yoshinorim

Reviewed By: sdong

Subscribers: hermanlee4, maykov, spetrunia, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D40869
2015-08-11 17:52:23 -07:00
Poornima Chozhiyath Raman
960d936e83 Add function 'GetInfoLogList()'
Summary: The list of info log files of a db can be obtained using the new function.

Test Plan: New test in db_test.cc passed.

Reviewers: yhchiang, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: IslamAbdelRahman, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D41715
2015-08-05 16:16:46 -07:00
Mike Kolupaev
e06cf1a098 [wal changes 3/3] method in DB to sync WAL without blocking writers
Summary:
Subj. We really need this feature.

Previous diff D40899 has most of the changes to make this possible, this diff just adds the method.

Test Plan: `make check`, the new test fails without this diff; ran with ASAN, TSAN and valgrind.

Reviewers: igor, rven, IslamAbdelRahman, anthony, kradhakrishnan, tnovak, yhchiang, sdong

Reviewed By: sdong

Subscribers: MarkCallaghan, maykov, hermanlee4, yoshinorim, tnovak, dhruba

Differential Revision: https://reviews.facebook.net/D40905
2015-08-05 06:06:39 -07:00
Yueh-Hsuan Chiang
24daff6d7a Fix a typo and update HISTORY.md for NewCompactOnDeletionCollectorFactory().
Summary: Fix a typo and update HISTORY.md for NewCompactOnDeletionCollectorFactory().

Test Plan: no code change.

Reviewers: igor, anthony, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43521
2015-08-04 13:51:05 -07:00
Yueh-Hsuan Chiang
26894303c1 Add CompactOnDeletionCollector in utilities/table_properties_collectors.
Summary:
This diff adds CompactOnDeletionCollector in utilities/table_properties_collectors,
which applies a sliding window to a sst file and mark this file as need-compaction
when it observe enough deletion entries within the consecutive keys covered by
the sliding window.

Test Plan: compact_on_deletion_collector_test

Reviewers: igor, anthony, IslamAbdelRahman, kradhakrishnan, yoshinorim, sdong

Reviewed By: sdong

Subscribers: maykov, dhruba

Differential Revision: https://reviews.facebook.net/D41175
2015-08-03 20:42:55 -07:00