rocksdb

Author	SHA1	Message	Date
Andrew Kryczka	ed0a4c93ef	perf_context measure user bytes read Summary: With this PR, we can measure read-amp for queries where perf_context is enabled as follows: ``` SetPerfLevel(kEnableCount); Get(1, "foo"); double read_amp = static_cast<double>(get_perf_context()->block_read_byte / get_perf_context()->get_read_bytes); SetPerfLevel(kDisable); ``` Our internal infra enables perf_context for a sampling of queries. So we'll be able to compute the read-amp for the sample set, which can give us a good estimate of read-amp. Closes https://github.com/facebook/rocksdb/pull/2749 Differential Revision: D5647240 Pulled By: ajkr fbshipit-source-id: ad73550b06990cf040cc4528fa885360f308ec12	2017-08-18 11:43:33 -07:00
Maysam Yabandeh	1efc600ddf	Preload l0 index partitions Summary: This fixes the existing logic for pinning l0 index partitions. The patch preloads the partitions into block cache and pin them if they belong to level 0 and pin_l0 is set. The drawback is that it does many small IOs when preloading all the partitions into the cache is direct io is enabled. Working for a solution for that. Closes https://github.com/facebook/rocksdb/pull/2661 Differential Revision: D5554010 Pulled By: maysamyabandeh fbshipit-source-id: 1e6f32a3524d71355c77d4138516dcfb601ca7b2	2017-08-18 10:56:20 -07:00
Archit Mishra	bddd5d3630	Added mechanism to track deadlock chain Summary: Changes: * extended the wait_txn_map to track additional information * designed circular buffer to store n latest deadlocks' information * added test coverage to verify the additional information tracked is accurately stored in the buffer Closes https://github.com/facebook/rocksdb/pull/2630 Differential Revision: D5478025 Pulled By: armishra fbshipit-source-id: 2b138de7b5a73f5ca554fc3ff8220a3be49f39e7	2017-08-17 18:56:21 -07:00
Sagar Vemuri	9a44b4c32c	Allow merge operator to be called even with a single operand Summary: Added a function `MergeOperator::DoesAllowSingleMergeOperand()` to allow invoking a merge operator even with a single merge operand, if overriden. This is needed for Cassandra-on-RocksDB work. All Cassandra writes are through merges and this will allow a single merge-value to be updated in the merge-operator invoked via a compaction, if needed, due to an expired TTL. Closes https://github.com/facebook/rocksdb/pull/2721 Differential Revision: D5608706 Pulled By: sagar0 fbshipit-source-id: f299f9f91c4d1ac26e48bd5906e122c1c5e5f3fc	2017-08-16 23:42:00 -07:00
lxcode	3204a4f64b	Fix missing stdlib include required for abort() Summary: If ROCKSDB_LITE is defined, a call to abort() is introduced. This call requires stdlib.h. Build log of unpatched 5.7.1: http://beefy9.nyi.freebsd.org/data/110amd64-default/447974/logs/rocksdb-lite-5.7.1.log Closes https://github.com/facebook/rocksdb/pull/2744 Reviewed By: yiwu-arbug Differential Revision: D5632372 Pulled By: lxcode fbshipit-source-id: b2a8e692bf14ccf1f875f3a00463e87bba310a2b	2017-08-15 12:32:11 -07:00
Kent767	ac098a4626	expose set_skip_stats_update_on_db_open to C bindings Summary: It would be super helpful to not have to recompile rocksdb to get this performance tweak for mechanical disks. I have signed the CLA. Closes https://github.com/facebook/rocksdb/pull/2718 Differential Revision: D5606994 Pulled By: yiwu-arbug fbshipit-source-id: c05e92bad0d03bd38211af1e1ced0d0d1e02f634	2017-08-11 12:16:45 -07:00
Stanislav Tkach	25df24254b	Add column families related functions (C API) Summary: (#2564) Closes https://github.com/facebook/rocksdb/pull/2669 Differential Revision: D5594151 Pulled By: yiwu-arbug fbshipit-source-id: 67ae9446342f3323d6ecad8e811f4158da194270	2017-08-10 13:43:54 -07:00
Oleksandr Anyshchenko	0cecf8155b	Write batch for `TransactionDB` in C API Summary: Closes https://github.com/facebook/rocksdb/pull/2655 Differential Revision: D5600858 Pulled By: yiwu-arbug fbshipit-source-id: cf52f9104e348438bf168dc6bf7af3837faf12ef	2017-08-10 11:58:53 -07:00
FireMail	6a9de43477	Windows.h macro call fix Summary: - moved the max call for numeric limits into paranthesis so that max wont be called as macro when including <Windows.h> Closes https://github.com/facebook/rocksdb/pull/2709 Differential Revision: D5600773 Pulled By: yiwu-arbug fbshipit-source-id: fd28b6f7c10ddce21bad4030f2db06f965bb08da	2017-08-10 11:39:29 -07:00
Aaron G	7848f0b24c	add VerifyChecksum() to db.h Summary: We need a tool to check any sst file corruption in the db. It will check all the sst files in current version and read all the blocks (data, meta, index) with checksum verification. If any verification fails, the function will return non-OK status. Closes https://github.com/facebook/rocksdb/pull/2498 Differential Revision: D5324269 Pulled By: lightmark fbshipit-source-id: 6f8a272008b722402a772acfc804524c9d1a483b	2017-08-09 15:58:13 -07:00
Maysam Yabandeh	a9a4e89c38	Fix valgrind complaint about initialization Summary: Closes https://github.com/facebook/rocksdb/pull/2697 Differential Revision: D5573894 Pulled By: maysamyabandeh fbshipit-source-id: 8fc03ea8ea6f3f3bc0f68b64cf90243a70562dc4	2017-08-07 08:49:52 -07:00
Maysam Yabandeh	c9804e007a	Refactor TransactionDBImpl Summary: This opens space for the new implementations of TransactionDBImpl such as WritePreparedTxnDBImpl that has a different policy of how to write to DB. Closes https://github.com/facebook/rocksdb/pull/2689 Differential Revision: D5568918 Pulled By: maysamyabandeh fbshipit-source-id: f7eac866e175daf3793ae79da108f65cc7dc7b25	2017-08-05 17:26:15 -07:00
Andrew Kryczka	cc01985db0	Introduce bottom-pri thread pool for large universal compactions Summary: When we had a single thread pool for compactions, a thread could be busy for a long time (minutes) executing a compaction involving the bottom level. In multi-instance setups, the entire thread pool could be consumed by such bottom-level compactions. Then, top-level compactions (e.g., a few L0 files) would be blocked for a long time ("head-of-line blocking"). Such top-level compactions are critical to prevent compaction stalls as they can quickly reduce number of L0 files / sorted runs. This diff introduces a bottom-priority queue for universal compactions including the bottom level. This alleviates the head-of-line blocking situation for fast, top-level compactions. - Added `Env::Priority::BOTTOM` thread pool. This feature is only enabled if user explicitly configures it to have a positive number of threads. - Changed `ThreadPoolImpl`'s default thread limit from one to zero. This change is invisible to users as we call `IncBackgroundThreadsIfNeeded` on the low-pri/high-pri pools during `DB::Open` with values of at least one. It is necessary, though, for bottom-pri to start with zero threads so the feature is disabled by default. - Separated `ManualCompaction` into two parts in `PrepickedCompaction`. `PrepickedCompaction` is used for any compaction that's picked outside of its execution thread, either manual or automatic. - Forward universal compactions involving last level to the bottom pool (worker thread's entry point is `BGWorkBottomCompaction`). - Track `bg_bottom_compaction_scheduled_` so we can wait for bottom-level compactions to finish. We don't count them against the background jobs limits. So users of this feature will get an extra compaction for free. Closes https://github.com/facebook/rocksdb/pull/2580 Differential Revision: D5422916 Pulled By: ajkr fbshipit-source-id: a74bd11f1ea4933df3739b16808bb21fcd512333	2017-08-03 15:43:29 -07:00
Siying Dong	21696ba502	Replace dynamic_cast<> Summary: Replace dynamic_cast<> so that users can choose to build with RTTI off, so that they can save several bytes per object, and get tiny more memory available. Some nontrivial changes: 1. Add Comparator::GetRootComparator() to get around the internal comparator hack 2. Add the two experiemental functions to DB 3. Add TableFactory::GetOptionString() to avoid unnecessary casting to get the option string 4. Since 3 is done, move the parsing option functions for table factory to table factory files too, to be symmetric. Closes https://github.com/facebook/rocksdb/pull/2645 Differential Revision: D5502723 Pulled By: siying fbshipit-source-id: fd13cec5601cf68a554d87bfcf056f2ffa5fbf7c	2017-07-28 16:27:16 -07:00
Sagar Vemuri	aace46516b	Fix license headers in Cassandra related files Summary: I might have missed these while doing some recent cassandra code reviews. Closes https://github.com/facebook/rocksdb/pull/2663 Differential Revision: D5520138 Pulled By: sagar0 fbshipit-source-id: 340930afe9efe03c75f535a1da1f89bd3e53c1f9	2017-07-28 13:56:56 -07:00
Islam AbdelRahman	50a969131f	CacheActivityLogger, component to log cache activity into a file Summary: Simple component that will add a new entry in a log file every time we lookup/insert a key in SimCache. API: ``` SimCache::StartActivityLogging(<file_name>, <env>, <optional_max_size>) SimCache::StopActivityLogging() ``` Sending for review, Still need to add more comments. I was thinking about a better approach, but I ended up deciding I will use a mutex to sync the writes to the file, since this feature should not be heavily used and only used to collect info that will be analyzed offline. I think it's okay to hold the mutex every time we lookup/add to the SimCache. Closes https://github.com/facebook/rocksdb/pull/2295 Differential Revision: D5063826 Pulled By: IslamAbdelRahman fbshipit-source-id: f3b5daed8b201987c9a071146ddd5c5740a2dd8c	2017-07-28 12:36:48 -07:00
kapitan-k	34112aeffd	Added db paths to c Summary: Closes https://github.com/facebook/rocksdb/pull/2613 Differential Revision: D5476064 Pulled By: sagar0 fbshipit-source-id: 6b30a9eacb93a945bbe499eafb90565fa9f1798b	2017-07-24 11:58:02 -07:00
Islam AbdelRahman	a4c42e8007	Fix UBSAN issue of passing nullptr to memcmp Summary: As explained in the comments, Sometimes we create Slice(nullptr, 0) in our code base which cause us to do calls like ``` memcmp(nullptr, "abc", 0); ``` That's fine since the len is equal 0, but UBSAN is not happy about it so disable UBSAN for this function and add an assert instead Closes https://github.com/facebook/rocksdb/pull/2616 Differential Revision: D5458326 Pulled By: IslamAbdelRahman fbshipit-source-id: cfca32abe30f7d8f760c9f77ecd9543dfb1170dd	2017-07-24 10:54:37 -07:00
Siying Dong	e67b35c076	Add Iterator::Refresh() Summary: Add and implement Iterator::Refresh(). When this function is called, if the super version doesn't change, update the sequence number of the iterator to the latest one and invalidate the iterator. If the super version changed, recreated the whole iterator. This can help users reuse the iterator more easily. Closes https://github.com/facebook/rocksdb/pull/2621 Differential Revision: D5464500 Pulled By: siying fbshipit-source-id: f548bd35e85c1efca2ea69273802f6704eba6ba9	2017-07-24 10:54:37 -07:00
Sagar Vemuri	72502cf227	Revert "comment out unused parameters" Summary: This reverts the previous commit `1d7048c598`, which broke the build. Did a `git revert 1d7048c`. Closes https://github.com/facebook/rocksdb/pull/2627 Differential Revision: D5476473 Pulled By: sagar0 fbshipit-source-id: 4756ff5c0dfc88c17eceb00e02c36176de728d06	2017-07-21 18:26:26 -07:00
Victor Gao	1d7048c598	comment out unused parameters Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually. Reviewed By: igorsugak Differential Revision: D5454343 fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2	2017-07-21 14:57:44 -07:00
Sushma Devendrappa	0655b58582	enable PinnableSlice for RowCache Summary: This patch enables using PinnableSlice for RowCache, changes include not releasing the cache handle immediately after lookup in TableCache::Get, instead pass a Cleanble function which does Cache::RleaseHandle. Closes https://github.com/facebook/rocksdb/pull/2492 Differential Revision: D5316216 Pulled By: maysamyabandeh fbshipit-source-id: d2a684bd7e4ba73772f762e58a82b5f4fbd5d362	2017-07-17 15:08:30 -07:00
Siying Dong	3c327ac2d0	Change RocksDB License Summary: Closes https://github.com/facebook/rocksdb/pull/2589 Differential Revision: D5431502 Pulled By: siying fbshipit-source-id: 8ebf8c87883daa9daa54b2303d11ce01ab1f6f75	2017-07-15 16:11:23 -07:00
奏之章	70440f7a63	Add virtual func IsDeleteRangeSupported Summary: this modify allows third-party tables able to support delete range Closes https://github.com/facebook/rocksdb/pull/2035 Differential Revision: D5407973 Pulled By: ajkr fbshipit-source-id: 82e364b7dd5a198660788d59543f15b8f95cc418	2017-07-12 16:58:45 -07:00
Islam AbdelRahman	269d383d5d	Bump version to 5.7 Summary: Bump version to 5.7 Closes https://github.com/facebook/rocksdb/pull/2566 Differential Revision: D5400043 Pulled By: IslamAbdelRahman fbshipit-source-id: 74aae4ff143d370d7d89807e5be08a6ab827da40	2017-07-11 14:29:13 -07:00
Andrew Kryczka	33042573db	Fix GetCurrentTime() initialization for valgrind Summary: Valgrind had false positive complaints about the initialization pattern for `GetCurrentTime()`'s argument in #2480. We can instead have the client initialize the time variable before calling `GetCurrentTime()`, and have `GetCurrentTime()` promise to only overwrite it in success case. Closes https://github.com/facebook/rocksdb/pull/2526 Differential Revision: D5358689 Pulled By: ajkr fbshipit-source-id: 857b189f24c19196f6bb299216f3e23e7bc4be42	2017-07-05 12:12:00 -07:00
Maysam Yabandeh	45b9bb0331	Cut filter partition based on metadata_block_size Summary: Currently metadata_block_size controls only index partition size. With this patch a partition is cut after any of index or filter partitions reaches metadata_block_size. Closes https://github.com/facebook/rocksdb/pull/2452 Differential Revision: D5275651 Pulled By: maysamyabandeh fbshipit-source-id: 5057e4424b4c8902043782e6bf8c38f0c4f25160	2017-07-02 10:42:12 -07:00
Sagar Vemuri	1cd45cd1b3	FIFO Compaction with TTL Summary: Introducing FIFO compactions with TTL. FIFO compaction is based on size only which makes it tricky to enable in production as use cases can have organic growth. A user requested an option to drop files based on the time of their creation instead of the total size. To address that request: - Added a new TTL option to FIFO compaction options. - Updated FIFO compaction score to take TTL into consideration. - Added a new table property, creation_time, to keep track of when the SST file is created. - Creation_time is set as below: - On Flush: Set to the time of flush. - On Compaction: Set to the max creation_time of all the files involved in the compaction. - On Repair and Recovery: Set to the time of repair/recovery. - Old files created prior to this code change will have a creation_time of 0. - FIFO compaction with TTL is enabled when ttl > 0. All files older than ttl will be deleted during compaction. i.e. `if (file.creation_time < (current_time - ttl)) then delete(file)`. This will enable cases where you might want to delete all files older than, say, 1 day. - FIFO compaction will fall back to the prior way of deleting files based on size if: - the creation_time of all files involved in compaction is 0. - the total size (of all SST files combined) does not drop below `compaction_options_fifo.max_table_files_size` even if the files older than ttl are deleted. This feature is not supported if max_open_files != -1 or with table formats other than Block-based. Test Plan: Added tests. Benchmark results: Base: FIFO with max size: 100MB :: ``` svemuri@dev15905 ~/rocksdb (fifo-compaction) $ TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=readwhilewriting --num=5000000 --threads=16 --compaction_style=2 --fifo_compaction_max_table_files_size_mb=100 readwhilewriting : 1.924 micros/op 519858 ops/sec; 13.6 MB/s (1176277 of 5000000 found) ``` With TTL (a low one for testing) :: ``` svemuri@dev15905 ~/rocksdb (fifo-compaction) $ TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=readwhilewriting --num=5000000 --threads=16 --compaction_style=2 --fifo_compaction_max_table_files_size_mb=100 --fifo_compaction_ttl=20 readwhilewriting : 1.902 micros/op 525817 ops/sec; 13.7 MB/s (1185057 of 5000000 found) ``` Example Log lines: ``` 2017/06/26-15:17:24.609249 7fd5a45ff700 (Original Log Time 2017/06/26-15:17:24.609177) [db/compaction_picker.cc:1471] [default] FIFO compaction: picking file 40 with creation time 1498515423 for deletion 2017/06/26-15:17:24.609255 7fd5a45ff700 (Original Log Time 2017/06/26-15:17:24.609234) [db/db_impl_compaction_flush.cc:1541] [default] Deleted 1 files ... 2017/06/26-15:17:25.553185 7fd5a61a5800 [DEBUG] [db/db_impl_files.cc:309] [JOB 0] Delete /dev/shm/dbbench/000040.sst type=2 #40 -- OK 2017/06/26-15:17:25.553205 7fd5a61a5800 EVENT_LOG_v1 {"time_micros": 1498515445553199, "job": 0, "event": "table_file_deletion", "file_number": 40} ``` SST Files remaining in the dbbench dir, after db_bench execution completed: ``` svemuri@dev15905 ~/rocksdb (fifo-compaction) $ ls -l /dev/shm//dbbench/*.sst -rw-r--r--. 1 svemuri users 30749887 Jun 26 15:17 /dev/shm//dbbench/000042.sst -rw-r--r--. 1 svemuri users 30768779 Jun 26 15:17 /dev/shm//dbbench/000044.sst -rw-r--r--. 1 svemuri users 30757481 Jun 26 15:17 /dev/shm//dbbench/000046.sst ``` Closes https://github.com/facebook/rocksdb/pull/2480 Differential Revision: D5305116 Pulled By: sagar0 fbshipit-source-id: 3e5cfcf5dd07ed2211b5b37492eb235b45139174	2017-06-27 17:11:48 -07:00
Siying Dong	e517bfa2c2	CLANG Tidy Summary: Closes https://github.com/facebook/rocksdb/pull/2502 Differential Revision: D5326498 Pulled By: siying fbshipit-source-id: 2f0ac6dc6ca5ddb23cecf67a278c086e52646714	2017-06-27 11:00:59 -07:00
Ewout Prangsma	51778612c9	Encryption at rest support Summary: This PR adds support for encrypting data stored by RocksDB when written to disk. It adds an `EncryptedEnv` override of the `Env` class with matching overrides for sequential&random access files. The encryption itself is done through a configurable `EncryptionProvider`. This class creates is asked to create `BlockAccessCipherStream` for a file. This is where the actual encryption/decryption is being done. Currently there is a Counter mode implementation of `BlockAccessCipherStream` with a `ROT13` block cipher (NOTE the `ROT13` is for demo purposes only!!). The Counter operation mode uses an initial counter & random initialization vector (IV). Both are created randomly for each file and stored in a 4K (default size) block that is prefixed to that file. The `EncryptedEnv` implementation is such that clients of the `Env` class do not see this prefix (nor data, nor in filesize). The largest part of the prefix block is also encrypted, and there is room left for implementation specific settings/values/keys in there. To test the encryption, the `DBTestBase` class has been extended to consider a new environment variable called `ENCRYPTED_ENV`. If set, the test will setup a encrypted instance of the `Env` class to use for all tests. Typically you would run it like this: ``` ENCRYPTED_ENV=1 make check_some ``` There is also an added test that checks that some data inserted into the database is or is not "visible" on disk. With `ENCRYPTED_ENV` active it must not find plain text strings, with `ENCRYPTED_ENV` unset, it must find the plain text strings. Closes https://github.com/facebook/rocksdb/pull/2424 Differential Revision: D5322178 Pulled By: sdwilsh fbshipit-source-id: 253b0a9c2c498cc98f580df7f2623cbf7678a27f	2017-06-26 16:56:24 -07:00
Aaron Gao	0025a36409	revert perf_context and io_stats to __thread Summary: https://github.com/facebook/rocksdb/pull/2380 introduces a regression by replacing __thread with ThreadLocalPtr. Revert the thread local implementation back. Closes https://github.com/facebook/rocksdb/pull/2485 Differential Revision: D5308050 Pulled By: lightmark fbshipit-source-id: 2676e9c22edf76e8133d3f4c50e2711e11a95480	2017-06-26 15:27:17 -07:00
Maysam Yabandeh	499ebb3ab5	Optimize for serial commits in 2PC Summary: Throughput: 46k tps in our sysbench settings (filling the details later) The idea is to have the simplest change that gives us a reasonable boost in 2PC throughput. Major design changes: 1. The WAL file internal buffer is not flushed after each write. Instead it is flushed before critical operations (WAL copy via fs) or when FlushWAL is called by MySQL. Flushing the WAL buffer is also protected via mutex_. 2. Use two sequence numbers: last seq, and last seq for write. Last seq is the last visible sequence number for reads. Last seq for write is the next sequence number that should be used to write to WAL/memtable. This allows to have a memtable write be in parallel to WAL writes. 3. BatchGroup is not used for writes. This means that we can have parallel writers which changes a major assumption in the code base. To accommodate for that i) allow only 1 WriteImpl that intends to write to memtable via mem_mutex_--which is fine since in 2PC almost all of the memtable writes come via group commit phase which is serial anyway, ii) make all the parts in the code base that assumed to be the only writer (via EnterUnbatched) to also acquire mem_mutex_, iii) stat updates are protected via a stat_mutex_. Note: the first commit has the approach figured out but is not clean. Submitting the PR anyway to get the early feedback on the approach. If we are ok with the approach I will go ahead with this updates: 0) Rebase with Yi's pipelining changes 1) Currently batching is disabled by default to make sure that it will be consistent with all unit tests. Will make this optional via a config. 2) A couple of unit tests are disabled. They need to be updated with the serial commit of 2PC taken into account. 3) Replacing BatchGroup with mem_mutex_ got a bit ugly as it requires releasing mutex_ beforehand (the same way EnterUnbatched does). This needs to be cleaned up. Closes https://github.com/facebook/rocksdb/pull/2345 Differential Revision: D5210732 Pulled By: maysamyabandeh fbshipit-source-id: 78653bd95a35cd1e831e555e0e57bdfd695355a4	2017-06-24 14:11:29 -07:00
jsteemann	521724ba82	fixed wrong type for "allow_compaction" parameter Summary: should be boolean, not uint64_t MSVC complains about it during compilation with error `include\rocksdb\advanced_options.h(77): warning C4800: 'uint64_t': forcing value to bool 'true' or 'false' (performance warning)` Closes https://github.com/facebook/rocksdb/pull/2487 Differential Revision: D5310685 Pulled By: siying fbshipit-source-id: 719a33b3dba4f711aa72e3f229013c188015dc86	2017-06-23 09:41:19 -07:00
Andrew Kryczka	71f5bcb730	Introduce OnBackgroundError callback Summary: Some users want to prevent rocksdb from entering read-only mode in certain error cases. This diff gives them a callback, `OnBackgroundError`, that they can use to achieve it. - call `OnBackgroundError` every time we consider setting `bg_error_`. Use its result to assign `bg_error_` but not to change the function's return status. - classified calls using `BackgroundErrorReason` to give the callback some info about where the error happened - renamed `ParanoidCheck` to something more specific so we can provide a clear `BackgroundErrorReason` - unit tests for the most common cases: flush or compaction errors Closes https://github.com/facebook/rocksdb/pull/2477 Differential Revision: D5300190 Pulled By: ajkr fbshipit-source-id: a0ea4564249719b83428e3f4c6ca2c49e366e9b3	2017-06-22 19:41:50 -07:00
Siying Dong	af1746751f	WriteBufferManager will not trigger flush if much data is already being flushed Summary: Even if hard limit hits, flushing more memtable may not help cap the memory usage if already more than half data is scheduled for flush. Not triggering flush instead. Closes https://github.com/facebook/rocksdb/pull/2469 Differential Revision: D5284249 Pulled By: siying fbshipit-source-id: 8ab7ba1aba56a634dbe72b318fcab2093063972e	2017-06-21 10:41:37 -07:00
Dmitri Smirnov	a21db161c9	Implement ReopenWritibaleFile on Windows and other fixes Summary: Make default impl return NoSupported so the db_blob tests exist in a meaningful manner. Replace std::thread to port::Thread Closes https://github.com/facebook/rocksdb/pull/2465 Differential Revision: D5275563 Pulled By: yiwu-arbug fbshipit-source-id: cedf1a18a2c05e20d768c1308b3f3224dbd70ab6	2017-06-20 10:31:13 -07:00
Andrew Kryczka	0d278456c9	default implementation for InRange Summary: it's confusing to implementors of prefix extractor to implement an unused function Closes https://github.com/facebook/rocksdb/pull/2460 Differential Revision: D5267408 Pulled By: ajkr fbshipit-source-id: 2f1fe3131efc978f6098ae7a80e52bc7a0b13571	2017-06-18 12:42:42 -07:00
Ben Torfs	6a3377f454	Synchronize statistic enumeration values between statistics.h and java API Summary: Closes https://github.com/facebook/rocksdb/pull/2209 Differential Revision: D5251951 Pulled By: sagar0 fbshipit-source-id: 03a73d025a7b4a322bb8d8d86f5d249fcd7dd00e	2017-06-14 16:59:42 -07:00
Sagar Vemuri	89ad9f3adb	Allow ignoring unknown options when loading options from a file Summary: Added a flag, `ignore_unknown_options`, to skip unknown options when loading an options file (using `LoadLatestOptions`/`LoadOptionsFromFile`) or while verifying options (using `CheckOptionsCompatibility`). This will help in downgrading the db to an older version. Also added `--ignore_unknown_options` flag to ldb Example Use case: In MyRocks, if copying from newer version to older version, it is often impossible to start because of new RocksDB options that don't exist in older version, even though data format is compatible. MyRocks uses these load and verify functions in [ha_rocksdb.cc::check_rocksdb_options_compatibility](`e004fd9f41/storage/rocksdb/ha_rocksdb.cc (L3348-L3401)`). Test Plan: Updated the unit tests. `make check` ldb: $ ./ldb --db=/tmp/test_db --create_if_missing put a1 b1 OK Now edit /tmp/test_db/<OPTIONS-file> and add an unknown option. Try loading the options now, and it fails: $ ./ldb --db=/tmp/test_db --try_load_options get a1 Failed: Invalid argument: Unrecognized option DBOptions:: abcd Passes with the new --ignore_unknown_options flag $ ./ldb --db=/tmp/test_db --try_load_options --ignore_unknown_options get a1 b1 Closes https://github.com/facebook/rocksdb/pull/2423 Differential Revision: D5212091 Pulled By: sagar0 fbshipit-source-id: 2ec17636feb47dc0351b53a77e5f15ef7cbf2ca7	2017-06-13 16:58:01 -07:00
hyunwoo	6b5a5dc5d8	fixed typo Summary: fixed typo Closes https://github.com/facebook/rocksdb/pull/2430 Differential Revision: D5242471 Pulled By: IslamAbdelRahman fbshipit-source-id: 832eb3a4c70221444ccd2ae63217823fec56c748	2017-06-13 16:58:01 -07:00
Andrew Kryczka	c217e0b9c7	Call RateLimiter for compaction reads Summary: Allow users to rate limit background work based on read bytes, written bytes, or sum of read and written bytes. Support these by changing the RateLimiter API, so no additional options were needed. Closes https://github.com/facebook/rocksdb/pull/2433 Differential Revision: D5216946 Pulled By: ajkr fbshipit-source-id: aec57a8357dbb4bfde2003261094d786d94f724e	2017-06-13 14:56:46 -07:00
Maysam Yabandeh	6f4154d693	record index partition properties Summary: When Partitioning index/filter is enabled the user might need to check the index block size as well as the top-level index size via sst_dump. This patch records i) number of partitions, ii) top-level index size and make it accessible through sst_dump. The number of partitions for filters is the same as that of indexes. The top-level index for filters has a similar size to top-level index for indexes, so it is not repeated. Closes https://github.com/facebook/rocksdb/pull/2437 Differential Revision: D5224225 Pulled By: maysamyabandeh fbshipit-source-id: 5324598c75793523aef1bb7ee225a5475e95a9cb	2017-06-13 11:21:32 -07:00
Siying Dong	5582123dee	Sample number of reads per SST file Summary: We estimate number of reads per SST files, by updating the counter per file in sampled read requests. This information can later be used to trigger compactions to improve read performacne. Closes https://github.com/facebook/rocksdb/pull/2417 Differential Revision: D5193528 Pulled By: siying fbshipit-source-id: b4241c5ad0eaf444b61afb53f8e6290d9f5da2df	2017-06-12 07:12:08 -07:00
Aaron Gao	2e64f450dc	bump version to 5.6 Summary: Bump version to 5.6 beforehand in master Closes https://github.com/facebook/rocksdb/pull/2411 Differential Revision: D5186896 Pulled By: lightmark fbshipit-source-id: 079538e621b1a959c2dc99dada894e9cdb99ef95	2017-06-05 16:15:21 -07:00
Siying Dong	52a7f38b19	WriteOptions.low_pri which can throttle low pri writes if needed Summary: If ReadOptions.low_pri=true and compaction is behind, the write will either return immediate or be slowed down based on ReadOptions.no_slowdown. Closes https://github.com/facebook/rocksdb/pull/2369 Differential Revision: D5127619 Pulled By: siying fbshipit-source-id: d30e1cff515890af0eff32dfb869d2e4c9545eb0	2017-06-05 15:02:35 -07:00
hyunwoo	c7662a44a4	fixed typo Summary: fixed typo Closes https://github.com/facebook/rocksdb/pull/2376 Differential Revision: D5183630 Pulled By: ajkr fbshipit-source-id: 133cfd0445959e70aa2cd1a12151bf3c0c5c3ac5	2017-06-05 11:27:34 -07:00
Aaron Gao	7f6c02dda1	using ThreadLocalPtr to hide ROCKSDB_SUPPORT_THREAD_LOCAL from public… Summary: … headers https://github.com/facebook/rocksdb/pull/2199 should not reference RocksDB-specific macros (like ROCKSDB_SUPPORT_THREAD_LOCAL in this case) to public headers, `iostats_context.h` and `perf_context.h`. We shouldn't do that because users have to provide these compiler flags when building their binary with RocksDB. We should hide the thread local global variable inside our implementation and just expose a function api to retrieve these variables. It may break some users for now but good for long term. make check -j64 Closes https://github.com/facebook/rocksdb/pull/2380 Differential Revision: D5177896 Pulled By: lightmark fbshipit-source-id: 6fcdfac57f2e2dcfe60992b7385c5403f6dcb390	2017-06-02 17:26:19 -07:00
Mike Kolupaev	138b87eae4	Fix interaction between CompactionFilter::Decision::kRemoveAndSkipUnt… Summary: Fixes the following scenario: 1. Set prefix extractor. Enable bloom filters, with `whole_key_filtering = false`. Use compaction filter that sometimes returns `kRemoveAndSkipUntil`. 2. Do a compaction. 3. Compaction creates an iterator with `total_order_seek = false`, calls `SeekToFirst()` on it, then repeatedly calls `Next()`. 4. At some point compaction filter returns `kRemoveAndSkipUntil`. 5. Compaction calls `Seek(skip_until)` on the iterator. The key that it seeks to happens to have prefix that doesn't match the bloom filter. Since `total_order_seek = false`, iterator becomes invalid, and compaction thinks that it has reached the end. The rest of the compaction input is silently discarded. The fix is to make compaction iterator use `total_order_seek = true`. The implementation for PlainTable is quite awkward. I've made `kRemoveAndSkipUntil` officially incompatible with PlainTable. If you try to use them together, compaction will fail, and DB will enter read-only mode (`bg_error_`). That's not a very graceful way to communicate a misconfiguration, but the alternatives don't seem worth the implementation time and complexity. To be able to check in advance that `kRemoveAndSkipUntil` is not going to be used with PlainTable, we'd need to extend the interface of either `CompactionFilter` or `InternalIterator`. It seems unlikely that anyone will ever want to use `kRemoveAndSkipUntil` with PlainTable: PlainTable probably has very few users, and `kRemoveAndSkipUntil` has only one user so far: us (logdevice). Closes https://github.com/facebook/rocksdb/pull/2349 Differential Revision: D5110388 Pulled By: lightmark fbshipit-source-id: ec29101a99d9dcd97db33923b87f72bce56cc17a	2017-06-02 15:11:38 -07:00
Siying Dong	95b0e89b5d	Improve write buffer manager (and allow the size to be tracked in block cache) Summary: Improve write buffer manager in several ways: 1. Size is tracked when arena block is allocated, rather than every allocation, so that it can better track actual memory usage and the tracking overhead is slightly lower. 2. We start to trigger memtable flush when 7/8 of the memory cap hits, instead of 100%, and make 100% much harder to hit. 3. Allow a cache object to be passed into buffer manager and the size allocated by memtable can be costed there. This can help users have one single memory cap across block cache and memtable. Closes https://github.com/facebook/rocksdb/pull/2350 Differential Revision: D5110648 Pulled By: siying fbshipit-source-id: b4238113094bf22574001e446b5d88523ba00017	2017-06-02 14:26:56 -07:00
Andrew Kryczka	a4d9c02511	Pass CF ID to MemTableRepFactory Summary: Some users want to monitor column family activity in their custom memtable implementations. Previously there was no way to figure out with which column family a memtable is associated. This diff: - adds an overload to MemTableRepFactory::CreateMemTableRep() that provides the CF ID. For compatibility, its default implementation calls the old overload. - updates MemTable to create MemTableRep's using the new overload. Closes https://github.com/facebook/rocksdb/pull/2346 Differential Revision: D5108061 Pulled By: ajkr fbshipit-source-id: 3a1921214a348dd8ea0f54e1cab3b71c3d46d616	2017-06-02 12:12:06 -07:00

1 2 3 4 5 ...

1361 Commits