Summary:
add direct_io and compaction_readahead_size in db_stress
test direct_io under db_stress with compaction_readahead_size enabled to capture bugs found in production.
`./db_stress --allow_concurrent_memtable_write=0 --use_direct_reads --use_direct_writes --compaction_readahead_size=4096`
Closes https://github.com/facebook/rocksdb/pull/1906
Differential Revision: D4604514
Pulled By: IslamAbdelRahman
fbshipit-source-id: ebbf0ee
Summary:
querying logical sector size from the device instead of hardcoding it for linux platform.
Closes https://github.com/facebook/rocksdb/pull/1875
Differential Revision: D4591946
Pulled By: ajkr
fbshipit-source-id: 4e9805c
Summary:
InsertPathnameToSizeBytes() is called on shared/ and shared_checksum/ directories, which only exist for certain configurations. If we try to list a non-existent directory's contents, some Envs will dump an error message. Let's avoid this by checking whether the directory exists before listing its contents.
Closes https://github.com/facebook/rocksdb/pull/1895
Differential Revision: D4596301
Pulled By: ajkr
fbshipit-source-id: c809679
Summary:
`WriteBatchWithIndex` has an incorrect implicitly-generated move constructor (it will copy the pointer causing a double-free on destruction). Just switch to `unique_ptr` so we get correct move semantics for free.
Closes https://github.com/facebook/rocksdb/pull/1899
Differential Revision: D4598896
Pulled By: ajkr
fbshipit-source-id: 2373d47
Summary:
As the last step in backup creation, the .tmp directory is renamed omitting the .tmp suffix. In case the process terminates before this, the .tmp directory will be left behind. Even if this happens, we want future backups to succeed, so I added some checks/cleanup for this case.
Closes https://github.com/facebook/rocksdb/pull/1896
Differential Revision: D4597323
Pulled By: ajkr
fbshipit-source-id: 48900d8
Summary:
Missing this function will cause RandomAccessFileReader not doing alignment in Direct IO mode, which introduce an IOError: invalid argument.
Closes https://github.com/facebook/rocksdb/pull/1900
Differential Revision: D4601261
Pulled By: lightmark
fbshipit-source-id: c3eadf1
Summary:
we occasionally missing this call so the file size will be wrong
Closes https://github.com/facebook/rocksdb/pull/1894
Differential Revision: D4598446
Pulled By: lightmark
fbshipit-source-id: 42b6ef5
Summary:
Some places autodetected. These are the two places that didn't.
closes#1498
Still unsure if the following instances of 4 * 1024 need fixing in:
util/io_posix.h
include/rocksdb/table.h (appears to be blocksize and different)
utilities/persistent_cache/block_cache_tier.cc
utilities/persistent_cache/persistent_cache_test.h
include/rocksdb/env.h
util/env_posix.cc
db/column_family.cc
Closes https://github.com/facebook/rocksdb/pull/1499
Differential Revision: D4593640
Pulled By: yiwu-arbug
fbshipit-source-id: efc48de
Summary:
This is a trivial fix for OOMs we've seen a few days ago in logdevice.
RocksDB get into the following state:
(1) Write throughput is too high for flushes to keep up. Compactions are out of the picture - automatic compactions are disabled, and for manual compactions we don't care that much if they fall behind. We write to many CFs, with only a few L0 sst files in each, so compactions are not needed most of the time.
(2) total_log_size_ is consistently greater than GetMaxTotalWalSize(). It doesn't get smaller since flushes are falling ever further behind.
(3) Total size of memtables is way above db_write_buffer_size and keeps growing. But the write_buffer_manager_->ShouldFlush() is not checked because (2) prevents it (for no good reason, afaict; this is what this commit fixes).
(4) Every call to WriteImpl() hits the MaybeFlushColumnFamilies() path. This keeps flushing the memtables one by one in order of increasing log file number.
(5) No write stalling trigger is hit. We rely on max_write_buffer_number
Closes https://github.com/facebook/rocksdb/pull/1893
Differential Revision: D4593590
Pulled By: yiwu-arbug
fbshipit-source-id: af79c5f
Summary:
MongoRocks is still using some deprecated functions, return them temporarily
Closes https://github.com/facebook/rocksdb/pull/1892
Differential Revision: D4592451
Pulled By: IslamAbdelRahman
fbshipit-source-id: 5e6be3e
Summary:
reimplement the compaction expansion on lower level.
Considering such a case:
input level file: 1[B E] 2[F G] 3[H I] 4 [J M]
output level file: 5[A C] 6[D K] 7[L O]
If we initially pick file 2, now we will compact file 2 and 6. But we can safely compact 2, 3 and 6 without expanding the output level.
The previous code is messy and wrong.
In this diff, I first determine the input range [a, b], and output range [c, d],
then we get the range [e,f] = [min(a, c), max(b, d] and put all eligible clean-cut files within [e, f] into this compaction.
**Note: clean-cut means the files don't have the same user key on the boundaries of some files that are not chosen in this compaction**.
Closes https://github.com/facebook/rocksdb/pull/1760
Differential Revision: D4395564
Pulled By: lightmark
fbshipit-source-id: 2dc2c5c
Summary:
The option has been deprecated for two years and has no effect. Removing.
Closes https://github.com/facebook/rocksdb/pull/1866
Differential Revision: D4555203
Pulled By: yiwu-arbug
fbshipit-source-id: c48f627
Summary:
I'd like to propose a patch to expose a new IOError type with subcode kStaleFile to allow to detect when ESTALE error is returned. This allows the rocksdb consumers to handle this error separately from other IOErrors.
I've also added a missing string representation for the kDeadlock subcode, I believe calling ToString() on Status object with that subcode would result in an out of band access in the msgs array,
Please let me know if you have any questions or would like me to make any changes to this pull request.
Closes https://github.com/facebook/rocksdb/pull/1748
Differential Revision: D4387675
Pulled By: IslamAbdelRahman
fbshipit-source-id: 67feb13
Summary:
Remove functions that we deprecated long time ago in db.h
Closes https://github.com/facebook/rocksdb/pull/1878
Differential Revision: D4576521
Pulled By: IslamAbdelRahman
fbshipit-source-id: dfddad1
Summary:
The Makefile in the examples directory contained an empty line contain a tab character. This made my Emacs ask on every save `Suspicious line 10. Save anyway? (y or n)`
Closes https://github.com/facebook/rocksdb/pull/1872
Differential Revision: D4573881
Pulled By: siying
fbshipit-source-id: fb3b4ee
Summary:
fix lite bugs
disable direct io in lite mode
Closes https://github.com/facebook/rocksdb/pull/1870
Differential Revision: D4559866
Pulled By: yiwu-arbug
fbshipit-source-id: 3761c51
Summary:
RepairDB isn't included in rocksdb lite, so don't test it.
Closes https://github.com/facebook/rocksdb/pull/1873
Differential Revision: D4565094
Pulled By: ajkr
fbshipit-source-id: 8cc0898
Summary:
NowMicros() provides non-monotonic time. When wall clock is
synchronized or changed, the non-monotonicity time points will affect write rate
controllers. This patch changes write_controller.cc and rate_limiter.cc to use
monotonic time points.
Closes https://github.com/facebook/rocksdb/pull/1865
Differential Revision: D4561732
Pulled By: siying
fbshipit-source-id: 95ece62
Summary:
Seems to me `has_unpersisted_data_` is read from read thread and write
from write thread concurrently without synchronization. Making it an
atomic.
I update the logic not because seeing any problem with it, but it just
feel confusing.
Closes https://github.com/facebook/rocksdb/pull/1869
Differential Revision: D4555837
Pulled By: yiwu-arbug
fbshipit-source-id: eff2ab8
Summary:
Remove disableDataSync, and another similarly named disable_data_sync options.
This is being done to simplify options, and also because the performance gains of this feature can be achieved by other methods.
Closes https://github.com/facebook/rocksdb/pull/1859
Differential Revision: D4541292
Pulled By: sagar0
fbshipit-source-id: 5b3a6ca
Summary:
The previous version of zlib is no longer available. I have also updated the versions of the other static libraries and added checkum checks for the downloads; This is related to https://github.com/facebook/rocksdb/issues/1769
Closes https://github.com/facebook/rocksdb/pull/1863
Differential Revision: D4550742
Pulled By: yiwu-arbug
fbshipit-source-id: 4414150
Summary:
Record the first parsed sequence number as the minimum
so we can find the true minimum otherwise everything is larger than zero.
Fix the comparator name comparision.
Closes https://github.com/facebook/rocksdb/pull/1858
Differential Revision: D4544365
Pulled By: ajkr
fbshipit-source-id: 439cbc2
Summary:
It was really annoying to have two places (top and bottom of compaction loop) where we cut output files. I had bugs in both DeleteRange and dictionary compression due to updating only one of the two. This diff consolidates the file-cutting logic to the bottom of the compaction loop.
Keep in mind that my goal with input_status is to be consistent with the past behavior, even though I'm not sure it's ideal.
Closes https://github.com/facebook/rocksdb/pull/1832
Differential Revision: D4503038
Pulled By: ajkr
fbshipit-source-id: 7da5213
Summary:
Announce the experimetnal two-level index feature in HISTORY.md. Also updated the default for index_per_partition to 1024.
Closes https://github.com/facebook/rocksdb/pull/1855
Differential Revision: D4530102
Pulled By: maysamyabandeh
fbshipit-source-id: b0fc6ff
Summary:
move the argument description to the right section, make it clearer that the '=' sign is required, and use it in one subcommand where it seemed forgotten before.
Closes https://github.com/facebook/rocksdb/pull/1840
Differential Revision: D4515096
Pulled By: ajkr
fbshipit-source-id: 7b5b1c1
Summary:
I want to be able to, e.g., DECLARE_string(statistics_string); in my application such that I can override the default value of statistics_string. For this to work, we need to remove the unnamed namespace containing all the flags, and make sure all variables/functions covered by that namespace are static.
Replaces #1828 due to internal tool issues.
Closes https://github.com/facebook/rocksdb/pull/1844
Differential Revision: D4515124
Pulled By: ajkr
fbshipit-source-id: 23b695e
Summary:
statistics are always concurrently updated regardless of whether a single instance or multiple instances are used. remove the warning since it's unnecessarily preventing people from sharing stats objects across rocksdb instances.
replace #1819 which was having some problem with our internal tools.
Closes https://github.com/facebook/rocksdb/pull/1826
Differential Revision: D4495970
Pulled By: ajkr
fbshipit-source-id: 6601f61
Summary:
Partition Index blocks and use a Partition-index as a 2nd level index.
The two-level index can be used by setting
BlockBasedTableOptions::kTwoLevelIndexSearch as the index type and
configuring BlockBasedTableOptions::index_per_partition
t15539501
Closes https://github.com/facebook/rocksdb/pull/1814
Differential Revision: D4473535
Pulled By: maysamyabandeh
fbshipit-source-id: bffb87e
Summary:
introduce new methods into a public threadpool interface,
- allow submission of std::functions as they allow greater flexibility.
- add Joining methods to the implementation to join scheduled and submitted jobs with
an option to cancel jobs that did not start executing.
- Remove ugly `#ifdefs` between pthread and std implementation, make it uniform.
- introduce pimpl for a drop in replacement of the implementation
- Introduce rocksdb::port::Thread typedef which is a replacement for std::thread. On Posix Thread defaults as before std::thread.
- Implement WindowsThread that allocates memory in a more controllable manner than windows std::thread with a replaceable implementation.
- should be no functionality changes.
Closes https://github.com/facebook/rocksdb/pull/1823
Differential Revision: D4492902
Pulled By: siying
fbshipit-source-id: c74cb11
Summary:
Added method that returns approx num of entries as well as size for memtables.
Closes https://github.com/facebook/rocksdb/pull/1841
Differential Revision: D4511990
Pulled By: VitaliyLi
fbshipit-source-id: 9a4576e
Summary:
The gcc-4.8.1-glibc-2.17 platform is deprecated and will be removed
soon.
Closes https://github.com/facebook/rocksdb/pull/1839
Differential Revision: D4509684
Pulled By: siying
fbshipit-source-id: 3efe296
Summary:
In theory, Get() can get a wrong result, if it races in a special with with flush. The bug can be reproduced in DBTest2.GetRaceFlush. Fix this bug by getting snapshot after referencing the super version.
Closes https://github.com/facebook/rocksdb/pull/1816
Differential Revision: D4475958
Pulled By: siying
fbshipit-source-id: bd9e67a
Summary:
merger.h was always a confusing name for me, simply give the file a better name
Closes https://github.com/facebook/rocksdb/pull/1836
Differential Revision: D4505357
Pulled By: IslamAbdelRahman
fbshipit-source-id: 07b28d8
Summary:
Change the default of delayed slowdown value to 16MB/s and further increase the L0 stop condition to 36 files.
Closes https://github.com/facebook/rocksdb/pull/1821
Differential Revision: D4489229
Pulled By: siying
fbshipit-source-id: 1003981