Commit Graph

251 Commits

Author SHA1 Message Date
Zhongyi Xie
f1592a06c2 run make format for PR 3838 (#3954)
Summary:
PR https://github.com/facebook/rocksdb/pull/3838 made some changes that triggers lint warnings.
Run `make format` to fix formatting as suggested by siying .
Also piggyback two changes:
1) fix singleton destruction order for windows and posix env
2) fix two clang warnings
Closes https://github.com/facebook/rocksdb/pull/3954

Differential Revision: D8272041

Pulled By: miasantreble

fbshipit-source-id: 7c4fd12bd17aac13534520de0c733328aa3c6c9f
2018-06-05 12:58:02 -07:00
Dmitri Smirnov
f4b72d7056 Provide a way to override windows memory allocator with jemalloc for ZSTD
Summary:
Windows does not have LD_PRELOAD mechanism to override all memory allocation functions and ZSTD makes use of C-tuntime calloc. During flushes and compactions default system allocator fragments and the system slows down considerably.

For builds with jemalloc we employ an advanced ZSTD context creation API that re-directs memory allocation to jemalloc. To reduce the cost of context creation on each block we cache ZSTD context within the block based table builder while a new SST file is being built, this will help all platform builds including those w/o jemalloc. This avoids system allocator fragmentation and improves the performance.

The change does not address random reads and currently on Windows reads with ZSTD regress as compared with SNAPPY compression.
Closes https://github.com/facebook/rocksdb/pull/3838

Differential Revision: D8229794

Pulled By: miasantreble

fbshipit-source-id: 719b622ab7bf4109819bc44f45ec66f0dd3ee80d
2018-06-04 12:12:48 -07:00
Dmitri Smirnov
3db8504cde Catchup with posix features
Summary:
Catch up with Posix features
  NewWritableRWFile must fail when file does not exists
  Implement Env::Truncate()
  Adjust Env options optimization functions
  Implement MemoryMappedBuffer on Windows.
Closes https://github.com/facebook/rocksdb/pull/3857

Differential Revision: D8053610

Pulled By: ajkr

fbshipit-source-id: ccd0d46c29648a9f6f496873bc1c9d6c5547487e
2018-05-24 15:13:04 -07:00
Kefu Chai
c465509379 port_posix: use posix_memalign() for aligned_alloc
Summary:
to workaround issue of http://tracker.ceph.com/issues/21422 .
and in tcmalloc aligned_alloc and posix_memalign() are basically the
same thing. the same applies to GNU glibc.

fixes #3175

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Closes https://github.com/facebook/rocksdb/pull/3862

Differential Revision: D8147930

Pulled By: yiwu-arbug

fbshipit-source-id: 355afe93c4dd0a96a0d711ef190e8b86fbe8d11d
2018-05-24 12:13:16 -07:00
Dmitri Smirnov
934f96de27 Better destroydb
Summary:
Delete archive directory before WAL folder
  since archive may be contained as a subfolder.
  Also improve loop readability.
Closes https://github.com/facebook/rocksdb/pull/3797

Differential Revision: D7866378

Pulled By: riversand963

fbshipit-source-id: 0c45d97677ce6fbefa3f8d602ef5e2a2a925e6f5
2018-05-03 16:13:09 -07:00
Dmitri Smirnov
acb61b7a52 Adjust pread/pwrite to return Status
Summary:
Returning bytes_read causes the caller to call GetLastError()
  to report failure but the lasterror may be overwritten by then
  so we lose the error code.
  Fix up CMake file to include xpress source code only when needed.
  Fix warning for the uninitialized var.
Closes https://github.com/facebook/rocksdb/pull/3795

Differential Revision: D7832935

Pulled By: anand1976

fbshipit-source-id: 4be21affb9b85d361b96244f4ef459f492b7cb2b
2018-05-01 13:42:46 -07:00
Dmitri Smirnov
61785c73ed CloseHandle docs says that the return is non-zero, does not say TRUE(1)
Summary:
say it is TRUE(1). Add assert.
Closes https://github.com/facebook/rocksdb/pull/3630

Differential Revision: D7346895

Pulled By: ajkr

fbshipit-source-id: a46075aa4dd89f32520230606adecccecc874cdf
2018-03-20 18:43:02 -07:00
Bruce Mitchener
0de710f5b8 Use nullptr instead of NULL / 0 more consistently.
Summary: Closes https://github.com/facebook/rocksdb/pull/3569

Differential Revision: D7170968

Pulled By: yiwu-arbug

fbshipit-source-id: 308a6b7dd358a04fd9a7de3d927bfd8abd57d348
2018-03-07 12:42:12 -08:00
Dmitri Smirnov
c364eb42b5 Windows cumulative patch
Summary:
This patch addressed several issues.
  Portability including db_test std::thread -> port::Thread Cc: @
  and %z to ROCKSDB portable macro. Cc: maysamyabandeh

  Implement Env::AreFilesSame

  Make the implementation of file unique number more robust

  Get rid of C-runtime and go directly to Windows API when dealing
  with file primitives.

  Implement GetSectorSize() and aling unbuffered read on the value if
  available.

  Adjust Windows Logger for the new interface, implement CloseImpl() Cc: anand1976

  Fix test running script issue where $status var was of incorrect scope
  so the failures were swallowed and not reported.

  DestroyDB() creates a logger and opens a LOG file in the directory
  being cleaned up. This holds a lock on the folder and the cleanup is
  prevented. This fails one of the checkpoin tests. We observe the same in production.
  We close the log file in this change.

 Fix DBTest2.ReadAmpBitmapLiveInCacheAfterDBClose failure where the test
 attempts to open a directory with NewRandomAccessFile which does not
 work on Windows.
  Fix DBTest.SoftLimit as it is dependent on thread timing. CC: yiwu-arbug
Closes https://github.com/facebook/rocksdb/pull/3552

Differential Revision: D7156304

Pulled By: siying

fbshipit-source-id: 43db0a757f1dfceffeb2b7988043156639173f5b
2018-03-06 11:57:43 -08:00
Andrew Kryczka
5d68243e61 Comment out unused variables
Summary:
Submitting on behalf of another employee.
Closes https://github.com/facebook/rocksdb/pull/3557

Differential Revision: D7146025

Pulled By: ajkr

fbshipit-source-id: 495ca5db5beec3789e671e26f78170957704e77e
2018-03-05 13:13:41 -08:00
Dmitri Smirnov
7eb292da14 Fix a memory leak in WindowsThread
Summary:
_endthreadex does not return and thus objects
  for stack destructors do not run. This creates a memory leak.
  We remove the calls since _enthreadex called automatically after the
  threadproc returns i.e. thread exits.
Closes https://github.com/facebook/rocksdb/pull/3542

Differential Revision: D7088713

Pulled By: ajkr

fbshipit-source-id: 749ecafc6a9572f587f76e516547e07734349a54
2018-02-26 13:46:12 -08:00
Igor Sugak
aba3409740 Back out "[codemod] - comment out unused parameters"
Reviewed By: igorsugak

fbshipit-source-id: 4a93675cc1931089ddd574cacdb15d228b1e5f37
2018-02-22 12:43:17 -08:00
David Lai
f4a030ce81 - comment out unused parameters
Reviewed By: everiq, igorsugak

Differential Revision: D7046710

fbshipit-source-id: 8e10b1f1e2aecebbfb229c742e214db887e5a461
2018-02-22 09:44:23 -08:00
Siying Dong
ef29d2a234 Explictly fail writes if key or value is not smaller than 4GB
Summary:
Right now, users will encounter unexpected bahavior if they use key or value larger than 4GB. We should explicitly fail the queriers.
Closes https://github.com/facebook/rocksdb/pull/3484

Differential Revision: D6953895

Pulled By: siying

fbshipit-source-id: b60491e1af064fc5d52971956661f6c18ceac24f
2018-02-09 14:57:54 -08:00
Tamir Duberstein
cd5092e168 Suppress unused warnings
Summary:
- Use `__unused__` everywhere
- Suppress unused warnings in Release mode
    + This currently affects non-MSVC builds (e.g. mingw64).
Closes https://github.com/facebook/rocksdb/pull/3448

Differential Revision: D6885496

Pulled By: miasantreble

fbshipit-source-id: f2f6adacec940cc3851a9eee328fafbf61aad211
2018-02-02 12:27:07 -08:00
Adam Retter
a53c571d2d FreeBSD build support for RocksDB and RocksJava
Summary:
Tested on a clean FreeBSD 11.01 x64.

Closes https://github.com/facebook/rocksdb/pull/1423
Closes https://github.com/facebook/rocksdb/pull/3357

Differential Revision: D6705868

Pulled By: sagar0

fbshipit-source-id: cbccbbdafd4f42922512ca03619a5d5583a425fd
2018-01-11 13:29:55 -08:00
Siying Dong
ccc095a016 Speed up BlockTest.BlockReadAmpBitmap
Summary:
BlockTest.BlockReadAmpBitmap is too slow and times out in some environments. Speed it up by:
(1) improve the way the verification is done. With this it is 5 times faster
(2) run fewer tests for large blocks. This cut it down by another 10 times.
Now it can finish in similar time as other tests.
Closes https://github.com/facebook/rocksdb/pull/3313

Differential Revision: D6643711

Pulled By: siying

fbshipit-source-id: c2397d666eab5421a78ca87e1e45491e0f832a6d
2018-01-02 10:41:28 -08:00
burtonli
b5c99cc908 Disable onboard cache for compaction output
Summary:
FILE_FLAG_WRITE_THROUGH is for disabling device on-board cache in windows API, which should be disabled if user doesn't need system cache.
There was a perf issue related with this, we found during memtable flush, the high percentile latency jumps significantly. During profiling, we found those high latency (P99.9) read requests got queue-jumped by write requests from memtable flush and takes 80ms or even more time to wait, even when SSD overall IO throughput is relatively low.

After enabling FILE_FLAG_WRITE_THROUGH, we rerun the test found high percentile latency drops a lot without observable impact on writes.

Scenario 1: 40MB/s + 40MB/s  R/W compaction throughput

 Original | FILE_FLAG_WRITE_THROUGH | Percentage reduction
---------------------------------------------------------------
P99.9 | 56.897 ms | 35.593 ms | -37.4%
P99 | 3.905 ms | 3.896 ms | -2.8%

Scenario 2:  14MB/s + 14MB/s R/W compaction throughput, cohosted with 100+ other rocksdb instances have manually triggered memtable flush operations (memtable is tiny), creating a lot of randomized the small file writes operations during test.

Original | FILE_FLAG_WRITE_THROUGH | Percentage reduction
---------------------------------------------------------------
P99.9 | 86.227   ms | 50.436 ms | -41.5%
P99 | 8.415   ms | 3.356 ms | -60.1%
Closes https://github.com/facebook/rocksdb/pull/3225

Differential Revision: D6624174

Pulled By: miasantreble

fbshipit-source-id: 321b86aee9d74470840c70e5d0d4fa9880660a91
2017-12-21 18:41:34 -08:00
Dmitri Smirnov
fe608e32ab Fix a race condition in WindowsThread (port::Thread)
Summary:
Fix a race condition when we create a thread and immediately destroy
 This case should be supported.
  What happens is that the thread function needs the Data instance
  to actually run but has no shared ownership and must rely on the
  WindowsThread instance to continue existing.
  To address this we change unique_ptr to shared_ptr and then
  acquire an additional refcount for the threadproc which destroys it
  just before the thread exit.
  We choose to allocate shared_ptr instance on the heap as this allows
  the original thread to continue w/o waiting for the new thread to start
  running.
Closes https://github.com/facebook/rocksdb/pull/3240

Differential Revision: D6511324

Pulled By: yiwu-arbug

fbshipit-source-id: 4633ff7996daf4d287a9fe34f60c1dd28cf4ff36
2017-12-07 13:42:53 -08:00
Shaohua Li
33c7d4ccd9 Make writable_file_max_buffer_size dynamic
Summary:
The DBOptions::writable_file_max_buffer_size can be changed dynamically.
Closes https://github.com/facebook/rocksdb/pull/3053

Differential Revision: D6152720

Pulled By: shligit

fbshipit-source-id: aa0c0cfcfae6a54eb17faadb148d904797c68681
2017-10-31 13:56:35 -07:00
Dmitri Smirnov
682db81385 Enable cacheline_aligned_alloc() to allocate from jemalloc if enabled.
Summary:
Reuse WITH_JEMALLOC option in preparation for module search unification.
  Move jemalloc overrides into a separate .cc
  Remote obsolete JEMALLOC_NOINIT option.
Closes https://github.com/facebook/rocksdb/pull/3078

Differential Revision: D6174826

Pulled By: yiwu-arbug

fbshipit-source-id: 9970a0289b4490272d15853920d9d7531af91140
2017-10-27 13:27:12 -07:00
Dmitri Smirnov
d2a65c59e1 Fix unused var warnings in Release mode
Summary:
MSVC does not support unused attribute at this time. A separate assignment line fixes the issue probably by being counted as usage for MSVC and it no longer complains about unused var.
Closes https://github.com/facebook/rocksdb/pull/3048

Differential Revision: D6126272

Pulled By: maysamyabandeh

fbshipit-source-id: 4907865db45fd75a39a15725c0695aaa17509c1f
2017-10-23 14:27:04 -07:00
Dmitri Smirnov
ebab2e2d42 Enable MSVC W4 with a few exceptions. Fix warnings and bugs
Summary: Closes https://github.com/facebook/rocksdb/pull/3018

Differential Revision: D6079011

Pulled By: yiwu-arbug

fbshipit-source-id: 988a721e7e7617967859dba71d660fc69f4dff57
2017-10-19 10:57:12 -07:00
Orgad Shaneh
34ebadf930 Fix MinGW build
Summary:
snprintf is defined as _snprintf, which doesn't exist in the std
namespace.
Closes https://github.com/facebook/rocksdb/pull/2298

Differential Revision: D5070457

Pulled By: yiwu-arbug

fbshipit-source-id: 6e1659ac3e86170653b174578da5a8ed16812cbb
2017-09-19 10:28:26 -07:00
Dmitri Smirnov
0ec90a7cc2 Add -DPORTABLE=1 to MSVC CI build
Summary:
Add -DPORTABLE=1
  port::cacheline_aligned_alloc() has arguments swapped which prevents every single test from running.
Closes https://github.com/facebook/rocksdb/pull/2815

Differential Revision: D5751661

Pulled By: siying

fbshipit-source-id: e0857d6e138ec46035b3c23d7c3c751901a0a4a0
2017-08-31 16:42:48 -07:00
Yi Wu
e83d6a02e3 Not using aligned_alloc with gcc4 + asan
Summary:
GCC < 5 + ASAN does not instrument aligned_alloc, which can make ASAN
report false-positive with "free on address which was not malloc" error.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61693

Also suppress leak warning with LRUCache::DisownData().
Closes https://github.com/facebook/rocksdb/pull/2783

Differential Revision: D5696465

Pulled By: yiwu-arbug

fbshipit-source-id: 87c607c002511fa089b18cc35e24909bee0e74b4
2017-08-29 21:56:02 -07:00
Andrew Kryczka
47ed3bfc3b fix WinEnv assertions
Summary: Closes https://github.com/facebook/rocksdb/pull/2702

Differential Revision: D5585389

Pulled By: ajkr

fbshipit-source-id: cb54041eb481d0d759c440f82a8a2c5b34534173
2017-08-08 17:20:52 -07:00
Daniel Black
16e0388205 LRUCacheShard cache line size alignment
Summary:
combining #2568 and #2612.
Closes https://github.com/facebook/rocksdb/pull/2620

Differential Revision: D5464394

Pulled By: IslamAbdelRahman

fbshipit-source-id: 9f71d3058dd6adaf02ce3b2de3a81a1228009778
2017-07-24 10:54:37 -07:00
Sagar Vemuri
72502cf227 Revert "comment out unused parameters"
Summary:
This reverts the previous commit 1d7048c598, which broke the build.

Did a `git revert 1d7048c`.
Closes https://github.com/facebook/rocksdb/pull/2627

Differential Revision: D5476473

Pulled By: sagar0

fbshipit-source-id: 4756ff5c0dfc88c17eceb00e02c36176de728d06
2017-07-21 18:26:26 -07:00
Victor Gao
1d7048c598 comment out unused parameters
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.

Reviewed By: igorsugak

Differential Revision: D5454343

fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
2017-07-21 14:57:44 -07:00
Siying Dong
3c327ac2d0 Change RocksDB License
Summary: Closes https://github.com/facebook/rocksdb/pull/2589

Differential Revision: D5431502

Pulled By: siying

fbshipit-source-id: 8ebf8c87883daa9daa54b2303d11ce01ab1f6f75
2017-07-15 16:11:23 -07:00
Daniel Black
ccf5f08f88 Set CACHE_LINE_SIZE for s390, PPC, ARM64
Summary: Closes https://github.com/facebook/rocksdb/pull/2579

Differential Revision: D5427667

Pulled By: maysamyabandeh

fbshipit-source-id: cd0b076aa0cd38d3554516f01723c548713ece61
2017-07-14 15:13:46 -07:00
Dmitri Smirnov
a21db161c9 Implement ReopenWritibaleFile on Windows and other fixes
Summary:
Make default impl return NoSupported so the db_blob
  tests exist in a meaningful manner.
  Replace std::thread to port::Thread
Closes https://github.com/facebook/rocksdb/pull/2465

Differential Revision: D5275563

Pulled By: yiwu-arbug

fbshipit-source-id: cedf1a18a2c05e20d768c1308b3f3224dbd70ab6
2017-06-20 10:31:13 -07:00
Orgad Shaneh
9bb91e9328 Dedup release
Summary:
cc tamird sagar0
Closes https://github.com/facebook/rocksdb/pull/2325

Differential Revision: D5098302

Pulled By: sagar0

fbshipit-source-id: 297c5506b5d9b2ed1d7719c8caf0b96cffe503b8
2017-06-12 13:13:06 -07:00
Andrew Kryczka
6cc9aef162 New API for background work in single thread pool
Summary:
Previously users could set `max_background_flushes=0` to force rocksdb to use a single thread pool for both background flushes and compactions. That'll no longer be possible since I'm going to deprecate `max_background_flushes` and `max_background_compactions` in favor of a single option. This diff introduces a new way to force a single thread pool: when high-pri pool has zero threads, all background jobs will be submitted to low-pri pool.

Note the majority of the code change is adding `Env::GetBackgroundThreads()`, which is necessary to check whether the user has provided a zero-sized thread pool.
Closes https://github.com/facebook/rocksdb/pull/2204

Differential Revision: D4936256

Pulled By: ajkr

fbshipit-source-id: 929a07a0c0705f7766f5339cd013ff74e90d6e01
2017-05-23 11:12:27 -07:00
Tamir Duberstein
146b7718f0 Fix mingw compilation with -DNDEBUG
Summary:
This was exposed by a48a62d, which made NDEBUG the default for cmake
builds.
Closes https://github.com/facebook/rocksdb/pull/2315

Differential Revision: D5079583

Pulled By: sagar0

fbshipit-source-id: c614e96a40df016a834a62b6236852265e7ee4db
2017-05-17 22:56:48 -07:00
Andrew Kryczka
be421b0b16 portable sched_getcpu calls
Summary:
- added a feature test in build_detect_platform to check whether sched_getcpu() is available. glibc offers it only on some platforms (e.g., linux but not mac); this way should be easier than maintaining a list of platforms on which it's available.
- refactored PhysicalCoreID() to be simpler / less repetitive. ordered the conditional compilation clauses from most-to-least preferred
Closes https://github.com/facebook/rocksdb/pull/2272

Differential Revision: D5038093

Pulled By: ajkr

fbshipit-source-id: 81d7db3cc620250de220bdeb3194b2b3d7673de7
2017-05-10 12:29:23 -07:00
Jos Collin
a620966969 port: updated PhysicalCoreID()
Summary:
Updated PhysicalCoreID() to use sched_getcpu() on x86_64 for glibc >= 2.22.  Added a new
function named GetCPUID() that calls sched_getcpu(), to avoid repeated code. This change is done as per the comments of PR: https://github.com/facebook/rocksdb/pull/2230

Signed-off-by: Jos Collin <jcollin@redhat.com>
Closes https://github.com/facebook/rocksdb/pull/2260

Differential Revision: D5025734

Pulled By: ajkr

fbshipit-source-id: f4cca68c12573cafcf8531e7411a1e733bbf8eef
2017-05-09 19:06:39 -07:00
Gunnar Kudrjavets
0b69e50791 Define CACHE_LINE_SIZE only when it's not defined
Summary:
RocksDB is compiled as part of MyRocks (MySQL storage engine) build.
MySQL already defines `CACHE_LINE_SIZE` and therefore we're getting a
conflict. Change RocksDB definition to be more cognizant of this.
Closes https://github.com/facebook/rocksdb/pull/2257

Differential Revision: D5013188

Pulled By: gunnarku

fbshipit-source-id: cfa76fe99f90dcd82aa09204e2f1f35e07a82b41
2017-05-08 16:12:28 -07:00
Tamir Duberstein
fdaefa0309 travis: add Windows cross-compilation
Summary:
- downcase includes for case-sensitive filesystems
- give targets the same name (librocksdb) on all platforms

With this patch it is possible to cross-compile RocksDB for Windows
from a Linux host using mingw.

cc yuslepukhin orgads
Closes https://github.com/facebook/rocksdb/pull/2107

Differential Revision: D4849784

Pulled By: siying

fbshipit-source-id: ad26ed6b4d393851aa6551e6aa4201faba82ef60
2017-05-05 23:20:01 -07:00
Jos Collin
60847a3b08 port: updated PhysicalCoreID()
Summary: Checked the return value of __get_cpuid(). Implemented the else case where the arch is different from i386 and x86_64.

Pulled By: ajkr

Differential Revision: D4973496

fbshipit-source-id: c40fdef5840364c2a79b1d11df0db5d4ec3d6a4a
2017-05-03 13:08:55 -07:00
Siying Dong
d616ebea23 Add GPLv2 as an alternative license.
Summary: Closes https://github.com/facebook/rocksdb/pull/2226

Differential Revision: D4967547

Pulled By: siying

fbshipit-source-id: dd3b58ae1e7a106ab6bb6f37ab5c88575b125ab4
2017-04-27 18:06:12 -07:00
Dmitri Smirnov
cdad04b051 Remove double buffering on RandomRead on Windows.
Summary:
Remove double buffering on RandomRead on Windows.
  With more logic appear in file reader/write Read no longer
  obeys forwarding calls to Windows implementation.
  Previously direct_io (unbuffered) was only available on Windows
  but now is supported as generic.
  We remove intermediate buffering on Windows.
  Remove random_access_max_buffer_size option which was windows specific.
  Non-zero values for that opton introduced unnecessary lock contention.
  Remove Env::EnableReadAhead(), Env::ShouldForwardRawRequest() that are
  no longer necessary.
  Add aligned buffer reads for cases when requested reads exceed read ahead size.
Closes https://github.com/facebook/rocksdb/pull/2105

Differential Revision: D4847770

Pulled By: siying

fbshipit-source-id: 8ab48f8e854ab498a4fd398a6934859792a2788f
2017-04-27 12:30:05 -07:00
Tomas Kolda
04d58970cb AIX and Solaris Sparc Support
Summary:
Replacement of #2147

The change was squashed due to a lot of conflicts.
Closes https://github.com/facebook/rocksdb/pull/2194

Differential Revision: D4929799

Pulled By: siying

fbshipit-source-id: 5cd49c254737a1d5ac13f3c035f128e86524c581
2017-04-21 20:48:04 -07:00
Siying Dong
d2dce5611a Move some files under util/ to separate dirs
Summary:
Move some files under util/ to new directories env/, monitoring/ options/ and cache/
Closes https://github.com/facebook/rocksdb/pull/2090

Differential Revision: D4833681

Pulled By: siying

fbshipit-source-id: 2fd8bef
2017-04-05 19:09:16 -07:00
Orgad Shaneh
6401a8b76b Fix build with MinGW
Summary:
There still are many warnings (most of them about invalid printf format
for long long), but it builds if FAIL_ON_WARNINGS is disabled.
Closes https://github.com/facebook/rocksdb/pull/2052

Differential Revision: D4807355

Pulled By: siying

fbshipit-source-id: ef03786
2017-03-30 16:54:52 -07:00
Dmitri Smirnov
c9df05d1e4 Fix random access alignment
Summary:
This fixes an issue when the most recent readers assume that alignment is always set even if direct io is off.
Also adjust slightly appveyor script to run db_basic_test cases concurrently.
Closes https://github.com/facebook/rocksdb/pull/1959

Differential Revision: D4671972

Pulled By: IslamAbdelRahman

fbshipit-source-id: 1886620
2017-03-08 17:09:11 -08:00
Dmitri Smirnov
0a4cdde50a Windows thread
Summary:
introduce new methods into a public threadpool interface,
- allow submission of std::functions as they allow greater flexibility.
- add Joining methods to the implementation to join scheduled and submitted jobs with
  an option to cancel jobs that did not start executing.
- Remove ugly `#ifdefs` between pthread and std implementation, make it uniform.
- introduce pimpl for a drop in replacement of the implementation
- Introduce rocksdb::port::Thread typedef which is a replacement for std::thread.  On Posix Thread defaults as before std::thread.
- Implement WindowsThread that allocates memory in a more controllable manner than windows std::thread with a replaceable implementation.
- should be no functionality changes.
Closes https://github.com/facebook/rocksdb/pull/1823

Differential Revision: D4492902

Pulled By: siying

fbshipit-source-id: c74cb11
2017-02-06 14:54:18 -08:00
Dmitri Smirnov
324a0f988e Follow up for DirectIO refactor
Summary: Windows follow up for  dc2584eea0

Differential Revision: D4420337

Pulled By: IslamAbdelRahman

fbshipit-source-id: fedc5b5
2017-01-15 13:24:16 -08:00
Aaron Gao
3e6899d116 change UseDirectIO() to use_direct_io()
Summary:
also change variable name `direct_io_` to `use_direct_io_` in WritableFile to make it consistent with read path.
Closes https://github.com/facebook/rocksdb/pull/1770

Differential Revision: D4416435

Pulled By: lightmark

fbshipit-source-id: 4143c53
2017-01-13 12:09:15 -08:00
Dmitri Smirnov
3c233ca4ea Fix Windows environment issues
Summary:
Enable directIO on WritableFileImpl::Append
     with offset being current length of the file.
     Enable UniqueID tests on Windows, disable others but
     leeting them to compile. Unique tests are valuable to
     detect failures on different filesystems and upcoming
     ReFS.
     Clear output in WinEnv Getchildren.This is different from
     previous strategy, do not touch output on failure.
     Make sure DBTest.OpenWhenOpen works with windows error message
Closes https://github.com/facebook/rocksdb/pull/1746

Differential Revision: D4385681

Pulled By: IslamAbdelRahman

fbshipit-source-id: c07b702
2017-01-09 15:54:12 -08:00
Gunnar Kudrjavets
548b628054 Enable conditionally using adaptive mutexes
Summary:
To support scenarios where we want all instances of `Mutex` be adaptive
we're adding a conditional `#define` so that the desired behavior can be
easily enabled.
Closes https://github.com/facebook/rocksdb/pull/1710

Differential Revision: D4359863

Pulled By: gunnarku

fbshipit-source-id: 2f1e2f8
2016-12-27 16:09:12 -08:00
Aaron Gao
972f96b3fb direct io write support
Summary:
rocksdb direct io support

```
[gzh@dev11575.prn2 ~/rocksdb] ./db_bench -benchmarks=fillseq --num=1000000
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
RocksDB:    version 5.0
Date:       Wed Nov 23 13:17:43 2016
CPU:        40 * Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
CPUCache:   25600 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
Write rate: 0 bytes/second
Compression: Snappy
Memtablerep: skip_list
Perf Level: 1
WARNING: Assertions are enabled; benchmarks unnecessarily slow
------------------------------------------------
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
DB path: [/tmp/rocksdbtest-112628/dbbench]
fillseq      :       4.393 micros/op 227639 ops/sec;   25.2 MB/s

[gzh@dev11575.prn2 ~/roc
Closes https://github.com/facebook/rocksdb/pull/1564

Differential Revision: D4241093

Pulled By: lightmark

fbshipit-source-id: 98c29e3
2016-12-22 13:09:19 -08:00
ivan
046099c9b5 The array is malloced by backtrace_symbols(), and must be freed
Summary:
The address of the array of string pointers is returned as the function result of backtrace_symbols().  This array is malloced by backtrace_symbols(), and must be freed by the caller.
Closes https://github.com/facebook/rocksdb/pull/1692

Differential Revision: D4355737

Pulled By: IslamAbdelRahman

fbshipit-source-id: 5742035
2016-12-20 17:24:12 -08:00
Andrew Kryczka
f0c509e2c8 Return finer-granularity status from Env::GetChildren*
Summary:
It'd be nice to use the error status type to distinguish
between user error and system error. For example, GetChildren can fail
listing a backup directory's contents either because a bad path was provided
(user error) or because an operation failed, e.g., a remote storage service
call failed (system error). In the former case, we want to continue and treat
the backup directory as empty; in the latter case, we want to immediately
propagate the error to the caller.

This diff uses NotFound to indicate user error and IOError to indicate
system error. Previously IOError indicated both.
Closes https://github.com/facebook/rocksdb/pull/1644

Differential Revision: D4312157

Pulled By: ajkr

fbshipit-source-id: 51b4f24
2016-12-12 12:54:13 -08:00
Edouard A
99c052a34f Fix integer overflow in GetL0ThresholdSpeedupCompaction (#1378) 2016-10-23 18:43:29 -07:00
Dmitri Smirnov
b9311aa65c Implement WinRandomRW file and improve code reuse (#1388) 2016-10-13 16:36:34 -07:00
Edouard A
66a91e2607 Add NoSpace subcode to IOError (#1320)
Add a sub code to distinguish "out of space" errors from regular I/O errors
2016-09-07 12:37:45 -07:00
Islam AbdelRahman
e9b2af87f8 Expose ThreadPool under include/rocksdb/threadpool.h
Summary:
This diff split ThreadPool to
-ThreadPool (abstract interface exposed in include/rocksdb/threadpool.h)
-ThreadPoolImpl (actual implementation in util/threadpool_imp.h)

This allow us to expose ThreadPool to the user so we can use it as an option later

Test Plan: existing unit tests

Reviewers: andrewkr, yiwu, yhchiang, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D62085
2016-08-26 10:41:35 -07:00
Willem Jan Withagen
5647fa427c stack_trace,cc: The current Stacktrace code does not compile for FreeBSD (#1153)
* stack_trace,cc: The current Stacktrace code does not compile for FreeBSD

So set it to generate empty routines

* stack_trace,cc: The current Stacktrace code does not compile for FreeBSD

Use the definition also used in other commits
2016-06-05 17:40:43 -07:00
sdong
f62fbd2c85 Handle overflow case of rate limiter's paramters
Summary: When rate_bytes_per_sec * refill_period_us_ overflows, the actual limited rate is very low. Handle this case so the rate will be large.

Test Plan: Add a unit test for it.

Reviewers: IslamAbdelRahman, andrewkr

Reviewed By: andrewkr

Subscribers: yiwu, lightmark, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58929
2016-05-27 16:15:28 -07:00
Aaron Orenstein
2073cf3775 Eliminate use of 'using namespace std'. Also remove a number of ADL references to std functions.
Summary: Reduce use of argument-dependent name lookup in RocksDB.

Test Plan: 'make check' passed.

Reviewers: andrewkr

Reviewed By: andrewkr

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58203
2016-05-20 07:42:18 -07:00
Dmitri Smirnov
26adaad438 Split WinEnv into separate classes. (#1128)
For ease of reuse and customization as a library
  without wrapping.
  WinEnvThreads is a class for replacement.
  WintEnvIO is a class for reuse and behavior override.
  Added private virtual functions for custom override
  of fallocate pread for io classes.
2016-05-19 16:40:54 -07:00
Dmitri Smirnov
bac3be7c46 Fix build issue. (#1123)
Implement GetUniqueIdFromFile to support new tests and the feature.
2016-05-16 17:01:00 -07:00
Dmitri Smirnov
aab91b8d8f Use generic threadpool for Windows environment (#1120)
Conditionally retrofit thread_posix for use with std::thread
  and reuse the same logic. Posix users continue using Posix interfaces.
  Enable XPRESS compression in test runs.
  Fix master introduced signed/unsigned mismatch.
2016-05-12 18:34:04 -07:00
Dmitri Smirnov
4ea6e051ee Fix multiple issues with WinMmapFile fo sequential writing (#1108)
make preallocation inline with other writable files
  make sure that we map no more than pre-allocated size.
2016-04-29 16:43:13 -07:00
PraveenSinghRao
e8115cea45 Revert "Use async file handle for better parallelism (#1049)" (#1105)
This reverts commit b54c347424.

Revert async file handle change as it causes failures with appveyor
2016-04-28 22:50:26 -07:00
Li Peng
6d4832a998 Merge pull request #1101 from flyd1005/wip-fix-typo
fix typos and remove duplicated words
2016-04-28 02:30:44 -07:00
dx9
b71c4e613f Alpine Linux Build (#990)
* Musl libc does not provide adaptive mutex. Added feature test for PTHREAD_MUTEX_ADAPTIVE_NP.

* Musl libc does not provide backtrace(3). Added a feature check for backtrace(3).

* Fixed compiler error.

* Musl libc does not implement backtrace(3). Added platform check for libexecinfo.

* Alpine does not appear to support gcc -pg option. By default (gcc has PIE option enabled) it fails with:

gcc: error: -pie and -pg|p|profile are incompatible when linking

When -fno-PIE and -nopie are used it fails with:

/usr/lib/gcc/x86_64-alpine-linux-musl/5.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find gcrt1.o: No such file or directory

Added gcc -pg platform test and output PROFILING_FLAGS accordingly. Replaced pg var in Makefile with PROFILING_FLAGS.

* fix segfault when TEST_IOCTL_FRIENDLY_TMPDIR is undefined and default candidates are not suitable

* use ASSERT_DOUBLE_EQ instead of ASSERT_EQ

* When compiled with ROCKSDB_MALLOC_USABLE_SIZE UniversalCompactionFourPaths and UniversalCompactionSecondPathRatio tests fail due to premature memtable flushes on systems with 16-byte alignment. Arena runs out of block space before GenerateNewFile() completes.

Increased options.write_buffer_size.
2016-04-22 16:49:12 -07:00
PraveenSinghRao
b54c347424 Use async file handle for better parallelism (#1049) 2016-04-22 13:27:33 -07:00
Dmitri Smirnov
ee221d2de0 Introduce XPRESS compresssion on Windows. (#1081)
Comparable with Snappy on comp ratio.
  Implemented using Windows API, does not require external package.
  Avaiable since Windows 8 and server 2012.
  Use -DXPRESS=1 with CMake to enable.
2016-04-19 22:54:24 -07:00
Yueh-Hsuan Chiang
a558830f8f Fixed compile warnings in posix_logger.h and coding.h
Summary:
Fixed the following compile warnings:

/Users/yhchiang/rocksdb/util/posix_logger.h:32:11: error: unused variable 'kDebugLogChunkSize' [-Werror,-Wunused-const-variable]
const int kDebugLogChunkSize = 128 * 1024;
          ^
/Users/yhchiang/rocksdb/util/coding.h:24:20: error: unused variable 'kMaxVarint32Length' [-Werror,-Wunused-const-variable]
const unsigned int kMaxVarint32Length = 5;
                   ^
2 errors generated.

Test Plan: make clean rocksdb

Reviewers: igor, sdong, anthony, IslamAbdelRahman, rven, kradhakrishnan, adamretter

Reviewed By: adamretter

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D56223
2016-03-31 16:01:47 -07:00
Dmitri Smirnov
2ca0994cf7 Latest versions of Jemalloc library do not require je_init()/je_unint()
calls. #ifdef in the source code and make this a default build option.
2016-03-17 11:25:20 -07:00
Dmitri Smirnov
9ea2968d26 Implement ConsistentChildrenAttribute
by using default implementation for now as it works.
2016-02-19 14:20:34 -08:00
Andrew Kryczka
d733dd5728 [build] Fix env_win.cc compiler errors
Summary: I broke it in D53781.

Test Plan: tried the same code in util/env_posix.cc and it compiled successfully

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D54303
2016-02-17 11:57:04 -08:00
Baraa Hamodi
21e95811d1 Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
Andrew Kryczka
59b3ee658f Env function for bulk metadata retrieval
Summary:
Added this new function, which returns filename, size, and modified
timestamp for each file in the provided directory. The default implementation
retrieves the metadata sequentially using existing functions. In the next diff
I'll make HdfsEnv override this function to use libhdfs's bulk get function.

This won't work on windows due to the path separator.

Test Plan:
new unit test

  $ ./env_test --gtest_filter=EnvPosixTest.ConsistentChildrenMetadata

Reviewers: yhchiang, sdong

Reviewed By: sdong

Subscribers: IslamAbdelRahman, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D53781
2016-02-09 14:54:32 -08:00
Tomas Kolda
57a95a7001 Making use of GetSystemTimePreciseAsFileTime dynamic - code review fixes 2016-02-02 10:23:56 +01:00
Tomas Kolda
502d41f150 Making use of GetSystemTimePreciseAsFileTime dynamic to not
break compatibility with Windows 7. The issue with rotated logs
was fixed other way.
2016-02-02 10:23:56 +01:00
Dmitri Smirnov
36300fbbe3 Enable per-request buffer allocation in RandomAccessFile
This change impacts only non-buffered I/O on Windows.
 Currently, there is a buffer per RandomAccessFile
 instance that is protected by a lock. The reason we
 maintain the buffer is non-buffered I/O requires an aligned
 buffer to work.
 XPerf traces demonstrate that we accumulate a considerable
 wait time while waiting for that lock.
 This change enables to set random access buffer size to zero
 which would indicate a per request allocation.
 We are expecting that allocation expense would be much less than
 I/O costs plus wait time due to the fact that the memory heap
 would tend to re-use page aligned allocations especially with the
 use of Jemalloc.
 This change does not affect buffer use as a read_ahead_buffer for
 compaction purposes.
2016-02-01 13:14:37 -08:00
Dmitri Smirnov
ac50fd3a71 Align statistics
Use Yield macro to make it a little more portable between platforms.
2016-01-13 14:53:23 -08:00
Marek Kurdej
92d0850f1c Fix failing assertion in logger on Windows when the disk is full. 2016-01-05 13:35:14 +01:00
Nathan Bronson
7d87f02799 support for concurrent adds to memtable
Summary:
This diff adds support for concurrent adds to the skiplist memtable
implementations.  Memory allocation is made thread-safe by the addition of
a spinlock, with small per-core buffers to avoid contention.  Concurrent
memtable writes are made via an additional method and don't impose a
performance overhead on the non-concurrent case, so parallelism can be
selected on a per-batch basis.

Write thread synchronization is an increasing bottleneck for higher levels
of concurrency, so this diff adds --enable_write_thread_adaptive_yield
(default off).  This feature causes threads joining a write batch
group to spin for a short time (default 100 usec) using sched_yield,
rather than going to sleep on a mutex.  If the timing of the yield calls
indicates that another thread has actually run during the yield then
spinning is avoided.  This option improves performance for concurrent
situations even without parallel adds, although it has the potential to
increase CPU usage (and the heuristic adaptation is not yet mature).

Parallel writes are not currently compatible with
inplace updates, update callbacks, or delete filtering.
Enable it with --allow_concurrent_memtable_write (and
--enable_write_thread_adaptive_yield).  Parallel memtable writes
are performance neutral when there is no actual parallelism, and in
my experiments (SSD server-class Linux and varying contention and key
sizes for fillrandom) they are always a performance win when there is
more than one thread.

Statistics are updated earlier in the write path, dropping the number
of DB mutex acquisitions from 2 to 1 for almost all cases.

This diff was motivated and inspired by Yahoo's cLSM work.  It is more
conservative than cLSM: RocksDB's write batch group leader role is
preserved (along with all of the existing flush and write throttling
logic) and concurrent writers are blocked until all memtable insertions
have completed and the sequence number has been advanced, to preserve
linearizability.

My test config is "db_bench -benchmarks=fillrandom -threads=$T
-batch_size=1 -memtablerep=skip_list -value_size=100 --num=1000000/$T
-level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999
-disable_auto_compactions --max_write_buffer_number=8
-max_background_flushes=8 --disable_wal --write_buffer_size=160000000
--block_size=16384 --allow_concurrent_memtable_write" on a two-socket
Xeon E5-2660 @ 2.2Ghz with lots of memory and an SSD hard drive.  With 1
thread I get ~440Kops/sec.  Peak performance for 1 socket (numactl
-N1) is slightly more than 1Mops/sec, at 16 threads.  Peak performance
across both sockets happens at 30 threads, and is ~900Kops/sec, although
with fewer threads there is less performance loss when the system has
background work.

Test Plan:
1. concurrent stress tests for InlineSkipList and DynamicBloom
2. make clean; make check
3. make clean; DISABLE_JEMALLOC=1 make valgrind_check; valgrind db_bench
4. make clean; COMPILE_WITH_TSAN=1 make all check; db_bench
5. make clean; COMPILE_WITH_ASAN=1 make all check; db_bench
6. make clean; OPT=-DROCKSDB_LITE make check
7. verify no perf regressions when disabled

Reviewers: igor, sdong

Reviewed By: sdong

Subscribers: MarkCallaghan, IslamAbdelRahman, anthony, yhchiang, rven, sdong, guyg8, kradhakrishnan, dhruba

Differential Revision: https://reviews.facebook.net/D50589
2015-12-25 11:03:40 -08:00
Siying Dong
298ba27ae2 Merge pull request #846 from yuslepukhin/enble_c4244_lossofdata
Enable MS compiler warning c4244.
2015-12-23 22:59:42 -08:00
Venkatesh Radhakrishnan
030215bf01 Running manual compactions in parallel with other automatic or manual compactions in restricted cases
Summary:
This diff provides a framework for doing manual
compactions in parallel with other compactions. We now have a deque of manual compactions. We also pass manual compactions as an argument from RunManualCompactions down to
BackgroundCompactions, so that RunManualCompactions can be reentrant.
Parallelism is controlled by the two routines
ConflictingManualCompaction to allow/disallow new parallel/manual
compactions based on already existing ManualCompactions. In this diff, by default manual compactions still have to run exclusive of other compactions. However, by setting the compaction option, exclusive_manual_compaction to false, it is possible to run other compactions in parallel with a manual compaction. However, we are still restricted to one manual compaction per column family at a time. All of these restrictions will be relaxed in future diffs.
I will be adding more tests later.

Test Plan: Rocksdb regression + new tests + valgrind

Reviewers: igor, anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, sdong

Reviewed By: sdong

Subscribers: yoshinorim, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47973
2015-12-14 11:20:34 -08:00
Dmitri Smirnov
236fe21c92 Enable MS compiler warning c4244.
Mostly due to the fact that there are differences in sizes of int,long
  on 64 bit systems vs GNU.
2015-12-11 16:47:34 -08:00
charsyam
c30b499541 fix typos in comments 2015-12-11 01:54:48 +09:00
Siying Dong
fa3dbf203f Merge pull request #853 from Vaisman/enable_C4267_warning
Enable C4267 warning
2015-12-08 17:59:24 -08:00
yuslepukhin
78de0c9222 Fix up VS 15 build.
Fix warnings
 Take advantage of native snprintf on VS 15
2015-12-08 08:38:21 -08:00
Vasili Svirski
41b32c6059 Enable C4267 warning
* conversion from 'size_t' to 'type', by add static_cast

Tested:
* by build solution on Windows, Linux locally,
* run tests
* build CI system successful
2015-11-24 16:33:09 +03:00
yuslepukhin
047bd22aae Build on Visual Studio 2015 Update 1 2015-11-20 15:31:47 -08:00
Dmitri Smirnov
314f62194a Remove headers from the cc since they are in the module's header. 2015-11-16 15:08:11 -08:00
Dmitri Smirnov
472c74006f Add necessary headers after cpplint rearranged includes 2015-11-16 14:41:11 -08:00
Islam AbdelRahman
a163cc2d5a Lint everything
Summary:
```
arc2 lint --everything
```

run the linter on the whole code repo to fix exisitng lint issues

Test Plan: make check -j64

Reviewers: sdong, rven, anthony, kradhakrishnan, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D50769
2015-11-16 12:56:21 -08:00
Dmitri Smirnov
5270b33bd3 Make use of portable uint64_t type to make possible file access
in 64-bit.

  Currently, a signed off_t type is being used for the following
  interfaces for both offset and the length in bytes:
  * `Allocate`
  * `RangeSync`

  On Linux `off_t` is automatically either 32 or 64-bit depending on
  the platform. On Windows it is always a 32-bit signed long which
  limits file access and in particular space pre-allocation
  to effectively 2 Gb.

  Proposal is to replace off_t with uint64_t as a portable type
  always access files with 64-bit interfaces.

  May need to modify posix code but lack resources to test it.
2015-11-10 17:03:42 -08:00
sdong
296c3a1f94 "make format" in some recent commits
Summary: Run "make format" for some recent commits.

Test Plan: Build and run tests

Reviewers: IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D49707
2015-10-29 17:11:14 -07:00
Dmitri Smirnov
6fbc4f9f3e Implement smart buffer management.
introduce a new DBOption random_access_max_buffer_size to limit
  the size of the random access buffer used for unbuffered access.
  Implement read ahead buffering when enabled.
  To that effect propagate compaction_readahead_size and the new option
  to the env options to make it available for the implementation.
  Add Hint() override so SetupForCompaction() call would call Hint()
  readahead can now be setup from both Hint() and EnableReadAhead()
  Add new option random_access_max_buffer_size support
  db_bench, options_helper to make it string parsable
  and the unit test.
2015-10-27 14:44:16 -07:00
sdong
e1a5ff857b Allow users to disable some kill points in db_stress
Summary:
Give a name for every kill point, and allow users to disable some kill points based on prefixes. The kill points can be passed by db_stress through a command line paramter. This provides a way for users to boost the chance of triggering low frequency kill points
This allow follow up changes in crash test scripts to improve crash test coverage.

Test Plan:
Manually run db_stress with variable values of --kill_random_test and --kill_prefix_blacklist. Like this:
 --kill_random_test=2 --kill_prefix_blacklist=Posix,WritableFileWriter::Append,WritableFileWriter::WriteBuffered,WritableFileWriter::Sync

Reviewers: igor, kradhakrishnan, rven, IslamAbdelRahman, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D48735
2015-10-15 14:33:13 -07:00
Siying Dong
d662b8dab5 Merge pull request #766 from PraveenSinghRao/lockfix
move debug variable under ifndef NDEBUG
2015-10-14 10:07:17 -07:00
Praveen Rao
91c041e578 move debug variable under ifndef 2015-10-13 14:28:11 -07:00