Summary:
Many link commands were identical.
Factor that out into a variable, AM_LINK, and use
it in place of all of those open-coded commands.
Test Plan: run "make check"
Reviewers: ljin, igor, sdong
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33585
Summary:
With this change, make now prints a summary line for each
compiler and linker invocation, e.g.,:
CC db/builder.o
CC db/c.o
CC db/column_family.o
To see full commands, insert "V=1" into your make command.
E.g., run "make V=1 all" if you want it to print each command
in its full glory.
$^ is GNU make's abbreviation for the prerequisites of the current target.
These AM_V_... variables expand to some very short string like "CC" or
"LD", by default, so that the output of "make" is readable. If/when you
want more details, just build with "make V=1 ...", and make will print
each full command as it is executed. If you prefer to see the noise
all the time, and only want to optionally see the abbreviated output,
set AM_DEFAULT_VERBOSITY=1 in your environment, and then build with
V=0 to see the abbreviated command indicators.
Test Plan:
invoke make a few different ways and observe:
make clean; make # abbreviated
make clean; make V=0 # also abbreviated
make clean; make V=1 # full detail
Reviewers: sdong, ljin, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33579
Summary:
This is in preparation for some factorization.
* Makefile (deletefile_test): Add $(COVERAGEFLAGS) to link command.
(options_test): Remove explicit (redundant) dependency on
options_helper.o: that is already a dependent, via $(LIBOBJECTS)
Test Plan: run make
Reviewers: sdong, ljin, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33573
Summary:
There were Makefile rules to build those two targets,
but neither rule has worked for a long time, due to missing
dependent source files. Remove those rules.
Test Plan: run "make"
Reviewers: sdong, ljin, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33567
Summary:
Before this change, running "make" with no arguments would
silently run the rules for the "uninstall" target(!). Don't do that.
* Makefile (default): New, first target; depend on "all".
(uninstall, install): Do not hide the commands we run.
Test Plan:
Run "make" and verify that the rules for "uninstall" are no longer run.
Instead, note that many files are compiled and linked. Before, none were.
Reviewers: sdong, ljin, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33531
Summary:
* Makefile (dummy): Prefix this statement with "dummy := ",
so that it no longer triggers a syntax error from GNU make 3.80
and earlier. Reported by nielsl in
https://github.com/facebook/rocksdb/issues/509
Test Plan: run make
Reviewers: sdong, ljin, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33429
Summary:
The code being removed would invoke sed differently to work
around a portability difference in how sed -i works (different
on MacOS). Yet performing a host-type-based ifdef fails when
the tools installed do not match. That sed use was solely to
post-process the .d file. Instead, generate the desired output
directly, by using the compiler's -MT<FILE> option.
* Makefile (%.d: %.cc): With the prior use of Makefile-ifdef'd
sed, when building on MacOS with gnu sed, every run of this rule
would fail with a sed usage error. Also list each .d file as a
dependent.
Test Plan:
Ensure that a selected .d file is the same as before both with
g++ and with clang++. However, note that the new .d files each
contain a new reference to the .d file itself.
Reviewers: sdong, ljin, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33369
Summary:
Added new target ##make analyze## into Makefile. This command runs clang static analyzer and builds the sources as ##make all##. The result report is put into ##$(RocksDbSourceRoot)/can_build_report/##
If the development environment is a Facebook devserver and ##ROCKSDB_NO_FBCODE## is not set, then scan-build is used from fbcode. If it is run not on a Facebook devserver, scan-build should be available in ##$PATH##. I'll add details to wiki how to install scan-build on a non Facebook devserver environment.
Test Plan:
Run the fallowing commands on a Facebook devserver and Mac OS, and ensure no build or test errors.
```
% make all check -j32
% make clean
% USE_CLANG=1 make all -j32
% make analyze
% USE_CLANG=1 make analyze
```
Reviewers: sdong, lgalanis, leveldb, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D32799
Summary: Add CappedFixTransform, which is the same as fixed length prefix extractor, except that when slice is shorter than the fixed length, it will use the full key.
Test Plan:
Add a test case for
db_test
options_test
and a new test
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: MarkCallaghan, leveldb, dhruba, yoshinorim
Differential Revision: https://reviews.facebook.net/D31887
Summary: This was a feature request by osquery. See task t5617758
Test Plan: compiles and memenv_test runs
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32115
Summary:
There's a bug in TSAN (or libstdc++?) with std::shared_ptr<> for some reason. In db_test, only FlushSchedule is affected.
See more: https://groups.google.com/forum/#!topic/thread-sanitizer/vz_s-t226Vg
With this change and all other @sdong's and mine diffs, our db_test should be TSAN-clean. I'll move to other tests.
Test Plan: no more flush schedule when running TSAN
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: sdong, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32469
Summary: We don't have plans to work on this in the short term. If we ever resurrect the project, we can find the code in the history. No need for it to linger around
Test Plan: no test
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32349
Summary:
1. If WAL directory is different from db directory. Sync the directory after creating a log file under it.
2. After creating an SST file, sync its parent directory instead of DB directory.
3. change the check of kResetDeleteUnsyncedFiles in fault_injection_test. Since we changed the behavior to sync log files' parent directory after first WAL sync, instead of creating, kResetDeleteUnsyncedFiles will not guarantee to show post sync updates.
Test Plan: make all check
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32067
Summary: When you compile with COMPILE_WITH_TSAN=1, we will compile the code with -fsanitize=thread. This will resolve bunch of data race issues we might have.
Test Plan: COMPILE_WITH_TSAN=1 m db_test
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32019
Summary:
This is a port of [[ https://github.com/google/leveldb/blob/master/db/fault_injection_test.cc | LevelDB's fault_injection_test ]] to RocksDB. Unfortunately it fails with:
```
==== Test FaultInjectionTest.FaultTest
db/fault_injection_test.cc:491: Corruption: no meta-nextfile entry in descriptor
#0 ./fault_injection_test() [0x41477a] rocksdb::FaultInjectionTest::PartialCompactTestReopenWithFault(rocksdb::FaultInjectionTest::ResetMethod, int, int) /data/users/tomdzk/rocksdb/db/fault_injection_test.cc:491
#1 ./fault_injection_test() [0x40a38a] rocksdb::_Test_FaultTest::_Run() /data/users/tomdzk/rocksdb/db/fault_injection_test.cc:517
#2 ./fault_injection_test() [0x415bea] rocksdb::_Test_FaultTest::_RunIt() /data/users/tomdzk/rocksdb/db/fault_injection_test.cc:507
#3 ./fault_injection_test() [0x584367] rocksdb::test::RunAllTests() /data/users/tomdzk/rocksdb/util/testharness.cc:70
#4 /usr/local/fbcode/gcc-4.8.1-glibc-2.17/lib/libc.so.6(__libc_start_main+0x10e) [0x7f7a40857efe] ?? ??:0
#5 ./fault_injection_test() [0x408bb8] _start ??:0
```
so I commented out the test invocation in the source code for now (lines 514-520) so it can be merged.
Test Plan: This is a new test.
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31587
Summary: I'm tired of double-tab when opening build_tools/<something>. This change will make bu<tab> fully complete my path :)
Test Plan: `vi bu<tab>` gives me `vi build_tools/` yay!
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30639
Summary:
Create a benchmark for testing memtablereps. This diff is a bit rough, but it should do the trick until other bootcampers can clean it up.
Addressing comments
Removed the mutexes
Changed ReadWriteBenchmark to fix number of reads and count the number of writes we can perform in that time.
Test Plan:
Run it.
Below runs pass
./memtablerep_bench --benchmarks fillrandom,readrandom --memtablerep skiplist
./memtablerep_bench --benchmarks fillseq,readseq --memtablerep skiplist
./memtablerep_bench --benchmarks readwrite,seqreadwrite --memtablerep skiplist --num_operations 200 --num_threads 5
./memtablerep_bench --benchmarks fillrandom,readrandom --memtablerep hashskiplist
./memtablerep_bench --benchmarks fillseq,readseq --memtablerep hashskiplist
--num_scans 2
./memtablerep_bench --benchmarks fillseq,readseq --memtablerep vector
Reviewers: jpaton, ikabiljo, sdong
Reviewed By: sdong
Subscribers: dhruba, ameyag
Differential Revision: https://reviews.facebook.net/D22683
Summary:
Add support to allow nested config for block-based table factory. The format looks like this:
"write_buffer_size=1024;block_based_table_factory={block_size=4k};max_write_buffer_num=2"
Test Plan: unit test
Reviewers: yhchiang, rven, igor, ljin, jonahcohen
Reviewed By: jonahcohen
Subscribers: jonahcohen, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29223
Summary: Add -fno-exceptions flag to ROCKSDB_LITE.
Test Plan: make OPT=-DROCKSDB_LITE shared_lib -j32
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29901
Summary:
While running rocksdb tests, we sometimes encounter errors and
the test run stops. We now provide a new make target call check_some
which restarts the test run from a specific test and continues from
there depending on the value of the environment variable ROCKSDBTESTS_START
Test Plan:
Run make check_some with different values of
ROCKSDBTESTS_START.
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29913
Summary:
Add GetThreadList API, which allows developer to track the
status of each process. Currently, calling GetThreadList will
only get the list of background threads in RocksDB with their
thread-id and thread-type (priority) set. Will add more support
on this in the later diffs.
ThreadStatus currently has the following properties:
// An unique ID for the thread.
const uint64_t thread_id;
// The type of the thread, it could be ROCKSDB_HIGH_PRIORITY,
// ROCKSDB_LOW_PRIORITY, and USER_THREAD
const ThreadType thread_type;
// The name of the DB instance where the thread is currently
// involved with. It would be set to empty string if the thread
// does not involve in any DB operation.
const std::string db_name;
// The name of the column family where the thread is currently
// It would be set to empty string if the thread does not involve
// in any column family.
const std::string cf_name;
// The event that the current thread is involved.
// It would be set to empty string if the information about event
// is not currently available.
Test Plan:
./thread_list_test
export ROCKSDB_TESTS=GetThreadList
./db_test
Reviewers: rven, igor, sdong, ljin
Reviewed By: ljin
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D25047
Currently maven publishing uses the library with debug symbols. What
leads to unnecessary big library sizes. Included strip to remove
unnecessary stuff. 40M -> 2.7M
Previous to this commit too much targets got dependencies
on javadocs target.
Introduced one additional target "javalib" which resolves
that situation. JavaDoc will now be generated once while
executing a task with prefix "rocksdbjava".
Summary:
This is just a simple test that passes two files though a compaction. It shows the framework so that people can continue building new compaction *unit* tests.
In the future we might want to move some Compaction* tests from DBTest here. For example, CompactBetweenSnapshot seems a good candidate.
Hopefully this test can be simpler when we mock out VersionSet.
Test Plan: this is a test
Reviewers: ljin, rven, yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28449
Summary:
We need to turn on -Wshorten-64-to-32 for mobile. See D1671432 (internal phabricator) for details.
This diff turns on the warning flag and fixes all the errors. There were also some interesting errors that I might call bugs, especially in plain table. Going forward, I think it makes sense to have this flag turned on and be very very careful when converting 64-bit to 32-bit variables.
Test Plan: compiles
Reviewers: ljin, rven, yhchiang, sdong
Reviewed By: yhchiang
Subscribers: bobbaldwin, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28689
Summary:
This diff adds three sets of APIs to RocksDB.
= GetColumnFamilyMetaData =
* This APIs allow users to obtain the current state of a RocksDB instance on one column family.
* See GetColumnFamilyMetaData in include/rocksdb/db.h
= EventListener =
* A virtual class that allows users to implement a set of
call-back functions which will be called when specific
events of a RocksDB instance happens.
* To register EventListener, simply insert an EventListener to ColumnFamilyOptions::listeners
= CompactFiles =
* CompactFiles API inputs a set of file numbers and an output level, and RocksDB
will try to compact those files into the specified level.
= Example =
* Example code can be found in example/compact_files_example.cc, which implements
a simple external compactor using EventListener, GetColumnFamilyMetaData, and
CompactFiles API.
Test Plan:
listener_test
compactor_test
example/compact_files_example
export ROCKSDB_TESTS=CompactFiles
db_test
export ROCKSDB_TESTS=MetaData
db_test
Reviewers: ljin, igor, rven, sdong
Reviewed By: sdong
Subscribers: MarkCallaghan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D24705
Summary:
Only one more try, I promise.
I talked to Jim and he mentioned that if we include our system includes with -isystem rather than with -I, that signals to the compile that those are system includes and thus no warnings are issued. So I turned our glibc includes into system includes and now we no longer get the warning from there, making us shadow-warning-free!
Test Plan: compiles with both clang and gcc
Reviewers: sdong, yhchiang, rven, ljin
Reviewed By: ljin
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28479
Summary:
So glibc is not -Wshadow-safe, so we need to turn it off :(
error: ‘int sigaction(int, const sigaction*, sigaction*)’ hides
constructor for ‘struct sigaction’
The rest of the changes in this diff is that we include .h files under rocksdb namespace, which is a no-no.
Test Plan: compiles now
Reviewers: ljin, yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28413
Summary: It turns out that -Wshadow has different rules for gcc than clang. Previous commit fixed clang. This commits fixes the rest of the warnings for gcc.
Test Plan: compiles
Reviewers: ljin, yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28131
Summary:
TestMemEnv simulates all Env APIs using in-memory data structures.
We can use it to speed up db_test run, which is now reduced ~7mins when it is
enabled.
We can also add features to simulate power/disk failures in the next
step
TestMemEnv is derived from helper/mem_env
mem_env can not be used for rocksdb since some of its APIs do not give
the same results as env_posix. And its file read/write is not thread safe
Test Plan:
make all -j32
./db_test
./env_mem_test
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28035
Summary: as title
Test Plan: ./c_test
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28119
Summary:
...and fix all the errors :)
Jim suggested turning on -Wshadow because it helped him fix number of critical bugs in fbcode. I think it's a good idea to be -Wshadow clean.
Test Plan: compiles
Reviewers: yhchiang, rven, sdong, ljin
Reviewed By: ljin
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27711
Summary:
Rename Version::Builder to VersionBuilder and expose its definition to a header.
Make VerisonBuilder not reference Version or ColumnFamilyData, only working with VersionStorageInfo.
Add version_builder_test which has a simple test.
Test Plan: make all check
Reviewers: rven, yhchiang, igor, ljin
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D27969
Summary: Decoupling code that deals with archived log files outside of DBImpl. That will make this code easier to reason about and test. It will also make the code easier to improve, because an improver doesn't have to understand DBImpl code in entirety.
Test Plan: added test
Reviewers: ljin, yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27873
Summary:
Add some helper functions to make sure DB works well for non-default comparators.
Add a test for SimpleSuffixReverseComparator.
Test Plan: Run the new test
Reviewers: ljin, rven, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D27831
Summary:
Make compaction picker easier to test.
The basic idea is to separate a minimum subcomponent of Version to VersionStorageInfo, which just responsible to LSM tree. A stub VersionStorageInfo can then be easily created and passed into compaction picker so that we can check the outputs.
It now passes most tests. Still two things need to be done:
(1) deal with the FIFO compaction's file size.
(2) write an example test to make sure the interface can do the job.
Add a compaction_picker_test to make sure compaction picker codes can be easily unit tested.
Test Plan:
Pass all unit tests and compaction_picker_test
Reviewers: yhchiang, rven, igor, ljin
Reviewed By: ljin
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D27639
Summary:
This diff replaces BlockBasedTable in flush_job_test with TableMock, making it depend on less things and making it closer to an unit test than integration test.
It also introduces a framework to compile mock classes -- Any file named *mock.cc will not be compiled into the build. It will only get compiled into the tests. What way we can mock out most other classes, Version, VersionSet, DBImpl, etc.
Test Plan: flush_job_test
Reviewers: ljin, rven, yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27681
Summary:
Abstract out FlushProcess and take it out of DBImpl.
This also includes taking DeletionState outside of DBImpl.
Currently this diff is only doing the refactoring. Future work includes:
1. Decoupling flush_process.cc, make it depend on less state
2. Write flush_process_test, which will mock out everything that FlushProcess depends on and test it in isolation
Test Plan: make check
Reviewers: rven, yhchiang, sdong, ljin
Reviewed By: ljin
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27561
Summary: perf_context.get_from_output_files_time is now only set writable DB's DB::Get(). Extend it to MultiGet() and read only DB.
Test Plan:
make all check
Fix perf_context_test and extend it to cover MultiGet(), as long as read-only DB. Run it and watch the results
Reviewers: ljin, yhchiang, igor
Reviewed By: igor
Subscribers: rven, leveldb
Differential Revision: https://reviews.facebook.net/D24207
Summary:
Before this diff, there are two places with rocksdb versions. After the diff:
1. we only have one source of truth for rocksdb version
2. we have a script that we can use to get the version that we can use in other compilations (java, go, etc).
Test Plan: make
Reviewers: yhchiang, sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24333
Summary:
I put together a script to assist in the generation of deb's and
rpm's. I've tested that this works on ubuntu via vagrant. I've included the
Vagrantfile here, but I can remove it if it's not useful. The package.sh
script should work on any ubuntu or centos machine, I just added a bit of
logic in there to allow a base Ubuntu or Centos machine to be able to build
RocksDB from scratch.
Example output on Ubuntu 14.04:
```
root@vagrant-ubuntu-trusty-64:/vagrant# ./tools/package.sh
[+] g++-4.7 is already installed. skipping.
[+] libgflags-dev is already installed. skipping.
[+] ruby-all-dev is already installed. skipping.
[+] fpm is already installed. skipping.
Created package {:path=>"rocksdb_3.5_amd64.deb"}
root@vagrant-ubuntu-trusty-64:/vagrant# dpkg --info rocksdb_3.5_amd64.deb
new debian package, version 2.0.
size 17392022 bytes: control archive=1518 bytes.
275 bytes, 11 lines control
2911 bytes, 38 lines md5sums
Package: rocksdb
Version: 3.5
License: BSD
Vendor: Facebook
Architecture: amd64
Maintainer: rocksdb@fb.com
Installed-Size: 83358
Section: default
Priority: extra
Homepage: http://rocksdb.org/
Description: RocksDB is an embeddable persistent key-value store for fast storage.
```
Example output on CentOS 6.5:
```
[root@localhost vagrant]# rpm -qip rocksdb-3.5-1.x86_64.rpm
Name : rocksdb Relocations: /usr
Version : 3.5 Vendor: Facebook
Release : 1 Build Date: Mon 29 Sep 2014 01:26:11 AM UTC
Install Date: (not installed) Build Host: localhost
Group : default Source RPM: rocksdb-3.5-1.src.rpm
Size : 96231106 License: BSD
Signature : (none)
Packager : rocksdb@fb.com
URL : http://rocksdb.org/
Summary : RocksDB is an embeddable persistent key-value store for fast storage.
Description :
RocksDB is an embeddable persistent key-value store for fast storage.
```
Test Plan:
How this gets used is really up to the RocksDB core team. If you
want to actually get this into mainline, you might have to change `make
install` such that it install the RocksDB shared object file as well, which
would require you to link against gflags (maybe?) and that would require some
potential modifications to the script here (basically add a depends on that
package).
Currently, this will install the headers and a pre-compiled statically linked
object file. If that's what you want out of life, than this requires no
modifications.
Reviewers: ljin, yhchiang, igor
Reviewed By: igor
Differential Revision: https://reviews.facebook.net/D24141
Summary:
Intead of passing callback function pointer and its arg on Table::Get()
interface, passing GetContext. This makes the interface cleaner and
possible better perf. Also adding a fast pass for SaveValue()
Test Plan: make all check
Reviewers: igor, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24057
Summary:
Add make install. If INSTALL_PATH is not set, then rocksdb will be
installed under "/usr/local" directory (/usr/local/include for headers
and /usr/local/lib for library file(s).)
Test Plan:
Develop a simple rocksdb app, called test.cc, and do the followings.
make clean
make static_lib -j32
sudo make install
g++ -std=c++11 test.cc -lrocksdb -lbz2 -lz -o test
./test
sudo make uninstall
make clean
make shared_lib -j32
sudo make install
g++ -std=c++11 test.cc -lrocksdb -lbz2 -lz -o test
./test
make INSTALL_PATH=/tmp/path install
make INSTALL_PATH=/tmp/path uninstall
and make sure things are installed / uninstalled in the specified path.
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23211
Summary:
Introducing WriteController, which is a source of truth about per-DB write delays. Let's define an DB epoch as a period where there are no flushes and compactions (i.e. new epoch is started when flush or compaction finishes). Each epoch can either:
* proceed with all writes without delay
* delay all writes by fixed time
* stop all writes
The three modes are recomputed at each epoch change (flush, compaction), rather than on every write (which is currently the case).
When we have a lot of column families, our current pull behavior adds a big overhead, since we need to loop over every column family for every write. With new push model, overhead on Write code-path is minimal.
This is just the start. Next step is to also take care of stalls introduced by slow memtable flushes. The final goal is to eliminate function MakeRoomForWrite(), which currently needs to be called for every column family by every write.
Test Plan: make check for now. I'll add some unit tests later. Also, perf test.
Reviewers: dhruba, yhchiang, MarkCallaghan, sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22791
Summary:
1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file.
2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter.
3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type.
4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h.
5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc
Benchmark: base commit 1d23b5c470
Command:
db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1
Read QPS increase for about 30% from 2230002 to 2991411.
Test Plan:
make all check
valgrind db_test
db_stress --use_block_based_filter = 0
./auto_sanity_test.sh
Reviewers: igor, yhchiang, ljin, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D20979
Summary: In DBImpl::Recover method, while loading memtables, also check if memtables are empty. Use this in DBImplReadonly to determine whether to lookup memtable or not.
Test Plan:
db_test
make check all
Reviewers: sdong, yhchiang, ljin, igor
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22281
Summary:
Add WriteBatchWithIndex so that a user can query data out of a WriteBatch, to support MongoDB's read-its-own-write.
WriteBatchWithIndex uses a skiplist to store the binary index. The index stores the offset of the entry in the write batch. When searching for a key, the key for the entry is read by read the entry from the write batch from the offset.
Define a new iterator class for querying data out of WriteBatchWithIndex. A user can create an iterator of the write batch for one column family, seek to a key and keep calling Next() to see next entries.
I will add more unit tests if people are OK about this API.
Test Plan:
make all check
Add unit tests.
Reviewers: yhchiang, igor, MarkCallaghan, ljin
Reviewed By: ljin
Subscribers: dhruba, leveldb, xjin
Differential Revision: https://reviews.facebook.net/D21381
Summary: as title
Test Plan:
make all check
ran db_bench and saw seek stats at the end
Reviewers: yhchiang, igor, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D21651
Summary:
Contains the following changes:
- Implementation of cuckoo_table_factory
- Adding cuckoo table into AdaptiveTableFactory
- Adding cuckoo_table_db_test, similar to lines of plain_table_db_test
- Minor fixes to Reader: When a key is found in the table, return the key found instead of the search key.
- Minor fixes to Builder: Add table properties that are required by Version::UpdateTemporaryStats() during Get operation. Don't define curr_node as a reference variable as the memory locations may get reassigned during tree.push_back operation, leading to invalid memory access.
Test Plan:
cuckoo_table_reader_test --enable_perf
cuckoo_table_builder_test
cuckoo_table_db_test
make check all
make valgrind_check
make asan_check
Reviewers: sdong, igor, yhchiang, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D21219
* Script for building the unity.cc file via Makefile
* Unity executable Makefile target for testing builds
* Source code changes to fix compilation of unity build
Summary:
Contains:
- Implementation of TableReader based on Cuckoo Hashing
- Unittests for CuckooTableReader
- Performance test for TableReader
Test Plan:
make cuckoo_table_reader_test
./cuckoo_table_reader_test
make valgrind_check
make asan_check
Reviewers: yhchiang, sdong, igor, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D20511
Summary:
This diff is adding spatial index support to RocksDB.
When creating the DB user specifies a list of spatial indexes. Spatial indexes can cover different areas and have different resolution (i.e. number of tiles). This is useful for supporting different zoom levels.
Each element inserted into SpatialDB has:
* a bounding box, which determines how will the element be indexed
* string blob, which will usually be WKB representation of the polygon (http://en.wikipedia.org/wiki/Well-known_text)
* feature set, which is a map of key-value pairs, where value can be int, double, bool, null or a string. FeatureSet will be a set of tags associated with geo elements (for example, 'road': 'highway' and similar)
* a list of indexes to insert the element in. For example, small river element will be inserted in index for high zoom level, while country border will be inserted in all indexes (including the index for low zoom level).
Each query is executed on single spatial index. Query guarantees that it will return all elements intersecting the specified bounding box, but it might also return some extra non-intersecting elements.
Test Plan: Added bunch of unit tests in spatial_db_test
Reviewers: dhruba, yinwang
Reviewed By: yinwang
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D20361
Summary:
Add the missing ROCKSDB_JAR variable in Makefile, which is mistakenly
removed in https://reviews.facebook.net/D20289.
Test Plan:
export ROCKSDB_JAR=
make rocksdbjava
Summary:
Add a function GetOptions(), where based on four parameters users give: read/write amplification threshold, memory budget for mem tables and target DB size, it picks up a compaction style and parameters for them. Background threads are not touched yet.
One limit of this algorithm: since compression rate and key/value size are hard to predict, it's hard to predict level 0 file size from write buffer size. Simply make 1:1 ratio here.
Sample results: https://reviews.facebook.net/P477
Test Plan: Will add some a unit test where some sample scenarios are given and see they pick the results that make sense
Reviewers: yhchiang, dhruba, haobo, igor, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D18741
Summary:
Fixed some make and linking issues of RocksDBJava. Specifically:
* Add JAVA_LDFLAGS, which does not include gflags
* rocksdbjava library now uses JAVA_LDFLAGS instead of LDFLAGS
* java/Makefile now includes build_config.mk
* rearrange make rocksdbjava workflow to ensure the library file is correctly
included in the jar file.
Test Plan:
make rocksdbjava
make jdb_bench
java/jdb_bench.sh
Reviewers: dhruba, swapnilghike, zzbennett, rsumbaly, ankgup87
Reviewed By: ankgup87
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D20289
Summary:
Add -Wsign-compare to WARNING_FLAGS in Makefile as not all g++ compiler
include -Wsign-compare in -Wall when compiling '.h' file.
Test Plan: make -j32
Reviewers: ljin, igor, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D20169
Summary:
This is a rough sketch of our new document API. Would like to get some thoughts and comments about the high-level architecture and API.
I didn't optimize for performance at all. Leaving some low-hanging fruit so that we can be happy when we fix them! :)
Currently, bunch of features are not supported at all. Indexes can be only specified when creating database. There is no query planner whatsoever. This will all be added in due time.
Test Plan: Added a simple unit test
Reviewers: haobo, yhchiang, dhruba, sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D18747
Summary:
A generic rate limiter that can be shared by threads and rocksdb
instances. Will use this to smooth out write traffic generated by
compaction and flush. This will help us get better p99 behavior on flash
storage.
Test Plan:
unit test output
==== Test RateLimiterTest.Rate
request size [1 - 1023], limit 10 KB/sec, actual rate: 10.374969 KB/sec, elapsed 2002265
request size [1 - 2047], limit 20 KB/sec, actual rate: 20.771242 KB/sec, elapsed 2002139
request size [1 - 4095], limit 40 KB/sec, actual rate: 41.285299 KB/sec, elapsed 2202424
request size [1 - 8191], limit 80 KB/sec, actual rate: 81.371605 KB/sec, elapsed 2402558
request size [1 - 16383], limit 160 KB/sec, actual rate: 162.541268 KB/sec, elapsed 3303500
Reviewers: yhchiang, igor, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19359
Summary:
After evaluating options for JSON storage, I decided to implement our own. The reason is that we'll be able to optimize it better and we get to reduce unnecessary dependencies (which is what we'd get with folly).
I also plan to write a serializer/deserializer for JSONDocument with our own binary format similar to BSON. That way we'll store binary JSON format in RocksDB instead of the plain-text JSON. This means less storage and faster deserialization.
There are still some inefficiencies left here. I plan to optimize them after we develop a functioning DocumentDB. That way we can move and iterate faster.
Test Plan: added a unit test
Reviewers: dhruba, haobo, sdong, ljin, yhchiang
Reviewed By: haobo
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D18831