Summary:
This is a rough sketch of our new document API. Would like to get some thoughts and comments about the high-level architecture and API.
I didn't optimize for performance at all. Leaving some low-hanging fruit so that we can be happy when we fix them! :)
Currently, bunch of features are not supported at all. Indexes can be only specified when creating database. There is no query planner whatsoever. This will all be added in due time.
Test Plan: Added a simple unit test
Reviewers: haobo, yhchiang, dhruba, sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D18747
Summary:
A generic rate limiter that can be shared by threads and rocksdb
instances. Will use this to smooth out write traffic generated by
compaction and flush. This will help us get better p99 behavior on flash
storage.
Test Plan:
unit test output
==== Test RateLimiterTest.Rate
request size [1 - 1023], limit 10 KB/sec, actual rate: 10.374969 KB/sec, elapsed 2002265
request size [1 - 2047], limit 20 KB/sec, actual rate: 20.771242 KB/sec, elapsed 2002139
request size [1 - 4095], limit 40 KB/sec, actual rate: 41.285299 KB/sec, elapsed 2202424
request size [1 - 8191], limit 80 KB/sec, actual rate: 81.371605 KB/sec, elapsed 2402558
request size [1 - 16383], limit 160 KB/sec, actual rate: 162.541268 KB/sec, elapsed 3303500
Reviewers: yhchiang, igor, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19359
Summary:
After evaluating options for JSON storage, I decided to implement our own. The reason is that we'll be able to optimize it better and we get to reduce unnecessary dependencies (which is what we'd get with folly).
I also plan to write a serializer/deserializer for JSONDocument with our own binary format similar to BSON. That way we'll store binary JSON format in RocksDB instead of the plain-text JSON. This means less storage and faster deserialization.
There are still some inefficiencies left here. I plan to optimize them after we develop a functioning DocumentDB. That way we can move and iterate faster.
Test Plan: added a unit test
Reviewers: dhruba, haobo, sdong, ljin, yhchiang
Reviewed By: haobo
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D18831
Summary: We have a lot of problems with gflags. However, when compiling rocksdb static library, we don't need gflags dependency. Reorganize INSTALL.md such that first-time customers don't need any dependency installed to actually build rocksdb static library.
Test Plan: none
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D18501
Summary:
db_test includes Benchmark for LogAndApply. This diff removes it from db_test and puts it into a separate log_and_apply bench. I just wanted to play around with our new benchmark framework and figure out how it works.
I would also like to show you how great it is! I believe right set of microbenchmarks can speed up our productivity a lot and help catch early regressions.
Test Plan: no
Reviewers: dhruba, haobo, sdong, ljin, yhchiang
Reviewed By: yhchiang
CC: leveldb
Differential Revision: https://reviews.facebook.net/D18261
Summary: Added benchmark functionality on the lines of folly/Benchmark.h
Test Plan: Added unit tests
Reviewers: igor, haobo, sdong, ljin, yhchiang, dhruba
Reviewed By: igor
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17973
Summary:
The file tree structure in Version is prebuilt and the range of each file is known.
On the Get() code path, we do binary search in FindFile() by comparing
target key with each file's largest key and also check the range for each L0 file.
With some pre-calculated knowledge, each key comparision that has been done can serve
as a hint to narrow down further searches:
(1) If a key falls within a L0 file's range, we can safely skip the next
file if its range does not overlap with the current one.
(2) If a key falls within a file's range in level L0 - Ln-1, we should only
need to binary search in the next level for files that overlap with the current one.
(1) will be able to skip some files depending one the key distribution.
(2) can greatly reduce the range of binary search, especially for bottom
levels, given that one file most likely only overlaps with N files from
the level below (where N is max_bytes_for_level_multiplier). So on level
L, we will only look at ~N files instead of N^L files.
Some inital results: measured with 500M key DB, when write is light (10k/s = 1.2M/s), this
improves QPS ~7% on top of blocked bloom. When write is heavier (80k/s =
9.6M/s), it gives us ~13% improvement.
Test Plan: make all check
Reviewers: haobo, igor, dhruba, sdong, yhchiang
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17205
Summary:
Add a skeleton binding and test for BackupableDB which shows that BackupableDB
and RocksDB can share the same JNI calls.
Test Plan:
make rocksdbjava
make jtest
Reviewers: haobo, ankgup87, sdong, dhruba
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17793
Summary:
We don't really need sync_point.o if we're compiling with NDEBUG.
This diff depends on D17823
Test Plan: compiles
Reviewers: haobo, ljin, sdong
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17829
Summary:
Introducing RocksDBLite! Removes all the non-essential features and reduces the binary size. This effort should help our adoption on mobile.
Binary size when compiling for IOS (`TARGET_OS=IOS m static_lib`) is down to 9MB from 15MB (without stripping)
Test Plan: compiles :)
Reviewers: dhruba, haobo, ljin, sdong, yhchiang
Reviewed By: yhchiang
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17835
Summary:
This is first step of my effort to reduce size of librocksdb.a for use in mobile.
ldb object files are huge and are ment to be used as a command line tool. I moved them to `tools/` directory and include them only when compiling `ldb`
This diff reduced librocksdb.a from 42MB to 39MB on my mac (not stripped).
Test Plan: ran ldb
Reviewers: dhruba, haobo, sdong, ljin, yhchiang
Reviewed By: yhchiang
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17823
Summary:
* Add a benchmark for java binding for rocksdb. The java benchmark
is a complete rewrite based on the c++ db/db_bench.cc and the
DbBenchmark in dain's java leveldb.
* Support multithreading.
* 'readseq' is currently not supported as it requires RocksDB Iterator.
* usage:
--benchmarks
Comma-separated list of operations to run in the specified order
Actual benchmarks:
fillseq -- write N values in sequential key order in async mode
fillrandom -- write N values in random key order in async mode
fillbatch -- write N/1000 batch where each batch has 1000 values
in random key order in sync mode
fillsync -- write N/100 values in random key order in sync mode
fill100K -- write N/1000 100K values in random order in async mode
readseq -- read N times sequentially
readrandom -- read N times in random order
readhot -- read N times in random order from 1% section of DB
Meta Operations:
delete -- delete DB
DEFAULT: [fillseq, readrandom, fillrandom]
--compression_ratio
Arrange to generate values that shrink to this fraction of
their original size after compression
DEFAULT: 0.5
--use_existing_db
If true, do not destroy the existing database. If you set this
flag and also specify a benchmark that wants a fresh database, that benchmark will fail.
DEFAULT: false
--num
Number of key/values to place in database.
DEFAULT: 1000000
--threads
Number of concurrent threads to run.
DEFAULT: 1
--reads
Number of read operations to do. If negative, do --nums reads.
--key_size
The size of each key in bytes.
DEFAULT: 16
--value_size
The size of each value in bytes.
DEFAULT: 100
--write_buffer_size
Number of bytes to buffer in memtable before compacting
(initialized to default value by 'main'.)
DEFAULT: 4194304
--cache_size
Number of bytes to use as a cache of uncompressed data.
Negative means use default settings.
DEFAULT: -1
--seed
Seed base for random number generators.
DEFAULT: 0
--db
Use the db with the following name.
DEFAULT: /tmp/rocksdbjni-bench
* Add RocksDB.write().
Test Plan: make jbench
Reviewers: haobo, sdong, dhruba, ankgup87
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17433
Summary: This will allow us to disable them completely for iOS or for better performance
Test Plan: will run make all check
Reviewers: igor, haobo, dhruba
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17511
Summary:
I had to make number of changes to the code and Makefile:
* Add `make lib`, that will create static library without debug info. We need this to avoid growing binary too much. Currently it's 14MB.
* Remove cpuinfo() function and use __SSE4_2__ macro. We actually used the macro as part of Fast_CRC32() function.
As a result, I also accidentally fixed this issue: https://www.facebook.com/groups/rocksdb.dev/permalink/549700778461774/?stream_ref=2
* Remove __thread locals in OS_MACOSX
Test Plan: `make lib PLATFORM=IOS`
Reviewers: ljin, haobo, dhruba, sdong
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17475
Summary:
* Add java api for rocksdb::WriteBatch and rocksdb::WriteOptions, which are necessary components
for running benchmark.
* Add java test for org.rocksdb.WriteBatch and org.rocksdb.WriteOptions.
* Add remove() to org.rocksdb.RocksDB, and add put() and remove() to RocksDB which take
org.rocksdb.WriteOptions.
Test Plan: make jtest
Reviewers: haobo, sdong, dhruba
Reviewed By: sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17373
Summary:
* [java] Add a java api for rocksdb::Options, currently only supports create_if_missing.
* [java] Add a test for RocksDBException in RocksDBSample.
Test Plan: make jtest
Reviewers: haobo, sdong
Reviewed By: haobo
CC: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D17385
Summary:
This patch stores gps locations in rocksdb.
Each object is uniquely identified by an id. Each object has
a gps (latitude, longitude) associated with it. The geodb
supports looking up an object either by its gps location
or by its id. There is a method to retrieve all objects
within a circular radius centered at a specified gps location.
Test Plan: Simple unit-test attached.
Reviewers: leveldb, haobo
Reviewed By: haobo
CC: leveldb, tecbot, haobo
Differential Revision: https://reviews.facebook.net/D15567
Summary:
This diff contains a simple jni library for rocksdb which supports open, get,
put and closeusing default options (including Options, ReadOptions, and
WriteOptions.) In the usual case, Java developers can use the c++ rocksdb
library in the way similar to the following:
RocksDB db = RocksDB.open(path_to_db);
...
db.put("hello".getBytes(), "world".getBytes();
byte[] value = db.get("hello".getBytes());
...
db.close();
Specifically, this diff has the following major classes:
* RocksDB: a Java wrapper class which forwards the operations
from the java side to c++ rocksdb library.
* RocksDBException: ncapsulates the error of an operation.
This exception type is used to describe an internal error from
the c++ rocksdb library.
This diff also include a simple java sample code calling c++ rocksdb library.
To build the rocksdb jni library, simply run make jni, and make jtest will try to
build and run the sample code.
Note that if the rocksdb is not built with the default glibc that Java uses,
java will try to load the wrong glibc during the run time. As a result,
the sample code might not work properly during the run time.
Test Plan:
* make jni
* make jtest
Reviewers: haobo, dhruba, sdong, igor, ljin
Reviewed By: dhruba
CC: leveldb, xjin
Differential Revision: https://reviews.facebook.net/D17109
Summary:
@kailiu mentioned on meeting yesterday that we sometimes have trouble opening DB created by old version with the new version. This will be very important to test for column families, since I'm changing disk format for the MANIFEST.
I added a tool that can help us test that. Usage:
./db_sanity_test <path> create
will create a bunch of DBs under <path>
<change RocksDB version>
./db_sanity_test <path> verify
will verify consistency of DBs created under <path>
Test Plan: ran the db_sanity_test
Reviewers: kailiu, dhruba, haobo
Reviewed By: kailiu
CC: leveldb, kailiu, xjin
Differential Revision: https://reviews.facebook.net/D16605
Summary:
this is the key component extracted from diff: https://reviews.facebook.net/D14271
I separate it to a dedicated patch to make the review easier.
Test Plan: added a unit test and passed it.
Reviewers: haobo, sdong, dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16245
Summary:
This is not a generic thread local implementation in the sense that it
only takes pointer. But it does support multiple instances per thread
and lets user plugin function to perform cleanup when thread exits or an
instance gets destroyed.
Test Plan: unit test for now
Reviewers: haobo, igor, sdong, dhruba
Reviewed By: igor
CC: leveldb, kailiu
Differential Revision: https://reviews.facebook.net/D16131
Summary: A simple benchmark that simulates WAL append. It can be used to test different platform/file system's performance on WAL.
Test Plan: run it.
Reviewers: haobo, kailiu
Reviewed By: haobo
CC: igor, dhruba, i.am.jin.lei, yhchiang, leveldb, nkg-
Differential Revision: https://reviews.facebook.net/D16239
Summary:
Add a test to verify HashLinkList and HashSkipList (mainly for the former one) returns the correct results when inserting the same bucket in the different orders.
Some other changes:
(1) add the test to test list
(2) fix compile error
(3) add header
Test Plan: ./prefix_test
Reviewers: haobo, kailiu
Reviewed By: haobo
CC: igor, yhchiang, i.am.jin.lei, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D16143
Summary: Fix table_reader_bench after some interface changes. Add it to make to avoid future breaking
Test Plan: make table_reader_bench and run it with different options.
Reviewers: kailiu, haobo
Reviewed By: haobo
CC: igor, leveldb
Differential Revision: https://reviews.facebook.net/D16107
Summary: To speed up the compilation while allowing us to compile in debug mode.
Test Plan:
make: see -O2 enabled
make dbg: didn't see -O2
Reviewers: igor
Reviewed By: igor
CC: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D15969
Summary:
In new third-party release tool, `LIBNAME=<customized_library> make`
will not really change the LIBNAME.
However it's very odd that the same approach works with old third-party
release tools. I checked previous rocksdb version and both librocksdb.a
and librocksdb_debug.a were correctly generated and copied to the
right place.
Test Plan:
`LIBNAME=hello make -j32` generates hello.a
`make -j32` generates librocksdb.a
Reviewers: igor, sdong, haobo, dhruba
Reviewed By: igor
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15555
Summary:
Previous we made `make release` also compile shared library. However it takes a long time to complete.
To make our development process more efficient. I added a new make target shared_lib.
User can of course run `make <library_name>` for direct compilation. However the <library_name> changed under certain condition. Thus we need `make shared_lib` to get rid of the memorization from users' side.
Test Plan: make shared_lib
Reviewers: igor, sdong, haobo, dhruba
Reviewed By: igor
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15309
Compiling the shared libraries took a long time. Thus to speed up the development speed, it still makes sense to be separated from regular compilation.
Summary:
Added a script that reformat only the affected lines in a given diff.
I planned to make that file as pre-commit hook but looks it's a little bit more difficult than I thought. Since I don't want to spend too much time on this task right now, I eventually added a "make command" to achieve this with a few additional key strokes.
Also make the clang-format solely inherited from Google's style -- there are still debates on some of the style issues, but we can address them later once we reach a consensus.
Test Plan: Did some ugly format change and ran "make format", all affected lines are formatted as expected.
Reviewers: igor, sdong, haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15147
Summary:
Per request, some users need to use dynamic rocksdb library instead of static one.
However currently the dynamic libraries have to be manually compiled by default, which is inconvenient. I made dymamic libraries to be compiled by default.
Test Plan: make clean; make; make clean;
Reviewers: haobo, sdong, dhruba, igor
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15117
Summary:
This diff provides basic implementations of CreateColumnFamily(), DropColumnFamily() and ListColumnFamilies(). It builds on top of https://reviews.facebook.net/D14733
It also includes a bug fix for DBImplReadOnly, where Get implementation would be redirected to DBImpl instead of DBImplReadOnly.
Test Plan: Added unit test
Reviewers: dhruba, haobo, kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15021
Summary:
A vector that leverages pre-allocated stack-based array to achieve better
performance for array with small amount of items.
Test Plan:
Added tests for both correctness and performance
Here is the performance benchmark between vector and autovector
Please note that in the test "Creation and Insertion Test", the test case were designed with the motivation described below:
* no element inserted: internal array of std::vector may not really get
initialize.
* one element inserted: internal array of std::vector must have
initialized.
* kSize elements inserted. This shows the most time we'll spend if we
keep everything in stack.
* 2 * kSize elements inserted. The internal vector of
autovector must have been initialized.
Note: kSize is the capacity of autovector
=====================================================
Creation and Insertion Test
=====================================================
created 100000 vectors:
each was inserted with 0 elements
total time elapsed: 128000 (ns)
created 100000 autovectors:
each was inserted with 0 elements
total time elapsed: 3641000 (ns)
created 100000 VectorWithReserveSizes:
each was inserted with 0 elements
total time elapsed: 9896000 (ns)
-----------------------------------
created 100000 vectors:
each was inserted with 1 elements
total time elapsed: 11089000 (ns)
created 100000 autovectors:
each was inserted with 1 elements
total time elapsed: 5008000 (ns)
created 100000 VectorWithReserveSizes:
each was inserted with 1 elements
total time elapsed: 24271000 (ns)
-----------------------------------
created 100000 vectors:
each was inserted with 4 elements
total time elapsed: 39369000 (ns)
created 100000 autovectors:
each was inserted with 4 elements
total time elapsed: 10121000 (ns)
created 100000 VectorWithReserveSizes:
each was inserted with 4 elements
total time elapsed: 28473000 (ns)
-----------------------------------
created 100000 vectors:
each was inserted with 8 elements
total time elapsed: 75013000 (ns)
created 100000 autovectors:
each was inserted with 8 elements
total time elapsed: 18237000 (ns)
created 100000 VectorWithReserveSizes:
each was inserted with 8 elements
total time elapsed: 42464000 (ns)
-----------------------------------
created 100000 vectors:
each was inserted with 16 elements
total time elapsed: 102319000 (ns)
created 100000 autovectors:
each was inserted with 16 elements
total time elapsed: 76724000 (ns)
created 100000 VectorWithReserveSizes:
each was inserted with 16 elements
total time elapsed: 68285000 (ns)
-----------------------------------
=====================================================
Sequence Access Test
=====================================================
performed 100000 sequence access against vector
size: 4
total time elapsed: 198000 (ns)
performed 100000 sequence access against autovector
size: 4
total time elapsed: 306000 (ns)
-----------------------------------
performed 100000 sequence access against vector
size: 8
total time elapsed: 565000 (ns)
performed 100000 sequence access against autovector
size: 8
total time elapsed: 512000 (ns)
-----------------------------------
performed 100000 sequence access against vector
size: 16
total time elapsed: 1076000 (ns)
performed 100000 sequence access against autovector
size: 16
total time elapsed: 1070000 (ns)
-----------------------------------
Reviewers: dhruba, haobo, sdong, chip
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14655
Summary:
db_test should be the first to execute because it finds the most bugs.
Also, when third parties report issues, we don't want ldb error message, we prefer to have db_test error message. For example, see thread: https://github.com/facebook/rocksdb/issues/25
Test Plan: make check
Reviewers: dhruba, haobo, kailiu
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14715
Summary: as title
Test Plan: dynamic_bloom_test
Reviewers: dhruba, sdong, kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14385
Summary:
In this diff I present you BackupableDB v1. You can easily use it to backup your DB and it will do incremental snapshots for you.
Let's first describe how you would use BackupableDB. It's inheriting StackableDB interface so you can easily construct it with your DB object -- it will add a method RollTheSnapshot() to the DB object. When you call RollTheSnapshot(), current snapshot of the DB will be stored in the backup dir. To restore, you can just call RestoreDBFromBackup() on a BackupableDB (which is a static method) and it will restore all files from the backup dir. In the next version, it will even support automatic backuping every X minutes.
There are multiple things you can configure:
1. backup_env and db_env can be different, which is awesome because then you can easily backup to HDFS or wherever you feel like.
2. sync - if true, it *guarantees* backup consistency on machine reboot
3. number of snapshots to keep - this will keep last N snapshots around if you want, for some reason, be able to restore from an earlier snapshot. All the backuping is done in incremental fashion - if we already have 00010.sst, we will not copy it again. *IMPORTANT* -- This is based on assumption that 00010.sst never changes - two files named 00010.sst from the same DB will always be exactly the same. Is this true? I always copy manifest, current and log files.
4. You can decide if you want to flush the memtables before you backup, or you're fine with backing up the log files -- either way, you get a complete and consistent view of the database at a time of backup.
5. More things you can find in BackupableDBOptions
Here is the directory structure I use:
backup_dir/CURRENT_SNAPSHOT - just 4 bytes holding the latest snapshot
0, 1, 2, ... - files containing serialized version of each snapshot - containing a list of files
files/*.sst - sst files shared between snapshots - if one snapshot references 00010.sst and another one needs to backup it from the DB, it will just reference the same file
files/ 0/, 1/, 2/, ... - snapshot directories containing private snapshot files - current, manifest and log files
All the files are ref counted and deleted immediatelly when they get out of scope.
Some other stuff in this diff:
1. Added GetEnv() method to the DB. Discussed with @haobo and we agreed that it seems right thing to do.
2. Fixed StackableDB interface. The way it was set up before, I was not able to implement BackupableDB.
Test Plan:
I have a unittest, but please don't look at this yet. I just hacked it up to help me with debugging. I will write a lot of good tests and update the diff.
Also, `make asan_check`
Reviewers: dhruba, haobo, emayanke
Reviewed By: dhruba
CC: leveldb, haobo
Differential Revision: https://reviews.facebook.net/D14295
Summary:
A Simple plain table format. No block structure. When creating the table reader, scanning the full table to create indexes.
Test Plan:Add unit test
Reviewers:haobo,dhruba,kailiu
CC:
Task ID: #
Blame Rev:
Summary: Add asan_check rule to Makefile. After we add this, we will create Jenkins run that will check for asan errors!
Test Plan: make asan_check
Reviewers: dhruba, kailiu, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14205
Summary:
The primary motivation of the changes is to make it easier to figure out the inside of the tables.
* rename "table stats" to "table properties" since now we have more than "integers" to store in the property block.
* Add filter block size to the basic table properties.
* Whenever a table is built, we'll log the table properties (the sample output is in Test Plan).
* Make an api to expose deleted keys.
Test Plan:
Passed all existing test. and the sample output of table stats:
==================================================================
Basic Properties
------------------------------------------------------------------
# data blocks: 1
# entries: 1
raw key size: 9
raw average key size: 9
raw value size: 9
raw average value size: 0
data block size: 25
index block size: 27
filter block size: 18
(estimated) table size: 70
filter policy: rocksdb.BuiltinBloomFilter
==================================================================
User collected properties: InternalKeyPropertiesCollector
------------------------------------------------------------------
kDeletedKeys: 1
==================================================================
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14187
Summary:
1. Moved the compiler back to 4.8.1 and uses Centos 5.2 binaries if OS is Centos 5.2.
2. Fixes this issue: https://github.com/facebook/rocksdb/issues/7
3. We use lot of c++11 features, so we can't pretend we can compile without them. Makes it a first class dependency.
4. Fix blob_store_test, which failes on Ubuntu with "too many files opened" error
5. Removed dependency on port/port_chromium.h, which does not even exist on our system
Test Plan: make clean; make check
Reviewers: dhruba, kailiu
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14145
Summary: This diff invoves some more complicated issues in the posix environment.
Test Plan: works under mac os. will need to verify dev box.
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14061
Summary: The work to make sure mac os compiles rocksdb is not completed yet. But at least we can start cleaning some warnings captured only by g++ from mac os..
Test Plan: ran make in mac os
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14049
Summary: as title, half baked test for prefixhash memtable. Also contains deadlock test option
Test Plan: run it
Reviewers: igor, dhruba
Reviewed By: igor
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13887
Summary: It is a very simple benchmark to measure a Table implementation's Get() and iterator performance if all the data is in memory.
Test Plan: N/A
Reviewers: dhruba, haobo, kailiu
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13743
Summary: This patch makes Table and TableBuilder a abstract class and make all the implementation of the current table into BlockedBasedTable and BlockedBasedTable Builder.
Test Plan: Make db_test.cc to work with block based table. Add a new test simple_table_db_test.cc where a different simple table format is implemented.
Reviewers: dhruba, haobo, kailiu, emayanke, vamsi
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13521
Summary:
1. Added a new option that support user-defined table stats collection.
2. Added a deleted key stats collector in `utilities`
Test Plan:
Added a unit test for newly added code.
Also ran make check to make sure other tests are not broken.
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13491
Summary:
Finally, arc diff works again! This has been sitting in my repo for a while.
I would like some comments on my BlobStore benchmark. We don't have to check this in.
Also, I don't do any fsync in the BlobStore, so this is all extremely fast. I'm not sure what durability guarantees we need from the BlobStore.
Test Plan: Nope
Reviewers: dhruba, haobo, kailiu, emayanke
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13527
Summary:
Developing a capability for storing values on external backing file(s).
This is just a highly unoptimized first pass - supports:
1) Allocating some portion of external file to be used to store value
2) Freeing the range, enabling it to be reused by other values
As next steps, I plan to:
1) Create some kind of stress testing. Once I can measure stuff, I can focus on optimizing.
2) Optimize locking.
3) Optimize freelist data structure. Currently we have O(n) for both freeing and allocation.
4) Figure out how to do recovery.
Test Plan: Created a unit test.
Reviewers: dhruba, haobo, kailiu
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13389
Summary:
Jenkin reports errors that:
* Linking error on some machines. The error message shows it cannot find some gcov related symbols.
* lcov error due to the version issues.
Test Plan:
run make in different platforms
Reviewers:
CC:
Task ID: #
Blame Rev:
Summary: An api to query the level, key ranges, size etc for each SST file and an api to delete a specific file from the db and all associated state in the bookkeeping datastructures.
Notes: Editing the manifest version does not release the obsolete files right away. However deleting the file directly will mess up the iterator. We may need a more aggressive/timely file deletion api.
I have used std::unique_ptr - will switch to boost:: since this is external. thoughts?
Unit test is fragile right now as it expects the compaction at certain levels.
Test Plan: unittest
Reviewers: dhruba, vamsi, emayanke
CC: zshao, leveldb, haobo
Task ID: #
Blame Rev:
Summary: Previously, RocksDB's build scripts used relative pathnames like ./build_detect_platform. This can cause problems if the user uses CDPATH. Also, it just doesn't seem right to me.
Test Plan:
make clean
make -j32 check
Reviewers: MarkCallaghan, dhruba, kailiu
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12459
Summary:
-Added null checks and revisions to DBIter::MergeValuesNewToOld()
-Added DBIter test to stringappend_test
-Major fix with Merge and TTL
More plans for fixes later.
Test Plan:
-make clean; make stringappend_test -j 32; ./stringappend_test
-make all check;
Reviewers: haobo, emayanke, vamsi, dhruba
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12315
Summary:
The flags is accidentally introduced, and resulted in problems in 3rd party release.
Test Plan:
run make in all platforms (when doing the 3rd party release)
Summary:
* Added LIBNAME to enable configurable library name.
* remove/check fPIC in linux platform from build_detect_platform
Test Plan: make
Reviewers: emayanke
Differential Revision: https://reviews.facebook.net/D12321
Summary: As Aaron suggested, there are quite some problems with our Makefile and scripts. So in this diff I did some cleanup for them and revise some part of the scripts/makefile to help people better understand some mysterious parts.
Test Plan:
Ran make in several modes;
Ran the updated scripts.
Reviewers: dhruba, emayanke, akushner
Differential Revision: https://reviews.facebook.net/D12285
Summary:
This patch adds the ability for the user to add sequences of arbitrary data (blobs) to write batches. These blobs are saved to the log along with everything else in the write batch. You can add multiple blobs per WriteBatch and the ordering of blobs, puts, merges, and deletes are preserved.
Blobs are not saves to SST files. RocksDB ignores blobs in every way except for writing them to the log.
Before committing this patch, I need to add some test code. But I'm submitting it now so people can comment on the API.
Test Plan: make -j32 check
Reviewers: dhruba, haobo, vamsi
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12195
Summary:
As title. No locking/atomic is needed due to thread local. There is also no need to modify the existing client interface, in order to expose related counters.
perf_context_test shows a simple example of retrieving the number of user key comparison done for each put and get call. More counters could be added later.
Sample output
./perf_context_test 1000000
==== Test PerfContextTest.KeyComparisonCount
Inserting 1000000 key/value pairs
...
total user key comparison get: 43446523
total user key comparison put: 8017877
max user key comparison get: 88939
avg user key comparison get:43
Basically, the current skiplist does well on average, but could perform poorly in extreme cases.
Test Plan: run perf_context_test <total number of entries to put/get>
Reviewers: dhruba
Differential Revision: https://reviews.facebook.net/D12225
Summary:
Ultimate goals of the coverage report are:
* Report the coverage for all files (done in this diff)
* Report the coverage for recently updated files (not fully finished)
* Report is available in html form (done in this diff, but need some extra work to integrate it in Jenkin)
Task link: https://our.intern.facebook.com/intern/tasks/?s=1154818042&t=2604914
Test Plan:
Ran: coverage/coverage_test.sh
The sample output can be found here: https://phabricator.fb.com/P2433892
Reviewers: dhruba, emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D11943
Summary: did a find-replace
Test Plan: make
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D11979
Summary: Currently, merge_test uses /tmp/testdb for the test database. It should really use something more specific to merge_test. Most of the other tests use test::TmpDir() + "/<test name>db". This patch implements such behavior for merge_test; it makes merge_test use test::TmpDir() + "/merge_testdb"
Test Plan:
make clean
make -j32 merge_test
./merge_test
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D11877
Summary: rocksdb-2.0 released to third party
Test Plan: visual inspection
Reviewers: dhruba, haobo, sheki
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D11559
Summary:
With the Makefile now updated to correctly update all .o files, this
should fix the issues recompiling stringappend_test. This should also fix the
"segmentation-fault" that we were getting earlier. Now, stringappend_test should
be clean, and I have added it back to the unit-tests. Also made some minor updates
to the tests themselves.
Test Plan:
1. make clean; make stringappend_test -j 32 (will test it by itself)
2. make clean; make all check -j 32 (to run all unit tests)
3. make clean; make release (test in release mode)
4. valgrind ./stringappend_test (valgrind tests)
Reviewers: haobo, jpaton, dhruba
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D11505
Summary:
The old Makefile did not remove ALL .o and .d files, but rather only
those that happened to be in the root folder and one-level deep. This was causing
issues when recompiling files in deeper folders. This fix now causes make clean
to find ALL .o and .d files via a unix "find" command, and then remove them.
Test Plan:
make clean;
make all -j 32;
Reviewers: haobo, jpaton, dhruba
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D11493
Summary:
I'm concerned about a random seg-fault that sometimes occurs when
running stringappend_test. I will investigate further. First, I am removing
stringappend_test from the regular release tests, and making some clean-ups
to the code.
Test Plan:
1. make stringappend_test
2. ./stringappend_test
Reviewers: haobo, dhruba
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D11313
Summary:
Completed the implementation for the Redis API for Lists.
The Redis API uses rocksdb as a backend to persistently
store maps from key->list. It supports basic operations
for appending, inserting, pushing, popping, and accessing
a list, given its key.
Test Plan:
- Compile with: make redis_test
- Test with: ./redis_test
- Run all unit tests (for all rocksdb) with: make all check
- To use an interactive REDIS client use: ./redis_test -m
- To clean the database before use: ./redis_test -m -d
Reviewers: haobo, dhruba, zshao
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D10833
Summary:
I think the check for "error" that I added had caused
false alarm. Fixed that.
Test Plan:
Revert Plan: OK
Task ID: #
Reviewers: emayanke, dhruba
Reviewed By: emayanke
Differential Revision: https://reviews.facebook.net/D11139
Summary: Added a target to Makefile called 'tags' that runs ctags and cscope on all *.cc and *.h file
Test Plan:
Run 'make tags'. Then start vim and do
:set tags=./tags
:cs add cscope.out
These commands should give you no error messages. You should then be able to access cscope db and ctags as normal in vim.
Reviewers: dhruba
Differential Revision: https://reviews.facebook.net/D11103
Summary:
Implemented the StringAppendOperator class (subclass of MergeOperator).
Found in utilities/merge_operators/string_append/stringappend.{h,cc}
It is a rocksdb Merge Operator that supports string/list concatenation
with a configurable delimiter.
The tests are found in .../stringappend_test.cc. It implements a
map : key -> (list of strings), with core operations Append(list_key,val)
and Get(list_key).
Test Plan:
1. Navigate to your rocksdb repository
2. Execute: make stringappend_test (to compile)
3. Execute: ./stringappend_test (to run the tests)
4. Execute: make all check (to test the ENTIRE rocksdb codebase / regression)
Reviewers: haobo, dhruba, zshao
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D10737
Summary: Since we are keeping 'leveldb' instead of 'rocksdb' in third-party, this is only logical.
Test Plan: make clean;make
Reviewers: sheki, dhruba, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D10719
Summary:
This diff introduces a new Merge operation into rocksdb.
The purpose of this review is mostly getting feedback from the team (everyone please) on the design.
Please focus on the four files under include/leveldb/, as they spell the client visible interface change.
include/leveldb/db.h
include/leveldb/merge_operator.h
include/leveldb/options.h
include/leveldb/write_batch.h
Please go over local/my_test.cc carefully, as it is a concerete use case.
Please also review the impelmentation files to see if the straw man implementation makes sense.
Note that, the diff does pass all make check and truly supports forward iterator over db and a version
of Get that's based on iterator.
Future work:
- Integration with compaction
- A raw Get implementation
I am working on a wiki that explains the design and implementation choices, but coding comes
just naturally and I think it might be a good idea to share the code earlier. The code is
heavily commented.
Test Plan: run all local tests
Reviewers: dhruba, heyongqiang
Reviewed By: dhruba
CC: leveldb, zshao, sheki, emayanke, MarkCallaghan
Differential Revision: https://reviews.facebook.net/D9651
Summary:
When opened with DBTimestamp::Open call, timestamps are prepended to and stripped from the value during subsequent Put and Get calls respectively. The Timestamp is used to discard values in Get and custom compaction filter which have exceeded their TTL which is specified during Open.
Have made a temporary change to Makefile to let us test with the temporary file TestTime.cc. Have also changed the private members of db_impl.h to protected to let them be inherited by the new class DBTimestamp
Test Plan: make db_timestamp; TestTime.cc(will not check it in) shows how to use the apis currently, but I will write unit-tests shortly
Reviewers: dhruba, vamsi, haobo, sheki, heyongqiang, vkrest
Reviewed By: vamsi
CC: zshao, xjin, vkrest, MarkCallaghan
Differential Revision: https://reviews.facebook.net/D10311
Summary:
- don't see a point exposing table.h to the public.
- fixed make clean to remove also *.d files.
Test Plan: make check; db_stress
Reviewers: dhruba, heyongqiang
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D10479
Summary:
This diff provides the ability to print out a stacktrace when the process receives certain signals.
Currently, we enable this for the following signals (program error related):
SIGILL SIGSEGV SIGBUS SIGABRT
Application simply #include "util/stack_trace.h" and call leveldb::InstallStackTraceHandler() during initialization, if signal handler is needed. It's not done automatically when openning db, because it's the application(process)'s responsibility to install signal handler and some applications might already have their own (like fbcode).
Sample output:
Received signal 11 (Segmentation fault)
#0 0x408ff0 ./signal_test() [0x408ff0] /home/haobo/rocksdb/util/signal_test.cc:4
#1 0x40827d ./signal_test() [0x40827d] /home/haobo/rocksdb/util/signal_test.cc:24
#2 0x7f8bb183172e /usr/local/fbcode/gcc-4.7.1-glibc-2.14.1/lib/libc.so.6(__libc_start_main+0x10e) [0x7f8bb183172e] ??:0
#3 0x408ebc ./signal_test() [0x408ebc] /home/engshare/third-party/src/glibc/glibc-2.14.1/glibc-2.14.1/csu/../sysdeps/x86_64/elf/start.S:113
Segmentation fault (core dumped)
For each frame, we print the raw pointer, the symbol provided by backtrace_symbols (still not good enough), and the source file/line. Note that address translation is done by directly shell out to addr2line. ??:0 means addr2line fails to do the translation. Hacky, but I think it's good for now.
Test Plan: signal_test.cc
Reviewers: dhruba, MarkCallaghan
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D10173
Summary: To know which options the crashtest was run with. Also changed print to sys.stdout.write which is more standard.
Test Plan: python tools/db_crashtest.py
Reviewers: vamsi, akushner, dhruba
Reviewed By: akushner
Differential Revision: https://reviews.facebook.net/D10119
Summary: make crash_test will now invoke the crash_test. Also some cleanup in the db_crashtest.py file
Test Plan: make crash_test
Reviewers: akushner, vamsi, sheki, dhruba
Reviewed By: vamsi
Differential Revision: https://reviews.facebook.net/D9987
Summary: Renames in the Makefile. It will be used like this in third-party.
Test Plan: make
Reviewers: dhruba, sheki, heyongqiang, haobo
Reviewed By: heyongqiang
CC: leveldb
Differential Revision: https://reviews.facebook.net/D9633
Summary: Makefile had options to ignore sign-comparisons and unused-parameters, which should be there. Also fixed the specific errors in the code-base
Test Plan: make
Reviewers: chip, dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D9531
Summary: This option is needed for compilation and the open-sourced rocksdb version wiull need to get it from Makefile
Test Plan: make clean;make
Reviewers: MarkCallaghan, dhruba, sheki, chip
CC: leveldb
Differential Revision: https://reviews.facebook.net/D9243
Summary:
the valgrind version being used is in facebook specific path and should be moved to the fbcode.gcc471.sh file instead of the makefile.
The execution takes the environment's default valgrind version if the fbcode.gcc471.sh's valgrind_version is not available.
Test Plan: make valgrind_check
Reviewers: dhruba, sheki, akushner
Reviewed By: dhruba
Differential Revision: https://reviews.facebook.net/D9213
Summary:
valgrind 3.7.0 used currently has a bug that needs LD_PRELOAD being set as a workaround. This caused problems when run on jenkins. 3.8.1 has fixed this issue and we should use it from third party
Also, have done away with log files. The whole output will be there on the terminal and the failed tests will be listed at the end. This is done because jenkins only lets us download the different files and not view them in the browser which is undesirable.
Test Plan: make valgrind_check
Reviewers: akushner, dhruba, vamsi, sheki, heyongqiang
Reviewed By: sheki
CC: leveldb
Differential Revision: https://reviews.facebook.net/D9171
Summary:
When we use -O3, the gcc 4.7.1 compiler generates 'pinsrd' which is
not supported on machines with "vendor_id : AuthenticAMD".
Previous release of rocksdb used -O2.
Optimization -O2 was introduced at
772f75b3fb
Test Plan: make check
Reviewers: chip, heyongqiang, sheki
Reviewed By: sheki
CC: leveldb
Differential Revision: https://reviews.facebook.net/D9093
Summary:
The script valgrind_test.sh runs Valgrind for all tests in the makefile
including leak-checks and outputs the logs for every test in a separate file
with the name "valgrind_log_<testname>". It prints the failed tests in the file
"valgrind_failed_tests". All these files are created in the directory
"VALGRIND_LOGS" which can be changed in the Makefile.
Finally it checks the line-count for the file "valgrind_failed_tests"
and returns 0 if no tests failed and 1 otherwise.
Test Plan: ./valgrind_test.sh; Changed the tests to incorporte leaks and verified correctness
Reviewers: dhruba, sheki, MarkCallaghan
Reviewed By: sheki
CC: zshao
Differential Revision: https://reviews.facebook.net/D8877
Summary: Added automated valgrind testing for rocksdb by adding valgrind_check in the Makefile
Test Plan: make clean; make all check
Reviewers: dhruba, sheki, MarkCallaghan, zshao
Reviewed By: sheki
CC: leveldb
Differential Revision: https://reviews.facebook.net/D8787
Summary:
I missed InitTestDb() in one of my tess. InitTestDb() initializes the test directory, without which the test will throw IO error.
This problem didn't occur before because I've already run the tests before so the test directory is already there.
Test Plan:
Reviewers: dhruba
CC:
Task ID: #
Blame Rev:
Summary:
auto_roll_logger_test is failing because it cannot create the test dir, leading to IO error: /tmp/leveldbtest-6108/db_log_test/LOG: No such file or directory.
I'll temporary remove the unit test and will revert the test this problem is solved.
Test Plan:
make all check
Reviewers: dhruba
CC: leveldb
Task ID: #
Blame Rev:
Summary:
* Add a SplitByTTLLogger to enable this feature. In this diff I implemented generalized AutoSplitLoggerBase class to simplify the
development of such classes.
* Refactor the existing AutoSplitLogger and fix several bugs.
Test Plan:
* Added a unit tests for different types of "auto splitable" loggers individually.
* Tested the composited logger which allows the log files to be splitted by both TTL and log size.
Reviewers: heyongqiang, dhruba
Reviewed By: heyongqiang
CC: zshao, leveldb
Differential Revision: https://reviews.facebook.net/D8037
Summary:
Earlier way to record in histogram=>
Linear search BucketLimit array to find the bucket and increment the
counter
Current way to record in histogram=>
Store a HistMap statically which points the buckets of each value in the
range [kFirstValue, kLastValue);
In the proccess use vectors instead of array's and refactor some code to
HistogramHelper class.
Test Plan:
run db_bench with histogram=1 and see a histogram being
printed.
Reviewers: dhruba, chip, heyongqiang
Reviewed By: chip
CC: leveldb
Differential Revision: https://reviews.facebook.net/D8265
Summary:
We continually rebuilt build_version.c because we put the
current date into it, but that's what __DATE__ already is. This makes
builds faster.
This also fixes an issue with 'make clean FOO' not working properly.
Also tweak the build rules to be more consistent, always have warnings,
and add a 'make release' rule to handle flags for release builds.
Test Plan: make, make clean
Reviewers: dhruba
Reviewed By: dhruba
Differential Revision: https://reviews.facebook.net/D8139
Summary:
Specific changes:
1) Turn on -Werror so all warnings are errors
2) Fix some warnings the above now complains about
3) Add proper dependency support so changing a .h file forces a .c file
to rebuild
4) Automatically use fbcode gcc on any internal machine rather than
whatever system compiler is laying around
5) Fix jemalloc to once again be used in the builds (seemed like it
wasn't being?)
6) Fix issue where 'git' would fail in build_detect_version because of
LD_LIBRARY_PATH being set in the third-party build system
Test Plan:
make, make check, make clean, touch a header file, make sure
rebuild is expected
Reviewers: dhruba
Reviewed By: dhruba
Differential Revision: https://reviews.facebook.net/D7887