Commit Graph

828 Commits

Author SHA1 Message Date
Abhishek Kona
8e9c781ae5 [Rocksdb] Fix Crash on finding a db with no log files. Error out instead
Summary:
If the vector returned by GetUpdatesSince is empty, it is still returned to the
user. This causes it throw an std::range error.
The probable file list is checked and it returns an IOError status instead of OK now.

Test Plan: added a unit test.

Reviewers: dhruba, heyongqiang

Reviewed By: heyongqiang

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9771
2013-03-28 13:19:07 -07:00
Abhishek Kona
7fdd5f5b33 Use non-mmapd files for Write-Ahead Files
Summary:
Use non mmapd files for Write-Ahead log.
Earlier use of MMaped files. made the log iterator read ahead and miss records.
Now the reader and writer will point to the same physical location.

There is no perf regression :
./db_bench --benchmarks=fillseq --db=/dev/shm/mmap_test --num=$(million 20) --use_existing_db=0 --threads=2
with This diff :
fillseq      :      10.756 micros/op 185281 ops/sec;   20.5 MB/s
without this dif :
fillseq      :      11.085 micros/op 179676 ops/sec;   19.9 MB/s

Test Plan: unit test included

Reviewers: dhruba, heyongqiang

Reviewed By: heyongqiang

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9741
2013-03-28 13:13:35 -07:00
Abhishek Kona
63f216ee0a memory manage statistics
Summary:
Earlier Statistics object was a raw pointer. This meant the user had to clear up
the Statistics object after creating the database. In most use cases the database is created in a function and the statistics pointer is out of scope. Hence the statistics object would never be deleted.
Now Using a shared_ptr to manage this.

Want this in before the next release.

Test Plan: make all check.

Reviewers: dhruba, emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9735
2013-03-27 11:27:39 -07:00
Haobo Xu
ecd8db0200 [RocksDB] Minimize Mutex protected code section in the critical path
Summary: rocksdb uses a single global lock to protect in memory metadata. We should minimize the mutex protected code section to increase the effective parallelism of the program. See https://our.intern.facebook.com/intern/tasks/?t=2218928

Test Plan:
make check
db_bench

Reviewers: dhruba, heyongqiang

CC: zshao, leveldb

Differential Revision: https://reviews.facebook.net/D9705
2013-03-26 22:42:26 -07:00
Simon Marlow
a8bf8fe504 Integrate the manifest_dump command with ldb
Summary:
Syntax:

   manifest_dump [--verbose] --num=<manifest_num>

e.g.

$ ./ldb --db=/home/smarlow/tmp/testdb manifest_dump --num=12
manifest_file_number 13 next_file_number 14 last_sequence 3 log_number
11  prev_log_number 0
--- level 0 --- version# 0 ---
 6:116['a1' @ 1 : 1 .. 'a1' @ 1 : 1]
 10:130['a3' @ 2 : 1 .. 'a4' @ 3 : 1]
--- level 1 --- version# 0 ---
--- level 2 --- version# 0 ---
--- level 3 --- version# 0 ---
--- level 4 --- version# 0 ---
--- level 5 --- version# 0 ---
--- level 6 --- version# 0 ---

Test Plan: - Tested on an example DB (see output in summary)

Reviewers: sheki, dhruba

Reviewed By: sheki

CC: leveldb, heyongqiang

Differential Revision: https://reviews.facebook.net/D9609
2013-03-22 09:17:30 -07:00
Abhishek Kona
9b70529c86 Disable Unit Test for TransactionLogIteratorStall
Summary:
The unit test fails as our solution does not work with MMap'd files.
Disable the failing unit test. Put it back with the next diff which should fix the problem.

Test Plan: db_test

Reviewers: heyongqiang

CC: dhruba

Differential Revision: https://reviews.facebook.net/D9645
2013-03-21 15:51:18 -07:00
Abhishek Kona
27c15fb67e TransactionLogIter should stall at the last record. Currently it errors out
Summary:
* Add a method to check if the log reader is at EOF.
* If we know a record has been flushed force the log_reader to believe it is not at EOF, using a new method UnMarkEof().

This does not work with MMpaed files.

Test Plan: added a unit test.

Reviewers: dhruba, heyongqiang

Reviewed By: heyongqiang

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9567
2013-03-21 15:12:35 -07:00
Mayank Agarwal
38d54832f7 Initialize variable in constructor for PosixEnv::checkedDiskForMmap_
Summary: This caused compilation problems on some gcc platforms during the third-partyrelease

Test Plan: make

Reviewers: sheki

Reviewed By: sheki

Differential Revision: https://reviews.facebook.net/D9627
2013-03-21 11:26:50 -07:00
Abhishek Kona
91660e048f [Rocksdb] codemod NULL to nullptr in tools/*.cc
Summary: simple sed command to replace NULL in tools directory. Was missed by the previous codemod.

Test Plan: it compiles

Reviewers: emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9621
2013-03-21 10:45:57 -07:00
Dhruba Borthakur
d0798f67f4 Run compactions even if workload is readonly or read-mostly.
Summary:
The events that trigger compaction:
* opening the database
* Get -> only if seek compaction is not disabled and other checks are true
* MakeRoomForWrite -> when memtable is full
* BackgroundCall ->
  If the background thread is about to do a compaction run, it schedules
  a new background task to trigger a possible compaction. This will cause
  additional background threads to find and process other compactions that
  can run concurrently.

Test Plan: ran db_bench with overwrite and readonly alternatively.

Reviewers: sheki, MarkCallaghan

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9579
2013-03-20 23:43:29 -07:00
Dhruba Borthakur
ad96563b79 Ability to configure bufferedio-reads, filesystem-readaheads and mmap-read-write per database.
Summary:
This patch allows an application to specify whether to use bufferedio,
reads-via-mmaps and writes-via-mmaps per database. Earlier, there
was a global static variable that was used to configure this functionality.

The default setting remains the same (and is backward compatible):
 1. use bufferedio
 2. do not use mmaps for reads
 3. use mmap for writes
 4. use readaheads for reads needed for compaction

I also added a parameter to db_bench to be able to explicitly specify
whether to do readaheads for compactions or not.

Test Plan: make check

Reviewers: sheki, heyongqiang, MarkCallaghan

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9429
2013-03-20 23:14:03 -07:00
Dhruba Borthakur
2adddeefcb 1.5.8.1.fb release.
Summary:

Test Plan:

Reviewers:

CC:

Task ID: #

Blame Rev:
2013-03-20 14:16:53 -07:00
Mayank Agarwal
a6f4275403 Removing boost from ldb_cmd.cc
Summary: Getting rid of boost in our github codebase which caused problems on third-party

Test Plan: make ldb; python tools/ldb_test.py

Reviewers: sheki, dhruba

Reviewed By: sheki

Differential Revision: https://reviews.facebook.net/D9543
2013-03-20 11:19:12 -07:00
Mayank Agarwal
48abc06049 Using return value of fwrite in posix_logger.h
Summary: Was causing error(warning) in third-party saying unused result

Test Plan: make

Reviewers: sheki, dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D9447
2013-03-19 21:33:01 -07:00
Mayank Agarwal
b1bea58457 Fix more signed-unsigned comparisons
Summary: Some comparisons left in log_test.cc and db_test.cc complained by make

Test Plan: make

Reviewers: dhruba, sheki

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D9537
2013-03-19 17:21:36 -07:00
Mayank Agarwal
487168cdcf Fixed sign-comparison in rocksdb code-base and fixed Makefile
Summary: Makefile had options to ignore sign-comparisons and unused-parameters, which should be there. Also fixed the specific errors in the code-base

Test Plan: make

Reviewers: chip, dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9531
2013-03-19 14:35:23 -07:00
Mark Callaghan
72d14eafd3 add --benchmarks=levelstats option to db_bench, prevent "nan" in stats output
Summary:
Add --benchmarks=levelstats option to report per-level stats (#files, #bytes)
Change readwhilewriting test to report response time for writes but exclude
them from the stats merged by all threads.
Prevent "NaN" in stats output by preventing division by 0.
Remove "o" file I committed by mistake.

Task ID: #

Blame Rev:

Test Plan:
make check

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D9513
2013-03-19 13:14:44 -07:00
Abhishek Kona
02c459805b Ignore a zero-sized file while looking for a seq-no in GetUpdatesSince
Summary:
Rocksdb can create 0 sized log files when it is opened and closed without any operations.
The GetUpdatesSince fails currently if there is a log file of size zero.

This diff fixes this. If there is a log file is 0, it is removed form the probable_file_list

Test Plan: unit test

Reviewers: dhruba, heyongqiang

Reviewed By: heyongqiang

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9507
2013-03-19 11:00:09 -07:00
Abhishek Kona
7b9db9c98e DO not report level size as zero when there are no files in L0
Summary:
Instead of checking for number of files in L0. Check for number of files in the requested level.

Bug introduced in D4929 (diff trying to do too many things).

Test Plan: db_test.

Reviewers: dhruba, MarkCallaghan

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D9483
2013-03-18 12:04:38 -07:00
Mayank Agarwal
f04cc368f7 Fixing a careless mistake in ldb
Summary: negation of the condition checked currently had to be checkd actually

Test Plan: make ldb; python ldb_test.py

Reviewers: sheki, dhruba

Reviewed By: sheki

Differential Revision: https://reviews.facebook.net/D9459
2013-03-15 13:59:11 -07:00
Mayank Agarwal
a78fb5e8bc Doing away with boost in ldb_cmd.h
Summary: boost functions cause complications while deploying to third-party

Test Plan: make

Reviewers: sheki, dhruba

Reviewed By: sheki

Differential Revision: https://reviews.facebook.net/D9441
2013-03-14 18:16:46 -07:00
Mark Callaghan
5a8c8845a9 Enhance db_bench
Summary:
Add --benchmarks=updaterandom for read-modify-write workloads. This is different
from --benchmarks=readrandomwriterandom in a few ways. First, an "operation" is the
combined time to do the read & write rather than treating them as two ops. Second,
the same key is used for the read & write.

Change RandomGenerator to support rows larger than 1M. That was using "assert"
to fail and assert is compiled-away when -DNDEBUG is used.

Add more options to db_bench
--duration - sets the number of seconds for tests to run. When not set the
operation count continues to be the limit. This is used by random operation
tests.

--use_snapshot - when set GetSnapshot() is called prior to each random read.
This is to measure the overhead from using snapshots.

--get_approx - when set GetApproximateSizes() is called prior to each random
read. This is to measure the overhead for a query optimizer.

Task ID: #

Blame Rev:

Test Plan:
run db_bench

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D9267
2013-03-14 16:00:23 -07:00
Mayank Agarwal
e93dc3c0b8 Updating fbcode.gcc471.sh to use jemalloc 3.3.1
Summary: Updated TOOL_CHAIN_LIB_BASE to use the third-party version for jemalloc-3.3.1 which contains a bug fix in quarantine.cc. This was detected while debugging valgrind issues with the rocksdb table_test

Test Plan: make table_test;valgrind --leak-check=full ./table_test

Reviewers: dhruba, sheki, vamsi

Reviewed By: sheki

Differential Revision: https://reviews.facebook.net/D9387
2013-03-13 15:34:50 -07:00
Abhishek Kona
1ba5abca97 Use posix_fallocate as default.
Summary:
Ftruncate does not throw an error on disk-full. This causes Sig-bus in
the case where the database tries to issue a Put call on a full-disk.

Use posix_fallocate for allocation instead of truncate.
Add a check to use MMaped files only on ext4, xfs and tempfs, as
posix_fallocate is very slow on ext3 and older.

Test Plan: make all check

Reviewers: dhruba, chip

Reviewed By: dhruba

CC: adsharma, leveldb

Differential Revision: https://reviews.facebook.net/D9291
2013-03-13 13:50:26 -07:00
Mayank Agarwal
4e581c6ab4 Fix ldb_test.py to hide garbage from std output
Summary: ldb_test.py did a lot of assertFalse checks and displayed all the failed messages on the std output making it confusing to tell a successful from a failed run. Also many empty lines used to be needlessly printed. Also added some progression-"feel-good" lines in the tests

Test Plan: python ldb_test.py

Reviewers: dhruba, sheki, dilipj, chip

Reviewed By: dilipj

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9297
2013-03-12 21:07:07 -07:00
Mayank Agarwal
5b278b53ae Fix valgrind errors in rocksdb tests: auto_roll_logger_test, reduce_levels_test
Summary: Fix for memory leaks in rocksdb tests. Also modified the variable NUM_FAILED_TESTS to print the actual number of failed tests.

Test Plan: make <test>; valgrind --leak-check=full ./<test>

Reviewers: sheki, dhruba

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9333
2013-03-12 16:03:16 -07:00
Dhruba Borthakur
ebf16f57c9 Prevent segfault because SizeUnderCompaction was called without any locks.
Summary:
SizeBeingCompacted was called without any lock protection. This causes
crashes, especially when running db_bench with value_size=128K.
The fix is to compute SizeUnderCompaction while holding the mutex and
passing in these values into the call to Finalize.

(gdb) where
#4  leveldb::VersionSet::SizeBeingCompacted (this=this@entry=0x7f0b490931c0, level=level@entry=4) at db/version_set.cc:1827
#5  0x000000000043a3c8 in leveldb::VersionSet::Finalize (this=this@entry=0x7f0b490931c0, v=v@entry=0x7f0b3b86b480) at db/version_set.cc:1420
#6  0x00000000004418d1 in leveldb::VersionSet::LogAndApply (this=0x7f0b490931c0, edit=0x7f0b3dc8c200, mu=0x7f0b490835b0, new_descriptor_log=<optimized out>) at db/version_set.cc:1016
#7  0x00000000004222b2 in leveldb::DBImpl::InstallCompactionResults (this=this@entry=0x7f0b49083400, compact=compact@entry=0x7f0b2b8330f0) at db/db_impl.cc:1473
#8  0x0000000000426027 in leveldb::DBImpl::DoCompactionWork (this=this@entry=0x7f0b49083400, compact=compact@entry=0x7f0b2b8330f0) at db/db_impl.cc:1757
#9  0x0000000000426690 in leveldb::DBImpl::BackgroundCompaction (this=this@entry=0x7f0b49083400, madeProgress=madeProgress@entry=0x7f0b41bf2d1e, deletion_state=...) at db/db_impl.cc:1268
#10 0x0000000000428f42 in leveldb::DBImpl::BackgroundCall (this=0x7f0b49083400) at db/db_impl.cc:1170
#11 0x000000000045348e in BGThread (this=0x7f0b49023100) at util/env_posix.cc:941
#12 leveldb::(anonymous namespace)::PosixEnv::BGThreadWrapper (arg=0x7f0b49023100) at util/env_posix.cc:874
#13 0x00007f0b4a7cf10d in start_thread (arg=0x7f0b41bf3700) at pthread_create.c:301
#14 0x00007f0b49b4b11d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Test Plan:
make check

I am running db_bench with a value size of 128K to see if the segfault is fixed.

Reviewers: MarkCallaghan, sheki, emayanke

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9279
2013-03-11 14:09:01 -07:00
Dhruba Borthakur
c04c956b66 Make the build-time show up in the leveldb library.
Summary:
This is a regression caused by
772f75b3fb

If you do "strings libleveldb.a | grep leveldb_build_git_datetime" it will
show you the time when the binary was built.

Test Plan: make check

Reviewers: emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9273
2013-03-11 10:33:15 -07:00
Vamsi Ponnekanti
8ade935971 [Report the #gets and #founds in db_stress]
Summary:
Also added some comments and fixed some bugs in
stats reporting. Now the stats seem to match what is expected.

Test Plan:
[nponnekanti@dev902 /data/users/nponnekanti/rocksdb] ./db_stress --test_batches_snapshots=1 --ops_per_thread=1000 --threads=1 --max_key=320
LevelDB version     : 1.5
Number of threads   : 1
Ops per thread      : 1000
Read percentage     : 10
Delete percentage   : 30
Max key             : 320
Ratio #ops/#keys    : 3
Num times DB reopens: 10
Batches/snapshots   : 1
Num keys per lock   : 4
Compression         : snappy
------------------------------------------------
No lock creation because test_batches_snapshots set
2013/03/04-15:58:56  Starting database operations
2013/03/04-15:58:56  Reopening database for the 1th time
2013/03/04-15:58:56  Reopening database for the 2th time
2013/03/04-15:58:56  Reopening database for the 3th time
2013/03/04-15:58:56  Reopening database for the 4th time
Created bg thread 0x7f4542bff700
2013/03/04-15:58:56  Reopening database for the 5th time
2013/03/04-15:58:56  Reopening database for the 6th time
2013/03/04-15:58:56  Reopening database for the 7th time
2013/03/04-15:58:57  Reopening database for the 8th time
2013/03/04-15:58:57  Reopening database for the 9th time
2013/03/04-15:58:57  Reopening database for the 10th time
2013/03/04-15:58:57  Reopening database for the 11th time
2013/03/04-15:58:57  Limited verification already done during gets
Stress Test : 1811.551 micros/op 552 ops/sec
            : Wrote 0.10 MB (0.05 MB/sec) (598% of 1011 ops)
            : Wrote 6050 times
            : Deleted 3050 times
            : 500/900 gets found the key
            : Got errors 0 times

[nponnekanti@dev902 /data/users/nponnekanti/rocksdb] ./db_stress --ops_per_thread=1000 --threads=1 --max_key=320
LevelDB version     : 1.5
Number of threads   : 1
Ops per thread      : 1000
Read percentage     : 10
Delete percentage   : 30
Max key             : 320
Ratio #ops/#keys    : 3
Num times DB reopens: 10
Batches/snapshots   : 0
Num keys per lock   : 4
Compression         : snappy
------------------------------------------------
Creating 80 locks
2013/03/04-15:58:17  Starting database operations
2013/03/04-15:58:17  Reopening database for the 1th time
2013/03/04-15:58:17  Reopening database for the 2th time
2013/03/04-15:58:17  Reopening database for the 3th time
2013/03/04-15:58:17  Reopening database for the 4th time
Created bg thread 0x7fc0f5bff700
2013/03/04-15:58:17  Reopening database for the 5th time
2013/03/04-15:58:17  Reopening database for the 6th time
2013/03/04-15:58:18  Reopening database for the 7th time
2013/03/04-15:58:18  Reopening database for the 8th time
2013/03/04-15:58:18  Reopening database for the 9th time
2013/03/04-15:58:18  Reopening database for the 10th time
2013/03/04-15:58:18  Reopening database for the 11th time
2013/03/04-15:58:18  Starting verification
Stress Test : 1836.258 micros/op 544 ops/sec
            : Wrote 0.01 MB (0.01 MB/sec) (59% of 1011 ops)
            : Wrote 605 times
            : Deleted 305 times
            : 50/90 gets found the key
            : Got errors 0 times
2013/03/04-15:58:18  Verification successful

Revert Plan: OK

Task ID: #

Reviewers: emayanke, dhruba

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9081
2013-03-10 21:57:00 -07:00
amayank
3eed5c9c01 Putting option -std=gnu++0x in Makefile rather than fbcode.gcc471.sh
Summary: This option is needed for compilation and the open-sourced rocksdb version wiull need to get it from Makefile

Test Plan: make clean;make

Reviewers: MarkCallaghan, dhruba, sheki, chip

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9243
2013-03-08 11:56:18 -08:00
amayank
9e1c89cde8 Moving VALGRIND_VER which takes the valgrind version from third party to fbcode.gcc471.sh file
Summary:
the valgrind version being used is in facebook specific path and should be moved to the fbcode.gcc471.sh file instead of the makefile.
The execution takes the environment's default valgrind version if the fbcode.gcc471.sh's valgrind_version is not available.

Test Plan: make valgrind_check

Reviewers: dhruba, sheki, akushner

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D9213
2013-03-08 11:51:26 -08:00
Dhruba Borthakur
469724be7f Add appropriate parameters to make bulk-load go faster.
Summary:
1. Create only 2 levels so that manual compactions are fast.
2. Set target file size to a large value

Test Plan: make clean check

Reviewers: kailiu, zshao

Reviewed By: zshao

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9231
2013-03-08 10:52:16 -08:00
amayank
3b6653b1f8 Make db_stress Not purge redundant keys on some opens
Summary: In light of the new option introduced by commit 806e264350 where the database has an option to compact before flushing to disk, we want the stress test to test both sides of the option. Have made it to 'deterministically' and configurably change that option for reopens.

Test Plan: make db_stress; ./db_stress with some differnet options

Reviewers: dhruba, vamsi

Reviewed By: dhruba

CC: leveldb, sheki

Differential Revision: https://reviews.facebook.net/D9165
2013-03-08 04:55:07 -08:00
Dhruba Borthakur
6d812b6afb A mechanism to detect manifest file write errors and put db in readonly mode.
Summary:
If there is an error while writing an edit to the manifest file, the manifest
file is closed and reopened to check if the edit made it in. However, if the
re-opening of the manifest is unsuccessful and options.paranoid_checks is set
t true, then the db refuses to accept new puts, effectively putting the db
in readonly mode.

In a future diff, I would like to make the default value of paranoid_check
to true.

Test Plan: make check

Reviewers: sheki

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9201
2013-03-07 09:45:49 -08:00
amayank
3b87e2bd2a Use version 3.8.1 for valgrind in third_party and do away with log files
Summary:
valgrind 3.7.0 used currently has a bug that needs LD_PRELOAD being set as a workaround. This caused problems when run on jenkins. 3.8.1 has fixed this issue and we should use it from third party
Also, have done away with log files. The whole output will be there on the terminal and the failed tests will be listed at the end. This is done because jenkins only lets us download the different files and not view them in the browser which is undesirable.

Test Plan: make valgrind_check

Reviewers: akushner, dhruba, vamsi, sheki, heyongqiang

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9171
2013-03-06 17:47:31 -08:00
Abhishek Kona
d68880a1b9 Do not allow Transaction Log Iterator to fall ahead when writer is writing the same file
Summary:
Store the last flushed, seq no. in db_impl. Check against it in
transaction Log iterator. Do not attempt to read ahead if we do not know
if the data is flushed completely.
Does not work if flush is disabled. Any ideas on fixing that?
* Minor change, iter->Next is called the first time automatically for
* the first time.

Test Plan:
existing test pass.
More ideas on testing this?
Planning to run some stress test.

Reviewers: dhruba, heyongqiang

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9087
2013-03-06 14:05:53 -08:00
Dhruba Borthakur
afed60938f Fox db_stress crash by copying keys before changing sequencenum to zero.
Summary:
The compaction process zeros out sequence numbers if the output is
part of the bottommost level.
The Slice is supposed to refer to an immutable data buffer. The
merger that implements the priority queue while reading kvs as
the input of a compaction run reies on this fact. The bug was that
were updating the sequence number of a record in-place and that was
causing suceeding invocations of the merger to return kvs in
arbitrary order of sequence numbers.
The fix is to copy the key to a local memory buffer before setting
its seqno to 0.

Test Plan:
Set Options.purge_redundant_kvs_while_flush = false and then run
db_stress --ops_per_thread=1000 --max_key=320

Reviewers: emayanke, sheki

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9147
2013-03-06 10:52:08 -08:00
Abhishek Kona
2fb47d661a add -j nproc in valgrind check script 2013-03-05 17:38:13 -08:00
Abhishek Kona
760d511ece Fix Bash Script to run valgrind.
Test Plan: bash -n

Reviewers: emayanke

Differential Revision: https://reviews.facebook.net/D9129
2013-03-05 16:50:40 -08:00
Dhruba Borthakur
e7b726da08 Downgrade optimization level from -O3 to -O2.
Summary:
When we use -O3, the gcc 4.7.1 compiler generates 'pinsrd' which is
not supported on machines with "vendor_id       : AuthenticAMD".

Previous release of rocksdb used -O2.
Optimization -O2 was introduced at
772f75b3fb

Test Plan: make check

Reviewers: chip, heyongqiang, sheki

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9093
2013-03-05 10:54:02 -08:00
Zheng Shao
7b43500794 [RocksDB] Add bulk_load option to Options and ldb
Summary:
Add a shortcut function to make it easier for people
to efficiently bulk_load data into RocksDB.

Test Plan:
Tried ldb with "--bulk_load" and "--bulk_load --compact" and verified the outcome.
Needs to consult the team on how to test this automatically.

Reviewers: sheki, dhruba, emayanke, heyongqiang

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8907
2013-03-05 00:34:53 -08:00
Dhruba Borthakur
f5896681b4 Removed unnecesary file object in table_cache.
Summary:
TableCache->file is not used. remove it.
I kept the TableAndFile structure and will clean it up in a future patch.

Test Plan: make clean check

Reviewers: sheki, chip

Reviewed By: chip

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9075
2013-03-04 13:56:23 -08:00
Mark Callaghan
993543d1be Add rate_delay_limit_milliseconds
Summary:
This adds the rate_delay_limit_milliseconds option to make the delay
configurable in MakeRoomForWrite when the max compaction score is too high.
This delay is called the Ln slowdown. This change also counts the Ln slowdown
per level to make it possible to see where the stalls occur.

From IO-bound performance testing, the Level N stalls occur:
* with compression -> at the largest uncompressed level. This makes sense
                      because compaction for compressed levels is much
                      slower. When Lx is uncompressed and Lx+1 is compressed
                      then files pile up at Lx because the (Lx,Lx+1)->Lx+1
                      compaction process is the first to be slowed by
                      compression.
* without compression -> at level 1

Task ID: #1832108

Blame Rev:

Test Plan:
run with real data, added test

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D9045
2013-03-04 07:41:15 -08:00
Dhruba Borthakur
806e264350 Ability for rocksdb to compact when flushing the in-memory memtable to a file in L0.
Summary:
Rocks accumulates recent writes and deletes in the in-memory memtable.
When the memtable is full, it writes the contents on the memtable to
a file in L0.

This patch removes redundant records at the time of the flush. If there
are multiple versions of the same key in the memtable, then only the
most recent one is dumped into the output file. The purging of
redundant records occur only if the most recent snapshot is earlier
than the earliest record in the memtable.

Should we switch on this feature by default or should we keep this feature
turned off in the default settings?

Test Plan: Added test case to db_test.cc

Reviewers: sheki, vamsi, emayanke, heyongqiang

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8991
2013-03-04 00:01:47 -08:00
bil
4992633751 enable the ability to set key size in db_bench in rocksdb
Summary:
1. the default value for key size is still 16
2. enable the ability to set the key size via command line --key_size=

Test Plan:
build & run db_banch and pass some value via command line.
verify it works correctly.

Reviewers: sheki

Reviewed By: sheki

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8943
2013-03-01 14:10:09 -08:00
amayank
ec96ad5405 Automating valgrind to run with jenkins
Summary:
The script valgrind_test.sh runs Valgrind for all tests in the makefile
including leak-checks and outputs the logs for every test in a separate file
with the name "valgrind_log_<testname>". It prints the failed tests in the file
"valgrind_failed_tests". All these files are created in the directory
"VALGRIND_LOGS" which can be changed in the Makefile.
Finally it checks the line-count for the file "valgrind_failed_tests"
and returns 0 if no tests failed and 1 otherwise.

Test Plan: ./valgrind_test.sh; Changed the tests to incorporte leaks and verified correctness

Reviewers: dhruba, sheki, MarkCallaghan

Reviewed By: sheki

CC: zshao

Differential Revision: https://reviews.facebook.net/D8877
2013-03-01 11:44:40 -08:00
Abhishek Kona
c41f1e995c Codemod NULL to nullptr
Summary:
scripted NULL to nullptr in
* include/leveldb/
* db/
* table/
* util/

Test Plan: make all check

Reviewers: dhruba, emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D9003
2013-02-28 18:04:58 -08:00
Dhruba Borthakur
e45c7a8444 Abilty to support upto a million .sst files in the database
Summary:
There was an artifical limit of 50K files per database. This is
insifficient if the database is 1 TB in size and each file is 2 MB.

Test Plan: make check

Reviewers: sheki, emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8919
2013-02-26 16:27:51 -08:00
Abhishek Kona
a9866b721b Refactor statistics. Remove individual functions like incNumFileOpens
Summary:
Use only the counter mechanism. Do away with
incNumFileOpens, incNumFileClose, incNumFileErrors
s/NULL/nullptr/g in db/table_cache.cc

Test Plan: make clean check

Reviewers: dhruba, heyongqiang, emayanke

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8841
2013-02-25 13:58:34 -08:00
Vamsi Ponnekanti
465b9103f8 [Add a second kind of verification to db_stress
Summary:
Currently the test tracks all writes in memory and
uses it for verification at the end. This has 4 problems:
(a) It needs mutex for each write to ensure in-memory update
and leveldb update are done atomically. This slows down the
benchmark.
(b) Verification phase at the end is time consuming as well
(c) Does not test batch writes or snapshots
(d) We cannot kill the test and restart multiple times in a
loop because in-memory state will be lost.

I am adding a FLAGS_multi that does MultiGet/MultiPut/MultiDelete
instead of get/put/delete to get/put/delete a group of related
keys with same values atomically. Every get retrieves the group
of keys and checks that their values are same. This does not have
the above problems but the downside is that it does less amount
of validation than the other approach.

Test Plan:
This whole this is a test! Here is a small run. I am doing larger run now.

[nponnekanti@dev902 /data/users/nponnekanti/rocksdb] ./db_stress --ops_per_thread=10000 --multi=1 --ops_per_key=25
LevelDB version     : 1.5
Number of threads   : 32
Ops per thread      : 10000
Read percentage     : 10
Delete percentage   : 30
Max key             : 2147483648
Num times DB reopens: 10
Num keys per lock   : 4
Compression         : snappy
------------------------------------------------
Creating 536870912 locks
2013/02/20-16:59:32  Starting database operations
Created bg thread 0x7f9ebcfff700
2013/02/20-16:59:37  Reopening database for the 1th time
2013/02/20-16:59:46  Reopening database for the 2th time
2013/02/20-16:59:57  Reopening database for the 3th time
2013/02/20-17:00:11  Reopening database for the 4th time
2013/02/20-17:00:25  Reopening database for the 5th time
2013/02/20-17:00:36  Reopening database for the 6th time
2013/02/20-17:00:47  Reopening database for the 7th time
2013/02/20-17:00:59  Reopening database for the 8th time
2013/02/20-17:01:10  Reopening database for the 9th time
2013/02/20-17:01:20  Reopening database for the 10th time
2013/02/20-17:01:31  Reopening database for the 11th time
2013/02/20-17:01:31  Starting verification
Stress Test : 109.125 micros/op 22191 ops/sec
            : Wrote 0.00 MB (0.23 MB/sec) (59% of 32 ops)
            : Deleted 10 times
2013/02/20-17:01:31  Verification successful

Revert Plan: OK

Task ID: #

Reviewers: dhruba, emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8733
2013-02-22 12:20:11 -08:00