Commit Graph

1157 Commits

Author SHA1 Message Date
Dhruba Borthakur
30a700657d Fix corruption_test failure caused by auto-enablement of checksum verification.
Summary:
Patch  https://reviews.facebook.net/D15591 enabled checksum
verification by default. This caused the unit test to fail.

Test Plan: ./corruption_test

Reviewers: igor, kailiu

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15795
2014-01-31 17:16:38 -08:00
Igor Canadi
dbbffbd772 Mark the log_number file number used
Summary:
VersionSet::next_file_number_ is always assumed to be strictly greater than VersionSet::log_number_. In our new recovery code, we artificially set log_number_  to be (log_number + 1), so that once we flush, we don't recover from the same log file again (this is important because of merge operator non-idempotence)

When we set VersionSet::log_number_ to (log_number + 1), we also have to mark that file number used, such that next_file_number_ is increased to a legal level. Otherwise, VersionSet might assert.

This has not be a problem so far because here's what happens:
1. assume next_file_number is 5, we're recovering log_number 10
2. in DBImpl::Recover() we call MarkFileNumberUsed with 10. This will set VersionSet::next_file_number_ to 11.
3. If there are some updates, we will call WriteTable0ForRecovery(), which will use file number 11 as a new table file and advance VersionSet::next_file_number_ to 12.
4. When we LogAndApply() with log_number 11, assertion is true: assert(11 <= 12);

However, this was a lucky occurrence. Even though this diff doesn't cause a bug, I think the issue is important to fix.

Test Plan: In column families I have different recovery logic and this code path asserted. When adding MarkFileNumberUsed(log_number + 1) assert is gone.

Reviewers: dhruba, kailiu

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15783
2014-01-31 14:43:16 -08:00
Siying Dong
56bea9f80d When using Universal Compaction, Zero out seqID in the last file too
Summary: I didn't figure out the reason why the feature of zeroing out earlier sequence ID is disabled in universal compaction. I do see bottommost_level is set correctly. It should simply work if we remove the constraint of universal compaction.

Test Plan: make all check

Reviewers: haobo, dhruba, kailiu, igor

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15423
2014-01-31 09:59:52 -08:00
kailiu
4e0298f23c Clean up arena API
Summary:
Easy thing goes first. This patch moves arena to internal dir; based
on which, the coming patch will deal with memtable_rep.

Test Plan: make check

Reviewers: haobo, sdong, dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15615
2014-01-30 22:10:10 -08:00
Dhruba Borthakur
abd70ecc2b The default settings enable checksum verification on every read.
Summary: The default settings enable checksum verification on every read.

Test Plan: make check

Reviewers: haobo

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15591
2014-01-30 19:14:03 -08:00
Igor Canadi
ac92420fc5 Merge branch 'master' into performance
Conflicts:
	db/db_impl.h
2014-01-30 10:09:23 -08:00
Yueh-Hsuan Chiang
a46ac92138 Allow command line tool sst-dump to display table properties.
Summary:
Add option '--show_properties' to sst_dump tool to allow displaying
property block of the specified files.

Test Plan:
Run sst_dump with the following arguments, which covers cases affected by
this diff:

  1. with only --file
  2. with both --file and --show_properties
  3. with --file, --show_properties, and --from

Reviewers: kailiu, xjin

Differential Revision: https://reviews.facebook.net/D15453
2014-01-30 01:50:34 -08:00
Igor Canadi
3c0dcf0e25 InternalStatistics
Summary:
In DBImpl we keep track of some statistics internally and expose them via GetProperty(). This diff encapsulates all the internal statistics into a class InternalStatisics. Most of it is copy/paste.

Apart from cleaning up db_impl.cc, this diff is also necessary for Column families, since every column family should have its own CompactionStats, MakeRoomForWrite-stall stats, etc. It's much easier to keep track of it in every column family if it's nicely encapsulated in its own class.

Test Plan: make check

Reviewers: dhruba, kailiu, haobo, sdong, emayanke

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15273
2014-01-29 20:40:41 -08:00
kailiu
3170abd297 Remove unused classes
Summary: This is a followup diff for https://reviews.facebook.net/D15447, which picks the most simple task: delete some unused memtable reps.

Test Plan: make

Reviewers: haobo, dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15585
2014-01-29 16:40:36 -08:00
Lei Jin
d118707f8d set bg_error_ when background flush goes wrong
Summary: as title

Test Plan: unit test

Reviewers: haobo, igor, sdong, kailiu, dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15435
2014-01-29 15:55:58 -08:00
Lei Jin
fb84c49a36 convert Tickers back to array with padding and alignment
Summary:
Pad each Ticker structure to be 64 bytes and make them align on 64 bytes
boundary to avoid cache line false sharing issue.
Please refer to task 3615553 for more details

Test Plan:
db_bench

LevelDB:    version 2.0s
Date:       Wed Jan 29 12:23:17 2014
CPU:        32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
CPUCache:   20480 KB
rocksdb.build.overwrite.qps 49638
rocksdb.build.overwrite.p50_micros 58.73
rocksdb.build.overwrite.p75_micros 210.56
rocksdb.build.overwrite.p99_micros 733.28
rocksdb.build.fillseq.qps 366729
rocksdb.build.fillseq.p50_micros 1.00
rocksdb.build.fillseq.p75_micros 1.00
rocksdb.build.fillseq.p99_micros 2.65
rocksdb.build.readrandom.qps 1152995
rocksdb.build.readrandom.p50_micros 11.27
rocksdb.build.readrandom.p75_micros 15.69
rocksdb.build.readrandom.p99_micros 33.59
rocksdb.build.readrandom_smallblockcache.qps 956047
rocksdb.build.readrandom_smallblockcache.p50_micros 15.23
rocksdb.build.readrandom_smallblockcache.p75_micros 17.31
rocksdb.build.readrandom_smallblockcache.p99_micros 31.49
rocksdb.build.readrandom_memtable_sst.qps 1105183
rocksdb.build.readrandom_memtable_sst.p50_micros 12.04
rocksdb.build.readrandom_memtable_sst.p75_micros 15.78
rocksdb.build.readrandom_memtable_sst.p99_micros 32.49
rocksdb.build.readrandom_fillunique_random.qps 487856
rocksdb.build.readrandom_fillunique_random.p50_micros 29.65
rocksdb.build.readrandom_fillunique_random.p75_micros 40.93
rocksdb.build.readrandom_fillunique_random.p99_micros 78.68
rocksdb.build.memtablefillrandom.qps 91304
rocksdb.build.memtablefillrandom.p50_micros 171.05
rocksdb.build.memtablefillrandom.p75_micros 196.12
rocksdb.build.memtablefillrandom.p99_micros 291.73
rocksdb.build.memtablereadrandom.qps 1340411
rocksdb.build.memtablereadrandom.p50_micros 9.48
rocksdb.build.memtablereadrandom.p75_micros 13.95
rocksdb.build.memtablereadrandom.p99_micros 30.36
rocksdb.build.readwhilewriting.qps 491004
rocksdb.build.readwhilewriting.p50_micros 29.58
rocksdb.build.readwhilewriting.p75_micros 40.34
rocksdb.build.readwhilewriting.p99_micros 76.78

Reviewers: igor, haobo

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15573
2014-01-29 15:08:41 -08:00
Kai Liu
b7db241118 LIBNAME in Makefile is not really configurable
Summary:
In new third-party release tool, `LIBNAME=<customized_library> make`
will not really change the LIBNAME.

However it's very odd that the same approach works with old third-party
release tools. I checked previous rocksdb version and both librocksdb.a
and librocksdb_debug.a were correctly generated and copied to the
right place.

Test Plan:
`LIBNAME=hello make -j32` generates hello.a
`make -j32` generates librocksdb.a

Reviewers: igor, sdong, haobo, dhruba

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15555
2014-01-29 11:35:05 -08:00
kailiu
b1874af8bc Canonicalize "RocksDB" in make_new_version.sh
Summary: Change all occurrences of "rocksdb" to its canonical form "RocksDB".

Test Plan: N/A

Reviewers: igor

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15549
2014-01-29 10:19:34 -08:00
kailiu
c9eef784b7 Improve make_new_version.sh 2014-01-29 10:04:53 -08:00
Igor Canadi
9a597dc696 Installation instructions for CentOS 2014-01-29 08:43:11 -08:00
Igor Canadi
e57f0cc1a1 add include <atomic> to version_set.h 2014-01-29 08:17:43 -08:00
kailiu
9fe60d50ff Add history log and revise script
Summary:
* Add a change log for rocksdb releases.
* Remove the hacky parts of make_new_version.sh, which are either
  no longer useful or will be done in our dedicated 3rd-party release
  tool.

Test Plan: N/A

Reviewers: igor, haobo, sdong, dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15543
2014-01-28 20:24:41 -08:00
Lei Jin
9a126ba3b3 only corrupt private file checksum in backupable_db_test
Summary:
if it happens (randomly) to corrupt shared file in the test, then the
    checksum will be inconsistent between meta files from different backup.
    BackupEngine will then detect this issue and fail. But in reality, this
    does not happen since the checksum is checked on every backup. So here,
    only corrupt checksum of private file to let BackupEngine to construct
    properly (but fail during restore).

Test Plan: run test with valgrind

Reviewers: igor

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15531
2014-01-28 16:03:55 -08:00
Igor Canadi
5d2c62822e Only get the manifest file size if there is no error
Summary:
I came across this while working on column families. CorruptionTest::RecoverWriteError threw a SIGSEG because the descriptor_log_->file() was nullptr. I'm not sure why it doesn't happen in master, but better safe than sorry.

@kailiu, can we get this in release, too?

Test Plan: make check

Reviewers: kailiu, dhruba, haobo

Reviewed By: haobo

CC: leveldb, kailiu

Differential Revision: https://reviews.facebook.net/D15513
2014-01-28 16:02:51 -08:00
Igor Canadi
e5ec7384a0 Better interface to create BackupEngine
Summary: I think it looks nicer. In RocksDB we have both styles, but I think that static method is the more common version.

Test Plan: backupable_db_test

Reviewers: ljin, benj, swk

Reviewed By: ljin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15519
2014-01-28 16:01:53 -08:00
Igor Canadi
ec2fa4a690 Export BackupEngine
Summary:
Lots of clients have problems with using StackableDB interface. It's nice to have BackupableDB as a layer on top of DB, but not necessary.

This diff exports BackupEngine, which can be used to create backups without forcing clients to use StackableDB interface.

Test Plan: backupable_db_test

Reviewers: dhruba, ljin, swk

Reviewed By: ljin

CC: leveldb, benj

Differential Revision: https://reviews.facebook.net/D15477
2014-01-28 11:26:07 -08:00
kailiu
a5e220f5ef Merge branch 'master' into performance
Conflicts:
	Makefile
	db/db_impl.cc
	db/db_test.cc
	db/memtable_list.cc
	db/memtable_list.h
	table/block_based_table_reader.cc
	table/table_test.cc
	util/cache.cc
	util/coding.cc
2014-01-28 10:35:55 -08:00
Lei Jin
9dc29414e3 add checksum for backup files
Summary: Keep checksum of each backuped file in meta file. When it restores these files, compute their checksum on the fly and compare against what is in the meta file. Fail the restore process if checksum mismatch.

Test Plan: unit test

Reviewers: haobo, igor, sdong, kailiu

Reviewed By: igor

CC: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D15381
2014-01-28 09:43:36 -08:00
Mark Callaghan
90f29ccbef Update monitoring to include average time per compaction and stall
Summary:
The new columns are msComp and msStall that provide average time per compaction and stall for that level in milliseconds.
Level  Files Size(MB) Score Time(sec)  Read(MB) Write(MB)    Rn(MB)  Rnp1(MB)  Wnew(MB) RW-Amplify Read(MB/s) Write(MB/s)      Rn     Rnp1     Wnp1     NewW    Count   msComp   msStall  Ln-stall Stall-cnt
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  0        8       15   1.5         2         0        30         0         0        30        0.0       0.0        15.5        0        0        0        0       16      112       0.2       1.3      7568
  1        8       16   1.6         1        26        26        15        11        16        3.5      17.6        18.1        8        6       13        7        3      362       0.0       0.0         0
  2        1        2   0.0         0         0         2         0         0         2        0.0       0.0        18.4        0        0        0        0        1       50       0.0       0.0         0

Task ID: #

Blame Rev:

Test Plan:
run db_bench

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: haobo

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15345
2014-01-27 15:06:04 -08:00
Schalk-Willem Kruger
3d33da75ef Fix UnmarkEOF for partial blocks
Summary:
Blocks in the transaction log are a fixed size, but the last block in the transaction log file is usually a partial block. When a new record is added after the reader hit the end of the file, a new physical record will be appended to the last block. ReadPhysicalRecord can only read full blocks and assumes that the file position indicator is aligned to the start of a block. If the reader is forced to read further by simply clearing the EOF flag, ReadPhysicalRecord will read a full block starting from somewhere in the middle of a real block, causing it to lose alignment and to have a partial physical record at the end of the read buffer. This will result in length mismatches and checksum failures. When the log file is tailed for replication this will cause the log iterator to become invalid, necessitating the creation of a new iterator which will have to read the log file from scratch.

This diff fixes this issue by reading the remaining portion of the last block we read from. This is done when the reader is forced to read further (UnmarkEOF is called).

Test Plan:
- Added unit tests
- Stress test (with replication). Check dbdir/LOG file for corruptions.
- Test on test tier

Reviewers: emayanke, haobo, dhruba

Reviewed By: haobo

CC: vamsi, sheki, dhruba, kailiu, igor

Differential Revision: https://reviews.facebook.net/D15249
2014-01-27 14:49:10 -08:00
Igor Canadi
832158e7f7 Fsync directory after we create a new file
Summary:
@dhruba, I'm not sure where we need to sync the directory. I implemented the function in Env() and added the dir sync just after we close the newly created file in the builder.

Should I also add FsyncDir() to new files that get created by a compaction?

Test Plan: Confirmed that FsyncDir is returning Status::OK()

Reviewers: dhruba, haobo

Reviewed By: dhruba

CC: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D14751
2014-01-27 11:02:21 -08:00
Siying Dong
b20486f294 [Performance Branch] HashLinkList to avoid to convert length prefixed string back to internal keys
Summary: Converting from length prefixed buffer back to internal key costs some CPU but it is not necessary. In this patch, internal keys are pass though the functions so that we don't need to convert back to it.

Test Plan: make all check

Reviewers: haobo, kailiu

Reviewed By: kailiu

CC: igor, leveldb

Differential Revision: https://reviews.facebook.net/D15393
2014-01-27 10:26:14 -08:00
Igor Canadi
6c2ca1d3e6 Move NeedsCompaction() from VersionSet to Version
Summary: There is no reason to have functions NeedCompaction(), MaxCompactionScore() and MaxCompactionScoreLevel() in VersionSet, since they don't access any data in VersionSet.

Test Plan: make check

Reviewers: kailiu, haobo, sdong

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15333
2014-01-27 09:59:00 -08:00
Igor Canadi
e55b3c040c Fixing ref-counting memtables 2014-01-26 18:03:38 -08:00
Igor Canadi
983fafa56c Fix memory leak 2014-01-25 13:57:11 -08:00
Kai Liu
4b51dffcf8 Some refactorings on plain table
Summary:
Plain table has been working well and this is just a nit-picking patch,
which is generated during my coding reading. No real functional changes.
only some changes regarding:

* Improve some comments from the perspective a "new" code reader.
* Change some magic number to constant, which can help us to parameterize them
  in the future.
* Did some style, naming, C++ convention changes.
* Fix warnings from new "arc lint"

Test Plan: make check

Reviewers: sdong, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15429
2014-01-24 21:28:10 -08:00
Igor Canadi
04afa32134 Fix reduce levels
ReduceNumberOfLevels had segmentation fault in WriteSnapshot() since we
didn't change the number of levels in VersionSet (we consider them
immutable from now on). This fixes the problem.
2014-01-24 18:32:38 -08:00
Siying Dong
8477255da3 Moving Some includes from options.h to forward declaration
Summary: By removing some includes form options.h and reply on forward declaration, we can more easily reason the dependencies.

Test Plan: make all check

Reviewers: kailiu, haobo, igor, dhruba

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15411
2014-01-24 17:16:22 -08:00
Igor Canadi
f653fdcf5a Fixing iterator cleanup for Tailing iterator
Immutable tailing iterator doesn't set CleanupState::mem, so we don't
have to unref it.
2014-01-24 15:51:06 -08:00
Igor Canadi
b13bdfa500 Add a call DisownData() to Cache, which should speed up shutdown
Summary: On a shutdown, freeing memory takes a long time. If we're shutting down, we don't really care about memory leaks. I added a call to Cache that will avoid freeing all objects in cache.

Test Plan:
I created a script to test the speedup and demonstrate how to use the call: https://phabricator.fb.com/P3864368

Clean shutdown took 7.2 seconds, while fast and dirty one took 6.3 seconds. Unfortunately, the speedup is not that big, but should be bigger with bigger block_cache. I have set up the capacity to 80GB, but the script filled up only ~7GB.

Reviewers: dhruba, haobo, MarkCallaghan, xjin

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15069
2014-01-24 14:57:52 -08:00
Igor Canadi
677fee27c6 Make VersionSet::ReduceNumberOfLevels() static
Summary:
A lot of our code implicitly assumes number_levels to be static. ReduceNumberOfLevels() breaks that assumption. For example, after calling ReduceNumberOfLevels(), DBImpl::NumberLevels() will be different from VersionSet::NumberLevels(). This is dangerous. Thankfully, it's not in public headers and is only used from LDB cmd tool. LDB tool is only using it statically, i.e. it never calls it with running DB instance. With this diff, we make it explicitly static. This way, we can assume number_levels to be immutable and not break assumption that lot of our code is relying upon. LDB tool can still use the method.

Also, I removed the method from a separate file since it breaks filename completition. version_se<TAB> now completes to "version_set." instead of "version_set" (without the dot). I don't see a big reason that the function should be in a different file.

Test Plan: reduce_levels_test

Reviewers: dhruba, haobo, kailiu, sdong

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15303
2014-01-24 14:57:04 -08:00
Igor Canadi
c583157d49 MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.

This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.

I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())

Test Plan: `make check` works. I will also do `make valgrind_check` before commit

Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
Kai Liu
0ab766132b Re-org the table tests
Summary:
We'll divide the table tests into 3 buckets, plain table test, block-based table test and general table feature test.
This diff does no real change and only does the rename and reorg.

Test Plan: run table_test

Reviewers: sdong, haobo, igor, dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15417
2014-01-24 12:46:17 -08:00
kailiu
f131d4c280 Add a make target for shared library
Summary:
Previous we made `make release` also compile shared library. However it takes a long time to complete.

To make our development process more efficient. I added a new make target shared_lib.

User can of course run `make <library_name>` for direct compilation. However the <library_name> changed under certain condition. Thus we need `make shared_lib` to get rid of the memorization from users' side.

Test Plan: make shared_lib

Reviewers: igor, sdong, haobo, dhruba

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15309
2014-01-24 11:56:01 -08:00
Igor Canadi
e832e72b31 Revert "Moving to glibc-fb"
This reverts commit d24961b65e.

For some reason, glibc2.17-fb breaks gflags. Reverting for now
2014-01-24 11:50:38 -08:00
Kai Liu
7d991be400 Some small refactorings on table_test
Summary:

Just revise some hard-to-read or unnecessarily verbose code.

Test Plan:

make check
2014-01-24 11:10:32 -08:00
kailiu
66dc033af3 Temporarily disable caching index/filter blocks
Summary:
Mixing index/filter blocks with data blocks resulted in some known
issues.  To make sure in next release our users won't be affected,
we added a new option in BlockBasedTableFactory::TableOption to
conceal this functionality for now.

This patch also introduced a BlockBasedTableReader::OpenOptions,
which avoids the "infinite" growth of parameters in
BlockBasedTableReader::Open().

Test Plan: make check

Reviewers: haobo, sdong, igor, dhruba

Reviewed By: igor

CC: leveldb, tnovak

Differential Revision: https://reviews.facebook.net/D15327
2014-01-24 10:57:15 -08:00
Igor Canadi
d24961b65e Moving to glibc-fb
Summary:
It looks like we might have some trouble when building the new release with 4.8, since fbcode is using glibc2.17-fb by default and we are using glibc2.17. It was reported by Benjamin Renard in our internal group.

This diff moves our fbcode build to use glibc2.17-fb by default. I got some linker errors when compiling, complaining that `google::SetUsageMessage()` was undefined. After deleting all offending lines, the compile was successful and everything works.

Test Plan:
Compiled
Ran ./db_bench ./db_stress ./db_repl_stress

Reviewers: kailiu

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15405
2014-01-24 10:24:08 -08:00
Siying Dong
4605e20c58 If User setting of compaction multipliers overflow, use default value 1 instead
Summary: Currently, compaction multipliers can overflow and cause unexpected behaviors. In this patch, we detect those overflows and use multiplier 1 for them.

Test Plan: make all check

Reviewers: dhruba, haobo, igor, kailiu

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15321
2014-01-24 10:14:23 -08:00
kailiu
eda924a03a Remove an unused GetLengthPrefixedSlice
Summary: We have 3 versions of GetLengthPrefixedSlice() and one of them is no longer in use.

Test Plan: make

Reviewers: sdong, igor, haobo, dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15399
2014-01-23 23:06:52 -08:00
Lei Jin
aba2acb5ec CompactRange() to return status
Summary: as title

Test Plan:
make all check
What else tests shall I cover?

Reviewers: igor, haobo

CC:

Differential Revision: https://reviews.facebook.net/D15339
2014-01-23 16:41:46 -08:00
Kai Liu
054c5dda8c Merge branch 'master' into performance
Conflicts:
	db/db_impl.cc
	db/db_test.cc
	db/memtable.cc
	db/version_set.cc
	include/rocksdb/statistics.h
	util/statistics_imp.h
2014-01-23 16:32:49 -08:00
Tomislav Novak
81c9cc9b3b Tailing iterator
Summary:
This diff implements a special type of iterator that doesn't create a snapshot
(can be used to read newly inserted data) and is optimized for doing sequential
reads.

TailingIterator uses current superversion number to determine whether to
invalidate its internal iterators. If the version hasn't changed, it can often
avoid doing expensive seeks over immutable structures (sst files and immutable
memtables).

Test Plan:
* new unit tests
* running LD with this patch

Reviewers: igor, dhruba, haobo, sdong, kailiu

Reviewed By: sdong

CC: leveldb, lovro, march

Differential Revision: https://reviews.facebook.net/D15285
2014-01-23 16:26:08 -08:00
Igor Canadi
4e91f27c3a Fix performance regression in statistics
Summary:
For some reason, D15099 caused a big performance regression: https://fburl.com/16059000

After digging a bit, I figured out that the reason was that std::atomic_uint_fast64_t was allocated in an array. When I switched from an array to vector, the QPS returned to the previous level. I'm not sure why this is happening, but this diff seems to fix the performance regression.

Test Plan: I ran the regression script, observed the performance going back to normal

Reviewers: tnovak, kailiu, haobo

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15375
2014-01-23 16:11:55 -08:00
Kai Liu
bb19b530ca Aggressively inlining the short functions in coding.cc
Summary:
This diff takes an even more aggressive way to inline the functions. A decent rule that I followed is "not inline a function if it is more than 10 lines long."

Normally optimizing code by inline is ugly and hard to control, but since one of our usecase has significant amount of CPU used in functions from coding.cc, I'd like to try this diff out.

Test Plan:
1. the size for some .o file increased a little bit, but most less than 1%. So I think the negative impact of inline is negligible.
2. As the regression test shows (ran for 10 times and I calculated the average number)

    Metrics                                         Befor    After
    ========================================================================
    rocksdb.build.fillseq.qps                       426595   444515    (+4.6%)
    rocksdb.build.memtablefillrandom.qps            121739   123110
    rocksdb.build.memtablereadrandom.qps            1285103  1280520
    rocksdb.build.overwrite.qps                     125816   135570    (+9%)
    rocksdb.build.readrandom_fillunique_random.qps  285995   296863
    rocksdb.build.readrandom_memtable_sst.qps       1027132  1027279
    rocksdb.build.readrandom.qps                    1041427  1054665
    rocksdb.build.readrandom_smallblockcache.qps    1028631  1038433
    rocksdb.build.readwhilewriting.qps              918352   914629

Reviewers: haobo, sdong, igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15291
2014-01-23 16:03:34 -08:00