Commit Graph

1309 Commits

Author SHA1 Message Date
Dhruba Borthakur
5b0fe6c73b Greedy algorithm for picking files to compact.
Summary:
It is best if we pick the largest file to compact in a level.
This reduces the write amplification factor for compactions.
Each level has an auxiliary data structure called files_by_size_
that sorts all files by their size. This data structure is
updated when a new version is created.

Test Plan: make check

Differential Revision: https://reviews.facebook.net/D6195
2012-10-25 18:27:53 -07:00
Dhruba Borthakur
8fb5f40468 firstIndex fix for multi-threaded compaction code.
Summary:
Prior to multi-threaded compaction, wrap-around would be done by using
current_->files_[level[0]. With this change we should be
using the first file for which f->being_compacted is not true.

1ca0584345 (commitcomment-2041516)

Test Plan: make check

Differential Revision: https://reviews.facebook.net/D6165
2012-10-25 08:44:47 -07:00
Mark Callaghan
e7206f43ee Improve statistics
Summary:
This adds more statistics to be reported by GetProperty("leveldb.stats").
The new stats include time spent waiting on stalls in MakeRoomForWrite.
This also includes the total amplification rate where that is:
    (#bytes of sequential IO during compaction) / (#bytes from Put)
This also includes a lot more data for the per-level compaction report.
* Rn(MB) - MB read from level N during compaction between levels N and N+1
* Rnp1(MB) - MB read from level N+1 during compaction between levels N and N+1
* Wnew(MB) - new data written to the level during compaction
* Amplify - ( Write(MB) + Rnp1(MB) ) / Rn(MB)
* Rn - files read from level N during compaction between levels N and N+1
* Rnp1 - files read from level N+1 during compaction between levels N and N+1
* Wnp1 - files written to level N+1 during compaction between levels N and N+1
* NewW - new files written to level N+1 during compaction
* Count - number of compactions done for this level

This is the new output from DB::GetProperty("leveldb.stats"). The old output stopped at Write(MB)

                               Compactions
Level  Files Size(MB) Time(sec) Read(MB) Write(MB)  Rn(MB) Rnp1(MB) Wnew(MB) Amplify Read(MB/s) Write(MB/s)   Rn Rnp1 Wnp1 NewW Count
-------------------------------------------------------------------------------------------------------------------------------------
  0        3        6        33        0       576       0        0      576    -1.0       0.0         1.3     0    0    0    0   290
  1      127      242       351     5316      5314     570     4747      567    17.0      12.1        12.1   287 2399 2685  286    32
  2      161      328        54      822       824     326      496      328     4.0       1.9         1.9   160  251  411  160   161
Amplification: 22.3 rate, 0.56 GB in, 12.55 GB out
Uptime(secs): 439.8
Stalls(secs): 206.938 level0_slowdown, 0.000 level0_numfiles, 24.129 memtable_compaction

Task ID: #

Blame Rev:

Test Plan:
run db_bench

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
(cherry picked from commit ecdeead38f86cc02e754d0032600742c4f02fec8)

Reviewers: dhruba

Differential Revision: https://reviews.facebook.net/D6153
2012-10-24 14:21:38 -07:00
Dhruba Borthakur
47bce26aca Merge branch 'master' into performance 2012-10-23 22:32:54 -07:00
Dhruba Borthakur
3b06f94fa2 Merge branch 'master' into performance
Conflicts:
	db/db_impl.cc
	db/db_impl.h
	db/version_set.cc
2012-10-23 22:30:07 -07:00
Mark Callaghan
51d2adfbeb Fix broken build. Add stdint.h to get uint64_t
Summary:
I still get failures from this. Not sure whether there was a fix in progress.

Task ID: #

Blame Rev:

Test Plan:
compile

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D6147
2012-10-23 14:58:53 -07:00
Dhruba Borthakur
4c107587ed Delete files outside the mutex.
Summary:
The compaction process deletes a large number of files. This takes
quite a bit of time and is best done outside the mutex lock.

Test Plan: make check

Differential Revision: https://reviews.facebook.net/D6123
2012-10-22 11:53:23 -07:00
heyongqiang
5010daa7a8 add "seek_compaction" to log for better debug Summary:
Summary: as subject

Test Plan: compile

Reviewers: dhruba

Reviewed By: dhruba

CC: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6117
2012-10-22 10:00:25 -07:00
Dhruba Borthakur
3489cd615c Merge branch 'master' into performance
Conflicts:
	db/db_impl.cc
	db/db_impl.h
2012-10-21 02:15:19 -07:00
Dhruba Borthakur
f95219fb32 Delete files outside the mutex.
Summary:
The compaction process deletes a large number of files. This takes
quite a bit of time and is best done outside the mutex lock.

Test Plan: make check

Differential Revision: https://reviews.facebook.net/D6123
2012-10-21 02:03:00 -07:00
Dhruba Borthakur
98f23cf04a Merge branch 'master' into performance
Conflicts:
	db/db_impl.cc
	db/db_impl.h
2012-10-21 01:55:19 -07:00
Dhruba Borthakur
64c4b9f0e2 Delete files outside the mutex.
Summary:
The compaction process deletes a large number of files. This takes
quite a bit of time and is best done outside the mutex lock.

Test Plan:

Reviewers:

CC:

Task ID: #

Blame Rev:
2012-10-21 01:49:48 -07:00
Dhruba Borthakur
50166999ef Merge branch 'master' into performance 2012-10-19 16:08:04 -07:00
Dhruba Borthakur
507f5aac73 Do not enable checksums for zlib compression.
Summary:
Leveldb code already calculates checksums for each block. There
is no need to generate checksums inside zlib. This patch switches-off checksum generation/checking in zlib library.

(The Inno support for zlib uses windowsBits=14 as well.)

pfabricator marks this file as binary. But here is the diff

diff --git a/port/port_posix.h b/port/port_posix.h
index 86a0927..db4e0b8 100644
--- a/port/port_posix.h
+++ b/port/port_posix.h
@@ -163,7 +163,7 @@ inline bool Snappy_Uncompress(const char* input, size_t length,
 }

 inline bool Zlib_Compress(const char* input, size_t length,
-    ::std::string* output, int windowBits = 15, int level = -1,
+    ::std::string* output, int windowBits = -14, int level = -1,
      int strategy = 0) {
 #ifdef ZLIB
   // The memLevel parameter specifies how much memory should be allocated for
@@ -223,7 +223,7 @@ inline bool Zlib_Compress(const char* input, size_t length,
 }

 inline char* Zlib_Uncompress(const char* input_data, size_t input_length,
-    int* decompress_size, int windowBits = 15) {
+    int* decompress_size, int windowBits = -14) {
 #ifdef ZLIB
   z_stream _stream;
   memset(&_stream, 0, sizeof(z_stream));

Test Plan: run db_bench with zlib compression.

Reviewers: heyongqiang, MarkCallaghan

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D6105
2012-10-19 16:06:33 -07:00
Dhruba Borthakur
e982f5a1d2 Merge branch 'master' into performance
Conflicts:
	util/options.cc
2012-10-19 15:16:42 -07:00
Dhruba Borthakur
cf5adc8016 db_bench was not correctly initializing the value for delete_obsolete_files_period_micros option.
Summary:
The parameter delete_obsolete_files_period_micros controls the
periodicity of deleting obsolete files. db_bench was reading in
this parameter intoa local variable called 'l' but was incorrectly
using another local variable called 'n' while setting it in the
db.options data structure.
This patch also logs the value of delete_obsolete_files_period_micros
in the LOG file at db startup time.

I am hoping that this will improve the overall write throughput drastically.

Test Plan: run db_bench

Reviewers: MarkCallaghan, heyongqiang

Reviewed By: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6099
2012-10-19 15:10:12 -07:00
Dhruba Borthakur
1ca0584345 This is the mega-patch multi-threaded compaction
published in https://reviews.facebook.net/D5997.

Summary:
This patch allows compaction to occur in multiple background threads
concurrently.

If a manual compaction is issued, the system falls back to a
single-compaction-thread model. This is done to ensure correctess
and simplicity of code. When the manual compaction is finished,
the system resumes its concurrent-compaction mode automatically.

The updates to the manifest are done via group-commit approach.

Test Plan: run db_bench
2012-10-19 14:00:53 -07:00
Dhruba Borthakur
cd93e82845 Enable SSE when building with fbcode support.
Summary:
fbcode build now support SSE instructions.
Delete older version of the compile-helper fbcode.sh. This is
subsumed by fbcode.gcc471.sh.

Test Plan: run make check

Reviewers: heyongqiang, MarkCallaghan

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D6057
2012-10-18 08:43:25 -07:00
Dhruba Borthakur
aa73538f2a The deletion of obsolete files should not occur very frequently.
Summary:
The method DeleteObsolete files is a very costly methind, especially
when the number of files in a system is large. It makes a list of
all live-files and then scans the directory to compute the diff.
By default, this method is executed after every compaction run.

This patch makes it such that DeleteObsolete files is never
invoked twice within a configured period.

Test Plan: run all unit tests

Reviewers: heyongqiang, MarkCallaghan

Reviewed By: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6045
2012-10-16 10:26:10 -07:00
Dhruba Borthakur
0230866791 Enhance db_bench to allow setting the number of levels in a database.
Summary: Enhance db_bench to allow setting the number of levels in a database.

Test Plan: run db_bench and look at LOG

Reviewers: heyongqiang, MarkCallaghan

Reviewed By: MarkCallaghan

CC: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6027
2012-10-15 10:18:49 -07:00
Dhruba Borthakur
5dc784c233 Fix compilation problem with db_stress when using C11 compiler.
Summary:

Test Plan:

Reviewers:

CC:

Task ID: #

Blame Rev:
2012-10-12 17:00:25 -07:00
Asad K Awan
24f7983b1f [tools] Add a tool to stress test concurrent writing to levelDB
Summary:
Created a tool that runs multiple threads that concurrently read and write to levelDB.
All writes to the DB are stored in an in-memory hashtable and verified at the end of the
test. All writes for a given key are serialzied.

Test Plan:
 - Verified by writing only a few keys and logging all writes and verifying that values read and written are correct.
 - Verified correctness of value generator.
 - Ran with various parameters of number of keys, locks, and threads.

Reviewers: dhruba, MarkCallaghan, heyongqiang

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D5829
2012-10-10 12:12:55 -07:00
Thawan Kooburat
696b290821 Add LevelDb's JNI wrapper
Summary: This implement the Java interface by using JNI

Test Plan: compile test

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D5925
2012-10-05 13:13:49 -07:00
Thawan Kooburat
fc23714f27 Add LevelDb's Java interface
Summary:
See the wiki below
https://our.intern.facebook.com/intern/wiki/index.php/Database/leveldb/Java

Test Plan: compile test

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D5919
2012-10-05 13:11:31 -07:00
Dhruba Borthakur
f7975ac733 Implement RowLocks for assoc schema
Summary:
Each assoc is identified by (id1, assocType). This is the rowkey.
Each row has a read/write rowlock. There is statically allocated array
of 2000 read/write locks. A rowkey is murmur-hashed to one of the
read/write locks.

assocPut and assocDelete acquires the rowlock in Write mode.
The key-updates are done within the rowlock with a atomic nosync
batch write to leveldb. Then the rowlock is released and
a write-with-sync is done to sync leveldb transaction log.

Test Plan: added unit test

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5859
2012-10-03 23:19:01 -07:00
Dhruba Borthakur
c1006d4276 An configurable option to write data using write instead of mmap.
Summary:
We have seen that reading data via the pread call (instead of
mmap) is much faster on Linux 2.6.x kernels. This patch makes
an equivalent option to switch off mmaps for the write path
as well.

db_bench --mmap_write=0 will use write() instead of mmap() to
write data to a file.

This change is backward compatible, the default
option is to continue using mmap for writing to a file.

Test Plan: "make check all"

Differential Revision: https://reviews.facebook.net/D5781
2012-10-03 17:08:13 -07:00
Mark Callaghan
e678a5947a Add --stats_interval option to db_bench
Summary:
The option is zero by default and in that case reporting is unchanged.
By unchanged, the interval at which stats are reported is scaled after each
report and newline is not issued after each report so one line is rewritten.
When non-zero it specifies the constant interval (in operations) at which
statistics are reported and the stats include the rate per interval. This
makes it easier to determine whether QPS changes over the duration of the test.

Task ID: #

Blame Rev:

Test Plan:
run db_bench

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

Reviewed By: dhruba

CC: heyongqiang

Differential Revision: https://reviews.facebook.net/D5817
2012-10-03 09:54:33 -07:00
Mark Callaghan
d8763abecd Fix the bounds check for the --readwritepercent option
Summary:
see above

Task ID: #

Blame Rev:

Test Plan:
run db_bench with invalid value for option

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

Reviewed By: dhruba

CC: heyongqiang

Differential Revision: https://reviews.facebook.net/D5823
2012-10-03 09:52:26 -07:00
Mark Callaghan
98804f914f Fix compiler warnings and errors in ldb.c
Summary:
stdlib.h is needed for exit()
--readhead --> --readahead

Task ID: #

Blame Rev:

Test Plan:
compile

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
fix compiler warnings & errors

Reviewers: dhruba

Reviewed By: dhruba

CC: heyongqiang

Differential Revision: https://reviews.facebook.net/D5805
2012-10-03 06:46:59 -07:00
Dhruba Borthakur
a58d48de79 Implement ReadWrite locks for leveldb
Summary:
Implement ReadWrite locks for leveldb. These will be helpful
to implement a read-modify-write operation (e.g. atomic increments).

Test Plan: does not modify any existing code

Reviewers: heyongqiang

Reviewed By: heyongqiang

CC: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D5787
2012-10-01 22:37:39 -07:00
Abhishek Kona
fec81318b0 Commandline tool to compace LevelDB databases.
Summary:
A simple CLI which calles DB->CompactRange()
Can take String key's as range.

Test Plan:
Inserted data into a table.
Waited for a minute, used compact tool on it. File modification time's
changed so Compact did something on the files.

Existing unit tests work.

Reviewers: heyongqiang, dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D5697
2012-10-01 10:49:19 -07:00
Dhruba Borthakur
a321d5be9e Implement assocDelete.
Summary: Implement assocDelete.

Test Plan: unit test attached

Reviewers: heyongqiang

Reviewed By: heyongqiang

CC: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D5721
2012-10-01 09:58:26 -07:00
Dhruba Borthakur
72c45c66c6 Print the block cache size in the LOG.
Summary: Print the block cache size in the LOG.

Test Plan: run db_bench and look at LOG. This is helpful while I was debugging one use-case.

Reviewers: heyongqiang, MarkCallaghan

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5739
2012-09-29 21:39:19 -07:00
Dhruba Borthakur
c1bb32e1ba Trigger read compaction only if seeks to storage are incurred.
Summary:
In the current code, a Get() call can trigger compaction if it has to look at more than one file. This causes unnecessary compaction because looking at more than one file is a penalty only if the file is not yet in the cache. Also, th current code counts these files before the bloom filter check is applied.

This patch counts a 'seek' only if the file fails the bloom filter
check and has to read in data block(s) from the storage.

This patch also counts a 'seek' if a file is not present in the file-cache, because opening a file means that its index blocks need to be read into cache.

Test Plan: unit test attached. I will probably add one more unti tests.

Reviewers: heyongqiang

Reviewed By: heyongqiang

CC: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D5709
2012-09-28 11:10:52 -07:00
gjain
92368ab8a2 Add db_dump tool to dump DB keys
Summary:
Create a tool to iterate through keys and dump values. Current options
as follows:

db_dump --start=[START_KEY] --end=[END_KEY] --max_keys=[NUM] --stats
[PATH]

START_KEY: First key to start at
END_KEY: Key to end at (not inclusive)
NUM: Maximum number of keys to dump
PATH: Path to leveldb DB

The --stats command line argument prints out the DB stats before dumping
the keys.

Test Plan:
- Tested with invalid args
- Tested with invalid path
- Used empty DB
- Used filled DB
- Tried various permutations of command line options

Reviewers: dhruba, heyongqiang

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D5643
2012-09-27 09:53:58 -07:00
Dhruba Borthakur
eace74deac Add -fPIC to the shared library builds. Needed by libleveldbjni.
Summary: Add -fPIC to the shared library builds. Needed by libleveldbjni.

Test Plan: build

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5667
2012-09-25 11:07:35 -07:00
Dhruba Borthakur
24eea931ef If ReadCompaction is switched off, then it is better to not even submit background compaction jobs.
Summary:
If ReadCompaction is switched off, then it is better to not even
submit background compaction jobs. I see about 3% increase in
read-throughput on a pure memory database.

Test Plan: run db_bench

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5673
2012-09-25 11:07:01 -07:00
Dhruba Borthakur
26e0ecbd98 Release 1.5.3.fb.
Summary:

Test Plan:

Reviewers:

CC:

Task ID: #

Blame Rev:
2012-09-25 08:30:46 -07:00
Dhruba Borthakur
ae36e509f8 The BackupAPI should also list the length of the manifest file.
Summary:
The GetLiveFiles() api lists the set of sst files and the current
MANIFEST file. But the database continues to append new data to the
MANIFEST file even when the application is backing it up to the
backup location. This means that the database-version that is
stored in the MANIFEST FILE in the backup location
does not correspond to the sst files returned by GetLiveFiles.

This API adds a new parameter to GetLiveFiles. This new parmeter
returns the current size of the MANIFEST file.

Test Plan: Unit test attached.

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5631
2012-09-25 03:13:25 -07:00
Dhruba Borthakur
dd45b8cd8c Keep symbols even for production release.
Summary:
Keeping symbols in the binary increases the size of the library but makes
it easier to debug. The optimization level is still -O2, so this should
have no impact on performance.

Test Plan: make all

Reviewers: heyongqiang, MarkCallaghan

Reviewed By: MarkCallaghan

CC: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D5601
2012-09-21 15:57:47 -07:00
Dhruba Borthakur
653add3c66 Release 1.5.2.fb
Summary:

Test Plan:

Reviewers:

CC:

Task ID: #

Blame Rev:
2012-09-21 11:01:33 -07:00
Dhruba Borthakur
bb2dcd2457 Segfault in DoCompactionWork caused by buffer overflow
Summary:
The code was allocating 200 bytes on the stack but it
writes 256 bytes into the array.

x8a8ea5 std::_Rb_tree<>::erase()
    @     0x7f134bee7eb0 (unknown)
    @           0x8a8ea5 std::_Rb_tree<>::erase()
    @           0x8a35d6 leveldb::DBImpl::CleanupCompaction()
    @           0x8a7810 leveldb::DBImpl::BackgroundCompaction()
    @           0x8a804d leveldb::DBImpl::BackgroundCall()
    @           0x8c4eff leveldb::(anonymous namespace)::PosixEnv::BGThreadWrapper()
    @     0x7f134b3c010d start_thread
    @     0x7f134bf9f10d clone

Test Plan: run db_bench with overwrite option

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5595
2012-09-21 10:55:38 -07:00
Dhruba Borthakur
9e84834eb4 Allow a configurable number of background threads.
Summary:
The background threads are necessary for compaction.
For slower storage, it might be necessary to have more than
one compaction thread per DB. This patch allows creating
a configurable number of worker threads.
The default reamins at 1 (to maintain backward compatibility).

Test Plan:
run all unit tests. changes to db-bench coming in
a separate patch.

Reviewers: heyongqiang

Reviewed By: heyongqiang

CC: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D5559
2012-09-19 15:51:08 -07:00
Dhruba Borthakur
fb4b381a0c Print out the compile version in the LOG.
Summary: Print out the compile version in the LOG.

Test Plan: run dbbench and verify LOG

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5529
2012-09-18 13:24:32 -07:00
heyongqiang
3662c2976a improve comments about target_file_size_base, target_file_size_multiplier, max_bytes_for_level_base, max_bytes_for_level_multiplier Summary:
Summary: as subject

Test Plan: compile

Reviewers: MarkCallaghan, dhruba

Differential Revision: https://reviews.facebook.net/D5499
2012-09-17 15:56:11 -07:00
Dhruba Borthakur
aa0426f124 Use correct version of jemalloc.
Summary: Use correct version of jemalloc.

Test Plan: run unit tests

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5487
2012-09-17 15:00:19 -07:00
heyongqiang
a8464ed820 add an option to disable seek compaction
Summary:
as subject. This diff should be good for benchmarking.

will send another diff to make it better in the case the seek compaction is enable.
In that coming diff, will not count a seek if the bloomfilter filters.

Test Plan: build

Reviewers: dhruba, MarkCallaghan

Reviewed By: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D5481
2012-09-17 13:59:57 -07:00
Dhruba Borthakur
906f2ee1f1 New release 1.5.1.fb
Summary:

Test Plan:

Reviewers:

CC:

Task ID: #

Blame Rev:
2012-09-17 11:35:06 -07:00
Dhruba Borthakur
1f7850cf4c Build with gcc-4.7.1-glibc-2.14.1.
Summary:

Test Plan:

Reviewers:

CC:

Task ID: #

Blame Rev:
2012-09-17 10:56:26 -07:00
heyongqiang
b5263428ab use 20d3328ac30f633840ce819ad03019f415267a86 as builder Summary:
Summary: as subject

Test Plan: build

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D5475
2012-09-17 10:53:52 -07:00