Commit Graph

121 Commits

Author SHA1 Message Date
sdong
d72b31774e Slowdown when writing to the last write buffer
Summary: Now if inserting to mem table is much faster than writing to files, there is no mechanism users can rely on to avoid stopping for reaching options.max_write_buffer_number. With the commit, if there are more than four maximum write buffers configured, we slow down to the rate of options.delayed_write_rate while we reach the last one.

Test Plan:
1. Add a new unit test.
2. Run db_bench with

./db_bench --benchmarks=fillrandom --num=10000000 --max_background_flushes=6 --batch_size=32 -max_write_buffer_number=4 --delayed_write_rate=500000 --statistics

based on hard drive and see stopping is avoided with the commit.

Reviewers: yhchiang, IslamAbdelRahman, anthony, rven, kradhakrishnan, igor

Reviewed By: igor

Subscribers: MarkCallaghan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D52047
2015-12-17 10:49:08 -08:00
Dmitri Smirnov
a6fbdd64e0 Fix rebase issues and new code warnings. 2015-12-11 16:56:24 -08:00
sdong
56e77f0967 Deprecate options.soft_rate_limit and add options.soft_pending_compaction_bytes_limit
Summary: Deprecate options.soft_rate_limit, which is hard to tune, with options.soft_pending_compaction_bytes_limit, which would trigger the slowdown if estimated pending compaction bytes exceeds the threshold. The hope is to make it more striaght-forward to tune.

Test Plan: Modify DBTest.SoftLimit to cover options.soft_pending_compaction_bytes_limit instead; run all unit tests.

Reviewers: IslamAbdelRahman, yhchiang, rven, kradhakrishnan, igor, anthony

Reviewed By: anthony

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D51117
2015-12-09 18:22:45 -08:00
Yueh-Hsuan Chiang
ad471453e8 Allow GetProperty to report the number of currently running flushes / compactions.
Summary:
Add rocksdb.num-running-compactions and rocksdb.num-running-flushes
to GetIntProperty() that reports the number of currently running
compactions / flushes.

Test Plan: augmented existing tests in db_test

Reviewers: igor, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D48693
2015-10-17 00:16:36 -07:00
Andres Noetzli
014fd55adc Support for SingleDelete()
Summary:
This patch fixes #7460559. It introduces SingleDelete as a new database
operation. This operation can be used to delete keys that were never
overwritten (no put following another put of the same key). If an overwritten
key is single deleted the behavior is undefined. Single deletion of a
non-existent key has no effect but multiple consecutive single deletions are
not allowed (see limitations).

In contrast to the conventional Delete() operation, the deletion entry is
removed along with the value when the two are lined up in a compaction. Note:
The semantics are similar to @igor's prototype that allowed to have this
behavior on the granularity of a column family (
https://reviews.facebook.net/D42093 ). This new patch, however, is more
aggressive when it comes to removing tombstones: It removes the SingleDelete
together with the value whenever there is no snapshot between them while the
older patch only did this when the sequence number of the deletion was older
than the earliest snapshot.

Most of the complex additions are in the Compaction Iterator, all other changes
should be relatively straightforward. The patch also includes basic support for
single deletions in db_stress and db_bench.

Limitations:
- Not compatible with cuckoo hash tables
- Single deletions cannot be used in combination with merges and normal
  deletions on the same key (other keys are not affected by this)
- Consecutive single deletions are currently not allowed (and older version of
  this patch supported this so it could be resurrected if needed)

Test Plan: make all check

Reviewers: yhchiang, sdong, rven, anthony, yoshinorim, igor

Reviewed By: igor

Subscribers: maykov, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43179
2015-09-17 11:42:56 -07:00
sdong
5de807ac16 Add options.hard_pending_compaction_bytes_limit to stop writes if compaction lagging behind
Summary: Add an option to stop writes if compaction lefts behind. If estimated pending compaction bytes is more than threshold specified by options.hard_pending_compaction_bytes_liimt, writes will stop until compactions are cleared to under the threshold.

Test Plan: Add unit test DBTest.HardLimit

Reviewers: rven, kradhakrishnan, anthony, IslamAbdelRahman, yhchiang, igor

Reviewed By: igor

Subscribers: MarkCallaghan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D45999
2015-09-14 12:51:16 -07:00
Ari Ekmekji
03ddce9a01 Add counters for L0 stall while L0-L1 compaction is taking place
Summary:
Although there are currently counters to keep track of the
stall caused by having too many L0 files, there is no distinction as
to whether when that stall occurs either (A) L0-L1 compaction is taking
place to try and mitigate it, or (B) no L0-L1 compaction has been scheduled
at the moment. This diff adds a counter for (A) so that the nature of L0
stalls can be better understood.

Test Plan: make all && make check

Reviewers: sdong, igor, anthony, noetzli, yhchiang

Reviewed By: yhchiang

Subscribers: MarkCallaghan, dhruba

Differential Revision: https://reviews.facebook.net/D46749
2015-09-14 11:03:37 -07:00
Andres Noetzli
3c9cef1eed Unified maps with Comparator for sorting, other cleanup
Summary:
This diff is a collection of cleanups that were initially part of D43179.
Additionally it adds a unified way of defining key-value maps that use a
Comparator for sorting (this was previously implemented in four different
places).

Test Plan: make clean check all

Reviewers: rven, anthony, yhchiang, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D45993
2015-09-02 13:58:22 -07:00
Yueh-Hsuan Chiang
6996de87af Expose per-level aggregated table properties via GetProperty()
Summary:
This patch adds "rocksdb.aggregated-table-properties"
and "rocksdb.aggregated-table-properties-at-levelN", the former
returns the aggreated table properties of a column family,
while the later returns the aggregated table properties
of the specified level N.

Test Plan: Added tests in db_test

Reviewers: igor, sdong, IslamAbdelRahman, anthony

Reviewed By: anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D45087
2015-08-25 12:03:54 -07:00
Igor Canadi
e46bcc08b9 Remove an extra 's' from cur-size-all-mem-tabless
Summary: As title

Test Plan: make check

Reviewers: yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D45447
2015-08-24 16:43:18 -07:00
sdong
07d2d34160 Add a counter about estimated pending compaction bytes
Summary:
Add a counter of estimated bytes the DB needs to compact for all the compactions to finish. Expose it as a DB Property.
In the future, we can use threshold of this counter to replace soft rate limit and hard rate limit. A single threshold of estimated compaction debt in bytes will be easier for users to reason about when should slow down and stopping than more abstract soft and hard rate limits.

Test Plan: Add unit tests

Reviewers: IslamAbdelRahman, yhchiang, rven, kradhakrishnan, anthony, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D44205
2015-08-20 22:17:10 -07:00
Islam AbdelRahman
027ca5b2cd Total SST files size DB Property
Summary: Add a new DB property that calculate the total size of files used by all RocksDB Versions

Test Plan: Unittests for the new property

Reviewers: igor, yhchiang, anthony, rven, kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D44799
2015-08-20 11:47:19 -07:00
Yueh-Hsuan Chiang
df79eafcb3 Introduce GetIntProperty("rocksdb.size-all-mem-tables")
Summary:
Currently, GetIntProperty("rocksdb.cur-size-all-mem-tables") only returns
the memory usage by those memtables which have not yet been flushed.

This patch introduces GetIntProperty("rocksdb.size-all-mem-tables"),
which includes the memory usage by all the memtables, includes those
have been flushed but pinned by iterators.

Test Plan: Added a test in db_test

Reviewers: igor, anthony, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D44229
2015-08-19 13:32:09 -07:00
sdong
72613657f0 Measure file read latency histogram per level
Summary: In internal stats, remember read latency histogram, if statistics is enabled. It can be retrieved from DB::GetProperty() with "rocksdb.dbstats" property, if it is enabled.

Test Plan: Manually run db_bench and prints out "rocksdb.dbstats" by hand and make sure it prints out as expected

Reviewers: igor, IslamAbdelRahman, rven, kradhakrishnan, anthony, yhchiang

Reviewed By: yhchiang

Subscribers: MarkCallaghan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D44193
2015-08-14 17:32:42 -07:00
Andres Notzli
4249f159d5 Removing duplicate code in db_bench/db_stress, fixing typos
Summary:
While working on single delete support for db_bench, I realized that
db_bench/db_stress contain a bunch of duplicate code related to
copmression and found some typos. This patch removes duplicate code,
typos and a redundant #ifndef in internal_stats.cc.

Test Plan: make db_stress && make db_bench && ./db_bench --benchmarks=compress,uncompress

Reviewers: yhchiang, sdong, rven, anthony, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43965
2015-08-11 11:46:15 -07:00
Andres Notzli
06aebca592 Report live data size estimate
Summary:
Fixes T6548822. Added a new function for estimating the size of the live data
as proposed in the task. The value can be accessed through the property
rocksdb.estimate-live-data-size.

Test Plan:
There are two unit tests in version_set_test and a simple test in db_test.
make version_set_test && ./version_set_test;
make db_test && ./db_test gtest_filter=GetProperty

Reviewers: rven, igor, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D41493
2015-07-21 21:33:20 -07:00
Yueh-Hsuan Chiang
bb1c74ce18 Fixed a bug of CompactionStats in multi-level universal compaction case
Summary:
Universal compaction can involves in multiple levels.  However,
the current implementation of bytes_readn and bytes_readnp1
(and some other stats with postfix `n` and `np1`) assumes compaction
can only have two levels.

This patch fixes this bug and redefines bytes_readn and bytes_readnp1:
* bytes_readnp1: the number of bytes read in the compaction output level.
* bytes_readn: the total number of bytes read minus bytes_readnp1

Test Plan: Add a test in compaction_job_stats_test

Reviewers: igor, sdong, rven, anthony, kradhakrishnan, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D40239
2015-06-17 23:40:34 -07:00
agiardullo
c815351038 Support saving history in memtable_list
Summary:
For transactions, we are using the memtables to validate that there are no write conflicts.  But after flushing, we don't have any memtables, and transactions could fail to commit.  So we want to someone keep around some extra history to use for conflict checking.  In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.

After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure).  It seems like the best place for this is abstracted inside the memtable_list.  I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.

This diff adds a new parameter to control how much memtable history to keep around after flushing.  However, it sounds like people aren't too fond of adding new parameters.  So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers.  This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit.  (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached).  So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).

However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.

Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests.  Added testing in memtablelist_test and planning on adding more testing here.

Reviewers: sdong, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37443
2015-05-28 16:34:24 -07:00
Igor Canadi
ea6d3a8ac0 Don't skip last level when calculating compaction stats
Summary: We have a bug where we don't report the last level's files as being compacted. This fixes it.

Test Plan: See the fix in action here: https://phabricator.fb.com/P19845738

Reviewers: MarkCallaghan, sdong

Reviewed By: sdong

Subscribers: yhchiang, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38727
2015-05-22 15:30:43 -04:00
Mark Callaghan
944043d683 Add --wal_bytes_per_sync for db_bench and more IO stats
Summary:
See https://gist.github.com/mdcallag/89ebb2b8cbd331854865 for the IO stats.
I added "Cumulative compaction:" and "Interval compaction:" lines. The IO rates
can be confusing. Rates fro per-level stats lines, Wr(MB/s) & Rd(MB/s), are computed
using the duration of the compaction job. If the job reads 10MB, writes 9MB and the job
(IO & merging) takes 1 second then the rates are 10MB/s for read and 9MB/s for writes.
The IO rates in the Cumulative compaction line uses the total uptime. The IO rates in the
Interval compaction line uses the interval uptime. So these Cumalative & Interval
compaction IO rates cannot be compared to the per-level IO rates. But both forms of
the rates are useful for debugging perf.

Task ID: #

Blame Rev:

Test Plan:
run db_bench

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D38667
2015-05-19 16:19:30 -07:00
Herman Lee
51c8133a72 Fix make unity build compiler warning about "stats" shadowing global variable
Summary:
Fix the make unity build. The local stats variable name was shadowing a
global stats variable.

Test Plan:
Run the build
OPT=-DTRAVIS V=1 make unity

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D36285
2015-04-01 10:48:42 -07:00
Mark Callaghan
99ec2412e5 Make the benchmark scripts configurable and add tests
Summary:
This makes run_flash_bench.sh configurable. Previously it was hardwired for 1B keys and tests
ran for 12 hours each. That kept me from using it. This makes it configuable, adds more tests,
makes the duration per-test configurable and refactors the test scripts.

Adds the seekrandomwhilemerging test to db_bench which is the same as seekrandomwhilewriting except
the writer thread does Merge rather than Put.

Forces the stall-time column in compaction IO stats to use a fixed format (H:M:S) which makes
it easier to scrape and parse. Also adds an option to AppendHumanMicros to force a fixed format.
Sometimes automation and humans want different format.

Calls thread->stats.AddBytes(bytes); in db_bench for more tests to get the MB/sec summary
stats in the output at test end.

Adds the average ingest rate to compaction IO stats. Output now looks like:
https://gist.github.com/mdcallag/2bd64d18be1b93adc494

More information on the benchmark output is at https://gist.github.com/mdcallag/db43a58bd5ac624f01e1

For benchmark.sh changes default RocksDB configuration to reduce stalls:
* min_level_to_compress from 2 to 3
* hard_rate_limit from 2 to 3
* max_grandparent_overlap_factor and max_bytes_for_level_multiplier from 10 to 8
* L0 file count triggers from 4,8,12 to 4,12,20 for (start,stall,stop)

Task ID: #6596829

Blame Rev:

Test Plan:
run tools/run_flash_bench.sh

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D36075
2015-03-30 11:28:25 -07:00
Herman Lee
e018892bb6 Formalize the DB properties string definitions.
Summary:
Assign the string properties to const string variables under the
DB::Properties namespace. This helps catch typos during compilation and
also consolidates the property definition in one place.

Test Plan: Run rocksdb unit tests

Reviewers: sdong, yoshinorim, igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D35991
2015-03-27 14:50:20 -07:00
sdong
0831a35994 Add a DB Property For Number of Deletions in Memtables
Summary: Add a DB property for number of deletions in memtables. It can sometimes help people debug slowness because of too many deletes.

Test Plan: Add test cases.

Reviewers: rven, yhchiang, kradhakrishnan, igor

Reviewed By: igor

Subscribers: leveldb, dhruba, yoshinorim

Differential Revision: https://reviews.facebook.net/D35247
2015-03-18 17:03:59 -07:00
Mark Callaghan
56337faf3e Fix compaction IO stats to handle large file counts
Summary:
The output did not have space for 6-digit file counts or for 3-digit
counts of files being compacted. This adds space for that while preserving
existing alignment. See https://gist.github.com/mdcallag/0a61c6a18dd467224c11

Task ID: #

Blame Rev:

Test Plan:
run db_bench, look at output

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D35091
2015-03-16 11:50:23 -07:00
Igor Canadi
c6967a1a5e Make RecordIn/RecordOut human readable
Summary: I had hard time understanding these big numbers. Here's how the output looks like now: https://gist.github.com/igorcanadi/4c39c17685049584a992

Test Plan: db_bench

Reviewers: sdong, MarkCallaghan

Reviewed By: MarkCallaghan

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35073
2015-03-14 15:12:41 -07:00
Mark Callaghan
c8da670325 Stop printing per-level stall times.
Summary:
Per-level stall times are the suggested stall time, not the actual stall time so this change stops printing them
both in the per-level output lines and in the summary. Also changed output for total stall time to include units
in all cases. The new output looks like:
Level   Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) Stall(cnt)    RecordIn   RecordDrop
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0     4/1          7   0.8      0.0     0.0      0.0       0.6      0.6       0.0   0.0      0.0     12.9        50       352    0.141        882            0            0
  L1     5/0          9   0.9      0.0     0.0      0.0       0.0      0.0       0.6   0.0      0.0      0.0         0         0    0.000          0            0            0
  L2    54/0         99   1.0      0.0     0.0      0.0       0.0      0.0       0.6   0.0      0.0      0.0         0         0    0.000          0            0            0
  L3   289/0        527   0.5      0.0     0.0      0.0       0.0      0.0       0.5   0.0      0.0      0.0         0         0    0.000          0            0            0
 Sum   352/1        642   0.0      0.0     0.0      0.0       0.6      0.6       1.7   1.0      0.0     12.9        50       352    0.141        882            0            0
 Int     0/0          0   0.0      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0     15.5         0         3    0.118          7            0            0
Flush(GB): accumulative 0.627, interval 0.005
Stalls(count): 0 level0_slowdown, 0 level0_numfiles, 882 memtable_compaction, 0 leveln_slowdown_soft, 0 leveln_slowdown_hard

Task ID: #6493861

Blame Rev:

Test Plan:
run db_bench, look at output

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D35085
2015-03-14 15:01:43 -07:00
Islam AbdelRahman
9d22a1f136 Allow negative Wnew
Summary:
we are using uint64_t for Wnew this is not correct since this value can be negative
https://github.com/facebook/rocksdb/issues/535

Test Plan: run db_bench and check what happens when Wnew is -ve

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D34935
2015-03-12 20:53:18 -07:00
Yueh-Hsuan Chiang
694988b627 Fix a bug in stall time counter. Improve its output format.
Summary: Fix a bug in stall time counter.  Improve its output format.

Test Plan:
export ROCKSDB_TESTS=Timeout
./db_test

./db_bench --benchmarks=fillrandom --stats_interval=10000 --statistics=true --stats_per_interval=1 --num=1000000 --threads=4 --level0_stop_writes_trigger=3 --level0_slowdown_writes_trigger=2

sample output:
    Uptime(secs): 35.8 total, 0.0 interval
    Cumulative writes: 359590 writes, 359589 keys, 183047 batches, 2.0 writes per batch, 0.04 GB user ingest, stall seconds: 1786.008 ms
    Cumulative WAL: 359591 writes, 183046 syncs, 1.96 writes per sync, 0.04 GB written
    Interval writes: 253 writes, 253 keys, 128 batches, 2.0 writes per batch, 0.0 MB user ingest, stall time: 0 us
    Interval WAL: 253 writes, 128 syncs, 1.96 writes per sync, 0.00 MB written

Reviewers: MarkCallaghan, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34275
2015-03-03 12:48:12 -08:00
Igor Canadi
db03739340 options.level_compaction_dynamic_level_bytes to allow RocksDB to pick size bases of levels dynamically.
Summary:
When having fixed max_bytes_for_level_base, the ratio of size of largest level and the second one can range from 0 to the multiplier. This makes LSM tree frequently irregular and unpredictable. It can also cause poor space amplification in some cases.

In this improvement (proposed by Igor Kabiljo), we introduce a parameter option.level_compaction_use_dynamic_max_bytes. When turning it on, RocksDB is free to pick a level base in the range of (options.max_bytes_for_level_base/options.max_bytes_for_level_multiplier, options.max_bytes_for_level_base] so that real level ratios are close to options.max_bytes_for_level_multiplier.

Test Plan: New unit tests and pass tests suites including valgrind.

Reviewers: MarkCallaghan, rven, yhchiang, igor, ikabiljo

Reviewed By: ikabiljo

Subscribers: yoshinorim, ikabiljo, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D31437
2015-03-02 22:40:41 -08:00
sdong
d45a6a4002 Add rocksdb.num-live-versions: number of live versions
Summary: Add a DB property about live versions. It can be helpful to figure out whether there are files not live but not yet deleted, in some use cases.

Test Plan: make all check

Reviewers: rven, yhchiang, igor

Reviewed By: igor

Subscribers: yoshinorim, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D33327
2015-02-19 13:10:37 -08:00
sdong
9132e52ea4 DB Stats Dump to print total stall time
Summary:
Add printing of stall time in DB Stats:

Sample outputs:

** DB Stats **
Uptime(secs): 53.2 total, 1.7 interval
Cumulative writes: 625940 writes, 625939 keys, 625940 batches, 1.0 writes per batch, 0.49 GB user ingest, stall micros: 50691070
Cumulative WAL: 625940 writes, 625939 syncs, 1.00 writes per sync, 0.49 GB written
Interval writes: 10859 writes, 10859 keys, 10859 batches, 1.0 writes per batch, 8.7 MB user ingest, stall micros: 1692319
Interval WAL: 10859 writes, 10859 syncs, 1.00 writes per sync, 0.01 MB written

Test Plan:
make all check
verify printing using db_bench

Reviewers: igor, yhchiang, rven, MarkCallaghan

Reviewed By: MarkCallaghan

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D31239
2015-01-09 11:44:19 -08:00
sdong
1f04066cab Add DBProperty to return number of snapshots and time for oldest snapshot
Summary:
Add a counter in SnapshotList to show number of snapshots. Also a unix timestamp in every snapshot.
Add two DB Properties to return number of snapshots and timestamp of the oldest one.

Test Plan: Add unit test checking

Reviewers: yhchiang, rven, igor

Reviewed By: igor

Subscribers: leveldb, dhruba, MarkCallaghan

Differential Revision: https://reviews.facebook.net/D29919
2014-12-05 17:07:49 -08:00
Mark Callaghan
32a0a03844 Add Moved(GB) to Compaction IO stats
Summary:
Adds counter for bytes moved (files pushed down a level rather than compacted) to compaction
IO stats as Moved(GB). From the output removed these infrequently used columns: RW-Amp, Rn(cnt), Rnp1(cnt),
Wnp1(cnt), Wnew(cnt).
Example old output:
Level   Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) RW-Amp W-Amp Rd(MB/s) Wr(MB/s)  Rn(cnt) Rnp1(cnt) Wnp1(cnt) Wnew(cnt)  Comp(sec) Comp(cnt) Avg(sec) Stall(sec) Stall(cnt) Avg(ms) RecordIn RecordDrop
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0     0/0          0   0.0      0.0     0.0      0.0    2130.8   2130.8    0.0   0.0      0.0    109.1        0         0         0         0      20002     25068    0.798      28.75     182059    0.16       0          0
  L1   142/0        509   1.0   4618.5  2036.5   2582.0    4602.1   2020.2    4.5   2.3     88.5     88.1    24220    701246   1215528    514282      53466      4229   12.643       0.00          0    0.002032745988  300688729

Example new output:
Level   Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) Stall(sec) Stall(cnt) Avg(ms)     RecordIn   RecordDrop
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0     7/0         13   1.8      0.0     0.0      0.0       0.6      0.6       0.0   0.0      0.0     14.7        44       353    0.124       0.03        626    0.05            0            0
  L1     9/0         16   1.6      0.0     0.0      0.0       0.0      0.0       0.6   0.0      0.0      0.0         0         0    0.000       0.00          0    0.00            0            0

Task ID: #

Blame Rev:

Test Plan:
make check, run db_bench --fillseq --stats_per_interval --stats_interval and look at output

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D29787
2014-12-03 18:28:39 -08:00
Yueh-Hsuan Chiang
274ba62707 Block internal_stats in ROCKSDB_LITE
Summary: Block internal_stats in ROCKSDB_LITE.

Test Plan: make OPT=-DROCKSDB_LITE shared_lib

Reviewers: ljin, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D29541
2014-11-25 12:01:27 -08:00
Yueh-Hsuan Chiang
13de000f07 Add rocksdb::ToString() to address cases where std::to_string is not available.
Summary:
In some environment such as android, the c++ library does not have
std::to_string.  This path adds rocksdb::ToString(), which wraps std::to_string
when std::to_string is not available, and implements std::to_string
in the other case.

Test Plan:
make dbg -j32
./db_test
make clean
make dbg OPT=-DOS_ANDROID -j32
./db_test

Reviewers: ljin, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D29181
2014-11-24 20:44:49 -08:00
Lei Jin
8d3f8f9696 remove all remaining references to cfd->options()
Summary:
The very last reference happens in DBImpl::GetOptions()
I built with both DBImpl::GetOptions() and ColumnFamilyData::options() commented out

Test Plan: make all check

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D29073
2014-11-18 10:20:10 -08:00
Yueh-Hsuan Chiang
4161de92a3 Fix SIGSEGV
Summary: As a short-term fix, let's go back to previous way of calculating NeedsCompaction(). SIGSEGV happens because NeedsCompaction() can happen before super_version (and thus MutableCFOptions) is initialized.

Test Plan: make check

Reviewers: ljin, sdong, rven, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D28875
2014-11-13 15:21:04 -08:00
Yueh-Hsuan Chiang
1d1a64f58a Move NeedsCompaction() from VersionStorageInfo to CompactionPicker
Summary:
Move NeedsCompaction() from VersionStorageInfo to CompactionPicker
to allow different compaction strategy to have their own way to
determine whether doing compaction is necessary.

When compaction style is set to kCompactionStyleNone, then
NeedsCompaction() will always return false.

Test Plan:
export ROCKSDB_TESTS=Compact
./db_test

Reviewers: ljin, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D28719
2014-11-13 13:41:43 -08:00
sdong
2ea1219eb6 Fix RecordIn and RecordDrop stats
Summary:
1. fix possible overflow of the two stats by using uint64_t
2. use a similar source of data to calculate RecordDrop. Previous one is not correct.

Test Plan: See outputs of db_bench settings, and the results look reasonable

Reviewers: MarkCallaghan, ljin, igor

Reviewed By: igor

Subscribers: rven, leveldb, yhchiang, dhruba

Differential Revision: https://reviews.facebook.net/D28155
2014-11-05 11:03:34 -08:00
sdong
ac6afaf9ef Enforce naming convention of getters in version_set.h
Summary: Enforce the accessier naming convention in functions in version_set.h

Test Plan: make all check

Reviewers: ljin, yhchiang, rven, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D28143
2014-11-04 09:59:05 -08:00
Lei Jin
5594d446ff unfriend DBImpl and InternalStats from VersionStorageInfo
Summary: as title

Test Plan: make release

Reviewers: sdong, yhchiang, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D28053
2014-10-31 15:04:39 -07:00
sdong
c645250ee0 CompactionStats to support larger value of RecordIn and RecordDrop
Summary: now we use %8d for RecordIn and %10d for RecordDrop, which is far too small for some use cases. Extend both of them to %12d.

Test Plan: run one test in db_test and see the LOG file.

Reviewers: igor, MarkCallaghan, ljin

Reviewed By: ljin

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D28041
2014-10-31 13:51:48 -07:00
sdong
4d2ba38b65 Make VersionBuilder unit testable
Summary:
Rename Version::Builder to VersionBuilder and expose its definition to a header.
Make VerisonBuilder not reference Version or ColumnFamilyData, only working with VersionStorageInfo.
Add version_builder_test which has a simple test.

Test Plan: make all check

Reviewers: rven, yhchiang, igor, ljin

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D27969
2014-10-31 10:44:06 -07:00
sdong
76d1c28e82 Make CompactionPicker more easily tested
Summary:
Make compaction picker easier to test.
The basic idea is to separate a minimum subcomponent of Version to VersionStorageInfo, which just responsible to LSM tree. A stub VersionStorageInfo can then be easily created and passed into compaction picker so that we can check the outputs.

It now passes most tests. Still two things need to be done:
(1) deal with the FIFO compaction's file size.
(2) write an example test to make sure the interface can do the job.

Add a compaction_picker_test to make sure compaction picker codes can be easily unit tested.

Test Plan:
Pass all unit tests and compaction_picker_test

Reviewers: yhchiang, rven, igor, ljin

Reviewed By: ljin

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D27639
2014-10-29 15:16:53 -07:00
sdong
d755e53b87 Printing number of keys in DB Stats
Summary: It is useful to print out number of keys in DB Stats

Test Plan:
./db_bench --benchmarks fillrandom --num 1000000 -threads 16 -batch_size=16

and watch the outputs in LOG files

Reviewers: MarkCallaghan, ljin, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D24513
2014-10-22 18:41:33 -07:00
Igor Canadi
ee80fb4b4a Total memtables size counter
Summary: Added one new counter for GetProperty

Test Plan: Not sure if needs a test case. compiles

Reviewers: sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D25023
2014-10-17 09:26:27 -07:00
sdong
8ea232b9e3 Add number of records dropped in compaction summary
Summary:
Add two stats to compaction summary:
1. Total input records from previous level
2. Total number of records dropped after compaction

Test Plan: See outputs of printing when runnning locally

Reviewers: ljin, igor, MarkCallaghan

Reviewed By: MarkCallaghan

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D24411
2014-10-02 17:54:25 -07:00
liuhuahang
bb6ae0f80c fix more compile warnings
N/A

Change-Id: I5b6f9c70aea7d3f3489328834fed323d41106d9f
Signed-off-by: liuhuahang <liuhuahang@zerus.co>
2014-09-05 14:14:37 +08:00
Igor Canadi
990df99a61 Fix ios compile
Summary: We need to set contbuild for this :)

Test Plan: compiles

Reviewers: sdong, yhchiang, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22701
2014-09-02 10:50:15 -07:00
Stanislau Hlebik
9dcb75b6d9 Add is-file-deletions-enabled property
Summary:
Add property 'rocksdb.is-file-deletions-enable'
	 which equals disable_delete_obsole_file_

Test Plan: make all check

Reviewers: sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22119
2014-08-26 16:26:29 -07:00
sdong
76dcf7eefd Minor: fix a format
Summary: A format fixing

Test Plan: N/A

Reviewers: ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D21255
2014-08-06 18:11:33 -07:00
sdong
7abe9655d3 Fix valgrind failure caused by recent checked-in.
Summary: Initialize un-initialized parameters

Test Plan: run the failed test (c_test)

Reviewers: yhchiang, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D21249
2014-08-06 17:45:47 -07:00
sdong
1242bfcad7 Add DB property "rocksdb.estimate-table-readers-mem"
Summary:
Add a DB Property "rocksdb.estimate-table-readers-mem" to return estimated memory usage by all loaded table readers, other than allocated from block cache.

Refactor the property codes to allow getting property from a version, with DB mutex not acquired.

Test Plan: Add several checks of this new property in existing codes for various cases.

Reviewers: yhchiang, ljin

Reviewed By: ljin

Subscribers: xjin, igor, leveldb

Differential Revision: https://reviews.facebook.net/D20733
2014-08-06 11:39:46 -07:00
sdong
f04356e660 Add DB::GetIntProperty() to return integer properties to be returned as integers
Summary: We have quite some properties that are integers and we are adding more. Add a function to directly return them as an integer, instead of a string

Test Plan: Add several unit test checks

Reviewers: yhchiang, igor, dhruba, haobo, ljin

Reviewed By: ljin

Subscribers: yoshinorim, leveldb

Differential Revision: https://reviews.facebook.net/D20637
2014-07-28 16:55:57 -07:00
sdong
f6784766db Add DB property estimated number of keys
Summary: Add a DB property of estimated number of live keys, by adding number of entries of all mem tables and all files, subtracted by all deletions in all files.

Test Plan: Add the case in unit tests

Reviewers: hobbymanyp, ljin

Reviewed By: ljin

Subscribers: MarkCallaghan, yoshinorim, leveldb, igor, dhruba

Differential Revision: https://reviews.facebook.net/D20631
2014-07-28 15:27:58 -07:00
Lei Jin
7e8bb71dd0 InternalStats to take cfd on constructor
Summary:
It has one-to-one relationship with CFD. Take a pointer to CFD on
constructor to avoid passing cfd through member functions.

Test Plan: make

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D20565
2014-07-28 12:27:08 -07:00
Lei Jin
f6f1533c6f make internal stats independent of statistics
Summary:
also make it aware of column family
output from db_bench

```
** Compaction Stats [default] **
Level Files Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) RW-Amp W-Amp Rd(MB/s) Wr(MB/s)  Rn(cnt) Rnp1(cnt) Wnp1(cnt) Wnew(cnt)  Comp(sec) Comp(cnt) Avg(sec) Stall(sec) Stall(cnt) Avg(ms)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0    14      956   0.9      0.0     0.0      0.0       2.7      2.7    0.0   0.0      0.0    111.6        0         0         0         0         24        40    0.612      75.20     492387    0.15
  L1    21     2001   2.0      5.7     2.0      3.7       5.3      1.6    5.4   2.6     71.2     65.7       31        43        55        12         82         2   41.242      43.72      41183    1.06
  L2   217    18974   1.9     16.5     2.0     14.4      15.1      0.7   15.6   7.4     70.1     64.3       17       182       185         3        241        16   15.052       0.00          0    0.00
  L3  1641   188245   1.8      9.1     1.1      8.0       8.5      0.5   15.4   7.4     61.3     57.2        9        75        76         1        152         9   16.887       0.00          0    0.00
  L4  4447   449025   0.4     13.4     4.8      8.6       9.1      0.5    4.7   1.9     77.8     52.7       38        79       100        21        176        38    4.639       0.00          0    0.00
 Sum  6340   659201   0.0     44.7    10.0     34.7      40.6      6.0   32.0  15.2     67.7     61.6       95       379       416        37        676       105    6.439     118.91     533570    0.22
 Int     0        0   0.0      1.2     0.4      0.8       1.3      0.5    5.2   2.7     59.1     65.6        3         7         9         2         20        10    2.003       0.00          0    0.00
Stalls(secs): 75.197 level0_slowdown, 0.000 level0_numfiles, 0.000 memtable_compaction, 43.717 leveln_slowdown
Stalls(count): 492387 level0_slowdown, 0 level0_numfiles, 0 memtable_compaction, 41183 leveln_slowdown

** DB Stats **
Uptime(secs): 202.1 total, 13.5 interval
Cumulative writes: 6291456 writes, 6291456 batches, 1.0 writes per batch, 4.90 ingest GB
Cumulative WAL: 6291456 writes, 6291456 syncs, 1.00 writes per sync, 4.90 GB written
Interval writes: 1048576 writes, 1048576 batches, 1.0 writes per batch, 836.0 ingest MB
Interval WAL: 1048576 writes, 1048576 syncs, 1.00 writes per sync, 0.82 MB written

Test Plan: ran it

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19917
2014-07-21 12:57:29 -07:00
Igor Canadi
d2a727c182 BG -> GB 2014-07-14 09:06:38 -07:00
Igor Canadi
ee6b35e55a Fix mac compile
Summary: We should use PRIu64 instead of "%lu" for portability

Test Plan: compiles now

Reviewers: ljin, dhruba

Reviewed By: dhruba

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19809
2014-07-14 09:56:52 -04:00
Lei Jin
46f0f6ddd5 improve InternalStats output
Summary: as title

Test Plan:
sampe output:
Level Files Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(BG) Wnew(GB) RW-Amp W-Amp Rd(MB/s) Wr(MB/s)  Rn(cnt) Rnp1(cnt) Wnp1(cnt) Wnew(cnt)  Comp(sec) Comp(cnt) Avg(sec) Stall(sec) Stall(cnt) Avg(ms)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0    15     1024   1.0      0.0     0.0      0.0       8.2      8.2    0.0   0.0      0.0    111.4        0         0         1         1         75       123    0.612     295.94    1939238    0.15
  L1    23     2118   2.1     20.9     8.3     12.7      20.0      7.3    5.0   2.4     73.2     69.9      124       141       208        67        293         8   36.582      17.05      16100    1.06
  L2   162    15333   1.5     47.0     7.1     40.0      42.6      2.6   12.7   6.0     67.9     61.5       62       457       482        25        709        55   12.898       0.00          0    0.00
  L3   985   108065   1.1     37.8     4.0     33.9      36.9      3.0   18.8   9.3     60.1     58.5       41       338       363        25        645        31   20.812       0.00          0    0.00
  L4  2788   356033   0.3      0.0     0.0      0.0       0.0      0.0    0.0   0.0      0.0      0.0        0         0         0         0          0         0    0.000       0.00          0    0.00
 Sum  3973   482572   0.0    105.8    19.3     86.5     107.7     21.2   11.1   5.6     62.9     64.0      227       936      1054       118       1723       217    7.938     312.99    1955338    0.16

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19707
2014-07-11 15:03:30 -07:00
Lei Jin
8a9cc7885c report correct interval amplification
Summary: as title

Test Plan: make release

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19515
2014-07-08 12:48:10 -07:00
sdong
a570740727 Expose number of entries in mem tables to users
Summary: In this patch, two new DB properties are defined: rocksdb.num-immutable-mem-table and rocksdb.num-entries-imm-mem-tables, from where number of entries in mem tables can be exposed to users

Test Plan:
Cover the codes in db_test
make all check

Reviewers: haobo, ljin, igor

Reviewed By: igor

CC: nkg-, igor, yhchiang, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D18207
2014-04-22 22:13:21 -07:00
Igor Canadi
ddbd1ece88 Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
	db/db_test.cc
	db/internal_stats.cc
	db/internal_stats.h
	db/version_edit.cc
	db/version_edit.h
	db/version_set.cc
	include/rocksdb/options.h
	util/options.cc
2014-03-31 13:39:24 -07:00
Haobo Xu
a92194e5b2 [RocksDB] Add db property "rocksdb.cur-size-active-mem-table"
Summary: as title

Test Plan: db_test

Reviewers: sdong

Reviewed By: sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17217
2014-03-27 15:14:04 -07:00
Igor Canadi
e20fa3f8a4 Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
	db/internal_stats.cc
	db/internal_stats.h
	db/version_set.cc
2014-03-19 17:22:20 -07:00
sdong
71e6a34271 Add a DB property to indicate number of background errors encountered
Summary: Add a property to calculate number of background errors encountered to help users build their monitoring

Test Plan: Add a unit test. make all check

Reviewers: haobo, igor, dhruba

Reviewed By: igor

CC: ljin, nkg-, yhchiang, leveldb

Differential Revision: https://reviews.facebook.net/D16959
2014-03-18 14:28:30 -07:00
Kai Liu
1ec72b37b1 Several easy-to-add properties related to compaction and flushes
Summary: To partly address the request @nkg- raised, add three easy-to-add properties to compactions and flushes.

Test Plan: run unit tests and add a new unit test to cover new properties.

Reviewers: haobo, dhruba

Reviewed By: dhruba

CC: nkg-, leveldb

Differential Revision: https://reviews.facebook.net/D13677
2014-03-18 14:00:09 -07:00
Igor Canadi
f7489123e2 Move compaction picker and internal key comparator to ColumnFamilyData
Summary: Compaction picker and internal key comparator are different for each column family (not global), so they should live in ColumnFamilyData

Test Plan: make check

Reviewers: dhruba, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15801
2014-01-31 16:06:55 -08:00
Igor Canadi
c37e7de669 Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
	db/db_impl.h
2014-01-30 11:47:23 -08:00
Igor Canadi
3c0dcf0e25 InternalStatistics
Summary:
In DBImpl we keep track of some statistics internally and expose them via GetProperty(). This diff encapsulates all the internal statistics into a class InternalStatisics. Most of it is copy/paste.

Apart from cleaning up db_impl.cc, this diff is also necessary for Column families, since every column family should have its own CompactionStats, MakeRoomForWrite-stall stats, etc. It's much easier to keep track of it in every column family if it's nicely encapsulated in its own class.

Test Plan: make check

Reviewers: dhruba, kailiu, haobo, sdong, emayanke

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15273
2014-01-29 20:40:41 -08:00