Summary: Due to how the code handled compactions in Level 0 in `PickCompaction()` it could be the case that two compactions on level 0 ran that produced tables in level 1 that overlap. However, this case seems like it would only occur on a seek compaction which is unlikely on level 0. Furthermore, level 0 and level 1 had to have a certain arrangement of files.
Test Plan:
make check
Reviewers: dhruba, vamsi
Reviewed By: dhruba
CC: leveldb, sheki
Differential Revision: https://reviews.facebook.net/D7923
Summary:
Specific changes:
1) Turn on -Werror so all warnings are errors
2) Fix some warnings the above now complains about
3) Add proper dependency support so changing a .h file forces a .c file
to rebuild
4) Automatically use fbcode gcc on any internal machine rather than
whatever system compiler is laying around
5) Fix jemalloc to once again be used in the builds (seemed like it
wasn't being?)
6) Fix issue where 'git' would fail in build_detect_version because of
LD_LIBRARY_PATH being set in the third-party build system
Test Plan:
make, make check, make clean, touch a header file, make sure
rebuild is expected
Reviewers: dhruba
Reviewed By: dhruba
Differential Revision: https://reviews.facebook.net/D7887
Summary:
Pretty much a blind copy of the patch in open source.
Hope to get this in before we make a release
Test Plan: make clean check
Reviewers: dhruba, heyongqiang
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D7809
Summary: Found some issues running Valgrind on `db_test` (there are still some outstanding ones) and fixed them.
Test Plan:
make check
ran `valgrind ./db_test` and saw that errors no longer occur
Reviewers: dhruba, vamsi, emayanke, sheki
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D7803
Summary:
This was a peformance regression caused by https://reviews.facebook.net/D6729.
The default value of max_grandparent_overlap_factor was erroneously
set to 0 in db_bench.
This was causing compactions to create really really small files because the max_grandparent_overlap_factor was erroneously set to zero in the benchmark.
Test Plan: Run --benchmarks=overwrite
Reviewers: heyongqiang, emayanke, sheki, MarkCallaghan
Reviewed By: sheki
CC: leveldb
Differential Revision: https://reviews.facebook.net/D7797
Summary:
Changed CreateDir() to CreateDirIfMissing() so a directory that already exists now causes and error.
Fixed CreateDirIfMissing() and added Env.DirExists()
Test Plan:
make check to test for regessions
Ran the following to test if the error message is not about lock files not existing
./db_bench --db=dir/testdb
After creating a file "testdb", ran the following to see if it failed with sane error message:
./db_bench --db=testdb
Reviewers: dhruba, emayanke, vamsi, sheki
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D7707
Summary:
Adds the option --seed to db_bench to specify the base for the per-thread RNG.
When not set each thread uses the same value across runs of db_bench which defeats
IO stress testing.
Adds the option --read_range. When set to a value > 1 an iterator is created and
each query done for the randomread benchmark will do a range scan for that many
rows. When not set or set to 1 the existing behavior (a point lookup) is done.
Fixes a bug where a printf format string was missing.
Test Plan: run db_bench
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D7749
Summary:
`MemTableList::Add()` neglected to mention that it took ownership of the reference held by its caller.
The comment in `MemTable::Get()` was wrong in describing the format of the key.
Test Plan: None
Reviewers: dhruba, sheki, emayanke, vamsi
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D7755
Summary:
There was a bug in the ExtendOverlappingInputs method so that
the terminating condition for the backward search was incorrect.
Test Plan: make clean check
Reviewers: sheki, emayanke, MarkCallaghan
Reviewed By: MarkCallaghan
CC: leveldb
Differential Revision: https://reviews.facebook.net/D7725
Summary:
Leveldb has an api OpenForReadOnly() that opens the database
in readonly mode. This call had an option to not process the
transaction log. This patch removes this option and always
processes all transactions that had been committed. It has
been done in such a way that it does not create/write to
any new files in the process. The invariant of "no-writes"
to the leveldb data directory is still true.
This enhancement allows multiple threads to open the same database
in readonly mode and access all trancations that were committed right
upto the OpenForReadOnly call.
I changed the public API to match the new semantics because
there are no users who are currently using this api.
Test Plan: make clean check
Reviewers: sheki
Reviewed By: sheki
CC: leveldb
Differential Revision: https://reviews.facebook.net/D7479
Summary:
1. The OpenForReadOnly() call should not lock the db. This is useful
so that multiple processes can open the same database concurrently
for reading.
2. GetUpdatesSince should not error out if the archive directory
does not exist.
3. A new constructor for WriteBatch that can takes a serialized
string as a parameter of the constructor.
Test Plan: make clean check
Reviewers: sheki
Reviewed By: sheki
CC: leveldb
Differential Revision: https://reviews.facebook.net/D7449
Summary:
Added kMetaDatabase for meta-databases in db/filename.h along with supporting
fuctions.
Fixed switch in DBImpl so that it also handles kMetaDatabase.
Fixed DestroyDB() that it can handle destroying meta-databases.
Test Plan: make check
Reviewers: sheki, emayanke, vamsi, dhruba
Reviewed By: dhruba
Differential Revision: https://reviews.facebook.net/D7245
Summary:
C tests would fail sometimes as DestroyDB would return a Failure Status
message when deleting an archival directory which was not created
(WAL_ttl_seconds = 0).
Fix: Ignore the Status returned on Deleting Archival Directory.
Test Plan: * make check
Reviewers: dhruba, emayanke
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D7395
Summary:
WriteBatch is now used by the GetUpdatesSinceAPI. This API is external
and will be used by the rocks server. Rocks Server and others will need
to know about the Sequence Number in the WriteBatch. This public method
will allow for that.
Test Plan: make all check.
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D7293
Summary:
* Fixed implementation bug in Binary_Searvch introduced in https://reviews.facebook.net/D7119
* Binary search is also overflow safe.
* Delete archive log files and archive dir during DestroyDB
Test Plan: make check
Reviewers: dhruba
CC: kosievdmerwe, emayanke
Differential Revision: https://reviews.facebook.net/D7263
Summary:
Implement a interface to retrieve the most current transaction
id from the database.
Test Plan: Added unit test.
Reviewers: sheki
Reviewed By: sheki
CC: leveldb
Differential Revision: https://reviews.facebook.net/D7269
Summary:
filename.h has functions to do similar things.
Moving code away from db_impl.cc
Test Plan: make check
Reviewers: dhruba
Reviewed By: dhruba
Differential Revision: https://reviews.facebook.net/D7251
Summary:
How it works:
* GetUpdatesSince takes a SequenceNumber.
* A LogFile with the first SequenceNumber nearest and lesser than the requested Sequence Number is found.
* Seek in the logFile till the requested SeqNumber is found.
* Return an iterator which contains logic to return record's one by one.
Test Plan:
* Test case included to check the good code path.
* Will update with more test-cases.
* Feedback required on test-cases.
Reviewers: dhruba, emayanke
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D7119
Summary:
A compaction is picked based on its score. It is useful to
print the compaction score in the LOG because it aids in
debugging. If one looks at the logs, one can find out why
a compaction was preferred over another.
Test Plan: make clean check
Differential Revision: https://reviews.facebook.net/D7137
Summary:
Create a directory "archive" in the DB directory.
During DeleteObsolteFiles move the WAL files (*.log) to the Archive directory,
instead of deleting.
Test Plan: Created a DB using DB_Bench. Reopened it. Checked if files move.
Reviewers: dhruba
Reviewed By: dhruba
Differential Revision: https://reviews.facebook.net/D6975
Summary:
Scripted and removed all trailing spaces and converted all tabs to
spaces.
Also fixed other lint errors.
All lint errors from this point of time should be taken seriously.
Test Plan: make all check
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D7059
Summary:
LevelDB should delete almost-new keys when a long-open snapshot exists.
The previous behavior is to keep all versions that were created after the
oldest open snapshot. This can lead to database size bloat for
high-update workloads when there are long-open snapshots and long-open
snapshot will be used for logical backup. By "almost new" I mean that the
key was updated more than once after the oldest snapshot.
If there were two snapshots with seq numbers s1 and s2 (s1 < s2), and if
we find two instances of the same key k1 that lie entirely within s1 and
s2 (i.e. s1 < k1 < s2), then the earlier version
of k1 can be safely deleted because that version is not visible in any snapshot.
Test Plan:
unit test attached
make clean check
Differential Revision: https://reviews.facebook.net/D6999
Summary:
Print out status at the end of a compaction run. This helps in
debugging.
Test Plan: make clean check
Reviewers: sheki
Reviewed By: sheki
Differential Revision: https://reviews.facebook.net/D7035
Summary:
When we expand the range of keys for a level 0 compaction, we
need to invoke ParentFilesInCompaction() only once for the
entire range of keys that is being compacted. We were invoking
it for each file that was being compacted, but this triggers
an assertion because each file's range were contiguous but
non-overlapping.
I renamed ParentFilesInCompaction to ParentRangeInCompaction
to adequately represent that it is the range-of-keys and
not individual files that we compact in a single compaction run.
Here is the assertion that is fixed by this patch.
db_test: db/version_set.cc:585: void leveldb::Version::ExtendOverlappingInputs(int, const leveldb::Slice&, const leveldb::Slice&, std::vector<leveldb::FileMetaData*, std::allocator<leveldb::FileMetaData*> >*, int): Assertion `user_cmp->Compare(flimit, user_begin) >= 0' failed.
Test Plan: make clean check OPT=-g
Reviewers: sheki
Reviewed By: sheki
CC: MarkCallaghan, emayanke, leveldb
Differential Revision: https://reviews.facebook.net/D6963
Summary:
On fast filesystems (e.g. /dev/shm and ext4), the flushing
of memstore to disk was fast and quick, and the background compaction
thread was not getting scheduled fast enough to delete obsolete
files before the db was closed. This caused the repair method
to pick up those files that were not part of the db and the unit
test was failing.
The fix is to enhance the unti test to run a compaction before
closing the database so that all files that are not part of the
database are truly deleted from the filesystem.
Test Plan: make c_test; ./c_test
Reviewers: chip, emayanke, sheki
Reviewed By: chip
CC: leveldb
Differential Revision: https://reviews.facebook.net/D6915
Summary:
The compaction process takes some files from LevelK and
merges it into LevelK+1. The number of files it picks from
LevelK was capped such a way that the total amount of
data picked does not exceed the maxfilesize of that level.
This essentially meant that only one file from LevelK
is picked for a single compaction.
For bulkloads, we would like to take many many file from
LevelK and compact them using a single compaction run.
This patch introduces a option called the 'source_compaction_factor'
(similar to expanded_compaction_factor). It is a multiplier
that is multiplied by the maxfilesize of that level to arrive
at the limit that is used to throttle the number of source
files from LevelK. For bulk loads, set source_compaction_factor
to a very high number so that multiple files from the same
level are picked for compaction in a single compaction.
The default value of source_compaction_factor is 1, so that
we can keep backward compatibilty with existing compaction semantics.
Test Plan: make clean check
Reviewers: emayanke, sheki
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D6867
Summary:
This option is needed for fast bulk uploads. The goal is to load
all the data into files in L0 without any interference from
background compactions.
Test Plan: make clean check
Reviewers: sheki
Reviewed By: sheki
CC: leveldb
Differential Revision: https://reviews.facebook.net/D6849
Summary:
The method Finalize() recomputes the compaction score of each
level and then sorts these score from largest to smallest. The
idea is that the level with the largest compaction score will
be a better candidate for compaction. There are usually very
few levels, and a bubble sort code was used to sort these
compaction scores. There existed a bug in the sorting code that
skipped looking at the score for the n-1 level. This meant that
even if the compaction score of the n-1 level is large, it will
not be picked for compaction.
This patch fixes the bug and also introduces "asserts" in the
code to detect any possible inconsistencies caused by future bugs.
This bug existed in the very first code change that introduced
multi-threaded compaction to the leveldb code. That version of
code was committed on Oct 19th via
1ca0584345
Test Plan: make clean check OPT=-g
Reviewers: emayanke, sheki, MarkCallaghan
Reviewed By: sheki
CC: leveldb
Differential Revision: https://reviews.facebook.net/D6837
Summary:
The manifest file contains a series of edits. If the verbose
option is switched on, then print each individual edit in the
manifest file. This helps in debugging.
Test Plan: make clean manifest_dump
Reviewers: emayanke, sheki
Reviewed By: sheki
CC: leveldb
Differential Revision: https://reviews.facebook.net/D6807
Summary: The new function MinLevelToCompress in db_test.cc was incomplete. It needs to tell the calling function-TEST whether the test has to be skipped or not
Test Plan: make all;./db_test
Reviewers: dhruba, heyongqiang
Reviewed By: dhruba
CC: sheki
Differential Revision: https://reviews.facebook.net/D6771
Summary:
The manifest file contains a series of edits. If the verbose
option is switched on, then print each individual edit in the
manifest file. This helps in debugging.
Test Plan: make clean manifest_dump
Reviewers: emayanke, sheki
Reviewed By: sheki
CC: leveldb
Differential Revision: https://reviews.facebook.net/D6807
Summary:
dbstress has an option to reopen the database. Make it such that the
previous handle is not closed before we reopen, this simulates a
situation similar to a process crash.
Added new api to DMImpl to remove the lock file.
Test Plan: run db_stress
Reviewers: emayanke
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D6777
Summary: The new function MinLevelToCompress in db_test.cc was incomplete. It needs to tell the calling function-TEST whether the test has to be skipped or not
Test Plan: make all;./db_test
Reviewers: dhruba, heyongqiang
Reviewed By: dhruba
CC: sheki
Differential Revision: https://reviews.facebook.net/D6771
Summary:
The value specified in max_grandparent_overlap_factor is used to
limit the file size in a compaction run. This patch makes it
configurable when using db_bench.
Test Plan: make clean db_bench
Reviewers: MarkCallaghan, heyongqiang
Reviewed By: heyongqiang
CC: leveldb
Differential Revision: https://reviews.facebook.net/D6729
Summary:
There are applications that operate on multiple leveldb instances.
These applications will like to pass in an opaque type for each
leveldb instance and this type should be passed back to the application
with every invocation of the CompactionFilter api.
Test Plan: Enehanced unit test for opaque parameter to CompactionFilter.
Reviewers: heyongqiang
Reviewed By: heyongqiang
CC: MarkCallaghan, sheki, emayanke
Differential Revision: https://reviews.facebook.net/D6711
Summary:
Compilation used to fail with the error:
db/version_set.cc:1773: error: ‘number_of_files_to_sort_’ is not a member of ‘leveldb::VersionSet’
I created a new method called CheckConsistencyForDeletes() so that
all the high cost checking is done only when OPT=-g is specified.
I also fixed a bug in PickCompactionBySize that was triggered when
OPT=-g was switched on. The base_index in the compaction record
was not set correctly.
Test Plan: make check OPT=-g
Differential Revision: https://reviews.facebook.net/D6687
Summary:
The db_bench utility was broken in 1.5.4.fb because of a
signed-unsigned comparision.
The static variable FLAGS_min_level_to_compress was recently
changed from int to 'unsigned in' but it is initilized to a
nagative value -1.
The segfault is of this type:
Program received signal SIGSEGV, Segmentation fault.
Open (this=0x7fffffffdee0) at db/db_bench.cc:939
939 db/db_bench.cc: No such file or directory.
(gdb) where
Test Plan: run db_bench with no options.
Reviewers: heyongqiang
Reviewed By: heyongqiang
CC: MarkCallaghan, emayanke, sheki
Differential Revision: https://reviews.facebook.net/D6663
Summary:
make clean check OPT=-g fails
leveldb::DBStatistics::getTickerCount(leveldb::Tickers)’:
./db/db_statistics.h:34: error: ‘MAX_NO_TICKERS’ was not declared in this scope
util/ldb_cmd.cc:255: warning: left shift count >= width of type
Test Plan:
make clean check OPT=-g
Reviewers:
CC:
Task ID: #
Blame Rev:
Summary: Record BloomFliter hits and drop off reasons during compaction.
Test Plan: Unit tests work.
Reviewers: dhruba, heyongqiang
Reviewed By: dhruba
Differential Revision: https://reviews.facebook.net/D6591
Summary:
disable size compaction in ldb reduce_levels, this will avoid compactions rather than the manual comapction,
added --compression=none|snappy|zlib|bzip2 and --file_size= per-file size to ldb reduce_levels command
Test Plan: run ldb
Reviewers: dhruba, MarkCallaghan
Reviewed By: dhruba
CC: sheki, emayanke
Differential Revision: https://reviews.facebook.net/D6597
Summary:
Prototype stat's collection. Diff is a good estimate of what
the final code will look like.
A few assumptions :
* Used a global static instance of the statistics object. Plan to pass
it to each internal function. Static allows metrics only at app
level.
* In the Ticker's do not do any locking. Depend on the mutex at each
function of LevelDB. If we ever remove the mutex, we should change
here too. The other option is use atomic objects anyways as there
won't be any contention as they will be always acquired only by one
thread.
* The counters are dumb, increment through lifecycle. Plan to use ods
etc to get last5min stat etc.
Test Plan:
made changes in db_bench
Ran ./db_bench --statistics=1 --num=10000 --cache_size=5000
This will print the cache hit/miss stats.
Reviewers: dhruba, heyongqiang
Differential Revision: https://reviews.facebook.net/D6441
Summary:
When a new version is created, we sort all the files at every
level based on their size. This is necessary because we want
to compact the largest file first. The sorting takes quite a
bit of CPU.
Moved the sorting code to be outside the mutex. Also, the
earlier code was sorting files at all levels but we do not
need to sort the highest-number level because those files
are never the cause of any compaction. To reduce sorting
costs, we sort only the first few files in each level
because it is likely that those are the only files in that
level that will be picked for compaction.
At steady state, I have seen that this patch increase
throughout from 1500 writes/sec to 1700 writes/sec at the
end of a 72 hour run. The cpu saving by not sorting the
last level was not distinctive in this test run because
there were only 100K files in the highest numbered level.
I expect the cpu saving to be significant when the number of
files is much higher.
This is mostly an early preview and not ready for rigorous review.
With this patch, the writs/sec is now bottlenecked not by the sorting code but by GetOverlappingInputs. I am working on a patch to optimize GetOverlappingInputs.
Test Plan: make check
Reviewers: MarkCallaghan, heyongqiang
Reviewed By: heyongqiang
Differential Revision: https://reviews.facebook.net/D6411