Summary:
skiplist doesn't cache the location of the last insert and becomes
CPU bound when the input data has sequential keys.
Notes on thread safety: ::Insert() already requires external
synchronization. So this change is not making it any worse.
Test Plan: skiplist_test
Reviewers: dhruba
Reviewed By: dhruba
Differential Revision: https://reviews.facebook.net/D3129
In particular, we add a new FilterPolicy class. An instance
of this class can be supplied in Options when opening a
database. If supplied, the instance is used to generate
summaries of keys (e.g., a bloom filter) which are placed in
sstables. These summaries are consulted by DB::Get() so we
can avoid reading sstable blocks that are guaranteed to not
contain the key we are looking for.
This change provides one implementation of FilterPolicy
based on bloom filters.
Other changes:
- Updated version number to 1.4.
- Some build tweaks.
- C binding for CompactRange.
- A few more benchmarks: deleteseq, deleterandom, readmissing, seekrandom.
- Minor .gitignore update.
- Pass system's values of CFLAGS,LDFLAGS.
Don't override OPT if it's already set.
Original patch by Alessio Treglia <alessio@debian.org>:
http://code.google.com/p/leveldb/issues/detail?id=27#c6
- Remove 1 exit time destructor from leveldb.
See http://crbug.com/101600
- Fix problem where sstable building code would pass an
internal key to the user comparator.
(Sync with uptream at 25436817.)
- Replace raw slice comparison with a call to user comparator.
Added test for custom comparators.
- Fix end of namespace comments.
- Fixed bug in picking inputs for a level-0 compaction.
When finding overlapping files, the covered range may expand
as files are added to the input set. We now correctly expand
the range when this happens instead of continuing to use the
old range. For example, suppose L0 contains files with the
following ranges:
F1: a .. d
F2: c .. g
F3: f .. j
and the initial compaction target is F3. We used to search
for range f..j which yielded {F2,F3}. However we now expand
the range as soon as another file is added. In this case,
when F2 is added, we expand the range to c..j and restart the
search. That picks up file F1 as well.
This change fixes a bug related to deleted keys showing up
incorrectly after a compaction as described in Issue 44.
(Sync with upstream @25072954)
- Added DB::CompactRange() method.
Changed manual compaction code so it breaks up compactions of
big ranges into smaller compactions.
Changed the code that pushes the output of memtable compactions
to higher levels to obey the grandparent constraint: i.e., we
must never have a single file in level L that overlaps too
much data in level L+1 (to avoid very expensive L-1 compactions).
Added code to pretty-print internal keys.
- Fixed bug where we would not detect overlap with files in
level-0 because we were incorrectly using binary search
on an array of files with overlapping ranges.
Added "leveldb.sstables" property that can be used to dump
all of the sstables and ranges that make up the db state.
- Removing post_write_snapshot support. Email to leveldb mailing
list brought up no users, just confusion from one person about
what it meant.
- Fixing static_cast char to unsigned on BIG_ENDIAN platforms.
Fixes Issue 35 and Issue 36.
- Comment clarification to address leveldb Issue 37.
- Change license in posix_logger.h to match other files.
- A build problem where uint32 was used instead of uint32_t.
Sync with upstream @24408625
- Fix bug in Get: when it triggers a compaction, it could sometimes
mark the compaction with the wrong level (if there was a gap
in the set of levels examined for the Get).
- Do not hold mutex while writing to the log file or to the
MANIFEST file.
Added a new benchmark that runs a writer thread concurrently with
reader threads.
Percentiles
------------------------------
micros/op: avg median 99 99.9 99.99 99.999 max
------------------------------------------------------
before: 42 38 110 225 32000 42000 48000
after: 24 20 55 65 130 1100 7000
- Fixed race in optimized Get. It should have been using the
pinned memtables, not the current memtables.
git-svn-id: https://leveldb.googlecode.com/svn/trunk@50 62dab493-f737-651d-591e-8d6aee1b9529
- Fix for issue 33 (non-null-terminated result from
leveldb_property_value())
- Support for running multiple instances of a benchmark in parallel.
- Reduce lock contention on Get():
(1) Do not hold the lock while searching memtables.
(2) Shard block and table caches 16-ways.
Benchmark for evaluating this change:
$ db_bench --benchmarks=fillseq1,readrandom --threads=$n
(fillseq1 is a small hack to make sure fillseq runs once regardless
of number of threads specified on the command line).
git-svn-id: https://leveldb.googlecode.com/svn/trunk@49 62dab493-f737-651d-591e-8d6aee1b9529
- Fix bug in Iterator::Prev where it would return the wrong key.
Fixes issues 29 and 30.
- Added a tweak to testharness to allow running just some tests.
- Fixing two minor documentation errors based on issues 28 and 25.
- Cleanup; fix namespaces of export-to-C code.
Also fix one "const char*" vs "char*" mismatch.
git-svn-id: https://leveldb.googlecode.com/svn/trunk@48 62dab493-f737-651d-591e-8d6aee1b9529
- Added a C binding for LevelDB.
May be useful as a stable ABI that can be used by
programs that keep leveldb in a shared library,
or for JNI API.
- Replaced SQLite's readseq benchmark to a more efficient version.
SQLite readseq speeds increased by about a factor of 2x
from the previous version. Also updated benchmark page to
reflect readseq speed up.
git-svn-id: https://leveldb.googlecode.com/svn/trunk@46 62dab493-f737-651d-591e-8d6aee1b9529
- Removed one copy of an uncompressed block contents changing
the signature of Snappy_Uncompress() so it uncompresses into a
flat array instead of a std::string.
Speeds up readrandom ~10%.
- Instead of a combination of Env/WritableFile, we now have a
Logger interface that can be easily overridden applications
that want to supply their own logging.
- Separated out the gcc and Sun Studio parts of atomic_pointer.h
so we can use 'asm', 'volatile' keywords for Sun Studio.
git-svn-id: https://leveldb.googlecode.com/svn/trunk@39 62dab493-f737-651d-591e-8d6aee1b9529
- LevelDB patch for Sun Studio
Based on a patch submitted by Theo Schlossnagle - thanks!
This fixes Issue 17.
- Fix a couple of test related memory leaks.
git-svn-id: https://leveldb.googlecode.com/svn/trunk@38 62dab493-f737-651d-591e-8d6aee1b9529
Slight tweak to the no-overlap optimization: only push to
level 2 to reduce the amount of wasted space when the same
small key range is being repeatedly overwritten.
Fix for Issue 18: Avoid failure on Windows by avoiding
deletion of lock file until the end of DestroyDB().
Fix for Issue 19: Disregard sequence numbers when checking for
overlap in sstable ranges. This fixes issue 19: when writing
the same key over and over again, we would generate a sequence
of sstables that were never merged together since their sequence
numbers were disjoint.
Don't ignore map/unmap error checks.
Miscellaneous fixes for small problems Sanjay found while diagnosing
issue/9 and issue/16 (corruption_testr failures).
- log::Reader reports the record type when it finds an unexpected type.
- log::Reader no longer reports an error when it encounters an expected
zero record regardless of the setting of the "checksum" flag.
- Added a missing forward declaration.
- Documented a side-effects of larger write buffer sizes
(longer recovery time).
git-svn-id: https://leveldb.googlecode.com/svn/trunk@37 62dab493-f737-651d-591e-8d6aee1b9529
- Implemented Get() directly instead of building on top of a full
merging iterator stack. This speeds up the "readrandom" benchmark
by up to 15-30%.
- Fixed an opensource compilation problem.
Added --db=<name> flag to control where the database is placed.
- Automatically compact a file when we have done enough
overlapping seeks to that file.
- Fixed a performance bug where we would read from at least one
file in a level even if none of the files overlapped the key
being read.
- Makefile fix for Mac OSX installations that have XCode 4 without XCode 3.
- Unified the two occurrences of binary search in a file-list
into one routine.
- Found and fixed a bug where we would unnecessarily search the
last file when looking for a key larger than all data in the
level.
- A fix to avoid the need for trivial move compactions and
therefore gets rid of two out of five syncs in "fillseq".
- Removed the MANIFEST file write when switching to a new
memtable/log-file for a 10-20% improvement on fill speed on ext4.
- Adding a SNAPPY setting in the Makefile for folks who have
Snappy installed. Snappy compresses values and speeds up writes.
git-svn-id: https://leveldb.googlecode.com/svn/trunk@32 62dab493-f737-651d-591e-8d6aee1b9529
Fixed race condition reported by Dave Smit (dizzyd@dizzyd,com)
on the leveldb mailing list. We were not signalling
waiters after a trivial move from level-0. The result was
that in some cases (hard to reproduce), a write would get
stuck forever waiting for the number of level-0 files to drop
below its hard limit.
The new code is simpler: there is just one condition variable
instead of two, and the condition variable is signalled after
every piece of background work finishes. Also, all compaction
work (including for manual compactions) is done in the
background thread, and therefore we can remove the
"compacting_" variable.
git-svn-id: https://leveldb.googlecode.com/svn/trunk@31 62dab493-f737-651d-591e-8d6aee1b9529
* Patch LevelDB to build for OSX and iOS
* Fix race condition in memtable iterator deletion.
* Other small fixes.
git-svn-id: https://leveldb.googlecode.com/svn/trunk@29 62dab493-f737-651d-591e-8d6aee1b9529
* env_chromium.cc should not export symbols.
* Fix MSVC warnings.
* Removed large value support.
* Fix broken reference to documentation file
git-svn-id: https://leveldb.googlecode.com/svn/trunk@24 62dab493-f737-651d-591e-8d6aee1b9529