Commit Graph

6726 Commits

Author SHA1 Message Date
Dhruba Borthakur
f50ece60c7 Fix table-cache size bug, gather table-cache statistics and prevent readahead done by fs. Summary:
Summary:
The db_bench test was not using the specified value for the max-file-open. Fixed.

The fs readhead is switched off.

Gather statistics about the table cache and print it out at the end of the tets run.

Test Plan: Revert Plan:

Reviewers: adsharma, sc

Reviewed By: adsharma

Differential Revision: https://reviews.facebook.net/D3441
2012-05-30 16:42:45 -07:00
Dhruba Borthakur
8f293b68a9 Support --bufferedio=[0,1] from db_bench. If bufferedio = 0, then the read code path clears the OS page cache after the IO is completed. The default remains as bufferedio=1
Summary:
Task ID: #

Blame Rev:

Test Plan: Revert Plan:

Differential Revision: https://reviews.facebook.net/D3429
2012-05-29 13:29:44 -07:00
Dhruba Borthakur
33a3c6ff6c Ability to make the benchmark issue a large number of IOs. This is helpful to populate many gigabytes of data for benchmarking at scale.
Summary:
Task ID: #

Blame Rev:

Test Plan: Revert Plan:

Differential Revision: https://reviews.facebook.net/D3333
2012-05-22 12:20:09 -07:00
Dhruba Borthakur
3b86a51cb1 Ability to switch on checksum verification from benchmark.
Summary:
Task ID: #

Blame Rev:

Test Plan: Revert Plan:

Differential Revision: https://reviews.facebook.net/D3309
2012-05-19 00:13:50 -07:00
Dhruba Borthakur
a2a0e358cb Add support to specify the number of shards for the Block cache. By default, the block cache is sharded into 16 parts.
Summary:
Task ID: #

Blame Rev:

Test Plan: Revert Plan:

Differential Revision: https://reviews.facebook.net/D3273
2012-05-16 17:23:49 -07:00
Arun Sharma
95af128225 SSE4 optimization
Summary:
This speeds up CRC computation significantly on
hardware that supports it. Enabled via -msse4.

Note: the binary won't be usable on older CPUs
that don't support the instruction.

Test Plan: crc32c_test

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D3201
2012-05-15 10:10:01 -07:00
Dhruba Borthakur
8d41351666 Ability to switch to gcc 4.1.6 and also use jemalloc. Thanks for Chip for large amounts of help.
Summary:
Task ID: #

Blame Rev:

Test Plan: Revert Plan:

Reviewers: chip

Reviewed By: chip

CC: sc, adsharma

Differential Revision: https://reviews.facebook.net/D3225
2012-05-14 22:15:09 -07:00
Arun Sharma
921a48428e Optimize for lp64
Summary:
Some code reorganization in-preparation for replacing with a hardware
instruction.

* Use u64 for some of the key types
* Use an ALIGN macro so code is easier to read

Test Plan: crc32c_test

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D3135
2012-05-14 15:40:11 -07:00
Dhruba Borthakur
37d0dcb9b1 Use the elapsed time (instead of the per-thread time) to compute ops/sec.
Summary:
Task ID: #

Blame Rev:

Test Plan: Revert Plan:

Differential Revision: https://reviews.facebook.net/D3147
2012-05-11 12:43:31 -07:00
Arun Sharma
90b2924fb2 skiplist: optimize for sequential insert pattern
Summary:
skiplist doesn't cache the location of the last insert and becomes
CPU bound when the input data has sequential keys.

Notes on thread safety: ::Insert() already requires external
synchronization. So this change is not making it any worse.

Test Plan: skiplist_test

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D3129
2012-05-11 09:57:40 -07:00
Dhruba Borthakur
cc6c32535a Support arcdiff.
Summary:
Task ID: #

Blame Rev:

Test Plan: Revert Plan:

Differential Revision: https://reviews.facebook.net/D3105
2012-05-09 23:35:05 -07:00
Sanjay Ghemawat
85584d497e Added bloom filter support.
In particular, we add a new FilterPolicy class.  An instance
of this class can be supplied in Options when opening a
database.  If supplied, the instance is used to generate
summaries of keys (e.g., a bloom filter) which are placed in
sstables.  These summaries are consulted by DB::Get() so we
can avoid reading sstable blocks that are guaranteed to not
contain the key we are looking for.

This change provides one implementation of FilterPolicy
based on bloom filters.

Other changes:
- Updated version number to 1.4.
- Some build tweaks.
- C binding for CompactRange.
- A few more benchmarks: deleteseq, deleterandom, readmissing, seekrandom.
- Minor .gitignore update.
2012-04-17 08:36:46 -07:00
Sanjay Ghemawat
bc1ee4d25e build shared libraries; updated version to 1.3; add Status accessors 2012-03-30 13:15:49 -07:00
Sanjay Ghemawat
a1ad4d1995 Build fixes and cleanups:
(1) Separate out C++ and CC flags (fixes c_test compilation)
(2) Move snappy/perftools detection to script
(3) Fix db_bench_sqlite3 and db_bench_tree_db build rules
2012-03-21 10:28:03 -07:00
Sanjay Ghemawat
9013f13b15 use mmap on 64-bit machines to speed-up reads; small build fixes 2012-03-15 09:14:00 -07:00
Sanjay Ghemawat
583f1499c0 fix LOCK file deletion to prevent crash on windows 2012-03-09 07:51:04 -08:00
Sanjay Ghemawat
d79762e273 added group commit; drastically speeds up mult-threaded synchronous write workloads 2012-03-08 16:23:21 -08:00
Sanjay Ghemawat
015d26f8be add .gitignore; support for building on a few BSD variants 2012-03-05 10:35:46 -08:00
Sanjay Ghemawat
239ac9d2de avoid very large compactions; fix build on Linux 2012-02-02 09:34:14 -08:00
Sanjay Ghemawat
3c8be108bf fixed issues 66 (leaking files on disk error) and 68 (no sync of CURRENT file) 2012-01-25 14:56:52 -08:00
Hans Wennborg
c8c5866a86 Makefile fixes for systems with $CXX other than g++.
- Makefile: Use $(CXX) for compiling C++ files,
  don't override the environment's value of $CXX

- build_detect_platform: use $CXX instead of g++.

Based on bug report from Theo Schlossnagle:
http://code.google.com/p/leveldb/issues/detail?id=46

(Sync with uptream at 25807040.)
2011-11-30 10:59:40 +00:00
Hans Wennborg
42fb47f6ed Pass system's CFLAGS, remove exit time destructor, sstable bug fix.
- Pass system's values of CFLAGS,LDFLAGS.
  Don't override OPT if it's already set.
  Original patch by Alessio Treglia <alessio@debian.org>:
  http://code.google.com/p/leveldb/issues/detail?id=27#c6

- Remove 1 exit time destructor from leveldb.
  See http://crbug.com/101600

- Fix problem where sstable building code would pass an
  internal key to the user comparator.

(Sync with uptream at 25436817.)
2011-11-14 17:06:16 +00:00
Hans Wennborg
36a5f8ed7f A number of fixes:
- Replace raw slice comparison with a call to user comparator.
  Added test for custom comparators.

- Fix end of namespace comments.

- Fixed bug in picking inputs for a level-0 compaction.

  When finding overlapping files, the covered range may expand
  as files are added to the input set.  We now correctly expand
  the range when this happens instead of continuing to use the
  old range.  For example, suppose L0 contains files with the
  following ranges:

      F1: a .. d
      F2:    c .. g
      F3:       f .. j

  and the initial compaction target is F3.  We used to search
  for range f..j which yielded {F2,F3}.  However we now expand
  the range as soon as another file is added.  In this case,
  when F2 is added, we expand the range to c..j and restart the
  search.  That picks up file F1 as well.

  This change fixes a bug related to deleted keys showing up
  incorrectly after a compaction as described in Issue 44.

(Sync with upstream @25072954)
2011-10-31 17:22:06 +00:00
Gabor Cselle
299ccedfec A number of bugfixes:
- Added DB::CompactRange() method.

  Changed manual compaction code so it breaks up compactions of
  big ranges into smaller compactions.

  Changed the code that pushes the output of memtable compactions
  to higher levels to obey the grandparent constraint: i.e., we
  must never have a single file in level L that overlaps too
  much data in level L+1 (to avoid very expensive L-1 compactions).

  Added code to pretty-print internal keys.

- Fixed bug where we would not detect overlap with files in
  level-0 because we were incorrectly using binary search
  on an array of files with overlapping ranges.

  Added "leveldb.sstables" property that can be used to dump
  all of the sstables and ranges that make up the db state.

- Removing post_write_snapshot support.  Email to leveldb mailing
  list brought up no users, just confusion from one person about
  what it meant.

- Fixing static_cast char to unsigned on BIG_ENDIAN platforms.

  Fixes	Issue 35 and Issue 36.

- Comment clarification to address leveldb Issue 37.

- Change license in posix_logger.h to match other files.

- A build problem where uint32 was used instead of uint32_t.

Sync with upstream @24408625
2011-10-05 16:30:28 -07:00
Hans Wennborg
26db4d971a Sync with upstream @24213649.
Adding GNU/kFreeBSD support. As requested here:
http://code.google.com/p/leveldb/issues/detail?id=38

Use uint64_t instead of size_t in MemEnvTest. As pointed out at
http://code.google.com/p/leveldb/issues/detail?id=41
2011-09-26 17:37:09 +01:00
Hans Wennborg
213a68eb68 Sync with upstream @23860137.
Fix GCC -Wshadow warnings in LevelDB's public header files,
reported by Dustin.

Add in-memory Env implementation (helpers/memenv/*).
This enables users to create LevelDB databases in-memory.

Initialize ShardedLRUCache::last_id_ to zero.
This fixes a Valgrind warning.

(Also delete port/sha1_* which were removed upstream some time ago.)
2011-09-12 10:21:10 +01:00
gabor@google.com
7263023651 Bugfixes: for Get(), don't hold mutex while writing log.
- Fix bug in Get: when it triggers a compaction, it could sometimes
  mark the compaction with the wrong level (if there was a gap
  in the set of levels examined for the Get).

- Do not hold mutex while writing to the log file or to the
  MANIFEST file.

  Added a new benchmark that runs a writer thread concurrently with
  reader threads.

  Percentiles
  ------------------------------
  micros/op: avg  median 99   99.9  99.99  99.999 max
  ------------------------------------------------------
  before:    42   38     110  225   32000  42000  48000
  after:     24   20     55   65    130    1100   7000

- Fixed race in optimized Get.  It should have been using the
  pinned memtables, not the current memtables.



git-svn-id: https://leveldb.googlecode.com/svn/trunk@50 62dab493-f737-651d-591e-8d6aee1b9529
2011-09-01 19:08:02 +00:00
gabor@google.com
e3584f9c28 Bugfix for issue 33; reduce lock contention in Get(), parallel benchmarks.
- Fix for issue 33 (non-null-terminated result from
  leveldb_property_value())

- Support for running multiple instances of a benchmark in parallel.

- Reduce lock contention on Get():
  (1) Do not hold the lock while searching memtables.
  (2) Shard block and table caches 16-ways.

  Benchmark for evaluating this change:
  $ db_bench --benchmarks=fillseq1,readrandom --threads=$n
  (fillseq1 is a small hack to make sure fillseq runs once regardless
  of number of threads specified on the command line).



git-svn-id: https://leveldb.googlecode.com/svn/trunk@49 62dab493-f737-651d-591e-8d6aee1b9529
2011-08-22 21:08:51 +00:00
gabor@google.com
ab323f7e1e Bugfixes for iterator and documentation.
- Fix bug in Iterator::Prev where it would return the wrong key.
  Fixes issues 29 and 30.

- Added a tweak to testharness to allow running just some tests.

- Fixing two minor documentation errors based on issues 28 and 25.

- Cleanup; fix namespaces of export-to-C code.
  Also fix one "const char*" vs "char*" mismatch.



git-svn-id: https://leveldb.googlecode.com/svn/trunk@48 62dab493-f737-651d-591e-8d6aee1b9529
2011-08-16 01:21:01 +00:00
dgrogan@chromium.org
a05525d13b @23023120
git-svn-id: https://leveldb.googlecode.com/svn/trunk@47 62dab493-f737-651d-591e-8d6aee1b9529
2011-08-06 00:19:37 +00:00
gabor@google.com
021ee9c32b C binding for leveldb, better readseq benchmark for SQLite.
- Added a C binding for LevelDB.
  May be useful as a stable ABI that can be used by 
  programs that keep leveldb in a shared library, 
  or for JNI API.

- Replaced SQLite's readseq benchmark to a more efficient version. 
  SQLite readseq speeds increased by about a factor of 2x 
  from the previous version. Also updated benchmark page to
  reflect readseq speed up.



git-svn-id: https://leveldb.googlecode.com/svn/trunk@46 62dab493-f737-651d-591e-8d6aee1b9529
2011-08-05 20:40:49 +00:00
gabor@google.com
1bfbe76b4e Improved benchmark, fixed bugs and SQLite parameters.
- Based on suggestions on the sqlite-users mailing list,
  we removed the superfluous index on the primary key 
  for SQLite's benchmarks, and turned write-ahead logging 
  ("WAL") on. This led to performance improvements for SQLite.

- Based on a suggestion by Florian Weimer on the leveldb
  mailing list, we disabled hard drive write-caching via
  hdparm when testing synchronous writes. This led to
  performance losses for LevelDB and Kyoto TreeDB.

- Fixed a mistake in 2.A.->Random where the bar sizes
  were switched for Kyoto TreeDB and SQLite.



git-svn-id: https://leveldb.googlecode.com/svn/trunk@45 62dab493-f737-651d-591e-8d6aee1b9529
2011-07-29 21:35:05 +00:00
gabor@google.com
b9ef9141ba Minor typos in benchmark page.
git-svn-id: https://leveldb.googlecode.com/svn/trunk@44 62dab493-f737-651d-591e-8d6aee1b9529
2011-07-27 14:29:59 +00:00
gabor@google.com
e8dee348b6 Minor edit in benchmark page.
(Baseline comparison does not make sense for large values.)


git-svn-id: https://leveldb.googlecode.com/svn/trunk@43 62dab493-f737-651d-591e-8d6aee1b9529
2011-07-27 04:39:46 +00:00
gabor@google.com
3cc27381f7 Setting SVN mime-type for benchmark page.
git-svn-id: https://leveldb.googlecode.com/svn/trunk@42 62dab493-f737-651d-591e-8d6aee1b9529
2011-07-27 01:56:52 +00:00
gabor@google.com
e301f17c2f Adding doctype to benchmark page so Google Code displays it as HTML.
git-svn-id: https://leveldb.googlecode.com/svn/trunk@41 62dab493-f737-651d-591e-8d6aee1b9529
2011-07-27 01:49:08 +00:00
gabor@google.com
f122c6dfbb Adding FreeBSD support, removing Chromium files, adding benchmark.
- LevelDB patch for FreeBSD. This resolves Issue 22.
  Contributed by dforsythe (thanks!).

- Removing Chromium-specific files.
  They are now going to live in the Chromium repository.

- Adding a benchmark page comparing LevelDB performance
  to SQLite and Kyoto Cabinet's TreeDB, along with
  code to generate the benchmarks.
  Thanks to Kevin Tseng for compiling the benchmarks,
  and Scott Hess and Mikio Hirabayashi for their
  help and advice.



git-svn-id: https://leveldb.googlecode.com/svn/trunk@40 62dab493-f737-651d-591e-8d6aee1b9529
2011-07-27 01:46:25 +00:00
gabor@google.com
60bd8015f2 Speed up Snappy uncompression, new Logger interface.
- Removed one copy of an uncompressed block contents changing
  the signature of Snappy_Uncompress() so it uncompresses into a
  flat array instead of a std::string.
        
  Speeds up readrandom ~10%.

- Instead of a combination of Env/WritableFile, we now have a
  Logger interface that can be easily overridden applications
  that want to supply their own logging.

- Separated out the gcc and Sun Studio parts of atomic_pointer.h
  so we can use 'asm', 'volatile' keywords for Sun Studio.




git-svn-id: https://leveldb.googlecode.com/svn/trunk@39 62dab493-f737-651d-591e-8d6aee1b9529
2011-07-21 02:40:18 +00:00
gabor@google.com
6872ace901 Sun Studio support, and fix for test related memory fixes.
- LevelDB patch for Sun Studio
  Based on a patch submitted by Theo Schlossnagle - thanks!
  This fixes Issue 17.

- Fix a couple of test related memory leaks.



git-svn-id: https://leveldb.googlecode.com/svn/trunk@38 62dab493-f737-651d-591e-8d6aee1b9529
2011-07-19 23:36:47 +00:00
gabor@google.com
6699c7ebe6 Small tweaks and bugfixes for Issue 18 and 19.
Slight tweak to the no-overlap optimization: only push to
level 2 to reduce the amount of wasted space when the same
small key range is being repeatedly overwritten.

Fix for Issue 18: Avoid failure on Windows by avoiding
deletion of lock file until the end of DestroyDB().

Fix for Issue 19: Disregard sequence numbers when checking for 
overlap in sstable ranges. This fixes issue 19: when writing 
the same key over and over again, we would generate a sequence 
of sstables that were never merged together since their sequence
numbers were disjoint.

Don't ignore map/unmap error checks.

Miscellaneous fixes for small problems Sanjay found while diagnosing
issue/9 and issue/16 (corruption_testr failures).
- log::Reader reports the record type when it finds an unexpected type.
- log::Reader no longer reports an error when it encounters an expected
  zero record regardless of the setting of the "checksum" flag.
- Added a missing forward declaration.
- Documented a side-effects of larger write buffer sizes
  (longer recovery time).



git-svn-id: https://leveldb.googlecode.com/svn/trunk@37 62dab493-f737-651d-591e-8d6aee1b9529
2011-07-15 00:20:57 +00:00
gabor@google.com
ed154f6dc4 Fixed a snappy compression wrapper bug (passing wrong variable).
Change atomic_pointer.h to prefer a memory barrier based
implementation over a <cstdatomic> based implementation for
the following reasons:
(1) On a x86-32-bit gcc-4.4 build, <ctdatomic> was corrupting
    the AtomicPointer.
(2) On a x86-64-bit gcc build, a <ctstdatomic> based acquire-load
    takes ~15ns as opposed to the ~1ns for a memory-barrier
    based implementation.

Fixes issue 9 (corruption_test fails)
http://code.google.com/p/leveldb/issues/detail?id=9

Fixes issue 16 (CorruptionTest.MissingDescriptor fails)
http://code.google.com/p/leveldb/issues/detail?id=16



git-svn-id: https://leveldb.googlecode.com/svn/trunk@36 62dab493-f737-651d-591e-8d6aee1b9529
2011-06-30 23:17:03 +00:00
gabor@google.com
85f0ab1975 Fixing Makefile issue reported in Issue 15 (misspelled flag)
git-svn-id: https://leveldb.googlecode.com/svn/trunk@35 62dab493-f737-651d-591e-8d6aee1b9529
2011-06-29 22:53:17 +00:00
gabor@google.com
f57e23351f Platform detection during build, plus compatibility patches for machines without <cstdatomic>.
This revision adds two major changes:
1. build_detect_platform which generates build_config.mk
   with platform-dependent flags for the build process
2. /port/atomic_pointer.h with anAtomicPointerimplementation
   for platforms without <cstdatomic>

Some of this code is loosely based on patches submitted to the 
LevelDB mailing list at https://groups.google.com/forum/#!forum/leveldb
Tip of the hat to Dave Smith and Edouard A, who both sent patches.

The presence of Snappy (http://code.google.com/p/snappy/) and
cstdatomic are now both detected in the build_detect_platform
script (1.) which gets executing during make.

For (2.), instead of broadly importing atomicops_* from Chromium or
the Google performance tools, we chose to just implement AtomicPointer 
and the limited atomic load and store operations it needs. 
This resulted in much less code and fewer files - everything is 
contained in atomic_pointer.h.



git-svn-id: https://leveldb.googlecode.com/svn/trunk@34 62dab493-f737-651d-591e-8d6aee1b9529
2011-06-29 00:30:50 +00:00
gabor@google.com
e0cbd242cb Fixing issue 11: version_set_test.cc was missing
git-svn-id: https://leveldb.googlecode.com/svn/trunk@33 62dab493-f737-651d-591e-8d6aee1b9529
2011-06-22 18:45:39 +00:00
gabor@google.com
ccf0fcd5c2 A number of smaller fixes and performance improvements:
- Implemented Get() directly instead of building on top of a full
  merging iterator stack.  This speeds up the "readrandom" benchmark
  by up to 15-30%.

- Fixed an opensource compilation problem.
  Added --db=<name> flag to control where the database is placed.

- Automatically compact a file when we have done enough
  overlapping seeks to that file.

- Fixed a performance bug where we would read from at least one
  file in a level even if none of the files overlapped the key
  being read.

- Makefile fix for Mac OSX installations that have XCode 4 without XCode 3.

- Unified the two occurrences of binary search in a file-list
  into one routine.

- Found and fixed a bug where we would unnecessarily search the
  last file when looking for a key larger than all data in the
  level.

- A fix to avoid the need for trivial move compactions and
  therefore gets rid of two out of five syncs in "fillseq".

- Removed the MANIFEST file write when switching to a new
  memtable/log-file for a 10-20% improvement on fill speed on ext4.

- Adding a SNAPPY setting in the Makefile for folks who have
  Snappy installed. Snappy compresses values and speeds up writes.



git-svn-id: https://leveldb.googlecode.com/svn/trunk@32 62dab493-f737-651d-591e-8d6aee1b9529
2011-06-22 02:36:45 +00:00
hans@chromium.org
80e5b0d944 sync with upstream @21706995
Fixed race condition reported by Dave Smit (dizzyd@dizzyd,com)
on the leveldb mailing list.  We were not signalling
waiters after a trivial move from level-0.  The result was
that in some cases (hard to reproduce), a write would get
stuck forever waiting for the number of level-0 files to drop
below its hard limit.

The new code is simpler: there is just one condition variable
instead of two, and the condition variable is signalled after
every piece of background work finishes.  Also, all compaction
work (including for manual compactions) is done in the
background thread, and therefore we can remove the
"compacting_" variable.



git-svn-id: https://leveldb.googlecode.com/svn/trunk@31 62dab493-f737-651d-591e-8d6aee1b9529
2011-06-07 14:40:26 +00:00
dgrogan@chromium.org
c4f5514948 sync with upstream @21627589
Minor changes:
* Reformat the bodies of the iterator interface routines in IteratorWrapper to
  make them a bit easier to read
* Switched the default in the leveldb makefile to be optimized mode, rather
  than debug mode
* Fix build problem in chromium port

git-svn-id: https://leveldb.googlecode.com/svn/trunk@30 62dab493-f737-651d-591e-8d6aee1b9529
2011-06-02 00:00:37 +00:00
dgrogan@chromium.org
740d8b3d00 Update from upstream @21551990
* Patch LevelDB to build for OSX and iOS
* Fix race condition in memtable iterator deletion.
* Other small fixes.

git-svn-id: https://leveldb.googlecode.com/svn/trunk@29 62dab493-f737-651d-591e-8d6aee1b9529
2011-05-28 00:53:58 +00:00
dgrogan@chromium.org
da79909507 sync with upstream @ 21409451
Check the NEWS file for details of what changed.

git-svn-id: https://leveldb.googlecode.com/svn/trunk@28 62dab493-f737-651d-591e-8d6aee1b9529
2011-05-21 02:17:43 +00:00
dgrogan@chromium.org
3c111335a7 make windows include /Iport\win in dependent projects
git-svn-id: https://leveldb.googlecode.com/svn/trunk@27 62dab493-f737-651d-591e-8d6aee1b9529
2011-05-03 03:10:59 +00:00