rocksdb

Author	SHA1	Message	Date
Dhruba Borthakur	1ca0584345	This is the mega-patch multi-threaded compaction published in https://reviews.facebook.net/D5997. Summary: This patch allows compaction to occur in multiple background threads concurrently. If a manual compaction is issued, the system falls back to a single-compaction-thread model. This is done to ensure correctess and simplicity of code. When the manual compaction is finished, the system resumes its concurrent-compaction mode automatically. The updates to the manifest are done via group-commit approach. Test Plan: run db_bench	2012-10-19 14:00:53 -07:00
Dhruba Borthakur	aa73538f2a	The deletion of obsolete files should not occur very frequently. Summary: The method DeleteObsolete files is a very costly methind, especially when the number of files in a system is large. It makes a list of all live-files and then scans the directory to compute the diff. By default, this method is executed after every compaction run. This patch makes it such that DeleteObsolete files is never invoked twice within a configured period. Test Plan: run all unit tests Reviewers: heyongqiang, MarkCallaghan Reviewed By: MarkCallaghan Differential Revision: https://reviews.facebook.net/D6045	2012-10-16 10:26:10 -07:00
Dhruba Borthakur	f7975ac733	Implement RowLocks for assoc schema Summary: Each assoc is identified by (id1, assocType). This is the rowkey. Each row has a read/write rowlock. There is statically allocated array of 2000 read/write locks. A rowkey is murmur-hashed to one of the read/write locks. assocPut and assocDelete acquires the rowlock in Write mode. The key-updates are done within the rowlock with a atomic nosync batch write to leveldb. Then the rowlock is released and a write-with-sync is done to sync leveldb transaction log. Test Plan: added unit test Reviewers: heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D5859	2012-10-03 23:19:01 -07:00
Dhruba Borthakur	c1006d4276	An configurable option to write data using write instead of mmap. Summary: We have seen that reading data via the pread call (instead of mmap) is much faster on Linux 2.6.x kernels. This patch makes an equivalent option to switch off mmaps for the write path as well. db_bench --mmap_write=0 will use write() instead of mmap() to write data to a file. This change is backward compatible, the default option is to continue using mmap for writing to a file. Test Plan: "make check all" Differential Revision: https://reviews.facebook.net/D5781	2012-10-03 17:08:13 -07:00
Dhruba Borthakur	a58d48de79	Implement ReadWrite locks for leveldb Summary: Implement ReadWrite locks for leveldb. These will be helpful to implement a read-modify-write operation (e.g. atomic increments). Test Plan: does not modify any existing code Reviewers: heyongqiang Reviewed By: heyongqiang CC: MarkCallaghan Differential Revision: https://reviews.facebook.net/D5787	2012-10-01 22:37:39 -07:00
Dhruba Borthakur	72c45c66c6	Print the block cache size in the LOG. Summary: Print the block cache size in the LOG. Test Plan: run db_bench and look at LOG. This is helpful while I was debugging one use-case. Reviewers: heyongqiang, MarkCallaghan Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D5739	2012-09-29 21:39:19 -07:00
Dhruba Borthakur	ae36e509f8	The BackupAPI should also list the length of the manifest file. Summary: The GetLiveFiles() api lists the set of sst files and the current MANIFEST file. But the database continues to append new data to the MANIFEST file even when the application is backing it up to the backup location. This means that the database-version that is stored in the MANIFEST FILE in the backup location does not correspond to the sst files returned by GetLiveFiles. This API adds a new parameter to GetLiveFiles. This new parmeter returns the current size of the MANIFEST file. Test Plan: Unit test attached. Reviewers: heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D5631	2012-09-25 03:13:25 -07:00
Dhruba Borthakur	9e84834eb4	Allow a configurable number of background threads. Summary: The background threads are necessary for compaction. For slower storage, it might be necessary to have more than one compaction thread per DB. This patch allows creating a configurable number of worker threads. The default reamins at 1 (to maintain backward compatibility). Test Plan: run all unit tests. changes to db-bench coming in a separate patch. Reviewers: heyongqiang Reviewed By: heyongqiang CC: MarkCallaghan Differential Revision: https://reviews.facebook.net/D5559	2012-09-19 15:51:08 -07:00
heyongqiang	a8464ed820	add an option to disable seek compaction Summary: as subject. This diff should be good for benchmarking. will send another diff to make it better in the case the seek compaction is enable. In that coming diff, will not count a seek if the bloomfilter filters. Test Plan: build Reviewers: dhruba, MarkCallaghan Reviewed By: MarkCallaghan Differential Revision: https://reviews.facebook.net/D5481	2012-09-17 13:59:57 -07:00
heyongqiang	b85cdca690	add a global var leveldb::useMmapRead to enable mmap Summary: Summary: as subject. this can be used for benchmarking. If we want it for some cases, we can do more changes to make this part of the option. Test Plan: db_test Reviewers: dhruba CC: MarkCallaghan Differential Revision: https://reviews.facebook.net/D5451	2012-09-16 22:07:35 -07:00
Mark Callaghan	33323f2111	Remove use of mmap for random reads Summary: Reads via mmap on concurrent workloads are much slower than pread. For example on a 24-core server with storage that can do 100k IOPS or more I can get no more than 10k IOPS with mmap reads and 32+ threads. Test Plan: db_bench benchmarks Reviewers: dhruba, heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D5433	2012-09-14 16:43:50 -07:00
Dhruba Borthakur	93f4952089	Ability to switch off filesystem read-aheads Summary: Ability to switch off filesystem read-aheads. This change is backward-compatible: the default setting is to allow file system read-aheads. Test Plan: run benchmarks Reviewers: heyongqiang, adsharma Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D5391	2012-09-13 12:09:56 -07:00
Dhruba Borthakur	4028ae7d31	Do not cache readahead-pages in the OS cache. Summary: When posix_fadvise(offset, offset) is usedm it frees up only those pages in that specified range. But the filesystem could have done some read-aheads and those get cached in the OS cache. Do not cache readahead-pages in the OS cache. Test Plan: run db_bench benchmark. Reviewers: vamsi, heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D5379	2012-09-13 10:56:02 -07:00
Dhruba Borthakur	407727b75f	Fix compiler warnings. Use uint64_t instead of uint. Summary: Fix compiler warnings. Use uint64_t instead of uint. Test Plan: build using -Wall Reviewers: heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D5355	2012-09-12 14:42:36 -07:00
heyongqiang	0f43aa474e	put log in a seperate dir Summary: added a new option db_log_dir, which points the log dir. Inside that dir, in order to make log names unique, the log file name is prefixed with the leveldb data dir absolute path. Test Plan: db_test Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D5205	2012-09-06 17:52:08 -07:00
Dhruba Borthakur	fe93631678	Clean up compiler warnings generated by -Wall option. Summary: Clean up compiler warnings generated by -Wall option. make clean all OPT=-Wall This is a pre-requisite before making a new release. Test Plan: compile and run unit tests Reviewers: heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D5019	2012-08-29 14:24:51 -07:00
Dhruba Borthakur	e5fe80e4e3	The sharding of the block cache is limited to 220 pieces. Summary: The numbers of shards that the block cache is divided into is configurable. However, if the user specifies that he/she wants the block cache to be divided into more than 220 pieces, then the system will rey to allocate a huge array of that size) that could fail. It is better to limit the sharding of the block cache to an upper bound. The default sharding is 16 shards (i.e. 24) and the maximum is now 2 million shards (i.e. 2*20). Also, fixed a bug with the LRUCache where the numShardBits should be a private member of the LRUCache object rather than a static variable. Test Plan: run db_bench with --cache_numshardbits=64. Task ID: # Blame Rev: Reviewers: heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D5013	2012-08-29 12:17:59 -07:00
heyongqiang	a4f9b8b49e	merge 1.5 Summary: as subject Test Plan: db_test table_test Reviewers: dhruba	2012-08-28 11:43:33 -07:00
Dhruba Borthakur	fc20273e73	Introduce a new method Env->Fsync() that issues fsync (instead of fdatasync). Summary: Introduce a new method Env->Fsync() that issues fsync (instead of fdatasync). This is needed for data durability when running on ext3 filesystems. Added options to the benchmark db_bench to generate performance numbers with either fsync or fdatasync enabled. Cleaned up Makefile to build leveldb_shell only when building the thrift leveldb server. Test Plan: build and run benchmark Reviewers: heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D4911	2012-08-27 21:24:17 -07:00
Dhruba Borthakur	e5a7c8e580	Log the open-options to the LOG. Summary: Log the open-options to the LOG. Use options_ instead of options because SanitizeOptions could modify the max_file_open limit. Test Plan: num db_bench Reviewers: heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D4833	2012-08-22 12:22:12 -07:00
heyongqiang	21082fa13c	regression for trigger compaction logic Summary: as subject Test Plan: manually run db_bench confirmed Reviewers: dhruba Differential Revision: https://reviews.facebook.net/D4809	2012-08-21 18:11:21 -07:00
Dhruba Borthakur	f4e7febf22	Record the version of the source repository that was used to build the leveldb library. Summary: Record the version of the source that we are compiling. We keep a record of the git revision in util/version.cc. This source file is then built as a regular source file as part of the compilation process. One can run "strings executable_filename \| grep _build_" to find the version of the source that we used to build the executable file. Test Plan: none Differential Revision: https://reviews.facebook.net/D4785	2012-08-21 14:47:15 -07:00
heyongqiang	6ba1f17789	adding a scribe logger in leveldb to log leveldb deploy stats Summary: as subject. A new log is written to scribe via thrift client when a new db is opened and when there is a compaction. a new option var scribe_log_db_stats is added. Test Plan: manually checked using command "ptail -time 0 leveldb_deploy_stats" Reviewers: dhruba Differential Revision: https://reviews.facebook.net/D4659	2012-08-21 11:43:22 -07:00
Dhruba Borthakur	e56b2c5a31	Prevent concurrent multiple opens of leveldb database. Summary: The fcntl call cannot detect lock conflicts when invoked multiple times from the same thread. Use a static lockedFile Set to record the paths that are locked. A lockfile request checks to see if htis filename already exists in lockedFiles, if so, then it triggers an error. Otherwise, it inserts the filename in the lockedFiles Set. A unlock file request verifies that the filename is in the lockedFiles set and removes it from lockedFiles set. Test Plan: unit test attached Reviewers: heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D4755	2012-08-20 23:55:04 -07:00
Dhruba Borthakur	c3096afd61	Introduce a new option disableDataSync for opening the database. If this is set to true, then the data written to newly created data files are not sycned to disk, instead depend on the OS to flush dirty data to stable storage. This option is good for bulk Test Plan: manual tests Task ID: # Blame Rev: Differential Revision: https://reviews.facebook.net/D4515	2012-08-03 15:23:53 -07:00
Dhruba Borthakur	d11b637f34	bits_per_key is already configurable. It defines how many bloom bits will be used for every key in the database. My change in this patch is to make the Hash code that is used for blooms to be confgurable. In fact, one can specify a modified HashCode that inspects only parts of the Key to generate the Hash (used by booms). Test Plan: none Differential Revision: https://reviews.facebook.net/D4059	2012-07-09 23:06:07 -07:00
Dhruba Borthakur	80c663882a	Create leveldb server via Thrift. Summary: First draft. Unit tests pass. Test Plan: unit tests attached Reviewers: heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D3969	2012-07-07 09:42:39 -07:00
heyongqiang	7600228072	fix compile warning Summary: as subject Test Plan: compile Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D3957	2012-07-02 17:37:45 -07:00
heyongqiang	4e4b6812ff	Make some variables configurable for each db instance Summary: Make configurable 'targetFileSize', 'targetFileSizeMultiplier', 'maxBytesForLevelBase', 'maxBytesForLevelMultiplier', 'expandedCompactionFactor', 'maxGrandParentOverlapFactor' Test Plan: N/A Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D3801	2012-06-27 14:36:31 -07:00
Dhruba Borthakur	a35e574344	Make Leveldb save data into HDFS files. You have to set USE_HDFS in your environment variable to compile leveldb with HDFS support. Test Plan: Run benchmark. Differential Revision: https://reviews.facebook.net/D3549	2012-06-14 00:29:01 -07:00
Dhruba Borthakur	8f293b68a9	Support --bufferedio=[0,1] from db_bench. If bufferedio = 0, then the read code path clears the OS page cache after the IO is completed. The default remains as bufferedio=1 Summary: Task ID: # Blame Rev: Test Plan: Revert Plan: Differential Revision: https://reviews.facebook.net/D3429	2012-05-29 13:29:44 -07:00
Dhruba Borthakur	a2a0e358cb	Add support to specify the number of shards for the Block cache. By default, the block cache is sharded into 16 parts. Summary: Task ID: # Blame Rev: Test Plan: Revert Plan: Differential Revision: https://reviews.facebook.net/D3273	2012-05-16 17:23:49 -07:00
Arun Sharma	95af128225	SSE4 optimization Summary: This speeds up CRC computation significantly on hardware that supports it. Enabled via -msse4. Note: the binary won't be usable on older CPUs that don't support the instruction. Test Plan: crc32c_test Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D3201	2012-05-15 10:10:01 -07:00
Arun Sharma	921a48428e	Optimize for lp64 Summary: Some code reorganization in-preparation for replacing with a hardware instruction. * Use u64 for some of the key types * Use an ALIGN macro so code is easier to read Test Plan: crc32c_test Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D3135	2012-05-14 15:40:11 -07:00
Sanjay Ghemawat	85584d497e	Added bloom filter support. In particular, we add a new FilterPolicy class. An instance of this class can be supplied in Options when opening a database. If supplied, the instance is used to generate summaries of keys (e.g., a bloom filter) which are placed in sstables. These summaries are consulted by DB::Get() so we can avoid reading sstable blocks that are guaranteed to not contain the key we are looking for. This change provides one implementation of FilterPolicy based on bloom filters. Other changes: - Updated version number to 1.4. - Some build tweaks. - C binding for CompactRange. - A few more benchmarks: deleteseq, deleterandom, readmissing, seekrandom. - Minor .gitignore update.	2012-04-17 08:36:46 -07:00
Sanjay Ghemawat	9013f13b15	use mmap on 64-bit machines to speed-up reads; small build fixes	2012-03-15 09:14:00 -07:00
Sanjay Ghemawat	3c8be108bf	fixed issues 66 (leaking files on disk error) and 68 (no sync of CURRENT file)	2012-01-25 14:56:52 -08:00
Hans Wennborg	42fb47f6ed	Pass system's CFLAGS, remove exit time destructor, sstable bug fix. - Pass system's values of CFLAGS,LDFLAGS. Don't override OPT if it's already set. Original patch by Alessio Treglia <alessio@debian.org>: http://code.google.com/p/leveldb/issues/detail?id=27#c6 - Remove 1 exit time destructor from leveldb. See http://crbug.com/101600 - Fix problem where sstable building code would pass an internal key to the user comparator. (Sync with uptream at 25436817.)	2011-11-14 17:06:16 +00:00
Hans Wennborg	36a5f8ed7f	A number of fixes: - Replace raw slice comparison with a call to user comparator. Added test for custom comparators. - Fix end of namespace comments. - Fixed bug in picking inputs for a level-0 compaction. When finding overlapping files, the covered range may expand as files are added to the input set. We now correctly expand the range when this happens instead of continuing to use the old range. For example, suppose L0 contains files with the following ranges: F1: a .. d F2: c .. g F3: f .. j and the initial compaction target is F3. We used to search for range f..j which yielded {F2,F3}. However we now expand the range as soon as another file is added. In this case, when F2 is added, we expand the range to c..j and restart the search. That picks up file F1 as well. This change fixes a bug related to deleted keys showing up incorrectly after a compaction as described in Issue 44. (Sync with upstream @25072954)	2011-10-31 17:22:06 +00:00
Gabor Cselle	299ccedfec	A number of bugfixes: - Added DB::CompactRange() method. Changed manual compaction code so it breaks up compactions of big ranges into smaller compactions. Changed the code that pushes the output of memtable compactions to higher levels to obey the grandparent constraint: i.e., we must never have a single file in level L that overlaps too much data in level L+1 (to avoid very expensive L-1 compactions). Added code to pretty-print internal keys. - Fixed bug where we would not detect overlap with files in level-0 because we were incorrectly using binary search on an array of files with overlapping ranges. Added "leveldb.sstables" property that can be used to dump all of the sstables and ranges that make up the db state. - Removing post_write_snapshot support. Email to leveldb mailing list brought up no users, just confusion from one person about what it meant. - Fixing static_cast char to unsigned on BIG_ENDIAN platforms. Fixes Issue 35 and Issue 36. - Comment clarification to address leveldb Issue 37. - Change license in posix_logger.h to match other files. - A build problem where uint32 was used instead of uint32_t. Sync with upstream @24408625	2011-10-05 16:30:28 -07:00
Hans Wennborg	213a68eb68	Sync with upstream @23860137. Fix GCC -Wshadow warnings in LevelDB's public header files, reported by Dustin. Add in-memory Env implementation (helpers/memenv/). This enables users to create LevelDB databases in-memory. Initialize ShardedLRUCache::last_id_ to zero. This fixes a Valgrind warning. (Also delete port/sha1_ which were removed upstream some time ago.)	2011-09-12 10:21:10 +01:00
gabor@google.com	e3584f9c28	Bugfix for issue 33; reduce lock contention in Get(), parallel benchmarks. - Fix for issue 33 (non-null-terminated result from leveldb_property_value()) - Support for running multiple instances of a benchmark in parallel. - Reduce lock contention on Get(): (1) Do not hold the lock while searching memtables. (2) Shard block and table caches 16-ways. Benchmark for evaluating this change: $ db_bench --benchmarks=fillseq1,readrandom --threads=$n (fillseq1 is a small hack to make sure fillseq runs once regardless of number of threads specified on the command line). git-svn-id: https://leveldb.googlecode.com/svn/trunk@49 62dab493-f737-651d-591e-8d6aee1b9529	2011-08-22 21:08:51 +00:00
gabor@google.com	ab323f7e1e	Bugfixes for iterator and documentation. - Fix bug in Iterator::Prev where it would return the wrong key. Fixes issues 29 and 30. - Added a tweak to testharness to allow running just some tests. - Fixing two minor documentation errors based on issues 28 and 25. - Cleanup; fix namespaces of export-to-C code. Also fix one "const char" vs "char" mismatch. git-svn-id: https://leveldb.googlecode.com/svn/trunk@48 62dab493-f737-651d-591e-8d6aee1b9529	2011-08-16 01:21:01 +00:00
dgrogan@chromium.org	a05525d13b	@23023120 git-svn-id: https://leveldb.googlecode.com/svn/trunk@47 62dab493-f737-651d-591e-8d6aee1b9529	2011-08-06 00:19:37 +00:00
gabor@google.com	f122c6dfbb	Adding FreeBSD support, removing Chromium files, adding benchmark. - LevelDB patch for FreeBSD. This resolves Issue 22. Contributed by dforsythe (thanks!). - Removing Chromium-specific files. They are now going to live in the Chromium repository. - Adding a benchmark page comparing LevelDB performance to SQLite and Kyoto Cabinet's TreeDB, along with code to generate the benchmarks. Thanks to Kevin Tseng for compiling the benchmarks, and Scott Hess and Mikio Hirabayashi for their help and advice. git-svn-id: https://leveldb.googlecode.com/svn/trunk@40 62dab493-f737-651d-591e-8d6aee1b9529	2011-07-27 01:46:25 +00:00
gabor@google.com	60bd8015f2	Speed up Snappy uncompression, new Logger interface. - Removed one copy of an uncompressed block contents changing the signature of Snappy_Uncompress() so it uncompresses into a flat array instead of a std::string. Speeds up readrandom ~10%. - Instead of a combination of Env/WritableFile, we now have a Logger interface that can be easily overridden applications that want to supply their own logging. - Separated out the gcc and Sun Studio parts of atomic_pointer.h so we can use 'asm', 'volatile' keywords for Sun Studio. git-svn-id: https://leveldb.googlecode.com/svn/trunk@39 62dab493-f737-651d-591e-8d6aee1b9529	2011-07-21 02:40:18 +00:00
gabor@google.com	6872ace901	Sun Studio support, and fix for test related memory fixes. - LevelDB patch for Sun Studio Based on a patch submitted by Theo Schlossnagle - thanks! This fixes Issue 17. - Fix a couple of test related memory leaks. git-svn-id: https://leveldb.googlecode.com/svn/trunk@38 62dab493-f737-651d-591e-8d6aee1b9529	2011-07-19 23:36:47 +00:00
gabor@google.com	6699c7ebe6	Small tweaks and bugfixes for Issue 18 and 19. Slight tweak to the no-overlap optimization: only push to level 2 to reduce the amount of wasted space when the same small key range is being repeatedly overwritten. Fix for Issue 18: Avoid failure on Windows by avoiding deletion of lock file until the end of DestroyDB(). Fix for Issue 19: Disregard sequence numbers when checking for overlap in sstable ranges. This fixes issue 19: when writing the same key over and over again, we would generate a sequence of sstables that were never merged together since their sequence numbers were disjoint. Don't ignore map/unmap error checks. Miscellaneous fixes for small problems Sanjay found while diagnosing issue/9 and issue/16 (corruption_testr failures). - log::Reader reports the record type when it finds an unexpected type. - log::Reader no longer reports an error when it encounters an expected zero record regardless of the setting of the "checksum" flag. - Added a missing forward declaration. - Documented a side-effects of larger write buffer sizes (longer recovery time). git-svn-id: https://leveldb.googlecode.com/svn/trunk@37 62dab493-f737-651d-591e-8d6aee1b9529	2011-07-15 00:20:57 +00:00
gabor@google.com	f57e23351f	Platform detection during build, plus compatibility patches for machines without <cstdatomic>. This revision adds two major changes: 1. build_detect_platform which generates build_config.mk with platform-dependent flags for the build process 2. /port/atomic_pointer.h with anAtomicPointerimplementation for platforms without <cstdatomic> Some of this code is loosely based on patches submitted to the LevelDB mailing list at https://groups.google.com/forum/#!forum/leveldb Tip of the hat to Dave Smith and Edouard A, who both sent patches. The presence of Snappy (http://code.google.com/p/snappy/) and cstdatomic are now both detected in the build_detect_platform script (1.) which gets executing during make. For (2.), instead of broadly importing atomicops_* from Chromium or the Google performance tools, we chose to just implement AtomicPointer and the limited atomic load and store operations it needs. This resulted in much less code and fewer files - everything is contained in atomic_pointer.h. git-svn-id: https://leveldb.googlecode.com/svn/trunk@34 62dab493-f737-651d-591e-8d6aee1b9529	2011-06-29 00:30:50 +00:00
dgrogan@chromium.org	740d8b3d00	Update from upstream @21551990 * Patch LevelDB to build for OSX and iOS * Fix race condition in memtable iterator deletion. * Other small fixes. git-svn-id: https://leveldb.googlecode.com/svn/trunk@29 62dab493-f737-651d-591e-8d6aee1b9529	2011-05-28 00:53:58 +00:00

1 2

66 Commits