2016-02-09 15:12:00 -08:00
|
|
|
// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
|
2017-07-15 16:03:42 -07:00
|
|
|
// This source code is licensed under both the GPLv2 (found in the
|
|
|
|
// COPYING file in the root directory) and Apache 2.0 License
|
|
|
|
// (found in the LICENSE.Apache file in the root directory).
|
2014-01-27 21:58:46 -08:00
|
|
|
//
|
|
|
|
// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
|
|
|
|
// Use of this source code is governed by a BSD-style license that can be
|
|
|
|
// found in the LICENSE file. See the AUTHORS file for names of contributors.
|
|
|
|
|
|
|
|
#pragma once
|
2014-02-07 19:26:49 -08:00
|
|
|
#include <memory>
|
2018-05-21 14:33:55 -07:00
|
|
|
#include "rocksdb/slice_transform.h"
|
2016-08-19 15:10:31 -07:00
|
|
|
#include "table/internal_iterator.h"
|
2014-01-27 21:58:46 -08:00
|
|
|
|
|
|
|
namespace rocksdb {
|
|
|
|
|
|
|
|
class Iterator;
|
2014-02-03 19:48:45 -08:00
|
|
|
struct ParsedInternalKey;
|
2014-01-27 21:58:46 -08:00
|
|
|
class Slice;
|
In DB::NewIterator(), try to allocate the whole iterator tree in an arena
Summary:
In this patch, try to allocate the whole iterator tree starting from DBIter from an arena
1. ArenaWrappedDBIter is created when serves as the entry point of an iterator tree, with an arena in it.
2. Add an option to create iterator from arena for following iterators: DBIter, MergingIterator, MemtableIterator, all mem table's iterators, all table reader's iterators and two level iterator.
3. MergeIteratorBuilder is created to incrementally build the tree of internal iterators. It is passed to mem table list and version set and add iterators to it.
Limitations:
(1) Only DB::NewIterator() without tailing uses the arena. Other cases, including readonly DB and compactions are still from malloc
(2) Two level iterator itself is allocated in arena, but not iterators inside it.
Test Plan: make all check
Reviewers: ljin, haobo
Reviewed By: haobo
Subscribers: leveldb, dhruba, yhchiang, igor
Differential Revision: https://reviews.facebook.net/D18513
2014-06-02 16:38:00 -07:00
|
|
|
class Arena;
|
2014-01-27 21:58:46 -08:00
|
|
|
struct ReadOptions;
|
|
|
|
struct TableProperties;
|
2014-09-29 11:09:09 -07:00
|
|
|
class GetContext;
|
2015-10-12 15:06:38 -07:00
|
|
|
class InternalIterator;
|
2014-01-27 21:58:46 -08:00
|
|
|
|
|
|
|
// A Table is a sorted map from strings to strings. Tables are
|
|
|
|
// immutable and persistent. A Table may be safely accessed from
|
|
|
|
// multiple threads without external synchronization.
|
|
|
|
class TableReader {
|
|
|
|
public:
|
|
|
|
virtual ~TableReader() {}
|
|
|
|
|
|
|
|
// Returns a new iterator over the table contents.
|
|
|
|
// The result of NewIterator() is initially invalid (caller must
|
|
|
|
// call one of the Seek methods on the iterator before using it).
|
In DB::NewIterator(), try to allocate the whole iterator tree in an arena
Summary:
In this patch, try to allocate the whole iterator tree starting from DBIter from an arena
1. ArenaWrappedDBIter is created when serves as the entry point of an iterator tree, with an arena in it.
2. Add an option to create iterator from arena for following iterators: DBIter, MergingIterator, MemtableIterator, all mem table's iterators, all table reader's iterators and two level iterator.
3. MergeIteratorBuilder is created to incrementally build the tree of internal iterators. It is passed to mem table list and version set and add iterators to it.
Limitations:
(1) Only DB::NewIterator() without tailing uses the arena. Other cases, including readonly DB and compactions are still from malloc
(2) Two level iterator itself is allocated in arena, but not iterators inside it.
Test Plan: make all check
Reviewers: ljin, haobo
Reviewed By: haobo
Subscribers: leveldb, dhruba, yhchiang, igor
Differential Revision: https://reviews.facebook.net/D18513
2014-06-02 16:38:00 -07:00
|
|
|
// arena: If not null, the arena needs to be used to allocate the Iterator.
|
|
|
|
// When destroying the iterator, the caller will not call "delete"
|
|
|
|
// but Iterator::~Iterator() directly. The destructor needs to destroy
|
|
|
|
// all the states but those allocated in arena.
|
Skip bottom-level filter block caching when hit-optimized
Summary:
When Get() or NewIterator() trigger file loads, skip caching the filter block if
(1) optimize_filters_for_hits is set and (2) the file is on the bottommost
level. Also skip checking filters under the same conditions, which means that
for a preloaded file or a file that was trivially-moved to the bottom level, its
filter block will eventually expire from the cache.
- added parameters/instance variables in various places in order to propagate the config ("skip_filters") from version_set to block_based_table_reader
- in BlockBasedTable::Rep, this optimization prevents filter from being loaded when the file is opened simply by setting filter_policy = nullptr
- in BlockBasedTable::Get/BlockBasedTable::NewIterator, this optimization prevents filter from being used (even if it was loaded already) by setting filter = nullptr
Test Plan:
updated unit test:
$ ./db_test --gtest_filter=DBTest.OptimizeFiltersForHits
will also run 'make check'
Reviewers: sdong, igor, paultuckfield, anthony, rven, kradhakrishnan, IslamAbdelRahman, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D51633
2015-12-23 10:15:07 -08:00
|
|
|
// skip_filters: disables checking the bloom filters even if they exist. This
|
|
|
|
// option is effective only for block-based table format.
|
2015-10-12 15:06:38 -07:00
|
|
|
virtual InternalIterator* NewIterator(const ReadOptions&,
|
2018-05-21 14:33:55 -07:00
|
|
|
const SliceTransform* prefix_extractor,
|
Skip bottom-level filter block caching when hit-optimized
Summary:
When Get() or NewIterator() trigger file loads, skip caching the filter block if
(1) optimize_filters_for_hits is set and (2) the file is on the bottommost
level. Also skip checking filters under the same conditions, which means that
for a preloaded file or a file that was trivially-moved to the bottom level, its
filter block will eventually expire from the cache.
- added parameters/instance variables in various places in order to propagate the config ("skip_filters") from version_set to block_based_table_reader
- in BlockBasedTable::Rep, this optimization prevents filter from being loaded when the file is opened simply by setting filter_policy = nullptr
- in BlockBasedTable::Get/BlockBasedTable::NewIterator, this optimization prevents filter from being used (even if it was loaded already) by setting filter = nullptr
Test Plan:
updated unit test:
$ ./db_test --gtest_filter=DBTest.OptimizeFiltersForHits
will also run 'make check'
Reviewers: sdong, igor, paultuckfield, anthony, rven, kradhakrishnan, IslamAbdelRahman, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D51633
2015-12-23 10:15:07 -08:00
|
|
|
Arena* arena = nullptr,
|
2018-06-25 13:07:38 -07:00
|
|
|
bool skip_filters = false,
|
|
|
|
bool for_compaction = false) = 0;
|
2014-01-27 21:58:46 -08:00
|
|
|
|
2016-08-19 15:10:31 -07:00
|
|
|
virtual InternalIterator* NewRangeTombstoneIterator(
|
2018-03-05 13:08:17 -08:00
|
|
|
const ReadOptions& /*read_options*/) {
|
2016-11-21 12:07:09 -08:00
|
|
|
return nullptr;
|
2016-08-19 15:10:31 -07:00
|
|
|
}
|
|
|
|
|
2014-01-27 21:58:46 -08:00
|
|
|
// Given a key, return an approximate byte offset in the file where
|
|
|
|
// the data for that key begins (or would begin if the key were
|
|
|
|
// present in the file). The returned value is in terms of file
|
|
|
|
// bytes, and so includes effects like compression of the underlying data.
|
|
|
|
// E.g., the approximate offset of the last key in the table will
|
|
|
|
// be close to the file length.
|
|
|
|
virtual uint64_t ApproximateOffsetOf(const Slice& key) = 0;
|
|
|
|
|
|
|
|
// Set up the table for Compaction. Might change some parameters with
|
|
|
|
// posix_fadvise
|
|
|
|
virtual void SetupForCompaction() = 0;
|
|
|
|
|
2014-02-07 19:26:49 -08:00
|
|
|
virtual std::shared_ptr<const TableProperties> GetTableProperties() const = 0;
|
2014-01-27 21:58:46 -08:00
|
|
|
|
2014-06-12 10:06:18 -07:00
|
|
|
// Prepare work that can be done before the real Get()
|
2018-03-05 13:08:17 -08:00
|
|
|
virtual void Prepare(const Slice& /*target*/) {}
|
2014-06-12 10:06:18 -07:00
|
|
|
|
2014-08-05 11:27:34 -07:00
|
|
|
// Report an approximation of how much memory has been used.
|
|
|
|
virtual size_t ApproximateMemoryUsage() const = 0;
|
|
|
|
|
2014-09-29 11:09:09 -07:00
|
|
|
// Calls get_context->SaveValue() repeatedly, starting with
|
|
|
|
// the entry found after a call to Seek(key), until it returns false.
|
|
|
|
// May not make such a call if filter policy says that key is not present.
|
2014-01-27 21:58:46 -08:00
|
|
|
//
|
2014-09-29 11:09:09 -07:00
|
|
|
// get_context->MarkKeyMayExist needs to be called when it is configured to be
|
|
|
|
// memory only and the key is not found in the block cache.
|
2014-01-27 21:58:46 -08:00
|
|
|
//
|
|
|
|
// readOptions is the options for the read
|
|
|
|
// key is the key to search for
|
Skip bottom-level filter block caching when hit-optimized
Summary:
When Get() or NewIterator() trigger file loads, skip caching the filter block if
(1) optimize_filters_for_hits is set and (2) the file is on the bottommost
level. Also skip checking filters under the same conditions, which means that
for a preloaded file or a file that was trivially-moved to the bottom level, its
filter block will eventually expire from the cache.
- added parameters/instance variables in various places in order to propagate the config ("skip_filters") from version_set to block_based_table_reader
- in BlockBasedTable::Rep, this optimization prevents filter from being loaded when the file is opened simply by setting filter_policy = nullptr
- in BlockBasedTable::Get/BlockBasedTable::NewIterator, this optimization prevents filter from being used (even if it was loaded already) by setting filter = nullptr
Test Plan:
updated unit test:
$ ./db_test --gtest_filter=DBTest.OptimizeFiltersForHits
will also run 'make check'
Reviewers: sdong, igor, paultuckfield, anthony, rven, kradhakrishnan, IslamAbdelRahman, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D51633
2015-12-23 10:15:07 -08:00
|
|
|
// skip_filters: disables checking the bloom filters even if they exist. This
|
|
|
|
// option is effective only for block-based table format.
|
2014-09-29 11:09:09 -07:00
|
|
|
virtual Status Get(const ReadOptions& readOptions, const Slice& key,
|
2018-05-21 14:33:55 -07:00
|
|
|
GetContext* get_context,
|
|
|
|
const SliceTransform* prefix_extractor,
|
|
|
|
bool skip_filters = false) = 0;
|
2014-12-23 13:24:07 -08:00
|
|
|
|
2015-03-02 17:07:03 -08:00
|
|
|
// Prefetch data corresponding to a give range of keys
|
|
|
|
// Typically this functionality is required for table implementations that
|
|
|
|
// persists the data on a non volatile storage medium like disk/SSD
|
|
|
|
virtual Status Prefetch(const Slice* begin = nullptr,
|
|
|
|
const Slice* end = nullptr) {
|
|
|
|
(void) begin;
|
|
|
|
(void) end;
|
|
|
|
// Default implementation is NOOP.
|
|
|
|
// The child class should implement functionality when applicable
|
|
|
|
return Status::OK();
|
|
|
|
}
|
|
|
|
|
2014-12-23 13:24:07 -08:00
|
|
|
// convert db file to a human readable form
|
2018-05-21 14:33:55 -07:00
|
|
|
virtual Status DumpTable(WritableFile* /*out_file*/,
|
|
|
|
const SliceTransform* /*prefix_extractor*/) {
|
2014-12-23 13:24:07 -08:00
|
|
|
return Status::NotSupported("DumpTable() not supported");
|
|
|
|
}
|
Adding pin_l0_filter_and_index_blocks_in_cache feature and related fixes.
Summary:
When a block based table file is opened, if prefetch_index_and_filter is true, it will prefetch the index and filter blocks, putting them into the block cache.
What this feature adds: when a L0 block based table file is opened, if pin_l0_filter_and_index_blocks_in_cache is true in the options (and prefetch_index_and_filter is true), then the filter and index blocks aren't released back to the block cache at the end of BlockBasedTableReader::Open(). Instead the table reader takes ownership of them, hence pinning them, ie. the LRU cache will never push them out. Meanwhile in the table reader, further accesses will not hit the block cache, thus avoiding lock contention.
Test Plan:
'export TEST_TMPDIR=/dev/shm/ && DISABLE_JEMALLOC=1 OPT=-g make all valgrind_check -j32' is OK.
I didn't run the Java tests, I don't have Java set up on my devserver.
Reviewers: sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56133
2016-04-01 10:42:39 -07:00
|
|
|
|
2017-08-09 15:49:40 -07:00
|
|
|
// check whether there is corruption in this db file
|
|
|
|
virtual Status VerifyChecksum() {
|
|
|
|
return Status::NotSupported("VerifyChecksum() not supported");
|
|
|
|
}
|
|
|
|
|
Adding pin_l0_filter_and_index_blocks_in_cache feature and related fixes.
Summary:
When a block based table file is opened, if prefetch_index_and_filter is true, it will prefetch the index and filter blocks, putting them into the block cache.
What this feature adds: when a L0 block based table file is opened, if pin_l0_filter_and_index_blocks_in_cache is true in the options (and prefetch_index_and_filter is true), then the filter and index blocks aren't released back to the block cache at the end of BlockBasedTableReader::Open(). Instead the table reader takes ownership of them, hence pinning them, ie. the LRU cache will never push them out. Meanwhile in the table reader, further accesses will not hit the block cache, thus avoiding lock contention.
Test Plan:
'export TEST_TMPDIR=/dev/shm/ && DISABLE_JEMALLOC=1 OPT=-g make all valgrind_check -j32' is OK.
I didn't run the Java tests, I don't have Java set up on my devserver.
Reviewers: sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56133
2016-04-01 10:42:39 -07:00
|
|
|
virtual void Close() {}
|
2014-01-27 21:58:46 -08:00
|
|
|
};
|
|
|
|
|
|
|
|
} // namespace rocksdb
|