Commit Graph

185 Commits

Author SHA1 Message Date
Dhruba Borthakur
18cb6004d2 Fixed compilation error in previous merge.
Summary:
Fixed compilation error in previous merge.

Test Plan:

Reviewers:

CC:

Task ID: #

Blame Rev:
2012-11-07 15:24:47 -08:00
Dhruba Borthakur
8143062edd Merge branch 'master' into performance
Conflicts:
	db/db_impl.cc
	db/version_set.cc
	util/options.cc
2012-11-07 15:11:37 -08:00
heyongqiang
3fcf533ed0 Add a readonly db
Summary: as subject

Test Plan: run db_bench readrandom

Reviewers: dhruba

Reviewed By: dhruba

CC: MarkCallaghan, emayanke, sheki

Differential Revision: https://reviews.facebook.net/D6495
2012-11-07 14:19:48 -08:00
Dhruba Borthakur
9b87a2bae8 Avoid doing a exhaustive search when looking for overlapping files.
Summary:
The Version::GetOverlappingInputs() is called multiple times in
the compaction code path. Eack invocation does a binary search
for overlapping files in the specified key range.
This patch remembers the offset of an overlapped file when
GetOverlappingInputs() is called the first time within
a compaction run. Suceeding calls to GetOverlappingInputs()
uses the remembered index to avoid the binary search.

I measured that 1000 iterations of GetOverlappingInputs
takes around 4500 microseconds without this patch. If I use
this patch with the hint on every invocation, then 1000
iterations take about 3900 microsecond.

Test Plan: make check OPT=-g

Reviewers: heyongqiang

Reviewed By: heyongqiang

CC: MarkCallaghan, emayanke, sheki

Differential Revision: https://reviews.facebook.net/D6513
2012-11-07 11:47:17 -08:00
Abhishek Kona
4e413df3d0 Flush Data at object destruction if disableWal is used.
Summary:
Added a conditional flush in ~DBImpl to flush.
There is still a chance of writes not being persisted if there is a
crash (not a clean shutdown) before the DBImpl instance is destroyed.

Test Plan: modified db_test to meet the new expectations.

Reviewers: dhruba, heyongqiang

Differential Revision: https://reviews.facebook.net/D6519
2012-11-06 15:04:42 -08:00
Dhruba Borthakur
aa42c66814 Fix all warnings generated by -Wall option to the compiler.
Summary:
The default compilation process now uses "-Wall" to compile.
Fix all compilation error generated by gcc.

Test Plan: make all check

Reviewers: heyongqiang, emayanke, sheki

Reviewed By: heyongqiang

CC: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6525
2012-11-06 14:07:31 -08:00
Dhruba Borthakur
5f91868cee Merge branch 'master' into performance
Conflicts:
	db/version_set.cc
	util/options.cc
2012-11-05 16:51:55 -08:00
Dhruba Borthakur
cb7a00227f The method GetOverlappingInputs should use binary search.
Summary:
The method Version::GetOverlappingInputs used a sequential search
to map a kay-range to a set of files. But the files are arranged
in ascending order of key, so a biary search is more effective.

This patch implements Version::GetOverlappingInputsBinarySearch
that finds one file that corresponds to the specified key range
and then iterates backwards and forwards to find all overlapping
files.

This patch is critical for making compactions efficient, especially
when there are thousands of files in a single level.

I measured that 1000 iterations of TEST_MaxNextLevelOverlappingBytes
takes 16000 microseconds without this patch. With this patch, the
same method takes about 4600 microseconds.

Test Plan: Almost all unit tests in db_test uses this method to lookup keys.

Reviewers: heyongqiang

Reviewed By: heyongqiang

CC: MarkCallaghan, emayanke, sheki

Differential Revision: https://reviews.facebook.net/D6465
2012-11-05 16:08:01 -08:00
Dhruba Borthakur
5273c81483 Ability to invoke application hook for every key during compaction.
Summary:
There are certain use-cases where the application intends to
delete older keys aftre they have expired a certian time period.
One option for those applications is to periodically scan the
entire database and delete appropriate keys.

A better way is to allow the application to hook into the
compaction process. This patch allows the application to set
a method callback for every key that is being compacted. If
this method returns true, then the key is not preserved in
the output of the compaction.

Test Plan:
This is mostly to preview the proposed new public api.
Since it is a public api, please do due diligence on reviewing it.

I will be writing test cases for this api in mynext version of
this patch.

Reviewers: MarkCallaghan, heyongqiang

Reviewed By: heyongqiang

CC: sheki, adsharma

Differential Revision: https://reviews.facebook.net/D6285
2012-11-05 16:02:13 -08:00
heyongqiang
f1a7c735b5 fix complie error
Summary:

as subject

Test Plan:n/a
2012-11-05 10:30:19 -08:00
heyongqiang
d55c2ba305 Add a tool to change number of levels
Summary: as subject.

Test Plan: manually test it, will add a testcase

Reviewers: dhruba, MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6345
2012-11-05 10:17:39 -08:00
Dhruba Borthakur
81f735d97c Merge branch 'master' into performance
Conflicts:
	db/db_impl.cc
	util/options.cc
2012-11-05 09:41:38 -08:00
Dhruba Borthakur
a1bd5b7752 Compilation problem introduced by previous
commit 854c66b089.

Summary:
Compilation problem introduced by previous
commit 854c66b089.

Test Plan:  make check
2012-11-04 22:04:14 -08:00
amayank
854c66b089 Make compression options configurable. These include window-bits, level and strategy for ZlibCompression
Summary: Leveldb currently uses windowBits=-14 while using zlib compression.(It was earlier 15). This makes the setting configurable. Related changes here: https://reviews.facebook.net/D6105

Test Plan: make all check

Reviewers: dhruba, MarkCallaghan, sheki, heyongqiang

Differential Revision: https://reviews.facebook.net/D6393
2012-11-02 11:26:39 -07:00
heyongqiang
3096fa7534 Add two more options: disable block cache and make table cache shard number configuable
Summary:

as subject

Test Plan:

run db_bench and db_test

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D6111
2012-11-01 13:23:21 -07:00
Mark Callaghan
3e7e269292 Use timer to measure sleep rather than assume it is 1000 usecs
Summary:
This makes the stall timers in MakeRoomForWrite more accurate by timing
the sleeps. From looking at the logs the real sleep times are usually
about 2000 usecs each when SleepForMicros(1000) is called. The modified LOG messages are:
2012/10/29-12:06:33.271984 2b3cc872f700 delaying write 13 usecs for level0_slowdown_writes_trigger
2012/10/29-12:06:34.688939 2b3cc872f700 delaying write 1728 usecs for rate limits with max score 3.83

Task ID: #

Blame Rev:

Test Plan:
run db_bench, look at DB/LOG

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D6297
2012-10-30 07:21:37 -07:00
heyongqiang
fb8d437325 fix test failure
Summary: as subject

Test Plan: db_test

Reviewers: dhruba, MarkCallaghan

Reviewed By: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6309
2012-10-29 18:55:52 -07:00
heyongqiang
925f60d39d add a test case to make sure chaning num_levels will fail Summary:
Summary: as subject

Test Plan: db_test

Reviewers: dhruba, MarkCallaghan

Reviewed By: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6303
2012-10-29 15:27:07 -07:00
Dhruba Borthakur
53e04311b1 Merge branch 'master' into performance
Conflicts:
	db/db_bench.cc
	util/options.cc
2012-10-29 14:18:00 -07:00
Dhruba Borthakur
321dfdc3ae Allow having different compression algorithms on different levels.
Summary:
The leveldb API is enhanced to support different compression algorithms at
different levels.

This adds the option min_level_to_compress to db_bench that specifies
the minimum level for which compression should be done when
compression is enabled. This can be used to disable compression for levels
0 and 1 which are likely to suffer from stalls because of the CPU load
for memtable flushes and (L0,L1) compaction.  Level 0 is special as it
gets frequent memtable flushes. Level 1 is special as it frequently
gets all:all file compactions between it and level 0. But all other levels
could be the same. For any level N where N > 1, the rate of sequential
IO for that level should be the same. The last level is the
exception because it might not be full and because files from it are
not read to compact with the next larger level.

The same amount of time will be spent doing compaction at any
level N excluding N=0, 1 or the last level. By this standard all
of those levels should use the same compression. The difference is that
the loss (using more disk space) from a faster compression algorithm
is less significant for N=2 than for N=3. So we might be willing to
trade disk space for faster write rates with no compression
for L0 and L1, snappy for L2, zlib for L3. Using a faster compression
algorithm for the mid levels also allows us to reclaim some cpu
without trading off much loss in disk space overhead.

Also note that little is to be gained by compressing levels 0 and 1. For
a 4-level tree they account for 10% of the data. For a 5-level tree they
account for 1% of the data.

With compression enabled:
* memtable flush rate is ~18MB/second
* (L0,L1) compaction rate is ~30MB/second

With compression enabled but min_level_to_compress=2
* memtable flush rate is ~320MB/second
* (L0,L1) compaction rate is ~560MB/second

This practicaly takes the same code from https://reviews.facebook.net/D6225
but makes the leveldb api more general purpose with a few additional
lines of code.

Test Plan: make check

Differential Revision: https://reviews.facebook.net/D6261
2012-10-29 11:48:09 -07:00
Mark Callaghan
acc8567b24 Add more rates to db_bench output
Summary:
Adds the "MB/sec in" and "MB/sec out" to this line:
Amplification: 1.7 rate, 0.01 GB in, 0.02 GB out, 8.24 MB/sec in, 13.75 MB/sec out

Changes all values to be reported per interval and since test start for this line:
... thread 0: (10000,60000) ops and (19155.6,27307.5) ops/second in (0.522041,2.197198) seconds

Task ID: #

Blame Rev:

Test Plan:
run db_bench

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D6291
2012-10-29 11:30:07 -07:00
Dhruba Borthakur
de7689b1d7 Fix unit test failure caused by delaying deleting obsolete files.
Summary:
A previous commit 4c107587ed introduced
the idea that some version updates might not delete obsolete files.
This means that if a unit test blindly counts the number of files
in the db directory it might not represent the true state of the database.

Use GetLiveFiles() insteads to count the number of live files in the database.

Test Plan:
make check
2012-10-29 11:12:24 -07:00
Mark Callaghan
70c42bf05f Adds DB::GetNextCompaction and then uses that for rate limiting db_bench
Summary:
Adds a method that returns the score for the next level that most
needs compaction. That method is then used by db_bench to rate limit threads.
Threads are put to sleep at the end of each stats interval until the score
is less than the limit. The limit is set via the --rate_limit=$double option.
The specified value must be > 1.0. Also adds the option --stats_per_interval
to enable additional metrics reported every stats interval.

Task ID: #

Blame Rev:

Test Plan:
run db_bench

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D6243
2012-10-29 10:17:43 -07:00
Kai Liu
d50f8eb603 Enable LevelDb to create a new log file if current log file is too large.
Summary: Enable LevelDb to create a new log file if current log file is too large.

Test Plan:
Write a script and manually check the generated info LOG.

Task ID: 1803577

Blame Rev:

Reviewers: dhruba, heyongqiang

Reviewed By: heyongqiang

CC: zshao

Differential Revision: https://reviews.facebook.net/D6003
2012-10-26 14:55:02 -07:00
Mark Callaghan
65855dd8d4 Normalize compaction stats by time in compaction
Summary:
I used server uptime to compute per-level IO throughput rates. I
intended to use time spent doing compaction at that level. This fixes that.

Task ID: #

Blame Rev:

Test Plan:
run db_bench, look at results

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D6237
2012-10-26 14:19:13 -07:00
Dhruba Borthakur
ea9e087851 Merge branch 'master' into performance
Conflicts:
	db/db_bench.cc
	db/db_impl.cc
	db/db_test.cc
2012-10-26 08:57:56 -07:00
Dhruba Borthakur
8eedf13a82 Fix unit test failure caused by delaying deleting obsolete files.
Summary:
A previous commit 4c107587ed introduced
the idea that some version updates might not delete obsolete files.
This means that if a unit test blindly counts the number of files
in the db directory it might not represent the true state of the database.

Use GetLiveFiles() insteads to count the number of live files in the database.

Test Plan: make check

Reviewers: heyongqiang, MarkCallaghan

Reviewed By: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6207
2012-10-26 08:42:05 -07:00
Dhruba Borthakur
5b0fe6c73b Greedy algorithm for picking files to compact.
Summary:
It is best if we pick the largest file to compact in a level.
This reduces the write amplification factor for compactions.
Each level has an auxiliary data structure called files_by_size_
that sorts all files by their size. This data structure is
updated when a new version is created.

Test Plan: make check

Differential Revision: https://reviews.facebook.net/D6195
2012-10-25 18:27:53 -07:00
Dhruba Borthakur
8fb5f40468 firstIndex fix for multi-threaded compaction code.
Summary:
Prior to multi-threaded compaction, wrap-around would be done by using
current_->files_[level[0]. With this change we should be
using the first file for which f->being_compacted is not true.

1ca0584345 (commitcomment-2041516)

Test Plan: make check

Differential Revision: https://reviews.facebook.net/D6165
2012-10-25 08:44:47 -07:00
Mark Callaghan
e7206f43ee Improve statistics
Summary:
This adds more statistics to be reported by GetProperty("leveldb.stats").
The new stats include time spent waiting on stalls in MakeRoomForWrite.
This also includes the total amplification rate where that is:
    (#bytes of sequential IO during compaction) / (#bytes from Put)
This also includes a lot more data for the per-level compaction report.
* Rn(MB) - MB read from level N during compaction between levels N and N+1
* Rnp1(MB) - MB read from level N+1 during compaction between levels N and N+1
* Wnew(MB) - new data written to the level during compaction
* Amplify - ( Write(MB) + Rnp1(MB) ) / Rn(MB)
* Rn - files read from level N during compaction between levels N and N+1
* Rnp1 - files read from level N+1 during compaction between levels N and N+1
* Wnp1 - files written to level N+1 during compaction between levels N and N+1
* NewW - new files written to level N+1 during compaction
* Count - number of compactions done for this level

This is the new output from DB::GetProperty("leveldb.stats"). The old output stopped at Write(MB)

                               Compactions
Level  Files Size(MB) Time(sec) Read(MB) Write(MB)  Rn(MB) Rnp1(MB) Wnew(MB) Amplify Read(MB/s) Write(MB/s)   Rn Rnp1 Wnp1 NewW Count
-------------------------------------------------------------------------------------------------------------------------------------
  0        3        6        33        0       576       0        0      576    -1.0       0.0         1.3     0    0    0    0   290
  1      127      242       351     5316      5314     570     4747      567    17.0      12.1        12.1   287 2399 2685  286    32
  2      161      328        54      822       824     326      496      328     4.0       1.9         1.9   160  251  411  160   161
Amplification: 22.3 rate, 0.56 GB in, 12.55 GB out
Uptime(secs): 439.8
Stalls(secs): 206.938 level0_slowdown, 0.000 level0_numfiles, 24.129 memtable_compaction

Task ID: #

Blame Rev:

Test Plan:
run db_bench

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
(cherry picked from commit ecdeead38f86cc02e754d0032600742c4f02fec8)

Reviewers: dhruba

Differential Revision: https://reviews.facebook.net/D6153
2012-10-24 14:21:38 -07:00
Dhruba Borthakur
3b06f94fa2 Merge branch 'master' into performance
Conflicts:
	db/db_impl.cc
	db/db_impl.h
	db/version_set.cc
2012-10-23 22:30:07 -07:00
Dhruba Borthakur
4c107587ed Delete files outside the mutex.
Summary:
The compaction process deletes a large number of files. This takes
quite a bit of time and is best done outside the mutex lock.

Test Plan: make check

Differential Revision: https://reviews.facebook.net/D6123
2012-10-22 11:53:23 -07:00
heyongqiang
5010daa7a8 add "seek_compaction" to log for better debug Summary:
Summary: as subject

Test Plan: compile

Reviewers: dhruba

Reviewed By: dhruba

CC: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6117
2012-10-22 10:00:25 -07:00
Dhruba Borthakur
3489cd615c Merge branch 'master' into performance
Conflicts:
	db/db_impl.cc
	db/db_impl.h
2012-10-21 02:15:19 -07:00
Dhruba Borthakur
f95219fb32 Delete files outside the mutex.
Summary:
The compaction process deletes a large number of files. This takes
quite a bit of time and is best done outside the mutex lock.

Test Plan: make check

Differential Revision: https://reviews.facebook.net/D6123
2012-10-21 02:03:00 -07:00
Dhruba Borthakur
98f23cf04a Merge branch 'master' into performance
Conflicts:
	db/db_impl.cc
	db/db_impl.h
2012-10-21 01:55:19 -07:00
Dhruba Borthakur
64c4b9f0e2 Delete files outside the mutex.
Summary:
The compaction process deletes a large number of files. This takes
quite a bit of time and is best done outside the mutex lock.

Test Plan:

Reviewers:

CC:

Task ID: #

Blame Rev:
2012-10-21 01:49:48 -07:00
Dhruba Borthakur
e982f5a1d2 Merge branch 'master' into performance
Conflicts:
	util/options.cc
2012-10-19 15:16:42 -07:00
Dhruba Borthakur
cf5adc8016 db_bench was not correctly initializing the value for delete_obsolete_files_period_micros option.
Summary:
The parameter delete_obsolete_files_period_micros controls the
periodicity of deleting obsolete files. db_bench was reading in
this parameter intoa local variable called 'l' but was incorrectly
using another local variable called 'n' while setting it in the
db.options data structure.
This patch also logs the value of delete_obsolete_files_period_micros
in the LOG file at db startup time.

I am hoping that this will improve the overall write throughput drastically.

Test Plan: run db_bench

Reviewers: MarkCallaghan, heyongqiang

Reviewed By: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6099
2012-10-19 15:10:12 -07:00
Dhruba Borthakur
1ca0584345 This is the mega-patch multi-threaded compaction
published in https://reviews.facebook.net/D5997.

Summary:
This patch allows compaction to occur in multiple background threads
concurrently.

If a manual compaction is issued, the system falls back to a
single-compaction-thread model. This is done to ensure correctess
and simplicity of code. When the manual compaction is finished,
the system resumes its concurrent-compaction mode automatically.

The updates to the manifest are done via group-commit approach.

Test Plan: run db_bench
2012-10-19 14:00:53 -07:00
Dhruba Borthakur
aa73538f2a The deletion of obsolete files should not occur very frequently.
Summary:
The method DeleteObsolete files is a very costly methind, especially
when the number of files in a system is large. It makes a list of
all live-files and then scans the directory to compute the diff.
By default, this method is executed after every compaction run.

This patch makes it such that DeleteObsolete files is never
invoked twice within a configured period.

Test Plan: run all unit tests

Reviewers: heyongqiang, MarkCallaghan

Reviewed By: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6045
2012-10-16 10:26:10 -07:00
Dhruba Borthakur
0230866791 Enhance db_bench to allow setting the number of levels in a database.
Summary: Enhance db_bench to allow setting the number of levels in a database.

Test Plan: run db_bench and look at LOG

Reviewers: heyongqiang, MarkCallaghan

Reviewed By: MarkCallaghan

CC: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6027
2012-10-15 10:18:49 -07:00
Dhruba Borthakur
c1006d4276 An configurable option to write data using write instead of mmap.
Summary:
We have seen that reading data via the pread call (instead of
mmap) is much faster on Linux 2.6.x kernels. This patch makes
an equivalent option to switch off mmaps for the write path
as well.

db_bench --mmap_write=0 will use write() instead of mmap() to
write data to a file.

This change is backward compatible, the default
option is to continue using mmap for writing to a file.

Test Plan: "make check all"

Differential Revision: https://reviews.facebook.net/D5781
2012-10-03 17:08:13 -07:00
Mark Callaghan
e678a5947a Add --stats_interval option to db_bench
Summary:
The option is zero by default and in that case reporting is unchanged.
By unchanged, the interval at which stats are reported is scaled after each
report and newline is not issued after each report so one line is rewritten.
When non-zero it specifies the constant interval (in operations) at which
statistics are reported and the stats include the rate per interval. This
makes it easier to determine whether QPS changes over the duration of the test.

Task ID: #

Blame Rev:

Test Plan:
run db_bench

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

Reviewed By: dhruba

CC: heyongqiang

Differential Revision: https://reviews.facebook.net/D5817
2012-10-03 09:54:33 -07:00
Mark Callaghan
d8763abecd Fix the bounds check for the --readwritepercent option
Summary:
see above

Task ID: #

Blame Rev:

Test Plan:
run db_bench with invalid value for option

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba

Reviewed By: dhruba

CC: heyongqiang

Differential Revision: https://reviews.facebook.net/D5823
2012-10-03 09:52:26 -07:00
Mark Callaghan
98804f914f Fix compiler warnings and errors in ldb.c
Summary:
stdlib.h is needed for exit()
--readhead --> --readahead

Task ID: #

Blame Rev:

Test Plan:
compile

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
fix compiler warnings & errors

Reviewers: dhruba

Reviewed By: dhruba

CC: heyongqiang

Differential Revision: https://reviews.facebook.net/D5805
2012-10-03 06:46:59 -07:00
Abhishek Kona
fec81318b0 Commandline tool to compace LevelDB databases.
Summary:
A simple CLI which calles DB->CompactRange()
Can take String key's as range.

Test Plan:
Inserted data into a table.
Waited for a minute, used compact tool on it. File modification time's
changed so Compact did something on the files.

Existing unit tests work.

Reviewers: heyongqiang, dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D5697
2012-10-01 10:49:19 -07:00
Dhruba Borthakur
c1bb32e1ba Trigger read compaction only if seeks to storage are incurred.
Summary:
In the current code, a Get() call can trigger compaction if it has to look at more than one file. This causes unnecessary compaction because looking at more than one file is a penalty only if the file is not yet in the cache. Also, th current code counts these files before the bloom filter check is applied.

This patch counts a 'seek' only if the file fails the bloom filter
check and has to read in data block(s) from the storage.

This patch also counts a 'seek' if a file is not present in the file-cache, because opening a file means that its index blocks need to be read into cache.

Test Plan: unit test attached. I will probably add one more unti tests.

Reviewers: heyongqiang

Reviewed By: heyongqiang

CC: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D5709
2012-09-28 11:10:52 -07:00
Dhruba Borthakur
24eea931ef If ReadCompaction is switched off, then it is better to not even submit background compaction jobs.
Summary:
If ReadCompaction is switched off, then it is better to not even
submit background compaction jobs. I see about 3% increase in
read-throughput on a pure memory database.

Test Plan: run db_bench

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5673
2012-09-25 11:07:01 -07:00
Dhruba Borthakur
ae36e509f8 The BackupAPI should also list the length of the manifest file.
Summary:
The GetLiveFiles() api lists the set of sst files and the current
MANIFEST file. But the database continues to append new data to the
MANIFEST file even when the application is backing it up to the
backup location. This means that the database-version that is
stored in the MANIFEST FILE in the backup location
does not correspond to the sst files returned by GetLiveFiles.

This API adds a new parameter to GetLiveFiles. This new parmeter
returns the current size of the MANIFEST file.

Test Plan: Unit test attached.

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5631
2012-09-25 03:13:25 -07:00
Dhruba Borthakur
bb2dcd2457 Segfault in DoCompactionWork caused by buffer overflow
Summary:
The code was allocating 200 bytes on the stack but it
writes 256 bytes into the array.

x8a8ea5 std::_Rb_tree<>::erase()
    @     0x7f134bee7eb0 (unknown)
    @           0x8a8ea5 std::_Rb_tree<>::erase()
    @           0x8a35d6 leveldb::DBImpl::CleanupCompaction()
    @           0x8a7810 leveldb::DBImpl::BackgroundCompaction()
    @           0x8a804d leveldb::DBImpl::BackgroundCall()
    @           0x8c4eff leveldb::(anonymous namespace)::PosixEnv::BGThreadWrapper()
    @     0x7f134b3c010d start_thread
    @     0x7f134bf9f10d clone

Test Plan: run db_bench with overwrite option

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5595
2012-09-21 10:55:38 -07:00
Dhruba Borthakur
fb4b381a0c Print out the compile version in the LOG.
Summary: Print out the compile version in the LOG.

Test Plan: run dbbench and verify LOG

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5529
2012-09-18 13:24:32 -07:00
heyongqiang
a8464ed820 add an option to disable seek compaction
Summary:
as subject. This diff should be good for benchmarking.

will send another diff to make it better in the case the seek compaction is enable.
In that coming diff, will not count a seek if the bloomfilter filters.

Test Plan: build

Reviewers: dhruba, MarkCallaghan

Reviewed By: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D5481
2012-09-17 13:59:57 -07:00
Dhruba Borthakur
ba55d77b5d Ability to take a file-lvel snapshot from leveldb.
Summary:
A set of apis that allows an application to backup data from the
leveldb database based on a set of files.

Test Plan: unint test attached. more coming soon.

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5439
2012-09-17 09:14:50 -07:00
heyongqiang
b85cdca690 add a global var leveldb::useMmapRead to enable mmap Summary:
Summary:
as subject. this can be used for benchmarking.
If we want it for some cases, we can do more changes to make this part of the option.

Test Plan: db_test

Reviewers: dhruba

CC: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D5451
2012-09-16 22:07:35 -07:00
heyongqiang
dcbd6be340 remove boost
Summary: as subject

Test Plan: build

Reviewers: dhruba

Differential Revision: https://reviews.facebook.net/D5469
2012-09-16 19:33:43 -07:00
Mark Callaghan
fa29f82548 scan a long for FLAGS_cache_size to fix a compiler warning
Summary:
FLAGS_cache_size is a long, no need to scan %lld into a size_t
for it (which generates a compiler warning)

Test Plan: run db_bench

Reviewers: dhruba, heyongqiang

Reviewed By: heyongqiang

CC: heyongqiang

Differential Revision: https://reviews.facebook.net/D5427
2012-09-14 12:45:42 -07:00
Mark Callaghan
837113908c Add --compression_type=X option with valid values: snappy (default) none bzip2 zlib
Summary:
This adds an option to db_bench to specify the compression algorithm to
use for LevelDB

Test Plan: ran db_bench

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D5421
2012-09-14 12:28:21 -07:00
Dhruba Borthakur
93f4952089 Ability to switch off filesystem read-aheads
Summary:
Ability to switch off filesystem read-aheads. This change is
backward-compatible: the default setting is to allow file
system read-aheads.

Test Plan: run benchmarks

Reviewers: heyongqiang, adsharma

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5391
2012-09-13 12:09:56 -07:00
Dhruba Borthakur
7ecc5d4ad5 Enable db_bench to specify block size.
Summary: Enable db_bench to specify block size.

Test Plan: compile and run

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5373
2012-09-13 10:22:43 -07:00
Dhruba Borthakur
407727b75f Fix compiler warnings. Use uint64_t instead of uint.
Summary: Fix compiler warnings. Use uint64_t instead of uint.

Test Plan: build using -Wall

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5355
2012-09-12 14:42:36 -07:00
heyongqiang
0f43aa474e put log in a seperate dir
Summary: added a new option db_log_dir, which points the log dir. Inside that dir, in order to make log names unique, the log file name is prefixed with the leveldb data dir absolute path.

Test Plan: db_test

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D5205
2012-09-06 17:52:08 -07:00
Dhruba Borthakur
536ca698ba The ReadnRandomWriteRandom was always looping FLAGS_num of times.
Summary: If none of reads or writes are specified by user, then pick the FLAGS_NUM as the number of iterations in the ReadRandomWriteRandom test. If either reads or writes are defined, then use their maximum.

Test Plan: run benchmark

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5217
2012-09-06 09:13:24 -07:00
Dhruba Borthakur
94208a7881 Benchmark with both reads and writes at the same time.
Summary:
This patch enables the db_bench benchmark to issue both random reads and random writes at the same time. This options can be trigged via
./db_bench --benchmarks=readrandomwriterandom

The default percetage of reads is 90.

One can change the percentage of reads by specifying the --readwritepercent.
./db_bench --benchmarks=readrandomwriterandom=50

This is a feature request from Jeffro asking for leveldb performance with a 90:10 read:write ratio.

Test Plan: run on test machine.

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5067
2012-09-04 12:06:26 -07:00
Dhruba Borthakur
fe93631678 Clean up compiler warnings generated by -Wall option.
Summary:
Clean up compiler warnings generated by -Wall option.
make clean all OPT=-Wall

This is a pre-requisite before making a new release.

Test Plan: compile and run unit tests

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5019
2012-08-29 14:24:51 -07:00
Dhruba Borthakur
e5fe80e4e3 The sharding of the block cache is limited to 2*20 pieces.
Summary:
The numbers of shards that the block cache is divided into is
configurable. However, if the user specifies that he/she wants
the block cache to be divided into more than 2**20 pieces, then
the system will rey to allocate a huge array of that size) that
could fail.

It is better to limit the sharding of the block cache to an
upper bound. The default sharding is 16 shards (i.e. 2**4)
and the maximum is now 2 million shards (i.e. 2**20).

Also, fixed a bug with the LRUCache where the numShardBits
should be a private member of the LRUCache object rather than
a static variable.

Test Plan:
run db_bench with --cache_numshardbits=64.

Task ID: #

Blame Rev:

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D5013
2012-08-29 12:17:59 -07:00
heyongqiang
a4f9b8b49e merge 1.5
Summary:

as subject

Test Plan:

db_test table_test

Reviewers: dhruba
2012-08-28 11:43:33 -07:00
heyongqiang
6fee5a74f5 Do not spin in a tight loop attempting compactions if there is a compaction error
Summary: as subject. ported the change from google code leveldb 1.5

Test Plan: run db_test

Reviewers: dhruba

Differential Revision: https://reviews.facebook.net/D4839
2012-08-28 11:43:33 -07:00
heyongqiang
935fdd030b fix filename_test
Summary: as subject

Test Plan: run filename_test

Reviewers: dhruba

Differential Revision: https://reviews.facebook.net/D4965
2012-08-28 11:42:42 -07:00
heyongqiang
690bf88682 in db_stats_logger.cc, hold mutex_ while accessing versions_
Summary:

as subject

Test Plan:db_test

Reviewers: dhruba
2012-08-28 11:29:30 -07:00
heyongqiang
d3759ca121 fix db_test error with scribe logger turned on
Summary: as subject

Test Plan: db_test

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D4929
2012-08-28 11:22:58 -07:00
Dhruba Borthakur
fc20273e73 Introduce a new method Env->Fsync() that issues fsync (instead of fdatasync).
Summary:
Introduce a new method Env->Fsync() that issues fsync (instead of fdatasync).
This is needed for data durability when running on ext3 filesystems.
Added options to the benchmark db_bench to generate performance numbers
with either fsync or fdatasync enabled.

Cleaned up Makefile to build leveldb_shell only when building the thrift
leveldb server.

Test Plan: build and run benchmark

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D4911
2012-08-27 21:24:17 -07:00
heyongqiang
1de83cc2ac add more logs
Summary:
as subject

add a tool to read sst file

as subject.

./sst_reader --command=check --file=
./sst_reader --command=scan --file=

Test Plan:
db_test

run this command

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D4881
2012-08-24 15:20:49 -07:00
heyongqiang
1c99b0a6b3 add more logs
Summary:

as subject

Test Plan:db_test

Reviewers: dhruba
2012-08-24 15:18:43 -07:00
Dhruba Borthakur
f3ee54526f Utility to dump manifest contents.
Summary:
./manifest_dump --file=/tmp/dbbench/MANIFEST-000002

Output looks like

manifest_file_number 30 next_file_number 31 last_sequence 388082 log_number 28  prev_log_number 0
--- level 0 ---
--- level 1 ---
--- level 2 ---
 5:3244155['0000000000000000' @ 1 : 1 .. '0000000000028220' @ 28221 : 1]
 7:3244177['0000000000028221' @ 28222 : 1 .. '0000000000056441' @ 56442 : 1]
 9:3244156['0000000000056442' @ 56443 : 1 .. '0000000000084662' @ 84663 : 1]
 11:3244178['0000000000084663' @ 84664 : 1 .. '0000000000112883' @ 112884 : 1]
 13:3244158['0000000000112884' @ 112885 : 1 .. '0000000000141104' @ 141105 : 1]
 15:3244176['0000000000141105' @ 141106 : 1 .. '0000000000169325' @ 169326 : 1]
 17:3244156['0000000000169326' @ 169327 : 1 .. '0000000000197546' @ 197547 : 1]
 19:3244178['0000000000197547' @ 197548 : 1 .. '0000000000225767' @ 225768 : 1]
 21:3244155['0000000000225768' @ 225769 : 1 .. '0000000000253988' @ 253989 : 1]
 23:3244179['0000000000253989' @ 253990 : 1 .. '0000000000282209' @ 282210 : 1]
 25:3244157['0000000000282210' @ 282211 : 1 .. '0000000000310430' @ 310431 : 1]
 27:3244176['0000000000310431' @ 310432 : 1 .. '0000000000338651' @ 338652 : 1]
 29:3244156['0000000000338652' @ 338653 : 1 .. '0000000000366872' @ 366873 : 1]
--- level 3 ---
--- level 4 ---
--- level 5 ---
--- level 6 ---

Test Plan: run on test directory created by dbbench

Reviewers: heyongqiang

Reviewed By: heyongqiang

CC: hustliubo

Differential Revision: https://reviews.facebook.net/D4743
2012-08-24 15:17:09 -07:00
Dhruba Borthakur
e5a7c8e580 Log the open-options to the LOG.
Summary: Log the open-options to the LOG. Use options_ instead of options because SanitizeOptions could modify the max_file_open limit.

Test Plan: num db_bench

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D4833
2012-08-22 12:22:12 -07:00
heyongqiang
21082fa13c regression for trigger compaction logic
Summary: as subject

Test Plan: manually run db_bench confirmed

Reviewers: dhruba

Differential Revision: https://reviews.facebook.net/D4809
2012-08-21 18:11:21 -07:00
Dhruba Borthakur
a098207c95 Fixed unit test c_test by initializing logger=NULL.
Summary:
Fixed unit test c_test by initializing logger=NULL.

Removed "atomic" from last_log_ts so that unit tests do not require C11 compiler.
Anyway, last_log_ts is mostly used for logging, so it is ok if it is loosely
accurate.

Test Plan: run c_test

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D4803
2012-08-21 17:10:29 -07:00
Dhruba Borthakur
f4e7febf22 Record the version of the source repository that was used to build the leveldb library.
Summary: Record the version of the source that we are compiling. We keep a record of the git revision in util/version.cc. This source file is then built as a regular source file as part of the compilation process. One can run "strings executable_filename | grep _build_" to find the version of the source that we used to build the executable file.

Test Plan: none

Differential Revision: https://reviews.facebook.net/D4785
2012-08-21 14:47:15 -07:00
heyongqiang
6ba1f17789 adding a scribe logger in leveldb to log leveldb deploy stats
Summary:
as subject.

A new log is written to scribe via thrift client when a new db is opened and when there is
a compaction.

a new option var scribe_log_db_stats is added.

Test Plan: manually checked using command "ptail -time 0 leveldb_deploy_stats"

Reviewers: dhruba

Differential Revision: https://reviews.facebook.net/D4659
2012-08-21 11:43:22 -07:00
Dhruba Borthakur
e56b2c5a31 Prevent concurrent multiple opens of leveldb database.
Summary:
The fcntl call cannot detect lock conflicts when invoked multiple times
from the same thread.
Use a static lockedFile Set to record the paths that are locked.
A lockfile request checks to see if htis filename already exists in
lockedFiles, if so, then it triggers an error. Otherwise, it inserts
the filename in the lockedFiles Set.
A unlock file request verifies that the filename is in the lockedFiles
set and removes it from lockedFiles set.

Test Plan: unit test attached

Reviewers: heyongqiang

Reviewed By: heyongqiang

Differential Revision: https://reviews.facebook.net/D4755
2012-08-20 23:55:04 -07:00
heyongqiang
deb1a1fa9b add disable wal to db_bench
Summary:
as subject.

./db_bench --benchmarks=fillrandom --num=1000000 --disable_data_sync=1 --write_buffer_size=50000000 --target_file_size_base=100000000 --disable_wal=1

LevelDB:    version 1.4
Date:       Sun Aug 19 16:01:59 2012
CPU:        8 * Intel(R) Xeon(R) CPU           L5630  @ 2.13GHz
CPUCache:   12288 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
------------------------------------------------
fillrandom   :       4.591 micros/op 217797 ops/sec;   24.1 MB/s

./db_bench --benchmarks=fillrandom --num=1000000 --disable_data_sync=1 --write_buffer_size=50000000 --target_file_size_base=100000000

LevelDB:    version 1.4
Date:       Sun Aug 19 16:02:54 2012
CPU:        8 * Intel(R) Xeon(R) CPU           L5630  @ 2.13GHz
CPUCache:   12288 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
------------------------------------------------
fillrandom   :       3.696 micros/op 270530 ops/sec;   29.9 MB/s

Test Plan: db_bench

Reviewers: dhruba

Differential Revision: https://reviews.facebook.net/D4767
2012-08-19 22:37:51 -07:00
Dhruba Borthakur
2aa514ec8c Utility to dump manifest contents.
Summary:
./manifest_dump --file=/tmp/dbbench/MANIFEST-000002

Output looks like

manifest_file_number 30 next_file_number 31 last_sequence 388082 log_number 28  prev_log_number 0
--- level 0 ---
--- level 1 ---
--- level 2 ---
 5:3244155['0000000000000000' @ 1 : 1 .. '0000000000028220' @ 28221 : 1]
 7:3244177['0000000000028221' @ 28222 : 1 .. '0000000000056441' @ 56442 : 1]
 9:3244156['0000000000056442' @ 56443 : 1 .. '0000000000084662' @ 84663 : 1]
 11:3244178['0000000000084663' @ 84664 : 1 .. '0000000000112883' @ 112884 : 1]
 13:3244158['0000000000112884' @ 112885 : 1 .. '0000000000141104' @ 141105 : 1]
 15:3244176['0000000000141105' @ 141106 : 1 .. '0000000000169325' @ 169326 : 1]
 17:3244156['0000000000169326' @ 169327 : 1 .. '0000000000197546' @ 197547 : 1]
 19:3244178['0000000000197547' @ 197548 : 1 .. '0000000000225767' @ 225768 : 1]
 21:3244155['0000000000225768' @ 225769 : 1 .. '0000000000253988' @ 253989 : 1]
 23:3244179['0000000000253989' @ 253990 : 1 .. '0000000000282209' @ 282210 : 1]
 25:3244157['0000000000282210' @ 282211 : 1 .. '0000000000310430' @ 310431 : 1]
 27:3244176['0000000000310431' @ 310432 : 1 .. '0000000000338651' @ 338652 : 1]
 29:3244156['0000000000338652' @ 338653 : 1 .. '0000000000366872' @ 366873 : 1]
--- level 3 ---
--- level 4 ---
--- level 5 ---
--- level 6 ---

Test Plan: run on test directory created by dbbench

Reviewers: heyongqiang

Reviewed By: heyongqiang

CC: hustliubo

Differential Revision: https://reviews.facebook.net/D4743
2012-08-17 22:36:59 -07:00
heyongqiang
680e571c4c add compaction log Summary:
Summary:
add compaction summary to log

log looks like:

2012/08/17-18:18:32.557334 7fdcaa2bb700 Compaction summary: Base level 0, input file:[11 9 7 ],[]

Test Plan: tested via db_test

Reviewers: dhruba

Differential Revision: https://reviews.facebook.net/D4749
2012-08-17 19:29:39 -07:00
heyongqiang
20ee76bd34 use ts as suffix for LOG.old files
Summary: as subject and only maintain 10 log files.

Test Plan: new test in db_test

Reviewers: dhruba

Differential Revision: https://reviews.facebook.net/D4731
2012-08-17 16:22:04 -07:00
heyongqiang
f16e393658 add more options to db_ben
Summary: as subject

Test Plan: run db_bench with new options

Reviewers: dhruba

Differential Revision: https://reviews.facebook.net/D4677
2012-08-15 17:42:33 -07:00
heyongqiang
fcb2ea4715 disable data sync options needs to be checked when doing level-0 dump Summary:
Summary: as subject

Test Plan: use db_bench

Reviewers: dhruba

Differential Revision: https://reviews.facebook.net/D4671
2012-08-15 16:39:02 -07:00
Dhruba Borthakur
c3096afd61 Introduce a new option disableDataSync for opening the database. If this is set to true, then the data written to newly created data files are not sycned to disk, instead depend on the OS to flush dirty data to stable storage. This option is good for bulk
Test Plan:
manual tests

Task ID: #

Blame Rev:

Differential Revision: https://reviews.facebook.net/D4515
2012-08-03 15:23:53 -07:00
heyongqiang
22ee777f68 add flush interface to DB
Summary: as subject. The flush will flush everything in the db.

Test Plan: new test in db_test.cc

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D4029
2012-07-06 12:11:19 -07:00
heyongqiang
a347d4ac0d add disable WAL option
Summary: add disable WAL option

Test Plan: new testcase in db_test.cc

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D4011
2012-07-05 15:06:56 -07:00
heyongqiang
4e4b6812ff Make some variables configurable for each db instance
Summary:
Make configurable 'targetFileSize', 'targetFileSizeMultiplier',
'maxBytesForLevelBase', 'maxBytesForLevelMultiplier',
'expandedCompactionFactor', 'maxGrandParentOverlapFactor'

Test Plan: N/A

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D3801
2012-06-27 14:36:31 -07:00
Dhruba Borthakur
a35e574344 Make Leveldb save data into HDFS files. You have to set USE_HDFS in your environment variable to compile leveldb with HDFS support.
Test Plan: Run benchmark.

Differential Revision: https://reviews.facebook.net/D3549
2012-06-14 00:29:01 -07:00
Dhruba Borthakur
338939e5c1 Print log message when we are throttling writes.
Summary:
Added option --writes=xxx to specify the number of keys that we want to overwrite in the benchmark.

Task ID: #

Blame Rev:

Test Plan: Revert Plan:

Reviewers: adsharma

CC: sc

Differential Revision: https://reviews.facebook.net/D3465
2012-06-01 14:03:37 -07:00
Dhruba Borthakur
f50ece60c7 Fix table-cache size bug, gather table-cache statistics and prevent readahead done by fs. Summary:
Summary:
The db_bench test was not using the specified value for the max-file-open. Fixed.

The fs readhead is switched off.

Gather statistics about the table cache and print it out at the end of the tets run.

Test Plan: Revert Plan:

Reviewers: adsharma, sc

Reviewed By: adsharma

Differential Revision: https://reviews.facebook.net/D3441
2012-05-30 16:42:45 -07:00
Dhruba Borthakur
8f293b68a9 Support --bufferedio=[0,1] from db_bench. If bufferedio = 0, then the read code path clears the OS page cache after the IO is completed. The default remains as bufferedio=1
Summary:
Task ID: #

Blame Rev:

Test Plan: Revert Plan:

Differential Revision: https://reviews.facebook.net/D3429
2012-05-29 13:29:44 -07:00
Dhruba Borthakur
33a3c6ff6c Ability to make the benchmark issue a large number of IOs. This is helpful to populate many gigabytes of data for benchmarking at scale.
Summary:
Task ID: #

Blame Rev:

Test Plan: Revert Plan:

Differential Revision: https://reviews.facebook.net/D3333
2012-05-22 12:20:09 -07:00
Dhruba Borthakur
3b86a51cb1 Ability to switch on checksum verification from benchmark.
Summary:
Task ID: #

Blame Rev:

Test Plan: Revert Plan:

Differential Revision: https://reviews.facebook.net/D3309
2012-05-19 00:13:50 -07:00
Dhruba Borthakur
a2a0e358cb Add support to specify the number of shards for the Block cache. By default, the block cache is sharded into 16 parts.
Summary:
Task ID: #

Blame Rev:

Test Plan: Revert Plan:

Differential Revision: https://reviews.facebook.net/D3273
2012-05-16 17:23:49 -07:00
Dhruba Borthakur
37d0dcb9b1 Use the elapsed time (instead of the per-thread time) to compute ops/sec.
Summary:
Task ID: #

Blame Rev:

Test Plan: Revert Plan:

Differential Revision: https://reviews.facebook.net/D3147
2012-05-11 12:43:31 -07:00
Arun Sharma
90b2924fb2 skiplist: optimize for sequential insert pattern
Summary:
skiplist doesn't cache the location of the last insert and becomes
CPU bound when the input data has sequential keys.

Notes on thread safety: ::Insert() already requires external
synchronization. So this change is not making it any worse.

Test Plan: skiplist_test

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D3129
2012-05-11 09:57:40 -07:00