Repairer documentation improvement.

Summary: Adding verbosity to existing comments. Test Plan: None Reviewers: sdong CC: leveldb Task ID: #6718960 Blame Rev:
2015-04-09 16:11:15 -07:00 · 2015-04-09 16:11:15 -07:00 · 697380f3d7
commit 697380f3d7
parent 2f66d7f925
1 changed files with 41 additions and 12 deletions
--- a/db/repair.cc
+++ b/db/repair.cc
@ -7,24 +7,53 @@
 // Use of this source code is governed by a BSD-style license that can be
 // found in the LICENSE file. See the AUTHORS file for names of contributors.
 //
-// We recover the contents of the descriptor from the other files we find.
+// Repairer does best effort recovery to recover as much data as possible after
-// (1) Any log files are first converted to tables
+// a disaster without compromising consistency. It does not guarantee bringing
-// (2) We scan every table to compute
+// the database to a time consistent state.
-//     (a) smallest/largest for the table
+//
-//     (b) largest sequence number in the table
+// Repair process is broken into 4 phases:
-// (3) We generate descriptor contents:
+// (a) Find files
-//      - log number is set to zero
+// (b) Convert logs to tables
-//      - next-file-number is set to 1 + largest file number we found
+// (c) Extract metadata
-//      - last-sequence-number is set to largest sequence# found across
+// (d) Write Descriptor
-//        all tables (see 2c)
+//
-//      - compaction pointers are cleared
+// (a) Find files
-//      - every table file is added at level 0
+//
 // The repairer goes through all the files in the directory, and classifies them
 // based on their file name. Any file that cannot be identified by name will be
 // ignored.
 //
 // (b) Convert logs to table
 //
 // Every log file that is active is replayed. All sections of the file where the
 // checksum does not match is skipped over. We intentionally give preference to
 // data consistency.
 //
 // (c) Extract metadata
 //
 // We scan every table to compute
 // (1) smallest/largest for the table
 // (2) largest sequence number in the table
 //
 // If we are unable to scan the file, then we ignore the table.
 //
 // (d) Write Descriptor
 //
 // We generate descriptor contents:
 //  - log number is set to zero
 //  - next-file-number is set to 1 + largest file number we found
 //  - last-sequence-number is set to largest sequence# found across
 //    all tables (see 2c)
 //  - compaction pointers are cleared
 //  - every table file is added at level 0
 //
 // Possible optimization 1:
 //   (a) Compute total size and use to pick appropriate max-level M
 //   (b) Sort tables by largest sequence# in the table
 //   (c) For each table: if it overlaps earlier table, place in level-0,
 //       else place in level-M.
 //   (d) We can provide options for time consistent recovery and unsafe recovery
 //       (ignore checksum failure when applicable)
 // Possible optimization 2:
 //   Store per-table metadata (smallest, largest, largest-seq#, ...)
 //   in the table's meta section to speed up ScanTable.