80c663882a
Summary: First draft. Unit tests pass. Test Plan: unit tests attached Reviewers: heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D3969
68 lines
2.8 KiB
Plaintext
68 lines
2.8 KiB
Plaintext
Neutronium: a very dense encoding of Thrift objects
|
|
|
|
Neutronium is a Thrift encoding format optimized for space at the expense of
|
|
speed. It achieves high efficiency in a few ways:
|
|
|
|
1. It does not encode type and tag information. This is stored out of line in
|
|
a Schema object, which must be provided at both encoding and decoding time.
|
|
If encoding many Thrift objects, you can transmit / store the Schema only
|
|
once. The Schema (in thrift/lib/thrift/reflection.thrift) is itself a
|
|
Thrift object, and can be serialized / deserialized in the usual way.
|
|
|
|
2. It encodes data very compactly. Bytes use one byte each; larger numbers
|
|
(i16, i32, i64, double) use variable-length encoding (GroupVarint). Booleans
|
|
use one bit each. We also encode one bit for every optional field that exists
|
|
in the structure definition (indicating whether the field is set or not).
|
|
Strings can be encoded in a variety of formats, see below.
|
|
|
|
3. Aggregates (lists, maps, sets) are encoded efficiently -- they are encoded
|
|
like a structure with a variable number of fields. So list<i32> takes
|
|
advantage of GroupVarint encoding among consecutive values.
|
|
|
|
4. Strings can be interned: when encoding multiple strings, we can detect
|
|
duplicates and store only an ID. This requires using an InternTable and
|
|
passing the same InternTable to the encoder and decoder (the InternTable
|
|
can be easily serialized and deserialized).
|
|
|
|
Neutronium is backwards compatible as long as the schema is identical from
|
|
encoding to decoding, and the changes you made to the Thrift definition are
|
|
backwards compatible (that is, fields were added, removed, or renamed, but
|
|
field ids remained the same)
|
|
|
|
Configuration:
|
|
|
|
Neutronium can be configured by using field attributes in your Thrift
|
|
definition:
|
|
|
|
struct Foo {
|
|
1: i32 a (neutronium.fixed = 1),
|
|
}
|
|
|
|
Attributes for number fields:
|
|
neutronium.fixed = 1
|
|
Encode the number as a fixed-length value (i16 takes 2 bytes, i32
|
|
takes 4 bytes, i64 takes 8 bytes) instead of using Varint encoding
|
|
|
|
Attributes for string fields:
|
|
neutronium.fixed = <length>
|
|
neutronium.pad = 'X'
|
|
Do not encode the string length, assume that all strings have length
|
|
<length>. Strings longer than <length> are truncated; strings
|
|
shorter than <length> are padded with 'X' (default: the null byte, '\0').
|
|
Use this when you expect that all / most strings have a fixed length.
|
|
|
|
neutronium.terminator = 'X'
|
|
Do not encode the string length; store strings terminated with a
|
|
terminator ('\0' will likely be a popular choice). Encoding strings that
|
|
contain the terminator is an error.
|
|
|
|
neutronium.intern = 1
|
|
Intern strings; requires a non-NULL InternTable.
|
|
|
|
Attributes for enum fields:
|
|
neutronium.strict = 1
|
|
Encode the enum using as few bits as necessary to encode all possible
|
|
values; note that it becomes an error to encode an enum value that is not
|
|
specified in the Thrift definition.
|
|
|