68 lines
2.8 KiB
Plaintext
Raw Normal View History

Neutronium: a very dense encoding of Thrift objects
Neutronium is a Thrift encoding format optimized for space at the expense of
speed. It achieves high efficiency in a few ways:
1. It does not encode type and tag information. This is stored out of line in
a Schema object, which must be provided at both encoding and decoding time.
If encoding many Thrift objects, you can transmit / store the Schema only
once. The Schema (in thrift/lib/thrift/reflection.thrift) is itself a
Thrift object, and can be serialized / deserialized in the usual way.
2. It encodes data very compactly. Bytes use one byte each; larger numbers
(i16, i32, i64, double) use variable-length encoding (GroupVarint). Booleans
use one bit each. We also encode one bit for every optional field that exists
in the structure definition (indicating whether the field is set or not).
Strings can be encoded in a variety of formats, see below.
3. Aggregates (lists, maps, sets) are encoded efficiently -- they are encoded
like a structure with a variable number of fields. So list<i32> takes
advantage of GroupVarint encoding among consecutive values.
4. Strings can be interned: when encoding multiple strings, we can detect
duplicates and store only an ID. This requires using an InternTable and
passing the same InternTable to the encoder and decoder (the InternTable
can be easily serialized and deserialized).
Neutronium is backwards compatible as long as the schema is identical from
encoding to decoding, and the changes you made to the Thrift definition are
backwards compatible (that is, fields were added, removed, or renamed, but
field ids remained the same)
Configuration:
Neutronium can be configured by using field attributes in your Thrift
definition:
struct Foo {
1: i32 a (neutronium.fixed = 1),
}
Attributes for number fields:
neutronium.fixed = 1
Encode the number as a fixed-length value (i16 takes 2 bytes, i32
takes 4 bytes, i64 takes 8 bytes) instead of using Varint encoding
Attributes for string fields:
neutronium.fixed = <length>
neutronium.pad = 'X'
Do not encode the string length, assume that all strings have length
<length>. Strings longer than <length> are truncated; strings
shorter than <length> are padded with 'X' (default: the null byte, '\0').
Use this when you expect that all / most strings have a fixed length.
neutronium.terminator = 'X'
Do not encode the string length; store strings terminated with a
terminator ('\0' will likely be a popular choice). Encoding strings that
contain the terminator is an error.
neutronium.intern = 1
Intern strings; requires a non-NULL InternTable.
Attributes for enum fields:
neutronium.strict = 1
Encode the enum using as few bits as necessary to encode all possible
values; note that it becomes an error to encode an enum value that is not
specified in the Thrift definition.