80c663882a
Summary: First draft. Unit tests pass. Test Plan: unit tests attached Reviewers: heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D3969 |
||
---|---|---|
.. | ||
test | ||
Decoder-inl.h | ||
Decoder.h | ||
Encoder-inl.h | ||
Encoder.h | ||
intern_table.thrift | ||
InternTable.h | ||
README | ||
Schema-inl.h | ||
Schema.h | ||
TARGETS | ||
Utils.h |
Neutronium: a very dense encoding of Thrift objects Neutronium is a Thrift encoding format optimized for space at the expense of speed. It achieves high efficiency in a few ways: 1. It does not encode type and tag information. This is stored out of line in a Schema object, which must be provided at both encoding and decoding time. If encoding many Thrift objects, you can transmit / store the Schema only once. The Schema (in thrift/lib/thrift/reflection.thrift) is itself a Thrift object, and can be serialized / deserialized in the usual way. 2. It encodes data very compactly. Bytes use one byte each; larger numbers (i16, i32, i64, double) use variable-length encoding (GroupVarint). Booleans use one bit each. We also encode one bit for every optional field that exists in the structure definition (indicating whether the field is set or not). Strings can be encoded in a variety of formats, see below. 3. Aggregates (lists, maps, sets) are encoded efficiently -- they are encoded like a structure with a variable number of fields. So list<i32> takes advantage of GroupVarint encoding among consecutive values. 4. Strings can be interned: when encoding multiple strings, we can detect duplicates and store only an ID. This requires using an InternTable and passing the same InternTable to the encoder and decoder (the InternTable can be easily serialized and deserialized). Neutronium is backwards compatible as long as the schema is identical from encoding to decoding, and the changes you made to the Thrift definition are backwards compatible (that is, fields were added, removed, or renamed, but field ids remained the same) Configuration: Neutronium can be configured by using field attributes in your Thrift definition: struct Foo { 1: i32 a (neutronium.fixed = 1), } Attributes for number fields: neutronium.fixed = 1 Encode the number as a fixed-length value (i16 takes 2 bytes, i32 takes 4 bytes, i64 takes 8 bytes) instead of using Varint encoding Attributes for string fields: neutronium.fixed = <length> neutronium.pad = 'X' Do not encode the string length, assume that all strings have length <length>. Strings longer than <length> are truncated; strings shorter than <length> are padded with 'X' (default: the null byte, '\0'). Use this when you expect that all / most strings have a fixed length. neutronium.terminator = 'X' Do not encode the string length; store strings terminated with a terminator ('\0' will likely be a popular choice). Encoding strings that contain the terminator is an error. neutronium.intern = 1 Intern strings; requires a non-NULL InternTable. Attributes for enum fields: neutronium.strict = 1 Encode the enum using as few bits as necessary to encode all possible values; note that it becomes an error to encode an enum value that is not specified in the Thrift definition.