Motivation:
According to the spec:
All pseudo-header fields MUST appear in the header block before regular
header fields. Any request or response that contains a pseudo-header
field that appears in a header block after
a regular header field MUST be treated as malformed (Section 8.1.2.6).
Pseudo-header fields are only valid in the context in which they are defined.
Pseudo-header fields defined for requests MUST NOT appear in responses;
pseudo-header fields defined for responses MUST NOT appear in requests.
Pseudo-header fields MUST NOT appear in trailers.
Endpoints MUST treat a request or response that contains undefined or
invalid pseudo-header fields as malformed (Section 8.1.2.6).
Clients MUST NOT accept a malformed response. Note that these requirements
are intended to protect against several types of common attacks against HTTP;
they are deliberately strict because being permissive can expose
implementations to these vulnerabilities.
Modifications:
- Introduce validation in HPackDecoder
Result:
- Requests with unknown pseudo-field headers are rejected
- Requests with containing response specific pseudo-headers are rejected
- Requests where pseudo-header appear after regular header are rejected
- h2spec 8.1.2.1 pass
Motivation:
We used Recycler for the CodecOutputList which is not optimized for the use-case of access only from the same Thread all the time.
Modifications:
- Use FastThreadLocal for CodecOutputList
- Add benchmark
Result:
Less overhead in our codecs.
Motivation:
HttpMethod#valueOf shows up on profiler results in the top set of
results. Since it is a relatively simple operation it can be improved in
isolation.
Modifications:
- Introduce a special case map which assigns each HttpMethod to a unique
index in an array and provides constant time lookup from a hash code
algorithm. When the bucket is matched we can then directly do equality
comparison instead of potentially following a linked structure when
HashMap has hash collisions.
Result:
~10% improvement in benchmark results for HttpMethod#valueOf
Benchmark Mode Cnt Score Error Units
HttpMethodMapBenchmark.newMapKnownMethods thrpt 16 31.831 ± 0.928 ops/us
HttpMethodMapBenchmark.newMapMixMethods thrpt 16 25.568 ± 0.400 ops/us
HttpMethodMapBenchmark.newMapUnknownMethods thrpt 16 51.413 ± 1.824 ops/us
HttpMethodMapBenchmark.oldMapKnownMethods thrpt 16 29.226 ± 0.330 ops/us
HttpMethodMapBenchmark.oldMapMixMethods thrpt 16 21.073 ± 0.247 ops/us
HttpMethodMapBenchmark.oldMapUnknownMethods thrpt 16 49.081 ± 0.577 ops/us
Motivation:
DefaultHttp2FrameWriter#writeData allocates a DataFrameHeader for each write operation. DataFrameHeader maintains internal state and allocates multiple slices of a buffer which is a maximum of 30 bytes. This 30 byte buffer may not always be necessary and the additional slice operations can utilize retainedSlice to take advantage of pooled objects. We can also save computation and object allocations if there is no padding which is a common case in practice.
Modifications:
- Remove DataFrameHeader
- Add a fast path for padding == 0
Result:
Less object allocation in DefaultHttp2FrameWriter
Motivation:
1. Some encoders used a `ByteBuf#writeBytes` to write short constant byte array (2-3 bytes). This can be replaced with more faster `ByteBuf#writeShort` or `ByteBuf#writeMedium` which do not access the memory.
2. Two chained calls of the `ByteBuf#setByte` with constants can be replaced with one `ByteBuf#setShort` to reduce index checks.
3. The signature of method `HttpHeadersEncoder#encoderHeader` has an unnecessary `throws`.
Modifications:
1. Use `ByteBuf#writeShort` or `ByteBuf#writeMedium` instead of `ByteBuf#writeBytes` for the constants.
2. Use `ByteBuf#setShort` instead of chained call of the `ByteBuf#setByte` with constants.
3. Remove an unnecessary `throws` from `HttpHeadersEncoder#encoderHeader`.
Result:
A bit faster writes constants into buffers.
Motivation:
Right now HttpRequestEncoder does insertion of slash for url like http://localhost?pararm=1 before the question mark. It is done not effectively.
Modification:
Code:
new StringBuilder(len + 1)
.append(uri, 0, index)
.append(SLASH)
.append(uri, index, len)
.toString();
Replaced with:
new StringBuilder(uri)
.insert(index, SLASH)
.toString();
Result:
Faster HttpRequestEncoder. Additional small test. Attached benchmark in PR.
Benchmark Mode Cnt Score Error Units
HttpRequestEncoderInsertBenchmark.newEncoder thrpt 40 3704843.303 ± 98950.919 ops/s
HttpRequestEncoderInsertBenchmark.oldEncoder thrpt 40 3284236.960 ± 134433.217 ops/s
Motivation:
The internal.hpack classes are no longer exposed in our public APIs and can be made package private in the http2 package.
Modifications:
- Make the hpack classes package private in the http2 package
Result:
Less APIs exposed as public.
Motivation:
* DefaultHeaders from netty-codec has some duplicated logic for header date parsing
* Several classes keep on using deprecated HttpHeaderDateFormat
Modifications:
* Move HttpHeaderDateFormatter to netty-codec and rename it into HeaderDateFormatter
* Make DefaultHeaders use HeaderDateFormatter
* Replace HttpHeaderDateFormat usage with HeaderDateFormatter
Result:
Faster and more consistent code
Motivation:
99dfc9ea799348430a1c25776ce30a95bc10a1ff introduced some code that will more frequently try to forward messages out of the list of decoded messages to reduce latency and memory footprint. Unfortunally this has the side-effect that RecycleableArrayList.clear() will be called more often and so introduce some overhead as ArrayList will null out the array on each call.
Modifications:
- Introduce a CodecOutputList which allows to not null out the array until we recycle it and also allows to access internal array with extra range checks.
- Add benchmark that add elements to different List implementations and clear them
Result:
Less overhead when decode / encode messages.
Benchmark (elements) Mode Cnt Score Error Units
CodecOutputListBenchmark.arrayList 1 thrpt 20 24853764.609 ± 161582.376 ops/s
CodecOutputListBenchmark.arrayList 4 thrpt 20 17310636.508 ± 930517.403 ops/s
CodecOutputListBenchmark.codecOutList 1 thrpt 20 26670751.661 ± 587812.655 ops/s
CodecOutputListBenchmark.codecOutList 4 thrpt 20 25166421.089 ± 166945.599 ops/s
CodecOutputListBenchmark.recyclableArrayList 1 thrpt 20 24565992.626 ± 210017.290 ops/s
CodecOutputListBenchmark.recyclableArrayList 4 thrpt 20 18477881.775 ± 157003.777 ops/s
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 246.748 sec - in io.netty.handler.codec.CodecOutputListBenchmark