Motivation:
1. Some encoders used a `ByteBuf#writeBytes` to write short constant byte array (2-3 bytes). This can be replaced with more faster `ByteBuf#writeShort` or `ByteBuf#writeMedium` which do not access the memory.
2. Two chained calls of the `ByteBuf#setByte` with constants can be replaced with one `ByteBuf#setShort` to reduce index checks.
3. The signature of method `HttpHeadersEncoder#encoderHeader` has an unnecessary `throws`.
Modifications:
1. Use `ByteBuf#writeShort` or `ByteBuf#writeMedium` instead of `ByteBuf#writeBytes` for the constants.
2. Use `ByteBuf#setShort` instead of chained call of the `ByteBuf#setByte` with constants.
3. Remove an unnecessary `throws` from `HttpHeadersEncoder#encoderHeader`.
Result:
A bit faster writes constants into buffers.
Motivation:
Right now HttpRequestEncoder does insertion of slash for url like http://localhost?pararm=1 before the question mark. It is done not effectively.
Modification:
Code:
new StringBuilder(len + 1)
.append(uri, 0, index)
.append(SLASH)
.append(uri, index, len)
.toString();
Replaced with:
new StringBuilder(uri)
.insert(index, SLASH)
.toString();
Result:
Faster HttpRequestEncoder. Additional small test. Attached benchmark in PR.
Benchmark Mode Cnt Score Error Units
HttpRequestEncoderInsertBenchmark.newEncoder thrpt 40 3704843.303 ± 98950.919 ops/s
HttpRequestEncoderInsertBenchmark.oldEncoder thrpt 40 3284236.960 ± 134433.217 ops/s
Motivation:
A `StringUtil#escapeCsv` creates new `StringBuilder` on each value even if the same string is returned in the end.
Modifications:
Create new `StringBuilder` only if it really needed. Otherwise, return the original string (or just trimmed substring).
Result:
Less GC load. Up to 4x faster work for not changed strings.
Motivation:
When I run Netty micro benchmarks I get many warnings like:
WARNING: -Dio.netty.noResourceLeakDetection is deprecated. Use '-Dio.netty.leakDetection.level=simple' instead.
Modification:
-Dio.netty.noResourceLeakDetection replaced with -Dio.netty.leakDetection.level=disabled.
Result:
No warnings.
Motivation:
IPv4/6 validation methods use allocations, which can be avoided.
IPv4 parse method use StringTokenizer.
Modifications:
Rewriting IPv4/6 validation methods to avoid allocations.
Rewriting IPv4 parse method without use StringTokenizer.
Result:
IPv4/6 validation and IPv4 parsing faster up to 2-10x.
Motivation:
We currently don't have a native transport which supports kqueue https://www.freebsd.org/cgi/man.cgi?query=kqueue&sektion=2. This can be useful for BSD systems such as MacOS to take advantage of native features, and provide feature parity with the Linux native transport.
Modifications:
- Make a new transport-native-unix-common module with all the java classes and JNI code for generic unix items. This module will build a static library for each unix platform, and included in the dynamic libraries used for JNI (e.g. transport-native-epoll, and eventually kqueue).
- Make a new transport-native-unix-common-tests module where the tests for the transport-native-unix-common module will live. This is so each unix platform can inherit from these test and ensure they pass.
- Add a new transport-native-kqueue module which uses JNI to directly interact with kqueue
Result:
JNI support for kqueue.
Fixes https://github.com/netty/netty/issues/2448
Fixes https://github.com/netty/netty/issues/4231
Motivation:
Java9 added a new method to Unsafe which allows to allocate a byte[] without memset it. This can have a massive impact in allocation times when the byte[] is big. This change allows to enable this when using Java9 with the io.netty.tryAllocateUninitializedArray property when running Java9+. Please note that you will need to open up the jdk.internal.misc package via '--add-opens java.base/jdk.internal.misc=ALL-UNNAMED' as well.
Modifications:
Allow to allocate byte[] without memset on Java9+
Result:
Better performance when allocate big heap buffers and using java9.
Motivation:
There are two files that still use `system.out.println` to log their status
Modification:
Replace `system.out.println` with a `debug` function inside an instance of `InternalLoggerFactory`
Result:
Introduce an instance of `InternalLoggerFactory` in class `AbstractMicrobenchmark.java` and `AbstractSharedExecutorMicrobenchmark.java`
Motivation:
We currently don't have a benchmark which includes SslHandler. The SslEngine benchmarks also always include a single TLS packet when encoding/decoding. In practice when reading data from the network there may be multiple TLS packets present and we should expand the benchmarks to understand this use case.
Modifications:
- SslEngine benchmarks should include wrapping/unwrapping of multiple TLS packets
- Introduce SslHandler benchmarks which can also account for wrapping/unwrapping of multiple TLS packets
Result:
SslHandler and SslEngine benchmarks are more comprehensive.
Motivation:
The internal.hpack classes are no longer exposed in our public APIs and can be made package private in the http2 package.
Modifications:
- Make the hpack classes package private in the http2 package
Result:
Less APIs exposed as public.
Motivation:
Often its useful for the user to be able to get some stats about the memory allocated via an allocator.
Modifications:
- Allow to obtain the used heap and direct memory for an allocator
- Add test case
Result:
Fixes [#6341]
Motivation:
Issue [#6349] brought up the idea to not use UnpooledUnsafeNoCleanerDirectByteBuf by default. To decide what to do a benchmark is needed.
Modifications:
Add benchmarks for UnpooledUnsafeNoCleanerDirectByteBuf vs UnpooledUnsafeDirectB
yteBuf
Result:
Better idea about impact of using UnpooledUnsafeNoCleanerDirectByteBuf.
Motivation:
As we provide our own SSLEngine implementation we should have benchmarks to compare it against JDK impl.
Modifications:
Add benchmarks for wrap / unwrap and handshake performance.
Result:
Benchmarks FTW.
Motivation:
Allmost all our benchmarks are in src/main/java but a few are in src/test/java. We should make it consistent.
Modifications:
Move everything to src/main/java
Result:
Consistent code base.
Motivation:
64-byte alignment is recommended by the Intel performance guide (https://software.intel.com/en-us/articles/practical-intel-avx-optimization-on-2nd-generation-intel-core-processors) for data-structures over 64 bytes.
Requiring padding to a multiple of 64 bytes allows for using SIMD instructions consistently in loops without additional conditional checks. This should allow for simpler and more efficient code.
Modification:
At the moment cache alignment must be setup manually. But probably it might be taken from the system. The original code was introduced by @normanmaurer https://github.com/netty/netty/pull/4726/files
Result:
Buffer alignment works better than miss-align cache.
Motivation:
codec-http2 couples the dependency tree state with the remainder of the stream state (Http2Stream). This makes implementing constraints where stream state and dependency tree state diverge in the RFC challenging. For example the RFC recommends retaining dependency tree state after a stream transitions to closed [1]. Dependency tree state can be exchanged on streams in IDLE. In practice clients may use stream IDs for the purpose of establishing QoS classes and therefore retaining this dependency tree state can be important to client perceived performance. It is difficult to limit the total amount of state we retain when stream state and dependency tree state is combined.
Modifications:
- Remove dependency tree, priority, and weight related items from public facing Http2Connection and Http2Stream APIs. This information is optional to track and depends on the flow controller implementation.
- Move all dependency tree, priority, and weight related code from DefaultHttp2Connection to WeightedFairQueueByteDistributor. This is currently the only place which cares about priority. We can pull out the dependency tree related code in the future if it is generally useful to expose for other implementations.
- DefaultHttp2Connection should explicitly limit the number of reserved streams now that IDLE streams are no longer created.
Result:
More compliant with the HTTP/2 RFC.
Fixes https://github.com/netty/netty/issues/6206.
[1] https://tools.ietf.org/html/rfc7540#section-5.3.4
Motivation:
2fd42cfc6b fixed a bug related to encoding headers but it also introduced a throws statement onto the Http2FrameWriter methods which write headers. This throws statement makes the API more verbose and is not necessary because we can communicate the failure in the ChannelFuture that is returned by these methods.
Modifications:
- Remove throws from all Http2FrameWriter methods.
Result:
Http2FrameWriter APIs do not propagate checked exceptions.
Motivation:
Currently Netty does not wrap socket connect, bind, or accept
operations in doPrivileged blocks. Nor does it wrap cases where a dns
lookup might happen.
This prevents an application utilizing the SecurityManager from
isolating SocketPermissions to Netty.
Modifications:
I have introduced a class (SocketUtils) that wraps operations
requiring SocketPermissions in doPrivileged blocks.
Result:
A user of Netty can grant SocketPermissions explicitly to the Netty
jar, without granting it to the rest of their application.
Motivation:
If the HPACK Decoder detects that SETTINGS_MAX_HEADER_LIST_SIZE has been violated it aborts immediately and sends a RST_STREAM frame for what ever stream caused the issue. Because HPACK is stateful this means that the HPACK state may become out of sync between peers, and the issue won't be detected until the next headers frame. We should make a best effort to keep processing to keep the HPACK state in sync with our peer, or completely close the connection.
If the HPACK Encoder is configured to verify SETTINGS_MAX_HEADER_LIST_SIZE it checks the limit and encodes at the same time. This may result in modifying the HPACK local state but not sending the headers to the peer if SETTINGS_MAX_HEADER_LIST_SIZE is violated. This will also lead to an inconsistency in HPACK state that will be flagged at some later time.
Modifications:
- HPACK Decoder now has 2 levels of limits related to SETTINGS_MAX_HEADER_LIST_SIZE. The first will attempt to keep processing data and send a RST_STREAM after all data is processed. The second will send a GO_AWAY and close the entire connection.
- When the HPACK Encoder enforces SETTINGS_MAX_HEADER_LIST_SIZE it should not modify the HPACK state until the size has been checked.
- https://tools.ietf.org/html/rfc7540#section-6.5.2 states that the initial value of SETTINGS_MAX_HEADER_LIST_SIZE is "unlimited". We currently use 8k as a limit. We should honor the specifications default value so we don't unintentionally close a connection before the remote peer is aware of the local settings.
- Remove unnecessary object allocation in DefaultHttp2HeadersDecoder and DefaultHttp2HeadersEncoder.
Result:
Fixes https://github.com/netty/netty/issues/6209.
Motivation:
- Decoder#decodeULE128 has a bounds bug and cannot decode Integer.MAX_VALUE
- Decoder#decodeULE128 doesn't support values greater than can be represented with Java's int data type. This is a problem because there are cases that require at least unsigned 32 bits (max header table size).
- Decoder#decodeULE128 treats overflowing the data type and invalid input the same. This can be misleading when inspecting the error that is thrown.
- Encoder#encodeInteger doesn't support values greater than can be represented with Java's int data type. This is a problem because there are cases that require at least unsigned 32 bits (max header table size).
Modifications:
- Correct the above issues and add unit tests.
Result:
Fixes https://github.com/netty/netty/issues/6210.
Motivation:
* DefaultHeaders from netty-codec has some duplicated logic for header date parsing
* Several classes keep on using deprecated HttpHeaderDateFormat
Modifications:
* Move HttpHeaderDateFormatter to netty-codec and rename it into HeaderDateFormatter
* Make DefaultHeaders use HeaderDateFormatter
* Replace HttpHeaderDateFormat usage with HeaderDateFormatter
Result:
Faster and more consistent code
Motivation:
* RFC6265 defines its own parser which is different from RFC1123 (it accepts RFC1123 format but also other ones). Basically, it's very lax on delimiters, ignores day of week and timezone. Currently, ClientCookieDecoder uses HttpHeaderDateFormat underneath, and can't parse valid cookies such as Github ones whose expires attribute looks like "Sun, 27 Nov 2016 19:37:15 -0000"
* ServerSideCookieEncoder currently uses HttpHeaderDateFormat underneath for formatting expires field, and it's slow.
Modifications:
* Introduce HttpHeaderDateFormatter that correctly implement RFC6265
* Use HttpHeaderDateFormatter in ClientCookieDecoder and ServerCookieEncoder
* Deprecate HttpHeaderDateFormat
Result:
* Proper RFC6265 dates support
* Faster ServerCookieEncoder and ClientCookieDecoder
* Faster tool for handling headers such as "Expires" and "Date"
Motivation:
The SETTINGS_MAX_HEADER_LIST_SIZE limit, as enforced by the HPACK Encoder, should be a stream error and not apply to the whole connection.
Modifications:
Made the necessary changes for the exception to be of type StreamException.
Result:
A HEADERS frame exceeding the limit, only affects a specific stream.
Motivation:
The responsibility for retaining the settings values and enforcing the settings constraints is spread out in different areas of the code and may be initialized with different values than the default specified in the RFC. This should not be allowed by default and interfaces which are responsible for maintaining/enforcing settings state should clearly indicate the restrictions that they should only be set by the codec upon receipt of a SETTINGS ACK frame.
Modifications:
- Encoder, Decoder, and the Headers Encoder/Decoder no longer expose public constructors that allow the default settings to be changed.
- Http2HeadersDecoder#maxHeaderSize() exists to provide some bound when headers/continuation frames are being aggregated. However this is roughly the same as SETTINGS_MAX_HEADER_LIST_SIZE (besides the 32 byte octet for each header field) and can be used instead of attempting to keep the two independent values in sync.
- Encoding headers now enforces SETTINGS_MAX_HEADER_LIST_SIZE at the octect level. Previously the header encoder compared the number of header key/value pairs against SETTINGS_MAX_HEADER_LIST_SIZE instead of the number of octets (plus 32 bytes overhead).
- DefaultHttp2ConnectionDecoder#onData calls shouldIgnoreHeadersOrDataFrame but may swallow exceptions from this method. This means a STREAM_RST frame may not be sent when it should for an unknown stream and thus violate the RFC. The exception is no longer swallowed.
Result:
Default settings state is enforced and interfaces related to settings state are clarified.
Motivation:
the build doesnt seem to enforce this, so they piled up
Modifications:
removed unused import lines
Result:
less unused imports
Signed-off-by: radai-rosenblatt <radai.rosenblatt@gmail.com>
Motivation:
It'd be usually good to use the latest library version.
Modification:
Bumped JMH to the latest version as of today.
Result:
Now we use JMH version 1.14.1 for our benchmark.
Motivation:
The HTTP/2 HPACK Encoder class has some code which is only used for test purposes. This code can be removed to reduce complexity and member variable count.
Modifications:
- Remove test code and update unit tests
- Other minor cleanup
Result:
Test code is removed from operational code.
Motivation:
The HPACK decoder keeps state so that the decode method can be called multiple times with successive header fragments. This decoder also requires that a method is called to signify the decoding is complete. At this point status is returned to indicate if the max header size has been violated. Netty always accumulates the header fragments into a single buffer before attempting to HPACK decode process and so keeping state and delaying notification that bounds have been exceeded is not necessary.
Modifications:
- HPACK Decoder#decode(..) now must be called with a complete header block
- HPACK will terminate immediately if the maximum header length, or maximum number of headers is exceeded
- Reduce member variables in the HPACK Decoder class because they can now live in the decode(..) method
Result:
HPACK bounds checking is done earlier and less class state is needed.
Motivation:
retainSlice() currently does not unwrap the ByteBuf when creating the ByteBuf wrapper. This effectivley forms a linked list of ByteBuf when it is only necessary to maintain a reference to the unwrapped ByteBuf.
Modifications:
- retainSlice() and retainDuplicate() variants should only maintain a reference to the unwrapped ByteBuf
- create new unit tests which generally verify the retainSlice() behavior
- Remove unecessary generic arguments from AbstractPooledDerivedByteBuf
- Remove unecessary int length member variable from the unpooled sliced ByteBuf implementation
- Rename the unpooled sliced/derived ByteBuf to include Unpooled in their name to be more consistent with the Pooled variants
Result:
Fixes https://github.com/netty/netty/issues/5582
Motivation:
HPACK Encoder has a data structure which is similar to a previous version of DefaultHeaders. Some of the same improvements can be made.
Motivation:
- Enforce the restriction that the Encoder's headerFields length must be a power of two so we can use masking instead of modulo
- Use AsciiString.hashCode which already has optimizations instead of having yet another hash code algorithm in Encoder
Result:
Fixes https://github.com/netty/netty/issues/5357
Motivation:
PlatformDependent attempts to use reflection to get the underlying char[] (or byte[]) from String objects. This is fragile as if the String implementation does not utilize the full array, and instead uses a subset of the array, this optimization is invalid. OpenJDK6 and some earlier versions of OpenJDK7 String have the capability to use a subsection of the underlying char[].
Modifications:
- PlatformDependent should not attempt to use the underlying array from String (or other data types) via reflection
Result:
PlatformDependent hash code generation for CharSequence does not depend upon specific JDK implementation details.
Motivation:
PR #5355 modified interfaces to reduce GC related to the HPACK code. However this came with an anticipated performance regression related to HpackUtil.equals due to AsciiString's increase cost of charAt(..). We should mitigate this performance regression.
Modifications:
- Introduce an equals method in PlatformDependent which doesn't leak timing information and use this in HpcakUtil.equals
Result:
Fixes https://github.com/netty/netty/issues/5436
Motivation:
It is good to have used dependencies and plugins up-to-date to fix any undiscovered bug fixed by the authors.
Modification:
Scanned dependencies and plugins and carefully updated one by one.
Result:
Dependencies and plugins are up-to-date.
Motivations:
The HPACK code was not really optimized and written with Netty types in mind. Because of this a lot of garbage was created due heavy object creation.
This was first reported in [#3597] and https://github.com/grpc/grpc-java/issues/1872 .
Modifications:
- Directly use ByteBuf as input and output
- Make use of ByteProcessor where possible
- Use AsciiString as this is the only thing we need for our http2 usage
Result:
Less garbage and better usage of Netty apis.
Motivation:
99dfc9ea79 introduced some code that will more frequently try to forward messages out of the list of decoded messages to reduce latency and memory footprint. Unfortunally this has the side-effect that RecycleableArrayList.clear() will be called more often and so introduce some overhead as ArrayList will null out the array on each call.
Modifications:
- Introduce a CodecOutputList which allows to not null out the array until we recycle it and also allows to access internal array with extra range checks.
- Add benchmark that add elements to different List implementations and clear them
Result:
Less overhead when decode / encode messages.
Benchmark (elements) Mode Cnt Score Error Units
CodecOutputListBenchmark.arrayList 1 thrpt 20 24853764.609 ± 161582.376 ops/s
CodecOutputListBenchmark.arrayList 4 thrpt 20 17310636.508 ± 930517.403 ops/s
CodecOutputListBenchmark.codecOutList 1 thrpt 20 26670751.661 ± 587812.655 ops/s
CodecOutputListBenchmark.codecOutList 4 thrpt 20 25166421.089 ± 166945.599 ops/s
CodecOutputListBenchmark.recyclableArrayList 1 thrpt 20 24565992.626 ± 210017.290 ops/s
CodecOutputListBenchmark.recyclableArrayList 4 thrpt 20 18477881.775 ± 157003.777 ops/s
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 246.748 sec - in io.netty.handler.codec.CodecOutputListBenchmark
Motivation:
We tried to provide the ability for the user to change the semantics of the threading-model by delegate the invoking of the ChannelHandler to the ChannelHandlerInvoker. Unfortunually this not really worked out quite well and resulted in just more complexity and splitting of code that belongs together. We should remove the ChannelHandlerInvoker again and just do the same as in 4.0
Modifications:
Remove ChannelHandlerInvoker again and replace its usage in Http2MultiplexCodec
Result:
Easier code and less bad abstractions.
Motivation:
KObjectHashMap.probeNext(..) usually involves 2 conditional statements and 2 aritmatic operations. This can be improved to have 0 conditional statements.
Modifications:
- Use bit masking to avoid conditional statements
Result:
Improved performance for KObjecthashMap.probeNext(..)
Motivation:
The DefaultHttp2Conneciton.close method accounts for active streams being iterated and attempts to avoid reentrant modifications of the underlying stream map by using iterators to remove from the stream map. However there are a few issues:
- While iterating over the stream map we don't prevent iterations over the active stream collection
- Removing a single stream may actually remove > 1 streams due to closed non-leaf streams being preserved in the priority tree which may result in NPE
Preserving closed non-leaf streams in the priority tree is no longer necessary with our current allocation algorithms, and so this feature (and related complexity) can be removed.
Modifications:
- DefaultHttp2Connection.close should prevent others from iterating over the active streams and reentrant modification scenarios which may result from this
- DefaultHttp2Connection should not keep closed stream in the priority tree
- Remove all associated code in DefaultHttp2RemoteFlowController which accounts for this case including the ReducedState object
- This includes fixing writability changes which depended on ReducedState
- Update unit tests
Result:
Fixes https://github.com/netty/netty/issues/5198
Motivation:
Some codecs should be considered unstable as these are relative new. For this purpose we should introduce an annotation which these codecs should us to be marked as unstable in terms of API.
Modifications:
- Add UnstableApi annotation and use it on codecs that are not stable
- Move http2.hpack to http2.internal.hpack as it is internal.
Result:
Better document unstable APIs.
Motivation:
Before release 4.1.0.Final we should update all our dependencies.
Modifications:
Update dependencies.
Result:
Up-to-date dependencies used.
Motivation:
The current slow path of FastThreadLocal is much slower than JDK ThreadLocal. See #4418
Modifications:
- Add FastThreadLocalSlowPathBenchmark for the flow path of FastThreadLocal
- Add final to speed up the slow path of FastThreadLocal
Result:
The slow path of FastThreadLocal is improved.
Motivation:
See https://github.com/netty/netty-build/issues/5
Modifications:
Add xml-maven-plugin to check indentation and fix violations
Result:
pom.xml will be checked in the PR build
Motivation:
Currently the initial headers for every stream is queued in the flow controller. Since the initial header frame may create streams the peer must receive these frames in the order in which they were created, or else this will be a protocol error and the connection will be closed. Tolerating the initial headers being queued would increase the complexity of the WeightedFairQueueByteDistributor and there is benefit of doing so is not clear.
Modifications:
- The initial headers will no longer be queued in the flow controllers
Result:
Fixes https://github.com/netty/netty/issues/4758
Motivation:
Being able to access the invoker() is useful when adding additional
handlers that should be running in the same thread. Since an application
may be using a threading model unsupported by the default invoker, they
can specify their own. Because of that, in a handler that auto-adds
other handlers:
// This is a good pattern
ctx.pipeline().addBefore(ctx.invoker(), ctx.name(), null, newHandler);
// This will generally work, but prevents using custom invoker.
ctx.pipeline().addBefore(ctx.executor(), ctx.name(), null, newHandler);
That's why I believe in commit 110745b0, for the now-defunct 5.0 branch,
when ChannelHandlerAppender was added the invoker() method was also
necessary.
There is a side-benefit to exposing the invoker: in certain advanced
use-cases using the invoker for a particular handler is useful. Using
the invoker you are able to invoke a _particular_ handler, from possibly
a different thread yet still using standard exception processing.
ChannelHandlerContext does part of that, but is unwieldy when trying to
invoke a particular handler because it invokes the prev or next handler,
not the one the context is for. A workaround is to use the next or prev
context (respectively), but this breaks when the pipeline changes.
This came up during writing the Http2MultiplexCodec which uses a
separate child channel for each http/2 stream and wants to send messages
from the child channel directly to the Http2MultiplexCodec handler that
created it.
Modifications:
Add the invoker() method to ChannelHandlerContext. It was already being
implemented by AbstractChannelHandlerContext. The two other
implementations of ChannelHandlerContext needed minor tweaks.
Result:
Access to the invoker used for a particular handler, for either reusing
for other handlers or for advanced use-cases. Fixes#4738
Motivation:
Javadoc reports errors about invalid docs.
Modifications:
Fix some errors reported by javadoc.
Result:
A lot of javadoc errors are fixed by this patch.
Motivation:
PriorityStreamByteDistributor has been removed but NoPriorityByteDistributionBenchmark in microbench still need it and causes compile error
Modifications:
Remove PriorityStreamByteDistributor from NoPriorityByteDistributionBenchmark
Result:
The compile error has been fixed
Motivation:
The `NoPriorityByteDistibbutionBenchmark` was broken with a recent commit.
Modifications:
Fixed the benchmark to use the new HTTP2 handler builder.
Result:
It builds.