Motivation:
We should just add `executor()` to the `ChannelOutboundInvoker` interface and override this method in `Channel` to return `EventLoop`.
Modifications:
- Add `executor()` method to `ChannelOutboundInvoker`
- Let `Channel` override this method and return `EventLoop`.
- Adjust all usages of `eventLoop()`
- Add some default implementations
Result:
API cleanup
Motivation:
At the moment the outbound operations of ChannelHandler take a Promise as argument. This Promise needs to be carried forward to the next handler in the pipeline until it hits the transport. This is API choice has a few quirks which we should aim to remove:
- There is a difference between if you add a FutureListener to the Promise or the Future that is returned by the outbound method in terms of the ordering of execution of the listeners. Sometimes we add the listener to the promise while in reality we usually always want to add it to the future to ensure the listerns are executed in the "correct order".
- It is quite easy to "loose" a promise by forgetting to use the right method which also takes a promise
- We have no idea what EventExecutor is used for the passed in Promise which may invalid our assumption of threading.
While changing the method signature of the outbound operations of the ChannelHandler is a good step forward we should also take care of just remove all the methods from ChannelOutboundInvoker (and its sub-types) that take a Promise and just always use the methods that return a Future only.
Modifications:
- Change the signature of the methods that took a Promise to not take one anymore and just return a Future
- Remove all operations for ChannelOutboundInvoker that take a Promise.
- Adjust all code to cope with the API changes
Result:
Cleaner API which is easier to reason about and easier to use.
Motivation:
The generics for the existing futures, promises, and listeners are too complicated.
This complication comes from the existence of `ChannelPromise` and `ChannelFuture`, which forces listeners to care about the particular _type_ of future being listened on.
Modification:
* Add a `FutureContextListener` which can take a context object as an additional argument. This allows our listeners to have the channel piped through to them, so they don't need to rely on the `ChannelFuture.channel()` method.
* Make the `FutureListener`, along with the `FutureContextListener` sibling, the default listener API, retiring the `GenericFutureListener` since we no longer need to abstract over the type of the future.
* Change all uses of `ChannelPromise` to `Promise<Void>`.
* Change all uses of `ChannelFuture` to `Future<Void>`.
* Change all uses of `GenericFutureListener` to either `FutureListener` or `FutureContextListener` as needed.
* Remove `ChannelFutureListener` and `GenericFutureListener`.
* Introduce a `ChannelFutureListeners` enum to house the constants that previously lived in `ChannelFutureListener`. These constants now implement `FutureContextListener` and take the `Channel` as a context.
* Remove `ChannelPromise` and `ChannelFuture` — all usages now rely on the plain `Future` and `Promise` APIs.
* Add static factory methods to `DefaultPromise` that allow us to create promises that are initialised as successful or failed.
* Remove `CompleteFuture`, `SucceededFuture`, `FailedFuture`, `CompleteChannelFuture`, `SucceededChannelFuture`, and `FailedChannelFuture`.
* Remove `ChannelPromiseNotifier`.
Result:
Cleaner generics and more straight forward code.
Motivation:
A number of classes and APIs around the ResourceLeakDetector have been deprecated for removal in Netty 5.x, because better alternatives exist.
Modification:
Remove everything in and around ResourceLeakDetector that is deprecated, and fix the few usages that were found.
Result:
Less deprecated code.
Motivation:
There are lots of redundant variable declarations which should be inlined to make good look better.
Modification:
Made variables inlined.
Result:
Less redundant variable and more readable code.
Motivation:
We should make variables `final` which are not reinstated again in code to match the code style and makes the code look better.
Modification:
Made couples of variables as `final`.
Result:
Variables marked as `final`.
Motivation:
There are lots of imports which are unused. We should get rid of them to make the code look better,
Modification:
Removed unused imports.
Result:
No unused imports.
Bootstrap methods now return Future<Channel> instead of ChannelFuture
Motivation:
In #8516 it was proposed to at some point remove the specialised ChannelFuture and ChannelPromise.
Or at least make them not extend Future and Promise, respectively.
One pain point encountered in this discussion is the need to get access to the channel object after it has been initialised, but without waiting for the channel registration to propagate through the pipeline.
Modification:
Add a Bootstrap.createUnregistered method, which will return a Channel directly.
All other Bootstrap methods that previously returned ChannelFuture now return Future<Channel>
Result:
It's now possible to obtain an initialised but unregistered channel from a bootstrap, without blocking.
And the other bootstrap methods now only release their channels through the result of their futures, preventing racy access to the channels.
Motivation:
We did migrate all these modules to junit5 before but missed a few usages of junit4
Modifications:
Replace all junit4 imports by junit5 apis
Result:
Part of https://github.com/netty/netty/issues/10757
Motivation:
At the moment all methods in `ChannelHandler` declare `throws Exception` as part of their method signature. While this is fine for methods that handle inbound events it is quite confusing for methods that handle outbound events. This comes due the fact that these methods also take a `ChannelPromise` which actually need to be fullfilled to signal back either success or failure. Define `throws...` for these methods is confusing at best. We should just always require the implementation to use the passed in promise to signal back success or failure. Doing so also clears up semantics in general. Due the fact that we can't "forbid" throwing `RuntimeException` we still need to handle this in some way tho. In this case we should just consider it a "bug" and so log it and close the `Channel` in question. The user should never have an exception "escape" their implementation and just use the promise. This also clears up the ownership of the passed in message etc.
As `flush(ChannelHandlerContext)` and `read(ChannelHandlerContext)` don't take a `ChannelPromise` as argument this also means that these methods can never produce an error. This makes kind of sense as these really are just "signals" for the underlying transports to do something. For `RuntimeException` the same rule is used as for other outbound event handling methods, which is logging and closing the `Channel`.
Motifications:
- Remove `throws Exception` from signature
- Adjust code to not throw and just notify the promise directly
- Adjust unit tests
Result:
Much cleaner API and semantics.
Motivation:
This special case implementation of Promise / Future requires the implementations responsible for completing the promise to have knowledge of this class to provide value. It also requires that the implementations are able to provide intermediate status while the work is being done. Even throughout the core of Netty it is not really supported most of the times and so just brings more complexity without real gain.
Let's remove it completely which is better then only support it sometimes.
Modifications:
Remove Progressive* API
Result:
Code cleanup.... Fixes https://github.com/netty/netty/issues/8519
Motivation:
Sometime in the past we introduced the concept of Void*Promise. As it turned out this was not a good idea at all as basically each handler in the pipeline need to be very careful to correctly handle this. We should better just remove this "optimization".
Modifications:
- Remove Void*Promise and all the related APIs
- Remove tests which were related to Void*Promise
Result:
Less error-prone API
Motivation:
NullChecks resulting in a NullPointerException or IllegalArgumentException, numeric ranges (>0, >=0) checks, not empty strings/arrays checks must never be anonymous but with the parameter or variable name which is checked. They must be specific and should not be done with an "OR-Logic" (if a == null || b == null) throw new NullPointerEx.
Modifications:
* import static relevant checks
* Replace manual checks with ObjectUtil methods
Result:
All checks needed are done with ObjectUtil, some exception texts are improved in microbench and resolver-dns
Fixes#11170
Motivation:
- Underlying buffer usages might be erroneous when releasing them internaly
in HttpPostMultipartRequestDecoder.
2 bugs occurs:
1) Final File upload seems not to be of the right size.
2) Memory, even in Disk mode, is increasing continuously, while it shouldn't.
- Method `getByte(position)` is too often called within the current implementation
of the HttpPostMultipartRequestDecoder.
This implies too much activities which is visible when PARANOID mode is active.
This is also true in standard mode.
Apply the same fix on buffer from HttpPostMultipartRequestDecoder to HttpPostStandardRequestDecoder
made previously.
Finally in order to ensure we do not rewrite already decoded HttpData when decoding
next ones within multipart, we must ensure the buffers are copied and not a retained slice.
Modification:
- Add some tests to check consistency for HttpPostMultipartRequestDecoder.
Add a package protected method for testing purpose only.
- Use the `bytesBefore(...)` method instead of `getByte(pos)` in order to limit the external
access to the underlying buffer by retrieving iteratively the beginning of a correct start
position.
It is used to find both LF/CRLF and delimiter.
2 methods in HttpPostBodyUtil were created for that.
The undecodedChunk is copied when adding a chunk to a DataMultipart is loaded.
The same buffer is also rewritten in order to release the copied memory part.
Result:
Just for note, for both Memory or Disk or Mixed mode factories, the release has to be done as:
for (InterfaceHttpData httpData: decoder.getBodyHttpDatas()) {
httpData.release();
factory.removeHttpDataFromClean(request, httpData);
}
factory.cleanAllHttpData();
decoder.destroy();
The memory used is minimal in Disk or Mixed mode. In Memory mode, a big file is still
in memory but not more in the undecodedChunk but its own buffer (copied).
In terms of benchmarking, the results are:
Original code Benchmark Mode Cnt Score Error Units
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel thrpt 6 0,152 ± 0,100 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel thrpt 6 0,543 ± 0,218 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel thrpt 6 0,001 ± 0,001 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel thrpt 6 0,615 ± 0,070 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel thrpt 6 0,114 ± 0,063 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel thrpt 6 0,664 ± 0,034 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel thrpt 6 0,001 ± 0,001 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel thrpt 6 0,620 ± 0,140 ops/ms
New code Benchmark Mode Cnt Score Error Units
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel thrpt 6 4,037 ± 0,358 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel thrpt 6 4,226 ± 0,471 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel thrpt 6 0,875 ± 0,029 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel thrpt 6 4,346 ± 0,275 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel thrpt 6 2,044 ± 0,020 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel thrpt 6 2,278 ± 0,159 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel thrpt 6 0,174 ± 0,004 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel thrpt 6 2,370 ± 0,065 ops/ms
In short, using big file transfers, this is about 7 times faster with new code, while
using high number of HttpData, this is about 4 times faster with new code when using Simple Level.
When using Paranoid Level, using big file transfers, this is about 800 times faster with new code, while
using high number of HttpData, this is about 170 times faster with new code.
Motivation:
DecodeHexBenchmark needs to be less branch-predictor friendly
to mimic the "real" behaviour while decoding
Modifications:
DecodeHexBenchmark uses a larger sets of inputs, picking them at
random on each iteration and the benchmarked method is made !inlineable
Result:
DecodeHexBenchmark is more trusty while showing the performance
difference between different decoding methods
Motivation:
HPACK static table is organized in a way that fields with the same
name are sequential. Which means when doing sequential scan we can
short-circuit scan on name mismatch.
Modifications:
* `HpackStaticTable.getIndexIndensitive` returns -1 on name mismatch
rather than keep scanning.
* `HpackStaticTable` statically defined max position in the array
where name duplication is possible (after the given index there's
no need to check for other fields with the same name)
* Benchmark for different lookup patterns
Result:
Better HPACK static table lookup performance.
Co-authored-by: Norman Maurer <norman_maurer@apple.com>
Motivation:
https://github.com/netty/netty/pull/10267 introduced a change that reduced the fragmentation. Unfortunally it also introduced a regression when it comes to caching of normal allocations. This can have a negative performance impact depending on the allocation sizes.
Modifications:
- Fix algorithm to calculate the array size for normal allocation caches
- Correctly calculate indeox for normal caches
- Add unit test
Result:
Fixes https://github.com/netty/netty/issues/10805
Fix issue #10508 where PARANOID mode slow down about 1000 times compared to ADVANCED.
Also fix a rare issue when internal buffer was growing over a limit, it was partially discarded
using `discardReadBytes()` which causes bad changes within previously discovered HttpData.
Reasons were:
Too many `readByte()` method calls while other ways exist (such as keep in memory the last scan position when trying to find a delimiter or using `bytesBefore(firstByte)` instead of looping externally).
Changes done:
- major change on way buffer are parsed: instead of read byte per byte until found delimiter, try to find the delimiter using `bytesBefore()` and keep the last unfound position to skeep already parsed parts (algorithms are the same but implementation of scan are different)
- Change the condition to discard read bytes when refCnt is at most 1.
Observations using Async-Profiler:
==================================
1) Without optimizations, most of the time (more than 95%) is through `readByte()` method within `loadDataMultipartStandard` method.
2) With using `bytesBefore(byte)` instead of `readByte()` to find various delimiter, the `loadDataMultipartStandard` method is going down to 19 to 33% depending on the test used. the `readByte()` method or equivalent `getByte(pos)` method are going down to 15% (from 95%).
Times are confirming those profiling:
- With optimizations, in SIMPLE mode about 82% better, in ADVANCED mode about 79% better and in PARANOID mode about 99% better (most of the duplicate read accesses are removed or make internally through `bytesBefore(byte)` method)
A benchmark is added to show the behavior of the various cases (one big item, such as File upload, and many items) and various level of detection (Disabled, Simple, Advanced, Paranoid). This benchmark is intend to alert if new implementations make too many differences (such as the previous version where about PARANOID gives about 1000 times slower than other levels, while it is now about at most 10 times).
Extract of Benchmark run:
=========================
Run complete. Total time: 00:13:27
Benchmark Mode Cnt Score Error Units
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel thrpt 6 2,248 ± 0,198 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel thrpt 6 2,067 ± 1,219 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel thrpt 6 1,109 ± 0,038 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel thrpt 6 2,326 ± 0,314 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel thrpt 6 1,444 ± 0,226 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel thrpt 6 1,462 ± 0,642 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel thrpt 6 0,159 ± 0,003 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel thrpt 6 1,522 ± 0,049 ops/ms
Motivation:
https in xmlns URIs does not work and will let the maven release plugin fail:
```
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.779 s
[INFO] Finished at: 2020-11-10T07:45:21Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-release-plugin:2.5.3:prepare (default-cli) on project netty-parent: Execution default-cli of goal org.apache.maven.plugins:maven-release-plugin:2.5.3:prepare failed: The namespace xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" could not be added as a namespace to "project": The namespace prefix "xsi" collides with an additional namespace declared by the element -> [Help 1]
[ERROR]
```
See also https://issues.apache.org/jira/browse/HBASE-24014.
Modifications:
Use http for xmlns
Result:
Be able to use maven release plugin
Motivation:
JUnit 5 is the new hotness. It's more expressive, extensible, and composable in many ways, and it's better able to run tests in parallel. But most importantly, it's able to directly run JUnit 4 tests.
This means we can update and start using JUnit 5 without touching any of our existing tests.
I'm also introducing a dependency on assertj-core, which is like hamcrest, but arguably has a nicer and more discoverable API.
Modification:
Add the JUnit 5 and assertj-core dependencies, without converting any tests at time time.
Result:
All our tests are now executed through the JUnit 5 Vintage Engine.
Also, the JUnit 5 test APIs are available, and any JUnit 5 tests that are added from now on will also be executed.
Motivation:
HTTP is a plaintext protocol which means that someone may be able
to eavesdrop the data. To prevent this, HTTPS should be used whenever
possible. However, maintaining using https:// in all URLs may be
difficult. The nohttp tool can help here. The tool scans all the files
in a repository and reports where http:// is used.
Modifications:
- Added nohttp (via checkstyle) into the build process.
- Suppressed findings for the websites
that don't support HTTPS or that are not reachable
Result:
- Prevent using HTTP in the future.
- Encourage users to use HTTPS when they follow the links they found in
the code.
Motivation:
DefaultAttributeMap::attr has a blocking behaviour on lookup of an existing attribute:
it can be made non-blocking.
Modification:
Replace the existing fixed bucket table using a locked intrusive linked list
with an hand-rolled copy-on-write ordered single array
Result:
Non blocking behaviour for the lookup happy path
Motivation: Code failed to compile because ByteBuf index marking has been removed.
Modification: Index marking wasn't really used anyway, so just set the relevant index to zero.
Result: Code compiles again.
Motivation:
writeUtf8 can suffer from inlining issues and/or megamorphic call-sites on the hot path due to ByteBuf hierarchy
Modifications:
Duplicate and specialize the code paths to reduce the need of polymorphic calls
Result:
Performance are more stable in user code
Reduce garbage on MQTT encoding
Motivation:
MQTT encoding and decoding is doing unnecessary object allocation in a number of places:
- MqttEncoder create many byte[] to encode Strings into UTF-8 bytes
- MqttProperties uses Integer keys instead of int
- Some enums valueOf create unnecessary arrays on the hot paths
- MqttDecoder was using unecessary Result<T>
Modification:
- ByteBufUtil::utf8Bytes and ByteBufUtil::reserveAndWriteUtf8 allows to perform the same operation GC-free
- MqttProperties uses a primitive key map
- Implemented GC free const table lookup/switch valueOf
- Use some bit-tricks to pack 2 ints into a single primitive long to store both result and numberOfBytesConsumed and use byte[].length to compute numberOfByteConsumed on fly. These changes allowed to save creating Result<T>.
Result:
Significantly less garbage produced in MQTT encoding/decoding
Motivation:
We have found out that ByteBufUtil.indexOf can be inefficient for substring search on
ByteBuf, both in terms of algorithm complexity (worst case O(needle.readableBytes *
haystack.readableBytes)), and in constant factor (esp. on Composite buffers).
With implementation of more performant search algorithms we have seen improvements on
the order of magnitude.
Modifications:
This change introduces three search algorithms:
1. Knuth Morris Pratt - classical textbook algorithm, a good default choice.
2. Bit mask based algorithm - stable performance on any input, but limited to maximum
search substring (the needle) length of 64 bytes.
3. Aho–Corasick - worse performance and higher memory consumption than [1] and [2], but
it supports multiple substring (the needles) search simultaneously, by inspecting every
byte of the haystack only once.
Each algorithm processes every byte of underlying buffer only once, they are implemented
as ByteProcessor.
Result:
Efficient search algorithms with linear time complexity available in Netty (I will share
benchmark results in a comment on a PR).
Motivation:
PoolChunk.usage() method has non-trivial computations. It is used currently in hot path methods invoked when an allocation and de-allocation are happened.
The idea is to replace usage() output comparison against percent thresholds by Chunk.freeBytes plain comparison against absolute thresholds. In such way the majority of computations from the threshold conditions are moved to init logic.
Modifications:
Replace PoolChunk.usage() conditions in PoolChunkList with equivalent conditions for PoolChunk.freeBytes()
Result:
Improve performance of allocation and de-allocation of ByteBuf from normal size cache pool
Motivation:
In next major version of netty users should use ChannelHandler everywhere. We should ensure we do the same
Modifications:
Replace usage of deprecated classes / interfaces with ChannelHandler
Result:
Use non-deprecated code
Motivation:
decodeHexNibble can be a lot faster using a lookup table
Modifications:
decodeHexNibble is made faster by using a lookup table
Result:
decodeHexNibble is faster
Motivation:
Currently, characters are appended to the encoded string char-by-char even when no encoding is needed. We can instead separate out codepath that appends the entire string in one go for better `StringBuilder` allocation performance.
Modification:
Only go into char-by-char loop when finding a character that requires encoding.
Result:
The results aren't so clear with noise on my hot laptop - the biggest impact is on long strings, both to reduce resizes of the buffer and also to reduce complexity of the loop. I don't think there's a significant downside though for the cases that hit the slow path.
After
```
Benchmark Mode Cnt Score Error Units
QueryStringEncoderBenchmark.longAscii thrpt 6 1.406 ± 0.069 ops/us
QueryStringEncoderBenchmark.longAsciiFirst thrpt 6 0.046 ± 0.001 ops/us
QueryStringEncoderBenchmark.longUtf8 thrpt 6 0.046 ± 0.001 ops/us
QueryStringEncoderBenchmark.shortAscii thrpt 6 15.781 ± 0.949 ops/us
QueryStringEncoderBenchmark.shortAsciiFirst thrpt 6 3.171 ± 0.232 ops/us
QueryStringEncoderBenchmark.shortUtf8 thrpt 6 3.900 ± 0.667 ops/us
```
Before
```
Benchmark Mode Cnt Score Error Units
QueryStringEncoderBenchmark.longAscii thrpt 6 0.444 ± 0.072 ops/us
QueryStringEncoderBenchmark.longAsciiFirst thrpt 6 0.043 ± 0.002 ops/us
QueryStringEncoderBenchmark.longUtf8 thrpt 6 0.047 ± 0.001 ops/us
QueryStringEncoderBenchmark.shortAscii thrpt 6 16.503 ± 1.015 ops/us
QueryStringEncoderBenchmark.shortAsciiFirst thrpt 6 3.316 ± 0.154 ops/us
QueryStringEncoderBenchmark.shortUtf8 thrpt 6 3.776 ± 0.956 ops/us
```
Motivation:
In Java, it is almost always at least slower to use `ByteBuffer` than `byte[]` without pooling or I/O. `QueryStringDecoder` can use `byte[]` with arguably simpler code.
Modification:
Replace `ByteBuffer` / `CharsetDecoder` with `byte[]` and `new String`
Result:
After
```
Benchmark Mode Cnt Score Error Units
QueryStringDecoderBenchmark.noDecoding thrpt 6 5.612 ± 2.639 ops/us
QueryStringDecoderBenchmark.onlyDecoding thrpt 6 1.393 ± 0.067 ops/us
QueryStringDecoderBenchmark.mixedDecoding thrpt 6 1.223 ± 0.048 ops/us
```
Before
```
Benchmark Mode Cnt Score Error Units
QueryStringDecoderBenchmark.noDecoding thrpt 6 6.123 ± 0.250 ops/us
QueryStringDecoderBenchmark.onlyDecoding thrpt 6 0.922 ± 0.159 ops/us
QueryStringDecoderBenchmark.mixedDecoding thrpt 6 1.032 ± 0.178 ops/us
```
I notice #6781 switched from an array to `ByteBuffer` but I can't find any motivation for that in the PR. Unit tests pass fine with an array and we get a reasonable speed bump.
Motivation
JMH 1.22 was released recently, we might as well use the latest when
running benchmarks.
Summary of changes:
https://mail.openjdk.java.net/pipermail/jmh-dev/2019-November/002879.html
Modifications
Update jmh dependencies in microbench module from version 1.21 to 1.22.
Result
Benchmarks run using latest JMH
Motivation:
Netty homepage(netty.io) serves both "http" and "https".
It's recommended to use https than http.
Modification:
I changed from "http://netty.io" to "https://netty.io"
Result:
No effects.
Motivation:
The previous used maxHeaderListSize was too low which resulted in exceptions during the benchmark run:
```
io.netty.handler.codec.http2.Http2Exception: Header size exceeded max allowed size (8192)
at io.netty.handler.codec.http2.Http2Exception.connectionError(Http2Exception.java:103)
at io.netty.handler.codec.http2.Http2Exception.headerListSizeError(Http2Exception.java:188)
at io.netty.handler.codec.http2.Http2CodecUtil.headerListSizeExceeded(Http2CodecUtil.java:231)
at io.netty.handler.codec.http2.HpackDecoder$Http2HeadersSink.finish(HpackDecoder.java:545)
at io.netty.handler.codec.http2.HpackDecoder.decode(HpackDecoder.java:132)
at io.netty.handler.codec.http2.HpackDecoderBenchmark.decode(HpackDecoderBenchmark.java:85)
at io.netty.handler.codec.http2.generated.HpackDecoderBenchmark_decode_jmhTest.decode_thrpt_jmhStub(HpackDecoderBenchmark_decode_jmhTest.java:120)
at io.netty.handler.codec.http2.generated.HpackDecoderBenchmark_decode_jmhTest.decode_Throughput(HpackDecoderBenchmark_decode_jmhTest.java:83)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:453)
at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:437)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
```
Also we should ensure we only use ascii for header names.
Modifications:
Just use Integer.MAX_VALUE as limit
Result:
Be able to run benchmark without exceptions
Motivation:
Some methods that either override others or are implemented as part of implementation an interface did miss the `@Override` annotation
Modifications:
Add missing `@Override`s
Result:
Code cleanup
Motivation:
SpotJMHBugs reports that accumulating a value as a way of eliding dead code
elimination may be inadvisable, as discussed in
`JMHSample_34_SafeLooping::measureWrong_2`. Change the test so that it consumes
the response with `Blackhole::consume` instead.
Modifications:
- Replace addition of results with explicit `blackhole.consume()` call
Result:
Tests work as before, but with different benchmark numbers.
Motivation:
Some JMH benchmarks need additional explanations to motivate
specific code choices.
Modifications:
Introduced comment to explai why calling BlackHole::consume
in a loop is not always the right choice for some benchmark.
Result:
The relevant method shows a comment that warn about changing
the code to introduce BlackHole::consume in the loop.
Motivation:
Resolve the issue highlighted by SpotJMHBugs that the creation of the RecyclableArrayList may be elided by the JIT since the result isn't consumed or returned.
Modifications:
Return the result of `list.recycle()` so that the list isn't elided.
Result:
The JMH benchmark shows a change in performance indicating that the prior results of this may be unsound.
Motivation:
The wakeup logic in EpollEventLoop is overly complex
Modification:
* Simplify the race to wakeup the loop
* Dont let the event loop wake up itself (it's already awake!)
* Make event loop check if there are any more tasks after preparing to
sleep. There is small window where the non-eventloop writers can issue
eventfd writes here, but that is okay.
Result:
Cleaner wakeup logic.
Benchmarks:
```
BEFORE
Benchmark Mode Cnt Score Error Units
EpollSocketChannelBenchmark.executeMulti thrpt 20 408381.411 ± 2857.498 ops/s
EpollSocketChannelBenchmark.executeSingle thrpt 20 157022.360 ± 1240.573 ops/s
EpollSocketChannelBenchmark.pingPong thrpt 20 60571.704 ± 331.125 ops/s
Benchmark Mode Cnt Score Error Units
EpollSocketChannelBenchmark.executeMulti thrpt 20 440546.953 ± 1652.823 ops/s
EpollSocketChannelBenchmark.executeSingle thrpt 20 168114.751 ± 1176.609 ops/s
EpollSocketChannelBenchmark.pingPong thrpt 20 61231.878 ± 520.108 ops/s
```
Motivation
Pipeline handlers are free to "take control" of input buffers if they have singular refcount - in particular to mutate their raw data if non-readonly via discarding of read bytes, etc.
However there are various places (primarily unit tests) where a wrapped byte-array buffer is passed in and the wrapped array is assumed not to change (used after the wrapped buffer is passed to EmbeddedChannel.writeInbound()). This invalid assumption could result in unexpected errors, such as those exposed by #8931.
Modifications
Anywhere that the data passed to writeInbound() might be used again, ensure that either:
- A copy is used rather than wrapping a shared byte array, or
- The buffer is otherwise protected from modification by making it read-only
For the tests, copying is preferred since it still allows the "mutating" optimizations to be exercised.
Results
Avoid possible errors when pipeline assumes it has full control of input buffer.
Motivation:
Results are just wrong for small delays.
Modifications:
Switching to AvarageTime avoid to rely on OS nanoTime granularity.
Result:
Uncontended low delay results are not reliable
Motivation:
In 42742e233f we already added default methods to Channel*Handler and deprecated the Adapter classes to simplify the class hierarchy. With this change we go even further and merge everything into just ChannelHandler. This simplifies things even more in terms of class-hierarchy.
Modifications:
- Merge ChannelInboundHandler | ChannelOutboundHandler into ChannelHandler
- Adjust code to just use ChannelHandler
- Deprecate old interfaces.
Result:
Cleaner and simpler code in terms of class-hierarchy.