Motivation:
UDP_GRO can improve performance when reading UDP datagrams. This patch adds support for it.
See https://lwn.net/Articles/768995/
Modifications:
- Add recvmsg(...)
- Add support for UDP_GRO in recvmsg(...) and recvmmsg(...)
- Remove usage of recvfrom(...) and just always use recvmsg(...) or recvmmsg(...) to simplify things
- Refactor some code for sharing
- Add EpollChannelOption.UDP_GRO and the getter / setter in EpollDatagramConfig
Result:
UDP_GRO is supported when the underlying system supports it.
Motivation:
We had a bug in out DefaulThreadFactory as it always retrieved the ThreadGroup to used during creation time when now explicit ThreadGroup was given. This is problematic as the Thread may die and so the ThreadGroup is destroyed even tho the DefaultThreadFactory is still used.
This could produce exceptions like:
java.lang.IllegalThreadStateException
at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:867)
at java.lang.Thread.init(Thread.java:405)
at java.lang.Thread.init(Thread.java:349)
at java.lang.Thread.<init>(Thread.java:599)
at io.netty.util.concurrent.FastThreadLocalThread.<init>(FastThreadLocalThread.java:60)
at io.netty.util.concurrent.DefaultThreadFactory.newThread(DefaultThreadFactory.java:122)
at io.netty.util.concurrent.DefaultThreadFactory.newThread(DefaultThreadFactory.java:106)
at io.netty.util.concurrent.ThreadPerTaskExecutor.execute(ThreadPerTaskExecutor.java:32)
at io.netty.util.internal.ThreadExecutorMap$1.execute(ThreadExecutorMap.java:57)
at io.netty.util.concurrent.SingleThreadEventExecutor.doStartThread(SingleThreadEventExecutor.java:978)
at io.netty.util.concurrent.SingleThreadEventExecutor.startThread(SingleThreadEventExecutor.java:947)
at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:830)
at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:818)
at io.netty.channel.AbstractChannel$AbstractUnsafe.register(AbstractChannel.java:471)
at io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:87)
at io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:81)
at io.netty.channel.MultithreadEventLoopGroup.register(MultithreadEventLoopGroup.java:86)
at io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:323)
at io.netty.bootstrap.AbstractBootstrap.doBind(AbstractBootstrap.java:272)
at io.netty.bootstrap.AbstractBootstrap.bind(AbstractBootstrap.java:239)
at io.netty.incubator.codec.quic.QuicTestUtils.newServer(QuicTestUtils.java:138)
at io.netty.incubator.codec.quic.QuicTestUtils.newServer(QuicTestUtils.java:143)
at io.netty.incubator.codec.quic.QuicTestUtils.newServer(QuicTestUtils.java:147)
at io.netty.incubator.codec.quic.QuicStreamFrameTest.testCloseHalfClosure(QuicStreamFrameTest.java:48)
at io.netty.incubator.codec.quic.QuicStreamFrameTest.testCloseHalfClosureUnidirectional(QuicStreamFrameTest.java:35)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
Modifications:
- If the user dont specify a ThreadGroup we will just pass null to the constructor of FastThreadLocalThread and so let it retrieve on creation time
- Adjust tests
Result:
Don't risk to see IllegalThreadStateExceptions.
netty-jni-util 0.0.2.Final is incompatible with static linking. Before
the netty-jni-util dependency was introduced netty-tcnative supported
static linking via NETTY_BUILD_STATIC. netty-jni-util 0.0.3.Final adds
static linking compatibility.
Modifications:
Bump netty-jni-util to version 0.0.3.Final and update to its new API
which requires the caller to manage packagePrefix.
Result:
Using latest version of netty-jni-util and restored static linking
compatibility.
Motivation:
LSE (https://mysqlonarm.github.io/ARM-LSE-and-MySQL/) can have a huge performance difference. Let's ensure we use a compiler that can support it.
Modifications:
Update to gc10 when cross-compiling as it supports LSE and enables it by default
Result:
More optimized builds for aarch64
Motivation:
CONNECT requests have no path defined as stated in the HTTP/2 spec, at the moment we will throw an exception if we try to convert such a request to HTTP/1.1
Modifications:
- Don't throw an exception if we try to convert a HTTP/2 CONNECT request that has no path
- Add unit test
Result:
Related to https://github.com/netty/netty-incubator-codec-http3/pull/112.
Motivation:
We need to ensure we are still be able to correctly map errors to streams in all cases. The problem was that we sometimes called closeStreamRemote(...) in a finally block and so closed the underyling stream before the actual exception was propagated. This was only true in some cases and not in all. Generally speaking we should only call closeStreamRemote(...) if there was no error as in a case of error we should generate a RST frame.
Modifications:
- Only call closeStreamRemote(...) if no exeption was thrown and so let the Http2ConnectionHandler handle the exception correctly
- Add unit tests
Result:
Correctly handle errors even when endStream is set to true
Motivation:
Alignment handling was broken, and basically turned into a fixed offset into each allocation address regardless of its initial value, instead of ensuring that the allocated address is either aligned or bumped to the nearest alignment offset.
The brokenness of the alignment handling extended so far, that overlapping ByteBuf instances could even be created, as was seen in #11101.
Modification:
Instead of fixing the per-allocation pointer bump, we now ensure that 1) the minimum page size is a whole multiple of the alignment, and 2) the reference memory for each chunk is bumped to the nearest aligned address, and finally 3) ensured that the reservations are whole multiples of the alignment, thus ensuring that the next allocation automatically occurs from an aligned address.
Incidentally, (3) above comes for free because the reservations are in whole pages, and in (1) we ensured that pages are sized in whole multiples of the alignment.
In order to ensure that the memory for a chunk is aligned, we introduce some new PlatformDependent infrastructure.
The PlatformDependent.alignDirectBuffer will produce a slice of the given buffer, and the slice will have an address that is aligned.
This method is plainly available on ByteBuffer in Java 9 onwards, but for pre-9 we have to use Unsafe, which means it can fail and might not be available on all platforms.
Attempts to create a PooledByteBufAllocator that uses alignment, when this is not supported, will throw an exception.
Luckily, I think use of aligned allocations are rare.
Result:
Aligned pooled byte bufs now work correctly, and never have any overlap.
Fixes#11101
Motivation
A GOAWAY frame (or any other HTTP/2 frame) should not be sent before the
connection preface. Clients that immediately close the channel may
currently attempt to send a GOAWAY frame before the connection preface,
resulting in servers receiving a seemingly-corrupt connection preface.
Modifications
* Ensure that the preface has been sent before attempting to
automatically send a GOAWAY frame as part of channel shutdown logic
* Add unit test that only passes with new behavior
Result
Fixes https://github.com/netty/netty/issues/11026
Co-authored-by: Bennett Lynch <Bennett-Lynch@users.noreply.github.com>
Motivation:
If compression is enabled and the HttpResponse also
implements HttpContent (but not LastHttpContent) then
the buffer will be freed to eagerly.
Modification:
I retain the buffer the same way that is done for the LastHttpContent case.
Note that there is another suspicious looking call a few lines above (if beginEncode returns null). I am not sure if this should also be retained.
Result:
Fixes#11092
Motivation:
Components in a composite buffer can "go missing" if the composite is a slice of another composite and the parent has changed its layout.
Modification:
Where we would previously have thrown a NullPointerException, we now have a null-check for the component, and we instead throw an IllegalStateException with a more descriptive message.
Result:
It's now a bit easier to understand what is going on in these situations.
Fixes#10908
Motivation:
Newer versions of dawidd6/action-download-artifact changed the default
workflow_conclusion from "completed" to "completed, success" which can
result in download failures if the associated job fails, which is
expected when tests fail.
Modifications:
- Explicitly set the workflow_conclusion to "completed"
Result:
Test failures which result in build failures will still download test
data and generate reports after updating
dawidd6/action-download-artifact.
... number of bytes when using DatagramChannels
Motivation:
In our FixedRecvByteBufAllocator we dont continue to read if the number of bytes is less then what was configured. This is correct when using it for TCP but not when using it for UDP. When using UDP the number of bytes is the maximum of what we want to support but we often end up processing smaller datagrams in general. Because of this we should use contineReading(UncheckedBooleanSupplier) to determite if we should continue reading
Modifications:
- use contineReading(UncheckedBooleanSupplier) for DatagramChannels
Result:
Read more then once in the general case for DatagramChannels with the default config
Motivation:
Allow to configure the maximum number of messages to write per eventloop run. This can be useful to ensure we read data in a timely manner and not let writes dominate the CPU time. This is especially useful in protocols like QUIC where you need to read "fast enough" as otherwise you may not read the ACKs fast enough.
Modifications:
- Add new ChannelOption / config that allows to limit the number of messages to write per eventloop run.
- Respect this setting for DatagramChannels
Result:
Reduce the risk of having WRITES block the processing of other events in a timely manner
Co-authored-by: terrarier2111 <58695553+terrarier2111@users.noreply.github.com>
Motivation:
SslHandler owns the responsibility to flush non-application data
(e.g. handshake, renegotiation, etc.) to the socket. However when
TCP Fast Open is supported but the client_hello cannot be written
in the SYN the client_hello may not always be flushed. SslHandler
may not wrap/flush previously written/flushed data in the event
it was not able to be wrapped due to NEED_UNWRAP state being
encountered in wrap (e.g. peer initiated renegotiation).
Modifications:
- SslHandler to flush in channelActive() if TFO is enabled and
the client_hello cannot be written in the SYN.
- SslHandler to wrap application data after non-application data
wrap and handshake status is FINISHED.
- SocketSslEchoTest only flushes when writes are done, and waits
for the handshake to complete before writing.
Result:
SslHandler flushes handshake data for TFO, and previously flushed
application data after peer initiated renegotiation finishes.
Motivation:
The EpollSocketConnectTest was not correctly configuring TCP Fast Open on the server socket.
It's an option, not a child option.
Modification:
EpollSocketConnectTest now correctly enables TCP Fast Open on the server side, when available, for the test that needs it.
Result:
Test covers what it was intended to.
Motivation:
There are several overloads of streamError(), with one receiving the
Throwable to be made the cause of the new exception. However, the wrong
overload was being called and instead the IllegalArgumentException was
being passed as a message format argument which was summarily thrown
away as the message format didn't reference it.
Modifications:
Move IllegalArgumentException to proper argument position.
Result:
A useful exception, with the underlying cause available.
Motivation:
c22c6b845d introduced support for
UDP_SEGMENT but did restrict it to continous buffers. This is not needed
as it is also fine to use CompositeByteBuf
Modifications:
- Allow to use CompositeByteBuf as well
- Add unit test
Result:
More flexible usage of segmented datagrams possible
Motivation
The HttpObjectDecoder accepts input parameters for maxInitialLineLength
and maxHeaderSize. These are important variables since both message
components must be buffered in memory. As such, many decoders (like
Netty and others) introduce constraints. Due to their importance, many
users may wish to add instrumentation on the values of successful
decoder results, or otherwise be able to access these values to enforce
their own supplemental constraints.
While users can perhaps estimate the sizes today, they will not be
exact, due to the decoder being responsible for consuming optional
whitespace and the like.
Modifications
* Add HttpMessageDecoderResult class. This class extends DecoderResult
and is intended for HttpMessage objects successfully decoded by the
HttpObjectDecoder. It exposes attributes for the decoded
initialLineLength and headerSize.
* Modify HttpObjectDecoder to produce HttpMessageDecoderResults upon
successfully decoding the last HTTP header.
* Add corresponding tests to HttpRequestDecoderTest &
HttpResponseDecoderTest.
Co-authored-by: Bennett Lynch <Bennett-Lynch@users.noreply.github.com>
Motivation:
Due a regression in fd8c1874b4 we did not correctly set the result for the returned Future if it was build for a Callable.
Modifications:
- Adjust code to call get() to retrive the correct result for notification of the future.
- Add unit test
Result:
Fixes https://github.com/netty/netty/issues/11072
Motivation:
At the moment we use http to download rpms, let's switch to https
Modifications:
Use https for rpms
Result:
Hopefully more stable docker image builds
Motivation:
As stated by https://tools.ietf.org/html/rfc7540#section-8.1.2.6 we should report a stream error if the content-length does not match the sum of all data frames.
Modifications:
- Verify that the sum of data frames match if a content-length header was send.
- Handle multiple content-length headers and also handle negative values
- Add io.netty.http2.validateContentLength system property which allows to disable the more strict validation
- Add unit tests
Result:
Correctly handle the case when the content-length header was included but not match what is send and also when content-length header is invalid
Motivation:
HttpObjectDecoder may throw an IllegalArgumentException if it encounters
a character that Character.isWhitespace() returns true for, but is not
one of the two valid OWS (optional whitespace) values. Such values may
not have friendly or readable toString() values. We should include the
hex value so that the illegal character can always be determined.
Modifications:
Add hex value as well
Result:
Easier to debug
Co-authored-by: Bennett Lynch <Bennett-Lynch@users.noreply.github.com>
Motivation:
These days we always include the OS in the library name. This means we also can simplify things
Modifications:
Adjust build configuration to address for libray name change
Result:
Simplify build
Motivation:
The DnsResolver default start address listen to "0.0.0.0", which may be not what the user wants.
Modification:
Add localAddress as a param of DnsNameResolver and its builder
Result:
The DnsNameResolver's bind address can be configured.
Motivation:
At the moment we don't support session caching on the client side at all when using the native SSL implementation. We should at least allow to enable it.
Modification:
Allow to enable session cache for client side but disable ti by default due a JDK bug atm.
Result:
Be able to cache sessions on the client side when using native SSL implementation .
Motivation:
In WriteTimeoutHandler we did make the assumption that the executor which is used to schedule the timeout is the same that is backing the write promise. This may not be true which will cause concurrency issues
Modifications:
Ensure we are on the right thread when try to modify the doubly-linked-list and if not schedule it on the right thread.
Result:
Fixes https://github.com/netty/netty/issues/11053
Motivation:
At the moment its only possible to create a PendingWriteQueue via a ChannelHandlerContext.
Modifications:
Add another constructor
Result:
More flexible usage of PendingWriteQueue
Motivation:
We need to ensure that we call queue.remove() before we cal writeAndFlush() as this operation may cause an event that also touches the queue and remove from it. If we miss to do so we may see NoSuchElementExceptions.
Modifications:
- Call queue.remove() before calling writeAndFlush(...)
- Add unit test
Result:
Fixes https://github.com/netty/netty/issues/11046
Motivation:
Channels need to have their configurations applied before we can call out to user-code via handlerAdded and initChannel.
Modification:
This adds tests for this behaviour, which already works correctly.
Result:
Better test coverage.
Motivation:
For protocols like QUIC using UDP_SEGMENT (GSO) can help to reduce the
overhead quite a bit. We should support it.
Modifications:
- Add a SegmentedDatagramPacket which can be used to use UDP_SEGMENT
- Add unit test
Result:
Be able to make use of UDP_SEGMENT
Motivation:
- Underlying buffer usages might be erroneous when releasing them internaly
in HttpPostMultipartRequestDecoder.
2 bugs occurs:
1) Final File upload seems not to be of the right size.
2) Memory, even in Disk mode, is increasing continuously, while it shouldn't.
- Method `getByte(position)` is too often called within the current implementation
of the HttpPostMultipartRequestDecoder.
This implies too much activities which is visible when PARANOID mode is active.
This is also true in standard mode.
Apply the same fix on buffer from HttpPostMultipartRequestDecoder to HttpPostStandardRequestDecoder
made previously.
Finally in order to ensure we do not rewrite already decoded HttpData when decoding
next ones within multipart, we must ensure the buffers are copied and not a retained slice.
Modification:
- Add some tests to check consistency for HttpPostMultipartRequestDecoder.
Add a package protected method for testing purpose only.
- Use the `bytesBefore(...)` method instead of `getByte(pos)` in order to limit the external
access to the underlying buffer by retrieving iteratively the beginning of a correct start
position.
It is used to find both LF/CRLF and delimiter.
2 methods in HttpPostBodyUtil were created for that.
The undecodedChunk is copied when adding a chunk to a DataMultipart is loaded.
The same buffer is also rewritten in order to release the copied memory part.
Result:
Just for note, for both Memory or Disk or Mixed mode factories, the release has to be done as:
for (InterfaceHttpData httpData: decoder.getBodyHttpDatas()) {
httpData.release();
factory.removeHttpDataFromClean(request, httpData);
}
factory.cleanAllHttpData();
decoder.destroy();
The memory used is minimal in Disk or Mixed mode. In Memory mode, a big file is still
in memory but not more in the undecodedChunk but its own buffer (copied).
In terms of benchmarking, the results are:
Original code Benchmark Mode Cnt Score Error Units
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel thrpt 6 0,152 ± 0,100 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel thrpt 6 0,543 ± 0,218 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel thrpt 6 0,001 ± 0,001 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel thrpt 6 0,615 ± 0,070 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel thrpt 6 0,114 ± 0,063 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel thrpt 6 0,664 ± 0,034 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel thrpt 6 0,001 ± 0,001 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel thrpt 6 0,620 ± 0,140 ops/ms
New code Benchmark Mode Cnt Score Error Units
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel thrpt 6 4,037 ± 0,358 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel thrpt 6 4,226 ± 0,471 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel thrpt 6 0,875 ± 0,029 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel thrpt 6 4,346 ± 0,275 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel thrpt 6 2,044 ± 0,020 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel thrpt 6 2,278 ± 0,159 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel thrpt 6 0,174 ± 0,004 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel thrpt 6 2,370 ± 0,065 ops/ms
In short, using big file transfers, this is about 7 times faster with new code, while
using high number of HttpData, this is about 4 times faster with new code when using Simple Level.
When using Paranoid Level, using big file transfers, this is about 800 times faster with new code, while
using high number of HttpData, this is about 170 times faster with new code.
Motivation:
It is possible for two separate threads to race on recycling an object.
If this happens, the object might be added to a WeakOrderQueue when it shouldn't be.
The end result of this is that an object could be acquired multiple times, without a recycle in between.
Effectively, it ends up in circulation twice.
Modification:
We fix this by making the update to the lastRecycledId field of the handle, an atomic state transition.
Only the thread that "wins" the race and succeeds in their state transition will be allowed to recycle the object.
The others will bail out on their recycling.
We use weakCompareAndSet because we only need the atomicity guarantee, and the program order within each thread is sufficient.
Also, spurious failures just means we won't recycle that particular object, which is fine.
Result:
Objects no longer risk circulating twice due to a recycle race.
This fixes#10986
Motivation:
It is not uncommon to run Netty on OS X without the specific
`MacOSDnsServerAddressStreamProvider`. The current log message is too
verbose because it prints a full stack trace on the console while a
simple logging message would have been enough.
Modifications:
- Print a `WARN` message when `MacOSDnsServerAddressStreamProvider`
class is not found;
- Print a `ERROR` message with a stack trace when the class was found
but could not be loaded due to some other reasons;
Result:
Less noise in logs.
Motivation:
We should use a higher timeout as sometimes the verification process in oss.sonatype.org is very slow.
Modifications:
Bump up timeout to 10 minutes
Result:
Less likely to see timeouts
Motivation:
When finish the release process we need to give the id of the staged release. Let's add a script for that
Modifications:
Add script which allows to show all staged releases
Result:
No need to login into sonatype anymore
Motivation:
If two different headers end up in the same hash bucket, and you are iterating the header that is not the first in the bucket, and you use the iterator to remove the first element returned from the iterator, then you would get a NullPointerException.
Modification:
Change the DefaultHeaders iterator remove method, to re-iterate the hash bucket and unlink the entry once found, if we don't have any existing iteration starting point.
Also made DefaultHeaders.remove0 package private to avoid a synthetic method indirection.
Result:
Removing from iterators from DefaultHeaders is now robust towards hash collisions.
Motivation:
When TLSv1.3 is used (or TLS_FALSE_START) together with mTLS the handshake is considered successful before the server actually did verify the key material that was provided by the client. If the verification fails we currently will just close the stream without any extra information which makes it very hard to debug on the client side.
Modifications:
- Propagate SSLExceptions to the active streams
- Add unit test
Result:
Better visibility into why a stream was closed