Motivation:
EPOLL supports decoupling the timed wakeup mechanism from the selector call. The EPOLL transport takes advantage of this in order to offer more fine grained timer resolution. However we are current calling timerfd_settime on each call to epoll_wait and this is expensive. We don't have to re-arm the timer on every call to epoll_wait and instead only have to arm the timer when a task is scheduled with an earlier expiration than any other existing scheduled task.
Modifications:
- Before scheduled tasks are added to the task queue, we determine if the new
duration is the soonest to expire, and if so update with timerfd_settime. We
also drain all the tasks at the end of the event loop to make sure we service
any expired tasks and get an accurate next time delay.
- EpollEventLoop maintains a volatile variable which represents the next deadline to expire. This variable is modified inside the event loop thread (before calling epoll_wait) and out side the event loop thread (immediately to ensure proper wakeup time).
- Execute the task queue before the schedule task priority queue. This means we
may delay the processing of scheduled tasks but it ensures we transfer all
pending tasks from the task queue to the scheduled priority queue to run the
soonest to expire scheduled task first.
- Deprecate IORatio on EpollEventLoop, and drain the executor and scheduled queue on each event loop wakeup. Coupling the amount of time we are allowed to drain the executor queue to a proportion of time we process inbound IO may lead to unbounded queue sizes and unpredictable latency.
Result:
Fixes https://github.com/netty/netty/issues/7829
- In most cases this results in less calls to timerfd_settime
- Less event loop wakeups just to check for scheduled tasks executed outside the event loop
- More predictable executor queue and scheduled task queue draining
- More accurate and responsive scheduled task execution
Motivation:
It is possible for a remote peer to flood the server / client with empty DATA frames (without end_of_stream flag) set and so cause high CPU usage without the possibility to ever hit a limit. We need to guard against this.
See CVE-2019-9518
Modifications:
- Add a new config option to AbstractHttp2ConnectionBuilder and sub-classes which allows to set the max number of consecutive empty DATA frames (without end_of_stream flag). After this limit is hit we will close the connection. A limit of 10 is used by default.
- Add unit tests
Result:
Guards against CVE-2019-9518
Motivation:
Due how http2 spec is defined it is possible by a remote peer to flood us with frames that will trigger control frames as response, the problem here is that the remote peer can also just stop reading these (while still produce more of these) and so may drive us to the pointer where we either run out of memory or burn all CPU. To protect against this we need to implement some kind of limit that will tear down connections that cause the above mentioned situation.
See CVE-2019-9512 / CVE-2019-9514 / CVE-2019-9515
Modifications:
- Add Http2ControlFrameLimitEncoder which limits the number of queued control frames that were caused because of the remote peer.
- Allow to insert ths Http2ControlFrameLimitEncoder by setting AbstractHttp2ConnectionBuilder.encoderEnforceMaxQueuedControlFrames(...) to a number higher then 0. The default is 10000 which provides some protection by default but will hopefully not cause too many false-positives.
- Add unit tests
Result:
Protect against DDOS due control frames. Fixes CVE-2019-9512 / CVE-2019-9514 / CVE-2019-9515 .
Motivation
Underlying array allocations in UnpooledHeapByteBuf are intended be done
via the protected allocateArray(int) method, so that they can be tracked
and/or overridden by subclasses, for example
UnpooledByteBufAllocator$InstrumentedUnpooledHeapByteBuf or #8015. But
it looks like an explicit allocation was missed in the copy(int,int)
method.
Modification
Just use alloc().heapBuffer(...) for the allocation
Result
No possibility of "missing" array allocations when ByteBuf#copy is used.
Motivation:
We should delay the firing of the Http2ConnectionPrefaceAndSettingsFrameWrittenEvent by one EventLoop tick when using the Http2FrameCodec to ensure all handlers are added to the pipeline before the event is passed through it.
This is needed to workaround a race that could happen when the preface is send in handlerAdded(...) but a later handler wants to act on the event.
Modifications:
Offload firing of the event to the EventExecutor.
Result:
Fixes https://github.com/netty/netty/issues/9432.
Motivation:
As we use the docker files for the CI we should use the delegated mount option to speed up builds.
See https://docs.docker.com/docker-for-mac/osxfs-caching/#delegated
Modifications:
Use delegated mount option
Result:
Faster builds when using docker
Motivation:
When using OpenSSL and JDK < 11 is used we need to wrap the user provided X509ExtendedTrustManager to be able to support TLS1.3. We had a check in place that first tried to see if wrapping is needed at all which could lead to missleading calls of the user provided trustmanager. We should remove these calls and just always wrap if needed.
Modifications:
Always wrap if OpenSSL + JDK < 11 and TLS1.3 is supported
Result:
Less missleading calls to user provided trustmanager
Motivation:
Users' runtime systems may have incompatible dynamic libraries to the ones our
tcnative wrappers link to. Unfortunately, we cannot determine and catch these
scenarios (in which the JVM crashes) but we can make a more educated guess on
what library to load and try to find one that works better before crashing.
Modifications:
1) Build dynamically linked openSSL builds for more OSs (netty-tcnative)
2) Load native linux libraries with matching classifier (first)
Result:
More developers / users can use the dynamically-linked native libraries.
Motivation:
EpollDatagramChannel#localAddress returns wrong information when
EpollDatagramChannel is created with InternetProtocolFamily,
and EpollDatagramChannel#localAddress is invoked BEFORE the actual binding.
This is a regression caused by change
e17ce934da
Modifications:
EpollDatagramChannel() and EpollDatagramChannel(InternetProtocolFamily family)
do not cache local/remote address
Result:
Rebinding on the same address without "reuse port" works
EpollDatagramChannel#localAddress returns correct address
Motivation:
Allow to set the ORIGIN header value from custom headers in WebSocketClientHandshaker
Modification:
Only override header if not present already
Result:
More flexible handshaker usage
Motivation:
#9224 introduced overrides of ByteBufUtil#writeUtf8(...) and related
methods to operate on a sub-CharSequence directly to save having to
allocate substrings, but it missed an edge case where the subsequence
does not extend to the end of the CharSequence and the last char in the
sequence is a high surrogate.
Due to the catch-IndexOutOfBoundsException optimization that avoids an
additional bounds check, it would be possible to read past the specified
end char index and successfully decode a surrogate pair which would
otherwise result in a '?' byte being written.
Modifications:
- Check for end-of-subsequence before reading next char after a high
surrogate is encountered in the
writeUtf8(AbstractByteBuf,int,CharSequence,int,int) and
utf8BytesNonAscii methods
- Add unit test for this edge case
Result:
Bug is fixed.
This removes the bounds-check-avoidance optimization but it does not
appear to have a measurable impact on benchmark results, including when
the char sequence contains many surrogate pairs (which should be rare in
any case).
Motivation:
If the HttpUtil.getCharset method is called with an illegal charset like
"charset=!illegal!" it throws an IllegalCharsetNameException. But the javadoc
states, that defaultCharset is returned if incorrect header value. Since the
client sending the request sets the header value this should not crash.
Modification:
HttpUtil.getCharset catches the IllegalCharsetNameException and returns the
defualt value.
Result:
HttpUtil.getCharset does not throw IllegalCharsetNameException any more.
Motivation:
On openSUSE (probably more), 64 bit builds use lib64, e.g. /usr/lib64, and
configure picks this up and builds the native library in
native-build/target/lib64 where maven is not looking.
Modifications:
Explicitly specify --libdir=${project.build.directory}/native-build/target/lib
during configuration.
Result:
Maven uses the correct lib directory.
Motivation:
Since both projects (to some extend) rely on classifier parsing via the
os-maven-plugin, they should ideally use the same version in case the parsing
changed.
Modifications:
Upgrade os-maven-plugin from 1.6.0 to 1.6.2
Result:
Same os-maven-plugin with same parsing logic.
…ryWebSocketFrames
Motivation:
`Utf8FrameValidator` is always created and added to the pipeline in `WebSocketServerProtocolHandler.handlerAdded` method. However, for websocket connection with only `BinaryWebSocketFrame`'s UTF8 validator is unnecessary overhead. Adding of `Utf8FrameValidator` could be easily avoided by extending of `WebSocketDecoderConfig` with additional property.
Specification requires UTF-8 validation only for `TextWebSocketFrame`.
Modification:
Added `boolean WebSocketDecoderConfig.withUTF8Validator` that allows to avoid adding of `Utf8FrameValidator` during pipeline initialization.
Result:
Less overhead when using only `BinaryWebSocketFrame`within web socket.
Motivation:
Look like `EmbeddedChannelPipeline` should also override `onUnhandledInboundMessage(ChannelHandlerContext ctx, Object msg)` in order to do not print "Discarded message pipeline" because in case of `EmbeddedChannelPipeline` discarding actually not happens.
This fixes next warning in the latest netty version with websocket and `WebSocketServerCompressionHandler`:
```
13:36:36.231 DEBUG- Decoding WebSocket Frame opCode=2
13:36:36.231 DEBUG- Decoding WebSocket Frame length=5
13:36:36.231 DEBUG- Discarded message pipeline : [JdkZlibDecoder#0, DefaultChannelPipeline$TailContext#0]. Channel : [id: 0xembedded, L:embedded - R:embedded].
```
Modification:
Override correct method
Result:
Follow up fix after https://github.com/netty/netty/pull/9286
Motivation:
Our QA servers are spammed with this messages:
13:57:51.560 DEBUG- Decoding WebSocket Frame opCode=1
13:57:51.560 DEBUG- Decoding WebSocket Frame length=4
I think this is too much info for debug level. It is better to move it to trace level.
Modification:
logger.debug changed to logger.trace for WebSocketFrameEncoder/Decoder
Result:
Less messages in Debug mode.
Motivation:
306299323cd47fed6d15767291a3d52e48d16786 introduced some code change to move the responsibility of creating the stream for the upgrade to Http2FrameCodec. Unfortunaly this lead to the situation of having newStream().setStreamAndProperty(...) be called twice. Because of this we only ever saw the channelActive(...) on Http2StreamChannel but no other events as the mapping was replaced on the second newStream().setStreamAndProperty(...) call.
Beside this we also did not use the correct handler for the upgrade stream in some cases
Modifications:
- Just remove the Http2FrameCodec.onHttpClientUpgrade() method and so let the base class handle all of it. The stream is created correctly as part of the ConnectionListener implementation of Http2FrameCodec already.
- Consolidate logic of creating stream channels
- Adjust unit test to capture the bug
Result:
Fixes https://github.com/netty/netty/issues/9395
Motivation:
We did miss to call reclaimSpace(...) in one case which can lead to the situation of having the Recycler to not correctly reclaim space and so just create new objects when not needed.
Modifications:
Correctly call reclaimSpace(...)
Result:
Recycler correctly reclaims space in all situations.
Motivation:
When using the HTTP/2 multiplex implementation we need to ensure we correctly drain the buffered inbound data even if the RecvByteBufallocator.Handle tells us to stop reading in between.
Modifications:
Correctly loop through the buffered inbound data until the user does stop to request from it.
Result:
Fixes https://github.com/netty/netty/issues/9387.
Co-authored-by: Bryce Anderson <banderson@twitter.com>
Motivation:
We can easily reuse the Http2FrameStreamEvent instances and so reduce GC pressure as there may be multiple events per streams over the life-time.
Modifications:
Reuse instances
Result:
Less allocations
Motivation:
In many places Netty uses Unpooled.buffer(0) while should use EMPTY_BUFFER. We can't change this due to back compatibility in the constructors but can use Unpooled.EMPTY_BUFFER in some cases to ensure we not allocate at all. In others we can directly use the allocator either from the Channel / ChannelHandlerContext or the request / response.
Modification:
- Use Unpooled.EMPTY_BUFFER where possible
- Use allocator where possible
Result:
Fixes#9345 for websockets and http package
Motivation:
At the moment we lookup the ChannelHandlerContext used in Http2StreamChannelBootstrap each time the open(...) method is invoked. This is not needed and we can just cache it for later usage.
Modifications:
Cache ChannelHandlerContext in volatile field.
Result:
Speed up open(...) method implementation when called multiple times
Motivation:
We need to ensure we place the encoder before the decoder when doing the websockets upgrade as the decoder may produce a close frame when protocol violations are detected.
Modifications:
- Correctly place encoder before decoder
- Add unit test
Result:
Fixes https://github.com/netty/netty/issues/9300
Motivation:
If a read triggers a AbstractHttp2StreamChannel to close we can
get an NPE in the read loop.
Modifications:
Make sure that the inboundBuffer isn't null before attempting to
continue the loop.
Result:
No NPE.
Fixes#9337
Motivation:
We recently made a change to use ET for the eventfd and not trigger a read each time. This testcase proves everything works as expected.
Modifications:
Add testcase that verifies thqat the wakeups happen correctly
Result:
More tests
Motivation:
If the encoded value of a form element happens to exactly hit
the chunk limit (8096 bytes), the post request encoder will
throw a NullPointerException.
Modifications:
Catch the null case and return.
Result:
No NPE.
Motivation:
The Http2FrameCodec should be responsible to create the upgrade stream.
Modifications:
Move code to create stream to Http2FrameCodec
Result:
More correct responsibility
Motivation
The AbstractEpollStreamChannel::spliceTo(FileDescriptor, ...) methods
take an offset parameter but this was effectively ignored due to what
looks like a typo in the corresponding JNI function impl. Instead it
would always use the file's own native offset.
Modification
- Fix typo in netty_epoll_native_splice0() and offset accounting in
AbstractEpollStreamChannel::SpliceFdTask.
- Modify unit test to include an invocation of the public spliceTo
method using non-zero offset.
Result
spliceTo FD methods work as expected when an offset is provided.
Motivation:
While fixing #9359 found few places that could be patched / improved separately.
Modification:
On handshake response generation - throw exception before allocating response objects if request is invalid.
Result:
No more leaks when exception is thrown.
Motivation:
Mark Http2StreamChannelBootstrap.open0(...) as deprecated as the user should not use it. It was marked as public by mistake.
Modifications:
Add deprecation warning.
Result:
User will be aware the method should not be used directly.
Motivation:
There are situations where the user may want to be more flexible when to send the PING acks for various reasons or be able to attach a listener to the future that is used for the ping ack. To be able to do so we should allow to manage the acks manually.
Modifications:
- Add constructor to DefaultHttp2ConnectionDecoder that allows to disable the automatically sending of ping acks (default is to send automatically to not break users)
- Add methods to AbstractHttp2ConnectionHandlerBuilder (and sub-classes) to either enable ot disable auto acks for pings
- Make DefaultHttp2PingFrame constructor public that allows to write acks.
- Add unit test
Result:
More flexible way of handling acks.
Motivation
I noticed this while looking at something else.
AbstractEpollStreamChannel::spliceQueue is an MPSC queue but only
accessed from the event loop. So it could be just changed to e.g. an
ArrayDeque. This PR instead reverts to using is as an MPSC queue to
avoid dispatching a task to the EL, as appears was the original
intention.
Modification
Change AbstractEpollStreamChannel::spliceQueue to be volatile and lazily
initialized via double-checked locking. Add tasks directly to the queue
from the public methods rather than possibly waking the EL just to
enqueue.
An alternative is just to change PlatformDependent.newMpscQueue() to new
ArrayDeque() and be done with it :)
Result
Less disruptive channel/fd-splicing.
Motivation:
Based on https://tools.ietf.org/html/rfc6455#section-1.3 - for non-browser
clients, Origin header field may be sent if it makes sense in the context of those clients.
Modification:
Replace Sec-WebSocket-Origin to Origin
Result:
Fixes#9134 .
Motivation:
Recently I'm going to build MQTT broker and client based on Netty. I had MQTT encoder and decoder founded, while no basic examples. So I'm going to share my simple heartBeat MQTT broker and client as an example.
Modification:
New MQTT heartBeat example under io.netty.example/mqtt/heartBeat/.
Result:
Client would send CONNECT and PINGREQ(heartBeat message).
- CONNECT: once channel active
- PINGREQ: once IdleStateEvent triggered, which is 20 seconds in this example
Client would discard all messages it received.
MQTT broker could handle CONNECT, PINGREQ and DISCONNECT messages.
- CONNECT: send CONNACK back
- PINGREQ: send PINGRESP back
- DISCONNECT: close the channel
Broker would close the channel if 2 heartBeat lost, which set to 45 seconds in this example.
Motivation
Debugging SSL/TLS connections through wireshark is a pain -- if the cipher used involves Diffie-Hellman then it is essentially impossible unless you can have the client dump out the master key [1]
This is a work-in-progress change (tests & comments to come!) that introduces a new handler you can set on the SslContext to receive the master key & session id. I'm hoping to get feedback if a change in this vein would be welcomed.
An implementation that conforms to Wireshark's NSS key log[2] file is also included.
Depending on feedback on the PR going forward I am planning to "clean it up" by adding documentation, example server & tests. Implementation will need to be finished as well for retrieving the master key from the OpenSSL context.
[1] https://jimshaver.net/2015/02/11/decrypting-tls-browser-traffic-with-wireshark-the-easy-way/
[2] https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/Key_Log_Format
Modification
- Added SslMasterKeyHandler
- An implementation of the handler that conforms to Wireshark's key log format is included.
Result:
Be able to debug SSL / TLS connections more easily.
Signed-off-by: Farid Zakaria <farid.m.zakaria@gmail.com>
Motivation:
Netty homepage(netty.io) serves both "http" and "https".
It's recommended to use https than http.
Modification:
I changed from "http://netty.io" to "https://netty.io"
Result:
No effects.
Motivation:
Http2ConnectionHandler (and sub-classes) allow to configure a graceful shutdown timeout but only apply it if there is at least one active stream. We should always apply the timeout. This is also true when we try to send a GO_AWAY and close the connection because of an connection error.
Modifications:
- Always apply the timeout if one is configured
- Add unit test
Result:
Always respect gracefulShutdownTimeoutMillis
Motivation:
b3dba317d797e21cc253bb6ad6776307297f612e introduced the concept of Http2SettingsReceivedConsumer but did not correctly inplement DecoratingHttp2ConnectionEncoder.consumeRemoteSettings(...).
Modifications:
- Add missing `else` around the throws
- Add unit tests
Result:
Correctly implement DecoratingHttp2ConnectionEncoder.consumeRemoteSettings(...)
Motivation
The nice change made by @carl-mastrangelo in #9307 for lookup-table
based HPACK Huffman decoding can be simplified a little to remove the
separate flags field and eliminate some intermediate operations.
Modification
Simplify HpackHuffmanDecoder::decode logic including de-dup of the
per-nibble part.
Result
Less code, possibly better performance though not noticeable in a quick
benchmark.