Motivation
The epoll transport was updated in #7834 to decouple setting of the
timerFd from the event loop, so that scheduling delayed tasks does not
require waking up epoll_wait. To achieve this, new overridable hooks
were added in the AbstractScheduledEventExecutor and
SingleThreadEventExecutor superclasses.
However, the minimumDelayScheduledTaskRemoved hook has no current
purpose and I can't envisage a _practical_ need for it. Removing
it would reduce complexity and avoid supporting this specific
API indefinitely. We can add something similar later if needed
but the opposite is not true.
There also isn't a _nice_ way to use the abstractions for
wakeup-avoidance optimizations in other EventLoops that don't have a
decoupled timer.
This PR replaces executeScheduledRunnable and
wakesUpForScheduledRunnable
with two new methods before/afterFutureTaskScheduled that have slightly
different semantics:
- They only apply to additions; given the current internals there's no
practical use for removals
- They allow per-submission wakeup decisions via a boolean return val,
which makes them easier to exploit from other existing EL impls (e.g.
NIO/KQueue)
- They are subjectively "cleaner", taking just the deadline parameter
and not exposing Runnables
- For current EL/queue impls, only the "after" hook is really needed,
but specialized blocking queue impls can conditionally wake on task
submission (I have one lined up)
Also included are further optimization/simplification/fixes to the
timerFd manipulation logic.
Modifications
- Remove AbstractScheduledEventExecutor#minimumDelayScheduledTaskRemoved()
and supporting methods
- Uplift NonWakeupRunnable and corresponding default wakesUpForTask()
impl from SingleThreadEventLoop to SingleThreadEventExecutor
- Change executeScheduledRunnable() to be package-private, and have a
final impl in SingleThreadEventExecutor which triggers new overridable
hooks before/afterFutureTaskScheduled()
- Remove unnecessary use of bookend tasks while draining the task queue
- Use new hooks to add simpler wake-up avoidance optimization to
NioEventLoop (primarily to demonstrate utility/simplicity)
- Reinstate removed EpollTest class
In EpollEventLoop:
- Refactor to use only the new afterFutureTaskScheduled() hook for
updating timerFd
- Fix setTimerFd race condition using a monitor
- Set nextDeadlineNanos to a negative value while the EL is awake and
use this to block timer changes from outside the EL. Restore the
known-set value prior to sleeping, updating timerFd first if necessary
- Don't read from timerFd when processing expiry event
Result
- Cleaner API for integrating with different EL/queue timing impls
- Fixed race condition to avoid missing scheduled wakeups
- Eliminate unnecessary timerFd updates while EL is awake, and
unnecessary expired timerFd reads
- Avoid unnecessary scheduled-task wakeups when using NIO transport
I did not yet further explore the suggestion of using
TFD_TIMER_ABSTIME for the timerFd.
Motivation:
Amazon lately released Amazon Corretto Crypto Provider, so we should include it in our testsuite
Modifications:
Add tests related to Amazon Corretto Crypto Provider
Result:
Test netty with Amazon Corretto Crypto Provider
Motivation:
Http post request may be encoded as 'multipart/form-data' without any files and consist mixed attributes only.
Modifications:
- Do not double release attributes
- Add unit test
Result:
Code does not throw an IllegalReferenceCountException.
Motivation
Currently an epoll_ctl syscall is made every time there is a change to
the event interest flags (EPOLLIN, EPOLLOUT, etc) of a channel. These
are only done in the event loop so can be aggregated into 0 or 1 such
calls per channel prior to the next call to epoll_wait.
Modifications
I think further streamlining/simplification is possible but for now I've
tried to minimize structural changes and added the aggregation beneath
the existing flag manipulation logic.
A new AbstractChannel#activeFlags field records the flags last set on
the epoll fd for that channel. Calls to setFlag/clearFlag update the
flags field as before but instead of calling epoll_ctl immediately, just
set or clear a bit for the channel in a new bitset in the associated
EpollEventLoop to reflect whether there's any change to the last set
value.
Prior to calling epoll_wait the event loop makes the appropriate
epoll_ctl(EPOLL_CTL_MOD) call once for each channel who's bit is set.
Result
Fewer syscalls, particularly in some auto-read=false cases. Simplified
error handling from centralization of these calls.
Motivation:
We need to ensure we replace WebSocketServerProtocolHandshakeHandler before doing the actual handshake as the handshake itself may complete directly and so forward pending bytes through the pipeline.
Modifications:
Replace the handler before doing the actual handshake.
Result:
Fixes https://github.com/netty/netty/issues/9471.
Motivation:
It is possible that the user uses a too big EDNS0 setting for the MTU and so we may receive a truncated datagram packet. In this case we should try to detect this and retry via TCP if possible
Modifications:
- Fix detecting of incomplete records
- Mark response as truncated if we did not consume the whole packet
- Add unit test
Result:
Fixes https://github.com/netty/netty/issues/9365
Motivation:
AsciiString.contentEqualsIgnoreCase may return true for non-matching strings of equal length when offset is non zero.
Modifications:
- Correctly take offset into account
- Add unit test
Result:
Fixes#9475
Motivation:
If all we need is the FileChannel we should better use RandomAccessFile as FileInputStream and FileOutputStream use a finalizer.
Modifications:
Replace FileInputStream and FileOutputStream with RandomAccessFile when possible.
Result:
Fixes https://github.com/netty/netty/issues/8078.
Motivation:
In AbstractBoostrap, options and attrs are LinkedHashMap that are synchronized on for every read, copy/clone, write operation.
When a lot of connections are triggered concurrently on the same bootstrap instance, the synchronized blocks lead to contention, Netty IO threads get blocked, and performance may be severely degraded.
Modifications:
Use ConcurrentHashMap
Result:
Less contention. Fixes https://github.com/netty/netty/issues/9426
Motivation:
We should use the same java versions whenever we use CentOS 6 or 7 and also use the latest Java12 version
Modifications:
- Use same Java versions
- Use latest Java 12 version
- Remove old configs which are not used anymore
Result:
Docker cleanup
Motivation:
We should better update the flow-controller on Channel.read() to reduce overhead and memory overhead.
See https://github.com/netty/netty/pull/9390#issuecomment-513008269
Modifications:
Move updateLocalWindowIfNeeded() to doBeginRead()
Result:
Reduce memory overhead
Motivation
#1802 fixed ByteBuf implementations to ensure that the whole buffer
region is preserved when capacity is increased, not just the readable
part. The behaviour is still different however when the capacity is
_decreased_ - data outside the currently-readable region is zeroed.
Modifications
Update ByteBuf capacity(int) implementations to also copy the whole
buffer region when the new capacity is less than the current capacity.
Result
Consistent behaviour of ByteBuf#capacity(int) regardless of whether the
new capacity is greater than or less than the current capacity.
Motivation:
It was possible to produce a NPE when we for examples received more responses as requests as we did not check if the queue did not contain a method before trying to compare method names.
Modifications:
- Add extra null check
- Add unit tet
Result:
Fixes https://github.com/netty/netty/issues/9459
Motivation:
As we decorate the Http2FrameListener under the covers we should ensure the user can still access the original Http2FrameListener.
Modifications:
- Unwrap the Http2FrameListener in frameListener()
- Add unit test
Result:
Less suprises for users.
Motivation:
We should only ever close the underlying tcp socket once we received the envelope to ensure we never race in the test.
Modifications:
- Only close socket once we received the envelope
- Set REUSE_ADDR
Result:
More robust test
Motivation:
We did not correctly pass the mask parameters in all cases.
Modifications:
Correctly pass on parameters
Result:
Fixes https://github.com/netty/netty/issues/9463.
Motivation:
We recently introduced Http2ControlFrameLimitEncoderTest which did not correctly notify the goAway promises and so leaked buffers.
Modifications:
Correctly notify all promises and so release the debug data.
Result:
Fixes leak in HTTP2 test
Motivation:
EPOLL supports decoupling the timed wakeup mechanism from the selector call. The EPOLL transport takes advantage of this in order to offer more fine grained timer resolution. However we are current calling timerfd_settime on each call to epoll_wait and this is expensive. We don't have to re-arm the timer on every call to epoll_wait and instead only have to arm the timer when a task is scheduled with an earlier expiration than any other existing scheduled task.
Modifications:
- Before scheduled tasks are added to the task queue, we determine if the new
duration is the soonest to expire, and if so update with timerfd_settime. We
also drain all the tasks at the end of the event loop to make sure we service
any expired tasks and get an accurate next time delay.
- EpollEventLoop maintains a volatile variable which represents the next deadline to expire. This variable is modified inside the event loop thread (before calling epoll_wait) and out side the event loop thread (immediately to ensure proper wakeup time).
- Execute the task queue before the schedule task priority queue. This means we
may delay the processing of scheduled tasks but it ensures we transfer all
pending tasks from the task queue to the scheduled priority queue to run the
soonest to expire scheduled task first.
- Deprecate IORatio on EpollEventLoop, and drain the executor and scheduled queue on each event loop wakeup. Coupling the amount of time we are allowed to drain the executor queue to a proportion of time we process inbound IO may lead to unbounded queue sizes and unpredictable latency.
Result:
Fixes https://github.com/netty/netty/issues/7829
- In most cases this results in less calls to timerfd_settime
- Less event loop wakeups just to check for scheduled tasks executed outside the event loop
- More predictable executor queue and scheduled task queue draining
- More accurate and responsive scheduled task execution
Motivation:
It is possible for a remote peer to flood the server / client with empty DATA frames (without end_of_stream flag) set and so cause high CPU usage without the possibility to ever hit a limit. We need to guard against this.
See CVE-2019-9518
Modifications:
- Add a new config option to AbstractHttp2ConnectionBuilder and sub-classes which allows to set the max number of consecutive empty DATA frames (without end_of_stream flag). After this limit is hit we will close the connection. A limit of 10 is used by default.
- Add unit tests
Result:
Guards against CVE-2019-9518
Motivation:
Due how http2 spec is defined it is possible by a remote peer to flood us with frames that will trigger control frames as response, the problem here is that the remote peer can also just stop reading these (while still produce more of these) and so may drive us to the pointer where we either run out of memory or burn all CPU. To protect against this we need to implement some kind of limit that will tear down connections that cause the above mentioned situation.
See CVE-2019-9512 / CVE-2019-9514 / CVE-2019-9515
Modifications:
- Add Http2ControlFrameLimitEncoder which limits the number of queued control frames that were caused because of the remote peer.
- Allow to insert ths Http2ControlFrameLimitEncoder by setting AbstractHttp2ConnectionBuilder.encoderEnforceMaxQueuedControlFrames(...) to a number higher then 0. The default is 10000 which provides some protection by default but will hopefully not cause too many false-positives.
- Add unit tests
Result:
Protect against DDOS due control frames. Fixes CVE-2019-9512 / CVE-2019-9514 / CVE-2019-9515 .
Motivation
Underlying array allocations in UnpooledHeapByteBuf are intended be done
via the protected allocateArray(int) method, so that they can be tracked
and/or overridden by subclasses, for example
UnpooledByteBufAllocator$InstrumentedUnpooledHeapByteBuf or #8015. But
it looks like an explicit allocation was missed in the copy(int,int)
method.
Modification
Just use alloc().heapBuffer(...) for the allocation
Result
No possibility of "missing" array allocations when ByteBuf#copy is used.
Motivation:
We should delay the firing of the Http2ConnectionPrefaceAndSettingsFrameWrittenEvent by one EventLoop tick when using the Http2FrameCodec to ensure all handlers are added to the pipeline before the event is passed through it.
This is needed to workaround a race that could happen when the preface is send in handlerAdded(...) but a later handler wants to act on the event.
Modifications:
Offload firing of the event to the EventExecutor.
Result:
Fixes https://github.com/netty/netty/issues/9432.
Motivation:
As we use the docker files for the CI we should use the delegated mount option to speed up builds.
See https://docs.docker.com/docker-for-mac/osxfs-caching/#delegated
Modifications:
Use delegated mount option
Result:
Faster builds when using docker
Motivation:
When using OpenSSL and JDK < 11 is used we need to wrap the user provided X509ExtendedTrustManager to be able to support TLS1.3. We had a check in place that first tried to see if wrapping is needed at all which could lead to missleading calls of the user provided trustmanager. We should remove these calls and just always wrap if needed.
Modifications:
Always wrap if OpenSSL + JDK < 11 and TLS1.3 is supported
Result:
Less missleading calls to user provided trustmanager
Motivation:
Users' runtime systems may have incompatible dynamic libraries to the ones our
tcnative wrappers link to. Unfortunately, we cannot determine and catch these
scenarios (in which the JVM crashes) but we can make a more educated guess on
what library to load and try to find one that works better before crashing.
Modifications:
1) Build dynamically linked openSSL builds for more OSs (netty-tcnative)
2) Load native linux libraries with matching classifier (first)
Result:
More developers / users can use the dynamically-linked native libraries.
Motivation:
EpollDatagramChannel#localAddress returns wrong information when
EpollDatagramChannel is created with InternetProtocolFamily,
and EpollDatagramChannel#localAddress is invoked BEFORE the actual binding.
This is a regression caused by change
e17ce934da
Modifications:
EpollDatagramChannel() and EpollDatagramChannel(InternetProtocolFamily family)
do not cache local/remote address
Result:
Rebinding on the same address without "reuse port" works
EpollDatagramChannel#localAddress returns correct address
Motivation:
Allow to set the ORIGIN header value from custom headers in WebSocketClientHandshaker
Modification:
Only override header if not present already
Result:
More flexible handshaker usage
Motivation:
#9224 introduced overrides of ByteBufUtil#writeUtf8(...) and related
methods to operate on a sub-CharSequence directly to save having to
allocate substrings, but it missed an edge case where the subsequence
does not extend to the end of the CharSequence and the last char in the
sequence is a high surrogate.
Due to the catch-IndexOutOfBoundsException optimization that avoids an
additional bounds check, it would be possible to read past the specified
end char index and successfully decode a surrogate pair which would
otherwise result in a '?' byte being written.
Modifications:
- Check for end-of-subsequence before reading next char after a high
surrogate is encountered in the
writeUtf8(AbstractByteBuf,int,CharSequence,int,int) and
utf8BytesNonAscii methods
- Add unit test for this edge case
Result:
Bug is fixed.
This removes the bounds-check-avoidance optimization but it does not
appear to have a measurable impact on benchmark results, including when
the char sequence contains many surrogate pairs (which should be rare in
any case).
Motivation:
If the HttpUtil.getCharset method is called with an illegal charset like
"charset=!illegal!" it throws an IllegalCharsetNameException. But the javadoc
states, that defaultCharset is returned if incorrect header value. Since the
client sending the request sets the header value this should not crash.
Modification:
HttpUtil.getCharset catches the IllegalCharsetNameException and returns the
defualt value.
Result:
HttpUtil.getCharset does not throw IllegalCharsetNameException any more.
Motivation:
On openSUSE (probably more), 64 bit builds use lib64, e.g. /usr/lib64, and
configure picks this up and builds the native library in
native-build/target/lib64 where maven is not looking.
Modifications:
Explicitly specify --libdir=${project.build.directory}/native-build/target/lib
during configuration.
Result:
Maven uses the correct lib directory.
Motivation:
Since both projects (to some extend) rely on classifier parsing via the
os-maven-plugin, they should ideally use the same version in case the parsing
changed.
Modifications:
Upgrade os-maven-plugin from 1.6.0 to 1.6.2
Result:
Same os-maven-plugin with same parsing logic.
…ryWebSocketFrames
Motivation:
`Utf8FrameValidator` is always created and added to the pipeline in `WebSocketServerProtocolHandler.handlerAdded` method. However, for websocket connection with only `BinaryWebSocketFrame`'s UTF8 validator is unnecessary overhead. Adding of `Utf8FrameValidator` could be easily avoided by extending of `WebSocketDecoderConfig` with additional property.
Specification requires UTF-8 validation only for `TextWebSocketFrame`.
Modification:
Added `boolean WebSocketDecoderConfig.withUTF8Validator` that allows to avoid adding of `Utf8FrameValidator` during pipeline initialization.
Result:
Less overhead when using only `BinaryWebSocketFrame`within web socket.
Motivation:
Look like `EmbeddedChannelPipeline` should also override `onUnhandledInboundMessage(ChannelHandlerContext ctx, Object msg)` in order to do not print "Discarded message pipeline" because in case of `EmbeddedChannelPipeline` discarding actually not happens.
This fixes next warning in the latest netty version with websocket and `WebSocketServerCompressionHandler`:
```
13:36:36.231 DEBUG- Decoding WebSocket Frame opCode=2
13:36:36.231 DEBUG- Decoding WebSocket Frame length=5
13:36:36.231 DEBUG- Discarded message pipeline : [JdkZlibDecoder#0, DefaultChannelPipeline$TailContext#0]. Channel : [id: 0xembedded, L:embedded - R:embedded].
```
Modification:
Override correct method
Result:
Follow up fix after https://github.com/netty/netty/pull/9286
Motivation:
Our QA servers are spammed with this messages:
13:57:51.560 DEBUG- Decoding WebSocket Frame opCode=1
13:57:51.560 DEBUG- Decoding WebSocket Frame length=4
I think this is too much info for debug level. It is better to move it to trace level.
Modification:
logger.debug changed to logger.trace for WebSocketFrameEncoder/Decoder
Result:
Less messages in Debug mode.
Motivation:
306299323cd47fed6d15767291a3d52e48d16786 introduced some code change to move the responsibility of creating the stream for the upgrade to Http2FrameCodec. Unfortunaly this lead to the situation of having newStream().setStreamAndProperty(...) be called twice. Because of this we only ever saw the channelActive(...) on Http2StreamChannel but no other events as the mapping was replaced on the second newStream().setStreamAndProperty(...) call.
Beside this we also did not use the correct handler for the upgrade stream in some cases
Modifications:
- Just remove the Http2FrameCodec.onHttpClientUpgrade() method and so let the base class handle all of it. The stream is created correctly as part of the ConnectionListener implementation of Http2FrameCodec already.
- Consolidate logic of creating stream channels
- Adjust unit test to capture the bug
Result:
Fixes https://github.com/netty/netty/issues/9395
Motivation:
We did miss to call reclaimSpace(...) in one case which can lead to the situation of having the Recycler to not correctly reclaim space and so just create new objects when not needed.
Modifications:
Correctly call reclaimSpace(...)
Result:
Recycler correctly reclaims space in all situations.
Motivation:
When using the HTTP/2 multiplex implementation we need to ensure we correctly drain the buffered inbound data even if the RecvByteBufallocator.Handle tells us to stop reading in between.
Modifications:
Correctly loop through the buffered inbound data until the user does stop to request from it.
Result:
Fixes https://github.com/netty/netty/issues/9387.
Co-authored-by: Bryce Anderson <banderson@twitter.com>
Motivation:
We can easily reuse the Http2FrameStreamEvent instances and so reduce GC pressure as there may be multiple events per streams over the life-time.
Modifications:
Reuse instances
Result:
Less allocations
Motivation:
In many places Netty uses Unpooled.buffer(0) while should use EMPTY_BUFFER. We can't change this due to back compatibility in the constructors but can use Unpooled.EMPTY_BUFFER in some cases to ensure we not allocate at all. In others we can directly use the allocator either from the Channel / ChannelHandlerContext or the request / response.
Modification:
- Use Unpooled.EMPTY_BUFFER where possible
- Use allocator where possible
Result:
Fixes#9345 for websockets and http package
Motivation:
At the moment we lookup the ChannelHandlerContext used in Http2StreamChannelBootstrap each time the open(...) method is invoked. This is not needed and we can just cache it for later usage.
Modifications:
Cache ChannelHandlerContext in volatile field.
Result:
Speed up open(...) method implementation when called multiple times
Motivation:
We need to ensure we place the encoder before the decoder when doing the websockets upgrade as the decoder may produce a close frame when protocol violations are detected.
Modifications:
- Correctly place encoder before decoder
- Add unit test
Result:
Fixes https://github.com/netty/netty/issues/9300
Motivation:
If a read triggers a AbstractHttp2StreamChannel to close we can
get an NPE in the read loop.
Modifications:
Make sure that the inboundBuffer isn't null before attempting to
continue the loop.
Result:
No NPE.
Fixes#9337