Motivation:
PlatformDependent#getSystemClassLoader may throw a wide variety of exceptions based upon the environment. We should handle all exceptions and continue initializing the slow path if an exception occurs.
Modifications:
- Catch Throwable in cases where PlatformDependent#getSystemClassLoader is used
Result:
Fixes https://github.com/netty/netty/issues/6038
Motivation:
If an exception is thrown while processing the ready channels in the EventLoop we should still run all tasks as this may allow to recover. For example a OutOfMemoryError may be thrown and runAllTasks() will free up memory again. Beside this we should also ensure we always allow to shutdown even if an exception was thrown.
Modifications:
- Call runAllTasks() in a finally block
- Ensure shutdown is always handles.
Result:
More robust EventLoop implementations for NIO and Epoll.
Motivation:
We should better first process OP_WRITE before OP_READ as this may allow us to free memory in a faster fashion for previous queued writes.
Modifications:
Process OP_WRITE before OP_READ
Result:
Free memory faster for queued writes.
Motivation:
The JDK implementation of SocketChannel has an internal state that is tracked for its operations. Because of this we need to ensure we call finishConnect() before try to call read(...) / write(...) as otherwise it may produce a NotYetConnectedException.
Modifications:
First process OP_CONNECT flag.
Result:
No more possibility of NotYetConnectedException because OP_CONNECT is handled not early enough when processing interestedOps for a Channel.
Motivation:
When attempting to set the selectedKeys fields on the selector
implementation, JDK 9 can throw an inaccessible object exception.
Modications:
Catch and log this exception as an possible course of action if the
sun.nio.ch package is not exported from java.base.
Result:
The selector replacement will fail gracefully as an expected course of
action if the sun.nio.ch package is not exported from java.base.
Motivation:
Instrumenting the NIO selector implementation requires special
permissions. Yet, the code for performing this instrumentation is
executed in a manner that would require all code leading up to the
initialization to have the requisite permissions. In a restrictive
environment (e.g., under a security policy that only grants the
requisite permissions the Netty transport jar but not to application
code triggering the Netty initialization), then instrumeting the
selector will not succeed even if the security policy would otherwise
permit it.
Modifications:
This commit marks the necessary blocks as privileged. This enables
access to the necessary resources for instrumenting the selector. The
idea is that we are saying the Netty code is trusted, and as long as the
Netty code has been granted the necessary permissions, then we will
allow the caller access to these resources even though the caller itself
might not have the requisite permissions.
Result:
The selector can be instrumented in a restrictive security environment.
Motivation:
Writing to a system property requires permissions. Yet the code for
setting sun.nio.ch.bugLevel is not marked as privileged. In a
restrictive environment (e.g., under a security policy that only grants
the requisite permissions the Netty transport jar but not to application
code triggering the Netty initialization), writing to this system
property will not succeed even if the security policy would otherwise
permit it.
Modifications:
This commt marks the necessary code block as privileged. This enables
writing to this system property. The idea is that we are saying the
Netty code is trusted, and as long as the Netty code has been granted
the necessary permissions, then we will allow the caller access to these
resources even though the caller itself might not have the requisite
permissions.
Result:
The system property sun.nio.ch.bugLevel can be written to in a
restrictive security environment.
Motiviation:
Sometimes it is useful to allow to specify a custom strategy to handle rejected tasks. For example if someone tries to add tasks from outside the eventloop it may make sense to try to backoff and retries and so give the executor time to recover.
Modification:
Add RejectedEventExecutor interface and implementations and allow to inject it.
Result:
More flexible handling of executor overload.
Motivation:
To restrict the memory usage of a system it is sometimes needed to adjust the number of max pending tasks in the tasks queue.
Modifications:
- Add new constructors to modify the number of allowed pending tasks.
- Add system properties to configure the default values.
Result:
More flexible configuration.
Motivation:
If a user writes an own nio based transport which uses a special SelectorProvider it is useful to be able to get the SelectorProvider that is used by a NioEventLoop. This way this can be used when implement AbstractChannel.isCompatible(...) and check that the SelectorProvider is the correct one.
Modifications:
Expose the SelectorProvider.
Result:
Be able to get the SelectorProvider used by a NioEventLoop.
Motivation:
To better debug why a Selector need to be rebuild it is useful to also log the instance of the Selector.
Modifications:
Add logger instance to the log message.
Result:
More useful log message.
Motivation:
SingleThreadEventExecutor.pendingTasks() will call taskQueue.size() to get the number of pending tasks in the queue. This is not safe when using MpscLinkedQueue as size() is only allowed to be called by a single consumer.
Modifications:
Ensure size() is only called from the EventLoop.
Result:
No more livelock possible when call pendingTasks, no matter from which thread it is done.
Motivation:
If a task was submitted when wakenUp value was true, the task didn't get a chance to call Selector#wakeup.
So we need to check task queue again before executing select operation. If we don't, the task might be pended until select operation was timed out.
It might be pended until idle timeout if IdleStateHandler existed in pipeline.
Modifications:
Execute Selector#select in a non-blocking manner if there's a task submitted when wakenUp value was true.
Result:
Every tasks in NioEventLoop will not be pended.
Motivation:
If a channel is deregistered from an NioEventLoop the associated SelectionKey is cancelled. If the NioEventLoop has yet to process a pending event as a result of that SelectionKey then the NioEventLoop will see the SelecitonKey is invalid and close the channel. The NioEventLoop should not close a channel if it is not registered with that NioEventLoop.
Modifications:
- NioEventLoop.processSelectedKeys should check that the channel is still registered to itself before closing the channel
Result:
NioEventLoop doesn't close a channel that is no longer registered to it when the SelectionKey is invalid
Fixes https://github.com/netty/netty/issues/5125
Motivation:
NIO now supports a pluggable select strategy, but EPOLL currently doesn't support this. We should strive for feature parity for EPOLL.
Modifications:
- Add SelectStrategy to EPOLL transport.
Result:
EPOLL transport supports SelectStategy.
Motivation:
Under high throughput/low latency workloads, selector wakeups are
degrading performance when the incoming operations are triggered
from outside of the event loop. This is a common scenario for
"client" applications where the originating input is coming from
application threads rather from the socket attached inside the
event loops.
As a result, it can be desirable to defer the blocking select
so that incoming tasks (write/flush) do not need to wakeup
the selector.
Modifications:
This changeset adds the notion of a generic SelectStrategy which,
based on its contract, allows the implementation to optionally
defer the blocking select based on some custom criteria.
The default implementation resembles the original behaviour, that
is if tasks are in the queue `selectNow()` and move on, and if no
tasks need to be processed go into the blocking select and wait
for wakeup.
The strategy can be customized per `NioEventLoopGroup` in the
constructor.
Result:
High performance client applications are now given the chance to
customize for how long the actual selector blocking should be
deferred by employing a custom select strategy.
Motivation:
The prefix fix of #2363 did not correctly handle the case when selectAgain is true and so missed to null out entries.
Modifications:
Move the i++ from end of loop to beginning of loop
Result:
Entries in the array will be null out so allow to have these GC'ed once the Channel close
Motivation:
We use the nanoTime of the scheduledTasks to calculate the milli-seconds to wait for a select operation to select something. Once these elapsed we check if there was something selected or some task is ready for processing. Unfortunally we not take into account scheduled tasks here so the selection loop will continue if only scheduled tasks are ready for processing. This will delay the execution of these tasks.
Modification:
- Check if a scheduled task is ready after selecting
- also make a tiny change in NioEventLoop to not trigger a rebuild if nothing was selected because the timeout was reached a few times in a row.
Result:
Execute scheduled tasks on time.
Motivation:
When a select rebuild was triggered the reference to the SelectionKey is not updated in AbstractNioChannel. This will cause a CancelledKeyException later.
Modification:
Correctly update SelectionKey reference after rebuild
Result:
Fix exception
Motivation:
Because Thread.currentThread().interrupt() will unblock Selector.select() we need to take special care when check if we need to rebuild the Selector. If the unblock was caused by the interrupt() we will clear it and move on as this is most likely a bug in a custom ChannelHandler or a library the user makes use of.
Modification:
Clear the interrupt state of the Thread if the Selector was unblock because of an interrupt and the returned keys was 0.
Result:
No more busy loop caused by Thread.currentThread().interrupt()
Motivation:
When using System.getProperty(...) and various methods to get a ClassLoader it will fail when a SecurityManager is in place.
Modifications:
Use a priveled block if needed. This work is based in the PR #2353 done by @anilsaldhana .
Result:
Code works also when SecurityManager is present
Motivation:
Because we not null out the array entry in the SelectionKey[] which is produced by SelectedSelectionKeySet.flip() we may end up with a few SelectionKeyreferences still hanging around here even after the Channel was closed. As these entries may be present at the end of the SelectionKey[] which is never updated for a long time as not enough SelectionKeys are ready.
Modifications:
Once we access the SelectionKey out of the SelectionKey[] we directly null it out.
Result:
Reference can be GC'ed right away once the Channel was closed.
shutdownGracefully() provides two optional parameters that give more
control over when an executor has to be shut down.
- Related issue: #1307
- Add shutdownGracefully(..) and isShuttingDown()
- Deprecate shutdown() / shutdownNow()
- Replace lastAccessTime with lastExecutionTime and update it after task
execution for accurate quiet period check
- runAllTasks() and runShutdownTasks() update it automatically.
- Add updateLastExecutionTime() so that subclasses can update it
- Add a constructor parameter that tells not to add an unncessary wakeup
task in execute() if addTask() wakes up the executor thread
automatically. Previously, execute() always called wakeup() after
addTask(), which often caused an extra dummy task in the task queue.
- Use shutdownGracefully() wherever possible / Deprecation javadoc
- Reduce the running time of SingleThreadEventLoopTest from 40s to 15s
using custom graceful shutdown parameters
- Other changes made along with this commit:
- takeTask() does not throw InterruptedException anymore.
- Returns null on interruption or wakeup
- Make sure runShutdownTasks() return true even if an exception was
raised while running the shutdown tasks
- Remove unnecessary isShutdown() checks
- Consistent use of SingleThreadEventExecutor.nanoTime()
Replace isWakeupOverridden with a constructor parameter
- Count the number of select() calls made to wait until reaching at the expected dead line, and rebuild selectors if too many select() calls were made.