Motivation:
Minor performance optimisation that prevents thread from blocking due to task not having been added to queue. Discussed #7815.
Modification:
add task to the queue before starting the thread.
Result:
No additional tests.
Motivation:
DefaultPromise's internal state depends upon specific Signal objects. These Signal objects can be used externally which causes the DefaultPromise object API to not function correct and state to become corrupted.
Modifications:
- DefaultPromise shouldn't depend upon Signal for its internal state
Result:
Fixes https://github.com/netty/netty/issues/7707
Motivation:
Currently if user call set/remove/set/remove many times, it will create multiple cleaner task for same index. It may cause OOM due to long live thread will have more and more task in LIVE_SET.
Modification:
Add flag to avoid recreating tasks.
Result:
Only create 1 clean task. But use more space of indexedVariables.
Motivation:
The methods implement io.netty.util.concurrent.Future#cancel(boolean mayInterruptIfRunning) which actually ignored the param mayInterruptIfRunning.We need to add comments for the `mayInterruptIfRunning` param.
Modifications:
Add comments for the `mayInterruptIfRunning` param.
Result:
People who call the `cancel` method will be more clear about the effect of `mayInterruptIfRunning` param.
Motivation:
In a few classes, Netty starts a thread and then sets the context classloader of these threads
to prevent classloader leaks. The Thread#setContextClassLoader method is a privileged method in
that it requires permissions to be executed when there is a security manager in place. Unless
these calls are wrapped in a doPrivileged block, they will fail in an environment with a security
manager and restrictive policy in place.
Modifications:
Wrap the calls to Thread#setContextClassLoader in a AccessController#doPrivileged block.
Result:
After this change, the threads can set the context classloader without any errors in an
environment with a security manager and restrictive policy in place.
Motivation:
FastThreadLocal#set calls isIndexedVariableSet to determine if we need to register with the cleaner, but the set(InternalThreadLocalMap, V) method will also internally do this check so we can share code and only do the check a single time.
Modifications:
- extract code from set(InternalThreadLocalMap, V) so it can be called externally to determine if a new item was created
Result:
Less code duplication in FastThreadLocal#set.
Motivation:
e329ca1 introduced the user of ObjectCleaner in FastThreadLocal but we missed the case to register our cleaner task if FastThreadLocal.set was called only.
Modifications:
- Use ObjectCleaner also when FastThreadLocal.set is used.
- Add test case.
Result:
ObjectCleaner is always used.
Motivation:
There is no guarantee that FastThreadLocal.onRemoval(...) is called if the FastThreadLocal is used by "non" FastThreacLocalThreads. This can lead to all sort of problems, like for example memory leaks as direct memory is not correctly cleaned up etc.
Beside this we use ThreadDeathWatcher to check if we need to release buffers back to the pool when thread local caches are collected. In the past ThreadDeathWatcher was used which will need to "wakeup" every second to check if the registered Threads are still alive. If we can ensure FastThreadLocal.onRemoval(...) is called we do not need this anymore.
Modifications:
- Introduce ObjectCleaner and use it to ensure FastThreadLocal.onRemoval(...) is always called when a Thread is collected.
- Deprecate ThreadDeathWatcher
- Add unit tests.
Result:
Consistent way of cleanup FastThreadLocals when a Thread is collected.
Motivation:
When doStartThread throws an exception, e.g. due to the actual executor being depleted of threads and throwing in its rejected execution handler, the STEE ends up in started state anyway. If we try to execute another task in this executor, it will be queued but the thread won't be started anymore and the task will linger forever.
Modifications:
- Ensure we not update the internal state if the startThread() method throws.
- Add testcase
Result:
Fixes [#7483]
Motivation:
ThreadDeathWatcher and GlobalEventExecutor may create and start a new thread from various other threads and so inherit the classloader. We need to ensure we not inherit to allow recycling the classloader.
Modifications:
Use Thread.setContextClassLoader(null) to ensure we not hold a strong reference to the classloader and so not leak it.
Result:
Fixes [#7290].
Motivation:
We dont need to use the ThreadDeathWatcher if we use a FastThreadLocalThread for which we wrap the Runnable and ensure we call FastThreadLocal.removeAll() once the Runnable completes.
Modifications:
- Dont use a ThreadDeathWatcher if we are sure we will call FastThreadLocal.removeAll()
- Add unit test.
Result:
Less overhead / running theads if you only allocate / deallocate from FastThreadLocalThreads.
Motivation:
`AbstractScheduledEventExecutor` uses a standard `java.util.PriorityQueue` to keep track of task deadlines. `ScheduledFuture.cancel` removes tasks from this `PriorityQueue`. Unfortunately, `PriorityQueue.remove` has `O(n)` performance since it must search for the item in the entire queue before removing it. This is fast when the future is at the front of the queue (e.g., already triggered) but not when it's randomly located in the queue.
Many servers will use `ScheduledFuture.cancel` on all requests, e.g., to manage a request timeout. As these cancellations will be happen in arbitrary order, when there are many scheduled futures, `PriorityQueue.remove` is a bottleneck and greatly hurts performance with many concurrent requests (>10K).
Modification:
Use netty's `DefaultPriorityQueue` for scheduling futures instead of the JDK. `DefaultPriorityQueue` is almost identical to the JDK version except it is able to remove futures without searching for them in the queue. This means `DefaultPriorityQueue.remove` has `O(log n)` performance.
Result:
Before - cancelling futures has varying performance, capped at `O(n)`
After - cancelling futures has stable performance, capped at `O(log n)`
Benchmark results
After - cancelling in order and in reverse order have similar performance within `O(log n)` bounds
```
Benchmark (num) Mode Cnt Score Error Units
ScheduledFutureTaskBenchmark.cancelInOrder 100 thrpt 20 137779.616 ± 7709.751 ops/s
ScheduledFutureTaskBenchmark.cancelInOrder 1000 thrpt 20 11049.448 ± 385.832 ops/s
ScheduledFutureTaskBenchmark.cancelInOrder 10000 thrpt 20 943.294 ± 12.391 ops/s
ScheduledFutureTaskBenchmark.cancelInOrder 100000 thrpt 20 64.210 ± 1.824 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 100 thrpt 20 167531.096 ± 9187.865 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 1000 thrpt 20 33019.786 ± 4737.770 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 10000 thrpt 20 2976.955 ± 248.555 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 100000 thrpt 20 362.654 ± 45.716 ops/s
```
Before - cancelling in order and in reverse order have significantly different performance at higher queue size, orders of magnitude worse than the new implementation.
```
Benchmark (num) Mode Cnt Score Error Units
ScheduledFutureTaskBenchmark.cancelInOrder 100 thrpt 20 139968.586 ± 12951.333 ops/s
ScheduledFutureTaskBenchmark.cancelInOrder 1000 thrpt 20 12274.420 ± 337.800 ops/s
ScheduledFutureTaskBenchmark.cancelInOrder 10000 thrpt 20 958.168 ± 15.350 ops/s
ScheduledFutureTaskBenchmark.cancelInOrder 100000 thrpt 20 53.381 ± 13.981 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 100 thrpt 20 123918.829 ± 3642.517 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 1000 thrpt 20 5099.810 ± 206.992 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 10000 thrpt 20 72.335 ± 0.443 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 100000 thrpt 20 0.743 ± 0.003 ops/s
```
Motivation:
AtomicIntegerFieldUpdater#get is unnecessary, I think use simple volatile read is cleaner
Modifications:
Replace code STATE_UPDATER.get(this) to state in SingleThreadEventExecutor
Result:
Cleaner code
Motivation:
As the javadoc of ScheduledExecutorService state:
Zero and negative delays (but not periods) are also allowed in schedule methods,and are treated as requests for immediate execution.
Modifications:
- Correctly handle delay <= 0.
- Add unit tests.
Result:
Fixes [#6627].
Motivation:
When UnorderedThreadPoolEventExecutor.execute / submit etc is called it will consume up to 100 % CPU even after the task was executed.
Modifications:
Add a special wrapper which we will be used in execute(...) to wrap the submitted Runnable. This is needed as ScheduledThreadPoolExecutor.execute(...) will delegate to submit(...) which will then use decorateTask(...). The problem with this is that decorateTask(...) needs to ensure we only do our own decoration if we not call from execute(...) as otherwise we may end up creating an endless loop because DefaultPromise will call EventExecutor.execute(...) when notify the listeners of the promise.
Result:
Fixes [#6507].
Motivation:
Calling a static method is faster then dynamic
Modifications:
Add 'static' keyword for methods where it missed
Result:
A bit faster method calls
Motivation:
A testing goof in 7c630fe introduced a binary incompatibility when the old Promise-specific `add` and `addAll` methods in PromiseCombiner were generalized to accept `Futures`.
Modification:
- Restore (but mark as `@Deprecated`) old PromiseCombiner methods.
- Fixed a couple minor documentation typos because sure why not.
Result:
`PromiseCombiner` is binary-compatible with previous versions of Netty.
Motivation:
`PromiseCombiner` is really handy, but it's not obvious how to use it from its existing documentation/method signatures.
Modification:
- Added javadoc comments to explain the theory of operation of `PromiseCombiner`.
- Generalized `PromiseCombiner` to work with `Futures` so it's clearer that the things for which it's listening won't be modified.
Result:
`PromiseCombiner` is easier to understand.
Motivation:
In later Java8 versions our Atomic*FieldUpdater are slower then the JDK implementations so we should not use ours anymore. Even worse the JDK implementations provide for example an optimized version of addAndGet(...) using intrinsics which makes it a lot faster for this use-case.
Modifications:
- Remove methods that return our own Atomic*FieldUpdaters.
- Use the JDK implementations everywhere.
Result:
Faster code.
Motivation:
the build doesnt seem to enforce this, so they piled up
Modifications:
removed unused import lines
Result:
less unused imports
Signed-off-by: radai-rosenblatt <radai.rosenblatt@gmail.com>
Motivation:
To make it easier to debug why notification of a promise failed we should log extra info and make it consistent.
Modifications:
- Create a new PromiseNotificationUtil that has static methods that can be used to try notify a promise and log.
- Reuse this in AbstractChannelHandlerContext, ChannelOutboundBuffer and PromiseNotifier
Result:
Easier to debug why a promise could not be notified.
Motivation:
If the user uses 0 as quiet period we should shutdown without any delay if possible.
Modifications:
Ensure we not introduce extra delay when a shutdown quit period of 0 is used.
Result:
EventLoop shutdown as fast as expected.
Motivation:
We offer DefaultEventExecutorGroup as an EventExecutorGroup which return OrderedEventExecutor and so provide strict ordering of event execution. One limitations of this implementation is that each contained DefaultEventExecutor will always be tied to a single thread, which can lead to a very unbalanced execution as one thread may be super busy while others are idling.
Modifications:
- Add NonStickyEventExecutorGroup which can be used to wrap another EventExecutorGroup (like UnorderedThreadPoolEventExecutor) and expose ordering while not be sticky with the thread that is used for a given EventExecutor. This basically means that Threads may change between execution of tasks for an EventExecutor but ordering is still guaranteed.
Result:
Better utalization of threads in some use-cases.
Motivation:
When we try to close the Channel due a timeout we need to ensure we not log if the notification of the promise fails as it may be completed in the meantime.
Modifications:
Add another constructor to ChannelPromiseNotifier and PromiseNotifier which allows to log on notification failure.
Result:
No more miss-leading logs.
Motivation:
A recent change to DefaultThreadFactory modified it so that it is sticky
with respect to thread groups. In particular, this change made it so
that DefaultThreadFactory would hold on to the thread group that created
it, and then use that thread group to create threads.
This can have problematic semantics since it can lead to violations of a
tenet of thread groups that a thread can only modify threads in its own
thread group and descendant thread groups. With a sticky thread group, a
thread triggering the creation of a new thread via
DefaultThreadFactory#newThread will be modifying a thread from the
sticky thread group which will not necessarily be its own nor a
descendant thread group. When a security manager is in place that
enforces this requirement, these modifications are now impossible. This
is especially problematic in the context of Netty because certain global
singletons like GlobalEventExecutor will create a
DefaultThreadFactory. If all DefaultThreadFactory instances are sticky
about their thread groups, it means that submitting tasks to the
GlobalEventExecutor singleton can cause a thread to be created from the
DefaultThreadFactory sticky thread group, exactly the problem with
DefaultThreadFactory being sticky about thread groups. A similar problem
arises from the ThreadDeathWatcher.
Modifications:
This commit modifies DefaultThreadFactory so that a null thread group
can be set with the behavior that all threads created by such an
instance will inherit the default thread group (the thread group
provided by the security manager if there is one, otherwise the thread
group of the creating thread). The construction of the instances of
DefaultThreadFactory used by the GlobalEventExecutor singleton and
ThreadDeathWatcher are modified to use this behavior. Additionally, we
also modify the chained constructor invocations of the
DefaultThreadFactory that do not have a parameter to specify a thread
group to use the thread group from the security manager is available,
otherwise the creating thread's thread group. We also add unit tests
ensuring that all of this behavior is maintained.
Result:
It will be possible to have DefaultThreadFactory instances that are not
sticky about the thread group that led to their creation. Instead,
threads created by such a DefaultThreadFactory will inherit the default
thread group which will either be the thread group from the security
manager or the current thread's thread group.
Motivation:
This change is part of the change done in PR #5395 to provide an `AUTO_FLUSH` capability.
Splitting this change will enable to try other ways of implementing `AUTO_FLUSH`.
Modifications:


Two methods:
```java
void executeAfterEventLoopIteration(Runnable task);


boolean removeAfterEventLoopIterationTask(Runnable task);
```
are added to `SingleThreadEventLoop` class for adding/removing a task to be executed at the end of current/next iteration of this `eventloop`.
In order to support the above, a few methods are added to `SingleThreadEventExecutor`
```java
protected void afterRunningAllTasks() { }
```
This is invoked after all tasks are run for this executor OR if the passed timeout value for `runAllTasks(long timeoutNanos)` is expired.
Added a queue of `tailTasks` to `SingleThreadEventLoop` to hold all tasks to be executed at the end of every iteration.


Result:


`SingleThreadEventLoop` now has the ability to execute tasks at the end of an eventloop iteration.
Motivation:
Today when awaiting uninterruptibly on a default promise, a race
condition can lead to a missed signal. Quite simply, the check for
whether the condition holds is not made inside a lock before
waiting. This means that the waiting thread can enter the wait after the
promise has completed and will thus not be notified, thus missing the
signal. This leads to the waiting thread to enter a timed wait that will
only trip with the timeout elapses leading to unnecessarily long waits
(imagine a connection timeout, and the waiting thread missed the signal
that the connection is ready).
Modification:
This commit fixes this missed signal by checking the condition inside a
lock. We also add a test that reliably fails without the non-racy
condition check.
Result:
Timed uninterruptible waits on default promise will not race against the
condition and possibly wait longer than necessary.
Motivation:
ExecutorService.invoke*(...) methods may block by API definition. This can lead to deadlocks if called from inside the EventLoop in SingleThreadEventExecutor as it only has one Thread that does all the work.
Modifications:
Throw a RejectedExectionException if someone tries to call SingleThreadEventExecutor.invoke*(...) while in the EventLoop.
Result:
No more deadlock possible.
Motivation:
DefaultPromise has a listeners member variable which is volatile to allow for an optimization which makes notification of listeners less expensive when there are no listeners to notify. However this change makes all other operations involving the listeners member variable more costly. This optimization which requires listeners to be volatile can be removed to avoid volatile writes/reads for every access on the listeners member variable.
Modifications:
- DefaultPromise listeners is made non-volatile and the null check optimization is removed
Result:
DefaultPromise.listeners is no longer volatile.
Motivation:
In commit f984870ccca133d6056e8b0df0b2352f8f90b0fe I made a change which operated under invalide assumption that tasks executed by an EventExecutor will always be processed in a serial fashion. This is true for SingleThreadEventExecutor sub-classes but not part of the EventExecutor interface contract.
Because of this change implementations of EventExecutor which not strictly execute tasks in a serial fashion may miss events before handlerAdded(...) is called. This is strictly speaking not correct as there is not guarantee in this case that handlerAdded(...) will be called as first task (as there is no ordering guarentee).
Cassandra itself ships such an EventExecutor implementation which has no strict ordering to spread load across multiple threads.
Modifications:
- Add new OrderedEventExecutor interface and let SingleThreadEventExecutor / EventLoop implement / extend it.
- Only expose "restriction" of skipping events until handlerAdded(...) is called for OrderedEventExecutor implementations
- Add ThreadPoolEventExecutor implementation which executes tasks in an unordered fashion. This is used in added unit test but can also be used for protocols which not expose an strict ordering.
- Add unit test.
Result:
Resurrect the possibility to implement an EventExecutor which does not enforce serial execution of events and be able to use it with the DefaultChannelPipeline.
Motivation:
DefaultPromise has a listeners member variable which is volatile to allow for an optimization which makes notification of listeners less expensive when there are no listeners to notify. However this change makes all other operations involving the listeners member variable more costly. This optimization which requires listeners to be volatile can be removed to avoid volatile writes/reads for every access on the listeners member variable.
Modifications:
- DefaultPromise listeners is made non-volatile and the null check optimization is removed
Result:
DefaultPromise.listeners is no longer volatile.
Motivation:
The logging statements in i.n.u.c.DefaultPromise do not emit the
caught Throwable when a Throwable is thrown while a listener is being
notified of completed or progressed operations.
Modifications:
This issue arises because the logging message has a single placeholder
but is passing two additional arguments, the second one being the
caught Throwable that is thus quietly not logged. We address this by
modifying the logging statements to ensure the caught Throwable is
logged. In this case, the preferred approach is to use the logger
override that accepts a message and a Throwable parameter since logger
implementations might have special handling for this case.
Result:
Log messages from i.n.u.c.DefaultPromise when a Throwable is thrown
while notifying a listener of completed or progressed operations will
contain the caught Throwable.
Motivation:
A race detector found that DefaultPromise.listeners is improperly synchronized [1].
Worst case a listener will not be executed when the promise is completed.
Modifications:
Make DefaultPromise.listeners a volatile.
Result:
Hopefully, DefaultPromise is more correct under concurrent execution.
[1] https://github.com/grpc/grpc-java/issues/2015
Motivation:
Currently in the single threaded and global event executors when the scheduled task queue is drained, there is a call to hasScheduledTasks(). If there are scheduled tasks then the the code polls the queue for tasks. The poll method duplicates the exact logic of hasScheduledTasks(). This involves two calls to nanoTime when one seems sufficient.
Modifications:
Directly poll the queue for tasks and break if the task returned is null.
Result:
Should be no noticeable impact on functionality. Two calls to nanoTime have been coarsened into a single call.
Motivation:
Some Netty use cases may want to configure the max allowed stack depth for promise listener notification.
Modifications:
- Add a system property so that this value can be configured.
Result:
DefaultPromise's max stack depth is configurable.
Motiviation:
Sometimes it is useful to allow to specify a custom strategy to handle rejected tasks. For example if someone tries to add tasks from outside the eventloop it may make sense to try to backoff and retries and so give the executor time to recover.
Modification:
Add RejectedEventExecutor interface and implementations and allow to inject it.
Result:
More flexible handling of executor overload.
Motivation:
To restrict the memory usage of a system it is sometimes needed to adjust the number of max pending tasks in the tasks queue.
Modifications:
- Add new constructors to modify the number of allowed pending tasks.
- Add system properties to configure the default values.
Result:
More flexible configuration.
Motivation:
We use pre-instantiated exceptions in various places for performance reasons. These exceptions don't include a stacktrace which makes it hard to know where the exception was thrown. This is especially true as we use the same exception type (for example ChannelClosedException) in different places. Setting some StackTraceElements will provide more context as to where these exceptions original and make debugging easier.
Modifications:
Set a generated StackTraceElement on these pre-instantiated exceptions which at least contains the origin class and method name. The filename and linenumber are specified as unkown (as stated in the javadocs of StackTraceElement).
Result:
Easier to find the origin of a pre-instantiated exception.
Motivation:
DefaultThreadFactory allows to override the newThread(...) method and so should have access to all fields that are set via the constructor.
Modifications:
Change threadGroup from private to protected visibility.
Result:
Easier to extend DefaultThreadFactory.
Motivation:
JCTools supports both non-unsafe, unsafe versions of queues and JDK6 which allows us to shade the library in netty-common allowing it to stay "zero dependency".
Modifications:
- Remove copy paste JCTools code and shade the library (dependencies that are shaded should be removed from the <dependencies> section of the generated POM).
- Remove usage of OneTimeTask and remove it all together.
Result:
Less code to maintain and easier to update JCTools and less GC pressure as the queue implementation nt creates so much garbage
Motivation:
The javaDocs for Future.removeListener do not clarify that only the first occurrence of the listener is guaranteed to be removed.
Modifications:
- Clarify the javaDocs for Future.removeListener[s] so it is known that the only the first occurrence of the listener will be removed.
Result:
Fixes https://github.com/netty/netty/issues/5351
Motivation:
Sometimes it may be benefitially for an user to specify a custom algorithm when choose the next EventExecutor/EventLoop.
Modifications:
Allow to specify a custom EventExecutorChooseFactory that allows to customize algorithm.
Result:
More flexible api.
Modifications:
DefaultPromise provides a ThreadLocal queue to protect against StackOverflowError because of executors which may immediately execute runnables instead of queue them (i.e. ImmediateEventExecutor). However this may be better addressed by fixing these executors to protect against StackOverflowError instead of just fixing for a single use case. Also the most commonly used executors already provide the desired behavior and don't need the additional overhead of a ThreadLocal queue in DefaultPromise.
Modifications:
- Remove ThreadLocal queue from DefaultPromise
- Change ImmediateEventExecutor so it maintains a queue of runnables if reentrant condition occurs
Result:
DefaultPromise StackOverflowError code is simpler, and ImmediateEventExecutor protects against StackOverflowError.