Related issue: #2250
Motivation:
Prior to this commit, Netty's non blocking EventLoops
were each assigned a fixed thread by which all of the
EventLoop's I/O and handler logic would be performed.
While this is a fine approach for most users of Netty,
some advanced users require more flexibility in
scheduling the EventLoops.
Modifications:
Remove all direct usages of threads in MultithreadEventExecutorGroup,
SingleThreadEventExecutor et al., and introduce an Executor
abstraction instead.
The way to think about this change is, that each
iteration of an eventloop is now a task that gets scheduled
in a ForkJoinPool.
While the ForkJoinPool is the default, one also has the
ability to plug in his/her own Executor (aka thread pool)
into a EventLoop(Group).
Result:
Netty hands off thread management to a ForkJoinPool by default.
Users can also provide their own thread pool implementation and
get some control over scheduling Netty's EventLoops
Motivation:
The calculation of the max wait time for HashedWheelTimerTest.testExecutionOnTime() was wrong and so the test sometimes failed.
Modifications:
Fix the max wait time.
Result:
No more test-failures
- Rewrite with linear probing, no state array, compaction at cleanup
- Optimize keys() and values() to not use reflection
- Optimize hashCode() and equals() for efficient iteration
- Fixed equals() to not return true for equals(null)
- Optimize iterator to not allocate new Entry at each next()
- Added toString()
- Added some new unit tests
Motivations:
In our new version of HWT we used some kind of lazy cancelation of timeouts by put them back in the queue and let them pick up on the next tick. This multiple problems:
- we may corrupt the MpscLinkedQueue if the task is used as tombstone
- this sometimes lead to an uncessary delay especially when someone did executed some "heavy" logic in the TimeTask
Modifications:
Use a Lock per HashedWheelBucket for save and fast removal.
Modifications:
Cancellation of tasks can be done fast and so stuff can be GC'ed and no more infinite-loop possible
Motivation:
Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used.
Modification:
- Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance.
- Recycling of the same object multiple times without acquire it will fail.
- Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler
These changes are based on @belliottsmith 's work that was part of #2504.
Result:
Huge increase in performance.
4.0 branch without this commit:
Benchmark (size) Mode Samples Score Score error Units
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
4.0 branch with this commit:
Benchmark (size) Mode Samples Score Score error Units
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
Motivation:
When Netty runs in a managed environment such as web application server,
Netty needs to provide an explicit way to remove the thread-local
variables it created to prevent class loader leaks.
FastThreadLocal uses different execution paths for storing a
thread-local variable depending on the type of the current thread.
It increases the complexity of thread-local removal.
Modifications:
- Moved FastThreadLocal and FastThreadLocalThread out of the internal
package so that a user can use it.
- FastThreadLocal now keeps track of all thread local variables it has
initialized, and calling FastThreadLocal.removeAll() will remove all
thread-local variables of the caller thread.
- Added FastThreadLocal.size() for diagnostics and tests
- Introduce InternalThreadLocalMap which is a mixture of hard-wired
thread local variable fields and extensible indexed variables
- FastThreadLocal now uses InternalThreadLocalMap to implement a
thread-local variable.
- Added ThreadDeathWatcher.unwatch() so that PooledByteBufAllocator
tells it to stop watching when its thread-local cache has been freed
by FastThreadLocal.removeAll().
- Added FastThreadLocalTest to ensure that removeAll() works
- Added microbenchmark for FastThreadLocal and JDK ThreadLocal
- Upgraded to JMH 0.9
Result:
- A user can remove all thread-local variables Netty created, as long as
he or she did not exit from the current thread. (Note that there's no
way to remove a thread-local variable from outside of the thread.)
- FastThreadLocal exposes more useful operations such as isSet() because
we always implement a thread local variable via InternalThreadLocalMap
instead of falling back to JDK ThreadLocal.
- FastThreadLocalBenchmark shows that this change improves the
performance of FastThreadLocal even more.
Motivation:
Maps with integer keys are used in several places (HTTP/2 code, for
example). To reduce the memory footprint of these structures, we need a
specialized map class that uses ints as keys.
Modifications:
Added IntObjectHashMap, which is uses open addressing and double hashing
for collision resolution.
Result:
A new int-based map class that can be shared across Netty.
Motivation:
When running Netty on a container environment, the container will often
complain about the lingering threads such as the worker threads of
ThreadDeathWatcher and GlobalEventExecutor. We should provide an
operation that allows a use to wait until such threads are terminated.
Modifications:
- Add awaitInactivity()
- (misc) Fix typo in GlobalEventExecutorTest
- (misc) Port ThreadDeathWatch's CAS-based thread life cycle management
to GlobalEventExecutor
Result:
- Fixes#2084
- Less overhead on task submission of GlobalEventExecutor
Motivation:
PooledByteBufAllocator's thread local cache and
ReferenceCountUtil.releaseLater() are in need of a way to run an
arbitrary logic when a certain thread is terminated.
Modifications:
- Add ThreadDeathWatcher, which spawns a low-priority daemon thread
that watches a list of threads periodically (every second) and
invokes the specified tasks when the associated threads are not alive
anymore
- Start-stop logic based on CAS operation proposed by @tea-dragon
- Add debug-level log messages to see if ThreadDeathWatcher works
Result:
- Fixes#2519 because we don't use GlobalEventExecutor anymore
- Cleaner code
Motivation:
The current DefaultAttributeMap cause an infinite-loop when the user removes an attribute and create the same attribute again. This regression was introduced by c3bd7a8ff1.
Modification:
Correctly break out loop
Result:
No infinite-loop anymore.
.. which occurs when a user adds a listener from different threads after the promise is done and the notifications for the listeners, that were added before the promise is done, is in progress. For instance:
Thread-1: p.addListener(listenerA);
Thread-1: p.setSuccess(null);
Thread-2: p.addListener(listenerB);
Thread-2: p.executor.execute(taskNotifyListenerB);
Thread-1: p.executor.execute(taskNotifyListenerA);
taskNotifyListenerB should not really notify listenerB until taskNotifyListenerA is finished.
To fix this issue:
- Change the semantic of (listeners == null) to determine if the early
listeners [1] were notified
- If a late listener is added before the early listeners are notified,
the notification of the late listener is deferred until the early
listeners are notified (i.e. until listeners == null)
- The late listeners with deferred notifications are stored in a lazily
instantiated queue to preserve ordering, and then are notified once
the early listeners are notified.
[1] the listeners that were added before the promise is done
[2] the listeners that were added after the promise is done
- Proposed fix for #1824
UniqueName and its subtypes do not allow getting the previously registered instance. For example, let's assume that a user is running his/her application in an OSGi container with Netty bundles and his server bundle. Whenever the server bundle is reloaded, the server will try to create a new AttributeKey instance with the same name. However, Netty bundles were not reloaded at all, so AttributeKey will complain that the name is taken already (by the previously loaded bundle.)
To fix this problem:
- Replaced UniqueName with Constant, AbstractConstant, and ConstantPool. Better name and better design.
- Sctp/Udt/RxtxChannelOption is not a ChannelOption anymore. They are just constant providers and ChannelOption is final now. It's because caching anything that's from outside of netty-transport will lead to ClassCastException on reload, because ChannelOption's constant pool will keep all option objects for reuse.
- Signal implements Constant because we can't ensure its uniqueness anymore by relying on the exception raised by UniqueName's constructor.