Motivation:
We did various changes related to the ChannelOutboundBuffer in 4.0 branch. This commit port all of them over and so make sure our branches are synced in terms of these changes.
Related to [#2734], [#2709], [#2729], [#2710] and [#2693] .
Modification:
Port all changes that was done on the ChannelOutboundBuffer.
This includes the port of the following commits:
- 73dfd7c01b
- 997d8c32d2
- e282e504f1
- 5e5d1a58fd
- 8ee3575e72
- d6f0d12a86
- 16e50765d1
- 3f3e66c31a
Result:
- Less memory usage by ChannelOutboundBuffer
- Same code as in 4.0 branch
- Make it possible to use ChannelOutboundBuffer with Channel implementation that not extends AbstractChannel
Motivation:
The PID_MAX_LIMIT on 64bit linux systems is 4194304 and on osx it is 99998. At the moment we use 65535 as an upper-limit which is too small.
Modifications:
Use 4194304 as max possible value
Result:
No more false-positives when try to detect current pid.
Motivation:
We have some inconsistency when handling writes. Sometimes we call ChannelOutboundBuffer.progress(...) also for complete writes and sometimes not. We should call it always.
Modifications:
Correctly call ChannelOuboundBuffer.progress(...) for complete and incomplete writes.
Result:
Consistent behavior
Motivation:
While benchmarking the native transport with gathering writes I noticed that it is quite slow. This is due the fact that we need to do a lot of array copies to get the buffers into the iov array.
Modification:
Introduce a new class calles IovArray which allows to fill buffers directly in a iov array that can be passed over to JNI without any array copies. This gives a nice optimization in terms of speed when doing gathering writes.
Result:
Big performance improvement when doing gathering writes. See the included benchmark...
Before:
[nmaurer@xxx]~% wrk/wrk -H 'Host: localhost' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Connection: keep-alive' -d 120 -c 256 -t 16 --pipeline 256 http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 23.44ms 16.37ms 259.57ms 91.77%
Req/Sec 181.99k 31.69k 304.60k 78.12%
346544071 requests in 2.00m, 46.48GB read
Requests/sec: 2887885.09
Transfer/sec: 396.59MB
With this change:
[nmaurer@xxx]~% wrk/wrk -H 'Host: localhost' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Connection: keep-alive' -d 120 -c 256 -t 16 --pipeline 256 http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 21.93ms 16.33ms 305.73ms 92.34%
Req/Sec 194.56k 33.75k 309.33k 77.04%
369617503 requests in 2.00m, 49.57GB read
Requests/sec: 3080169.65
Transfer/sec: 423.00MB
Motivation:
ChannelOutboundBuffer is basically a circular array queue of its entry
objects. Once an entry is created in the array, it is never nulled out
to reduce the allocation cost.
However, because it is a circular queue, the array almost always ends up
with as many entry instances as the size of the array, regardless of the
number of pending writes.
At worst case, a channel might have only 1 pending writes at maximum
while creating 32 entry objects, where 32 is the initial capacity of the
array.
Modifications:
- Reduce the initial capacity of the circular array queue to 4.
- Make the initial capacity of the circular array queue configurable
Result:
We spend 4 times less memory for entry objects under certain
circumstances.
Motivation:
At the moment NioSocketChannelOutboundBuffer.nioBuffers() / EpollSocketChannelOutboundBuffer.memoryAddresses() returns null if something is contained in the ChannelOutboundBuffer which is not a ByteBuf. This is a problem for two reasons:
1 - In the javadocs we state that it will never return null
2 - We may do a not optimal write as there may be things that could be written via gathering writes
Modifications:
Change NioSocketChannelOutboundBuffer.nioBuffers() / EpollSocketChannelOutboundBuffer.memoryAddresses() to never return null but have it contain all ByteBuffer that were found before the non ByteBuf. This way we can do a gathering write and also conform to the javadocs.
Result:
Better speed and also correct implementation in terms of the api.
Motivation:
Now Netty has a few problems with null values.
Modifications:
- Check HAProxyProxiedProtocol in HAProxyMessage constructor and throw NPE if it is null.
If HAProxyProxiedProtocol is null we will set AddressFamily as null. So we will get NPE inside checkAddress(String, AddressFamily) and it won't be easy to understand why addrFamily is null.
- Check File in DiskFileUpload.toString().
If File is null we will get NPE when calling toString() method.
- Check Result<String> in MqttDecoder.decodeConnectionPayload(...).
If !mqttConnectVariableHeader.isWillFlag() || !mqttConnectVariableHeader.hasUserName() || !mqttConnectVariableHeader.hasPassword() we will get NPE when we will try to create new instance of MqttConnectPayload.
- Check Unsafe before calling unsafe.getClass() in PlatformDependent0 static block.
- Removed unnecessary null check in WebSocket08FrameEncoder.encode(...).
Because msg.content() can not return null.
- Removed unnecessary null check in DefaultStompFrame(StompCommand) constructor.
Because we have this check in the super class.
- Removed unnecessary null checks in ConcurrentHashMapV8.removeTreeNode(TreeNode<K,V>).
- Removed unnecessary null check in OioDatagramChannel.doReadMessages(List<Object>).
Because tmpPacket.getSocketAddress() always returns new SocketAddress instance.
- Removed unnecessary null check in OioServerSocketChannel.doReadMessages(List<Object>).
Because socket.accept() always returns new Socket instance.
- Pass Unpooled.buffer(0) instead of null inside CloseWebSocketFrame(boolean, int) constructor.
If we will pass null we will get NPE in super class constructor.
- Added throw new IllegalStateException in GlobalEventExecutor.awaitInactivity(long, TimeUnit) if it will be called before GlobalEventExecutor.execute(Runnable).
Because now we will get NPE. IllegalStateException will be better in this case.
- Fixed null check in OpenSslServerContext.setTicketKeys(byte[]).
Now we throw new NPE if byte[] is not null.
Result:
Added new null checks when it is necessary, removed unnecessary null checks and fixed some NPE problems.
Motivation:
Fix some typos in Netty.
Modifications:
- Fix potentially dangerous use of non-short-circuit logic in Recycler.transfer(Stack<?>).
- Removed double 'the the' in javadoc of EmbeddedChannel.
- Write to log an exception message if we can not get SOMAXCONN in the NetUtil's static block.
Motivation:
As a DatagramChannel supports to write to multiple remote peers we must not close the Channel once a IOException accours as this error may be only valid for one remote peer.
Modification:
Continue writing on IOException.
Result:
DatagramChannel can be used even after an IOException accours during writing.
Motivation:
Because of a missing return statement we may produce a NPE when try to fullfill the connect ChannelPromise when it was fullfilled before.
Modification:
Add missing return statement.
Result:
No more NPE.
Motivation:
When system is in short of entrophy, the initialization of
ThreadLocalRandom can take at most 3 seconds. The initialization occurs
when ThreadLocalRandom.current() is invoked first time, which might be
much later than the moment when the application has started. If we
start the initialization of ThreadLocalRandom as early as possible, we
can reduce the perceived time taken for the retrieval.
Modification:
Begin the initialization of ThreadLocalRandom in InternalLoggerFactory,
potentially one of the firstly initialized class in a Netty application.
Make DefaultChannelId retrieve the current process ID before retrieving
the current machine ID, because retrieval of a machine ID is more likely
to use ThreadLocalRandom.current().
Use a dummy channel ID for EmbeddedChannel, which prevents many unit
tests from creating a ThreadLocalRandom instance.
Result:
We gain extra 100ms at minimum for initialSeedUniquifier generation. If
an application has its own initialization that takes long enough time
and generates good amount of entrophy, it is very likely that we will
gain a lot more.
Motivation:
When a bind fails AbstractBootstrap will use the GlobalEventExecutor to notify the ChannelPromise. We should use the EventLoop of the Channel if possible.
Modification:
Use EventLoop of the Channel if possible to use the correct Thread to notify and so guaranteer the right order of events.
Result:
Use the correct EventLoop for notification
Motivation:
There is no way for a ChannelHandler to check if the passed in ChannelPromise for a write(...) call is a VoidChannelPromise. This is a problem as some handlers need to add listeners to the ChannelPromise which is not possible in the case of a VoidChannelPromise.
Modification:
- Introduce ChannelFuture.isVoid() which will return true if it is not possible to add listeners or wait on the result.
- Add ChannelPromise.unvoid() which allows to create a ChannelFuture out of a void ChannelFuture which supports all the operations.
Result:
It's now easy to write ChannelHandler implementations which also works when a void ChannelPromise is used.
Motivation:
We use the nanoTime of the scheduledTasks to calculate the milli-seconds to wait for a select operation to select something. Once these elapsed we check if there was something selected or some task is ready for processing. Unfortunally we not take into account scheduled tasks here so the selection loop will continue if only scheduled tasks are ready for processing. This will delay the execution of these tasks.
Modification:
- Check if a scheduled task is ready after selecting
- also make a tiny change in NioEventLoop to not trigger a rebuild if nothing was selected because the timeout was reached a few times in a row.
Result:
Execute scheduled tasks on time.
Motivation:
When a select rebuild was triggered the reference to the SelectionKey is not updated in AbstractNioChannel. This will cause a CancelledKeyException later.
Modification:
Correctly update SelectionKey reference after rebuild
Result:
Fix exception
Motivation:
Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used.
Modification:
- Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance.
- Recycling of the same object multiple times without acquire it will fail.
- Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler
These changes are based on @belliottsmith 's work that was part of #2504.
Result:
Huge increase in performance.
4.0 branch without this commit:
Benchmark (size) Mode Samples Score Score error Units
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
4.0 branch with this commit:
Benchmark (size) Mode Samples Score Score error Units
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
Motivation:
LocalServerChannel.doClose() calls LocalChannelRegistry.unregister(localAddress); without check if localAddress is null and so produce a NPE when pass null the used ConcurrentHashMapV8
Modification:
Check for localAddress != null before try to remove it from Map. Also added a unit test which showed the stacktrace of the error.
Result:
No more NPE during doClose().
Motivation:
At the moment AbstractBoostrap.bind(...) will always use the GlobalEventExecutor to notify the returned ChannelFuture if the registration is not done yet. This should only be done if the registration fails later. If it completes successful we should just notify with the EventLoop of the Channel.
Modification:
Use EventLoop of the Channel if possible to use the correct Thread to notify and so guaranteer the right order of events.
Result:
Use the correct EventLoop for notification
Motivation:
When Netty runs in a managed environment such as web application server,
Netty needs to provide an explicit way to remove the thread-local
variables it created to prevent class loader leaks.
FastThreadLocal uses different execution paths for storing a
thread-local variable depending on the type of the current thread.
It increases the complexity of thread-local removal.
Modifications:
- Moved FastThreadLocal and FastThreadLocalThread out of the internal
package so that a user can use it.
- FastThreadLocal now keeps track of all thread local variables it has
initialized, and calling FastThreadLocal.removeAll() will remove all
thread-local variables of the caller thread.
- Added FastThreadLocal.size() for diagnostics and tests
- Introduce InternalThreadLocalMap which is a mixture of hard-wired
thread local variable fields and extensible indexed variables
- FastThreadLocal now uses InternalThreadLocalMap to implement a
thread-local variable.
- Added ThreadDeathWatcher.unwatch() so that PooledByteBufAllocator
tells it to stop watching when its thread-local cache has been freed
by FastThreadLocal.removeAll().
- Added FastThreadLocalTest to ensure that removeAll() works
- Added microbenchmark for FastThreadLocal and JDK ThreadLocal
- Upgraded to JMH 0.9
Result:
- A user can remove all thread-local variables Netty created, as long as
he or she did not exit from the current thread. (Note that there's no
way to remove a thread-local variable from outside of the thread.)
- FastThreadLocal exposes more useful operations such as isSet() because
we always implement a thread local variable via InternalThreadLocalMap
instead of falling back to JDK ThreadLocal.
- FastThreadLocalBenchmark shows that this change improves the
performance of FastThreadLocal even more.
Motivation:
The code in ChannelOutboundBuffer can be simplified by using AtomicLongFieldUpdater.addAndGet(...)
Modification:
Replace our manual looping with AtomicLongFieldUpdater.addAndGet(...)
Result:
Cleaner code
Motivation:
If ChannelOutboundBuffer.addFlush() is called multiple times and flushed != unflushed it will still loop through all entries that are not flushed yet even if it is not needed anymore as these were marked uncancellable before.
Modifications:
Check if new messages were added since addFlush() was called and only if this was the case loop through all entries and try to mark the uncancellable.
Result:
Less overhead when ChannelOuboundBuffer.addFlush() is called multiple times without new messages been added.
Motivation:
Provide a faster ThreadLocal implementation
Modification:
Add a "FastThreadLocal" which uses an EnumMap and a predefined fixed set of possible thread locals (all of the static instances created by netty) that is around 10-20% faster than standard ThreadLocal in my benchmarks (and can be seen having an effect in the direct PooledByteBufAllocator benchmark that uses the DEFAULT ByteBufAllocator which uses this FastThreadLocal, as opposed to normal instantiations that do not, and in the new RecyclableArrayList benchmark);
Result:
Improved performance
Motivation:
Each of DefaultChannelPipeline instance creates an head and tail that wraps a handler. These are used to chain together other DefaultChannelHandlerContext that are created once a new ChannelHandler is added. There are a few things here that can be improved in terms of memory usage and initialization time.
Modification:
- Only generate the name for the tail and head one time as it will never change anyway
- Rename DefaultChannelHandlerContext to AbstractChannelHandlerContext and make it abstract
- Create a new DefaultChannelHandlerContext that is used when a ChannelHandler is added to the DefaultChannelPipeline
- Rename TailHandler to TailContext and HeadHandler to HeadContext and let them extend AbstractChannelHandlerContext. This way we can save 2 object creations per DefaultChannelPipeline
Result:
- Less memory usage because we have 2 less objects per DefaultChannelPipeline
- Faster creation of DefaultChannelPipeline as we not need to generate the name for the head and tail
Motivation:
At the moment ChannelFlushPromiseNotifier.add(....) takes an int value for pendingDataSize, which may be too small as a user may need to use a long. This can for example be useful when a user writes a FileRegion etc. Beside this the notify* method names are kind of missleading as these should not contain *Future* because it is about ChannelPromises.
Modification:
Add a new add(...) method that takes a long for pendingDataSize and @deprecated the old method. Beside this also @deprecated all *Future* methods and add methods that have *Promise* in the method name to better reflect usage.
Result:
ChannelFlushPromiseNotifier can be used with bigger data.
Motivation:
The default StringBuilder size is too small (data.length + 4) while it will be 2*data.length (byte to Hex) + 5 "-" char (since 5 peaces appended).
Modification:
Changing initial size to the correct one
Result:
Allocation of the correct final size from the beginning for this StringBuilder.
Motivation:
On some ill-configured systems, InetAddress.getLocalHost() fails. NioSocketChannelTest calls java.net.Socket.connect() and it internally invoked InetAddress.getLocalHost(), which causes the test failures in NioSocketChannelTes on such an ill-configured system.
Modifications:
Use NetUtil.LOCALHOST explicitly.
Result:
NioSocketChannelTest should not fail anymore.
Motivation:
The old DefaultAttributeMap impl did more synchronization then needed and also did not expose a efficient way to check if an attribute exists with a specific key.
Modifications:
* Rewrite DefaultAttributeMap to not use IdentityHashMap and synchronization on the map directly. The new impl uses a combination of AtomicReferenceArray and synchronization per chain (linked-list). Also access the first Attribute per bucket can be done without any synchronization at all and just uses atomic operations. This should fit for most use-cases pretty weel.
* Add hasAttr(...) implementation
Result:
It's now possible to check for the existence of a attribute without create one. Synchronization is per linked-list and the first entry can even be added via atomic operation.
Motivation:
At the moment we sometimes use only RecvByteBufAllocator.guess() to guess the next size and the use the ByteBufAllocator.* directly to allocate the buffer. We should always use RecvByteBufAllocator.allocate(...) all the time as this makes the behavior easier to adjust.
Modifications:
Change the read() implementations to make use of RecvByteBufAllocator.
Result:
Behavior is more consistent.
Motivation:
DefaultChannelPipeline.firstContext() should return null when the ipeline is empty. This is not the case atm.
Modification:
Fix incorrect check in DefaultChannelPipeline.firstContext() and add unit tests.
Result:
Correctly return null when DefaultChannelPipeline.firstContext() is called on empty pipeline.
Motivation:
Because Thread.currentThread().interrupt() will unblock Selector.select() we need to take special care when check if we need to rebuild the Selector. If the unblock was caused by the interrupt() we will clear it and move on as this is most likely a bug in a custom ChannelHandler or a library the user makes use of.
Modification:
Clear the interrupt state of the Thread if the Selector was unblock because of an interrupt and the returned keys was 0.
Result:
No more busy loop caused by Thread.currentThread().interrupt()
Motivation:
At the moment whenever we add/remove a ChannelHandler with an EventExecutorGroup we have two synchronization points in the execution path. One to find the childInvoker and one for add/remove itself. We can eliminate the former by call findIInvoker in the synchronization block, as we need to synchronize anyway.
Modification:
Remove the usage of AtomicFieldUpdater and the extra synchronization in findInvoker by moving the call of the method in the synchronized(this) block.
Result:
Less synchronization points and volatile reads/writes
Motivation:
As discussed in #2250, it will become much less complicated to implement
deregistration and reregistration of a channel once #2250 is resolved.
Therefore, there's no need to deprecate deregister() and
channelUnregistered().
Modification:
- Undeprecate deregister() and channelUnregistered()
- Remove SuppressWarnings annotations where applicable
Result:
We (including @jakobbuchgraber) are now ready to play with #2250 at
master
Motivation:
4 and 5 were diverged long time ago and we recently reverted some of the
early commits in master. We must make sure 4.1 and master are not very
different now.
Modification:
Fix found differences
Result:
4.1 and master got closer.
Motivation:
4 and 5 were diverged long time ago and we recently reverted some of the
early commits in master. We must make sure 4.1 and master are not very
different now.
Modification:
Remove ChannelHandlerInvoker.writeAndFlush(...) and the related
implementations.
Result:
4.1 and master got closer.
Motivation:
4 and 5 were diverged long time ago and we recently reverted some of the
early commits in master. We must make sure 4.1 and master are not very
different now.
Modification:
Remove ChannelHandlerInvoker.writeAndFlush(...) and the related implementations.
Result:
4.1 and master got closer.
Motivation:
4 and 5 were diverged long time ago and we recently reverted some of the
early commits in master. We must make sure 4.1 and master are not very
different now.
Modification:
Small adjustments to match up branches
Result:
4.1 and master got closer.
Motivation:
4 and 5 were diverged long time ago and we recently reverted some of the
early commits in master. We must make sure 4.1 and master are not very
different now.
Modification:
Fix found differences
Result:
4.1 and master got closer.
Motivation:
I had the NioSocketChannelTest.testFlushCloseReentrance() fail sometimes on one of my linux installation. This change let it pass all the time.
Modification:
Set the SO_SNDBUF to a small value to force split writes
Result:
Test is passing all the time where it was sometimes fail before.
Motivation:
At the moment it is not possible to deregister a LocalChannel from its EventLoop and register it to another one as the LocalChannel is closed during the deregister.
Modification:
Not close the LocalChannel during dergister
Result:
It is now possible to deregister a LocalChannel and register it to another EventLoop
Motivation:
Once a user implement a custom ChannelHandlerInvoker it is needed to validate the ChannelPromise. We should expose a utility method for this.
Modifications:
Move validatePromise(...) from DefaultChannelHandlerInvoker to ChannelHandlerInvokerUtil and make it public.
Result:
User is able to reuse code
Motivation:
At the moment it is possible to see a NPE when the LocalSocketChannels doRegister() method is called and the LocalSocketChannels doClose() method is called before the registration was completed.
Modifications:
Make sure we delay the actual close until the registration task was executed.
Result:
No more NPE
Motivation:
At the moment ChanneConfig.setAutoRead(false) only is guaranteer to not have an extra channelRead(...) triggered when used from within the channelRead(...) or channelReadComplete(...) method. This is not the correct behaviour as it should also work from other methods that are triggered from within the EventLoop. For example a valid use case is to have it called from within a ChannelFutureListener, which currently not work as expected.
Beside this there is another bug which is kind of related. Currently Channel.read() will not work as expected for OIO as we will stop try to read even if nothing could be read there after one read operation on the socket (when the SO_TIMEOUT kicks in).
Modifications:
Implement the logic the right way for the NIO/OIO/SCTP and native transport, specific to the transport implementation. Also correctly handle Channel.read() for OIO transport by trigger a new read if SO_TIMEOUT was catched.
Result:
It is now also possible to use ChannelConfig.setAutoRead(false) from other methods that are called from within the EventLoop and have direct effect.
Conflicts:
transport-sctp/src/main/java/io/netty/channel/sctp/nio/NioSctpChannel.java
transport/src/main/java/io/netty/channel/socket/nio/NioDatagramChannel.java
transport/src/main/java/io/netty/channel/socket/nio/NioSocketChannel.java
Motivation:
At the moment we create a HashMap that holds the MembershipKeys for multicast with every NioDatagramChannel even when most people not need it at al
Modifications:
Lazy create the HashMap when needed.
Result:
Less memory usage and less object creation
Motivation:
When using System.getProperty(...) and various methods to get a ClassLoader it will fail when a SecurityManager is in place.
Modifications:
Use a priveled block if needed. This work is based in the PR #2353 done by @anilsaldhana .
Result:
Code works also when SecurityManager is present
Motivation:
Because we not null out the array entry in the SelectionKey[] which is produced by SelectedSelectionKeySet.flip() we may end up with a few SelectionKeyreferences still hanging around here even after the Channel was closed. As these entries may be present at the end of the SelectionKey[] which is never updated for a long time as not enough SelectionKeys are ready.
Modifications:
Once we access the SelectionKey out of the SelectionKey[] we directly null it out.
Result:
Reference can be GC'ed right away once the Channel was closed.
Motivation:
At the moment we do a Channel.isActive() check in every AbstractChannel.AbstractUnsafe.write(...) call which gives quite some overhead as shown in the profiler when you write fast enough. We can eliminate the check and do something more smart here.
Modifications:
Remove the isActive() check and just check if the ChannelOutboundBuffer was set to null before, which means the Channel was closed. The rest will be handled in flush0() anyway.
Result:
Less overhead when doing many write calls
Motivation:
At the moment an IllegalArgumentException will be thrown if a ChannelPromise is cancelled while propagate through the ChannelPipeline. This is not correct, we should just stop to propagate it as it is valid to cancel at any time.
Modifications:
Stop propagate the operation through the ChannelPipeline once a ChannelPromise is cancelled.
Result:
No more IllegalArgumentException when cancel a ChannelPromise while moving through the ChannelPipeline.
Motivation:
While the default thread model provided by Netty is reasonable enough for most applications, some users might have a special requirement for the thread model. Here are a few examples:
- A user might want to invoke handlers from the caller thread directly, assuming that his or her application is completely asynchronous and does not make any invocation from non-I/O thread. In this case, the default invoker implementation will only add the overhead of checking if the current thread is an I/O thread or not.
- A user might want to invoke handlers from different threads depending on the type of events flexibly.
Modifications:
- Backport 132af3a485 which is a fix for #1912
- Add a new interface called 'ChannelHandlerInvoker' that performs the invocation of event handler methods.
- Add pipeline manipulation methods that accept ChannelHandlerInvoker
- The differences from the original commit:
- Separated the irrelevant changes out
- Channel.eventLoop is null until the registration is complete in this branch, so Channel.Unsafe.invoker() doesn't work before registration.
- Deregistration is not gone in this branch, so the methods related with deregistration were added to ChannelHandlerInvoker
Motivation:
MultithreadEventLoopGroup.newChild() does not override MultithreadEventExecutorGroup.newChild() which returns EventExecutor. MultithreadEventLoopGroup.newChild() should never return an EventExecutor, so this is incorrect.
Modifications:
Override MultithreadEventLoopGroup.newChild() so that it returns EventLoop
Result:
Correct API
Motivation:
EventExecutor.iterator() is fixed to return Iterator<EventExecutor> and there's no way to change that as long as we don't extend Iterable. However, a user should have a way to cast the returned set of executors painlessly. Currently, it is only possible with an explicit cast like (Iterator<NioEventLoop>).
Modifications:
Instead, I added a new method called 'children()' which returns an immutable collection of child executors whose method signature looks like the following:
<E extends EventExecutor> Set<E> children();
Result:
A user can now do this:
Set<NioEventLoop> loops = group.children();
for (NioEventLoop l: loops) { ... }
Unfortunately, this is not possible:
for (NioEventLoop l: group.children()) { ... }
However, it's still a gain that a user doesn't need to down-cast explicitly and to add the '@SuppressWarnings` annotation.
Motivation:
LocalEventLoopGroup and LocalEventLoop are not really special for LocalChannels. It can be used for other channel implementations as long as they don't require special handling.
Modifications:
- Add DefaultEventLoopGroup and DefaultEventLoop
- Deprecate LocalEventLoopGroup and make it extend DefaultEventLoopGroup
- Add DefaultEventLoop and remove LocalEventLoop
- Fix inspector warnings
Result:
- Better class names.
Motivation:
EventExecutor.parent() and EventLoop.parent() almost always return a constant parent executor. There's not much reason to let it implemented in subclasses.
Modifications:
- Implement AbstractEventExecutor.parent() with an additional contructor
- Add AbstractEventLoop so that subclasses extend AbstractEventLoop, which implements parent() appropriately
- Remove redundant parent() implementations in the subclasses
- Fix inspector warnings
Result:
Less duplication.
Motivation:
At the moment we use the system-wide default selector provider for this invocation of the Java virtual machine when constructing a new NIO channel, which makes using an alternative SelectorProvider practically useless.
This change allows user specify his/her preferred SelectorProvider.
Modifications:
Add SelectorProvider as a param for current `private static *Channel newSocket` method of NioSocketChannel, NioServerSocketChannel and NioDatagramChannel.
Change default constructors of NioSocketChannel, NioServerSocketChannel and NioDatagramChannel to use DEFAULT_SELECTOR_PROVIDER when calling newSocket(SelectorProvider).
Add new constructors for NioSocketChannel, NioServerSocketChannel and NioDatagramChannel which allow user specify his/her preferred SelectorProvider.
Result:
Now users can specify his/her preferred SelectorProvider when constructing an NIO channel.
Motivation:
Some operating systems like Windows 7 uses a valid globally unique EUI-64 MAC address for a virtual device (e.g. 00:00:00:00:00:00:00:E0), and because it's usually longer than the legit MAC-48 address, we should not use the length of MAC address when two MAC addresses are of the same quality. Instead, we should compare the INET address of the NICs before comparing the length of the MAC addresses.
Modification:
Compare the length of MAC addresses as a last resort.
Result:
Correct MAC address detection in Windows with IPv6 enabled.
Motivation:
When there are two MAC addresses which are good enough, we can choose the one with better IP address rather than just choosing the first appeared one.
Modification:
Replace isBetterAddress() with compareAddresses() to make it return if both addresses are in the same preference level.
Add compareAddresses() which compare InetAddresses and use it when compareAddress(byte[], byte[]) returns 0 (same preference)
Result:
More correct primary MAC address detection
Motivation:
As reported in #2331, some query operations in NetworkInterface takes much longer time than we expected. For example, specifying -Djava.net.preferIPv4Stack=true option in Window increases the execution time by more than 4 times. Some Windows systems have more than 20 network interfaces, and this problem gets bigger as the number of unused (virtual) NICs increases.
Modification:
Use NetworkInterface.getInetAddresses() wherever possible.
Before iterating over all NICs reported by NetworkInterface, filter the NICs without proper InetAddresses. This reduces the number of candidates quite a lot.
NetUtil does not query hardware address of NIC in the first place but uses InetAddress.isLoopbackAddress().
Do not call unnecessary query operations on NetworkInterface. Just get hardware address and compare.
Result:
Significantly reduced class initialization time, which should fix#2331.
Motivation:
Allow the user to create a NioServerSocketChannel from an existing ServerSocketChannel.
Modifications:
Add an extra constructor
Result:
Now the user is be able to create a NioServerSocketChannel from an existing ServerSocketChannel, like he can do with all the other Nio*Channel implemntations.
Motivation:
Ensure the user know the Channel must be closed to release resources like filehandles.
Modifications:
Add some extra javadoc.
Result:
More clear documentation
Motivation:
At the moment we use SocketChannel.open(), ServerSocketChannel.open() and DatagramSocketChannel.open(...) within the constructor of our
NIO channels. This introduces a bottleneck if you create a lot of connections as these calls delegate to SelectorProvider.provider() which
uses synchronized internal. This change removed the bottleneck.
Modifications:
Obtain a static instance of the SelectorProvider and use SelectorProvider.openSocketChannel(), SelectorProvider.openServerSocketChannel() and
SelectorProvider.openDatagramChannel(). This eliminates the bottleneck as SelectorProvider.provider() is not called on every channel creation.
Result:
Less conditions when create new channels.
Motivation:
Remove the synchronization bottleneck and so speed up things
Modifications:
Introduce a ThreadLocal cache that holds mappings between classes of ChannelHandlerAdapater implementations and the result of checking if the @Sharable annotation is present.
This way we only will need to do the real check one time and server the other calls via the cache. A ThreadLocal and WeakHashMap combo is used to implement the cache
as this way we can minimize the conditions while still be sure we not leak class instances in containers.
Result:
Less conditions during adding ChannelHandlerAdapter to the ChannelPipeline
This also does factor out some logic of ChannelOutboundBuffer. Mainly we not need nioBuffers() for many
transports and also not need to copy from heap to direct buffer. So this functionality was moved to
NioSocketChannelOutboundBuffer. Also introduce a EpollChannelOutboundBuffer which makes use of
memory addresses for all the writes to reduce GC pressure
- Allocating and deallocating a direct buffer for I/O is an expensive
operation, so we have to at least have a pool of direct buffers if the
current allocator is not pooled
- Related: #2163
- Add ResourceLeakHint to allow a user to provide a meaningful information about the leak when touching it
- DefaultChannelHandlerContext now implements ResourceLeakHint to tell where the message is going.
- Cleaner resource leak report by excluding noisy stack trace elements
- Fixes#1810
- Add a new interface ChannelId and its default implementation which generates globally unique channel ID.
- Replace AbstractChannel.hashCode with ChannelId.hashCode() and ChannelId.shortValue()
- Add variants of ByteBuf.hexDump() which accept byte[] instead of ByteBuf.
- Proposed fix for #1824
UniqueName and its subtypes do not allow getting the previously registered instance. For example, let's assume that a user is running his/her application in an OSGi container with Netty bundles and his server bundle. Whenever the server bundle is reloaded, the server will try to create a new AttributeKey instance with the same name. However, Netty bundles were not reloaded at all, so AttributeKey will complain that the name is taken already (by the previously loaded bundle.)
To fix this problem:
- Replaced UniqueName with Constant, AbstractConstant, and ConstantPool. Better name and better design.
- Sctp/Udt/RxtxChannelOption is not a ChannelOption anymore. They are just constant providers and ChannelOption is final now. It's because caching anything that's from outside of netty-transport will lead to ClassCastException on reload, because ChannelOption's constant pool will keep all option objects for reuse.
- Signal implements Constant because we can't ensure its uniqueness anymore by relying on the exception raised by UniqueName's constructor.
- Inspired by #2214 by @normanmaurer
- Call setUncancellable() before performing an outbound operation
- Add safeSetSuccess/Failure() and use them wherever
- Fixes#2060
- Ensure to return a future/promise implementation that does not fail with 'not registered to an event loop' error for registration operations
- If there is no usable event loop available, GlobalEventExecutor.INSTANCE is used as a fallback.
The problem with the old way was that we always set the OP_WRITE when the buffer could not be written
until the write-spin-count was reached. This means that in some cases the channel was still be writable
but we just was not able to write out the data quick enough. For this cases we should better break out the
write loop and schedule a write to be picked up later in the EventLoop, when other tasks was executed.
The OP_WRITE will only be set if a write actual returned 0 which means there is no more room for writing data
and this we need to wait for the os to notify us.
Beside this it also helps to reduce CPU usage as nioBufferCount() is quite expensive when used on CompositeByteBuf which are
nested and contains a lot of components
This ChannelOption allows to tell the DatagramChannel implementation to be active as soon as they are registrated to their EventLoop. This can be used to make it possible to write to a not bound DatagramChannel.
The ChannelOption is marked as @deprecated as I'm looking for a better solution in master which breaks default behaviour with 4.0 branch.
This move less common method patterns to extra methods and so make the nioBuffers() method with most common pattern (backed by one ByteBuffer) small enough for inlining.
This is needed because of otherwise the JDK itself will do an extra ByteBuffer copy with it's own pool implementation. Even worth it will be done
multiple times if the ByteBuffer is always only partial written. With this change the copy is done inside of netty using it's own allocator and
only be done one time in all cases.
Introduce a new interface called MessageSizeEstimator. This can be specific per Channel (via ChannelConfig). The MessageSizeEstimator will be used to estimate for a message that should be written. The default implementation handles ByteBuf, ByteBufHolder and FileRegion. A user is free to plug-in his/her own implementation for different behaviour.
This fixes#1664 and revert also the original commit which was meant to fix it 3b1881b523 . The problem with the original commit was that it could delay handlerRemove(..) calls and so mess up the order or forward bytes to late.
- Remove unnecessary ascending traversal of pipeline in DefaultChannelHandlerContext.freeInbound()
- Move DefaultChannelHandlerContext.teardownAll() to DefaultChannelPipeline
- Previously, failUnflushed() did not run when inFail is true, which made unflushed writes are not released on reentrance. This has been fixed by this commit.
- Also, AbstractUnsafe.outboundBuffer is set to null as early as possible to remove the chance of any write attempts made after the closure.
- Fix a bug in DefaultProgressivePromise.tryProgress() where the notification is dropped
- Fix a bug in AbstractChannel.calculateMessageSize() where FileRegion is not counted
- HttpStaticFileServer example now uses zero copy file transfer if possible.
- Merge MessageList into ChannelOutboundBuffer
- Make ChannelOutboundBuffer a queue-like data structure so that it is nearly impossible to leak a message
- Make ChannelOutboundBuffer public so that AbstractChannel can expose it to its subclasses.
- TODO: Re-enable gathering write in NioSocketChannel
This is often useful if you for example use a ChannelGroup to hold all connected Channels and want to broadcast a message too all of them
except one Channel.
- write() now accepts a ChannelPromise and returns ChannelFuture as most
users expected. It makes the user's life much easier because it is
now much easier to get notified when a specific message has been
written.
- flush() does not create a ChannelPromise nor returns ChannelFuture.
It is now similar to what read() looks like.
DefaultChannelHandlerContext does not trigger exceptionCaught() immediately when ChannelOutboundHandler.write() raises an exception. It just records the exception until flush() is triggered. On invokeFlush(), if there's any exception recorded, DefaultChannelHandlerContext will fail the promise without calling ChannelOutboundHandler.flush(). If more than one exception were raised, only the first exception is used as the cause of the failure and the others will be logged at warn level.
- Remove channelReadSuspended because it's actually same with messageReceivedLast
- Rename messageReceived to channelRead
- Rename messageReceivedLast to channelReadComplete
We renamed messageReceivedLast to channelReadComplete because it
reflects what it really is for. Also, we renamed messageReceived to
channelRead for consistency in method names.
I must admit MesageList was pain in the ass. Instead of forcing a
handler always loop over the list of messages, this commit splits
messageReceived(ctx, list) into two event handlers:
- messageReceived(ctx, msg)
- mmessageReceivedLast(ctx)
When Netty reads one or more messages, messageReceived(ctx, msg) event
is triggered for each message. Once the current read operation is
finished, messageReceivedLast() is triggered to tell the handler that
the last messageReceived() was the last message in the current batch.
Similarly, for outbound, write(ctx, list) has been split into two:
- write(ctx, msg)
- flush(ctx, promise)
Instead of writing a list of message with a promise, a user is now
supposed to call write(msg) multiple times and then call flush() to
actually flush the buffered messages.
Please note that write() doesn't have a promise with it. You must call
flush() to get notified on completion. (or you can use writeAndFlush())
Other changes:
- Because MessageList is completely hidden, codec framework uses
List<Object> instead of MessageList as an output parameter.
- Fixes#1528
It's not really easy to provide a general-purpose abstraction for fast-yet-safe iteration. Instead of making forEachByte() less optimal, let's make it do what it does really well, and allow a user to implement potentially unsafe-yet-fast loop using unsafe operations.
* The problem with the release(..) calls here was that it would have called release on an unsupported message and then throw an exception. This exception will trigger ChannelOutboundBuffer.fail(..), which will also try to release the message again.
* Also use the same exception type for unsupported messages as in other channel impls.
- MessageList.array() should give better performance + concise code
- MessageList.add(T[], int, int) iterated over the source array 3 times at worst case. This commit reduces that to 1 time.
- Related: #1378
- They now accept only one argument.
- A user who wants to use a buffer for more complex use cases, he or she can always access the buffer directly via memoryAddress() and array()
- Fixes#1426
- We already allow a user instantiate an EventLoopGroup with the default number of threads via the default constructor, so I think it's OK although it's not always optimal.
- Fixes#1486
- Decreased the default from 16 to 1 because unnecessary extra read on req-res protocols results in lower throughput due to extra syscalls.
- No need to have fine-grained lookup table because the buffer pool has
much more coarse capacities available
- No need to use a loop to normalize a buffer capacity
- SimpleChannelInboundHandler now has a constructor parameter to let a
user decide to enable automatic message release. (the default is to
enable), which makes ChannelInboundConsumingHandler of less value.
This reverts commit a1a86b9de4 because the
semantic of ctx.isRemoved() is confusing to a user - why is
ctx.isRemoved() false when handlerRemoved() is invoked? A better
solution would be check if the connection is inactive and mark the
promise as failure before attempting to write anything.
- Related issue: #1432
- Add Future.isCancellable()
- Add Promise.setUncancellable() which is meant to be used for the party that runs the task uncancellable once started
- Implement Future.isCancelled() and Promise.cancel(boolean) properly
The AIO transport was added in the past as we hoped it would have better latency as the NIO transport. But in reality this was never the case.
So there is no reason to use the AIO transport at all. It just put more burden on us as we need to also support it and fix bugs.
Because of this we dedicided to remove it for now. It will stay in the master_with_aio_transport branch so we can pick it up later again if it is ever needed.
The API changes made so far turned out to increase the memory footprint
and consumption while our intention was actually decreasing them.
Memory consumption issue:
When there are many connections which does not exchange data frequently,
the old Netty 4 API spent a lot more memory than 3 because it always
allocates per-handler buffer for each connection unless otherwise
explicitly stated by a user. In a usual real world load, a client
doesn't always send requests without pausing, so the idea of having a
buffer whose life cycle if bound to the life cycle of a connection
didn't work as expected.
Memory footprint issue:
The old Netty 4 API decreased overall memory footprint by a great deal
in many cases. It was mainly because the old Netty 4 API did not
allocate a new buffer and event object for each read. Instead, it
created a new buffer for each handler in a pipeline. This works pretty
well as long as the number of handlers in a pipeline is only a few.
However, for a highly modular application with many handlers which
handles connections which lasts for relatively short period, it actually
makes the memory footprint issue much worse.
Changes:
All in all, this is about retaining all the good changes we made in 4 so
far such as better thread model and going back to the way how we dealt
with message events in 3.
To fix the memory consumption/footprint issue mentioned above, we made a
hard decision to break the backward compatibility again with the
following changes:
- Remove MessageBuf
- Merge Buf into ByteBuf
- Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler
- Similar changes were made to the adapter classes
- Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler
- Similar changes were made to the adapter classes
- Introduce MessageList which is similar to `MessageEvent` in Netty 3
- Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList)
- Replace flush(ctx, promise) with write(ctx, MessageList, promise)
- Remove ByteToByteEncoder/Decoder/Codec
- Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf>
- Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel
- Add SimpleChannelInboundHandler which is sometimes more useful than
ChannelInboundHandlerAdapter
- Bring back Channel.isWritable() from Netty 3
- Add ChannelInboundHandler.channelWritabilityChanges() event
- Add RecvByteBufAllocator configuration property
- Similar to ReceiveBufferSizePredictor in Netty 3
- Some existing configuration properties such as
DatagramChannelConfig.receivePacketSize is gone now.
- Remove suspend/resumeIntermediaryDeallocation() in ByteBuf
This change would have been impossible without @normanmaurer's help. He
fixed, ported, and improved many parts of the changes.
- Use the local transport in a correct way (i.e. no need to trigger channelActive et al by ourselves)
- Use Promise/Future instead of CountDownLatch where they simplifies
- Fixes#1366: No elegant way to free non-in/outbound buffers held by a handler
- handlerRemoved() is now also invoked when a channel is deregistered, as well as when a handler is removed from a pipeline.
- A little bit of clean-up for readability
- Fix a bug in forwardBufferContentAndRemove() where the handler buffers are not freed (mainly because we were relying on channel.isRegistered() to determine if the handler has been removed from inside the handler.
- ChunkedWriteHandler.handlerRemoved() is unnecessary anymore because ChannelPipeline now always forwards the content of the buffer.
This is done by stop accept() new sockets for 1 seconds
Beside this this commit also makes sure accept() exceptions of OioServerSocketChannel trigger
the fireExceptionCaught(...). The same is true fo the AioServerSocketChannel.
- Fixes#1282 (not perfectly, but to the extent it's possible with the current API)
- Add AddressedEnvelope and DefaultAddressedEnvelope
- Make DatagramPacket extend DefaultAddressedEnvelope<ByteBuf, InetSocketAddress>
- Rename ByteBufHolder.data() to content() so that a message can implement both AddressedEnvelope and ByteBufHolder (DatagramPacket does) without introducing two getter methods for the content
- Datagram channel implementations now understand ByteBuf and ByteBufHolder as a message with unspecified remote address.
shutdownGracefully() provides two optional parameters that give more
control over when an executor has to be shut down.
- Related issue: #1307
- Add shutdownGracefully(..) and isShuttingDown()
- Deprecate shutdown() / shutdownNow()
- Replace lastAccessTime with lastExecutionTime and update it after task
execution for accurate quiet period check
- runAllTasks() and runShutdownTasks() update it automatically.
- Add updateLastExecutionTime() so that subclasses can update it
- Add a constructor parameter that tells not to add an unncessary wakeup
task in execute() if addTask() wakes up the executor thread
automatically. Previously, execute() always called wakeup() after
addTask(), which often caused an extra dummy task in the task queue.
- Use shutdownGracefully() wherever possible / Deprecation javadoc
- Reduce the running time of SingleThreadEventLoopTest from 40s to 15s
using custom graceful shutdown parameters
- Other changes made along with this commit:
- takeTask() does not throw InterruptedException anymore.
- Returns null on interruption or wakeup
- Make sure runShutdownTasks() return true even if an exception was
raised while running the shutdown tasks
- Remove unnecessary isShutdown() checks
- Consistent use of SingleThreadEventExecutor.nanoTime()
Replace isWakeupOverridden with a constructor parameter
- Fixes#1308
freeInboundBuffer() and freeOutboundBuffer() were introduced in the early days of the new API when we did not have reference counting mechanism in the buffer. A user did not want Netty to free the handler buffers had to override these methods.
However, now that we have reference counting mechanism built into the buffer, a user who wants to retain the buffers beyond handler's life cycle can simply return the buffer whose reference count is greater than 1 in newInbound/OutboundBuffer().
- Added a test case that reproduces the problem in ReplayingDecoderTest
- Call newHandler.handlerAdded() *before* oldHandler.handlerRemoved() to ensure newHandlerAdded() is called before forwarding the buffer content of the old handler in replace0().
- Fixes#1292
- Replace DefaultChannelPipeline.inbound/outboundShutdown flag with per-context flags
- Update the flags in free() / freeInbound() / freeOutbound() for clarity
- Replace ugly 'prev != null' check with explicit event scheduling
- Fix an incorrect flag operation in freeHandlerBuffersAfterRemoval()
- Fix a bug in AbstractEmbeddedChannel.doRegister where it makes pending tasks immediately, where the pending tasks actually triggers inbound events
- Remove unnecessary suppression of inboundBufferUpdated() event in DefaultChannelPipeline, which potentially hides an event ordering bug. Unfortunately, I don't remember why I added it in cca35454d2.
This change also introduce a few other changes which was needed:
* ChannelHandler.beforeAdd(...) and ChannelHandler.beforeRemove(...) were removed
* ChannelHandler.afterAdd(...) -> handlerAdded(...)
* ChannelHandler.afterRemoved(...) -> handlerRemoved(...)
* SslHandler.handshake() -> SslHandler.hanshakeFuture() as the handshake is triggered automatically after
the Channel becomes active
- Now works without the transport package
- Renamed TransferFuture to ProgressiveFuture and ChannelProgressiveFuture / same for promises
- ProgressiveFutureListener now extends GenericProgressiveFutureListener and GenericFutureListener (add/removeTransferListener*() were removed)
- Renamed DefaultEventListeners to DefaultFutureListeners and only accept GenericFutureListeners
- Various clean-up
This commit splits bridge into two parts. One is NextBridgeFeeder,
which provides ByteBuf and MessageBuf that are local to the context
whose next*Buffer() has been invoked on. The other is a thread-safe
queue that stores the data fed by NextBridgeFeeder.feed().
By splitting the bridge into the two parts, the data pushed by a handler
is not lost anymore when the next handler who provided the next buffer
is removed from the pipeline.
- Fixes#1272
- Fixes#1229
- Primarily written by @normanmaurer and revised by @trustin
This commit removes the notion of unfolding from the codec framework
completely. Unfolding was introduced in Netty 3.x to work around the
shortcoming of the codec framework where encode() and decode() did not
allow generating multiple messages.
Such a shortcoming can be fixed by changing the signature of encode()
and decode() instead of introducing an obscure workaround like
unfolding. Therefore, we changed the signature of them in 4.0.
The change is simple, but backward-incompatible. encode() and decode()
do not return anything. Instead, the codec framework will pass a
MessageBuf<Object> so encode() and decode() can add the generated
messages into the MessageBuf.
- Count the number of select() calls made to wait until reaching at the expected dead line, and rebuild selectors if too many select() calls were made.
- Similar to @normanmaurer's fix in that this commit also makes Bootstrap.init(Channel) asynchronous, but it is simpler and less invasive.
- Also made sure a connection attempt failure in the local transport does not trigger an exceptionCaught event
- The offending test case is annotated with `@Ignore`
- Also fixed a bug where channel initialization failure swallows the original cause of initialization failure
- Add ChannelHandlerUtil and move the core logic of ChannelInbound/OutboundMessageHandler to ChannelHandlerUtil
- Add ChannelHandlerUtil.SingleInbound/OutboundMessageHandler and make ChannelInbound/OutboundMessageHandlerAdapter implement them. This is a backward incompatible change because it forces all handler methods to be public (was protected previously)
- Fixes: #1119
- Merge waiters and fluchCheckpoint into a single field
- This limits the number of waiter threads to 2^24 - 1, which is still very large. Can you imagine an application with 16 million threads?
- Rename inbound/outboundBufferFreed to inbound/OutboundShutdown which makes more sense
- Move DefaultChannelHandlerContext.isInbound/OutboundBufferFreed() to DefaultChannelPipeline
- Fix a problem where invokeFreeInbound/OutboundBuffer() sets inbound/outboundShutdown too early (this was the direct cause of #1064)
- Remove the volatile modifier - DCHC.prev/next are volatile and that's just enough