Motivation:
The previous fix did disable the caching of ByteBuffers completely which can cause performance regressions. This fix makes sure we use nioBuffers() for all writes in NioSocketChannel and so prevent data-corruptions. This is still kind of a workaround which will be replaced by a more fundamental fix later.
Modifications:
- Revert 4059c9f354
- Use nioBuffers() for all writes to prevent data-corruption
Result:
No more data-corruption but still retain the original speed.
Motivation:
At the moment we expand the ByteBuffer[] when we have more then 1024 ByteBuffer to write and replace the stored instance in its FastThreadLocal. This is not needed and may even harm performance on linux as IOV_MAX is 1024 and so this may cause the JVM to do an array copy.
Modifications:
Just exit the nioBuffers() method if we can not fit more ByteBuffer in the array. This way we will pick them up on the next call.
Result:
Remove uncessary array copy and simplify the code.
Motivation:
We cache the ByteBuffers in ChannelOutboundBuffer.nioBuffers() for the Entries in the ChannelOutboundBuffer to reduce some overhead. The problem is this can lead to data-corruption if an incomplete write happens and next time we try to do a non-gathering write.
To fix this we should remove the caching which does not help a lot anyway and just make the code buggy.
Modifications:
Remove the caching of ByteBuffers.
Result:
No more data-corruption.
Motivation:
At the moment it's only possible for a user to set the RecvByteBufAllocator for a Channel but not access the Handle once it is assigned. This makes it hard to write more flexible implementations.
Modifications:
Add a new method to the Channel.Unsafe to allow access the the used Handle for the Channel. The RecvByteBufAllocator.Handle is created lazily.
Result:
It's possible to write more flexible implementatons that allow to adjust stuff on the fly for a Handle that is used by a Channel
Motivation:
Sometimes ChannelHandler need to queue writes to some point and then process these. We currently have no datastructure for this so the user will use an Queue or something like this. The problem is with this Channel.isWritable() will not work as expected and so the user risk to write to fast. That's exactly what happened in our SslHandler. For this purpose we need to add a special datastructure which will also take care of update the Channel and so be sure that Channel.isWritable() works as expected.
Modifications:
- Add PendingWriteQueue which can be used for this purpose
- Make use of PendingWriteQueue in SslHandler
Result:
It is now possible to queue writes in a ChannelHandler and still have Channel.isWritable() working as expected. This also fixes#2752.
Motivation:
We did various changes related to the ChannelOutboundBuffer in 4.0 branch. This commit port all of them over and so make sure our branches are synced in terms of these changes.
Related to [#2734], [#2709], [#2729], [#2710] and [#2693] .
Modification:
Port all changes that was done on the ChannelOutboundBuffer.
This includes the port of the following commits:
- 73dfd7c01b
- 997d8c32d2
- e282e504f1
- 5e5d1a58fd
- 8ee3575e72
- d6f0d12a86
- 16e50765d1
- 3f3e66c31a
Result:
- Less memory usage by ChannelOutboundBuffer
- Same code as in 4.0 branch
- Make it possible to use ChannelOutboundBuffer with Channel implementation that not extends AbstractChannel
Motivation:
The PID_MAX_LIMIT on 64bit linux systems is 4194304 and on osx it is 99998. At the moment we use 65535 as an upper-limit which is too small.
Modifications:
Use 4194304 as max possible value
Result:
No more false-positives when try to detect current pid.
Motivation:
We have some inconsistency when handling writes. Sometimes we call ChannelOutboundBuffer.progress(...) also for complete writes and sometimes not. We should call it always.
Modifications:
Correctly call ChannelOuboundBuffer.progress(...) for complete and incomplete writes.
Result:
Consistent behavior
Motivation:
While benchmarking the native transport with gathering writes I noticed that it is quite slow. This is due the fact that we need to do a lot of array copies to get the buffers into the iov array.
Modification:
Introduce a new class calles IovArray which allows to fill buffers directly in a iov array that can be passed over to JNI without any array copies. This gives a nice optimization in terms of speed when doing gathering writes.
Result:
Big performance improvement when doing gathering writes. See the included benchmark...
Before:
[nmaurer@xxx]~% wrk/wrk -H 'Host: localhost' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Connection: keep-alive' -d 120 -c 256 -t 16 --pipeline 256 http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 23.44ms 16.37ms 259.57ms 91.77%
Req/Sec 181.99k 31.69k 304.60k 78.12%
346544071 requests in 2.00m, 46.48GB read
Requests/sec: 2887885.09
Transfer/sec: 396.59MB
With this change:
[nmaurer@xxx]~% wrk/wrk -H 'Host: localhost' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Connection: keep-alive' -d 120 -c 256 -t 16 --pipeline 256 http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 21.93ms 16.33ms 305.73ms 92.34%
Req/Sec 194.56k 33.75k 309.33k 77.04%
369617503 requests in 2.00m, 49.57GB read
Requests/sec: 3080169.65
Transfer/sec: 423.00MB
Motivation:
ChannelOutboundBuffer is basically a circular array queue of its entry
objects. Once an entry is created in the array, it is never nulled out
to reduce the allocation cost.
However, because it is a circular queue, the array almost always ends up
with as many entry instances as the size of the array, regardless of the
number of pending writes.
At worst case, a channel might have only 1 pending writes at maximum
while creating 32 entry objects, where 32 is the initial capacity of the
array.
Modifications:
- Reduce the initial capacity of the circular array queue to 4.
- Make the initial capacity of the circular array queue configurable
Result:
We spend 4 times less memory for entry objects under certain
circumstances.
Motivation:
At the moment NioSocketChannelOutboundBuffer.nioBuffers() / EpollSocketChannelOutboundBuffer.memoryAddresses() returns null if something is contained in the ChannelOutboundBuffer which is not a ByteBuf. This is a problem for two reasons:
1 - In the javadocs we state that it will never return null
2 - We may do a not optimal write as there may be things that could be written via gathering writes
Modifications:
Change NioSocketChannelOutboundBuffer.nioBuffers() / EpollSocketChannelOutboundBuffer.memoryAddresses() to never return null but have it contain all ByteBuffer that were found before the non ByteBuf. This way we can do a gathering write and also conform to the javadocs.
Result:
Better speed and also correct implementation in terms of the api.
Motivation:
Now Netty has a few problems with null values.
Modifications:
- Check HAProxyProxiedProtocol in HAProxyMessage constructor and throw NPE if it is null.
If HAProxyProxiedProtocol is null we will set AddressFamily as null. So we will get NPE inside checkAddress(String, AddressFamily) and it won't be easy to understand why addrFamily is null.
- Check File in DiskFileUpload.toString().
If File is null we will get NPE when calling toString() method.
- Check Result<String> in MqttDecoder.decodeConnectionPayload(...).
If !mqttConnectVariableHeader.isWillFlag() || !mqttConnectVariableHeader.hasUserName() || !mqttConnectVariableHeader.hasPassword() we will get NPE when we will try to create new instance of MqttConnectPayload.
- Check Unsafe before calling unsafe.getClass() in PlatformDependent0 static block.
- Removed unnecessary null check in WebSocket08FrameEncoder.encode(...).
Because msg.content() can not return null.
- Removed unnecessary null check in DefaultStompFrame(StompCommand) constructor.
Because we have this check in the super class.
- Removed unnecessary null checks in ConcurrentHashMapV8.removeTreeNode(TreeNode<K,V>).
- Removed unnecessary null check in OioDatagramChannel.doReadMessages(List<Object>).
Because tmpPacket.getSocketAddress() always returns new SocketAddress instance.
- Removed unnecessary null check in OioServerSocketChannel.doReadMessages(List<Object>).
Because socket.accept() always returns new Socket instance.
- Pass Unpooled.buffer(0) instead of null inside CloseWebSocketFrame(boolean, int) constructor.
If we will pass null we will get NPE in super class constructor.
- Added throw new IllegalStateException in GlobalEventExecutor.awaitInactivity(long, TimeUnit) if it will be called before GlobalEventExecutor.execute(Runnable).
Because now we will get NPE. IllegalStateException will be better in this case.
- Fixed null check in OpenSslServerContext.setTicketKeys(byte[]).
Now we throw new NPE if byte[] is not null.
Result:
Added new null checks when it is necessary, removed unnecessary null checks and fixed some NPE problems.
Motivation:
Fix some typos in Netty.
Modifications:
- Fix potentially dangerous use of non-short-circuit logic in Recycler.transfer(Stack<?>).
- Removed double 'the the' in javadoc of EmbeddedChannel.
- Write to log an exception message if we can not get SOMAXCONN in the NetUtil's static block.
Motivation:
As a DatagramChannel supports to write to multiple remote peers we must not close the Channel once a IOException accours as this error may be only valid for one remote peer.
Modification:
Continue writing on IOException.
Result:
DatagramChannel can be used even after an IOException accours during writing.
Motivation:
Because of a missing return statement we may produce a NPE when try to fullfill the connect ChannelPromise when it was fullfilled before.
Modification:
Add missing return statement.
Result:
No more NPE.
Motivation:
When system is in short of entrophy, the initialization of
ThreadLocalRandom can take at most 3 seconds. The initialization occurs
when ThreadLocalRandom.current() is invoked first time, which might be
much later than the moment when the application has started. If we
start the initialization of ThreadLocalRandom as early as possible, we
can reduce the perceived time taken for the retrieval.
Modification:
Begin the initialization of ThreadLocalRandom in InternalLoggerFactory,
potentially one of the firstly initialized class in a Netty application.
Make DefaultChannelId retrieve the current process ID before retrieving
the current machine ID, because retrieval of a machine ID is more likely
to use ThreadLocalRandom.current().
Use a dummy channel ID for EmbeddedChannel, which prevents many unit
tests from creating a ThreadLocalRandom instance.
Result:
We gain extra 100ms at minimum for initialSeedUniquifier generation. If
an application has its own initialization that takes long enough time
and generates good amount of entrophy, it is very likely that we will
gain a lot more.
Motivation:
When a bind fails AbstractBootstrap will use the GlobalEventExecutor to notify the ChannelPromise. We should use the EventLoop of the Channel if possible.
Modification:
Use EventLoop of the Channel if possible to use the correct Thread to notify and so guaranteer the right order of events.
Result:
Use the correct EventLoop for notification
Motivation:
There is no way for a ChannelHandler to check if the passed in ChannelPromise for a write(...) call is a VoidChannelPromise. This is a problem as some handlers need to add listeners to the ChannelPromise which is not possible in the case of a VoidChannelPromise.
Modification:
- Introduce ChannelFuture.isVoid() which will return true if it is not possible to add listeners or wait on the result.
- Add ChannelPromise.unvoid() which allows to create a ChannelFuture out of a void ChannelFuture which supports all the operations.
Result:
It's now easy to write ChannelHandler implementations which also works when a void ChannelPromise is used.
Motivation:
We use the nanoTime of the scheduledTasks to calculate the milli-seconds to wait for a select operation to select something. Once these elapsed we check if there was something selected or some task is ready for processing. Unfortunally we not take into account scheduled tasks here so the selection loop will continue if only scheduled tasks are ready for processing. This will delay the execution of these tasks.
Modification:
- Check if a scheduled task is ready after selecting
- also make a tiny change in NioEventLoop to not trigger a rebuild if nothing was selected because the timeout was reached a few times in a row.
Result:
Execute scheduled tasks on time.
Motivation:
When a select rebuild was triggered the reference to the SelectionKey is not updated in AbstractNioChannel. This will cause a CancelledKeyException later.
Modification:
Correctly update SelectionKey reference after rebuild
Result:
Fix exception
Motivation:
Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used.
Modification:
- Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance.
- Recycling of the same object multiple times without acquire it will fail.
- Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler
These changes are based on @belliottsmith 's work that was part of #2504.
Result:
Huge increase in performance.
4.0 branch without this commit:
Benchmark (size) Mode Samples Score Score error Units
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
4.0 branch with this commit:
Benchmark (size) Mode Samples Score Score error Units
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
Motivation:
LocalServerChannel.doClose() calls LocalChannelRegistry.unregister(localAddress); without check if localAddress is null and so produce a NPE when pass null the used ConcurrentHashMapV8
Modification:
Check for localAddress != null before try to remove it from Map. Also added a unit test which showed the stacktrace of the error.
Result:
No more NPE during doClose().
Motivation:
At the moment AbstractBoostrap.bind(...) will always use the GlobalEventExecutor to notify the returned ChannelFuture if the registration is not done yet. This should only be done if the registration fails later. If it completes successful we should just notify with the EventLoop of the Channel.
Modification:
Use EventLoop of the Channel if possible to use the correct Thread to notify and so guaranteer the right order of events.
Result:
Use the correct EventLoop for notification
Motivation:
When Netty runs in a managed environment such as web application server,
Netty needs to provide an explicit way to remove the thread-local
variables it created to prevent class loader leaks.
FastThreadLocal uses different execution paths for storing a
thread-local variable depending on the type of the current thread.
It increases the complexity of thread-local removal.
Modifications:
- Moved FastThreadLocal and FastThreadLocalThread out of the internal
package so that a user can use it.
- FastThreadLocal now keeps track of all thread local variables it has
initialized, and calling FastThreadLocal.removeAll() will remove all
thread-local variables of the caller thread.
- Added FastThreadLocal.size() for diagnostics and tests
- Introduce InternalThreadLocalMap which is a mixture of hard-wired
thread local variable fields and extensible indexed variables
- FastThreadLocal now uses InternalThreadLocalMap to implement a
thread-local variable.
- Added ThreadDeathWatcher.unwatch() so that PooledByteBufAllocator
tells it to stop watching when its thread-local cache has been freed
by FastThreadLocal.removeAll().
- Added FastThreadLocalTest to ensure that removeAll() works
- Added microbenchmark for FastThreadLocal and JDK ThreadLocal
- Upgraded to JMH 0.9
Result:
- A user can remove all thread-local variables Netty created, as long as
he or she did not exit from the current thread. (Note that there's no
way to remove a thread-local variable from outside of the thread.)
- FastThreadLocal exposes more useful operations such as isSet() because
we always implement a thread local variable via InternalThreadLocalMap
instead of falling back to JDK ThreadLocal.
- FastThreadLocalBenchmark shows that this change improves the
performance of FastThreadLocal even more.
Motivation:
The code in ChannelOutboundBuffer can be simplified by using AtomicLongFieldUpdater.addAndGet(...)
Modification:
Replace our manual looping with AtomicLongFieldUpdater.addAndGet(...)
Result:
Cleaner code
Motivation:
If ChannelOutboundBuffer.addFlush() is called multiple times and flushed != unflushed it will still loop through all entries that are not flushed yet even if it is not needed anymore as these were marked uncancellable before.
Modifications:
Check if new messages were added since addFlush() was called and only if this was the case loop through all entries and try to mark the uncancellable.
Result:
Less overhead when ChannelOuboundBuffer.addFlush() is called multiple times without new messages been added.
Motivation:
Provide a faster ThreadLocal implementation
Modification:
Add a "FastThreadLocal" which uses an EnumMap and a predefined fixed set of possible thread locals (all of the static instances created by netty) that is around 10-20% faster than standard ThreadLocal in my benchmarks (and can be seen having an effect in the direct PooledByteBufAllocator benchmark that uses the DEFAULT ByteBufAllocator which uses this FastThreadLocal, as opposed to normal instantiations that do not, and in the new RecyclableArrayList benchmark);
Result:
Improved performance
Motivation:
Each of DefaultChannelPipeline instance creates an head and tail that wraps a handler. These are used to chain together other DefaultChannelHandlerContext that are created once a new ChannelHandler is added. There are a few things here that can be improved in terms of memory usage and initialization time.
Modification:
- Only generate the name for the tail and head one time as it will never change anyway
- Rename DefaultChannelHandlerContext to AbstractChannelHandlerContext and make it abstract
- Create a new DefaultChannelHandlerContext that is used when a ChannelHandler is added to the DefaultChannelPipeline
- Rename TailHandler to TailContext and HeadHandler to HeadContext and let them extend AbstractChannelHandlerContext. This way we can save 2 object creations per DefaultChannelPipeline
Result:
- Less memory usage because we have 2 less objects per DefaultChannelPipeline
- Faster creation of DefaultChannelPipeline as we not need to generate the name for the head and tail
Motivation:
At the moment ChannelFlushPromiseNotifier.add(....) takes an int value for pendingDataSize, which may be too small as a user may need to use a long. This can for example be useful when a user writes a FileRegion etc. Beside this the notify* method names are kind of missleading as these should not contain *Future* because it is about ChannelPromises.
Modification:
Add a new add(...) method that takes a long for pendingDataSize and @deprecated the old method. Beside this also @deprecated all *Future* methods and add methods that have *Promise* in the method name to better reflect usage.
Result:
ChannelFlushPromiseNotifier can be used with bigger data.
Motivation:
The default StringBuilder size is too small (data.length + 4) while it will be 2*data.length (byte to Hex) + 5 "-" char (since 5 peaces appended).
Modification:
Changing initial size to the correct one
Result:
Allocation of the correct final size from the beginning for this StringBuilder.
Motivation:
On some ill-configured systems, InetAddress.getLocalHost() fails. NioSocketChannelTest calls java.net.Socket.connect() and it internally invoked InetAddress.getLocalHost(), which causes the test failures in NioSocketChannelTes on such an ill-configured system.
Modifications:
Use NetUtil.LOCALHOST explicitly.
Result:
NioSocketChannelTest should not fail anymore.
Motivation:
The old DefaultAttributeMap impl did more synchronization then needed and also did not expose a efficient way to check if an attribute exists with a specific key.
Modifications:
* Rewrite DefaultAttributeMap to not use IdentityHashMap and synchronization on the map directly. The new impl uses a combination of AtomicReferenceArray and synchronization per chain (linked-list). Also access the first Attribute per bucket can be done without any synchronization at all and just uses atomic operations. This should fit for most use-cases pretty weel.
* Add hasAttr(...) implementation
Result:
It's now possible to check for the existence of a attribute without create one. Synchronization is per linked-list and the first entry can even be added via atomic operation.
Motivation:
At the moment we sometimes use only RecvByteBufAllocator.guess() to guess the next size and the use the ByteBufAllocator.* directly to allocate the buffer. We should always use RecvByteBufAllocator.allocate(...) all the time as this makes the behavior easier to adjust.
Modifications:
Change the read() implementations to make use of RecvByteBufAllocator.
Result:
Behavior is more consistent.
Motivation:
DefaultChannelPipeline.firstContext() should return null when the ipeline is empty. This is not the case atm.
Modification:
Fix incorrect check in DefaultChannelPipeline.firstContext() and add unit tests.
Result:
Correctly return null when DefaultChannelPipeline.firstContext() is called on empty pipeline.
Motivation:
Because Thread.currentThread().interrupt() will unblock Selector.select() we need to take special care when check if we need to rebuild the Selector. If the unblock was caused by the interrupt() we will clear it and move on as this is most likely a bug in a custom ChannelHandler or a library the user makes use of.
Modification:
Clear the interrupt state of the Thread if the Selector was unblock because of an interrupt and the returned keys was 0.
Result:
No more busy loop caused by Thread.currentThread().interrupt()
Motivation:
At the moment whenever we add/remove a ChannelHandler with an EventExecutorGroup we have two synchronization points in the execution path. One to find the childInvoker and one for add/remove itself. We can eliminate the former by call findIInvoker in the synchronization block, as we need to synchronize anyway.
Modification:
Remove the usage of AtomicFieldUpdater and the extra synchronization in findInvoker by moving the call of the method in the synchronized(this) block.
Result:
Less synchronization points and volatile reads/writes
Motivation:
As discussed in #2250, it will become much less complicated to implement
deregistration and reregistration of a channel once #2250 is resolved.
Therefore, there's no need to deprecate deregister() and
channelUnregistered().
Modification:
- Undeprecate deregister() and channelUnregistered()
- Remove SuppressWarnings annotations where applicable
Result:
We (including @jakobbuchgraber) are now ready to play with #2250 at
master
Motivation:
4 and 5 were diverged long time ago and we recently reverted some of the
early commits in master. We must make sure 4.1 and master are not very
different now.
Modification:
Fix found differences
Result:
4.1 and master got closer.
Motivation:
4 and 5 were diverged long time ago and we recently reverted some of the
early commits in master. We must make sure 4.1 and master are not very
different now.
Modification:
Remove ChannelHandlerInvoker.writeAndFlush(...) and the related
implementations.
Result:
4.1 and master got closer.
Motivation:
4 and 5 were diverged long time ago and we recently reverted some of the
early commits in master. We must make sure 4.1 and master are not very
different now.
Modification:
Remove ChannelHandlerInvoker.writeAndFlush(...) and the related implementations.
Result:
4.1 and master got closer.
Motivation:
4 and 5 were diverged long time ago and we recently reverted some of the
early commits in master. We must make sure 4.1 and master are not very
different now.
Modification:
Small adjustments to match up branches
Result:
4.1 and master got closer.
Motivation:
4 and 5 were diverged long time ago and we recently reverted some of the
early commits in master. We must make sure 4.1 and master are not very
different now.
Modification:
Fix found differences
Result:
4.1 and master got closer.
Motivation:
I had the NioSocketChannelTest.testFlushCloseReentrance() fail sometimes on one of my linux installation. This change let it pass all the time.
Modification:
Set the SO_SNDBUF to a small value to force split writes
Result:
Test is passing all the time where it was sometimes fail before.
Motivation:
At the moment it is not possible to deregister a LocalChannel from its EventLoop and register it to another one as the LocalChannel is closed during the deregister.
Modification:
Not close the LocalChannel during dergister
Result:
It is now possible to deregister a LocalChannel and register it to another EventLoop
Motivation:
Once a user implement a custom ChannelHandlerInvoker it is needed to validate the ChannelPromise. We should expose a utility method for this.
Modifications:
Move validatePromise(...) from DefaultChannelHandlerInvoker to ChannelHandlerInvokerUtil and make it public.
Result:
User is able to reuse code
Motivation:
At the moment it is possible to see a NPE when the LocalSocketChannels doRegister() method is called and the LocalSocketChannels doClose() method is called before the registration was completed.
Modifications:
Make sure we delay the actual close until the registration task was executed.
Result:
No more NPE
Motivation:
At the moment ChanneConfig.setAutoRead(false) only is guaranteer to not have an extra channelRead(...) triggered when used from within the channelRead(...) or channelReadComplete(...) method. This is not the correct behaviour as it should also work from other methods that are triggered from within the EventLoop. For example a valid use case is to have it called from within a ChannelFutureListener, which currently not work as expected.
Beside this there is another bug which is kind of related. Currently Channel.read() will not work as expected for OIO as we will stop try to read even if nothing could be read there after one read operation on the socket (when the SO_TIMEOUT kicks in).
Modifications:
Implement the logic the right way for the NIO/OIO/SCTP and native transport, specific to the transport implementation. Also correctly handle Channel.read() for OIO transport by trigger a new read if SO_TIMEOUT was catched.
Result:
It is now also possible to use ChannelConfig.setAutoRead(false) from other methods that are called from within the EventLoop and have direct effect.
Conflicts:
transport-sctp/src/main/java/io/netty/channel/sctp/nio/NioSctpChannel.java
transport/src/main/java/io/netty/channel/socket/nio/NioDatagramChannel.java
transport/src/main/java/io/netty/channel/socket/nio/NioSocketChannel.java
Motivation:
At the moment we create a HashMap that holds the MembershipKeys for multicast with every NioDatagramChannel even when most people not need it at al
Modifications:
Lazy create the HashMap when needed.
Result:
Less memory usage and less object creation
Motivation:
When using System.getProperty(...) and various methods to get a ClassLoader it will fail when a SecurityManager is in place.
Modifications:
Use a priveled block if needed. This work is based in the PR #2353 done by @anilsaldhana .
Result:
Code works also when SecurityManager is present
Motivation:
Because we not null out the array entry in the SelectionKey[] which is produced by SelectedSelectionKeySet.flip() we may end up with a few SelectionKeyreferences still hanging around here even after the Channel was closed. As these entries may be present at the end of the SelectionKey[] which is never updated for a long time as not enough SelectionKeys are ready.
Modifications:
Once we access the SelectionKey out of the SelectionKey[] we directly null it out.
Result:
Reference can be GC'ed right away once the Channel was closed.
Motivation:
At the moment we do a Channel.isActive() check in every AbstractChannel.AbstractUnsafe.write(...) call which gives quite some overhead as shown in the profiler when you write fast enough. We can eliminate the check and do something more smart here.
Modifications:
Remove the isActive() check and just check if the ChannelOutboundBuffer was set to null before, which means the Channel was closed. The rest will be handled in flush0() anyway.
Result:
Less overhead when doing many write calls
Motivation:
At the moment an IllegalArgumentException will be thrown if a ChannelPromise is cancelled while propagate through the ChannelPipeline. This is not correct, we should just stop to propagate it as it is valid to cancel at any time.
Modifications:
Stop propagate the operation through the ChannelPipeline once a ChannelPromise is cancelled.
Result:
No more IllegalArgumentException when cancel a ChannelPromise while moving through the ChannelPipeline.
Motivation:
While the default thread model provided by Netty is reasonable enough for most applications, some users might have a special requirement for the thread model. Here are a few examples:
- A user might want to invoke handlers from the caller thread directly, assuming that his or her application is completely asynchronous and does not make any invocation from non-I/O thread. In this case, the default invoker implementation will only add the overhead of checking if the current thread is an I/O thread or not.
- A user might want to invoke handlers from different threads depending on the type of events flexibly.
Modifications:
- Backport 132af3a485 which is a fix for #1912
- Add a new interface called 'ChannelHandlerInvoker' that performs the invocation of event handler methods.
- Add pipeline manipulation methods that accept ChannelHandlerInvoker
- The differences from the original commit:
- Separated the irrelevant changes out
- Channel.eventLoop is null until the registration is complete in this branch, so Channel.Unsafe.invoker() doesn't work before registration.
- Deregistration is not gone in this branch, so the methods related with deregistration were added to ChannelHandlerInvoker
Motivation:
MultithreadEventLoopGroup.newChild() does not override MultithreadEventExecutorGroup.newChild() which returns EventExecutor. MultithreadEventLoopGroup.newChild() should never return an EventExecutor, so this is incorrect.
Modifications:
Override MultithreadEventLoopGroup.newChild() so that it returns EventLoop
Result:
Correct API
Motivation:
EventExecutor.iterator() is fixed to return Iterator<EventExecutor> and there's no way to change that as long as we don't extend Iterable. However, a user should have a way to cast the returned set of executors painlessly. Currently, it is only possible with an explicit cast like (Iterator<NioEventLoop>).
Modifications:
Instead, I added a new method called 'children()' which returns an immutable collection of child executors whose method signature looks like the following:
<E extends EventExecutor> Set<E> children();
Result:
A user can now do this:
Set<NioEventLoop> loops = group.children();
for (NioEventLoop l: loops) { ... }
Unfortunately, this is not possible:
for (NioEventLoop l: group.children()) { ... }
However, it's still a gain that a user doesn't need to down-cast explicitly and to add the '@SuppressWarnings` annotation.
Motivation:
LocalEventLoopGroup and LocalEventLoop are not really special for LocalChannels. It can be used for other channel implementations as long as they don't require special handling.
Modifications:
- Add DefaultEventLoopGroup and DefaultEventLoop
- Deprecate LocalEventLoopGroup and make it extend DefaultEventLoopGroup
- Add DefaultEventLoop and remove LocalEventLoop
- Fix inspector warnings
Result:
- Better class names.
Motivation:
EventExecutor.parent() and EventLoop.parent() almost always return a constant parent executor. There's not much reason to let it implemented in subclasses.
Modifications:
- Implement AbstractEventExecutor.parent() with an additional contructor
- Add AbstractEventLoop so that subclasses extend AbstractEventLoop, which implements parent() appropriately
- Remove redundant parent() implementations in the subclasses
- Fix inspector warnings
Result:
Less duplication.
Motivation:
At the moment we use the system-wide default selector provider for this invocation of the Java virtual machine when constructing a new NIO channel, which makes using an alternative SelectorProvider practically useless.
This change allows user specify his/her preferred SelectorProvider.
Modifications:
Add SelectorProvider as a param for current `private static *Channel newSocket` method of NioSocketChannel, NioServerSocketChannel and NioDatagramChannel.
Change default constructors of NioSocketChannel, NioServerSocketChannel and NioDatagramChannel to use DEFAULT_SELECTOR_PROVIDER when calling newSocket(SelectorProvider).
Add new constructors for NioSocketChannel, NioServerSocketChannel and NioDatagramChannel which allow user specify his/her preferred SelectorProvider.
Result:
Now users can specify his/her preferred SelectorProvider when constructing an NIO channel.
Motivation:
Some operating systems like Windows 7 uses a valid globally unique EUI-64 MAC address for a virtual device (e.g. 00:00:00:00:00:00:00:E0), and because it's usually longer than the legit MAC-48 address, we should not use the length of MAC address when two MAC addresses are of the same quality. Instead, we should compare the INET address of the NICs before comparing the length of the MAC addresses.
Modification:
Compare the length of MAC addresses as a last resort.
Result:
Correct MAC address detection in Windows with IPv6 enabled.
Motivation:
When there are two MAC addresses which are good enough, we can choose the one with better IP address rather than just choosing the first appeared one.
Modification:
Replace isBetterAddress() with compareAddresses() to make it return if both addresses are in the same preference level.
Add compareAddresses() which compare InetAddresses and use it when compareAddress(byte[], byte[]) returns 0 (same preference)
Result:
More correct primary MAC address detection
Motivation:
As reported in #2331, some query operations in NetworkInterface takes much longer time than we expected. For example, specifying -Djava.net.preferIPv4Stack=true option in Window increases the execution time by more than 4 times. Some Windows systems have more than 20 network interfaces, and this problem gets bigger as the number of unused (virtual) NICs increases.
Modification:
Use NetworkInterface.getInetAddresses() wherever possible.
Before iterating over all NICs reported by NetworkInterface, filter the NICs without proper InetAddresses. This reduces the number of candidates quite a lot.
NetUtil does not query hardware address of NIC in the first place but uses InetAddress.isLoopbackAddress().
Do not call unnecessary query operations on NetworkInterface. Just get hardware address and compare.
Result:
Significantly reduced class initialization time, which should fix#2331.
Motivation:
Allow the user to create a NioServerSocketChannel from an existing ServerSocketChannel.
Modifications:
Add an extra constructor
Result:
Now the user is be able to create a NioServerSocketChannel from an existing ServerSocketChannel, like he can do with all the other Nio*Channel implemntations.
Motivation:
Ensure the user know the Channel must be closed to release resources like filehandles.
Modifications:
Add some extra javadoc.
Result:
More clear documentation
Motivation:
At the moment we use SocketChannel.open(), ServerSocketChannel.open() and DatagramSocketChannel.open(...) within the constructor of our
NIO channels. This introduces a bottleneck if you create a lot of connections as these calls delegate to SelectorProvider.provider() which
uses synchronized internal. This change removed the bottleneck.
Modifications:
Obtain a static instance of the SelectorProvider and use SelectorProvider.openSocketChannel(), SelectorProvider.openServerSocketChannel() and
SelectorProvider.openDatagramChannel(). This eliminates the bottleneck as SelectorProvider.provider() is not called on every channel creation.
Result:
Less conditions when create new channels.
Motivation:
Remove the synchronization bottleneck and so speed up things
Modifications:
Introduce a ThreadLocal cache that holds mappings between classes of ChannelHandlerAdapater implementations and the result of checking if the @Sharable annotation is present.
This way we only will need to do the real check one time and server the other calls via the cache. A ThreadLocal and WeakHashMap combo is used to implement the cache
as this way we can minimize the conditions while still be sure we not leak class instances in containers.
Result:
Less conditions during adding ChannelHandlerAdapter to the ChannelPipeline
This also does factor out some logic of ChannelOutboundBuffer. Mainly we not need nioBuffers() for many
transports and also not need to copy from heap to direct buffer. So this functionality was moved to
NioSocketChannelOutboundBuffer. Also introduce a EpollChannelOutboundBuffer which makes use of
memory addresses for all the writes to reduce GC pressure
- Allocating and deallocating a direct buffer for I/O is an expensive
operation, so we have to at least have a pool of direct buffers if the
current allocator is not pooled
- Related: #2163
- Add ResourceLeakHint to allow a user to provide a meaningful information about the leak when touching it
- DefaultChannelHandlerContext now implements ResourceLeakHint to tell where the message is going.
- Cleaner resource leak report by excluding noisy stack trace elements
- Fixes#1810
- Add a new interface ChannelId and its default implementation which generates globally unique channel ID.
- Replace AbstractChannel.hashCode with ChannelId.hashCode() and ChannelId.shortValue()
- Add variants of ByteBuf.hexDump() which accept byte[] instead of ByteBuf.
- Proposed fix for #1824
UniqueName and its subtypes do not allow getting the previously registered instance. For example, let's assume that a user is running his/her application in an OSGi container with Netty bundles and his server bundle. Whenever the server bundle is reloaded, the server will try to create a new AttributeKey instance with the same name. However, Netty bundles were not reloaded at all, so AttributeKey will complain that the name is taken already (by the previously loaded bundle.)
To fix this problem:
- Replaced UniqueName with Constant, AbstractConstant, and ConstantPool. Better name and better design.
- Sctp/Udt/RxtxChannelOption is not a ChannelOption anymore. They are just constant providers and ChannelOption is final now. It's because caching anything that's from outside of netty-transport will lead to ClassCastException on reload, because ChannelOption's constant pool will keep all option objects for reuse.
- Signal implements Constant because we can't ensure its uniqueness anymore by relying on the exception raised by UniqueName's constructor.
- Inspired by #2214 by @normanmaurer
- Call setUncancellable() before performing an outbound operation
- Add safeSetSuccess/Failure() and use them wherever
- Fixes#2060
- Ensure to return a future/promise implementation that does not fail with 'not registered to an event loop' error for registration operations
- If there is no usable event loop available, GlobalEventExecutor.INSTANCE is used as a fallback.
- Fixes#2003 properly
- Instead of using 'bundle' packaging, use 'jar' packaging. This is
more robust because some strict build tools fail to retrieve the
artifacts from a Maven repository unless their packaging is not 'jar'.
- All artifacts now contain META-INF/io.netty.version.properties, which
provides the detailed information about the build and repository.
- Removed OSGi testsuite temporarily because it gives false errors
during split package test and examination.
- Add io.netty.util.Version for easy retrieval of version information
The problem with the old way was that we always set the OP_WRITE when the buffer could not be written
until the write-spin-count was reached. This means that in some cases the channel was still be writable
but we just was not able to write out the data quick enough. For this cases we should better break out the
write loop and schedule a write to be picked up later in the EventLoop, when other tasks was executed.
The OP_WRITE will only be set if a write actual returned 0 which means there is no more room for writing data
and this we need to wait for the os to notify us.
Beside this it also helps to reduce CPU usage as nioBufferCount() is quite expensive when used on CompositeByteBuf which are
nested and contains a lot of components
This ChannelOption allows to tell the DatagramChannel implementation to be active as soon as they are registrated to their EventLoop. This can be used to make it possible to write to a not bound DatagramChannel.
The ChannelOption is marked as @deprecated as I'm looking for a better solution in master which breaks default behaviour with 4.0 branch.
This move less common method patterns to extra methods and so make the nioBuffers() method with most common pattern (backed by one ByteBuffer) small enough for inlining.
This is needed because of otherwise the JDK itself will do an extra ByteBuffer copy with it's own pool implementation. Even worth it will be done
multiple times if the ByteBuffer is always only partial written. With this change the copy is done inside of netty using it's own allocator and
only be done one time in all cases.
Introduce a new interface called MessageSizeEstimator. This can be specific per Channel (via ChannelConfig). The MessageSizeEstimator will be used to estimate for a message that should be written. The default implementation handles ByteBuf, ByteBufHolder and FileRegion. A user is free to plug-in his/her own implementation for different behaviour.