Motivation:
To guard against the case that a user will enqueue a lot of empty or small buffers and so raise an OOME we need to also take the overhead of the ChannelOutboundBuffer / PendingWriteQueue into account when detect if a Channel is writable or not. This is related to #5856.
Modifications:
When calculate the memory for an message that is enqueued also add some extra bytes depending on the implementation.
Result:
Better guard against OOME.
Motivation:
To make it easier to debug why notification of a promise failed we should log extra info and make it consistent.
Modifications:
- Create a new PromiseNotificationUtil that has static methods that can be used to try notify a promise and log.
- Reuse this in AbstractChannelHandlerContext, ChannelOutboundBuffer and PromiseNotifier
Result:
Easier to debug why a promise could not be notified.
Motivation:
The API documentation in ChannelConfig states that a a channel is writable,
if the number of pending bytes is below the low watermark and a
channel is not writable, if the number of pending bytes exceeds the high
watermark.
Therefore, we should use < operators instead of <= as well as > instead of >=.
Using <= and >= is also problematic, if the low watermark is equal to the high watermark,
as then a channel could be both writable and unwritable with the same number of pending
bytes (depending on whether remove() or addMessage() is called first).
The use of <= and >= was introduced in PR https://github.com/netty/netty/pull/3036, but
I don't understand why, as there doesn't seem to have been any discussion around that.
Modifications:
Use < and > operators instead of <= and >=.
Result:
High and low watermarks are treated as stated in the API docs.
Motivation:
We should merge ThrowableUtils into ThrowableUtil as this name is more consistent with the naming of utility classes in netty.
Modifications:
Merge classes.
Result:
More consistent naming
Motivation:
In case of exception in invokeExceptionCaught() only original exception passed to invokeExceptionCaught() will be logged on any log level.
+ AbstractChannelHandlerContext and CombinedChannelDuplexHandler log different exceptions.
Modifications:
Fix inconsistent logging code and add ability to see both stacktraces on DEBUG level.
Result:
Both handlers log now both original exception and thrown from invokeExceptionCaught. To see full stacktrace of exception thrown from invokeExceptionCaught DEBUG log level must be enabled.
Motivation:
JCTools supports both non-unsafe, unsafe versions of queues and JDK6 which allows us to shade the library in netty-common allowing it to stay "zero dependency".
Modifications:
- Remove copy paste JCTools code and shade the library (dependencies that are shaded should be removed from the <dependencies> section of the generated POM).
- Remove usage of OneTimeTask and remove it all together.
Result:
Less code to maintain and easier to update JCTools and less GC pressure as the queue implementation nt creates so much garbage
Motivation:
Recycler.recycle(...) should not be used anymore and be replaced by Handle.recycle().
Modifications:
Mark it as deprecated and update usage.
Result:
Correctly document deprecated api.
Motivation:
Revert d0943dcd30. Delaying the notification of writability change may lead to notification being missed. This is a ABA type of concurrency problem.
Modifications:
- Revert d0943dcd30.
Result:
channelWritabilityChange will be called on every change, and will not be suppressed due to ABA scenario.
Motivation:
When always triggered fireChannelWritabilityChanged() directly when the update the pending bytes in the ChannelOutboundBuffer was made from within the EventLoop. This is problematic as this can cause some re-entrance issue if the user has a custom ChannelOutboundHandler that does multiple writes from within the write(...) method and also has a handler that will intercept the channelWritabilityChanged event and trigger another write when the Channel is writable. This can also easily happen if the user just use a MessageToMessageEncoder subclass and triggers a write from channelWritabilityChanged().
Beside this we also triggered fireChannelWritabilityChanged() too often when a user did a write from outside the EventLoop. In this case we increased the pending bytes of the outboundbuffer before scheduled the actual write and decreased again before the write then takes place. Both of this may trigger a fireChannelWritabilityChanged() event which then may be re-triggered once the actual write ends again in the ChannelOutboundBuffer.
The third gotcha was that a user may get multiple events even if the writability of the channel not changed.
Modification:
- Always invoke the fireChannelWritabilityChanged() later on the EventLoop.
- Only trigger the fireChannelWritabilityChanged() if the channel is still active and if the writability of the channel changed. No need to cause events that were already triggered without a real writability change.
- when write(...) is called from outside the EventLoop we only increase the pending bytes in the outbound buffer (so that Channel.isWritable() is updated directly) but not cause a fireChannelWritabilityChanged(). The fireChannelWritabilityChanged() is then triggered once the task is picked up by the EventLoop as usual.
Result:
No more re-entrance possible because of writes from within channelWritabilityChanged(...) method and no events without a real writability change.
Motivation:
When a promise is notified that was already added to the ChannelOutboundBuffer and we try to notify it later on we only see a warning that it was notified before. This is often not very useful as we have no idea where it was notified at all. We can do better in case it was failed before (which is most of the times the case) and just also log the cause that was used for it.
Modifications:
Add the cause that was used to notify the promise when we fail to notify it as part of the ChannelOutboundBuffer.
Result:
Easier to debug user errors.
Motivation:
We should use OneTimeTask where possible to reduce object creation.
Modifications:
Replace Runnable with OneTimeTask
Result:
Less object creation
Motivation:
To avoid buffering too much it would be useful to get an estimate of how many bytes can be written to a Channel before it becomes unwritable.
Modifications:
- Update the Channel interface to support 2 new methods. 1 to give how many bytes before unwritable. 1 to give how many bytes before writable.
- Update the AbstractChannel implementation to delegate to the ChannelOutboundBuffer.
Result:
The Channel interface supports 2 new methods which provide more visibility into writability.
Motiviation:
There are currently no accessors which provide visbility into how many bytes must be written in order for a writability change to occur. This feature would be useful for codecs which intent to control how many bytes are queued at any given time.
Modifications:
- add bytesBeforeUnWritable() which will give the number of bytes before the buffer (and associated channel) transitions to not writable
- add bytesBeforeWritable() which will give the number of bytes that must be drained from the queue until the channel becomes writable.
Result:
More visibility into writability for the ChannelOutboundBuffer.
Motivation:
the ByteBuffer[] that we keep in the ThreadLocal are never nulled out which can lead to have ByteBuffer instances sit there forever.
This is even a bigger problem if nioBuffer() of ByteBuffer returns a new ByteBuffer that can not be destroyed by ByteBuffer.release().
Modifications:
Null out ByteBuffer array after processing.
Result:
No more dangling references after done.
Motivation:
When trying to write more then Integer.MAX_VALUE / SSIZE_MAX via writev(...) the OS may return EINVAL depending on the kernel or the actual OS (bsd / osx always return EINVAL). This will trigger an IOException.
Modifications:
Never try to write more then Integer.MAX_VALUE / SSIZE_MAX when using writev.
Result:
No more IOException when write more data then Integer.MAX_VALUE / SSIZE_MAX via writev.
Motivation:
We should not trigger channelWritabilityChanged during failing message when we are about to close the Channel as otherwise the use may try again writing even if the Channel is about to get closed.
Modifications:
Add new boolean param to ChannelOutboundBuffer.failFlushed(...) which allows to specify if we should notify or not.
Result:
channelWritabilityChanged is not triggered anymore if we cloe the Channel because of an IOException during write.
Related: #3212
Motivation:
When SslHandler and ChunkedWriteHandler exists in a pipeline together,
it is possible that ChunkedWriteHandler.channelWritabilityChanged()
invokes SslHandler.flush() and vice versa. Because they can feed each
other (i.e. ChunkedWriteHandler.channelWritabilityChanged() ->
SslHandler.flush() -> ChunkedWriteHandler.channelWritabilityChanged() ->
..), they can fall into an inconsistent state due to reentrance (e.g.
bad MAC record at the remote peer due to incorrect ordering.)
Modifications:
- Trigger channelWritabilityChanged() using EventLoop.execute() when
there's a chance where channelWritabilityChanged() can cause a
reentrance issue
- Fix test failures caused by the modification
Result:
Fix the handler reentrance issues related with a
channelWritabilityChanged() event
Related: #2945
Motivation:
Some special handlers such as TrafficShapingHandler need to override the
writability of a Channel to throttle the outbound traffic.
Modifications:
Add a new indexed property called 'user-defined writability flag' to
ChannelOutboundBuffer so that a handler can override the writability of
a Channel easily.
Result:
A handler can override the writability of a Channel using an unsafe API.
For example:
Channel ch = ...;
ch.unsafe().outboundBuffer().setUserDefinedWritability(1, false);
When a ChannelOutboundBuffer contains ByteBufs followed by a FileRegion,
removeBytes() will fail with a ClassCastException. It should break the
loop instead.
Motivation:
At the moment we expand the ByteBuffer[] when we have more then 1024 ByteBuffer to write and replace the stored instance in its FastThreadLocal. This is not needed and may even harm performance on linux as IOV_MAX is 1024 and so this may cause the JVM to do an array copy.
Modifications:
Just exit the nioBuffers() method if we can not fit more ByteBuffer in the array. This way we will pick them up on the next call.
Result:
Remove uncessary array copy and simplify the code.
Motivation:
We cache the ByteBuffers in ChannelOutboundBuffer.nioBuffers() for the Entries in the ChannelOutboundBuffer to reduce some overhead. The problem is this can lead to data-corruption if an incomplete write happens and next time we try to do a non-gathering write.
To fix this we should remove the caching which does not help a lot anyway and just make the code buggy.
Modifications:
Remove the caching of ByteBuffers.
Result:
No more data-corruption.
Motivation:
Sometimes ChannelHandler need to queue writes to some point and then process these. We currently have no datastructure for this so the user will use an Queue or something like this. The problem is with this Channel.isWritable() will not work as expected and so the user risk to write to fast. That's exactly what happened in our SslHandler. For this purpose we need to add a special datastructure which will also take care of update the Channel and so be sure that Channel.isWritable() works as expected.
Modifications:
- Add PendingWriteQueue which can be used for this purpose
- Make use of PendingWriteQueue in SslHandler
Result:
It is now possible to queue writes in a ChannelHandler and still have Channel.isWritable() working as expected. This also fixes#2752.
Motivation:
We did various changes related to the ChannelOutboundBuffer in 4.0 branch. This commit port all of them over and so make sure our branches are synced in terms of these changes.
Related to [#2734], [#2709], [#2729], [#2710] and [#2693] .
Modification:
Port all changes that was done on the ChannelOutboundBuffer.
This includes the port of the following commits:
- 73dfd7c01b
- 997d8c32d2
- e282e504f1
- 5e5d1a58fd
- 8ee3575e72
- d6f0d12a86
- 16e50765d1
- 3f3e66c31a
Result:
- Less memory usage by ChannelOutboundBuffer
- Same code as in 4.0 branch
- Make it possible to use ChannelOutboundBuffer with Channel implementation that not extends AbstractChannel
Motivation:
While benchmarking the native transport with gathering writes I noticed that it is quite slow. This is due the fact that we need to do a lot of array copies to get the buffers into the iov array.
Modification:
Introduce a new class calles IovArray which allows to fill buffers directly in a iov array that can be passed over to JNI without any array copies. This gives a nice optimization in terms of speed when doing gathering writes.
Result:
Big performance improvement when doing gathering writes. See the included benchmark...
Before:
[nmaurer@xxx]~% wrk/wrk -H 'Host: localhost' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Connection: keep-alive' -d 120 -c 256 -t 16 --pipeline 256 http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 23.44ms 16.37ms 259.57ms 91.77%
Req/Sec 181.99k 31.69k 304.60k 78.12%
346544071 requests in 2.00m, 46.48GB read
Requests/sec: 2887885.09
Transfer/sec: 396.59MB
With this change:
[nmaurer@xxx]~% wrk/wrk -H 'Host: localhost' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Connection: keep-alive' -d 120 -c 256 -t 16 --pipeline 256 http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 21.93ms 16.33ms 305.73ms 92.34%
Req/Sec 194.56k 33.75k 309.33k 77.04%
369617503 requests in 2.00m, 49.57GB read
Requests/sec: 3080169.65
Transfer/sec: 423.00MB
Motivation:
ChannelOutboundBuffer is basically a circular array queue of its entry
objects. Once an entry is created in the array, it is never nulled out
to reduce the allocation cost.
However, because it is a circular queue, the array almost always ends up
with as many entry instances as the size of the array, regardless of the
number of pending writes.
At worst case, a channel might have only 1 pending writes at maximum
while creating 32 entry objects, where 32 is the initial capacity of the
array.
Modifications:
- Reduce the initial capacity of the circular array queue to 4.
- Make the initial capacity of the circular array queue configurable
Result:
We spend 4 times less memory for entry objects under certain
circumstances.
Motivation:
The code in ChannelOutboundBuffer can be simplified by using AtomicLongFieldUpdater.addAndGet(...)
Modification:
Replace our manual looping with AtomicLongFieldUpdater.addAndGet(...)
Result:
Cleaner code
Motivation:
If ChannelOutboundBuffer.addFlush() is called multiple times and flushed != unflushed it will still loop through all entries that are not flushed yet even if it is not needed anymore as these were marked uncancellable before.
Modifications:
Check if new messages were added since addFlush() was called and only if this was the case loop through all entries and try to mark the uncancellable.
Result:
Less overhead when ChannelOuboundBuffer.addFlush() is called multiple times without new messages been added.
This also does factor out some logic of ChannelOutboundBuffer. Mainly we not need nioBuffers() for many
transports and also not need to copy from heap to direct buffer. So this functionality was moved to
NioSocketChannelOutboundBuffer. Also introduce a EpollChannelOutboundBuffer which makes use of
memory addresses for all the writes to reduce GC pressure
- Allocating and deallocating a direct buffer for I/O is an expensive
operation, so we have to at least have a pool of direct buffers if the
current allocator is not pooled
- Inspired by #2214 by @normanmaurer
- Call setUncancellable() before performing an outbound operation
- Add safeSetSuccess/Failure() and use them wherever
Beside this it also helps to reduce CPU usage as nioBufferCount() is quite expensive when used on CompositeByteBuf which are
nested and contains a lot of components
This move less common method patterns to extra methods and so make the nioBuffers() method with most common pattern (backed by one ByteBuffer) small enough for inlining.
This is needed because of otherwise the JDK itself will do an extra ByteBuffer copy with it's own pool implementation. Even worth it will be done
multiple times if the ByteBuffer is always only partial written. With this change the copy is done inside of netty using it's own allocator and
only be done one time in all cases.