Motivation:
On linux with glibc >= 2.14 it is possible to send multiple DatagramPackets with one syscall. This can be a huge performance win and so we should support it in our native transport.
Modification:
- Add support for sendmmsg by reuse IovArray
- Factor out ThreadLocal support of IovArray to IovArrayThreadLocal for better separation as we use IovArray also without ThreadLocal in NativeDatagramPacketArray now
- Introduce NativeDatagramPacketArray which is used for sendmmsg(...)
- Implement sendmmsg(...) via jni
- Expand DatagramUnicastTest to test also sendmmsg(...)
Result:
Netty now automatically use sendmmsg(...) if it is supported and we have more then 1 DatagramPacket in the ChannelOutboundBuffer and flush() is called.
Motivation:
On linux it is possible to use the sendMsg(...) system call to write multiple buffers with one system call when using datagram/udp.
Modifications:
- Implement the needed changes and make use of sendMsg(...) if possible for max performance
- Add tests that test sending datagram packets with all kind of different ByteBuf implementations.
Result:
Performance improvement when using CompoisteByteBuf and EpollDatagramChannel.
Motivation:
InetAddress.getByName(...) uses exceptions for control flow when try to parse IPv4-mapped-on-IPv6 addresses. This is quite expensive.
Modifications:
Detect IPv4-mapped-on-IPv6 addresses in the JNI level and convert to IPv4 addresses before pass to InetAddress.getByName(...) (via InetSocketAddress constructor).
Result:
Eliminate performance problem causes by exception creation when parsing IPv4-mapped-on-IPv6 addresses.
Motivation:
We received a bug-report that the ByteBuf.refCnt() does sometimes not show the correct value when release() and refCnt() is called from different Threads.
Modifications:
Add test-case which shows that all is working like expected
Result:
Test-case added which shows everything is ok.
Motivation:
This fixes bug #2848 which caused Recycler to become unbounded and cache infinite number of objects with maxCapacity that's not a power of two. This can result in general sluggishness of the application and OutOfMemoryError.
Modifications:
The test for maxCapacity has been moved out of test to check if the buffer has filled. The buffer is now also capped at maxCapacity and cannot grow over it as it jumps from one power of two to the other.
Additionally, a unit test was added to verify maxCapacity is honored even when it's not a power of two.
Result:
With these changes the user is able to use a custom maxCapacity number and not have it ignored. The unit test assures this bug will not repeat itself.
Motivation:
In EpollSocketchannel.doWriteFileRegion(...) we need to make sure we write until sendFile(...) returns either 0 or all is written. Otherwise we may not get notified once the Channel is writable again.
This is the case as we use EPOLL_ET.
Modifications:
Always write until either sendFile returns 0 or all is written.
Result:
No more hangs when writing DefaultFileRegion can happen.
Related issue: #2821
Motivation:
There's no way for a user to change the default ZlibEncoder
implementation.
It is already possible to change the default ZlibDecoder implementation.
Modification:
Add a new system property 'io.netty.noJdkZlibEncoder'.
Result:
A user can disable JDK ZlibEncoder, just like he or she can disable JDK
ZlibDecoder.
Motivation:
We have some duplicated code that can be reused.
Modifications:
Create package private class called CodecUtil that now contains the shared code / helper method.
Result:
Less code-duplication
Motivation:
ByteToMessageCodec miss to check for @Sharable annotation in one of its constructors.
Modifications:
Ensure we call checkForSharableAnnotation in all constructors.
Result:
After your change, what will change.
Motivation:
Currently we do more memory copies then needed.
Modification:
- Directly use heap buffers to reduce memory copy
- Correctly release buffers to fix buffer leak
Result:
Less memory copies and no leaks
Motivation:
There were no way to efficient write a CompositeByteBuf as we always did a memory copy to a direct buffer in this case. This is not needed as we can just write a CompositeByteBuf as long as all the components are buffers with a memory address.
Modifications:
- Write CompositeByteBuf which contains only direct buffers without memory copy
- Also handle CompositeByteBuf that have more components then 1024.
Result:
More efficient writing of CompositeByteBuf.
Motivation:
There is not need todo redunant reads of head in peakNode as we can just spin on next() until it becomes visible.
Modifications:
Remove redundant reads of head in peakNode. This is based on @nitsanw's patch for akka.
See https://github.com/akka/akka/pull/15596
Result:
Less volatile access.
Motivation:
We used the wrong EventExecutor to notify for bind failures if a late registration was done.
Modifications:
Use the correct EventExecutor to notify and only use the GlobelEventExecutor if the registration fails itself.
Result:
The correct Thread will do the notification.
Motivation:
The example mis handle two elements:
1) Last message is a LastHttpContent and is not taken into account by
the server handler
2) The client makes a sync on last write (chunked) but there is no flush
before, therefore the sync is waiting forever.
Modifications:
1) Take into account the message LastHttpContent in simple Get.
2) Removes sync but add flush for each post and multipost parts
Results:
Example is no more blocked after get test.
Should be done also in 4.0 and Master (similar changes)
Motivation:
Recently we changed the default value of SOMAXCONN that is used when we can not determine it by reading /proc/sys/net/core/somaxconn. While doing this we missed to update the javadocs to reflect the new default value that is used.
Modifications:
List correct default value in the javadocs of SOMAXCONN.
Result:
Correct javadocs.
Motivation:
The test procedure is unstable when testing quick time (factor less or equal to 1). Changing to default 10ms in this case will force time to be correct and time to be checked only when factor is >= 2.
Modifications:
When factor is <= 1, minimalWaitBetween is 10ms
Result:
Hoping this version is finally stable.
Motivation:
It seems that in certain conditions, the write back from the server is so quick that the handler has no time to compute traffic shaping. So 10ms of wait before acknowledging is added in server side.
Modifications:
Add 10ms waiting before server ackonwledge the client.
Result:
The timing is now suppsed to be stable.
Motivation:
The test procedure is unstable due to not enough precise timestamping
during the check.
Modifications:
Reducing the test cases and cibling "stable" test ("timestamp-able")
bring more stability to the tests.
Result:
Tests for TrafficShapingHandler seem more stable (whatever using JVM 6,
7 or 8).
When a ChannelOutboundBuffer contains ByteBufs followed by a FileRegion,
removeBytes() will fail with a ClassCastException. It should break the
loop instead.
f31c630c8c was causing
SocketGatheringWriteTest to fail because it does not take the case where
an empty buffer exists in a gathering write.
When there is an empty buffer in a gathering write, the number of
buffers returned by ChannelOutboundBuffer.nioBuffer() and the actual
number of write attemps can differ.
To remove the write requests correctly, a byte transport must use
ChannelOutboundBuffer.removeBytes()
Motivation:
Because of an incorrect logic in teh EmbeddedChannel constructor it is not possible to use EmbeddedChannel with a ChannelInitializer as constructor argument. This is because it adds the internal LastInboundHandler to its ChannelPipeline before it register itself to the EventLoop.
Modifications:
First register self to EventLoop before add LastInboundHandler to the ChannelPipeline.
Result:
It's now possible to use EmbeddedChannel with ChannelInitializer.
Motivation:
Due a regression NioSocketChannel.doWrite(...) will throw a ClassCastException if you do something like:
channel.write(bytebuf);
channel.write(fileregion);
channel.flush();
Modifications:
Correctly handle writing of different message types by using the correct message count while loop over them.
Result:
No more ClassCastException
Related issue: #2407
Motivation:
The current fallback SOMAXCONN value is 3072. It is way too large
comparing to the default SOMAXCONN value of popular OSes.
Modifications:
Decrease the fallback SOMAXCONN value to 128 or 200 depending on the
current OS
Result:
Saner fallback value
Motivation:
The _0XFF_0X00 buffer is not duplicated and empty after the first usage preventing the connection close to happen on subsequent close frames.
Modifications:
Correctly duplicate the buffer.
Result:
Multiple CloseWebSocketFrames are handled correctly.
Motivation:
ByteToMessageDecoder and ReplayingDecoder have incorrect javadocs in some places.
Modifications:
Fix incorrect javadocs for both classes.
Result:
Correct javadocs for both classes
Related issue: #2764
Motivation:
EpollSocketChannel.writeFileRegion() does not handle the case where the
position of a FileRegion is non-zero properly.
Modifications:
- Improve SocketFileRegionTest so that it tests the cases where the file
transfer begins from the middle of the file
- Add another jlong parameter named 'base_off' so that we can take the
position of a FileRegion into account
Result:
Improved test passes. Corruption is gone.
Related issue: #2508
Motivation:
The '<exec/>' task takes unnecessarily long time due to a known issue:
- https://issues.apache.org/bugzilla/show_bug.cgi?id=54128
Modifications:
- Reduce the number of '<exec/>' tasks for faster build
- Use '<propertyregex/>' to extract the output
Result:
Slightly faster build
Related issue: #2028
Motivation:
Some copiedBuffer() methods in Unpooled allocated a direct buffer. An
allocation of a direct buffer is an expensive operation, and thus should
be avoided for unpooled buffers.
Modifications:
- Use heap buffers in all copiedBuffer() methods
Result:
Unpooled.copiedBuffers() are less expensive now.
Motivation:
The previous fix did disable the caching of ByteBuffers completely which can cause performance regressions. This fix makes sure we use nioBuffers() for all writes in NioSocketChannel and so prevent data-corruptions. This is still kind of a workaround which will be replaced by a more fundamental fix later.
Modifications:
- Revert 4059c9f354
- Use nioBuffers() for all writes to prevent data-corruption
Result:
No more data-corruption but still retain the original speed.
Motivation:
At the moment we expand the ByteBuffer[] when we have more then 1024 ByteBuffer to write and replace the stored instance in its FastThreadLocal. This is not needed and may even harm performance on linux as IOV_MAX is 1024 and so this may cause the JVM to do an array copy.
Modifications:
Just exit the nioBuffers() method if we can not fit more ByteBuffer in the array. This way we will pick them up on the next call.
Result:
Remove uncessary array copy and simplify the code.
Motivation:
We cache the ByteBuffers in ChannelOutboundBuffer.nioBuffers() for the Entries in the ChannelOutboundBuffer to reduce some overhead. The problem is this can lead to data-corruption if an incomplete write happens and next time we try to do a non-gathering write.
To fix this we should remove the caching which does not help a lot anyway and just make the code buggy.
Modifications:
Remove the caching of ByteBuffers.
Result:
No more data-corruption.
Motivation:
Currently Traffic Shaping is using 1 timer only and could lead to
"partial" wrong bandwidth computation when "short" time occurs between
adding used bytes and when the TrafficCounter updates itself and finally
when the traffic is computed.
Indeed, the TrafficCounter is updated every x delay and it is at the
same time saved into "lastXxxxBytes" and set to 0. Therefore, when one
request the counter, it first updates the TrafficCounter with the added
used bytes. If this value is set just before the TrafficCounter is
updated, then the bandwidth computation will use the TrafficCounter with
a "0" value (this value being reset once the delay occurs). Therefore,
the traffic shaping computation is wrong in rare cases.
Secondly the traffic shapping should avoid if possible the "Timeout"
effect by not stopping reading or writing more than a maxTime, this
maxTime being less than the TimeOut limit.
Thirdly the traffic shapping in read had an issue since the readOp was
not set but should, turning in no read blocking from socket point of
view. (see #2696)
Take into account setAutoRead(boolean) setting directly
by the user in the program external to this handler.
Modifications:
The TrafficCounter has 2 new methods that compute the time to wait
according to read or write) using in priority the currentXxxxBytes (as
before), but could used (if current is at 0) the lastXxxxxBytes, and
therefore having more chance to take into account the real traffic.
Moreover the Handler could change the default "max time to wait", which
is by default set to half of "standard" Time Out (30s:2 = 15s).
Finally we add the setAutoRead(boolean) accordingly to the situation, as
proposed in #2696 (the original pull request is in error for unknown
reason so this merge).
Result:
The Traffic Shaping is better take into account (no 0 value when it
shouldn't) and it tries to not block traffic more than Time Out event.
Moreover the read is really stopped from socket point of view.
This version is similar to #2388 and #2450.
This version is for V4.0, and includes the #2696 pull request to ease
the merge process.
The test minimizes time check by reducing to 66ms steps (50s total).
Motivation:
Sometimes ChannelHandler need to queue writes to some point and then process these. We currently have no datastructure for this so the user will use an Queue or something like this. The problem is with this Channel.isWritable() will not work as expected and so the user risk to write to fast. That's exactly what happened in our SslHandler. For this purpose we need to add a special datastructure which will also take care of update the Channel and so be sure that Channel.isWritable() works as expected.
Modifications:
- Add PendingWriteQueue which can be used for this purpose
- Make use of PendingWriteQueue in SslHandler
Result:
It is now possible to queue writes in a ChannelHandler and still have Channel.isWritable() working as expected. This also fixes#2752.
In Netty 3, downstream writes of SPDY data frames and upstream reads of
SPDY window udpate frames occur on different threads.
When receiving a window update frame, we synchronize on a java object
(SpdySessionHandler::flowControlLock) while sending any pending writes
that are now able to complete.
When writing a data frame, we check the send window size to see if we
are allowed to write it to the socket, or if we have to enqueue it as a
pending write. To prevent races with the window update frame, this is
also synchronized on the same SpdySessionHandler::flowControlLock.
In Netty 4, upstream and downstream operations on any given channel now
occur on the same thread. Since java locks are re-entrant, this now
allows downstream writes to occur while processing window update frames.
In particular, when we receive a window update frame that unblocks a
pending write, this write completes which triggers an event notification
on the response, which in turn triggers a write of a data frame. Since
this is on the same thread it re-enters the lock and modifies the send
window. When the write completes, we continue processing pending writes
without knowledge that the window size has been decremented.
Motivation:
The calculation of the max wait time for HashedWheelTimerTest.testExecutionOnTime() was wrong and so the test sometimes failed.
Modifications:
Fix the max wait time.
Result:
No more test-failures
Related issue: #2743
Motivation:
When there are more than one stream with the same priority, the set
returned by SpdySession.getActiveStream() will not include all of them,
because it uses TreeSet and only compares the priority of streams. If
two different streams have the same priority, one of them will be
discarded by TreeSet.
Modification:
- Rename getActiveStreams() to activeStreams()
- Replace PriorityComparator with StreamComparator
Result:
Two different streams with the same priority are compared correctly.
Motivation:
We forgot to do a null check on the cause parameter of
ChannelFuture.setFailure(cause)
Modifications:
Add a null check
Result:
Fixed issue: #2728
Motivation:
If the requests contains uri parameters but not path the HttpRequestEncoder does produce an invalid uri while try to add the missing path.
Modifications:
Correctly handle the case of uri with paramaters but no path.
Result:
HttpRequestEncoder produce correct uri in all cases.
Motivation:
While trying to merge our ChannelOutboundBuffer changes we've made last
week, I realized that we have quite a bit of conflicting changes at 4.1
and master. It was primarily because we added
ChannelOutboundBuffer.beforeAdd() and moved some logic there, such as
direct buffer conversion.
However, this is not possible with the changes we've made for 4.0. We
made ChannelOutboundBuffer final for example.
Maintaining multiple branch is already getting painful and having
different core will make it even worse, so I think we should keep the
differences between 4.0 and other branches minimal.
Modifications:
- Move ChannelOutboundBuffer.safeRelease() to ReferenceCountUtil
- Add ByteBufUtil.threadLocalBuffer()
- Backported from ThreadLocalPooledDirectByteBuf
- Make most methods in AbstractUnsafe final
- Add AbstractChannel.filterOutboundMessage() so that a transport can
convert a message to another (e.g. heap -> off-heap), and also
reject unsupported messages
- Move all direct buffer conversions to filterOutboundMessage()
- Move all type checks to filterOutboundMessage()
- Move AbstractChannel.checkEOF() to OioByteStreamChannel, because it's
the only place it is used at all
- Remove ChannelOutboundBuffer.current(Object), because it's not used
anymore
- Add protected direct buffer conversion methods to AbstractNioChannel
and AbstractEpollChannel so that they can be used by their subtypes
- Update all transport implementations according to the changes above
Result:
- The missing extension point in 4.0 has been added.
- AbstractChannel.filterOutboundMessage()
- Thanks to the new extension point, we moved all transport-specific
logic from ChannelOutboundBuffer to each transport implementation
- We can copy most of the transport implementations in 4.0 to 4.1 and
master now, so that we have much less merge conflict when we modify
the core.
Related issue: #2733
Motivation:
Unlike OpenSsl, Epoll lacks a couple useful availability checker
methods:
- ensureAvailability()
- unavailabilityCause()
Modifications:
Add missing methods
Result:
More ways to check the availability and to get the cause of
unavailability programatically.