Motivation:
We introduced a PoolThreadCache which is used in our PooledByteBufAllocator to reduce the synchronization overhead on PoolArenas when allocate / deallocate PooledByteBuf instances. This cache is used for both the allocation path and deallocation path by:
- Look for cached memory in the PoolThreadCache for the Thread that tries to allocate a new PooledByteBuf and if one is found return it.
- Add the memory that is used by a PooledByteBuf to the PoolThreadCache of the Thread that release the PooledByteBuf
This works out very well when all allocation / deallocation is done in the EventLoop as the EventLoop will be used for read and write. On the otherside this can lead to surprising side-effects if the user allocate from outside the EventLoop and and pass the ByteBuf over for writing. The problem here is that the memory will be added to the PoolThreadCache that did the actual write on the underlying transport and not on the Thread that previously allocated the buffer.
Modifications:
Don't cache if different Threads are used for allocating/deallocating
Result:
Less confusing behavior for users that allocate PooledByteBufs from outside the EventLoop.
Motivation:
When MemoryRegionCache.trim() is called, some unused cache entries will be freed (started from head). However, in MeoryRegionCache.trim() the head is not updated, which make entry list's head point to an entry whose chunk is null now and following allocate of MeoryRegionCache will return false immediately.
In other word, cache is no longer usable once trim happen.
Modifications:
Update head to correct idx after free entries in trim().
Result:
MemoryRegionCache behaves correctly even after calling trim().
Motivation:
handlerAdded and handlerRemoved were overriden but super was never
called, while it should.
Also add one missing information in the toString method.
Modifications:
Add the super corresponding call, and add checkInterval to the
toString() method
Result;
super method calls are correctly passed to the super implementation
part.
Motivation:
Sometimes it is useful to be able to access the uri that was used to initialize the QueryStringDecoder.
Modifications:
Add method which allows to retrieve the uri.
Result:
Allow to retrieve the uri that was used to create the QueryStringDecoder.
Motivation:
When constructing a FingerprintTrustManagerFactory from an Iterable of Strings, the fingerprints were correctly parsed but never added to the result array. The constructed FingerprintTrustManagerFactory consequently fails to validate any certificate.
Modifications:
I added a line to add each converted SHA-1 certificate fingerprint to the result array which then gets passed on to the next constructor.
Result:
Certificate fingerprints passed to the constructor are now correctly added to the array of valid fingerprints. The resulting FingerprintTrustManagerFactory object correctly validates certificates against the list of specified fingerprints.
Motiviation:
If sendmmsg is already defined then the native epoll module failed to build because of conflicting definitions.
The mmsghdr type was also redefined on systems that already supported this structure.
Modifications:
Provide a way so that systems which already define sendmmsg and mmsghdr can build
Provide a way so that systems which don't define sendmmsg and mmsghdr can build
Result:
The native EPOLL module can build in more environments
Motivation:
ExtensionRegistry is a subclass of ExtensionRegistryLite. The ProtobufDecoder
doesn't use the registry directly, it simply passes it through to the Protobuf
API. The Protobuf calls in question are themselves written in terms
ExtensionRegistryLite not ExtensionRegistry.
Modifications:
Require ExtensionRegistryLite instead of ExtensionRegistry in ProtobufDecoder.
Result:
Consumers can use ExtensionRegistryLite with ProtobufDecoder.
Motiviation:
The HTTP content decoder's cleanup method is not cleaning up the decoder correctly.
The cleanup method is currently doing a readOutbound on the EmbeddedChannel but
for decoding the call should be readInbound.
Modifications:
-Change readOutbound to readInbound in the cleanup method
Result:
The cleanup method should be correctly releaseing unused resources
Motivation:
In linux it is possible to write more then one buffer withone syscall when sending datagram messages.
Modifications:
Not copy CompositeByteBuf if it only contains direct buffers.
Result:
More performance due less overhead for copy.
Motivation:
Due incorrect usage of CompositeByteBuf a buffer leak was introduced.
Modifications:
Correctly handle tests with CompositeByteBuf.
Result:
No more buffer leaks
Motivation:
On linux with glibc >= 2.14 it is possible to send multiple DatagramPackets with one syscall. This can be a huge performance win and so we should support it in our native transport.
Modification:
- Add support for sendmmsg by reuse IovArray
- Factor out ThreadLocal support of IovArray to IovArrayThreadLocal for better separation as we use IovArray also without ThreadLocal in NativeDatagramPacketArray now
- Introduce NativeDatagramPacketArray which is used for sendmmsg(...)
- Implement sendmmsg(...) via jni
- Expand DatagramUnicastTest to test also sendmmsg(...)
Result:
Netty now automatically use sendmmsg(...) if it is supported and we have more then 1 DatagramPacket in the ChannelOutboundBuffer and flush() is called.
Motivation:
On linux it is possible to use the sendMsg(...) system call to write multiple buffers with one system call when using datagram/udp.
Modifications:
- Implement the needed changes and make use of sendMsg(...) if possible for max performance
- Add tests that test sending datagram packets with all kind of different ByteBuf implementations.
Result:
Performance improvement when using CompoisteByteBuf and EpollDatagramChannel.
Motivation:
InetAddress.getByName(...) uses exceptions for control flow when try to parse IPv4-mapped-on-IPv6 addresses. This is quite expensive.
Modifications:
Detect IPv4-mapped-on-IPv6 addresses in the JNI level and convert to IPv4 addresses before pass to InetAddress.getByName(...) (via InetSocketAddress constructor).
Result:
Eliminate performance problem causes by exception creation when parsing IPv4-mapped-on-IPv6 addresses.
Motivation:
We received a bug-report that the ByteBuf.refCnt() does sometimes not show the correct value when release() and refCnt() is called from different Threads.
Modifications:
Add test-case which shows that all is working like expected
Result:
Test-case added which shows everything is ok.
Motivation:
This fixes bug #2848 which caused Recycler to become unbounded and cache infinite number of objects with maxCapacity that's not a power of two. This can result in general sluggishness of the application and OutOfMemoryError.
Modifications:
The test for maxCapacity has been moved out of test to check if the buffer has filled. The buffer is now also capped at maxCapacity and cannot grow over it as it jumps from one power of two to the other.
Additionally, a unit test was added to verify maxCapacity is honored even when it's not a power of two.
Result:
With these changes the user is able to use a custom maxCapacity number and not have it ignored. The unit test assures this bug will not repeat itself.
Motivation:
In EpollSocketchannel.doWriteFileRegion(...) we need to make sure we write until sendFile(...) returns either 0 or all is written. Otherwise we may not get notified once the Channel is writable again.
This is the case as we use EPOLL_ET.
Modifications:
Always write until either sendFile returns 0 or all is written.
Result:
No more hangs when writing DefaultFileRegion can happen.
Related issue: #2821
Motivation:
There's no way for a user to change the default ZlibEncoder
implementation.
It is already possible to change the default ZlibDecoder implementation.
Modification:
Add a new system property 'io.netty.noJdkZlibEncoder'.
Result:
A user can disable JDK ZlibEncoder, just like he or she can disable JDK
ZlibDecoder.
Motivation:
We have some duplicated code that can be reused.
Modifications:
Create package private class called CodecUtil that now contains the shared code / helper method.
Result:
Less code-duplication
Motivation:
ByteToMessageCodec miss to check for @Sharable annotation in one of its constructors.
Modifications:
Ensure we call checkForSharableAnnotation in all constructors.
Result:
After your change, what will change.
Motivation:
Currently we do more memory copies then needed.
Modification:
- Directly use heap buffers to reduce memory copy
- Correctly release buffers to fix buffer leak
Result:
Less memory copies and no leaks
Motivation:
There were no way to efficient write a CompositeByteBuf as we always did a memory copy to a direct buffer in this case. This is not needed as we can just write a CompositeByteBuf as long as all the components are buffers with a memory address.
Modifications:
- Write CompositeByteBuf which contains only direct buffers without memory copy
- Also handle CompositeByteBuf that have more components then 1024.
Result:
More efficient writing of CompositeByteBuf.
Motivation:
There is not need todo redunant reads of head in peakNode as we can just spin on next() until it becomes visible.
Modifications:
Remove redundant reads of head in peakNode. This is based on @nitsanw's patch for akka.
See https://github.com/akka/akka/pull/15596
Result:
Less volatile access.
Motivation:
We used the wrong EventExecutor to notify for bind failures if a late registration was done.
Modifications:
Use the correct EventExecutor to notify and only use the GlobelEventExecutor if the registration fails itself.
Result:
The correct Thread will do the notification.
Motivation:
The example mis handle two elements:
1) Last message is a LastHttpContent and is not taken into account by
the server handler
2) The client makes a sync on last write (chunked) but there is no flush
before, therefore the sync is waiting forever.
Modifications:
1) Take into account the message LastHttpContent in simple Get.
2) Removes sync but add flush for each post and multipost parts
Results:
Example is no more blocked after get test.
Should be done also in 4.0 and Master (similar changes)
Motivation:
Recently we changed the default value of SOMAXCONN that is used when we can not determine it by reading /proc/sys/net/core/somaxconn. While doing this we missed to update the javadocs to reflect the new default value that is used.
Modifications:
List correct default value in the javadocs of SOMAXCONN.
Result:
Correct javadocs.
Motivation:
The test procedure is unstable when testing quick time (factor less or equal to 1). Changing to default 10ms in this case will force time to be correct and time to be checked only when factor is >= 2.
Modifications:
When factor is <= 1, minimalWaitBetween is 10ms
Result:
Hoping this version is finally stable.
Motivation:
It seems that in certain conditions, the write back from the server is so quick that the handler has no time to compute traffic shaping. So 10ms of wait before acknowledging is added in server side.
Modifications:
Add 10ms waiting before server ackonwledge the client.
Result:
The timing is now suppsed to be stable.
Motivation:
The test procedure is unstable due to not enough precise timestamping
during the check.
Modifications:
Reducing the test cases and cibling "stable" test ("timestamp-able")
bring more stability to the tests.
Result:
Tests for TrafficShapingHandler seem more stable (whatever using JVM 6,
7 or 8).
When a ChannelOutboundBuffer contains ByteBufs followed by a FileRegion,
removeBytes() will fail with a ClassCastException. It should break the
loop instead.
f31c630c8c was causing
SocketGatheringWriteTest to fail because it does not take the case where
an empty buffer exists in a gathering write.
When there is an empty buffer in a gathering write, the number of
buffers returned by ChannelOutboundBuffer.nioBuffer() and the actual
number of write attemps can differ.
To remove the write requests correctly, a byte transport must use
ChannelOutboundBuffer.removeBytes()
Motivation:
Because of an incorrect logic in teh EmbeddedChannel constructor it is not possible to use EmbeddedChannel with a ChannelInitializer as constructor argument. This is because it adds the internal LastInboundHandler to its ChannelPipeline before it register itself to the EventLoop.
Modifications:
First register self to EventLoop before add LastInboundHandler to the ChannelPipeline.
Result:
It's now possible to use EmbeddedChannel with ChannelInitializer.
Motivation:
Due a regression NioSocketChannel.doWrite(...) will throw a ClassCastException if you do something like:
channel.write(bytebuf);
channel.write(fileregion);
channel.flush();
Modifications:
Correctly handle writing of different message types by using the correct message count while loop over them.
Result:
No more ClassCastException
Related issue: #2407
Motivation:
The current fallback SOMAXCONN value is 3072. It is way too large
comparing to the default SOMAXCONN value of popular OSes.
Modifications:
Decrease the fallback SOMAXCONN value to 128 or 200 depending on the
current OS
Result:
Saner fallback value
Motivation:
The _0XFF_0X00 buffer is not duplicated and empty after the first usage preventing the connection close to happen on subsequent close frames.
Modifications:
Correctly duplicate the buffer.
Result:
Multiple CloseWebSocketFrames are handled correctly.
Motivation:
ByteToMessageDecoder and ReplayingDecoder have incorrect javadocs in some places.
Modifications:
Fix incorrect javadocs for both classes.
Result:
Correct javadocs for both classes
Related issue: #2764
Motivation:
EpollSocketChannel.writeFileRegion() does not handle the case where the
position of a FileRegion is non-zero properly.
Modifications:
- Improve SocketFileRegionTest so that it tests the cases where the file
transfer begins from the middle of the file
- Add another jlong parameter named 'base_off' so that we can take the
position of a FileRegion into account
Result:
Improved test passes. Corruption is gone.
Related issue: #2508
Motivation:
The '<exec/>' task takes unnecessarily long time due to a known issue:
- https://issues.apache.org/bugzilla/show_bug.cgi?id=54128
Modifications:
- Reduce the number of '<exec/>' tasks for faster build
- Use '<propertyregex/>' to extract the output
Result:
Slightly faster build
Related issue: #2028
Motivation:
Some copiedBuffer() methods in Unpooled allocated a direct buffer. An
allocation of a direct buffer is an expensive operation, and thus should
be avoided for unpooled buffers.
Modifications:
- Use heap buffers in all copiedBuffer() methods
Result:
Unpooled.copiedBuffers() are less expensive now.
Motivation:
The previous fix did disable the caching of ByteBuffers completely which can cause performance regressions. This fix makes sure we use nioBuffers() for all writes in NioSocketChannel and so prevent data-corruptions. This is still kind of a workaround which will be replaced by a more fundamental fix later.
Modifications:
- Revert 4059c9f354
- Use nioBuffers() for all writes to prevent data-corruption
Result:
No more data-corruption but still retain the original speed.
Motivation:
At the moment we expand the ByteBuffer[] when we have more then 1024 ByteBuffer to write and replace the stored instance in its FastThreadLocal. This is not needed and may even harm performance on linux as IOV_MAX is 1024 and so this may cause the JVM to do an array copy.
Modifications:
Just exit the nioBuffers() method if we can not fit more ByteBuffer in the array. This way we will pick them up on the next call.
Result:
Remove uncessary array copy and simplify the code.