Motivation:
At the moment ChanneConfig.setAutoRead(false) only is guaranteer to not have an extra channelRead(...) triggered when used from within the channelRead(...) or channelReadComplete(...) method. This is not the correct behaviour as it should also work from other methods that are triggered from within the EventLoop. For example a valid use case is to have it called from within a ChannelFutureListener, which currently not work as expected.
Beside this there is another bug which is kind of related. Currently Channel.read() will not work as expected for OIO as we will stop try to read even if nothing could be read there after one read operation on the socket (when the SO_TIMEOUT kicks in).
Modifications:
Implement the logic the right way for the NIO/OIO/SCTP and native transport, specific to the transport implementation. Also correctly handle Channel.read() for OIO transport by trigger a new read if SO_TIMEOUT was catched.
Result:
It is now also possible to use ChannelConfig.setAutoRead(false) from other methods that are called from within the EventLoop and have direct effect.
Motivation:
There is currently no epoll based DatagramChannel. We should add one to make the set of provided channels complete and also to be able to offer better performance compared to the NioDatagramChannel once SO_REUSEPORT is implemented.
Modifications:
Add implementation of DatagramChannel which uses epoll. This implementation does currently not support multicast yet which will me implemented later on. As most users will not use multicast anyway I think it is fair to just add the EpollDatagramChannel without the support for now. We shipped NioDatagramChannel without support earlier too ...
Result:
Be able to use EpollDatagramChannel for max. performance on linux
Motivation:
In linux kernel 3.9 a new featured named SO_REUSEPORT was introduced which allows to have multiple sockets bind to the same port and so handle the accept() of new connections with multiple threads. This can greatly improve the performance when you not to accept a lot of connections.
Modifications:
Implement SO_REUSEPORT via JNI
Result:
Be able to use the SO_REUSEPORT feature when using the EpollServerSocketChannel
Motivation:
Fix leaks reported during running SpdyFrameDecoderTest
Modifications:
Make sure the produced buffers of SpdyFrameDecoder and SpdyFrameDecoderTest are released
Result:
No more leak reports during run the tests.
Motivation:
Fix leaks reported during running SpdyFrameDecoderTest
Modifications:
Make sure the produced buffer of SpdyFrameDecoder is released
Result:
No more leak reports during run the tests.
Motivation:
Fix leaks reported during SPDY test.
Modifications:
Use ReferenceCountUtil.releaseLater(...) to make sure everything is released once the tests are done.
Result:
No more leak reports during run the tests.
Motivation:
Currently, the SPDY frame encoding and decoding code is based upon
the ChannelHandler abstraction. This requires maintaining multiple
versions for 3.x and 4.x (and possibly 5.x moving forward).
Modifications:
The SPDY frame encoding and decoding code is separated from the
ChannelHandler and SpdyFrame abstractions. Also test coverage is
improved.
Result:
SpdyFrameCodec now implements the ChannelHandler abstraction and is
responsible for creating and handling SpdyFrame objects.
Conflicts:
codec-http/src/main/java/io/netty/handler/codec/spdy/SpdyFrameCodec.java
Motivation:
PoolArena's 'normalizeCapacity' function was micro-optimized some
time ago to remove a while loop. However, there was a change of
behavior in the function as a result. Capacities passed into it
that are already powers of 2 (and >= 512) are doubled in size. So
if I ask for a buffer with a capacity of 1024, I will get back one
that actually uses 2048 bytes (stored in maxLength).
Aligning to powers of two for book keeping ease is reasonable,
and if someone tries to expand a buffer, you might as well use some
of the previously wasted space. However, since this distinction
between 'easily expanded' and 'costly to expand' space is not
supported at all by the APIs, I cannot imagine this change to
doubling is desirable or intentional.
This is especially costly when using composite buffers. They
frequently allocate components with a capacity that is a power of
2, and they never attempt to expand components themselves. The end
result is that heavy use of pool-backed composite buffers wastes
almost half of the memory pool (the smaller / initial components are
<512 and so are not affected by the off-by-one bug).
Modifications:
Although I find it difficult to believe that such an optimization
is really helpful, I left it in and fixed the off-by-one issue by
decrementing the value at the start.
I also added a simple test to both attempt to verify that the
decrement fixes the issue without introducing any other change, and
to make it easy for a reviewer to test the existing behavior. PoolArena
does not seem to have much testing or testability support though so
the test is kind of a hack and will break for unrelated changes. I
suggest either removing it or factoring out the single non-static
portion of normalizeCapacity so that the fragile dummy PoolArena is
not required.
Result:
Pooled allocators will allocate less resources to the highly
inefficient and undocumented buffer section between length and
maxLength.
Composite buffers of non-trivial size that are backed by pooled
allocators will use about half as much memory.
Motivation:
At the moment we create a HashMap that holds the MembershipKeys for multicast with every NioDatagramChannel even when most people not need it at al
Modifications:
Lazy create the HashMap when needed.
Result:
Less memory usage and less object creation
Motivation:
We sometimes see data corruption when writing to the EpollSocketChannel.
Modifications:
The problem was caused as we mixed writing via memory address and via ByteBuffer. This not works out pretty well because of how the position of the buffer is updated etc. To fix the problem we only write via ByteBuffer (this is true for normal and gathering writes). Before normal writes may write via the memory address and gathering writes always used the ByteBuffer.
Result:
Fix data-corruption which could happen on partial writes
Motivation:
We sometimes see data corruption when writing to the EpollSocketChannel.
Modifications:
Correctly update the position of the ByteBuffer after something was written.
Result:
Fix data-corruption which could happen on partial writes
Motivation:
At the moment we create new ThreadPoolCache whenever a Thread tries either allocate or release something on the PooledByteBufAllocator. When something is released we put it then in its ThreadPoolCache. The problem is we never check if a Thread is not alive anymore and so we may end up with memory that is never freed again if a user create many short living Threads that use the PooledByteBufAllocator.
Modifications:
Periodically check if the Thread is still alive that has a ThreadPoolCache assinged and if not free it.
Result:
Memory is freed up correctly even for short living Threads.
Motivation:
When using System.getProperty(...) and various methods to get a ClassLoader it will fail when a SecurityManager is in place.
Modifications:
Use a priveled block if needed. This work is based in the PR #2353 done by @anilsaldhana .
Result:
Code works also when SecurityManager is present
Motivation:
At the moment a user can not safetly call slice().retain() or duplicate.retain()in the ByteToMessageDecoder.decode(...) implementation without the risk to see coruption because we may call discardSomeReadBytes() to make room on the buffer once the handling is done.
Modifications:
Check for the refCnt() before call discardSomeReadBytes() and also check before call decode(...) to create a copy if needed.
Result:
The user can safetly call slice().retain() or duplicate.retain() in his/her ByteToMessageDecoder.decode(...) method.
Motivation:
Because we not null out the array entry in the SelectionKey[] which is produced by SelectedSelectionKeySet.flip() we may end up with a few SelectionKeyreferences still hanging around here even after the Channel was closed. As these entries may be present at the end of the SelectionKey[] which is never updated for a long time as not enough SelectionKeys are ready.
Modifications:
Once we access the SelectionKey out of the SelectionKey[] we directly null it out.
Result:
Reference can be GC'ed right away once the Channel was closed.
Motivation:
EpollSocketChannel.remoteAddress0() is always null on accepted EpollSocketChannels as we not set it excplicit.
Modifications:
Correctly retrieve the local and remote address when accept new channel and store it
Result:
EpollSocketchannel.remoteAddress0() and EpollSocketChannel.localAddress0() return correct addresses
Motivation:
At the moment we do a Channel.isActive() check in every AbstractChannel.AbstractUnsafe.write(...) call which gives quite some overhead as shown in the profiler when you write fast enough. We can eliminate the check and do something more smart here.
Modifications:
Remove the isActive() check and just check if the ChannelOutboundBuffer was set to null before, which means the Channel was closed. The rest will be handled in flush0() anyway.
Result:
Less overhead when doing many write calls
Motivation:
Native.epollCreate(...) fails on systems using a kernel < 2.6.27 / glibc < 2.9 because it uses epoll_create1(...) without checking if it is present
Modifications:
Check if epoll_create1(...) exists abd if not fall back to use epoll_create(...)
Result:
Works even on systems with kernel < 2.6.27 / glibc < 2.9
Motivation:
In SslHandler.safeClose(...) we attach a ChannelFutureListener to the flushFuture and will notify the ChannelPromise which was used for close(...) in it. The problem here is that we only call ChannelHandlerContext.close(ChannelPromise) if Channel.isActive() is true and otherwise not notify it at all. We should just call ChannelHandlerContext.close(ChannelPromise) in all cases.
Modifications:
Always call ChannelHandlerContext.close(ChannelPromise) in the ChannelFutureListeiner
Result:
ChannelPromise used for close the Channel is notified in all cases
Motivation:
At the moment an IllegalArgumentException will be thrown if a ChannelPromise is cancelled while propagate through the ChannelPipeline. This is not correct, we should just stop to propagate it as it is valid to cancel at any time.
Modifications:
Stop propagate the operation through the ChannelPipeline once a ChannelPromise is cancelled.
Result:
No more IllegalArgumentException when cancel a ChannelPromise while moving through the ChannelPipeline.
Motivation:
Currently the CORS support only handles a single origin, or a wildcard
origin. This task should enhance Netty's CORS support to allow multiple
origins to be specified. Just being allowed to specify one origin is
particulary limiting when a site support both http and https for
example.
Modifications:
- Updated CorsConfig and its Builder to accept multiple origins.
Result:
Users are now able to configure multiple origins for CORS.
[https://github.com/netty/netty/issues/2346]
Motivation:
In ChunkedWriteHandler, there is a redundant variable that servers
no purpose. It implies that under some conditions you might not want
to flush.
Modifications:
Removed the variable and the if condition that read it. The boolean
was always true so just removing the if statement was fine.
Result:
Slightly less misleading code.
Motivation:
Reduce memory usage in ProtobufVarint32LengthFieldPrepender.
Modifications:
Explicit set the buffer size that is needed for the header (between 1 and 5 bytes).
Result:
Less memory usage in ProtobufVarint32LengthFieldPrepender.
Motivation:
At the moment we use the system-wide default selector provider for this invocation of the Java virtual machine when constructing a new NIO channel, which makes using an alternative SelectorProvider practically useless.
This change allows user specify his/her preferred SelectorProvider.
Modifications:
Add SelectorProvider as a param for current `private static *Channel newSocket` method of NioSocketChannel, NioServerSocketChannel and NioDatagramChannel.
Change default constructors of NioSocketChannel, NioServerSocketChannel and NioDatagramChannel to use DEFAULT_SELECTOR_PROVIDER when calling newSocket(SelectorProvider).
Add new constructors for NioSocketChannel, NioServerSocketChannel and NioDatagramChannel which allow user specify his/her preferred SelectorProvider.
Result:
Now users can specify his/her preferred SelectorProvider when constructing an NIO channel.
Motivation:
Make sure the remote/local InetSocketAddress can be obtained correctly
Modifications:
Set the remote/local InetSocketAddress after a bind/connect operation was performed
Result:
It is possible to still access the informations even after the fd became invalid. This mirror the behaviour of NIO.
Motivation:
Previously, we used URLDecoder.decode(...) to decode url-encoded data. This generates a lot of garbage and takes a considerable amount of time.
Modifications:
Replace URLDecoder.decode(...) with QueryStringDecoder.decodeComponent(...)
Result:
Less garbage to GC and faster decode processing.
Motivation:
When running the build with Java 8 the following error occurred:
java: reference to preflightResponseHeader is ambiguous
both method
<T>preflightResponseHeader(java.lang.CharSequence,java.lang.Iterable<T>)
in io.netty.handler.codec.http.cors.CorsConfig.Builder and method
<T>preflightResponseHeader(java.lang.String,java.util.concurrent.Callable<T>)
in io.netty.handler.codec.http.cors.CorsConfig.Builder match
The offending class was CorsConfigTest and its shouldThrowIfValueIsNull
which contained the following line:
withOrigin("*").preflightResponseHeader("HeaderName", null).build();
Modifications:
Updated the offending method with to supply a type, and object array, to
avoid the error.
Result:
After this I was able to build with Java 7 and Java 8
Motivation:
An intermediary like a load balancer might require that a Cross Origin
Resource Sharing (CORS) preflight request have certain headers set.
As a concrete example the Elastic Load Balancer (ELB) requires the
'Date' and 'Content-Length' header to be set or it will fail with a 502
error code.
This works is an enhancement of https://github.com/netty/netty/pull/2290
Modifications:
CorsConfig has been extended to make additional HTTP response headers
configurable for preflight responses. Since some headers, like the
'Date' header need to be generated each time, m0wfo suggested using a
Callable.
Result:
By default, the 'Date' and 'Content-Lenght' headers will be sent in a
preflight response. This can be overriden and users can specify
any headers that might be required by different intermediaries.
Motivation:
Previously, we used SecureRandom.nextLong() to generate the initialSeedUniquifier. This required more entrophy than necessary because it has to 1) generate the seed of SecureRandom first and then 2) generate a random long integer. Instead, we can use generateSeed() to skip the step (2)
Modifications:
Use generateSeed() instead of nextLong()
Result:
ThreadLocalRandom requires less amount of entrphy to start up
Motivation:
As reported in #2331, some query operations in NetworkInterface takes much longer time than we expected. For example, specifying -Djava.net.preferIPv4Stack=true option in Window increases the execution time by more than 4 times. Some Windows systems have more than 20 network interfaces, and this problem gets bigger as the number of unused (virtual) NICs increases.
Modification:
Use NetworkInterface.getInetAddresses() wherever possible.
Before iterating over all NICs reported by NetworkInterface, filter the NICs without proper InetAddresses. This reduces the number of candidates quite a lot.
NetUtil does not query hardware address of NIC in the first place but uses InetAddress.isLoopbackAddress().
Do not call unnecessary query operations on NetworkInterface. Just get hardware address and compare.
Result:
Significantly reduced class initialization time
Motivation:
Remove the synchronization bottleneck in PoolArena and so speed up things
Modifications:
This implementation uses kind of the same technics as outlined in the jemalloc paper and jemalloc
blogpost https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919.
At the moment we only cache for "known" Threads (that powers EventExecutors) and not for others to keep the overhead
minimal when need to free up unused buffers in the cache and free up cached buffers once the Thread completes. Here
we use multi-level caches for tiny, small and normal allocations. Huge allocations are not cached at all to keep the
memory usage at a sane level. All the different cache configurations can be adjusted via system properties or the constructor
directly where it makes sense.
Result:
Less conditions as most allocations can be served by the cache itself
Motivation:
6e8ba291cf introduced a regression in Android because Android does not have sun.nio.ch.DirectBuffer (see #2330.) I also found PlatformDependent0.freeDirectBuffer() and freeDirectBufferUnsafe() are pretty much same after the commit and the unsafe version should be removed.
Modifications:
- Merge PlatformDependent0.freeDirectBuffer() and freeDirectBufferUnsafe() into one method.
- Make the Unsafe unavailable when sun.nio.ch.DirectBuffer is unavailable. We could keep the Unsafe available and handle the sun.nio.ch.DirectBuffer case separately, but I don't want to complicate our code just because of that. All supported JDK versions have sun.nio.ch.DirectBuffer if the Unsafe is available.
Result:
Simpler code. Fixes Android support (#2330)
Motivation:
Currently we use System.currentTimeMillis() in our timeout handlers this is bad
for various reasons like when the clock adjusts etc.
Modifications:
Replace System.currentTimeMillis() with System.nanoTime()
Result:
More robust timeout handling
Motivation:
I was studying the code and thought this was simpler and easier to
understand.
Modifications:
Replaced the for loop and if conditions, with a simple implementation.
Result:
Code is easier to understand.
Motivation:
While investigating the recent CI machine crashes, I observed that the
JVM processes spawned by surefire sometimes take up to 1 GiB RAM.
Consuming large amount of memory isn't really a problem, but we need to
make sure no GC trashing is occuring during the tests.
Modifications:
Add -verbose:gc option to the test JVM arguments
Result:
We can determine if there is any GC anomalies going on in our CI
machine.
Motivation:
Testing the OIO transport takes longer time than other transports because it has to wait for SO_TIMEOUT if there is nothing to read. In production, it's not a good idea to decrease this value (1000ms) because it will result in so many SocketTimeoutExceptions internally, but doing so in the testsuite should be fine.
Modifications:
Reduce the default SO_TIMEOUT of OIO channels to 10 ms.
Result:
Our testsuite finishes sooner.
Motivation:
The epoll testsuite tests the epoll transport only against itself (i.e. epoll x epoll only). We should test the epoll transport also against the well-tested NIO transport, too.
Modifications:
- Make SocketTestPermutation extensible and reusable so that the epoll testsuite can take advantage of it.
- Rename EpollTestUtils to EpollSocketTestPermutation and make it extend SocketTestPermutation.
- Overall clean-up of SocketTestPermutation
- Use Arrays.asList() for simplicity
- Add combo() method to remove code duplication
Result:
The epoll transport is now also tested against the NIO transport. SocketTestPermutation got cleaner.
Motivation:
Previous commit (2de65e25e9) introduced a regression that makes the epoll testsuite fail with an 'incompatible event loop' error.
Modifications:
Use the correct event loop type.
Result:
Build doesn't fail anymore.
Motivation:
We are seeing EpollSocketSslEchoTest does not finish itself while its I/O thread is busy. Jenkins should have terminated them when the global build timeout reaches, but Jenkins seems to fail to do so. What's more interesting is that Jenkins will start another job before the EpollSocketSslEchoTest is terminated, and Linux starts to oom-kill them, impacting the uptime of the CI service.
Modifications:
- Set timeout for all test cases in SocketSslEchoTest so that all SSL tests terminate themselves when they take too long.
- Fix a bug where the epoll testsuite uses non-daemon threads which can potentially prevent JVM from quitting.
- (Cleanup) Separate boss group and worker group just like we do for NIO/OIO transport testsuite.
Result:
Potentially more stable CI machine.
Motivation:
We better use UnresolveableAddressException as NIO does the same.
Modifications:
Replace usage of UnknownHostException with UnresolveableAddressException
Result:
epoll transport and nio transport behave the same way
Motivation:
Allow the user to create a NioServerSocketChannel from an existing ServerSocketChannel.
Modifications:
Add an extra constructor
Result:
Now the user is be able to create a NioServerSocketChannel from an existing ServerSocketChannel, like he can do with all the other Nio*Channel implemntations.
Motivation:
Ensure the user know the Channel must be closed to release resources like filehandles.
Modifications:
Add some extra javadoc.
Result:
More clear documentation
Motivation:
At the moment when an unresolvable InetSocketAddress is passed into the epoll transport a NPE is thrown
Modifications:
Add check in place which will throw an UnknownHostException if an InetSocketAddress could not been resolved.
Result:
Proper handling of unresolvable InetSocketAddresses
Motivation:
If the last item analyzed in a previous received HttpChunk/HttpContent was a part of an attribute's name, the read index was not set to the new right place and therefore raizing an exception in some case (since the "new" name analyzed is empty, which is not allowed so the exception).
What appears there is that the read index should be reset to the last valid position encountered whatever the case. Currently it was set when only when there is an attribute not already finished (name is ok, but content is possibly not).
Therefore the issue is that elements could be rescanned multiple times (including completed elements) and moreover some bad decoding can occur such as when in a middle of an attribute's name.
Modifications:
To fix this issue, since "firstpos" contains the last "valid" read index of the decoding (when finding a '&', '=', 'CR/LF'), we should add the setting of the read index for the following cases:
'lastchunk' encountered, therefore finishing the current buffer
any other cases than current attribute is not finished (name not found yet in particular)
So adding for this 2 cases:
undecodedChunk.readerIndex(firstpos);
Result:
Now the decoding is done once, content is added from chunk/content to chunk/content, name is decoded correctly even if in the middle of 2 chunks/contents.
A Junit test code was added: testChunkCorrect that should not raized any exception.
Motivation:
When starting with a read-only NIO buffer, wrapping it in a ByteBuf,
and then later retrieving a re-wrapped NIO buffer the limit was getting
too short.
Modifications:
Changed ReadOnlyByteBufferBuf.nioBuffer(int,int) to compute the
limit in the same manner as the internalNioBuffer method.
Result:
Round-trip conversion from NIO to ByteBuf to NIO will work reliably.
Motivation:
Remove the synchronization bottleneck in startThread() which is called by each execute(..) call from outside the EventLoop.
Modifications:
Replace the synchronized block with the use of AtomicInteger and compareAndSet loops.
Result:
Less conditions during SingleThreadEventExecutor.execute(...)
Motivation:
Cleanup pom.xml file.
Modifications:
Remove sniffer whitelist entries for NIO.2 as we not include a NIO.2 bases transport anymore.
Result:
Less entries in pom.xml
Motivation:
At the moment we use SocketChannel.open(), ServerSocketChannel.open() and DatagramSocketChannel.open(...) within the constructor of our
NIO channels. This introduces a bottleneck if you create a lot of connections as these calls delegate to SelectorProvider.provider() which
uses synchronized internal. This change removed the bottleneck.
Modifications:
Obtain a static instance of the SelectorProvider and use SelectorProvider.openSocketChannel(), SelectorProvider.openServerSocketChannel() and
SelectorProvider.openDatagramChannel(). This eliminates the bottleneck as SelectorProvider.provider() is not called on every channel creation.
Result:
Less conditions when create new channels.