Motivation:
Use new / non-deprecated APIs for creating SSL Context
in tests, in order to be able to implement OpenSsl
tests with maximum code reuse.
Modifications:
Use `SslContextBuilder.(forServer|forClient)` instead
of deprecated `JdkSslServerContext` constructor.
Use `ApplicationProtocolConfig` instead of Protocol
Negotiator.
Use custom exception type for skipping tests to avoid
swallowing exceptions arising from tests.
Result:
Exceptions from tests aren't swallowed.
Using new APIs allows reusing same test code for
OpenSsl tests.
Motivation:
Fix a race-condition when closing NioSocketChannel or EpollSocketChannel while try to detect if a close executor should be used and the underlying socket was already closed. This could lead to an exception that then leave the channel / in an invalid state and so could lead to side-effects like heavy CPU usage.
Modifications:
Catch possible socket exception while try to get the SO_LINGER options from the underlying socket.
Result:
No more race-condition when closing the channel is possible with bad side-effects.
Motivation:
- On the client, cookies should be sorted in decreasing order of path
length. From RFC 6265:
5.4.2. The user agent SHOULD sort the cookie-list in the following
order:
* Cookies with longer paths are listed before cookies with
shorter paths.
* Among cookies that have equal-length path fields, cookies with
earlier creation-times are listed before cookies with later
creation-times.
NOTE: Not all user agents sort the cookie-list in this order, but
this order reflects common practice when this document was
written, and, historically, there have been servers that
(erroneously) depended on this order.
Note that the RFC does not define the path length of cookies without a
path. We sort pathless cookies before cookies with the longest path,
since pathless cookies inherit the request path (and setting a path
that is longer than the request path is of limited use, since it cannot
be read from the context in which it is written).
- On the server, if there are multiple cookies of the same name, only one
of them should be encoded. RFC 6265 says:
Servers SHOULD NOT include more than one Set-Cookie header field in
the same response with the same cookie-name.
Note that the RFC does not define which cookie should be set in the case
of multiple cookies with the same name; we arbitrarily pick the last one.
Modifications:
- Changed the visibility of the 'strict' field to 'protected' in
CookieEncoder.
- Modified ClientCookieEncoder to sort cookies in decreasing order of path
length when in strict mode.
- Modified ServerCookieEncoder to return only the last cookie of a given
name when in strict mode.
- Added a fast path for both strict mode in both client and server code
for cases with only one cookie, in order avoid the overhead of sorting
and memory allocation.
- Added unit tests for the new cases.
Result:
- Cookie generation on client and server is now more conformant to RFC 6265.
Motivation:
The javadocs for ChannelOption.AUTO_CLOSE say the default is false, but the default is currently true.
Modifications:
- Make javadocs consistent with code
Result:
Less confusing docs.
Motiviation:
According to jetty docs the alpn-api should use the provided scope.
Modificaitons:
- change scope to provided for alpn-api
- update for new jdk
Result:
Users of Netty don't run into alpn version conflicts.
Fixes https://github.com/netty/netty/issues/4480
Motivation:
exceptionCaught(...) will only handle inbound exceptions which means it makes not much sense to have it also on ChannelOutboundHandler. Because of this we should move it to ChannelInboundHandler.
Modifications:
Add @deprecated annotation to ChannelHandler.exceptionCaught(...).
Result:
Preapre to cleanup the API in later release.
Motivation:
6.7 is the latest stable release in RHEL/CentOS 6 line. Given that most
RHEL/CentOS users have upgraded to 6.7 via yum upgrade, we should bump
our requirement.
Modification:
s/6.6/6.7/g
Result:
'mvn release:*' must be run on RHEL/CentOS 6.7 instead of 6.6.
Motivation:
AbstractEpollStreamChannel has a queue which collects splice events. Splice is assumed not to be the most common use case of this class and thus the splice queue could be initialized in a lazy fashion to save memory. This becomes more significant when the number of connections grows.
Modifications:
- AbstractEpollStreamChannel.spliceQueue will be initialized in a lazy fashion
Result:
Less memory consumption for most use cases
Motivation:
We should use OneTimeTask where possible to reduce object creation.
Modifications:
Replace Runnable with OneTimeTask
Result:
Less object creation
Motivation:
There is a notification ordering issue in DefaultPromise when the lateListener collection is in use. The ordering issue can be observed in situations where a late listener is added to a Future returned from a write operation. It is possible that this future will run after a read operation scheduled on the I/O thread, even if the late listener is added on the I/O thread. This can lead to unexpected ordering where a listener for a write operation which must complete in order for the read operation to happen is notified after the read operation is done.
Modifications:
- If the lateListener collection becomes empty, it should be treated as though it was null when checking if lateListeners can be notified immediatley (instead of executing a task on the executor)
Result:
Ordering is more natural and will not be perceived as being out of order relative to other tasks on the same executor.
Motivation:
If you need to handle a lot of concurrent connections (1M+) the memory footprint can be problem.
Modifications:
- Lazy create the IdentityHashMap that holds the EventExecutor mappings as this is not needed by most users anyway
- Use a sane initial capacity when creating the IdentityHashMap
Result:
Smaller memory footprint of DefaultChannelPipeline
Motivation:
We not need to store another reference to AbstractChannel as we can access it through DefaultChannelHandlerContext.
Modifications:
Remove reference.
Result:
Cleaner code.
Motivation:
If you start to have 1M+ concurrent connections memory footprint can be come a big issue. We should try to reduce it as much as possible in the core of netty.
Modifications:
- Remove HashMap that was used to store name to ctx mapping. This was only used for validation and access a handler by name. As a pipeline is not expected to be very long (like 100+ handlers) we can just walk the linked list structure to find the ctx with a given name.
Result:
Less memory footprint of the DefaultChannelPipeline.
Motivation:
If we have a lot of writes going on we currently need to lookup the IovArray for each Channel that does writes. This can have quite some perf overhead. We should not need to do this and just store a reference of the IovArray on the EpollEventLoop itself.
Modifications:
- Remove IoArrayThreadLocal
- Store the IoArray in the EventLoop itself
Result:
Less FastThreadLocal lookups
Motivation:
If ChannelOption.ALLOW_HALF_CLOSURE is true and the shutdown input operation fails we should not propagate this exception, and instead consider this socket's read as half closed.
Modifications:
- AbstractEpollChannel.shutdownInput should not propagate exceptions when attempting to shutdown the input, but instead should just close the socket
Result:
Users expecting a ChannelInputShutdownEvent will get this event even if the socket is already shutdown, and the shutdown operation fails.
Motivation:
The method setBytes did not work correctly because read-only ByteBuffer
does not allow access to its underlying array.
Modifications:
New case was added for ByteBuffer's that are not direct and do not have an array.
These must be handled by copying the data into a temporary array. Unit test was
added to test this case.
Result:
It is now possible to use read-only ByteBuffer as the source
for the setBytes method.
Motivation:
A new version of ALPN boot has been released.
Modifications:
- Update the pom to pull in this new version
Result:
New JDK get new ALPN boot.
Motivation:
Child classes of ApplicationProtocolNegotiationHandler may want to override the behavior when a handshake failure is detected.
Modifications:
- Provide a method which can be overriden when a handshake failure is detected.
Result:
Child classes can override ApplicationProtocolNegotiationHandler handshake failure behavior.
Motivation:
The previous DefaultChannelPipeline#destroy() implementation, introduced in #3156, is suboptimal as it can cause the for loop to continuously spin if the executor used by a given handler is unable to "recognize" the event loop.
It could be objected that it's the custom executor responsibility to properly implement the inEventLoop() method, but some implementetaions might not be able to do that for performance reasons, and even so, it's always better to be safe against API misuse, in particular when it is not possible to fail fast and the alternative is rather some sutle behaviour.
Modifications:
The patch simply avoids the recursive spin by explicitly passing the "in event loop" condition as a boolean parameter, preserving the same guarantees offered by #3156. A unit test has also been added.
Result:
All channel events are correctly called and no high CPU usage is seen anymore.
Motivation:
If netty used as part of application, should be a way to prefix service thread name to easy distinguish such threads (for example, used in IntelliJ Platform)
Modifications:
Introduce system property io.netty.serviceThreadPrefix
Result:
ThreadDeathWatcher thread has a readable name "Netty threadDeathWatcher-2-1" if io.netty.serviceThreadPrefix set to "Netty"
Motivation:
Changing the chache of generated names to use a cache per thread. This will remove the bottleneck when many eventloops are used and names need to generate.
Modifications:
Use a FastThreadLocal to store the cached names.
Result:
Less locking between threads.
Motivation:
We should allow our custom Executor to shutdown quickly.
Modifications:
Call super constructor which correct arguments.
Result:
Custom Executor can be shutdown quickly.
Motivation:
The EPOLL module was not completly respecting the half closed state. It may have missed events, or procssed events when it should not have due to checking isOpen instead of the appropriate shutdown state.
Modifications:
- use FileDescriptor's isShutdown* methods instead of isOpen to check for processing events.
Result:
Half closed code in EPOLL module is more correct.
Motivation:
transport-native-epoll is designed to be specific to Linux. However there is native code that can be extracted out and made to work on more Unix like distributions. There are a few steps to be completely decoupled but the first step is to extract out code that can run in a more general Unix environment from the Linux specific code base.
Modifications:
- Move all non-Linux specific stuff from Native.java into the io.netty.channel.unix package.
- io.netty.channel.unix.FileDescriptor will inherit all the native methods that are specific to file descriptors.
- io_netty_channel_epoll_Native.[c|h] will only have code that is specific to Linux.
Result:
Code is decoupled and design is streamlined in FileDescriptor.
Motivation:
In 4.1 and master the isValid utility has been moved to MathUtil. We should stay consistent for internal APIs.
Modifications:
- Move isValid to MathUtil
Result:
More consistent internal structure across branches.
Motivation:
DefaultPromise.toString() returns 'DefaultPromise(incomplete)' when it's
actually complete with non-null result.
Modifications:
Handle the case where the promise is done and its result is non-null in
toString()
Result:
The String returned by DefaultPromise.toString() is not confusing
anymore.
Motivation:
As reported in #4402, the FastThreadLocalBenchmark shows that the JDK ThreadLocal
is actually faster than Netty's custom thread local implementation.
I was looking forward to doing some deep digging, but got disappointed :(.
Modifications:
The microbenchmark was not using FastThreadLocalThreads and would thus always hit the slow path.
I updated the JMH command line flags, so that FastThreadLocalThreads would be used.
Result:
FastThreadLocalBenchmark shows FastThreadLocal to be faster than JDK's ThreadLocal implementation,
by about 56% in this particular benchmark. Run on OSX El Capitan with OpenJDK 1.8u60.
Benchmark Mode Cnt Score Error Units
FastThreadLocalBenchmark.fastThreadLocal thrpt 20 55452.027 ± 725.713 ops/s
FastThreadLocalBenchmark.jdkThreadLocalGet thrpt 20 35481.888 ± 1471.647 ops/s
Motivation:
To prove one implementation is faster as the other we should have a benchmark.
Modifications:
Add benchmark which benchmarks the unsafe and non-unsafe implementation of HeapByteBuf.
Result:
Able to compare speed of implementations easily.
Motivation:
Modulo operations are slow, we can use bitwise operation to detect if resource leak detection must be done while sampling.
Modifications:
- Ensure the interval is a power of two
- Use bitwise operation for sampling
- Add benchmark.
Result:
Faster sampling.
Motivation:
When the ImmediateEventExecutor is in use it is possible to get a StackOverFlowException if when a promise completes a new listener is added to that promise.
Modifications:
- Protect against the case where LateListeners.run() smashes the stack.
Result:
Fixes https://github.com/netty/netty/issues/4395
Motiviation:
If a user writes from outside the EventLoop we increase the pending bytes of the outbound buffer before submitting the write request. This is done so the user can stop writing asap once the channel turns unwritable. Unfortunally this doesn't take the overhead of adding the task into the account and so it is very easy for an user to full up the task queue. Beside this we use a value of 0 for an unown message by default which is not ideal.
Modifications:
- port the message calculation we used in netty 3.x into AbstractChannelHandlerContext and so better calculate the overhead of a message that is submitted from outside the EventLoop
- change the default estimated size for an unknown message to 8.
Result:
Better behaviour when submiting writes from outside the EventLoop.
Keep RTSPRequestEncoder, RTSPRequestDecoder, RTSPResponseEncoder and
RTSPResponseDecoder for backwards compatibility but they now just extends
the generic encoder/decoder and are markes as deprecated.
Renamed the decoder test, because the decoder is now generic. Added
testcase for when ANNOUNCE request is received from server.
Created testcases for encoder.
Mark abstract base classes RTSPObjectEncoder and RTSPObjectDecoder as
deprecated, that functionality is now in RTSPEncoder and RTSPDecoder.
Added annotation in RtspHeaders to suppress warnings about deprecation, no need when
whole class is deprecated.
Motivation:
Fix a race condition that was introduced by f18990a8a5 that could lead to a NPE when allocate from the PooledByteBufAllocator concurrently by many threads.
Modifications:
Correctly synchronize on the PoolSubPage head.
Result:
No more race.
Motivation:
OpenSslServerContext should not reinitialize the provided TrustManagerFactory with the key cert chain as the user should be able to pass a fully initialized TrustManagerFactory. This is also in line with how JdkSslServerContext works.
Modifications:
Not reinitialize the provided TrustManagerFactory with the key cert chain.
Result:
Correct and consistent behavior.
Motivation:
Once a FixedChannelPool was closed we must not allow to acquire or release Channels to prevent assert errors.
Modifications:
Fail release and acquire calls when FixedChannelPool is closed.
Result:
No more assert errors.1
Motiviation:
We have a lot of duplicated code which makes it hard to maintain.
Modification:
Move shared code to UnsafeByteBufUtil and use it in the implementations.
Result:
Less duplicated code and so easier to maintain.
Motiviation:
We have a lot of duplicated code which makes it hard to maintain.
Modification:
Move shared code to HeapByteBufUtil and use it in the implementations.
Result:
Less duplicated code and so easier to maintain.
Motivation:
sun.misc.Unsafe allows us to handle heap ByteBuf in a more efficient matter. We should use special ByteBuf implementation when sun.misc.Unsafe can be used to increase performance.
Modifications:
- Add PooledUnsafeHeapByteBuf and UnpooledUnsafeHeapByteBuf that are used when sun.misc.Unsafe is ready to use.
- Add UnsafeHeapSwappedByteBuf
Result:
Better performance when using heap buffers and sun.misc.Unsafe is ready to use.
Motivation:
We had a bug in our implemention which double "reversed" bytes on systems which not support unaligned access.
Modifications:
- Correctly only reverse bytes if needed.
- Share code between unsafe implementations.
Result:
No more data-corruption on sytems without unaligned access.
Motivation:
When moving bytes between a PooledUnsafeDirectByteBuf or an UnpooledUnsafeDirectByteBuf
and a ByteBuffer, a temp ByteBuffer is allocated and will need to be GCed. This is a
common case since a ByteBuffer is always needed when reading/writing on a file,
for example.
Modifications:
Use PlatformDependent.copyMemory() to avoid the need for the temp ByteBuffer
Result:
No temp ByteBuffer allocated and GCed.
Motivation:
SlicedByteBuf did double reference count checking for various bulk operations, which affects performance.
Modifications:
- Add package private method to AbstractByteBuf that can be used to check indexes without check the reference count
- Use this new method in the bulk operation os SlicedByteBuf as the reference count checks take place on the wrapped buffer anyway
- Fix test-case to not try to read data that is out of the bounds of the buffer.
Result:
Better performance on bulk operations when using SlicedByteBuf (and sub-classes)
Motivation:
We started the thread before store it in a field which could lead to an assert error when the thread is executed before we actually store it.
Modifications:
Store thread before start it.
Result:
No more assert error possible.
Motivation:
Some of the tests in the buffer module contained unused code. Some of the tests also used unnecessary inheritance which could be avoided to simplify code.
Modifications:
Cleanup the test cases.
Result:
Cleaner code, less cruft.
Motivation:
We need to always return a real slice even when the requested length is 0. This is needed as otherwise we not correctly share the reference count and so may leak a buffer if the user call release() on the returned slice and expect it to decrement the reference count of the "parent" buffer.
Modifications:
- Always return a real slice
- Add unit test for the bug.
Result:
No more leak possible when a user requests a slice of length 0 of a SlicedByteBuf.