Motivation:
In linux kernel 3.9 a new featured named SO_REUSEPORT was introduced which allows to have multiple sockets bind to the same port and so handle the accept() of new connections with multiple threads. This can greatly improve the performance when you not to accept a lot of connections.
Modifications:
Implement SO_REUSEPORT via JNI
Result:
Be able to use the SO_REUSEPORT feature when using the EpollServerSocketChannel
Motivation:
Fix leaks reported during running SpdyFrameDecoderTest
Modifications:
Make sure the produced buffers of SpdyFrameDecoder and SpdyFrameDecoderTest are released
Result:
No more leak reports during run the tests.
Motivation:
Fix leaks reported during running SpdyFrameDecoderTest
Modifications:
Make sure the produced buffer of SpdyFrameDecoder is released
Result:
No more leak reports during run the tests.
Motivation:
Fix leaks reported during SPDY test.
Modifications:
Use ReferenceCountUtil.releaseLater(...) to make sure everything is released once the tests are done.
Result:
No more leak reports during run the tests.
Motivation:
The HTTP2 example client logs, and it's useful to show what's
going on. It'd be sweet if the server did too.
Modifications:
Added Http2FrameLogger to example server pipeline.
Result:
HTTP2 example server will log frames.
and initial settings frame were sent.
Motivation:
The current logic for marking the connection header and initial settings
frame is a bit complicated.
Modifications:
- Http2FrameEncoder, Http2ConnectionHandler: removed sent listeners and
set the sent flag immediately.
Result:
Just a code cleanup ... behavior is the same.
Motivation:
Currently, the SPDY frame encoding and decoding code is based upon
the ChannelHandler abstraction. This requires maintaining multiple
versions for 3.x and 4.x (and possibly 5.x moving forward).
Modifications:
The SPDY frame encoding and decoding code is separated from the
ChannelHandler and SpdyFrame abstractions. Also test coverage is
improved.
Result:
SpdyFrameCodec now implements the ChannelHandler abstraction and is
responsible for creating and handling SpdyFrame objects.
Motivation:
PoolArena's 'normalizeCapacity' function was micro-optimized some
time ago to remove a while loop. However, there was a change of
behavior in the function as a result. Capacities passed into it
that are already powers of 2 (and >= 512) are doubled in size. So
if I ask for a buffer with a capacity of 1024, I will get back one
that actually uses 2048 bytes (stored in maxLength).
Aligning to powers of two for book keeping ease is reasonable,
and if someone tries to expand a buffer, you might as well use some
of the previously wasted space. However, since this distinction
between 'easily expanded' and 'costly to expand' space is not
supported at all by the APIs, I cannot imagine this change to
doubling is desirable or intentional.
This is especially costly when using composite buffers. They
frequently allocate components with a capacity that is a power of
2, and they never attempt to expand components themselves. The end
result is that heavy use of pool-backed composite buffers wastes
almost half of the memory pool (the smaller / initial components are
<512 and so are not affected by the off-by-one bug).
Modifications:
Although I find it difficult to believe that such an optimization
is really helpful, I left it in and fixed the off-by-one issue by
decrementing the value at the start.
I also added a simple test to both attempt to verify that the
decrement fixes the issue without introducing any other change, and
to make it easy for a reviewer to test the existing behavior. PoolArena
does not seem to have much testing or testability support though so
the test is kind of a hack and will break for unrelated changes. I
suggest either removing it or factoring out the single non-static
portion of normalizeCapacity so that the fragile dummy PoolArena is
not required.
Result:
Pooled allocators will allocate less resources to the highly
inefficient and undocumented buffer section between length and
maxLength.
Composite buffers of non-trivial size that are backed by pooled
allocators will use about half as much memory.
Motivation:
At the moment we create a HashMap that holds the MembershipKeys for multicast with every NioDatagramChannel even when most people not need it at al
Modifications:
Lazy create the HashMap when needed.
Result:
Less memory usage and less object creation
Motivation:
Currently, there exists no example which shows how to use the memcache binary
protocol.
Modifications:
Add an example client and client handler to show how to utilize the binary
protocol in a memcache client with a simple interactive shell.
Result:
Users looking for an example can now start off with the provided one.
Motivation:
The current HTTP/2 support does not properly comply with the HTTP/2 spec
wrt startup.
Modifications:
Changed the frame codec as well as the connection handler to support
exchange of the connection preface, followed immediately by an initial
settings frame.
Result:
The HTTP/2 initialization handshake will be in compliance with the spec.
Will need more work to support the upgrade protocols, however :)
Motivation:
We sometimes see data corruption when writing to the EpollSocketChannel.
Modifications:
Correctly update the position of the ByteBuffer after something was written.
Result:
Fix data-corruption which could happen on partial writes
Motivation:
At the moment we create new ThreadPoolCache whenever a Thread tries either allocate or release something on the PooledByteBufAllocator. When something is released we put it then in its ThreadPoolCache. The problem is we never check if a Thread is not alive anymore and so we may end up with memory that is never freed again if a user create many short living Threads that use the PooledByteBufAllocator.
Modifications:
Periodically check if the Thread is still alive that has a ThreadPoolCache assinged and if not free it.
Result:
Memory is freed up correctly even for short living Threads.
Motivation:
When using System.getProperty(...) and various methods to get a ClassLoader it will fail when a SecurityManager is in place.
Modifications:
Use a priveled block if needed. This work is based in the PR #2353 done by @anilsaldhana .
Result:
Code works also when SecurityManager is present
Motivation:
Provide some example code to show how to bootstrap client and server for
use with HTTP/2 framing.
Modifications:
- Fixed Http2ConnectionHandler to allow headers after stream creation.
Needed for response headers.
- Added toString() to all frame classes to help with debugging/logging
- Added example classes for HTTP/2
Result:
HTTP/2 connections now properly support response headers. Examples for
HTTP/2 provided with the distribution of examples module.
After your change, what will change.
Motivation:
At the moment a user can not safetly call slice().retain() or duplicate.retain()in the ByteToMessageDecoder.decode(...) implementation without the risk to see coruption because we may call discardSomeReadBytes() to make room on the buffer once the handling is done.
Modifications:
Check for the refCnt() before call discardSomeReadBytes() and also check before call decode(...) to create a copy if needed.
Result:
The user can safetly call slice().retain() or duplicate.retain() in his/her ByteToMessageDecoder.decode(...) method.
Motivation:
Bring the http2 codec more inline with the rest of Netty's code-base.
Modifications:
- Add null checks in constructors
- Move private static methods into inner-classes where they are used
- Use getBytes(CharsetUtil.UTf8)
Result:
More consistent code-base
Motivation:
The use of the guava library library is fairly superficial and can
easily be removed to reduce the overall dependency list of the module.
Cleaning up the HTTP2 code to use common Netty utilities, etc.
Modifications:
- Various changes to remove use of guava preconditions and collections.
- Changed default charset to use CharsetUtil.UTF_8.
- Changed StreamState to use ArrayDeque instead of LinkedList.
- Changed precondition checks to throw NPE
- Changed implementation of Http2Headers to more closely mirror the SPDY
implementation.
Result:
The behavior of the HTTP2 module will remain unchanged.
Motivation:
Because we not null out the array entry in the SelectionKey[] which is produced by SelectedSelectionKeySet.flip() we may end up with a few SelectionKeyreferences still hanging around here even after the Channel was closed. As these entries may be present at the end of the SelectionKey[] which is never updated for a long time as not enough SelectionKeys are ready.
Modifications:
Once we access the SelectionKey out of the SelectionKey[] we directly null it out.
Result:
Reference can be GC'ed right away once the Channel was closed.
Motivation:
EpollSocketChannel.remoteAddress0() is always null on accepted EpollSocketChannels as we not set it excplicit.
Modifications:
Correctly retrieve the local and remote address when accept new channel and store it
Result:
EpollSocketchannel.remoteAddress0() and EpollSocketChannel.localAddress0() return correct addresses
Motivation:
At the moment we do a Channel.isActive() check in every AbstractChannel.AbstractUnsafe.write(...) call which gives quite some overhead as shown in the profiler when you write fast enough. We can eliminate the check and do something more smart here.
Modifications:
Remove the isActive() check and just check if the ChannelOutboundBuffer was set to null before, which means the Channel was closed. The rest will be handled in flush0() anyway.
Result:
Less overhead when doing many write calls
Motivation:
Native.epollCreate(...) fails on systems using a kernel < 2.6.27 / glibc < 2.9 because it uses epoll_create1(...) without checking if it is present
Modifications:
Check if epoll_create1(...) exists abd if not fall back to use epoll_create(...)
Result:
Works even on systems with kernel < 2.6.27 / glibc < 2.9
Motivation:
In SslHandler.safeClose(...) we attach a ChannelFutureListener to the flushFuture and will notify the ChannelPromise which was used for close(...) in it. The problem here is that we only call ChannelHandlerContext.close(ChannelPromise) if Channel.isActive() is true and otherwise not notify it at all. We should just call ChannelHandlerContext.close(ChannelPromise) in all cases.
Modifications:
Always call ChannelHandlerContext.close(ChannelPromise) in the ChannelFutureListeiner
Result:
ChannelPromise used for close the Channel is notified in all cases
Motivation:
At the moment an IllegalArgumentException will be thrown if a ChannelPromise is cancelled while propagate through the ChannelPipeline. This is not correct, we should just stop to propagate it as it is valid to cancel at any time.
Modifications:
Stop propagate the operation through the ChannelPipeline once a ChannelPromise is cancelled.
Result:
No more IllegalArgumentException when cancel a ChannelPromise while moving through the ChannelPipeline.
Motivation:
Currently the CORS support only handles a single origin, or a wildcard
origin. This task should enhance Netty's CORS support to allow multiple
origins to be specified. Just being allowed to specify one origin is
particulary limiting when a site support both http and https for
example.
Modifications:
- Updated CorsConfig and its Builder to accept multiple origins.
Result:
Users are now able to configure multiple origins for CORS.
[https://github.com/netty/netty/issues/2346]
Motivation:
In ChunkedWriteHandler, there is a redundant variable that servers
no purpose. It implies that under some conditions you might not want
to flush.
Modifications:
Removed the variable and the if condition that read it. The boolean
was always true so just removing the if statement was fine.
Result:
Slightly less misleading code.
Motivation:
Reduce memory usage in ProtobufVarint32LengthFieldPrepender.
Modifications:
Explicit set the buffer size that is needed for the header (between 1 and 5 bytes).
Result:
Less memory usage in ProtobufVarint32LengthFieldPrepender.
Motivation:
Needed a rough performance comparison between SPDY and HTTP 2.0 framing.
Expected performance gains were seen in HTTP 2.0 due to header
compression.
Modifications:
Added a new codec-http2 module containing all of the new source and unit
tests. Updated the top-level pom.xml to add this as a child module.
Result:
Netty will have basic support for HTTP2.
Motivation:
EventLoop overrides the next() method with the same signature of its
super class EventLoopGroup.
Modifications:
Remove the duplicate method declaration from the EventLoop interface.
Result:
Perhaps makes the API slightly more readable as there is one less
method.
Motivation:
Minor spelling/typo correction for EventExecutor's newSucceededFuture and
newFailedFuture javadocs.
Modifications:
Just correcting the spelling/typo.
Result:
Improve the javadocs.
Motivation:
At the moment we use the system-wide default selector provider for this invocation of the Java virtual machine when constructing a new NIO channel, which makes using an alternative SelectorProvider practically useless.
This change allows user specify his/her preferred SelectorProvider.
Modifications:
Add SelectorProvider as a param for current `private static *Channel newSocket` method of NioSocketChannel, NioServerSocketChannel and NioDatagramChannel.
Change default constructors of NioSocketChannel, NioServerSocketChannel and NioDatagramChannel to use DEFAULT_SELECTOR_PROVIDER when calling newSocket(SelectorProvider).
Add new constructors for NioSocketChannel, NioServerSocketChannel and NioDatagramChannel which allow user specify his/her preferred SelectorProvider.
Result:
Now users can specify his/her preferred SelectorProvider when constructing an NIO channel.
Motivation:
Make sure the remote/local InetSocketAddress can be obtained correctly
Modifications:
Set the remote/local InetSocketAddress after a bind/connect operation was performed
Result:
It is possible to still access the informations even after the fd became invalid. This mirror the behaviour of NIO.
Motivation:
Previously, we used URLDecoder.decode(...) to decode url-encoded data. This generates a lot of garbage and takes a considerable amount of time.
Modifications:
Replace URLDecoder.decode(...) with QueryStringDecoder.decodeComponent(...)
Result:
Less garbage to GC and faster decode processing.
Motivation:
When running the build with Java 8 the following error occurred:
java: reference to preflightResponseHeader is ambiguous
both method
<T>preflightResponseHeader(java.lang.CharSequence,java.lang.Iterable<T>)
in io.netty.handler.codec.http.cors.CorsConfig.Builder and method
<T>preflightResponseHeader(java.lang.String,java.util.concurrent.Callable<T>)
in io.netty.handler.codec.http.cors.CorsConfig.Builder match
The offending class was CorsConfigTest and its shouldThrowIfValueIsNull
which contained the following line:
withOrigin("*").preflightResponseHeader("HeaderName", null).build();
Modifications:
Updated the offending method with to supply a type, and object array, to
avoid the error.
Result:
After this I was able to build with Java 7 and Java 8
Motivation:
An intermediary like a load balancer might require that a Cross Origin
Resource Sharing (CORS) preflight request have certain headers set.
As a concrete example the Elastic Load Balancer (ELB) requires the
'Date' and 'Content-Length' header to be set or it will fail with a 502
error code.
This works is an enhancement of https://github.com/netty/netty/pull/2290
Modifications:
CorsConfig has been extended to make additional HTTP response headers
configurable for preflight responses. Since some headers, like the
'Date' header need to be generated each time, m0wfo suggested using a
Callable.
Result:
By default, the 'Date' and 'Content-Lenght' headers will be sent in a
preflight response. This can be overriden and users can specify
any headers that might be required by different intermediaries.
Motivation:
Previously, we used SecureRandom.nextLong() to generate the initialSeedUniquifier. This required more entrophy than necessary because it has to 1) generate the seed of SecureRandom first and then 2) generate a random long integer. Instead, we can use generateSeed() to skip the step (2)
Modifications:
Use generateSeed() instead of nextLong()
Result:
ThreadLocalRandom requires less amount of entrphy to start up
Motivation:
Some operating systems like Windows 7 uses a valid globally unique EUI-64 MAC address for a virtual device (e.g. 00:00:00:00:00:00:00:E0), and because it's usually longer than the legit MAC-48 address, we should not use the length of MAC address when two MAC addresses are of the same quality. Instead, we should compare the INET address of the NICs before comparing the length of the MAC addresses.
Modification:
Compare the length of MAC addresses as a last resort.
Result:
Correct MAC address detection in Windows with IPv6 enabled.
Motivation:
When there are two MAC addresses which are good enough, we can choose the one with better IP address rather than just choosing the first appeared one.
Modification:
Replace isBetterAddress() with compareAddresses() to make it return if both addresses are in the same preference level.
Add compareAddresses() which compare InetAddresses and use it when compareAddress(byte[], byte[]) returns 0 (same preference)
Result:
More correct primary MAC address detection
Motivation:
As reported in #2331, some query operations in NetworkInterface takes much longer time than we expected. For example, specifying -Djava.net.preferIPv4Stack=true option in Window increases the execution time by more than 4 times. Some Windows systems have more than 20 network interfaces, and this problem gets bigger as the number of unused (virtual) NICs increases.
Modification:
Use NetworkInterface.getInetAddresses() wherever possible.
Before iterating over all NICs reported by NetworkInterface, filter the NICs without proper InetAddresses. This reduces the number of candidates quite a lot.
NetUtil does not query hardware address of NIC in the first place but uses InetAddress.isLoopbackAddress().
Do not call unnecessary query operations on NetworkInterface. Just get hardware address and compare.
Result:
Significantly reduced class initialization time, which should fix#2331.
Motivation:
Remove the synchronization bottleneck in PoolArena and so speed up things
Modifications:
This implementation uses kind of the same technics as outlined in the jemalloc paper and jemalloc
blogpost https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919.
At the moment we only cache for "known" Threads (that powers EventExecutors) and not for others to keep the overhead
minimal when need to free up unused buffers in the cache and free up cached buffers once the Thread completes. Here
we use multi-level caches for tiny, small and normal allocations. Huge allocations are not cached at all to keep the
memory usage at a sane level. All the different cache configurations can be adjusted via system properties or the constructor
directly where it makes sense.
Result:
Less conditions as most allocations can be served by the cache itself
Motivation:
The current ascii table [1] showing the various states that a ChannelFuture
can be in, but the table is slightly "off" for the 'cause'.
Modifications:
- Updated the ascii table to display nicely.
Result:
Easier to read the states of a ChannelFuture.
[1] http://netty.io/5.0/api/io/netty/channel/ChannelFuture.html
Motivation:
6e8ba291cf introduced a regression in Android because Android does not have sun.nio.ch.DirectBuffer (see #2330.) I also found PlatformDependent0.freeDirectBuffer() and freeDirectBufferUnsafe() are pretty much same after the commit and the unsafe version should be removed.
Modifications:
- Do not use the pooled allocator in Android because it's too resource hungry for Androids.
- Merge PlatformDependent0.freeDirectBuffer() and freeDirectBufferUnsafe() into one method.
- Make the Unsafe unavailable when sun.nio.ch.DirectBuffer is unavailable. We could keep the Unsafe available and handle the sun.nio.ch.DirectBuffer case separately, but I don't want to complicate our code just because of that. All supported JDK versions have sun.nio.ch.DirectBuffer if the Unsafe is available.
Result:
Simpler code. Fixes Android support (#2330)
Motivation:
Currently we use System.currentTimeMillis() in our timeout handlers this is bad
for various reasons like when the clock adjusts etc.
Modifications:
Replace System.currentTimeMillis() with System.nanoTime()
Result:
More robust timeout handling
Motivation:
I was studying the code and thought this was simpler and easier to
understand.
Modifications:
Replaced the for loop and if conditions, with a simple implementation.
Result:
Code is easier to understand.
Motivation:
While investigating the recent CI machine crashes, I observed that the
JVM processes spawned by surefire sometimes take up to 1 GiB RAM.
Consuming large amount of memory isn't really a problem, but we need to
make sure no GC trashing is occuring during the tests.
Modifications:
Add -verbose:gc option to the test JVM arguments
Result:
We can determine if there is any GC anomalies going on in our CI
machine.
Motivation:
The epoll testsuite tests the epoll transport only against itself (i.e. epoll x epoll only). We should test the epoll transport also against the well-tested NIO transport, too.
Modifications:
- Make SocketTestPermutation extensible and reusable so that the epoll testsuite can take advantage of it.
- Rename EpollTestUtils to EpollSocketTestPermutation and make it extend SocketTestPermutation.
- Overall clean-up of SocketTestPermutation
- Use Arrays.asList() for simplicity
- Add combo() method to remove code duplication
Result:
The epoll transport is now also tested against the NIO transport. SocketTestPermutation got cleaner.