Motivation:
Currently our epoll native transport requires sun.misc.Unsafe and so we need to take this into account for Epoll.isAvailable().
Modifications:
Take into account if sun.misc.Unsafe is present.
Result:
Only return true for Epoll.isAvailable() if sun.misc.Unsafe is present.
Motivation:
If Netty's class files are renamed and the type references are updated (shaded) the native libraries will not function. The native epoll module uses implicit JNI bindings which requires the fully qualified java type names to match the method signatures of the native methods. This means EPOLL cannot be used with a shaded Netty.
Modifications:
- Make the JNI method registration dynamic
- support a system property io.netty.packagePrefix which must be prepended to the name of the native library (to ensure the correct library is loaded) and all class names (to allow classes to be correctly referenced)
- remove system property io.netty.native.epoll.nettyPackagePrefix which was recently added and the code to support it was incomplete
Result:
transport-native-epoll can be used when Netty has been shaded.
Fixes https://github.com/netty/netty/issues/4800
Motivation:
If an user will close a Socket / FileDescriptor multiple times we should handle the extra close operations as NOOP.
Modifications:
Only do the actual closing one time
Result:
No exception if close is called multiple times.
Motivation:
We need to remove all registered events for a Channel from the EventLoop before doing the actual close to ensure we not produce a cpu spin when the actual close operation is delayed or executed outside of the EventLoop.
Modifications:
Deregister for events for NIO and EPOLL socket implementations when SO_LINGER is used.
Result:
No more cpu spin.
Motivation:
We should retain the original hostname when connect to a remote peer so the user can still query the origin hostname if getHostString() is used.
Modifications:
Compute a InetSocketAddress from the original remote address and the one returned by the Os.
Result:
Same behavior when using epoll transport and nio transport.
Motivation:
Fix a race-condition when closing NioSocketChannel or EpollSocketChannel while try to detect if a close executor should be used and the underlying socket was already closed. This could lead to an exception that then leave the channel / in an invalid state and so could lead to side-effects like heavy CPU usage.
Modifications:
Catch possible socket exception while try to get the SO_LINGER options from the underlying socket.
Result:
No more race-condition when closing the channel is possible with bad side-effects.
Motivation:
AbstractEpollStreamChannel has a queue which collects splice events. Splice is assumed not to be the most common use case of this class and thus the splice queue could be initialized in a lazy fashion to save memory. This becomes more significant when the number of connections grows.
Modifications:
- AbstractEpollStreamChannel.spliceQueue will be initialized in a lazy fashion
Result:
Less memory consumption for most use cases
Motivation:
We should use OneTimeTask where possible to reduce object creation.
Modifications:
Replace Runnable with OneTimeTask
Result:
Less object creation
Motivation:
If we have a lot of writes going on we currently need to lookup the IovArray for each Channel that does writes. This can have quite some perf overhead. We should not need to do this and just store a reference of the IovArray on the EpollEventLoop itself.
Modifications:
- Remove IoArrayThreadLocal
- Store the IoArray in the EventLoop itself
Result:
Less FastThreadLocal lookups
Motivation:
If ChannelOption.ALLOW_HALF_CLOSURE is true and the shutdown input operation fails we should not propagate this exception, and instead consider this socket's read as half closed.
Modifications:
- AbstractEpollChannel.shutdownInput should not propagate exceptions when attempting to shutdown the input, but instead should just close the socket
Result:
Users expecting a ChannelInputShutdownEvent will get this event even if the socket is already shutdown, and the shutdown operation fails.
Motivation:
The EPOLL module was not completly respecting the half closed state. It may have missed events, or procssed events when it should not have due to checking isOpen instead of the appropriate shutdown state.
Modifications:
- use FileDescriptor's isShutdown* methods instead of isOpen to check for processing events.
Result:
Half closed code in EPOLL module is more correct.
Motivation:
transport-native-epoll is designed to be specific to Linux. However there is native code that can be extracted out and made to work on more Unix like distributions. There are a few steps to be completely decoupled but the first step is to extract out code that can run in a more general Unix environment from the Linux specific code base.
Modifications:
- Move all non-Linux specific stuff from Native.java into the io.netty.channel.unix package.
- io.netty.channel.unix.FileDescriptor will inherit all the native methods that are specific to file descriptors.
- io_netty_channel_epoll_Native.[c|h] will only have code that is specific to Linux.
Result:
Code is decoupled and design is streamlined in FileDescriptor.
Motivation:
If a RDHUP and IN event occurred at the same time it is possible we may not read all pending data on the channel. We should ensure we read data before processing the RDHUP event.
Modifications:
- Process the RDHUP event before the IN event.
Result:
Data will not be dropped.
Fixes https://github.com/netty/netty/issues/4317
Motivation:
EPOLL attempts to support half closed socket, but fails to call shutdown to close the read portion of the file descriptor.
Motivation:
- If half closed is supported shutting down the input should call underlying Native.shutdown(...) to make sure the peer is notified of the half closed state.
Result:
EPOLL half closed is more correct.
Motivation:
We should fail the build on warnings in the JNI/c code.
Modifications:
- Add GCC flag to fail build on warnings.
- Fix warnings (which also fixed a bug when using splice with offsets).
Result:
Better code quality.
Motivation:
We should call shutdown(...) on the socket before closing the filedescriptor to ensure it is closed gracefully.
Modifications:
Call shutdown(...) before close.
Result:
Sockets are gracefully shutdown when using native transport.
Motivation:
writeBytes(...) missed to set EPOLLOUT flag when not all bytes were written. This could lead to have the EpollEventLoop not try to flush the remaining bytes once the socket becomes writable again.
Modifications:
- Move setting EPOLLOUT flag logic to one point so we are sure we always do it.
- Move OP_WRITE flag logic to one point as well.
Result:
Correctly try to write pending data if socket becomes writable again.
Motivation:
TCP Fast Open allows data to be carried in the SYN and SYN-ACK packets and consumed by the receiving end during the initial connection handshake, and saves up to one full round-trip time (RTT) compared to the standard TCP, which requires a three-way handshake (3WHS) to complete before data can be exchanged. This commit enables support for TFO on server sockets.
Modifications:
Added new Integer Option TCP_FASTOPEN in EpollChannelOption.
Added getters/setters in EpollServerChannelConfig for TCP_FASTOPEN.
Added way to check if TCP_FASTOPEN is supported on server in Native.
Added setting on socket opt TCP_FASTOPEN if value is set on channel options in doBind in EpollServerSocketChannel.
Enhanced EpollSocketTestPermutation to contain a permutation for server socket containing fast open.
Result:
Users of native-epoll can set TCP_FASTOPEN on server sockets and thus leverage fast connect features of RFC7413 if client is capable of it.
Conflicts:
transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollChannelOption.java
Motivation:
There are protocols (BGP, SXP), which are typically deployed with TCP
MD5 authentication to protect sessions from being hijacked/torn down by
third parties. This facility is not available on most operating systems,
but is typically present on Linux.
Modifications:
- add a new EpollChannelOption, which is write-only
- teach Epoll(Server)SocketChannel to track which addresses have keys
associated
- teach Native how to set the MD5 signature keys for a socket
Result:
Users of the native-epoll transport can set MD5 signature keys and thus
leverage RFC-2385 protection on TCP connections.
Motivation:
See #4174.
Modifications:
Modify transport-native-epoll to allow setting TCP_USER_TIMEOUT.
Result:
Hanging connections that are written into will get timeouted.
Conflicts:
transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollChannelOption.java
Motivation:
When try to get SO_LINGER from a fd that is closed an Exception is thrown. We should only try to get SO_LINGER if the fd is still open otherwise an Exception is thrown that can be ignored anyway.
Modifications:
First check if the fd is still open before try to obtain SO_LINGER setting when get the closeExecutor. This is also the same that we do in the NIO transport.
Result:
No more exception when calling unsafe.close() on a channel that has a closed file descriptor.
Motivation:
Commit cf171ff52555b9e984a3b9103287f6b897dc8626 changed the way read operations were done. This change introduced a feedback loop between fireException and epollInReady.
Modifications:
- All EPOLL*Channel* classes should not call fireException and also continue to read. Instead a read operation should be executed on the eventloop (if the channel's input is not closed, and other conditions are satisfied)
Result:
Exception processing and channelRead will not be in a feedback loop.
Fixes https://github.com/netty/netty/issues/4091
Motivation:
Because of java custom UTF encoding, it was previously impossible to use
nul-bytes in domain socket names, which is required for abstract domain
sockets.
Modifications:
- Pass the encoded string byte array to the native code
- Modify native code accordingly to work with nul-bytes in the the
array.
- Move the string encoding to UTF-8 in java code.
Result:
Unix domain socket addresses will work properly if they contain nul-
bytes. Address encoding for these addresses changes from UTF-8-like to
real UTF-8.
Motivation:
If is enabled and a channel is half closed it is possible for the EPOLL event loop to get into an infinite loop by continuously being woken up on the EPOLLRDHUP event.
Modifications:
- Ensure that the EPOLLRDHUP event is unregistered for to prevent infinite loop.
Result:
1 less infinite loop.
Motiviation:
The current read loops don't fascilitate reading a maximum amount of bytes. This capability is useful to have more fine grain control over how much data is injested.
Modifications:
- Add a setMaxBytesPerRead(int) and getMaxBytesPerRead() to ChannelConfig
- Add a setMaxBytesPerIndividualRead(int) and getMaxBytesPerIndividualRead to ChannelConfig
- Add methods to RecvByteBufAllocator so that a pluggable scheme can be used to control the behavior of the read loop.
- Modify read loop for all transport types to respect the new RecvByteBufAllocator API
Result:
The ability to control how many bytes are read for each read operation/loop, and a more extensible read loop.
Motivation:
IP_FREEBIND allows to bind to addresses without the address up yet or even the interface configured yet.
Modifications:
Add support for IP_FREEBIND.
Result:
It's now possible to use IP_FREEBIND when using the native epoll transport.
Motivation:
It would be useful to support the Java `Map` interface in our primitive maps.
Modifications:
Renamed current methods to "pXXX", where p is short for "primitive". Made the template for all primitive maps extend the appropriate Map interface.
Result:
Fixes#3970
Motivation:
We missed to register for EPOLLRDHUP events when construct the EpollSocketChannel from an existing FileDescriptor. This could cause to miss connection-resets.
Modifications:
Add Native.EPOLLRDHUP to the events we are interested in.
Result:
Connection-resets are detected correctly.
Motivation:
Some glibc/kernel versions will trigger an EPOLLERR event to notify
about failed connect and not an EPOLLOUT. Also EPOLLERR may be triggered
when a connection is broke.
Modification:
React on EPOLLERR like if an EPOLLOUT / EPOLLIN was received, this will work in
all cases as we handle errors in EPOLLOUT / EPOLLIN anyway.
Result:
Correctly detect errors.
Motiviation:
Linux provides the TCP_NOTSENT_LOWAT socket option. This can be used to control how much unsent data is queued in the tcp kernel buffers. This can be important when application level protocols (SPDY, HTTP/2) have their own priority mechanism and don't want data queued in the kernel.
Modifications:
- The epoll module will have an additional socket option TCP_NOTSENT_LOWAT
- There will be JNI methods to control the underlying linux socket option mechanism
Result:
Linux EPOLL module exposes the TCP_NOTSENT_LOWAT socket option.
Motivation:
Due a bug we not correctly handled connection refused errors and so failed the connect promise with the wrong exception.
Beside this we some times even triggered fireChannelActive() which is not correct.
Modifications:
- Add testcase
- correctly detect connect errors
Result:
Correct and consistent handling.
Motivation:
When trying to write more then Integer.MAX_VALUE / SSIZE_MAX via writev(...) the OS may return EINVAL depending on the kernel or the actual OS (bsd / osx always return EINVAL). This will trigger an IOException.
Modifications:
Never try to write more then Integer.MAX_VALUE / SSIZE_MAX when using writev.
Result:
No more IOException when write more data then Integer.MAX_VALUE / SSIZE_MAX via writev.
Motivation:
When EPOLLRDHUP is received we need to try to read at least one time to ensure
that we read all pending data from the socket. Otherwise we may loose data.
Modifications:
- Ensure we read all data from socket
- Ensure file descriptor is closed on doClose() even if doDeregister() throws an Exception.
- Only handle either EPOLLRDHUP or EPOLLIN as only one is needed to detect connection reset.
Result:
No more data loss on connection reset.
Motivation:
When using epoll_ctl we should respect the return value and do the right thing depending on it.
Modifications:
Adjust java and native code to respect epoll_ctl return values.
Result:
Correct and cleaner code.
Motivation:
Linux supports splice(...) to transfer data from one filedescriptor to another without
pass data through the user-space. This allows to write high-performant proxy code or to stream
stuff from the socket directly the the filesystem.
Modification:
Add AbstractEpollStreamChannel.spliceTo(...) method to support splice(...) system call
Result:
Splice is now supported when using the native linux transport.
Conflicts:
transport-native-epoll/src/main/java/io/netty/channel/epoll/AbstractEpollStreamChannel.java
Motivation:
Because of a bug we missed to fail the connect future when doClose() is called. This can lead to a future which is never notified and so may lead to deadlocks in user-programs.
Modifications:
Correctly fail the connect future when doClose() is called and the connection was not established yet.
Result:
Connect future is always notified.
Motivation:
Each different *ChannelOption did extend ChannelOption in 4.0, which we changed in 4.1. This is a breaking change in terms of the API so we need to ensure we keep the old hierarchy.
Modifications:
- Let all *ChannelOption extend ChannelOption
- Add back constructor and mark it as @deprecated
Result:
No API breakage between 4.0 and 4.1
Motivation:
As we missed to correctly handle EPOLLRDHUP we produce an IOException which is unnessary. This leads
to have exceptionCaught(...) methods called.
Modifications:
When EPOLLRDHUP was received just close the socket and fail all pending writes.
Result:
Correctly handle of EPOLLRDHUP and so not miss-leading exceptions.
Motivation:
Due a a regression that was introduced by b898bdd we failed to set the localAddress if the connect did not success directly.
Modifications:
Correct set localAddress in doConnect(...)
Result:
Be able to get the localAddress in all cases.
Motivation:
During 6b941e9bdbc1b1a9090c280bc6c44903ff7c7b67 I introduced a regression that could cause an IllegalStateException.
A non-proper fix was commited as part of #3443. This commit add a proper fix.
Modifications:
Remove FileDescriptor.INVALID and add FileDescriptor.isOpen() as replacement. Once FileDescriptor.close() is called isOpen() will return false.
Result:
No more IllegalStateException caused by a close channel.
Motivation:
EpollDragramChannel never calls fireChannelActive after connect() which is a bug.
Modifications:
Correctly call fireChannelActive if needed
Result:
Correct behaviour
Motivation:
Because of a regression sometimes accept could produce an IllegalArgumentException
Modifications:
Correctly respect offset when decode port and scope id.
Result:
No more IllegalArgumentException
Motivation:
This is a regression that was introduced as part of 6b941e9bdbc1b1a9090c280bc6c44903ff7c7b67. The regression could produce an "infinity" triggering of IllegalStateException if a channel goes inactive while process the events for it.
Modifications:
Correctly check if the channel is still active before trigger the callbacks.
Result:
No more IllegalStateException
Motivation:
There is a small race in the native transport where an accept(...) may success but a later try to obtain the remote address from the fd may fail is the fd is already closed.
Modifications:
Let accept(...) directly set the remote address.
Result:
No more race possible.
Motivation:
When epoll LT is used and autoRead == false when entering epollIn() we need to return without reading any data.
Modifications:
Correctly respect autoRead == false if using epoll LT.
Result:
Consistent and correct behaviour.
Motivation:
In the native transport we should throw a pre-instanced IOException on connection reset while reading.
Modifications:
Correctly throw pre-instanced IOException when ECONNRESET is received
Result:
Less overhead on connection reset
Motivation:
As we plan to have other native transports soon (like a kqueue transport) we should move unix classes/interfaces out of the epoll package so we
introduce other implementations without breaking stuff before the next stable release.
Modifications:
Create a new io.netty.channel.unix package and move stuff over there.
Result:
Possible to introduce other native impls beside epoll.