Motivation:
At the moment we throw a ChannelException if netty_epoll_linuxsocket_setTcpMd5Sig fails. This is inconsistent with other methods which throw a IOException.
Modifications:
Throw IOException
Result:
More correct and consistent exception usage in epoll transport
Motivation:
When netty_epoll_linuxsocket_setTcpMd5Sig fails to init the sockaddr we should throw an exception and not silently return.
Modifications:
Throw exception if init of sockaddr fails.
Result:
Correctly report back error to user.
…nterface, block or loopback-mode-disabled operations).
Motivation:
Provide epoll/native multicast to support high load multicast users (we are using it for a high load telecomm app at my day job).
Modification:
Added support for (ipv4 only) source specific and any source multicast for epoll transport. Some caveats (beyond no ipv6 support initially - there’s a bit of work to add in join and leave group specifically around SSM, as ipv6 uses different data structures for this): no support for disabling loop back mode, retrieval of interface and block operation, all of which tend to be less frequently used.
Result:
Provides epoll transport multicast for IPv4 for common use cases. Understand if you’d prefer to hold off until ipv6 is included but not sure when I’ll be able to get to that.
Motivation:
Add an option (through a SelectStrategy return code) to have the Netty event loop thread to do busy-wait on the epoll.
The reason for this change is to avoid the context switch cost that comes when the event loop thread is blocked on the epoll_wait() call.
On average, the context switch has a penalty of ~13usec.
This benefits both:
The latency when reading from a socket
Scheduling tasks to be executed on the event loop thread.
The tradeoff, when enabling this feature, is that the event loop thread will be using 100% cpu, even when inactive.
Modification:
Added SelectStrategy option to return BUSY_WAIT
Epoll loop will do a epoll_wait() with no timeout
Use pause instruction to hint to processor that we're in a busy loop
Result:
When enabled, minimizes impact of context switch in the critical path
Motivation:
The Epoll transport checks to see if there are any scheduled tasks
before entering epoll_wait, and resets the timerfd just before.
This causes an extra syscall to timerfd_settime before doing any
actual work. When scheduled tasks aren't added frequently, or
tasks are added with later deadlines, this is unnecessary.
Modification:
Check the *deadline* of the peeked task in EpollEventLoop, rather
than the *delay*. If it hasn't changed since last time, don't
re-arm the timer
Result:
About 2us faster on gRPC RTT 50pct latency benchmarks.
Before (2 runs for 5 minutes, 1 minute of warmup):
```
50.0%ile Latency (in nanos): 64267
90.0%ile Latency (in nanos): 72851
95.0%ile Latency (in nanos): 78903
99.0%ile Latency (in nanos): 92327
99.9%ile Latency (in nanos): 119691
100.0%ile Latency (in nanos): 13347327
QPS: 14933
50.0%ile Latency (in nanos): 63907
90.0%ile Latency (in nanos): 73055
95.0%ile Latency (in nanos): 79443
99.0%ile Latency (in nanos): 93739
99.9%ile Latency (in nanos): 123583
100.0%ile Latency (in nanos): 14028287
QPS: 14936
```
After:
```
50.0%ile Latency (in nanos): 62123
90.0%ile Latency (in nanos): 70795
95.0%ile Latency (in nanos): 76895
99.0%ile Latency (in nanos): 90887
99.9%ile Latency (in nanos): 117819
100.0%ile Latency (in nanos): 14126591
QPS: 15387
50.0%ile Latency (in nanos): 61021
90.0%ile Latency (in nanos): 70311
95.0%ile Latency (in nanos): 76687
99.0%ile Latency (in nanos): 90887
99.9%ile Latency (in nanos): 119527
100.0%ile Latency (in nanos): 6351615
QPS: 15571
```
Motivation:
When using Epoll based transport, allow applications to configure SO_BUSY_POLL socket option:
SO_BUSY_POLL (since Linux 3.11)
Sets the approximate time in microseconds to busy poll on a
blocking receive when there is no data. Increasing this value
requires CAP_NET_ADMIN. The default for this option is con‐
trolled by the /proc/sys/net/core/busy_read file.
The value in the /proc/sys/net/core/busy_poll file determines
how long select(2) and poll(2) will busy poll when they oper‐
ate on sockets with SO_BUSY_POLL set and no events to report
are found.
In both cases, busy polling will only be done when the socket
last received data from a network device that supports this
option.
While busy polling may improve latency of some applications,
care must be taken when using it since this will increase both
CPU utilization and power usage.
Modification:
Added SO_BUSY_POLL socket option
Result:
Able to configure SO_BUSY_POLL from Netty
Motivation:
We should ensure we call *UnLoad when we detect an error during calling *OnLoad and previous *OnLoad calls were succesfull.
Modifications:
Correctly call *UnLoad when needed.
Result:
More correct code and no leaks when an error happens during loading the native lib.
* Allow to use native transports when sun.misc.Unsafe is not present on the system
Motivation:
We should be able to use the native transports (epoll / kqueue) even when sun.misc.Unsafe is not present on the system. This is especially important as Java11 will be released soon and does not allow access to it by default.
Modifications:
- Correctly disable usage of sun.misc.Unsafe when -PnoUnsafe is used while running the build
- Correctly increment metric when UnpooledDirectByteBuf is allocated. This was uncovered once -PnoUnsafe usage was fixed.
- Implement fallbacks in all our native transport code for when sun.misc.Unsafe is not present.
Result:
Fixes https://github.com/netty/netty/issues/8229.
Motivation:
We should support to load multiple shaded versions of the same netty artifact as netty is often used in multiple dependencies.
This is related to https://github.com/netty/netty/issues/7272.
Modifications:
- Use -fvisibility=hidden when compiling and use JNIEXPORT for things we really want to have exported
- Ensure fields are declared as static so these are not exported
- Adjust testsuite-shading to use install_name_tool on MacOS to change the id of the lib. Otherwise the wrong may be used.
Result:
Be able to use multiple shaded versions of the same netty artifact.
Motivation:
DatagramPacket.recipient() doesn't return the actual destination IP, but the IP the app is bound to.
Modification:
- IP_RECVORIGDSTADDR option is enabled for UDP sockets, which allows retrieval of ancillary information containing the original recipient.
- _recvFrom(...) function from transport-native-unix-common/src/main/c/netty_unix_socket.c is modified such that if IP_RECVORIGDSTADDR is set, recvmsg is used instead of recvfrom; enabling the retrieval of the original recipient.
- DatagramSocketAddress also contains a 'local' address, representing the recipient.
- EpollDatagramChannel is updated to return the retrieved recipient address instead of the address the channel is bound to.
Result:
Fixes#4950.
Motivation:
The writeSpinCount currently loops over the same buffer, gathering
write, file write, or other write operation multiple times but will
continue writing until there is nothing left or the OS doesn't accept
any data for that specific write. However if the OS keeps accepting
writes there is no way to limit how much time we spend on a specific
socket. This can lead to unfair consumption of resources dedicated to a
single socket.
We currently don't limit the amount of bytes we attempt to write per
gathering write. If there are many more bytes pending relative to the
SO_SNDBUF size we will end up building iov arrays with more elements
than can be written, which results in extra iteration, conditionals,
and book keeping.
Modifications:
- writeSpinCount should limit the number of system calls we make to
write data, instead of applying to individual write operations
- IovArray should support a maximum number of bytes
- IovArray should support composite buffers of greater than size 1024
- We should auto-scale the amount of data that we attempt to write per
gathering write operation relative to SO_SNDBUF and how much data is
successfully written
- The non-unsafe path should also support a maximum number of bytes,
and respect the IOV_MAX limit
Result:
Write resource consumption can be bounded and gathering writes have
a limit relative to the amount of data which can actually be accepted
by the socket.
Motivation:
To better isolate OS system calls we should not call getsockopt directly but use our netty_unix_socket_getOption0 function. See is a followup of f115bf5.
Modifications:
Export netty_unix_socket_getOption0 by declaring it in the header file and use it
Result:
Better isolation of system calls.
Motivation:
If a user calls EpollSocketChannelConfig.getOptions() and TCP_FASTOPEN_CONNECT is not supported we throw an exception.
Modifications:
- Just return 0 if ENOPROTOOPT is set.
- Add testcase
Result:
getOptions() works as epxected.
Motivation:
Linux kernel 4.11 introduced a new socket option,
TCP_FASTOPEN_CONNECT, that greatly simplifies making TCP Fast Open
connections on client side. Usually simply setting the flag before
connect() call is enough, no more changes are required.
Details can be found in kernel commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=19f6d3f3
Modifications:
TCP_FASTOPEN_CONNECT socket option was added to EpollChannelOption
class.
Result:
Netty clients can easily make TCP Fast Open connections. Simply
calling option(EpollChannelOption.TCP_FASTOPEN_CONNECT, true) in
client bootstrap is enough (given recent enough kernel).
Motivation:
As noticed in https://stackoverflow.com/questions/45700277/
compilation can fail if the definition of a method doesn't
match the declaration. It's easy enough to add this in, and make
it easy to compile.
Modifications:
Add JNIEXPORT to the entry points.
* On Windows this adds: `__declspec(dllexport)`
* On Mac this adds: `__attribute__((visibility("default")))`
* On Linux (GCC 4.2+) this adds: ` __attribute__((visibility("default")))`
* On other it doesn't add anything.
Result:
Easier compilation
Motivation:
KQueueEventLoop and EpollEventLoop implement different approaches to applying a timeout of their respective poll calls. Epoll attempts to ensure the desired timeout is satisfied at the java layer and at the JNI layer, but it should be sufficient to account for spurious wakups at the JNI layer. Epoll timeout granularity is also limited to milliseconds which may be too large for some latency sensitive applications.
Modifications:
- Make EpollEventLoop wait method look like KQueueEventLoop
- Epoll should support a finer timeout granularity via timerfd_create. We can hide most of these details behind the epollWait0 JNI call to avoid crossing additional JNI boundaries.
Result:
More consistent timeout approach between KQueue and Epoll.
Motivation:
Due to an oversight (by myself), linking two JNI modules with
duplicate symbols fails in linking. This only seems to happen
some of the time (the behavior seems to be different between GCC
and Clang toolchains). For instance, including both netty tcnative
and netty epoll fails to link because of duplicate JNI_OnLoad
symobols.
Modification:
Do not define the JNI_OnLoad and JNI_OnUnload symbols when
compiling for static linkage, as indicated by the NETTY_BUILD_STATIC
preprocessor define. They are never directly called when
statically linked.
Result:
Able to statically compile epoll and tcnative code into a single
binary.
Motivation:
At the moment we try to load the library using multiple names which includes names using - but also _ . We should just use _ all the time.
Modifications:
Replace - with _
Result:
Fixes [#7069]
Motivation:
We used an int[] to store all values that are returned in the struct for TCP_INFO which is not good enough as it uses usigned int values.
Modifications:
- Change int[] to long[] and correctly cast values.
Result:
No more truncated values.
Motivation:
Enable static linking for Java 8. These commits are the same as those introduced to netty tcnative. The goal is to allow lots of JNI libraries to be statically linked together without having conflict `JNI_OnLoad` methods.
Modification:
* add JNI_OnLoad suffixes to enable static linking
* Add static names to the list of libraries that try to be loaded
* Enable compiling with JNI 1.8
* Sort includes
Result:
Enable statically linked JNI code.
Motivation:
Google requires stricter compilation by adding -Werror and enabling many other warnings.
Modification:
* fix warning caused by -Wmissing-braces
* Use the address of `sendmmsg` rather than the function itself when
checking for presence. This resovles the warning caused by
`-Wpointer-bool-conversion`.
More detail:
When compiling on Linux, `sendmmsg` is always present, so the
function is always nonnull. When compiling elsewhere, the
function is defined as `__attribute__((weak))` which means it
may be absent at link time. This is controlled by
`IO_NETTY_SENDMMSG_NOT_FOUND`, which is off by default.
The reason for the error is due to the risk of accidentally not
calling the function. By adding `&` before the function, there
is no ambiguity. (the result of the fn call cannot have its
address taken.)
* use != to check for sendmmsg
Result:
Easier compilation.
Motivation:
This allows netty to operate in 'transparent proxy' mode, intercepting connections
to other addresses by means of Linux firewalling rules, as per
https://www.kernel.org/doc/Documentation/networking/tproxy.txt
The original destination address can be obtained by referencing
ch.localAddress().
Modification:
Add methods similar to those for ipFreeBind, to set the IP_TRANSPARENT option.
Result:
Allows setting and getting of the IP_TRANSPARENT option, which allows retrieval of the ultimate socket address originally requested.
Motivation:
We currently don't have a native transport which supports kqueue https://www.freebsd.org/cgi/man.cgi?query=kqueue&sektion=2. This can be useful for BSD systems such as MacOS to take advantage of native features, and provide feature parity with the Linux native transport.
Modifications:
- Make a new transport-native-unix-common module with all the java classes and JNI code for generic unix items. This module will build a static library for each unix platform, and included in the dynamic libraries used for JNI (e.g. transport-native-epoll, and eventually kqueue).
- Make a new transport-native-unix-common-tests module where the tests for the transport-native-unix-common module will live. This is so each unix platform can inherit from these test and ensure they pass.
- Add a new transport-native-kqueue module which uses JNI to directly interact with kqueue
Result:
JNI support for kqueue.
Fixes https://github.com/netty/netty/issues/2448
Fixes https://github.com/netty/netty/issues/4231
Motivation:
I had a need to know the user credentials of a connected unix domain socket.
Modifications:
Added a class to encapsulate user credentials (UID, GID, and the PID).
Augemented the Socket class to provide the JNI native interface to return this new class
Augemented the c code to call getSockOpts passing <a href=http://man7.org/linux/man-pages/man7/socket.7.html>SO_PEERCRED</a>
Then surfaced the ability to get user credentials in the EpollDomainSocketChannel
Result:
The EpollDomainSocketChannel now has a the following function signature:
public PeerCredentials peerCredentials() throws IOException allowing a caller to get the UID, GID, and PID of the linux process
connected to the unix domain socket.
Motivation:
The NIO transport used an IllegalStateException if a user tried to issue another connect(...) while the connect was still in process. For this case the JDK specified a ConnectPendingException which we should use. The same issues exists in the EPOLL transport. Beside this the EPOLL transport also does not throw the right exceptions for ENETUNREACH and EISCONN errno codes.
Modifications:
- Replace IllegalStateException with ConnectPendingException in NIO and EPOLL transport
- throw correct exceptions for ENETUNREACH and EISCONN in EPOLL transport
- Add test case
Result:
More correct error handling for connect attempts when using NIO and EPOLL transport
Motivation:
ECONNREFUSED can be a common type of exception when attempting to finish the connection process. Generating a new exception each time can be costly and quickly bloat memory usage.
Modifications:
- Expose ECONNREFUSED from JNI and cache this exception in Socket.finishConnect
Result:
ECONNREFUSED during finish connect doesn't create a new exception each time.
Motivation:
Unused methods create warnings on some C compilers. It may not be feasible to selectively turn them off.
Modifications:
Remove createInetSocketAddress as it is unused.
Result:
Less noisy compilation
Motivation:
epoll_wait accepts a timeout argument which will specify the maximum amount of time the epoll_wait will wait for an event to occur. If the epoll_wait method returns for any reason that is not fatal (e.g. EINTR) the original timeout value is re-used. This does not honor the timeout interface contract and can lead to unbounded time in epoll_wait.
Modifications:
- The time taken by epoll_wait should be decremented before calling epoll_wait again, and if the remaining time is exhausted we should return 0 according to the epoll_wait interface docs http://man7.org/linux/man-pages/man2/epoll_wait.2.html
- link librt which is needed for some platforms to use clock_gettime
Result:
epoll_wait will wait for at most timeout ms according to the epoll_wait interface contract.
Motivation:
We used transfered in native code which is not correct spelling. It should be transferred.
Modifications:
Fix typo.
Result:
Less typos in source code.
Motivation:
When epoll datagram channel invokes sendmmsg0, _all_ of the messages go
on the wire with the address of the _last_ packet in the list.
Modifications:
An array of addresses equal to the length of the messages is allocated
on the stack to hold the address for each msg_hdr.msg_name.
Result:
Each message goes on the wire with the correct address.
Motivation:
To be consistent with the JDK we should ensure our native methods throw a ClosedChannelException if the Channel was previously closed. This will then be wrapped in a ChannelException as usual. For all other errors we continue to just throw a ChannelException directly.
Modifications:
Ensure getsockopt and setsockopt will throw a ClosedChannelException if the channel was closed before, on other errors we throw a ChannelException as before diretly.
Result:
Consistent with the NIO Channel implementations.
Motivation:
EpollServerSocketConfig.isFreebind() throws an exception when called.
Modifications:
Use the correct getsockopt arguments.
Result:
No more exception when call EpollServerSocketConfig.isFreebind()
Motivation:
When using the native transport have support for TCP_DEFER_ACCEPT or / and TCP_QUICKACK can be useful.
Modifications:
- Add support for TCP_DEFER_ACCEPT and TCP_QUICKACK
- Ad unit tests
Result:
TCP_DEFER_ACCEPT and TCP_QUICKACK are supported now.
Motivation:
JNI_OnUnload(...) does not return anything (has void in its signature) so we should not try to return something.
Modifications:
Remove return.
Result:
Fix incorrect but harmless code.
Motivation:
netty_epoll_native.c uses dladdr in attempt to get the name of the library that the code is running in. However the address passed to this funciton (JNI_OnLoad) may not be unique in the context of the application which loaded it. For example if another JNI library is loaded this address may first resolve to the other JNI library and cause the path name parsing to fail, which will cause the library to fail.
Modifications:
- Pass an addresses which is local to the current library to dladdr
Result:
EPOLL JNI library can be loaded in an environment where multiple JNI libraries are loaded.
Fixes https://github.com/netty/netty/issues/4840
Motivation:
If Netty's class files are renamed and the type references are updated (shaded) the native libraries will not function. The native epoll module uses implicit JNI bindings which requires the fully qualified java type names to match the method signatures of the native methods. This means EPOLL cannot be used with a shaded Netty.
Modifications:
- Make the JNI method registration dynamic
- support a system property io.netty.packagePrefix which must be prepended to the name of the native library (to ensure the correct library is loaded) and all class names (to allow classes to be correctly referenced)
- remove system property io.netty.native.epoll.nettyPackagePrefix which was recently added and the code to support it was incomplete
Result:
transport-native-epoll can be used when Netty has been shaded.
Fixes https://github.com/netty/netty/issues/4800
Motivation:
When a wildcard address is used to bind a socket and ipv4 and ipv6 are usable we should accept both (just like JDK IO/NIO does).
Modifications:
Detect wildcard address and if so use in6addr_any
Result:
Correctly accept ipv4 and ipv6
Motivation:
transport-native-epoll finds java classes from JNI using fully qualified class names. If a shaded version of Netty is used then these lookups will fail.
Modifications:
- Allow a prefix to be appended to Netty class names in JNI code.
Result:
JNI code can be used with shaded version of Netty.
Motivation:
We should also be able to compile the native transport on 32bit systems.
Modifications:
Add cast to intptr_t for pointers
Result:
It's possible now to also compile on 32bit.
Motivation:
Linux uses different socket options to set the traffic class (DSCP) on IPv6
Modifications:
Also set IPV6_TCLASS for IPv6 sockets
Result:
TrafficClass will work on IPv4 and IPv6 correctly
Motivation:
We missed to define the actual c function for isKeepAlive(...) and so throw UnsatisfieldLinkError.
Modifications:
- Add function
- Add unit test for Socket class
Result:
Correctly work isKeepAlive(...) when using native transport
Motivation:
transport-native-epoll is designed to be specific to Linux. However there is native code that can be extracted out and made to work on more Unix like distributions. There are a few steps to be completely decoupled but the first step is to extract out code that can run in a more general Unix environment from the Linux specific code base.
Modifications:
- Move all non-Linux specific stuff from Native.java into the io.netty.channel.unix package.
- io.netty.channel.unix.FileDescriptor will inherit all the native methods that are specific to file descriptors.
- io_netty_channel_epoll_Native.[c|h] will only have code that is specific to Linux.
Result:
Code is decoupled and design is streamlined in FileDescriptor.
Motivation:
Java_io_netty_channel_epoll_Native_getSoError incorrectly returns the value from the get socket option function.
Modifications:
- return the value from the result of the get socket option call
Result:
Java_io_netty_channel_epoll_Native_getSoError returns the correct value.
Motivation:
If a RDHUP and IN event occurred at the same time it is possible we may not read all pending data on the channel. We should ensure we read data before processing the RDHUP event.
Modifications:
- Process the RDHUP event before the IN event.
Result:
Data will not be dropped.
Fixes https://github.com/netty/netty/issues/4317
Motivation:
We should fail the build on warnings in the JNI/c code.
Modifications:
- Add GCC flag to fail build on warnings.
- Fix warnings (which also fixed a bug when using splice with offsets).
Result:
Better code quality.
Motivation:
TCP Fast Open allows data to be carried in the SYN and SYN-ACK packets and consumed by the receiving end during the initial connection handshake, and saves up to one full round-trip time (RTT) compared to the standard TCP, which requires a three-way handshake (3WHS) to complete before data can be exchanged. This commit enables support for TFO on server sockets.
Modifications:
Added new Integer Option TCP_FASTOPEN in EpollChannelOption.
Added getters/setters in EpollServerChannelConfig for TCP_FASTOPEN.
Added way to check if TCP_FASTOPEN is supported on server in Native.
Added setting on socket opt TCP_FASTOPEN if value is set on channel options in doBind in EpollServerSocketChannel.
Enhanced EpollSocketTestPermutation to contain a permutation for server socket containing fast open.
Result:
Users of native-epoll can set TCP_FASTOPEN on server sockets and thus leverage fast connect features of RFC7413 if client is capable of it.
Conflicts:
transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollChannelOption.java
Motivation:
There are protocols (BGP, SXP), which are typically deployed with TCP
MD5 authentication to protect sessions from being hijacked/torn down by
third parties. This facility is not available on most operating systems,
but is typically present on Linux.
Modifications:
- add a new EpollChannelOption, which is write-only
- teach Epoll(Server)SocketChannel to track which addresses have keys
associated
- teach Native how to set the MD5 signature keys for a socket
Result:
Users of the native-epoll transport can set MD5 signature keys and thus
leverage RFC-2385 protection on TCP connections.