Motivation
The AbstractEpollStreamChannel::spliceTo(FileDescriptor, ...) methods
take an offset parameter but this was effectively ignored due to what
looks like a typo in the corresponding JNI function impl. Instead it
would always use the file's own native offset.
Modification
- Fix typo in netty_epoll_native_splice0() and offset accounting in
AbstractEpollStreamChannel::SpliceFdTask.
- Modify unit test to include an invocation of the public spliceTo
method using non-zero offset.
Result
spliceTo FD methods work as expected when an offset is provided.
Motivation:
We do not need to issue a read on timerfd and eventfd when the EventLoop wakes up if we register these as Edge-Triggered. This removes the overhead of 2 syscalls and so helps to reduce latency.
Modifications:
- Ensure we register the timerfd and eventfd with EPOLLET flag
- If eventfd_write fails with EAGAIN, call eventfd_read and try eventfd_write again as we only use it as wake-up mechanism.
Result:
Less syscalls and so reducing overhead.
Co-authored-by: Carl Mastrangelo <carl@carlmastrangelo.com>
Motivation:
Provide epoll/native multicast to support high load multicast users (we are using it for a high load telecomm app at my day job).
Modification:
Added support for source specific and any source multicast for epoll transport. Some caveats: no support for disabling loop back mode, retrieval of interface and block operation, all of which tend to be less frequently used.
Result:
Provides epoll transport multicast for common use cases.
Co-authored-by: Norman Maurer <norman_maurer@apple.com>
Motivation:
Add an option (through a SelectStrategy return code) to have the Netty event loop thread to do busy-wait on the epoll.
The reason for this change is to avoid the context switch cost that comes when the event loop thread is blocked on the epoll_wait() call.
On average, the context switch has a penalty of ~13usec.
This benefits both:
The latency when reading from a socket
Scheduling tasks to be executed on the event loop thread.
The tradeoff, when enabling this feature, is that the event loop thread will be using 100% cpu, even when inactive.
Modification:
Added SelectStrategy option to return BUSY_WAIT
Epoll loop will do a epoll_wait() with no timeout
Use pause instruction to hint to processor that we're in a busy loop
Result:
When enabled, minimizes impact of context switch in the critical path
Motivation:
The Epoll transport checks to see if there are any scheduled tasks
before entering epoll_wait, and resets the timerfd just before.
This causes an extra syscall to timerfd_settime before doing any
actual work. When scheduled tasks aren't added frequently, or
tasks are added with later deadlines, this is unnecessary.
Modification:
Check the *deadline* of the peeked task in EpollEventLoop, rather
than the *delay*. If it hasn't changed since last time, don't
re-arm the timer
Result:
About 2us faster on gRPC RTT 50pct latency benchmarks.
Before (2 runs for 5 minutes, 1 minute of warmup):
```
50.0%ile Latency (in nanos): 64267
90.0%ile Latency (in nanos): 72851
95.0%ile Latency (in nanos): 78903
99.0%ile Latency (in nanos): 92327
99.9%ile Latency (in nanos): 119691
100.0%ile Latency (in nanos): 13347327
QPS: 14933
50.0%ile Latency (in nanos): 63907
90.0%ile Latency (in nanos): 73055
95.0%ile Latency (in nanos): 79443
99.0%ile Latency (in nanos): 93739
99.9%ile Latency (in nanos): 123583
100.0%ile Latency (in nanos): 14028287
QPS: 14936
```
After:
```
50.0%ile Latency (in nanos): 62123
90.0%ile Latency (in nanos): 70795
95.0%ile Latency (in nanos): 76895
99.0%ile Latency (in nanos): 90887
99.9%ile Latency (in nanos): 117819
100.0%ile Latency (in nanos): 14126591
QPS: 15387
50.0%ile Latency (in nanos): 61021
90.0%ile Latency (in nanos): 70311
95.0%ile Latency (in nanos): 76687
99.0%ile Latency (in nanos): 90887
99.9%ile Latency (in nanos): 119527
100.0%ile Latency (in nanos): 6351615
QPS: 15571
```
Motivation:
We should ensure we call *UnLoad when we detect an error during calling *OnLoad and previous *OnLoad calls were succesfull.
Modifications:
Correctly call *UnLoad when needed.
Result:
More correct code and no leaks when an error happens during loading the native lib.
* Allow to use native transports when sun.misc.Unsafe is not present on the system
Motivation:
We should be able to use the native transports (epoll / kqueue) even when sun.misc.Unsafe is not present on the system. This is especially important as Java11 will be released soon and does not allow access to it by default.
Modifications:
- Correctly disable usage of sun.misc.Unsafe when -PnoUnsafe is used while running the build
- Correctly increment metric when UnpooledDirectByteBuf is allocated. This was uncovered once -PnoUnsafe usage was fixed.
- Implement fallbacks in all our native transport code for when sun.misc.Unsafe is not present.
Result:
Fixes https://github.com/netty/netty/issues/8229.
Motivation:
We should support to load multiple shaded versions of the same netty artifact as netty is often used in multiple dependencies.
This is related to https://github.com/netty/netty/issues/7272.
Modifications:
- Use -fvisibility=hidden when compiling and use JNIEXPORT for things we really want to have exported
- Ensure fields are declared as static so these are not exported
- Adjust testsuite-shading to use install_name_tool on MacOS to change the id of the lib. Otherwise the wrong may be used.
Result:
Be able to use multiple shaded versions of the same netty artifact.
Motivation:
The writeSpinCount currently loops over the same buffer, gathering
write, file write, or other write operation multiple times but will
continue writing until there is nothing left or the OS doesn't accept
any data for that specific write. However if the OS keeps accepting
writes there is no way to limit how much time we spend on a specific
socket. This can lead to unfair consumption of resources dedicated to a
single socket.
We currently don't limit the amount of bytes we attempt to write per
gathering write. If there are many more bytes pending relative to the
SO_SNDBUF size we will end up building iov arrays with more elements
than can be written, which results in extra iteration, conditionals,
and book keeping.
Modifications:
- writeSpinCount should limit the number of system calls we make to
write data, instead of applying to individual write operations
- IovArray should support a maximum number of bytes
- IovArray should support composite buffers of greater than size 1024
- We should auto-scale the amount of data that we attempt to write per
gathering write operation relative to SO_SNDBUF and how much data is
successfully written
- The non-unsafe path should also support a maximum number of bytes,
and respect the IOV_MAX limit
Result:
Write resource consumption can be bounded and gathering writes have
a limit relative to the amount of data which can actually be accepted
by the socket.
Motivation:
As noticed in https://stackoverflow.com/questions/45700277/
compilation can fail if the definition of a method doesn't
match the declaration. It's easy enough to add this in, and make
it easy to compile.
Modifications:
Add JNIEXPORT to the entry points.
* On Windows this adds: `__declspec(dllexport)`
* On Mac this adds: `__attribute__((visibility("default")))`
* On Linux (GCC 4.2+) this adds: ` __attribute__((visibility("default")))`
* On other it doesn't add anything.
Result:
Easier compilation
Motivation:
KQueueEventLoop and EpollEventLoop implement different approaches to applying a timeout of their respective poll calls. Epoll attempts to ensure the desired timeout is satisfied at the java layer and at the JNI layer, but it should be sufficient to account for spurious wakups at the JNI layer. Epoll timeout granularity is also limited to milliseconds which may be too large for some latency sensitive applications.
Modifications:
- Make EpollEventLoop wait method look like KQueueEventLoop
- Epoll should support a finer timeout granularity via timerfd_create. We can hide most of these details behind the epollWait0 JNI call to avoid crossing additional JNI boundaries.
Result:
More consistent timeout approach between KQueue and Epoll.
Motivation:
Due to an oversight (by myself), linking two JNI modules with
duplicate symbols fails in linking. This only seems to happen
some of the time (the behavior seems to be different between GCC
and Clang toolchains). For instance, including both netty tcnative
and netty epoll fails to link because of duplicate JNI_OnLoad
symobols.
Modification:
Do not define the JNI_OnLoad and JNI_OnUnload symbols when
compiling for static linkage, as indicated by the NETTY_BUILD_STATIC
preprocessor define. They are never directly called when
statically linked.
Result:
Able to statically compile epoll and tcnative code into a single
binary.
Motivation:
At the moment we try to load the library using multiple names which includes names using - but also _ . We should just use _ all the time.
Modifications:
Replace - with _
Result:
Fixes [#7069]
Motivation:
Enable static linking for Java 8. These commits are the same as those introduced to netty tcnative. The goal is to allow lots of JNI libraries to be statically linked together without having conflict `JNI_OnLoad` methods.
Modification:
* add JNI_OnLoad suffixes to enable static linking
* Add static names to the list of libraries that try to be loaded
* Enable compiling with JNI 1.8
* Sort includes
Result:
Enable statically linked JNI code.
Motivation:
Google requires stricter compilation by adding -Werror and enabling many other warnings.
Modification:
* fix warning caused by -Wmissing-braces
* Use the address of `sendmmsg` rather than the function itself when
checking for presence. This resovles the warning caused by
`-Wpointer-bool-conversion`.
More detail:
When compiling on Linux, `sendmmsg` is always present, so the
function is always nonnull. When compiling elsewhere, the
function is defined as `__attribute__((weak))` which means it
may be absent at link time. This is controlled by
`IO_NETTY_SENDMMSG_NOT_FOUND`, which is off by default.
The reason for the error is due to the risk of accidentally not
calling the function. By adding `&` before the function, there
is no ambiguity. (the result of the fn call cannot have its
address taken.)
* use != to check for sendmmsg
Result:
Easier compilation.
Motivation:
We currently don't have a native transport which supports kqueue https://www.freebsd.org/cgi/man.cgi?query=kqueue&sektion=2. This can be useful for BSD systems such as MacOS to take advantage of native features, and provide feature parity with the Linux native transport.
Modifications:
- Make a new transport-native-unix-common module with all the java classes and JNI code for generic unix items. This module will build a static library for each unix platform, and included in the dynamic libraries used for JNI (e.g. transport-native-epoll, and eventually kqueue).
- Make a new transport-native-unix-common-tests module where the tests for the transport-native-unix-common module will live. This is so each unix platform can inherit from these test and ensure they pass.
- Add a new transport-native-kqueue module which uses JNI to directly interact with kqueue
Result:
JNI support for kqueue.
Fixes https://github.com/netty/netty/issues/2448
Fixes https://github.com/netty/netty/issues/4231
Motivation:
epoll_wait accepts a timeout argument which will specify the maximum amount of time the epoll_wait will wait for an event to occur. If the epoll_wait method returns for any reason that is not fatal (e.g. EINTR) the original timeout value is re-used. This does not honor the timeout interface contract and can lead to unbounded time in epoll_wait.
Modifications:
- The time taken by epoll_wait should be decremented before calling epoll_wait again, and if the remaining time is exhausted we should return 0 according to the epoll_wait interface docs http://man7.org/linux/man-pages/man2/epoll_wait.2.html
- link librt which is needed for some platforms to use clock_gettime
Result:
epoll_wait will wait for at most timeout ms according to the epoll_wait interface contract.
Motivation:
We used transfered in native code which is not correct spelling. It should be transferred.
Modifications:
Fix typo.
Result:
Less typos in source code.
Motivation:
When epoll datagram channel invokes sendmmsg0, _all_ of the messages go
on the wire with the address of the _last_ packet in the list.
Modifications:
An array of addresses equal to the length of the messages is allocated
on the stack to hold the address for each msg_hdr.msg_name.
Result:
Each message goes on the wire with the correct address.
Motivation:
EpollServerSocketConfig.isFreebind() throws an exception when called.
Modifications:
Use the correct getsockopt arguments.
Result:
No more exception when call EpollServerSocketConfig.isFreebind()
Motivation:
JNI_OnUnload(...) does not return anything (has void in its signature) so we should not try to return something.
Modifications:
Remove return.
Result:
Fix incorrect but harmless code.
Motivation:
netty_epoll_native.c uses dladdr in attempt to get the name of the library that the code is running in. However the address passed to this funciton (JNI_OnLoad) may not be unique in the context of the application which loaded it. For example if another JNI library is loaded this address may first resolve to the other JNI library and cause the path name parsing to fail, which will cause the library to fail.
Modifications:
- Pass an addresses which is local to the current library to dladdr
Result:
EPOLL JNI library can be loaded in an environment where multiple JNI libraries are loaded.
Fixes https://github.com/netty/netty/issues/4840
Motivation:
If Netty's class files are renamed and the type references are updated (shaded) the native libraries will not function. The native epoll module uses implicit JNI bindings which requires the fully qualified java type names to match the method signatures of the native methods. This means EPOLL cannot be used with a shaded Netty.
Modifications:
- Make the JNI method registration dynamic
- support a system property io.netty.packagePrefix which must be prepended to the name of the native library (to ensure the correct library is loaded) and all class names (to allow classes to be correctly referenced)
- remove system property io.netty.native.epoll.nettyPackagePrefix which was recently added and the code to support it was incomplete
Result:
transport-native-epoll can be used when Netty has been shaded.
Fixes https://github.com/netty/netty/issues/4800