Motivation:
There are 2 motivations, the first depends on the second:
Loading Netty Epoll statically stopped working in 4.1.16, due to
`Native` always loading the arch specific shared object. In a
static binary, there is no arch specific SO.
Second, there are a ton of exceptions that can happen when loading
a native library. When loading native code, Netty tries a bunch of
different paths but a failure in any given may not be fatal.
Additionally: turning on debug logging is not always feasible so
exceptions get silently swallowed.
Modifications:
* Change Epoll and Kqueue to try the static load second
* Modify NativeLibraryLoader to record all the locations where
exceptions occur.
* Attempt to use `addSuppressed` from Java 7 if available.
Alternatives Considered:
An alternative would be to record log messages at each failure. If
all load attempts fail, the log messages are printed as warning,
else as debug. The problem with this is there is no `LogRecord` to
create like in java.util.logging. Buffering the args to
logger.log() at the end of the method loses the call site, and
changes the order of events to be confusing.
Another alternative is to teach NativeLibraryLoader about loading
the SO first, and then the static version. This would consolidate
the code fore Epoll, Kqueue, and TCNative. I think this is the
long term better option, but this PR is changing a lot already.
Someone else can take a crack at it later
Results:
Epoll Still Loads and easier debugging.
Motivation:
When SO_LINGER is used we run doClose() on the GlobalEventExecutor by default so we need to ensure we schedule all code that needs to be run on the EventLoop on the EventLoop in doClose. Beside this there are also threading issues when calling shutdownOutput(...)
Modifications:
- Schedule removal from EventLoop to the EventLoop
- Correctly handle shutdownOutput and shutdown in respect with threading-model
- Add unit tests
Result:
Fixes [#7159].
Motivation:
We need to ensure we use the correct osname in the Bundle-NativeCode declaration as declared in:
https://www.osgi.org/developer/specifications/reference/
Modifications:
Update osname to match the spec.
Result:
Correct Bundle-NativeCode entry in the MANIFEST
Motivation:
We should only try to load the native artifacts if the architecture we are currently running on is the same as the one the native libraries were compiled for.
Modifications:
Include architecture in native lib name and append the current arch when trying to load these. This will fail then if its not the same as the arch of the compiled arch.
Result:
Fixes [#7150].
Motivation:
PD and PD0 Both try to find and use Unsafe. If unavailable, they
try to log why and continue on. However, it is not always east to
enable this logging. Chaining exceptions together is much easier
to reach, and the original exception is relevant when Unsafe is
needed.
Modifications:
* Make PD log why PD0 could not be loaded with a trace level log
* Make PD0 remember why Unsafe wasn't available
* Expose unavailability cause through PD for higher level use.
* Make Epoll and KQueue include the reason when failing
Result:
Easier debugging in hard to reconfigure environments
Motivation:
If AutoClose is false and there is a IoException then AbstractChannel will not close the channel but instead just fail flushed element in the ChannelOutboundBuffer. AbstractChannel also notifies of writability changes, which may lead to an infinite loop if the peer has closed its read side of the socket because we will keep accepting more data but continuously fail because the peer isn't accepting writes.
Modifications:
- If the transport throws on a write we should acknowledge that the output side of the channel has been shutdown and cleanup. If the channel can't accept more data because it is full, and still healthy it is not expected to throw. However if the channel is not healthy it will throw and is not expected to accept any more writes. In this case we should shutdown the output for Channels that support this feature and otherwise just close.
- Connection-less protocols like UDP can remain the same because the channel may disconnected temporarily.
- Make sure AbstractUnsafe#shutdownOutput is called because the shutdown on the socket may throw an exception.
Result:
More correct handling of write failure when AutoClose is false.
Motivation:
As noticed in https://stackoverflow.com/questions/45700277/
compilation can fail if the definition of a method doesn't
match the declaration. It's easy enough to add this in, and make
it easy to compile.
Modifications:
Add JNIEXPORT to the entry points.
* On Windows this adds: `__declspec(dllexport)`
* On Mac this adds: `__attribute__((visibility("default")))`
* On Linux (GCC 4.2+) this adds: ` __attribute__((visibility("default")))`
* On other it doesn't add anything.
Result:
Easier compilation
Motivation:
KQueueEventLoop and EpollEventLoop implement different approaches to applying a timeout of their respective poll calls. Epoll attempts to ensure the desired timeout is satisfied at the java layer and at the JNI layer, but it should be sufficient to account for spurious wakups at the JNI layer. Epoll timeout granularity is also limited to milliseconds which may be too large for some latency sensitive applications.
Modifications:
- Make EpollEventLoop wait method look like KQueueEventLoop
- Epoll should support a finer timeout granularity via timerfd_create. We can hide most of these details behind the epollWait0 JNI call to avoid crossing additional JNI boundaries.
Result:
More consistent timeout approach between KQueue and Epoll.
Motivation:
The EPOLL transport uses EPOLLRDHUP to detect when the peer closes the write side of the socket. Currently KQueue is not able to mimic this behavior and the only way to detect if the peer has closed is to read. It may not always be appropriate to read for backpressure and other reasons at the application level.
Modifications:
- Support EVFILT_SOCK filter which provides notification when the peer closes the socket
Result:
KQueue transport has more consistent behavior with Epoll transport for detecting peer closure.
Motivation:
Due to an oversight (by myself), linking two JNI modules with
duplicate symbols fails in linking. This only seems to happen
some of the time (the behavior seems to be different between GCC
and Clang toolchains). For instance, including both netty tcnative
and netty epoll fails to link because of duplicate JNI_OnLoad
symobols.
Modification:
Do not define the JNI_OnLoad and JNI_OnUnload symbols when
compiling for static linkage, as indicated by the NETTY_BUILD_STATIC
preprocessor define. They are never directly called when
statically linked.
Result:
Able to statically compile epoll and tcnative code into a single
binary.
Motivation:
At the moment we try to load the library using multiple names which includes names using - but also _ . We should just use _ all the time.
Modifications:
Replace - with _
Result:
Fixes [#7069]
Motivation:
Implementations of DuplexChannel delegate the shutdownOutput to the underlying transport, but do not take any action on the ChannelOutboundBuffer. In the event of a write failure due to the underlying transport failing and application may attempt to shutdown the output and allow the read side the transport to finish and detect the close. However this may result in an issue where writes are failed, this generates a writability change, we continue to write more data, and this may lead to another writability change, and this loop may continue. Shutting down the output should fail all pending writes and not allow any future writes to avoid this scenario.
Modifications:
- Implementations of DuplexChannel should null out the ChannelOutboundBuffer and fail all pending writes
Result:
More controlled sequencing for shutting down the output side of a channel.
Motivation:
We did not correctly handle connect() and disconnect() in EpollDatagramChannel / KQueueDatagramChannel and so the behavior was different compared to NioDatagramChannel.
Modifications:
- Correct implement connect and disconnect methods
- Share connect and related code
- Add tests
Result:
EpollDatagramChannel / KQueueDatagramChannel also supports correctly connect() and disconnect() methods.
Motivation:
`Epoll.ensureAvailability()` is called multiple times, once in
static initialization and in a couple of the constructors. This is
redundant and confusing to read.
Modifications:
Move `Epoll.ensureAvailability()` call into an instance initializer
and remove all other references. This ensures that every EELG
checks availability, while still delaying the check until
construction. This pattern is used when there are multiple ctors,
as in this class.
Result:
Easier to read code.
Motivation:
We should not try to detect a free port in tests put just use 0 when bind so there is no race in which the system my bind something to the port we choosen before.
Modifications:
- Remove the usage of TestUtils.getFreePort() in the testsuite
- Remove hack to workaround bind errors which will not happen anymore now
Result:
Less flacky tests.
Motivation:
JCTools 2.0.2 provides an unbounded MPSC linked queue. Before we shaded JCTools we had our own unbounded MPSC linked queue and used it in various places but gave this up because there was no public equivalent available in JCTools at the time.
Modifications:
- Use JCTool's MPSC linked queue when no upper bound is specified
Result:
Fixes https://github.com/netty/netty/issues/5951
Motivation:
We used an intermediate collection to store the read DatagramPackets and only fired these through the pipeline once wewere done with the reading loop. This is not needed and can also increase memory usage.
Modifications:
Remove intermediate collection
Result:
Less overhead and possible less memory usage during read loop.
Motivation:
The kqueue documentation states that 'Calling close() on a file descriptor will remove any kevents that reference the descriptor.' [1], but doesn't mention if this cleanup will be done synchronously. Under some circumstances it has been observed that cleanup was not done immediately and when KQueueEventLoop attempted to access the channel associated with the event the JVM would crash, a ClassCastException, or generally undefined behavior would occur because of invalid pointer references.
[1] https://www.freebsd.org/cgi/man.cgi?query=kqueue&sektion=2
Modifications:
- AbstractKqueueChannel#doClose should not rely upon this assumption and instead should call doDeregister() to ensure cleanup is done synchronously.
- Deleting a kevent should also set the jniSelfPtr stored in the udata of that kevent to NULL, to ensure we will not dereference it later.
Result:
No more kqueue crash due to close/cleanup sequencing.
Motivation:
We rely upon the linker being non-lazy to test compatibility the native library compatibility for kqueue, but the default mode of operation is to lazy link.
Modifications:
- We should modify the build scripts to inform the linker that this library should not be lazy linked
- Error messages changes
dyld: lazy symbol binding failed: Symbol not found: _clock_gettime
java.lang.UnsatisfiedLinkError: unsupported JNI version 0xFFFFFFFF required by .../libnetty-transport-native-kqueue.dylib
Result:
Link errors are detected upon library load time.
Motivation:
Enable static linking for Java 8. These commits are the same as those introduced to netty tcnative. The goal is to allow lots of JNI libraries to be statically linked together without having conflict `JNI_OnLoad` methods.
Modification:
* add JNI_OnLoad suffixes to enable static linking
* Add static names to the list of libraries that try to be loaded
* Enable compiling with JNI 1.8
* Sort includes
Result:
Enable statically linked JNI code.
Motivation:
We should not force autoconf and compile as this will result in multiple executions and so slow down the build.
Modifications:
Remove force declarations
Result:
Faster build of native modules
Motivation:
1. special handling of ByteBuf with multi nioBuffer rather than type of CompositeByteBuf (eg. DuplicatedByteBuf with CompositeByteBuf)
2. EpollDatagramUnicastTest and KQueueDatagramUnicastTest passed because CompositeByteBuf is converted to DuplicatedByteBuf before write to channel
3. uninitalized struct msghdr will raise error
Modifications:
1. isBufferCopyNeededForWrite(like isSingleDirectBuffer in NioDatgramChannel) checks wether a new direct buffer is needed
2. special handling of ByteBuf with multi nioBuffer in EpollDatagramChannel, AbstractEpollStreamChannel, KQueueDatagramChannel, AbstractKQueueStreamChannel and IovArray
3. initalize struct msghdr
Result:
handle of ByteBuf with multi nioBuffer in EpollDatagramChannel and KQueueDatagramChannel are ok
Motivation:
We only can call eventLoop() if we are registered on an EventLoop yet. As we just did this without checking we spammed the log with an error that was harmless.
Modifications:
Check if registered on eventLoop before try to deregister on close.
Result:
Fixes [#6770]
Motivation:
The native epoll transport allows to wrap an existing filedescriptor, we should support the same in the native kqueue transport.
Modifications:
Add constructors that allow to wrap and existing filedescriptor.
Result:
Featureset of native transports more on par.
Motivation:
For our native libraries in netty we support shading, to have this work on runtime the user needs to set a system property. This code should shared.
Modifications:
Move logic to NativeLbiraryLoader and so share for all native libs.
Result:
Less code duplication and also will work for netty-tcnative out of the box once it support shading
Motivation:
MacOS will throw an error when attempting to set the IP_TOS socket option if IPv6 is available, and also when getting the value for IP_TOS.
Modifications:
- Socket#setTrafficClass and Socket#getTrafficClass should try to use IPv6 first, and check if the error code indicates the protocol is not supported before trying IPv4
Result:
Fixes https://github.com/netty/netty/issues/6741.
Motivation:
To ensure the release plugin works correctly we need to ensure all modules are included during build.
Modification:
- Include all modules
- Skip compilation and tests for native code when not supported but still include the module and build the jar
Result:
Build and release works again
Motivation:
We currently don't have a native transport which supports kqueue https://www.freebsd.org/cgi/man.cgi?query=kqueue&sektion=2. This can be useful for BSD systems such as MacOS to take advantage of native features, and provide feature parity with the Linux native transport.
Modifications:
- Make a new transport-native-unix-common module with all the java classes and JNI code for generic unix items. This module will build a static library for each unix platform, and included in the dynamic libraries used for JNI (e.g. transport-native-epoll, and eventually kqueue).
- Make a new transport-native-unix-common-tests module where the tests for the transport-native-unix-common module will live. This is so each unix platform can inherit from these test and ensure they pass.
- Add a new transport-native-kqueue module which uses JNI to directly interact with kqueue
Result:
JNI support for kqueue.
Fixes https://github.com/netty/netty/issues/2448
Fixes https://github.com/netty/netty/issues/4231