Motivation:
While using Load Balancers or HA support is needed there are cases when UDP channel need to bind to IP Address which is not available on network interfaces locally.
Modification:
Modified EpollDatagramChannelConfig to allow IP_FREEBIND option
Result:
Fixes ##8727.
Motivation:
Most of the maven modules do not explicitly declare their
dependencies and rely on transitivity, which is not always correct.
Modifications:
For all maven modules, add all of their dependencies to pom.xml
Result:
All of the (essentially non-transitive) depepdencies of the modules are explicitly declared in pom.xml
Motivation:
https://github.com/netty/netty/issues/8444 reports that there is some issue with negative values passed to timerfd_settime. This test verifies that everything is working as expected.
Modifications:
Add testcase.
Result:
Test to verify expected behaviour.
Motivation:
epoll_wait should work in 4.1.30 like it did in 4.1.29.
Modifications:
Revert Integer.MAX_VALUE back to MAX_SCHEDULED_TIMERFD_NS (999,999,999).
Add unit test.
Result:
epoll_wait will no longer throw EINVAL.
Motivation:
Add an option (through a SelectStrategy return code) to have the Netty event loop thread to do busy-wait on the epoll.
The reason for this change is to avoid the context switch cost that comes when the event loop thread is blocked on the epoll_wait() call.
On average, the context switch has a penalty of ~13usec.
This benefits both:
The latency when reading from a socket
Scheduling tasks to be executed on the event loop thread.
The tradeoff, when enabling this feature, is that the event loop thread will be using 100% cpu, even when inactive.
Modification:
Added SelectStrategy option to return BUSY_WAIT
Epoll loop will do a epoll_wait() with no timeout
Use pause instruction to hint to processor that we're in a busy loop
Result:
When enabled, minimizes impact of context switch in the critical path
Motivation
The EpollChannelConfig (same for KQueues) and its subclasses repeatetly declare their own channel field which leads to a 3x repetition for each config instance. Given the fields are protected or package-private it's exposing the code code to "field hiding" bugs.
Modifications
Use the the existing protected channel field from the DefaultChannelConfig class and simply cast it when needed.
Result
Fixes#8331
Motivation:
The Epoll transport checks to see if there are any scheduled tasks
before entering epoll_wait, and resets the timerfd just before.
This causes an extra syscall to timerfd_settime before doing any
actual work. When scheduled tasks aren't added frequently, or
tasks are added with later deadlines, this is unnecessary.
Modification:
Check the *deadline* of the peeked task in EpollEventLoop, rather
than the *delay*. If it hasn't changed since last time, don't
re-arm the timer
Result:
About 2us faster on gRPC RTT 50pct latency benchmarks.
Before (2 runs for 5 minutes, 1 minute of warmup):
```
50.0%ile Latency (in nanos): 64267
90.0%ile Latency (in nanos): 72851
95.0%ile Latency (in nanos): 78903
99.0%ile Latency (in nanos): 92327
99.9%ile Latency (in nanos): 119691
100.0%ile Latency (in nanos): 13347327
QPS: 14933
50.0%ile Latency (in nanos): 63907
90.0%ile Latency (in nanos): 73055
95.0%ile Latency (in nanos): 79443
99.0%ile Latency (in nanos): 93739
99.9%ile Latency (in nanos): 123583
100.0%ile Latency (in nanos): 14028287
QPS: 14936
```
After:
```
50.0%ile Latency (in nanos): 62123
90.0%ile Latency (in nanos): 70795
95.0%ile Latency (in nanos): 76895
99.0%ile Latency (in nanos): 90887
99.9%ile Latency (in nanos): 117819
100.0%ile Latency (in nanos): 14126591
QPS: 15387
50.0%ile Latency (in nanos): 61021
90.0%ile Latency (in nanos): 70311
95.0%ile Latency (in nanos): 76687
99.0%ile Latency (in nanos): 90887
99.9%ile Latency (in nanos): 119527
100.0%ile Latency (in nanos): 6351615
QPS: 15571
```
Motivation:
When using Epoll based transport, allow applications to configure SO_BUSY_POLL socket option:
SO_BUSY_POLL (since Linux 3.11)
Sets the approximate time in microseconds to busy poll on a
blocking receive when there is no data. Increasing this value
requires CAP_NET_ADMIN. The default for this option is con‐
trolled by the /proc/sys/net/core/busy_read file.
The value in the /proc/sys/net/core/busy_poll file determines
how long select(2) and poll(2) will busy poll when they oper‐
ate on sockets with SO_BUSY_POLL set and no events to report
are found.
In both cases, busy polling will only be done when the socket
last received data from a network device that supports this
option.
While busy polling may improve latency of some applications,
care must be taken when using it since this will increase both
CPU utilization and power usage.
Modification:
Added SO_BUSY_POLL socket option
Result:
Able to configure SO_BUSY_POLL from Netty
Motivation:
We should ensure we call *UnLoad when we detect an error during calling *OnLoad and previous *OnLoad calls were succesfull.
Modifications:
Correctly call *UnLoad when needed.
Result:
More correct code and no leaks when an error happens during loading the native lib.
* Allow to use native transports when sun.misc.Unsafe is not present on the system
Motivation:
We should be able to use the native transports (epoll / kqueue) even when sun.misc.Unsafe is not present on the system. This is especially important as Java11 will be released soon and does not allow access to it by default.
Modifications:
- Correctly disable usage of sun.misc.Unsafe when -PnoUnsafe is used while running the build
- Correctly increment metric when UnpooledDirectByteBuf is allocated. This was uncovered once -PnoUnsafe usage was fixed.
- Implement fallbacks in all our native transport code for when sun.misc.Unsafe is not present.
Result:
Fixes https://github.com/netty/netty/issues/8229.
Motivation:
We should support to load multiple shaded versions of the same netty artifact as netty is often used in multiple dependencies.
This is related to https://github.com/netty/netty/issues/7272.
Modifications:
- Use -fvisibility=hidden when compiling and use JNIEXPORT for things we really want to have exported
- Ensure fields are declared as static so these are not exported
- Adjust testsuite-shading to use install_name_tool on MacOS to change the id of the lib. Otherwise the wrong may be used.
Result:
Be able to use multiple shaded versions of the same netty artifact.
Motivation:
Avoid unnecessary native memory allocation if UDP / TCP isn't being
used.
Modifications:
Create the reused NativeDatagramPacketArray and IovArray upon first use
instead of EpollEventLoop construction.
Also correct related comment in NativeDatagramPacketArray.
Result:
Reduced native memory use when using epoll in many cases
Motivation:
We can store the NativeDatagramPacketArray directly in the EpollEventLoop. This removes the need of using FastThreadLocal.
Modifications:
- Store NativeDatagramPacketArray directly in the EpollEventLoop (just as we do with IovArray as well).
Result:
Less FastThreadLocal usage and more consistent code.
Motivation:
Epoll and Kqueue channels have internal state which forces
a single read operation after channel construction. This
violates the Channel#read() interface which indicates that
data shouldn't be delivered until this method is called.
The behavior is also inconsistent with the NIO transport.
Modifications:
- Epoll and Kqueue shouldn't unconditionally read upon
initialization, and instead should rely upon Channel#read()
or auto_read.
Result:
Epoll and Kqueue are more consistent with NIO.
Motivation:
We should allow to schedule tasks with a delay up to Long.MAX_VALUE as we did pre 4.1.25.Final.
Modifications:
Just ensure we not overflow and put the correct max limits in place when schedule a timer. At worse we will get a wakeup to early and then schedule a new timeout.
Result:
Fixes https://github.com/netty/netty/issues/7970.
* Read until all data is consumed when EOF is detected even if readPending is false and auto-read is disabled.
Motivation:
We should better always notify the user of EOF even if the user did not request any data as otherwise we may never be notified when the remote peer closes the connection. This should be ok as the amount of extra data we may read and so fire through the pipeline is limited by SO_RECVBUF.
Modifications:
- Always drain the socket when EOF is detected.
- Add testcase
Result:
No risk for the user to be not notified of EOF.
Motivation:
Sometimes it's useful to disable native transports / native ssl to debug a problem. We should allow to do so with a system property so people not need to adjust code for this.
Modifications:
Add system properties which allow to disable native transport and native ssl.
Result:
Easier to disable native code usage without code changes.
Motivation:
DatagramPacket.recipient() doesn't return the actual destination IP, but the IP the app is bound to.
Modification:
- IP_RECVORIGDSTADDR option is enabled for UDP sockets, which allows retrieval of ancillary information containing the original recipient.
- _recvFrom(...) function from transport-native-unix-common/src/main/c/netty_unix_socket.c is modified such that if IP_RECVORIGDSTADDR is set, recvmsg is used instead of recvfrom; enabling the retrieval of the original recipient.
- DatagramSocketAddress also contains a 'local' address, representing the recipient.
- EpollDatagramChannel is updated to return the retrieved recipient address instead of the address the channel is bound to.
Result:
Fixes#4950.
Motivation:
Using a very huge delay when calling schedule(...) may cause an Selector error when calling select(...) later on. We should gaurd against such a big value.
Modifications:
- Add guard against a very huge value.
- Added tests.
Result:
Fixes [#7365]
Motivation:
This allows netty to operate in 'transparent proxy' mode for UDP, intercepting connections
to other addresses by means of Linux firewalling rules, as per
https://www.kernel.org/doc/Documentation/networking/tproxy.txt
Modification:
Add IP_TRANSPARENT option.
Result:
Allows setting and getting of the IP_TRANSPARENT option, which allows retrieval of the ultimate socket address originally requested.
Motivation:
AbstractNioByteChannel will detect that the remote end of the socket has
been closed and propagate a user event through the pipeline. However if
the user has auto read on, or calls read again, we may propagate the
same user events again. If the underlying transport continuously
notifies us that there is read activity this will happen in a spin loop
which consumes unnecessary CPU.
Modifications:
- AbstractNioByteChannel's unsafe read() should check if the input side
of the socket has been shutdown before processing the event. This is
consistent with EPOLL and KQUEUE transports.
- add unit test with @normanmaurer's help, and make transports consistent with respect to user events
Result:
No more read spin loop in NIO when the channel is half closed.
Motivation:
The flush task is currently using flush() which will have the affect of have the flush traverse the whole ChannelPipeline and also flush messages that were written since we gave up flushing. This is not really correct as we should only continue to flush messages that were flushed at the point in time when the flush task was submitted for execution if the user not explicit call flush() by him/herself.
Modification:
Call *Unsafe.flush0() via the flush task which will only continue flushing messages that were marked as flushed before.
Result:
More correct behaviour when the flush task is used.
Motivation:
b215794de3 recently introduced a change in behavior where writeSpinCount provided a limit for how many write operations were attempted per flush operation. However when the write quantum was meet the selector write flag was not cleared, and the channel unsafe flush0 method has an optimization which prematurely exits if the write flag is set. This may lead to no write progress being made under the following scenario:
- flush is called, but the socket can't accept all data, we set the write flag
- the selector wakes us up because the socket is writable, we write data and use the writeSpinCount quantum
- we then schedule a flush() on the EventLoop to execute later, however it the flush0 optimization prematurely exits because the write flag is still set
In this scenario the socket is still writable so the EventLoop may never notify us that the socket is writable, and therefore we may never attempt to flush data to the OS.
Modifications:
- When the writeSpinCount quantum is exceeded we should clear the selector write flag
Result:
Fixes https://github.com/netty/netty/issues/7729
Motivation:
IovArray implements MessageProcessor, and the processMessage method will continue to be called during iteration until it returns true. A recent commit b215794de3 changed the return value to only return true if any component of a CompositeByteBuf was added as a result of the method call. However this results in the iteration continuing, and potentially subsequent smaller buffers maybe added, which will result in out of order writes and generally corrupts data.
Modifications:
- IovArray#add should return false so that the MessageProcessor#processMessage will stop iterating.
Result:
Native transports which use IovArray will not corrupt data during gathering writes of CompositeByteBuf objects.
Motivation:
FileDescriptor#writev calls JNI code, and that JNI code dereferences a NULL pointer which crashes the application. This occurs when writing a single CompositeByteBuf object with more than one component.
Modifications:
- Initialize the iovec iterator properly to avoid the core dump
- Fix the array length calculation if we aren't able to fit all the ByteBuffer objects in the iovec array
Result:
No more core dump.
Motivation:
The writeSpinCount currently loops over the same buffer, gathering
write, file write, or other write operation multiple times but will
continue writing until there is nothing left or the OS doesn't accept
any data for that specific write. However if the OS keeps accepting
writes there is no way to limit how much time we spend on a specific
socket. This can lead to unfair consumption of resources dedicated to a
single socket.
We currently don't limit the amount of bytes we attempt to write per
gathering write. If there are many more bytes pending relative to the
SO_SNDBUF size we will end up building iov arrays with more elements
than can be written, which results in extra iteration, conditionals,
and book keeping.
Modifications:
- writeSpinCount should limit the number of system calls we make to
write data, instead of applying to individual write operations
- IovArray should support a maximum number of bytes
- IovArray should support composite buffers of greater than size 1024
- We should auto-scale the amount of data that we attempt to write per
gathering write operation relative to SO_SNDBUF and how much data is
successfully written
- The non-unsafe path should also support a maximum number of bytes,
and respect the IOV_MAX limit
Result:
Write resource consumption can be bounded and gathering writes have
a limit relative to the amount of data which can actually be accepted
by the socket.
Motivation:
We used NetUtil.isIpV4StackPreferred() when loading JNI code which tries to load NetworkInterface in its static initializer. Unfortunally a lock on the NetworkInterface class init may be already hold somewhere else which may cause a loader deadlock.
Modifications:
Add a new Socket.initialize() method that will be called when init the library and pass everything needed to the JNI level so we not need to call back to java.
Result:
Fixes [#7458].
Motivation:
AbstractChannel attempts to "filter" messages which are written [1]. A goal of this process is to copy from heap to direct if necessary. However implementations of this method [2][3] may translate a buffer with 0 readable bytes to EMPTY_BUFFER. This may mask a user error where an empty buffer is written but already released.
Modifications:
Replace safeRelease(...) with release(...) to ensure we propagate reference count issues.
Result:
Fixes [#7383]
Automatic-Module-Name entry provides a stable JDK9 module name, when Netty is used in a modular JDK9 applications. More info: http://blog.joda.org/2017/05/java-se-9-jpms-automatic-modules.html
When Netty migrates to JDK9 in the future, the entry can be replaced by actual module-info descriptor.
Modification:
The POM-s are configured to put the correct module names to the manifest.
Result:
Fixes#7218.
Motivation:
To better isolate OS system calls we should not call getsockopt directly but use our netty_unix_socket_getOption0 function. See is a followup of f115bf5.
Modifications:
Export netty_unix_socket_getOption0 by declaring it in the header file and use it
Result:
Better isolation of system calls.
Motivation:
If a user calls EpollSocketChannelConfig.getOptions() and TCP_FASTOPEN_CONNECT is not supported we throw an exception.
Modifications:
- Just return 0 if ENOPROTOOPT is set.
- Add testcase
Result:
getOptions() works as epxected.
Motivation:
We need to set readPending to false when we detect a EOF while issue a read as otherwise we may not unregister from the Selector / Epoll / KQueue and so keep on receving wakeups.
The important bit is that we may even get a wakeup for a read event but will still will only be able to read 0 bytes from the socket, so we need to be very careful when we clear the readPending. This can happen because we generally using edge-triggered mode for our native transports and because of the nature of edge-triggered we may schedule an read event just to find out there is nothing left to read atm (because we completely drained the socket on the previous read).
Modifications:
Set readPending to false when EOF is detected.
Result:
Fixes [#7255].
This reverts commit 413c7c2cd8 as it introduced an regression when edge-triggered mode is used which is true for our native transports by default. With 413c7c2cd8 included it was possible that we set readPending to false by mistake even if we would be interested in read more.
Motivation:
Linux kernel 4.11 introduced a new socket option,
TCP_FASTOPEN_CONNECT, that greatly simplifies making TCP Fast Open
connections on client side. Usually simply setting the flag before
connect() call is enough, no more changes are required.
Details can be found in kernel commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=19f6d3f3
Modifications:
TCP_FASTOPEN_CONNECT socket option was added to EpollChannelOption
class.
Result:
Netty clients can easily make TCP Fast Open connections. Simply
calling option(EpollChannelOption.TCP_FASTOPEN_CONNECT, true) in
client bootstrap is enough (given recent enough kernel).
Motivation:
readPending is currently only set to false if data is delivered to the application, however this may result in duplicate events being received from the selector in the event that the socket was closed.
Modifications:
- We should set readPending to false before each read attempt for all
transports besides NIO.
- Based upon the Javadocs it is possible that NIO may have spurious
wakeups [1]. In this case we should be more cautious and only set
readPending to false if data was actually read.
[1] https://docs.oracle.com/javase/7/docs/api/java/nio/channels/SelectionKey.html
That a selection key's ready set indicates that its channel is ready for some operation category is a hint, but not a guarantee, that an operation in such a category may be performed by a thread without causing the thread to block.
Result:
Notification from the selector (or simulated events from kqueue/epoll ET) in the event of socket closure.
Fixes https://github.com/netty/netty/issues/7255
Motivation:
Even if it's a super micro-optimization (most JVM could optimize such
cases in runtime), in theory (and according to some perf tests) it
may help a bit. It also makes a code more clear and allows you to
access such methods in the test scope directly, without instance of
the class.
Modifications:
Add 'static' modifier for all methods, where it possible. Mostly in
test scope.
Result:
Cleaner code with proper 'static' modifiers.
Motivation:
There are 2 motivations, the first depends on the second:
Loading Netty Epoll statically stopped working in 4.1.16, due to
`Native` always loading the arch specific shared object. In a
static binary, there is no arch specific SO.
Second, there are a ton of exceptions that can happen when loading
a native library. When loading native code, Netty tries a bunch of
different paths but a failure in any given may not be fatal.
Additionally: turning on debug logging is not always feasible so
exceptions get silently swallowed.
Modifications:
* Change Epoll and Kqueue to try the static load second
* Modify NativeLibraryLoader to record all the locations where
exceptions occur.
* Attempt to use `addSuppressed` from Java 7 if available.
Alternatives Considered:
An alternative would be to record log messages at each failure. If
all load attempts fail, the log messages are printed as warning,
else as debug. The problem with this is there is no `LogRecord` to
create like in java.util.logging. Buffering the args to
logger.log() at the end of the method loses the call site, and
changes the order of events to be confusing.
Another alternative is to teach NativeLibraryLoader about loading
the SO first, and then the static version. This would consolidate
the code fore Epoll, Kqueue, and TCNative. I think this is the
long term better option, but this PR is changing a lot already.
Someone else can take a crack at it later
Results:
Epoll Still Loads and easier debugging.
Motivation:
We used to check for version 6.8 but the latest is 6.9
Modifications:
Update version to 6.9 in the check.
Result:
Be able to cut a release on latest centos version
Motivation:
When SO_LINGER is used we run doClose() on the GlobalEventExecutor by default so we need to ensure we schedule all code that needs to be run on the EventLoop on the EventLoop in doClose. Beside this there are also threading issues when calling shutdownOutput(...)
Modifications:
- Schedule removal from EventLoop to the EventLoop
- Correctly handle shutdownOutput and shutdown in respect with threading-model
- Add unit tests
Result:
Fixes [#7159].
Motivation:
We need to ensure we use the correct osname in the Bundle-NativeCode declaration as declared in:
https://www.osgi.org/developer/specifications/reference/
Modifications:
Update osname to match the spec.
Result:
Correct Bundle-NativeCode entry in the MANIFEST
Motivation:
We should only try to load the native artifacts if the architecture we are currently running on is the same as the one the native libraries were compiled for.
Modifications:
Include architecture in native lib name and append the current arch when trying to load these. This will fail then if its not the same as the arch of the compiled arch.
Result:
Fixes [#7150].
Motivation:
PD and PD0 Both try to find and use Unsafe. If unavailable, they
try to log why and continue on. However, it is not always east to
enable this logging. Chaining exceptions together is much easier
to reach, and the original exception is relevant when Unsafe is
needed.
Modifications:
* Make PD log why PD0 could not be loaded with a trace level log
* Make PD0 remember why Unsafe wasn't available
* Expose unavailability cause through PD for higher level use.
* Make Epoll and KQueue include the reason when failing
Result:
Easier debugging in hard to reconfigure environments
Motivation:
If AutoClose is false and there is a IoException then AbstractChannel will not close the channel but instead just fail flushed element in the ChannelOutboundBuffer. AbstractChannel also notifies of writability changes, which may lead to an infinite loop if the peer has closed its read side of the socket because we will keep accepting more data but continuously fail because the peer isn't accepting writes.
Modifications:
- If the transport throws on a write we should acknowledge that the output side of the channel has been shutdown and cleanup. If the channel can't accept more data because it is full, and still healthy it is not expected to throw. However if the channel is not healthy it will throw and is not expected to accept any more writes. In this case we should shutdown the output for Channels that support this feature and otherwise just close.
- Connection-less protocols like UDP can remain the same because the channel may disconnected temporarily.
- Make sure AbstractUnsafe#shutdownOutput is called because the shutdown on the socket may throw an exception.
Result:
More correct handling of write failure when AutoClose is false.
Motivation:
As noticed in https://stackoverflow.com/questions/45700277/
compilation can fail if the definition of a method doesn't
match the declaration. It's easy enough to add this in, and make
it easy to compile.
Modifications:
Add JNIEXPORT to the entry points.
* On Windows this adds: `__declspec(dllexport)`
* On Mac this adds: `__attribute__((visibility("default")))`
* On Linux (GCC 4.2+) this adds: ` __attribute__((visibility("default")))`
* On other it doesn't add anything.
Result:
Easier compilation
Motivation:
KQueueEventLoop and EpollEventLoop implement different approaches to applying a timeout of their respective poll calls. Epoll attempts to ensure the desired timeout is satisfied at the java layer and at the JNI layer, but it should be sufficient to account for spurious wakups at the JNI layer. Epoll timeout granularity is also limited to milliseconds which may be too large for some latency sensitive applications.
Modifications:
- Make EpollEventLoop wait method look like KQueueEventLoop
- Epoll should support a finer timeout granularity via timerfd_create. We can hide most of these details behind the epollWait0 JNI call to avoid crossing additional JNI boundaries.
Result:
More consistent timeout approach between KQueue and Epoll.
Motivation:
The EPOLL transport uses EPOLLRDHUP to detect when the peer closes the write side of the socket. Currently KQueue is not able to mimic this behavior and the only way to detect if the peer has closed is to read. It may not always be appropriate to read for backpressure and other reasons at the application level.
Modifications:
- Support EVFILT_SOCK filter which provides notification when the peer closes the socket
Result:
KQueue transport has more consistent behavior with Epoll transport for detecting peer closure.
Motivation:
Due to an oversight (by myself), linking two JNI modules with
duplicate symbols fails in linking. This only seems to happen
some of the time (the behavior seems to be different between GCC
and Clang toolchains). For instance, including both netty tcnative
and netty epoll fails to link because of duplicate JNI_OnLoad
symobols.
Modification:
Do not define the JNI_OnLoad and JNI_OnUnload symbols when
compiling for static linkage, as indicated by the NETTY_BUILD_STATIC
preprocessor define. They are never directly called when
statically linked.
Result:
Able to statically compile epoll and tcnative code into a single
binary.
Motivation:
At the moment we try to load the library using multiple names which includes names using - but also _ . We should just use _ all the time.
Modifications:
Replace - with _
Result:
Fixes [#7069]
Motivation:
Implementations of DuplexChannel delegate the shutdownOutput to the underlying transport, but do not take any action on the ChannelOutboundBuffer. In the event of a write failure due to the underlying transport failing and application may attempt to shutdown the output and allow the read side the transport to finish and detect the close. However this may result in an issue where writes are failed, this generates a writability change, we continue to write more data, and this may lead to another writability change, and this loop may continue. Shutting down the output should fail all pending writes and not allow any future writes to avoid this scenario.
Modifications:
- Implementations of DuplexChannel should null out the ChannelOutboundBuffer and fail all pending writes
Result:
More controlled sequencing for shutting down the output side of a channel.
Motivation:
We did not correctly handle connect() and disconnect() in EpollDatagramChannel / KQueueDatagramChannel and so the behavior was different compared to NioDatagramChannel.
Modifications:
- Correct implement connect and disconnect methods
- Share connect and related code
- Add tests
Result:
EpollDatagramChannel / KQueueDatagramChannel also supports correctly connect() and disconnect() methods.
Motivation:
IP_TRANSPARENT support is not complete, the option can currently only be set on EpollServerSocket. Setting the option on an EpollSocket is also requires so as to be able to bind a socket to a non-local address as described in ip(7)
http://man7.org/linux/man-pages/man7/ip.7.html
"TProxy redirection with the iptables TPROXY target also
requires that this option be set on the redirected socket."
Modifications:
Added IP_TRANSPARENT socket option to EpollSocketChannelConfig
Result:
A redirecting socket can be created with a non-local IP address as required for TPROXY
Motivation:
We used an int[] to store all values that are returned in the struct for TCP_INFO which is not good enough as it uses usigned int values.
Modifications:
- Change int[] to long[] and correctly cast values.
Result:
No more truncated values.
Motivation:
`Epoll.ensureAvailability()` is called multiple times, once in
static initialization and in a couple of the constructors. This is
redundant and confusing to read.
Modifications:
Move `Epoll.ensureAvailability()` call into an instance initializer
and remove all other references. This ensures that every EELG
checks availability, while still delaying the check until
construction. This pattern is used when there are multiple ctors,
as in this class.
Result:
Easier to read code.
Motivation:
We should not try to detect a free port in tests put just use 0 when bind so there is no race in which the system my bind something to the port we choosen before.
Modifications:
- Remove the usage of TestUtils.getFreePort() in the testsuite
- Remove hack to workaround bind errors which will not happen anymore now
Result:
Less flacky tests.
Motivation:
When run the current testsuite on docker (mac) it will fail a few tests with:
io.netty.channel.AbstractChannel$AnnotatedConnectException: connect(..) failed: Cannot assign requested address: /0:0:0:0:0:0:0:0%0:46607
Caused by: java.net.ConnectException: connect(..) failed: Cannot assign requested address
Modifications:
Specify host explicit as done in other tests to only use ipv6 when really supported.
Result:
Build pass on docker as well
Motivation:
JCTools 2.0.2 provides an unbounded MPSC linked queue. Before we shaded JCTools we had our own unbounded MPSC linked queue and used it in various places but gave this up because there was no public equivalent available in JCTools at the time.
Modifications:
- Use JCTool's MPSC linked queue when no upper bound is specified
Result:
Fixes https://github.com/netty/netty/issues/5951
Motivation:
An intermediate list is creating in the `EpollEventLoop#closeAll` to prevent ConcurrentModificationException. But this is not the obvious purpose has no comment.
Modifications:
Add comment to clarify the appointment of the intermediate collection.
Result:
More clear code.
Motivation:
We used an intermediate collection to store the read DatagramPackets and only fired these through the pipeline once wewere done with the reading loop. This is not needed and can also increase memory usage.
Modifications:
Remove intermediate collection
Result:
Less overhead and possible less memory usage during read loop.
Motivation:
We had a typo in the method name of the EpollSocketChannelConfig.
Modifications:
Deprecate old method and introduce a new one.
Result:
Fixes [#6909]
Motivation:
Enable static linking for Java 8. These commits are the same as those introduced to netty tcnative. The goal is to allow lots of JNI libraries to be statically linked together without having conflict `JNI_OnLoad` methods.
Modification:
* add JNI_OnLoad suffixes to enable static linking
* Add static names to the list of libraries that try to be loaded
* Enable compiling with JNI 1.8
* Sort includes
Result:
Enable statically linked JNI code.
The code in question has this comment, but it is *after* the fall
so the static analysis flags it.
This is described in http://errorprone.info/bugpattern/FallThrough
Modifications:
Move fall through comment to where the fall actually occurs
Result:
More compatible with Error Prone tools
Motivation:
Google requires stricter compilation by adding -Werror and enabling many other warnings.
Modification:
* fix warning caused by -Wmissing-braces
* Use the address of `sendmmsg` rather than the function itself when
checking for presence. This resovles the warning caused by
`-Wpointer-bool-conversion`.
More detail:
When compiling on Linux, `sendmmsg` is always present, so the
function is always nonnull. When compiling elsewhere, the
function is defined as `__attribute__((weak))` which means it
may be absent at link time. This is controlled by
`IO_NETTY_SENDMMSG_NOT_FOUND`, which is off by default.
The reason for the error is due to the risk of accidentally not
calling the function. By adding `&` before the function, there
is no ambiguity. (the result of the fn call cannot have its
address taken.)
* use != to check for sendmmsg
Result:
Easier compilation.
Motivation:
This allows netty to operate in 'transparent proxy' mode, intercepting connections
to other addresses by means of Linux firewalling rules, as per
https://www.kernel.org/doc/Documentation/networking/tproxy.txt
The original destination address can be obtained by referencing
ch.localAddress().
Modification:
Add methods similar to those for ipFreeBind, to set the IP_TRANSPARENT option.
Result:
Allows setting and getting of the IP_TRANSPARENT option, which allows retrieval of the ultimate socket address originally requested.
Motivation:
We should not force autoconf and compile as this will result in multiple executions and so slow down the build.
Modifications:
Remove force declarations
Result:
Faster build of native modules
Motivation:
1. special handling of ByteBuf with multi nioBuffer rather than type of CompositeByteBuf (eg. DuplicatedByteBuf with CompositeByteBuf)
2. EpollDatagramUnicastTest and KQueueDatagramUnicastTest passed because CompositeByteBuf is converted to DuplicatedByteBuf before write to channel
3. uninitalized struct msghdr will raise error
Modifications:
1. isBufferCopyNeededForWrite(like isSingleDirectBuffer in NioDatgramChannel) checks wether a new direct buffer is needed
2. special handling of ByteBuf with multi nioBuffer in EpollDatagramChannel, AbstractEpollStreamChannel, KQueueDatagramChannel, AbstractKQueueStreamChannel and IovArray
3. initalize struct msghdr
Result:
handle of ByteBuf with multi nioBuffer in EpollDatagramChannel and KQueueDatagramChannel are ok
Motivation:
We only can call eventLoop() if we are registered on an EventLoop yet. As we just did this without checking we spammed the log with an error that was harmless.
Modifications:
Check if registered on eventLoop before try to deregister on close.
Result:
Fixes [#6770]
Motivation:
For our native libraries in netty we support shading, to have this work on runtime the user needs to set a system property. This code should shared.
Modifications:
Move logic to NativeLbiraryLoader and so share for all native libs.
Result:
Less code duplication and also will work for netty-tcnative out of the box once it support shading
Motivation:
MacOS will throw an error when attempting to set the IP_TOS socket option if IPv6 is available, and also when getting the value for IP_TOS.
Modifications:
- Socket#setTrafficClass and Socket#getTrafficClass should try to use IPv6 first, and check if the error code indicates the protocol is not supported before trying IPv4
Result:
Fixes https://github.com/netty/netty/issues/6741.
Motivation:
To ensure the release plugin works correctly we need to ensure all modules are included during build.
Modification:
- Include all modules
- Skip compilation and tests for native code when not supported but still include the module and build the jar
Result:
Build and release works again
Motivation:
We currently don't have a native transport which supports kqueue https://www.freebsd.org/cgi/man.cgi?query=kqueue&sektion=2. This can be useful for BSD systems such as MacOS to take advantage of native features, and provide feature parity with the Linux native transport.
Modifications:
- Make a new transport-native-unix-common module with all the java classes and JNI code for generic unix items. This module will build a static library for each unix platform, and included in the dynamic libraries used for JNI (e.g. transport-native-epoll, and eventually kqueue).
- Make a new transport-native-unix-common-tests module where the tests for the transport-native-unix-common module will live. This is so each unix platform can inherit from these test and ensure they pass.
- Add a new transport-native-kqueue module which uses JNI to directly interact with kqueue
Result:
JNI support for kqueue.
Fixes https://github.com/netty/netty/issues/2448
Fixes https://github.com/netty/netty/issues/4231
Make the FileRegion comments about which transports are supported more accurate.
Also, eleminate any outstanding references to FileRegion.transfered as the method was renamed for spelling.
Modifications:
Class-level comment on FileRegion, can call renamed method.
Result:
More accurate documentation and less calls to deprecated methods.
Motivation:
Exceptions generated from transport-native-epoll may include duplicate error string description or inconsistent usage of the method name in the string description.
Modifications:
- Ensure the method name from static exceptions and dynamic exceptions is of the same format
- Remove duplicate string rational from the exception messages
Result:
More consistent error messages with no duplicate error description.
Motivation:
We missed some stuff in 5728e0eb2c and so the build failed on java9
Modifications:
- Add extra cmdline args when needed
- skip the autobahntestsuite as jython not works with java9
- skip the osgi testsuite as the maven plugin not works with java9
Result:
Build finally passed on java9
Motivation:
EpollDatagramChannel uses getOption in the isActive method. getOption is backed by a relatively large conditional if/else if block and this conditional checking can be avoided in the epoll transport.
Modifications:
- Add EpollDatagramChannelConfig#getActiveOnOpen and use this in EpollDatagramChannel
Result:
Conditional checking due to getOption is removed from EpollDatagramChannel.
Motivation:
When the EPOLLRDHUP event is received we assume that the read side of the FD is no longer functional and force the input state to be shutdown. However if the channel is still active we should rely upon EPOLLIN and read to indicate there is no more data before we update the shutdown state. If we do not do this we may not read all pending data in the FD if the RecvByteBufAllocator doesn't want to consume it all in a single read operation.
Modifications:
- AbstractEpollChannel#epollRdHupReady() shouldn't force shutdown the input if the channel is active
Result:
All data can be read even if the RecvByteBufAllocator doesn't read it all in the current read loop.
Fixes https://github.com/netty/netty/issues/6303
Motivation:
We need to pass special arguments to run with jdk9 as otherwise some tests will not be able to run.
Modifications:
Allow to define extra arguments when running with jdk9
Result:
Tests pass with jdk9
Motivation:
EPOLL annotates some exceptions to provide the remote address, but the original exception is not preserved. This may make determining a root cause more difficult. The static EPOLL exceptions references the native method that failed, but does not provide a description of the actual error number. Without the description users have to know intimate details about the native calls and how they may fail to debug issues.
Modifications:
- annotated exceptions should preserve the original exception
- static exceptions should include the string description of the expected errno
Result:
EPOLL exceptions provide more context and are more useful to end users.
Motivation:
EpollRecvByteAllocatorHandle intends to override the meaning of "maybe more data to read" which is a concept also used in all existing implementations of RecvByteBufAllocator$Handle but the interface doesn't support overriding. Because the interfaces lack the ability to propagate this computation EpollRecvByteAllocatorHandle attempts to implement a heuristic on top of the delegate which may lead to reading when we shouldn't or not reading data.
Modifications:
- Create a new interface ExtendedRecvByteBufAllocator and ExtendedHandle which allows the "maybe more data to read" between interfaces
- Deprecate RecvByteBufAllocator and change all existing implementations to extend ExtendedRecvByteBufAllocator
- transport-native-epoll should require ExtendedRecvByteBufAllocator so the "maybe more data to read" can be propagated to the ExtendedHandle
Result:
Fixes https://github.com/netty/netty/issues/6303.
Motivation:
We should call Epoll.ensureAvailability() when init EpollEventLoopGroup to fail fast and with a proper exception.
Modifications:
Call Epoll.ensureAvailability() during EpollEventLoopGroup init.
Result:
Fail fast if epoll is not availability (for whatever reason).
Motivation:
EpollRecvByteAllocatorHandle will read unconditionally if EPOLLRDHUP has been received. However we can just treat this the same was we do as data maybe pending in ET mode, and let LT mode notify us if we haven't read all data.
Modifications:
- EpollRecvByteAllocatorHandle should not always force a read just because EPOLLRDHUP has been received, but just treated as an indicator that there maybe more data to read in ET mode
Result:
Fixes https://github.com/netty/netty/issues/6173.
Motivation:
In later Java8 versions our Atomic*FieldUpdater are slower then the JDK implementations so we should not use ours anymore. Even worse the JDK implementations provide for example an optimized version of addAndGet(...) using intrinsics which makes it a lot faster for this use-case.
Modifications:
- Remove methods that return our own Atomic*FieldUpdaters.
- Use the JDK implementations everywhere.
Result:
Faster code.
Motivation:
To keep our code clean we should fail the build when unused c code is detected.
Modifications:
- Add '-Wunused-variable' to build flags
Result:
Cleaner code.
Motivation:
When attempting to flamegraph netty w/ epoll it was noticed the stacks are lost going from
java to epoll lib.
Modifications:
added the -fno-omit-framepointer flag to compiler flags to ensure the fp is kept intact
Result:
Flamegraphs will now show native code in the same stack as java code using perf-java-flames
Motivation:
We missed to set active = true in EpollServerDomainSocketChannel.doBind(...) which also means that channelActive(...) was never triggered.
Modifications:
Correct set active = true in doBind(...)
Result:
EpollServerDomainSocketChannel is correctly set to active when bound.
Motivation:
I had a need to know the user credentials of a connected unix domain socket.
Modifications:
Added a class to encapsulate user credentials (UID, GID, and the PID).
Augemented the Socket class to provide the JNI native interface to return this new class
Augemented the c code to call getSockOpts passing <a href=http://man7.org/linux/man-pages/man7/socket.7.html>SO_PEERCRED</a>
Then surfaced the ability to get user credentials in the EpollDomainSocketChannel
Result:
The EpollDomainSocketChannel now has a the following function signature:
public PeerCredentials peerCredentials() throws IOException allowing a caller to get the UID, GID, and PID of the linux process
connected to the unix domain socket.
Motivation:
The previously generated manifest causes a parse exception when loaded into an Apache Felix OSGI container.
Modifications:
Fix parameter delimiter and unbalanced quotes in manifest entry. Suffixed with asterisk so the bundle is resolved on other architectures as well even if native libs won't be loaded.
Result:
Bundle will load properly in OSGI containers.
Motivation:
If an exception is thrown while processing the ready channels in the EventLoop we should still run all tasks as this may allow to recover. For example a OutOfMemoryError may be thrown and runAllTasks() will free up memory again. Beside this we should also ensure we always allow to shutdown even if an exception was thrown.
Modifications:
- Call runAllTasks() in a finally block
- Ensure shutdown is always handles.
Result:
More robust EventLoop implementations for NIO and Epoll.
Motivation:
At the moment only DefaultFileRegion is supported when using the native epoll transport.
Modification:
- Add support for any FileRegion implementation
- Add test case
Result:
Also custom FileRegion implementation are supported when using the epoll transport.
Motivition:
NIO throws ClosedChannelException when a user tries to call shutdown*() on a closed Channel. We should do the same for the EPOLL transport
Modification:
Throw ClosedChannelException when a user tries to call shutdown*() on a closed channel.
Result:
Consistent behavior.
Motivation:
Commit 2c1f17faa2 introduced a regression which could cause NotYetConnectedExceptions when handling RDHUP events.
Modifications:
Correct ignore NotYetConnectedException when handling RDHUP events.
Result:
No more regression.
Motivation:
The NIO transport used an IllegalStateException if a user tried to issue another connect(...) while the connect was still in process. For this case the JDK specified a ConnectPendingException which we should use. The same issues exists in the EPOLL transport. Beside this the EPOLL transport also does not throw the right exceptions for ENETUNREACH and EISCONN errno codes.
Modifications:
- Replace IllegalStateException with ConnectPendingException in NIO and EPOLL transport
- throw correct exceptions for ENETUNREACH and EISCONN in EPOLL transport
- Add test case
Result:
More correct error handling for connect attempts when using NIO and EPOLL transport
Motivation:
We need to ensure we also call fireChannelActive() if the Channel is directly closed in a ChannelFutureListener that is belongs to the promise for the connect. Otherwise we will see missing active events.
Modifications:
Ensure we always call fireChannelActive() if the Channel was active.
Result:
No missing events.
Motivation:
We should throw a NotYetConnectedException when ENOTCONN errno is set. This is also consistent with NIO.
Modification:
Throw correct exception and add test case
Result:
More correct and consistent behavior.
Motivation:
In 4.0 AbstractNioByteChannel has a default of 16 max messages per read. However in 4.1 that constraint was applied at the NioSocketChannel which is not equivalent. In 4.1 AbstractEpollStreamChannel also did not have the default of 16 max messages per read applied.
Modifications:
- Make Nio consistent with 4.0
- Make Epoll consistent with Nio
Result:
Nio and Epoll both have consistent ChannelMetadata and are consistent with 4.0.
Motivation:
The build generates a OSGi bundle with missing Bundle-NativeCode manifest entry.
Modifications:
Add missing manifest entry.
Result:
Be able to use transport-native-epoll in osgi container.
Motivation:
ECONNREFUSED can be a common type of exception when attempting to finish the connection process. Generating a new exception each time can be costly and quickly bloat memory usage.
Modifications:
- Expose ECONNREFUSED from JNI and cache this exception in Socket.finishConnect
Result:
ECONNREFUSED during finish connect doesn't create a new exception each time.
Motiviation:
Sometimes it is useful to allow to specify a custom strategy to handle rejected tasks. For example if someone tries to add tasks from outside the eventloop it may make sense to try to backoff and retries and so give the executor time to recover.
Modification:
Add RejectedEventExecutor interface and implementations and allow to inject it.
Result:
More flexible handling of executor overload.
Motivation:
To restrict the memory usage of a system it is sometimes needed to adjust the number of max pending tasks in the tasks queue.
Modifications:
- Add new constructors to modify the number of allowed pending tasks.
- Add system properties to configure the default values.
Result:
More flexible configuration.
Motivation:
We use pre-instantiated exceptions in various places for performance reasons. These exceptions don't include a stacktrace which makes it hard to know where the exception was thrown. This is especially true as we use the same exception type (for example ChannelClosedException) in different places. Setting some StackTraceElements will provide more context as to where these exceptions original and make debugging easier.
Modifications:
Set a generated StackTraceElement on these pre-instantiated exceptions which at least contains the origin class and method name. The filename and linenumber are specified as unkown (as stated in the javadocs of StackTraceElement).
Result:
Easier to find the origin of a pre-instantiated exception.
Motivation:
Unused methods create warnings on some C compilers. It may not be feasible to selectively turn them off.
Modifications:
Remove createInetSocketAddress as it is unused.
Result:
Less noisy compilation
Motivation:
JCTools supports both non-unsafe, unsafe versions of queues and JDK6 which allows us to shade the library in netty-common allowing it to stay "zero dependency".
Modifications:
- Remove copy paste JCTools code and shade the library (dependencies that are shaded should be removed from the <dependencies> section of the generated POM).
- Remove usage of OneTimeTask and remove it all together.
Result:
Less code to maintain and easier to update JCTools and less GC pressure as the queue implementation nt creates so much garbage
Motivation:
epoll_wait accepts a timeout argument which will specify the maximum amount of time the epoll_wait will wait for an event to occur. If the epoll_wait method returns for any reason that is not fatal (e.g. EINTR) the original timeout value is re-used. This does not honor the timeout interface contract and can lead to unbounded time in epoll_wait.
Modifications:
- The time taken by epoll_wait should be decremented before calling epoll_wait again, and if the remaining time is exhausted we should return 0 according to the epoll_wait interface docs http://man7.org/linux/man-pages/man2/epoll_wait.2.html
- link librt which is needed for some platforms to use clock_gettime
Result:
epoll_wait will wait for at most timeout ms according to the epoll_wait interface contract.
Motivation:
Sometimes it may be benefitially for an user to specify a custom algorithm when choose the next EventExecutor/EventLoop.
Modifications:
Allow to specify a custom EventExecutorChooseFactory that allows to customize algorithm.
Result:
More flexible api.
Motivation:
We used transfered in native code which is not correct spelling. It should be transferred.
Modifications:
Fix typo.
Result:
Less typos in source code.
Motivation:
Currenlty, netty-transport-native-epoll-*-linux-x86_64.jar is not packed as OSGi bundle
and thus not working in OSGi environment.
Modifications:
In netty-transport-native-epoll's pom.xml added configuration
to attach manifest to the jar with a native library.
In netty-common's pom.xml added configuration instruction (DynamicImport-Package)
to maven bnd plugin to make sure the native code is loaded from
netty-transport-native-epoll bundle.
Result:
The netty-transport-native-epoll-*-linux-x86_64.jar is a bundle (MANIFEST.MF attached)
and the inluced native library can be successfuly loaded in OSGi environment.
Fixing #5119
Motivation:
SingleThreadEventExecutor.pendingTasks() will call taskQueue.size() to get the number of pending tasks in the queue. This is not safe when using MpscLinkedQueue as size() is only allowed to be called by a single consumer.
Modifications:
Ensure size() is only called from the EventLoop.
Result:
No more livelock possible when call pendingTasks, no matter from which thread it is done.
Motivation:
The DuplexChannel is currently incomplete and only supports shutting down the output side of a channel. This interface should also support shutting down the input side of the channel.
Modifications:
- Add shutdownInput and shutdown methods to the DuplexChannel interface
- Remove state in NIO and OIO for tracking input being shutdown independent of the underlying transport's socket type. Tracking the state independently may lead to inconsistent state.
Result:
DuplexChannel supports shutting down the input side of the channel
Fixes https://github.com/netty/netty/issues/5175
Motivation:
If a task was submitted when wakenUp value was 1, the task didn't get a chance to produce wakeup event. So we need to check task queue again before calling epoll_wait. If we don't, the task might be pended until epoll_wait was timed out. It might be pended until idle timeout if IdleStateHandler existed in pipeline.
Modifications:
Execute epoll_wait in a non-blocking manner if there's a task submitted when wakenUp value was 1.
Result:
Every tasks in EpollEventLoop will not be pended.
Motivation:
EventExecutor.children uses generics in such a way that an entire colleciton must be cast to a specific type of object. This interface is not very flexible and is impossible to implement if the EventExecutor type must be wrapped. The current usage of this method also does not have any clear need within Netty. The Iterator interface allows for EventExecutor to be wrapped and forces the caller to make assumptions about types instead of building the assumptions into the interface.
Motivation:
- Remove EventExecutor.children and undeprecate the iterator() interface
Result:
EventExecutor interface has one less method and is easier to wrap.
Motivation:
When epoll datagram channel invokes sendmmsg0, _all_ of the messages go
on the wire with the address of the _last_ packet in the list.
Modifications:
An array of addresses equal to the length of the messages is allocated
on the stack to hold the address for each msg_hdr.msg_name.
Result:
Each message goes on the wire with the correct address.
Motivation:
NioEventLoopGroup supports constructors which take an executor but EpollEventLoopGroup does not. EPOLL should be consistent with NIO where ever possible.
Modifications:
- Add constructors to EpollEventLoopGroup which accept an Executor as a parameter
Result:
EpollEventLoopGroup is more consistent with NioEventLoopGroup
Fixes https://github.com/netty/netty/issues/5161
Motivation:
Before release 4.1.0.Final we should update all our dependencies.
Modifications:
Update dependencies.
Result:
Up-to-date dependencies used.
Motivation:
Some applications may use alternative methods of loading the epoll JNI symbols. We should support this use case.
Modifications:
Attempt to use a side effect free JNI method. If that fails, load the library.
Result:
Fixes#5122
Motivation:
We missed to correctly retrieve the localAddress() after we called Socket.connect(..) and so the user would always see an incorrect address when calling EpollSocketChannel.localAddress().
Modifications:
- Ensure we always retrieve the localAddress() after we called Socket.connect(...) as only after this we will be able to receive the correct address.
- Add unit test
Result:
Correct and consistent behaviour across different transports (NIO/OIO/EPOLL).
Motivation:
441aa4c575 conditionally set the readFlag based upon if maybeMoreDataToRead is set. It is possible that the read flag will not be set, and nothing will be read by executeEpollInReadyRunnable and no actual data will be read even though the user requested it.
Modifications:
- Always set the readFlag in doBeginRead
- Make it so only a single epollInReadyRunnable can execute for a channel at a time
Result:
Less chance of missing read events in EPOLL transport.
Motivation:
OIO/NIO use a volatile variable to track if a read is pending. EPOLL does not use a volatile an executes a Runnable on the event loop thread to set readPending to false. These mechansims should be consistent, and not using a volatile variable is preferable because the variable is written to frequently in the event loop thread.
OIO also does not set readPending to false before each fireChannelRead operation and may result in reading more data than the user desires.
Modifications:
- OIO/NIO should not use a volatile variable for readPending
- OIO should set readPending to false before each fireChannelRead
Result:
OIO/NIO/EPOLL are more consistent w.r.t. readPending and volatile variable operations are reduced
Fixes https://github.com/netty/netty/issues/5069
Motivation:
441aa4c575 introduced a bug in transport-native-epoll where readPending is set to false before a read is attempted, but this should happen before fireChannelRead is called. The NIO transport also only sets the readPending variable to false on the first read in the event loop. This means that if the user only calls read() on the first channelRead(..) the select loop will still listen for read events even if the user does not call read() on subsequent channelRead() or channelReadComplete() in the same event loop run. If the user only needs 2 channelRead() calls then by default they will may get 14 more channelRead() calls in the current event loop, and then 16 more when the event loop is woken up for a read event. This will also read data off the TCP stack and allow the peer to queue more data in the local RECV buffers.
Modifications:
- readPending should be set to false before each call to channelRead()
- make NIO readPending set to false consistent with EPOLL
Result:
NIO and EPOLL transport set readPending to false at correct times which don't read more data than intended by the user.
Fixes https://github.com/netty/netty/issues/5082
Motivation:
There is a spelling error in FileRegion.transfered() as it should be transferred().
Modifications:
Deprecate old method and add a new one.
Result:
Fix typo and can remove the old method later.
Motivation:
bfbef036a8 made EPOLL respect autoRead while in ET mode. However it is possible that we may miss data pending on the RECV queue if autoRead is off. This is because maybeMoreDataToRead is updated after fireChannelRead and if a user calls read() from here maybeMoreDataToRead will be false because it is updated after the fireChannelRead call. The way maybeMoreDataToRead was updated also causes a single channel to continuously read on the event loop and not relinquish and give other channels to try reading.
Modifications:
- Ensure maybeMoreDataToRead is always set after all user events, and is evaluated with readPending to execute a epollInReady on the EventLoop
- Combine the checkResetEpollIn and maybeMoreDataToRead logic to invoke a epollInReady later into the epollInFinally method due to similar responsibilities
- Update unit tests to reflect the user calling read() on the event loop from channelRead()
Result:
EPOLL ET with autoRead set to false will not leave data on the RECV queue.
Motivation:
Setting the WRITE_BUFFER_LOW_WATER_MARK before WRITE_BUFFER_HIGH_WATER_MARK results in an internal Exception (appears only in the logs) if the value is larger than the default high water mark value. The WRITE_BUFFER_HIGH_WATER_MARK call appears to have no effect in this context.
Setting the values in the reverse order works.
Modifications:
- deprecated ChannelOption.WRITE_BUFFER_HIGH_WATER_MARK and
ChannelOption.WRITE_BUFFER_LOW_WATER_MARK.
- add one new option called ChannelOption.WRITE_BUFFER_WATER_MARK.
Result:
The high/low water mark values limits caused by default values are removed.
Setting the WRITE_BUFFER_LOW_WATER_MARK before WRITE_BUFFER_HIGH_WATER_MARK results in an internal Exception (appears only in the logs) if the value is larger than the default high water mark value. The WRITE_BUFFER_HIGH_WATER_MARK call appears to have no effect in this context.
Setting the values in the reverse order works.
Motivation:
NIO now supports a pluggable select strategy, but EPOLL currently doesn't support this. We should strive for feature parity for EPOLL.
Modifications:
- Add SelectStrategy to EPOLL transport.
Result:
EPOLL transport supports SelectStategy.
Motivation:
We need to break out of the read loop for two reasons:
- If the input was shutdown in between (which may be the case when the user did it in the
fireChannelRead(...) method we should not try to read again to not produce any
miss-leading exceptions.
- If the user closes the channel we need to ensure we not try to read from it again as
the filedescriptor may be re-used already by the OS if the system is handling a lot of
concurrent connections and so needs a lot of filedescriptors. If not do this we risk
reading data from a filedescriptor that belongs to another socket then the socket that
was "wrapped" by this Channel implementation.
Modification:
Break the reading loop if the input was shutdown from within the channelRead(...) method.
Result:
No more meaningless exceptions and no risk to read data from wrong socket after the original was closed.
Motivation:
8dbf5d02e5 modified the shutdown code for Socket but did not correctly calculate the change in shutdown state and only applying this change. This is significant because if sockets are being opening and closed quickly and the underlying FD happens to be reused we need to take care that we don't unintentionally change the state of the new FD by acting on an object which represents the old incarnation of that FD.
Modifications:
- Calculate the shutdown change, and only apply what has changed, or exit if no change.
Result:
Socket.shutdown can not inadvertently affect the state of another logical FD.
Motivation:
cf171ff525 introduced a change in behavior when dealing with closing channel in the read loop. This changed behavior may use stale state to determine if a channel should be shutdown and may be incorrect.
Modifications:
- Revert the usage of potentially stale state
Result:
Closing a channel in the read loop is based upon current state instead of potentially stale state.
Motivation:
The code of transport-native-epoll missed some things in terms of static keywords, @deprecated annotations and other minor things.
Modifications:
- Add missing @deprecated annotation
- Not using FQCN in javadocs
- Add static keyword where possible
- Use final fields when possible
- Remove throws IOException from method where it is not needed.
Result:
Cleaner code.
Motivation:
In commit acbca192bd we changed to have our native operations which either gall getsockopt or setsockopt throw IOExceptions (to be more specific we throw a ClosedChannelException in some cases). Unfortunally I missed to also do the same for getSoError() and missed to add throws IOException to the native methods.
Modifications:
- Correctly throw IOException from getSoError()
- Add throws IOException to native methods where it was missed.
Result:
Correct declaration of getSoError() and other native methods.
Motivation:
If SO_LINGER is set to 0 the EPOLL transport will send a FIN followed by a RST. This is not consistent with the behavior of the NIO transport. This variation in behavior can cause protocol violations in streaming protocols (e.g. HTTP) where a FIN may be interpreted as a valid end to a data stream, but RST may be treated as the data is corrupted and should be discarded.
https://github.com/netty/netty/issues/4170 Claims the behavior of NIO always issues a shutdown when close occurs. I could not find any evidence of this in Netty's NIO transport nor in the JDK's SocketChannel.close() implementation.
Modifications:
- AbstractEpollChannel should be consistent with the NIO transport and not force a shutdown on every close
- FileDescriptor to keep state in a consistent manner with the JDK and not allow a shutdown after a close
- Unit tests for NIO and EPOLL to ensure consistent behavior
Result:
EPOLL is capable of sending just a RST to terminate a connection.
Motivation:
To be consistent with the JDK we should ensure our native methods throw a ClosedChannelException if the Channel was previously closed. This will then be wrapped in a ChannelException as usual. For all other errors we continue to just throw a ChannelException directly.
Modifications:
Ensure getsockopt and setsockopt will throw a ClosedChannelException if the channel was closed before, on other errors we throw a ChannelException as before diretly.
Result:
Consistent with the NIO Channel implementations.
Motivation:
We should always first notify the promise before trigger an event through the pipeline to be consistent.
Modifications:
Ensure we notify the promise before fire event.
Result:
Consistent behavior
Motivation:
EpollServerSocketConfig.isFreebind() throws an exception when called.
Modifications:
Use the correct getsockopt arguments.
Result:
No more exception when call EpollServerSocketConfig.isFreebind()
Motivation:
TCP_MD5 is only supported by SocketChannels so remove it from EpollServerChannelConfig which is generic.
Modifications:
Remove invalid code.
Result:
Remove invalid / dead code.
Motivation:
EPOLL does not support autoread when in ET mode.
Modifications:
- EpollRecvByteAllocatorHandle should not unconditionally force reading just because ET is enabled
- AbstractEpollChannel and all derived classes which implement epollInReady must support a variable which indicates
there may be more data to read. The variable will be used when read is called to simulate a EPOLL wakeup and call epollInReady if necessary. This will ensure that if we don't read until EAGAIN that we will try to read again and not rely on EPOLL to notify us.
Result:
EPOLL ET supports auto read.
Motivation:
When using the native transport have support for TCP_DEFER_ACCEPT or / and TCP_QUICKACK can be useful.
Modifications:
- Add support for TCP_DEFER_ACCEPT and TCP_QUICKACK
- Ad unit tests
Result:
TCP_DEFER_ACCEPT and TCP_QUICKACK are supported now.
Motivation:
For on tests we expected a ConnectTimeoutException but used the default timeout of 10 seconds. This slows down testing.
Modifications:
Use connect timeout of 1 second in unit test.
Result:
Faster execution of unit test.
Motivation:
JNI_OnUnload(...) does not return anything (has void in its signature) so we should not try to return something.
Modifications:
Remove return.
Result:
Fix incorrect but harmless code.
Motivation:
netty_epoll_native.c uses dladdr in attempt to get the name of the library that the code is running in. However the address passed to this funciton (JNI_OnLoad) may not be unique in the context of the application which loaded it. For example if another JNI library is loaded this address may first resolve to the other JNI library and cause the path name parsing to fail, which will cause the library to fail.
Modifications:
- Pass an addresses which is local to the current library to dladdr
Result:
EPOLL JNI library can be loaded in an environment where multiple JNI libraries are loaded.
Fixes https://github.com/netty/netty/issues/4840
Motivation:
Currently our epoll native transport requires sun.misc.Unsafe and so we need to take this into account for Epoll.isAvailable().
Modifications:
Take into account if sun.misc.Unsafe is present.
Result:
Only return true for Epoll.isAvailable() if sun.misc.Unsafe is present.
Motivation:
If Netty's class files are renamed and the type references are updated (shaded) the native libraries will not function. The native epoll module uses implicit JNI bindings which requires the fully qualified java type names to match the method signatures of the native methods. This means EPOLL cannot be used with a shaded Netty.
Modifications:
- Make the JNI method registration dynamic
- support a system property io.netty.packagePrefix which must be prepended to the name of the native library (to ensure the correct library is loaded) and all class names (to allow classes to be correctly referenced)
- remove system property io.netty.native.epoll.nettyPackagePrefix which was recently added and the code to support it was incomplete
Result:
transport-native-epoll can be used when Netty has been shaded.
Fixes https://github.com/netty/netty/issues/4800
Motivation:
As we now can easily build static linked versions of tcnative it makes sense to run our netty build against all of them.
This helps to ensure our code works with libressl, openssl and boringssl.
Modifications:
Allow to specify -Dtcnative.artifactId= and -Dtcnative.version=
Result:
Easy to run netty build against different tcnative flavors.
Motivation:
When a wildcard address is used to bind a socket and ipv4 and ipv6 are usable we should accept both (just like JDK IO/NIO does).
Modifications:
Detect wildcard address and if so use in6addr_any
Result:
Correctly accept ipv4 and ipv6
Motivation:
transport-native-epoll finds java classes from JNI using fully qualified class names. If a shaded version of Netty is used then these lookups will fail.
Modifications:
- Allow a prefix to be appended to Netty class names in JNI code.
Result:
JNI code can be used with shaded version of Netty.
Motivation:
We should also be able to compile the native transport on 32bit systems.
Modifications:
Add cast to intptr_t for pointers
Result:
It's possible now to also compile on 32bit.
Motivation:
transport-native-epoll has its pom.xml encoding attribute set to ISO-8859-15. Because
of this gradle, and other dependency management systems, can't correctly resolve this
library from wherever it happens to be published.
Modifications:
netty/transport-native-epoll/pom.xml had its xml encoding changed to UTF-9
Result:
Gradle, and other dependency management systems, will now be able to correctly resolve this module.
Motivation:
Linux uses different socket options to set the traffic class (DSCP) on IPv6
Modifications:
Also set IPV6_TCLASS for IPv6 sockets
Result:
TrafficClass will work on IPv4 and IPv6 correctly
Motivation:
If an user will close a Socket / FileDescriptor multiple times we should handle the extra close operations as NOOP.
Modifications:
Only do the actual closing one time
Result:
No exception if close is called multiple times.
Motivation:
We missed to define the actual c function for isKeepAlive(...) and so throw UnsatisfieldLinkError.
Modifications:
- Add function
- Add unit test for Socket class
Result:
Correctly work isKeepAlive(...) when using native transport
Motivation:
We need to remove all registered events for a Channel from the EventLoop before doing the actual close to ensure we not produce a cpu spin when the actual close operation is delayed or executed outside of the EventLoop.
Modifications:
Deregister for events for NIO and EPOLL socket implementations when SO_LINGER is used.
Result:
No more cpu spin.
Motivation:
We should retain the original hostname when connect to a remote peer so the user can still query the origin hostname if getHostString() is used.
Modifications:
Compute a InetSocketAddress from the original remote address and the one returned by the Os.
Result:
Same behavior when using epoll transport and nio transport.
Motivation:
Fix a race-condition when closing NioSocketChannel or EpollSocketChannel while try to detect if a close executor should be used and the underlying socket was already closed. This could lead to an exception that then leave the channel / in an invalid state and so could lead to side-effects like heavy CPU usage.
Modifications:
Catch possible socket exception while try to get the SO_LINGER options from the underlying socket.
Result:
No more race-condition when closing the channel is possible with bad side-effects.
Motivation:
AbstractEpollStreamChannel has a queue which collects splice events. Splice is assumed not to be the most common use case of this class and thus the splice queue could be initialized in a lazy fashion to save memory. This becomes more significant when the number of connections grows.
Modifications:
- AbstractEpollStreamChannel.spliceQueue will be initialized in a lazy fashion
Result:
Less memory consumption for most use cases
Motivation:
We should use OneTimeTask where possible to reduce object creation.
Modifications:
Replace Runnable with OneTimeTask
Result:
Less object creation
Motivation:
If we have a lot of writes going on we currently need to lookup the IovArray for each Channel that does writes. This can have quite some perf overhead. We should not need to do this and just store a reference of the IovArray on the EpollEventLoop itself.
Modifications:
- Remove IoArrayThreadLocal
- Store the IoArray in the EventLoop itself
Result:
Less FastThreadLocal lookups
Motivation:
If ChannelOption.ALLOW_HALF_CLOSURE is true and the shutdown input operation fails we should not propagate this exception, and instead consider this socket's read as half closed.
Modifications:
- AbstractEpollChannel.shutdownInput should not propagate exceptions when attempting to shutdown the input, but instead should just close the socket
Result:
Users expecting a ChannelInputShutdownEvent will get this event even if the socket is already shutdown, and the shutdown operation fails.
Motivation:
The EPOLL module was not completly respecting the half closed state. It may have missed events, or procssed events when it should not have due to checking isOpen instead of the appropriate shutdown state.
Modifications:
- use FileDescriptor's isShutdown* methods instead of isOpen to check for processing events.
Result:
Half closed code in EPOLL module is more correct.