Motivation:
Creating exceptions is expensive so we should only do so if really needed.
Modifications:
Only create the ConnectTimeoutException if we really need it.
Result:
Less overhead
Motivation:
When we try to parse the kernel version we need to be careful what to
expect. Especially when a custom kernel is used we may get extra chars
in the version numbers. For example I had this one fail because of my
custom kernel that I built for io_uring:
5.8.7ioring-fixes+
Modifications:
- Try to be a bit more lenient when parsing
- If we cant parse the kernel version just use 0.0.0
Result:
Tests are more robust
only
Motivation:
4b7dba1 introduced a change which was not 100 % complete and so
introduce a regression when a user specified to use
InetProtocolFamily.IPv4 and trying to bind to a port (without specify
the ip).
Modifications:
- Fix regression by respect the InetProtocolFamily
- Add unit test
Result:
Fix regression when binding to port explicit
Motivation:
When we were using the netty http protocol, OOM occurred, this problem has been in 4.1.51.Final Fix [# 10424](https://github.com/netty/netty/issues/10424), even if OOM is up, the service will still receive new connection events, will occur again OOM and eventually cause the connection not to be released.
code `byteBuf = allocHandle.allocate(allocator);`
Modification:
I fail to create buffer when I try to receive new data, i determine if it is OOM then the close read event releases the connection.
```java
if (close || cause instanceof OutOfMemoryError || cause instanceof IOException) {
closeOnRead(pipeline);
}
```
Result:
Fixes # [10434](https://github.com/netty/netty/issues/10434).
Motivation:
Even if the system does not support ipv6 we should try to use it if the user explicit pass an Inet6Address. This way we ensure we fail and not try to convert this to an ipv4 address internally.
This incorrect behavior was introduced by 70731bfa7e
Modifications:
If the user explicit passed an Inet6Address we force the usage of ipv6
Result:
Fixes https://github.com/netty/netty/issues/10402
Motivation:
netty_epoll_linuxsocket_JNI_OnLoad(...) may produce a deadlock with another thread that will load IOUtil in a static block. This seems to be a JDK bug which is not yet fixed. To workaround this we force IOUtil to be loaded from without java code before init the JNI code
Modifications:
Use Selector.open() as a workaround to load IOUtil
Result:
Fixes https://github.com/netty/netty/issues/10187
Motivation:
8dc6ad5 introduced IPV6-mapped-IPV4 address support but
copied the addresses incorrectly. It copied the first
4 bytes of the ipv6 address to the address byte array
at offset 12, instead of the other way around.
7a547aa implemented this correctly in netty_unix_socket.c
but it seems the change should've been applied to
netty_epoll_native.c as well.
The current behaviour will always set the address to
`0.0.0.0`.
Modifications:
Copy the correct bytes from the ipv6 mapped ipv4 address.
I.e. copy 4 bytes at offset 12 from the native address
to the byte array `addr` at offset 0.
Result:
When using recvmmsg with IPV6-mapped-IPV4 addresses,
the address will be correctly copied to the byte array
`addr` in the NativeDatagramPacket instance.
Motivation:
https://github.com/netty/netty/pull/9797 changed the code for recvmmsg and sendmmsg to use the syscalls directly to remvove the dependency on newer GLIBC versions. Unfortunally it made the assumption that the syscall numbers are the same for different architectures, which is not the case.
Thanks to @jayv for pointing it out
Modifications:
Add #if, #elif and #else declarations to ensure we pick the correct syscall number (or not support if if the architecture is not supported atm).
Result:
Pick the correct syscall number depending on the architecture.
Motivation:
Due a bug we did not correctly set the writerIndex of the ByteBuf when a
user specified EpollChannelOption.MAX_DATAGRAM_PAYLOAD_SIZE but we ended
up with a non scattering read.
Modifications:
- Set writerIndex to the correct value
- Add unit tests
Result:
Fixes https://github.com/netty/netty/issues/9788
Motivation
The event loop implementations had become somewhat tangled over time and
work was done recently to streamline EpollEventLoop. NioEventLoop would
benefit from the same treatment and it is more straighforward now that
we can follow the same structure as was done for epoll.
Modifications
Untangle NioEventLoop logic and mirror what's now done in EpollEventLoop
w.r.t. the volatile selector wake-up guard and scheduled task deadline
handling.
Some common refinements to EpollEventLoop have also been included - to
use constants for the "special" deadline/wakeup volatile values and to
avoid some unnecessary calls to System.nanoTime() on task-only
iterations.
Result
Hopefully cleaner, more efficient and less fragile NIO transport
implementation.
Motivation:
394a1b3485 introduced a hard dependency on GLIBC 2.12 which was not the case before. This had the effect of not be able to use the native epoll transports on platforms which ship with earlier versions of GLIBC.
To make things a backward compatible as possible we should not introduce such changes in a bugfix release.
Special thanks to @weissi with all the help to fix this.
Modifications:
- Use syscalls directly to remove dependency on GLIBC 2.12
- Make code consistent that needs newer GLIBC versions
- Adjust scattering read test to only run if recvmmsg syscall is supported
- Cleanup pom.xml as some stuff is not needed anymore after using syscalls.
Result:
Fixes https://github.com/netty/netty/issues/9758.
Motivation:
In most cases, we want to use MultithreadEventLoopGroup such as NioEventLoopGroup without setting thread numbers but thread name only. So we need to use followed code:
NioEventLoopGroup boss = new NioEventLoopGroup(0, new DefaultThreadFactory("boss"));
It looks a bit confuse or strange for the number 0 due to we only want to set thread name. So it will be better to add new constructor for this case.
Modifications:
add new constructor into all event loop groups, for example: public NioEventLoopGroup(ThreadFactory threadFactory)
Result:
User can only set thread factory without setting the thread number to 0:
NioEventLoopGroup boss = new NioEventLoopGroup(new DefaultThreadFactory("boss"));
Motivation
The recently-introduced event loop scheduling hooks can be exploited by
the epoll transport to avoid waking the event loop when scheduling
future tasks if there is a timer already set to wake up sooner.
There is also a "default" timeout which will wake the event
loop after 1 second if there are no pending future tasks. The
performance impact of these wakeups themselves is likely negligible but
there's significant overhead in having to re-arm the timer every time
the event loop goes to sleep (see #7816). It's not 100% clear why this
timeout was there originally but we're sure it's no longer needed.
Modification
Combine the existing volatile wakenUp and non-volatile prevDeadlineNanos
fields into a single AtomicLong that stores the next scheduled wakeup
time while the event loop is in epoll_wait, and is -1 while it is awake.
Use this as a guard to debounce wakeups from both immediate scheduled
tasks and future scheduled tasks, the latter using the new
before/afterScheduledTaskSubmitted overrides and based on whether the
new deadline occurs prior to an already-scheduled timer.
A similar optimization was already added to NioEventLoop, but it still
uses two separate volatiles. We should consider similar streamlining of
that in a future update.
Result
Fewer event loop wakeups when scheduling future tasks, greatly reduced
overhead when no future tasks are scheduled.
Motivation:
There is a goto statement above the current position of initialize dynamicMethods, and dynamicMethods is used after the goto which might cause undefined behavior.
Modifications:
Initialize dynamicMehtods at the top.
Result:
No more undefined behavior.
Motivation
This is a vestige that was removed in the original PR #9535 before it
was reverted, but we missed it when re-applying in #9586.
It means there is a possible race condition because a wakeup event could
be missed while shutting down, but the consequences aren't serious since
there's a 1 second safeguard timeout when waiting for it.
Modification
Remove call to epollWaitNow() in EpollEventLoop#closeAll()
Result
Cleanup redundant code, avoid shutdown delay race condition
Motivation
The current event loop shutdown logic is quite fragile and in the
epoll/NIO cases relies on the default 1 second wait/select timeout that
applies when there are no scheduled tasks. Without this default timeout
the shutdown would hang indefinitely.
The timeout only takes effect in this case because queued scheduled
tasks are first cancelled in
SingleThreadEventExecutor#confirmShutdown(), but I _think_ even this
isn't robust, since the main task queue is subsequently serviced which
could result in some new scheduled task being queued with much later
deadline.
It also means shutdowns are unnecessarily delayed by up to 1 second.
Modifications
- Add/extend unit tests to expose the issue
- Adjust SingleThreadEventExecutor shutdown and confirmShutdown methods
to explicitly add no-op tasks to the taskQueue so that the subsequent
event loop iteration doesn't enter blocking wait (as looks like was
originally intended)
Results
Faster and more robust shutdown of event loops, allows removal of the
default wait timeout
Motivation:
At the moment we not consistently (and also not correctly) free allocated native memory in all cases during loading the JNI library. This can lead to native memory leaks in the unlikely case of failure while trying to load the library.
Beside this we also not always correctly handle the case when a new java object can not be created in native code because of out of memory.
Modification:
- Copy some macros from netty-tcnative to be able to handle errors in a more easy fashion
- Correctly account for New* functions to return NULL
- Share code
Result:
More robust and clean JNI code
Motivation
This is another iteration of #9476.
Modifications
Instead of maintaining a count of all writes performed and then using
reads during shutdown to ensure all are accounted for, just set a flag
after each write and don't reset it until the corresponding event has
been returned from epoll_wait.
This requires that while a write is still pending we don't reset
wakenUp, i.e. continue to block writes from the wakeup() method.
Result
Race condition eliminated. Fixes#9362
Co-authored-by: Norman Maurer <norman_maurer@apple.com>
Motivation:
Running tests with a `KQueueDomainSocketChannel` showed worse performance than an `NioSocketChannel`. It turns out that the default send buffer size for Nio sockets is 64k while for KQueue sockets it's 8k. I verified that manually setting the socket's send buffer size improved perf to expected levels.
Modification:
Plumb the `SO_SNDBUF` and `SO_RCVBUF` options into the `*DomainSocketChannelConfig`.
Result:
Can now configure send and receive buffer sizes for domain sockets.
Motivation
Currently an epoll_ctl syscall is made every time there is a change to
the event interest flags (EPOLLIN, EPOLLOUT, etc) of a channel. These
are only done in the event loop so can be aggregated into 0 or 1 such
calls per channel prior to the next call to epoll_wait.
Modifications
I think further streamlining/simplification is possible but for now I've
tried to minimize structural changes and added the aggregation beneath
the existing flag manipulation logic.
A new AbstractChannel#activeFlags field records the flags last set on
the epoll fd for that channel. Calls to setFlag/clearFlag update the
flags field as before but instead of calling epoll_ctl immediately, just
set or clear a bit for the channel in a new bitset in the associated
EpollEventLoop to reflect whether there's any change to the last set
value.
Prior to calling epoll_wait the event loop makes the appropriate
epoll_ctl(EPOLL_CTL_MOD) call once for each channel who's bit is set.
Result
Fewer syscalls, particularly in some auto-read=false cases. Simplified
error handling from centralization of these calls.
Motivation:
We should correctly reset the cached local and remote address when a Channel.disconnect() is called and the channel has a notion of disconnect vs close (for example DatagramChannel implementations).
Modifications:
- Correctly reset cached kicak abd remote address
- Update testcase to cover it and so ensure all transports work in a consistent way
Result:
Correctly handle disconnect()
Motivation:
Changes that were done to the EpollEventLoop to optimize some things did break some testsuite and caused timeouts. We need to investigate to see why this is the case but for
now we should just revert so we can do a release.
Modifivations:
- Partly revert 1fa7a5e697 and a22d4ba859
Result:
Testsuites pass again.
Motivation:
291f80733a introduced a change to use a byte[] to construct the InetAddress when receiving datagram messages to reduce the overhead. Unfortunally it introduced a regression when handling IPv6-mapped-IPv4 addresses and so produced an IndexOutOfBoundsException when trying to fill the byte[] in native code.
Modifications:
- Correctly use the offset on the pointer of the address.
- Add testcase
- Make tests more robust and include more details when the test fails
Result:
No more IndexOutOfBoundsException
Motivation:
394a1b3485 added support for recvmmsg(...) for unconnected datagram channels, this change also allows to use recvmmsg(...) with connected datagram channels.
Modifications:
- Always try to use recvmmsg(...) if configured to do so
- Adjust unit test to cover it
Result:
Less syscalls when reading datagram packets
Motivation:
We should also use sendmmsg on connected channels whenever possible to reduce the overhead of syscalls.
Modifications:
No matter if the channel is connected or not try to use sendmmsg when supported to reduce the overhead of syscalls
Result:
Better performance on connected UDP channels due less syscalls
Motivation:
394a1b3485 introduced the possibility to use recvmmsg(...) but did not correctly handle ipv6 mapped ip4 addresses to make it consistent with other transports.
Modifications:
- Correctly handle ipv6 mapped ipv4 addresses by only copy over the relevant bytes
- Small improvement on how to detect ipv6 mapped ipv4 addresses by using memcmp and not byte by byte compare
- Adjust test to cover this bug
Result:
Correctly handle ipv6 mapped ipv4 addresses
Motivation
This is another iteration of #9476.
Modifications
Instead of maintaining a count of all writes performed and then using
reads during shutdown to ensure all are accounted for, just set a flag
after each write and don't reset it until the corresponding event has
been returned from epoll_wait.
This requires that while a write is still pending we don't reset
wakenUp, i.e. continue to block writes from the wakeup() method.
Result
Race condition eliminated. Fixes#9362
Motivation:
When using datagram sockets which need to handle a lot of packets it makes sense to use recvmmsg to be able to read multiple datagram packets with one syscall.
Modifications:
- Add support for recvmmsg on linux
- Add new EpollChannelOption.MAX_DATAGRAM_PACKET_SIZE
- Add tests
Result:
Fixes https://github.com/netty/netty/issues/8446.
Motivation:
There are some extra log level checks (logger.isWarnEnabled()).
Modification:
Remove log level checks (logger.isWarnEnabled()) from io.netty.channel.epoll.AbstractEpollStreamChannel, io.netty.channel.DefaultFileRegion, io.netty.channel.nio.AbstractNioChannel, io.netty.util.HashedWheelTimer, io.netty.handler.stream.ChunkedWriteHandler and io.netty.channel.udt.nio.NioUdtMessageConnectorChannel
Result:
Fixes#9456
Motivation
The epoll transport was updated in #7834 to decouple setting of the
timerFd from the event loop, so that scheduling delayed tasks does not
require waking up epoll_wait. To achieve this, new overridable hooks
were added in the AbstractScheduledEventExecutor and
SingleThreadEventExecutor superclasses.
However, the minimumDelayScheduledTaskRemoved hook has no current
purpose and I can't envisage a _practical_ need for it. Removing
it would reduce complexity and avoid supporting this specific
API indefinitely. We can add something similar later if needed
but the opposite is not true.
There also isn't a _nice_ way to use the abstractions for
wakeup-avoidance optimizations in other EventLoops that don't have a
decoupled timer.
This PR replaces executeScheduledRunnable and
wakesUpForScheduledRunnable
with two new methods before/afterFutureTaskScheduled that have slightly
different semantics:
- They only apply to additions; given the current internals there's no
practical use for removals
- They allow per-submission wakeup decisions via a boolean return val,
which makes them easier to exploit from other existing EL impls (e.g.
NIO/KQueue)
- They are subjectively "cleaner", taking just the deadline parameter
and not exposing Runnables
- For current EL/queue impls, only the "after" hook is really needed,
but specialized blocking queue impls can conditionally wake on task
submission (I have one lined up)
Also included are further optimization/simplification/fixes to the
timerFd manipulation logic.
Modifications
- Remove AbstractScheduledEventExecutor#minimumDelayScheduledTaskRemoved()
and supporting methods
- Uplift NonWakeupRunnable and corresponding default wakesUpForTask()
impl from SingleThreadEventLoop to SingleThreadEventExecutor
- Change executeScheduledRunnable() to be package-private, and have a
final impl in SingleThreadEventExecutor which triggers new overridable
hooks before/afterFutureTaskScheduled()
- Remove unnecessary use of bookend tasks while draining the task queue
- Use new hooks to add simpler wake-up avoidance optimization to
NioEventLoop (primarily to demonstrate utility/simplicity)
- Reinstate removed EpollTest class
In EpollEventLoop:
- Refactor to use only the new afterFutureTaskScheduled() hook for
updating timerFd
- Fix setTimerFd race condition using a monitor
- Set nextDeadlineNanos to a negative value while the EL is awake and
use this to block timer changes from outside the EL. Restore the
known-set value prior to sleeping, updating timerFd first if necessary
- Don't read from timerFd when processing expiry event
Result
- Cleaner API for integrating with different EL/queue timing impls
- Fixed race condition to avoid missing scheduled wakeups
- Eliminate unnecessary timerFd updates while EL is awake, and
unnecessary expired timerFd reads
- Avoid unnecessary scheduled-task wakeups when using NIO transport
I did not yet further explore the suggestion of using
TFD_TIMER_ABSTIME for the timerFd.
Motivation
Currently an epoll_ctl syscall is made every time there is a change to
the event interest flags (EPOLLIN, EPOLLOUT, etc) of a channel. These
are only done in the event loop so can be aggregated into 0 or 1 such
calls per channel prior to the next call to epoll_wait.
Modifications
I think further streamlining/simplification is possible but for now I've
tried to minimize structural changes and added the aggregation beneath
the existing flag manipulation logic.
A new AbstractChannel#activeFlags field records the flags last set on
the epoll fd for that channel. Calls to setFlag/clearFlag update the
flags field as before but instead of calling epoll_ctl immediately, just
set or clear a bit for the channel in a new bitset in the associated
EpollEventLoop to reflect whether there's any change to the last set
value.
Prior to calling epoll_wait the event loop makes the appropriate
epoll_ctl(EPOLL_CTL_MOD) call once for each channel who's bit is set.
Result
Fewer syscalls, particularly in some auto-read=false cases. Simplified
error handling from centralization of these calls.
Motivation:
EPOLL supports decoupling the timed wakeup mechanism from the selector call. The EPOLL transport takes advantage of this in order to offer more fine grained timer resolution. However we are current calling timerfd_settime on each call to epoll_wait and this is expensive. We don't have to re-arm the timer on every call to epoll_wait and instead only have to arm the timer when a task is scheduled with an earlier expiration than any other existing scheduled task.
Modifications:
- Before scheduled tasks are added to the task queue, we determine if the new
duration is the soonest to expire, and if so update with timerfd_settime. We
also drain all the tasks at the end of the event loop to make sure we service
any expired tasks and get an accurate next time delay.
- EpollEventLoop maintains a volatile variable which represents the next deadline to expire. This variable is modified inside the event loop thread (before calling epoll_wait) and out side the event loop thread (immediately to ensure proper wakeup time).
- Execute the task queue before the schedule task priority queue. This means we
may delay the processing of scheduled tasks but it ensures we transfer all
pending tasks from the task queue to the scheduled priority queue to run the
soonest to expire scheduled task first.
- Deprecate IORatio on EpollEventLoop, and drain the executor and scheduled queue on each event loop wakeup. Coupling the amount of time we are allowed to drain the executor queue to a proportion of time we process inbound IO may lead to unbounded queue sizes and unpredictable latency.
Result:
Fixes https://github.com/netty/netty/issues/7829
- In most cases this results in less calls to timerfd_settime
- Less event loop wakeups just to check for scheduled tasks executed outside the event loop
- More predictable executor queue and scheduled task queue draining
- More accurate and responsive scheduled task execution
Motivation:
EpollDatagramChannel#localAddress returns wrong information when
EpollDatagramChannel is created with InternetProtocolFamily,
and EpollDatagramChannel#localAddress is invoked BEFORE the actual binding.
This is a regression caused by change
e17ce934da
Modifications:
EpollDatagramChannel() and EpollDatagramChannel(InternetProtocolFamily family)
do not cache local/remote address
Result:
Rebinding on the same address without "reuse port" works
EpollDatagramChannel#localAddress returns correct address
Motivation:
We recently made a change to use ET for the eventfd and not trigger a read each time. This testcase proves everything works as expected.
Modifications:
Add testcase that verifies thqat the wakeups happen correctly
Result:
More tests
Motivation
The AbstractEpollStreamChannel::spliceTo(FileDescriptor, ...) methods
take an offset parameter but this was effectively ignored due to what
looks like a typo in the corresponding JNI function impl. Instead it
would always use the file's own native offset.
Modification
- Fix typo in netty_epoll_native_splice0() and offset accounting in
AbstractEpollStreamChannel::SpliceFdTask.
- Modify unit test to include an invocation of the public spliceTo
method using non-zero offset.
Result
spliceTo FD methods work as expected when an offset is provided.
Motivation
I noticed this while looking at something else.
AbstractEpollStreamChannel::spliceQueue is an MPSC queue but only
accessed from the event loop. So it could be just changed to e.g. an
ArrayDeque. This PR instead reverts to using is as an MPSC queue to
avoid dispatching a task to the EL, as appears was the original
intention.
Modification
Change AbstractEpollStreamChannel::spliceQueue to be volatile and lazily
initialized via double-checked locking. Add tasks directly to the queue
from the public methods rather than possibly waking the EL just to
enqueue.
An alternative is just to change PlatformDependent.newMpscQueue() to new
ArrayDeque() and be done with it :)
Result
Less disruptive channel/fd-splicing.
Motivation:
Netty homepage(netty.io) serves both "http" and "https".
It's recommended to use https than http.
Modification:
I changed from "http://netty.io" to "https://netty.io"
Result:
No effects.
Motivation:
Compiling with -Werror,-Wuninitialized complains about the sockaddrs being uninitialized.
I believe this is because the init function netty_unix_socket_initSockaddr is in a
separate compilation unit. Since this code isn't on the criticial path, it's easy
to just memset the variables rather than suppress the warning.
Modification:
Always clear the sockaddrs, even if they will be initialized later.
Result:
Able to compile with warnings turned on
Motivation:
Some methods that either override others or are implemented as part of implementation an interface did miss the `@Override` annotation
Modifications:
Add missing `@Override`s
Result:
Code cleanup
Motivation:
c9aaa93d83 added the ability to specify an EventLoopTaskQueueFactory but did place it under MultithreadEventLoopGroup while not really belongs there.
Modifications:
Make EventLoopTaskQueueFactory a top-level interface
Result:
More logical code layout.
Motivation:
Sometimes it is desirable to be able to use a different Queue implementation for the EventLoop of a Channel. This is currently not possible without resort to reflection.
Modifications:
- Add a new constructor to Nio|Epoll|KQueueEventLoopGroup which allows to specify a factory which is used to create the task queue. This was the user can override the default implementation.
- Add test
Result:
Be able to change Queue that is used for the EventLoop.
Motivation:
We did not have support for enable / disable loopback mode in our native epoll transport and also missed the implemention to access the configured interface.
Modifications:
Add implementation and adjust test to cover it
Result:
More complete multicast support with native epoll transport
Motivation:
The wakeup logic in EpollEventLoop is overly complex
Modification:
* Simplify the race to wakeup the loop
* Dont let the event loop wake up itself (it's already awake!)
* Make event loop check if there are any more tasks after preparing to
sleep. There is small window where the non-eventloop writers can issue
eventfd writes here, but that is okay.
Result:
Cleaner wakeup logic.
Benchmarks:
```
BEFORE
Benchmark Mode Cnt Score Error Units
EpollSocketChannelBenchmark.executeMulti thrpt 20 408381.411 ± 2857.498 ops/s
EpollSocketChannelBenchmark.executeSingle thrpt 20 157022.360 ± 1240.573 ops/s
EpollSocketChannelBenchmark.pingPong thrpt 20 60571.704 ± 331.125 ops/s
Benchmark Mode Cnt Score Error Units
EpollSocketChannelBenchmark.executeMulti thrpt 20 440546.953 ± 1652.823 ops/s
EpollSocketChannelBenchmark.executeSingle thrpt 20 168114.751 ± 1176.609 ops/s
EpollSocketChannelBenchmark.pingPong thrpt 20 61231.878 ± 520.108 ops/s
```
Motivation:
We do not need to issue a read on timerfd and eventfd when the EventLoop wakes up if we register these as Edge-Triggered. This removes the overhead of 2 syscalls and so helps to reduce latency.
Modifications:
- Ensure we register the timerfd and eventfd with EPOLLET flag
- If eventfd_write fails with EAGAIN, call eventfd_read and try eventfd_write again as we only use it as wake-up mechanism.
Result:
Less syscalls and so reducing overhead.
Co-authored-by: Carl Mastrangelo <carl@carlmastrangelo.com>
Motivation:
When EpollDatagramChannel is created with an existing FileDescriptor we should detect the correct InternetProtocolFamily.
Modifications:
Obtain the InternetProtocolFamily from the given FD
Result:
Use correct InternetProtocolFamily when EpollDatagramChannel is created via existing FileDescriptor