Motivation
The AbstractEpollStreamChannel::spliceTo(FileDescriptor, ...) methods
take an offset parameter but this was effectively ignored due to what
looks like a typo in the corresponding JNI function impl. Instead it
would always use the file's own native offset.
Modification
- Fix typo in netty_epoll_native_splice0() and offset accounting in
AbstractEpollStreamChannel::SpliceFdTask.
- Modify unit test to include an invocation of the public spliceTo
method using non-zero offset.
Result
spliceTo FD methods work as expected when an offset is provided.
Motivation
I noticed this while looking at something else.
AbstractEpollStreamChannel::spliceQueue is an MPSC queue but only
accessed from the event loop. So it could be just changed to e.g. an
ArrayDeque. This PR instead reverts to using is as an MPSC queue to
avoid dispatching a task to the EL, as appears was the original
intention.
Modification
Change AbstractEpollStreamChannel::spliceQueue to be volatile and lazily
initialized via double-checked locking. Add tasks directly to the queue
from the public methods rather than possibly waking the EL just to
enqueue.
An alternative is just to change PlatformDependent.newMpscQueue() to new
ArrayDeque() and be done with it :)
Result
Less disruptive channel/fd-splicing.
Motivation:
Netty homepage(netty.io) serves both "http" and "https".
It's recommended to use https than http.
Modification:
I changed from "http://netty.io" to "https://netty.io"
Result:
No effects.
Motivation:
Compiling with -Werror,-Wuninitialized complains about the sockaddrs being uninitialized.
I believe this is because the init function netty_unix_socket_initSockaddr is in a
separate compilation unit. Since this code isn't on the criticial path, it's easy
to just memset the variables rather than suppress the warning.
Modification:
Always clear the sockaddrs, even if they will be initialized later.
Result:
Able to compile with warnings turned on
Motivation:
Some methods that either override others or are implemented as part of implementation an interface did miss the `@Override` annotation
Modifications:
Add missing `@Override`s
Result:
Code cleanup
Motivation:
We did not have support for enable / disable loopback mode in our native epoll transport and also missed the implemention to access the configured interface.
Modifications:
Add implementation and adjust test to cover it
Result:
More complete multicast support with native epoll transport
Motivation:
The wakeup logic in EpollEventLoop is overly complex
Modification:
* Simplify the race to wakeup the loop
* Dont let the event loop wake up itself (it's already awake!)
* Make event loop check if there are any more tasks after preparing to
sleep. There is small window where the non-eventloop writers can issue
eventfd writes here, but that is okay.
Result:
Cleaner wakeup logic.
Benchmarks:
```
BEFORE
Benchmark Mode Cnt Score Error Units
EpollSocketChannelBenchmark.executeMulti thrpt 20 408381.411 ± 2857.498 ops/s
EpollSocketChannelBenchmark.executeSingle thrpt 20 157022.360 ± 1240.573 ops/s
EpollSocketChannelBenchmark.pingPong thrpt 20 60571.704 ± 331.125 ops/s
Benchmark Mode Cnt Score Error Units
EpollSocketChannelBenchmark.executeMulti thrpt 20 440546.953 ± 1652.823 ops/s
EpollSocketChannelBenchmark.executeSingle thrpt 20 168114.751 ± 1176.609 ops/s
EpollSocketChannelBenchmark.pingPong thrpt 20 61231.878 ± 520.108 ops/s
```
Motivation:
We do not need to issue a read on timerfd and eventfd when the EventLoop wakes up if we register these as Edge-Triggered. This removes the overhead of 2 syscalls and so helps to reduce latency.
Modifications:
- Ensure we register the timerfd and eventfd with EPOLLET flag
- If eventfd_write fails with EAGAIN, call eventfd_read and try eventfd_write again as we only use it as wake-up mechanism.
Result:
Less syscalls and so reducing overhead.
Co-authored-by: Carl Mastrangelo <carl@carlmastrangelo.com>
Motivation:
When EpollDatagramChannel is created with an existing FileDescriptor we should detect the correct InternetProtocolFamily.
Modifications:
Obtain the InternetProtocolFamily from the given FD
Result:
Use correct InternetProtocolFamily when EpollDatagramChannel is created via existing FileDescriptor
Motivation:
Provide epoll/native multicast to support high load multicast users (we are using it for a high load telecomm app at my day job).
Modification:
Added support for source specific and any source multicast for epoll transport. Some caveats: no support for disabling loop back mode, retrieval of interface and block operation, all of which tend to be less frequently used.
Result:
Provides epoll transport multicast for common use cases.
Co-authored-by: Norman Maurer <norman_maurer@apple.com>
Motivation:
The current KQueueEventLoop implementation does not process concurrent domain socket channel registration/unregistration in the order they actual
happen since unregistration are delated by an event loop task scheduling. When a domain socket is closed, it's file descriptor might be reused
quickly and therefore trigger a new channel registration using the same descriptor.
Consequently the KQueueEventLoop#add(AbstractKQueueChannel) method will overwrite the current inactive channels having the same descriptor
and the delayed KQueueEventLoop#remove(AbstractKQueueChannel) will remove the active channel that replaced the inactive one.
As active channels are registered, events for this file descriptor won't be processed anymore and the channels will never be closed.
The same problem can also happen in EpollEventLoop. Beside this we also may never remove the AbstractEpollChannel from the internal map
when it is unregistered which will prevent it from be GC'ed
Modifications:
- Change logic of native KQueue and Epoll implementations to ensure we correctly handle the case of FD reuse
- Only try to update kevent / epoll if the Channel is still open (as otherwise it will be handled by kqueue / epoll itself)
- Correctly remove AbstractEpollChannel from internal map in all cases
- Make implementation of closeAll() consistent for Epoll and KQueueEventLoop
Result:
KQueue and Epoll native transports correctly handle FD reuse
Co-authored-by: Norman Maurer <norman_maurer@apple.com>
Motivation:
OOME is occurred by increasing suppressedExceptions because other libraries call Throwable#addSuppressed. As we have no control over what other libraries do we need to ensure this can not lead to OOME.
Modifications:
Only use static instances of the Exceptions if we can either dissable addSuppressed or we run on java6.
Result:
Not possible to OOME because of addSuppressed. Fixes https://github.com/netty/netty/issues/9151.
Motivation
These aren't needed, only one field from each class is used. It also showed as an ambiguous identifier compilation error in my IDE even though javac is obviously fine with it.
Modifications
Static-import explicit ChannelOption fields in EpollDomainSocketChannelConfig instead of using .* wildcard.
Result
Cleaner / more consistent code.
Motivation
These implementations delegate most of their methods to an existing Handle and previously extended RecvByteBufAllocator.DelegatingHandle. This was reverted in #6322 with the introduction of ExtendedHandle but it's not clear to me why it needed to be - the code looks a lot cleaner.
Modifications
Have (Epoll|KQueue)RecvByteAllocatorHandle extend DelegatingHandle again, while still implementing ExtendedHandle.
Result
Less code.
Motivation:
At the moment we throw a ChannelException if netty_epoll_linuxsocket_setTcpMd5Sig fails. This is inconsistent with other methods which throw a IOException.
Modifications:
Throw IOException
Result:
More correct and consistent exception usage in epoll transport
Motivation:
When netty_epoll_linuxsocket_setTcpMd5Sig fails to init the sockaddr we should throw an exception and not silently return.
Modifications:
Throw exception if init of sockaddr fails.
Result:
Correctly report back error to user.
…nterface, block or loopback-mode-disabled operations).
Motivation:
Provide epoll/native multicast to support high load multicast users (we are using it for a high load telecomm app at my day job).
Modification:
Added support for (ipv4 only) source specific and any source multicast for epoll transport. Some caveats (beyond no ipv6 support initially - there’s a bit of work to add in join and leave group specifically around SSM, as ipv6 uses different data structures for this): no support for disabling loop back mode, retrieval of interface and block operation, all of which tend to be less frequently used.
Result:
Provides epoll transport multicast for IPv4 for common use cases. Understand if you’d prefer to hold off until ipv6 is included but not sure when I’ll be able to get to that.
Motivation:
com.puppycrawl.tools checkstyle < 8.18 was reported to contain a possible security flaw. We should upgrade.
Modifications:
- Upgrade netty-build and checkstyle.
- Fix checkstyle errors
Result:
Fixes https://github.com/netty/netty/issues/8968.
Motivation:
Since DomainSocketChannel is a DuplexChannel, which be able to shutdown input or output individually on demands, but ALLOW_HALF_CLOSURE channel option has not been supported yet.
I thought this could be a missing feature of Unix domain socket, so here the PR for it.
Modifications:
1. Added allHalfClosure property both in EpollDomainSocketChannelConfig and KQueueDomainSocketChannelConfig,
2. Enabled isAllowHalfClosure method of native channel to support domain channel config,
3. Created EpollDomainSocketShutdownOutputByPeerTest and KQueueDomainSocketShutdownOutputByPeerTest to verify the change.
Result:
ALLOW_HALF_CLOSURE channel option can be set with DomainSocketChannel, and no more warning of Unknown channel option 'ALLOW_HALF_CLOSURE'.
Motivation:
`DefaultFileRegion.transferTo` will return 0 all the time when we request more data then the actual file size. This may result in a busy spin while processing the fileregion during writes.
Modifications:
- If we wrote 0 bytes check if the underlying file size is smaller then the requested count and if so throw an IOException
- Add DefaultFileRegionTest
- Add a test to the testsuite
Result:
Fixes https://github.com/netty/netty/issues/8868.
Motivation:
We can just use Objects.requireNonNull(...) as a replacement for ObjectUtil.checkNotNull(....)
Modifications:
- Use Objects.requireNonNull(...)
Result:
Less code to maintain.
Motivation:
When using a linux distribution that supports sendmmsg(...) we allocated enough direct memory per EpollEventLoop to be able to write IOV_MAX number of iovecs per message that can be written per sendmmsg.
The number of messages that can be written per sendmmsg(...) call is limited by UIO_MAX_IOV.
In practice this resulted in an allocation of 16MB direct memory per EpollEventLoop instance that stayed allocated until the EpollEventLoop was shutdown which happens as part of the shutdown of the enclosing EpollEVentLoopGroup.
This resulted in quite some heavy direct memory usage in practice even when in practice we have very slim changes to ever need all of the memory.
Modification:
Adjust NativeDatagramPacketArray to share one IovArray instance across all NativeDatagramPacket instances it holds. This limits the max number of iovecs we can write across all messages to IOV_MAX per sendmmsg(...) call.
This in practice will still be enough to allow us to write multiple messages with one syscall while keep the memory overhead to a minimum.
Result:
Smaller direct memory footprint per EpollEventLoop when using EpollDatagramChannel on distributions that support sendmmsg(...).
Fixes https://github.com/netty/netty/issues/8814
Motivation:
In the native code EpollDatagramChannel / KQueueDatagramChannel creates a DatagramSocketAddress object for each received UDP datagram even when in connected mode as it uses the recvfrom(...) / recvmsg(...) method. Creating these is quite heavy in terms of allocations as internally, char[], String, Inet4Address, InetAddressHolder, InetSocketAddressHolder, InetAddress[], byte[] objects are getting generated when constructing the object. When in connected mode we can just use regular read(...) calls which do not need to allocate all of these.
Modifications:
- When in connected mode use read(...) and NOT recvfrom(..) / readmsg(...) to reduce allocations when possible.
- Adjust tests to ensure read works as expected when in connected mode.
Result:
Less allocations and GC when using native datagram channels in connected mode. Fixes https://github.com/netty/netty/issues/8770.
Motivation:
We have a utility method to check for > 0 and >0 arguments. We should use it.
Modification:
use checkPositive/checkPositiveOrZero instead of if statement.
Result:
Re-use utility method.
Motivation:
We can use lambdas now as we use Java8.
Modification:
use lambda function for all package, #8751 only migrate transport package.
Result:
Code cleanup.
Motivation:
We need to update to a new checkstyle plugin to allow the usage of lambdas.
Modifications:
- Update to new plugin version.
- Fix checkstyle problems.
Result:
Be able to use checkstyle plugin which supports new Java syntax.
* Decouble EventLoop details from the IO handling for each transport to allow easy re-use of code and customization
Motiviation:
As today extending EventLoop implementations to add custom logic / metrics / instrumentations is only possible in a very limited way if at all. This is due the fact that most implementations are final or even package-private. That said even if these would be public there are the ability to do something useful with these is very limited as the IO processing and task processing are very tightly coupled. All of the mentioned things are a big pain point in netty 4.x and need improvement.
Modifications:
This changeset decoubled the IO processing logic from the task processing logic for the main transport (NIO, Epoll, KQueue) by introducing the concept of an IoHandler. The IoHandler itself is responsible to wait for IO readiness and process these IO events. The execution of the IoHandler itself is done by the SingleThreadEventLoop as part of its EventLoop processing. This allows to use the same EventLoopGroup (MultiThreadEventLoupGroup) for all the mentioned transports by just specify a different IoHandlerFactory during construction.
Beside this core API change this changeset also allows to easily extend SingleThreadEventExecutor / SingleThreadEventLoop to add custom logic to it which then can be reused by all the transports. The ideas are very similar to what is provided by ScheduledThreadPoolExecutor (that is part of the JDK). This allows for example things like:
* Adding instrumentation / metrics:
* how many Channels are registered on an SingleThreadEventLoop
* how many Channels were handled during the IO processing in an EventLoop run
* how many task were handled during the last EventLoop / EventExecutor run
* how many outstanding tasks we have
...
...
* Implementing custom strategies for choosing the next EventExecutor / EventLoop to use based on these metrics.
* Use different Promise / Future / ScheduledFuture implementations
* decorate Runnable / Callables when submitted to the EventExecutor / EventLoop
As a lot of functionalities are folded into the MultiThreadEventLoopGroup and SingleThreadEventLoopGroup this changeset also removes:
* AbstractEventLoop
* AbstractEventLoopGroup
* EventExecutorChooser
* EventExecutorChooserFactory
* DefaultEventLoopGroup
* DefaultEventExecutor
* DefaultEventExecutorGroup
Result:
Fixes https://github.com/netty/netty/issues/8514 .
Motivation:
We can use the diamond operator these days.
Modification:
Use diamond operator whenever possible.
Result:
More modern code and less boiler-plate.
Motivation:
While using Load Balancers or HA support is needed there are cases when UDP channel need to bind to IP Address which is not available on network interfaces locally.
Modification:
Modified EpollDatagramChannelConfig to allow IP_FREEBIND option
Result:
Fixes ##8727.
Motiviation:
Because of how we implemented the registration / deregistration of an EventLoop it was not possible to wrap an EventLoop implementation and use it with a Channel.
Modification:
- Introduce EventLoop.Unsafe which is responsible for the actual registration.
- Move validation of EventLoop / Channel combo to the EventLoop
- Add unit test that verifies that wrapping works
Result:
Be able to wrap an EventLoop and so add some extra functionality.
Motivation:
At the moment it’s possible to have a Channel in Netty that is not registered / assigned to an EventLoop until register(...) is called. This is suboptimal as if the Channel is not registered it is also not possible to do anything useful with a ChannelFuture that belongs to the Channel. We should think about if we should have the EventLoop as a constructor argument of a Channel and have the register / deregister method only have the effect of add a Channel to KQueue/Epoll/... It is also currently possible to deregister a Channel from one EventLoop and register it with another EventLoop. This operation defeats the threading model assumptions that are wide spread in Netty, and requires careful user level coordination to pull off without any concurrency issues. It is not a commonly used feature in practice, may be better handled by other means (e.g. client side load balancing), and therefore we propose removing this feature.
Modifications:
- Change all Channel implementations to require an EventLoop for construction ( + an EventLoopGroup for all ServerChannel implementations)
- Remove all register(...) methods from EventLoopGroup
- Add ChannelOutboundInvoker.register(...) which now basically means we want to register on the EventLoop for IO.
- Change ChannelUnsafe.register(...) to not take an EventLoop as parameter (as the EventLoop is supplied on custruction).
- Change ChannelFactory to take an EventLoop to create new Channels and introduce ServerChannelFactory which takes an EventLoop and one EventLoopGroup to create new ServerChannel instances.
- Add ServerChannel.childEventLoopGroup()
- Ensure all operations on the accepted Channel is done in the EventLoop of the Channel in ServerBootstrap
- Change unit tests for new behaviour
Result:
A Channel always has an EventLoop assigned which will never change during its life-time. This ensures we are always be able to call any operation on the Channel once constructed (unit the EventLoop is shutdown). This also simplifies the logic in DefaultChannelPipeline a lot as we can always call handlerAdded / handlerRemoved directly without the need to wait for register() to happen.
Also note that its still possible to deregister a Channel and register it again. It's just not possible anymore to move from one EventLoop to another (which was not really safe anyway).
Fixes https://github.com/netty/netty/issues/8513.
* Handling AUTO_READ should not be the responsibility of DefaultChannelPipeline but the Channel itself.
Motivation:
At the moment we do automatically call read() in the DefaultChannelPipeline when fireChannelReadComplete() / fireChannelActive() is called and the Channel is using auto read. This is nice in terms of sharing code but imho is not the responsibility of the ChannelPipeline implementation but the responsibility of the Channel implementation.
Modifications:
Move handing of auto read from DefaultChannelPipeline to Channel implementations.
Result:
More clear responsibiliy and not depending on implemention details of the ChannelPipeline.
Motivation:
epoll_wait should work in 4.1.30 like it did in 4.1.29.
Modifications:
Revert Integer.MAX_VALUE back to MAX_SCHEDULED_TIMERFD_NS (999,999,999).
Add unit test.
Result:
epoll_wait will no longer throw EINVAL.
Motivation:
Add an option (through a SelectStrategy return code) to have the Netty event loop thread to do busy-wait on the epoll.
The reason for this change is to avoid the context switch cost that comes when the event loop thread is blocked on the epoll_wait() call.
On average, the context switch has a penalty of ~13usec.
This benefits both:
The latency when reading from a socket
Scheduling tasks to be executed on the event loop thread.
The tradeoff, when enabling this feature, is that the event loop thread will be using 100% cpu, even when inactive.
Modification:
Added SelectStrategy option to return BUSY_WAIT
Epoll loop will do a epoll_wait() with no timeout
Use pause instruction to hint to processor that we're in a busy loop
Result:
When enabled, minimizes impact of context switch in the critical path
Motivation
The EpollChannelConfig (same for KQueues) and its subclasses repeatetly declare their own channel field which leads to a 3x repetition for each config instance. Given the fields are protected or package-private it's exposing the code code to "field hiding" bugs.
Modifications
Use the the existing protected channel field from the DefaultChannelConfig class and simply cast it when needed.
Result
Fixes#8331
Motivation:
The Epoll transport checks to see if there are any scheduled tasks
before entering epoll_wait, and resets the timerfd just before.
This causes an extra syscall to timerfd_settime before doing any
actual work. When scheduled tasks aren't added frequently, or
tasks are added with later deadlines, this is unnecessary.
Modification:
Check the *deadline* of the peeked task in EpollEventLoop, rather
than the *delay*. If it hasn't changed since last time, don't
re-arm the timer
Result:
About 2us faster on gRPC RTT 50pct latency benchmarks.
Before (2 runs for 5 minutes, 1 minute of warmup):
```
50.0%ile Latency (in nanos): 64267
90.0%ile Latency (in nanos): 72851
95.0%ile Latency (in nanos): 78903
99.0%ile Latency (in nanos): 92327
99.9%ile Latency (in nanos): 119691
100.0%ile Latency (in nanos): 13347327
QPS: 14933
50.0%ile Latency (in nanos): 63907
90.0%ile Latency (in nanos): 73055
95.0%ile Latency (in nanos): 79443
99.0%ile Latency (in nanos): 93739
99.9%ile Latency (in nanos): 123583
100.0%ile Latency (in nanos): 14028287
QPS: 14936
```
After:
```
50.0%ile Latency (in nanos): 62123
90.0%ile Latency (in nanos): 70795
95.0%ile Latency (in nanos): 76895
99.0%ile Latency (in nanos): 90887
99.9%ile Latency (in nanos): 117819
100.0%ile Latency (in nanos): 14126591
QPS: 15387
50.0%ile Latency (in nanos): 61021
90.0%ile Latency (in nanos): 70311
95.0%ile Latency (in nanos): 76687
99.0%ile Latency (in nanos): 90887
99.9%ile Latency (in nanos): 119527
100.0%ile Latency (in nanos): 6351615
QPS: 15571
```
Motivation:
When using Epoll based transport, allow applications to configure SO_BUSY_POLL socket option:
SO_BUSY_POLL (since Linux 3.11)
Sets the approximate time in microseconds to busy poll on a
blocking receive when there is no data. Increasing this value
requires CAP_NET_ADMIN. The default for this option is con‐
trolled by the /proc/sys/net/core/busy_read file.
The value in the /proc/sys/net/core/busy_poll file determines
how long select(2) and poll(2) will busy poll when they oper‐
ate on sockets with SO_BUSY_POLL set and no events to report
are found.
In both cases, busy polling will only be done when the socket
last received data from a network device that supports this
option.
While busy polling may improve latency of some applications,
care must be taken when using it since this will increase both
CPU utilization and power usage.
Modification:
Added SO_BUSY_POLL socket option
Result:
Able to configure SO_BUSY_POLL from Netty
Motivation:
We should ensure we call *UnLoad when we detect an error during calling *OnLoad and previous *OnLoad calls were succesfull.
Modifications:
Correctly call *UnLoad when needed.
Result:
More correct code and no leaks when an error happens during loading the native lib.
* Allow to use native transports when sun.misc.Unsafe is not present on the system
Motivation:
We should be able to use the native transports (epoll / kqueue) even when sun.misc.Unsafe is not present on the system. This is especially important as Java11 will be released soon and does not allow access to it by default.
Modifications:
- Correctly disable usage of sun.misc.Unsafe when -PnoUnsafe is used while running the build
- Correctly increment metric when UnpooledDirectByteBuf is allocated. This was uncovered once -PnoUnsafe usage was fixed.
- Implement fallbacks in all our native transport code for when sun.misc.Unsafe is not present.
Result:
Fixes https://github.com/netty/netty/issues/8229.
Motivation:
We should support to load multiple shaded versions of the same netty artifact as netty is often used in multiple dependencies.
This is related to https://github.com/netty/netty/issues/7272.
Modifications:
- Use -fvisibility=hidden when compiling and use JNIEXPORT for things we really want to have exported
- Ensure fields are declared as static so these are not exported
- Adjust testsuite-shading to use install_name_tool on MacOS to change the id of the lib. Otherwise the wrong may be used.
Result:
Be able to use multiple shaded versions of the same netty artifact.
Motivation:
Avoid unnecessary native memory allocation if UDP / TCP isn't being
used.
Modifications:
Create the reused NativeDatagramPacketArray and IovArray upon first use
instead of EpollEventLoop construction.
Also correct related comment in NativeDatagramPacketArray.
Result:
Reduced native memory use when using epoll in many cases
Motivation:
We can store the NativeDatagramPacketArray directly in the EpollEventLoop. This removes the need of using FastThreadLocal.
Modifications:
- Store NativeDatagramPacketArray directly in the EpollEventLoop (just as we do with IovArray as well).
Result:
Less FastThreadLocal usage and more consistent code.
Motivation:
Epoll and Kqueue channels have internal state which forces
a single read operation after channel construction. This
violates the Channel#read() interface which indicates that
data shouldn't be delivered until this method is called.
The behavior is also inconsistent with the NIO transport.
Modifications:
- Epoll and Kqueue shouldn't unconditionally read upon
initialization, and instead should rely upon Channel#read()
or auto_read.
Result:
Epoll and Kqueue are more consistent with NIO.
Motivation:
We should allow to schedule tasks with a delay up to Long.MAX_VALUE as we did pre 4.1.25.Final.
Modifications:
Just ensure we not overflow and put the correct max limits in place when schedule a timer. At worse we will get a wakeup to early and then schedule a new timeout.
Result:
Fixes https://github.com/netty/netty/issues/7970.
* Read until all data is consumed when EOF is detected even if readPending is false and auto-read is disabled.
Motivation:
We should better always notify the user of EOF even if the user did not request any data as otherwise we may never be notified when the remote peer closes the connection. This should be ok as the amount of extra data we may read and so fire through the pipeline is limited by SO_RECVBUF.
Modifications:
- Always drain the socket when EOF is detected.
- Add testcase
Result:
No risk for the user to be not notified of EOF.