Motivation
There is a race condition when shutting down the event loop where the
eventFd write performed in the wakeup() method may actually hit a
different fd if it's closed and reassigned in the meantime.
This was already encountered and addressed in the epoll case.
Modifications
Similar to what's done for epoll, in IOUringEventLoop:
- Reinstate pendingWakeup flag which tracks when there is a wakeup
pending (CAS of nextWakeupNanos performed by other thread in the
wakeup() method)
- Add logic to the cleanup() method to wait for corresponding READ CQE
before closing the eventFd
- Remove unused fields from IOUringCompletionQueue (cleanup)
Result
No event loop shutdown race
Motivation:
liburing ships its own iouring headers so its possible to build on older
machines as well. We should do the same.
Modifications:
- Include header files and so not depend on kernel version
- Fix license files and header for attribution
Result:
Be able to build easier
Motivation:
We can also support UDP / Datagram based on io_uring, so we should do it
for maximal performance
Modifications:
- Add IOUringDatagramChannel
- Add tests based on our transport testsuite for it
Result:
UDP / Datagram is supported via io_uring as well now
Motivation:
IOUringCompletionQueue did use 2 spaces but we use 4 spaces in netty.
Beside this there were not javadocs
Modifications:
- Use 4 spaces
- Add javadocs
- remove public from method signature
Result:
Code cleanup
Motivation:
We always encoded the rwflags into user data which only makes sense for
POLL* atm. We should decouple this and so allow to store other things
into the user data for other ops.
Modifications:
Allow to explicit define what to store into user data and so be more
flexible.
Result:
More flexible usage
Motivation:
Creating exceptions is expensive so we should only do so if really needed. This is basically a port of #10595 for io_uring.
Modifications:
Only create the ConnectTimeoutException if we really need it.
Result:
Less overhead
Motivation:
While the current code works just fine we should better lookup the
offsets of the various struct members on init and use these. This way
we are sure the code is portable and correct.
Modifications:
Lookup various offsets on init and than use the offsets when reading and
writing to / from the structs
Result:
More robust and portable code
Motivation:
We want to keep the amount of JNI as small as possible to reduce the
performance overhead now that we eliminated the overhead of the need of
it for syscalls.
Modifications:
Write / read sockaddr_in / sockaddr_in6 via PlatformDependent and so
eliminate the need for JNI
Result:
Less JNI and so less overhead for crossing the border.
Motivation:
Just some cleanup needed in general
Modifications:
- Make methods package-private when the class is package-private
- Use spaces and not tabs everywhere
- Fix eventfd_write usage as the implementation was only needed like
this for EPOLL when used with edge-triggered
- Correctly handle EINTR
Result:
Cleaner code
Motivation:
There may be situations when the user dont want to use IOSEQ_ASYNC so we
should allow to configure this
Modifications:
Make it configurable if IOSEQ_ASYNC should be used
Result:
More flexible configuration
Motivation:
Using classes which are not provided by the JDK itself in JNI is
problematic when shading may be used by customers of the library. Also
it makes the maintainance of the code often more complicated.
Modifications:
- Only us classes which are provided by the JDK in the JNI code
- Cleanup
Result:
Easier to maintain code
Motivation:
io_uring supports the same way of obtaining the remoteAddress as
accept4(...) does. We should use it
Modifications:
Obtain the remoteAddress of the accepted socket as part of the accept
operation
Result:
Ensure we always see the correct remoteAddress when accepting sockets
Motivation:
calling methods in JNI are more expensive, it would be cleaner not using the getter methods
Modifications:
-delete getter methods
-access these fields directly
Result:
it's more efficient
Motivation
IOUringEventLoop can be streamlined to further reduce io_uring_enter
calls
Modification
- Don't prepare to block-wait until all available work is exhausted
- Combine submission with GETEVENTS
Result
Hopefully faster
see any stales
Motivation:
When a POLLRDHUP was received we need to continue draining the input
until EOF is detected as otherwise we may see stales when the user never
tries to read again.
Modifications:
- Correctly handle reading when POLLRDHUP was seen
- Remove @Ignore from testcases related to POLLRDHUP handling
Result:
Correctly drain input when POLLRDHUP was received in all cases
Motivation:
We should allow to specify the ringsize when constructing the
IOUringEventLoopGroup and also be constistent with the rest of the
EventLoopGroup implementations
Modifications:
- Cleanup constructors
- Make ringSize configurable
Result:
Cleaner code and more flexible in terms of configuration
Motivation:
we should throw a jvm runtime exception for io_uring creation failure to avoid a NullPointerException
Modifications:
-error handling for creation ring fd and mmap io_uring ring buffer
-some cleanups
Result:
better error handling
Motivation
SQE handling can be simplified in terms of code and operations
performed
Modifications
- Zero SQE array up front - no need to set never-used fields each time
- Fill SQ array up front with corresponding indicies - no need to set
each time since they are 1-1 with SQE array entries
- Keep local head and tail vars and don't track separate sqe array
head/tail
- Allocate memory for timespec directly (no need for ByteBuffer)
- Avoid some unnecessary casts / type conversions (no need to convert
uints to longs)
Result
Fewer operations, less code
Motivation
If we make eventfd blocking then read can take the place of poll+read
Modification
Make eventfd blocking, use READ instead of POLLIN, allocating a static
64bit buffer to read into
Result
Fewer kernel roundtrips for event loop wakeups
easier to test.
Motivation:
We should move the IovArray related code to an extra class so its easier
to test
Modifications:
- Move into extra class
- Add dedicated test
Result:
Cleanup
use it for clearing the IovArrays
Motivation:
IOUringSubmissionQueue may call submit() internally when there is no
space left in the buffer. Once this is done we can reuse for example
IovArrays etc. Because of this its useful to be able to specify a
callback that is executed after submission
Modifications:
- Allow to specify a Runnable that is called once submission was
complete
- Use this callback to clear the IovArrays
Result:
IovArrays are automatically cleared on each submit call.
Motivation:
We should only keep on reading if the fd is still open, otherwise we
will produce a confusing exception
Modifications:
check if fd is still open before schedule the read.
Result:
Dont produce a confusing excepion when the fd was closed during a read
loop.
Motivation:
We need to be careful that we only execute the close(...) once the write
operation completes as otherwise we may close the underlying socket too
fast and also the writes
Modifications:
Keep track of if we need to delay the close or not and if so execute it
once the write completes
Result:
No more test failures
Motivation:
It is possible that io_uring_enter(...) fails with EINTR. In this case
we should just retry the operation
Modifications:
Retry when EINTR was detected
Result:
More correct use of io_uring_enter(...)
Motivation:
At the moment our CI can not build and run the native bits for the iouring transport so we should just not compile this at the moment. The java classes itself should still be compiled tho
Modifications:
Add explicit profile to compile native bits of iouring
Result:
CI passes with iouring transport
Motivation:
incorrect for loop we could end up with an AssertionError (this is
sometimes triggered during testsuite run)
Modifications:
Fix for loop that calls IovArray.clear()
Result:
No more AssertionError
Motivation:
How we did manage the memory of writev was quite wasteful and could
produce a lot of memory overhead. We can just keep it simple by using
an array of IovArrays. Once these are full we can just submit and clear these as at this
point the kernel did take over a copy and its safe to reuse
Modifications:
Use an array of IovArrays and submit once it is full.
Result:
Less memory overhead and less code duplication
IOURING_OP_WRITEV
Motivation:
The bug related to IOSQE_ASYNC and IORING_OP_WRITEV was fixed so no need
to have the workaround
Modifications:
Remove workaround
Result:
Use IOSQE_ASYNC all the time
writes
Motivation:
We need to carefully manage state in terms of writing to guard against
rentrancy problems that could lead to corrupt state in the
ChannelOutboundBuffer
Modifications:
Only reset the flag once removeBytes(...) was called
Result:
No more reentrancy bug related to writes.
Motivation:
There is currently a bug in the kernel that let WRITEV sometimes fail
when IOSEQ_ASYNC is enabled
Modifications:
Don't use IOSEQ_ASYNC for WRITEV for now
Result:
Tests pass
Motivation:
We should better use JNI to lookup constants so we are sure we not mess
things up
Modifications:
Use JNI calls to lookup constants once
Result:
Less error prone code
Motivation:
At least in the throughput benchmarks it has shown that IOSQE_ASYNC
gives a lot of performance improvements. Lets enable it by default for
now and maybe make it configurable in the future
Modifications:
Use IOSEQ_ASYNC
Result:
Better performance
Motivation:
We should submit multiple IO ops at once to reduce the syscall overhead.
Modifications:
- Submit multiple IO ops in batches
- Adjust default ringsize
Result:
Much better performance
Motivation:
We should only reset the RecvByteBufAllocator.Handle when a new "read
loop" starts. Otherwise the handle will not be able to correctly limit
reads.
Modifications:
- Move reset(...) call into pollIn(...)
- Remove all @Ignore
Result:
The whole testsuite passes
Motivation:
Due a bug SO_BACKLOG was not supported via ChannelOption when using io_uring. Be
Modification:
- Add SO_BACKLOG to the supported ChannelOptions.
- Merge IOUringServerChannelConfig into IOUringServerSocketChannelConfig
Result:
SO_BACKLOG is supported
Motivation:
we should move the initAddress to LinuxSocket JNI as it is only used there
Modifications:
- cleanup
- move initAddress to linux socket JNI
Result:
it's cleaner