Motivation:
It would be useful to support the Java `Map` interface in our primitive maps.
Modifications:
Renamed current methods to "pXXX", where p is short for "primitive". Made the template for all primitive maps extend the appropriate Map interface.
Result:
Fixes#3970
Motivation:
We missed to register for EPOLLRDHUP events when construct the EpollSocketChannel from an existing FileDescriptor. This could cause to miss connection-resets.
Modifications:
Add Native.EPOLLRDHUP to the events we are interested in.
Result:
Connection-resets are detected correctly.
Motivation:
Some glibc/kernel versions will trigger an EPOLLERR event to notify
about failed connect and not an EPOLLOUT. Also EPOLLERR may be triggered
when a connection is broke.
Modification:
React on EPOLLERR like if an EPOLLOUT / EPOLLIN was received, this will work in
all cases as we handle errors in EPOLLOUT / EPOLLIN anyway.
Result:
Correctly detect errors.
Motiviation:
Linux provides the TCP_NOTSENT_LOWAT socket option. This can be used to control how much unsent data is queued in the tcp kernel buffers. This can be important when application level protocols (SPDY, HTTP/2) have their own priority mechanism and don't want data queued in the kernel.
Modifications:
- The epoll module will have an additional socket option TCP_NOTSENT_LOWAT
- There will be JNI methods to control the underlying linux socket option mechanism
Result:
Linux EPOLL module exposes the TCP_NOTSENT_LOWAT socket option.
Motivation:
Due a bug we not correctly handled connection refused errors and so failed the connect promise with the wrong exception.
Beside this we some times even triggered fireChannelActive() which is not correct.
Modifications:
- Add testcase
- correctly detect connect errors
Result:
Correct and consistent handling.
Motivation:
When trying to write more then Integer.MAX_VALUE / SSIZE_MAX via writev(...) the OS may return EINVAL depending on the kernel or the actual OS (bsd / osx always return EINVAL). This will trigger an IOException.
Modifications:
Never try to write more then Integer.MAX_VALUE / SSIZE_MAX when using writev.
Result:
No more IOException when write more data then Integer.MAX_VALUE / SSIZE_MAX via writev.
Motivation:
When EPOLLRDHUP is received we need to try to read at least one time to ensure
that we read all pending data from the socket. Otherwise we may loose data.
Modifications:
- Ensure we read all data from socket
- Ensure file descriptor is closed on doClose() even if doDeregister() throws an Exception.
- Only handle either EPOLLRDHUP or EPOLLIN as only one is needed to detect connection reset.
Result:
No more data loss on connection reset.
Motivation:
When using epoll_ctl we should respect the return value and do the right thing depending on it.
Modifications:
Adjust java and native code to respect epoll_ctl return values.
Result:
Correct and cleaner code.
Motivation:
Linux supports splice(...) to transfer data from one filedescriptor to another without
pass data through the user-space. This allows to write high-performant proxy code or to stream
stuff from the socket directly the the filesystem.
Modification:
Add AbstractEpollStreamChannel.spliceTo(...) method to support splice(...) system call
Result:
Splice is now supported when using the native linux transport.
Conflicts:
transport-native-epoll/src/main/java/io/netty/channel/epoll/AbstractEpollStreamChannel.java
Motivation:
Because of a bug we missed to fail the connect future when doClose() is called. This can lead to a future which is never notified and so may lead to deadlocks in user-programs.
Modifications:
Correctly fail the connect future when doClose() is called and the connection was not established yet.
Result:
Connect future is always notified.
Motivation:
Each different *ChannelOption did extend ChannelOption in 4.0, which we changed in 4.1. This is a breaking change in terms of the API so we need to ensure we keep the old hierarchy.
Modifications:
- Let all *ChannelOption extend ChannelOption
- Add back constructor and mark it as @deprecated
Result:
No API breakage between 4.0 and 4.1
Motivation:
As we missed to correctly handle EPOLLRDHUP we produce an IOException which is unnessary. This leads
to have exceptionCaught(...) methods called.
Modifications:
When EPOLLRDHUP was received just close the socket and fail all pending writes.
Result:
Correctly handle of EPOLLRDHUP and so not miss-leading exceptions.
Motivation:
Due a a regression that was introduced by b898bdd we failed to set the localAddress if the connect did not success directly.
Modifications:
Correct set localAddress in doConnect(...)
Result:
Be able to get the localAddress in all cases.
Motivation:
During 6b941e9bdbc1b1a9090c280bc6c44903ff7c7b67 I introduced a regression that could cause an IllegalStateException.
A non-proper fix was commited as part of #3443. This commit add a proper fix.
Modifications:
Remove FileDescriptor.INVALID and add FileDescriptor.isOpen() as replacement. Once FileDescriptor.close() is called isOpen() will return false.
Result:
No more IllegalStateException caused by a close channel.
Motivation:
EpollDragramChannel never calls fireChannelActive after connect() which is a bug.
Modifications:
Correctly call fireChannelActive if needed
Result:
Correct behaviour
Motivation:
Because of a regression sometimes accept could produce an IllegalArgumentException
Modifications:
Correctly respect offset when decode port and scope id.
Result:
No more IllegalArgumentException
Motivation:
This is a regression that was introduced as part of 6b941e9bdbc1b1a9090c280bc6c44903ff7c7b67. The regression could produce an "infinity" triggering of IllegalStateException if a channel goes inactive while process the events for it.
Modifications:
Correctly check if the channel is still active before trigger the callbacks.
Result:
No more IllegalStateException
Motivation:
There is a small race in the native transport where an accept(...) may success but a later try to obtain the remote address from the fd may fail is the fd is already closed.
Modifications:
Let accept(...) directly set the remote address.
Result:
No more race possible.
Motivation:
When epoll LT is used and autoRead == false when entering epollIn() we need to return without reading any data.
Modifications:
Correctly respect autoRead == false if using epoll LT.
Result:
Consistent and correct behaviour.
Motivation:
In the native transport we should throw a pre-instanced IOException on connection reset while reading.
Modifications:
Correctly throw pre-instanced IOException when ECONNRESET is received
Result:
Less overhead on connection reset
Motivation:
As we plan to have other native transports soon (like a kqueue transport) we should move unix classes/interfaces out of the epoll package so we
introduce other implementations without breaking stuff before the next stable release.
Modifications:
Create a new io.netty.channel.unix package and move stuff over there.
Result:
Possible to introduce other native impls beside epoll.
Motivation:
Sometimes it's useful to be able to create a Epoll*Channel from an existing file descriptor. This is especially helpful if you integrade some c/jni code.
Modifications:
- Add extra constructor to Epoll*Channel implementations that take a FileDescriptor as an argument
- Make Rename EpollFileDescriptor to NativeFileDescriptor and make it public
- Also ensure we obtain the correct remote/local address when create a Channel from a FileDescriptor
Result:
It's now possible to create a FileDescriptor and instance a Epoll*Channel via it.
Motivation:
If SO_LINGER is used shutdownOutput() and close() syscalls will block until either all data was send or until the timeout exceed. This is a problem when we try to execute them on the EventLoop as this means the EventLoop may be blocked and so can not process any other I/O.
Modifications:
- Add AbstractUnsafe.closeExecutor() which returns null by default and use this Executor for close if not null.
- Override the closeExecutor() in NioSocketChannel and EpollSocketChannel and return GlobalEventExecutor.INSTANCE if getSoLinger() > 0
- use closeExecutor() in shutdownInput(...) in NioSocketChannel and EpollSocketChannel
Result:
No more blocking of the EventLoop if SO_LINGER is used and shutdownOutput() or close() is called.
Motivation:
The writeSpinCount was ignored in the epoll transport and it just kept on trying writing. This could cause unnessary cpu spinning if a slow remote peer was reading the data very very slow.
Modification:
- Correctly take writeSpinCount into account when writing.
Result:
Less cpu spinning when writing to a slow remote peer.
Motivation:
Fix regression introduced by 585ce1593fdccc5a8d868a96c7643e0d63b1e21b, which missed to set EPOLLRDHUP for all stream channels.
Modifications:
Correctly set EPOLLRDHUP for all stream channels in the AbstractEpollStreamChannel constructor.
Result:
No more test failures in EpollDomain*Channel tests.
Motivation:
Before we used a long[] to store the ready events, this had a few problems and limitations:
- An extra loop was needed to translate between epoll_event and our long
- JNI may need to do extra memory copy if the JVM not supports pinning
- More branches
Modifications:
- Introduce a EpollEventArray which allows to directly write in a struct epoll_event* and pass it to epoll_wait.
Result:
Better speed when using native transport, as shown in the benchmark.
Before:
[xxx@xxx wrk]$ ./wrk -H 'Connection: keep-alive' -d 120 -c 256 -t 16 -s scripts/pipeline-many.lua http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 14.56ms 8.64ms 117.15ms 80.58%
Req/Sec 286.17k 38.71k 421.48k 68.17%
546324329 requests in 2.00m, 73.78GB read
Requests/sec: 4553438.39
Transfer/sec: 629.66MB
After:
[xxx@xxx wrk]$ ./wrk -H 'Connection: keep-alive' -d 120 -c 256 -t 16 -s scripts/pipeline-many.lua http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 14.12ms 8.69ms 100.40ms 83.08%
Req/Sec 294.79k 40.23k 472.70k 66.75%
555997226 requests in 2.00m, 75.08GB read
Requests/sec: 4634343.40
Transfer/sec: 640.85MB
Motivation:
Netty uses edge-triggered epoll by default for performance reasons. The downside here is that a messagesPerRead limit can not be enforced correctly, as we need to consume everything from the channel when notified.
Modification:
- Allow to switch epoll modes before channel is registered
- Some refactoring to share more code
Result:
It's now possible to switch epoll mode.
Motiviation:
When using domain sockets on linux it is supported to recv and send file descriptors. This can be used to pass around for example sockets.
Modifications:
- Add support for recv and send file descriptors when using EpollDomainSocketChannel.
- Allow to obtain the file descriptor for an Epoll*Channel so it can be send via domain sockets.
Result:
recv and send of file descriptors is supported now.
Motivation:
Using Unix Domain Sockets can be very useful when communication should take place on the same host and has less overhead then using loopback. We should support this with the native epoll transport.
Modifications:
- Add support for Unix Domain Sockets.
- Adjust testsuite to be able to reuse tests.
Result:
Unix Domain Sockets are now support when using native epoll transport.
Motivation:
At the moment the max number of events that can be handled per epoll wakup was set during construction.
Modifications:
- Automatically increase the max number of events to handle
Result:
Better performance when a lot of events need to be handled without adjusting the code.
Motivation:
The current way how the guard against overflow when generating the nextId() is pretty slow once an overflow happened.
Modifications:
Once a possible overflow is detected all ids used by the EpollEventLoop are scrubed and re-assigned to the registered Channels. This way we only need to do extra work each time an overflow is detected.
Result:
More consistent performance even after the first overflow was detected.
Motivation:
On Linux, you can gather various metrics using getsockopt(..., TCP_INFO,
...).
Modifications:
Add EpollSocketChannel.tcpInfo() which returns EpollTcpInfo that exposes
all metrics exposed via getsockopt(..., TCP_INFO, ...)
Result:
TCP_INFO support implemented
Motivation:
In the native transport we use getpeername to obtain the remote address from the file descriptor. This may fail for various reasons in which case NULL is returned.
Modifications:
- Check for null when try to obtain remote / local address
Result:
No more NPE
Related: #3274
Motivation:
channelReadComplete() event is not triggered after reading successfully
in EpollDatagramChannel.
Modifications:
- Trigger exceptionCaught() event for read failure only once for less
noise
- Trigger channelReadComplete() event at the end of the read.
Result:
Fix#3274
Rebased and cleaned-up based on the work by @normanmaurer
Motivation:
Currently, IOExceptions and ClosedChannelExceptions are thrown from
inside the JNI methods. Instantiation of Java objects inside JNI code is
an expensive operation, needless to say about filling stack trace for
every instantiation of an exception.
Modifications:
Change most JNI methods to return a negative value on failure so that
the exceptions are instantiated outside the native code.
Also, pre-instantiate some commonly-thrown exceptions for better
performance.
Result:
Performance gain
Motivation:
Everytime a new connection is accepted via EpollSocketServerChannel it will create a new EpollSocketChannel that needs to get the remote and local addresses in the constructor. The current implementation uses new InetSocketAddress(String, int) to create these. This is quite slow due the implementation in oracle and openjdk.
Modifications:
Encode all needed informations into a byte array before return from jni layer and then use new InetSocketAddress(InetAddress, int) to create the socket addresses. This allows to create the InetAddress via a byte[] and so reduce the overhead, this is done either by using InetAddress.getByteAddress(byte[]) or by Inet6Address.getByteAddress(String, byte[], int).
Result:
Reduce performance overhead while accept new connections with native transport
Motivation:
We only provided a constructor in DefaultFileRegion that takes a FileChannel which means the File itself needs to get opened on construction. This has the problem that if you want to write a lot of Files very fast you may end up with may open FD's even if they are not needed yet. This can lead to hit the open FD limit of the OS.
Modifications:
Add a new constructor to DefaultFileRegion which allows to construct it from a File. The FileChannel will only be obtained when transferTo(...) is called or the DefaultFileRegion is explicit open'ed via open() (this is needed for the native epoll transport)
Result:
Less resource usage when writing a lot of DefaultFileRegion.
Motivation:
JDK's exception messages triggered by a connection attempt failure do
not contain the related remote address in its message. We currently
append the remote address to ConnectException's message, but I found
that we need to cover more exception types such as SocketException.
Modifications:
- Add AbstractUnsafe.annotateConnectException() to de-duplicate the
code that appends the remote address
Result:
- Less duplication
- A transport implementor can annotate connection attempt failure
message more easily
Motivation:
In linux it is possible to write more then one buffer withone syscall when sending datagram messages.
Modifications:
Not copy CompositeByteBuf if it only contains direct buffers.
Result:
More performance due less overhead for copy.
Motivation:
On linux with glibc >= 2.14 it is possible to send multiple DatagramPackets with one syscall. This can be a huge performance win and so we should support it in our native transport.
Modification:
- Add support for sendmmsg by reuse IovArray
- Factor out ThreadLocal support of IovArray to IovArrayThreadLocal for better separation as we use IovArray also without ThreadLocal in NativeDatagramPacketArray now
- Introduce NativeDatagramPacketArray which is used for sendmmsg(...)
- Implement sendmmsg(...) via jni
- Expand DatagramUnicastTest to test also sendmmsg(...)
Result:
Netty now automatically use sendmmsg(...) if it is supported and we have more then 1 DatagramPacket in the ChannelOutboundBuffer and flush() is called.
Motivation:
On linux it is possible to use the sendMsg(...) system call to write multiple buffers with one system call when using datagram/udp.
Modifications:
- Implement the needed changes and make use of sendMsg(...) if possible for max performance
- Add tests that test sending datagram packets with all kind of different ByteBuf implementations.
Result:
Performance improvement when using CompoisteByteBuf and EpollDatagramChannel.
Motivation:
In EpollSocketchannel.doWriteFileRegion(...) we need to make sure we write until sendFile(...) returns either 0 or all is written. Otherwise we may not get notified once the Channel is writable again.
This is the case as we use EPOLL_ET.
Modifications:
Always write until either sendFile returns 0 or all is written.
Result:
No more hangs when writing DefaultFileRegion can happen.
Motivation:
There were no way to efficient write a CompositeByteBuf as we always did a memory copy to a direct buffer in this case. This is not needed as we can just write a CompositeByteBuf as long as all the components are buffers with a memory address.
Modifications:
- Write CompositeByteBuf which contains only direct buffers without memory copy
- Also handle CompositeByteBuf that have more components then 1024.
Result:
More efficient writing of CompositeByteBuf.
Related issue: #2764
Motivation:
EpollSocketChannel.writeFileRegion() does not handle the case where the
position of a FileRegion is non-zero properly.
Modifications:
- Improve SocketFileRegionTest so that it tests the cases where the file
transfer begins from the middle of the file
- Add another jlong parameter named 'base_off' so that we can take the
position of a FileRegion into account
Result:
Improved test passes. Corruption is gone.
Motivation:
At the moment it's only possible for a user to set the RecvByteBufAllocator for a Channel but not access the Handle once it is assigned. This makes it hard to write more flexible implementations.
Modifications:
Add a new method to the Channel.Unsafe to allow access the the used Handle for the Channel. The RecvByteBufAllocator.Handle is created lazily.
Result:
It's possible to write more flexible implementatons that allow to adjust stuff on the fly for a Handle that is used by a Channel
Motivation:
We did various changes related to the ChannelOutboundBuffer in 4.0 branch. This commit port all of them over and so make sure our branches are synced in terms of these changes.
Related to [#2734], [#2709], [#2729], [#2710] and [#2693] .
Modification:
Port all changes that was done on the ChannelOutboundBuffer.
This includes the port of the following commits:
- 73dfd7c01b49aca006a34cc48197dee3fc360af1
- 997d8c32d23f2d88903b7b607360907b99101002
- e282e504f17b0874719ff606c728494e3509b1a0
- 5e5d1a58fd3159c04ac7d10edfb8ed7a83d3935e
- 8ee3575e72d6ee000a99c717d96f36695a8667a0
- d6f0d12a8692c095df43b2a4462cbc97cf5c5a2d
- 16e50765d1fb99005ad761409c28dcedf477531b
- 3f3e66c31ae3da70c36cc125ca9bcac8215390e4
Result:
- Less memory usage by ChannelOutboundBuffer
- Same code as in 4.0 branch
- Make it possible to use ChannelOutboundBuffer with Channel implementation that not extends AbstractChannel
Related issue: #2733
Motivation:
Unlike OpenSsl, Epoll lacks a couple useful availability checker
methods:
- ensureAvailability()
- unavailabilityCause()
Modifications:
Add missing methods
Result:
More ways to check the availability and to get the cause of
unavailability programatically.