Motivation:
On linux it is possible to use the sendMsg(...) system call to write multiple buffers with one system call when using datagram/udp.
Modifications:
- Implement the needed changes and make use of sendMsg(...) if possible for max performance
- Add tests that test sending datagram packets with all kind of different ByteBuf implementations.
Result:
Performance improvement when using CompoisteByteBuf and EpollDatagramChannel.
Motivation:
InetAddress.getByName(...) uses exceptions for control flow when try to parse IPv4-mapped-on-IPv6 addresses. This is quite expensive.
Modifications:
Detect IPv4-mapped-on-IPv6 addresses in the JNI level and convert to IPv4 addresses before pass to InetAddress.getByName(...) (via InetSocketAddress constructor).
Result:
Eliminate performance problem causes by exception creation when parsing IPv4-mapped-on-IPv6 addresses.
Motivation:
In EpollSocketchannel.doWriteFileRegion(...) we need to make sure we write until sendFile(...) returns either 0 or all is written. Otherwise we may not get notified once the Channel is writable again.
This is the case as we use EPOLL_ET.
Modifications:
Always write until either sendFile returns 0 or all is written.
Result:
No more hangs when writing DefaultFileRegion can happen.
Motivation:
There were no way to efficient write a CompositeByteBuf as we always did a memory copy to a direct buffer in this case. This is not needed as we can just write a CompositeByteBuf as long as all the components are buffers with a memory address.
Modifications:
- Write CompositeByteBuf which contains only direct buffers without memory copy
- Also handle CompositeByteBuf that have more components then 1024.
Result:
More efficient writing of CompositeByteBuf.
Related issue: #2764
Motivation:
EpollSocketChannel.writeFileRegion() does not handle the case where the
position of a FileRegion is non-zero properly.
Modifications:
- Improve SocketFileRegionTest so that it tests the cases where the file
transfer begins from the middle of the file
- Add another jlong parameter named 'base_off' so that we can take the
position of a FileRegion into account
Result:
Improved test passes. Corruption is gone.
Motivation:
At the moment it's only possible for a user to set the RecvByteBufAllocator for a Channel but not access the Handle once it is assigned. This makes it hard to write more flexible implementations.
Modifications:
Add a new method to the Channel.Unsafe to allow access the the used Handle for the Channel. The RecvByteBufAllocator.Handle is created lazily.
Result:
It's possible to write more flexible implementatons that allow to adjust stuff on the fly for a Handle that is used by a Channel
Motivation:
We did various changes related to the ChannelOutboundBuffer in 4.0 branch. This commit port all of them over and so make sure our branches are synced in terms of these changes.
Related to [#2734], [#2709], [#2729], [#2710] and [#2693] .
Modification:
Port all changes that was done on the ChannelOutboundBuffer.
This includes the port of the following commits:
- 73dfd7c01b
- 997d8c32d2
- e282e504f1
- 5e5d1a58fd
- 8ee3575e72
- d6f0d12a86
- 16e50765d1
- 3f3e66c31a
Result:
- Less memory usage by ChannelOutboundBuffer
- Same code as in 4.0 branch
- Make it possible to use ChannelOutboundBuffer with Channel implementation that not extends AbstractChannel
Related issue: #2733
Motivation:
Unlike OpenSsl, Epoll lacks a couple useful availability checker
methods:
- ensureAvailability()
- unavailabilityCause()
Modifications:
Add missing methods
Result:
More ways to check the availability and to get the cause of
unavailability programatically.
Motivation:
We sometimes not use the correct exception message when throw it from the native code.
Modifications:
Fixed the message.
Result:
Correct message in exception
Motivation:
We have some inconsistency when handling writes. Sometimes we call ChannelOutboundBuffer.progress(...) also for complete writes and sometimes not. We should call it always.
Modifications:
Correctly call ChannelOuboundBuffer.progress(...) for complete and incomplete writes.
Result:
Consistent behavior
Motivation:
While optimize gathering writes I introduced a bug when writing single ByteBuf that have a memoryAddress. This regression was introduced by 88bd6e7a93.
Modifications:
Correctly use the writerIndex as argument when call Native.writeAddress(...)
Result:
No more corruption while write single buffers.
Motivation:
While benchmarking the native transport with gathering writes I noticed that it is quite slow. This is due the fact that we need to do a lot of array copies to get the buffers into the iov array.
Modification:
Introduce a new class calles IovArray which allows to fill buffers directly in a iov array that can be passed over to JNI without any array copies. This gives a nice optimization in terms of speed when doing gathering writes.
Result:
Big performance improvement when doing gathering writes. See the included benchmark...
Before:
[nmaurer@xxx]~% wrk/wrk -H 'Host: localhost' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Connection: keep-alive' -d 120 -c 256 -t 16 --pipeline 256 http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 23.44ms 16.37ms 259.57ms 91.77%
Req/Sec 181.99k 31.69k 304.60k 78.12%
346544071 requests in 2.00m, 46.48GB read
Requests/sec: 2887885.09
Transfer/sec: 396.59MB
With this change:
[nmaurer@xxx]~% wrk/wrk -H 'Host: localhost' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Connection: keep-alive' -d 120 -c 256 -t 16 --pipeline 256 http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 21.93ms 16.33ms 305.73ms 92.34%
Req/Sec 194.56k 33.75k 309.33k 77.04%
369617503 requests in 2.00m, 49.57GB read
Requests/sec: 3080169.65
Transfer/sec: 423.00MB
Motivation:
At the moment we use Get*ArrayElement all the time in the epoll transport which may be wasteful as the JVM may do a memory copy for this. For code-path that will get executed fast (without blocking) we should better make use of GetPrimitiveArrayCritical and ReleasePrimitiveArrayCritical as this signal the JVM that we not want to do any memory copy if not really needed. It is important to only do this on non-blocking code-path as this may even suspend the GC to disallow the JVM to move the arrays around.
See also http://docs.oracle.com/javase/7/docs/technotes/guides/jni/spec/functions.html#GetPrimitiveArrayCritical
Modification:
Make use of GetPrimitiveArrayCritical / ReleasePrimitiveArrayCritical as replacement for Get*ArrayElement / Release*ArrayElement where possible.
Result:
Better performance due less memory copies.
Motivation:
In EpollSocketchannel.writeBytesMultiple(...) we loop over all buffers to see if we need to adjust the readerIndex for incomplete writes. We can skip this if we know that everything was written (a.k.a complete write).
Modification:
Use fast-path if all bytes are written and so no need to loop over buffers
Result:
Fast write path for the average use.
Motivation:
At the moment NioSocketChannelOutboundBuffer.nioBuffers() / EpollSocketChannelOutboundBuffer.memoryAddresses() returns null if something is contained in the ChannelOutboundBuffer which is not a ByteBuf. This is a problem for two reasons:
1 - In the javadocs we state that it will never return null
2 - We may do a not optimal write as there may be things that could be written via gathering writes
Modifications:
Change NioSocketChannelOutboundBuffer.nioBuffers() / EpollSocketChannelOutboundBuffer.memoryAddresses() to never return null but have it contain all ByteBuffer that were found before the non ByteBuf. This way we can do a gathering write and also conform to the javadocs.
Result:
Better speed and also correct implementation in terms of the api.
Motivation:
In the previous fix for #2667 I did introduce a bit overhead by calling setEpollOut() too often.
Modification:
Only call setEpollOut() if really needed and remove unused code.
Result:
Less overhead when saturate network.
Motivation:
As a DatagramChannel supports to write to multiple remote peers we must not close the Channel once a IOException accours as this error may be only valid for one remote peer.
Modification:
Continue writing on IOException.
Result:
DatagramChannel can be used even after an IOException accours during writing.
Motivation:
We need to continue write until we hit EAGAIN to make sure we not see an starvation
Modification:
Write until EAGAIN is returned
Result:
No starvation when using native transport with ET.
Motivation:
Because of a missing return statement we may produce a NPE when try to fullfill the connect ChannelPromise when it was fullfilled before.
Modification:
Add missing return statement.
Result:
No more NPE.
Motivation:
The handling of IOV_MAX was done in JNI code base which makes stuff really complicated to maintain etc.
Modifications:
Move handling of IOV_MAX to java code to simplify stuff
Result:
Cleaner code.
Motivation:
In our nio implementation we use write-spinning for maximize throughput, but in the native implementation this is not used.
Modification:
Respect writeSpinCount in native transport.
Result:
Better throughput
Motivation:
Currently when Native.writev(...) is used it is possible to see a JVM segfault because the offset is updated to early.
Modification:
Only update the offset once it is safe to do so.
Result:
No more segfault
Motivation:
epoll transport fails on gathering write of more then 1024 buffers. As linux supports max. 1024 iov entries when calling writev(...) the epoll transport throws an exception.
Thanks again to @blucas to provide me with a reproducer and so helped me to understand what the issue is.
Modifications:
Make sure we break down the writes if to many buffers are uses for gathering writes.
Result:
Gathering writes work with any number of buffers
Motivation:
Currently it is impossible to build netty on linux system that not define SO_REUSEPORT even if it is supported.
Modification:
Define SO_REUSEPORT if not defined.
Result:
Possible to build on more linux dists.
Motivation:
We use the nanoTime of the scheduledTasks to calculate the milli-seconds to wait for a select operation to select something. Once these elapsed we check if there was something selected or some task is ready for processing. Unfortunally we not take into account scheduled tasks here so the selection loop will continue if only scheduled tasks are ready for processing. This will delay the execution of these tasks.
Modification:
- Check if a scheduled task is ready after selecting
- also make a tiny change in NioEventLoop to not trigger a rebuild if nothing was selected because the timeout was reached a few times in a row.
Result:
Execute scheduled tasks on time.
Motivation:
When we do a (env*)->GetObjectArrayElement(...) call we may created many local references which will only be cleaned up once we exist the native method. Thus a lot of memory can be used and so a StackOverFlow may be triggered. Beside this the JNI specification only say that an implementation must cope with 16 local references.
Modification:
Call (env*)->ReleaseLocalRef(...) to release the resource once not needed anymore.
Result:
Less memory usage and guard against StackOverflow
Motivation:
At the moment there is no simple way for a user to check if the native epoll transport can be used on the running platform. Thus the user can only try to instance it and catch any exception and fallback to nio transport.
Modification:
Add Epoll.isAvailable() which allows to check if epoll can be used.
Result:
User can easily check if epoll transport can be used or not
Motivation:
When using openjdk and oracle jdk's nio (while using the nio transport) the ServerSocketChannel uses SO_REUSEADDR by default. Our native transport should do the same to make it easier to switch between the different implementations and get the expected result.
Modification:
Change EpollServerSocketChannelConfig to set SO_REUSEADDR on the created socket.
Result:
SO_REUSEADDR is used by default on servers.
Motivation:
We need to map from ints to AbstractEpollChannel in EpollEventLoop but there is no need for box to Integer.
Modification:
Replace Map with IntObjectMap.
Result:
No more auto-boxing needed.
Motivation:
Some users already use an SSLEngine implementation in finagle-native. It
wraps OpenSSL to get higher SSL performance. However, to take advantage
of it, finagle-native must be compiled manually, and it means we cannot
pull it in as a dependency and thus we cannot test our SslHandler
against the OpenSSL-based SSLEngine. For an instance, we had #2216.
Because the construction procedures of JDK SSLEngine and OpenSslEngine
are very different from each other, we also need to provide a universal
way to enable SSL in a Netty application.
Modifications:
- Pull netty-tcnative in as an optional dependency.
http://netty.io/wiki/forked-tomcat-native.html
- Backport NativeLibraryLoader from 4.0
- Move OpenSSL-based SSLEngine implementation into our code base.
- Copied from finagle-native; originally written by @jpinner et al.
- Overall cleanup by @trustin.
- Run all SslHandler tests with both default SSLEngine and OpenSslEngine
- Add a unified API for creating an SSL context
- SslContext allows you to create a new SSLEngine or a new SslHandler
with your PKCS#8 key and X.509 certificate chain.
- Add JdkSslContext and its subclasses
- Add OpenSslServerContext
- Add ApplicationProtocolSelector to ensure the future support for NPN
(NextProtoNego) and ALPN (Application Layer Protocol Negotiation) on
the client-side.
- Add SimpleTrustManagerFactory to help a user write a
TrustManagerFactory easily, which should be useful for those who need
to write an alternative verification mechanism. For example, we can
use it to implement an unsafe TrustManagerFactory that accepts
self-signed certificates for testing purposes.
- Add InsecureTrustManagerFactory and FingerprintTrustManager for quick
and dirty testing
- Add SelfSignedCertificate class which generates a self-signed X.509
certificate very easily.
- Update all our examples to use SslContext.newClient/ServerContext()
- SslHandler now logs the chosen cipher suite when handshake is
finished.
Result:
- Cleaner unified API for configuring an SSL client and an SSL server
regardless of its internal implementation.
- When native libraries are available, OpenSSL-based SSLEngine
implementation is selected automatically to take advantage of its
performance benefit.
- Examples take advantage of this modification and thus are cleaner.
Motivation:
At the moment we sometimes use only RecvByteBufAllocator.guess() to guess the next size and the use the ByteBufAllocator.* directly to allocate the buffer. We should always use RecvByteBufAllocator.allocate(...) all the time as this makes the behavior easier to adjust.
Modifications:
Change the read() implementations to make use of RecvByteBufAllocator.
Result:
Behavior is more consistent.
Motivation:
When doing a gathering write we need to update the indices after the write partial completes. In the current code-base we use the wrong value when compare the expected written bytes and the actual written bytes.
Modifications:
Use the correct value when compare.
Result:
Indices are updated correctly and so no corruption can happen when resume writing after data was only partial written before.
Motivation:
oss.sonatype.org refuses to promote an artifact if it doesn't have the
default JAR (the JAR without classifier.)
Modifications:
- Generate both the default JAR and the native JAR to make
oss.sonatype.org happy
- Rename the profile 'release' to 'restricted-release' which reflects
what it really does better
- Remove the redundant <quickbuild>true</quickbuild> in all/pom.xml
We specify the profile 'full' that triggers that property already
in maven-release-plugin configuration.
Result:
oss.sonatype.org is happy. Simpler pom.xml
Motivation:
So far, we used a very simple platform string such as linux64 and
linux32. However, this is far from perfection because it does not
include anything about the CPU architecture.
Also, the current build tries to put multiple versions of .so files into
a single JAR. This doesn't work very well when we have to ship for many
different platforms. Think about shipping .so/.dynlib files for both
Linux and Mac OS X.
Modification:
- Use os-maven-plugin as an extension to determine the current OS and
CPU architecture reliable at build time
- Use Maven classifier instead of trying to put all shared libraries
into a single JAR
- NativeLibraryLoader does not guess the OS and bit mode anymore and it
always looks for the same location regardless of platform, because the
Maven classifier does the job instead.
Result:
Better scalable native library deployment and retrieval
Motivation:
4 and 5 were diverged long time ago and we recently reverted some of the
early commits in master. We must make sure 4.1 and master are not very
different now.
Modification:
Remove ChannelHandlerInvoker.writeAndFlush(...) and the related
implementations.
Result:
4.1 and master got closer.
Motivation:
AbstractEpollChannel.clearEpollIn() throws an IllegalStateException if a user tries to change the autoRead configuration for the Channel and the Channel is not registered on an EventLoop yet. This makes it for example impossible to set AUTO_READ to false via the ServerBootstrap as the configuration is modifed before the Channel is registered.
Modification:
Check if the Channel is registered and if not just modify the flags directly so they are respected once the Channel is registered
Result:
It is possible now to configure AUTO_READ via the ServerBootstrap
Motivation:
We are currently try to modify the events via EpollEventLoop even when the channel was closed before and so the fd was set to -1. This fails with a RuntimeException in this case.
Modification:
Always check if the Channel is still open before try to modify the events.
Result:
No more RuntimeException because of a not open channel
Motivation:
Currently the generics used for TCP_KEEPIDLE, TCP_KEEPINTVL and TCP_KEEPCNT are incorrect.
Modifications:
Use Integer as type
Result:
User can use TCP_KEEPIDLE, TCP_KEEPINTVL and TCP_KEEPCNT as expected
Motivation:
EpollDatagramChannel produced buffer leaks when tried to read from the channel and nothing was ready to be read.
Modifications:
Correctly release buffer if nothing was read
Result:
No buffer leak
Motivation:
Allow to set TCP_KEEPIDLE, TCP_KEEPINTVL and TCP_KEEPCNT in native transport to offer the user with more flexibility.
Modifications:
Expose methods to set these options and write the JNI implementation.
Result:
User can now use TCP_KEEPIDLE, TCP_KEEPINTVL and TCP_KEEPCNT.
Motivation:
With SO_REUSEPORT it is possible to bind multiple sockets to the same port and so handle the processing of packets via multiple threads. This allows to handle DatagramPackets with more then one thread on the same port and so gives better performance.
Modifications:
Expose EpollDatagramChannelConfig.setReusePort(..) and isReusePort()
Result:
Allow to bind multiple times to the same local address and so archive better performance.
Motivation:
At the moment ChanneConfig.setAutoRead(false) only is guaranteer to not have an extra channelRead(...) triggered when used from within the channelRead(...) or channelReadComplete(...) method. This is not the correct behaviour as it should also work from other methods that are triggered from within the EventLoop. For example a valid use case is to have it called from within a ChannelFutureListener, which currently not work as expected.
Beside this there is another bug which is kind of related. Currently Channel.read() will not work as expected for OIO as we will stop try to read even if nothing could be read there after one read operation on the socket (when the SO_TIMEOUT kicks in).
Modifications:
Implement the logic the right way for the NIO/OIO/SCTP and native transport, specific to the transport implementation. Also correctly handle Channel.read() for OIO transport by trigger a new read if SO_TIMEOUT was catched.
Result:
It is now also possible to use ChannelConfig.setAutoRead(false) from other methods that are called from within the EventLoop and have direct effect.
Conflicts:
transport-sctp/src/main/java/io/netty/channel/sctp/nio/NioSctpChannel.java
transport/src/main/java/io/netty/channel/socket/nio/NioDatagramChannel.java
transport/src/main/java/io/netty/channel/socket/nio/NioSocketChannel.java
Motivation:
There is currently no epoll based DatagramChannel. We should add one to make the set of provided channels complete and also to be able to offer better performance compared to the NioDatagramChannel once SO_REUSEPORT is implemented.
Modifications:
Add implementation of DatagramChannel which uses epoll. This implementation does currently not support multicast yet which will me implemented later on. As most users will not use multicast anyway I think it is fair to just add the EpollDatagramChannel without the support for now. We shipped NioDatagramChannel without support earlier too ...
Result:
Be able to use EpollDatagramChannel for max. performance on linux
Motivation:
In linux kernel 3.9 a new featured named SO_REUSEPORT was introduced which allows to have multiple sockets bind to the same port and so handle the accept() of new connections with multiple threads. This can greatly improve the performance when you not to accept a lot of connections.
Modifications:
Implement SO_REUSEPORT via JNI
Result:
Be able to use the SO_REUSEPORT feature when using the EpollServerSocketChannel
Motivation:
We sometimes see data corruption when writing to the EpollSocketChannel.
Modifications:
Correctly update the position of the ByteBuffer after something was written.
Result:
Fix data-corruption which could happen on partial writes
Motivation:
When using System.getProperty(...) and various methods to get a ClassLoader it will fail when a SecurityManager is in place.
Modifications:
Use a priveled block if needed. This work is based in the PR #2353 done by @anilsaldhana .
Result:
Code works also when SecurityManager is present
Motivation:
EpollSocketChannel.remoteAddress0() is always null on accepted EpollSocketChannels as we not set it excplicit.
Modifications:
Correctly retrieve the local and remote address when accept new channel and store it
Result:
EpollSocketchannel.remoteAddress0() and EpollSocketChannel.localAddress0() return correct addresses
Motivation:
Native.epollCreate(...) fails on systems using a kernel < 2.6.27 / glibc < 2.9 because it uses epoll_create1(...) without checking if it is present
Modifications:
Check if epoll_create1(...) exists abd if not fall back to use epoll_create(...)
Result:
Works even on systems with kernel < 2.6.27 / glibc < 2.9
Motivation:
While the default thread model provided by Netty is reasonable enough for most applications, some users might have a special requirement for the thread model. Here are a few examples:
- A user might want to invoke handlers from the caller thread directly, assuming that his or her application is completely asynchronous and does not make any invocation from non-I/O thread. In this case, the default invoker implementation will only add the overhead of checking if the current thread is an I/O thread or not.
- A user might want to invoke handlers from different threads depending on the type of events flexibly.
Modifications:
- Backport 132af3a485 which is a fix for #1912
- Add a new interface called 'ChannelHandlerInvoker' that performs the invocation of event handler methods.
- Add pipeline manipulation methods that accept ChannelHandlerInvoker
- The differences from the original commit:
- Separated the irrelevant changes out
- Channel.eventLoop is null until the registration is complete in this branch, so Channel.Unsafe.invoker() doesn't work before registration.
- Deregistration is not gone in this branch, so the methods related with deregistration were added to ChannelHandlerInvoker
Motivation:
Make sure the remote/local InetSocketAddress can be obtained correctly
Modifications:
Set the remote/local InetSocketAddress after a bind/connect operation was performed
Result:
It is possible to still access the informations even after the fd became invalid. This mirror the behaviour of NIO.
Motivation:
The epoll testsuite tests the epoll transport only against itself (i.e. epoll x epoll only). We should test the epoll transport also against the well-tested NIO transport, too.
Modifications:
- Make SocketTestPermutation extensible and reusable so that the epoll testsuite can take advantage of it.
- Rename EpollTestUtils to EpollSocketTestPermutation and make it extend SocketTestPermutation.
- Overall clean-up of SocketTestPermutation
- Use Arrays.asList() for simplicity
- Add combo() method to remove code duplication
Result:
The epoll transport is now also tested against the NIO transport. SocketTestPermutation got cleaner.
Motivation:
Previous commit (2de65e25e9) introduced a regression that makes the epoll testsuite fail with an 'incompatible event loop' error.
Modifications:
Use the correct event loop type.
Result:
Build doesn't fail anymore.
Motivation:
We are seeing EpollSocketSslEchoTest does not finish itself while its I/O thread is busy. Jenkins should have terminated them when the global build timeout reaches, but Jenkins seems to fail to do so. What's more interesting is that Jenkins will start another job before the EpollSocketSslEchoTest is terminated, and Linux starts to oom-kill them, impacting the uptime of the CI service.
Modifications:
- Set timeout for all test cases in SocketSslEchoTest so that all SSL tests terminate themselves when they take too long.
- Fix a bug where the epoll testsuite uses non-daemon threads which can potentially prevent JVM from quitting.
- (Cleanup) Separate boss group and worker group just like we do for NIO/OIO transport testsuite.
Result:
Potentially more stable CI machine.
Motivation:
We better use UnresolveableAddressException as NIO does the same.
Modifications:
Replace usage of UnknownHostException with UnresolveableAddressException
Result:
epoll transport and nio transport behave the same way
Motivation:
At the moment when an unresolvable InetSocketAddress is passed into the epoll transport a NPE is thrown
Modifications:
Add check in place which will throw an UnknownHostException if an InetSocketAddress could not been resolved.
Result:
Proper handling of unresolvable InetSocketAddresses
This also does factor out some logic of ChannelOutboundBuffer. Mainly we not need nioBuffers() for many
transports and also not need to copy from heap to direct buffer. So this functionality was moved to
NioSocketChannelOutboundBuffer. Also introduce a EpollChannelOutboundBuffer which makes use of
memory addresses for all the writes to reduce GC pressure
This transport use JNI (C) to directly make use of epoll in Edge-Triggered mode for maximal performance on Linux. Beside this it also support using TCP_CORK and produce less GC then the NIO transport using JDK NIO.
It only builds on linux and skip the build if linux is not used. The transport produce a jar which contains all needed .so files for 32bit and 64 bit. The user only need to include the jar as dependency as usually
to make use of it and use the correct classes.
This includes also some cleanup of @trustin