Motivation:
Our Unsafe*ByteBuf implementation always invert bytes when the native ByteOrder is LITTLE_ENDIAN (this is true on intel), even when the user calls order(ByteOrder.LITTLE_ENDIAN). This is not optimal for performance reasons as the user should be able to set the ByteOrder to LITTLE_ENDIAN and so write bytes without the extra inverting.
Modification:
- Introduce a new special SwappedByteBuf (called UnsafeDirectSwappedByteBuf) that is used by all the Unsafe*ByteBuf implementation and allows to write without inverting the bytes.
- Add benchmark
- Upgrade jmh to 0.8
Result:
The user is be able to get the max performance even on servers that have ByteOrder.LITTLE_ENDIAN as their native ByteOrder.
Motivation:
MpscLinkedQueue has various issues:
- It does not work without sun.misc.Unsafe.
- Some field names are confusing.
- Node.tail does not refer to the tail node really.
- The tail node is the starting point of iteration. I think the tail
node should be the head node and vice versa to reduce confusion.
- Some important methods are not implemented (e.g. iterator())
- Not serializable
- Potential false cache sharing problem due to lack of padding
- MpscLinkedQueue extends AtomicReference and thus exposes various
operations that mutates the internal state of the queue directly.
Modifications:
- Use AtomicReferenceFieldUpdater wherever possible so that we do not
use Unsafe directly. (e.g. use lazySet() instead of putOrderedObject)
- Extend AbstractQueue to implement most operations
- Implement serialization and iterator()
- Rename tail to head and head to tail to reduce confusion.
- Rename Node.tail to Node.next.
- Fix a leak where the references in the removed head are not cleared
properly.
- Add Node.clearMaybe() method so that the value of the new head node
is cleared if possible.
- Add some comments for my own educational purposes
- Add padding to the head node
- Add FullyPaddedReference and RightPaddedReference for future reuse
- Make MpscLinkedQueue package-local so that a user cannot access the
dangerous yet public operations exposed by the superclass.
- MpscLinkedQueue.Node becomes MpscLinkedQueueNode, a top level class
Result:
- It's more like a drop-in replacement of ConcurrentLinkedQueue for the
MPSC case.
- Works without sun.misc.Unsafe
- Code potentially easier to understand
- Fixed leak (related: #2372)
Motivation:
At the moment ChannelFlushPromiseNotifier.add(....) takes an int value for pendingDataSize, which may be too small as a user may need to use a long. This can for example be useful when a user writes a FileRegion etc. Beside this the notify* method names are kind of missleading as these should not contain *Future* because it is about ChannelPromises.
Modification:
Add a new add(...) method that takes a long for pendingDataSize and @deprecated the old method. Beside this also @deprecated all *Future* methods and add methods that have *Promise* in the method name to better reflect usage.
Result:
ChannelFlushPromiseNotifier can be used with bigger data.
Motivation:
Currently OkResponseHandler returns a DefaultHttpResponse which is not
correct and it should be returning complete http response.
Modifications:
Updated OkResponseHandler to return an instance of
DefaultFullHttpResponse.
Result:
It is not possible to add compression to the example without getting any
errors.
Motivation:
ChannelTrafficShapingHandler may corrupt inbound data stream by
scheduling the fireChannelRead event.
Modification:
Always call fireChannelRead(...) and only suspend reads after it
Result:
No more data corruption
Motivation:
When running Netty on a container environment, the container will often
complain about the lingering threads such as the worker threads of
ThreadDeathWatcher and GlobalEventExecutor. We should provide an
operation that allows a use to wait until such threads are terminated.
Modifications:
- Add awaitInactivity()
- (misc) Fix typo in GlobalEventExecutorTest
- (misc) Port ThreadDeathWatch's CAS-based thread life cycle management
to GlobalEventExecutor
Result:
- Fixes#2084
- Less overhead on task submission of GlobalEventExecutor
Motivation:
PooledByteBufAllocator's thread local cache and
ReferenceCountUtil.releaseLater() are in need of a way to run an
arbitrary logic when a certain thread is terminated.
Modifications:
- Add ThreadDeathWatcher, which spawns a low-priority daemon thread
that watches a list of threads periodically (every second) and
invokes the specified tasks when the associated threads are not alive
anymore
- Start-stop logic based on CAS operation proposed by @tea-dragon
- Add debug-level log messages to see if ThreadDeathWatcher works
Result:
- Fixes#2519 because we don't use GlobalEventExecutor anymore
- Cleaner code
Motivation:
At the moment MessageToMessageEncoder uses ctx.write(msg) when have more then one message was produced. This may produce more GC pressure then necessary as when the original ChannelPromise is a VoidChannelPromise we can safely also use one when write messages.
Modifications:
Use VoidChannelPromise when the original ChannelPromise was of this type
Result:
Less object creation and GC pressure
Motivation:
The current DefaultAttributeMap cause an infinite-loop when the user removes an attribute and create the same attribute again. This regression was introduced by c3bd7a8ff10021026b0c223d36f022dbbf4fe397.
Modification:
Correctly break out loop
Result:
No infinite-loop anymore.
Motivation:
If we make allocateRun/SubpageSimple() always try the left node first and make allocateRun/Subpage() always tries the right node first, it is more likely that allocateRun/Subpage() will find a node with ST_UNUSED sooner.
Modifications:
- Make allocateRunSimple() and allocateSubpageSimple() always try the left node first.
- Make allocateRun() and allocateSubpage() always try the right node first.
- Remove randome
Result:
We get the same performance without using random numbers.
Motivation:
We still have a room for improvement in PoolChunk.allocateRun() and
Subpage.allocate().
Modifications:
- Unroll the recursion in PoolChunk.allocateRun()
- Subpage.allocate() makes use of the 'nextAvail' value set by previous
free().
Result:
- PoolChunk.allocateRun() optimization yields 10%+ improvements in
allocation throughput for non-subpage allocations.
- Subpage.allocate() optimization makes the subpage allocations for
tiny buffers as fast as non-tiny buffers even when the pageSize is
huge (e.g. 1048576) because it doesn't need to perform a linear search
in most cases.
Motivation:
Allocating a single buffer and releasing it repetitively for a benchmark will not involve the realistic execution path of the allocators.
Modifications:
Keep the last 8192 allocations and release them randomly.
Result:
We are now getting the result close to what we got with caliper.
Motivation:
On some ill-configured systems, InetAddress.getLocalHost() fails. NioSocketChannelTest calls java.net.Socket.connect() and it internally invoked InetAddress.getLocalHost(), which causes the test failures in NioSocketChannelTes on such an ill-configured system.
Modifications:
Use NetUtil.LOCALHOST explicitly.
Result:
NioSocketChannelTest should not fail anymore.
Motivation:
maven-antrun-plugin does not redirect stdin, and thus it's impossible to
run interactive examples such as securechat-client and telnet-client.
org.codehaus.mojo:exec-maven-plugin redirects stdin, but it buffers
stdout and stderr, and thus an application output is not flushed timely.
Modifications:
Deploy a forked version of exec-maven-plugin which flushes output
buffers in a timely manner.
Result:
Interactive examples work. Launches faster than maven-antrun-plugin.
Motivation:
The examples have not been updated since long time ago, showing various
issues fixed in this commit.
Modifications:
- Overall simplification to reduce LoC
- Use system properties to get options instead of parsing args.
- Minimize option validation
- Just use System.out/err instead of Logger
- Do not pass config as parameters - just access it directly
- Move the main logic to main(String[]) instead of creating a new
instance meaninglessly
- Update netty-build-21 to make checkstyle not complain
- Remove 'throws Exception' clause if possible
- Line wrap at 120 (previously at 80)
- Add an option to enable SSL for most examples
- Use ChannelFuture.sync() instead of await()
- Use System.out for the actual result. Use System.err otherwise.
- Delete examples that are not very useful:
- applet
- websocket/html5
- websocketx/sslserver
- localecho/multithreaded
- Add run-example.sh which simplifies launching an example from command
line
- Rewrite FileServer example
Result:
Shorter and simpler examples. A user can focus more on what it actually
does than miscellaneous stuff. A user can launch an example very
easily.
Motivation:
When (listeners == null && lateListeners == null) and (stackDepth >= MAX_LISTENER_STACK_DEPTH), the listener is not notified at all. The discard client does not work.
Modification:
Make sure to submit the notification task.
Result:
The discard client works again and all listeners are notified.
Motivation:
- dependencyVersionsDir property is not resolved during the build
process. The build doesn't fail because of this, but it creates an
ugly directory.
- All-in-one JAR contains libnetty-tcnative.so, which is not part of the
all-in-one JAR.
Modifications:
- Fix an incorrect property name
(dependencyVersionDir -> dependencyVersionsDir)
- Exclude libnetty-tcnative.so
- Remove unnecessary includes in source expanding configuration
Result:
- Cleaner pom.xml
- We do not ship libnetty-tcnative.so in all-in-one JAR anymore, which
is correct, because strictly speaking the native library belongs to
org.apache.tomcat.jni package.
Motivation:
exec-maven-plugin does not flush stdout and stderr, making the console
output from the examples invisible to users
Modification:
Use maven-antrun-plugin instead
Result:
A user sees the output from the examples immediately.
Motivation:
According to TLS ALPN draft-05, a client sends the list of the supported
protocols and a server responds with the selected protocol, which is
different from NPN. Therefore, ApplicationProtocolSelector won't work
with ALPN
Modifications:
- Use Iterable<String> to list the supported protocols on the client
side, rather than using ApplicationProtocolSelector
- Remove ApplicationProtocolSelector
Result:
Future compatibility with TLS ALPN
Motivation:
- OpenSslEngine and JDK SSLEngine (+ Jetty NPN) have different APIs to
support NextProtoNego extension.
- It is impossible to configure NPN with SslContext when the provider
type is JDK.
Modification:
- Implement NextProtoNego extension by overriding the behavior of
SSLSession.getProtocol() for both OpenSSLEngine and JDK SSLEngine.
- SSLEngine.getProtocol() returns a string delimited by a colon (':')
where the first component is the transport protosol (e.g. TLSv1.2)
and the second component is the name of the application protocol
- Remove the direct reference of Jetty NPN classes from the examples
- Add SslContext.newApplicationProtocolSelector
Result:
- A user can now use both JDK SSLEngine and OpenSslEngine for NPN-based
protocols such as HTTP2 and SPDY
Motivation:
Mac OS X ships Bash 3, and it does not have an associative array
(declare -A).
Modifications:
Do not use an associative array.
Result:
Can run examples on Mac OS X using run-example.sh
Motivation:
- There's no way to pass an argument to an example.
- Assigning a Maven profile for each example is an overkill.
It makes the pom.xml crowded.
Modifications:
- Remove example profiles from example/pom.xml
- Keep the list of examples in run-example.sh
- run-example.sh passes all options to exec-maven-plugin.
For example, we can now do this:
./run-example.sh -Dssl -Dport=443 http-server
Result:
- It's much easier to add a new example and provide an easy way to
launch it.
- We can still pass an arbitrary argument to the example being launched.
(I'll update all examples to make them get their options from system
properties rather than from args[].
Motivation:
Build fails with JDK 8 because npn-boot does not work with JDK 8
Modifications:
Do not specify bootclasspath when on JDK 8
Result:
Build is green again.
Motivation:
Due to a known problem[1] of maven-compiler-plugin, our build always
compiles everything from scratch, which is waste of time.
Modifications:
Exclude package-info.java from the source list.
Result:
Much shorter build time.
[1]: https://jira.codehaus.org/browse/MCOMPILER-205
Motivation:
- example/pom.xml has quite a bit of duplication.
- We expect that we depend on npn-boot in more than one module in the
near future. (e.g. handler, codec-http, and codec-http2)
Modification:
- Deduplicate the profiles in example/pom.xml
- Move the build configuration related with npn-boot to the parent pom.
- Add run-example.sh that helps a user launch an example easily
Result:
- Cleaner build files
- Easier to add a new example
- Easier to launch an example
- Easier to run the tests that relies on npn-boot in the future
Motivation:
During a large memory copy, safepoint polling is diabled, hindering
accurate profiling.
Modifications:
Only copy up to 1 MiB per Unsafe.copyMemory()
Result:
Potentially more reliable performance
Motivation:
For an unknown reason, JVM of JDK8 crashes intermittently when
SslHandler feeds a direct buffer to SSLEngine.unwrap() *and* the current
cipher suite has GCM (Galois/Counter Mode) enabled.
Modifications:
Convert the inbound network buffer to a heap buffer when the current
cipher suite is using GCM.
Result:
JVM does not crash anymore.
Motivation:
JDK's SSLEngine.wrap() requires the output buffer to be always as large as MAX_ENCRYPTED_PACKET_LENGTH even if the input buffer contains small number of bytes. Our OpenSslEngine implementation does not have such wasteful behaviot.
Modifications:
If the current SSLEngine is OpenSslEngine, allocate as much as only needed.
Result:
Less peak memory usage.
Motivation:
It's useful to have netty-tcnative dependency in netty-example because
we can play with OpenSslEngine from our IDE.
Modifications:
Add netty-tcnative to example/pom.xml
Motivation:
Previous fix for the OpenSslEngine compatibility issue (#2216 and
18b0e95659c057b126653bad2f018a8ce5385255) was to feed SSL records one by
one to OpenSslEngine.unwrap(). It is not optimal because it will result
in more JNI calls.
Modifications:
- Do not feed SSL records one by one.
- Feed as many records as possible up to MAX_ENCRYPTED_PACKET_LENGTH
- Deduplicate MAX_ENCRYPTED_PACKET_LENGTH definitions
Result:
- No allocation of intemediary arrays
- Reduced number of calls to SSLEngine and thus its underlying JNI calls
- A tad bit increase in throughput, probably reverting the tiny drop
caused by 18b0e95659c057b126653bad2f018a8ce5385255