Motivation:
isRoot() is an expensive operation. We should avoid calling it if
possible.
Modifications:
Move the isRoot() checks to the end of the 'if' block, so that isRoot()
is evaluated only when really necessary.
Result:
isRoot() is evaluated only when SO_BROADCAST is set and the bind address
is anylocal address.
Motivation:
When a user sees an error message, sometimes he or she does not know
what exactly he or she has to do to fix the problem.
Modifications:
Log the URL of the wiki pages that might help the user troubleshoot.
Result:
We are more friendly.
Related: #3166
Motivation:
When the recyclable object created at one thread is returned at the
other thread, it is stored in a WeakOrderedQueue.
The objects stored in the WeakOrderedQueue is added back to the stack by
WeakOrderedQueue.transfer() when the owner thread ran out of recyclable
objects.
However, WeakOrderedQueue.transfer() does not have any mechanism that
prevents the stack from growing beyond its maximum capacity.
Modifications:
- Make WeakOrderedQueue.transfer() increase the capacity of the stack
only up to its maximum
- Add tests for the cases where the recyclable object is returned at the
non-owner thread
- Fix a bug where Stack.scavengeSome() does not scavenge the objects
when it's the first time it ran out of objects and thus its cursor is
null.
- Overall clean-up of scavengeSome() and transfer()
Result:
The capacity of Stack never increases beyond its maximum.
Motivation:
io.netty.util.internal.PlatformDependent.isRoot() depends on the IS_ROOT field which is filled in during class initialization. This spawns processes and consumes resources, which are not generally necessary to the complete functioning of that class.
Modifications:
This switches the class to use lazy initialization this field inside of the isRoot() method using double-checked locking (http://en.wikipedia.org/wiki/Double-checked_locking).
Result:
The first call to isRoot() will be slightly slower, at a tradeoff that class loading is faster, uses fewer resources and platform errors are avoided unless necessary.
- Parameterize DomainNameMapping to make it useful for other use cases
than just mapping to SslContext
- Move DomainNameMapping to io.netty.util
- Clean-up the API documentation
- Make SniHandler.hostname and sslContext volatile because they can be
accessed by non-I/O threads
Motivation:
When we need to host multiple server name with a single IP, it requires
the server to support Server Name Indication extension to serve clients
with proper certificate. So the SniHandler will host multiple
SslContext(s) and append SslHandler for requested hostname.
Modification:
* Added SniHandler to host multiple certifications in a single server
* Test case
Result:
User could use SniHandler to host multiple certifcates at a time.
It's server-side only.
Motivation:
8fbc513 introduced stray warnings in callsites of
PromiseAggregator#add and PromiseNotifier#(...).
Modifications:
This commit adds the @SafeVarargs annotation to PromiseAggregator#add
and PromiseNotifier#(...). As Netty is built with JDK7, this is a
recognized annotation and should not affect runtime VM versions 1.5 and
1.6.
Result:
Building Netty with JDK7 will no longer produce warnings in the
callsites mentioned above.
Motivation:
Although the new IntObjectMap.values() that returns Collection is
useful, the removed values(Class<V>) that returns an array is also
useful. It's also good for backward compatibility.
Modifications:
- Add IntObjectMap.values(Class<V>) back
- Miscellaneous improvements
- Cache the collection returned by IntObjectHashMap.values()
- Inspector warnings
- Update the IntObjectHashMapTest to test both values()
Result:
- Backward compatibility
- Potential performance improvement of values()
Motivation:
Found performance issues via FindBugs and PMD.
Modifications:
- Removed unnecessary boxing/unboxing operations in DefaultTextHeaders.convertToInt(CharSequence) and DefaultTextHeaders.convertToLong(CharSequence). A boxed primitive is created from a string, just to extract the unboxed primitive value.
- Added a static modifier for DefaultHttp2Connection.ParentChangedEvent class. This class is an inner class, but does not use its embedded reference to the object which created it. This reference makes the instances of the class larger, and may keep the reference to the creator object alive longer than necessary.
- Added a static compiled Pattern to avoid compile it each time it is used when we need to replace some part of authority.
- Improved using of StringBuilders.
Result:
Performance improvements.
Motivation:
The outbound flow controller logic does not properly reset the allocated
bytes between successive invocations of the priority algorithm.
Modifications:
Updated the priority algorithm to reset the allocated bytes for each
stream.
Result:
Each call to the priority algorithm now starts with zero allocated bytes
for each stream.
Motivation:
NetUtil.isValidIpV6Address() handles the interface name in IPv6 address
incorrectly. For example, it returns false for the following addresses:
- ::1%lo
- ::1%_%_in_name_
Modifications:
- Strip the square brackets before validation for simplicity
- Strip the part after the percent sign completely before validation for
simplicity
- Simplify and reformat NetUtilTest
Result:
- The interface names in IPv6 addresses are handled correctly.
- NetUtilTest is cleaner
Motivation:
ChannelPromiseAggregator and ChannelPromiseNotifiers only allow
consumers to work with Channels as the result type. Generic versions
of these classes allow consumers to aggregate or broadcast the results
of an asynchronous execution with other result types.
Modifications:
Add PromiseAggregator and PromiseNotifier. Add unit tests for both.
Remove code in ChannelPromiseAggregator and ChannelPromiseNotifier and
modify them to extend the new base classes.
Result:
Consumers can now aggregate or broadcast the results of an asynchronous
execution with results types other than Channel.
Motivation:
CollectionUtils has only one method and it is used only in DefaultHeaders.
Modification:
Move CollectionUtils.equals() to DefaultHeaders and make it private
Result:
One less class to expose in our public API
Motivation:
Headers within netty do not cleanly share a common class hierarchy. As a result some header types support some operations
and don't support others. The consolidation of the class hierarchy will allow for maintenance and scalability for new codec.
The existing hierarchy also has a few short comings such as it is not clear when data conversions are happening. This
could result unintentionally getting back a collection or iterator where a conversion on each entry must happen.
The current headers algorithm also prepends all elements which means to find the first element or return a collection
in insertion order often requires a complete traversal followed by a collections.reverse call.
Modifications:
-Provide a generic base class which provides all the implementation for headers in netty
-Provide an extension to this class which allows for name type conversions to happen (to accommodate legacy CharSequence to String conversions)
-Update the headers interface to clarify when conversions will happen.
-Update the headers data structure so that appends are done to avoid unnecessary iteration or collection reversal.
Result:
-More unified class hierarchy for headers in netty
-Improved headers data structure and algorithms
-headers API more clearly identify when conversions are required.
Motivation:
Using a needless local copy of keys.length.
Modifications:
Using keys.length explicitly everywhere.
Result:
Slight performance improvement of hashIndex.
Motivation:
The hashIndex method currently uses a conditional to handle negative
keys. This could be done without a conditional to slightly improve
performance.
Modifications:
Modified hashIndex() to avoid using a conditional.
Result:
Slight performance improvement to hashIndex().
Motivation:
IntObjectHashMap throws an exception when using negative values for
keys.
Modifications:
Changed hashIndex() to normalize the index if the mod operation returns
a negative number.
Result:
IntObjectHashMap supports negative key values.
Motivation
Issue #3004 shows that "=" character was not supported as it should in
the HttpPostRequestDecoder in form-data boundary.
Modifications:
Add 2 methods in StringUtil
split with maxParts: String split with a max parts only (to prevent multiple '=' to
be source of extra split while not needed)
substringAfter: String part after delimiter (since first part is not needed)
Use those methods in HttpPostRequestDecoder. Change and the
HttpPostRequestDecoderTest to check using a boundary beginning with "=".
Results:
The fix implies more stability and fix the issue.
Motivation:
An IPv6 string can have a zone index which is followed by the '%' sign.
When a user passes an IPv6 string with a zone index,
NetUtil.createByteArrayFromIpAddressString() returns an incorrect value.
Modification:
- Strip the zone index before conversion
Result:
An IPv6 string with a zone index is decoded correctly.
Motivation:
HTTP/2 codec does not properly test exception passed to
exceptionCaught() for instanceof Http2Exception (since the exception
will always be wrapped in a PipelineException), so it will never
properly handle Http2Exceptions in the pipeline.
Also if any streams are present, the connection close logic will execute
twice when a pipeline exception. This is because the exception logic
calls ctx.close() which then triggers the handleInActive() logic to
execute. This clears all of the remaining streams and then attempts to
run the closeListener logic (which has already been run).
Modifications:
Changed exceptionCaught logic to properly extract Http2Exception from
the PipelineException. Also added logic to the closeListener so that is
only run once.
Changed Http2CodecUtil.toHttp2Exception() to avoid NPE when creating
an exception with cause.getMessage().
Refactored Http2ConnectionHandler to more cleanly separate inbound and
outbound flows (Http2ConnectionDecoder/Http2ConnectionEncoder).
Added a test for verifying that a pipeline exception closes the
connection.
Result:
Exception handling logic is tidied up.
Motivation:
The java implementations for Inet6Address.getHostName() do not follow the RFC 5952 (http://tools.ietf.org/html/rfc5952#section-4) for recommended string representation. This introduces inconsistencies when integrating with other technologies that do follow the RFC.
Modifications:
-NetUtil.java to have another public static method to convert InetAddress to string. Inet4Address will use the java InetAddress.getHostAddress() implementation and there will be new code to implement the RFC 5952 IPV6 string conversion.
-New unit tests to test the new method
Result:
Netty provides a RFC 5952 compliant string conversion method for IPV6 addresses
Motivation:
Currently the Executor created by (Nio|Epoll)EventLoopGroup is not correctly shutdown.
This might lead to resource shortages, due to resources not being freed asap.
Modifications:
If (Nio|Epoll)EventLoopGroup create their internal Executor via a constructor
provided `ExecutorServiceFactory` object or via
MultithreadEventLoopGroup.newDefaultExecutorService(...) the ExecutorService.shutdown()
method will be called after (Nio|Epoll)EventLoopGroup is shutdown.
ExecutorService.shutdown() will not be called if the Executor object was passed
to the (Nio|Epoll)EventLoopGroup (that is, it was instantiated outside of Netty).
Result:
Correctly release resources on (Nio|Epoll)EventLoopGroup shutdown.
Motivation:
Outbound flow control does not properly remove the head of queue after
it's written. This will cause streams with multiple frames to get stuck
and not send all of the data.
Modifications:
Modified the DefaultHttp2OutboundFlowController to properly remove the
head of the pending write queue once a queued frame has been written.
Added an integration test that sends a large message to verify that all
DATA frames are properly collected at the other end.
Result:
Outbound flow control properly handles several queued messages.
Motivation:
This fixes bug #2848 which caused Recycler to become unbounded and cache infinite number of objects with maxCapacity that's not a power of two. This can result in general sluggishness of the application and OutOfMemoryError.
Modifications:
The test for maxCapacity has been moved out of test to check if the buffer has filled. The buffer is now also capped at maxCapacity and cannot grow over it as it jumps from one power of two to the other.
Additionally, a unit test was added to verify maxCapacity is honored even when it's not a power of two.
Result:
With these changes the user is able to use a custom maxCapacity number and not have it ignored. The unit test assures this bug will not repeat itself.
Motivation:
To make it more easy to shutdown an EventExecutorGroup / EventLoopGroup we should let both of them extend AutoCloseable.
Modifications:
Let EventExecutorGroup extend AutoCloseable and impement it.
Result:
Easier shutdown of EventExecutorGroup and EventLoopGroup
Motivation:
There is not need todo redunant reads of head in peakNode as we can just spin on next() until it becomes visible.
Modifications:
Remove redundant reads of head in peakNode. This is based on @nitsanw's patch for akka.
See https://github.com/akka/akka/pull/15596
Result:
Less volatile access.
Motivation:
The recent changes f8bee2e94c to use a FJP introduced a regression on Android that cause a NPE.
Modifications:
- Use AtomicReferenceFieldUpdater so it works also on Android
- Fix inspection warnings
Result:
Netty works again on Android too.
Motivation:
Recently we changed the default value of SOMAXCONN that is used when we can not determine it by reading /proc/sys/net/core/somaxconn. While doing this we missed to update the javadocs to reflect the new default value that is used.
Modifications:
List correct default value in the javadocs of SOMAXCONN.
Result:
Correct javadocs.
Related issue: #2407
Motivation:
The current fallback SOMAXCONN value is 3072. It is way too large
comparing to the default SOMAXCONN value of popular OSes.
Modifications:
Decrease the fallback SOMAXCONN value to 128 or 200 depending on the
current OS
Result:
Saner fallback value
Motivation:
After a channel is created it's usually assigned to an
EventLoop. During the lifetime of a Channel the
EventLoop is then responsible for processing all I/O
and compute tasks of the Channel.
For various reasons (e.g. load balancing) a user might
require the ability for a Channel to be assigned to
another EventLoop during its lifetime.
Modifications:
Introduce under the hood changes that ensure that Netty's
thread model is obeyed during and after the deregistration
of a channel.
Ensure that tasks (one time and periodic) are executed by
the right EventLoop at all times.
Result:
A Channel can be deregistered from one and re-registered with
another EventLoop.
Related issue: #2250
Motivation:
Prior to this commit, Netty's non blocking EventLoops
were each assigned a fixed thread by which all of the
EventLoop's I/O and handler logic would be performed.
While this is a fine approach for most users of Netty,
some advanced users require more flexibility in
scheduling the EventLoops.
Modifications:
Remove all direct usages of threads in MultithreadEventExecutorGroup,
SingleThreadEventExecutor et al., and introduce an Executor
abstraction instead.
The way to think about this change is, that each
iteration of an eventloop is now a task that gets scheduled
in a ForkJoinPool.
While the ForkJoinPool is the default, one also has the
ability to plug in his/her own Executor (aka thread pool)
into a EventLoop(Group).
Result:
Netty hands off thread management to a ForkJoinPool by default.
Users can also provide their own thread pool implementation and
get some control over scheduling Netty's EventLoops
Motivation:
The calculation of the max wait time for HashedWheelTimerTest.testExecutionOnTime() was wrong and so the test sometimes failed.
Modifications:
Fix the max wait time.
Result:
No more test-failures
Motivation:
We forgot to do a null check on the cause parameter of
ChannelFuture.setFailure(cause)
Modifications:
Add a null check
Result:
Fixed issue: #2728
Motivation:
We did various changes related to the ChannelOutboundBuffer in 4.0 branch. This commit port all of them over and so make sure our branches are synced in terms of these changes.
Related to [#2734], [#2709], [#2729], [#2710] and [#2693] .
Modification:
Port all changes that was done on the ChannelOutboundBuffer.
This includes the port of the following commits:
- 73dfd7c01b
- 997d8c32d2
- e282e504f1
- 5e5d1a58fd
- 8ee3575e72
- d6f0d12a86
- 16e50765d1
- 3f3e66c31a
Result:
- Less memory usage by ChannelOutboundBuffer
- Same code as in 4.0 branch
- Make it possible to use ChannelOutboundBuffer with Channel implementation that not extends AbstractChannel
Motivation:
As /proc/sys/net/core/somaxconn does not exists on non-linux platforms you see a noisy stacktrace when debug level is enabled while the static method of NetUtil is executed.
Modifications:
Check if the file exists before try to parse it.
Result:
Less noisy logging on non-linux platforms.
Related issue: #2354
Motivation:
AbstractConstant.compareTo() can return 0 even if the specified constant
object is not the same instance with 'this'.
Modifications:
- Compare the identityHashCode of constant first. If that fails,
allocate a small direct buffer and use its memory address as a unique
value. If the platform does not provide a way to get the memory
address of a direct buffer, use a thread-local random value.
- Signal cannot extend AbstractConstant. Use delegation.
Result:
It is practically impossible for AbstractConstant.compareTo() to return
0 for different constant objects.
Motivation:
While benchmarking the native transport with gathering writes I noticed that it is quite slow. This is due the fact that we need to do a lot of array copies to get the buffers into the iov array.
Modification:
Introduce a new class calles IovArray which allows to fill buffers directly in a iov array that can be passed over to JNI without any array copies. This gives a nice optimization in terms of speed when doing gathering writes.
Result:
Big performance improvement when doing gathering writes. See the included benchmark...
Before:
[nmaurer@xxx]~% wrk/wrk -H 'Host: localhost' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Connection: keep-alive' -d 120 -c 256 -t 16 --pipeline 256 http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 23.44ms 16.37ms 259.57ms 91.77%
Req/Sec 181.99k 31.69k 304.60k 78.12%
346544071 requests in 2.00m, 46.48GB read
Requests/sec: 2887885.09
Transfer/sec: 396.59MB
With this change:
[nmaurer@xxx]~% wrk/wrk -H 'Host: localhost' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Connection: keep-alive' -d 120 -c 256 -t 16 --pipeline 256 http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 21.93ms 16.33ms 305.73ms 92.34%
Req/Sec 194.56k 33.75k 309.33k 77.04%
369617503 requests in 2.00m, 49.57GB read
Requests/sec: 3080169.65
Transfer/sec: 423.00MB
Motivation:
Due some race-condition while handling canellation of TimerTasks it was possibleto corrupt the linked-list structure that is represent by HashedWheelBucket and so produce a NPE.
Modification:
Fix the problem by adding another MpscLinkedQueue which holds the cancellation tasks and process them on each tick. This allows to use no synchronization / locking at all while introduce a latency of max 1 tick before the TimerTask can be GC'ed.
Result:
No more NPE
- Rewrite with linear probing, no state array, compaction at cleanup
- Optimize keys() and values() to not use reflection
- Optimize hashCode() and equals() for efficient iteration
- Fixed equals() to not return true for equals(null)
- Optimize iterator to not allocate new Entry at each next()
- Added toString()
- Added some new unit tests
Motivation:
Now Netty has a few problems with null values.
Modifications:
- Check HAProxyProxiedProtocol in HAProxyMessage constructor and throw NPE if it is null.
If HAProxyProxiedProtocol is null we will set AddressFamily as null. So we will get NPE inside checkAddress(String, AddressFamily) and it won't be easy to understand why addrFamily is null.
- Check File in DiskFileUpload.toString().
If File is null we will get NPE when calling toString() method.
- Check Result<String> in MqttDecoder.decodeConnectionPayload(...).
If !mqttConnectVariableHeader.isWillFlag() || !mqttConnectVariableHeader.hasUserName() || !mqttConnectVariableHeader.hasPassword() we will get NPE when we will try to create new instance of MqttConnectPayload.
- Check Unsafe before calling unsafe.getClass() in PlatformDependent0 static block.
- Removed unnecessary null check in WebSocket08FrameEncoder.encode(...).
Because msg.content() can not return null.
- Removed unnecessary null check in DefaultStompFrame(StompCommand) constructor.
Because we have this check in the super class.
- Removed unnecessary null checks in ConcurrentHashMapV8.removeTreeNode(TreeNode<K,V>).
- Removed unnecessary null check in OioDatagramChannel.doReadMessages(List<Object>).
Because tmpPacket.getSocketAddress() always returns new SocketAddress instance.
- Removed unnecessary null check in OioServerSocketChannel.doReadMessages(List<Object>).
Because socket.accept() always returns new Socket instance.
- Pass Unpooled.buffer(0) instead of null inside CloseWebSocketFrame(boolean, int) constructor.
If we will pass null we will get NPE in super class constructor.
- Added throw new IllegalStateException in GlobalEventExecutor.awaitInactivity(long, TimeUnit) if it will be called before GlobalEventExecutor.execute(Runnable).
Because now we will get NPE. IllegalStateException will be better in this case.
- Fixed null check in OpenSslServerContext.setTicketKeys(byte[]).
Now we throw new NPE if byte[] is not null.
Result:
Added new null checks when it is necessary, removed unnecessary null checks and fixed some NPE problems.
Motivation:
Fix some typos in Netty.
Modifications:
- Fix potentially dangerous use of non-short-circuit logic in Recycler.transfer(Stack<?>).
- Removed double 'the the' in javadoc of EmbeddedChannel.
- Write to log an exception message if we can not get SOMAXCONN in the NetUtil's static block.
Modifications:
- Added a static modifier for CompositeByteBuf.Component.
This class is an inner class, but does not use its embedded reference to the object which created it. This reference makes the instances of the class larger, and may keep the reference to the creator object alive longer than necessary.
- Removed unnecessary boxing/unboxing operations in HttpResponseDecoder, RtspResponseDecoder, PerMessageDeflateClientExtensionHandshaker and PerMessageDeflateServerExtensionHandshaker
A boxed primitive is created from a String, just to extract the unboxed primitive value.
- Removed unnecessary 3 times calculations in DiskAttribute.addContent(...).
- Removed unnecessary checks if file exists before call mkdirs() in NativeLibraryLoader and PlatformDependent.
Because the method mkdirs() has this check inside.
- Removed unnecessary `instanceof AsciiString` check in StompSubframeAggregator.contentLength(StompHeadersSubframe) and StompSubframeDecoder.getContentLength(StompHeaders, long).
Because StompHeaders.get(CharSequence) always returns java.lang.String.
Motivations:
In our new version of HWT we used some kind of lazy cancelation of timeouts by put them back in the queue and let them pick up on the next tick. This multiple problems:
- we may corrupt the MpscLinkedQueue if the task is used as tombstone
- this sometimes lead to an uncessary delay especially when someone did executed some "heavy" logic in the TimeTask
Modifications:
Use a Lock per HashedWheelBucket for save and fast removal.
Modifications:
Cancellation of tasks can be done fast and so stuff can be GC'ed and no more infinite-loop possible
Motivation:
Need to upgrade HTTP/2 implementation to latest draft.
Modifications:
Various changes to support draft 13.
Result:
Support for HTTP/2 draft 13.
Motivation:
When system is in short of entrophy, the initialization of
ThreadLocalRandom can take at most 3 seconds. The initialization occurs
when ThreadLocalRandom.current() is invoked first time, which might be
much later than the moment when the application has started. If we
start the initialization of ThreadLocalRandom as early as possible, we
can reduce the perceived time taken for the retrieval.
Modification:
Begin the initialization of ThreadLocalRandom in InternalLoggerFactory,
potentially one of the firstly initialized class in a Netty application.
Make DefaultChannelId retrieve the current process ID before retrieving
the current machine ID, because retrieval of a machine ID is more likely
to use ThreadLocalRandom.current().
Use a dummy channel ID for EmbeddedChannel, which prevents many unit
tests from creating a ThreadLocalRandom instance.
Result:
We gain extra 100ms at minimum for initialSeedUniquifier generation. If
an application has its own initialization that takes long enough time
and generates good amount of entrophy, it is very likely that we will
gain a lot more.
Motivation:
We use the nanoTime of the scheduledTasks to calculate the milli-seconds to wait for a select operation to select something. Once these elapsed we check if there was something selected or some task is ready for processing. Unfortunally we not take into account scheduled tasks here so the selection loop will continue if only scheduled tasks are ready for processing. This will delay the execution of these tasks.
Modification:
- Check if a scheduled task is ready after selecting
- also make a tiny change in NioEventLoop to not trigger a rebuild if nothing was selected because the timeout was reached a few times in a row.
Result:
Execute scheduled tasks on time.
Motivation:
When a user tries to use netty on android it currently fails with "Could not find class 'sun.misc.Cleaner'"
Modification:
Encapsulate sun.misc.Cleaner usage in extra class to workaround this isssue.
Result:
Netty can be used on android again
Motivation:
During some refactoring we changed PlatformDependend0 to use sun.nio.ch.DirectBuffer for release direct buffers. This broke support for android as the class does not exist there and so an exception is thrown.
Modification:
Use again the fieldoffset to get access to Cleaner for release direct buffers.
Result:
Netty can be used on android again
Motivation:
Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used.
Modification:
- Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance.
- Recycling of the same object multiple times without acquire it will fail.
- Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler
These changes are based on @belliottsmith 's work that was part of #2504.
Result:
Huge increase in performance.
4.0 branch without this commit:
Benchmark (size) Mode Samples Score Score error Units
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
4.0 branch with this commit:
Benchmark (size) Mode Samples Score Score error Units
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
Motivation:
The tail node reference writes (by producer threads) are very likely to
invalidate the cache line holding the headRef which is read by the
consumer threads in order to access the padded reference to the head
node. This is because the resulting layout for the object is:
- header
- Object AtomicReference.value -> Tail node
- Object MpscLinkedQueue.headRef -> PaddedRef -> Head node
This is 'passive' false sharing where one thread reads and the other
writes. The current implementation suffers from further passive false
sharing potential from any and all neighbours to the queue object as no
pre/post padding is provided for the class fields.
Modifications:
Fix the memory layout by adding pre-post padding for the head node and
putting the tail node reference in the same object.
Result:
Fixed false sharing
Motivation:
An IDE's auto-completion often confuses between java.util.Collections
and io.netty.util.collection.Collections, and it's annoying to me. :-)
Modifications:
Use a different class name.
Result:
When I type Collections, it's always 'the' Collections.
Motivation:
Various small fixes/improvements to the interface to the HTTP/2 classes,
as well as some minor performance improvements.
Modifications:
- Added fix for IntObjectHashMap to ensure that capacity is always odd.
Even capacity can cause probing to fail.
- Cleaned the access to GOAWAY information in Http2Connection interface.
Endpoints now manage their own state for GOAWAY. Also added a goingAway
event handler.
- Added Endpoint methods for checking MAX_CONCURRENT_STREAMS or if the
number of streams for the endpoint have been exhausted. See
Endpoint.nextStreamId()/acceptingNewStreams().
- Changed DefaultHttp2Connection to use IntObjectHashMap. This should be
a slight memory improvement.
- Fixed check for MAX_CONCURRENT_STREAMS to correctly use the number of
active streams for the endpoint (not total active). See
DefaultHttp2Connection.checkNewStreamAllowed.
- Exposing a few methods to subclasses of AbstractHttp2ConnectionHandler
(e.g. exception handling).
- Cleaning up GOAWAY and RST_STREAM handling in
AbstractHttp2ConnectionHandler.
Result:
HTTP/2 code should provide more information to subclasses and will have
a reduced memory footprint.
Motivation:
When Netty runs in a managed environment such as web application server,
Netty needs to provide an explicit way to remove the thread-local
variables it created to prevent class loader leaks.
FastThreadLocal uses different execution paths for storing a
thread-local variable depending on the type of the current thread.
It increases the complexity of thread-local removal.
Modifications:
- Moved FastThreadLocal and FastThreadLocalThread out of the internal
package so that a user can use it.
- FastThreadLocal now keeps track of all thread local variables it has
initialized, and calling FastThreadLocal.removeAll() will remove all
thread-local variables of the caller thread.
- Added FastThreadLocal.size() for diagnostics and tests
- Introduce InternalThreadLocalMap which is a mixture of hard-wired
thread local variable fields and extensible indexed variables
- FastThreadLocal now uses InternalThreadLocalMap to implement a
thread-local variable.
- Added ThreadDeathWatcher.unwatch() so that PooledByteBufAllocator
tells it to stop watching when its thread-local cache has been freed
by FastThreadLocal.removeAll().
- Added FastThreadLocalTest to ensure that removeAll() works
- Added microbenchmark for FastThreadLocal and JDK ThreadLocal
- Upgraded to JMH 0.9
Result:
- A user can remove all thread-local variables Netty created, as long as
he or she did not exit from the current thread. (Note that there's no
way to remove a thread-local variable from outside of the thread.)
- FastThreadLocal exposes more useful operations such as isSet() because
we always implement a thread local variable via InternalThreadLocalMap
instead of falling back to JDK ThreadLocal.
- FastThreadLocalBenchmark shows that this change improves the
performance of FastThreadLocal even more.
Motivation:
Maps with integer keys are used in several places (HTTP/2 code, for
example). To reduce the memory footprint of these structures, we need a
specialized map class that uses ints as keys.
Modifications:
Added IntObjectHashMap, which is uses open addressing and double hashing
for collision resolution.
Result:
A new int-based map class that can be shared across Netty.
Motivation:
We have quite a bit of code duplication between HTTP/1, HTTP/2, SPDY,
and STOMP codec, because they all have a notion of 'headers', which is a
multimap of string names and values.
Modifications:
- Add TextHeaders and its default implementation
- Add AsciiString to replace HttpHeaderEntity
- Borrowed some portion from Apache Harmony's java.lang.String.
- Reimplement HttpHeaders, SpdyHeaders, and StompHeaders using
TextHeaders
- Add AsciiHeadersEncoder to reuse the encoding a TextHeaders
- Used a dedicated encoder for HTTP headers for better performance
though
- Remove shortcut methods in SpdyHeaders
- Remove shortcut methods in SpdyHttpHeaders
- Replace SpdyHeaders.getStatus() with HttpResponseStatus.parseLine()
Result:
- Removed quite a bit of code duplication in the header implementations.
- Slightly better performance thanks to improved header validation and
hash code calculation
Motivation:
Provide a faster ThreadLocal implementation
Modification:
Add a "FastThreadLocal" which uses an EnumMap and a predefined fixed set of possible thread locals (all of the static instances created by netty) that is around 10-20% faster than standard ThreadLocal in my benchmarks (and can be seen having an effect in the direct PooledByteBufAllocator benchmark that uses the DEFAULT ByteBufAllocator which uses this FastThreadLocal, as opposed to normal instantiations that do not, and in the new RecyclableArrayList benchmark);
Result:
Improved performance
Motivation:
At the moment the HashedWheelTimer will only remove the cancelled Timeouts once the HashedWheelBucket is processed again. Until this the instance will not be able to be GC'ed as there are still strong referenced to it even if the user not reference it by himself/herself. This can cause to waste a lot of memory even if the Timeout was cancelled before.
Modification:
Add a new queue which holds CancelTasks that will be processed on each tick to remove cancelled Timeouts. Because all of this is done only by the WorkerThread there is no need for synchronization and only one extra object creation is needed when cancel() is executed. For addTimeout(...) no new overhead is introduced.
Result:
Less memory usage for cancelled Timeouts.
Motivation:
As part of GSOC 2013 we had @mbakkar working on a DNS codec but did not integrate it yet as it needs some cleanup. This commit is based on @mbakkar's work and provide the codec for DNS.
Modifications:
Add DNS codec
Result:
Reusable DNS codec will be included in netty.
This PR also includes a AsynchronousDnsResolver which allows to resolve DNS entries in a non blocking way by make use
of the dns codec and netty transport itself.
Motivation:
MpscLinkedQueue has various issues:
- It does not work without sun.misc.Unsafe.
- Some field names are confusing.
- Node.tail does not refer to the tail node really.
- The tail node is the starting point of iteration. I think the tail
node should be the head node and vice versa to reduce confusion.
- Some important methods are not implemented (e.g. iterator())
- Not serializable
- Potential false cache sharing problem due to lack of padding
- MpscLinkedQueue extends AtomicReference and thus exposes various
operations that mutates the internal state of the queue directly.
Modifications:
- Use AtomicReferenceFieldUpdater wherever possible so that we do not
use Unsafe directly. (e.g. use lazySet() instead of putOrderedObject)
- Extend AbstractQueue to implement most operations
- Implement serialization and iterator()
- Rename tail to head and head to tail to reduce confusion.
- Rename Node.tail to Node.next.
- Fix a leak where the references in the removed head are not cleared
properly.
- Add Node.clearMaybe() method so that the value of the new head node
is cleared if possible.
- Add some comments for my own educational purposes
- Add padding to the head node
- Add FullyPaddedReference and RightPaddedReference for future reuse
- Make MpscLinkedQueue package-local so that a user cannot access the
dangerous yet public operations exposed by the superclass.
- MpscLinkedQueue.Node becomes MpscLinkedQueueNode, a top level class
Result:
- It's more like a drop-in replacement of ConcurrentLinkedQueue for the
MPSC case.
- Works without sun.misc.Unsafe
- Code potentially easier to understand
- Fixed leak (related: #2372)
Motivation:
When running Netty on a container environment, the container will often
complain about the lingering threads such as the worker threads of
ThreadDeathWatcher and GlobalEventExecutor. We should provide an
operation that allows a use to wait until such threads are terminated.
Modifications:
- Add awaitInactivity()
- (misc) Fix typo in GlobalEventExecutorTest
- (misc) Port ThreadDeathWatch's CAS-based thread life cycle management
to GlobalEventExecutor
Result:
- Fixes#2084
- Less overhead on task submission of GlobalEventExecutor
Motivation:
PooledByteBufAllocator's thread local cache and
ReferenceCountUtil.releaseLater() are in need of a way to run an
arbitrary logic when a certain thread is terminated.
Modifications:
- Add ThreadDeathWatcher, which spawns a low-priority daemon thread
that watches a list of threads periodically (every second) and
invokes the specified tasks when the associated threads are not alive
anymore
- Start-stop logic based on CAS operation proposed by @tea-dragon
- Add debug-level log messages to see if ThreadDeathWatcher works
Result:
- Fixes#2519 because we don't use GlobalEventExecutor anymore
- Cleaner code
Motivation:
The current DefaultAttributeMap cause an infinite-loop when the user removes an attribute and create the same attribute again. This regression was introduced by c3bd7a8ff1.
Modification:
Correctly break out loop
Result:
No infinite-loop anymore.
Motivation:
When (listeners == null && lateListeners == null) and (stackDepth >= MAX_LISTENER_STACK_DEPTH), the listener is not notified at all. The discard client does not work.
Modification:
Make sure to submit the notification task.
Result:
The discard client works again and all listeners are notified.
Motivation:
During a large memory copy, safepoint polling is diabled, hindering
accurate profiling.
Modifications:
Only copy up to 1 MiB per Unsafe.copyMemory()
Result:
Potentially more reliable performance
Motivation:
Some users already use an SSLEngine implementation in finagle-native. It
wraps OpenSSL to get higher SSL performance. However, to take advantage
of it, finagle-native must be compiled manually, and it means we cannot
pull it in as a dependency and thus we cannot test our SslHandler
against the OpenSSL-based SSLEngine. For an instance, we had #2216.
Because the construction procedures of JDK SSLEngine and OpenSslEngine
are very different from each other, we also need to provide a universal
way to enable SSL in a Netty application.
Modifications:
- Pull netty-tcnative in as an optional dependency.
http://netty.io/wiki/forked-tomcat-native.html
- Backport NativeLibraryLoader from 4.0
- Move OpenSSL-based SSLEngine implementation into our code base.
- Copied from finagle-native; originally written by @jpinner et al.
- Overall cleanup by @trustin.
- Run all SslHandler tests with both default SSLEngine and OpenSslEngine
- Add a unified API for creating an SSL context
- SslContext allows you to create a new SSLEngine or a new SslHandler
with your PKCS#8 key and X.509 certificate chain.
- Add JdkSslContext and its subclasses
- Add OpenSslServerContext
- Add ApplicationProtocolSelector to ensure the future support for NPN
(NextProtoNego) and ALPN (Application Layer Protocol Negotiation) on
the client-side.
- Add SimpleTrustManagerFactory to help a user write a
TrustManagerFactory easily, which should be useful for those who need
to write an alternative verification mechanism. For example, we can
use it to implement an unsafe TrustManagerFactory that accepts
self-signed certificates for testing purposes.
- Add InsecureTrustManagerFactory and FingerprintTrustManager for quick
and dirty testing
- Add SelfSignedCertificate class which generates a self-signed X.509
certificate very easily.
- Update all our examples to use SslContext.newClient/ServerContext()
- SslHandler now logs the chosen cipher suite when handshake is
finished.
Result:
- Cleaner unified API for configuring an SSL client and an SSL server
regardless of its internal implementation.
- When native libraries are available, OpenSSL-based SSLEngine
implementation is selected automatically to take advantage of its
performance benefit.
- Examples take advantage of this modification and thus are cleaner.
Motivation:
The old DefaultAttributeMap impl did more synchronization then needed and also did not expose a efficient way to check if an attribute exists with a specific key.
Modifications:
* Rewrite DefaultAttributeMap to not use IdentityHashMap and synchronization on the map directly. The new impl uses a combination of AtomicReferenceArray and synchronization per chain (linked-list). Also access the first Attribute per bucket can be done without any synchronization at all and just uses atomic operations. This should fit for most use-cases pretty weel.
* Add hasAttr(...) implementation
Result:
It's now possible to check for the existence of a attribute without create one. Synchronization is per linked-list and the first entry can even be added via atomic operation.
Motivation:
At the moment there are two issues with HashedWheelTimer:
* the memory footprint of it is pretty heavy (250kb fon an empty instance)
* the way how added Timeouts are handled is inefficient in terms of how locks etc are used and so a lot of context-switching / condition can happen.
Modification:
Rewrite HashedWheelTimer to use an optimized bucket implementation to store the submitted Timeouts and a MPSC queue to handover the timeouts. So volatile writes are reduced to a minimum and also the memory foot-print of the buckets itself is reduced a lot as the bucket uses a double-linked-list. Beside this we use Atomic*FieldUpdater where-ever possible to improve the memory foot-print and performance.
Result:
Lower memory-footprint and better performance
Motivation:
Some JDK versions of Mac OS X generates a JNI dynamic library with '.jnilib' extension rather than with '.dynlib' extension. However, System.mapLibraryName() always returns 'lib<name>.dynlib'. As a result, NativeLibraryLoader fails to load the native library whose extension is .jnilib.
Modification:
Try to find both '.jnilib' and '.dynlib' resources on OS X.
Result:
Dynamic libraries are loaded correctly in Mac OS X, and thus we can continue the OpenSslEngine work.
Motivation:
So far, we used a very simple platform string such as linux64 and
linux32. However, this is far from perfection because it does not
include anything about the CPU architecture.
Also, the current build tries to put multiple versions of .so files into
a single JAR. This doesn't work very well when we have to ship for many
different platforms. Think about shipping .so/.dynlib files for both
Linux and Mac OS X.
Modification:
- Use os-maven-plugin as an extension to determine the current OS and
CPU architecture reliable at build time
- Use Maven classifier instead of trying to put all shared libraries
into a single JAR
- NativeLibraryLoader does not guess the OS and bit mode anymore and it
always looks for the same location regardless of platform, because the
Maven classifier does the job instead.
Result:
Better scalable native library deployment and retrieval
Motivation:
It is less confusing not to spread Thread.interrupt() calls.
Modification:
- Comments
- Move generatorThread.interrupt() to where currentThread.interrupt() is
triggered
Result:
Code that is easier to read
Motivation:
4 and 5 were diverged long time ago and we recently reverted some of the
early commits in master. We must make sure 4.1 and master are not very
different now.
Modification:
Fix found differences
Result:
4.1 and master got closer.
Motivation:
4 and 5 were diverged long time ago and we recently reverted some of the
early commits in master. We must make sure 4.1 and master are not very
different now.
Modification:
Fix found differences
Result:
4.1 and master got closer.
Motivation:
ThreadLocalRandomTest reveals that ThreadLocalRandom's initial seed generation loop becomes tight if the thread is interrupted.
We currently interrupt ourselves inside the wait loop, which will raise an InterruptedException again in the next iteration, resulting in infinite (up to 3 seconds) exception construction and thread interruptions.
Modification:
- When the initial seed generator thread is interrupted, break out of the wait loop immediately.
- Log properly when the initial seed generation failed due to interruption.
- When failed to generate the initial seed, interrupt the generator thread just in case the SecureRandom implementation handles it properly.
- Make the initial seed generator thread daemon and handle potential exceptions raised due to the interruption.
Result:
No more tight loop on interruption. More robust generator thread termination. Fixes#2412
Motivation:
When using System.getProperty(...) and various methods to get a ClassLoader it will fail when a SecurityManager is in place.
Modifications:
Use a priveled block if needed. This work is based in the PR #2353 done by @anilsaldhana .
Result:
Code works also when SecurityManager is present
Motivation:
Minor spelling/typo correction for EventExecutor's newSucceededFuture and
newFailedFuture javadocs.
Modifications:
Just correcting the spelling/typo.
Result:
Improve the javadocs.
Motivation:
Previously, we used SecureRandom.nextLong() to generate the initialSeedUniquifier. This required more entrophy than necessary because it has to 1) generate the seed of SecureRandom first and then 2) generate a random long integer. Instead, we can use generateSeed() to skip the step (2)
Modifications:
Use generateSeed() instead of nextLong()
Result:
ThreadLocalRandom requires less amount of entrphy to start up