Motivation:
We recently added the ResourceLeakDetectorFactory but missed to updated HashedWheelTimer to use it.
Modifications:
- Add new abstract method to ResourceLeakDetectorFactory that allows to provide also samplingInterval and maxActive args.
- Deprecate most constructors in ResourceLeakDetector and add doc explaining that people should use ResourceLeakDetectorFactory
Result:
Custom ResourceLeakDetectorFactory will also be used in HashedWheelTimer if configured.
Motivation:
Sometimes a shared HashedWheelTimer can not easily be stopped in a good place. If the worker thread is daemon this is not a big deal and we should allow to not log a leak.
Modifications:
Add another constructor which allows to disable resource leak detection if worker thread is used.
Result:
Not log resource leak when HashedWheelTimer is not stopped and the worker thread is a deamon thread.
Motivation:
Some Netty use cases may want to configure the max allowed stack depth for promise listener notification.
Modifications:
- Add a system property so that this value can be configured.
Result:
DefaultPromise's max stack depth is configurable.
Motivation:
PR #5355 modified interfaces to reduce GC related to the HPACK code. However this came with an anticipated performance regression related to HpackUtil.equals due to AsciiString's increase cost of charAt(..). We should mitigate this performance regression.
Modifications:
- Introduce an equals method in PlatformDependent which doesn't leak timing information and use this in HpcakUtil.equals
Result:
Fixes https://github.com/netty/netty/issues/5436
Motivation:
It is good to have used dependencies and plugins up-to-date to fix any undiscovered bug fixed by the authors.
Modification:
Scanned dependencies and plugins and carefully updated one by one.
Result:
Dependencies and plugins are up-to-date.
Motiviation:
Sometimes it is useful to allow to specify a custom strategy to handle rejected tasks. For example if someone tries to add tasks from outside the eventloop it may make sense to try to backoff and retries and so give the executor time to recover.
Modification:
Add RejectedEventExecutor interface and implementations and allow to inject it.
Result:
More flexible handling of executor overload.
Motivation:
To restrict the memory usage of a system it is sometimes needed to adjust the number of max pending tasks in the tasks queue.
Modifications:
- Add new constructors to modify the number of allowed pending tasks.
- Add system properties to configure the default values.
Result:
More flexible configuration.
Motivation:
We should merge ThrowableUtils into ThrowableUtil as this name is more consistent with the naming of utility classes in netty.
Modifications:
Merge classes.
Result:
More consistent naming
Motivation:
We use pre-instantiated exceptions in various places for performance reasons. These exceptions don't include a stacktrace which makes it hard to know where the exception was thrown. This is especially true as we use the same exception type (for example ChannelClosedException) in different places. Setting some StackTraceElements will provide more context as to where these exceptions original and make debugging easier.
Modifications:
Set a generated StackTraceElement on these pre-instantiated exceptions which at least contains the origin class and method name. The filename and linenumber are specified as unkown (as stated in the javadocs of StackTraceElement).
Result:
Easier to find the origin of a pre-instantiated exception.
Allow users of Netty to plug in their own leak detector for the purpose
of instrumentation.
Motivation:
We are rolling out a large Netty deployment and want to be able to
track the amount of leaks we're seeing in production via custom
instrumentation. In order to achieve this today, I had to plug in a
custom `ByteBufAllocator` into the bootstrap and have it initialize a
custom `ResourceLeakDetector`. Due to these classes mostly being marked
`final` or having private or static methods, a lot of the code had to
be copy-pasted and it's quite ugly.
Modifications:
* I've added a static loader method for the `ResourceLeakDetector` in
`AbstractByteBuf` that tries to instantiate the class passed in via the
`-Dio.netty.customResourceLeakDetector`, otherwise falling back to the
default one.
* I've modified `ResourceLeakDetector` to be non-final and to have the
reporting broken out in to methods that can be overridden.
Result:
You can instrument leaks in your application by just adding something
like the following:
```java
public class InstrumentedResourceLeakDetector<T> extends
ResourceLeakDetector<T> {
@Monitor("InstanceLeakCounter")
private final AtomicInteger instancesLeakCounter;
@Monitor("LeakCounter")
private final AtomicInteger leakCounter;
public InstrumentedResourceLeakDetector(Class<T> resource) {
super(resource);
this.instancesLeakCounter = new AtomicInteger();
this.leakCounter = new AtomicInteger();
}
@Override
protected void reportTracedLeak(String records) {
super.reportTracedLeak(records);
leakCounter.incrementAndGet();
}
@Override
protected void reportUntracedLeak() {
super.reportUntracedLeak();
leakCounter.incrementAndGet();
}
@Override
protected void reportInstancesLeak() {
super.reportInstancesLeak();
instancesLeakCounter.incrementAndGet();
}
}
```
Motivation:
DefaultThreadFactory allows to override the newThread(...) method and so should have access to all fields that are set via the constructor.
Modifications:
Change threadGroup from private to protected visibility.
Result:
Easier to extend DefaultThreadFactory.
Motivation:
In case of exception in invokeExceptionCaught() only original exception passed to invokeExceptionCaught() will be logged on any log level.
+ AbstractChannelHandlerContext and CombinedChannelDuplexHandler log different exceptions.
Modifications:
Fix inconsistent logging code and add ability to see both stacktraces on DEBUG level.
Result:
Both handlers log now both original exception and thrown from invokeExceptionCaught. To see full stacktrace of exception thrown from invokeExceptionCaught DEBUG log level must be enabled.
Motivation:
JCTools supports both non-unsafe, unsafe versions of queues and JDK6 which allows us to shade the library in netty-common allowing it to stay "zero dependency".
Modifications:
- Remove copy paste JCTools code and shade the library (dependencies that are shaded should be removed from the <dependencies> section of the generated POM).
- Remove usage of OneTimeTask and remove it all together.
Result:
Less code to maintain and easier to update JCTools and less GC pressure as the queue implementation nt creates so much garbage
Motivation:
The javaDocs for Future.removeListener do not clarify that only the first occurrence of the listener is guaranteed to be removed.
Modifications:
- Clarify the javaDocs for Future.removeListener[s] so it is known that the only the first occurrence of the listener will be removed.
Result:
Fixes https://github.com/netty/netty/issues/5351
Motivation:
Sometimes it may be benefitially for an user to specify a custom algorithm when choose the next EventExecutor/EventLoop.
Modifications:
Allow to specify a custom EventExecutorChooseFactory that allows to customize algorithm.
Result:
More flexible api.
Motivation:
We tried to always use SecureRandom to generate the initialSeed for our ThreadLocalRandom, this can sometimes give warnings under normal usage. We should better not use SecureRandom as default (just as the implementation in jsr166y does) and only try if the user specified -Djava.util.secureRandomSeed=true .
Modifications:
Only try to use SecureRandom when -Djava.util.secureRandomSeed=true is used.
Result:
Less likely to see entropy warnings.
Motivation:
If the user uses unsafe direct buffers with no cleaner we can use Unsafe.reallocateMemory(...) as optimization when we need to expand the buffer.
Modifications:
Use Unsafe.relocateMemory(...) in UnpooledUnsafeNoCleanerDirectByteBuf.
Result:
Less expensive expanding of buffers.
Motivation:
Using the Cleaner to release the native memory has a few drawbacks:
- Cleaner.clean() uses static synchronized internally which means it can be a performance bottleneck
- It put more load on the GC
Modifications:
Add new buffer implementations that can be enabled with a system flag as optimizations. In this case no Cleaner is used at all and the user must ensure everything is always released.
Result:
Less performance impact by direct buffers when need to be allocated and released.
Modifications:
DefaultPromise provides a ThreadLocal queue to protect against StackOverflowError because of executors which may immediately execute runnables instead of queue them (i.e. ImmediateEventExecutor). However this may be better addressed by fixing these executors to protect against StackOverflowError instead of just fixing for a single use case. Also the most commonly used executors already provide the desired behavior and don't need the additional overhead of a ThreadLocal queue in DefaultPromise.
Modifications:
- Remove ThreadLocal queue from DefaultPromise
- Change ImmediateEventExecutor so it maintains a queue of runnables if reentrant condition occurs
Result:
DefaultPromise StackOverflowError code is simpler, and ImmediateEventExecutor protects against StackOverflowError.
Motivation:
Under very unlikely (however possible) circumstances, Recycler may leak
references. This happens _only_ when the object was already recycled
at least once (which means it's got written to the stack) and then
taken out again, and never returned.
The "never returned" part may be the fault of the user (forgotten
`finally` clause) or the situation when Recycler drops the possibly
youngest item itself.
Modifications:
Nullify the item taken from the stack.
Result:
Reference is cleaned up. If the object is lost, it will be a subject for
GC. The rest of Stack / Recycler functionality remains unaffected.
Motivation:
Sometimes people may want to trade GC with memory overhead. For this it can be useful to allow to change the capacity of the array that is hold in the Link that is used by the Recycler internally.
Modifications:
Introduce a new system property , io.netty.recycler.linkCapacity which allows to change the capcity.
Result:
More flexible configuration of netty.
Motivation:
Currenlty, netty-transport-native-epoll-*-linux-x86_64.jar is not packed as OSGi bundle
and thus not working in OSGi environment.
Modifications:
In netty-transport-native-epoll's pom.xml added configuration
to attach manifest to the jar with a native library.
In netty-common's pom.xml added configuration instruction (DynamicImport-Package)
to maven bnd plugin to make sure the native code is loaded from
netty-transport-native-epoll bundle.
Result:
The netty-transport-native-epoll-*-linux-x86_64.jar is a bundle (MANIFEST.MF attached)
and the inluced native library can be successfuly loaded in OSGi environment.
Fixing #5119
Motivation:
Unsafe offers a method to set memory to a specific value. This can be used to implement an optimized version of setZero(...) and writeZero(...)
Modifications:
Add implementation for all Unsafe*ByteBuf implementations.
Result:
Faster setZero(...) and writeZero(...)
Motivation:
SingleThreadEventExecutor.pendingTasks() will call taskQueue.size() to get the number of pending tasks in the queue. This is not safe when using MpscLinkedQueue as size() is only allowed to be called by a single consumer.
Modifications:
Ensure size() is only called from the EventLoop.
Result:
No more livelock possible when call pendingTasks, no matter from which thread it is done.
Motivation:
f2ed3e6ce8 removed the previous mechanism for StackOverflowError because it didn't work in all cases (i.e. ImmediateExecutor). However if a chain of listeners which complete other promises is formed there is still a possibility of a StackOverflowError.
Modifications:
- Use a ThreadLocal to save any DefaultPromises which could not be notified due to the stack being too large. After the first DefaultPromise on the stack completes notification this ThreadLocal should be used to notify any DefaultPromises which have not yet been notified.
Result:
DefaultPromise has StackOverflowError protection that works with all EventExecutor types.
Motivation:
If the executor changes while listeners are added and notification of listeners is being done then listeners can be notified out of order and concurrently. We should ensure that only one executor is used at any given time to notify listeners and ensure the listeners are notified in FIFO order.
Modifications:
- Move the notifyingListeners member variable from DefaultPromise into the synchronized block to prevent concurrent notification of listeners and preserve FIFO notification order
Result:
If the executor is changed for a DefaultPromise the listener notification order should be FIFO.
Motivation:
Recycler.recycle(...) should not be used anymore and be replaced by Handle.recycle().
Modifications:
Mark it as deprecated and update usage.
Result:
Correctly document deprecated api.
Motivation:
DomainMappingBuilder should have been named as DomainNameMappingBuilder
because it builds a DomainNameMapping.
Modifications:
- Add DomainNameMappingBuilder that does the same job with
DomainMappingBuilder
- Deprecate DomainMappingBuilder and delegate its logic to
DomainNameMappingBuilder
- Remove the references to the deprecated methods and classes related
with domain name mapping
- Miscellaneous:
- Fix Javadoc of DomainNameMapping.asMap()
- Pre-create the unmodifiable map in DomainNameMapping
Result:
- Consistent naming
- Less use of deprecated API
Motivation:
DomainNameMapping.entries() returns Set<Map.Entry<String, V>>, which
doesn't sound very natural.
Modifications:
Replace entries() with asMap() which returns a Map<String, V> instead.
Result:
- Better looking API
- User can do a lookup because it's a Map
Motivation:
As reported in #4211, when using Netty in Tomcat (or other container based deployment), ForkJoinPool leaks an instance of `Submitter` so that the class loader of `Submitter` won't be GCed. However, since `Submitter` is just a wrapper of `int`, we can replace it with `int[1]`.
Modifications:
Replace `Submitter` with `int[1]`.
Result:
No class loader leak in ForkJoinPool when using in a container.
Motivation:
See #4200.
Modifications:
Add DomainNameMapping.entries to allow retrieving the domain match lists.
Result:
People can use DomainNameMapping.entries to retrive the match list in DomainNameMapping.
Motivation:
EventExecutor.children uses generics in such a way that an entire colleciton must be cast to a specific type of object. This interface is not very flexible and is impossible to implement if the EventExecutor type must be wrapped. The current usage of this method also does not have any clear need within Netty. The Iterator interface allows for EventExecutor to be wrapped and forces the caller to make assumptions about types instead of building the assumptions into the interface.
Motivation:
- Remove EventExecutor.children and undeprecate the iterator() interface
Result:
EventExecutor interface has one less method and is easier to wrap.
Motivation:
KObjectHashMap.remove(int index) attempts to move back items which may have been displaced because their spot in the hash based array was taken by another item. If this happens the nextIndex reference in PrimitiveIterator will not be updated. At this time the PrimitiveEntry will reference the incorrect index and may result in a NPE.
Modifications:
- If KObjectHashMap.remove(int index) moves entries back then PrimitiveIterator should adjust its nextIndex
Result:
PrimitiveIterator.remove() updates its internal state to reference the new nextIndex and will not NPE.
Fixes https://github.com/netty/netty/issues/5198
Motivation:
KObjectHashMap.probeNext(..) usually involves 2 conditional statements and 2 aritmatic operations. This can be improved to have 0 conditional statements.
Modifications:
- Use bit masking to avoid conditional statements
Result:
Improved performance for KObjecthashMap.probeNext(..)
Motivation:
The DefaultHttp2Conneciton.close method accounts for active streams being iterated and attempts to avoid reentrant modifications of the underlying stream map by using iterators to remove from the stream map. However there are a few issues:
- While iterating over the stream map we don't prevent iterations over the active stream collection
- Removing a single stream may actually remove > 1 streams due to closed non-leaf streams being preserved in the priority tree which may result in NPE
Preserving closed non-leaf streams in the priority tree is no longer necessary with our current allocation algorithms, and so this feature (and related complexity) can be removed.
Modifications:
- DefaultHttp2Connection.close should prevent others from iterating over the active streams and reentrant modification scenarios which may result from this
- DefaultHttp2Connection should not keep closed stream in the priority tree
- Remove all associated code in DefaultHttp2RemoteFlowController which accounts for this case including the ReducedState object
- This includes fixing writability changes which depended on ReducedState
- Update unit tests
Result:
Fixes https://github.com/netty/netty/issues/5198
Motivation:
The LateListener logic is prone to infinite loops and relies on being processed in the EventExecutor's thread for synchronization, but this EventExecutor may not be constant. An infinite loop can occur if the EventExecutor's execute method does not introduce a context switch in LateListener.run. The EventExecutor can be changed by classes which inherit from DefaultPromise. For example the DefaultChannelPromise will return w/e EventLoop the channel is registered to, but this EventLoop can change (re-registration).
Modifications:
- Remove the LateListener concept and instead use a single Object to maintain the listeners while still preserving notification order
- Make the result member variable an atomic variable so it can be outside the synchronized(this) blocks
- Cleanup/simplify existing state management code
Result:
Fixes https://github.com/netty/netty/issues/5185
Motivation:
Currently the default log level when running tests is debug. When
running the build on the CI server it might be nice to avoid this debug
level and allow for the level to be configured.
Modifications:
Added a logback-test.xml configuration that has been added to the
common module. This allows for the logLevel to be configured.
The default level will still be debug.
Result:
The log level can now be configured from the command line:
$ mvn test -DlogLevel=error
Motivation:
Some codecs should be considered unstable as these are relative new. For this purpose we should introduce an annotation which these codecs should us to be marked as unstable in terms of API.
Modifications:
- Add UnstableApi annotation and use it on codecs that are not stable
- Move http2.hpack to http2.internal.hpack as it is internal.
Result:
Better document unstable APIs.
Motivation:
It's better to make all InternalLoggerFactory implementations be singletons according to the discussions in #5047
Modifications:
Make all InternalLoggerFactory implementations be singletons and hide the construtors.
Result:
All InternalLoggerFactory implementations be singletons.
Motivation:
Fixes#5084. We (gRPC) encountered a bug that was triggered by
grpc/grpc-java@d927180. After that commit, event loop threads are
created per task by NioEventLoopGroup, and inherits the thread group of
the caller, which in our case is an application-provided request-scope
thread. Things go south when the application tries to manipulate (e.g.,
interrupt and join) all threads of the request-scope thread group, which
unexpectedly include the event loop threads.
Modifications:
DefaultThreadFactory will save the current thread group in constructor,
and apply it to all new threads.
Result:
Threads created by DefaultThreadFactory will be in the same thread group
as the thread where the factory is created.
Related: #3449
Motivation:
When a user shut down an EventExecutor/Loop prematurely, a Promise will
fail to execute its listeners. When it happens, DefaultPromise will log
a message at ERROR level, but there's no way to get notified about it
programmatically.
Modifications:
Do not catch and log the RejectedExecutionException unconditionally,
but only catch and log for non-late listener notifications, so that a
user gets notified on submission failure at least when the listener is
late.
Result:
Remedies #3449 to some extent, although we will need fundamental fix for
that, such as #3566
Motivation:
See #3095
Modifications:
Add Log4J2LoggerFactory and Log4J2Logger which is an InternalLogger implementation based on log4j2.
Result:
The user can use log4j2 directly without a special slf4j binding.
Motivation:
Under high throughput/low latency workloads, selector wakeups are
degrading performance when the incoming operations are triggered
from outside of the event loop. This is a common scenario for
"client" applications where the originating input is coming from
application threads rather from the socket attached inside the
event loops.
As a result, it can be desirable to defer the blocking select
so that incoming tasks (write/flush) do not need to wakeup
the selector.
Modifications:
This changeset adds the notion of a generic SelectStrategy which,
based on its contract, allows the implementation to optionally
defer the blocking select based on some custom criteria.
The default implementation resembles the original behaviour, that
is if tasks are in the queue `selectNow()` and move on, and if no
tasks need to be processed go into the blocking select and wait
for wakeup.
The strategy can be customized per `NioEventLoopGroup` in the
constructor.
Result:
High performance client applications are now given the chance to
customize for how long the actual selector blocking should be
deferred by employing a custom select strategy.
Motivation:
The current slow path of FastThreadLocal is much slower than JDK ThreadLocal. See #4418
Modifications:
- Add FastThreadLocalSlowPathBenchmark for the flow path of FastThreadLocal
- Add final to speed up the slow path of FastThreadLocal
Result:
The slow path of FastThreadLocal is improved.
Motivation:
It should be possible to disable the Recycler with -Dio.netty.recycler.maxCapacity=0, but because of a typo this is not the case.
Modifications:
Replace <= with < to make it posible to disable the Recycler.
Result:
Correct behaviour when using -Dio.netty.recycler.maxCapacity=0
Motivation:
Fix a typo in the log message of the static initializer of Recycler.
Modifications:
Fix typo.
Result:
Correctly log system property io.netty.recycler.maxCapacity.
Motivation:
We want to allow the use of an uber jar that contains shared dynamic libraries for all platforms (including fedora).
Modifications:
Modified OpenSsl to try and load the fedora library if the OS is Linux and the platform specified library fails before using the default lib.
Result:
True uber support.
Motivation:
PromiseAggregator's API allows for the aggregate promise to complete before the user is done adding promises. In order to support this use case the API structure would need to change in a breaking manner.
Modifications:
- Deprecate PromiseAggregator and subclasses
- Introduce PromiseCombiner which corrects these issues
Result:
PromiseCombiner corrects the deficiencies in PromiseAggregator.
Motivation:
A custom SecurityManager may prevent calling File.exists() and so throw a SecurityException in the static init block of NetUtil.
Modifications:
Correctly catch the exception and so allow to static init NetUtil.
Result:
Allow static init method of NetUtil to work even with custom SecurityManager.
Motivation:
See #3321
Modifications:
1. Add CharsetUtil.encoder/decoder() methods
2. Deprecate CharsetUtil.getEncoder/getDecoder() methods
Result:
Users can use new CharsetUtil.encoder/decoder() to specify error actions
Motivation:
See #3746.
Modifications:
Fork SpscLinkedQueue and SpscLinkedAtomicQueue from JCTools based on 7846450e28
Result:
Add SpscLinkedQueue and SpscLinkedAtomicQueue and apply it in LocalChannel.
Motivation:
If the Future that the PromiseNotifier is listening to is cancelled, it does not propagate the cancel to all the promises it is expected to notify.
Modifications:
- If the future is cancelled then all the promises should be cancelled
- Add a UnaryPromiseNotifier if a collection of promises is not necessary
Result:
PromiseNotifier propagates cancel events to all promises
Motivation:
See #4855
Modifications:
Unfortunately, unescapeCsv cannot be used here because the input could be a CSV line like `"a,b",c`. Hence this patch adds unescapeCsvFields to parse a CSV line and split it into multiple fields and unescaped them. The unit tests should define the behavior of unescapeCsvFields.
Then this patch just uses unescapeCsvFields to implement `CombinedHttpHeaders.getAll`.
Result:
`CombinedHttpHeaders.getAll` will return the unescaped values of a header.
Motivation:
ResourceLeakDetector enforces a limit as to how large the queue is allowed to grow for stack traces in order to keep memory from growing too large. However it is not always clear when records are dropped, or how many have been dropped. This can make interpreting leak reports more difficult if you assume all information is present when it may not be. Also we should increase the limit (currently 4) when running tests on the CI servers.
Modifications:
- Increase leak detector record limit on CI servers from 4 to 32.
- Track how many records have been discarded and disclose this in the leak report.
Result:
Leak reports clarify how many records were dropped, and how to increase the limit.
Motivation:
The implementation of obtaining the best possible mac address is very good. There are many sub-par implementations proposed on stackoverflow.
While not strictly a netty concern, it would be nice to offer this util also to netty users.
Modifications:
extract DefaultChannelId#defaultMachineId code obtaining the "best" mac into a new helper called MacAddress, keep the random bytes fallback in DefaultChannelID.
Result:
New helper available.
Motivation:
See #3411. A reusable ArrayList in InternalThreadLocalMap can avoid allocations in the following pattern:
```
List<...> list = new ArrayList<...>();
add something to list but never use InternalThreadLocalMap
return list.toArray(new ...[list.size()]);
```
Modifications:
Add a reusable ArrayList to InternalThreadLocalMap and update codes to use it.
Result:
Reuse a thread local ArrayList to avoid allocations.
Motivation:
PlatformDependent0 has an optimization which grabs the char[] from a String. Since this code was introduced http://openjdk.java.net/jeps/254 has been gaining momentum in JDK 9. This JEP changes the internal storage from char[] to byte[], and thus the existing char[] only based optimizations will not work.
Modifications:
- The ASCII encoding char[] String optimizations should also work for byte[].
Result:
ASCII encoding char[] String optimizations don't break if the underlying storage in String is byte[].
Motivation:
In AsciiString.trim, last should be `arrayOffset() + length() - 1`. See #4741.
Modifications:
Fix the last value.
Result:
AsciiString.trim works correctly.
Motivation:
PlatformDependent allows some exceptions to escape during static initialization. If an exception escapes it will be translated into a java.lang.ExceptionInInitializerError and render the application unable to run.
Modifications:
- Make sure to catch Throwable during static initialization.
Result:
PlatformDependent static initialization doesn't result in java.lang.ExceptionInInitializerError.
Motivation:
Sometimes a user want to do async mappings in the SniHandler as it is not possible to populate a Mapping up front.
Modifications:
Add AsyncMapping interface and make SniHandler work with it.
Result:
It is possible to do async mappings for SNI
Motivation:
HttpConversionUtil.toHttp2Headers currently has a throws Exception as part of the signature. This comes from the signature of ByteProcessor.process, but is not necessary because the ByteProcessor used does not throw.
Modifications:
- Remove throws Exception from the signature of HttpConversionUtil.toHttp2Headers.
Result:
HttpConversionUtil.toHttp2Headers interface does not propagate a throws Exception when it is used.
Motivation:
Caching is currently nested in DnsResolver.
It should also be possible to extend DnsResolver to ba able to pass a different cache on each resolution attemp.
Modifications:
* Introduce DnsCache, NoopDnsCache and DefaultDnsCache. The latter contains all the current caching logic that was extracted.
* Introduce protected versions of doResolve and doResolveAll that can be used as extension points to build resolvers that bypass the main cache and use a different one on each resolution.
Result:
Isolated caching logic. Better extensibility.
Motivation:
* newAtomicIntegerFieldUpdater and newAtomicLongFieldUpdater take a
class<?> so they're too lax
* newAtomicReferenceFieldUpdater takes a Class<U> so it's too strict
and can only be passe a rawtype parameter when dealing w/ generic
classes
Modifications:
Take a Class<? super T> parameter instead.
Result:
Better type safety and generics support.
Motivation:
Warnings in IDE, unclean code, negligible performance impact.
Modification:
Deletion of unused imports
Result:
No more warnings in IDE, cleaner code, negligible performance improvement.
Motivation:
Javadoc reports errors about invalid docs.
Modifications:
Fix some errors reported by javadoc.
Result:
A lot of javadoc errors are fixed by this patch.
Motivation:
There are some wrong links and tags in javadoc.
Modifications:
Fix the wrong links and tags in javadoc.
Result:
These links will work correctly in javadoc.
Motivation:
DomainNameMapping.add() makes DomainNameMapping look like it's safe to call add() anytime, and this is never true. It's probably better deprecate add() and introduce DomainNameMappingBuilder.
Modifications:
Made an immutable implementation of DomainNameMapping;
Added Builder for immutable DomainNameMapping;
Replaced regex pattern with String::startsWith check;
Replaced HashMap with two arrays in ImmutableDomainNameMapping;
Deprecated mutable API;
Estimation for StringBuilder initial size in ImmutableDomainNameMapping#toString()
Added StringUtil#commonSuffixOfLength
Replaced unnecessary substrings creation in DomainNameMapping#matches with regionMatches
Result:
Clients will be able to create immutable instances of DomainNameMapping with builder API.
Motivation:
UTF-16 can not represent the full range of Unicode characters, and thus has the concept of Surrogate Pair (http://unicode.org/glossary/#surrogate_pair) where 2 16-bit code units can be used to represent the missing characters. ByteBufUtil.writeUtf8 is currently does not support this and is thus incomplete.
Modifications:
- Add support for surrogate pairs in ByteBufUtil.writeUtf8
Result:
ByteBufUtil.writeUtf8 now supports surrogate pairs and is correctly converting to UTF-8.
Motivation:
PriorityStreamByteDistributor uses a homegrown algorithm which distributes bytes to nodes in the priority tree. PriorityStreamByteDistributor has no concept of goodput which may result in poor utilization of network resources. PriorityStreamByteDistributor also has performance issues related to the tree traversal approach and number of nodes that must be visited. There also exists some more proven algorithms from the resource scheduling domain which PriorityStreamByteDistributor does not employ.
Modifications:
- Introduce a new ByteDistributor which uses elements from weighted fair queue schedulers
Result:
StreamByteDistributor which is sensitive to priority and uses a more familiar distribution concept.
Fixes https://github.com/netty/netty/issues/4462
Motivation:
On contrary to `DefaultNameResolver`, `DnsNameResolver` doesn't currently honor hosts file.
Modifications:
* Introduce `HostsFileParser` that parses `/etc/hosts` or `C:\Windows\system32\drivers\etc\hosts` depending on the platform
* Introduce `HostsFileEntriesResolver` that uses the former to resolve host names
* Make `DnsNameResolver` check his `HostsFileEntriesResolver` prior to trying to resolve names against the DNS server
* Introduce `DnsNameResolverBuilder` so we now have a builder for `DnsNameResolver`s
* Additionally introduce a `CompositeNameResolver` that takes several `NameResolver`s and tries to resolve names by delegating sequentially
* Change `DnsNameResolver.asAddressResolver` to return a composite and honor hosts file
Result:
Hosts file support when using `DnsNameResolver`.
Consistent behavior with JDK implementation.
Motivation:
The KObjectHashMapTest is in a directory called "io.netty.util.collection" rather than "io/netty/util/collection". This causes the generated tests to be created in the wrong directory as well.
Modifications:
Moved the file.
Result:
Fixes#4546
Motivation:
Related to issue #4564.
AsciiString.contentEqualsIgnoreCase fails when comparing two AsciiStrings of the same length
Modifications:
Compare the values of the first AsciiString to the second AsciiString
Result:
AsciiString.contentEqualsIgnoreCase works as expected
Motivation:
DefaultPromiseTest has dead code which was left over from a code restructure. Shared code between 2 tests was moved into a common method, but some code which was not cleaned up in each of these methods after the code was moved.
Modifications:
- Delete dead code in DefaultPromiseTest
Result:
Less dead code
Motivation:
AbstractFuture currently wraps CancellationException in a ExecutionException. However the interface of Future says that this exception should be directly thrown.
Modifications:
- Throw CancellationException from AbstractFuture.get
Result:
Interface contract for CancellationException is honored in AbstractFuture.
Motivation:
There is a notification ordering issue in DefaultPromise when the lateListener collection is in use. The ordering issue can be observed in situations where a late listener is added to a Future returned from a write operation. It is possible that this future will run after a read operation scheduled on the I/O thread, even if the late listener is added on the I/O thread. This can lead to unexpected ordering where a listener for a write operation which must complete in order for the read operation to happen is notified after the read operation is done.
Modifications:
- If the lateListener collection becomes empty, it should be treated as though it was null when checking if lateListeners can be notified immediatley (instead of executing a task on the executor)
Result:
Ordering is more natural and will not be perceived as being out of order relative to other tasks on the same executor.
Motivation:
HttpHeaders already has specific methods for such popular and simple headers like "Host", but if I need to convert POST raw body to string I need to parse complex ContentType header in my code.
Modifications:
Add getCharset and getCharsetAsString methods to parse charset from Content-Length header.
Result:
Easy to use utility method.
Motivation:
The AsciiString.hashCode() method can be optimized. This method is frequently used while to build the DefaultHeaders data structure.
Modification:
- Add a PlatformDependent hashCode algorithm which utilizes UNSAFE if available
Result:
AsciiString hashCode is faster.
Motivation:
If netty used as part of application, should be a way to prefix service thread name to easy distinguish such threads (for example, used in IntelliJ Platform)
Modifications:
Introduce system property io.netty.serviceThreadPrefix
Result:
ThreadDeathWatcher thread has a readable name "Netty threadDeathWatcher-2-1" if io.netty.serviceThreadPrefix set to "Netty"
Motivation:
The HTTP/2 RFC (https://tools.ietf.org/html/rfc7540#section-8.1.2) indicates that header names consist of ASCII characters. We currently use ByteString to represent HTTP/2 header names. The HTTP/2 RFC (https://tools.ietf.org/html/rfc7540#section-10.3) also eludes to header values inheriting the same validity characteristics as HTTP/1.x. Using AsciiString for the value type of HTTP/2 headers would allow for re-use of predefined HTTP/1.x values, and make comparisons more intuitive. The Headers<T> interface could also be expanded to allow for easier use of header types which do not have the same Key and Value type.
Motivation:
- Change Headers<T> to Headers<K, V>
- Change Http2Headers<ByteString> to Http2Headers<CharSequence, CharSequence>
- Remove ByteString. Having AsciiString extend ByteString complicates equality comparisons when the hash code algorithm is no longer shared.
Result:
Http2Header types are more representative of the HTTP/2 RFC, and relationship between HTTP/2 header name/values more directly relates to HTTP/1.x header names/values.
Motivation:
DefaultPromise.toString() returns 'DefaultPromise(incomplete)' when it's
actually complete with non-null result.
Modifications:
Handle the case where the promise is done and its result is non-null in
toString()
Result:
The String returned by DefaultPromise.toString() is not confusing
anymore.
Motivation:
Modulo operations are slow, we can use bitwise operation to detect if resource leak detection must be done while sampling.
Modifications:
- Ensure the interval is a power of two
- Use bitwise operation for sampling
- Add benchmark.
Result:
Faster sampling.
Motivation:
When the ImmediateEventExecutor is in use it is possible to get a StackOverFlowException if when a promise completes a new listener is added to that promise.
Modifications:
- Protect against the case where LateListeners.run() smashes the stack.
Result:
Fixes https://github.com/netty/netty/issues/4395
Motivation:
sun.misc.Unsafe allows us to handle heap ByteBuf in a more efficient matter. We should use special ByteBuf implementation when sun.misc.Unsafe can be used to increase performance.
Modifications:
- Add PooledUnsafeHeapByteBuf and UnpooledUnsafeHeapByteBuf that are used when sun.misc.Unsafe is ready to use.
- Add UnsafeHeapSwappedByteBuf
Result:
Better performance when using heap buffers and sun.misc.Unsafe is ready to use.
Motivation:
We had a bug in our implemention which double "reversed" bytes on systems which not support unaligned access.
Modifications:
- Correctly only reverse bytes if needed.
- Share code between unsafe implementations.
Result:
No more data-corruption on sytems without unaligned access.
Motivation:
We started the thread before store it in a field which could lead to an assert error when the thread is executed before we actually store it.
Modifications:
Store thread before start it.
Result:
No more assert error possible.
Motivation:
At the moment we only forward decoded messages that were added the out List once the full decode loop was completed. This has the affect that resources may not be released as fast as possible and as an application may incounter higher latency if the user triggeres a writeAndFlush(...) as a result of the decoded messages.
Modifications:
- forward decoded messages after each decode call
Result:
Forwarding decoded messages through the pipeline in a more eager fashion.
Motivation:
When dealing with case insensitive headers it can be useful to have a case insensitive contains method for CharSequence.
Modifications:
- Add containsCaseInsensative to AsciiString
Result:
More expressive utility method for case insensitive CharSequence.
Motivation:
Http2CodecUtils has some static variables which are defined as Strings instead of CharSequence. One of these defines is used as a header name and should be AsciiString.
Modifications:
- Change the String defines in Http2CodecUtils to CharSequence
Result:
Types are more consistently using CharSequence and adding the upgrade header will require less work.
Motivation:
Leak detector, when it detects a leak, will print the last 5 stack
traces that touched the ByteBuf. In some cases that might not be enough
to identify the root cause of the leak.
Also, sometimes users might not be interested in tracing all the
operations on the buffer, but just the ones that are affecting the
reference count.
Modifications:
Added command line properties to override default values:
* Allow to configure max number of stack traces to collect
* Allow to only record retain/release operation on buffers
Result:
Users can increase the number of stack traces to debug buffer leaks
with lot of retain/release operations.
Motivation:
for debugging and metrics reasons its sometimes useful to be able to get details of the the Thread that powers a SingleThreadEventExecutor.
Modifications:
- Expose ThreadProperties
- Add unit test.
Result:
It's now possible to get details of the Thread that powers a SingleThreadEventExecutor.
Motivation:
Sometimes it is useful to disable recycling completely if memory constraints are very tight.
Modifications:
Allow to use -Dio.netty.recycler.maxCapacity=0 to disable recycling completely.
Result:
It's possible to disable recycling now.
Motivation:
The javadocs are incorrect and so give false impressions of use-pattern.
Modifications:
- Fix javadocs of which operations are allowed from multiple threads concurrently.
- Let isEmpty() work concurrently.
Result:
Correctly document usage-patterns.
Motivation:
The StringUtil class creates a Formatter object, but does not close it. There are also a 2 utility methods which would be generally useful.
Modifications:
- Close the Formatter
- Add length and isNullOrEmpty
Result:
No more resource leaks. Additional utility methods.
Motivation:
A degradation in performance has been observed from the 4.0 branch as documented in https://github.com/netty/netty/issues/3962.
Modifications:
- Simplify Headers class hierarchy.
- Restore the DefaultHeaders to be based upon DefaultHttpHeaders from 4.0.
- Make various other modifications that are causing hot spots.
Result:
Performance is now on par with 4.0.
Motivation:
We noticed that the headers implementation in Netty for HTTP/2 uses quite a lot of memory
and that also at least the performance of randomly accessing a header is quite poor. The main
concern however was memory usage, as profiling has shown that a DefaultHttp2Headers
not only use a lot of memory it also wastes a lot due to the underlying hashmaps having
to be resized potentially several times as new headers are being inserted.
This is tracked as issue #3600.
Modifications:
We redesigned the DefaultHeaders to simply take a Map object in its constructor and
reimplemented the class using only the Map primitives. That way the implementation
is very concise and hopefully easy to understand and it allows each concrete headers
implementation to provide its own map or to even use a different headers implementation
for processing requests and writing responses i.e. incoming headers need to provide
fast random access while outgoing headers need fast insertion and fast iteration. The
new implementation can support this with hardly any code changes. It also comes
with the advantage that if the Netty project decides to add a third party collections library
as a dependency, one can simply plug in one of those very fast and memory efficient map
implementations and get faster and smaller headers for free.
For now, we are using the JDK's TreeMap for HTTP and HTTP/2 default headers.
Result:
- Significantly fewer lines of code in the implementation. While the total commit is still
roughly 400 lines less, the actual implementation is a lot less. I just added some more
tests and microbenchmarks.
- Overall performance is up. The current implementation should be significantly faster
for insertion and retrieval. However, it is slower when it comes to iteration. There is simply
no way a TreeMap can have the same iteration performance as a linked list (as used in the
current headers implementation). That's totally fine though, because when looking at the
benchmark results @ejona86 pointed out that the performance of the headers is completely
dominated by insertion, that is insertion is so significantly faster in the new implementation
that it does make up for several times the iteration speed. You can't iterate what you haven't
inserted. I am demonstrating that in this spreadsheet [1]. (Actually, iteration performance is
only down for HTTP, it's significantly improved for HTTP/2).
- Memory is down. The implementation with TreeMap uses on avg ~30% less memory. It also does not
produce any garbage while being resized. In load tests for GRPC we have seen a memory reduction
of up to 1.2KB per RPC. I summarized the memory improvements in this spreadsheet [1]. The data
was generated by [2] using JOL.
- While it was my original intend to only improve the memory usage for HTTP/2, it should be similarly
improved for HTTP, SPDY and STOMP as they all share a common implementation.
[1] https://docs.google.com/spreadsheets/d/1ck3RQklyzEcCLlyJoqDXPCWRGVUuS-ArZf0etSXLVDQ/edit#gid=0
[2] https://gist.github.com/buchgr/4458a8bdb51dd58c82b4
Motivation:
The HttpObjectDecoder is on the hot code path for the http codec. There are a few hot methods which can be modified to improve performance.
Modifications:
- Modify AppendableCharSequence to provide unsafe methods which don't need to re-check bounds for every call.
- Update HttpObjectDecoder methods to take advantage of new AppendableCharSequence methods.
Result:
Peformance boost for decoding http objects.
Motivation:
We should support XXXCollections methods for all primitive map types.
Modifications:
Removed PrimitiveCollections and added a template for XXXCollections.
Result:
Fixes#4001
Motivation:
It would be useful to support the Java `Map` interface in our primitive maps.
Modifications:
Renamed current methods to "pXXX", where p is short for "primitive". Made the template for all primitive maps extend the appropriate Map interface.
Result:
Fixes#3970
Motivation:
Prior we used a purge task that would remove previous canceled scheduled tasks from the internal queue. This could introduce some delay and so use a lot of memory even if the task itself is already canceled.
Modifications:
Schedule removal of task from queue via EventLoop if cancel operation is not done in the EventLoop Thread or just remove directly if the Thread that cancels the scheduled task is in the EventLoop.
Result:
Faster possibility to GC a canceled ScheduledFutureTask.
Motivation:
PoolThreadCache did only cache allocations if the allocation and deallocation Thread were the same. This is not optimal as often people write from differen thread then the actual EventLoop thread.
Modification:
- Add MpscArrayQueue which was forked from jctools and lightly modified.
- Use MpscArrayQueue for caches and always add buffer back to the cache that belongs to the allocation thread.
Result:
ThreadPoolCache is now also usable and so gives performance improvements when allocation and deallocation thread are different.
Performance when using same thread for allocation and deallocation is noticable worse then before.
Motivation:
The PooledByteBufAllocator is more or less a black-box atm. We need to expose some metrics to allow the user to get a better idea how to tune it.
Modifications:
- Expose different metrics via PooledByteBufAllocator
- Add *Metrics interfaces
Result:
It is now easy to gather metrics and detail about the PooledByteBufAllocator and so get a better understanding about resource-usage etc.
Motivation:
There is an error in the ByteString test logic which is resulting in test failures.
Modifications:
- Fix the loop iteration to use the loop iteration variable instead of a fixed index.
Result:
Tests are less buggy.
Motivation:
In the SslHandler we schedule a timeout at which we close the Channel if a timeout was detected during close_notify. Because this can race with notify the flushFuture we can see an IllegalStateException when the Channel is closed.
Modifications:
- Use a trySuccess() and tryFailure(...) to guard against race.
Result:
No more race.
Motivation:
All read operations should be safe to execute from multiple threads which was not the case and so could produce a livelock.
Modifications:
Modify methods so these are safe to be called from multiple threads.
Result:
No more livelock.
Motivation:
There are various known issues in netty-codec-dns:
- Message types are not interfaces, which can make it difficult for a
user to implement his/her own message implementation.
- Some class names and field names do not match with the terms in the
RFC.
- The support for decoding a DNS record was limited. A user had to
encode and decode by him/herself.
- The separation of DnsHeader from DnsMessage was unnecessary, although
it is fine conceptually.
- Buffer leak caused by DnsMessage was difficult to analyze, because the
leak detector tracks down the underlying ByteBuf rather than the
DnsMessage itself.
- DnsMessage assumes DNS-over-UDP.
- To send an EDNS message, a user have to create a new DNS record class
instance unnecessarily.
Modifications:
- Make all message types interfaces and add default implementations
- Rename some classes, properties, and constants to match the RFCs
- DnsResource -> DnsRecord
- DnsType -> DnsRecordType
- and many more
- Remove DnsClass and use an integer to support EDNS better
- Add DnsRecordEncoder/DnsRecordDecoder and their default
implementations
- DnsRecord does not require RDATA to be ByteBuf anymore.
- Add DnsRawRecord as the catch-all record type
- Merge DnsHeader into DnsMessage
- Make ResourceLeakDetector track AbstractDnsMessage
- Remove DnsMessage.sender/recipient properties
- Wrap DnsMessage with AddressedEnvelope
- Add DatagramDnsQuest and DatagramDnsResponse for ease of use
- Rename DnsQueryEncoder to DatagramDnsQueryEncoder
- Rename DnsResponseDecoder to DatagramDnsResponseDecoder
- Miscellaneous changes
- Add StringUtil.TAB
Result:
- Cleaner APi
- Can support DNS-over-TCP more easily in the future
- Reduced memory footprint in the default DnsQuery/Response
implementations
- Better leak tracking for DnsMessages
- Possibility to introduce new DnsRecord types in the future and provide
full record encoder/decoder implementation.
- No unnecessary instantiation for an EDNS pseudo resource record
Motivation:
Many projects need some kind a Channel/Connection pool implementation. While the protocols are different many things can be shared, so we should provide a generic API and implementation.
Modifications:
Add ChannelPool / ChannelPoolMap API and implementations.
Result:
Reusable / Generic pool implementation that users can use.
Motivation:
'length2 == 0' is not reachable because length1 and length2 are same at this point.
Motification:
Removed 'length2 == 0'.
Result:
Cleaner code.
Motivation:
Currently, valueOf() and newInstance() use almost same code to create new constant.
For maintainability, it's better to share duplicate code among them.
Motification:
Added new private functions.
- checkNotNullAndNotEmpty() is for checking whether the name of a constant is null and empty.
- newConstant0() is for creating a new constant.
Result:
- Compact source code
- Improvement of maintainability
Motivation:
When a SecurityManager is in place that preven reading the somaxconn file trying to bootstrap a channel later will result in a ClassNotFoundError.
Modifications:
- Reading the file in a privileged block.
Result:
No more ClassNotFoundError when a SecurityManager is in place.
Motivation:
The ByteString class currently assumes the underlying array will be a complete representation of data. This is limiting as it does not allow a subsection of another array to be used. The forces copy operations to take place to compensate for the lack of API support.
Modifications:
- add arrayOffset method to ByteString
- modify all ByteString and AsciiString methods that loop over or index into the underlying array to use this offset
- update all code that uses ByteString.array to ensure it accounts for the offset
- add unit tests to test the implementation respects the offset
Result:
ByteString and AsciiString can represent a sub region of a byte[].
Motivation:
static Package getPackage(Class<?> c) uses synchronized block internally.
Thanks to @jingene for the hint and initial report of the issue.
Modifications:
-Use simple lastIndexOf(...) and substring for a faster implementation
Result:
No more lock condition.
Motivation:
Each different *ChannelOption did extend ChannelOption in 4.0, which we changed in 4.1. This is a breaking change in terms of the API so we need to ensure we keep the old hierarchy.
Modifications:
- Let all *ChannelOption extend ChannelOption
- Add back constructor and mark it as @deprecated
Result:
No API breakage between 4.0 and 4.1
Motivation:
The current implementation does byte by byte comparison, which we have seen
can be a performance bottleneck when the AsciiString is used as the key in
a Map.
Modifications:
Use sun.misc.Unsafe (on supporting platforms) to compare up to eight bytes at a time
and get closer to the performance of String.equals(Object).
Result:
Significant improvement (2x - 6x) in performance over the current implementation.
Benchmark (size) Mode Samples Score Score error Units
i.n.m.i.PlatformDependentBenchmark.arraysBytesEqual 10 thrpt 10 118843477.518 2347259.347 ops/s
i.n.m.i.PlatformDependentBenchmark.arraysBytesEqual 50 thrpt 10 43910319.773 198376.996 ops/s
i.n.m.i.PlatformDependentBenchmark.arraysBytesEqual 100 thrpt 10 26339969.001 159599.252 ops/s
i.n.m.i.PlatformDependentBenchmark.arraysBytesEqual 1000 thrpt 10 2873119.030 20779.056 ops/s
i.n.m.i.PlatformDependentBenchmark.arraysBytesEqual 10000 thrpt 10 306370.450 1933.303 ops/s
i.n.m.i.PlatformDependentBenchmark.arraysBytesEqual 100000 thrpt 10 25750.415 108.391 ops/s
i.n.m.i.PlatformDependentBenchmark.unsafeBytesEqual 10 thrpt 10 248077563.510 635320.093 ops/s
i.n.m.i.PlatformDependentBenchmark.unsafeBytesEqual 50 thrpt 10 128198943.138 614827.548 ops/s
i.n.m.i.PlatformDependentBenchmark.unsafeBytesEqual 100 thrpt 10 86195621.349 1063959.307 ops/s
i.n.m.i.PlatformDependentBenchmark.unsafeBytesEqual 1000 thrpt 10 16920264.598 61615.365 ops/s
i.n.m.i.PlatformDependentBenchmark.unsafeBytesEqual 10000 thrpt 10 1687454.747 6367.602 ops/s
i.n.m.i.PlatformDependentBenchmark.unsafeBytesEqual 100000 thrpt 10 153717.851 586.916 ops/s
Motivation:
ByteString#hashCode() trashes its own hash code if it's being accessed concurrently
Modifications:
Pull the ByteString#hash into a local variable and calculate it locally.
Result:
ByteString#hashCode() is no longer returning a junk value.
Motivation:
The IntObjectHashMap benchmarks show the Agrona collections to be faster on put, lookup, and remove. One major difference is that we're using 2 modulus operations each time we increment the position index while iterating. Agrona uses a mask instead.
Modifications:
Modified the KObjectHashMap to use masking rather than modulus when wrapping the position index. This requires that the capacity be a power of 2.
Result:
Improved performance of IntObjectHashMap.
Motivation:
The usage and code within AsciiString has exceeded the original design scope for this class. Its usage as a binary string is confusing and on the verge of violating interface assumptions in some spots.
Modifications:
- ByteString will be created as a base class to AsciiString. All of the generic byte handling processing will live in ByteString and all the special character encoding will live in AsciiString.
Results:
The AsciiString interface will be clarified. Users of AsciiString can now be clear of the limitations the class imposes while users of the ByteString class don't have to live with those limitations.
Motivation:
Attribute.getAndRemove() will return the value but also remove the AttributeKey itself from the AttributeMap. This may not
what you want as you may want to keep an instance of it and just set it later again. Document the contract so the user know what to expect.
Modifications:
- Make it clear when to use AttributeKey.getAndRemove() / AttributeKey.remove() and when AttributeKey.getAndSet(null) / AttributeKey.set(null).
Result:
Less suprising behaviour.
Motivation:
To support HTTP2 we need APLN support. This was not provided before when using OpenSslEngine, so SSLEngine (JDK one) was the only bet.
Beside this CipherSuiteFilter was not supported
Modifications:
- Upgrade netty-tcnative and make use of new features to support ALPN and NPN in server and client mode.
- Guard against segfaults after the ssl pointer is freed
- support correctly different failure behaviours
- add support for CipherSuiteFilter
Result:
Be able to use OpenSslEngine for ALPN / NPN for server and client.
Motivation:
Currently we have IntObjectMap/HashMap, but it will be useful to support other primitive-based maps.
Modifications:
Moved the code int the current maps to template files and run Groovy code from common/pom.xml to apply the templates.
Result:
Autogeneration of int and char-based hash maps.
Motivation:
We are allocating a hash map for every HTTP2 Stream to store it's children.
Most streams are leafs in the priority tree and don't have children.
Modification:
- Only allocate children when we actually use them.
- Make EmptyIntObjectMap not throw a UnsupportedOperationException on remove, but return null instead (as is stated in it's javadoc).
Result:
Fewer unnecessary allocations.
Motivation:
PrimitiveCollections is not in the 4.1 branch. It is needed by HTTP/2.
Modifications:
Backport this class.
Result:
PrimitiveCollections is in 4.1.
Motivation:
The Http2Settings class currently disallows setting non-standard settings, which violates the spec.
Modifications:
Updated Http2Settings to permit arbitrary settings. Also adjusting the default initial capacity to allow setting all of the standard settings without reallocation.
Result:
Fixes#3560
Motivation:
On a system where ipv4 and ipv6 are supported a user may want to use -Djava.net.preferIPv4Stack=true to restrict it to use ipv4 only.
This is currently ignored with the epoll transport.
Modifications:
Respect java.net.preferIPv4Stack system property.
Result:
-Djava.net.preferIPv4Stack=true will have the effect the user is looking for.
Motivation:
When remove0() is called multiple times for an DefaultAttribute it can cause corruption of the internal linked-list structure.
Modifications:
- Ensure remove0() can not cause corruption by null out prev and next references.
Result:
No more corruption possible
Motivation:
At the moment when EmbeddedChannel is used and a ChannelHandler tries to schedule and task it will throw an UnsupportedOperationException. This makes it impossible to test these handlers or even reuse them with EmbeddedChannel.
Modifications:
- Factor out reusable scheduling code into AbstractSchedulingEventExecutor
- Let EmbeddedEventLoop and SingleThreadEventExecutor extend AbstractSchedulingEventExecutor
- add EmbbededChannel.runScheduledPendingTasks() which allows to run all scheduled tasks that are ready
Result:
Embeddedchannel is now usable even with ChannelHandler that try to schedule tasks.
Motivation:
At the moment if you want to return a HTTP header containing multiple
values you have to set/add that header once with the values wanted. If
you used set/add with an array/iterable multiple HTTP header fields will
be returned in the response.
Note, that this is indeed a suggestion and additional work and tests
should be added. This is mainly to bring up a discussion.
Modifications:
Added a flag to specify that when multiple values exist for a single
HTTP header then add them as a comma separated string.
In addition added a method to StringUtil to help escape comma separated
value charsequences.
Result:
Allows for responses to be smaller.
Motivation:
We should allow to get a ChannelOption/AttributeKey from a String. This will make it a lot easier to make use of configuration files in applications.
Modifications:
- Add exists(...), newInstance(...) method to ChannelOption and AttributeKey and alter valueOf(...) to return an existing instance for a String or create one.
- Add unit tests.
Result:
Much more flexible usage of ChannelOption and AttributeKey.
While implementing netty-handler-proxy, I realized various issues in our
current socksx package. Here's the list of the modifications and their
background:
- Split message types into interfaces and default implementations
- so that a user can implement an alternative message implementations
- Use classes instead of enums when a user might want to define a new
constant
- so that a user can extend SOCKS5 protocol, such as:
- defining a new error code
- defining a new address type
- Rename the message classes
- to avoid abbreviated class names. e.g:
- Cmd -> Command
- Init -> Initial
- so that the class names align better with the protocol
specifications. e.g:
- AuthRequest -> PasswordAuthRequest
- AuthScheme -> AuthMethod
- Rename the property names of the messages
- so that the property names align better when the field names in the
protocol specifications
- Improve the decoder implementations
- Give a user more control over when a decoder has to be removed
- Use DecoderResult and DecoderResultProvider to handle decode failure
gracefully. i.e. no more Unknown* message classes
- Add SocksPortUnifinicationServerHandler since it's useful to the users
who write a SOCKS server
- Cleaned up and moved from the socksproxy example
Motivation:
isRoot() is an expensive operation. We should avoid calling it if
possible.
Modifications:
Move the isRoot() checks to the end of the 'if' block, so that isRoot()
is evaluated only when really necessary.
Result:
isRoot() is evaluated only when SO_BROADCAST is set and the bind address
is anylocal address.
Motivation:
When a user sees an error message, sometimes he or she does not know
what exactly he or she has to do to fix the problem.
Modifications:
Log the URL of the wiki pages that might help the user troubleshoot.
Result:
We are more friendly.
Related: #3166
Motivation:
When the recyclable object created at one thread is returned at the
other thread, it is stored in a WeakOrderedQueue.
The objects stored in the WeakOrderedQueue is added back to the stack by
WeakOrderedQueue.transfer() when the owner thread ran out of recyclable
objects.
However, WeakOrderedQueue.transfer() does not have any mechanism that
prevents the stack from growing beyond its maximum capacity.
Modifications:
- Make WeakOrderedQueue.transfer() increase the capacity of the stack
only up to its maximum
- Add tests for the cases where the recyclable object is returned at the
non-owner thread
- Fix a bug where Stack.scavengeSome() does not scavenge the objects
when it's the first time it ran out of objects and thus its cursor is
null.
- Overall clean-up of scavengeSome() and transfer()
Result:
The capacity of Stack never increases beyond its maximum.
Motivation:
io.netty.util.internal.PlatformDependent.isRoot() depends on the IS_ROOT field which is filled in during class initialization. This spawns processes and consumes resources, which are not generally necessary to the complete functioning of that class.
Modifications:
This switches the class to use lazy initialization this field inside of the isRoot() method using double-checked locking (http://en.wikipedia.org/wiki/Double-checked_locking).
Result:
The first call to isRoot() will be slightly slower, at a tradeoff that class loading is faster, uses fewer resources and platform errors are avoided unless necessary.
- Parameterize DomainNameMapping to make it useful for other use cases
than just mapping to SslContext
- Move DomainNameMapping to io.netty.util
- Clean-up the API documentation
- Make SniHandler.hostname and sslContext volatile because they can be
accessed by non-I/O threads
Motivation:
When we need to host multiple server name with a single IP, it requires
the server to support Server Name Indication extension to serve clients
with proper certificate. So the SniHandler will host multiple
SslContext(s) and append SslHandler for requested hostname.
Modification:
* Added SniHandler to host multiple certifications in a single server
* Test case
Result:
User could use SniHandler to host multiple certifcates at a time.
It's server-side only.
Motivation:
8fbc513 introduced stray warnings in callsites of
PromiseAggregator#add and PromiseNotifier#(...).
Modifications:
This commit adds the @SafeVarargs annotation to PromiseAggregator#add
and PromiseNotifier#(...). As Netty is built with JDK7, this is a
recognized annotation and should not affect runtime VM versions 1.5 and
1.6.
Result:
Building Netty with JDK7 will no longer produce warnings in the
callsites mentioned above.
Motivation:
Although the new IntObjectMap.values() that returns Collection is
useful, the removed values(Class<V>) that returns an array is also
useful. It's also good for backward compatibility.
Modifications:
- Add IntObjectMap.values(Class<V>) back
- Miscellaneous improvements
- Cache the collection returned by IntObjectHashMap.values()
- Inspector warnings
- Update the IntObjectHashMapTest to test both values()
Result:
- Backward compatibility
- Potential performance improvement of values()
Motivation:
The mentioned commit contains a bug fix and an improvement in
IntObjectHashMap that requires backporting.
Modifications:
Update IntObjectMap, IntObjectHashMap, and IntObjectHashMapTest
Result:
Easier to backport HTTP/2 and other changes in master in the future
Motivation:
Found performance issues via FindBugs and PMD.
Modifications:
- Removed unnecessary boxing/unboxing operations in DefaultTextHeaders.convertToInt(CharSequence) and DefaultTextHeaders.convertToLong(CharSequence). A boxed primitive is created from a string, just to extract the unboxed primitive value.
- Added a static modifier for DefaultHttp2Connection.ParentChangedEvent class. This class is an inner class, but does not use its embedded reference to the object which created it. This reference makes the instances of the class larger, and may keep the reference to the creator object alive longer than necessary.
- Added a static compiled Pattern to avoid compile it each time it is used when we need to replace some part of authority.
- Improved using of StringBuilders.
Result:
Performance improvements.
Motivation:
NetUtil.isValidIpV6Address() handles the interface name in IPv6 address
incorrectly. For example, it returns false for the following addresses:
- ::1%lo
- ::1%_%_in_name_
Modifications:
- Strip the square brackets before validation for simplicity
- Strip the part after the percent sign completely before validation for
simplicity
- Simplify and reformat NetUtilTest
Result:
- The interface names in IPv6 addresses are handled correctly.
- NetUtilTest is cleaner
Motivation:
ChannelPromiseAggregator and ChannelPromiseNotifiers only allow
consumers to work with Channels as the result type. Generic versions
of these classes allow consumers to aggregate or broadcast the results
of an asynchronous execution with other result types.
Modifications:
Add PromiseAggregator and PromiseNotifier. Add unit tests for both.
Remove code in ChannelPromiseAggregator and ChannelPromiseNotifier and
modify them to extend the new base classes.
Result:
Consumers can now aggregate or broadcast the results of an asynchronous
execution with results types other than Channel.
Motivation:
CollectionUtils has only one method and it is used only in DefaultHeaders.
Modification:
Move CollectionUtils.equals() to DefaultHeaders and make it private
Result:
One less class to expose in our public API
Motivation:
The header class hierarchy and algorithm was improved on the master branch for versions 5.x. These improvments should be backported to the 4.1 baseline.
Modifications:
- cherry-pick the following commits from the master branch: 2374e17, 36b4157, 222d258
Result:
Header improvements in master branch are available in 4.1 branch.
Motivation:
Improvements were made on the main line to support ALPN and mutual
authentication for TLS. These should be backported.
Modifications:
- Backport commits from the master branch
- f8af84d599
- e74c8edba3
Result:
Support for ALPN and mutual authentication.
Motivation:
The java implementations for Inet6Address.getHostName() do not follow the RFC 5952 (http://tools.ietf.org/html/rfc5952#section-4) for recommended string representation. This introduces inconsistencies when integrating with other technologies that do follow the RFC.
Modifications:
-NetUtil.java to have another public static method to convert InetAddress to string. Inet4Address will use the java InetAddress.getHostAddress() implementation and there will be new code to implement the RFC 5952 IPV6 string conversion.
-New unit tests to test the new method
Result:
Netty provides a RFC 5952 compliant string conversion method for IPV6 addresses
Motivation
Issue #3004 shows that "=" character was not supported as it should in
the HttpPostRequestDecoder in form-data boundary.
Modifications:
Add 2 methods in StringUtil
- split with maxPart argument: String split with max parts only (to prevent multiple '='
to be source of extra split while not needed)
- substringAfter: String part after delimiter (since first part is not
needed)
Use those methods in HttpPostRequestDecoder.
Change and the HttpPostRequestDecoderTest to check using a boundary
beginning with "=".
Results:
The fix implies more stability and fix the issue.
Motivation:
Using a needless local copy of keys.length.
Modifications:
Using keys.length explicitly everywhere.
Result:
Slight performance improvement of hashIndex.
Motivation:
The hashIndex method currently uses a conditional to handle negative
keys. This could be done without a conditional to slightly improve
performance.
Modifications:
Modified hashIndex() to avoid using a conditional.
Result:
Slight performance improvement to hashIndex().
Motivation:
IntObjectHashMap throws an exception when using negative values for
keys.
Modifications:
Changed hashIndex() to normalize the index if the mod operation returns
a negative number.
Result:
IntObjectHashMap supports negative key values.
Motivation:
An IPv6 string can have a zone index which is followed by the '%' sign.
When a user passes an IPv6 string with a zone index,
NetUtil.createByteArrayFromIpAddressString() returns an incorrect value.
Modification:
- Strip the zone index before conversion
Result:
An IPv6 string with a zone index is decoded correctly.
Motivation:
This fixes bug #2848 which caused Recycler to become unbounded and cache infinite number of objects with maxCapacity that's not a power of two. This can result in general sluggishness of the application and OutOfMemoryError.
Modifications:
The test for maxCapacity has been moved out of test to check if the buffer has filled. The buffer is now also capped at maxCapacity and cannot grow over it as it jumps from one power of two to the other.
Additionally, a unit test was added to verify maxCapacity is honored even when it's not a power of two.
Result:
With these changes the user is able to use a custom maxCapacity number and not have it ignored. The unit test assures this bug will not repeat itself.
Motivation:
There is not need todo redunant reads of head in peakNode as we can just spin on next() until it becomes visible.
Modifications:
Remove redundant reads of head in peakNode. This is based on @nitsanw's patch for akka.
See https://github.com/akka/akka/pull/15596
Result:
Less volatile access.
Motivation:
Recently we changed the default value of SOMAXCONN that is used when we can not determine it by reading /proc/sys/net/core/somaxconn. While doing this we missed to update the javadocs to reflect the new default value that is used.
Modifications:
List correct default value in the javadocs of SOMAXCONN.
Result:
Correct javadocs.
Related issue: #2407
Motivation:
The current fallback SOMAXCONN value is 3072. It is way too large
comparing to the default SOMAXCONN value of popular OSes.
Modifications:
Decrease the fallback SOMAXCONN value to 128 or 200 depending on the
current OS
Result:
Saner fallback value
Motivation:
The calculation of the max wait time for HashedWheelTimerTest.testExecutionOnTime() was wrong and so the test sometimes failed.
Modifications:
Fix the max wait time.
Result:
No more test-failures
Motivation:
We forgot to do a null check on the cause parameter of
ChannelFuture.setFailure(cause)
Modifications:
Add a null check
Result:
Fixed issue: #2728
Motivation:
We did various changes related to the ChannelOutboundBuffer in 4.0 branch. This commit port all of them over and so make sure our branches are synced in terms of these changes.
Related to [#2734], [#2709], [#2729], [#2710] and [#2693] .
Modification:
Port all changes that was done on the ChannelOutboundBuffer.
This includes the port of the following commits:
- 73dfd7c01b
- 997d8c32d2
- e282e504f1
- 5e5d1a58fd
- 8ee3575e72
- d6f0d12a86
- 16e50765d1
- 3f3e66c31a
Result:
- Less memory usage by ChannelOutboundBuffer
- Same code as in 4.0 branch
- Make it possible to use ChannelOutboundBuffer with Channel implementation that not extends AbstractChannel
Motivation:
As /proc/sys/net/core/somaxconn does not exists on non-linux platforms you see a noisy stacktrace when debug level is enabled while the static method of NetUtil is executed.
Modifications:
Check if the file exists before try to parse it.
Result:
Less noisy logging on non-linux platforms.
Related issue: #2354
Motivation:
AbstractConstant.compareTo() can return 0 even if the specified constant
object is not the same instance with 'this'.
Modifications:
- Compare the identityHashCode of constant first. If that fails,
allocate a small direct buffer and use its memory address as a unique
value. If the platform does not provide a way to get the memory
address of a direct buffer, use a thread-local random value.
- Signal cannot extend AbstractConstant. Use delegation.
Result:
It is practically impossible for AbstractConstant.compareTo() to return
0 for different constant objects.
Motivation:
While benchmarking the native transport with gathering writes I noticed that it is quite slow. This is due the fact that we need to do a lot of array copies to get the buffers into the iov array.
Modification:
Introduce a new class calles IovArray which allows to fill buffers directly in a iov array that can be passed over to JNI without any array copies. This gives a nice optimization in terms of speed when doing gathering writes.
Result:
Big performance improvement when doing gathering writes. See the included benchmark...
Before:
[nmaurer@xxx]~% wrk/wrk -H 'Host: localhost' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Connection: keep-alive' -d 120 -c 256 -t 16 --pipeline 256 http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 23.44ms 16.37ms 259.57ms 91.77%
Req/Sec 181.99k 31.69k 304.60k 78.12%
346544071 requests in 2.00m, 46.48GB read
Requests/sec: 2887885.09
Transfer/sec: 396.59MB
With this change:
[nmaurer@xxx]~% wrk/wrk -H 'Host: localhost' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Connection: keep-alive' -d 120 -c 256 -t 16 --pipeline 256 http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 21.93ms 16.33ms 305.73ms 92.34%
Req/Sec 194.56k 33.75k 309.33k 77.04%
369617503 requests in 2.00m, 49.57GB read
Requests/sec: 3080169.65
Transfer/sec: 423.00MB
Motivation:
Due some race-condition while handling canellation of TimerTasks it was possibleto corrupt the linked-list structure that is represent by HashedWheelBucket and so produce a NPE.
Modification:
Fix the problem by adding another MpscLinkedQueue which holds the cancellation tasks and process them on each tick. This allows to use no synchronization / locking at all while introduce a latency of max 1 tick before the TimerTask can be GC'ed.
Result:
No more NPE
- Rewrite with linear probing, no state array, compaction at cleanup
- Optimize keys() and values() to not use reflection
- Optimize hashCode() and equals() for efficient iteration
- Fixed equals() to not return true for equals(null)
- Optimize iterator to not allocate new Entry at each next()
- Added toString()
- Added some new unit tests
Motivation:
Now Netty has a few problems with null values.
Modifications:
- Check HAProxyProxiedProtocol in HAProxyMessage constructor and throw NPE if it is null.
If HAProxyProxiedProtocol is null we will set AddressFamily as null. So we will get NPE inside checkAddress(String, AddressFamily) and it won't be easy to understand why addrFamily is null.
- Check File in DiskFileUpload.toString().
If File is null we will get NPE when calling toString() method.
- Check Result<String> in MqttDecoder.decodeConnectionPayload(...).
If !mqttConnectVariableHeader.isWillFlag() || !mqttConnectVariableHeader.hasUserName() || !mqttConnectVariableHeader.hasPassword() we will get NPE when we will try to create new instance of MqttConnectPayload.
- Check Unsafe before calling unsafe.getClass() in PlatformDependent0 static block.
- Removed unnecessary null check in WebSocket08FrameEncoder.encode(...).
Because msg.content() can not return null.
- Removed unnecessary null check in DefaultStompFrame(StompCommand) constructor.
Because we have this check in the super class.
- Removed unnecessary null checks in ConcurrentHashMapV8.removeTreeNode(TreeNode<K,V>).
- Removed unnecessary null check in OioDatagramChannel.doReadMessages(List<Object>).
Because tmpPacket.getSocketAddress() always returns new SocketAddress instance.
- Removed unnecessary null check in OioServerSocketChannel.doReadMessages(List<Object>).
Because socket.accept() always returns new Socket instance.
- Pass Unpooled.buffer(0) instead of null inside CloseWebSocketFrame(boolean, int) constructor.
If we will pass null we will get NPE in super class constructor.
- Added throw new IllegalStateException in GlobalEventExecutor.awaitInactivity(long, TimeUnit) if it will be called before GlobalEventExecutor.execute(Runnable).
Because now we will get NPE. IllegalStateException will be better in this case.
- Fixed null check in OpenSslServerContext.setTicketKeys(byte[]).
Now we throw new NPE if byte[] is not null.
Result:
Added new null checks when it is necessary, removed unnecessary null checks and fixed some NPE problems.
Motivation:
Fix some typos in Netty.
Modifications:
- Fix potentially dangerous use of non-short-circuit logic in Recycler.transfer(Stack<?>).
- Removed double 'the the' in javadoc of EmbeddedChannel.
- Write to log an exception message if we can not get SOMAXCONN in the NetUtil's static block.
Modifications:
- Added a static modifier for CompositeByteBuf.Component.
This class is an inner class, but does not use its embedded reference to the object which created it. This reference makes the instances of the class larger, and may keep the reference to the creator object alive longer than necessary.
- Removed unnecessary boxing/unboxing operations in HttpResponseDecoder, RtspResponseDecoder, PerMessageDeflateClientExtensionHandshaker and PerMessageDeflateServerExtensionHandshaker
A boxed primitive is created from a String, just to extract the unboxed primitive value.
- Removed unnecessary 3 times calculations in DiskAttribute.addContent(...).
- Removed unnecessary checks if file exists before call mkdirs() in NativeLibraryLoader and PlatformDependent.
Because the method mkdirs() has this check inside.
- Removed unnecessary `instanceof AsciiString` check in StompSubframeAggregator.contentLength(StompHeadersSubframe) and StompSubframeDecoder.getContentLength(StompHeaders, long).
Because StompHeaders.get(CharSequence) always returns java.lang.String.
Motivations:
In our new version of HWT we used some kind of lazy cancelation of timeouts by put them back in the queue and let them pick up on the next tick. This multiple problems:
- we may corrupt the MpscLinkedQueue if the task is used as tombstone
- this sometimes lead to an uncessary delay especially when someone did executed some "heavy" logic in the TimeTask
Modifications:
Use a Lock per HashedWheelBucket for save and fast removal.
Modifications:
Cancellation of tasks can be done fast and so stuff can be GC'ed and no more infinite-loop possible
Motivation:
When system is in short of entrophy, the initialization of
ThreadLocalRandom can take at most 3 seconds. The initialization occurs
when ThreadLocalRandom.current() is invoked first time, which might be
much later than the moment when the application has started. If we
start the initialization of ThreadLocalRandom as early as possible, we
can reduce the perceived time taken for the retrieval.
Modification:
Begin the initialization of ThreadLocalRandom in InternalLoggerFactory,
potentially one of the firstly initialized class in a Netty application.
Make DefaultChannelId retrieve the current process ID before retrieving
the current machine ID, because retrieval of a machine ID is more likely
to use ThreadLocalRandom.current().
Use a dummy channel ID for EmbeddedChannel, which prevents many unit
tests from creating a ThreadLocalRandom instance.
Result:
We gain extra 100ms at minimum for initialSeedUniquifier generation. If
an application has its own initialization that takes long enough time
and generates good amount of entrophy, it is very likely that we will
gain a lot more.
Motivation:
We use the nanoTime of the scheduledTasks to calculate the milli-seconds to wait for a select operation to select something. Once these elapsed we check if there was something selected or some task is ready for processing. Unfortunally we not take into account scheduled tasks here so the selection loop will continue if only scheduled tasks are ready for processing. This will delay the execution of these tasks.
Modification:
- Check if a scheduled task is ready after selecting
- also make a tiny change in NioEventLoop to not trigger a rebuild if nothing was selected because the timeout was reached a few times in a row.
Result:
Execute scheduled tasks on time.
Motivation:
When a user tries to use netty on android it currently fails with "Could not find class 'sun.misc.Cleaner'"
Modification:
Encapsulate sun.misc.Cleaner usage in extra class to workaround this isssue.
Result:
Netty can be used on android again
Motivation:
During some refactoring we changed PlatformDependend0 to use sun.nio.ch.DirectBuffer for release direct buffers. This broke support for android as the class does not exist there and so an exception is thrown.
Modification:
Use again the fieldoffset to get access to Cleaner for release direct buffers.
Result:
Netty can be used on android again
Motivation:
Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used.
Modification:
- Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance.
- Recycling of the same object multiple times without acquire it will fail.
- Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler
These changes are based on @belliottsmith 's work that was part of #2504.
Result:
Huge increase in performance.
4.0 branch without this commit:
Benchmark (size) Mode Samples Score Score error Units
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
4.0 branch with this commit:
Benchmark (size) Mode Samples Score Score error Units
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
Motivation:
The tail node reference writes (by producer threads) are very likely to
invalidate the cache line holding the headRef which is read by the
consumer threads in order to access the padded reference to the head
node. This is because the resulting layout for the object is:
- header
- Object AtomicReference.value -> Tail node
- Object MpscLinkedQueue.headRef -> PaddedRef -> Head node
This is 'passive' false sharing where one thread reads and the other
writes. The current implementation suffers from further passive false
sharing potential from any and all neighbours to the queue object as no
pre/post padding is provided for the class fields.
Modifications:
Fix the memory layout by adding pre-post padding for the head node and
putting the tail node reference in the same object.
Result:
Fixed false sharing
Motivation:
Maps with integer keys are used in several places (HTTP/2 code, for
example). To reduce the memory footprint of these structures, we need a
specialized map class that uses ints as keys.
Modifications:
Added IntObjectHashMap, which is uses open addressing and double hashing
for collision resolution.
Result:
A new int-based map class that can be shared across Netty.
Motivation:
When Netty runs in a managed environment such as web application server,
Netty needs to provide an explicit way to remove the thread-local
variables it created to prevent class loader leaks.
FastThreadLocal uses different execution paths for storing a
thread-local variable depending on the type of the current thread.
It increases the complexity of thread-local removal.
Modifications:
- Moved FastThreadLocal and FastThreadLocalThread out of the internal
package so that a user can use it.
- FastThreadLocal now keeps track of all thread local variables it has
initialized, and calling FastThreadLocal.removeAll() will remove all
thread-local variables of the caller thread.
- Added FastThreadLocal.size() for diagnostics and tests
- Introduce InternalThreadLocalMap which is a mixture of hard-wired
thread local variable fields and extensible indexed variables
- FastThreadLocal now uses InternalThreadLocalMap to implement a
thread-local variable.
- Added ThreadDeathWatcher.unwatch() so that PooledByteBufAllocator
tells it to stop watching when its thread-local cache has been freed
by FastThreadLocal.removeAll().
- Added FastThreadLocalTest to ensure that removeAll() works
- Added microbenchmark for FastThreadLocal and JDK ThreadLocal
- Upgraded to JMH 0.9
Result:
- A user can remove all thread-local variables Netty created, as long as
he or she did not exit from the current thread. (Note that there's no
way to remove a thread-local variable from outside of the thread.)
- FastThreadLocal exposes more useful operations such as isSet() because
we always implement a thread local variable via InternalThreadLocalMap
instead of falling back to JDK ThreadLocal.
- FastThreadLocalBenchmark shows that this change improves the
performance of FastThreadLocal even more.
Motivation:
We have quite a bit of code duplication between HTTP/1, HTTP/2, SPDY,
and STOMP codec, because they all have a notion of 'headers', which is a
multimap of string names and values.
Modifications:
- Add TextHeaders and its default implementation
- Add AsciiString to replace HttpHeaderEntity
- Borrowed some portion from Apache Harmony's java.lang.String.
- Reimplement HttpHeaders, SpdyHeaders, and StompHeaders using
TextHeaders
- Add AsciiHeadersEncoder to reuse the encoding a TextHeaders
- Used a dedicated encoder for HTTP headers for better performance
though
- Remove shortcut methods in SpdyHeaders
- Replace SpdyHeaders.getStatus() with HttpResponseStatus.parseLine()
Result:
- Removed quite a bit of code duplication in the header implementations.
- Slightly better performance thanks to improved header validation and
hash code calculation
Motivation:
Provide a faster ThreadLocal implementation
Modification:
Add a "FastThreadLocal" which uses an EnumMap and a predefined fixed set of possible thread locals (all of the static instances created by netty) that is around 10-20% faster than standard ThreadLocal in my benchmarks (and can be seen having an effect in the direct PooledByteBufAllocator benchmark that uses the DEFAULT ByteBufAllocator which uses this FastThreadLocal, as opposed to normal instantiations that do not, and in the new RecyclableArrayList benchmark);
Result:
Improved performance
Motivation:
At the moment the HashedWheelTimer will only remove the cancelled Timeouts once the HashedWheelBucket is processed again. Until this the instance will not be able to be GC'ed as there are still strong referenced to it even if the user not reference it by himself/herself. This can cause to waste a lot of memory even if the Timeout was cancelled before.
Modification:
Add a new queue which holds CancelTasks that will be processed on each tick to remove cancelled Timeouts. Because all of this is done only by the WorkerThread there is no need for synchronization and only one extra object creation is needed when cancel() is executed. For addTimeout(...) no new overhead is introduced.
Result:
Less memory usage for cancelled Timeouts.
Motivation:
As part of GSOC 2013 we had @mbakkar working on a DNS codec but did not integrate it yet as it needs some cleanup. This commit is based on @mbakkar's work and provide the codec for DNS.
Modifications:
Add DNS codec
Result:
Reusable DNS codec will be included in netty.
This PR also includes a AsynchronousDnsResolver which allows to resolve DNS entries in a non blocking way by make use
of the dns codec and netty transport itself.