Motivation:
On contrary to `DefaultNameResolver`, `DnsNameResolver` doesn't currently honor hosts file.
Modifications:
* Introduce `HostsFileParser` that parses `/etc/hosts` or `C:\Windows\system32\drivers\etc\hosts` depending on the platform
* Introduce `HostsFileEntriesResolver` that uses the former to resolve host names
* Make `DnsNameResolver` check his `HostsFileEntriesResolver` prior to trying to resolve names against the DNS server
* Introduce `DnsNameResolverBuilder` so we now have a builder for `DnsNameResolver`s
* Additionally introduce a `CompositeNameResolver` that takes several `NameResolver`s and tries to resolve names by delegating sequentially
* Change `DnsNameResolver.asAddressResolver` to return a composite and honor hosts file
Result:
Hosts file support when using `DnsNameResolver`.
Consistent behavior with JDK implementation.
Motivation:
The AsciiString.hashCode() method can be optimized. This method is frequently used while to build the DefaultHeaders data structure.
Modification:
- Add a PlatformDependent hashCode algorithm which utilizes UNSAFE if available
Result:
AsciiString hashCode is faster.
Motivation:
The HTTP/2 RFC (https://tools.ietf.org/html/rfc7540#section-8.1.2) indicates that header names consist of ASCII characters. We currently use ByteString to represent HTTP/2 header names. The HTTP/2 RFC (https://tools.ietf.org/html/rfc7540#section-10.3) also eludes to header values inheriting the same validity characteristics as HTTP/1.x. Using AsciiString for the value type of HTTP/2 headers would allow for re-use of predefined HTTP/1.x values, and make comparisons more intuitive. The Headers<T> interface could also be expanded to allow for easier use of header types which do not have the same Key and Value type.
Motivation:
- Change Headers<T> to Headers<K, V>
- Change Http2Headers<ByteString> to Http2Headers<CharSequence, CharSequence>
- Remove ByteString. Having AsciiString extend ByteString complicates equality comparisons when the hash code algorithm is no longer shared.
Result:
Http2Header types are more representative of the HTTP/2 RFC, and relationship between HTTP/2 header name/values more directly relates to HTTP/1.x header names/values.
Motivation:
Modulo operations are slow, we can use bitwise operation to detect if resource leak detection must be done while sampling.
Modifications:
- Ensure the interval is a power of two
- Use bitwise operation for sampling
- Add benchmark.
Result:
Faster sampling.
Motivation:
sun.misc.Unsafe allows us to handle heap ByteBuf in a more efficient matter. We should use special ByteBuf implementation when sun.misc.Unsafe can be used to increase performance.
Modifications:
- Add PooledUnsafeHeapByteBuf and UnpooledUnsafeHeapByteBuf that are used when sun.misc.Unsafe is ready to use.
- Add UnsafeHeapSwappedByteBuf
Result:
Better performance when using heap buffers and sun.misc.Unsafe is ready to use.
Motivation:
We had a bug in our implemention which double "reversed" bytes on systems which not support unaligned access.
Modifications:
- Correctly only reverse bytes if needed.
- Share code between unsafe implementations.
Result:
No more data-corruption on sytems without unaligned access.
Motivation:
At the moment we only forward decoded messages that were added the out List once the full decode loop was completed. This has the affect that resources may not be released as fast as possible and as an application may incounter higher latency if the user triggeres a writeAndFlush(...) as a result of the decoded messages.
Modifications:
- forward decoded messages after each decode call
Result:
Forwarding decoded messages through the pipeline in a more eager fashion.
Motivation:
The javadocs are incorrect and so give false impressions of use-pattern.
Modifications:
- Fix javadocs of which operations are allowed from multiple threads concurrently.
- Let isEmpty() work concurrently.
Result:
Correctly document usage-patterns.
Motivation:
The StringUtil class creates a Formatter object, but does not close it. There are also a 2 utility methods which would be generally useful.
Modifications:
- Close the Formatter
- Add length and isNullOrEmpty
Result:
No more resource leaks. Additional utility methods.
Motivation:
A degradation in performance has been observed from the 4.0 branch as documented in https://github.com/netty/netty/issues/3962.
Modifications:
- Simplify Headers class hierarchy.
- Restore the DefaultHeaders to be based upon DefaultHttpHeaders from 4.0.
- Make various other modifications that are causing hot spots.
Result:
Performance is now on par with 4.0.
Motivation:
The HttpObjectDecoder is on the hot code path for the http codec. There are a few hot methods which can be modified to improve performance.
Modifications:
- Modify AppendableCharSequence to provide unsafe methods which don't need to re-check bounds for every call.
- Update HttpObjectDecoder methods to take advantage of new AppendableCharSequence methods.
Result:
Peformance boost for decoding http objects.
Motivation:
PoolThreadCache did only cache allocations if the allocation and deallocation Thread were the same. This is not optimal as often people write from differen thread then the actual EventLoop thread.
Modification:
- Add MpscArrayQueue which was forked from jctools and lightly modified.
- Use MpscArrayQueue for caches and always add buffer back to the cache that belongs to the allocation thread.
Result:
ThreadPoolCache is now also usable and so gives performance improvements when allocation and deallocation thread are different.
Performance when using same thread for allocation and deallocation is noticable worse then before.
Motivation:
The PooledByteBufAllocator is more or less a black-box atm. We need to expose some metrics to allow the user to get a better idea how to tune it.
Modifications:
- Expose different metrics via PooledByteBufAllocator
- Add *Metrics interfaces
Result:
It is now easy to gather metrics and detail about the PooledByteBufAllocator and so get a better understanding about resource-usage etc.
Motivation:
All read operations should be safe to execute from multiple threads which was not the case and so could produce a livelock.
Modifications:
Modify methods so these are safe to be called from multiple threads.
Result:
No more livelock.
Motivation:
There are various known issues in netty-codec-dns:
- Message types are not interfaces, which can make it difficult for a
user to implement his/her own message implementation.
- Some class names and field names do not match with the terms in the
RFC.
- The support for decoding a DNS record was limited. A user had to
encode and decode by him/herself.
- The separation of DnsHeader from DnsMessage was unnecessary, although
it is fine conceptually.
- Buffer leak caused by DnsMessage was difficult to analyze, because the
leak detector tracks down the underlying ByteBuf rather than the
DnsMessage itself.
- DnsMessage assumes DNS-over-UDP.
- To send an EDNS message, a user have to create a new DNS record class
instance unnecessarily.
Modifications:
- Make all message types interfaces and add default implementations
- Rename some classes, properties, and constants to match the RFCs
- DnsResource -> DnsRecord
- DnsType -> DnsRecordType
- and many more
- Remove DnsClass and use an integer to support EDNS better
- Add DnsRecordEncoder/DnsRecordDecoder and their default
implementations
- DnsRecord does not require RDATA to be ByteBuf anymore.
- Add DnsRawRecord as the catch-all record type
- Merge DnsHeader into DnsMessage
- Make ResourceLeakDetector track AbstractDnsMessage
- Remove DnsMessage.sender/recipient properties
- Wrap DnsMessage with AddressedEnvelope
- Add DatagramDnsQuest and DatagramDnsResponse for ease of use
- Rename DnsQueryEncoder to DatagramDnsQueryEncoder
- Rename DnsResponseDecoder to DatagramDnsResponseDecoder
- Miscellaneous changes
- Add StringUtil.TAB
Result:
- Cleaner APi
- Can support DNS-over-TCP more easily in the future
- Reduced memory footprint in the default DnsQuery/Response
implementations
- Better leak tracking for DnsMessages
- Possibility to introduce new DnsRecord types in the future and provide
full record encoder/decoder implementation.
- No unnecessary instantiation for an EDNS pseudo resource record
Motivation:
Many projects need some kind a Channel/Connection pool implementation. While the protocols are different many things can be shared, so we should provide a generic API and implementation.
Modifications:
Add ChannelPool / ChannelPoolMap API and implementations.
Result:
Reusable / Generic pool implementation that users can use.
Motivation:
The ByteString class currently assumes the underlying array will be a complete representation of data. This is limiting as it does not allow a subsection of another array to be used. The forces copy operations to take place to compensate for the lack of API support.
Modifications:
- add arrayOffset method to ByteString
- modify all ByteString and AsciiString methods that loop over or index into the underlying array to use this offset
- update all code that uses ByteString.array to ensure it accounts for the offset
- add unit tests to test the implementation respects the offset
Result:
ByteString and AsciiString can represent a sub region of a byte[].
Motivation:
static Package getPackage(Class<?> c) uses synchronized block internally.
Thanks to @jingene for the hint and initial report of the issue.
Modifications:
-Use simple lastIndexOf(...) and substring for a faster implementation
Result:
No more lock condition.
Motivation:
The current implementation does byte by byte comparison, which we have seen
can be a performance bottleneck when the AsciiString is used as the key in
a Map.
Modifications:
Use sun.misc.Unsafe (on supporting platforms) to compare up to eight bytes at a time
and get closer to the performance of String.equals(Object).
Result:
Significant improvement (2x - 6x) in performance over the current implementation.
Benchmark (size) Mode Samples Score Score error Units
i.n.m.i.PlatformDependentBenchmark.arraysBytesEqual 10 thrpt 10 118843477.518 2347259.347 ops/s
i.n.m.i.PlatformDependentBenchmark.arraysBytesEqual 50 thrpt 10 43910319.773 198376.996 ops/s
i.n.m.i.PlatformDependentBenchmark.arraysBytesEqual 100 thrpt 10 26339969.001 159599.252 ops/s
i.n.m.i.PlatformDependentBenchmark.arraysBytesEqual 1000 thrpt 10 2873119.030 20779.056 ops/s
i.n.m.i.PlatformDependentBenchmark.arraysBytesEqual 10000 thrpt 10 306370.450 1933.303 ops/s
i.n.m.i.PlatformDependentBenchmark.arraysBytesEqual 100000 thrpt 10 25750.415 108.391 ops/s
i.n.m.i.PlatformDependentBenchmark.unsafeBytesEqual 10 thrpt 10 248077563.510 635320.093 ops/s
i.n.m.i.PlatformDependentBenchmark.unsafeBytesEqual 50 thrpt 10 128198943.138 614827.548 ops/s
i.n.m.i.PlatformDependentBenchmark.unsafeBytesEqual 100 thrpt 10 86195621.349 1063959.307 ops/s
i.n.m.i.PlatformDependentBenchmark.unsafeBytesEqual 1000 thrpt 10 16920264.598 61615.365 ops/s
i.n.m.i.PlatformDependentBenchmark.unsafeBytesEqual 10000 thrpt 10 1687454.747 6367.602 ops/s
i.n.m.i.PlatformDependentBenchmark.unsafeBytesEqual 100000 thrpt 10 153717.851 586.916 ops/s
Motivation:
The IntObjectHashMap benchmarks show the Agrona collections to be faster on put, lookup, and remove. One major difference is that we're using 2 modulus operations each time we increment the position index while iterating. Agrona uses a mask instead.
Modifications:
Modified the KObjectHashMap to use masking rather than modulus when wrapping the position index. This requires that the capacity be a power of 2.
Result:
Improved performance of IntObjectHashMap.
Motivation:
The usage and code within AsciiString has exceeded the original design scope for this class. Its usage as a binary string is confusing and on the verge of violating interface assumptions in some spots.
Modifications:
- ByteString will be created as a base class to AsciiString. All of the generic byte handling processing will live in ByteString and all the special character encoding will live in AsciiString.
Results:
The AsciiString interface will be clarified. Users of AsciiString can now be clear of the limitations the class imposes while users of the ByteString class don't have to live with those limitations.
Motivation:
To support HTTP2 we need APLN support. This was not provided before when using OpenSslEngine, so SSLEngine (JDK one) was the only bet.
Beside this CipherSuiteFilter was not supported
Modifications:
- Upgrade netty-tcnative and make use of new features to support ALPN and NPN in server and client mode.
- Guard against segfaults after the ssl pointer is freed
- support correctly different failure behaviours
- add support for CipherSuiteFilter
Result:
Be able to use OpenSslEngine for ALPN / NPN for server and client.
Motivation:
At the moment if you want to return a HTTP header containing multiple
values you have to set/add that header once with the values wanted. If
you used set/add with an array/iterable multiple HTTP header fields will
be returned in the response.
Note, that this is indeed a suggestion and additional work and tests
should be added. This is mainly to bring up a discussion.
Modifications:
Added a flag to specify that when multiple values exist for a single
HTTP header then add them as a comma separated string.
In addition added a method to StringUtil to help escape comma separated
value charsequences.
Result:
Allows for responses to be smaller.
Motivation:
isRoot() is an expensive operation. We should avoid calling it if
possible.
Modifications:
Move the isRoot() checks to the end of the 'if' block, so that isRoot()
is evaluated only when really necessary.
Result:
isRoot() is evaluated only when SO_BROADCAST is set and the bind address
is anylocal address.
Motivation:
io.netty.util.internal.PlatformDependent.isRoot() depends on the IS_ROOT field which is filled in during class initialization. This spawns processes and consumes resources, which are not generally necessary to the complete functioning of that class.
Modifications:
This switches the class to use lazy initialization this field inside of the isRoot() method using double-checked locking (http://en.wikipedia.org/wiki/Double-checked_locking).
Result:
The first call to isRoot() will be slightly slower, at a tradeoff that class loading is faster, uses fewer resources and platform errors are avoided unless necessary.
Motivation:
Improvements were made on the main line to support ALPN and mutual
authentication for TLS. These should be backported.
Modifications:
- Backport commits from the master branch
- f8af84d5993456426a63ad0146479147b1a4a5e5
- e74c8edba3fcbfd2e895ed6aac440efeb3aa637f
Result:
Support for ALPN and mutual authentication.
Motivation
Issue #3004 shows that "=" character was not supported as it should in
the HttpPostRequestDecoder in form-data boundary.
Modifications:
Add 2 methods in StringUtil
- split with maxPart argument: String split with max parts only (to prevent multiple '='
to be source of extra split while not needed)
- substringAfter: String part after delimiter (since first part is not
needed)
Use those methods in HttpPostRequestDecoder.
Change and the HttpPostRequestDecoderTest to check using a boundary
beginning with "=".
Results:
The fix implies more stability and fix the issue.
Motivation:
There is not need todo redunant reads of head in peakNode as we can just spin on next() until it becomes visible.
Modifications:
Remove redundant reads of head in peakNode. This is based on @nitsanw's patch for akka.
See https://github.com/akka/akka/pull/15596
Result:
Less volatile access.
Motivation:
While benchmarking the native transport with gathering writes I noticed that it is quite slow. This is due the fact that we need to do a lot of array copies to get the buffers into the iov array.
Modification:
Introduce a new class calles IovArray which allows to fill buffers directly in a iov array that can be passed over to JNI without any array copies. This gives a nice optimization in terms of speed when doing gathering writes.
Result:
Big performance improvement when doing gathering writes. See the included benchmark...
Before:
[nmaurer@xxx]~% wrk/wrk -H 'Host: localhost' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Connection: keep-alive' -d 120 -c 256 -t 16 --pipeline 256 http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 23.44ms 16.37ms 259.57ms 91.77%
Req/Sec 181.99k 31.69k 304.60k 78.12%
346544071 requests in 2.00m, 46.48GB read
Requests/sec: 2887885.09
Transfer/sec: 396.59MB
With this change:
[nmaurer@xxx]~% wrk/wrk -H 'Host: localhost' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Connection: keep-alive' -d 120 -c 256 -t 16 --pipeline 256 http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 21.93ms 16.33ms 305.73ms 92.34%
Req/Sec 194.56k 33.75k 309.33k 77.04%
369617503 requests in 2.00m, 49.57GB read
Requests/sec: 3080169.65
Transfer/sec: 423.00MB
Motivation:
Now Netty has a few problems with null values.
Modifications:
- Check HAProxyProxiedProtocol in HAProxyMessage constructor and throw NPE if it is null.
If HAProxyProxiedProtocol is null we will set AddressFamily as null. So we will get NPE inside checkAddress(String, AddressFamily) and it won't be easy to understand why addrFamily is null.
- Check File in DiskFileUpload.toString().
If File is null we will get NPE when calling toString() method.
- Check Result<String> in MqttDecoder.decodeConnectionPayload(...).
If !mqttConnectVariableHeader.isWillFlag() || !mqttConnectVariableHeader.hasUserName() || !mqttConnectVariableHeader.hasPassword() we will get NPE when we will try to create new instance of MqttConnectPayload.
- Check Unsafe before calling unsafe.getClass() in PlatformDependent0 static block.
- Removed unnecessary null check in WebSocket08FrameEncoder.encode(...).
Because msg.content() can not return null.
- Removed unnecessary null check in DefaultStompFrame(StompCommand) constructor.
Because we have this check in the super class.
- Removed unnecessary null checks in ConcurrentHashMapV8.removeTreeNode(TreeNode<K,V>).
- Removed unnecessary null check in OioDatagramChannel.doReadMessages(List<Object>).
Because tmpPacket.getSocketAddress() always returns new SocketAddress instance.
- Removed unnecessary null check in OioServerSocketChannel.doReadMessages(List<Object>).
Because socket.accept() always returns new Socket instance.
- Pass Unpooled.buffer(0) instead of null inside CloseWebSocketFrame(boolean, int) constructor.
If we will pass null we will get NPE in super class constructor.
- Added throw new IllegalStateException in GlobalEventExecutor.awaitInactivity(long, TimeUnit) if it will be called before GlobalEventExecutor.execute(Runnable).
Because now we will get NPE. IllegalStateException will be better in this case.
- Fixed null check in OpenSslServerContext.setTicketKeys(byte[]).
Now we throw new NPE if byte[] is not null.
Result:
Added new null checks when it is necessary, removed unnecessary null checks and fixed some NPE problems.
Modifications:
- Added a static modifier for CompositeByteBuf.Component.
This class is an inner class, but does not use its embedded reference to the object which created it. This reference makes the instances of the class larger, and may keep the reference to the creator object alive longer than necessary.
- Removed unnecessary boxing/unboxing operations in HttpResponseDecoder, RtspResponseDecoder, PerMessageDeflateClientExtensionHandshaker and PerMessageDeflateServerExtensionHandshaker
A boxed primitive is created from a String, just to extract the unboxed primitive value.
- Removed unnecessary 3 times calculations in DiskAttribute.addContent(...).
- Removed unnecessary checks if file exists before call mkdirs() in NativeLibraryLoader and PlatformDependent.
Because the method mkdirs() has this check inside.
- Removed unnecessary `instanceof AsciiString` check in StompSubframeAggregator.contentLength(StompHeadersSubframe) and StompSubframeDecoder.getContentLength(StompHeaders, long).
Because StompHeaders.get(CharSequence) always returns java.lang.String.
Motivation:
When system is in short of entrophy, the initialization of
ThreadLocalRandom can take at most 3 seconds. The initialization occurs
when ThreadLocalRandom.current() is invoked first time, which might be
much later than the moment when the application has started. If we
start the initialization of ThreadLocalRandom as early as possible, we
can reduce the perceived time taken for the retrieval.
Modification:
Begin the initialization of ThreadLocalRandom in InternalLoggerFactory,
potentially one of the firstly initialized class in a Netty application.
Make DefaultChannelId retrieve the current process ID before retrieving
the current machine ID, because retrieval of a machine ID is more likely
to use ThreadLocalRandom.current().
Use a dummy channel ID for EmbeddedChannel, which prevents many unit
tests from creating a ThreadLocalRandom instance.
Result:
We gain extra 100ms at minimum for initialSeedUniquifier generation. If
an application has its own initialization that takes long enough time
and generates good amount of entrophy, it is very likely that we will
gain a lot more.
Motivation:
When a user tries to use netty on android it currently fails with "Could not find class 'sun.misc.Cleaner'"
Modification:
Encapsulate sun.misc.Cleaner usage in extra class to workaround this isssue.
Result:
Netty can be used on android again
Motivation:
During some refactoring we changed PlatformDependend0 to use sun.nio.ch.DirectBuffer for release direct buffers. This broke support for android as the class does not exist there and so an exception is thrown.
Modification:
Use again the fieldoffset to get access to Cleaner for release direct buffers.
Result:
Netty can be used on android again
Motivation:
Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used.
Modification:
- Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance.
- Recycling of the same object multiple times without acquire it will fail.
- Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler
These changes are based on @belliottsmith 's work that was part of #2504.
Result:
Huge increase in performance.
4.0 branch without this commit:
Benchmark (size) Mode Samples Score Score error Units
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
4.0 branch with this commit:
Benchmark (size) Mode Samples Score Score error Units
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
Motivation:
The tail node reference writes (by producer threads) are very likely to
invalidate the cache line holding the headRef which is read by the
consumer threads in order to access the padded reference to the head
node. This is because the resulting layout for the object is:
- header
- Object AtomicReference.value -> Tail node
- Object MpscLinkedQueue.headRef -> PaddedRef -> Head node
This is 'passive' false sharing where one thread reads and the other
writes. The current implementation suffers from further passive false
sharing potential from any and all neighbours to the queue object as no
pre/post padding is provided for the class fields.
Modifications:
Fix the memory layout by adding pre-post padding for the head node and
putting the tail node reference in the same object.
Result:
Fixed false sharing
Motivation:
When Netty runs in a managed environment such as web application server,
Netty needs to provide an explicit way to remove the thread-local
variables it created to prevent class loader leaks.
FastThreadLocal uses different execution paths for storing a
thread-local variable depending on the type of the current thread.
It increases the complexity of thread-local removal.
Modifications:
- Moved FastThreadLocal and FastThreadLocalThread out of the internal
package so that a user can use it.
- FastThreadLocal now keeps track of all thread local variables it has
initialized, and calling FastThreadLocal.removeAll() will remove all
thread-local variables of the caller thread.
- Added FastThreadLocal.size() for diagnostics and tests
- Introduce InternalThreadLocalMap which is a mixture of hard-wired
thread local variable fields and extensible indexed variables
- FastThreadLocal now uses InternalThreadLocalMap to implement a
thread-local variable.
- Added ThreadDeathWatcher.unwatch() so that PooledByteBufAllocator
tells it to stop watching when its thread-local cache has been freed
by FastThreadLocal.removeAll().
- Added FastThreadLocalTest to ensure that removeAll() works
- Added microbenchmark for FastThreadLocal and JDK ThreadLocal
- Upgraded to JMH 0.9
Result:
- A user can remove all thread-local variables Netty created, as long as
he or she did not exit from the current thread. (Note that there's no
way to remove a thread-local variable from outside of the thread.)
- FastThreadLocal exposes more useful operations such as isSet() because
we always implement a thread local variable via InternalThreadLocalMap
instead of falling back to JDK ThreadLocal.
- FastThreadLocalBenchmark shows that this change improves the
performance of FastThreadLocal even more.