Motivation
#8563 highlighted race conditions introduced by the prior optimistic
update optimization in 83a19d5650. These
were known at the time but considered acceptable given the perf
benefit in high contention scenarios.
This PR proposes a modified approach which provides roughly half the
gains but stronger concurrency semantics. Race conditions still exist
but their scope is narrowed to much less likely cases (releases
coinciding with retain overflow), and even in those
cases certain guarantees are still assured. Once release() returns true,
all subsequent release/retains are guaranteed to throw, and in
particular deallocate will be called at most once.
Modifications
- Use even numbers internally (including -ve) for live refcounts
- "Final" release changes to odd number (equivalent to refcount 0)
- Retain still uses faster getAndAdd, release uses CAS loop
- First CAS attempt uses non-volatile read
- Thread.yield() after a failed CAS provides a net gain
Result
More (though not completely) robust concurrency semantics for ref
counting; increased latency under high contention, but still roughly
twice as fast as the original logic. Bench results to follow
Motivation:
We did miss to use MessageFormatter inside LocationAwareSlf4jLogger and so {} was not correctly replaced in log messages when using slf4j.
This regression was introduced by afe0767e9c.
Modifications:
- Make use of MessageFormatter
- Add unit test.
Result:
Fixes https://github.com/netty/netty/issues/8483.
Motivation:
We can change from using compareAndSet to addAndGet, which emits a different CPU instruction on x86 (CMPXCHG to XADD) when count direct memory usage. This instruction is cheaper in general and so produce less overhead on the "happy path". If we detect too much memory usage we just rollback the change before throwing the Error.
Modifications:
Replace compareAndSet(...) with addAndGet(...)
Result:
Less overhead when tracking direct memory.
Motivation:
We should allow adjustment of the leak detecting sampling interval when in SAMPLE mode.
Modifications:
Added new int property io.netty.leakDetection.samplingInterval
Result:
Be able to consume changes made by the user.
Motivation:
There is a racy UnsupportedOperationException instead because the task removal is delegated to MpscChunkedArrayQueue that does not support removal. This happens with SingleThreadEventExecutor that overrides the newTaskQueue to return an MPSC queue instead of the LinkedBlockingQueue returned by the base class such as NioEventLoop, EpollEventLoop and KQueueEventLoop.
Modifications:
- Catch the UnsupportedOperationException
- Add unit test.
Result:
Fix#8475
Motivation:
allLeaks is to store the DefaultResourceLeak. When we actually use it, the key is DefaultResourceLeak, and the value is actually a meaningless value.
We only care about the keys of allLeaks and don't care about the values. So Set is more in line with this scenario.
Using Set as a container is more consistent with the definition of a container than Map.
Modification:
Replace allLeaks with set. Create a thread-safe set using 'Collections.newSetFromMap(new ConcurrentHashMap<DefaultResourceLeak<?>, Boolean>()).'
Motivation:
HWT does not support anything smaller then 1ms so we should make it clear that this is the case.
Modifications:
Log a warning if < 1ms is used.
Result:
Less suprising behaviour.
Motivation:
In netty we use our own max direct memory limit that can be adjusted by io.netty.maxDirectMemory but we do not take this in acount when maxDirectMemory() is used. That will lead to non optimal configuration of PooledByteBufAllocator in some cases.
This came up on stackoverflow:
https://stackoverflow.com/questions/53097133/why-is-default-num-direct-arena-derived-from-platformdependent-maxdirectmemory
Modifications:
Correctly respect io.netty.maxDirectMemory and so configure PooledByteBufAllocator correctly by default.
Result:
Correct value for max direct memory.
Motivation:
There are currently many more places where this could be used which were
possibly not considered when the method was added.
If https://github.com/netty/netty/pull/8388 is included in its current
form, a number of these places could additionally make use of the same
BYTE_ARRAYS threadlocal.
There's also a couple of adjacent places where an optimistically-pooled
heap buffer is used for temp byte storage which could use the
threadlocal too in preference to allocating a temp heap bytebuf wrapper.
For example
https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/ByteBufUtil.java#L1417.
Modifications:
Replace new byte[] with PlatformDependent.allocateUninitializedArray()
where appropriate; make use of ByteBufUtil.getBytes() in some places
which currently perform the equivalent logic, including avoiding copy of
backing array if possible (although would be rare).
Result:
Further potential speed-up with java9+ and appropriate compile flags.
Many of these places could be on latency-sensitive code paths.
Motivation:
trackedObject != null gives no guarantee that trackedObject remains reachable. This may cause problems related to premature finalization: false leak detector warnings.
Modifications:
Add private method reachabilityFence0 that works on JDK 8 and can be factored out into PlatformDependent. Later, it can be swapped for the real Reference.reachabilityFence.
Result:
No false leak detector warnings in future versions of JDK.
Motivation:
DefaultResourceLeak.toString() did include the wrong value for duplicated records.
Modifications:
Include the correct value.
Result:
Correct toString() implementation.
Motivation:
Java since version 6 has the wrapper for the ConcurrentHashMap that could be created via Collections.newSetFromMap(map). So no need to create own ConcurrentSet class. Also, since netty plans to switch to Java 8 soon there is another method for that - ConcurrentHashMap.newKeySet().
For now, marking this class @deprecated would be enough, just to warn users who use netty's ConcurrentSet. After switching to Java 8 ConcurrentSet should be removed and replaced with ConcurrentHashMap.newKeySet().
Modification:
ConcurrentSet deprecated.
Motivation:
Seems like IntegerHolder counterHashCode field is the very old legacy field that is no longer used. Should be marked as deprecated and removed in the future versions.
Modification:
IntegerHolder class, InternalThreadLocalMap.counterHashCode() and InternalThreadLocalMap.setCounterHashCode(IntegerHolder counterHashCode) are now deprecated.
Motivation:
When a X509TrustManager is used while configure the SslContext the JDK automatically does some extra checks during validation of provided certs by the remote peer. We should do the same when our native implementation is used.
Modification:
- Automatically wrap a X509TrustManager and so do the same validations as the JDK does.
- Add unit tests.
Result:
More consistent behaviour. Fixes https://github.com/netty/netty/issues/6664.
Motivation:
The Epoll transport checks to see if there are any scheduled tasks
before entering epoll_wait, and resets the timerfd just before.
This causes an extra syscall to timerfd_settime before doing any
actual work. When scheduled tasks aren't added frequently, or
tasks are added with later deadlines, this is unnecessary.
Modification:
Check the *deadline* of the peeked task in EpollEventLoop, rather
than the *delay*. If it hasn't changed since last time, don't
re-arm the timer
Result:
About 2us faster on gRPC RTT 50pct latency benchmarks.
Before (2 runs for 5 minutes, 1 minute of warmup):
```
50.0%ile Latency (in nanos): 64267
90.0%ile Latency (in nanos): 72851
95.0%ile Latency (in nanos): 78903
99.0%ile Latency (in nanos): 92327
99.9%ile Latency (in nanos): 119691
100.0%ile Latency (in nanos): 13347327
QPS: 14933
50.0%ile Latency (in nanos): 63907
90.0%ile Latency (in nanos): 73055
95.0%ile Latency (in nanos): 79443
99.0%ile Latency (in nanos): 93739
99.9%ile Latency (in nanos): 123583
100.0%ile Latency (in nanos): 14028287
QPS: 14936
```
After:
```
50.0%ile Latency (in nanos): 62123
90.0%ile Latency (in nanos): 70795
95.0%ile Latency (in nanos): 76895
99.0%ile Latency (in nanos): 90887
99.9%ile Latency (in nanos): 117819
100.0%ile Latency (in nanos): 14126591
QPS: 15387
50.0%ile Latency (in nanos): 61021
90.0%ile Latency (in nanos): 70311
95.0%ile Latency (in nanos): 76687
99.0%ile Latency (in nanos): 90887
99.9%ile Latency (in nanos): 119527
100.0%ile Latency (in nanos): 6351615
QPS: 15571
```
* Log the correct line-number when using SLF4j with netty if possible.
Motivation:
At the moment we do not log the correct line number in many cases as it will log the line number of the logger wrapper itself. Slf4j does have an extra interface that can be used to filter out these nad make it more usable with logging wrappers.
Modifications:
Detect if the returned logger implements LocationAwareLogger and if so make use of its extra methods to be able to log the correct origin of the log request.
Result:
Better logging when using slf4j.
Motivation:
In Java8 and earlier we used reflection to replace the used key set if not otherwise told. This does not work on Java9 and later without special flags as its not possible to call setAccessible(true) on the Field anymore.
Modifications:
- Use Unsafe to instrument the Selector with out special set when sun.misc.Unsafe is present and we are using Java9+.
Result:
NIO transport produce less GC on Java9 and later as well.
Motivation:
In Java8 and earlier we used reflection to detect if unaligned access is supported. This fails in Java9 and later as we would need to change the accessible level of the method.
Lucky enough we can use Unsafe directly to read the content of the static field here.
Modifications:
Add special handling for detecting if unaligned access is supported on Java9 and later which does not fail due jigsaw.
Result:
Better and more correct detection on Java9 and later.
Motivation:
At the moment we will just assume the correct version of log4j2 is used when we find it on the classpath. This may lead to an AbstractMethodError at runtime. We should not use log4j2 if the version is not correct.
Modifications:
Check on class loading if we can use Log4J2 or not.
Result:
Fixes#8217.
Motivation:
Log4J2Logger had some code-duplication with AbstractInternalLogger
Modifications:
Reuse AbstractInternaLogger.EXCEPTION_MESSAGE in Log4J2Logger and so remove some code-duplication
Result:
Less duplicated code.
* We should be able to use the ByteBuffer cleaner on java8 (and earlier versions) even if sun.misc.Unsafe is not present.
Motivation:
At the moment we have a hard dependency on sun.misc.Unsafe to use the Cleaner on Java8 and earlier. This is not really needed as we can still use pure reflection if sun.misc.Unsafe is not present.
Modifications:
Refactor Cleaner6 to fallback to pure reflection if sun.misc.Unsafe is not present on system.
Result:
More timely releasing of direct memory on Java8 and earlier when sun.misc.Unsafe is not present.
Motivation:
f77891cc17 changed slightly how we detect if we should prefer direct buffers or not but did miss to also take this into account when logging.
Modifications:
Fix branch for log message to reflect changes in f77891cc17.
Result:
Correct logging.
Motivation:
There was a race condition between the task submitter and task executor threads such that the last Runnable submitted may not get executed.
Modifications:
The bug was fixed by checking the task queue and state in the task executor thread after it saw the task queue was empty.
Result:
Fixes#8230
Motivation:
We should prefer direct buffers whenever we can use the cleaner even if sun.misc.Unsafe is not present.
Modifications:
Correctly prefer direct buffers in all cases.
Result:
More correct code.
Motivation:
CleanerJava9 currently fails whever a SecurityManager is installed. We should make use of AccessController.doPrivileged(...) so the user can give it the correct rights.
Modifications:
Use doPrivileged(...) when needed.
Result:
Fixes https://github.com/netty/netty/issues/8201.
Motivation:
Recycler may produce a NPE when the same object is recycled multiple times from different threads.
Modifications:
- Check if the id has changed or if the Stack became null and if so throw an IllegalStateException
- Add unit test
Result:
Fixes https://github.com/netty/netty/issues/8220.
* Try to monkey-patch library id when shading is used and we are on MacOS / OSX.
Motivation:
ea4c315b45 did ensure we support using multiple versions of the same shaded native library but the user still needed to run install_name_tool -id on MacOS to ensure the ID is unique.
This is kind of error prone and also means that the shading itself would need to be done on MacOS / OSX.
This is related to https://github.com/netty/netty/issues/7272.
Modifications:
- Monkey patch the shaded native lib on MacOS to ensure the id is unique while unpacking it to the tempory location.
Result:
Easier way of using shaded native libs in netty.
Motivation:
Java9 and later does the safepoint polling by itself so there is not need for us to do it.
Modifications:
Check for java version before doing manual safepoint polling.
Result:
Less custom code and less overhead when using java9 and later. Fixes https://github.com/netty/netty/issues/8122.
Motivation:
We do not correctly check for previous calles of setUncancellable() in getNow() which may result in ClassCastException as we incorrectly return the internally UNCANCELLABLE object and not null if setUncancellable() we as called before.
Modifications:
Correctly check for UNCANCELLABLE and add unit test.
Result:
Fixes https://github.com/netty/netty/issues/8135.
Motivation:
We incorrectly calculated the length that was used for our for loop in AsciiString.indexOf(...). This lead to a possible ArrayIndexOutOfBoundsException.
Modifications:
- Not include the start in the length calculation
- Add unit test.
Result:
Fixes https://github.com/netty/netty/issues/8112.
Motivation:
Users should not see a scary log message when Netty is initialized if
Netty configuration explicitly disables unsafe. The log message that
produces this warning was previously guarded but by recent refactoring
a bug was introduced inside the guard helper method.
Modifications:
This commit brings back the guard against the scary log message if
unsafe is explicitly disabled.
Result:
No log message is produced when unsafe is unavailable because Netty was
told to not look for it.
Relates https://github.com/netty/netty/pull/5624, https://github.com/netty/netty/pull/6696
Motivation:
ObjectCleaner does start a Thread to handle the cleaning of resources which leaks into the users application. We should not use it in netty itself to make things more predictable.
Modifications:
- Remove usage of ObjectCleaner and use finalize as a replacement when possible.
- Clarify javadocs for FastThreadLocal.onRemoval(...) to ensure its clear that remove() is not guaranteed to be called when the Thread completees and so this method is not enough to guarantee cleanup for this case.
Result:
Fixes https://github.com/netty/netty/issues/8017.
Motivation:
I'm not sure if trivial changes like this are interesting :-) But I
noticed that the PlatformDependent.maxDirectMemory0() method is called
twice unnecessarily during static initialization (on the default path at
least).
Modifications:
Use constant MAX_DIRECT_MEMORY already set to the same value instead of
calling maxDirectMemory0() again.
Result:
A surely imperceivable reduction in operations performed at startup.
Motivation
There is a cost to concatenating strings and calling methods that will be wasted if the Logger's level is not enabled.
Modifications
Check if Log level is enabled before producing log statement. These are just a few cases found by RegEx'ing in the code.
Result
Tiny bit more efficient code.