Motivation:
GraalVM native images are a new way to deliver java applications. Netty is one of the most popular libraries however there are a few limitations that make it impossible to use with native images out of the box. Adding a few metadata (in specific modules will allow the compilation to success and produce working binaries)
Modification:
Added properties files in `META-INF` and substitutions classes (under `internal.svm`) will solve the compilation issues. The substitutions classes are not visible and do not have a public constructor so they are not visible to end users.
Result:
Fixes#8959
This fix is very conservative as it applies the minimum config required to build:
* pure netty servers
* vert.x applications
* grpc applications
The build is having trouble due to checkstyle which does not seem to be able to find the copyright notice on property files.
Motivation:
In GlobalEventExecutorTest we used Thread.sleep(...) which can produce flaky results (as seen on the CI). We should use another alternative during tests.
Modifications:
Replace Thread.sleep(...) with join()
Result:
No more flaky GlobalEventExecutor tests.
Motivation:
IKVM.NET seems to ship a bug sun.misc.Unsafe class, for this reason we should better disable our sun.misc.Unsafe usage when we detect IKVM.NET is used.
Modifications:
Check if IKVM.NET is used and if so do not use sun.misc.Unsafe by default.
Result:
Fixes https://github.com/netty/netty/issues/9035 and https://github.com/netty/netty/issues/8916.
Motivation
AbstractReferenceCounted and AbstractReferenceCountedByteBuf contain
duplicate logic for managing the volatile refcount in an optimized and
consistent manner, which increased in complexity in #8583. It's possible
to extract this into a common helper class now that all access is via an
AtomicIntegerFieldUpdater.
Modifications
- Move duplicate logic into a shared ReferenceCountUpdater class
- Incorporate some additional simplification for the most common single
increment/decrement cases (fewer checks/operations)
Result
Less code duplication, better encapsulation of the "non-trivial"
internal volatile refcount manipulation
Motivation:
PooledByteBufAllocator uses a PoolThreadCache per Thread that allocates / deallocates to minimize the performance overhead. This PoolThreadCache is trimmed after X allocations to free up buffers that are not allocated for a long time. This works out quite well when the app continues to allocate but fails if the app stops to allocate frequently (for whatever reason) and so a lot of memory is wasted and not given back to the arena / freed.
Modifications:
- Add a ThreadExecutorMap that offers multiple methods that wrap Runnable / ThreadFactory / Executor and allow to call ThreadExecutorMap.currentEventExecutor() to get the current executing EventExecutor for the calling Thread.
- Use these methods in the constructors of our EventExecutor implementations (which also covers the EventLoop implementations)
- Add io.netty.allocator.cacheTrimIntervalMillis system property which can be used to specify a fixed rate / interval on which we should try to trim the PoolThreadCache for a EventExecutor that allocates.
- Add PooledByteBufAllocator.trimCurrentThreadCache() to allow the user to trim the cache of the calling thread manually.
- Add testcases
- Introduce FastThreadLocal.getIfExists()
Result:
Allow to better / more frequently trim PoolThreadCache and so give back memory to the area / system.
Motivation:
We dont use ObjectCleaner in our FastThreadLocal anymore so we also dont need to take special care to store it there anymore.
Modifications:
Remove code that is not needed anymore.
Result:
Code cleanup.
Motivation:
GlobalEventExecutor does already provide all guarantees of OrderedEventExecutor so it should implement it.
Modifications:
Let GlobalEventExecutor implement OrderedEventExecutor.
Result:
Make it more clear how execution order is handled in GlobalEventExecutor.
Motivation:
This counter is very useful in order to monitor Netty without having every ByteBufAllocator in the JVM
Modification:
Expose the value of DIRECT_MEMORY_COUNTER as we are already doing for DIRECT_MEMORY_LIMIT.
We are returning -1 in case that DIRECT_MEMORY_COUNTER is not available.
Result:
Be able to get the amount of direct memory used.
Motivation:
We had a typo in NativeLibraryLoader debug log message which could misslead the user.
Modifications:
Fix typo to correctly state java.library.path
Result:
Correct and less confusing log message
Motivation:
2bb9f64e16 introduced a change which made it possible to use different shaded versions of netty-tcnative on the classpath. This only partly worked as we did not correctly handled the case when os / arch is part of the library name (which is the case when netty-tcnative-boringssl-static is used with the uber jar).
Modifications:
- If patching the ID failed we retry again with the os / arch stripped
- Add unit tests to verify that patching ID now works with and without os / arch as suffix.
Result:
Using multiple shaded version of netty-tcnative-boringssl-static on MacOS works.
Motivation:
We should run a CI job using J9 to ensure netty also works when using different JVMs.
Modifications:
- Adjust PooledByteBufAllocatorTest to be able to complete faster when using a JVM which takes longer when joining Threads (this seems to be the case with J9).
- Skip UDT tests on J9 as UDT is not supported there.
Result:
Be able to run CI against J9.
Motivation:
PromiseCombiner is not thread-safe and even assumes all added Futures are using the same EventExecutor. This is kind of fragile as we do not enforce this. We need to enforce this contract to ensure it's safe to use and easy to spot concurrency problems.
Modifications:
- Add new contructor to PromiseCombiner that takes an EventExecutor and deprecate the old non-arg constructor.
- Check if methods are called from within the EventExecutor thread and if not fail
- Correctly dispatch on the right EventExecutor if the Future uses a different EventExecutor to eliminate concurrency issues.
Result:
More safe use of PromiseCombiner + enforce correct usage / contract.
Motivation:
https://github.com/netty/netty/pull/8866 added support for calling Iterator.remove() but did not add a testcase.
Modifications:
Add testcase to ensure removal works.
Result:
Better test-coverage.
Motivation:
As ActiveMQ project using netty, we want to make use of this class, unfortunately the iterator on values(), seems to not support remove method, even so the delegated iterator does. Currently we have to clone and modify this class locally albeit a one line change is needed, it would be ideal if netty could allow remove, then removing the need to maintain a clone.
Modifications:
* remove throws UnsupportedOperationException, and instead call remove method on delegated iterator
Result:
Be able to call Iterator.remove() for the values.
Motivation:
When users' /tmp is noexec, NativeLibraryLoader logs a message informing
them how to fix the problem by setting a system property. However, if
Netty has been shaded that message will tell them to set the un-shaded
system property name, which won't work.
Modifications:
Change the code to let shading tools rename the native.workdir property
name reference within user-visible log messages.
Notably, debug logs were _not_ changed, as there's many debug statements
including a variety of property names. Fixing them would be a much more
invasive change and have limited benefit.
Result:
The users will see the correctly-named system property to set if they
are using a noexec /tmp.
Motivation:
When we fail a call to PromiseCombiner.finish(...) because of a null argument we must not update the internal state before throwing.
Modifications:
- First do the null check and only after we validated that the argument is not null update the internal state
- Add test case.
Modifications:
Do not mess up internal state of PromiseCombiner when finish(...) is called with a null argument.
Result:
After your change, what will change.
Motivation:
Just was looking through code and found 1 interesting place DateFormatter.tryParseMonth that was not very effective, so I decided to optimize it a bit.
Modification:
Changed DateFormatter.tryParseMonth method. Instead of invocation regionMatch() for every month - compare chars one by one.
Result:
DateFormatter.parseHttpDate method performance improved from ~3% to ~15%.
Benchmark (DATE_STRING) Mode Cnt Score Error Units
DateFormatter2Benchmark.parseHttpHeaderDateFormatter Sun, 27 Jan 2016 19:18:46 GMT thrpt 6 4142781.221 ± 82155.002 ops/s
DateFormatter2Benchmark.parseHttpHeaderDateFormatter Sun, 27 Dec 2016 19:18:46 GMT thrpt 6 3781810.558 ± 38679.061 ops/s
DateFormatter2Benchmark.parseHttpHeaderDateFormatterNew Sun, 27 Jan 2016 19:18:46 GMT thrpt 6 4372569.705 ± 30257.537 ops/s
DateFormatter2Benchmark.parseHttpHeaderDateFormatterNew Sun, 27 Dec 2016 19:18:46 GMT thrpt 6 4339785.100 ± 57542.660 ops/s
Motivation
As pointed out by @91he in
https://github.com/netty/netty/pull/8595#issuecomment-459181794, there
is a remaining bug in LocationAwareSlf4JLogger following the updates
done in #8595. The logging methods which take a varargs message
parameter array should format using MessageFormatter.arrayFormat rather
than MessageFormatter.format.
Modifications
Change varargs param methods in LocationAwareSlf4JLogger to use
MessageFormatter.arrayFormat and extend unit test to cover these cases.
Results
Correct log output when logging messages with > 2 parameters when using
LocationAwareSlf4JLogger.
Motivation:
If there are no listeners attached to the promise when full-filling it we do not need to schedule a task to notify.
Modifications:
- Don't schedule a task if there is nothing to notify.
- Add unit tests.
Result:
Fixes https://github.com/netty/netty/issues/8795.
Motivation:
To conform to the CharSequence interface we need to return an empty CharSequence when start == end index and a subSequence is requested.
Modifications:
- Correctly handle the case where start == end
- Add unit test
Result:
Fix https://github.com/netty/netty/issues/8796.
Motivation:
We need to update to a new checkstyle plugin to allow the usage of lambdas.
Modifications:
- Update to new plugin version.
- Fix checkstyle problems.
Result:
Be able to use checkstyle plugin which supports new Java syntax.
Motivation:
Incrementing two variables in sync is not necessary when only one will do.
Modifications:
- Remove `j` from `for` loop and replace with `i`.
- Add more unit testing scenarios to cover changed code.
Results:
Unnecessary variable removed.
Motivation:
We should try removing all FastThreadLocals for the Thread before we notify the termination. future. The user may block on the future and once it unblocks the JVM may terminate and start unloading classes.
Modifications:
Remove all FastThreadLocals for the Thread before notify termination future.
Result:
Fixes https://github.com/netty/netty/issues/6596.
Motivation:
In windows if the project is in a path that contains whitespace,
resources cannot be accessed and tests fail.
Modifications:
Adds ResourcesUtil.java in netty-common. Tests use ResourcesUtil.java to access a resource.
Result:
Being able to build netty in a path containing whitespace
Motivation
#8563 highlighted race conditions introduced by the prior optimistic
update optimization in 83a19d5650. These
were known at the time but considered acceptable given the perf
benefit in high contention scenarios.
This PR proposes a modified approach which provides roughly half the
gains but stronger concurrency semantics. Race conditions still exist
but their scope is narrowed to much less likely cases (releases
coinciding with retain overflow), and even in those
cases certain guarantees are still assured. Once release() returns true,
all subsequent release/retains are guaranteed to throw, and in
particular deallocate will be called at most once.
Modifications
- Use even numbers internally (including -ve) for live refcounts
- "Final" release changes to odd number (equivalent to refcount 0)
- Retain still uses faster getAndAdd, release uses CAS loop
- First CAS attempt uses non-volatile read
- Thread.yield() after a failed CAS provides a net gain
Result
More (though not completely) robust concurrency semantics for ref
counting; increased latency under high contention, but still roughly
twice as fast as the original logic. Bench results to follow
Motivation:
We did miss to use MessageFormatter inside LocationAwareSlf4jLogger and so {} was not correctly replaced in log messages when using slf4j.
This regression was introduced by afe0767e9c.
Modifications:
- Make use of MessageFormatter
- Add unit test.
Result:
Fixes https://github.com/netty/netty/issues/8483.
Motivation:
We can change from using compareAndSet to addAndGet, which emits a different CPU instruction on x86 (CMPXCHG to XADD) when count direct memory usage. This instruction is cheaper in general and so produce less overhead on the "happy path". If we detect too much memory usage we just rollback the change before throwing the Error.
Modifications:
Replace compareAndSet(...) with addAndGet(...)
Result:
Less overhead when tracking direct memory.
Motivation:
We should allow adjustment of the leak detecting sampling interval when in SAMPLE mode.
Modifications:
Added new int property io.netty.leakDetection.samplingInterval
Result:
Be able to consume changes made by the user.
Motivation:
There is a racy UnsupportedOperationException instead because the task removal is delegated to MpscChunkedArrayQueue that does not support removal. This happens with SingleThreadEventExecutor that overrides the newTaskQueue to return an MPSC queue instead of the LinkedBlockingQueue returned by the base class such as NioEventLoop, EpollEventLoop and KQueueEventLoop.
Modifications:
- Catch the UnsupportedOperationException
- Add unit test.
Result:
Fix#8475
Motivation:
allLeaks is to store the DefaultResourceLeak. When we actually use it, the key is DefaultResourceLeak, and the value is actually a meaningless value.
We only care about the keys of allLeaks and don't care about the values. So Set is more in line with this scenario.
Using Set as a container is more consistent with the definition of a container than Map.
Modification:
Replace allLeaks with set. Create a thread-safe set using 'Collections.newSetFromMap(new ConcurrentHashMap<DefaultResourceLeak<?>, Boolean>()).'
Motivation:
HWT does not support anything smaller then 1ms so we should make it clear that this is the case.
Modifications:
Log a warning if < 1ms is used.
Result:
Less suprising behaviour.
Motivation:
In netty we use our own max direct memory limit that can be adjusted by io.netty.maxDirectMemory but we do not take this in acount when maxDirectMemory() is used. That will lead to non optimal configuration of PooledByteBufAllocator in some cases.
This came up on stackoverflow:
https://stackoverflow.com/questions/53097133/why-is-default-num-direct-arena-derived-from-platformdependent-maxdirectmemory
Modifications:
Correctly respect io.netty.maxDirectMemory and so configure PooledByteBufAllocator correctly by default.
Result:
Correct value for max direct memory.
Motivation:
There are currently many more places where this could be used which were
possibly not considered when the method was added.
If https://github.com/netty/netty/pull/8388 is included in its current
form, a number of these places could additionally make use of the same
BYTE_ARRAYS threadlocal.
There's also a couple of adjacent places where an optimistically-pooled
heap buffer is used for temp byte storage which could use the
threadlocal too in preference to allocating a temp heap bytebuf wrapper.
For example
https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/ByteBufUtil.java#L1417.
Modifications:
Replace new byte[] with PlatformDependent.allocateUninitializedArray()
where appropriate; make use of ByteBufUtil.getBytes() in some places
which currently perform the equivalent logic, including avoiding copy of
backing array if possible (although would be rare).
Result:
Further potential speed-up with java9+ and appropriate compile flags.
Many of these places could be on latency-sensitive code paths.
Motivation:
trackedObject != null gives no guarantee that trackedObject remains reachable. This may cause problems related to premature finalization: false leak detector warnings.
Modifications:
Add private method reachabilityFence0 that works on JDK 8 and can be factored out into PlatformDependent. Later, it can be swapped for the real Reference.reachabilityFence.
Result:
No false leak detector warnings in future versions of JDK.
Motivation:
DefaultResourceLeak.toString() did include the wrong value for duplicated records.
Modifications:
Include the correct value.
Result:
Correct toString() implementation.
Motivation:
Java since version 6 has the wrapper for the ConcurrentHashMap that could be created via Collections.newSetFromMap(map). So no need to create own ConcurrentSet class. Also, since netty plans to switch to Java 8 soon there is another method for that - ConcurrentHashMap.newKeySet().
For now, marking this class @deprecated would be enough, just to warn users who use netty's ConcurrentSet. After switching to Java 8 ConcurrentSet should be removed and replaced with ConcurrentHashMap.newKeySet().
Modification:
ConcurrentSet deprecated.
Motivation:
Seems like IntegerHolder counterHashCode field is the very old legacy field that is no longer used. Should be marked as deprecated and removed in the future versions.
Modification:
IntegerHolder class, InternalThreadLocalMap.counterHashCode() and InternalThreadLocalMap.setCounterHashCode(IntegerHolder counterHashCode) are now deprecated.
Motivation:
When a X509TrustManager is used while configure the SslContext the JDK automatically does some extra checks during validation of provided certs by the remote peer. We should do the same when our native implementation is used.
Modification:
- Automatically wrap a X509TrustManager and so do the same validations as the JDK does.
- Add unit tests.
Result:
More consistent behaviour. Fixes https://github.com/netty/netty/issues/6664.
Motivation:
The Epoll transport checks to see if there are any scheduled tasks
before entering epoll_wait, and resets the timerfd just before.
This causes an extra syscall to timerfd_settime before doing any
actual work. When scheduled tasks aren't added frequently, or
tasks are added with later deadlines, this is unnecessary.
Modification:
Check the *deadline* of the peeked task in EpollEventLoop, rather
than the *delay*. If it hasn't changed since last time, don't
re-arm the timer
Result:
About 2us faster on gRPC RTT 50pct latency benchmarks.
Before (2 runs for 5 minutes, 1 minute of warmup):
```
50.0%ile Latency (in nanos): 64267
90.0%ile Latency (in nanos): 72851
95.0%ile Latency (in nanos): 78903
99.0%ile Latency (in nanos): 92327
99.9%ile Latency (in nanos): 119691
100.0%ile Latency (in nanos): 13347327
QPS: 14933
50.0%ile Latency (in nanos): 63907
90.0%ile Latency (in nanos): 73055
95.0%ile Latency (in nanos): 79443
99.0%ile Latency (in nanos): 93739
99.9%ile Latency (in nanos): 123583
100.0%ile Latency (in nanos): 14028287
QPS: 14936
```
After:
```
50.0%ile Latency (in nanos): 62123
90.0%ile Latency (in nanos): 70795
95.0%ile Latency (in nanos): 76895
99.0%ile Latency (in nanos): 90887
99.9%ile Latency (in nanos): 117819
100.0%ile Latency (in nanos): 14126591
QPS: 15387
50.0%ile Latency (in nanos): 61021
90.0%ile Latency (in nanos): 70311
95.0%ile Latency (in nanos): 76687
99.0%ile Latency (in nanos): 90887
99.9%ile Latency (in nanos): 119527
100.0%ile Latency (in nanos): 6351615
QPS: 15571
```
* Log the correct line-number when using SLF4j with netty if possible.
Motivation:
At the moment we do not log the correct line number in many cases as it will log the line number of the logger wrapper itself. Slf4j does have an extra interface that can be used to filter out these nad make it more usable with logging wrappers.
Modifications:
Detect if the returned logger implements LocationAwareLogger and if so make use of its extra methods to be able to log the correct origin of the log request.
Result:
Better logging when using slf4j.
Motivation:
In Java8 and earlier we used reflection to replace the used key set if not otherwise told. This does not work on Java9 and later without special flags as its not possible to call setAccessible(true) on the Field anymore.
Modifications:
- Use Unsafe to instrument the Selector with out special set when sun.misc.Unsafe is present and we are using Java9+.
Result:
NIO transport produce less GC on Java9 and later as well.
Motivation:
In Java8 and earlier we used reflection to detect if unaligned access is supported. This fails in Java9 and later as we would need to change the accessible level of the method.
Lucky enough we can use Unsafe directly to read the content of the static field here.
Modifications:
Add special handling for detecting if unaligned access is supported on Java9 and later which does not fail due jigsaw.
Result:
Better and more correct detection on Java9 and later.
Motivation:
At the moment we will just assume the correct version of log4j2 is used when we find it on the classpath. This may lead to an AbstractMethodError at runtime. We should not use log4j2 if the version is not correct.
Modifications:
Check on class loading if we can use Log4J2 or not.
Result:
Fixes#8217.
Motivation:
Log4J2Logger had some code-duplication with AbstractInternalLogger
Modifications:
Reuse AbstractInternaLogger.EXCEPTION_MESSAGE in Log4J2Logger and so remove some code-duplication
Result:
Less duplicated code.
* We should be able to use the ByteBuffer cleaner on java8 (and earlier versions) even if sun.misc.Unsafe is not present.
Motivation:
At the moment we have a hard dependency on sun.misc.Unsafe to use the Cleaner on Java8 and earlier. This is not really needed as we can still use pure reflection if sun.misc.Unsafe is not present.
Modifications:
Refactor Cleaner6 to fallback to pure reflection if sun.misc.Unsafe is not present on system.
Result:
More timely releasing of direct memory on Java8 and earlier when sun.misc.Unsafe is not present.
Motivation:
f77891cc17 changed slightly how we detect if we should prefer direct buffers or not but did miss to also take this into account when logging.
Modifications:
Fix branch for log message to reflect changes in f77891cc17.
Result:
Correct logging.
Motivation:
There was a race condition between the task submitter and task executor threads such that the last Runnable submitted may not get executed.
Modifications:
The bug was fixed by checking the task queue and state in the task executor thread after it saw the task queue was empty.
Result:
Fixes#8230
Motivation:
We should prefer direct buffers whenever we can use the cleaner even if sun.misc.Unsafe is not present.
Modifications:
Correctly prefer direct buffers in all cases.
Result:
More correct code.
Motivation:
CleanerJava9 currently fails whever a SecurityManager is installed. We should make use of AccessController.doPrivileged(...) so the user can give it the correct rights.
Modifications:
Use doPrivileged(...) when needed.
Result:
Fixes https://github.com/netty/netty/issues/8201.
Motivation:
Recycler may produce a NPE when the same object is recycled multiple times from different threads.
Modifications:
- Check if the id has changed or if the Stack became null and if so throw an IllegalStateException
- Add unit test
Result:
Fixes https://github.com/netty/netty/issues/8220.
* Try to monkey-patch library id when shading is used and we are on MacOS / OSX.
Motivation:
ea4c315b45 did ensure we support using multiple versions of the same shaded native library but the user still needed to run install_name_tool -id on MacOS to ensure the ID is unique.
This is kind of error prone and also means that the shading itself would need to be done on MacOS / OSX.
This is related to https://github.com/netty/netty/issues/7272.
Modifications:
- Monkey patch the shaded native lib on MacOS to ensure the id is unique while unpacking it to the tempory location.
Result:
Easier way of using shaded native libs in netty.
Motivation:
Java9 and later does the safepoint polling by itself so there is not need for us to do it.
Modifications:
Check for java version before doing manual safepoint polling.
Result:
Less custom code and less overhead when using java9 and later. Fixes https://github.com/netty/netty/issues/8122.
Motivation:
We do not correctly check for previous calles of setUncancellable() in getNow() which may result in ClassCastException as we incorrectly return the internally UNCANCELLABLE object and not null if setUncancellable() we as called before.
Modifications:
Correctly check for UNCANCELLABLE and add unit test.
Result:
Fixes https://github.com/netty/netty/issues/8135.
Motivation:
We incorrectly calculated the length that was used for our for loop in AsciiString.indexOf(...). This lead to a possible ArrayIndexOutOfBoundsException.
Modifications:
- Not include the start in the length calculation
- Add unit test.
Result:
Fixes https://github.com/netty/netty/issues/8112.
Motivation:
Users should not see a scary log message when Netty is initialized if
Netty configuration explicitly disables unsafe. The log message that
produces this warning was previously guarded but by recent refactoring
a bug was introduced inside the guard helper method.
Modifications:
This commit brings back the guard against the scary log message if
unsafe is explicitly disabled.
Result:
No log message is produced when unsafe is unavailable because Netty was
told to not look for it.
Relates https://github.com/netty/netty/pull/5624, https://github.com/netty/netty/pull/6696
Motivation:
ObjectCleaner does start a Thread to handle the cleaning of resources which leaks into the users application. We should not use it in netty itself to make things more predictable.
Modifications:
- Remove usage of ObjectCleaner and use finalize as a replacement when possible.
- Clarify javadocs for FastThreadLocal.onRemoval(...) to ensure its clear that remove() is not guaranteed to be called when the Thread completees and so this method is not enough to guarantee cleanup for this case.
Result:
Fixes https://github.com/netty/netty/issues/8017.
Motivation:
I'm not sure if trivial changes like this are interesting :-) But I
noticed that the PlatformDependent.maxDirectMemory0() method is called
twice unnecessarily during static initialization (on the default path at
least).
Modifications:
Use constant MAX_DIRECT_MEMORY already set to the same value instead of
calling maxDirectMemory0() again.
Result:
A surely imperceivable reduction in operations performed at startup.
Motivation
There is a cost to concatenating strings and calling methods that will be wasted if the Logger's level is not enabled.
Modifications
Check if Log level is enabled before producing log statement. These are just a few cases found by RegEx'ing in the code.
Result
Tiny bit more efficient code.
Motivation:
We should allow to schedule tasks with a delay up to Long.MAX_VALUE as we did pre 4.1.25.Final.
Modifications:
Just ensure we not overflow and put the correct max limits in place when schedule a timer. At worse we will get a wakeup to early and then schedule a new timeout.
Result:
Fixes https://github.com/netty/netty/issues/7970.
Motivation:
On J9 / OpenJ9 netty initializes this value with 64M, even the direct accessible memory is actually unbounded.
Modifications:
Skip usage of VM.maxDirectMemory() on J9 / OpenJ9
Result:
More correct direct memory limit detection. Fixes#7654.
Motivation:
We did not guard against the case of calling malloc(0) when creating a ByteBuffer without a Cleaner. The problem is that malloc(0) can have different behaviour, it either return a null-pointer or a valid pointer that you can pass to free.
The real problem arise if Unsafe.allocateMemory(0) returns 0 and we use it as the memoryAddress of the ByteBuffer. The problem here is that native libraries test for 0 and handle it as a null-ptr. This is for example true in SSL.bioSetByteBuffer(...) which would throw a NPE when 0 is used as memoryAddress and so produced errors during SSL usage.
Modifications:
- Always allocate 1 byte as minimum (even if we ask for an empty buffer).
- Add unit test.
Result:
No more errors possible because of malloc(0).
Motivation:
When a buffer is over-released, the current error message of `IllegalReferenceCountException` is `refCnt: XXX, increment: XXX`, which is confusing. The correct message should be `refCnt: XXX, decrement: XXX`.
Modifications:
Pass `-decrement` to create `IllegalReferenceCountException`.
Result:
The error message will be `refCnt: XXX, decrement: XXX` when a buffer is over-released.
Motivation:
On z/OS netty initializes this value with 64M, even the direct accessible memory is actually unbounded.
Modifications:
Skip usage of VM.maxDirectMemory() on z/OS.
Result:
More correct direct memory limit detection. Fixes https://github.com/netty/netty/issues/7654.
Motivation:
Using a very huge delay when calling schedule(...) may cause an Selector error when calling select(...) later on. We should gaurd against such a big value.
Modifications:
- Add guard against a very huge value.
- Added tests.
Result:
Fixes [#7365]
Motivation:
A FastThreadLocal instance currently occupies 2 slots of InternalThreadLocalMap of every thread. Actually for a FastThreadLocalThread, it does not need to store cleaner flags of FastThreadLocal instances. Besides, using BitSet to store cleaner flags is also better for space usage.
Modification:
Use BitSet to optimize space usage of FastThreadLocal.
Result:
Avoid unnecessary slot occupancy. Cleaner flags are now stored into a BitSet.
Motivation:
When trying to cleanup WeakOrderQueue by the ObjectCleaner we end up in an endless loop which will cause the ObjectCleaner to be not able to cleanup any other resources anymore.This bug was introduced by 6eb9674bf5.
Modifications:
Correctly update link while cleanup
Result:
Fixes https://github.com/netty/netty/issues/7877
Motivation:
The bounds checking for AsciiString#indexOf and AsciiString#lastIndexOf is not correct and may lead to ArrayIndexOutOfBoundsException.
Modifications:
- Correct the bounds checking for AsciiString#indexOf and AsciiString#lastIndexOf
Result
Fixes https://github.com/netty/netty/issues/7863
Motivation:
Finer granularity when configuring CorsHandler, enabling different policies for different origins.
Modifications:
The CorsHandler has an extra constructor that accepts a List<CorsConfig> that are evaluated sequentially when processing a Cors request
Result:
The changes don't break backwards compatibility. The extra ctor can be used to provide more than one CorsConfig object.
Motivation:
Minor performance optimisation that prevents thread from blocking due to task not having been added to queue. Discussed #7815.
Modification:
add task to the queue before starting the thread.
Result:
No additional tests.
* NetUtil valid IP methods to accept CharSequence
Motivation:
NetUtil has methods to determine if a String is a valid IP address. These methods don't rely upon String specific methods and can use CharSequence instead.
Modifications:
- Use CharSequence instead of String for the IP validator methods.
- Avoid object allocation in AsciiString#indexOf(char,int) and reduce
byte code
Result:
No more copy operation required if a CharSequence exists.
Motivation:
HttpProxyHandler uses `NetUtil#toSocketAddressString` to compute
CONNECT url and Host header.
The url is correct when the address is unresolved, as
`NetUtil#toSocketAddressString` will then use
`getHoststring`/`getHostname`. If the address is already resolved, the
url will be based on the IP instead of the hostname.
There’s an additional minor issue with the Host header: default port
443 should be omitted.
Modifications:
* Introduce NetUtil#getHostname
* Introduce HttpUtil#formatHostnameForHttp to format an
InetSocketAddress to
HTTP format
* Change url computation to favor hostname instead of IP
* Introduce HttpProxyHandler ignoreDefaultPortsInConnectHostHeader
parameter to ignore 80 and 443 ports in Host header
Result:
HttpProxyHandler performs properly when connecting to a resolved address
Motivation:
We missed to correctly record the stacktrace of the creation of an ResourceLeak record. This could either have the effect to log the wrote stacktrace for creation or not log a stacktrace at all if the object was dropped on the floor after it was created.
Modifications:
Correctly create a Record on creation of the object.
Result:
Fixes https://github.com/netty/netty/issues/7781.
Motivation:
We recently introduced ObjectCleaner which can be used to ensure some cleanup action is done once an object becomes weakable reachable. We should use this in Recycler.WeakOrderQueue to reduce the overhead of using a finalizer() (which will cause the GC to process it two times).
Modifications:
Replace finalizer() usage with ObjectCleaner
Result:
Fixes [#7343]
Motivation:
We dont protect from overflow and so the timer may fire too early if a large timeout is used.
Modifications:
Add overflow guard and a test.
Result:
Fixes https://github.com/netty/netty/issues/7760.
Motivation:
It is not clear why Unsafe is unavailable when it is explicitly
disabled, or when Netty thinks it is running on Android.
Modification:
Change the "has" fields and methods to be causes. A null cause
means Unsafe is present. This catches all possible reason why
Unsafe might not be available.
Result:
Easier to debug Netty start up when logging cannot be turned on.
Motivation:
DefaultPromise's internal state depends upon specific Signal objects. These Signal objects can be used externally which causes the DefaultPromise object API to not function correct and state to become corrupted.
Modifications:
- DefaultPromise shouldn't depend upon Signal for its internal state
Result:
Fixes https://github.com/netty/netty/issues/7707
Motivation:
The Recycler currently retains 32k objects per thread by default. The Recycler is used in more than just one place and may result in large amounts of memory bloat if spikes of traffic are observed.
Modifications:
- Reduce the Recyclers default capacity from 32k to 4k.
Result:
- Lower default capacity of the Recycler and less memory retained.
Motivation:
Some java binaries include android classes on their classpath, even
if they aren't actually android. When this is true, `Unsafe` no
longer works, disabling the Epoll functionality. A sample case is
for binaries that use the j2objc library.
Modifications:
Check the `java.vm.name` instead of the classpath. Numerous
Google-internal Android libraries / binaries check this property
rather than the class path.
It is believed this is safe and works with bother ART and Dalvik
VMs, safe for Robolectric, and j2objc.
Results:
Unusually built java server binaries can still use Netty Epoll.
Motivation:
Currently if user call set/remove/set/remove many times, it will create multiple cleaner task for same index. It may cause OOM due to long live thread will have more and more task in LIVE_SET.
Modification:
Add flag to avoid recreating tasks.
Result:
Only create 1 clean task. But use more space of indexedVariables.
Motivation:
Reflective setAccessible(true) will produce scary warnings on the console when using java9+, while netty still works. That said users may feel uncomfortable with these warnings, we should not try to do it by default when using java9+.
Modifications:
Add io.netty.tryReflectionSetAccessible system property which controls if setAccessible(...) will be used. By default it will bet set to false when using java9+.
Result:
Fixes [#7254].
Motivation:
The methods implement io.netty.util.concurrent.Future#cancel(boolean mayInterruptIfRunning) which actually ignored the param mayInterruptIfRunning.We need to add comments for the `mayInterruptIfRunning` param.
Modifications:
Add comments for the `mayInterruptIfRunning` param.
Result:
People who call the `cancel` method will be more clear about the effect of `mayInterruptIfRunning` param.
Motivation:
The ObjectCleanerThread must be a daemon thread as otherwise we may block the JVM from exit. By using a daemon thread we basically give the same garantees as the JVM when it comes to cleanup of resources (as the GC threads are also daemon threads and the CleanerImpl uses a deamon thread as well in Java9+).
Modifications:
Change ObjectCleanThread to be a daemon thread.
Result:
JVM shutdown will always be able to complete. Fixed [#7617].
Motivation:
In environments with a security manager, the reflective access to get the reference to
Throwable#addSuppressed can cause issues that result in Netty failing to load. The main
motivation in this pull request is to remove the use of reflection to prevent issues in
these environments.
Modifications:
ThrowableUtil no longer uses Class#getDeclaredMembers to get the Method that references
Throwable#addSuppressed and instead guards the call to Throwable#addSuppressed with a
Java version check.
Additionally, a annotation was added that suppresses the animal sniffer java16 signature
check on the given method. The benefit of the annotation is that it limits the exclusion
of Throwable to just the ThrowableUtil class and has string text indicating the reason
for suppressing the java16 signature check.
Result:
Netty no longer requires the use of Class#getDeclaredMethod for ThrowableUtil and will
work in environments restricted by a security manager without needing to grant reflection
permissions.
Fixes#7614
Motivation:
In a few classes, Netty starts a thread and then sets the context classloader of these threads
to prevent classloader leaks. The Thread#setContextClassLoader method is a privileged method in
that it requires permissions to be executed when there is a security manager in place. Unless
these calls are wrapped in a doPrivileged block, they will fail in an environment with a security
manager and restrictive policy in place.
Modifications:
Wrap the calls to Thread#setContextClassLoader in a AccessController#doPrivileged block.
Result:
After this change, the threads can set the context classloader without any errors in an
environment with a security manager and restrictive policy in place.
Motivation:
Usages of HttpResponseStatus may result in more object allocation then necessary due to not looking for cached objects and the AsciiString parsing method not being used due to CharSequence method being used instead.
Modifications:
- HttpResponseDecoder should attempt to get the HttpResponseStatus from cache instead of allocating a new object
- HttpResponseStatus#parseLine(CharSequence) should check if the type is AsciiString and redirect to the AsciiString parsing method which may not require an additional toString call
- HttpResponseStatus#parseLine(AsciiString) can be optimized and doesn't require and may not require object allocation
Result:
Less allocations when dealing with HttpResponseStatus.
Motivation:
ObjectCleaner inovkes a Runnable which may execute user code (FastThreadLocal#onRemoval) and therefore exceptions maybe thrown. If an exception is thrown the cleanup thread will exit prematurely and we may never finish cleaning up which will result in leaks.
Modifications:
- ObjectCleaner should suppress exceptions and continue cleaning
Result:
ObjectCleaner will reliably clean despite exceptions being thrown.
Motivation:
ObjectCleaner polls a ReferenceQueue which will block indefinitely. However it is possible there is a race condition between the live set of objects being empty due to the WeakReference being cleaned/cleared and polling the queue. If this situation occurs the cleanup thread may never unblock if no more objects are added to the live set, and may result in an application's failure to gracefully close.
Modifications:
- ReferenceQueue.remove should use a timeout to compensate for the race condition, and avoid dead lock
Result:
No more dead lock in ObjectCleaner when polling the ReferenceQueue.
Motivation:
FastThreadLocal#set calls isIndexedVariableSet to determine if we need to register with the cleaner, but the set(InternalThreadLocalMap, V) method will also internally do this check so we can share code and only do the check a single time.
Modifications:
- extract code from set(InternalThreadLocalMap, V) so it can be called externally to determine if a new item was created
Result:
Less code duplication in FastThreadLocal#set.
Motivation:
e329ca1 introduced the user of ObjectCleaner in FastThreadLocal but we missed the case to register our cleaner task if FastThreadLocal.set was called only.
Modifications:
- Use ObjectCleaner also when FastThreadLocal.set is used.
- Add test case.
Result:
ObjectCleaner is always used.
Motivation:
Allow pre-computing calculation of the constants for compiler where it could be.
Similar fix in OpenJDK: [1].
Modifications:
- Use parentheses.
- Simplify static initialization of `BYTE2HEX_*` arrays in `StringUtil`.
Result:
Less bytecode, possible faster calculations at runtime.
[1] https://bugs.openjdk.java.net/browse/JDK-4477961
Motivation:
There is no guarantee that FastThreadLocal.onRemoval(...) is called if the FastThreadLocal is used by "non" FastThreacLocalThreads. This can lead to all sort of problems, like for example memory leaks as direct memory is not correctly cleaned up etc.
Beside this we use ThreadDeathWatcher to check if we need to release buffers back to the pool when thread local caches are collected. In the past ThreadDeathWatcher was used which will need to "wakeup" every second to check if the registered Threads are still alive. If we can ensure FastThreadLocal.onRemoval(...) is called we do not need this anymore.
Modifications:
- Introduce ObjectCleaner and use it to ensure FastThreadLocal.onRemoval(...) is always called when a Thread is collected.
- Deprecate ThreadDeathWatcher
- Add unit tests.
Result:
Consistent way of cleanup FastThreadLocals when a Thread is collected.
Motivation:
We should remove the WeakOrderedQueue from the WeakHashMap directly if possible and only depend on the semantics of the WeakHashMap if there is no other way for us to cleanup it.
Modifications:
Override onRemoval(...) to remove the WeakOrderedQueue if possible.
Result:
Less overhead and quicker collection of WeakOrderedQueue for some cases.
Motivation:
When doStartThread throws an exception, e.g. due to the actual executor being depleted of threads and throwing in its rejected execution handler, the STEE ends up in started state anyway. If we try to execute another task in this executor, it will be queued but the thread won't be started anymore and the task will linger forever.
Modifications:
- Ensure we not update the internal state if the startThread() method throws.
- Add testcase
Result:
Fixes [#7483]
Motivation:
In our Recycler implementation we store a reference to the current Thread in the Stack that is stored in a FastThreadLocal. The Stack itself is referenced in the DefaultHandle itself. A problem can arise if a user stores a Reference to an Object that holds a reference to the DefaultHandle somewhere and either not remove the reference at all or remove it very late. In this case the Thread itself can not be collected as its still referenced in the Stack that is referenced by the DefaultHandle.
Modifications:
- Use a WeakReference to store the reference to the Thread in the Stack
- Add a test case
Result:
Ensure a Thread can be collected in a timely manner in all cases even if it used the Recycler.
Motivation:
ThreadDeathWatcher and GlobalEventExecutor may create and start a new thread from various other threads and so inherit the classloader. We need to ensure we not inherit to allow recycling the classloader.
Modifications:
Use Thread.setContextClassLoader(null) to ensure we not hold a strong reference to the classloader and so not leak it.
Result:
Fixes [#7290].
Motivation:
We dont need to use the ThreadDeathWatcher if we use a FastThreadLocalThread for which we wrap the Runnable and ensure we call FastThreadLocal.removeAll() once the Runnable completes.
Modifications:
- Dont use a ThreadDeathWatcher if we are sure we will call FastThreadLocal.removeAll()
- Add unit test.
Result:
Less overhead / running theads if you only allocate / deallocate from FastThreadLocalThreads.
Motivation:
OSGI and other enviroments may not allow to even load Unsafe which will lead to an NoClassDefFoundError when trying to access it. We should guard against this.
Modifications:
Catch NoClassDefFoundError when trying to load Unsafe.
Result:
Be able to use netty with a strict OSGI config.
Motivation:
When system property is empty, the default value should be used.
Modification:
- Correctly use the default value in all cases
- Add unit tests
Result:
Correct behaviour
Motivation:
In order to determine if a header contains a value we currently rely
upon getAll(..) and regular expressions. This operation is commonly used
during the encode and decode stage to determine the transfer encoding
(e.g. HttpUtil#isTransferEncodingChunked). This operation requires an
intermediate collection and possibly regular expressions for the
CombinedHttpHeaders use case which can be expensive.
Modifications:
- Add a valuesIterator to HttpHeaders and specializations of this method
for DefaultHttpHeaders, ReadOnlyHttpHeaders, and CombinedHttpHeaders.
Result:
Less intermediate collections and allocation overhead when determining
if HttpHeaders contains a name/value pair.
Motivation:
Netty could handle "connection" or "te" headers more gently when
converting from http/1.1 to http/2 headers. Http/2 headers don't
support single-hop headers, so when we convert from http/1.1 to http/2,
we should drop all single-hop headers. This includes headers like
"transfer-encoding" and "connection", but also the headers that
"connection" points to, since "connection" can be used to designate
other headers as single-hop headers. For the "te" header, we can more
permissively convert it by just dropping non-conforming headers (ie
non-"trailers" headers) which is what we do for all other headers when
we convert.
Modifications:
Add a new blacklist to the http/1.1 to http/2 conversion, which is
constructed from the values of the "connection" header, and stop
throwing an exception when a "te" header is passed with a non-"trailers"
value. Instead, drop all values except for "trailers". Add unit tests
for "connection" and "te" headers when converting from http/1.1 to http/2.
Result:
This will improve the h2c upgrade request, and also conversions from
http/1.1 to http/2. This will simplify implementing spec-compliant
http/2 servers that want to share code between their http/1.1 and http/2
implementations.
[Fixes#7355]
Motivation:
`AbstractScheduledEventExecutor` uses a standard `java.util.PriorityQueue` to keep track of task deadlines. `ScheduledFuture.cancel` removes tasks from this `PriorityQueue`. Unfortunately, `PriorityQueue.remove` has `O(n)` performance since it must search for the item in the entire queue before removing it. This is fast when the future is at the front of the queue (e.g., already triggered) but not when it's randomly located in the queue.
Many servers will use `ScheduledFuture.cancel` on all requests, e.g., to manage a request timeout. As these cancellations will be happen in arbitrary order, when there are many scheduled futures, `PriorityQueue.remove` is a bottleneck and greatly hurts performance with many concurrent requests (>10K).
Modification:
Use netty's `DefaultPriorityQueue` for scheduling futures instead of the JDK. `DefaultPriorityQueue` is almost identical to the JDK version except it is able to remove futures without searching for them in the queue. This means `DefaultPriorityQueue.remove` has `O(log n)` performance.
Result:
Before - cancelling futures has varying performance, capped at `O(n)`
After - cancelling futures has stable performance, capped at `O(log n)`
Benchmark results
After - cancelling in order and in reverse order have similar performance within `O(log n)` bounds
```
Benchmark (num) Mode Cnt Score Error Units
ScheduledFutureTaskBenchmark.cancelInOrder 100 thrpt 20 137779.616 ± 7709.751 ops/s
ScheduledFutureTaskBenchmark.cancelInOrder 1000 thrpt 20 11049.448 ± 385.832 ops/s
ScheduledFutureTaskBenchmark.cancelInOrder 10000 thrpt 20 943.294 ± 12.391 ops/s
ScheduledFutureTaskBenchmark.cancelInOrder 100000 thrpt 20 64.210 ± 1.824 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 100 thrpt 20 167531.096 ± 9187.865 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 1000 thrpt 20 33019.786 ± 4737.770 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 10000 thrpt 20 2976.955 ± 248.555 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 100000 thrpt 20 362.654 ± 45.716 ops/s
```
Before - cancelling in order and in reverse order have significantly different performance at higher queue size, orders of magnitude worse than the new implementation.
```
Benchmark (num) Mode Cnt Score Error Units
ScheduledFutureTaskBenchmark.cancelInOrder 100 thrpt 20 139968.586 ± 12951.333 ops/s
ScheduledFutureTaskBenchmark.cancelInOrder 1000 thrpt 20 12274.420 ± 337.800 ops/s
ScheduledFutureTaskBenchmark.cancelInOrder 10000 thrpt 20 958.168 ± 15.350 ops/s
ScheduledFutureTaskBenchmark.cancelInOrder 100000 thrpt 20 53.381 ± 13.981 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 100 thrpt 20 123918.829 ± 3642.517 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 1000 thrpt 20 5099.810 ± 206.992 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 10000 thrpt 20 72.335 ± 0.443 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 100000 thrpt 20 0.743 ± 0.003 ops/s
```
Motivation:
When looking for a leak, its nice to be able to request at least a
number of leaks.
Modification:
* Made all leak records up to the target amoutn recorded, and only
then enable backing off.
* Enable recording more than 32 elements. Previously the shift
amount made this impossible.
Result:
Ability to record all accesses.
Motivation:
Fix NullPointerExceptions that occur when running netty-tcnative inside the bootstrap class loader.
Modifications:
- Replace loader.getResource(...) with ClassLoader.getSystemResource(...) when loader is null.
- Replace loader.loadClass(...) with Class.forName(..., false, loader) which works when loader is both null and non-null.
Result:
Support running native libs in bootstrap class loader
Motivation:
Phantom references are for cleaning up resources that were
forgotten, which means they keep their referent alive. This
means garbage is kept around until the refqueue is drained, rather
than when the reference is unreachable.
Modification:
Use Weak References instead of Phantoms
Result:
More punctual leak detection.
Motivation:
Objects of java.util.TreeMap or java.util.TreeSet will become
non-Serializable if instantiated with Comparators, which are not also
Serializable. This can result in unexpected and difficult-to-diagnose
bugs.
Modifications:
Implements Serializable for all classes, which implements Comparator.
Result:
Proper Comparators which will not force collections to
non-Serializable mode.
Motivation:
Even if it's a super micro-optimization (most JVM could optimize such
cases in runtime), in theory (and according to some perf tests) it
may help a bit. It also makes a code more clear and allows you to
access such methods in the test scope directly, without instance of
the class.
Modifications:
Add 'static' modifier for all methods, where it possible. Mostly in
test scope.
Result:
Cleaner code with proper 'static' modifiers.
Motivation:
Without a 'serialVersionUID' field, any change to a class will make
previously serialized versions unreadable.
Modifications:
Add missed 'serialVersionUID' field for all Serializable
classes.
Result:
Proper deserialization of previously serialized objects.
Configuring this is tough because there is split between highly shared (and accessed) objects and lightly accessed objects.
Modification:
There are a number of changes here. In relative order of importance:
API / Functionality changes:
* Max records and max sample records are gone. Only "target" records, the number of records tries to retain is exposed.
* Records are sampled based on the number of already stored records. The likelihood of recording a new sample is `2^(-n)`, where `n` is the number of currently stored elements.
* Records are stored in a concurrent stack structure rather than a list. This avoids a head and tail. Since the stack is only read once, there is no need to maintain head and tail pointers
* The properties of this imply that the very first and very last access are always recorded. When deciding to sample, the top element is replaced rather than pushed.
* Samples that happen between the first and last accesses now have a chance of being recorded. Previously only the final few were kept.
* Sampling is no longer deterministic. Previously, a deterministic access pattern meant that you could conceivably always miss some access points.
* Sampling has a linear ramp for low values and and exponentially backs off roughly equal to 2^n. This means that for 1,000,000 accesses, about 20 will actually be kept. I have an elegant proof for this which is too large to fit in this commit message.
Code changes:
* All locks are gone. Because sampling rarely needs to do a write, there is almost 0 contention. The dropped records counter is slightly contentious, but this could be removed or changed to a LongAdder. This was not done because of memory concerns.
* Stack trace exclusion is done outside of RLD. Classes can opt to remove some of their methods.
* Stack trace exclusion is faster, since it uses String.equals, often getting a pointer compare due to interning. Previously it used contains()
* Leak printing is outputted fairly differently. I tried to preserve as much of the original formatting as possible, but some things didn't make sense to keep.
Result:
More useful leak reporting.
Faster:
```
Before:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 136293.404 ± 7669.454 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 72805.720 ± 3710.864 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 139131.215 ± 4882.751 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 74146.313 ± 4999.246 ops/s
After:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 155281.969 ± 5301.399 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 77866.239 ± 3821.054 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 153360.036 ± 8611.353 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 78670.804 ± 2399.149 ops/s
```
Motivation:
There are 2 motivations, the first depends on the second:
Loading Netty Epoll statically stopped working in 4.1.16, due to
`Native` always loading the arch specific shared object. In a
static binary, there is no arch specific SO.
Second, there are a ton of exceptions that can happen when loading
a native library. When loading native code, Netty tries a bunch of
different paths but a failure in any given may not be fatal.
Additionally: turning on debug logging is not always feasible so
exceptions get silently swallowed.
Modifications:
* Change Epoll and Kqueue to try the static load second
* Modify NativeLibraryLoader to record all the locations where
exceptions occur.
* Attempt to use `addSuppressed` from Java 7 if available.
Alternatives Considered:
An alternative would be to record log messages at each failure. If
all load attempts fail, the log messages are printed as warning,
else as debug. The problem with this is there is no `LogRecord` to
create like in java.util.logging. Buffering the args to
logger.log() at the end of the method loses the call site, and
changes the order of events to be confusing.
Another alternative is to teach NativeLibraryLoader about loading
the SO first, and then the static version. This would consolidate
the code fore Epoll, Kqueue, and TCNative. I think this is the
long term better option, but this PR is changing a lot already.
Someone else can take a crack at it later
Results:
Epoll Still Loads and easier debugging.
Motivation:
Highly retained and released objects have contention on their ref
count. Currently, the ref count is updated using compareAndSet
with care to make sure the count doesn't overflow, double free, or
revive the object.
Profiling has shown that a non trivial (~1%) of CPU time on gRPC
latency benchmarks is from the ref count updating.
Modification:
Rather than pessimistically assuming the ref count will be invalid,
optimistically update it assuming it will be. If the update was
wrong, then use the slow path to revert the change and throw an
execption. Most of the time, the ref counts are correct.
This changes from using compareAndSet to getAndAdd, which emits a
different CPU instruction on x86 (CMPXCHG to XADD). Because the
CPU knows it will modifiy the memory, it can avoid contention.
On a highly contended machine, this can be about 2x faster.
There is a downside to the new approach. The ref counters can
temporarily enter invalid states if over retained or over released.
The code does handle these overflow and underflow scenarios, but it
is possible that another concurrent access may push the failure to
a different location. For example:
Time 1 Thread 1: obj.retain(INT_MAX - 1)
Time 2 Thread 1: obj.retain(2)
Time 2 Thread 2: obj.retain(1)
Previously Thread 2 would always succeed and Thread 1 would always
fail on the second access. Now, thread 2 could fail while thread 1
is rolling back its change.
====
There are a few reasons why I think this is okay:
1. Buggy code is going to have bugs. An exception _is_ going to be
thrown. This just causes the other threads to notice the state
is messed up and stop early.
2. If high retention counts are a use case, then ref count should
be a long rather than an int.
3. The critical section is greatly reduced compared to the previous
version, so the likelihood of this happening is lower
4. On error, the code always rollsback the change atomically, so
there is no possibility of corruption.
Result:
Faster refcounting
```
BEFORE:
Benchmark (delay) Mode Cnt Score Error Units
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1 sample 2901361 804.579 ± 1.835 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10 sample 3038729 785.376 ± 16.471 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 100 sample 2899401 817.392 ± 6.668 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1000 sample 3650566 2077.700 ± 0.600 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10000 sample 3005467 19949.334 ± 4.243 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1 sample 456091 48.610 ± 1.162 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10 sample 732051 62.599 ± 0.815 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 100 sample 778925 228.629 ± 1.205 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1000 sample 633682 2002.987 ± 2.856 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10000 sample 506442 19735.345 ± 12.312 ns/op
AFTER:
Benchmark (delay) Mode Cnt Score Error Units
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1 sample 3761980 383.436 ± 1.315 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10 sample 3667304 474.429 ± 1.101 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 100 sample 3039374 479.267 ± 0.435 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1000 sample 3709210 2044.603 ± 0.989 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10000 sample 3011591 19904.227 ± 18.025 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1 sample 494975 52.269 ± 8.345 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10 sample 771094 62.290 ± 0.795 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 100 sample 763230 235.044 ± 1.552 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1000 sample 634037 2006.578 ± 3.574 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10000 sample 506284 19742.605 ± 13.729 ns/op
```
Motivation:
3c8c7fc7e9 introduced some changes to the ResourceLeakDetector that introduced a regression and so would always log that paranoid leak detection should be enabled even it was already.
Modifications:
Correctly not clear the recorded stacktraces when we process the reference queue so we can log these.
Result:
ResourceLeakDetector works again as expected.
Motivation:
NativeLibraryLoader uses ClassLoader#getResource method that can return nulls when the resource cannot be found. The returned url variable should be checked for nullity and fail in a more usable manner than a NullPointerException
Modifications:
Fail with a FileNotFoundException
Result:
Fixes [#7222].
Motiviation:
The ResourceLeakDetector helps to detect and troubleshoot resource leaks and is often used even in production enviroments with a low level. Because of this its import that we try to keep the overhead as low as overhead. Most of the times no leak is detected (as all is correctly handled) so we should keep the overhead for this case as low as possible.
Modifications:
- Only call getStackTrace() if a leak is reported as it is a very expensive native call. Also handle the filtering and creating of the String in a lazy fashion
- Remove the need to mantain a Queue to store the last access records
- Add benchmark
Result:
Huge decrease of performance overhead.
Before the patch:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 4358.367 ± 116.419 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 2306.027 ± 55.044 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 4220.979 ± 114.046 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 2250.734 ± 55.352 ops/s
With this patch:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 71398.957 ± 2695.925 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 38643.963 ± 1446.694 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 71677.882 ± 2923.622 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 38660.176 ± 1467.732 ops/s
Motivation:
RLD allocates an ArrayDeque in anticipation of recording access
points. If the leak detection level is less than ADVANCED though,
the dequeue is never used. Since SIMPLE is the default level,
there is a minor perf win to not preemptively allocate it.
This showed up in garbage profiling when creation a high number of
buffers.
Modifications:
Only allocate the dequeue if it will be used.
Result:
Less garbage created.
Motivation:
NativeLibraryLoader may only log a debug statement if the library is successfully loaded from java.library.path, but will log failure statements the if load for java.library.path fails which can mislead users to believe the load actually failed when it may have succeeded.
Modifications:
- Always load a debug statement when a library was successfully loaded
Result:
NativeLibraryLoader log statements more clear.
Motivation:
When Log4j2Logger is used with PatternLayout (%F:%L)%c.%M, the log message incorrect shows:
(Log4J2Logger.java:73)io.netty.util.internal.PlatformDependent0.debug ....
Modification:
Extend AbstractLogger
Result:
Fixes [#7186].
Motivation:
We should only try to load the native artifacts if the architecture we are currently running on is the same as the one the native libraries were compiled for.
Modifications:
Include architecture in native lib name and append the current arch when trying to load these. This will fail then if its not the same as the arch of the compiled arch.
Result:
Fixes [#7150].
Motivation:
When the user want to have the direct memory explicitly managed by the GC (just as java.nio does) it is useful to be able to construct an UnpooledByteBufAllocator that allows this without the chances to see any memory leak.
Modifications:
Allow to explicitly disable the usage of reflection to construct direct ByteBufs and so be sure these will be collected by GC.
Result:
More flexible way to use the UnpooledByteBufAllocator.
Motivation:
The documentation for field updates says:
> Note that the guarantees of the {@code compareAndSet}
> method in this class are weaker than in other atomic classes.
> Because this class cannot ensure that all uses of the field
> are appropriate for purposes of atomic access, it can
> guarantee atomicity only with respect to other invocations of
> {@code compareAndSet} and {@code set} on the same updater.
This implies that volatiles shouldn't use normal assignment; the
updater should set them.
Modifications:
Use setter for field updaters that make use of compareAndSet.
Result:
Concurrency compliant code
Motivation:
PD and PD0 Both try to find and use Unsafe. If unavailable, they
try to log why and continue on. However, it is not always east to
enable this logging. Chaining exceptions together is much easier
to reach, and the original exception is relevant when Unsafe is
needed.
Modifications:
* Make PD log why PD0 could not be loaded with a trace level log
* Make PD0 remember why Unsafe wasn't available
* Expose unavailability cause through PD for higher level use.
* Make Epoll and KQueue include the reason when failing
Result:
Easier debugging in hard to reconfigure environments
Motivation:
Continuing to make netty happy when compiling through errorprone.
Modification:
Mostly comments, some minor switch statement changes.
Result:
No more compiler errors!
Motivation:
On restricted systems (e.g. grsecurity), it might not be possible to write a .so on disk and load it afterwards. On those system Netty should check java.library.path for libraries to load.
Modifications:
Changed NativeLibraryLoader.java to first try to load libs from java.library.path before exporting the .so to disk.
Result:
Libraries load fine on restricted systems.
Motivation:
When compiling this code and running it through errorprone[1], this message appears:
```
StringUtil.java:493: error: [FallThrough] Switch case may fall through; add a `// fall through` comment if it was deliberate
case LINE_FEED:
^
(see http://errorprone.info/bugpattern/FallThrough)
```
By adding that comment, it silences the error and also makes clear the intention of that statement.
[1]http://errorprone.info/index
Modification:
Add simple comment.
Result:
Errorprone is happier with the code.
Motivation:
The `AsciiString#toString` method calculate string value and cache it into field. If an `AsciiString` created from the `String` value, we can avoid rebuilding strings if we cache them immediately when creating `AsciiString`. It would be useful for constants strings, which already stored in the JVMs string table, or in cases where an unavoidable `#toString `method call is assumed.
Modifications:
- Add new static method `AsciiString#cache(String)` which save string value into cache field.
- Apply a "benign" data race in the `#hashCode` and `#toString` methods.
Result:
Less memory usage in some `AsciiString` use cases.
Motivation:
At the moment we try to load the library using multiple names which includes names using - but also _ . We should just use _ all the time.
Modifications:
Replace - with _
Result:
Fixes [#7069]
Motivation:
In some cases of using an `InternalThreadLocalMap#stringBuilder`, the `StringBuilder`s size can often exceed the exist limit (1024 bytes). This can lead to permanent memory reallocation.
Modifications:
Add custom properties for the initial capacity and maximum size (after which the `StringBuilder`s capacity will be reduced to the initial capacity).
Result:
An `InternalThreadLocalMap#stringBuilder`s initial and max size is configurable. Fixes [#7092].
Motivation:
NativeLibraryLoader has some code-duplication that can be removed.
Modifications:
Remove duplicated code and just use provided methods of PlatformDependent.
Result:
Less code duplication, fixes [#3756].
Motivation:
Now that the NativeLibraryLoader implicitly detects the shaded package prefix we no longer need the io.netty.packagePrefix system property.
Modifications:
- Remove io.netty.packagePrefix processing from NativeLibraryLoader
Result:
Code is cleaner.
Motivation:
Shading requires renaming binary components (.so, .dll; for tcnative,
epoll, etc). But the rename then requires setting the
io.netty.packagePrefix system property on the command line or runtime,
which is either a burden or not feasible.
If you don't rename the binary components everything appears to
work, until a dependency on a second version of the binary component is
added. At that point, only one version of the binary will be loaded...
which is what shading is supposed to prevent. So for valid shading, the
binaries must be renamed.
Modifications:
Automatically detect the package prefix by comparing the actual class
name to the non-shaded expected class name. The expected class name must
be obfuscated to prevent shading utilities from changing it.
Result:
When shading and using binary components, runtime configuration is no
longer necessary.
Pre-existing shading users that were not renaming the binary components
will break, because the packagePrefix previously defaulted to "". Since
these pre-existing users had broken configurations that only _appeared_
to work, this breakage is considered a Good Thing. Users may workaround
this breakage temporarily by setting -Dio.netty.packagePrefix= to
restore packagePrefix to "".
Fixes#6963
Motivation:
ResourceLeakDetector records at most MAX_RECORDS+1 records
Modifications:
Make room before add to lastRecords
Result:
ResourceLeakDetector will record at most MAX_RECORDS records
Motivation:
JCTools 2.0.2 provides an unbounded MPSC linked queue. Before we shaded JCTools we had our own unbounded MPSC linked queue and used it in various places but gave this up because there was no public equivalent available in JCTools at the time.
Modifications:
- Use JCTool's MPSC linked queue when no upper bound is specified
Result:
Fixes https://github.com/netty/netty/issues/5951
Motivation:
PR #6811 introduced a public utility methods to decode hex dump and its parts, but they are not visible from netty-common.
Modifications:
1. Move the `decodeHexByte`, `decodeHexDump` and `decodeHexNibble` methods into `StringUtils`.
2. Apply these methods where applicable.
3. Remove similar methods from other locations (e.g. `HpackHex` test class).
Result:
Less code duplication.
Motivation:
A `StringUtil#escapeCsv` creates new `StringBuilder` on each value even if the same string is returned in the end.
Modifications:
Create new `StringBuilder` only if it really needed. Otherwise, return the original string (or just trimmed substring).
Result:
Less GC load. Up to 4x faster work for not changed strings.
Motivation:
AppendableCharSequence depends upon IndexOutOfBoundsException to trigger a resize operation under the assumption that the resize operation will be rare if the initial size guess is good. However if the initial size guess is not good then the performance will be more unpredictable and likely suffer.
Modifications:
- Check the position in AppendableCharSequence#append to determine if a resize is necessary
Result:
More predictable performance in AppendableCharSequence#append.
Motivation:
We need to use FQCN to prevent classloader issues for classes that are > Java6. This is a cleanup of ed5fcbb773.
Modifications:
Just remove the imports and use FQCN.
Result:
No classloader issues with java6
Motivation:
Docker's `--tmpfs` flag mounts the temp volume with `noexec` by default,
resulting in an UnsatisfiedLinkError. While this is good security
practice, it is a surprising failure from a seemingly innocuous flag.
Modifications:
Add a best-effort attempt in `NativeLibraryLoader` to detect when temp
files beng loaded cannot be executed even when execution permissions
are set, often because the `noexec` flag is set on the volume.
Requires numerous additional exclusions to the Animal Sniffer config
for Java7 POSIX permissions manipulation.
Result:
Fixes [#6678].