Motivation:
When a buffer is over-released, the current error message of `IllegalReferenceCountException` is `refCnt: XXX, increment: XXX`, which is confusing. The correct message should be `refCnt: XXX, decrement: XXX`.
Modifications:
Pass `-decrement` to create `IllegalReferenceCountException`.
Result:
The error message will be `refCnt: XXX, decrement: XXX` when a buffer is over-released.
Motivation:
On z/OS netty initializes this value with 64M, even the direct accessible memory is actually unbounded.
Modifications:
Skip usage of VM.maxDirectMemory() on z/OS.
Result:
More correct direct memory limit detection. Fixes https://github.com/netty/netty/issues/7654.
Motivation:
Using a very huge delay when calling schedule(...) may cause an Selector error when calling select(...) later on. We should gaurd against such a big value.
Modifications:
- Add guard against a very huge value.
- Added tests.
Result:
Fixes [#7365]
Motivation:
A FastThreadLocal instance currently occupies 2 slots of InternalThreadLocalMap of every thread. Actually for a FastThreadLocalThread, it does not need to store cleaner flags of FastThreadLocal instances. Besides, using BitSet to store cleaner flags is also better for space usage.
Modification:
Use BitSet to optimize space usage of FastThreadLocal.
Result:
Avoid unnecessary slot occupancy. Cleaner flags are now stored into a BitSet.
Motivation:
When trying to cleanup WeakOrderQueue by the ObjectCleaner we end up in an endless loop which will cause the ObjectCleaner to be not able to cleanup any other resources anymore.This bug was introduced by 6eb9674bf5.
Modifications:
Correctly update link while cleanup
Result:
Fixes https://github.com/netty/netty/issues/7877
Motivation:
The bounds checking for AsciiString#indexOf and AsciiString#lastIndexOf is not correct and may lead to ArrayIndexOutOfBoundsException.
Modifications:
- Correct the bounds checking for AsciiString#indexOf and AsciiString#lastIndexOf
Result
Fixes https://github.com/netty/netty/issues/7863
Motivation:
Finer granularity when configuring CorsHandler, enabling different policies for different origins.
Modifications:
The CorsHandler has an extra constructor that accepts a List<CorsConfig> that are evaluated sequentially when processing a Cors request
Result:
The changes don't break backwards compatibility. The extra ctor can be used to provide more than one CorsConfig object.
Motivation:
We need to add a dev-tools dependecy for commons as otherwise we may fail to fetch it before we try to use it.
Modifications:
Add dependency.
Result:
Fixes https://github.com/netty/netty/issues/7842
Motivation:
Minor performance optimisation that prevents thread from blocking due to task not having been added to queue. Discussed #7815.
Modification:
add task to the queue before starting the thread.
Result:
No additional tests.
* NetUtil valid IP methods to accept CharSequence
Motivation:
NetUtil has methods to determine if a String is a valid IP address. These methods don't rely upon String specific methods and can use CharSequence instead.
Modifications:
- Use CharSequence instead of String for the IP validator methods.
- Avoid object allocation in AsciiString#indexOf(char,int) and reduce
byte code
Result:
No more copy operation required if a CharSequence exists.
Motivation:
HttpProxyHandler uses `NetUtil#toSocketAddressString` to compute
CONNECT url and Host header.
The url is correct when the address is unresolved, as
`NetUtil#toSocketAddressString` will then use
`getHoststring`/`getHostname`. If the address is already resolved, the
url will be based on the IP instead of the hostname.
There’s an additional minor issue with the Host header: default port
443 should be omitted.
Modifications:
* Introduce NetUtil#getHostname
* Introduce HttpUtil#formatHostnameForHttp to format an
InetSocketAddress to
HTTP format
* Change url computation to favor hostname instead of IP
* Introduce HttpProxyHandler ignoreDefaultPortsInConnectHostHeader
parameter to ignore 80 and 443 ports in Host header
Result:
HttpProxyHandler performs properly when connecting to a resolved address
Motivation:
We missed to correctly record the stacktrace of the creation of an ResourceLeak record. This could either have the effect to log the wrote stacktrace for creation or not log a stacktrace at all if the object was dropped on the floor after it was created.
Modifications:
Correctly create a Record on creation of the object.
Result:
Fixes https://github.com/netty/netty/issues/7781.
Motivation:
We recently introduced ObjectCleaner which can be used to ensure some cleanup action is done once an object becomes weakable reachable. We should use this in Recycler.WeakOrderQueue to reduce the overhead of using a finalizer() (which will cause the GC to process it two times).
Modifications:
Replace finalizer() usage with ObjectCleaner
Result:
Fixes [#7343]
Motivation:
We dont protect from overflow and so the timer may fire too early if a large timeout is used.
Modifications:
Add overflow guard and a test.
Result:
Fixes https://github.com/netty/netty/issues/7760.
Motivation:
It is not clear why Unsafe is unavailable when it is explicitly
disabled, or when Netty thinks it is running on Android.
Modification:
Change the "has" fields and methods to be causes. A null cause
means Unsafe is present. This catches all possible reason why
Unsafe might not be available.
Result:
Easier to debug Netty start up when logging cannot be turned on.
Motivation:
DefaultPromise's internal state depends upon specific Signal objects. These Signal objects can be used externally which causes the DefaultPromise object API to not function correct and state to become corrupted.
Modifications:
- DefaultPromise shouldn't depend upon Signal for its internal state
Result:
Fixes https://github.com/netty/netty/issues/7707
Motivation:
The Recycler currently retains 32k objects per thread by default. The Recycler is used in more than just one place and may result in large amounts of memory bloat if spikes of traffic are observed.
Modifications:
- Reduce the Recyclers default capacity from 32k to 4k.
Result:
- Lower default capacity of the Recycler and less memory retained.
Motivation:
Some java binaries include android classes on their classpath, even
if they aren't actually android. When this is true, `Unsafe` no
longer works, disabling the Epoll functionality. A sample case is
for binaries that use the j2objc library.
Modifications:
Check the `java.vm.name` instead of the classpath. Numerous
Google-internal Android libraries / binaries check this property
rather than the class path.
It is believed this is safe and works with bother ART and Dalvik
VMs, safe for Robolectric, and j2objc.
Results:
Unusually built java server binaries can still use Netty Epoll.
Motivation:
Currently if user call set/remove/set/remove many times, it will create multiple cleaner task for same index. It may cause OOM due to long live thread will have more and more task in LIVE_SET.
Modification:
Add flag to avoid recreating tasks.
Result:
Only create 1 clean task. But use more space of indexedVariables.
Motivation:
Reflective setAccessible(true) will produce scary warnings on the console when using java9+, while netty still works. That said users may feel uncomfortable with these warnings, we should not try to do it by default when using java9+.
Modifications:
Add io.netty.tryReflectionSetAccessible system property which controls if setAccessible(...) will be used. By default it will bet set to false when using java9+.
Result:
Fixes [#7254].
Motivation:
The methods implement io.netty.util.concurrent.Future#cancel(boolean mayInterruptIfRunning) which actually ignored the param mayInterruptIfRunning.We need to add comments for the `mayInterruptIfRunning` param.
Modifications:
Add comments for the `mayInterruptIfRunning` param.
Result:
People who call the `cancel` method will be more clear about the effect of `mayInterruptIfRunning` param.
Motivation:
The ObjectCleanerThread must be a daemon thread as otherwise we may block the JVM from exit. By using a daemon thread we basically give the same garantees as the JVM when it comes to cleanup of resources (as the GC threads are also daemon threads and the CleanerImpl uses a deamon thread as well in Java9+).
Modifications:
Change ObjectCleanThread to be a daemon thread.
Result:
JVM shutdown will always be able to complete. Fixed [#7617].
Motivation:
In environments with a security manager, the reflective access to get the reference to
Throwable#addSuppressed can cause issues that result in Netty failing to load. The main
motivation in this pull request is to remove the use of reflection to prevent issues in
these environments.
Modifications:
ThrowableUtil no longer uses Class#getDeclaredMembers to get the Method that references
Throwable#addSuppressed and instead guards the call to Throwable#addSuppressed with a
Java version check.
Additionally, a annotation was added that suppresses the animal sniffer java16 signature
check on the given method. The benefit of the annotation is that it limits the exclusion
of Throwable to just the ThrowableUtil class and has string text indicating the reason
for suppressing the java16 signature check.
Result:
Netty no longer requires the use of Class#getDeclaredMethod for ThrowableUtil and will
work in environments restricted by a security manager without needing to grant reflection
permissions.
Fixes#7614
Motivation:
In a few classes, Netty starts a thread and then sets the context classloader of these threads
to prevent classloader leaks. The Thread#setContextClassLoader method is a privileged method in
that it requires permissions to be executed when there is a security manager in place. Unless
these calls are wrapped in a doPrivileged block, they will fail in an environment with a security
manager and restrictive policy in place.
Modifications:
Wrap the calls to Thread#setContextClassLoader in a AccessController#doPrivileged block.
Result:
After this change, the threads can set the context classloader without any errors in an
environment with a security manager and restrictive policy in place.
Motivation:
Usages of HttpResponseStatus may result in more object allocation then necessary due to not looking for cached objects and the AsciiString parsing method not being used due to CharSequence method being used instead.
Modifications:
- HttpResponseDecoder should attempt to get the HttpResponseStatus from cache instead of allocating a new object
- HttpResponseStatus#parseLine(CharSequence) should check if the type is AsciiString and redirect to the AsciiString parsing method which may not require an additional toString call
- HttpResponseStatus#parseLine(AsciiString) can be optimized and doesn't require and may not require object allocation
Result:
Less allocations when dealing with HttpResponseStatus.
Motivation:
ObjectCleaner inovkes a Runnable which may execute user code (FastThreadLocal#onRemoval) and therefore exceptions maybe thrown. If an exception is thrown the cleanup thread will exit prematurely and we may never finish cleaning up which will result in leaks.
Modifications:
- ObjectCleaner should suppress exceptions and continue cleaning
Result:
ObjectCleaner will reliably clean despite exceptions being thrown.
Motivation:
ObjectCleaner polls a ReferenceQueue which will block indefinitely. However it is possible there is a race condition between the live set of objects being empty due to the WeakReference being cleaned/cleared and polling the queue. If this situation occurs the cleanup thread may never unblock if no more objects are added to the live set, and may result in an application's failure to gracefully close.
Modifications:
- ReferenceQueue.remove should use a timeout to compensate for the race condition, and avoid dead lock
Result:
No more dead lock in ObjectCleaner when polling the ReferenceQueue.
Motivation:
FastThreadLocal#set calls isIndexedVariableSet to determine if we need to register with the cleaner, but the set(InternalThreadLocalMap, V) method will also internally do this check so we can share code and only do the check a single time.
Modifications:
- extract code from set(InternalThreadLocalMap, V) so it can be called externally to determine if a new item was created
Result:
Less code duplication in FastThreadLocal#set.
Motivation:
e329ca1 introduced the user of ObjectCleaner in FastThreadLocal but we missed the case to register our cleaner task if FastThreadLocal.set was called only.
Modifications:
- Use ObjectCleaner also when FastThreadLocal.set is used.
- Add test case.
Result:
ObjectCleaner is always used.
Motivation:
Allow pre-computing calculation of the constants for compiler where it could be.
Similar fix in OpenJDK: [1].
Modifications:
- Use parentheses.
- Simplify static initialization of `BYTE2HEX_*` arrays in `StringUtil`.
Result:
Less bytecode, possible faster calculations at runtime.
[1] https://bugs.openjdk.java.net/browse/JDK-4477961
Motivation:
There is no guarantee that FastThreadLocal.onRemoval(...) is called if the FastThreadLocal is used by "non" FastThreacLocalThreads. This can lead to all sort of problems, like for example memory leaks as direct memory is not correctly cleaned up etc.
Beside this we use ThreadDeathWatcher to check if we need to release buffers back to the pool when thread local caches are collected. In the past ThreadDeathWatcher was used which will need to "wakeup" every second to check if the registered Threads are still alive. If we can ensure FastThreadLocal.onRemoval(...) is called we do not need this anymore.
Modifications:
- Introduce ObjectCleaner and use it to ensure FastThreadLocal.onRemoval(...) is always called when a Thread is collected.
- Deprecate ThreadDeathWatcher
- Add unit tests.
Result:
Consistent way of cleanup FastThreadLocals when a Thread is collected.
Motivation:
We should remove the WeakOrderedQueue from the WeakHashMap directly if possible and only depend on the semantics of the WeakHashMap if there is no other way for us to cleanup it.
Modifications:
Override onRemoval(...) to remove the WeakOrderedQueue if possible.
Result:
Less overhead and quicker collection of WeakOrderedQueue for some cases.
Motivation:
When doStartThread throws an exception, e.g. due to the actual executor being depleted of threads and throwing in its rejected execution handler, the STEE ends up in started state anyway. If we try to execute another task in this executor, it will be queued but the thread won't be started anymore and the task will linger forever.
Modifications:
- Ensure we not update the internal state if the startThread() method throws.
- Add testcase
Result:
Fixes [#7483]
Motivation:
In our Recycler implementation we store a reference to the current Thread in the Stack that is stored in a FastThreadLocal. The Stack itself is referenced in the DefaultHandle itself. A problem can arise if a user stores a Reference to an Object that holds a reference to the DefaultHandle somewhere and either not remove the reference at all or remove it very late. In this case the Thread itself can not be collected as its still referenced in the Stack that is referenced by the DefaultHandle.
Modifications:
- Use a WeakReference to store the reference to the Thread in the Stack
- Add a test case
Result:
Ensure a Thread can be collected in a timely manner in all cases even if it used the Recycler.
Motivation:
ThreadDeathWatcher and GlobalEventExecutor may create and start a new thread from various other threads and so inherit the classloader. We need to ensure we not inherit to allow recycling the classloader.
Modifications:
Use Thread.setContextClassLoader(null) to ensure we not hold a strong reference to the classloader and so not leak it.
Result:
Fixes [#7290].
Automatic-Module-Name entry provides a stable JDK9 module name, when Netty is used in a modular JDK9 applications. More info: http://blog.joda.org/2017/05/java-se-9-jpms-automatic-modules.html
When Netty migrates to JDK9 in the future, the entry can be replaced by actual module-info descriptor.
Modification:
The POM-s are configured to put the correct module names to the manifest.
Result:
Fixes#7218.
Motivation:
We dont need to use the ThreadDeathWatcher if we use a FastThreadLocalThread for which we wrap the Runnable and ensure we call FastThreadLocal.removeAll() once the Runnable completes.
Modifications:
- Dont use a ThreadDeathWatcher if we are sure we will call FastThreadLocal.removeAll()
- Add unit test.
Result:
Less overhead / running theads if you only allocate / deallocate from FastThreadLocalThreads.
Motivation:
OSGI and other enviroments may not allow to even load Unsafe which will lead to an NoClassDefFoundError when trying to access it. We should guard against this.
Modifications:
Catch NoClassDefFoundError when trying to load Unsafe.
Result:
Be able to use netty with a strict OSGI config.
Motivation:
When system property is empty, the default value should be used.
Modification:
- Correctly use the default value in all cases
- Add unit tests
Result:
Correct behaviour
Motivation:
In order to determine if a header contains a value we currently rely
upon getAll(..) and regular expressions. This operation is commonly used
during the encode and decode stage to determine the transfer encoding
(e.g. HttpUtil#isTransferEncodingChunked). This operation requires an
intermediate collection and possibly regular expressions for the
CombinedHttpHeaders use case which can be expensive.
Modifications:
- Add a valuesIterator to HttpHeaders and specializations of this method
for DefaultHttpHeaders, ReadOnlyHttpHeaders, and CombinedHttpHeaders.
Result:
Less intermediate collections and allocation overhead when determining
if HttpHeaders contains a name/value pair.
Motivation:
Netty could handle "connection" or "te" headers more gently when
converting from http/1.1 to http/2 headers. Http/2 headers don't
support single-hop headers, so when we convert from http/1.1 to http/2,
we should drop all single-hop headers. This includes headers like
"transfer-encoding" and "connection", but also the headers that
"connection" points to, since "connection" can be used to designate
other headers as single-hop headers. For the "te" header, we can more
permissively convert it by just dropping non-conforming headers (ie
non-"trailers" headers) which is what we do for all other headers when
we convert.
Modifications:
Add a new blacklist to the http/1.1 to http/2 conversion, which is
constructed from the values of the "connection" header, and stop
throwing an exception when a "te" header is passed with a non-"trailers"
value. Instead, drop all values except for "trailers". Add unit tests
for "connection" and "te" headers when converting from http/1.1 to http/2.
Result:
This will improve the h2c upgrade request, and also conversions from
http/1.1 to http/2. This will simplify implementing spec-compliant
http/2 servers that want to share code between their http/1.1 and http/2
implementations.
[Fixes#7355]
Motivation:
`AbstractScheduledEventExecutor` uses a standard `java.util.PriorityQueue` to keep track of task deadlines. `ScheduledFuture.cancel` removes tasks from this `PriorityQueue`. Unfortunately, `PriorityQueue.remove` has `O(n)` performance since it must search for the item in the entire queue before removing it. This is fast when the future is at the front of the queue (e.g., already triggered) but not when it's randomly located in the queue.
Many servers will use `ScheduledFuture.cancel` on all requests, e.g., to manage a request timeout. As these cancellations will be happen in arbitrary order, when there are many scheduled futures, `PriorityQueue.remove` is a bottleneck and greatly hurts performance with many concurrent requests (>10K).
Modification:
Use netty's `DefaultPriorityQueue` for scheduling futures instead of the JDK. `DefaultPriorityQueue` is almost identical to the JDK version except it is able to remove futures without searching for them in the queue. This means `DefaultPriorityQueue.remove` has `O(log n)` performance.
Result:
Before - cancelling futures has varying performance, capped at `O(n)`
After - cancelling futures has stable performance, capped at `O(log n)`
Benchmark results
After - cancelling in order and in reverse order have similar performance within `O(log n)` bounds
```
Benchmark (num) Mode Cnt Score Error Units
ScheduledFutureTaskBenchmark.cancelInOrder 100 thrpt 20 137779.616 ± 7709.751 ops/s
ScheduledFutureTaskBenchmark.cancelInOrder 1000 thrpt 20 11049.448 ± 385.832 ops/s
ScheduledFutureTaskBenchmark.cancelInOrder 10000 thrpt 20 943.294 ± 12.391 ops/s
ScheduledFutureTaskBenchmark.cancelInOrder 100000 thrpt 20 64.210 ± 1.824 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 100 thrpt 20 167531.096 ± 9187.865 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 1000 thrpt 20 33019.786 ± 4737.770 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 10000 thrpt 20 2976.955 ± 248.555 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 100000 thrpt 20 362.654 ± 45.716 ops/s
```
Before - cancelling in order and in reverse order have significantly different performance at higher queue size, orders of magnitude worse than the new implementation.
```
Benchmark (num) Mode Cnt Score Error Units
ScheduledFutureTaskBenchmark.cancelInOrder 100 thrpt 20 139968.586 ± 12951.333 ops/s
ScheduledFutureTaskBenchmark.cancelInOrder 1000 thrpt 20 12274.420 ± 337.800 ops/s
ScheduledFutureTaskBenchmark.cancelInOrder 10000 thrpt 20 958.168 ± 15.350 ops/s
ScheduledFutureTaskBenchmark.cancelInOrder 100000 thrpt 20 53.381 ± 13.981 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 100 thrpt 20 123918.829 ± 3642.517 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 1000 thrpt 20 5099.810 ± 206.992 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 10000 thrpt 20 72.335 ± 0.443 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 100000 thrpt 20 0.743 ± 0.003 ops/s
```
Motivation:
When looking for a leak, its nice to be able to request at least a
number of leaks.
Modification:
* Made all leak records up to the target amoutn recorded, and only
then enable backing off.
* Enable recording more than 32 elements. Previously the shift
amount made this impossible.
Result:
Ability to record all accesses.
Motivation:
Fix NullPointerExceptions that occur when running netty-tcnative inside the bootstrap class loader.
Modifications:
- Replace loader.getResource(...) with ClassLoader.getSystemResource(...) when loader is null.
- Replace loader.loadClass(...) with Class.forName(..., false, loader) which works when loader is both null and non-null.
Result:
Support running native libs in bootstrap class loader
Motivation:
Phantom references are for cleaning up resources that were
forgotten, which means they keep their referent alive. This
means garbage is kept around until the refqueue is drained, rather
than when the reference is unreachable.
Modification:
Use Weak References instead of Phantoms
Result:
More punctual leak detection.
Motivation:
Objects of java.util.TreeMap or java.util.TreeSet will become
non-Serializable if instantiated with Comparators, which are not also
Serializable. This can result in unexpected and difficult-to-diagnose
bugs.
Modifications:
Implements Serializable for all classes, which implements Comparator.
Result:
Proper Comparators which will not force collections to
non-Serializable mode.
Motivation:
Even if it's a super micro-optimization (most JVM could optimize such
cases in runtime), in theory (and according to some perf tests) it
may help a bit. It also makes a code more clear and allows you to
access such methods in the test scope directly, without instance of
the class.
Modifications:
Add 'static' modifier for all methods, where it possible. Mostly in
test scope.
Result:
Cleaner code with proper 'static' modifiers.
Motivation:
Without a 'serialVersionUID' field, any change to a class will make
previously serialized versions unreadable.
Modifications:
Add missed 'serialVersionUID' field for all Serializable
classes.
Result:
Proper deserialization of previously serialized objects.
Configuring this is tough because there is split between highly shared (and accessed) objects and lightly accessed objects.
Modification:
There are a number of changes here. In relative order of importance:
API / Functionality changes:
* Max records and max sample records are gone. Only "target" records, the number of records tries to retain is exposed.
* Records are sampled based on the number of already stored records. The likelihood of recording a new sample is `2^(-n)`, where `n` is the number of currently stored elements.
* Records are stored in a concurrent stack structure rather than a list. This avoids a head and tail. Since the stack is only read once, there is no need to maintain head and tail pointers
* The properties of this imply that the very first and very last access are always recorded. When deciding to sample, the top element is replaced rather than pushed.
* Samples that happen between the first and last accesses now have a chance of being recorded. Previously only the final few were kept.
* Sampling is no longer deterministic. Previously, a deterministic access pattern meant that you could conceivably always miss some access points.
* Sampling has a linear ramp for low values and and exponentially backs off roughly equal to 2^n. This means that for 1,000,000 accesses, about 20 will actually be kept. I have an elegant proof for this which is too large to fit in this commit message.
Code changes:
* All locks are gone. Because sampling rarely needs to do a write, there is almost 0 contention. The dropped records counter is slightly contentious, but this could be removed or changed to a LongAdder. This was not done because of memory concerns.
* Stack trace exclusion is done outside of RLD. Classes can opt to remove some of their methods.
* Stack trace exclusion is faster, since it uses String.equals, often getting a pointer compare due to interning. Previously it used contains()
* Leak printing is outputted fairly differently. I tried to preserve as much of the original formatting as possible, but some things didn't make sense to keep.
Result:
More useful leak reporting.
Faster:
```
Before:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 136293.404 ± 7669.454 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 72805.720 ± 3710.864 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 139131.215 ± 4882.751 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 74146.313 ± 4999.246 ops/s
After:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 155281.969 ± 5301.399 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 77866.239 ± 3821.054 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 153360.036 ± 8611.353 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 78670.804 ± 2399.149 ops/s
```
Motivation:
We need to upgrade our dependencies to versions which use ASM 6.0.0+ to support compiling on java9.
Modifications:
Update animal-sniffer-maven-plugin and maven-shade-plugin.
Result:
Fixes https://github.com/netty/netty/issues/6100
Motivation:
There are 2 motivations, the first depends on the second:
Loading Netty Epoll statically stopped working in 4.1.16, due to
`Native` always loading the arch specific shared object. In a
static binary, there is no arch specific SO.
Second, there are a ton of exceptions that can happen when loading
a native library. When loading native code, Netty tries a bunch of
different paths but a failure in any given may not be fatal.
Additionally: turning on debug logging is not always feasible so
exceptions get silently swallowed.
Modifications:
* Change Epoll and Kqueue to try the static load second
* Modify NativeLibraryLoader to record all the locations where
exceptions occur.
* Attempt to use `addSuppressed` from Java 7 if available.
Alternatives Considered:
An alternative would be to record log messages at each failure. If
all load attempts fail, the log messages are printed as warning,
else as debug. The problem with this is there is no `LogRecord` to
create like in java.util.logging. Buffering the args to
logger.log() at the end of the method loses the call site, and
changes the order of events to be confusing.
Another alternative is to teach NativeLibraryLoader about loading
the SO first, and then the static version. This would consolidate
the code fore Epoll, Kqueue, and TCNative. I think this is the
long term better option, but this PR is changing a lot already.
Someone else can take a crack at it later
Results:
Epoll Still Loads and easier debugging.
Motivation:
Highly retained and released objects have contention on their ref
count. Currently, the ref count is updated using compareAndSet
with care to make sure the count doesn't overflow, double free, or
revive the object.
Profiling has shown that a non trivial (~1%) of CPU time on gRPC
latency benchmarks is from the ref count updating.
Modification:
Rather than pessimistically assuming the ref count will be invalid,
optimistically update it assuming it will be. If the update was
wrong, then use the slow path to revert the change and throw an
execption. Most of the time, the ref counts are correct.
This changes from using compareAndSet to getAndAdd, which emits a
different CPU instruction on x86 (CMPXCHG to XADD). Because the
CPU knows it will modifiy the memory, it can avoid contention.
On a highly contended machine, this can be about 2x faster.
There is a downside to the new approach. The ref counters can
temporarily enter invalid states if over retained or over released.
The code does handle these overflow and underflow scenarios, but it
is possible that another concurrent access may push the failure to
a different location. For example:
Time 1 Thread 1: obj.retain(INT_MAX - 1)
Time 2 Thread 1: obj.retain(2)
Time 2 Thread 2: obj.retain(1)
Previously Thread 2 would always succeed and Thread 1 would always
fail on the second access. Now, thread 2 could fail while thread 1
is rolling back its change.
====
There are a few reasons why I think this is okay:
1. Buggy code is going to have bugs. An exception _is_ going to be
thrown. This just causes the other threads to notice the state
is messed up and stop early.
2. If high retention counts are a use case, then ref count should
be a long rather than an int.
3. The critical section is greatly reduced compared to the previous
version, so the likelihood of this happening is lower
4. On error, the code always rollsback the change atomically, so
there is no possibility of corruption.
Result:
Faster refcounting
```
BEFORE:
Benchmark (delay) Mode Cnt Score Error Units
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1 sample 2901361 804.579 ± 1.835 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10 sample 3038729 785.376 ± 16.471 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 100 sample 2899401 817.392 ± 6.668 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1000 sample 3650566 2077.700 ± 0.600 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10000 sample 3005467 19949.334 ± 4.243 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1 sample 456091 48.610 ± 1.162 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10 sample 732051 62.599 ± 0.815 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 100 sample 778925 228.629 ± 1.205 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1000 sample 633682 2002.987 ± 2.856 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10000 sample 506442 19735.345 ± 12.312 ns/op
AFTER:
Benchmark (delay) Mode Cnt Score Error Units
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1 sample 3761980 383.436 ± 1.315 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10 sample 3667304 474.429 ± 1.101 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 100 sample 3039374 479.267 ± 0.435 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1000 sample 3709210 2044.603 ± 0.989 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10000 sample 3011591 19904.227 ± 18.025 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1 sample 494975 52.269 ± 8.345 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10 sample 771094 62.290 ± 0.795 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 100 sample 763230 235.044 ± 1.552 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1000 sample 634037 2006.578 ± 3.574 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10000 sample 506284 19742.605 ± 13.729 ns/op
```
Motivation:
3c8c7fc7e9 introduced some changes to the ResourceLeakDetector that introduced a regression and so would always log that paranoid leak detection should be enabled even it was already.
Modifications:
Correctly not clear the recorded stacktraces when we process the reference queue so we can log these.
Result:
ResourceLeakDetector works again as expected.
Motivation:
NativeLibraryLoader uses ClassLoader#getResource method that can return nulls when the resource cannot be found. The returned url variable should be checked for nullity and fail in a more usable manner than a NullPointerException
Modifications:
Fail with a FileNotFoundException
Result:
Fixes [#7222].
Motiviation:
The ResourceLeakDetector helps to detect and troubleshoot resource leaks and is often used even in production enviroments with a low level. Because of this its import that we try to keep the overhead as low as overhead. Most of the times no leak is detected (as all is correctly handled) so we should keep the overhead for this case as low as possible.
Modifications:
- Only call getStackTrace() if a leak is reported as it is a very expensive native call. Also handle the filtering and creating of the String in a lazy fashion
- Remove the need to mantain a Queue to store the last access records
- Add benchmark
Result:
Huge decrease of performance overhead.
Before the patch:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 4358.367 ± 116.419 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 2306.027 ± 55.044 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 4220.979 ± 114.046 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 2250.734 ± 55.352 ops/s
With this patch:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 71398.957 ± 2695.925 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 38643.963 ± 1446.694 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 71677.882 ± 2923.622 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 38660.176 ± 1467.732 ops/s
Motivation:
RLD allocates an ArrayDeque in anticipation of recording access
points. If the leak detection level is less than ADVANCED though,
the dequeue is never used. Since SIMPLE is the default level,
there is a minor perf win to not preemptively allocate it.
This showed up in garbage profiling when creation a high number of
buffers.
Modifications:
Only allocate the dequeue if it will be used.
Result:
Less garbage created.
Motivation:
NativeLibraryLoader may only log a debug statement if the library is successfully loaded from java.library.path, but will log failure statements the if load for java.library.path fails which can mislead users to believe the load actually failed when it may have succeeded.
Modifications:
- Always load a debug statement when a library was successfully loaded
Result:
NativeLibraryLoader log statements more clear.
Motivation:
When Log4j2Logger is used with PatternLayout (%F:%L)%c.%M, the log message incorrect shows:
(Log4J2Logger.java:73)io.netty.util.internal.PlatformDependent0.debug ....
Modification:
Extend AbstractLogger
Result:
Fixes [#7186].
Motivation:
We should only try to load the native artifacts if the architecture we are currently running on is the same as the one the native libraries were compiled for.
Modifications:
Include architecture in native lib name and append the current arch when trying to load these. This will fail then if its not the same as the arch of the compiled arch.
Result:
Fixes [#7150].
Motivation:
When the user want to have the direct memory explicitly managed by the GC (just as java.nio does) it is useful to be able to construct an UnpooledByteBufAllocator that allows this without the chances to see any memory leak.
Modifications:
Allow to explicitly disable the usage of reflection to construct direct ByteBufs and so be sure these will be collected by GC.
Result:
More flexible way to use the UnpooledByteBufAllocator.
Motivation:
The documentation for field updates says:
> Note that the guarantees of the {@code compareAndSet}
> method in this class are weaker than in other atomic classes.
> Because this class cannot ensure that all uses of the field
> are appropriate for purposes of atomic access, it can
> guarantee atomicity only with respect to other invocations of
> {@code compareAndSet} and {@code set} on the same updater.
This implies that volatiles shouldn't use normal assignment; the
updater should set them.
Modifications:
Use setter for field updaters that make use of compareAndSet.
Result:
Concurrency compliant code
Motivation:
PD and PD0 Both try to find and use Unsafe. If unavailable, they
try to log why and continue on. However, it is not always east to
enable this logging. Chaining exceptions together is much easier
to reach, and the original exception is relevant when Unsafe is
needed.
Modifications:
* Make PD log why PD0 could not be loaded with a trace level log
* Make PD0 remember why Unsafe wasn't available
* Expose unavailability cause through PD for higher level use.
* Make Epoll and KQueue include the reason when failing
Result:
Easier debugging in hard to reconfigure environments
Motivation:
Continuing to make netty happy when compiling through errorprone.
Modification:
Mostly comments, some minor switch statement changes.
Result:
No more compiler errors!
Motivation:
On restricted systems (e.g. grsecurity), it might not be possible to write a .so on disk and load it afterwards. On those system Netty should check java.library.path for libraries to load.
Modifications:
Changed NativeLibraryLoader.java to first try to load libs from java.library.path before exporting the .so to disk.
Result:
Libraries load fine on restricted systems.
Motivation:
When compiling this code and running it through errorprone[1], this message appears:
```
StringUtil.java:493: error: [FallThrough] Switch case may fall through; add a `// fall through` comment if it was deliberate
case LINE_FEED:
^
(see http://errorprone.info/bugpattern/FallThrough)
```
By adding that comment, it silences the error and also makes clear the intention of that statement.
[1]http://errorprone.info/index
Modification:
Add simple comment.
Result:
Errorprone is happier with the code.
Motivation:
The `AsciiString#toString` method calculate string value and cache it into field. If an `AsciiString` created from the `String` value, we can avoid rebuilding strings if we cache them immediately when creating `AsciiString`. It would be useful for constants strings, which already stored in the JVMs string table, or in cases where an unavoidable `#toString `method call is assumed.
Modifications:
- Add new static method `AsciiString#cache(String)` which save string value into cache field.
- Apply a "benign" data race in the `#hashCode` and `#toString` methods.
Result:
Less memory usage in some `AsciiString` use cases.
Motivation:
At the moment we try to load the library using multiple names which includes names using - but also _ . We should just use _ all the time.
Modifications:
Replace - with _
Result:
Fixes [#7069]
Motivation:
In some cases of using an `InternalThreadLocalMap#stringBuilder`, the `StringBuilder`s size can often exceed the exist limit (1024 bytes). This can lead to permanent memory reallocation.
Modifications:
Add custom properties for the initial capacity and maximum size (after which the `StringBuilder`s capacity will be reduced to the initial capacity).
Result:
An `InternalThreadLocalMap#stringBuilder`s initial and max size is configurable. Fixes [#7092].
Motivation:
NativeLibraryLoader has some code-duplication that can be removed.
Modifications:
Remove duplicated code and just use provided methods of PlatformDependent.
Result:
Less code duplication, fixes [#3756].
Motivation:
Now that the NativeLibraryLoader implicitly detects the shaded package prefix we no longer need the io.netty.packagePrefix system property.
Modifications:
- Remove io.netty.packagePrefix processing from NativeLibraryLoader
Result:
Code is cleaner.