Motivation:
There is no guarantee that FastThreadLocal.onRemoval(...) is called if the FastThreadLocal is used by "non" FastThreacLocalThreads. This can lead to all sort of problems, like for example memory leaks as direct memory is not correctly cleaned up etc.
Beside this we use ThreadDeathWatcher to check if we need to release buffers back to the pool when thread local caches are collected. In the past ThreadDeathWatcher was used which will need to "wakeup" every second to check if the registered Threads are still alive. If we can ensure FastThreadLocal.onRemoval(...) is called we do not need this anymore.
Modifications:
- Introduce ObjectCleaner and use it to ensure FastThreadLocal.onRemoval(...) is always called when a Thread is collected.
- Deprecate ThreadDeathWatcher
- Add unit tests.
Result:
Consistent way of cleanup FastThreadLocals when a Thread is collected.
Motivation:
We should remove the WeakOrderedQueue from the WeakHashMap directly if possible and only depend on the semantics of the WeakHashMap if there is no other way for us to cleanup it.
Modifications:
Override onRemoval(...) to remove the WeakOrderedQueue if possible.
Result:
Less overhead and quicker collection of WeakOrderedQueue for some cases.
Motivation:
When doStartThread throws an exception, e.g. due to the actual executor being depleted of threads and throwing in its rejected execution handler, the STEE ends up in started state anyway. If we try to execute another task in this executor, it will be queued but the thread won't be started anymore and the task will linger forever.
Modifications:
- Ensure we not update the internal state if the startThread() method throws.
- Add testcase
Result:
Fixes [#7483]
Motivation:
In our Recycler implementation we store a reference to the current Thread in the Stack that is stored in a FastThreadLocal. The Stack itself is referenced in the DefaultHandle itself. A problem can arise if a user stores a Reference to an Object that holds a reference to the DefaultHandle somewhere and either not remove the reference at all or remove it very late. In this case the Thread itself can not be collected as its still referenced in the Stack that is referenced by the DefaultHandle.
Modifications:
- Use a WeakReference to store the reference to the Thread in the Stack
- Add a test case
Result:
Ensure a Thread can be collected in a timely manner in all cases even if it used the Recycler.
Motivation:
ThreadDeathWatcher and GlobalEventExecutor may create and start a new thread from various other threads and so inherit the classloader. We need to ensure we not inherit to allow recycling the classloader.
Modifications:
Use Thread.setContextClassLoader(null) to ensure we not hold a strong reference to the classloader and so not leak it.
Result:
Fixes [#7290].
Motivation:
We dont need to use the ThreadDeathWatcher if we use a FastThreadLocalThread for which we wrap the Runnable and ensure we call FastThreadLocal.removeAll() once the Runnable completes.
Modifications:
- Dont use a ThreadDeathWatcher if we are sure we will call FastThreadLocal.removeAll()
- Add unit test.
Result:
Less overhead / running theads if you only allocate / deallocate from FastThreadLocalThreads.
Motivation:
OSGI and other enviroments may not allow to even load Unsafe which will lead to an NoClassDefFoundError when trying to access it. We should guard against this.
Modifications:
Catch NoClassDefFoundError when trying to load Unsafe.
Result:
Be able to use netty with a strict OSGI config.
Motivation:
When system property is empty, the default value should be used.
Modification:
- Correctly use the default value in all cases
- Add unit tests
Result:
Correct behaviour
Motivation:
In order to determine if a header contains a value we currently rely
upon getAll(..) and regular expressions. This operation is commonly used
during the encode and decode stage to determine the transfer encoding
(e.g. HttpUtil#isTransferEncodingChunked). This operation requires an
intermediate collection and possibly regular expressions for the
CombinedHttpHeaders use case which can be expensive.
Modifications:
- Add a valuesIterator to HttpHeaders and specializations of this method
for DefaultHttpHeaders, ReadOnlyHttpHeaders, and CombinedHttpHeaders.
Result:
Less intermediate collections and allocation overhead when determining
if HttpHeaders contains a name/value pair.
Motivation:
Netty could handle "connection" or "te" headers more gently when
converting from http/1.1 to http/2 headers. Http/2 headers don't
support single-hop headers, so when we convert from http/1.1 to http/2,
we should drop all single-hop headers. This includes headers like
"transfer-encoding" and "connection", but also the headers that
"connection" points to, since "connection" can be used to designate
other headers as single-hop headers. For the "te" header, we can more
permissively convert it by just dropping non-conforming headers (ie
non-"trailers" headers) which is what we do for all other headers when
we convert.
Modifications:
Add a new blacklist to the http/1.1 to http/2 conversion, which is
constructed from the values of the "connection" header, and stop
throwing an exception when a "te" header is passed with a non-"trailers"
value. Instead, drop all values except for "trailers". Add unit tests
for "connection" and "te" headers when converting from http/1.1 to http/2.
Result:
This will improve the h2c upgrade request, and also conversions from
http/1.1 to http/2. This will simplify implementing spec-compliant
http/2 servers that want to share code between their http/1.1 and http/2
implementations.
[Fixes#7355]
Motivation:
`AbstractScheduledEventExecutor` uses a standard `java.util.PriorityQueue` to keep track of task deadlines. `ScheduledFuture.cancel` removes tasks from this `PriorityQueue`. Unfortunately, `PriorityQueue.remove` has `O(n)` performance since it must search for the item in the entire queue before removing it. This is fast when the future is at the front of the queue (e.g., already triggered) but not when it's randomly located in the queue.
Many servers will use `ScheduledFuture.cancel` on all requests, e.g., to manage a request timeout. As these cancellations will be happen in arbitrary order, when there are many scheduled futures, `PriorityQueue.remove` is a bottleneck and greatly hurts performance with many concurrent requests (>10K).
Modification:
Use netty's `DefaultPriorityQueue` for scheduling futures instead of the JDK. `DefaultPriorityQueue` is almost identical to the JDK version except it is able to remove futures without searching for them in the queue. This means `DefaultPriorityQueue.remove` has `O(log n)` performance.
Result:
Before - cancelling futures has varying performance, capped at `O(n)`
After - cancelling futures has stable performance, capped at `O(log n)`
Benchmark results
After - cancelling in order and in reverse order have similar performance within `O(log n)` bounds
```
Benchmark (num) Mode Cnt Score Error Units
ScheduledFutureTaskBenchmark.cancelInOrder 100 thrpt 20 137779.616 ± 7709.751 ops/s
ScheduledFutureTaskBenchmark.cancelInOrder 1000 thrpt 20 11049.448 ± 385.832 ops/s
ScheduledFutureTaskBenchmark.cancelInOrder 10000 thrpt 20 943.294 ± 12.391 ops/s
ScheduledFutureTaskBenchmark.cancelInOrder 100000 thrpt 20 64.210 ± 1.824 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 100 thrpt 20 167531.096 ± 9187.865 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 1000 thrpt 20 33019.786 ± 4737.770 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 10000 thrpt 20 2976.955 ± 248.555 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 100000 thrpt 20 362.654 ± 45.716 ops/s
```
Before - cancelling in order and in reverse order have significantly different performance at higher queue size, orders of magnitude worse than the new implementation.
```
Benchmark (num) Mode Cnt Score Error Units
ScheduledFutureTaskBenchmark.cancelInOrder 100 thrpt 20 139968.586 ± 12951.333 ops/s
ScheduledFutureTaskBenchmark.cancelInOrder 1000 thrpt 20 12274.420 ± 337.800 ops/s
ScheduledFutureTaskBenchmark.cancelInOrder 10000 thrpt 20 958.168 ± 15.350 ops/s
ScheduledFutureTaskBenchmark.cancelInOrder 100000 thrpt 20 53.381 ± 13.981 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 100 thrpt 20 123918.829 ± 3642.517 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 1000 thrpt 20 5099.810 ± 206.992 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 10000 thrpt 20 72.335 ± 0.443 ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder 100000 thrpt 20 0.743 ± 0.003 ops/s
```
Motivation:
When looking for a leak, its nice to be able to request at least a
number of leaks.
Modification:
* Made all leak records up to the target amoutn recorded, and only
then enable backing off.
* Enable recording more than 32 elements. Previously the shift
amount made this impossible.
Result:
Ability to record all accesses.
Motivation:
Fix NullPointerExceptions that occur when running netty-tcnative inside the bootstrap class loader.
Modifications:
- Replace loader.getResource(...) with ClassLoader.getSystemResource(...) when loader is null.
- Replace loader.loadClass(...) with Class.forName(..., false, loader) which works when loader is both null and non-null.
Result:
Support running native libs in bootstrap class loader
Motivation:
Phantom references are for cleaning up resources that were
forgotten, which means they keep their referent alive. This
means garbage is kept around until the refqueue is drained, rather
than when the reference is unreachable.
Modification:
Use Weak References instead of Phantoms
Result:
More punctual leak detection.
Motivation:
Without a 'serialVersionUID' field, any change to a class will make
previously serialized versions unreadable.
Modifications:
Add missed 'serialVersionUID' field for all Serializable
classes.
Result:
Proper deserialization of previously serialized objects.
Configuring this is tough because there is split between highly shared (and accessed) objects and lightly accessed objects.
Modification:
There are a number of changes here. In relative order of importance:
API / Functionality changes:
* Max records and max sample records are gone. Only "target" records, the number of records tries to retain is exposed.
* Records are sampled based on the number of already stored records. The likelihood of recording a new sample is `2^(-n)`, where `n` is the number of currently stored elements.
* Records are stored in a concurrent stack structure rather than a list. This avoids a head and tail. Since the stack is only read once, there is no need to maintain head and tail pointers
* The properties of this imply that the very first and very last access are always recorded. When deciding to sample, the top element is replaced rather than pushed.
* Samples that happen between the first and last accesses now have a chance of being recorded. Previously only the final few were kept.
* Sampling is no longer deterministic. Previously, a deterministic access pattern meant that you could conceivably always miss some access points.
* Sampling has a linear ramp for low values and and exponentially backs off roughly equal to 2^n. This means that for 1,000,000 accesses, about 20 will actually be kept. I have an elegant proof for this which is too large to fit in this commit message.
Code changes:
* All locks are gone. Because sampling rarely needs to do a write, there is almost 0 contention. The dropped records counter is slightly contentious, but this could be removed or changed to a LongAdder. This was not done because of memory concerns.
* Stack trace exclusion is done outside of RLD. Classes can opt to remove some of their methods.
* Stack trace exclusion is faster, since it uses String.equals, often getting a pointer compare due to interning. Previously it used contains()
* Leak printing is outputted fairly differently. I tried to preserve as much of the original formatting as possible, but some things didn't make sense to keep.
Result:
More useful leak reporting.
Faster:
```
Before:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 136293.404 ± 7669.454 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 72805.720 ± 3710.864 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 139131.215 ± 4882.751 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 74146.313 ± 4999.246 ops/s
After:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 155281.969 ± 5301.399 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 77866.239 ± 3821.054 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 153360.036 ± 8611.353 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 78670.804 ± 2399.149 ops/s
```
Motivation:
There are 2 motivations, the first depends on the second:
Loading Netty Epoll statically stopped working in 4.1.16, due to
`Native` always loading the arch specific shared object. In a
static binary, there is no arch specific SO.
Second, there are a ton of exceptions that can happen when loading
a native library. When loading native code, Netty tries a bunch of
different paths but a failure in any given may not be fatal.
Additionally: turning on debug logging is not always feasible so
exceptions get silently swallowed.
Modifications:
* Change Epoll and Kqueue to try the static load second
* Modify NativeLibraryLoader to record all the locations where
exceptions occur.
* Attempt to use `addSuppressed` from Java 7 if available.
Alternatives Considered:
An alternative would be to record log messages at each failure. If
all load attempts fail, the log messages are printed as warning,
else as debug. The problem with this is there is no `LogRecord` to
create like in java.util.logging. Buffering the args to
logger.log() at the end of the method loses the call site, and
changes the order of events to be confusing.
Another alternative is to teach NativeLibraryLoader about loading
the SO first, and then the static version. This would consolidate
the code fore Epoll, Kqueue, and TCNative. I think this is the
long term better option, but this PR is changing a lot already.
Someone else can take a crack at it later
Results:
Epoll Still Loads and easier debugging.
Motivation:
Highly retained and released objects have contention on their ref
count. Currently, the ref count is updated using compareAndSet
with care to make sure the count doesn't overflow, double free, or
revive the object.
Profiling has shown that a non trivial (~1%) of CPU time on gRPC
latency benchmarks is from the ref count updating.
Modification:
Rather than pessimistically assuming the ref count will be invalid,
optimistically update it assuming it will be. If the update was
wrong, then use the slow path to revert the change and throw an
execption. Most of the time, the ref counts are correct.
This changes from using compareAndSet to getAndAdd, which emits a
different CPU instruction on x86 (CMPXCHG to XADD). Because the
CPU knows it will modifiy the memory, it can avoid contention.
On a highly contended machine, this can be about 2x faster.
There is a downside to the new approach. The ref counters can
temporarily enter invalid states if over retained or over released.
The code does handle these overflow and underflow scenarios, but it
is possible that another concurrent access may push the failure to
a different location. For example:
Time 1 Thread 1: obj.retain(INT_MAX - 1)
Time 2 Thread 1: obj.retain(2)
Time 2 Thread 2: obj.retain(1)
Previously Thread 2 would always succeed and Thread 1 would always
fail on the second access. Now, thread 2 could fail while thread 1
is rolling back its change.
====
There are a few reasons why I think this is okay:
1. Buggy code is going to have bugs. An exception _is_ going to be
thrown. This just causes the other threads to notice the state
is messed up and stop early.
2. If high retention counts are a use case, then ref count should
be a long rather than an int.
3. The critical section is greatly reduced compared to the previous
version, so the likelihood of this happening is lower
4. On error, the code always rollsback the change atomically, so
there is no possibility of corruption.
Result:
Faster refcounting
```
BEFORE:
Benchmark (delay) Mode Cnt Score Error Units
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1 sample 2901361 804.579 ± 1.835 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10 sample 3038729 785.376 ± 16.471 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 100 sample 2899401 817.392 ± 6.668 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1000 sample 3650566 2077.700 ± 0.600 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10000 sample 3005467 19949.334 ± 4.243 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1 sample 456091 48.610 ± 1.162 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10 sample 732051 62.599 ± 0.815 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 100 sample 778925 228.629 ± 1.205 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1000 sample 633682 2002.987 ± 2.856 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10000 sample 506442 19735.345 ± 12.312 ns/op
AFTER:
Benchmark (delay) Mode Cnt Score Error Units
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1 sample 3761980 383.436 ± 1.315 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10 sample 3667304 474.429 ± 1.101 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 100 sample 3039374 479.267 ± 0.435 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1000 sample 3709210 2044.603 ± 0.989 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10000 sample 3011591 19904.227 ± 18.025 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1 sample 494975 52.269 ± 8.345 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10 sample 771094 62.290 ± 0.795 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 100 sample 763230 235.044 ± 1.552 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1000 sample 634037 2006.578 ± 3.574 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10000 sample 506284 19742.605 ± 13.729 ns/op
```
Motivation:
3c8c7fc7e9 introduced some changes to the ResourceLeakDetector that introduced a regression and so would always log that paranoid leak detection should be enabled even it was already.
Modifications:
Correctly not clear the recorded stacktraces when we process the reference queue so we can log these.
Result:
ResourceLeakDetector works again as expected.
Motivation:
NativeLibraryLoader uses ClassLoader#getResource method that can return nulls when the resource cannot be found. The returned url variable should be checked for nullity and fail in a more usable manner than a NullPointerException
Modifications:
Fail with a FileNotFoundException
Result:
Fixes [#7222].
Motiviation:
The ResourceLeakDetector helps to detect and troubleshoot resource leaks and is often used even in production enviroments with a low level. Because of this its import that we try to keep the overhead as low as overhead. Most of the times no leak is detected (as all is correctly handled) so we should keep the overhead for this case as low as possible.
Modifications:
- Only call getStackTrace() if a leak is reported as it is a very expensive native call. Also handle the filtering and creating of the String in a lazy fashion
- Remove the need to mantain a Queue to store the last access records
- Add benchmark
Result:
Huge decrease of performance overhead.
Before the patch:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 4358.367 ± 116.419 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 2306.027 ± 55.044 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 4220.979 ± 114.046 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 2250.734 ± 55.352 ops/s
With this patch:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 71398.957 ± 2695.925 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 38643.963 ± 1446.694 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 71677.882 ± 2923.622 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 38660.176 ± 1467.732 ops/s
Motivation:
RLD allocates an ArrayDeque in anticipation of recording access
points. If the leak detection level is less than ADVANCED though,
the dequeue is never used. Since SIMPLE is the default level,
there is a minor perf win to not preemptively allocate it.
This showed up in garbage profiling when creation a high number of
buffers.
Modifications:
Only allocate the dequeue if it will be used.
Result:
Less garbage created.
Motivation:
NativeLibraryLoader may only log a debug statement if the library is successfully loaded from java.library.path, but will log failure statements the if load for java.library.path fails which can mislead users to believe the load actually failed when it may have succeeded.
Modifications:
- Always load a debug statement when a library was successfully loaded
Result:
NativeLibraryLoader log statements more clear.
Motivation:
When Log4j2Logger is used with PatternLayout (%F:%L)%c.%M, the log message incorrect shows:
(Log4J2Logger.java:73)io.netty.util.internal.PlatformDependent0.debug ....
Modification:
Extend AbstractLogger
Result:
Fixes [#7186].
Motivation:
We should only try to load the native artifacts if the architecture we are currently running on is the same as the one the native libraries were compiled for.
Modifications:
Include architecture in native lib name and append the current arch when trying to load these. This will fail then if its not the same as the arch of the compiled arch.
Result:
Fixes [#7150].
Motivation:
When the user want to have the direct memory explicitly managed by the GC (just as java.nio does) it is useful to be able to construct an UnpooledByteBufAllocator that allows this without the chances to see any memory leak.
Modifications:
Allow to explicitly disable the usage of reflection to construct direct ByteBufs and so be sure these will be collected by GC.
Result:
More flexible way to use the UnpooledByteBufAllocator.
Motivation:
The documentation for field updates says:
> Note that the guarantees of the {@code compareAndSet}
> method in this class are weaker than in other atomic classes.
> Because this class cannot ensure that all uses of the field
> are appropriate for purposes of atomic access, it can
> guarantee atomicity only with respect to other invocations of
> {@code compareAndSet} and {@code set} on the same updater.
This implies that volatiles shouldn't use normal assignment; the
updater should set them.
Modifications:
Use setter for field updaters that make use of compareAndSet.
Result:
Concurrency compliant code
Motivation:
PD and PD0 Both try to find and use Unsafe. If unavailable, they
try to log why and continue on. However, it is not always east to
enable this logging. Chaining exceptions together is much easier
to reach, and the original exception is relevant when Unsafe is
needed.
Modifications:
* Make PD log why PD0 could not be loaded with a trace level log
* Make PD0 remember why Unsafe wasn't available
* Expose unavailability cause through PD for higher level use.
* Make Epoll and KQueue include the reason when failing
Result:
Easier debugging in hard to reconfigure environments
Motivation:
Continuing to make netty happy when compiling through errorprone.
Modification:
Mostly comments, some minor switch statement changes.
Result:
No more compiler errors!
Motivation:
On restricted systems (e.g. grsecurity), it might not be possible to write a .so on disk and load it afterwards. On those system Netty should check java.library.path for libraries to load.
Modifications:
Changed NativeLibraryLoader.java to first try to load libs from java.library.path before exporting the .so to disk.
Result:
Libraries load fine on restricted systems.
Motivation:
When compiling this code and running it through errorprone[1], this message appears:
```
StringUtil.java:493: error: [FallThrough] Switch case may fall through; add a `// fall through` comment if it was deliberate
case LINE_FEED:
^
(see http://errorprone.info/bugpattern/FallThrough)
```
By adding that comment, it silences the error and also makes clear the intention of that statement.
[1]http://errorprone.info/index
Modification:
Add simple comment.
Result:
Errorprone is happier with the code.
Motivation:
The `AsciiString#toString` method calculate string value and cache it into field. If an `AsciiString` created from the `String` value, we can avoid rebuilding strings if we cache them immediately when creating `AsciiString`. It would be useful for constants strings, which already stored in the JVMs string table, or in cases where an unavoidable `#toString `method call is assumed.
Modifications:
- Add new static method `AsciiString#cache(String)` which save string value into cache field.
- Apply a "benign" data race in the `#hashCode` and `#toString` methods.
Result:
Less memory usage in some `AsciiString` use cases.
Motivation:
At the moment we try to load the library using multiple names which includes names using - but also _ . We should just use _ all the time.
Modifications:
Replace - with _
Result:
Fixes [#7069]
Motivation:
In some cases of using an `InternalThreadLocalMap#stringBuilder`, the `StringBuilder`s size can often exceed the exist limit (1024 bytes). This can lead to permanent memory reallocation.
Modifications:
Add custom properties for the initial capacity and maximum size (after which the `StringBuilder`s capacity will be reduced to the initial capacity).
Result:
An `InternalThreadLocalMap#stringBuilder`s initial and max size is configurable. Fixes [#7092].
Motivation:
NativeLibraryLoader has some code-duplication that can be removed.
Modifications:
Remove duplicated code and just use provided methods of PlatformDependent.
Result:
Less code duplication, fixes [#3756].
Motivation:
Now that the NativeLibraryLoader implicitly detects the shaded package prefix we no longer need the io.netty.packagePrefix system property.
Modifications:
- Remove io.netty.packagePrefix processing from NativeLibraryLoader
Result:
Code is cleaner.
Motivation:
Shading requires renaming binary components (.so, .dll; for tcnative,
epoll, etc). But the rename then requires setting the
io.netty.packagePrefix system property on the command line or runtime,
which is either a burden or not feasible.
If you don't rename the binary components everything appears to
work, until a dependency on a second version of the binary component is
added. At that point, only one version of the binary will be loaded...
which is what shading is supposed to prevent. So for valid shading, the
binaries must be renamed.
Modifications:
Automatically detect the package prefix by comparing the actual class
name to the non-shaded expected class name. The expected class name must
be obfuscated to prevent shading utilities from changing it.
Result:
When shading and using binary components, runtime configuration is no
longer necessary.
Pre-existing shading users that were not renaming the binary components
will break, because the packagePrefix previously defaulted to "". Since
these pre-existing users had broken configurations that only _appeared_
to work, this breakage is considered a Good Thing. Users may workaround
this breakage temporarily by setting -Dio.netty.packagePrefix= to
restore packagePrefix to "".
Fixes#6963
Motivation:
ResourceLeakDetector records at most MAX_RECORDS+1 records
Modifications:
Make room before add to lastRecords
Result:
ResourceLeakDetector will record at most MAX_RECORDS records
Motivation:
JCTools 2.0.2 provides an unbounded MPSC linked queue. Before we shaded JCTools we had our own unbounded MPSC linked queue and used it in various places but gave this up because there was no public equivalent available in JCTools at the time.
Modifications:
- Use JCTool's MPSC linked queue when no upper bound is specified
Result:
Fixes https://github.com/netty/netty/issues/5951
Motivation:
PR #6811 introduced a public utility methods to decode hex dump and its parts, but they are not visible from netty-common.
Modifications:
1. Move the `decodeHexByte`, `decodeHexDump` and `decodeHexNibble` methods into `StringUtils`.
2. Apply these methods where applicable.
3. Remove similar methods from other locations (e.g. `HpackHex` test class).
Result:
Less code duplication.
Motivation:
A `StringUtil#escapeCsv` creates new `StringBuilder` on each value even if the same string is returned in the end.
Modifications:
Create new `StringBuilder` only if it really needed. Otherwise, return the original string (or just trimmed substring).
Result:
Less GC load. Up to 4x faster work for not changed strings.
Motivation:
AppendableCharSequence depends upon IndexOutOfBoundsException to trigger a resize operation under the assumption that the resize operation will be rare if the initial size guess is good. However if the initial size guess is not good then the performance will be more unpredictable and likely suffer.
Modifications:
- Check the position in AppendableCharSequence#append to determine if a resize is necessary
Result:
More predictable performance in AppendableCharSequence#append.
Motivation:
We need to use FQCN to prevent classloader issues for classes that are > Java6. This is a cleanup of ed5fcbb773.
Modifications:
Just remove the imports and use FQCN.
Result:
No classloader issues with java6
Motivation:
Docker's `--tmpfs` flag mounts the temp volume with `noexec` by default,
resulting in an UnsatisfiedLinkError. While this is good security
practice, it is a surprising failure from a seemingly innocuous flag.
Modifications:
Add a best-effort attempt in `NativeLibraryLoader` to detect when temp
files beng loaded cannot be executed even when execution permissions
are set, often because the `noexec` flag is set on the volume.
Requires numerous additional exclusions to the Animal Sniffer config
for Java7 POSIX permissions manipulation.
Result:
Fixes [#6678].
Motivation:
For our native libraries in netty we support shading, to have this work on runtime the user needs to set a system property. This code should shared.
Modifications:
Move logic to NativeLbiraryLoader and so share for all native libs.
Result:
Less code duplication and also will work for netty-tcnative out of the box once it support shading
Motivation:
`FormattingTuple.getArgArray()` is never used.
In the `MessageFormatter` it is possible to make
some improvements, e.g. replace `StringBuffer`
with `StringBuilder`, avoid redundant allocations, etc.
Modifications:
- Remove `argArray` field from the `FormattingTuple`.
- In `MessageFormatter`:
- replace `StringBuffer` with `StringBuilder`,
- replace `HashMap` with `HashSet` and make it lazy initialized.
- avoid redundant allocations (`substring()`, etc.)
- use appropriate StringBuilder's methods for the some `Number` values.
- Porting unit tests from `slf4j`.
Result:
Less GC load on logging with internal `MessageFormatter`.
Motivation:
IPv4/6 validation methods use allocations, which can be avoided.
IPv4 parse method use StringTokenizer.
Modifications:
Rewriting IPv4/6 validation methods to avoid allocations.
Rewriting IPv4 parse method without use StringTokenizer.
Result:
IPv4/6 validation and IPv4 parsing faster up to 2-10x.
Motivation:
We should only try to load jdk.internal.misc.Unsafe if we run on Java9+ to eliminate noise in the log.
Modifications:
- Move javaVersion() and related methods to PlatformDependent0 to be able to use these in the static initializer without creating a cycle.
- Only try to load jdk.internal.misc.Unsafe when running in Java9+
Result:
Less noise in the log when running pre java9.
Motivation:
A previous change allocated a new thread local string builder if it
was getting too large. This is a good change, these string builders
can accidentally get too large and then never shrunk and that is sort
of a memory leak. However, the change allocates an entirely new string
builder which is more allocations than necessary. Instead, we can trim
the string builder if its too large, this only allocates an extra
backing array instead of a whole new object.
Modifications:
If the string builder is above a threshold, we trim the string builder
and then ensure its capacity is reasonable to we do not allocate too
much as we start using the string builder.
Result:
The thread local string builder do not serve as a memory yet we do not
allocate too many new objects.
Motivation:
`NetUtil`'s methods `isValidIpV6Address` and `getIPv6ByName` incorrectly validate some IPv6 addresses.
Modifications:
- `getIPv6ByName`: add checks for single colon at the start or end.
- `isValidIpV6Address`: fix checks for the count of colons and use `endOffset` instead of `ipAddress.length()` for the cases with the brackets or '%'.
Result:
More correct implementation of `NetUtil#isValidIpV6Address` and `NetUtil#getIPv6ByName`.
Motivation:
AtomicIntegerFieldUpdater#get is unnecessary, I think use simple volatile read is cleaner
Modifications:
Replace code STATE_UPDATER.get(this) to state in SingleThreadEventExecutor
Result:
Cleaner code
Motivation:
InternalThreadLocalMap#stringBuilder: ensure memory overhead
Modification:
If the capacity of StringBuilder is greater than 65536 then release it on the next time you get StringBuilder and re-create a StringBuilder.
Result:
Possible less memory usage.
Motivation:
If unsafe is unavailable, we can not use the cleaner anyway. If we try
to set it up, we get an annoying log message about unsafe being
unavailable (when debug logging is enabled). We know this will fail, so
we should not even bother and avoid the log message.
Modifications:
This commit adds a guard against setting up the cleaner if it is not
going to be available because unsafe is unavailable.
Result:
We do not try to set up the cleaner if unsafe is unavailable, and we do
not get an annoying log message.
Motivation:
Users should not see a scary log message when Netty is initialized if
Netty configuration explicitly disables unsafe. The log message that
produces this warning was previously guarded but the guard was
lost.
Modifications:
This commit brings back the guard against the scary log message if
unsafe is explicitly disabled.
Result:
No log message is produced when unsafe is unavailable because Netty was
told to not look for it.
Motivation:
We currently don't have a native transport which supports kqueue https://www.freebsd.org/cgi/man.cgi?query=kqueue&sektion=2. This can be useful for BSD systems such as MacOS to take advantage of native features, and provide feature parity with the Linux native transport.
Modifications:
- Make a new transport-native-unix-common module with all the java classes and JNI code for generic unix items. This module will build a static library for each unix platform, and included in the dynamic libraries used for JNI (e.g. transport-native-epoll, and eventually kqueue).
- Make a new transport-native-unix-common-tests module where the tests for the transport-native-unix-common module will live. This is so each unix platform can inherit from these test and ensure they pass.
- Add a new transport-native-kqueue module which uses JNI to directly interact with kqueue
Result:
JNI support for kqueue.
Fixes https://github.com/netty/netty/issues/2448
Fixes https://github.com/netty/netty/issues/4231
Motivation:
Fixes#6681.
Modification:
For the sake of better timer observability, expose the number of pending timeouts through the new HashedWheelTimer.pendingTimeouts method .
Result:
It's now ridiculously easy to observe Netty timer's very basic and yet important metric, the number of pending tasks/timeouts.
Motivation:
NetUtil#isValidIpV6Address and NetUtil#getIPv6ByName allowed an invalid form of mapped IPv4 addresses which lead to accepting invalid IPv6 addresses as valid.
Modifications:
- NetUtil#isValidIpV6Address and NetUtil#getIPv6ByName should only allow 7 colons for an IPv4 address if they are the first 2 characters.
Result:
More correct implementation of NetUtil#isValidIpV6Address and NetUtil#getIPv6ByName
Motivation:
NetUtil#getByName and NetUtil#isValidIpV6Address do not strictly enforce the format of IPv4 addresses that are allowed to be embedded in IPv6 addresses as specified in https://tools.ietf.org/html/rfc4291#section-2.5.5. This may lead to invalid addresses being parsed, or invalid addresses being considered valid. Compression of a single IPv6 word was also not handled correctly if there are 7 : characters.
Modifications:
- NetUtil#isValidIpV6Address should enforce the IPv4-Compatible and IPv4-Mapped are the only valid formats for including IPv4 addresses as specified in https://tools.ietf.org/html/rfc4291#section-2.5.5
- NetUtil#getByName should more stritcly parse IPv6 addresses which contain IPv4 addresses as specified in https://tools.ietf.org/html/rfc4291#section-2.5.5
- NetUtil should allow compression even if the number of : characters is 7.
- NetUtil#createByteArrayFromIpAddressString should use the same IP string to byte[] translation which is used in NetUtil#getByName
Result:
NetUtil#getByName and NetUtil#isValidIpV6Address respect the IPv6 RFC which defines the valid formats for embedding IPv4 addresses.
Motivation:
In cases when an application is running in a container or is otherwise
constrained to the number of processors that it is using, the JVM
invocation Runtime#availableProcessors will not return the constrained
value but rather the number of processors available to the virtual
machine. Netty uses this number in sizing various resources.
Additionally, some applications will constrain the number of threads
that they are using independenly of the number of processors available
on the system. Thus, applications should have a way to globally
configure the number of processors.
Modifications:
Rather than invoking Runtime#availableProcessors, Netty should rely on a
method that enables configuration when the JVM is started or by the
application. This commit exposes a new class NettyRuntime for enabling
such configuraiton. This value can only be set once. Its default value
is Runtime#availableProcessors so that there is no visible change to
existing applications, but enables configuring either a system property
or configuring during application startup (e.g., based on settings used
to configure the application).
Additionally, we introduce the usage of forbidden-apis to prevent future
uses of Runtime#availableProcessors from creeping. Future work should
enable the bundled signatures and clean up uses of deprecated and
other forbidden methods.
Result:
Netty can be configured to not use the underlying number of processors,
but rather the constrained number of processors.
Motivation:
Java9 added a new method to Unsafe which allows to allocate a byte[] without memset it. This can have a massive impact in allocation times when the byte[] is big. This change allows to enable this when using Java9 with the io.netty.tryAllocateUninitializedArray property when running Java9+. Please note that you will need to open up the jdk.internal.misc package via '--add-opens java.base/jdk.internal.misc=ALL-UNNAMED' as well.
Modifications:
Allow to allocate byte[] without memset on Java9+
Result:
Better performance when allocate big heap buffers and using java9.
Motivation:
As the javadoc of ScheduledExecutorService state:
Zero and negative delays (but not periods) are also allowed in schedule methods,and are treated as requests for immediate execution.
Modifications:
- Correctly handle delay <= 0.
- Add unit tests.
Result:
Fixes [#6627].
Motivation:
When debugging netty memory leaks, it's sometimes helpful to
print the object's reference count.
Modifications:
Add `refCnt` methods to set of already exitsting helpers for ref coutned
objects.
Result:
Users will have utility to print object's ref count without much of a
boilerplate.
Motivation:
Java9 adds a new method to Unsafe which allows to free direct ByteBuffer via the cleaner without the need to use an commandline arguments.
Modifications:
- Add Cleaner interface
- Add CleanerJava9 which will be used when using Java9+ and take care of release direct ByteBuffer
- Let Cleaner0 implement Cleaner
Result:
Be able to free direct ByteBuffer on Java9+ again without any commandline arguments.
Motivation:
When UNSAFE.allocateMemory is returning an address whose high bit is set we currently throw an IllegalArgumentException. This is not correct as it may return a negative number on at least sparc.
Modifications:
- Allow to pass in negative memoryAddress
- Add unit tests
Result:
Correctly validate the memoryAddress and so also work on sparc as expected. Fixes [#6574].
Motivation:
The updated HTTP/1.x RFC allows for header values to be CSV and separated by OWS [1]. CombinedHttpHeaders should remove this OWS on insertion.
[1] https://tools.ietf.org/html/rfc7230#section-7
Modification:
CombinedHttpHeaders doesn't account for the OWS and returns it back to the user as part of the value.
Result:
Fixes#6452
Motivation:
We should use SystemPropertyUtil to access system properties and so always handle SecurityExceptions.
Modifications:
Use SystemPropertyUtil everywhere.
Result:
Better and consist handling of SecurityException.
Motivation:
We forked a new process to detect if the program is run by root. We should better just use user.name system property
Modifications:
- Change PlatformDependent.isRoot0() to read the user.name system property to detect if root runs the program and rename it to maybeSuperUser0().
- Rename PlatformDependent.isRoot() to maybeSuperUser() and let it init directly in the static block
Result:
Less heavy way to detect if the program is run by root.
Motivation:
When UnorderedThreadPoolEventExecutor.execute / submit etc is called it will consume up to 100 % CPU even after the task was executed.
Modifications:
Add a special wrapper which we will be used in execute(...) to wrap the submitted Runnable. This is needed as ScheduledThreadPoolExecutor.execute(...) will delegate to submit(...) which will then use decorateTask(...). The problem with this is that decorateTask(...) needs to ensure we only do our own decoration if we not call from execute(...) as otherwise we may end up creating an endless loop because DefaultPromise will call EventExecutor.execute(...) when notify the listeners of the promise.
Result:
Fixes [#6507].
Motivation:
PlatformDependent0 makes assumptions that the array index scale for byte[] is always 1. If this is not the case the results from methods which make this assumption will be undefined.
Modifications:
- PlatformDependent0 should check if unsafe.arrayIndexScale(byte[].class) is not 1, and if so not use unsafe
Result:
Assumptions made by optimizations in PlatformDependent0 which use byte[] are explicitly enforced.
Motivation:
We only need to add the port to the HOST header value if its not a standard port.
Modifications:
- Only add port if needed.
- Fix parsing of ipv6 address which is enclosed by [].
Result:
Fixes [#6426].
Motivation:
Calling a static method is faster then dynamic
Modifications:
Add 'static' keyword for methods where it missed
Result:
A bit faster method calls
Motivation:
We shipped a javassist based implementation for typematching and logged a confusing debug message about missing javassist. We never were able to prove it really gives any perf improvements so we should just remove it.
Modifications:
- Remove javassist dependency and impl
- Fix possible classloader deadlock as reported by intellij
Result:
Less code to maintain and less confusing log message.
Motivation:
We should log why we can not use ByteBuffer.cleaner and so maybe allow the user to fix it.
Modifications:
- Use Unsafe to access the field
- Log the exception when we can not use ByteBuffer.cleaner
Result:
Easier to debug why using cleaner is not possible.
Motivation:
We have our own ThreadLocalRandom implementation to support older JDKs . That said we should prefer the JDK provided when running on JDK >= 7
Modification:
Using ThreadLocalRandom implementation of the JDK when possible.
Result:
Make use of JDK implementations when possible.
Motivation:
Java9 does not allow changing access level via reflection by default. This lead to the situation that netty disabled Unsafe completely as ByteBuffer.address could not be read.
Modification:
Use Unsafe to read the address field as this works on all Java versions.
Result:
Again be able to use Unsafe optimisations when using Netty with Java9
Motivation:
NetworkInterface.getNetworkInterfaces() may return null if no network interfaces are found. We should guard against it.
Modifications:
Check for null return value.
Result:
Fixes [#6384]
Motiviation:
Simplify implementation of compareTo/equals/hashCode for ChannelIds.
Modifications:
We simplfy the hashCode implementation for DefaultChannelId by not
making it random, but making it based on the underlying data. We fix the
compareTo implementation for DefaultChannelId by using lexicographic
comparison of the underlying data array. We fix the compareTo
implementation for CustomChannelId to avoid the possibility of overflow.
Result:
Cleaner code that is easier to maintain.
Motivation:
Java8 is out now for some time and JDK7 is no longer supported officially. We should remove all our backports and just use what the JDK provides us. This also will allow us to use intrinsics that are offered by the JDK implementations.
Modifications:
Remove all backports of jdk8 classes.
Result:
Use what the JDK offers us. This also fixes [#5458]
Motivation:
Initialization of PlatformDependent0 fails on Java 9 in static initializer when calling setAccessible(true).
Modifications:
Add RefelectionUtil which can be used to safely try if setAccessible(true) can be used or not and if not fail back to non reflection.
Result:
Fixed [#6345]
Motivation:
EpollRecvByteAllocatorHandle intends to override the meaning of "maybe more data to read" which is a concept also used in all existing implementations of RecvByteBufAllocator$Handle but the interface doesn't support overriding. Because the interfaces lack the ability to propagate this computation EpollRecvByteAllocatorHandle attempts to implement a heuristic on top of the delegate which may lead to reading when we shouldn't or not reading data.
Modifications:
- Create a new interface ExtendedRecvByteBufAllocator and ExtendedHandle which allows the "maybe more data to read" between interfaces
- Deprecate RecvByteBufAllocator and change all existing implementations to extend ExtendedRecvByteBufAllocator
- transport-native-epoll should require ExtendedRecvByteBufAllocator so the "maybe more data to read" can be propagated to the ExtendedHandle
Result:
Fixes https://github.com/netty/netty/issues/6303.
Motivation:
The JDK uses gethostbyname for blocking hostname resoltuion. gethostbyname can be configured on Unix systems according to [1][2]. This may impact the name server that is used to resolve particular domains or just override the default fall-back resolver. DnsNameResolver currently ignores these configuration files which means the default resolution behavior is different than the JDK. This may lead to unexpected resolution failures which succeed when using the JDK's resolver.
Modifications:
- Add an interface which can override what DnsServerAddressStream to use for a given hostname
- Provide a Unix specific implementation of this interface and implement [1][2]. Some elements may be ignored sortlist, timeout, etc...
Result:
DnsNameResolver behaves more like the JDK resolver by default.
[1] https://linux.die.net/man/5/resolver
[2] https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man5/resolver.5.html
Motivation:
ResourceLeakDetector supports a parameter called maxActive. This parameter is used in attempt to limit the amount of objects which are being tracked for leaks at any given time, and generates an error log message if this limit is exceeded. This assumes that there is a relationship between leak sample rate and object lifetime for objects which are already being tracked. This relationship may appear to work in cases were there are a single leak record per object and those leak records live for the lifetime of the application but in general this relationship doesn't exist. The original motivation was to provide a limit for cases such as HashedWheelTimer to limit the number of instances which exist at any given time. This limit is not enforced in all circumstances in HashedWheelTimer (e.g. if the thread is a daemon) and can be implemented outside ResourceLeakDetector.
Modifications:
- Deprecate all methods which interact with maxActive in ResourceLeakDetectorFactory and ResourceLeakDetector
- Remove all logic related to maxActive in ResourceLeakDetector
- HashedWheelTimer implements its own logic to impose a limit and warn users if too many instances exists at any given time.
Result:
Fixes https://github.com/netty/netty/issues/6225.
Motivation:
When comparing MAC addresses searching for the best MAC address, if
locally-administered address (e.g., from a Docker container) is compared
against an empty MAC address, the empty MAC address will be marked as
preferred. In cases this is the only available MAC address, this leaves
Netty using a random machine ID instead of using a perfectly valid
machine ID from the locally-adminstered address.
Modifications:
This commit modifies the MAC address logic so that the empty MAC address
is not preferred over a locally-administered address. This commit also
simplifies the comparison logic here.
Result:
Empty MAC addresses will not be preferred over locally-administered
addresses thus permitting the default machine ID to be the
locally-adminstered MAC address if it is the only available MAC address.
Motivation:
codec-http2 couples the dependency tree state with the remainder of the stream state (Http2Stream). This makes implementing constraints where stream state and dependency tree state diverge in the RFC challenging. For example the RFC recommends retaining dependency tree state after a stream transitions to closed [1]. Dependency tree state can be exchanged on streams in IDLE. In practice clients may use stream IDs for the purpose of establishing QoS classes and therefore retaining this dependency tree state can be important to client perceived performance. It is difficult to limit the total amount of state we retain when stream state and dependency tree state is combined.
Modifications:
- Remove dependency tree, priority, and weight related items from public facing Http2Connection and Http2Stream APIs. This information is optional to track and depends on the flow controller implementation.
- Move all dependency tree, priority, and weight related code from DefaultHttp2Connection to WeightedFairQueueByteDistributor. This is currently the only place which cares about priority. We can pull out the dependency tree related code in the future if it is generally useful to expose for other implementations.
- DefaultHttp2Connection should explicitly limit the number of reserved streams now that IDLE streams are no longer created.
Result:
More compliant with the HTTP/2 RFC.
Fixes https://github.com/netty/netty/issues/6206.
[1] https://tools.ietf.org/html/rfc7540#section-5.3.4
Motivation:
A testing goof in 7c630fe introduced a binary incompatibility when the old Promise-specific `add` and `addAll` methods in PromiseCombiner were generalized to accept `Futures`.
Modification:
- Restore (but mark as `@Deprecated`) old PromiseCombiner methods.
- Fixed a couple minor documentation typos because sure why not.
Result:
`PromiseCombiner` is binary-compatible with previous versions of Netty.