Motivation:
PooledByteBufAllocator uses a PoolThreadCache per Thread that allocates / deallocates to minimize the performance overhead. This PoolThreadCache is trimmed after X allocations to free up buffers that are not allocated for a long time. This works out quite well when the app continues to allocate but fails if the app stops to allocate frequently (for whatever reason) and so a lot of memory is wasted and not given back to the arena / freed.
Modifications:
- Add a ThreadExecutorMap that offers multiple methods that wrap Runnable / ThreadFactory / Executor and allow to call ThreadExecutorMap.currentEventExecutor() to get the current executing EventExecutor for the calling Thread.
- Use these methods in the constructors of our EventExecutor implementations (which also covers the EventLoop implementations)
- Add io.netty.allocator.cacheTrimIntervalMillis system property which can be used to specify a fixed rate / interval on which we should try to trim the PoolThreadCache for a EventExecutor that allocates.
- Add PooledByteBufAllocator.trimCurrentThreadCache() to allow the user to trim the cache of the calling thread manually.
- Add testcases
- Introduce FastThreadLocal.getIfExists()
Result:
Allow to better / more frequently trim PoolThreadCache and so give back memory to the area / system.
Motivation:
We dont use ObjectCleaner in our FastThreadLocal anymore so we also dont need to take special care to store it there anymore.
Modifications:
Remove code that is not needed anymore.
Result:
Code cleanup.
Motivation:
GlobalEventExecutor does already provide all guarantees of OrderedEventExecutor so it should implement it.
Modifications:
Let GlobalEventExecutor implement OrderedEventExecutor.
Result:
Make it more clear how execution order is handled in GlobalEventExecutor.
Motivation:
This counter is very useful in order to monitor Netty without having every ByteBufAllocator in the JVM
Modification:
Expose the value of DIRECT_MEMORY_COUNTER as we are already doing for DIRECT_MEMORY_LIMIT.
We are returning -1 in case that DIRECT_MEMORY_COUNTER is not available.
Result:
Be able to get the amount of direct memory used.
Motivation:
We had a typo in NativeLibraryLoader debug log message which could misslead the user.
Modifications:
Fix typo to correctly state java.library.path
Result:
Correct and less confusing log message
Motivation:
2bb9f64e16 introduced a change which made it possible to use different shaded versions of netty-tcnative on the classpath. This only partly worked as we did not correctly handled the case when os / arch is part of the library name (which is the case when netty-tcnative-boringssl-static is used with the uber jar).
Modifications:
- If patching the ID failed we retry again with the os / arch stripped
- Add unit tests to verify that patching ID now works with and without os / arch as suffix.
Result:
Using multiple shaded version of netty-tcnative-boringssl-static on MacOS works.
Motivation:
We should run a CI job using J9 to ensure netty also works when using different JVMs.
Modifications:
- Adjust PooledByteBufAllocatorTest to be able to complete faster when using a JVM which takes longer when joining Threads (this seems to be the case with J9).
- Skip UDT tests on J9 as UDT is not supported there.
Result:
Be able to run CI against J9.
Motivation:
PromiseCombiner is not thread-safe and even assumes all added Futures are using the same EventExecutor. This is kind of fragile as we do not enforce this. We need to enforce this contract to ensure it's safe to use and easy to spot concurrency problems.
Modifications:
- Add new contructor to PromiseCombiner that takes an EventExecutor and deprecate the old non-arg constructor.
- Check if methods are called from within the EventExecutor thread and if not fail
- Correctly dispatch on the right EventExecutor if the Future uses a different EventExecutor to eliminate concurrency issues.
Result:
More safe use of PromiseCombiner + enforce correct usage / contract.
Motivation:
https://github.com/netty/netty/pull/8866 added support for calling Iterator.remove() but did not add a testcase.
Modifications:
Add testcase to ensure removal works.
Result:
Better test-coverage.
Motivation:
As ActiveMQ project using netty, we want to make use of this class, unfortunately the iterator on values(), seems to not support remove method, even so the delegated iterator does. Currently we have to clone and modify this class locally albeit a one line change is needed, it would be ideal if netty could allow remove, then removing the need to maintain a clone.
Modifications:
* remove throws UnsupportedOperationException, and instead call remove method on delegated iterator
Result:
Be able to call Iterator.remove() for the values.
Motivation:
When users' /tmp is noexec, NativeLibraryLoader logs a message informing
them how to fix the problem by setting a system property. However, if
Netty has been shaded that message will tell them to set the un-shaded
system property name, which won't work.
Modifications:
Change the code to let shading tools rename the native.workdir property
name reference within user-visible log messages.
Notably, debug logs were _not_ changed, as there's many debug statements
including a variety of property names. Fixing them would be a much more
invasive change and have limited benefit.
Result:
The users will see the correctly-named system property to set if they
are using a noexec /tmp.
Motivation:
When we fail a call to PromiseCombiner.finish(...) because of a null argument we must not update the internal state before throwing.
Modifications:
- First do the null check and only after we validated that the argument is not null update the internal state
- Add test case.
Modifications:
Do not mess up internal state of PromiseCombiner when finish(...) is called with a null argument.
Result:
After your change, what will change.
Motivation:
Just was looking through code and found 1 interesting place DateFormatter.tryParseMonth that was not very effective, so I decided to optimize it a bit.
Modification:
Changed DateFormatter.tryParseMonth method. Instead of invocation regionMatch() for every month - compare chars one by one.
Result:
DateFormatter.parseHttpDate method performance improved from ~3% to ~15%.
Benchmark (DATE_STRING) Mode Cnt Score Error Units
DateFormatter2Benchmark.parseHttpHeaderDateFormatter Sun, 27 Jan 2016 19:18:46 GMT thrpt 6 4142781.221 ± 82155.002 ops/s
DateFormatter2Benchmark.parseHttpHeaderDateFormatter Sun, 27 Dec 2016 19:18:46 GMT thrpt 6 3781810.558 ± 38679.061 ops/s
DateFormatter2Benchmark.parseHttpHeaderDateFormatterNew Sun, 27 Jan 2016 19:18:46 GMT thrpt 6 4372569.705 ± 30257.537 ops/s
DateFormatter2Benchmark.parseHttpHeaderDateFormatterNew Sun, 27 Dec 2016 19:18:46 GMT thrpt 6 4339785.100 ± 57542.660 ops/s
Motivation
As pointed out by @91he in
https://github.com/netty/netty/pull/8595#issuecomment-459181794, there
is a remaining bug in LocationAwareSlf4JLogger following the updates
done in #8595. The logging methods which take a varargs message
parameter array should format using MessageFormatter.arrayFormat rather
than MessageFormatter.format.
Modifications
Change varargs param methods in LocationAwareSlf4JLogger to use
MessageFormatter.arrayFormat and extend unit test to cover these cases.
Results
Correct log output when logging messages with > 2 parameters when using
LocationAwareSlf4JLogger.
Motivation:
If there are no listeners attached to the promise when full-filling it we do not need to schedule a task to notify.
Modifications:
- Don't schedule a task if there is nothing to notify.
- Add unit tests.
Result:
Fixes https://github.com/netty/netty/issues/8795.
Motivation:
To conform to the CharSequence interface we need to return an empty CharSequence when start == end index and a subSequence is requested.
Modifications:
- Correctly handle the case where start == end
- Add unit test
Result:
Fix https://github.com/netty/netty/issues/8796.
Motivation:
We need to update to a new checkstyle plugin to allow the usage of lambdas.
Modifications:
- Update to new plugin version.
- Fix checkstyle problems.
Result:
Be able to use checkstyle plugin which supports new Java syntax.
Motivation:
Incrementing two variables in sync is not necessary when only one will do.
Modifications:
- Remove `j` from `for` loop and replace with `i`.
- Add more unit testing scenarios to cover changed code.
Results:
Unnecessary variable removed.
Motivation:
We should try removing all FastThreadLocals for the Thread before we notify the termination. future. The user may block on the future and once it unblocks the JVM may terminate and start unloading classes.
Modifications:
Remove all FastThreadLocals for the Thread before notify termination future.
Result:
Fixes https://github.com/netty/netty/issues/6596.
Motivation:
In windows if the project is in a path that contains whitespace,
resources cannot be accessed and tests fail.
Modifications:
Adds ResourcesUtil.java in netty-common. Tests use ResourcesUtil.java to access a resource.
Result:
Being able to build netty in a path containing whitespace
Motivation
#8563 highlighted race conditions introduced by the prior optimistic
update optimization in 83a19d5650. These
were known at the time but considered acceptable given the perf
benefit in high contention scenarios.
This PR proposes a modified approach which provides roughly half the
gains but stronger concurrency semantics. Race conditions still exist
but their scope is narrowed to much less likely cases (releases
coinciding with retain overflow), and even in those
cases certain guarantees are still assured. Once release() returns true,
all subsequent release/retains are guaranteed to throw, and in
particular deallocate will be called at most once.
Modifications
- Use even numbers internally (including -ve) for live refcounts
- "Final" release changes to odd number (equivalent to refcount 0)
- Retain still uses faster getAndAdd, release uses CAS loop
- First CAS attempt uses non-volatile read
- Thread.yield() after a failed CAS provides a net gain
Result
More (though not completely) robust concurrency semantics for ref
counting; increased latency under high contention, but still roughly
twice as fast as the original logic. Bench results to follow
Motivation:
We did miss to use MessageFormatter inside LocationAwareSlf4jLogger and so {} was not correctly replaced in log messages when using slf4j.
This regression was introduced by afe0767e9c.
Modifications:
- Make use of MessageFormatter
- Add unit test.
Result:
Fixes https://github.com/netty/netty/issues/8483.
Motivation:
We can change from using compareAndSet to addAndGet, which emits a different CPU instruction on x86 (CMPXCHG to XADD) when count direct memory usage. This instruction is cheaper in general and so produce less overhead on the "happy path". If we detect too much memory usage we just rollback the change before throwing the Error.
Modifications:
Replace compareAndSet(...) with addAndGet(...)
Result:
Less overhead when tracking direct memory.
Motivation:
We should allow adjustment of the leak detecting sampling interval when in SAMPLE mode.
Modifications:
Added new int property io.netty.leakDetection.samplingInterval
Result:
Be able to consume changes made by the user.
Motivation:
There is a racy UnsupportedOperationException instead because the task removal is delegated to MpscChunkedArrayQueue that does not support removal. This happens with SingleThreadEventExecutor that overrides the newTaskQueue to return an MPSC queue instead of the LinkedBlockingQueue returned by the base class such as NioEventLoop, EpollEventLoop and KQueueEventLoop.
Modifications:
- Catch the UnsupportedOperationException
- Add unit test.
Result:
Fix#8475
Motivation:
allLeaks is to store the DefaultResourceLeak. When we actually use it, the key is DefaultResourceLeak, and the value is actually a meaningless value.
We only care about the keys of allLeaks and don't care about the values. So Set is more in line with this scenario.
Using Set as a container is more consistent with the definition of a container than Map.
Modification:
Replace allLeaks with set. Create a thread-safe set using 'Collections.newSetFromMap(new ConcurrentHashMap<DefaultResourceLeak<?>, Boolean>()).'
Motivation:
HWT does not support anything smaller then 1ms so we should make it clear that this is the case.
Modifications:
Log a warning if < 1ms is used.
Result:
Less suprising behaviour.
Motivation:
In netty we use our own max direct memory limit that can be adjusted by io.netty.maxDirectMemory but we do not take this in acount when maxDirectMemory() is used. That will lead to non optimal configuration of PooledByteBufAllocator in some cases.
This came up on stackoverflow:
https://stackoverflow.com/questions/53097133/why-is-default-num-direct-arena-derived-from-platformdependent-maxdirectmemory
Modifications:
Correctly respect io.netty.maxDirectMemory and so configure PooledByteBufAllocator correctly by default.
Result:
Correct value for max direct memory.
Motivation:
There are currently many more places where this could be used which were
possibly not considered when the method was added.
If https://github.com/netty/netty/pull/8388 is included in its current
form, a number of these places could additionally make use of the same
BYTE_ARRAYS threadlocal.
There's also a couple of adjacent places where an optimistically-pooled
heap buffer is used for temp byte storage which could use the
threadlocal too in preference to allocating a temp heap bytebuf wrapper.
For example
https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/ByteBufUtil.java#L1417.
Modifications:
Replace new byte[] with PlatformDependent.allocateUninitializedArray()
where appropriate; make use of ByteBufUtil.getBytes() in some places
which currently perform the equivalent logic, including avoiding copy of
backing array if possible (although would be rare).
Result:
Further potential speed-up with java9+ and appropriate compile flags.
Many of these places could be on latency-sensitive code paths.
Motivation:
trackedObject != null gives no guarantee that trackedObject remains reachable. This may cause problems related to premature finalization: false leak detector warnings.
Modifications:
Add private method reachabilityFence0 that works on JDK 8 and can be factored out into PlatformDependent. Later, it can be swapped for the real Reference.reachabilityFence.
Result:
No false leak detector warnings in future versions of JDK.
Motivation:
DefaultResourceLeak.toString() did include the wrong value for duplicated records.
Modifications:
Include the correct value.
Result:
Correct toString() implementation.
Motivation:
Java since version 6 has the wrapper for the ConcurrentHashMap that could be created via Collections.newSetFromMap(map). So no need to create own ConcurrentSet class. Also, since netty plans to switch to Java 8 soon there is another method for that - ConcurrentHashMap.newKeySet().
For now, marking this class @deprecated would be enough, just to warn users who use netty's ConcurrentSet. After switching to Java 8 ConcurrentSet should be removed and replaced with ConcurrentHashMap.newKeySet().
Modification:
ConcurrentSet deprecated.
Motivation:
Seems like IntegerHolder counterHashCode field is the very old legacy field that is no longer used. Should be marked as deprecated and removed in the future versions.
Modification:
IntegerHolder class, InternalThreadLocalMap.counterHashCode() and InternalThreadLocalMap.setCounterHashCode(IntegerHolder counterHashCode) are now deprecated.
Motivation:
When a X509TrustManager is used while configure the SslContext the JDK automatically does some extra checks during validation of provided certs by the remote peer. We should do the same when our native implementation is used.
Modification:
- Automatically wrap a X509TrustManager and so do the same validations as the JDK does.
- Add unit tests.
Result:
More consistent behaviour. Fixes https://github.com/netty/netty/issues/6664.
Motivation:
The Epoll transport checks to see if there are any scheduled tasks
before entering epoll_wait, and resets the timerfd just before.
This causes an extra syscall to timerfd_settime before doing any
actual work. When scheduled tasks aren't added frequently, or
tasks are added with later deadlines, this is unnecessary.
Modification:
Check the *deadline* of the peeked task in EpollEventLoop, rather
than the *delay*. If it hasn't changed since last time, don't
re-arm the timer
Result:
About 2us faster on gRPC RTT 50pct latency benchmarks.
Before (2 runs for 5 minutes, 1 minute of warmup):
```
50.0%ile Latency (in nanos): 64267
90.0%ile Latency (in nanos): 72851
95.0%ile Latency (in nanos): 78903
99.0%ile Latency (in nanos): 92327
99.9%ile Latency (in nanos): 119691
100.0%ile Latency (in nanos): 13347327
QPS: 14933
50.0%ile Latency (in nanos): 63907
90.0%ile Latency (in nanos): 73055
95.0%ile Latency (in nanos): 79443
99.0%ile Latency (in nanos): 93739
99.9%ile Latency (in nanos): 123583
100.0%ile Latency (in nanos): 14028287
QPS: 14936
```
After:
```
50.0%ile Latency (in nanos): 62123
90.0%ile Latency (in nanos): 70795
95.0%ile Latency (in nanos): 76895
99.0%ile Latency (in nanos): 90887
99.9%ile Latency (in nanos): 117819
100.0%ile Latency (in nanos): 14126591
QPS: 15387
50.0%ile Latency (in nanos): 61021
90.0%ile Latency (in nanos): 70311
95.0%ile Latency (in nanos): 76687
99.0%ile Latency (in nanos): 90887
99.9%ile Latency (in nanos): 119527
100.0%ile Latency (in nanos): 6351615
QPS: 15571
```
* Log the correct line-number when using SLF4j with netty if possible.
Motivation:
At the moment we do not log the correct line number in many cases as it will log the line number of the logger wrapper itself. Slf4j does have an extra interface that can be used to filter out these nad make it more usable with logging wrappers.
Modifications:
Detect if the returned logger implements LocationAwareLogger and if so make use of its extra methods to be able to log the correct origin of the log request.
Result:
Better logging when using slf4j.