Motivation:
In #9603 the executor hung on shutdown because of an abandoned task
on another executor the first was waiting for.
Modifications:
This commit modifies the executor shutdown sequence to include
switching to SHUTDOWN state and then running all remaining tasks.
This ensures that no more tasks are scheduled after SHUTDOWN and
the last pass of running remaining tasks will take it all.
Any tasks scheduled after SHUTDOWN will be rejected.
This change preserves the functionality of graceful shutdown with
quiet period and only adds one more pass of task execution after
the default shutdown process has finished and the executor is
ready for termination.
Result:
After this change tasks that succeed to be added to the executor will
be always executed. Tasks which come late will be rejected instead of
abandoned.
Motivation:
We should include the shaded sources for JCTools in our sources jar to make it easier to debug.
Modifications:
- Adjust plugin configuration to execute plugins in correct order
- Update source plugin
- Add configuration for shade plugin to generate source jar content
Result:
Fixes https://github.com/netty/netty/issues/6640.
Motivation:
Replace Map with Set. `reportedLeaks` has better semantics as a Set, and if it is a Map, it seems that the value of this Map has no meaning to us.
Modifications:
Use Set.
Result:
Cleaner code
Motivation
The event loop implementations had become somewhat tangled over time and
work was done recently to streamline EpollEventLoop. NioEventLoop would
benefit from the same treatment and it is more straighforward now that
we can follow the same structure as was done for epoll.
Modifications
Untangle NioEventLoop logic and mirror what's now done in EpollEventLoop
w.r.t. the volatile selector wake-up guard and scheduled task deadline
handling.
Some common refinements to EpollEventLoop have also been included - to
use constants for the "special" deadline/wakeup volatile values and to
avoid some unnecessary calls to System.nanoTime() on task-only
iterations.
Result
Hopefully cleaner, more efficient and less fragile NIO transport
implementation.
Motivation:
We can move some methods etc to make encapsulation better in Recycler
Modifications:
Move / rename methods to make usage more clear
Result:
Code cleanup
Motivation
Currently when future tasks are scheduled via EventExecutors from a
different thread, at least two allocations are performed - the
ScheduledFutureTask wrapping the to-be-run task, and a Runnable wrapping
the action to add to the scheduled task priority queue. The latter can
be avoided by incorporating this logic into the former.
Modification
- When scheduling or cancelling a future task from outside the event
loop, enqueue the task itself rather than wrapping in a Runnable
- Have ScheduledFutureTask#run first verify the task's deadline has
passed and if not add or remove it from the scheduledTaskQueue depending
on its cancellation state
- Add new outside-event-loop benchmarks to ScheduleFutureTaskBenchmark
Result
Fewer allocations when scheduling/cancelling future tasks
Motivation
The recycling ratio is currently implemented by comparing with a masked
count. The mask operation is not free and also not necessary.
Modification
Change the count(s) to just iterate over the corresponding interval,
which requires only a comparison and no mask.
Also make "first time recycle" behaviour consistent and revert change to
RecyclerTest made in #9727.
Result
Less recycling overhead
Motivation:
At the moment we only enfore ratioMask for the Stack which means that we only guard against recycle burts when recycled from the same Thread. We should also enforce the ratioMask in the WeakOrderQueue so we also guard against the bursts when recycle from other threads.
Modifications:
- Keep counter in WeakOrderQueue to enforce ratioMask as well
- Adjust unit test
Result:
Better guard against recycle bursts which could pollute the heap unnecessary.
Motivation
Currently the visibility of the various Recycler inner classes and their
fields isn't optimal. Some private members are accessed by other classes
resulting in synthetic methods, and other non-private classes/members
are only accessed privately and so can be made private.
Modifications
- Increase/reduce visibility of various fields/methods/classes within
Recycler
- Have WeakOrderQueue extend WeakReference<Thread> to eliminate the
owner field
- Change local DefaultHandle var to DefaultHandle<?> to avoid raw type
compiler warning
Result
Tidier code, fewer implicit methods on hot paths (reducing inlining
depths)
Motivation
This is already done internally for various reasons but it would make
sense i.m.o. as a top level concept: submitting a task to be run on the
event loop which doesn't need to run immediately but must still be
executed in FIFO order relative all other submitted tasks (be those
"lazy" or otherwise).
It's nice to separate this abstract "relaxed" semantic from concrete
implementations - the simplest is to just delegate to existing execute,
but for the main EL impls translates to whether a wakeup is required
after enqueuing.
Having a "global" abstraction also allows for simplification of our
internal use - for example encapsulating more of the common scheduled
future logic within AbstractScheduledEventExecutor.
Modifications
- Introduce public LazyRunnable interface and
AbstractEventExecutor#lazyExecute method (would be nice for this to be
added to EventExecutor interface in netty 5)
- Tweak existing SingleThreadEventExecutor mechanics to support these
- Replace internal use of NonWakeupRunnable (such as for pre-flush
channel writes)
- Uplift scheduling-related hooks into AbstractScheduledEventExecutor,
eliminating intermediate executeScheduledRunnable method
Result
Simpler code, cleaner and more useful/flexible abstractions - cleaner in
that they fully communicate the intent in a more general way, without
implying/exposing/restricting implementation details
Motivation:
We currently use a finalizer to ensure we correctly return the reserved back to the Stack but this is not really needed as we can ensure we return it when needed before dropping the WeakOrderQueue
Modifications:
Use explicit method call to ensure we return the reserved space back before dropping the object
Result:
Less finalizer usage and so less work for the GC
Motivation:
We null out the element in the array after we decrement the current size of the Stack but not directly write back the updated size to the stored field. This is problematic as we do some validation before we write it back and so may never do so if the validation fails. This then later can lead to have null objects returned where not expected
Modifications:
Update size directly after null out object
Result:
No more unexpected null value possible
##Motivation
The InternalLoggerFactory attempts to instantiate different logger
implementations to discover what is available on the class path,
accepting the first implementation that does not throw an exception.
Currently, the default ordering will attempt to instantiate a Log4j1
logger before Log4j2. For environments where both Log4j1 and Log4j2 are
available, this will result in using the older version. It seems that it
would be more intuitive to prefer the newer version, when possible.
##Modifications
Change the default ordering to attempt to use the Log4J2LoggerFactory
before the Log4JLoggerFactory.
##Result
For environments where both Log4j1 and Log4j2 are available on the class
path (but Slf4J is not available), Netty will now use Log4j2 instead of
Log4j1.
Motivation:
If maxDelayedQueues == 0 we should never put any WeakHashMap into the FastThreadLocal for a Thread.
Modifications:
Check if maxDelayedQueues == 0 and if so return directly. This will ensure we never call FastThreadLocal.initialValue() in this case
Result:
Less overhead / memory usage when maxDelayedQueues == 0
Motivation:
At the moment we directly extend the Recycler base class in our code which makes it hard to experiment with different Object pool implementation. It would be nice to be able to switch from one to another by using a system property in the future. This would also allow to more easily test things like https://github.com/netty/netty/pull/8052.
Modifications:
- Introduce ObjectPool class with static method that we now use internally to obtain an ObjectPool implementation.
- Wrap the Recycler into an ObjectPool and return it for now
Result:
Preparation for different ObjectPool implementations
Motivation:
We do not correct guard against the gact that when applying our workaround for windows we may end up with a 0 sleep period. In this case we should just sleep for 1 ms.
Modifications:
Guard agains the case when our calculation will produce 0 as sleep time on windows
Result:
Fixes https://github.com/netty/netty/issues/9710.
Motivation:
Netty is an asynchronous framework.
If somebody uses a blocking call inside Netty's event loops,
it may lead to a severe performance degradation.
BlockHound is a tool that helps detecting such calls.
Modifications:
This change adds a BlockHound's SPI integration that marks
threads created by Netty (`FastThreadLocalThread`s) as non-blocking.
It also marks some of Netty's internal methods as whitelisted
as they are required to run the event loops.
Result:
When BlockHound is installed, any blocking call inside event loops
is intercepted and reported (by default an error will be thrown).
Motivation
Currently when future tasks are scheduled via schedule(Runnable, ...)
methods, the supplied Runnable is wrapped in a newly allocated Callable
adapter prior to being wrapped in a ScheduledFutureTask.
This can be avoided which saves an object allocation per scheduled task.
Modifications
Change the Callable task field of ScheduledFutureTask to be of type
Object so that it can hold/run Runnables directly in addition to
Callables.
An "adapter" is still used in the case a Runnable is scheduled with an
explicit constant non-null completion value, assumed to be rare.
Result
Less garbage
Motivation:
We can use the `@SuppressJava6Requirement` annotation to be more precise about when we use Java6+ APIs. This helps us to ensure we always protect these places.
Modifications:
Make use of `@SuppressJava6Requirement` explicit
Result:
Fixes https://github.com/netty/netty/issues/2509.
Motivation
The current event loop shutdown logic is quite fragile and in the
epoll/NIO cases relies on the default 1 second wait/select timeout that
applies when there are no scheduled tasks. Without this default timeout
the shutdown would hang indefinitely.
The timeout only takes effect in this case because queued scheduled
tasks are first cancelled in
SingleThreadEventExecutor#confirmShutdown(), but I _think_ even this
isn't robust, since the main task queue is subsequently serviced which
could result in some new scheduled task being queued with much later
deadline.
It also means shutdowns are unnecessarily delayed by up to 1 second.
Modifications
- Add/extend unit tests to expose the issue
- Adjust SingleThreadEventExecutor shutdown and confirmShutdown methods
to explicitly add no-op tasks to the taskQueue so that the subsequent
event loop iteration doesn't enter blocking wait (as looks like was
originally intended)
Results
Faster and more robust shutdown of event loops, allows removal of the
default wait timeout
Motivation
When ScheduledFutureTasks complete, there's no need to retain a ref to
the wrapped task. Clearing it could help in particular with the case
where many scheduled tasks have been cancelled but their queue removal
delayed (since it is done lazily).
Modifications
This comprises just the PromiseTask changes from #9580. Upon completion,
replace the task reference with a static sentinel depending on the type
of completion (so that it will be reflected by toString).
Result
More expedient collection of cancelled task objects
Motivation:
Recycler$Stack.pop will occurs `ArrayIndexOutOfBoundsException` in some race cases, we should double check `size` even after `scavenge` called.
Modifications:
Double check `size` after `scavenge`
Result:
avoid ArrayIndexOutOfBoundsException in `pop`
Motivation
Currently a static AtomicLong is used to allocate a unique id whenever a
task is scheduled to any event loop. This could be a source of
contention if delayed tasks are scheduled at a high frequency and can be
easily avoided by having a non-volatile id counter per queue.
Modifications
- Replace static AtomicLong ScheduledFutureTask#nextTaskId with a long
field in AbstractScheduledExecutorService
- Set ScheduledFutureTask#id based on this when adding the task to the
queue (in event loop) instead of at construction time
- Add simple benchmark
Result
Less contention / cache-miss possibility when scheduling future tasks
Before:
Benchmark (num) Mode Cnt Score Error Units
scheduleLots 100000 thrpt 20 346.008 ± 21.931 ops/s
Benchmark (num) Mode Cnt Score Error Units
scheduleLots 100000 thrpt 20 654.824 ± 22.064 ops/s
Motivation:
peek() is implemented in a similar way to poll() for the mpsc queue, thus it is more like a consumer call.
It is possible that we could have multiple thread call peek() and possibly one thread calls poll() at at the same time.
This lead to multiple consumer scenario, which violates the multiple producer single consumer condition and could lead to spin in an infinite loop in peek()
Modification:
Use isEmpty() instead of peek() to check if task queue is empty
Result:
Dont violate the mpsc semantics.
Motivation:
SystemPropertyUtil already uses the AccessController internally so not need to wrap its usage with AccessController as well.
Modifications:
Remove explicit AccessController usage when SystemPropertyUtil is used.
Result:
Code cleanup
Motivation:
At the current moment HttpContentEncoder handle only first value of multiple accept-encoding headers.
Modification:
Join multiple accept-encoding headers to one separated by comma.
Result:
Fixes#9553
Motivation
Currently every call to get() on a promise results in two reads of the
volatile result field when one would suffice. Maybe this is optimized
away but it seems sensible not to rely on that.
Modification
Reimplement get() and get(...) in DefaultPromise to reduce volatile access.
Result
Fewer volatile reads.
Motivation
#9152 reverted some static exception reuse optimizations due to the
problem with Throwable#addSuppressed() raised in #9151. This introduced
a performance issue when promises are cancelled at a high frequency due
to the construction cost of CancellationException at the time that
DefaultPromise#cancel() is called.
Modifications
- Reinstate the prior static CANCELLATION_CAUSE_HOLDER but use it just
as a sentinel to indicate cancellation, constructing a new
CancellationException only if/when one needs to be explicitly
returned/thrown
- Subclass CancellationException, overriding fillInStackTrace() to
minimize the construction cost in these cases
Result
Promises are much cheaper to cancel. Fixes#9522.
Motivation:
There are some extra log level checks (logger.isWarnEnabled()).
Modification:
Remove log level checks (logger.isWarnEnabled()) from io.netty.channel.epoll.AbstractEpollStreamChannel, io.netty.channel.DefaultFileRegion, io.netty.channel.nio.AbstractNioChannel, io.netty.util.HashedWheelTimer, io.netty.handler.stream.ChunkedWriteHandler and io.netty.channel.udt.nio.NioUdtMessageConnectorChannel
Result:
Fixes#9456
Motivation:
The Netty classes are initialized at build time by default for GraalVM Native Image compilation. This is configured via the `--initialize-at-build-time=io.netty` option. While this reduces start-up time it can lead to some problems:
- The class initializer of `io.netty.buffer.PooledByteBufAllocator` looks at the maximum memory size to compute the size of internal buffers. If the class initializer runs during image generation, then the buffers are sized according to the very large heap size that the image generator uses, and Netty allocates several arrays that are 16 MByte. The fix is to initialize the following 3 classes at run time: `io.netty.buffer.PooledByteBufAllocator,io.netty.buffer.ByteBufAllocator,io.netty.buffer.ByteBufUtil`. This fix was dependent on a GraalVM Native Image fix that was included in 19.2.0.
- The class initializer of `io.netty.handler.ssl.util.ThreadLocalInsecureRandom` needs to be initialized at runtime to ensure that the generated values are trully random and not fixed for each generated image.
- The class initializers of `io.netty.buffer.AbstractReferenceCountedByteBuf` and `io.netty.util.AbstractReferenceCounted` compute field offsets. While the field offset recomputation is necessary for correct execution as a native image these initializers also have logic that depends on the presence/absence of `sun.misc.Unsafe`, e.g., via the `-Dio.netty.noUnsafe=true` flag. The fix is to push these initializers to runtime so that the field offset lookups (and the logic depending on them) run at run time. This way no manual substitutions are necessary either.
Modifications:
Add `META-INF/native-image` configuration files that correctly trigger the inialization of the above classes at run time via `--initialize-at-run-time=...` flags.
Result:
Fixes the initialisation issues described above for Netty executables built with GraalVM.
Motivation
The epoll transport was updated in #7834 to decouple setting of the
timerFd from the event loop, so that scheduling delayed tasks does not
require waking up epoll_wait. To achieve this, new overridable hooks
were added in the AbstractScheduledEventExecutor and
SingleThreadEventExecutor superclasses.
However, the minimumDelayScheduledTaskRemoved hook has no current
purpose and I can't envisage a _practical_ need for it. Removing
it would reduce complexity and avoid supporting this specific
API indefinitely. We can add something similar later if needed
but the opposite is not true.
There also isn't a _nice_ way to use the abstractions for
wakeup-avoidance optimizations in other EventLoops that don't have a
decoupled timer.
This PR replaces executeScheduledRunnable and
wakesUpForScheduledRunnable
with two new methods before/afterFutureTaskScheduled that have slightly
different semantics:
- They only apply to additions; given the current internals there's no
practical use for removals
- They allow per-submission wakeup decisions via a boolean return val,
which makes them easier to exploit from other existing EL impls (e.g.
NIO/KQueue)
- They are subjectively "cleaner", taking just the deadline parameter
and not exposing Runnables
- For current EL/queue impls, only the "after" hook is really needed,
but specialized blocking queue impls can conditionally wake on task
submission (I have one lined up)
Also included are further optimization/simplification/fixes to the
timerFd manipulation logic.
Modifications
- Remove AbstractScheduledEventExecutor#minimumDelayScheduledTaskRemoved()
and supporting methods
- Uplift NonWakeupRunnable and corresponding default wakesUpForTask()
impl from SingleThreadEventLoop to SingleThreadEventExecutor
- Change executeScheduledRunnable() to be package-private, and have a
final impl in SingleThreadEventExecutor which triggers new overridable
hooks before/afterFutureTaskScheduled()
- Remove unnecessary use of bookend tasks while draining the task queue
- Use new hooks to add simpler wake-up avoidance optimization to
NioEventLoop (primarily to demonstrate utility/simplicity)
- Reinstate removed EpollTest class
In EpollEventLoop:
- Refactor to use only the new afterFutureTaskScheduled() hook for
updating timerFd
- Fix setTimerFd race condition using a monitor
- Set nextDeadlineNanos to a negative value while the EL is awake and
use this to block timer changes from outside the EL. Restore the
known-set value prior to sleeping, updating timerFd first if necessary
- Don't read from timerFd when processing expiry event
Result
- Cleaner API for integrating with different EL/queue timing impls
- Fixed race condition to avoid missing scheduled wakeups
- Eliminate unnecessary timerFd updates while EL is awake, and
unnecessary expired timerFd reads
- Avoid unnecessary scheduled-task wakeups when using NIO transport
I did not yet further explore the suggestion of using
TFD_TIMER_ABSTIME for the timerFd.
Motivation:
AsciiString.contentEqualsIgnoreCase may return true for non-matching strings of equal length when offset is non zero.
Modifications:
- Correctly take offset into account
- Add unit test
Result:
Fixes#9475
Motivation:
EPOLL supports decoupling the timed wakeup mechanism from the selector call. The EPOLL transport takes advantage of this in order to offer more fine grained timer resolution. However we are current calling timerfd_settime on each call to epoll_wait and this is expensive. We don't have to re-arm the timer on every call to epoll_wait and instead only have to arm the timer when a task is scheduled with an earlier expiration than any other existing scheduled task.
Modifications:
- Before scheduled tasks are added to the task queue, we determine if the new
duration is the soonest to expire, and if so update with timerfd_settime. We
also drain all the tasks at the end of the event loop to make sure we service
any expired tasks and get an accurate next time delay.
- EpollEventLoop maintains a volatile variable which represents the next deadline to expire. This variable is modified inside the event loop thread (before calling epoll_wait) and out side the event loop thread (immediately to ensure proper wakeup time).
- Execute the task queue before the schedule task priority queue. This means we
may delay the processing of scheduled tasks but it ensures we transfer all
pending tasks from the task queue to the scheduled priority queue to run the
soonest to expire scheduled task first.
- Deprecate IORatio on EpollEventLoop, and drain the executor and scheduled queue on each event loop wakeup. Coupling the amount of time we are allowed to drain the executor queue to a proportion of time we process inbound IO may lead to unbounded queue sizes and unpredictable latency.
Result:
Fixes https://github.com/netty/netty/issues/7829
- In most cases this results in less calls to timerfd_settime
- Less event loop wakeups just to check for scheduled tasks executed outside the event loop
- More predictable executor queue and scheduled task queue draining
- More accurate and responsive scheduled task execution
Motivation:
Users' runtime systems may have incompatible dynamic libraries to the ones our
tcnative wrappers link to. Unfortunately, we cannot determine and catch these
scenarios (in which the JVM crashes) but we can make a more educated guess on
what library to load and try to find one that works better before crashing.
Modifications:
1) Build dynamically linked openSSL builds for more OSs (netty-tcnative)
2) Load native linux libraries with matching classifier (first)
Result:
More developers / users can use the dynamically-linked native libraries.
Motivation:
We did miss to call reclaimSpace(...) in one case which can lead to the situation of having the Recycler to not correctly reclaim space and so just create new objects when not needed.
Modifications:
Correctly call reclaimSpace(...)
Result:
Recycler correctly reclaims space in all situations.
Motivation:
Netty homepage(netty.io) serves both "http" and "https".
It's recommended to use https than http.
Modification:
I changed from "http://netty.io" to "https://netty.io"
Result:
No effects.
Motivation:
There are is some unnecessary code (like toString() calls) which can be cleaned up.
Modifications:
- Remove not needed toString() calls
- Simplify subString(...) calls
- Remove some explicit casts when not needed.
Result:
Cleaner code
Motivation:
There is some manual coping of elements of Collections which can be replaced by Collections.addAll(...) and also some unnecessary semicolons.
Modifications:
- Simplify branches
- Use Collections.addAll
- Code cleanup
Result:
Code cleanup
Motivation:
Some methods that either override others or are implemented as part of implementation an interface did miss the `@Override` annotation
Modifications:
Add missing `@Override`s
Result:
Code cleanup
Motivation:
asList should only be used if there are multiple elements.
Modification:
Call to asList with only one argument could be replaced with singletonList
Result:
Cleaner code and a bit of memory savings
Motivation:
Sometimes it is desirable to be able to use a different Queue implementation for the EventLoop of a Channel. This is currently not possible without resort to reflection.
Modifications:
- Add a new constructor to Nio|Epoll|KQueueEventLoopGroup which allows to specify a factory which is used to create the task queue. This was the user can override the default implementation.
- Add test
Result:
Be able to change Queue that is used for the EventLoop.
Motivation
A Semaphore is currently dedicated to this purpose but a simple
CountDownLatch will do.
Modification
Remove private threadLock Semaphore from SingleThreadEventExecutor and just use a CountDownLatch.
Also eliminate use of PlatformDependent.throwException() in startThread
method, and combine some nested if clauses.
Result
Cleaner EventLoop termination notification.
Motivation:
OOME is occurred by increasing suppressedExceptions because other libraries call Throwable#addSuppressed. As we have no control over what other libraries do we need to ensure this can not lead to OOME.
Modifications:
Only use static instances of the Exceptions if we can either dissable addSuppressed or we run on java6.
Result:
Not possible to OOME because of addSuppressed. Fixes https://github.com/netty/netty/issues/9151.
Motivation
@xiaoheng1 reported incorrect behaviour of AsciiString.lastIndexOf in
#9099. Upon closer inspection it appears that it was never implemented
correctly and searches between the provided index and the end of the
string similar to indexOf(...), rather than between the provided index
and the beginning of the string as the javadoc states (and in line with
java.lang.String).
Modifications
Fix AsciiString.lastIndexOf implementation and corresponding unit tests
to behave the same as the equivalent String methods.
Result
Fixes#9099
Motivation:
An instance is always equal to itself. It makes sense to skip processing for this case, which isn't uncommon since `AsciiString` is often memoized within an application when used as HTTP header names.
Modification:
`contentEquals` methods first check for instance equality before doing processing.
Result:
`contentEquals` will be faster when comparing an instance with itself.
I couldn't find any unit tests for these methods, only the static version. Let me know if I should add something to `AsciiStringCharacterTest`.
Came up here:
https://github.com/line/armeria/pull/1731#discussion_r280396280
Motivation:
GraalVM native images are a new way to deliver java applications. Netty is one of the most popular libraries however there are a few limitations that make it impossible to use with native images out of the box. Adding a few metadata (in specific modules will allow the compilation to success and produce working binaries)
Modification:
Added properties files in `META-INF` and substitutions classes (under `internal.svm`) will solve the compilation issues. The substitutions classes are not visible and do not have a public constructor so they are not visible to end users.
Result:
Fixes#8959
This fix is very conservative as it applies the minimum config required to build:
* pure netty servers
* vert.x applications
* grpc applications
The build is having trouble due to checkstyle which does not seem to be able to find the copyright notice on property files.
Motivation:
In GlobalEventExecutorTest we used Thread.sleep(...) which can produce flaky results (as seen on the CI). We should use another alternative during tests.
Modifications:
Replace Thread.sleep(...) with join()
Result:
No more flaky GlobalEventExecutor tests.
Motivation:
IKVM.NET seems to ship a bug sun.misc.Unsafe class, for this reason we should better disable our sun.misc.Unsafe usage when we detect IKVM.NET is used.
Modifications:
Check if IKVM.NET is used and if so do not use sun.misc.Unsafe by default.
Result:
Fixes https://github.com/netty/netty/issues/9035 and https://github.com/netty/netty/issues/8916.
Motivation
AbstractReferenceCounted and AbstractReferenceCountedByteBuf contain
duplicate logic for managing the volatile refcount in an optimized and
consistent manner, which increased in complexity in #8583. It's possible
to extract this into a common helper class now that all access is via an
AtomicIntegerFieldUpdater.
Modifications
- Move duplicate logic into a shared ReferenceCountUpdater class
- Incorporate some additional simplification for the most common single
increment/decrement cases (fewer checks/operations)
Result
Less code duplication, better encapsulation of the "non-trivial"
internal volatile refcount manipulation
Motivation:
PooledByteBufAllocator uses a PoolThreadCache per Thread that allocates / deallocates to minimize the performance overhead. This PoolThreadCache is trimmed after X allocations to free up buffers that are not allocated for a long time. This works out quite well when the app continues to allocate but fails if the app stops to allocate frequently (for whatever reason) and so a lot of memory is wasted and not given back to the arena / freed.
Modifications:
- Add a ThreadExecutorMap that offers multiple methods that wrap Runnable / ThreadFactory / Executor and allow to call ThreadExecutorMap.currentEventExecutor() to get the current executing EventExecutor for the calling Thread.
- Use these methods in the constructors of our EventExecutor implementations (which also covers the EventLoop implementations)
- Add io.netty.allocator.cacheTrimIntervalMillis system property which can be used to specify a fixed rate / interval on which we should try to trim the PoolThreadCache for a EventExecutor that allocates.
- Add PooledByteBufAllocator.trimCurrentThreadCache() to allow the user to trim the cache of the calling thread manually.
- Add testcases
- Introduce FastThreadLocal.getIfExists()
Result:
Allow to better / more frequently trim PoolThreadCache and so give back memory to the area / system.
Motivation:
We dont use ObjectCleaner in our FastThreadLocal anymore so we also dont need to take special care to store it there anymore.
Modifications:
Remove code that is not needed anymore.
Result:
Code cleanup.
Motivation:
GlobalEventExecutor does already provide all guarantees of OrderedEventExecutor so it should implement it.
Modifications:
Let GlobalEventExecutor implement OrderedEventExecutor.
Result:
Make it more clear how execution order is handled in GlobalEventExecutor.
Motivation:
This counter is very useful in order to monitor Netty without having every ByteBufAllocator in the JVM
Modification:
Expose the value of DIRECT_MEMORY_COUNTER as we are already doing for DIRECT_MEMORY_LIMIT.
We are returning -1 in case that DIRECT_MEMORY_COUNTER is not available.
Result:
Be able to get the amount of direct memory used.
Motivation:
We had a typo in NativeLibraryLoader debug log message which could misslead the user.
Modifications:
Fix typo to correctly state java.library.path
Result:
Correct and less confusing log message
Motivation:
2bb9f64e16 introduced a change which made it possible to use different shaded versions of netty-tcnative on the classpath. This only partly worked as we did not correctly handled the case when os / arch is part of the library name (which is the case when netty-tcnative-boringssl-static is used with the uber jar).
Modifications:
- If patching the ID failed we retry again with the os / arch stripped
- Add unit tests to verify that patching ID now works with and without os / arch as suffix.
Result:
Using multiple shaded version of netty-tcnative-boringssl-static on MacOS works.
Motivation:
We should run a CI job using J9 to ensure netty also works when using different JVMs.
Modifications:
- Adjust PooledByteBufAllocatorTest to be able to complete faster when using a JVM which takes longer when joining Threads (this seems to be the case with J9).
- Skip UDT tests on J9 as UDT is not supported there.
Result:
Be able to run CI against J9.
Motivation:
PromiseCombiner is not thread-safe and even assumes all added Futures are using the same EventExecutor. This is kind of fragile as we do not enforce this. We need to enforce this contract to ensure it's safe to use and easy to spot concurrency problems.
Modifications:
- Add new contructor to PromiseCombiner that takes an EventExecutor and deprecate the old non-arg constructor.
- Check if methods are called from within the EventExecutor thread and if not fail
- Correctly dispatch on the right EventExecutor if the Future uses a different EventExecutor to eliminate concurrency issues.
Result:
More safe use of PromiseCombiner + enforce correct usage / contract.
Motivation:
https://github.com/netty/netty/pull/8866 added support for calling Iterator.remove() but did not add a testcase.
Modifications:
Add testcase to ensure removal works.
Result:
Better test-coverage.
Motivation:
As ActiveMQ project using netty, we want to make use of this class, unfortunately the iterator on values(), seems to not support remove method, even so the delegated iterator does. Currently we have to clone and modify this class locally albeit a one line change is needed, it would be ideal if netty could allow remove, then removing the need to maintain a clone.
Modifications:
* remove throws UnsupportedOperationException, and instead call remove method on delegated iterator
Result:
Be able to call Iterator.remove() for the values.
Motivation:
When users' /tmp is noexec, NativeLibraryLoader logs a message informing
them how to fix the problem by setting a system property. However, if
Netty has been shaded that message will tell them to set the un-shaded
system property name, which won't work.
Modifications:
Change the code to let shading tools rename the native.workdir property
name reference within user-visible log messages.
Notably, debug logs were _not_ changed, as there's many debug statements
including a variety of property names. Fixing them would be a much more
invasive change and have limited benefit.
Result:
The users will see the correctly-named system property to set if they
are using a noexec /tmp.
Motivation:
When we fail a call to PromiseCombiner.finish(...) because of a null argument we must not update the internal state before throwing.
Modifications:
- First do the null check and only after we validated that the argument is not null update the internal state
- Add test case.
Modifications:
Do not mess up internal state of PromiseCombiner when finish(...) is called with a null argument.
Result:
After your change, what will change.
Motivation:
Just was looking through code and found 1 interesting place DateFormatter.tryParseMonth that was not very effective, so I decided to optimize it a bit.
Modification:
Changed DateFormatter.tryParseMonth method. Instead of invocation regionMatch() for every month - compare chars one by one.
Result:
DateFormatter.parseHttpDate method performance improved from ~3% to ~15%.
Benchmark (DATE_STRING) Mode Cnt Score Error Units
DateFormatter2Benchmark.parseHttpHeaderDateFormatter Sun, 27 Jan 2016 19:18:46 GMT thrpt 6 4142781.221 ± 82155.002 ops/s
DateFormatter2Benchmark.parseHttpHeaderDateFormatter Sun, 27 Dec 2016 19:18:46 GMT thrpt 6 3781810.558 ± 38679.061 ops/s
DateFormatter2Benchmark.parseHttpHeaderDateFormatterNew Sun, 27 Jan 2016 19:18:46 GMT thrpt 6 4372569.705 ± 30257.537 ops/s
DateFormatter2Benchmark.parseHttpHeaderDateFormatterNew Sun, 27 Dec 2016 19:18:46 GMT thrpt 6 4339785.100 ± 57542.660 ops/s
Motivation
As pointed out by @91he in
https://github.com/netty/netty/pull/8595#issuecomment-459181794, there
is a remaining bug in LocationAwareSlf4JLogger following the updates
done in #8595. The logging methods which take a varargs message
parameter array should format using MessageFormatter.arrayFormat rather
than MessageFormatter.format.
Modifications
Change varargs param methods in LocationAwareSlf4JLogger to use
MessageFormatter.arrayFormat and extend unit test to cover these cases.
Results
Correct log output when logging messages with > 2 parameters when using
LocationAwareSlf4JLogger.
Motivation:
If there are no listeners attached to the promise when full-filling it we do not need to schedule a task to notify.
Modifications:
- Don't schedule a task if there is nothing to notify.
- Add unit tests.
Result:
Fixes https://github.com/netty/netty/issues/8795.
Motivation:
To conform to the CharSequence interface we need to return an empty CharSequence when start == end index and a subSequence is requested.
Modifications:
- Correctly handle the case where start == end
- Add unit test
Result:
Fix https://github.com/netty/netty/issues/8796.
Motivation:
We need to update to a new checkstyle plugin to allow the usage of lambdas.
Modifications:
- Update to new plugin version.
- Fix checkstyle problems.
Result:
Be able to use checkstyle plugin which supports new Java syntax.
Motivation:
Incrementing two variables in sync is not necessary when only one will do.
Modifications:
- Remove `j` from `for` loop and replace with `i`.
- Add more unit testing scenarios to cover changed code.
Results:
Unnecessary variable removed.
Motivation:
We should try removing all FastThreadLocals for the Thread before we notify the termination. future. The user may block on the future and once it unblocks the JVM may terminate and start unloading classes.
Modifications:
Remove all FastThreadLocals for the Thread before notify termination future.
Result:
Fixes https://github.com/netty/netty/issues/6596.
Motivation:
In windows if the project is in a path that contains whitespace,
resources cannot be accessed and tests fail.
Modifications:
Adds ResourcesUtil.java in netty-common. Tests use ResourcesUtil.java to access a resource.
Result:
Being able to build netty in a path containing whitespace
Motivation
#8563 highlighted race conditions introduced by the prior optimistic
update optimization in 83a19d5650. These
were known at the time but considered acceptable given the perf
benefit in high contention scenarios.
This PR proposes a modified approach which provides roughly half the
gains but stronger concurrency semantics. Race conditions still exist
but their scope is narrowed to much less likely cases (releases
coinciding with retain overflow), and even in those
cases certain guarantees are still assured. Once release() returns true,
all subsequent release/retains are guaranteed to throw, and in
particular deallocate will be called at most once.
Modifications
- Use even numbers internally (including -ve) for live refcounts
- "Final" release changes to odd number (equivalent to refcount 0)
- Retain still uses faster getAndAdd, release uses CAS loop
- First CAS attempt uses non-volatile read
- Thread.yield() after a failed CAS provides a net gain
Result
More (though not completely) robust concurrency semantics for ref
counting; increased latency under high contention, but still roughly
twice as fast as the original logic. Bench results to follow
Motivation:
We did miss to use MessageFormatter inside LocationAwareSlf4jLogger and so {} was not correctly replaced in log messages when using slf4j.
This regression was introduced by afe0767e9c.
Modifications:
- Make use of MessageFormatter
- Add unit test.
Result:
Fixes https://github.com/netty/netty/issues/8483.
Motivation:
We can change from using compareAndSet to addAndGet, which emits a different CPU instruction on x86 (CMPXCHG to XADD) when count direct memory usage. This instruction is cheaper in general and so produce less overhead on the "happy path". If we detect too much memory usage we just rollback the change before throwing the Error.
Modifications:
Replace compareAndSet(...) with addAndGet(...)
Result:
Less overhead when tracking direct memory.
Motivation:
We should allow adjustment of the leak detecting sampling interval when in SAMPLE mode.
Modifications:
Added new int property io.netty.leakDetection.samplingInterval
Result:
Be able to consume changes made by the user.
Motivation:
There is a racy UnsupportedOperationException instead because the task removal is delegated to MpscChunkedArrayQueue that does not support removal. This happens with SingleThreadEventExecutor that overrides the newTaskQueue to return an MPSC queue instead of the LinkedBlockingQueue returned by the base class such as NioEventLoop, EpollEventLoop and KQueueEventLoop.
Modifications:
- Catch the UnsupportedOperationException
- Add unit test.
Result:
Fix#8475
Motivation:
allLeaks is to store the DefaultResourceLeak. When we actually use it, the key is DefaultResourceLeak, and the value is actually a meaningless value.
We only care about the keys of allLeaks and don't care about the values. So Set is more in line with this scenario.
Using Set as a container is more consistent with the definition of a container than Map.
Modification:
Replace allLeaks with set. Create a thread-safe set using 'Collections.newSetFromMap(new ConcurrentHashMap<DefaultResourceLeak<?>, Boolean>()).'
Motivation:
HWT does not support anything smaller then 1ms so we should make it clear that this is the case.
Modifications:
Log a warning if < 1ms is used.
Result:
Less suprising behaviour.
Motivation:
In netty we use our own max direct memory limit that can be adjusted by io.netty.maxDirectMemory but we do not take this in acount when maxDirectMemory() is used. That will lead to non optimal configuration of PooledByteBufAllocator in some cases.
This came up on stackoverflow:
https://stackoverflow.com/questions/53097133/why-is-default-num-direct-arena-derived-from-platformdependent-maxdirectmemory
Modifications:
Correctly respect io.netty.maxDirectMemory and so configure PooledByteBufAllocator correctly by default.
Result:
Correct value for max direct memory.
Motivation:
There are currently many more places where this could be used which were
possibly not considered when the method was added.
If https://github.com/netty/netty/pull/8388 is included in its current
form, a number of these places could additionally make use of the same
BYTE_ARRAYS threadlocal.
There's also a couple of adjacent places where an optimistically-pooled
heap buffer is used for temp byte storage which could use the
threadlocal too in preference to allocating a temp heap bytebuf wrapper.
For example
https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/ByteBufUtil.java#L1417.
Modifications:
Replace new byte[] with PlatformDependent.allocateUninitializedArray()
where appropriate; make use of ByteBufUtil.getBytes() in some places
which currently perform the equivalent logic, including avoiding copy of
backing array if possible (although would be rare).
Result:
Further potential speed-up with java9+ and appropriate compile flags.
Many of these places could be on latency-sensitive code paths.
Motivation:
trackedObject != null gives no guarantee that trackedObject remains reachable. This may cause problems related to premature finalization: false leak detector warnings.
Modifications:
Add private method reachabilityFence0 that works on JDK 8 and can be factored out into PlatformDependent. Later, it can be swapped for the real Reference.reachabilityFence.
Result:
No false leak detector warnings in future versions of JDK.
Motivation:
DefaultResourceLeak.toString() did include the wrong value for duplicated records.
Modifications:
Include the correct value.
Result:
Correct toString() implementation.
Motivation:
Java since version 6 has the wrapper for the ConcurrentHashMap that could be created via Collections.newSetFromMap(map). So no need to create own ConcurrentSet class. Also, since netty plans to switch to Java 8 soon there is another method for that - ConcurrentHashMap.newKeySet().
For now, marking this class @deprecated would be enough, just to warn users who use netty's ConcurrentSet. After switching to Java 8 ConcurrentSet should be removed and replaced with ConcurrentHashMap.newKeySet().
Modification:
ConcurrentSet deprecated.
Motivation:
Seems like IntegerHolder counterHashCode field is the very old legacy field that is no longer used. Should be marked as deprecated and removed in the future versions.
Modification:
IntegerHolder class, InternalThreadLocalMap.counterHashCode() and InternalThreadLocalMap.setCounterHashCode(IntegerHolder counterHashCode) are now deprecated.
Motivation:
When a X509TrustManager is used while configure the SslContext the JDK automatically does some extra checks during validation of provided certs by the remote peer. We should do the same when our native implementation is used.
Modification:
- Automatically wrap a X509TrustManager and so do the same validations as the JDK does.
- Add unit tests.
Result:
More consistent behaviour. Fixes https://github.com/netty/netty/issues/6664.
Motivation:
The Epoll transport checks to see if there are any scheduled tasks
before entering epoll_wait, and resets the timerfd just before.
This causes an extra syscall to timerfd_settime before doing any
actual work. When scheduled tasks aren't added frequently, or
tasks are added with later deadlines, this is unnecessary.
Modification:
Check the *deadline* of the peeked task in EpollEventLoop, rather
than the *delay*. If it hasn't changed since last time, don't
re-arm the timer
Result:
About 2us faster on gRPC RTT 50pct latency benchmarks.
Before (2 runs for 5 minutes, 1 minute of warmup):
```
50.0%ile Latency (in nanos): 64267
90.0%ile Latency (in nanos): 72851
95.0%ile Latency (in nanos): 78903
99.0%ile Latency (in nanos): 92327
99.9%ile Latency (in nanos): 119691
100.0%ile Latency (in nanos): 13347327
QPS: 14933
50.0%ile Latency (in nanos): 63907
90.0%ile Latency (in nanos): 73055
95.0%ile Latency (in nanos): 79443
99.0%ile Latency (in nanos): 93739
99.9%ile Latency (in nanos): 123583
100.0%ile Latency (in nanos): 14028287
QPS: 14936
```
After:
```
50.0%ile Latency (in nanos): 62123
90.0%ile Latency (in nanos): 70795
95.0%ile Latency (in nanos): 76895
99.0%ile Latency (in nanos): 90887
99.9%ile Latency (in nanos): 117819
100.0%ile Latency (in nanos): 14126591
QPS: 15387
50.0%ile Latency (in nanos): 61021
90.0%ile Latency (in nanos): 70311
95.0%ile Latency (in nanos): 76687
99.0%ile Latency (in nanos): 90887
99.9%ile Latency (in nanos): 119527
100.0%ile Latency (in nanos): 6351615
QPS: 15571
```
* Log the correct line-number when using SLF4j with netty if possible.
Motivation:
At the moment we do not log the correct line number in many cases as it will log the line number of the logger wrapper itself. Slf4j does have an extra interface that can be used to filter out these nad make it more usable with logging wrappers.
Modifications:
Detect if the returned logger implements LocationAwareLogger and if so make use of its extra methods to be able to log the correct origin of the log request.
Result:
Better logging when using slf4j.
Motivation:
In Java8 and earlier we used reflection to replace the used key set if not otherwise told. This does not work on Java9 and later without special flags as its not possible to call setAccessible(true) on the Field anymore.
Modifications:
- Use Unsafe to instrument the Selector with out special set when sun.misc.Unsafe is present and we are using Java9+.
Result:
NIO transport produce less GC on Java9 and later as well.
Motivation:
In Java8 and earlier we used reflection to detect if unaligned access is supported. This fails in Java9 and later as we would need to change the accessible level of the method.
Lucky enough we can use Unsafe directly to read the content of the static field here.
Modifications:
Add special handling for detecting if unaligned access is supported on Java9 and later which does not fail due jigsaw.
Result:
Better and more correct detection on Java9 and later.
Motivation:
At the moment we will just assume the correct version of log4j2 is used when we find it on the classpath. This may lead to an AbstractMethodError at runtime. We should not use log4j2 if the version is not correct.
Modifications:
Check on class loading if we can use Log4J2 or not.
Result:
Fixes#8217.
Motivation:
Log4J2Logger had some code-duplication with AbstractInternalLogger
Modifications:
Reuse AbstractInternaLogger.EXCEPTION_MESSAGE in Log4J2Logger and so remove some code-duplication
Result:
Less duplicated code.
* We should be able to use the ByteBuffer cleaner on java8 (and earlier versions) even if sun.misc.Unsafe is not present.
Motivation:
At the moment we have a hard dependency on sun.misc.Unsafe to use the Cleaner on Java8 and earlier. This is not really needed as we can still use pure reflection if sun.misc.Unsafe is not present.
Modifications:
Refactor Cleaner6 to fallback to pure reflection if sun.misc.Unsafe is not present on system.
Result:
More timely releasing of direct memory on Java8 and earlier when sun.misc.Unsafe is not present.
Motivation:
f77891cc17 changed slightly how we detect if we should prefer direct buffers or not but did miss to also take this into account when logging.
Modifications:
Fix branch for log message to reflect changes in f77891cc17.
Result:
Correct logging.
Motivation:
There was a race condition between the task submitter and task executor threads such that the last Runnable submitted may not get executed.
Modifications:
The bug was fixed by checking the task queue and state in the task executor thread after it saw the task queue was empty.
Result:
Fixes#8230
Motivation:
We should prefer direct buffers whenever we can use the cleaner even if sun.misc.Unsafe is not present.
Modifications:
Correctly prefer direct buffers in all cases.
Result:
More correct code.
Motivation:
CleanerJava9 currently fails whever a SecurityManager is installed. We should make use of AccessController.doPrivileged(...) so the user can give it the correct rights.
Modifications:
Use doPrivileged(...) when needed.
Result:
Fixes https://github.com/netty/netty/issues/8201.
Motivation:
Recycler may produce a NPE when the same object is recycled multiple times from different threads.
Modifications:
- Check if the id has changed or if the Stack became null and if so throw an IllegalStateException
- Add unit test
Result:
Fixes https://github.com/netty/netty/issues/8220.
* Try to monkey-patch library id when shading is used and we are on MacOS / OSX.
Motivation:
ea4c315b45 did ensure we support using multiple versions of the same shaded native library but the user still needed to run install_name_tool -id on MacOS to ensure the ID is unique.
This is kind of error prone and also means that the shading itself would need to be done on MacOS / OSX.
This is related to https://github.com/netty/netty/issues/7272.
Modifications:
- Monkey patch the shaded native lib on MacOS to ensure the id is unique while unpacking it to the tempory location.
Result:
Easier way of using shaded native libs in netty.
Motivation:
Java9 and later does the safepoint polling by itself so there is not need for us to do it.
Modifications:
Check for java version before doing manual safepoint polling.
Result:
Less custom code and less overhead when using java9 and later. Fixes https://github.com/netty/netty/issues/8122.
Motivation:
We do not correctly check for previous calles of setUncancellable() in getNow() which may result in ClassCastException as we incorrectly return the internally UNCANCELLABLE object and not null if setUncancellable() we as called before.
Modifications:
Correctly check for UNCANCELLABLE and add unit test.
Result:
Fixes https://github.com/netty/netty/issues/8135.
Motivation:
We incorrectly calculated the length that was used for our for loop in AsciiString.indexOf(...). This lead to a possible ArrayIndexOutOfBoundsException.
Modifications:
- Not include the start in the length calculation
- Add unit test.
Result:
Fixes https://github.com/netty/netty/issues/8112.
Motivation:
Users should not see a scary log message when Netty is initialized if
Netty configuration explicitly disables unsafe. The log message that
produces this warning was previously guarded but by recent refactoring
a bug was introduced inside the guard helper method.
Modifications:
This commit brings back the guard against the scary log message if
unsafe is explicitly disabled.
Result:
No log message is produced when unsafe is unavailable because Netty was
told to not look for it.
Relates https://github.com/netty/netty/pull/5624, https://github.com/netty/netty/pull/6696
Motivation:
ObjectCleaner does start a Thread to handle the cleaning of resources which leaks into the users application. We should not use it in netty itself to make things more predictable.
Modifications:
- Remove usage of ObjectCleaner and use finalize as a replacement when possible.
- Clarify javadocs for FastThreadLocal.onRemoval(...) to ensure its clear that remove() is not guaranteed to be called when the Thread completees and so this method is not enough to guarantee cleanup for this case.
Result:
Fixes https://github.com/netty/netty/issues/8017.
Motivation:
I'm not sure if trivial changes like this are interesting :-) But I
noticed that the PlatformDependent.maxDirectMemory0() method is called
twice unnecessarily during static initialization (on the default path at
least).
Modifications:
Use constant MAX_DIRECT_MEMORY already set to the same value instead of
calling maxDirectMemory0() again.
Result:
A surely imperceivable reduction in operations performed at startup.