Motivation
Underlying array allocations in UnpooledHeapByteBuf are intended be done
via the protected allocateArray(int) method, so that they can be tracked
and/or overridden by subclasses, for example
UnpooledByteBufAllocator$InstrumentedUnpooledHeapByteBuf or #8015. But
it looks like an explicit allocation was missed in the copy(int,int)
method.
Modification
Just use alloc().heapBuffer(...) for the allocation
Result
No possibility of "missing" array allocations when ByteBuf#copy is used.
Motivation:
#9224 introduced overrides of ByteBufUtil#writeUtf8(...) and related
methods to operate on a sub-CharSequence directly to save having to
allocate substrings, but it missed an edge case where the subsequence
does not extend to the end of the CharSequence and the last char in the
sequence is a high surrogate.
Due to the catch-IndexOutOfBoundsException optimization that avoids an
additional bounds check, it would be possible to read past the specified
end char index and successfully decode a surrogate pair which would
otherwise result in a '?' byte being written.
Modifications:
- Check for end-of-subsequence before reading next char after a high
surrogate is encountered in the
writeUtf8(AbstractByteBuf,int,CharSequence,int,int) and
utf8BytesNonAscii methods
- Add unit test for this edge case
Result:
Bug is fixed.
This removes the bounds-check-avoidance optimization but it does not
appear to have a measurable impact on benchmark results, including when
the char sequence contains many surrogate pairs (which should be rare in
any case).
* Correctly take length of ByteBufInputStream into account for readLine() / readByte()
Motivation:
ByteBufInputStream did not correctly take the length into account when validate bounds for readLine() / readByte() which could lead to read more then allowed.
Modifications:
- Correctly take length into account
- Add unit tests
- Fix existing unit test
Result:
Correctly take length of ByteBufInputStream into account.
Related to https://github.com/netty/netty/pull/9306.
Motivation:
buffer.isReadable() should not be used to limit the amount of data that can be read as the amount may be less then was is readable.
Modification:
- Use available() which takes the length into account
- Add unit test
Result:
Fixes https://github.com/netty/netty/issues/9305
Motivation:
Fix the issue of incorrectly calculating the number of dump rows when using prettyHexDumpmethod in ByteBufUtil. The way to find the remainder is either length % 16 or length & 15
Modification:
Fixed the way to calculate the remainder
Result:
Fixed#9301
Motivation:
Some methods that either override others or are implemented as part of implementation an interface did miss the `@Override` annotation
Modifications:
Add missing `@Override`s
Result:
Code cleanup
Motivation:
At the moment EmptyByteBuf.getCharSequence(0,...) will return null while it must return a "".
Modifications:
- Let EmptyByteBuf.getCharSequence(0,...) return ""
- Add unit test
Result:
Fixes https://github.com/netty/netty/issues/9271.
Motivation
It would be useful to be able to write UTF-8 encoded subsequence of
CharSequence characters to a ByteBuf without needing to create a
temporary object via CharSequence#subSequence().
Modification
Add overloads of ByteBufUtil writeUtf8, reserveAndWriteUtf8 and
utf8Bytes methods which take explicit subsequence bounds.
Result
More efficient writing of substrings to byte buffers possible
Motivation
There's quite a lot of duplicate/equivalent logic across the various
concrete ByteBuf implementations. We could take this even further but
for now I've focused on the PooledByteBuf sub-hierarchy.
Modifications
- Move common logic/methods into existing PooledByteBuf abstract
superclass
- Shorten PooledByteBuf.capacity(int) method implementation
Result
Less code to maintain
Motivation
While digging around looking at something else I noticed that these
share a lot of logic and it would be nice to reduce that duplication.
Modifications
Have UnpooledUnsafeDirectByteBuf extend UnpooledDirectByteBuf and make
adjustments to ensure existing behaviour remains unchanged.
The most significant addition needed to UnpooledUnsafeDirectByteBuf was
re-overriding the getPrimitive/setPrimitive methods to revert back to
the AbstractByteBuf versions which include bounds checks
(UnpooledDirectByteBuf excludes these as an optimization, relying on
those done by underlying ByteBuffer).
Result
~200 fewer lines, less duplicate logic.
Motivation:
1. Users will be able to use an optimized version of
`UnpooledHeapByteBuf` and override behavior of methods if required.
2. Consistency with `UnpooledDirectByteBuf`, `UnpooledHeapByteBuf`, and
`UnpooledUnsafeDirectByteBuf`.
Modifications:
- Add `public` access modifier to `UnpooledUnsafeHeapByteBuf` class and
ctor;
Result:
Public access for optimized version of `UnpooledHeapByteBuf`.
Motivation
There are a few minor inconsistencies / redundant operations in the
ByteBuf implementations which would be good to fix.
Modifications
- Unnecessary ByteBuffer.duplicate() performed in
CompositeByteBuf.nioBuffer(int,int)
- Add missing checkIndex(...) check to
ReadOnlyByteBufferBuf.nioBuffer(int,int)
- Remove duplicate bounds check in
ReadOnlyByteBufferBuf.getBytes(int,byte[],int,int)
- Omit redundant bounds check in
UnpooledHeapByteBuf.getBytes(int,ByteBuffer)
Result
More consistency and slightly less overhead
Motivation
ByteBuf capacity is automatically increased as needed up to maxCapacity
when writing beyond the buffer's current capacity. However there's no
way to tell in general whether such an increase will result in a
relatively costly internal buffer re-allocation.
For unpooled buffers it always does, in pooled cases it depends on the
size of the associated chunk of allocated memory, which I don't think is
currently exposed in any way.
It would sometimes be useful to know where this limit is when making
external decisions about whether to reuse or preemptively reallocate.
It would also be advantageous to take this limit into account when
auto-increasing the capacity during writes, to defer such reallocation
until really necessary.
Modifications
Introduce new AbstractByteBuf.maxFastWritableBytes() method which will
return a value >= writableBytes() and <= maxWritableBytes().
Make use of the new method in the sizing decision made by the
AbstractByteBuf.ensureWritable(...) methods.
Result
Less reallocation/copying.
Motivation
A small thread-safety bug was introduced during the internal
optimizations of ComponentByteBuf made a while back in #8437. When there
is a single component which was added as a slice,
internalNioBuffer(int,int) will currently return the unwrapped slice's
un-duplicated internal NIO buffer. This is not safe since it could be
modified concurrently with other usage of that parent buffer.
Modifications
Delegate internalNioBuffer to nioBuffer in this case, which returns a
duplicate. This matches what's done in derived buffers in general
(for the same reason). Add unit test.
Result
Fixed possible thread-safety bug
Motivation
Direct buffers are normally preferred when interfacing with raw
sockets. Currently netty will only return direct io buffers (for reading
from a channel) when a platform has unsafe. However, this is
inconsistent with the write-side (filterOutboundMessage) where a direct
byte buffer will be returned if pooling is enabled. This means that
environments without unsafe (and no manual netty configurations) end up
with many pooled heap byte buffers for reading, many pooled direct byte
buffers for writing, and jdk pooled byte buffers (for reading).
Modifications
This commit modifies the AbstractByteBufAllocator to return a direct
byte buffer for io handling when the platform has unsafe or direct byte
buffers are pooled.
Result:
Use direct buffers when direct buffers are pooled for IO.
* Fix incorrect behavior of ReadOnlyByteBufferBuf.getBytes(int,ByteBuffer)
Motivation
It currently will succeed when the destination is larger than the source
range, but the ByteBuf javadoc states this should be a failure, as is
the case with all the other implementations.
Modifications
- Fix logic to fail the bounds check in this case
- Remove explicit null check which isn't done in any equivalent method
- Add unit test
Result
More correct/consistent behaviour
Motivation:
1f93bd3 introduced a regression that could lead to not have the lastAccessed field correctly null'ed out when the endOffset of the internal Component == CompositeByteBuf.readerIndex()
Modifications:
- Correctly null out the lastAccessed field in any case
- Add unit tests
Result:
Fixes regression in CompositeByteBuf.discard*ReadBytes()
Motivation
AbstractReferenceCounted and AbstractReferenceCountedByteBuf contain
duplicate logic for managing the volatile refcount in an optimized and
consistent manner, which increased in complexity in #8583. It's possible
to extract this into a common helper class now that all access is via an
AtomicIntegerFieldUpdater.
Modifications
- Move duplicate logic into a shared ReferenceCountUpdater class
- Incorporate some additional simplification for the most common single
increment/decrement cases (fewer checks/operations)
Result
Less code duplication, better encapsulation of the "non-trivial"
internal volatile refcount manipulation
Motivation:
The CompositeByteBuf discardReadBytes / discardReadComponents methods are currently quite inefficient, including when there are no read components to discard. We would like to call the latter more frequently in ByteToMessageDecoder#COMPOSITE_CUMULATOR.
In the same context it would be beneficial to perform a "shallow copy" of a composite buffer (for example when it has a refcount > 1) to avoid having to allocate and copy the contained bytes just to obtain an "independent" cumulation.
Modifications:
- Optimize discardReadBytes() and discardReadComponents() implementations (start at first comp rather than performing a binary search for the readerIndex).
- New addFlattenedComponents(boolean,ByteBuf) method which performs a shallow copy if the provided buffer is also composite and avoids adding any empty buffers, plus unit test.
- Other minor optimizations to avoid unnecessary checks.
Results:
discardReadXX methods are faster, composite buffers can be easily appended without deepening the buffer "tree" or retaining unused components.
Motivation:
We synchronize on the chunk.arena when produce the String returned by PoolSubpage.toString() which may raise a NPE when chunk == null. Chunk == null for the head of the linked-list and so a NPE may raised by a debugger. This NPE can never happen in real code tho as we never access toString() of the head.
Modifications:
Add null checks and so fix the possible NPE
Result:
No NPE when using a debugger and inspect the PooledByteBufAllocator.
Motivation:
We currently use a thread local cache for all threads which often is suprising to users as it may result in a lot of memory usage if they allocate buffers from outside the EventLoop in different threads. We should better not do this by default to keep suprises to a minimum. Users that need the performance and know what they are doing can still change this.
Modifications:
Change io.netty.allocator.useCacheForAllThreads to false by default
Result:
Related to https://github.com/netty/netty/issues/8536.
Motivation:
In MemoryRegionCache.Entry we use the Recycler to reduce GC pressure and churn. The problem is that these will also be recycled when the PoolThreadCache is collected and finalize() is called. This then can have the effect that we try to load class but the WebApp is already stoped.
This will produce an stacktrace like this on Tomcat:
```
19-Mar-2019 15:53:21.351 INFO [Finalizer] org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading Illegal access: this web application instance has been stopped already. Could not load [java.util.WeakHashMap]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
java.lang.IllegalStateException: Illegal access: this web application instance has been stopped already. Could not load [java.util.WeakHashMap]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading(WebappClassLoaderBase.java:1383)
at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForClassLoading(WebappClassLoaderBase.java:1371)
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1224)
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1186)
at io.netty.util.Recycler$3.initialValue(Recycler.java:233)
at io.netty.util.Recycler$3.initialValue(Recycler.java:230)
at io.netty.util.concurrent.FastThreadLocal.initialize(FastThreadLocal.java:188)
at io.netty.util.concurrent.FastThreadLocal.get(FastThreadLocal.java:142)
at io.netty.util.Recycler$Stack.pushLater(Recycler.java:624)
at io.netty.util.Recycler$Stack.push(Recycler.java:597)
at io.netty.util.Recycler$DefaultHandle.recycle(Recycler.java:225)
at io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry.recycle(PoolThreadCache.java:478)
at io.netty.buffer.PoolThreadCache$MemoryRegionCache.freeEntry(PoolThreadCache.java:459)
at io.netty.buffer.PoolThreadCache$MemoryRegionCache.free(PoolThreadCache.java:430)
at io.netty.buffer.PoolThreadCache$MemoryRegionCache.free(PoolThreadCache.java:422)
at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:279)
at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:270)
at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:241)
at io.netty.buffer.PoolThreadCache.finalize(PoolThreadCache.java:230)
at java.lang.System$2.invokeFinalize(System.java:1270)
at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:102)
at java.lang.ref.Finalizer.access$100(Finalizer.java:34)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:217)
```
Beside this we also need to ensure we not try to lazy load SizeClass when the finalizer is used as it may not be present anymore if the ClassLoader is already destroyed.
This would produce an error like:
```
20-Mar-2019 11:26:35.254 INFO [Finalizer] org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading Illegal access: this web application instance has been stopped already. Could not load [io.netty.buffer.PoolArena$1]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
java.lang.IllegalStateException: Illegal access: this web application instance has been stopped already. Could not load [io.netty.buffer.PoolArena$1]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading(WebappClassLoaderBase.java:1383)
at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForClassLoading(WebappClassLoaderBase.java:1371)
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1224)
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1186)
at io.netty.buffer.PoolArena.freeChunk(PoolArena.java:287)
at io.netty.buffer.PoolThreadCache$MemoryRegionCache.freeEntry(PoolThreadCache.java:464)
at io.netty.buffer.PoolThreadCache$MemoryRegionCache.free(PoolThreadCache.java:429)
at io.netty.buffer.PoolThreadCache$MemoryRegionCache.free(PoolThreadCache.java:421)
at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:278)
at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:269)
at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:240)
at io.netty.buffer.PoolThreadCache.finalize(PoolThreadCache.java:229)
at java.lang.System$2.invokeFinalize(System.java:1270)
at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:102)
at java.lang.ref.Finalizer.access$100(Finalizer.java:34)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:217)
```
Modifications:
- Only try to put the Entry back into the Recycler if the PoolThredCache is not destroyed because of the finalizer.
- Only try to access SizeClass if not triggered by finalizer.
Result:
No IllegalStateException anymoe when a webapp is reloaded in Tomcat that uses netty and uses the PooledByteBufAllocator.
Motivation:
The special case fixed in #8497 also requires that we keep a derived slice when trimming components in place, as done by the capacity(int) and discardReadBytes() methods.
Modifications:
Ensure that we keep a ref to trimmed components' original retained slice in capacity(int) and discardReadBytes() methods, so that it is released properly when the they are later freed. Add unit test which fails prior to the fix.
Result:
Edge case leak is eliminated.
Motivation:
PooledByteBufAllocator uses a PoolThreadCache per Thread that allocates / deallocates to minimize the performance overhead. This PoolThreadCache is trimmed after X allocations to free up buffers that are not allocated for a long time. This works out quite well when the app continues to allocate but fails if the app stops to allocate frequently (for whatever reason) and so a lot of memory is wasted and not given back to the arena / freed.
Modifications:
- Add a ThreadExecutorMap that offers multiple methods that wrap Runnable / ThreadFactory / Executor and allow to call ThreadExecutorMap.currentEventExecutor() to get the current executing EventExecutor for the calling Thread.
- Use these methods in the constructors of our EventExecutor implementations (which also covers the EventLoop implementations)
- Add io.netty.allocator.cacheTrimIntervalMillis system property which can be used to specify a fixed rate / interval on which we should try to trim the PoolThreadCache for a EventExecutor that allocates.
- Add PooledByteBufAllocator.trimCurrentThreadCache() to allow the user to trim the cache of the calling thread manually.
- Add testcases
- Introduce FastThreadLocal.getIfExists()
Result:
Allow to better / more frequently trim PoolThreadCache and so give back memory to the area / system.
Motivation:
We should run a CI job using J9 to ensure netty also works when using different JVMs.
Modifications:
- Adjust PooledByteBufAllocatorTest to be able to complete faster when using a JVM which takes longer when joining Threads (this seems to be the case with J9).
Result:
Be able to run CI against J9.
Motivation:
We can remove some properties for which we introduced replacements.
Modifications:
io.netty.buffer.bytebuf.checkAccessible, io.netty.leakDetectionLevel, org.jboss.netty.tryUnsafe properties removed
Result:
Code cleanup
Motivation:
We can just use Objects.requireNonNull(...) as a replacement for ObjectUtil.checkNotNull(....)
Modifications:
- Use Objects.requireNonNull(...)
Result:
Less code to maintain.
Motivation
There's some miscellaneous cleanup/simplification of CompositeByteBuf
which would help make the code a bit clearer.
Modifications
- Simplify web of constructors and addComponents methods, reducing
duplication of logic
- Rename `Component.freeIfNecessary()` method to just `free()`, which is
less confusing (see #8641)
- Make loop in addComponents0(...) method more verbose/readable (see
https://github.com/netty/netty/pull/8437#discussion_r232124414)
- Simplify addition/subtraction in setBytes(...) methods
Result
Smaller/clearer code
Motivation:
We have a utility method to check for > 0 and >0 arguments. We should use it.
Modification:
use checkPositive/checkPositiveOrZero instead of if statement.
Result:
Re-use utility method.
Motivation:
We can use lambdas now as we use Java8.
Modification:
use lambda function for all package, #8751 only migrate transport package.
Result:
Code cleanup.
Motivation:
As netty 4.x supported Java 6 we had various if statements to check for java versions < 8. We can remove these now.
Modification:
Remove unnecessary if statements that check for java versions < 8.
Result:
Cleanup code.
Motivation:
We need to update to a new checkstyle plugin to allow the usage of lambdas.
Modifications:
- Update to new plugin version.
- Fix checkstyle problems.
Result:
Be able to use checkstyle plugin which supports new Java syntax.
Motivation
In #8758, @doom369 reported an infinite loop bug in CompositeByteBuf
which was introduced in #8437.
This is the same small fix for that, along with fixes for two other bugs
found while re-inspecting the changes and adding unit tests.
Modification
- Replace recursive call to toComponentIndex with toComponentIndex0 as
intended
- Add missed "lastAccessed" racy cache invalidation in capacity(int)
method
- Fix incorrect determination of initial offset in non-zero cIndex case
of updateComponentOffsets method
- New unit tests for previously uncovered methods
Results
Fewer bugs.
Motivation:
Custom Netty ThreadLocalRandom and ThreadLocalRandomProvider classes are no longer needed and can be removed.
Modification:
Remove own ThreadLocalRandom
Result:
Less code to maintain
Motivation:
We can use the diamond operator these days.
Modification:
Use diamond operator whenever possible.
Result:
More modern code and less boiler-plate.
Motivation:
After switching to Java 8 we no longer need LongCounter and LongAdderCounter. We can directly replace them with LongAdder.
Modification:
Removed LongCounter and LongAdderCounter classes.
Result:
Less code to maintain
Motivation:
The javadocs stating `IndexOutOfBoundsException` is thrown were
different from what `ByteBuf` actually did. We want to ensure the
Javadocs represent reality.
Modifications:
Updated javadocs on `write*`, `ensureWriteable`, `capacity`, and
`maxCapacity` methods.
Results:
Javadocs more closely match actual behaviour.
Motivation:
ByteBuf supports “marker indexes”. The intended use case for these is if a speculative operation (e.g. decode) is in process the user can “mark” and interface and refer to it later if the operation isn’t successful (e.g. not enough data). However this is rarely used in practice,
requires extra memory to maintain, and introduces complexity in the state management for derived/pooled buffer initialization, resizing, and other operations which may modify reader/writer indexes.
Modifications:
Remove support for marking and adjust testcases / code.
Result:
Fixes https://github.com/netty/netty/issues/8535.
Motivation:
In versions of Netty prior to 4.1.31.Final, a CompositeByteBuf could be
created with any size (including potentially nonsensical negative
values). This behavior changed in e7737b993, which introduced a bounds
check to only allow for a component size greater than one. This broke
some existing use cases that attempted to create a byte buf with a
single component.
Modifications:
Lower the bounds check on numComponents to include the single component
case, but still throw an exception for anything less than one.
Add unit tests for the case of numComponents being less than, equal to,
and greater than this lower bound.
Result:
Return to the behavior of 4.1.30.Final, allowing one component, but
still include an explicit check against a lower bound.
Note that while creating a CompositeByteBuf with a single component is
in some ways a contradiction of the term "composite", this patch caters
for existing uses while excluding the clearly nonsensical case of asking
for a CompositeByteBuf with zero or fewer components.
Fixes#8613.
Motivation:
Often a temporary ByteBuffer is used which can be cached to reduce the GC pressure.
Modifications:
Cache the ByteBuffer in the PoolThreadCache as well.
Result:
Less GC.
Motivation
#8563 highlighted race conditions introduced by the prior optimistic
update optimization in 83a19d5650. These
were known at the time but considered acceptable given the perf
benefit in high contention scenarios.
This PR proposes a modified approach which provides roughly half the
gains but stronger concurrency semantics. Race conditions still exist
but their scope is narrowed to much less likely cases (releases
coinciding with retain overflow), and even in those
cases certain guarantees are still assured. Once release() returns true,
all subsequent release/retains are guaranteed to throw, and in
particular deallocate will be called at most once.
Modifications
- Use even numbers internally (including -ve) for live refcounts
- "Final" release changes to odd number (equivalent to refcount 0)
- Retain still uses faster getAndAdd, release uses CAS loop
- First CAS attempt uses non-volatile read
- Thread.yield() after a failed CAS provides a net gain
Result
More (though not completely) robust concurrency semantics for ref
counting; increased latency under high contention, but still roughly
twice as fast as the original logic. Bench results to follow
Motivation:
ByteBuf is used everywhere so we should try hard to be able to make things inlinable. During benchmarks it showed that writeCharSequence(...) fails to inline writeUtf8 because it is too big even if its hots.
Modifications:
Move less common code-path to extra method to allow inlining.
Result:
Be able to inline writeUtf8 in most cases.
Motivation:
Often a temporary ByteBuffer is used which can be cached to reduce the GC pressure.
Modifications:
Add a Deque per PoolChunk which will be used for caching.
Result:
Less GC.
Motivation:
When we create new chunk with memory aligned, the offset of direct memory should be
'alignment - address & (alignment - 1)', not just 'address & (alignment - 1)'.
Modification:
Change offset calculating formula to offset = alignment - address & (alignment - 1) in PoolArena.DirectArena#offsetCacheLine and add a unit test to assert that.
Result:
Correctly calculate offset.