Motivation:
Found an invalid comment in UnpooledDirectByteBuf.
Modification:
Fixed a comment in UnpooledDirectByteBuf.
Result:
Fixed a comment in UnpooledDirectByteBuf.
Motivation:
We rely on this functionality in PoolChunk, and a bug was caught by a non-deterministic test failure
Modification:
Went back to the Algorithms book, and reimplemented remove() the way it was meant to.
Result:
No test failures after 200.000 runs, so we have some confidence the code is correct now.
Motivation:
The uncached access to PoolChunk can be made faster, and avoid allocating boxed Longs, if we have a primitive hash map and priority queue implementation for it.
Modification:
Add bespoke primitive implementations of a hash map and a priority queue for PoolChunk.
Remove all the long-boxing caused by the previous implementation.
The hashmap is a linear probing map with a fairly short probe that keeps the search within a couple of cache lines.
The priority queue is the same binary heap algorithm that's described in Algorithms by Sedgewick and Wayne.
The implementation avoids the Long boxing by relying on a long[] array.
This makes the internal-remove method faster, which is an important operation in PoolChunk.
Result:
Roughly 13% performance uplift in buffer allocations that miss cache.
Motivation:
https://github.com/netty/netty/pull/10267 introduced a change that reduced the fragmentation. Unfortunally it also introduced a regression when it comes to caching of normal allocations. This can have a negative performance impact depending on the allocation sizes.
Modifications:
- Fix algorithm to calculate the array size for normal allocation caches
- Correctly calculate indeox for normal caches
- Add unit test
Result:
Fixes https://github.com/netty/netty/issues/10805
Motivation:
Passing a null value of byte[] to the `Unsafe.copyMemory(xxx)` would cause the JVM crash
Modification:
Add null checking before calling `PlatformDependent.copyMemory(src, xxx)`
Result:
Fixes#10791 .
Motivation:
PoolChunk maintains multiple PriorityQueue<Long> collections. The usage
of PoolChunk#removeAvailRun unboxes the Long values to long, and then
this method uses queue.remove(..) which will auto box the value back to
Long. This creates unnecessary allocations via Long.valueOf(long).
Modifications:
- Adjust method signature and usage of PoolChunk#removeAvailRun to avoid
boxing
Result:
Less allocations as a result of PoolChunk#removeAvailRun.
Motivation:
There is no need for ByteProcessor to throw a checked exception.
The declared checked exception causes unnecessary code complications just to propagate it.
This can be cleaned up.
Modification:
ByteProcessor.process no longer declares to throw a checked exception, and all the places that were trying to cope with the checked exception have been simplified.
Result:
Simpler code.
Motivation:
There is no need for ByteProcessor to throw a checked exception.
The declared checked exception causes unnecessary code complications just to propagate it.
This can be cleaned up.
Modification:
ByteProcessor.process no longer declares to throw a checked exception, and all the places that were trying to cope with the checked exception have been simplified.
Result:
Simpler code.
Motivation:
Some buffers implement ByteBuf#order(order) by wrapping themselves in a SwappedByteBuf.
The SwappedByteBuf is then responsible for swapping the byte order on accesses.
The explicitly little-endian accessor methods, however, should not be swapped to big-endian, but instead remain explicitly little-endian.
Modification:
The SwappedByteBuf was passing through calls to e.g. writeIntLE, to the big-endian equivalent, e.g. writeInt.
This has been changed so that these calls delegate to their explicitly little-endian counterpart.
Result:
This makes all buffers that make use of SwappedByteBuf for their endian-ness configuration, consistent with all the buffers that use other implementation strategies.
In the end, all buffers now behave exactly the same, when using their explicitly little-endian accessor methods.
Motivation:
HTTP is a plaintext protocol which means that someone may be able
to eavesdrop the data. To prevent this, HTTPS should be used whenever
possible. However, maintaining using https:// in all URLs may be
difficult. The nohttp tool can help here. The tool scans all the files
in a repository and reports where http:// is used.
Modifications:
- Added nohttp (via checkstyle) into the build process.
- Suppressed findings for the websites
that don't support HTTPS or that are not reachable
Result:
- Prevent using HTTP in the future.
- Encourage users to use HTTPS when they follow the links they found in
the code.
Motivation:
LGTM reports multiple issues. They need to be triaged,
and real ones should be fixed.
Modifications:
- Fixed multiple issues reported by LGTM, such as redundant conditions,
resource leaks, typos, possible integer overflows.
- Suppressed false-positives.
- Added a few testcases.
Result:
Fixed several possible issues, get rid of false alarms in the LGTM report.
Motivation:
As the PooledByteBufAllocator is a critical part of netty we should ensure it works as expected.
Modifications:
- Add a few more asserts to ensure we not see any corrupted state
- Null out slot in the subpage array once the subpage was freed and removed from the pool
- Merge methods into constructor as it was only called from the constructor anyway.
Result:
Code cleanup
Motivation:
- To make ensureWritable throw IOOBE when maxCapacity is exceeded, even if
the requested new capacity would overflow Integer.MAX_VALUE
Modification:
- AbstractByteBuf.ensureWritable0 is modified to detect when
targetCapacity has wrapped around
- Test added for correct behaviour in AbstractByteBufTest
Result:
- Calls to ensureWritable will always throw IOOBE when maxCapacity is
exceeded (and bounds checking is enabled)
Motivation:
writeUtf8 can suffer from inlining issues and/or megamorphic call-sites on the hot path due to ByteBuf hierarchy
Modifications:
Duplicate and specialize the code paths to reduce the need of polymorphic calls
Result:
Performance are more stable in user code
Motivation:
If ByteBufUtil.getBytes() is called with copy=false, it does not
correctly check that the underlying array can be shared in some cases.
In particular:
* It does not check that the arrayOffset() is zero. This causes it to
incorrectly return the underlying array if the other conditions are
met. The returned array will be longer than requested, with additional
unwanted bytes at its start.
* It assumes that the capacity() of the ByteBuf is equal to the backing
array length. This is not true for some types of ByteBuf, such as
PooledHeapByteBuf. This causes it to incorrectly return the underlying
array if the other conditions are met. The returned array will be
longer than requested, with additional unwanted bytes at its end.
Modifications:
This commit fixes the two bugs by:
* Checking that the arrayOffset() is zero before returning the
underlying array.
* Comparing the requested length to the underlying array's length,
rather than the ByteBuf's capacity, before returning the underlying
array.
This commit also adds a series of test cases for ByteBufUtil.getBytes().
Result:
ByteBufUtil.getBytes() now correctly checks whether the underlying array
can be shared or not.
The test cases will ensure the bug is not reintroduced in the future.
Motivation
This is used solely for the DataOutput#writeUTF8() method, which may
often not be used.
Modifications
Lazily construct the contained DataOutputStream in ByteBufOutputStream.
Result
Saves an allocation in some common cases
Motivation
ByteBuf has an isAccessible method which was introduced as part of ref
counting optimizations but there are some places still doing
accessibility checks by accessing the volatile refCnt() directly.
Modifications
- Have PooledNonRetained(Duplicate|Sliced)ByteBuf#isAccessible() use
their refcount delegate's isAccessible() method
- Add static isAccessible(buf) and ensureAccessible(buf) methods to
ByteBufUtil
(since ByteBuf#isAccessible() is package-private)
- Adjust DefaultByteBufHolder and similar classes to use these methods
rather than access refCnt() directly
Result
- More efficient accessibility checks in more places
Motivation:
We shouldn't call incSmallAllocation() in a synchronized block as its backed by a concurrent datastructure
Modifications:
Move call of incSmallAllocation() out of synchronized block
Result:
Minimize scope of synchronized block
Motivation:
For size from 512 bytes to chunkSize, we use a buddy algorithm. The
drawback is that it has a large internal fragmentation.
Modifications:
1. add SizeClassesMetric and SizeClasses
2. remove tiny size, now we have small, normal and huge size
3. rewrite the structure of PoolChunk
4. rewrite pooled allocate algorithm in PoolChunk
5. when allocate subpage, using lowest common multiple of pageSize and
elemSize instead of pageSize.
6. add more tests in PooledByteBufAllocatorTest and PoolArenaTest
Result:
Reduce internal fragmentation and closes#3910
Motivation:
We should include as much details as possible when throwing an IllegalArgumentException because of overflow in CompositeByteBuf
Modifications:
Add more details and factor out check into a static method to share code
Result:
Make it more clear why an operations failed
Motivation
An NPE was reported in #10245, caused by a regression introduced in
#8939. This in particular affects ByteToMessageDecoders that use the
COMPOSITE_CUMULATOR.
Modification
- Unwrap WrappedCompositeByteBufs passed to
CompositeByteBuf#addFlattenedComponents(...) method before accessing
internal components field
- Extend unit test to cover this case and ensure more of the
CompositeByteBuf tests are also run on the wrapped variant
Results
Fixes#10245
Motivation:
We can make use of our optimized implementations for UTF-8 and US-ASCII if the user request a copy of a sequence for these charsets
Modifications:
- Add fastpath implementation
- Add unit tests
Result:
Fixes https://github.com/netty/netty/issues/10205
Motivation:
We need to ensure we not overflow when adding buffers to a CompositeByteBuf
Modifications:
- Correctly validate overflow before adding to the internal storage
- Add testcase
Result:
Fixes https://github.com/netty/netty/issues/10194
Motivation:
We have found out that ByteBufUtil.indexOf can be inefficient for substring search on
ByteBuf, both in terms of algorithm complexity (worst case O(needle.readableBytes *
haystack.readableBytes)), and in constant factor (esp. on Composite buffers).
With implementation of more performant search algorithms we have seen improvements on
the order of magnitude.
Modifications:
This change introduces three search algorithms:
1. Knuth Morris Pratt - classical textbook algorithm, a good default choice.
2. Bit mask based algorithm - stable performance on any input, but limited to maximum
search substring (the needle) length of 64 bytes.
3. Aho–Corasick - worse performance and higher memory consumption than [1] and [2], but
it supports multiple substring (the needles) search simultaneously, by inspecting every
byte of the haystack only once.
Each algorithm processes every byte of underlying buffer only once, they are implemented
as ByteProcessor.
Result:
Efficient search algorithms with linear time complexity available in Netty (I will share
benchmark results in a comment on a PR).
Motivation:
PoolChunk requires a link to a PoolThreadCache to init ByteBuf. Currently the link is retrieved from a thread local: arena.parent.threadCache().
It has some performance cost. At the beginning of the allocation call the PoolThreadCache is already retrieved from the thread local. The reference can be propagated through the calls and used.
Modifications:
Replace second lookup of PoolThreadCache during ByteBuf init by propagation of a reference to PoolThreadCache down in the allocation stack explicitly
Result:
Improve performance of ByteBuf allocation
--Before--
Benchmark (size) (tokens) (useThreadCache) Mode Cnt Score Error Units
SimpleByteBufPooledAllocatorBenchmark.getAndRelease 123 0 true avgt 20 57.112 ± 1.004 ns/op
SimpleByteBufPooledAllocatorBenchmark.getAndRelease 123 100 true avgt 20 222.827 ± 1.307 ns/op
--After--
Benchmark (size) (tokens) (useThreadCache) Mode Cnt Score Error Units
SimpleByteBufPooledAllocatorBenchmark.getAndRelease 123 0 true avgt 20 50.732 ± 1.321 ns/op
SimpleByteBufPooledAllocatorBenchmark.getAndRelease 123 100 true avgt 20 216.892 ± 3.806 ns/op
Motivation:
PoolChunk.usage() method has non-trivial computations. It is used currently in hot path methods invoked when an allocation and de-allocation are happened.
The idea is to replace usage() output comparison against percent thresholds by Chunk.freeBytes plain comparison against absolute thresholds. In such way the majority of computations from the threshold conditions are moved to init logic.
Modifications:
Replace PoolChunk.usage() conditions in PoolChunkList with equivalent conditions for PoolChunk.freeBytes()
Result:
Improve performance of allocation and de-allocation of ByteBuf from normal size cache pool
Motivation:
The current implementation of log2 in PoolThreadCache uses a loop and less efficient than an version based on Integer.numberOfLeadingZeros (intrinsic).
Modifications:
Replace the current log2 implementation in PoolThreadCache with a version based on Integer.numberOfLeadingZeros
Result:
It can improve performance slightly during allocation and de-allocation of ByteBuf using pooled allocator.
Motivation:
FixedCompositeByteBuf.isDirect() should return the same value as EMPTY_BUFFER.isDirect() when constructed via an empty array
Modifications:
- Return correct value when constructed via empty array.
- Add unit test
Result:
FixedCompositeByteBuf.isDirect() returns correct value
Motivation:
We hat a typo in the property name that is used in PooledByteBufAllocator.
Modifications:
Change from "...allocation.." to "...allocator.."
Result:
More consistent property naming
Motivation
Recent optimization #9765 introduced a bug where the native indices of
the internal reused duplicate nio buffer are not properly reset prior to
using it to copy data during a reallocation operation. This can result
in BufferOverflowExceptions thrown during ByteBuf capacity changes.
The code path in question applies only to pooled direct buffers when
Unsafe is disabled or not available.
Modification
Ensure ByteBuffer#clear() is always called on the reused internal nio
buffer prior to returning it from PooledByteBuf#internalNioBuffer()
(protected method); add unit test that exposes the bug.
Result
Fixes#9911
# Motivation:
`DefaultByteBufHolder.equals()` considers another object equal if it's an instance of `ByteBufferHolder` and if the contents of two objects are equal. However, the behavior of `equals` method is not a part of the `ByteBufHolder` contract so `DefaultByteBufHolder`'s version may be causing violation of the symmetric property if other classes have different logic.
There are already a few classes that are affected by this: `DefaultHttp2GoAwayFrame`, `DefaultHttp2UnknownFrame`, and `SctpMessage` are all overriding `equals` method breaking the symmetric property.
Another effect of this behavior is that all instances with empty data are considered equal. That may not be desireable in the situations when instances are created for predefined constants, e.g. `FullBulkStringRedisMessage.NULL_INSTANCE` and `FullBulkStringRedisMessage.EMPTY_INSTANCE` in `codec-redis`.
# Modification:
Make `DefaultByteBufHolder.equals()` implementation only work for the objects of the same class.
# Result:
- The symmetric property of the `equals` method is restored for the classes in question.
- Instances of different classes are not considered equal even if the content of the data they hold are the same.
Motivation
While working on other changes I noticed some opportunities to
streamline a few things in AbstractByteBuf.
Modifications
- Avoid duplicate ensureAccessible() checks in discard(Some)ReadBytes()
and ensureWritable0(int) methods
- Simplify ensureWritable0(int) logic
- Make some conditional checks more concise
Result
Cleaner, possibly faster code
Motivation:
Currently when use of Unsafe is disabled and an internal reallocation is
performed for a direct PooledByteBuf, a one-off temporary duplicate is
made of the source and destination backing nio buffers so that the
copy can be done in a threadsafe manner.
The need for this can be reduced by sharing the temporary duplicate
buffer that is already stored in the corresponding destination
PooledByteBuf instance.
Modifications:
Have PoolArena#memoryCopy(...) take the destination PooledByteBuf
instead of the underlying mem reference and offset, and use
internalNioBuffer() to obtain/initialize a reusable duplicate of the
backing nio buffer.
Result:
Fewer temporary allocations when resizing direct pooled ByteBufs in the
non-Unsafe case
Motivation
There's currently no way to determine whether an arbitrary ByteBuf
behaves internally like a "singluar" buffer or a composite one, and this
can be important to know when making decisions about how to manipulate
it in an efficient way.
An example of this is the ByteBuf#discardReadBytes() method which
increases the writable bytes for a contiguous buffer (by readerIndex)
but does not for a composite one.
Unfortunately !(buf instanceof CompositeByteBuf) is not reliable, since
for example this will be true in the case of a sliced CompositeByteBuf
or some third-party composite implementation.
isContiguous was chosen over isComposite since we want to assume "not
contiguous" in the unknown/default case - the doc will it clear that
false does not imply composite.
Modifications
- Add ByteBuf#isContiguous() which returns true by default
- Override the "concrete" ByteBuf impls to return true and ensure
wrapped/derived impls delegate it appropriately
- Include some basic unit tests
Result
Better assumptions/decisions possible when manipulating arbitrary
ByteBufs, for example when combining/cumulating them.
Motivation:
At the moment we directly extend the Recycler base class in our code which makes it hard to experiment with different Object pool implementation. It would be nice to be able to switch from one to another by using a system property in the future. This would also allow to more easily test things like https://github.com/netty/netty/pull/8052.
Modifications:
- Introduce ObjectPool class with static method that we now use internally to obtain an ObjectPool implementation.
- Wrap the Recycler into an ObjectPool and return it for now
Result:
Preparation for different ObjectPool implementations
Motivation
Currently doc != code and so one needs to change. Though behaviour as
currently documented might be more intuitive, we don't want to break
anyone so will adjust the doc instead. See #9503 for discussion.
Modifications
Correct the javadoc of indexOf(...) method in ByteBuf abstract class.
Results
Correct javadoc
Motivation
This is a "simpler" alternative to #9416 which fixes the same
CompositeByteBuf bugs described there, originally reported by @jingene
in #9398.
Modifications
- Add fields to Component class for the original buffer along with its
adjustment, which may be different to the already-stored unwrapped
buffer. Use it in appropriate places to ensure correctness and
equivalent behaviour to that prior to the earlier optimizations
- Add comments explaining purpose of each of the Component fields
- Unwrap more kinds of buffers in newComponent method to extend scope of
the existing indirection-reduction optimization
- De-duplicate common buffer consolidation logic
- Unit test for the original bug provided by @jingene
Result
- Correct behaviour / fixed bugs
- Some code deduplication / simplification
- Unwrapping optimization applied to more types of buffers
The downside is increased mem footprint from the two new fields, and
additional allocations in some specific cases, though those should be
rare.
Co-authored-by: jingene <jingene0206@gmail.com>
Motivation:
The Netty classes are initialized at build time by default for GraalVM Native Image compilation. This is configured via the `--initialize-at-build-time=io.netty` option. While this reduces start-up time it can lead to some problems:
- The class initializer of `io.netty.buffer.PooledByteBufAllocator` looks at the maximum memory size to compute the size of internal buffers. If the class initializer runs during image generation, then the buffers are sized according to the very large heap size that the image generator uses, and Netty allocates several arrays that are 16 MByte. The fix is to initialize the following 3 classes at run time: `io.netty.buffer.PooledByteBufAllocator,io.netty.buffer.ByteBufAllocator,io.netty.buffer.ByteBufUtil`. This fix was dependent on a GraalVM Native Image fix that was included in 19.2.0.
- The class initializer of `io.netty.handler.ssl.util.ThreadLocalInsecureRandom` needs to be initialized at runtime to ensure that the generated values are trully random and not fixed for each generated image.
- The class initializers of `io.netty.buffer.AbstractReferenceCountedByteBuf` and `io.netty.util.AbstractReferenceCounted` compute field offsets. While the field offset recomputation is necessary for correct execution as a native image these initializers also have logic that depends on the presence/absence of `sun.misc.Unsafe`, e.g., via the `-Dio.netty.noUnsafe=true` flag. The fix is to push these initializers to runtime so that the field offset lookups (and the logic depending on them) run at run time. This way no manual substitutions are necessary either.
Modifications:
Add `META-INF/native-image` configuration files that correctly trigger the inialization of the above classes at run time via `--initialize-at-run-time=...` flags.
Result:
Fixes the initialisation issues described above for Netty executables built with GraalVM.
Motivation:
AbstractByteBuf.indexOf(...) currently delegates to ByteBufUtils.indexOf(...) which will create a new ByteBufProcessor on each call. This is done to reduce overhead of bounds-checks. Unfortunally while this reduces bounds checks it produces a lot of GC. We can just implement our own version in AbstractByteBuf which makes use of _getByte(...) and so does no bound checks as well but also not need to create any garbage.
Modifications:
Write optimized implementation of indexOf(...) for AbstractByteBuf
Result:
Fixes https://github.com/netty/netty/issues/9499.
Motivation
region is preserved when capacity is increased, not just the readable
part. The behaviour is still different however when the capacity is
_decreased_ - data outside the currently-readable region is zeroed.
Modifications
Update ByteBuf capacity(int) implementations to also copy the whole
buffer region when the new capacity is less than the current capacity.
Result
Consistent behaviour of ByteBuf#capacity(int) regardless of whether the
new capacity is greater than or less than the current capacity.
Motivation
Underlying array allocations in UnpooledHeapByteBuf are intended be done
via the protected allocateArray(int) method, so that they can be tracked
and/or overridden by subclasses, for example
UnpooledByteBufAllocator$InstrumentedUnpooledHeapByteBuf or #8015. But
it looks like an explicit allocation was missed in the copy(int,int)
method.
Modification
Just use alloc().heapBuffer(...) for the allocation
Result
No possibility of "missing" array allocations when ByteBuf#copy is used.
Motivation:
#9224 introduced overrides of ByteBufUtil#writeUtf8(...) and related
methods to operate on a sub-CharSequence directly to save having to
allocate substrings, but it missed an edge case where the subsequence
does not extend to the end of the CharSequence and the last char in the
sequence is a high surrogate.
Due to the catch-IndexOutOfBoundsException optimization that avoids an
additional bounds check, it would be possible to read past the specified
end char index and successfully decode a surrogate pair which would
otherwise result in a '?' byte being written.
Modifications:
- Check for end-of-subsequence before reading next char after a high
surrogate is encountered in the
writeUtf8(AbstractByteBuf,int,CharSequence,int,int) and
utf8BytesNonAscii methods
- Add unit test for this edge case
Result:
Bug is fixed.
This removes the bounds-check-avoidance optimization but it does not
appear to have a measurable impact on benchmark results, including when
the char sequence contains many surrogate pairs (which should be rare in
any case).
* Correctly take length of ByteBufInputStream into account for readLine() / readByte()
Motivation:
ByteBufInputStream did not correctly take the length into account when validate bounds for readLine() / readByte() which could lead to read more then allowed.
Modifications:
- Correctly take length into account
- Add unit tests
- Fix existing unit test
Result:
Correctly take length of ByteBufInputStream into account.
Related to https://github.com/netty/netty/pull/9306.
Motivation:
buffer.isReadable() should not be used to limit the amount of data that can be read as the amount may be less then was is readable.
Modification:
- Use available() which takes the length into account
- Add unit test
Result:
Fixes https://github.com/netty/netty/issues/9305