Motivation
It would be useful to be able to write UTF-8 encoded subsequence of
CharSequence characters to a ByteBuf without needing to create a
temporary object via CharSequence#subSequence().
Modification
Add overloads of ByteBufUtil writeUtf8, reserveAndWriteUtf8 and
utf8Bytes methods which take explicit subsequence bounds.
Result
More efficient writing of substrings to byte buffers possible
Motivation
ByteBuf capacity is automatically increased as needed up to maxCapacity
when writing beyond the buffer's current capacity. However there's no
way to tell in general whether such an increase will result in a
relatively costly internal buffer re-allocation.
For unpooled buffers it always does, in pooled cases it depends on the
size of the associated chunk of allocated memory, which I don't think is
currently exposed in any way.
It would sometimes be useful to know where this limit is when making
external decisions about whether to reuse or preemptively reallocate.
It would also be advantageous to take this limit into account when
auto-increasing the capacity during writes, to defer such reallocation
until really necessary.
Modifications
Introduce new AbstractByteBuf.maxFastWritableBytes() method which will
return a value >= writableBytes() and <= maxWritableBytes().
Make use of the new method in the sizing decision made by the
AbstractByteBuf.ensureWritable(...) methods.
Result
Less reallocation/copying.
Motivation
A small thread-safety bug was introduced during the internal
optimizations of ComponentByteBuf made a while back in #8437. When there
is a single component which was added as a slice,
internalNioBuffer(int,int) will currently return the unwrapped slice's
un-duplicated internal NIO buffer. This is not safe since it could be
modified concurrently with other usage of that parent buffer.
Modifications
Delegate internalNioBuffer to nioBuffer in this case, which returns a
duplicate. This matches what's done in derived buffers in general
(for the same reason). Add unit test.
Result
Fixed possible thread-safety bug
Motivation
Direct buffers are normally preferred when interfacing with raw
sockets. Currently netty will only return direct io buffers (for reading
from a channel) when a platform has unsafe. However, this is
inconsistent with the write-side (filterOutboundMessage) where a direct
byte buffer will be returned if pooling is enabled. This means that
environments without unsafe (and no manual netty configurations) end up
with many pooled heap byte buffers for reading, many pooled direct byte
buffers for writing, and jdk pooled byte buffers (for reading).
Modifications
This commit modifies the AbstractByteBufAllocator to return a direct
byte buffer for io handling when the platform has unsafe or direct byte
buffers are pooled.
Result:
Use direct buffers when direct buffers are pooled for IO.
* Fix incorrect behavior of ReadOnlyByteBufferBuf.getBytes(int,ByteBuffer)
Motivation
It currently will succeed when the destination is larger than the source
range, but the ByteBuf javadoc states this should be a failure, as is
the case with all the other implementations.
Modifications
- Fix logic to fail the bounds check in this case
- Remove explicit null check which isn't done in any equivalent method
- Add unit test
Result
More correct/consistent behaviour
Motivation:
1f93bd3 introduced a regression that could lead to not have the lastAccessed field correctly null'ed out when the endOffset of the internal Component == CompositeByteBuf.readerIndex()
Modifications:
- Correctly null out the lastAccessed field in any case
- Add unit tests
Result:
Fixes regression in CompositeByteBuf.discard*ReadBytes()
Motivation:
The CompositeByteBuf discardReadBytes / discardReadComponents methods are currently quite inefficient, including when there are no read components to discard. We would like to call the latter more frequently in ByteToMessageDecoder#COMPOSITE_CUMULATOR.
In the same context it would be beneficial to perform a "shallow copy" of a composite buffer (for example when it has a refcount > 1) to avoid having to allocate and copy the contained bytes just to obtain an "independent" cumulation.
Modifications:
- Optimize discardReadBytes() and discardReadComponents() implementations (start at first comp rather than performing a binary search for the readerIndex).
- New addFlattenedComponents(boolean,ByteBuf) method which performs a shallow copy if the provided buffer is also composite and avoids adding any empty buffers, plus unit test.
- Other minor optimizations to avoid unnecessary checks.
Results:
discardReadXX methods are faster, composite buffers can be easily appended without deepening the buffer "tree" or retaining unused components.
Motivation:
The special case fixed in #8497 also requires that we keep a derived slice when trimming components in place, as done by the capacity(int) and discardReadBytes() methods.
Modifications:
Ensure that we keep a ref to trimmed components' original retained slice in capacity(int) and discardReadBytes() methods, so that it is released properly when the they are later freed. Add unit test which fails prior to the fix.
Result:
Edge case leak is eliminated.
Motivation:
PooledByteBufAllocator uses a PoolThreadCache per Thread that allocates / deallocates to minimize the performance overhead. This PoolThreadCache is trimmed after X allocations to free up buffers that are not allocated for a long time. This works out quite well when the app continues to allocate but fails if the app stops to allocate frequently (for whatever reason) and so a lot of memory is wasted and not given back to the arena / freed.
Modifications:
- Add a ThreadExecutorMap that offers multiple methods that wrap Runnable / ThreadFactory / Executor and allow to call ThreadExecutorMap.currentEventExecutor() to get the current executing EventExecutor for the calling Thread.
- Use these methods in the constructors of our EventExecutor implementations (which also covers the EventLoop implementations)
- Add io.netty.allocator.cacheTrimIntervalMillis system property which can be used to specify a fixed rate / interval on which we should try to trim the PoolThreadCache for a EventExecutor that allocates.
- Add PooledByteBufAllocator.trimCurrentThreadCache() to allow the user to trim the cache of the calling thread manually.
- Add testcases
- Introduce FastThreadLocal.getIfExists()
Result:
Allow to better / more frequently trim PoolThreadCache and so give back memory to the area / system.
Motivation:
We should run a CI job using J9 to ensure netty also works when using different JVMs.
Modifications:
- Adjust PooledByteBufAllocatorTest to be able to complete faster when using a JVM which takes longer when joining Threads (this seems to be the case with J9).
- Skip UDT tests on J9 as UDT is not supported there.
Result:
Be able to run CI against J9.
Motivation:
We need to update to a new checkstyle plugin to allow the usage of lambdas.
Modifications:
- Update to new plugin version.
- Fix checkstyle problems.
Result:
Be able to use checkstyle plugin which supports new Java syntax.
Motivation
In #8758, @doom369 reported an infinite loop bug in CompositeByteBuf
which was introduced in #8437.
This is the same small fix for that, along with fixes for two other bugs
found while re-inspecting the changes and adding unit tests.
Modification
- Replace recursive call to toComponentIndex with toComponentIndex0 as
intended
- Add missed "lastAccessed" racy cache invalidation in capacity(int)
method
- Fix incorrect determination of initial offset in non-zero cIndex case
of updateComponentOffsets method
- New unit tests for previously uncovered methods
Results
Fewer bugs.
Motivation:
In versions of Netty prior to 4.1.31.Final, a CompositeByteBuf could be
created with any size (including potentially nonsensical negative
values). This behavior changed in e7737b993, which introduced a bounds
check to only allow for a component size greater than one. This broke
some existing use cases that attempted to create a byte buf with a
single component.
Modifications:
Lower the bounds check on numComponents to include the single component
case, but still throw an exception for anything less than one.
Add unit tests for the case of numComponents being less than, equal to,
and greater than this lower bound.
Result:
Return to the behavior of 4.1.30.Final, allowing one component, but
still include an explicit check against a lower bound.
Note that while creating a CompositeByteBuf with a single component is
in some ways a contradiction of the term "composite", this patch caters
for existing uses while excluding the clearly nonsensical case of asking
for a CompositeByteBuf with zero or fewer components.
Fixes#8613.
Motivation:
When we create new chunk with memory aligned, the offset of direct memory should be
'alignment - address & (alignment - 1)', not just 'address & (alignment - 1)'.
Modification:
Change offset calculating formula to offset = alignment - address & (alignment - 1) in PoolArena.DirectArena#offsetCacheLine and add a unit test to assert that.
Result:
Correctly calculate offset.
Motivation:
ByteBuf.retainedSlice() and similar methods produce sliced buffers with
an independent refcount to the buffer that they wrap.
One of the optimizations in 10539f4dc7 was
to use the ref to the unwrapped buffer object for added slices, but this
did not take into account the above special case when later releasing.
Thanks to @rkapsi for discovering this via #8495.
Modifications:
Since a reference to the slice is still kept in the Component class,
just changed Component.freeIfNecessary() to release the slice in
preference to the unwrapped buf.
Also added a unit test which reproduces the bug.
Result:
Fixes#8495
Motivation:
I came across two bugs:
- Components removed due to capacity reduction aren't released
- Offsets aren't set correctly on empty components that are added
between existing components
Modifications:
Add unit tests which expose these bugs, fix them.
Result:
Bugs are fixed
* Optimize AbstractByteBuf.getCharSequence() in US_ASCII case
Motivation:
Inspired by https://github.com/netty/netty/pull/8388, I noticed this
simple optimization to avoid char[] allocation (also suggested in a TODO
here).
Modifications:
Return an AsciiString from AbstractByteBuf.getCharSequence() if
requested charset is US_ASCII or ISO_8859_1 (latter thanks to
@Scottmitch's suggestion). Also tweak unit tests not to require Strings
and include a new benchmark to demonstrate the speedup.
Result:
Speed-up of AbstractByteBuf.getCharSequence() in ascii and iso 8859/1
cases
Motivation:
CompositeByteBuf.decompose(...) did not correctly slice the content and so produced an incorrect representation of the data.
Modifications:
- Rewrote implementation to fix bug and also improved it to reduce GC
- Add unit tests.
Result:
Fixes https://github.com/netty/netty/issues/8400.
Motivation:
Avoid creating any StringBuilder instance if
ByteBufInputStream::readLine isn't used
Modifications:
The StringBuilder instance is lazy allocated on demand and
are added new test case branches to address the increased
complexity of ByteBufInputStream::readLine
Result:
Reduced GC activity if ByteBufInputStream::readLine isn't used
* Allow to use native transports when sun.misc.Unsafe is not present on the system
Motivation:
We should be able to use the native transports (epoll / kqueue) even when sun.misc.Unsafe is not present on the system. This is especially important as Java11 will be released soon and does not allow access to it by default.
Modifications:
- Correctly disable usage of sun.misc.Unsafe when -PnoUnsafe is used while running the build
- Correctly increment metric when UnpooledDirectByteBuf is allocated. This was uncovered once -PnoUnsafe usage was fixed.
- Implement fallbacks in all our native transport code for when sun.misc.Unsafe is not present.
Result:
Fixes https://github.com/netty/netty/issues/8229.
Motivation:
We need to add special handling for WrappedCompositeByteBuf as these also extend AbstractByteBuf, otherwise we will not correctly adjust / read the writerIndex during processing.
Modifications:
- Add instanceof checks for WrappedCompositeByteBuf as well.
- Add testcases
Result:
Fixes https://github.com/netty/netty/issues/8152.
Motivation:
When a buffer is over-released, the current error message of `IllegalReferenceCountException` is `refCnt: XXX, increment: XXX`, which is confusing. The correct message should be `refCnt: XXX, decrement: XXX`.
Modifications:
Pass `-decrement` to create `IllegalReferenceCountException`.
Result:
The error message will be `refCnt: XXX, decrement: XXX` when a buffer is over-released.
Motivation:
It should be possible to write a ReadOnlyByteBufferBuf to a channel without errors. However, ReadOnlyByteBufferBuf does not override isWritable and ensureWritable, which can cause some handlers to mistakenly assume they can write to the ReadOnlyByteBufferBuf, resulting in ReadOnlyBufferException.
Modification:
Added isWritable and ensureWritable method overrides on ReadOnlyByteBufferBuf to indicate that it is never writable. Added tests for these methods.
Result:
Can successfully write ReadOnlyByteBufferBuf to a channel with an SslHandler (or any other handler which may attempt to write to the ByteBuf it receives).
Motivation:
The `AbstractByteBuf#equals` method doesn't take into account the
class of buffer instance. So the two buffers with different classes
must have the same `hashCode` values if `equals` method returns `true`.
But `EmptyByteBuf#hashCode` is not consistent with `#hashCode`
of the empty `AbstractByteBuf`, that is violates the contract and
can lead to errors.
Modifications:
Return `1` in `EmptyByteBuf#hashCode`.
Result:
Consistent behavior of `EmptyByteBuf#hashCode` and `AbstractByteBuf#hashCode`.
Motivation:
The `ByteBuf#slice` and `ByteBuf#duplicate` methods should check
an accessibility to prevent creation slice or duplicate
of released buffer. At now this works not in the all scenarios.
Modifications:
Add missed checks.
Result:
More correct and consistent behavior of `ByteBuf` methods.
Motivation:
We should allow to access the memoryAddress / array of the FixedCompositeByteBuf when it only wraps a single ByteBuf. We do the same for CompositeByteBuf.
Modifications:
- Check how many buffers FixedCompositeByteBuf wraps and depending on it delegate the access to the memoryAddress / array
- Add unit tests.
Result:
Fixes [#7752].
Motivation:
If someone invoke writeByte(), markWriterIndex(), readByte() in order first, and then invoke resetWriterIndex() should be throw a IndexOutOfBoundsException to obey the rule that the buffer declared "0 <= readerIndex <= writerIndex <= capacity".
Modification:
Changed the code writerIndex = markedWriterIndex; into writerIndex(markedWriterIndex); to make the check affect
Result:
Throw IndexOutOfBoundsException if any invalid happened in resetWriterIndex.
Motivation:
To avoid eager allocation of the destination and to perform length prefixed encoding of UTF-8 string with forward only access pattern
Modifications:
The original writeUtf8 is modified by allowing customization of the reserved bytes on the destination buffer and is introduced an exact UTF-8 length estimator.
Result:
Is now possible to perform length first encoding with UTF-8 well-formed char sequences following a forward only write access pattern on the destination buffer.
Motivation:
There is some cleanup that can be done.
Modifications:
- Use intializer list expression where possible
- Remove unused imports.
Result:
Cleaner code.
Motivation:
We need the memoryAddress of a direct buffer when using our native transports. For this reason ReadOnlyUnsafeDirectByteBuf.memoryAddress() should not throw.
Modifications:
- Correctly override ReadOnlyUnsafeDirectByteBuf.memoryAddress() and hasMemoryAddress()
- Add test case
Result:
Fixes [#7672].
Motivation:
We saw some timeouts on the CI when the leak detection is enabled.
Modifications:
- Use smaller number of operations in test
- Increase timeout
Result:
CI not times out.
Motivation:
ByteBufUtil.isText(...) may produce unexpected results if called concurrently on the same ByteBuffer.
Modifications:
- Don't use internalNioBuffer where it is not safe.
- Add unit test.
Result:
ByteBufUtil.isText is thread-safe.
Motivation:
Usages of HttpResponseStatus may result in more object allocation then necessary due to not looking for cached objects and the AsciiString parsing method not being used due to CharSequence method being used instead.
Modifications:
- HttpResponseDecoder should attempt to get the HttpResponseStatus from cache instead of allocating a new object
- HttpResponseStatus#parseLine(CharSequence) should check if the type is AsciiString and redirect to the AsciiString parsing method which may not require an additional toString call
- HttpResponseStatus#parseLine(AsciiString) can be optimized and doesn't require and may not require object allocation
Result:
Less allocations when dealing with HttpResponseStatus.
Motivation:
Calling ByteBuf.toString(Charset) on the same buffer from multiple threads at the same time produces unexpected results, such as various exceptions and/or corrupted output. This is because ByteBufUtil.decodeString(...) is taking the source ByteBuffer for CharsetDecoder.decode() from ByteBuf.internalNioBuffer(int, int), which is not thread-safe.
Modification:
Call ByteBuf.nioBuffer() instead of ByteBuf.internalNioBuffer() to get the source buffer to pass to CharsetDecoder.decode().
Result:
Fixes the possible race condition.
Motivation:
We did not correctly take the position into account when wrapping a ByteBuffer via ReadOnlyUnsafeDirectByteBuf as we obtained the memory address from the original ByteBuffer and not the slice we take.
Modifications:
- Correctly use the slice to obtain memory address.
- Add test case.
Result:
Fixes [#7565].
Motivation:
There is no guarantee that FastThreadLocal.onRemoval(...) is called if the FastThreadLocal is used by "non" FastThreacLocalThreads. This can lead to all sort of problems, like for example memory leaks as direct memory is not correctly cleaned up etc.
Beside this we use ThreadDeathWatcher to check if we need to release buffers back to the pool when thread local caches are collected. In the past ThreadDeathWatcher was used which will need to "wakeup" every second to check if the registered Threads are still alive. If we can ensure FastThreadLocal.onRemoval(...) is called we do not need this anymore.
Modifications:
- Introduce ObjectCleaner and use it to ensure FastThreadLocal.onRemoval(...) is always called when a Thread is collected.
- Deprecate ThreadDeathWatcher
- Add unit tests.
Result:
Consistent way of cleanup FastThreadLocals when a Thread is collected.
Motivation:
We dont need to use the ThreadDeathWatcher if we use a FastThreadLocalThread for which we wrap the Runnable and ensure we call FastThreadLocal.removeAll() once the Runnable completes.
Modifications:
- Dont use a ThreadDeathWatcher if we are sure we will call FastThreadLocal.removeAll()
- Add unit test.
Result:
Less overhead / running theads if you only allocate / deallocate from FastThreadLocalThreads.
Motivation:
AbstractByteBuf#readSlice relied upon the bounds checking of the slice operation in order to detect index out of bounds conditions. However the slice bounds checking operation allows for the slice to go beyond the writer index, and this is out of bounds for a read operation.
Modifications:
- AbstractByteBuf#readSlice and AbstractByteBuf#readRetainedSlice should ensure the desired amount of bytes are readable before taking a slice
Result:
No reading of undefined data in AbstractByteBuf#readSlice and AbstractByteBuf#readRetainedSlice.
Motivation:
When calling CompositeBytebuf.copy() and copy(...) we currently use Unpooled to allocate the buffer. This is not really correct and may produce more GC then needed. We should use the allocator that was used when creating the CompositeByteBuf to allocate the new buffer which may be for example the PooledByteBufAllocator.
Modifications:
- Use alloc() to allocate the new buffer.
- Add tests
- Fix tests that depend on the copy to be backed by an byte-array without checking hasArray() first.
Result:
Fixes [#7393].
Motivation:
Even if it's a super micro-optimization (most JVM could optimize such
cases in runtime), in theory (and according to some perf tests) it
may help a bit. It also makes a code more clear and allows you to
access such methods in the test scope directly, without instance of
the class.
Modifications:
Add 'static' modifier for all methods, where it possible. Mostly in
test scope.
Result:
Cleaner code with proper 'static' modifiers.
Motivation:
`useCacheForAllThreads` may be false which disables memory caching
on non netty threads. Setting this argument or the system property
makes it impossible to use `PooledByteBufAllocator`.
Modifications:
Delayed the check of `freeSweepAllocationThreshold` in
`PoolThreadCache` to after it knows there will be any caches in
use. Additionally, check if the caches will have any data in them
(rather than allocating a 0-length array).
A test case is also added that fails without this change.
Results:
Fixes#7194
Motivation:
In ReadOnlyByteBufferBuf.copy(...) we just allocated a ByteBuffer directly and wrapped it. This way it was not possible for us to free the direct memory that was used by the copy without the GC.
Modifications:
- Ensure we use the allocator when create the copy and so be able to release direct memory in a timely manner
- Add unit test
- Depending on if the to be copied buffer is direct or heap based we also allocate the same type on copy.
Result:
Fixes [#7103].
Motivation:
`ByteBuf` does not have the little endian variant of float/double access methods.
Modifications:
Add support for little endian floats and doubles into `ByteBuf`.
Result:
`ByteBuf` has get/read/set/writeFloatLE() and get/read/set/writeDoubleLE() methods. Fixes [#6576].
Motivation:
Missing return in ByteBufUtil#writeAscii causes endless loop
Modifications:
Add return after write finished
Result:
ByteBufUtil#writeAscii is ok
Motivation:
ByteBuf#ensureWritable(int,boolean) returns an int indicating the status of the resize operation. For buffers that are unmodifiable or cannot be resized this method shouldn't throw but just return 1.
ByteBuf#ensureWriteable(int) should throw unmodifiable buffers.
Modifications:
- ReadOnlyByteBuf should be updated as described above.
- Add a unit test to SslHandler which verifies the read only buffer can be tolerated in the aggregation algorithm.
Result:
Fixes https://github.com/netty/netty/issues/7002.
Motivation:
We need to ensure we not allow calling set/writeCharsequence on an released ByteBuf.
Modifications:
Add test-cases
Result:
Proves fix of [#6951].
Motivation:
AbstractByteBuf.setCharSequence(...) must not expand the buffer if not enough writable space is present in the buffer to be consistent with all the other set operations.
Modifications:
- Ensure we only exand the buffer on writeCharSequence(...) but not on setCharSequence(...)
- Add unit tests.
Result:
Consistent and correct behavior.
Motivation:
AbstractByteBuf.ensureWritable(...) should check if buffer was released and if so throw an IllegalReferenceCountException
Modifications:
Ensure we throw in all cases.
Result:
More consistent and correct behaviour
Motivation:
1. Some encoders used a `ByteBuf#writeBytes` to write short constant byte array (2-3 bytes). This can be replaced with more faster `ByteBuf#writeShort` or `ByteBuf#writeMedium` which do not access the memory.
2. Two chained calls of the `ByteBuf#setByte` with constants can be replaced with one `ByteBuf#setShort` to reduce index checks.
3. The signature of method `HttpHeadersEncoder#encoderHeader` has an unnecessary `throws`.
Modifications:
1. Use `ByteBuf#writeShort` or `ByteBuf#writeMedium` instead of `ByteBuf#writeBytes` for the constants.
2. Use `ByteBuf#setShort` instead of chained call of the `ByteBuf#setByte` with constants.
3. Remove an unnecessary `throws` from `HttpHeadersEncoder#encoderHeader`.
Result:
A bit faster writes constants into buffers.