Motivation:
The CompositeByteBuf discardReadBytes / discardReadComponents methods are currently quite inefficient, including when there are no read components to discard. We would like to call the latter more frequently in ByteToMessageDecoder#COMPOSITE_CUMULATOR.
In the same context it would be beneficial to perform a "shallow copy" of a composite buffer (for example when it has a refcount > 1) to avoid having to allocate and copy the contained bytes just to obtain an "independent" cumulation.
Modifications:
- Optimize discardReadBytes() and discardReadComponents() implementations (start at first comp rather than performing a binary search for the readerIndex).
- New addFlattenedComponents(boolean,ByteBuf) method which performs a shallow copy if the provided buffer is also composite and avoids adding any empty buffers, plus unit test.
- Other minor optimizations to avoid unnecessary checks.
Results:
discardReadXX methods are faster, composite buffers can be easily appended without deepening the buffer "tree" or retaining unused components.
Motivation:
The special case fixed in #8497 also requires that we keep a derived slice when trimming components in place, as done by the capacity(int) and discardReadBytes() methods.
Modifications:
Ensure that we keep a ref to trimmed components' original retained slice in capacity(int) and discardReadBytes() methods, so that it is released properly when the they are later freed. Add unit test which fails prior to the fix.
Result:
Edge case leak is eliminated.
Motivation
There's some miscellaneous cleanup/simplification of CompositeByteBuf
which would help make the code a bit clearer.
Modifications
- Simplify web of constructors and addComponents methods, reducing
duplication of logic
- Rename `Component.freeIfNecessary()` method to just `free()`, which is
less confusing (see #8641)
- Make loop in addComponents0(...) method more verbose/readable (see
https://github.com/netty/netty/pull/8437#discussion_r232124414)
- Simplify addition/subtraction in setBytes(...) methods
Result
Smaller/clearer code
Motivation
In #8758, @doom369 reported an infinite loop bug in CompositeByteBuf
which was introduced in #8437.
This is the same small fix for that, along with fixes for two other bugs
found while re-inspecting the changes and adding unit tests.
Modification
- Replace recursive call to toComponentIndex with toComponentIndex0 as
intended
- Add missed "lastAccessed" racy cache invalidation in capacity(int)
method
- Fix incorrect determination of initial offset in non-zero cIndex case
of updateComponentOffsets method
- New unit tests for previously uncovered methods
Results
Fewer bugs.
Motivation:
In versions of Netty prior to 4.1.31.Final, a CompositeByteBuf could be
created with any size (including potentially nonsensical negative
values). This behavior changed in e7737b993, which introduced a bounds
check to only allow for a component size greater than one. This broke
some existing use cases that attempted to create a byte buf with a
single component.
Modifications:
Lower the bounds check on numComponents to include the single component
case, but still throw an exception for anything less than one.
Add unit tests for the case of numComponents being less than, equal to,
and greater than this lower bound.
Result:
Return to the behavior of 4.1.30.Final, allowing one component, but
still include an explicit check against a lower bound.
Note that while creating a CompositeByteBuf with a single component is
in some ways a contradiction of the term "composite", this patch caters
for existing uses while excluding the clearly nonsensical case of asking
for a CompositeByteBuf with zero or fewer components.
Fixes#8613.
Motivation:
ByteBuf.retainedSlice() and similar methods produce sliced buffers with
an independent refcount to the buffer that they wrap.
One of the optimizations in 10539f4dc7 was
to use the ref to the unwrapped buffer object for added slices, but this
did not take into account the above special case when later releasing.
Thanks to @rkapsi for discovering this via #8495.
Modifications:
Since a reference to the slice is still kept in the Component class,
just changed Component.freeIfNecessary() to release the slice in
preference to the unwrapped buf.
Also added a unit test which reproduces the bug.
Result:
Fixes#8495
Motivation:
CompositeByteBuf is a powerful and versatile abstraction, allowing for
manipulation of large data without copying bytes. There is still a
non-negligible cost to reading/writing however relative to "singular"
ByteBufs, and this can be mostly eliminated with some rework of the
internals.
My use case is message modification/transformation while zero-copy
proxying. For example replacing a string within a large message with one
of a different length
Modifications:
- No longer slice added buffers and unwrap added slices
- Components store target buf offset relative to position in
composite buf
- Less allocations, object footprint, pointer indirection, offset
arithmetic
- Use Component[] rather than ArrayList<Component>
- Avoid pointer indirection and duplicate bounds check, more
efficient backing array growth
- Facilitates optimization when doing bulk-inserts - inserting n
ByteBufs behind m is now O(m + n) instead of O(mn)
- Avoid unnecessary casting and method call indirection via superclass
- Eliminate some duplicate range/ref checks via non-checking versions of
toComponentIndex and findComponent
- Add simple fast-path for toComponentIndex(0); add racy cache of
last-accessed Component to findComponent(int)
- Override forEachByte0(...) and forEachByteDesc0(...) methods
- Make use of RecyclableArrayList in nioBuffers(int, int) (in line with
FasterCompositeByteBuf impl)
- Modify addComponents0(boolean,int,Iterable) to use the Iterable
directly rather than copy to an array first (and possibly to an
ArrayList before that)
- Optimize addComponents0(boolean,int,ByteBuf[],int) to not perform
repeated array insertions and avoid second loop for offset updates
- Simplify other logic in various places, in particular the general
pattern used where a sub-range is iterated over
- Add benchmarks to demonstrate some improvements
While refactoring I also came across a couple of clear bugs. They are
fixed in these changes but I will open another PR with unit tests and
fixes to the current version.
Result:
Much faster creation, manipulation, and access; many fewer allocations
and smaller footprint. Benchmark results to follow.
Motivation:
Unpooled.wrap(byte[]...) and Unpooled.wrap(ByteBuffer...) currently
allocate/copy an intermediate ByteBuf ArrayList and array, which can be
avoided.
Modifications:
- Define new internal ByteWrapper interface and add a CompositeByteBuf
constructor which takes a ByteWrapper with an array of the type that it
wraps, and modify the appropriate Unpooled.wrap(...) methods to take
advantage of it
- Tidy up other constructors in CompositeByteBuf to remove duplication
and misleading len arg (which is really an end offset into provided
array)
Result:
Less allocation/copying when wrapping byte[] and ByteBuffer arrays,
tidier code.
Motivation:
I came across two bugs:
- Components removed due to capacity reduction aren't released
- Offsets aren't set correctly on empty components that are added
between existing components
Modifications:
Add unit tests which expose these bugs, fix them.
Result:
Bugs are fixed
Motivation:
CompositeByteBuf.decompose(...) did not correctly slice the content and so produced an incorrect representation of the data.
Modifications:
- Rewrote implementation to fix bug and also improved it to reduce GC
- Add unit tests.
Result:
Fixes https://github.com/netty/netty/issues/8400.
In nioBuffer(int,int) in CompositeByteBuf , we create a sub-array of nioBuffers for the components that are in range, then concatenate all the components in range into a single bigger buffer.
However, if the call to nioBuffers() returned only one sub-buffer, then we are copying it to a newly-allocated buffer "merged" for no reason.
Motivation:
Profiler for Spark shows a lot of time spent in put() method inside nioBuffer(), while usually no copy of data is required.
Modification:
This change skips this last step and just returns a duplicate of the single buffer returned by the call to nioBuffers(), which will in most implementation not copy the data
Result:
No copy when the source is only 1 buffer
Motivation:
Eliminate avoidable backing array reallocations when constructing
composite ByteBufs from existing buffer arrays/Iterables. This also
applies to the Unpooled.wrappedBuffer(...) methods.
Modifications:
Ensure the initial components ComponentList is sized at least as large
as the provided buffer array/Iterable in the CompositeByteBuffer
constructors.
In single-arg Unpooled.wrappedBuffer(...) methods, set maxNumComponents
to the count of provided buffers, rather than a fixed default of 16. It
seems likely that most usage of these involves wrapping a list without
subsequent modification, particularly since they return a ByteBuf rather
than CompositeByteBuf. If a different/larger max is required there are
already the wrappedBuffer(int, ...) variants.
In fact the current behaviour could be considered inconsistent - if you
call Unpooled.wrappedBuffer(int, ByteBuf) with a single buffer, you
might expect to subsequently be able to add buffers to it (since you
specified a max related to consolidation), but it will in fact return
just a slice of the provided ByteBuf.
Result:
Fewer and smaller allocations in some cases when using CompositeByteBufs
or Unpooled.wrappedBuffer(...).
Motivation:
We used subList in CompositeByteBuf to remove ranges of elements from the internal storage. Beside this we also used an foreach loop in a few cases which will crate an Iterator.
Modifications:
- Use our own sub-class of ArrayList which exposes removeRange(...). This allows to remove a range of elements without an extra allocation.
- Use an old style for loop to iterate over the elements to reduce object allocations.
Result:
Less allocations.
Motivation:
When calling CompositeBytebuf.copy() and copy(...) we currently use Unpooled to allocate the buffer. This is not really correct and may produce more GC then needed. We should use the allocator that was used when creating the CompositeByteBuf to allocate the new buffer which may be for example the PooledByteBufAllocator.
Modifications:
- Use alloc() to allocate the new buffer.
- Add tests
- Fix tests that depend on the copy to be backed by an byte-array without checking hasArray() first.
Result:
Fixes [#7393].
Motivation:
We not had tests for ByteBufAllocator implementations in general.
Modifications:
Added ByteBufAllocatorTest, AbstractByteBufAllocatorTest and UnpooledByteBufAllocatorTest
Result:
More tests for allocator implementations.
Motivation:
PooledByteBuf.capacity(...) miss to enforce maxCapacity() and so its possible to increase the capacity of the buffer even if it will be bigger then maxCapacity().
Modifications:
- Correctly enforce maxCapacity()
- Add unit tests for capacity(...) calls.
Result:
Correctly enforce maxCapacity().
Motivation:
When Unpooled.wrappedBuffer(...) is called with an array of ByteBuf with length >= 2 and the first ByteBuf is not readable it will result in double releasing of these empty buffers when release() is called on the returned buffer.
Modifications:
- Ensure we only wrap readable buffers.
- Add unit test
Result:
No double release of buffers.
Motivation:
At the moment the user is responsible to increase the writer index of the composite buffer when a new component is added. We should add some methods that handle this for the user as this is the most popular usage of the composite buffer.
Modifications:
Add new methods that autoamtically increase the writerIndex when buffers are added.
Result:
Easier usage of CompositeByteBuf.
Motivation
See ##3229
Modifications:
Add methods with position independent FileChannel calls to ByteBuf and its subclasses.
Results:
The user can use these new methods to read/write ByteBuff without updating FileChannel's position.
Motivation:
The current interface for CompositeByteBuf.addComponent is not clear under what conditions ownership is transferred when addComponent is called. There should be a well defined behavior so that users can ensure that no leaks occur.
Modifications:
- CompositeByteBuf.addComponent should always assume reference count ownership
Result:
Users that call CompositeByteBuf.addComponent do not have to independently check if the buffer's ownership has been transferred and if not independently release the buffer.
Fixes https://github.com/netty/netty/issues/4760
Motivation:
CompositeByteBuf only implemented simple resource leak detection and how it was implemented was completly different to the way it was for ByteBuf. The other problem was that slice(), duplicate() and others would not return a resource leak enabled buffer.
Modifications:
- Proper implementation for all level of resource leak detection for CompositeByteBuf
Result:
Proper resource leak detection for CompositeByteBuf.
As discussed in #3209, this PR adds Little Endian accessors
to ByteBuf and descendants.
Corresponding accessors were added to UnsafeByteBufUtil,
HeapByteBufferUtil to avoid calling `reverseBytes`.
Deprecate `order()`, `order(buf)` and `SwappedByteBuf`.
Motivation:
CompositeByteBuf.iterator() currently creates a new ArrayList and fill it with the ByteBufs, which is more expensive then it needs to be.
Modifications:
- Use special Iterator implementation
Result:
Less overhead when calling iterator()
Motivation:
CompositeByteBuf has an iterator() method but fails to implement Iterable
Modifications:
Let CompositeByteBuf implement Iterable<ByteBuf>
Result:
Easier usage
Motivation:
At the moment we have two problems:
- CompositeByteBuf.addComponent(...) will not add the supplied buffer to the CompositeByteBuf if its empty, which means it will not be released on CompositeByteBuf.release() call. This is a problem as a user will expect everything added will be released (the user not know we not added it).
- CompositeByteBuf.addComponents(...) will either add no buffers if none is readable and so has the same problem as addComponent(...) or directly release the ByteBuf if at least one ByteBuf is readable. Again this gives inconsistent handling and may lead to memory leaks.
Modifications:
- Always add the buffer to the CompositeByteBuf and so release it on release call.
Result:
Consistent handling and no buffer leaks.
Motivation:
When a CompositeByteBuf is empty (i.e. has no component), its internal
memory access operations do not always behave as expected.
Modifications:
Check if the nunmber of components is zero. If so, return an empty
array or an empty NIO buffer, etc.
Result:
More robustness
Motivation:
CompositeByteBuf.nioBuffers(...) returns an empty ByteBuffer array if the specified length is 0. This is not consistent with other ByteBuf implementations which return an ByteBuffer array of size 1 with an empty ByteBuffer included.
Modifications:
Make CompositeByteBuf.nioBuffers(...) consistent with other ByteBuf implementations.
Result:
Consistent and correct behaviour of nioBufffers(...)
Modifications:
- Added a static modifier for CompositeByteBuf.Component.
This class is an inner class, but does not use its embedded reference to the object which created it. This reference makes the instances of the class larger, and may keep the reference to the creator object alive longer than necessary.
- Removed unnecessary boxing/unboxing operations in HttpResponseDecoder, RtspResponseDecoder, PerMessageDeflateClientExtensionHandshaker and PerMessageDeflateServerExtensionHandshaker
A boxed primitive is created from a String, just to extract the unboxed primitive value.
- Removed unnecessary 3 times calculations in DiskAttribute.addContent(...).
- Removed unnecessary checks if file exists before call mkdirs() in NativeLibraryLoader and PlatformDependent.
Because the method mkdirs() has this check inside.
- Removed unnecessary `instanceof AsciiString` check in StompSubframeAggregator.contentLength(StompHeadersSubframe) and StompSubframeDecoder.getContentLength(StompHeaders, long).
Because StompHeaders.get(CharSequence) always returns java.lang.String.
Motivation:
CompositeByteBuf.deallocate generates unnecessary GC pressure when using the 'foreach' loop, as a 'foreach' loop creates an iterator when looping.
Modification:
Convert 'foreach' loop into regular 'for' loop.
Result:
Less GC pressure (and possibly more throughput) as the 'for' loop does not create an iterator
Motivation:
Because of how we use reference counting we need to check for the reference count before each operation that touches the underlying memory. This is especially true as we use sun.misc.Cleaner.clean() to release the memory ASAP when possible. Because of this the user may cause a SEGFAULT if an operation is called that tries to access the backing memory after it was released.
Modification:
Correctly check the reference count on all methods that access the underlying memory or expose it via a ByteBuffer.
Result:
Safer usage of ByteBuf
- Related: #2163
- Add ResourceLeakHint to allow a user to provide a meaningful information about the leak when touching it
- DefaultChannelHandlerContext now implements ResourceLeakHint to tell where the message is going.
- Cleaner resource leak report by excluding noisy stack trace elements
- 5% improvement in throughput (HelloWorldServer example)
- Made CompositeByteBuf a concrete class (renamed from DefaultCompositeByteBuf) because there's no multiple inheritance in Java
Fixes#1536
- Fixes#1528
It's not really easy to provide a general-purpose abstraction for fast-yet-safe iteration. Instead of making forEachByte() less optimal, let's make it do what it does really well, and allow a user to implement potentially unsafe-yet-fast loop using unsafe operations.
The API changes made so far turned out to increase the memory footprint
and consumption while our intention was actually decreasing them.
Memory consumption issue:
When there are many connections which does not exchange data frequently,
the old Netty 4 API spent a lot more memory than 3 because it always
allocates per-handler buffer for each connection unless otherwise
explicitly stated by a user. In a usual real world load, a client
doesn't always send requests without pausing, so the idea of having a
buffer whose life cycle if bound to the life cycle of a connection
didn't work as expected.
Memory footprint issue:
The old Netty 4 API decreased overall memory footprint by a great deal
in many cases. It was mainly because the old Netty 4 API did not
allocate a new buffer and event object for each read. Instead, it
created a new buffer for each handler in a pipeline. This works pretty
well as long as the number of handlers in a pipeline is only a few.
However, for a highly modular application with many handlers which
handles connections which lasts for relatively short period, it actually
makes the memory footprint issue much worse.
Changes:
All in all, this is about retaining all the good changes we made in 4 so
far such as better thread model and going back to the way how we dealt
with message events in 3.
To fix the memory consumption/footprint issue mentioned above, we made a
hard decision to break the backward compatibility again with the
following changes:
- Remove MessageBuf
- Merge Buf into ByteBuf
- Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler
- Similar changes were made to the adapter classes
- Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler
- Similar changes were made to the adapter classes
- Introduce MessageList which is similar to `MessageEvent` in Netty 3
- Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList)
- Replace flush(ctx, promise) with write(ctx, MessageList, promise)
- Remove ByteToByteEncoder/Decoder/Codec
- Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf>
- Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel
- Add SimpleChannelInboundHandler which is sometimes more useful than
ChannelInboundHandlerAdapter
- Bring back Channel.isWritable() from Netty 3
- Add ChannelInboundHandler.channelWritabilityChanges() event
- Add RecvByteBufAllocator configuration property
- Similar to ReceiveBufferSizePredictor in Netty 3
- Some existing configuration properties such as
DatagramChannelConfig.receivePacketSize is gone now.
- Remove suspend/resumeIntermediaryDeallocation() in ByteBuf
This change would have been impossible without @normanmaurer's help. He
fixed, ported, and improved many parts of the changes.