Motivation:
There are a few buffer leaks related to how Unpooled.wrapped and Base64.encode is used.
Modifications:
- Fix usages of Bas64.encode to correct leaks
- Clarify interface of Unpooled.wrapped* to ensure reference count ownership is clearly defined.
Result:
Reference count code is more clearly defined and less leaks are possible.
Motivation:
UTF-16 can not represent the full range of Unicode characters, and thus has the concept of Surrogate Pair (http://unicode.org/glossary/#surrogate_pair) where 2 16-bit code units can be used to represent the missing characters. ByteBufUtil.writeUtf8 is currently does not support this and is thus incomplete.
Modifications:
- Add support for surrogate pairs in ByteBufUtil.writeUtf8
Result:
ByteBufUtil.writeUtf8 now supports surrogate pairs and is correctly converting to UTF-8.
Motivation:
We missed to check if the dst is ready only before using unsafe to copy data into it which lead to data-corruption. We need to ensure we respect ready only ByteBuffer.
Modifications:
- Correctly check if the dst is ready only before copy data into it in UnsafeByteBufUtil
- Also make it work for buffers that are not direct and not have an array
Result:
No more data corruption possible if the dst buffer is readonly and unsafe buffer implementation is used.
Motivation:
Initialisation of the ByteBufUtil class, a class frequently used is
delayed because a significant number of String operations is performed to
fill a HEXDUMP_ROWPREFIXES array. This array also sticks to the Strings
forever.
It is quite likely that applications never use the hexdump facility.
Modification:
Moved the static initialisation and references to a static inner class.
This delays initialisation (and memory usage) until actually needed.
The API is kept as is.
Result:
Faster startup time, less memory usage for most netty using applications.
Motivation:
The method setBytes did not work correctly because read-only ByteBuffer
does not allow access to its underlying array.
Modifications:
New case was added for ByteBuffer's that are not direct and do not have an array.
These must be handled by copying the data into a temporary array. Unit test was
added to test this case.
Result:
It is now possible to use read-only ByteBuffer as the source
for the setBytes method.
Motivation:
In 4.1 and master the isValid utility has been moved to MathUtil. We should stay consistent for internal APIs.
Modifications:
- Move isValid to MathUtil
Result:
More consistent internal structure across branches.
Motivation:
Modulo operations are slow, we can use bitwise operation to detect if resource leak detection must be done while sampling.
Modifications:
- Ensure the interval is a power of two
- Use bitwise operation for sampling
- Add benchmark.
Result:
Faster sampling.
Motivation:
Fix a race condition that was introduced by f18990a8a5 that could lead to a NPE when allocate from the PooledByteBufAllocator concurrently by many threads.
Modifications:
Correctly synchronize on the PoolSubPage head.
Result:
No more race.
Motiviation:
We have a lot of duplicated code which makes it hard to maintain.
Modification:
Move shared code to UnsafeByteBufUtil and use it in the implementations.
Result:
Less duplicated code and so easier to maintain.
Motiviation:
We have a lot of duplicated code which makes it hard to maintain.
Modification:
Move shared code to HeapByteBufUtil and use it in the implementations.
Result:
Less duplicated code and so easier to maintain.
Motivation:
sun.misc.Unsafe allows us to handle heap ByteBuf in a more efficient matter. We should use special ByteBuf implementation when sun.misc.Unsafe can be used to increase performance.
Modifications:
- Add PooledUnsafeHeapByteBuf and UnpooledUnsafeHeapByteBuf that are used when sun.misc.Unsafe is ready to use.
- Add UnsafeHeapSwappedByteBuf
Result:
Better performance when using heap buffers and sun.misc.Unsafe is ready to use.
Motivation:
We had a bug in our implemention which double "reversed" bytes on systems which not support unaligned access.
Modifications:
- Correctly only reverse bytes if needed.
- Share code between unsafe implementations.
Result:
No more data-corruption on sytems without unaligned access.
Motivation:
When moving bytes between a PooledUnsafeDirectByteBuf or an UnpooledUnsafeDirectByteBuf
and a ByteBuffer, a temp ByteBuffer is allocated and will need to be GCed. This is a
common case since a ByteBuffer is always needed when reading/writing on a file,
for example.
Modifications:
Use PlatformDependent.copyMemory() to avoid the need for the temp ByteBuffer
Result:
No temp ByteBuffer allocated and GCed.
Motivation:
SlicedByteBuf did double reference count checking for various bulk operations, which affects performance.
Modifications:
- Add package private method to AbstractByteBuf that can be used to check indexes without check the reference count
- Use this new method in the bulk operation os SlicedByteBuf as the reference count checks take place on the wrapped buffer anyway
- Fix test-case to not try to read data that is out of the bounds of the buffer.
Result:
Better performance on bulk operations when using SlicedByteBuf (and sub-classes)
Motivation:
Some of the tests in the buffer module contained unused code. Some of the tests also used unnecessary inheritance which could be avoided to simplify code.
Modifications:
Cleanup the test cases.
Result:
Cleaner code, less cruft.
Motivation:
We need to always return a real slice even when the requested length is 0. This is needed as otherwise we not correctly share the reference count and so may leak a buffer if the user call release() on the returned slice and expect it to decrement the reference count of the "parent" buffer.
Modifications:
- Always return a real slice
- Add unit test for the bug.
Result:
No more leak possible when a user requests a slice of length 0 of a SlicedByteBuf.
Motivation:
SlicedByteBuf can be used for any ByteBuf implementations and so can not do any optimizations that could be done
when AbstractByteBuf is sliced.
Modifications:
- Add SlicedAbstractByteBuf that can eliminate range and reference count checks for _get* and _set* methods.
Result:
Faster SlicedByteBuf implementations for AbstractByteBuf sub-classes.
Motivation:
DuplicatedByteBuf can be used for any ByteBuf implementations and so can not do any optimizations that could be done
when AbstractByteBuf is duplicated.
Modifications:
- Add DuplicatedAbstractByteBuf that can eliminate range and reference count checks for _get* and _set* methods.
Result:
Faster DuplicatedByteBuf implementations for AbstractByteBuf sub-classes.
Motivation:
Calling AbstractByteBuf.toString(..., Charset) is used quite frequently by users but produce a lot of GC.
Modification:
- Use a FastThreadLocal to store the CharBuffer that are needed for decoding.
- Use internalNioBuffer(...) when possible
Result:
Less object creation / Less GC
Motiviation:
Checking reference count on every access on a ByteBuf can have some big performance overhead depending on how the access pattern is. If the user is sure that there are no reference count errors on his side it should be possible to disable the check and so gain the max performance.
Modification:
- Add io.netty.buffer.bytebuf.checkAccessible system property which allows to disable the checks. Enabled by default.
- Add microbenchmark
Result:
Increased performance for operations on the ByteBuf.
Motivation:
We should minimize and optimize bound checks as much as possible to get the most out of performance.
Modifications:
- Use bitwise operations to remove branching
- Remove branches when possible
Result:
Better performance for various operations.
Motivation:
ByteBufUtil.writeUtf8(...) / writeUsAscii(...) can use a fast-path when writing into AbstractByteBuf. We should try to unwrap WrappedByteBuf implementations so
we are able to do the same on wrapped AbstractByteBuf instances.
Modifications:
- Try to unwrap WrappedByteBuf to use the fast-path
Result:
Faster writing of utf8 and usascii for WrappedByteBuf instances.
Motivation:
As toString() is often used while logging we need to ensure this produces no exception.
Modifications:
Ensure we never throw an IllegalReferenceCountException.
Result:
Be able to log without produce exceptions.
Motivation:
We need to ensure all markers are reset when doing an allocation via the PooledByteBufAllocator. This was not the always the case.
Modifications:
Move all logic that needs to get executed when reuse a PooledByteBuf into one place and call it.
Result:
Correct behavior
Motivation:
The configurable property value recently added was not logged like others properties.
Modifications:
Added debug log with effective value applied.
Result:
Consistent with other properties
Motivation:
Leak detector, when it detects a leak, will print the last 5 stack
traces that touched the ByteBuf. In some cases that might not be enough
to identify the root cause of the leak.
Also, sometimes users might not be interested in tracing all the
operations on the buffer, but just the ones that are affecting the
reference count.
Modifications:
Added command line properties to override default values:
* Allow to configure max number of stack traces to collect
* Allow to only record retain/release operation on buffers
Result:
Users can increase the number of stack traces to debug buffer leaks
with lot of retain/release operations.
Motivation:
Currently the "derived" buffer will only ever be recycled if the release call is made on the "derived" object, and the "wrapped" buffer ends up being "fully released" (aka refcount goes to 0). From my experience this is not the common use case and thus the "derived" buffers will not be recycled.
Modifications:
- revert https://github.com/netty/netty/pull/3788
Result:
Less complexity, and less code to create new objects in majority of cases.
Motivation:
Even though MemoryRegionCache$Entry instances are allocated through a recycler they are not properly recycled,
leaving a lot of instances to be GCed along with Recycler$DefaultHandle objects.
Fixes#4071
Modification:
Recycle Entry when done using it.
Result:
Less GCed objects.
Motivation:
As we modify the position of the passed in ByteBuffer's this methods are not thread-safe.
Modifications:
Duplicate the input ByteBuffers before copy the content to byte[].
Result:
Unpooled.copiedBuffer(ByteBuffer) and copiedBuffer(ByteBuffer...) is now thread-safe.
Motivation:
The javadoc of ByteBuf contained some out-dated code.
Modifications:
Update code example in javadoc to use netty 4+ API
Result:
Correct javadocs
Motivation:
FixedCompositeByteBuf does not properly implement a number of methods for
copying its content to direct buffers and output streams
Modifications:
Replace improper use of capacity() with readableBytes() when computing offesets during writes
Result:
Copying works correctly
This implementation does not produce as much GC pressure as CompositeByteBuf and so is prefered,
for writing an array of ByteBufs. Be aware that FixedCompositeByteBuf is readonly.
When using this in a project that make heavy use of CompositeByteBuf for writes we was able to cut
down allocation to a half.
Motivation:
At the moment we use 1 * cores as default mimimum for pool arenas. This can easily lead to conditions as we use 2 * cores as default for EventLoop's when using NIO or EPOLL. If we choose a smaller number we will run into hotspots as allocation and deallocation needs to be synchronized on the PoolArena.
Modifications:
Change the default number of arenas to 2 * cores.
Result:
Less conditions when using the default settings.
Motivation:
Dumping the content of a ByteBuf in a hex format is very useful.
Modifications:
Move code into ByteBufUtil so its easy to reuse.
Result:
Easy to reuse dumping code.
Motivation:
PoolThreadCache did only cache allocations if the allocation and deallocation Thread were the same. This is not optimal as often people write from differen thread then the actual EventLoop thread.
Modification:
- Add MpscArrayQueue which was forked from jctools and lightly modified.
- Use MpscArrayQueue for caches and always add buffer back to the cache that belongs to the allocation thread.
Result:
ThreadPoolCache is now also usable and so gives performance improvements when allocation and deallocation thread are different.
Performance when using same thread for allocation and deallocation is noticable worse then before.