Motiviation:
Sometimes it is useful to dump the status of the PooledByteBufAllocator and log it. Doing this is currently a bit cumbersome as the user needs to basically iterate through all the metrics and compose the String. we would better provide an easy way to do this.
Modification:
Add dumpStats() method.
Result:
Easier to get a view into the status of the allocator.
Motivation:
PoolChunkList.allocate(...) should return false without the need to walk all the contained PoolChunks when the requested capacity is larger then the capacity that can be allocated out of the PoolChunks in respect to the minUsage() and maxUsage() of the PoolChunkList.
Modifications:
Precompute the maximal capacity that can be allocated out of the PoolChunks that are contained in the PoolChunkList and use this to fast return from the allocate(...) method if an allocation capacity larger then that is requested.
Result:
Faster detection of allocations that can not be handled by the PoolChunkList and so faster allocations in general via the PoolArena.
Motivation:
To better understand how much memory is used by Netty for ByteBufs it is useful to understand how many bytes are currently active (allocated) per PoolArena.
Modifications:
- Add PoolArenaMetric.numActiveBytes()
Result:
The user is able to get better insight into the PooledByteBufAllocator.
Motivation:
To make it easier to understand PoolChunk and PoolArena we should cleanup duplicated code.
Modifications:
- Move reused code into methods
- Use Math.max(...)
Result:
Cleaner code and easier to understand.
Motivation:
When doing a normal allocation in PoolArena we also tried to allocate out of the PoolChunkList that only contains completely full PoolChunks. This makes no sense as these are full anyway so an allocation will never work here and just gives a perf hit as we need to walk the whole list of PoolChunks in the list.
Modifications:
Not try to allocate from PoolChunkList that only contains full PoolChunks
Result:
Faster allocation times when a new PoolChunk must be created.
Motivation:
We should better use Math utilities as these are intrinsics. This is a cleanup for ea3ffb8536.
Modifications:
Use Math utilities.
Result:
Cleaner code and use of intrinsics.
Motivation:
When a PoolChunk needs to get moved to the previous PoolChunkList because of the minUsage / maxUsage constraints we always just moved it one level which is incorrect and so could lead to have PoolChunks in the wrong PoolChunkList (in respect to their minUsage / maxUsage settings). This then could have the effect that PoolChunks are not released / freed in a timely fashion and so.
Modifications:
- Correctly move PoolChunks between PoolChunkLists, which includes moving it multiple "levels".
- Add unit test
Result:
Correctlty move the PoolChunk to PoolChunkList when it is freed, even if its multiple layers.
Motivation:
The PoolChunkList.minUsage() and maxUsage() needs to take special action to translate Integer.MIN_VALUE / MAX_VALUE as these are used internal for tail and head of the linked-list structure.
Modifications:
- Correct the minUsage() and maxUsage() methods.
- Add unit test.
Result:
Correct metrics
Motivation:
Sometimes it is useful to allow to disable the leak detection of buffers if the UnpooledByteBufAllocator is used. This is for example true if the app wants to leak buffers into user code but not want to put the burden on the user to always release the buffer.
Modifications:
Add another constructor to UnpooledByteBufAllocator that allows to completely disable leak-detection for all buffers that are allocator out of the UnpooledByteBufAllocator.
Result:
It's possible to disable leak-detection when the UnpooledByteBufAllocator is used.
Motivation:
We should only increment the metric for the huge / normal allocation after it is done. Also we should only decrement once deallocate.
Modifications:
- Move increment after the allocation.
- Fix deallocation metric and move it after deallocation
Result:
More correct metrics.
Motivation:
PoolThreadCache includes the wrong value when throwing a IllegalArgumentException because of freeSweepAllocationThreshold.
Modifications:
Use the correct value.
Result:
Correct exception message.
Motivation:
The method setBytes creates temporary heap buffer when source buffer is read-only.
But this temporary buffer is not used correctly and may lead to data corruption.
This problem occurs when target buffer is pooled and temporary buffer
arrayOffset() is not zero.
Modifications:
Use correct arrayOffset when calling PlatformDependent.copyMemory.
Unit test was added to test this case.
Result:
Setting buffer content works correctly when target is pooled buffer and source
is read-only ByteBuffer.
Motivation:
We also need to add synchronization when access fields to ensure we see the latest updates.
Modifications:
Add synchronization when read fields that are written concurrently.
Result:
Ensure correct visibility of updated.
Motivation:
We had some double spacing in the methods which should be removed to keep things consistent.
Modifications:
Remove redundant spaces.
Result:
Cleaner / consistent coding style.
Motivation:
My previous commit b88a980482 introduced a flawed unit test,
that executes an assertion in a different thread than the test thread.
If this assertion fails, the test doesn't fail.
Modifications:
Replace the assertion by a proper workaround.
Result:
More correct unit test
Motivation:
Circular assignment of arenas to thread caches can lead to less than optimal
mappings in cases where threads are (frequently) shutdown and started.
Example Scenario:
There are a total of 2 arenas. The first two threads performing an allocation
would lead to the following mapping:
Thread 0 -> Arena 0
Thread 1 -> Arena 1
Now, assume Thread 1 is shut down and another Thread 2 is started. The current
circular assignment algorithm would lead to the following mapping:
Thread 0 -> Arena 0
Thread 2 -> Arena 0
Ideally, we want Thread 2 to use Arena 1 though.
Presumably, this is not much of an issue for most Netty applications that do all
the allocations inside the eventloop, as eventloop threads are seldomly shut down
and restarted. However, applications that only use the netty-buffer package
or implement their own threading model outside the eventloop might suffer from
increased contention. For example, gRPC Java when using the blocking stub
performs some allocations outside the eventloop and within its own thread pool
that is dynamically sized depending on system load.
Modifications:
Implement a linear scan algorithm that assigns a new thread cache to the arena
that currently backs the fewest thread caches.
Result:
Closer to ideal mappings between thread caches and arenas. In order to always
get an ideal mapping, we would have to re-balance the mapping whenever a thread
dies. However, that's difficult because of deallocation.
Motivation:
The statistic counters PoolArena.(allocationsTiny|allocationsSmall) are
not protected by a per arena lock, but by a per size class lock. Thus,
two concurrent allocations of different size (class) could lead to a
race and ultimately to wrong statistics.
Modifications:
Use a thread-safe LongCounter instead of a plain long data type.
Result:
Fewer data races.
Motivation:
See #3321
Modifications:
1. Add CharsetUtil.encoder/decoder() methods
2. Deprecate CharsetUtil.getEncoder/getDecoder() methods
Result:
Users can use new CharsetUtil.encoder/decoder() to specify error actions
Motivation:
Utility methods in ByteBufUtil to writeUtf8 and writeAscii expect a buffer to already be allocated. If the user does not have a buffer allocated they have to know details of the encoding in order to know the size of the buffer to allocate.
Modifications:
- Add writeUtf8 and writeAscii which take a ByteBufAllocator and allocate a ByteBuf of the correct size for the user
Result:
ByteBufUtil methods which are easier to use if the user doesn't already have a ByteBuf.
Motivation:
f750d6e36c added support for surrogates in the writeUtf8 conversion. However exceptions are thrown if invalid input is detected, but the JDK (and slow path of writeUtf8) uses a replacement character and does not throw. We should behave the same way.
Modificiations:
- Don't throw in ByteBufUtil.writeUtf8, and instead use a replacement character consistent with the JDK
Result:
ByteBufUtil.writeUtf8 behavior is consistent with the JDK UTF_8 conversion.
Motivation:
We missed to take the byte[] into account when try to access the bytes and so produce a segfault.
Modifications:
Correctly pass the byte[] in.
Result:
No more segfault.
Motivation:
The current interface for CompositeByteBuf.addComponent is not clear under what conditions ownership is transferred when addComponent is called. There should be a well defined behavior so that users can ensure that no leaks occur.
Modifications:
- CompositeByteBuf.addComponent should always assume reference count ownership
Result:
Users that call CompositeByteBuf.addComponent do not have to independently check if the buffer's ownership has been transferred and if not independently release the buffer.
Fixes https://github.com/netty/netty/issues/4760
Motivation:
CompositeByteBuf only implemented simple resource leak detection and how it was implemented was completly different to the way it was for ByteBuf. The other problem was that slice(), duplicate() and others would not return a resource leak enabled buffer.
Modifications:
- Proper implementation for all level of resource leak detection for CompositeByteBuf
Result:
Proper resource leak detection for CompositeByteBuf.
Motivation:
There are a few buffer leaks related to how Unpooled.wrapped and Base64.encode is used.
Modifications:
- Fix usages of Bas64.encode to correct leaks
- Clarify interface of Unpooled.wrapped* to ensure reference count ownership is clearly defined.
Result:
Reference count code is more clearly defined and less leaks are possible.
Motivation:
UTF-16 can not represent the full range of Unicode characters, and thus has the concept of Surrogate Pair (http://unicode.org/glossary/#surrogate_pair) where 2 16-bit code units can be used to represent the missing characters. ByteBufUtil.writeUtf8 is currently does not support this and is thus incomplete.
Modifications:
- Add support for surrogate pairs in ByteBufUtil.writeUtf8
Result:
ByteBufUtil.writeUtf8 now supports surrogate pairs and is correctly converting to UTF-8.
Motivation:
We missed to check if the dst is ready only before using unsafe to copy data into it which lead to data-corruption. We need to ensure we respect ready only ByteBuffer.
Modifications:
- Correctly check if the dst is ready only before copy data into it in UnsafeByteBufUtil
- Also make it work for buffers that are not direct and not have an array
Result:
No more data corruption possible if the dst buffer is readonly and unsafe buffer implementation is used.
Motivation:
Initialisation of the ByteBufUtil class, a class frequently used is
delayed because a significant number of String operations is performed to
fill a HEXDUMP_ROWPREFIXES array. This array also sticks to the Strings
forever.
It is quite likely that applications never use the hexdump facility.
Modification:
Moved the static initialisation and references to a static inner class.
This delays initialisation (and memory usage) until actually needed.
The API is kept as is.
Result:
Faster startup time, less memory usage for most netty using applications.
Motivation:
The method setBytes did not work correctly because read-only ByteBuffer
does not allow access to its underlying array.
Modifications:
New case was added for ByteBuffer's that are not direct and do not have an array.
These must be handled by copying the data into a temporary array. Unit test was
added to test this case.
Result:
It is now possible to use read-only ByteBuffer as the source
for the setBytes method.
Motivation:
In 4.1 and master the isValid utility has been moved to MathUtil. We should stay consistent for internal APIs.
Modifications:
- Move isValid to MathUtil
Result:
More consistent internal structure across branches.
Motivation:
Modulo operations are slow, we can use bitwise operation to detect if resource leak detection must be done while sampling.
Modifications:
- Ensure the interval is a power of two
- Use bitwise operation for sampling
- Add benchmark.
Result:
Faster sampling.
Motivation:
Fix a race condition that was introduced by f18990a8a5 that could lead to a NPE when allocate from the PooledByteBufAllocator concurrently by many threads.
Modifications:
Correctly synchronize on the PoolSubPage head.
Result:
No more race.
Motiviation:
We have a lot of duplicated code which makes it hard to maintain.
Modification:
Move shared code to UnsafeByteBufUtil and use it in the implementations.
Result:
Less duplicated code and so easier to maintain.
Motiviation:
We have a lot of duplicated code which makes it hard to maintain.
Modification:
Move shared code to HeapByteBufUtil and use it in the implementations.
Result:
Less duplicated code and so easier to maintain.
Motivation:
sun.misc.Unsafe allows us to handle heap ByteBuf in a more efficient matter. We should use special ByteBuf implementation when sun.misc.Unsafe can be used to increase performance.
Modifications:
- Add PooledUnsafeHeapByteBuf and UnpooledUnsafeHeapByteBuf that are used when sun.misc.Unsafe is ready to use.
- Add UnsafeHeapSwappedByteBuf
Result:
Better performance when using heap buffers and sun.misc.Unsafe is ready to use.
Motivation:
We had a bug in our implemention which double "reversed" bytes on systems which not support unaligned access.
Modifications:
- Correctly only reverse bytes if needed.
- Share code between unsafe implementations.
Result:
No more data-corruption on sytems without unaligned access.