Motivation:
Often its useful for the user to be able to get some stats about the memory allocated via an allocator.
Modifications:
- Allow to obtain the used heap and direct memory for an allocator
- Add test case
Result:
Fixes [#6341]
Motivation:
Commit 8dda984afe introduced a regression which lead to the situation that the allocator is not set when PooledByteBuf.initUnpooled(...) is called. Thus it was possible that PooledByteBuf.alloc() returns null or the wrong allocator if multiple PooledByteBufAllocator are used in an application.
Modifications:
- Correctly set the allocator
- Add test-case
Result:
Fixes [#6436].
Motivation:
We have our own ThreadLocalRandom implementation to support older JDKs . That said we should prefer the JDK provided when running on JDK >= 7
Modification:
Using ThreadLocalRandom implementation of the JDK when possible.
Result:
Make use of JDK implementations when possible.
Motivation:
Java9 does not allow changing access level via reflection by default. This lead to the situation that netty disabled Unsafe completely as ByteBuffer.address could not be read.
Modification:
Use Unsafe to read the address field as this works on all Java versions.
Result:
Again be able to use Unsafe optimisations when using Netty with Java9
Motivation:
When sun.misc.Unsafe is present we want to use *Unsafe*ByteBuf implementations. We missed to do so in PooledByteBufAllocator when the heapArena is null.
Modifications:
- Correctly use UnpooledUnsafeHeapByteBuf
- Add unit tests
Result:
Use most optimal ByteBuf implementation.
Motivation:
We can eliminate unnessary wrapping when call ByteBuf.asReadOnly() in some cases to reduce indirection.
Modifications:
- Check if asReadOnly() needs to create a new instance or not
- Add test cases
Result:
Less object creation / wrapping.
Motivation:
We need to ensure we pass all tests when sun.misc.Unsafe is not present.
Modifications:
- Make *ByteBufAllocatorTest work whenever sun.misc.Unsafe is present or not
- Let Lz4FrameEncoderTest not depend on AbstractByteBufAllocator implementation details which take into account if sun.misc.Unsafe is present or not
Result:
Tests pass even without sun.misc.Unsafe.
Motivation:
We should only try to calculate the direct memory offset when sun.misc.Unsafe is present as otherwise it will fail with an NPE as PlatformDependent.directBufferAddress(...) will throw it.
This problem was introduced by 66b9be3a46.
Modifications:
Use offset of 0 if no sun.misc.Unsafe is present.
Result:
PooledByteBufAllocator also works again when no sun.misc.Unsafe is present.
Motivation:
ReadOnlyByteBufTest contains two tests which are missing the `@Test` annotation and so will never run.
Modifications:
Add missing annotation.
Result:
Tests run as expected.
Motivation:
We used various mocking frameworks. We should only use one...
Modifications:
Make usage of mocking framework consistent by only using Mockito.
Result:
Less dependencies and more consistent mocking usage.
Motivation:
64-byte alignment is recommended by the Intel performance guide (https://software.intel.com/en-us/articles/practical-intel-avx-optimization-on-2nd-generation-intel-core-processors) for data-structures over 64 bytes.
Requiring padding to a multiple of 64 bytes allows for using SIMD instructions consistently in loops without additional conditional checks. This should allow for simpler and more efficient code.
Modification:
At the moment cache alignment must be setup manually. But probably it might be taken from the system. The original code was introduced by @normanmaurer https://github.com/netty/netty/pull/4726/files
Result:
Buffer alignment works better than miss-align cache.
Motivation:
We not had tests for ByteBufAllocator implementations in general.
Modifications:
Added ByteBufAllocatorTest, AbstractByteBufAllocatorTest and UnpooledByteBufAllocatorTest
Result:
More tests for allocator implementations.
Motivation:
PooledByteBuf.capacity(...) miss to enforce maxCapacity() and so its possible to increase the capacity of the buffer even if it will be bigger then maxCapacity().
Modifications:
- Correctly enforce maxCapacity()
- Add unit tests for capacity(...) calls.
Result:
Correctly enforce maxCapacity().
Motivation:
Disable ThreadLocal Cache, then allocate Pooled ByteBuf and release all these buffers, PoolArena's tiny/small/normal allocation count is incorrect.
Modifications:
- Calculate PoolArena's tiny/small/normal allocation one time
- Add testAllocationCounter TestCase
Result:
Fixes#6282 .
Motivation:
ByteBufUtil.compare uses long arithmetic but doesn't check for underflow on when converting from long to int to satisfy the Comparable interface. This will result in incorrect comparisons and violate the Comparable interface contract.
Modifications:
- ByteBufUtil.compare should protect against int underflow
Result:
Fixes https://github.com/netty/netty/issues/6169
Motivation:
We should assert that the leak aware buffers correctly close the ResourceLeakTracker in the unit tests.
Modifications:
- Keep track of NoopResourceLeakTrackers and check if these were closed once the test completes
- Fix bugs in tests so the buffers are all released.
Result:
Better tests for leak aware buffers
Motivation:
We need to ensure the tracked object can not be GC'ed before ResourceLeak.close() is called as otherwise we may get false-positives reported by the ResourceLeakDetector. This can happen as the JIT / GC may be able to figure out that we do not need the tracked object anymore and so already enqueue it for collection before we actually get a chance to close the enclosing ResourceLeak.
Modifications:
- Add ResourceLeakTracker and deprecate the old ResourceLeak
- Fix some javadocs to correctly release buffers.
- Add a unit test for ResourceLeakDetector that shows that ResourceLeakTracker has not the problems.
Result:
No more false-positives reported by ResourceLeakDetector when ResourceLeakDetector.track(...) is used.
Motivation:
PooledByteBufAllocatorTest uses an ArrayQueue but access it from multiple threads (not concurrently but still from different threads). This may leak to memory visibility issues.
Modifications:
- Use a concurrent queue
- Some cleanup
Result:
Non racy test code.
Motivation:
We support using Netty without sun.misc.Unsafe, so we should also support building it without it. This way we can also run all tests without sun.misc.Unsafe and so see if it works as expected.
Modifications:
Correctly skip tests that depend on sun.misc.Unsafe if its not present or -Dio.netty.noUnsafe=true is used.
Result:
Be able to build netty without sun.misc.Unsafe
Motivation:
SwappedByteBuf.unwrap() not returned the wrapped buffer but the buffer that was wrapped by the original buffer. This is not correct.
Modifications:
Correctly return wrapped buffer and fix test.
Result:
SwappedByteBuf.unwrap() works as expected.
Motivation:
We had a few tests PooledByteBufAllocatorTests which used parkNanos(...) to give a resource enough time to get destroyed. This is race and may not be good enough.
Modifications:
Ensure the ThreadCache is really destroyed.
Result:
No more racy tests that depend on ThreadCaches.
Motivation:
4bba7526e2 introduced changes which made pooled and unpooled derived buffers inconsistent in a few ways:
- Pooled derived buffers always generated a duplicate buffer when duplicate() was called and always generated a sliced buffer when slice() was called. Unpooled derived buffers some times generated a sliced buffer when duplicate() was called.
- The indexes that were set for duplicate buffers generated from slices were not always consistent.
There were also some various bugs in the derived pooled buffer implementation.
Modifications:
- Make pooled/unpooled consistently generate duplicate buffers when duplicate() is called and sliced buffers when slice() is called.
- Fix bugs in the derived pooled buffer
Result:
More consistent behavior from the derived pooled/unpooled buffers.
Motiviation:
We used ReferenceCountUtil.releaseLater(...) in our tests which simplifies a bit the releasing of ReferenceCounted objects. The problem with this is that while it simplifies stuff it increase memory usage a lot as memory may not be freed up in a timely manner.
Modifications:
- Deprecate releaseLater(...)
- Remove usage of releaseLater(...) in tests.
Result:
Less memory needed to build netty while running the tests.
Motivation:
Currently the ByteBuf created as a result of retained[Slice|Duplicate] maintains its own reference count, and when this reference count is depleated it will release the ByteBuf returned from unwrap(). The unwrap() buffer is designed to be the 'root parent' and will skip all intermediate layers of buffers. If the intermediate layers of buffers contain a retained[Slice|Duplicate] then these reference counts will be ignored during deallocation. This may lead to deallocating the 'root parent' before all derived pooled buffers are actually released. This same issue holds if a retained[Slice|Duplicate] is in the heirachy and a 'regular' slice() or duplicate() buffer is created.
Modifications:
- AbstractPooledDerivedByteBuf must maintain a reference to the direct parent (the buffer which retained[Slice|Duplicate] was called on) and release on this buffer instead of the 'root parent' returned by unwrap()
- slice() and duplicate() buffers created from AbstractPooledDerivedByteBuf must also delegate reference count operations to their immediate parent (or first ancestor which maintains an independent reference count).
Result:
Fixes https://github.com/netty/netty/issues/5999
Motivation:
Netty provides a adaptor from ByteBuf to Java's InputStream interface. The JDK Stream interfaces have an explicit lifetime because they implement the Closable interface. This lifetime may be differnt than the ByteBuf which is wrapped, and controlled by the interface which accepts the JDK Stream. However Netty's ByteBufInputStream currently does not take reference count ownership of the underlying ByteBuf. There may be no way for existing classes which only accept the InputStream interface to communicate when they are done with the stream, other than calling close(). This means that when the stream is closed it may be appropriate to release the underlying ByteBuf, as the ownership of the underlying ByteBuf resource may be transferred to the Java Stream.
Motivation:
- ByteBufInputStream.close() supports taking reference count ownership of the underyling ByteBuf
Result:
ByteBufInputStream can assume reference count ownership so the underlying ByteBuf can be cleaned up when the stream is closed.
Motivation:
In some ByteBuf implementations we not correctly implement getBytes(index, ByteBuffer).
Modifications:
Correct code to do what is defined in the javadocs and adding test.
Result:
Implementation works as described.
Motivation:
the build doesnt seem to enforce this, so they piled up
Modifications:
removed unused import lines
Result:
less unused imports
Signed-off-by: radai-rosenblatt <radai.rosenblatt@gmail.com>
Motivation:
We introduced a regression in 1abdbe6f67 which let the iteration start from the wrong index.
Modifications:
Fix start index and add tests.
Result:
Fix regression.
Motivation:
Result of ByteBufUtil.compare(ByteBuf a, ByteBuf b) is dependent on ByteOrder of supplied ByteBufs which should not be the case (as stated in the javadocs).
Modifications:
Ensure we get a consistent behavior when calling ByteBufUtil.compare(ByteBuf a, ByteBuf b) and not depend on ByteOrder.
Result:
ByteBufUtil.compare(ByteBuf a, ByteBuf b) and so AbstractByteBuf.compare(...) works correctly as stated in the javadocs.
Motivation:
Sometimes it is useful to be able to wrap an existing memory address (a.k.a pointer) and create a ByteBuf from it. This way its easier to interopt with other libraries.
Modifications:
Add a new Unpooled.wrappedBuffer(....) method that takes a memory address.
Result:
Be able to wrap an existing memory address into a ByteBuf.
Motivation:
We used incorrect assumeTrue(...) checks.
Modifications:
Fix check.
Result:
Be able to run tests also if java.nio.DirectByteBuffer.<init>(long, int) could not be accessed.
Motivation:
We not need to do an extra conditional check in retain(...) as we can just check for overflow after we did the increment.
Modifications:
- Remove extra conditional check
- Add test code.
Result:
One conditional check less.
Motivation:
When Unpooled.wrappedBuffer(...) is called with an array of ByteBuf with length >= 2 and the first ByteBuf is not readable it will result in double releasing of these empty buffers when release() is called on the returned buffer.
Modifications:
- Ensure we only wrap readable buffers.
- Add unit test
Result:
No double release of buffers.
Motivation:
retainSlice() currently does not unwrap the ByteBuf when creating the ByteBuf wrapper. This effectivley forms a linked list of ByteBuf when it is only necessary to maintain a reference to the unwrapped ByteBuf.
Modifications:
- retainSlice() and retainDuplicate() variants should only maintain a reference to the unwrapped ByteBuf
- create new unit tests which generally verify the retainSlice() behavior
- Remove unecessary generic arguments from AbstractPooledDerivedByteBuf
- Remove unecessary int length member variable from the unpooled sliced ByteBuf implementation
- Rename the unpooled sliced/derived ByteBuf to include Unpooled in their name to be more consistent with the Pooled variants
Result:
Fixes https://github.com/netty/netty/issues/5582
Motivation:
At the moment the Recyler is very sensitive to allocation bursts which means that if there is a need for X objects for only one time these will most likely end up in the Recycler and sit there forever as the normal workload only need a subset of this number.
Modifications:
Add a ratio which sets how many objects should be pooled for each new allocation. This allows to slowly increase the number of objects in the Recycler while not be to sensitive for bursts.
Result:
Less unused objects in the Recycler if allocation rate sometimes bursts.
Motivation:
SwappedByteBuf.retainedSlice(...) did not return a retained buffer.
Modifications:
Correctly delegate to retainedSlice(..) calls.
Result:
Correctly return retained slice.
Motivation:
Because of a bug we missed to include the first PoolSubpage when collection metrics.
Modifications:
- Correctly include all subpages
- Add unit test
Result:
Correctly include all subpages
Motivation:
In order to prevent a regression, add test case for a bug that caused a CompositeByteBuf to not release its components.
Modifications:
Add a test case that asserts a CompositeByteBuf's component buffers have indeed been released.
Result:
AbstractCompositeByteBuf gains a test case that will prevent future regressions.
Motivation:
See #82.
Modifications:
- Added `isText` to validate if the given ByteBuf is compliant with the specified charset.
- Optimized for UTF-8 and ASCII. For other cases, `CharsetDecoder.decoder` is used.
Result:
Users can validate ByteBuf with given charset.
Motivation:
Using the Cleaner to release the native memory has a few drawbacks:
- Cleaner.clean() uses static synchronized internally which means it can be a performance bottleneck
- It put more load on the GC
Modifications:
Add new buffer implementations that can be enabled with a system flag as optimizations. In this case no Cleaner is used at all and the user must ensure everything is always released.
Result:
Less performance impact by direct buffers when need to be allocated and released.
Motivation:
At the moment the user is responsible to increase the writer index of the composite buffer when a new component is added. We should add some methods that handle this for the user as this is the most popular usage of the composite buffer.
Modifications:
Add new methods that autoamtically increase the writerIndex when buffers are added.
Result:
Easier usage of CompositeByteBuf.
Motivation:
We missed to override a few methods and so some actions on the ByteBuf failed.
Modifications:
- Override all methods
- Add unit tests to ensure all is fixed.
Result:
All *LeakAware*ByteBuf have correct implementations
Motivation:
Some tests in PooledByteBufAllocatorTest are blocking on a CountDownLatch. We should use a timeout on these tests so these will not block forever on a failure.
Modifications:
Add timeout param to @Test annotation
Result:
Have sane timeouts on tests.
Motivation:
DefaultByteBufHolder.equals(...) and hashCode() should be implemented so it works correctly with instances that share the same content.
Modifications:
Add implementations and a unit-test.
Result:
Have correctly working equals(...) and hashCode() method
Related: #4333#4421#5128
Motivation:
slice(), duplicate() and readSlice() currently create a non-recyclable
derived buffer instance. Under heavy load, an application that creates a
lot of derived buffers can put the garbage collector under pressure.
Modifications:
- Add the following methods which creates a non-recyclable derived buffer
- retainedSlice()
- retainedDuplicate()
- readRetainedSlice()
- Add the new recyclable derived buffer implementations, which has its
own reference count value
- Add ByteBufHolder.retainedDuplicate()
- Add ByteBufHolder.replace(ByteBuf) so that..
- a user can replace the content of the holder in a consistent way
- copy/duplicate/retainedDuplicate() can delegate the holder
construction to replace(ByteBuf)
- Use retainedDuplicate() and retainedSlice() wherever possible
- Miscellaneous:
- Rename DuplicateByteBufTest to DuplicatedByteBufTest (missing 'D')
- Make ReplayingDecoderByteBuf.reject() return an exception instead of
throwing it so that its callers don't need to add dummy return
statement
Result:
Derived buffers are now recycled when created via retainedSlice() and
retainedDuplicate() and derived from a pooled buffer
Motivation:
PooledByteBufAllocatorTest.testNumThreadCachesWithNoDirrectArenas() had a race as it just used LockSupport.parkNanos(). We should better use a CountdownLatch and so be sure we really have init everything.
Modifications:
Replace LockSupport.parkNanos(...) with CountdownLatch usage
Result:
No more race in test.
Motivation:
When FixedCompositeByteBuf was constructed with new ByteBuf[0] and IndexOutOfboundsException was thrown.
Modifications:
Fix constructor
Result:
No more exception
Motivation:
We should not cache the SwappedByteBuf in AbstractByteBuf to reduce the memory footprint.
Modifications:
Not cache the SwappedByteBuf.
Result:
Less memory footprint.
Motivation:
ByteBuf.readBytes(...) uses Unpooled.buffer(...) internally which will use a heap ByteBuf and also not able to make use of the allocator which may be pooled. We should better make use of the allocator.
Modifications:
Use the allocator for thenew buffer.
Result:
Take allocator into account when copy bytes.
Motivation:
When a PoolChunk needs to get moved to the previous PoolChunkList because of the minUsage / maxUsage constraints we always just moved it one level which is incorrect and so could lead to have PoolChunks in the wrong PoolChunkList (in respect to their minUsage / maxUsage settings). This then could have the effect that PoolChunks are not released / freed in a timely fashion and so.
Modifications:
- Correctly move PoolChunks between PoolChunkLists, which includes moving it multiple "levels".
- Add unit test
Result:
Correctlty move the PoolChunk to PoolChunkList when it is freed, even if its multiple layers.
Motivation:
The PoolChunkList.minUsage() and maxUsage() needs to take special action to translate Integer.MIN_VALUE / MAX_VALUE as these are used internal for tail and head of the linked-list structure.
Modifications:
- Correct the minUsage() and maxUsage() methods.
- Add unit test.
Result:
Correct metrics
Motivation:
The method setBytes creates temporary heap buffer when source buffer is read-only.
But this temporary buffer is not used correctly and may lead to data corruption.
This problem occurs when target buffer is pooled and temporary buffer
arrayOffset() is not zero.
Modifications:
Use correct arrayOffset when calling PlatformDependent.copyMemory.
Unit test was added to test this case.
Result:
Setting buffer content works correctly when target is pooled buffer and source
is read-only ByteBuffer.
Motivation:
My previous commit b88a980482 introduced a flawed unit test,
that executes an assertion in a different thread than the test thread.
If this assertion fails, the test doesn't fail.
Modifications:
Replace the assertion by a proper workaround.
Result:
More correct unit test
Motivation:
Circular assignment of arenas to thread caches can lead to less than optimal
mappings in cases where threads are (frequently) shutdown and started.
Example Scenario:
There are a total of 2 arenas. The first two threads performing an allocation
would lead to the following mapping:
Thread 0 -> Arena 0
Thread 1 -> Arena 1
Now, assume Thread 1 is shut down and another Thread 2 is started. The current
circular assignment algorithm would lead to the following mapping:
Thread 0 -> Arena 0
Thread 2 -> Arena 0
Ideally, we want Thread 2 to use Arena 1 though.
Presumably, this is not much of an issue for most Netty applications that do all
the allocations inside the eventloop, as eventloop threads are seldomly shut down
and restarted. However, applications that only use the netty-buffer package
or implement their own threading model outside the eventloop might suffer from
increased contention. For example, gRPC Java when using the blocking stub
performs some allocations outside the eventloop and within its own thread pool
that is dynamically sized depending on system load.
Modifications:
Implement a linear scan algorithm that assigns a new thread cache to the arena
that currently backs the fewest thread caches.
Result:
Closer to ideal mappings between thread caches and arenas. In order to always
get an ideal mapping, we would have to re-balance the mapping whenever a thread
dies. However, that's difficult because of deallocation.
Motivation
See ##3229
Modifications:
Add methods with position independent FileChannel calls to ByteBuf and its subclasses.
Results:
The user can use these new methods to read/write ByteBuff without updating FileChannel's position.
Motivation:
f750d6e36c added support for surrogates in the writeUtf8 conversion. However exceptions are thrown if invalid input is detected, but the JDK (and slow path of writeUtf8) uses a replacement character and does not throw. We should behave the same way.
Modificiations:
- Don't throw in ByteBufUtil.writeUtf8, and instead use a replacement character consistent with the JDK
Result:
ByteBufUtil.writeUtf8 behavior is consistent with the JDK UTF_8 conversion.
Motivation:
There are a few buffer leaks related to how Unpooled.wrapped and Base64.encode is used.
Modifications:
- Fix usages of Bas64.encode to correct leaks
- Clarify interface of Unpooled.wrapped* to ensure reference count ownership is clearly defined.
Result:
Reference count code is more clearly defined and less leaks are possible.
Motivation:
UTF-16 can not represent the full range of Unicode characters, and thus has the concept of Surrogate Pair (http://unicode.org/glossary/#surrogate_pair) where 2 16-bit code units can be used to represent the missing characters. ByteBufUtil.writeUtf8 is currently does not support this and is thus incomplete.
Modifications:
- Add support for surrogate pairs in ByteBufUtil.writeUtf8
Result:
ByteBufUtil.writeUtf8 now supports surrogate pairs and is correctly converting to UTF-8.
Motivation:
We missed to check if the dst is ready only before using unsafe to copy data into it which lead to data-corruption. We need to ensure we respect ready only ByteBuffer.
Modifications:
- Correctly check if the dst is ready only before copy data into it in UnsafeByteBufUtil
- Also make it work for buffers that are not direct and not have an array
Result:
No more data corruption possible if the dst buffer is readonly and unsafe buffer implementation is used.
As discussed in #3209, this PR adds Little Endian accessors
to ByteBuf and descendants.
Corresponding accessors were added to UnsafeByteBufUtil,
HeapByteBufferUtil to avoid calling `reverseBytes`.
Deprecate `order()`, `order(buf)` and `SwappedByteBuf`.
Motivation:
The method setBytes did not work correctly because read-only ByteBuffer
does not allow access to its underlying array.
Modifications:
New case was added for ByteBuffer's that are not direct and do not have an array.
These must be handled by copying the data into a temporary array. Unit test was
added to test this case.
Result:
It is now possible to use read-only ByteBuffer as the source
for the setBytes method.
Motivation:
Fix a race condition that was introduced by f18990a8a5 that could lead to a NPE when allocate from the PooledByteBufAllocator concurrently by many threads.
Modifications:
Correctly synchronize on the PoolSubPage head.
Result:
No more race.
Motivation:
SlicedByteBuf did double reference count checking for various bulk operations, which affects performance.
Modifications:
- Add package private method to AbstractByteBuf that can be used to check indexes without check the reference count
- Use this new method in the bulk operation os SlicedByteBuf as the reference count checks take place on the wrapped buffer anyway
- Fix test-case to not try to read data that is out of the bounds of the buffer.
Result:
Better performance on bulk operations when using SlicedByteBuf (and sub-classes)
Motivation:
Some of the tests in the buffer module contained unused code. Some of the tests also used unnecessary inheritance which could be avoided to simplify code.
Modifications:
Cleanup the test cases.
Result:
Cleaner code, less cruft.
Motivation:
We need to always return a real slice even when the requested length is 0. This is needed as otherwise we not correctly share the reference count and so may leak a buffer if the user call release() on the returned slice and expect it to decrement the reference count of the "parent" buffer.
Modifications:
- Always return a real slice
- Add unit test for the bug.
Result:
No more leak possible when a user requests a slice of length 0 of a SlicedByteBuf.
Motivation:
Calling AbstractByteBuf.toString(..., Charset) is used quite frequently by users but produce a lot of GC.
Modification:
- Use a FastThreadLocal to store the CharBuffer that are needed for decoding.
- Use internalNioBuffer(...) when possible
Result:
Less object creation / Less GC
Motivation:
ByteBufUtil.writeUtf8(...) / writeUsAscii(...) can use a fast-path when writing into AbstractByteBuf. We should try to unwrap WrappedByteBuf implementations so
we are able to do the same on wrapped AbstractByteBuf instances.
Modifications:
- Try to unwrap WrappedByteBuf to use the fast-path
Result:
Faster writing of utf8 and usascii for WrappedByteBuf instances.
Motivation:
As toString() is often used while logging we need to ensure this produces no exception.
Modifications:
Ensure we never throw an IllegalReferenceCountException.
Result:
Be able to log without produce exceptions.
Motivation:
The logic in ByteBufUtilTest.ByteBufUtilTest is wrong. It is attempting to ensure at least 1 byte is different in the ranges that will be subsequently compared, but does so before the copy operation.
Modifications:
- Move the code which ensures there is a difference to after the copy
- Simplify the logic which ensures there is a difference
Result:
Unit test now operates as designed.
Motivation:
ByteBufUtilTest.notEqualsBufferSubsections is testing non-equality but just uses random numbers to assume they will not be equal. Even after the random bytes are generated we should check they are infact not equal so the test has no chance of failing when it should not.
Modifications:
- Loop through bytes in notEqualsBufferSubsections after they are randomly generated to ensure there is atleast 1 difference.
Result:
More reliable unit tests.
Motivation:
We need to ensure all markers are reset when doing an allocation via the PooledByteBufAllocator. This was not the always the case.
Modifications:
Move all logic that needs to get executed when reuse a PooledByteBuf into one place and call it.
Result:
Correct behavior
Motivation:
When AsciiString is used we can optimize the write operation done by ByteBufUtil.writeUsAscii(...)
Modifications:
Sepcial handle AsciiString.
Result:
Faster writing of AsciiString.
Motiviation:
The current read loops don't fascilitate reading a maximum amount of bytes. This capability is useful to have more fine grain control over how much data is injested.
Modifications:
- Add a setMaxBytesPerRead(int) and getMaxBytesPerRead() to ChannelConfig
- Add a setMaxBytesPerIndividualRead(int) and getMaxBytesPerIndividualRead to ChannelConfig
- Add methods to RecvByteBufAllocator so that a pluggable scheme can be used to control the behavior of the read loop.
- Modify read loop for all transport types to respect the new RecvByteBufAllocator API
Result:
The ability to control how many bytes are read for each read operation/loop, and a more extensible read loop.
Motivation:
FixedCompositeByteBuf does not properly implement a number of methods for
copying its content to direct buffers and output streams
Modifications:
Replace improper use of capacity() with readableBytes() when computing offesets during writes
Result:
Copying works correctly
Motivation:
When allocate a PooledByteBuf we need to ensure to also reset the markers for the readerIndex and writerIndex.
Modifications:
- Correct reset the markers
- Add test-case for it
Result:
Correctly reset markers.
Motivation:
CompositeByteBuf.iterator() currently creates a new ArrayList and fill it with the ByteBufs, which is more expensive then it needs to be.
Modifications:
- Use special Iterator implementation
Result:
Less overhead when calling iterator()
Motivation:
The usage and code within AsciiString has exceeded the original design scope for this class. Its usage as a binary string is confusing and on the verge of violating interface assumptions in some spots.
Modifications:
- ByteString will be created as a base class to AsciiString. All of the generic byte handling processing will live in ByteString and all the special character encoding will live in AsciiString.
Results:
The AsciiString interface will be clarified. Users of AsciiString can now be clear of the limitations the class imposes while users of the ByteString class don't have to live with those limitations.
Motivation:
The DefaultHttp2ConnectionDecoder class is calling verifyPrefaceReceived() for almost every frame event at all times.
The Http2ConnectionHandler class is calling readClientPrefaceString() on every decode event.
Modifications:
- DefaultHttp2ConnectionDecoder should not have to continuously call verifyPrefaceReceived() because it transitions boolean state 1 time for each connection.
- Http2ConnectionHandler should not have to continuously call readClientPrefaceString() because it transitions boolean state 1 time for each connection.
Result:
- Less conditional checks for the mainstream usage of the connection.
Motivation:
At the moment we have two problems:
- CompositeByteBuf.addComponent(...) will not add the supplied buffer to the CompositeByteBuf if its empty, which means it will not be released on CompositeByteBuf.release() call. This is a problem as a user will expect everything added will be released (the user not know we not added it).
- CompositeByteBuf.addComponents(...) will either add no buffers if none is readable and so has the same problem as addComponent(...) or directly release the ByteBuf if at least one ByteBuf is readable. Again this gives inconsistent handling and may lead to memory leaks.
Modifications:
- Always add the buffer to the CompositeByteBuf and so release it on release call.
Result:
Consistent handling and no buffer leaks.
- Ensure an EmptyByteBuf has an array, an NIO buffer, and a memory
address at the same time
- Add an assertion that checks if EMPTY_BUFFER is an EmptyByteBuf,
just in case we make a mistake in the future
Motivation:
We expose no methods in ByteBuf to directly write a CharSequence into it. This leads to have the user either convert the CharSequence first to a byte array or use CharsetEncoder. Both cases have some overheads and we can do a lot better for well known Charsets like UTF-8 and ASCII.
Modifications:
Add ByteBufUtil.writeAscii(...) and ByteBufUtil.writeUtf8(...) which can do the task in an optimized way. This is especially true if the passed in ByteBuf extends AbstractByteBuf which is true for all of our implementations which not wrap another ByteBuf.
Result:
Writing an ASCII and UTF-8 CharSequence into a AbstractByteBuf is a lot faster then what the user could do by himself as we can make use of some package private methods and so eliminate reference and range checks. When the Charseq is not ASCII or UTF-8 we can still do a very good job and are on par in most of the cases with what the user would do.
The following benchmark shows the improvements:
Result: 2456866.966 ?(99.9%) 59066.370 ops/s [Average]
Statistics: (min, avg, max) = (2297025.189, 2456866.966, 2586003.225), stdev = 78851.914
Confidence interval (99.9%): [2397800.596, 2515933.336]
Benchmark Mode Samples Score Score error Units
i.n.m.b.ByteBufUtilBenchmark.writeAscii thrpt 50 9398165.238 131503.098 ops/s
i.n.m.b.ByteBufUtilBenchmark.writeAsciiString thrpt 50 9695177.968 176684.821 ops/s
i.n.m.b.ByteBufUtilBenchmark.writeAsciiStringViaArray thrpt 50 4788597.415 83181.549 ops/s
i.n.m.b.ByteBufUtilBenchmark.writeAsciiStringViaArrayWrapped thrpt 50 4722297.435 98984.491 ops/s
i.n.m.b.ByteBufUtilBenchmark.writeAsciiStringWrapped thrpt 50 4028689.762 66192.505 ops/s
i.n.m.b.ByteBufUtilBenchmark.writeAsciiViaArray thrpt 50 3234841.565 91308.009 ops/s
i.n.m.b.ByteBufUtilBenchmark.writeAsciiViaArrayWrapped thrpt 50 3311387.474 39018.933 ops/s
i.n.m.b.ByteBufUtilBenchmark.writeAsciiWrapped thrpt 50 3379764.250 66735.415 ops/s
i.n.m.b.ByteBufUtilBenchmark.writeUtf8 thrpt 50 5671116.821 101760.081 ops/s
i.n.m.b.ByteBufUtilBenchmark.writeUtf8String thrpt 50 5682733.440 111874.084 ops/s
i.n.m.b.ByteBufUtilBenchmark.writeUtf8StringViaArray thrpt 50 3564548.995 55709.512 ops/s
i.n.m.b.ByteBufUtilBenchmark.writeUtf8StringViaArrayWrapped thrpt 50 3621053.671 47632.820 ops/s
i.n.m.b.ByteBufUtilBenchmark.writeUtf8StringWrapped thrpt 50 2634029.071 52304.876 ops/s
i.n.m.b.ByteBufUtilBenchmark.writeUtf8ViaArray thrpt 50 3397049.332 57784.119 ops/s
i.n.m.b.ByteBufUtilBenchmark.writeUtf8ViaArrayWrapped thrpt 50 3318685.262 35869.562 ops/s
i.n.m.b.ByteBufUtilBenchmark.writeUtf8Wrapped thrpt 50 2473791.249 46423.114 ops/s
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1,387.417 sec - in io.netty.microbench.buffer.ByteBufUtilBenchmark
Results :
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
Results :
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
The *ViaArray* benchmarks are basically doing a toString().getBytes(Charset) which the others are using ByteBufUtil.write*(...).
Motivation:
CompositeByteBuf.nioBuffers(...) returns an empty ByteBuffer array if the specified length is 0. This is not consistent with other ByteBuf implementations which return an ByteBuffer array of size 1 with an empty ByteBuffer included.
Modifications:
Make CompositeByteBuf.nioBuffers(...) consistent with other ByteBuf implementations.
Result:
Consistent and correct behaviour of nioBufffers(...)
Motivation:
When calling slice(...) on a ByteBuf the returned ByteBuf should be the slice of a ByteBuf and shares it's reference count. This is important as it is perfect legal to use buf.slice(...).release() and have both, the slice and the original ByteBuf released. At the moment this is only the case if the requested slice size is > 0. This makes the behavior inconsistent and so may lead to a memory leak.
Modifications:
- Never return Unpooled.EMPTY_BUFFER when calling slice(...).
- Adding test case for buffer.slice(...).release() and buffer.duplicate(...).release()
Result:
Consistent behaviour and so no more leaks possible.
Motivation:
Before we missed to check if a buffer was released before we return the backing byte array or memoryaddress. This could lead to JVM crashes when someone tried various bulk operations on the Unsafe*ByteBuf implementations.
Modifications:
Always check if the buffer is released before all to return the byte array and memoryaddress.
Result:
No more JVM crashes because of released buffers when doing bulk operations on Unsafe*ByteBuf implementations.
Motivation:
We received a bug-report that the ByteBuf.refCnt() does sometimes not show the correct value when release() and refCnt() is called from different Threads.
Modifications:
Add test-case which shows that all is working like expected
Result:
Test-case added which shows everything is ok.
Motivation:
I introduced ensureAccessible() class as part of 6c47cc9711 in some places. Unfortunally I also added some where these are not needed and so caused a performance regression.
Modification:
Remove calls where not needed.
Result:
Fixed performance regression.
Motivation:
AbstractByteBufTest.testInternalBuffer() uses writeByte() operations to
populate the sample data. Usually, this isn't a problem, but it starts
to take a lot of time when the resource leak detection level gets
higher.
In our CI machine, testInternalBuffer() takes more than 30 minutes,
causing the build timeout when the 'leak' profile is active (paranoid
level resource detection.)
Modification:
Populate the sample data using ThreadLocalRandom.nextBytes() instead of
using millions of writeByte() operations.
Result:
Test runs much faster when leak detection level is high.
Motivation:
Because of how we use reference counting we need to check for the reference count before each operation that touches the underlying memory. This is especially true as we use sun.misc.Cleaner.clean() to release the memory ASAP when possible. Because of this the user may cause a SEGFAULT if an operation is called that tries to access the backing memory after it was released.
Modification:
Correctly check the reference count on all methods that access the underlying memory or expose it via a ByteBuffer.
Result:
Safer usage of ByteBuf
Motivation:
Persuit for the consistency in method naming
Modifications:
- Remove the 'get' prefix from all HTTP/SPDY message classes
- Fix some inspector warnings
Result:
Consistency
Motivation:
To improve the speed of ByteBuf with order LITTLE_ENDIAN and where the native order is also LITTLE_ENDIAN (intel) we introduces a new special SwappedByteBuf before in commit 4ad3984c8b. Unfortunally the commit has a flaw which does not handle correctly the case when a ByteBuf expands. This was caused because the memoryAddress was cached and never changed again even if the underlying buffer expanded. This can lead to corrupt data or even to SEGFAULT the JVM if you are lucky enough.
Modification:
Always lookup the actual memoryAddress of the wrapped ByteBuf.
Result:
No more data-corruption for ByteBuf with order LITTLE_ENDIAN and no JVM crashes.
Motivation:
PoolArena's 'normalizeCapacity' function was micro-optimized some
time ago to remove a while loop. However, there was a change of
behavior in the function as a result. Capacities passed into it
that are already powers of 2 (and >= 512) are doubled in size. So
if I ask for a buffer with a capacity of 1024, I will get back one
that actually uses 2048 bytes (stored in maxLength).
Aligning to powers of two for book keeping ease is reasonable,
and if someone tries to expand a buffer, you might as well use some
of the previously wasted space. However, since this distinction
between 'easily expanded' and 'costly to expand' space is not
supported at all by the APIs, I cannot imagine this change to
doubling is desirable or intentional.
This is especially costly when using composite buffers. They
frequently allocate components with a capacity that is a power of
2, and they never attempt to expand components themselves. The end
result is that heavy use of pool-backed composite buffers wastes
almost half of the memory pool (the smaller / initial components are
<512 and so are not affected by the off-by-one bug).
Modifications:
Although I find it difficult to believe that such an optimization
is really helpful, I left it in and fixed the off-by-one issue by
decrementing the value at the start.
I also added a simple test to both attempt to verify that the
decrement fixes the issue without introducing any other change, and
to make it easy for a reviewer to test the existing behavior. PoolArena
does not seem to have much testing or testability support though so
the test is kind of a hack and will break for unrelated changes. I
suggest either removing it or factoring out the single non-static
portion of normalizeCapacity so that the fragile dummy PoolArena is
not required.
Result:
Pooled allocators will allocate less resources to the highly
inefficient and undocumented buffer section between length and
maxLength.
Composite buffers of non-trivial size that are backed by pooled
allocators will use about half as much memory.