Motivation:
Each call to SSL_write may introduce about ~100 bytes of overhead. The OpenSslEngine (based upon OpenSSL) is not able to do gathering writes so this means each wrap operation will incur the ~100 byte overhead. This commit attempts to increase goodput by aggregating the plaintext in chunks of <a href="https://tools.ietf.org/html/rfc5246#section-6.2">2^14</a>. If many small chunks are written this can increase goodput, decrease the amount of calls to SSL_write, and decrease overall encryption operations.
Modifications:
- Introduce SslHandlerCoalescingBufferQueue in SslHandler which will aggregate up to 2^14 chunks of plaintext by default
- Introduce SslHandler#setWrapDataSize to control how much data should be aggregated for each write. Aggregation can be disabled by setting this value to <= 0.
Result:
Better goodput when using SslHandler and the OpenSslEngine.
Motivation:
1. Some encoders used a `ByteBuf#writeBytes` to write short constant byte array (2-3 bytes). This can be replaced with more faster `ByteBuf#writeShort` or `ByteBuf#writeMedium` which do not access the memory.
2. Two chained calls of the `ByteBuf#setByte` with constants can be replaced with one `ByteBuf#setShort` to reduce index checks.
3. The signature of method `HttpHeadersEncoder#encoderHeader` has an unnecessary `throws`.
Modifications:
1. Use `ByteBuf#writeShort` or `ByteBuf#writeMedium` instead of `ByteBuf#writeBytes` for the constants.
2. Use `ByteBuf#setShort` instead of chained call of the `ByteBuf#setByte` with constants.
3. Remove an unnecessary `throws` from `HttpHeadersEncoder#encoderHeader`.
Result:
A bit faster writes constants into buffers.
Motivation:
We should also use realloc when shrink the buffer to eliminate extra allocations / memory copies when possible.
Modifications:
Use realloc for expanding and shrinking when possible.
Result:
Less memory copies and allocations
Motivation:
Methods `ByteBufUtil#writeUtf8` and `ByteBufUtil#writeAscii` contains a check `ByteBuf#ensureWritable` before the calling `ByteBuf#writeBytes`. But the `ByteBuf#writeBytes` also do a such check inside.
Modifications:
Make checks more targeted.
Result:
Less redundant method calls.
Motivation:
1. `ByteBuf` contains methods to writing `CharSequence` which optimized for UTF-8 and ASCII encodings. We can also apply optimization for ISO-8859-1.
2. In many places appropriate methods are not used.
Modifications:
1. Apply optimization for ISO-8859-1 encoding in the `ByteBuf#setCharSequence` realizations.
2. Apply appropriate methods for writing `CharSequences` into buffers.
Result:
Reduce overhead from string-to-bytes conversion.
Motivation:
PR #6811 introduced a public utility methods to decode hex dump and its parts, but they are not visible from netty-common.
Modifications:
1. Move the `decodeHexByte`, `decodeHexDump` and `decodeHexNibble` methods into `StringUtils`.
2. Apply these methods where applicable.
3. Remove similar methods from other locations (e.g. `HpackHex` test class).
Result:
Less code duplication.
Motivation:
We should allow to access the memoryAddress of the wrapped ByteBuf when using ReadOnlyByteBuf for peformance reasons. If a user act on a memoryAddress its his responsible anyway to do nothing "stupid".
Modifications:
Delegate to wrapped ByteBuf.
Result:
Less performance overhead for various operations and also when writing to a native transport (which needs the memoryAddress).
Motivations:
1. There are duplicated implementations of decoding hex strings. #6797
2. ByteBufUtil.HexUtil.decodeHexDump does not handle substring start
index properly and does not decode hex byte rigorously.
Modifications:
1. Function decodeHexByte is moved from QueryStringDecoder into ByteBufUtil.
2. ByteBufUtil.HexUtil.decodeHexDump is changed to use decodeHexByte.
3. Tests are Updated accordingly.
Result:
Fixed#6797 and made hex decoding functions more robust.
Motivation:
ByteBufUtil provides a hexDump method. For debugging purposes it is often useful to decode that hex dump to get the original content, but no such method exists.
Modifications:
- Add ByteBufUtil#decodeHexDump
Result:
ByteBufUtil#decodeHexDump is available to make debugging easier.
Motivation:
The javadocs for ByteBuf#ensureWritable(int, boolean) indicate that it should not throw, and instead the return code should indicate the result of the operation. Due to a bug in AbstractByteBuf it is possible for a resize to be attempted on a buffer that may exceed maxCapacity() and therefore throw.
Modifications:
- If there is not enough space in the buffer, and force is false, then a resize should not be attempted
Result:
AbstractByteBuf#ensureWritable(int, boolean) enforces the javadoc constraints and does not throw.
Motivation:
We not correctly released all buffers in the UnpooledTest and so showed "bad" way of handling buffers to people that inspect our code to understand when a buffer needs to be released.
Modifications:
Explicit release all buffers.
Result:
Cleaner and more correct code.
Motivation:
In cases when an application is running in a container or is otherwise
constrained to the number of processors that it is using, the JVM
invocation Runtime#availableProcessors will not return the constrained
value but rather the number of processors available to the virtual
machine. Netty uses this number in sizing various resources.
Additionally, some applications will constrain the number of threads
that they are using independenly of the number of processors available
on the system. Thus, applications should have a way to globally
configure the number of processors.
Modifications:
Rather than invoking Runtime#availableProcessors, Netty should rely on a
method that enables configuration when the JVM is started or by the
application. This commit exposes a new class NettyRuntime for enabling
such configuraiton. This value can only be set once. Its default value
is Runtime#availableProcessors so that there is no visible change to
existing applications, but enables configuring either a system property
or configuring during application startup (e.g., based on settings used
to configure the application).
Additionally, we introduce the usage of forbidden-apis to prevent future
uses of Runtime#availableProcessors from creeping. Future work should
enable the bundled signatures and clean up uses of deprecated and
other forbidden methods.
Result:
Netty can be configured to not use the underlying number of processors,
but rather the constrained number of processors.
Motivation:
Unsafe.invokeCleaner(...) checks if the passed in ByteBuffer is a slice or duplicate and if so throws an IllegalArgumentException on Java9. We need to ensure we never try to free a ByteBuffer that was provided by the user directly as we not know if its a slice / duplicate or not.
Modifications:
Never try to free a ByteBuffer that was passed into UnpooledUnsafeDirectByteBuf constructor by an user (via Unpooled.wrappedBuffer(....)).
Result:
Build passes again on Java9
Motivation:
Java9 added a new method to Unsafe which allows to allocate a byte[] without memset it. This can have a massive impact in allocation times when the byte[] is big. This change allows to enable this when using Java9 with the io.netty.tryAllocateUninitializedArray property when running Java9+. Please note that you will need to open up the jdk.internal.misc package via '--add-opens java.base/jdk.internal.misc=ALL-UNNAMED' as well.
Modifications:
Allow to allocate byte[] without memset on Java9+
Result:
Better performance when allocate big heap buffers and using java9.
Motivation:
UnreleasableByteBuf operations are designed to not modify the reference count of the underlying buffer. The Retained[Duplicate|Slice] operations violate this assumption and can cause the underlying buffer's reference count to be increased, but never allow for it to be decreased. This may lead to memory leaks.
Modifications:
- UnreleasableByteBuf's Retained[Duplicate|Slice] should leave the reference count of the parent buffer unchanged after the operation completes.
Result:
No more memory leaks due to usage of the Retained[Duplicate|Slice] on an UnreleasableByteBuf object.
Motiviation:
UnsafeByteBufUtil has some bugs related to using an incorrect index, and also omitting the array paramter when dealing with byte[] objects. There is also some simplification possible with respect to type casting, and minor formatting consistentcy issues.
Modifications:
- Ensure indexing is correct when dealing with native memory
- Fix the native access and endianness for the medium/unsigned medium methods
- Ensure array is used when dealing with heap memory
- Remove unecessary casts when using long
- Fix formating and alignment
Result:
UnsafeByteBufUtil is more correct and won't access direct memory when heap arrays are used.
Motivation:
The contract of `ByteBuf.writeBytes(ByteBuf src)` is such that it will
throw an `IndexOutOfBoundsException if `src.readableBytes()` is greater than
`this.writableBytes()`. The EmptyByteBuf class will throw the exception,
even if the source buffer has zero readable bytes, in violation of the
contract.
Modifications:
Use the helper method `checkLength(..)` to check the length and throw
the exception, if appropriate.
Result:
Conformance with the stated behavior of ByteBuf.
Motivation:
PR [#6460] added a way to access the used memory of an allocator. The used naming was not very good and how things were exposed are not consistent.
Modifications:
- Add a new ByteBufAllocatorMetric and ByteBufAllocatorMetricProvider interface
- Let the ByteBufAllocator implementations implement ByteBufAllocatorMetricProvider
- Move exposed stats / metric from PooledByteBufAllocator to PooledByteBufAllocatorMetric and mark old methods as `@Deprecated`.
Result:
More consistent way to expose metric / stats for ByteBufAllocator
Motivation:
There are numerous usages of internalNioBuffer which hard code 0 for the index when the intention was to use the readerIndex().
Modifications:
- Remove hard coded 0 for the index and use readerIndex()
Result:
We are less susceptible to using the wrong index, and don't make assumptions about the ByteBufAllocator.
Motivation:
Often its useful for the user to be able to get some stats about the memory allocated via an allocator.
Modifications:
- Allow to obtain the used heap and direct memory for an allocator
- Add test case
Result:
Fixes [#6341]
Motivation:
As we may access the metrics exposed of PooledByteBufAllocator from another thread then the allocations happen we need to ensure we synchronize on the PoolArena to ensure correct visibility.
Modifications:
Synchronize on the PoolArena to ensure correct visibility.
Result:
Fix multi-thread issues on the metrics
Motivation:
Commit 8dda984afe introduced a regression which lead to the situation that the allocator is not set when PooledByteBuf.initUnpooled(...) is called. Thus it was possible that PooledByteBuf.alloc() returns null or the wrong allocator if multiple PooledByteBufAllocator are used in an application.
Modifications:
- Correctly set the allocator
- Add test-case
Result:
Fixes [#6436].
Motivation:
We have our own ThreadLocalRandom implementation to support older JDKs . That said we should prefer the JDK provided when running on JDK >= 7
Modification:
Using ThreadLocalRandom implementation of the JDK when possible.
Result:
Make use of JDK implementations when possible.
Motivation:
Java9 does not allow changing access level via reflection by default. This lead to the situation that netty disabled Unsafe completely as ByteBuffer.address could not be read.
Modification:
Use Unsafe to read the address field as this works on all Java versions.
Result:
Again be able to use Unsafe optimisations when using Netty with Java9
Motivation:
When sun.misc.Unsafe is present we want to use *Unsafe*ByteBuf implementations. We missed to do so in PooledByteBufAllocator when the heapArena is null.
Modifications:
- Correctly use UnpooledUnsafeHeapByteBuf
- Add unit tests
Result:
Use most optimal ByteBuf implementation.
Motivation:
We can eliminate unnessary wrapping when call ByteBuf.asReadOnly() in some cases to reduce indirection.
Modifications:
- Check if asReadOnly() needs to create a new instance or not
- Add test cases
Result:
Less object creation / wrapping.
Motivation:
We need to ensure we pass all tests when sun.misc.Unsafe is not present.
Modifications:
- Make *ByteBufAllocatorTest work whenever sun.misc.Unsafe is present or not
- Let Lz4FrameEncoderTest not depend on AbstractByteBufAllocator implementation details which take into account if sun.misc.Unsafe is present or not
Result:
Tests pass even without sun.misc.Unsafe.
Motivation:
We should only try to calculate the direct memory offset when sun.misc.Unsafe is present as otherwise it will fail with an NPE as PlatformDependent.directBufferAddress(...) will throw it.
This problem was introduced by 66b9be3a46.
Modifications:
Use offset of 0 if no sun.misc.Unsafe is present.
Result:
PooledByteBufAllocator also works again when no sun.misc.Unsafe is present.
Motivation:
ReadOnlyByteBufTest contains two tests which are missing the `@Test` annotation and so will never run.
Modifications:
Add missing annotation.
Result:
Tests run as expected.
Motivation:
We used various mocking frameworks. We should only use one...
Modifications:
Make usage of mocking framework consistent by only using Mockito.
Result:
Less dependencies and more consistent mocking usage.
Motivation:
64-byte alignment is recommended by the Intel performance guide (https://software.intel.com/en-us/articles/practical-intel-avx-optimization-on-2nd-generation-intel-core-processors) for data-structures over 64 bytes.
Requiring padding to a multiple of 64 bytes allows for using SIMD instructions consistently in loops without additional conditional checks. This should allow for simpler and more efficient code.
Modification:
At the moment cache alignment must be setup manually. But probably it might be taken from the system. The original code was introduced by @normanmaurer https://github.com/netty/netty/pull/4726/files
Result:
Buffer alignment works better than miss-align cache.
Motivation:
We not had tests for ByteBufAllocator implementations in general.
Modifications:
Added ByteBufAllocatorTest, AbstractByteBufAllocatorTest and UnpooledByteBufAllocatorTest
Result:
More tests for allocator implementations.