Motivation
It would be useful to be able to write UTF-8 encoded subsequence of
CharSequence characters to a ByteBuf without needing to create a
temporary object via CharSequence#subSequence().
Modification
Add overloads of ByteBufUtil writeUtf8, reserveAndWriteUtf8 and
utf8Bytes methods which take explicit subsequence bounds.
Result
More efficient writing of substrings to byte buffers possible
Motivation
AbstractReferenceCounted and AbstractReferenceCountedByteBuf contain
duplicate logic for managing the volatile refcount in an optimized and
consistent manner, which increased in complexity in #8583. It's possible
to extract this into a common helper class now that all access is via an
AtomicIntegerFieldUpdater.
Modifications
- Move duplicate logic into a shared ReferenceCountUpdater class
- Incorporate some additional simplification for the most common single
increment/decrement cases (fewer checks/operations)
Result
Less code duplication, better encapsulation of the "non-trivial"
internal volatile refcount manipulation
Motivation:
We have a utility method to check for > 0 and >0 arguments. We should use it.
Modification:
use checkPositive/checkPositiveOrZero instead of if statement.
Result:
Re-use utility method.
Motivation:
ByteBuf is used everywhere so we should try hard to be able to make things inlinable. During benchmarks it showed that writeCharSequence(...) fails to inline writeUtf8 because it is too big even if its hots.
Modifications:
Move less common code-path to extra method to allow inlining.
Result:
Be able to inline writeUtf8 in most cases.
Motivation:
Two similar bugs were introduced by myself in separate recent PRs #8393
and #8464, while optimizing the assignment/handling of temporary arrays
in ByteBufUtil and UnsafeByteBufUtil.
The temp arrays allocated for buffering data written to an OutputStream
are incorrectly sized to the full length of the data to copy rather than
being capped at WRITE_CHUNK_SIZE.
Unfortunately one of these is in the 4.1.31.Final release, I'm really
sorry and will be more careful in future.
This kind of thing is tricky to cover in unit tests.
Modifications:
Revert the temp array allocations back to their original sizes.
Avoid making duplicate calls to ByteBuf.capacity() in a couple of places
in ByteBufUtil (unrelated thing I noticed, can remove it from this PR if
desired!)
Result:
Temporary byte arrays will be reverted to their originally intended
sizes.
Motivation:
#8388 introduced a reusable ThreadLocal<byte[]> for use in
decodeString(...). It can be used in more places in the buffer package
to avoid temporary allocations of small arrays.
Modifications:
Encapsulate use of the ThreadLocal in a static package-private
ByteBufUtil.threadLocalTempArray(int) method, and make use of it from a
handful of new places including ByteBufUtil.readBytes(...).
Result:
Fewer short-lived small byte array allocations.
Motivation:
There are currently many more places where this could be used which were
possibly not considered when the method was added.
If https://github.com/netty/netty/pull/8388 is included in its current
form, a number of these places could additionally make use of the same
BYTE_ARRAYS threadlocal.
There's also a couple of adjacent places where an optimistically-pooled
heap buffer is used for temp byte storage which could use the
threadlocal too in preference to allocating a temp heap bytebuf wrapper.
For example
https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/ByteBufUtil.java#L1417.
Modifications:
Replace new byte[] with PlatformDependent.allocateUninitializedArray()
where appropriate; make use of ByteBufUtil.getBytes() in some places
which currently perform the equivalent logic, including avoiding copy of
backing array if possible (although would be rare).
Result:
Further potential speed-up with java9+ and appropriate compile flags.
Many of these places could be on latency-sensitive code paths.
Motivation:
We need to add special handling for WrappedCompositeByteBuf as these also extend AbstractByteBuf, otherwise we will not correctly adjust / read the writerIndex during processing.
Modifications:
- Add instanceof checks for WrappedCompositeByteBuf as well.
- Add testcases
Result:
Fixes https://github.com/netty/netty/issues/8152.
Motivation:
The `AbstractByteBuf#equals` method doesn't take into account the
class of buffer instance. So the two buffers with different classes
must have the same `hashCode` values if `equals` method returns `true`.
But `EmptyByteBuf#hashCode` is not consistent with `#hashCode`
of the empty `AbstractByteBuf`, that is violates the contract and
can lead to errors.
Modifications:
Return `1` in `EmptyByteBuf#hashCode`.
Result:
Consistent behavior of `EmptyByteBuf#hashCode` and `AbstractByteBuf#hashCode`.
Motivation:
Currently copying a direct ByteBuf copies it fully into the heap before writing it to an output stream.
The can result in huge memory usage on the heap.
Modification:
copy the bytebuf contents via an 8k buffer into the output stream
Result:
Fixes#7804
Motivation:
To avoid eager allocation of the destination and to perform length prefixed encoding of UTF-8 string with forward only access pattern
Modifications:
The original writeUtf8 is modified by allowing customization of the reserved bytes on the destination buffer and is introduced an exact UTF-8 length estimator.
Result:
Is now possible to perform length first encoding with UTF-8 well-formed char sequences following a forward only write access pattern on the destination buffer.
Motivation:
ByteBufUtil by default will cache DirectByteBuffer objects, and the
associated direct memory (up to 64k). In combination with the Recycler which may
cache up to 32k elements per thread may lead to a large amount of direct
memory being retained per EventLoop thread. As traffic spikes come this
may be perceived as a memory leak because the memory in the Recycler
will never be reclaimed.
Modifications:
- By default we shouldn't cache DirectByteBuffer objects.
Result:
Less direct memory consumption due to caching DirectByteBuffer objects.
Motivation:
ByteBufUtil.isText(...) may produce unexpected results if called concurrently on the same ByteBuffer.
Modifications:
- Don't use internalNioBuffer where it is not safe.
- Add unit test.
Result:
ByteBufUtil.isText is thread-safe.
Motivation:
Calling ByteBuf.toString(Charset) on the same buffer from multiple threads at the same time produces unexpected results, such as various exceptions and/or corrupted output. This is because ByteBufUtil.decodeString(...) is taking the source ByteBuffer for CharsetDecoder.decode() from ByteBuf.internalNioBuffer(int, int), which is not thread-safe.
Modification:
Call ByteBuf.nioBuffer() instead of ByteBuf.internalNioBuffer() to get the source buffer to pass to CharsetDecoder.decode().
Result:
Fixes the possible race condition.
Motivation:
Javadoc of the `ByteBufUtil#copy(AsciiString, int, ByteBuf, int, int)` is incorrect.
Modifications:
Fix it.
Result:
The description of the `#copy` method is not misleading.
Motivation:
Missing return in ByteBufUtil#writeAscii causes endless loop
Modifications:
Add return after write finished
Result:
ByteBufUtil#writeAscii is ok
Motivation:
AbstractByteBuf.ensureWritable(...) should check if buffer was released and if so throw an IllegalReferenceCountException
Modifications:
Ensure we throw in all cases.
Result:
More consistent and correct behaviour
Motivation:
Each call to SSL_write may introduce about ~100 bytes of overhead. The OpenSslEngine (based upon OpenSSL) is not able to do gathering writes so this means each wrap operation will incur the ~100 byte overhead. This commit attempts to increase goodput by aggregating the plaintext in chunks of <a href="https://tools.ietf.org/html/rfc5246#section-6.2">2^14</a>. If many small chunks are written this can increase goodput, decrease the amount of calls to SSL_write, and decrease overall encryption operations.
Modifications:
- Introduce SslHandlerCoalescingBufferQueue in SslHandler which will aggregate up to 2^14 chunks of plaintext by default
- Introduce SslHandler#setWrapDataSize to control how much data should be aggregated for each write. Aggregation can be disabled by setting this value to <= 0.
Result:
Better goodput when using SslHandler and the OpenSslEngine.
Motivation:
1. Some encoders used a `ByteBuf#writeBytes` to write short constant byte array (2-3 bytes). This can be replaced with more faster `ByteBuf#writeShort` or `ByteBuf#writeMedium` which do not access the memory.
2. Two chained calls of the `ByteBuf#setByte` with constants can be replaced with one `ByteBuf#setShort` to reduce index checks.
3. The signature of method `HttpHeadersEncoder#encoderHeader` has an unnecessary `throws`.
Modifications:
1. Use `ByteBuf#writeShort` or `ByteBuf#writeMedium` instead of `ByteBuf#writeBytes` for the constants.
2. Use `ByteBuf#setShort` instead of chained call of the `ByteBuf#setByte` with constants.
3. Remove an unnecessary `throws` from `HttpHeadersEncoder#encoderHeader`.
Result:
A bit faster writes constants into buffers.
Motivation:
Methods `ByteBufUtil#writeUtf8` and `ByteBufUtil#writeAscii` contains a check `ByteBuf#ensureWritable` before the calling `ByteBuf#writeBytes`. But the `ByteBuf#writeBytes` also do a such check inside.
Modifications:
Make checks more targeted.
Result:
Less redundant method calls.
Motivation:
1. `ByteBuf` contains methods to writing `CharSequence` which optimized for UTF-8 and ASCII encodings. We can also apply optimization for ISO-8859-1.
2. In many places appropriate methods are not used.
Modifications:
1. Apply optimization for ISO-8859-1 encoding in the `ByteBuf#setCharSequence` realizations.
2. Apply appropriate methods for writing `CharSequences` into buffers.
Result:
Reduce overhead from string-to-bytes conversion.
Motivation:
PR #6811 introduced a public utility methods to decode hex dump and its parts, but they are not visible from netty-common.
Modifications:
1. Move the `decodeHexByte`, `decodeHexDump` and `decodeHexNibble` methods into `StringUtils`.
2. Apply these methods where applicable.
3. Remove similar methods from other locations (e.g. `HpackHex` test class).
Result:
Less code duplication.
Motivations:
1. There are duplicated implementations of decoding hex strings. #6797
2. ByteBufUtil.HexUtil.decodeHexDump does not handle substring start
index properly and does not decode hex byte rigorously.
Modifications:
1. Function decodeHexByte is moved from QueryStringDecoder into ByteBufUtil.
2. ByteBufUtil.HexUtil.decodeHexDump is changed to use decodeHexByte.
3. Tests are Updated accordingly.
Result:
Fixed#6797 and made hex decoding functions more robust.
Motivation:
ByteBufUtil provides a hexDump method. For debugging purposes it is often useful to decode that hex dump to get the original content, but no such method exists.
Modifications:
- Add ByteBufUtil#decodeHexDump
Result:
ByteBufUtil#decodeHexDump is available to make debugging easier.
Motivation:
There are numerous usages of internalNioBuffer which hard code 0 for the index when the intention was to use the readerIndex().
Modifications:
- Remove hard coded 0 for the index and use readerIndex()
Result:
We are less susceptible to using the wrong index, and don't make assumptions about the ByteBufAllocator.
Motivation:
When An HTTP server is listening in plaintext mode, it doesn't have
a chance to negotiate "h2" in the tls handshake. HTTP 1 clients
that are not expecting an HTTP2 server will accidentally a request
that isn't an upgrade, which the HTTP/2 decoder will not
understand. The decoder treats the bytes as hex and adds them to
the error message.
These error messages are hard to understand by humans, and result
in extra, manual work to decode.
Modification:
If the first bytes of the request are not the preface, the decoder
will now see if they are an HTTP/1 request first. If so, the error
message will include the method and path of the original request in
the error message.
In case the path is long, the decoder will check up to the first
1024 bytes to see if it matches. This could be a DoS vector if
tons of bad requests or other garbage come in. A future optimization
would be to treat the first few bytes as an AsciiString and not do
any Charset decoding. ByteBuf.toCharSequence alludes to such an
optimization.
The code has been left simple for the time being.
Result:
Faster identification of errant HTTP requests.
Motivation:
ByteBufUtil.compare uses long arithmetic but doesn't check for underflow on when converting from long to int to satisfy the Comparable interface. This will result in incorrect comparisons and violate the Comparable interface contract.
Modifications:
- ByteBufUtil.compare should protect against int underflow
Result:
Fixes https://github.com/netty/netty/issues/6169
Motivation:
Result of ByteBufUtil.compare(ByteBuf a, ByteBuf b) is dependent on ByteOrder of supplied ByteBufs which should not be the case (as stated in the javadocs).
Modifications:
Ensure we get a consistent behavior when calling ByteBufUtil.compare(ByteBuf a, ByteBuf b) and not depend on ByteOrder.
Result:
ByteBufUtil.compare(ByteBuf a, ByteBuf b) and so AbstractByteBuf.compare(...) works correctly as stated in the javadocs.
Motivation:
See #82.
Modifications:
- Added `isText` to validate if the given ByteBuf is compliant with the specified charset.
- Optimized for UTF-8 and ASCII. For other cases, `CharsetDecoder.decoder` is used.
Result:
Users can validate ByteBuf with given charset.
Motivation:
Recycler.recycle(...) should not be used anymore and be replaced by Handle.recycle().
Modifications:
Mark it as deprecated and update usage.
Result:
Correctly document deprecated api.
Motivation:
Often users either need to read or write CharSequences to a ByteBuf. We should add methods for this to ByteBuf as we can do some optimizations for this depending on the implementation.
Modifications:
Add setCharSequence, writeCharSequence, getCharSequence and readCharSequence
Result:
Easier reading / writing of CharSequence with ByteBuf.
Motivation:
See #1811
Modifications:
Add LineEncoder and LineSeparator
Result:
The user can use LineEncoder to write a String with a line separator automatically
Motivation:
See #3321
Modifications:
1. Add CharsetUtil.encoder/decoder() methods
2. Deprecate CharsetUtil.getEncoder/getDecoder() methods
Result:
Users can use new CharsetUtil.encoder/decoder() to specify error actions
Motivation:
Utility methods in ByteBufUtil to writeUtf8 and writeAscii expect a buffer to already be allocated. If the user does not have a buffer allocated they have to know details of the encoding in order to know the size of the buffer to allocate.
Modifications:
- Add writeUtf8 and writeAscii which take a ByteBufAllocator and allocate a ByteBuf of the correct size for the user
Result:
ByteBufUtil methods which are easier to use if the user doesn't already have a ByteBuf.
Motivation:
f750d6e36c added support for surrogates in the writeUtf8 conversion. However exceptions are thrown if invalid input is detected, but the JDK (and slow path of writeUtf8) uses a replacement character and does not throw. We should behave the same way.
Modificiations:
- Don't throw in ByteBufUtil.writeUtf8, and instead use a replacement character consistent with the JDK
Result:
ByteBufUtil.writeUtf8 behavior is consistent with the JDK UTF_8 conversion.
Motivation:
There are some wrong links and tags in javadoc.
Modifications:
Fix the wrong links and tags in javadoc.
Result:
These links will work correctly in javadoc.
Motivation:
UTF-16 can not represent the full range of Unicode characters, and thus has the concept of Surrogate Pair (http://unicode.org/glossary/#surrogate_pair) where 2 16-bit code units can be used to represent the missing characters. ByteBufUtil.writeUtf8 is currently does not support this and is thus incomplete.
Modifications:
- Add support for surrogate pairs in ByteBufUtil.writeUtf8
Result:
ByteBufUtil.writeUtf8 now supports surrogate pairs and is correctly converting to UTF-8.
Motivation:
Initialisation of the ByteBufUtil class, a class frequently used is
delayed because a significant number of String operations is performed to
fill a HEXDUMP_ROWPREFIXES array. This array also sticks to the Strings
forever.
It is quite likely that applications never use the hexdump facility.
Modification:
Moved the static initialisation and references to a static inner class.
This delays initialisation (and memory usage) until actually needed.
The API is kept as is.
Result:
Faster startup time, less memory usage for most netty using applications.
Motivation:
The HTTP/2 RFC (https://tools.ietf.org/html/rfc7540#section-8.1.2) indicates that header names consist of ASCII characters. We currently use ByteString to represent HTTP/2 header names. The HTTP/2 RFC (https://tools.ietf.org/html/rfc7540#section-10.3) also eludes to header values inheriting the same validity characteristics as HTTP/1.x. Using AsciiString for the value type of HTTP/2 headers would allow for re-use of predefined HTTP/1.x values, and make comparisons more intuitive. The Headers<T> interface could also be expanded to allow for easier use of header types which do not have the same Key and Value type.
Motivation:
- Change Headers<T> to Headers<K, V>
- Change Http2Headers<ByteString> to Http2Headers<CharSequence, CharSequence>
- Remove ByteString. Having AsciiString extend ByteString complicates equality comparisons when the hash code algorithm is no longer shared.
Result:
Http2Header types are more representative of the HTTP/2 RFC, and relationship between HTTP/2 header name/values more directly relates to HTTP/1.x header names/values.
Motivation:
Calling AbstractByteBuf.toString(..., Charset) is used quite frequently by users but produce a lot of GC.
Modification:
- Use a FastThreadLocal to store the CharBuffer that are needed for decoding.
- Use internalNioBuffer(...) when possible
Result:
Less object creation / Less GC
Motivation:
ByteBufUtil.writeUtf8(...) / writeUsAscii(...) can use a fast-path when writing into AbstractByteBuf. We should try to unwrap WrappedByteBuf implementations so
we are able to do the same on wrapped AbstractByteBuf instances.
Modifications:
- Try to unwrap WrappedByteBuf to use the fast-path
Result:
Faster writing of utf8 and usascii for WrappedByteBuf instances.
Motivation:
When AsciiString is used we can optimize the write operation done by ByteBufUtil.writeUsAscii(...)
Modifications:
Sepcial handle AsciiString.
Result:
Faster writing of AsciiString.
Motivation:
Dumping the content of a ByteBuf in a hex format is very useful.
Modifications:
Move code into ByteBufUtil so its easy to reuse.
Result:
Easy to reuse dumping code.
Motivation:
The way of firstIndexOf and lastIndexOf iterating the ByteBuf is similar to forEachByte and forEachByteDesc, but have many range checks.
Modifications:
Use forEachByte and a IndexOfProcessor to find occurrence.
Result:
eliminate range checks
Motivation:
The ByteString class currently assumes the underlying array will be a complete representation of data. This is limiting as it does not allow a subsection of another array to be used. The forces copy operations to take place to compensate for the lack of API support.
Modifications:
- add arrayOffset method to ByteString
- modify all ByteString and AsciiString methods that loop over or index into the underlying array to use this offset
- update all code that uses ByteString.array to ensure it accounts for the offset
- add unit tests to test the implementation respects the offset
Result:
ByteString and AsciiString can represent a sub region of a byte[].
Motivation:
The usage and code within AsciiString has exceeded the original design scope for this class. Its usage as a binary string is confusing and on the verge of violating interface assumptions in some spots.
Modifications:
- ByteString will be created as a base class to AsciiString. All of the generic byte handling processing will live in ByteString and all the special character encoding will live in AsciiString.
Results:
The AsciiString interface will be clarified. Users of AsciiString can now be clear of the limitations the class imposes while users of the ByteString class don't have to live with those limitations.
Motivation:
The DefaultHttp2ConnectionDecoder class is calling verifyPrefaceReceived() for almost every frame event at all times.
The Http2ConnectionHandler class is calling readClientPrefaceString() on every decode event.
Modifications:
- DefaultHttp2ConnectionDecoder should not have to continuously call verifyPrefaceReceived() because it transitions boolean state 1 time for each connection.
- Http2ConnectionHandler should not have to continuously call readClientPrefaceString() because it transitions boolean state 1 time for each connection.
Result:
- Less conditional checks for the mainstream usage of the connection.