Motivation:
We need to ensure we pass all tests when sun.misc.Unsafe is not present.
Modifications:
- Make *ByteBufAllocatorTest work whenever sun.misc.Unsafe is present or not
- Let Lz4FrameEncoderTest not depend on AbstractByteBufAllocator implementation details which take into account if sun.misc.Unsafe is present or not
Result:
Tests pass even without sun.misc.Unsafe.
Motivation:
We used various mocking frameworks. We should only use one...
Modifications:
Make usage of mocking framework consistent by only using Mockito.
Result:
Less dependencies and more consistent mocking usage.
Motivation:
Currently Netty does not wrap socket connect, bind, or accept
operations in doPrivileged blocks. Nor does it wrap cases where a dns
lookup might happen.
This prevents an application utilizing the SecurityManager from
isolating SocketPermissions to Netty.
Modifications:
I have introduced a class (SocketUtils) that wraps operations
requiring SocketPermissions in doPrivileged blocks.
Result:
A user of Netty can grant SocketPermissions explicitly to the Netty
jar, without granting it to the rest of their application.
Motivation:
LZ4FrameEncoder maintains an internal buffer of incoming data compress, and only writes out compressed data when a size threshold is reached. LZ4FrameEncoder does not override the flush() method, and thus the only way to flush data down the pipeline is via more data or close the channel.
Modifications:
Override the flush() function to flush on demand. Also overrode the allocateBuffer() function so we can more accurately size the output buffer (instead of needing to potatntially realloc via buffer.ensureWritable()).
Result:
Implementation works as described.
Motivation:
Thought there may be a bug so added a testcase to verify everything works as expected.
Modifications:
Added testcase
Result:
More test-coverage.
Motivation:
IntelliJ issues several warnings.
Modifications:
* `ClientCookieDecoder` and `ServerCookieDecoder`:
* `nameEnd`, `valueBegin` and `valueEnd` don't need to be initialized
* `keyValLoop` loop doesn't been to be labelled, as it's the most inner one (same thing for labelled breaks)
* Remove `if (i != headerLen)` as condition is always true
* `ClientCookieEncoder` javadoc still mention old logic
* `DefaultCookie`, `ServerCookieEncoder` and `DefaultHttpHeaders` use ternary ops that can be turned into simple boolean ones
* `DefaultHeaders` uses a for(int) loop over an array. It can be turned into a foreach one as javac doesn't allocate an iterator to iterate over arrays
* `DefaultHttp2Headers` and `AbstractByteBuf` `equal` can be turned into a single boolean statement
Result:
Cleaner code
Motivation:
* DefaultHeaders from netty-codec has some duplicated logic for header date parsing
* Several classes keep on using deprecated HttpHeaderDateFormat
Modifications:
* Move HttpHeaderDateFormatter to netty-codec and rename it into HeaderDateFormatter
* Make DefaultHeaders use HeaderDateFormatter
* Replace HttpHeaderDateFormat usage with HeaderDateFormatter
Result:
Faster and more consistent code
Motiviation:
We used ReferenceCountUtil.releaseLater(...) in our tests which simplifies a bit the releasing of ReferenceCounted objects. The problem with this is that while it simplifies stuff it increase memory usage a lot as memory may not be freed up in a timely manner.
Modifications:
- Deprecate releaseLater(...)
- Remove usage of releaseLater(...) in tests.
Result:
Less memory needed to build netty while running the tests.
Motivation:
2c78902ebc ensured buffers were released in the general case but didn't clean up an extra release in LzmaFrameEncoderTest#testCompressionOfBatchedFlowOfData which lead to a double release.
Modifications:
LzmaFrameEncoderTest#testCompressionOfBatchedFlowOfData should not explicitly release the buffer because decompress will release the buffer
Result:
No more reference count exception and failed test.
Motivation:
c1932a8537 made an assumption that the LzmaInputStream which wraps a ByteBufInputStream would delegate the close operation to the wrapped stream. This assumption is not true and thus we still had a leak. An issue has been logged with our LZMA dependency https://github.com/jponge/lzma-java/issues/14.
Modifications:
- Force a close on the wrapped stream
Result:
No more leak.
Motivation:
Netty provides a adaptor from ByteBuf to Java's InputStream interface. The JDK Stream interfaces have an explicit lifetime because they implement the Closable interface. This lifetime may be differnt than the ByteBuf which is wrapped, and controlled by the interface which accepts the JDK Stream. However Netty's ByteBufInputStream currently does not take reference count ownership of the underlying ByteBuf. There may be no way for existing classes which only accept the InputStream interface to communicate when they are done with the stream, other than calling close(). This means that when the stream is closed it may be appropriate to release the underlying ByteBuf, as the ownership of the underlying ByteBuf resource may be transferred to the Java Stream.
Motivation:
- ByteBufInputStream.close() supports taking reference count ownership of the underyling ByteBuf
Result:
ByteBufInputStream can assume reference count ownership so the underlying ByteBuf can be cleaned up when the stream is closed.
Motivation:
The unit tests for the compression encoders/decoders may write buffers to an EmbeddedChannel but then may not release buffer or close the channel after the test. This may result in buffer leaks.
Modifications:
- Call channel.finishAndReleaseAll() after each test
Result:
Fixes https://github.com/netty/netty/issues/6007
Motivation:
ObjectOutputStream uses a Channel Attribute to cache a ObjectOutputStream which is backed by a ByteBuf that may be released after an object is encoded and the underlying buffer is written to the channel. On subsequent encode operations the cached ObjectOutputStream will be invalid and lead to a reference count exception.
Modifications:
- CompatibleObjectEncoder should not cache a ObjectOutputStream.
Result:
CompatibleObjectEncoder doesn't use a cached object backed by a released ByteBuf.
Motivation:
the build doesnt seem to enforce this, so they piled up
Modifications:
removed unused import lines
Result:
less unused imports
Signed-off-by: radai-rosenblatt <radai.rosenblatt@gmail.com>
Motivation:
MessageAggregator has a potential to leak if a new message is received before the existing message has completed, and if a HttpContent is received but maxContentLength has been exceeded, or the content length is too long.
Modifications:
- Make the HttpObjectAggregator more robust to leaks
- Reduce dependance on handlingOversizedMessage but instead rely on the more general check of a null currentMessage
Result:
More robust MessageAggregator with less chance of leaks
Motivation :
Unboxing operations allocate unnecessary objects when it could be avoided.
Modifications:
Replaced Float.valueOf with Number.parseFloat where possible.
Result:
Less unnecessary objects allocations.
Motivation:
We are currently doing a memory copy to verify the snapy version. This is not needed.
Modifications:
Remove memory copy and just compare byte per byte.
Result:
Less memory copies and allocations
Motivation:
We need to ensure the uncompressed ByteBuf is released if an exception happens while calling decode(...). If we miss to do so we leak buffers.
Modifications:
Correctly release buffer on exception.
Result:
No more memory leak.
Motivation:
We not need to do any memory copies when doing CRC32 processing.
Modifications:
Use ByteBufChecksum to eliminate memory copies.
Result:
Less memory copies.
Motivation:
We should try to minimize memory copies whenever possible.
Modifications:
- Refactor ByteBufChecksum to work with heap and direct ByteBuf always
- Remove memory copy in Snappy by let Crc32c extend ByteBufChecksum
Result:
Less memory copies when using Snappy
Motivation:
We did an unessary memory copy when doing bzip2 encoding.
Modifications:
Remove memory copy and just use a ByteProcessor.
Result:
Less memory copies and so faster.
Motivation:
We should prefer direct buffers for the output of Lz4FrameEncoder as this is what is needed for writing to the socket.
Modification:
Use direct buffers for the output
Result:
Less memory copies needed.
Motivation:
When the user constructs Lz4FrameDecoder with a Checksum implementation like CRC32 or Adler32 and uses Java8 we can directly use a ByteBuffer to do the checksum work. This way we can eliminate memory copies.
Modifications:
Detect if ByteBuffer can be used for checksum work and if so reduce memory copies.
Result:
Less memory copies when using JDK8
Motivation:
HPACK Encoder has a data structure which is similar to a previous version of DefaultHeaders. Some of the same improvements can be made.
Motivation:
- Enforce the restriction that the Encoder's headerFields length must be a power of two so we can use masking instead of modulo
- Use AsciiString.hashCode which already has optimizations instead of having yet another hash code algorithm in Encoder
Result:
Fixes https://github.com/netty/netty/issues/5357
Motivation:
PlatformDependent attempts to use reflection to get the underlying char[] (or byte[]) from String objects. This is fragile as if the String implementation does not utilize the full array, and instead uses a subset of the array, this optimization is invalid. OpenJDK6 and some earlier versions of OpenJDK7 String have the capability to use a subsection of the underlying char[].
Modifications:
- PlatformDependent should not attempt to use the underlying array from String (or other data types) via reflection
Result:
PlatformDependent hash code generation for CharSequence does not depend upon specific JDK implementation details.
Motivation:
It is good to have used dependencies and plugins up-to-date to fix any undiscovered bug fixed by the authors.
Modification:
Scanned dependencies and plugins and carefully updated one by one.
Result:
Dependencies and plugins are up-to-date.
Motivation:
JCTools supports both non-unsafe, unsafe versions of queues and JDK6 which allows us to shade the library in netty-common allowing it to stay "zero dependency".
Modifications:
- Remove copy paste JCTools code and shade the library (dependencies that are shaded should be removed from the <dependencies> section of the generated POM).
- Remove usage of OneTimeTask and remove it all together.
Result:
Less code to maintain and easier to update JCTools and less GC pressure as the queue implementation nt creates so much garbage
Motivation:
We should ensure we null out the cumulation buffer before we fire it through the pipleine in handlerRemoved(...) as in theory it could be possible that another method is triggered as result of the fireChannelRead(...) or fireChannelReadComplete() that will try to access the cumulation.
Modifications:
Null out cumulation buffer early in handlerRemoved(...)
Result:
No possible to access the cumulation buffer that was already handed over.
Motivation:
For example,
DefaultHttp2Headers headers = new DefaultHttp2Headers();
headers.add("key1", "value1");
headers.add("key1", "value2");
headers.add("key1", "value3");
headers.add("key2", "value4");
produces:
DefaultHttp2Headers[key1: value1key1: value2key1: value3, key2: value4]
while correctly it should be
DefaultHttp2Headers[key1: value1, key1: value2, key1: value3, key2: value4]
Modifications:
Change the toString() method to produce the beforementioned output.
Result:
toString() format is correct also for keys with multiple values.
Motivation:
At the moment the user is responsible to increase the writer index of the composite buffer when a new component is added. We should add some methods that handle this for the user as this is the most popular usage of the composite buffer.
Modifications:
Add new methods that autoamtically increase the writerIndex when buffers are added.
Result:
Easier usage of CompositeByteBuf.
Motivation:
99dfc9ea79 introduced some code that will more frequently try to forward messages out of the list of decoded messages to reduce latency and memory footprint. Unfortunally this has the side-effect that RecycleableArrayList.clear() will be called more often and so introduce some overhead as ArrayList will null out the array on each call.
Modifications:
- Introduce a CodecOutputList which allows to not null out the array until we recycle it and also allows to access internal array with extra range checks.
- Add benchmark that add elements to different List implementations and clear them
Result:
Less overhead when decode / encode messages.
Benchmark (elements) Mode Cnt Score Error Units
CodecOutputListBenchmark.arrayList 1 thrpt 20 24853764.609 ± 161582.376 ops/s
CodecOutputListBenchmark.arrayList 4 thrpt 20 17310636.508 ± 930517.403 ops/s
CodecOutputListBenchmark.codecOutList 1 thrpt 20 26670751.661 ± 587812.655 ops/s
CodecOutputListBenchmark.codecOutList 4 thrpt 20 25166421.089 ± 166945.599 ops/s
CodecOutputListBenchmark.recyclableArrayList 1 thrpt 20 24565992.626 ± 210017.290 ops/s
CodecOutputListBenchmark.recyclableArrayList 4 thrpt 20 18477881.775 ± 157003.777 ops/s
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 246.748 sec - in io.netty.handler.codec.CodecOutputListBenchmark
Related: #4333#4421#5128
Motivation:
slice(), duplicate() and readSlice() currently create a non-recyclable
derived buffer instance. Under heavy load, an application that creates a
lot of derived buffers can put the garbage collector under pressure.
Modifications:
- Add the following methods which creates a non-recyclable derived buffer
- retainedSlice()
- retainedDuplicate()
- readRetainedSlice()
- Add the new recyclable derived buffer implementations, which has its
own reference count value
- Add ByteBufHolder.retainedDuplicate()
- Add ByteBufHolder.replace(ByteBuf) so that..
- a user can replace the content of the holder in a consistent way
- copy/duplicate/retainedDuplicate() can delegate the holder
construction to replace(ByteBuf)
- Use retainedDuplicate() and retainedSlice() wherever possible
- Miscellaneous:
- Rename DuplicateByteBufTest to DuplicatedByteBufTest (missing 'D')
- Make ReplayingDecoderByteBuf.reject() return an exception instead of
throwing it so that its callers don't need to add dummy return
statement
Result:
Derived buffers are now recycled when created via retainedSlice() and
retainedDuplicate() and derived from a pooled buffer
Motivation:
Often users either need to read or write CharSequences to a ByteBuf. We should add methods for this to ByteBuf as we can do some optimizations for this depending on the implementation.
Modifications:
Add setCharSequence, writeCharSequence, getCharSequence and readCharSequence
Result:
Easier reading / writing of CharSequence with ByteBuf.
Motivation:
The double quote may be escaped in a JSON string, but JsonObjectDecoder doesn't handle it. Resolves#5157.
Modifications:
Don't end a JSON string when processing an escaped double quote.
Result:
JsonObjectDecoder can handle backslash and double quote in a JSON string correctly.
Motivation:
We lately added ByteBuf.isReadOnly() which allows to detect if a buffer is read-only or not. We should add ByteBuf.asReadOnly() to allow easily access a read-only version of a buffer.
Modifications:
- Add ByteBuf.asReadOnly()
- Deprecate Unpooled.unmodifiableBuffer(Bytebuf)
Result:
More consistent api.
Motivation:
b112673554 added ChannelInputShutdownEvent support to ByteToMessageDecoder but missed updating the code for ReplayingDecoder. This has the effect:
- If a ChannelInputShutdownEvent is fired ByteToMessageDecoder (the super-class of ReplayingDecoder) will call the channelInputClosed(...) method which will pass the incorrect buffer to the decode method of ReplayingDecoder.
Modifications:
Share more code between ByteToMessageDEcoder and ReplayingDecoder and so also support ChannelInputShutdownEvent correctly in ReplayingDecoder
Result:
ChannelInputShutdownEvent is corrrectly handle in ReplayingDecoder as well.
We need to check if this handler was removed before continuing with decoding.
If it was removed, it is not safe to continue to operate on the buffer. This was already fixed for ByteToMessageDecoder in 4cdbe39284 but missed for ReplayingDecoder.
Modifications:
Check if decoder was removed after fire messages through the pipeline.
Result:
No illegal buffer access when decoder was removed.
Motivation:
We use ByteBuf.readBytes(int) in various places where we could either remove it completely or use readSlice(int).retain().
Modifications:
- Remove ByteBuf.readBytes(int) when possible or replace by readSlice(int).retain().
Result:
Faster code.
Motivation:
See #1811
Modifications:
Add LineEncoder and LineSeparator
Result:
The user can use LineEncoder to write a String with a line separator automatically
Motivation:
UDP-oriented codec reusing the existing encoders and decoders would be helpful. See #1350
Modifications:
Add DatagramPacketEncoder and DatagramPacketDecoder to reuse the existing encoders and decoders.
Result:
People can use DatagramPacketEncoder and DatagramPacketDecoder to wrap existing encoders and decoders to create UDP-oriented codec.
Motivation:
If the input buffer is empty we should not have decodeLast(...) call decode(...) as the user may not expect this.
Modifications:
- Not call decode(...) in decodeLast(...) if the input buffer is empty.
- Add testcases.
Result:
decodeLast(...) will not call decode(...) if input buffer is empty.
Motivation:
Each call of ByteBuf.getByte(int) method does boundary checking. This can be eliminated by using ByteBuf.forEachByte(ByteProcessor) method and ByteProcessor.FIND_LF processor.
Modifications:
Find end of line with ByteProcessor.FIND_LF
Result:
A little better performance of LineBasedFrameDecoder.
Motivation:
Some people may want to use the Snappy class directly to encode / decode ByteBufs.
Modifications:
Make the Snappy class public and final.
Result:
Easier for people to reuse parts of Netty.
Motivation:
ByteToMessageDecoder must ensure that read components of the CompositeByteBuf can be discard by default when discardSomeReadBytes() is called. This may not be the case before as because of the default maxNumComponents that will cause consolidation.
Modifications:
Ensure we not do any consolidation to actually be abel to discard read components
Result:
Less memory usage and allocations.
Motivation
See ##3229
Modifications:
Add methods with position independent FileChannel calls to ByteBuf and its subclasses.
Results:
The user can use these new methods to read/write ByteBuff without updating FileChannel's position.
Motivation:
b714297a44 introduced ChannelInputShutdownEvent support for HttpObjectDecoder. However this should have been added to the super class ByteToMessageDecoder, and ByteToMessageDecoder should not propegate a channelInactive event through the pipeline in this case.
Modifications:
- Move the ChannelInputShutdownEvent handling from HttpObjectDecoder to ByteToMessageDecoder
- ByteToMessageDecoder doesn't call ctx.fireChannelInactive() on ChannelInputShutdownEvent
Result:
Half closed events are treated more generically, and don't get translated into a channelInactive pipeline event.
Motivation:
We not correctly added newlines if the src data needed to be padded. This regression was introduced by '63426fc3ed083513c07a58b45381f5c10dd47061'
Modifications:
- Correctly handling newlines
- Add unit test that proves the fix.
Result:
No more invalid base64 encoded data.
Motivation:
If the ZlibCodecFactory can support using a custom window size we should support it by default in the websocket extensions as well.
Modifications:
Detect if a custom window size can be handled by the ZlibCodecFactory and if so enable it by default for PerMessageDeflate*ExtensionHandshaker.
Result:
Support window size flag by default in most installations.
Motivation:
According to https://github.com/google/snappy/blob/master/format_description.txt#L55 , Snappy.decodeLiteral should handle the cases of 60, 61, 62 and 63. However right now it processes 64 instead of 63. I believe it's a typo since `tag >> 2 & 0x3F` must be less than 64.
Modifications:
Use the correct value 63.
Result:
Snappy.decodeLiteral handles the correct case.
Motivation:
Netty was missing support for Protobuf nano runtime targeted at
weaker systems such as Android devices.
Modifications:
Added ProtobufDecoderNano and ProtobufDecoderNano
in order to provide support for Nano runtime.
modified ProtobufVarint32FrameDecoder and
ProtobufLengthFieldPrepender in order to remove any
on either Nano or Lite runtime by copying the code
for handling Protobuf varint32 in from Protobuf
library.
modified Licenses and NOTICE in order to reflect the
changes i made.
added Protobuf Nano runtime as optional dependency
Result:
Netty now supports Protobuf Nano runtime.
Motivation:
`JdkZlibDecoder` is available since Netty 4.0.8 and works with JDK7+.
However, `io.netty.noJdkZlibDecoder` System prop evaluation always defaults to
true, causing Netty to always use JZLib when decompressing on the
client side when the property insn't explictly set to `false`.
Modifications:
Default to `false` instead of `true` when JDK7+.
Result:
JZLib optional as expected on JDK7+.
Motivation:
We need to check if this handler was removed before continuing with decoding.
If it was removed, it is not safe to continue to operate on the buffer.
Modifications:
Check if decoder was removed after fire messages through the pipeline.
Result:
No illegal buffer access when decoder was removed.
Motivation:
There are some wrong links and tags in javadoc.
Modifications:
Fix the wrong links and tags in javadoc.
Result:
These links will work correctly in javadoc.
Motivation:
We should not use Unpooled to allocate buffers for performance reasons.
Modifications:
Allow to pass in ByteBufAllocate which is used to allocate buffers or use the allocate of the src buffer.
Result:
Better performance if the PooledByteBufAllocator is used.
Motivation:
We need to ensure we not add a newline if the Base64 encoded buffer ends directly on the MAX_LINE_LENGTH. If we miss to do so this produce invalid data.
Because of this bug OpenSslServerContext and OpenSslClientContext may fail to load a cert.
Modifications:
- Only add NEW_LINE if we not are on the end of the dst buffer.
- Add unit test
Result:
Correct result in all cases
Motivation:
We have websocket extension support (with compression) in old master. We should port this to 4.1
Modifications:
Backport relevant code.
Result:
websocket extension support (with compression) is now in 4.1.
Motivation:
We recently added methods to ByteBuf to directly write and read LE values. We should use these in the Snappy implementation and so reduce duplication.
Modifications:
Replace manually swapping of values with LE write and read methods of ByteBuf.
Result:
Cleaner code with less duplication.
As discussed in #3209, this PR adds Little Endian accessors
to ByteBuf and descendants.
Corresponding accessors were added to UnsafeByteBufUtil,
HeapByteBufferUtil to avoid calling `reverseBytes`.
Deprecate `order()`, `order(buf)` and `SwappedByteBuf`.
Motivation:
DefaultHeaders creates an array of size 16 for all headers. This may waste a good deal of memory if applications only have a small number of headers. This memory may be critical when the number of connections grows large.
Modifications:
- Make the size of the array for DefaultHeaders configurable
Result:
Applications can control the size of the DefaultHeaders array and save memory.
Motivation:
We should use OneTimeTask where possible to reduce object creation.
Modifications:
Replace Runnable with OneTimeTask
Result:
Less object creation
Motivation:
Headers and groups of headers are frequently copied and the current mechanism is slower than it needs to be.
Modifications:
Skip name validation and hash computation when they are not necessary.
Fix emergent bug in CombinedHttpHeaders identified with better testing
Fix memory leak in DefaultHttp2Headers when clearing
Added benchmarks
Result:
Faster header copying and some collateral bug fixes
Motivation:
Makes the API contract of headers more consistent and simpler.
Modifications:
If self is passed to set then simply return
Result:
set and setAll will be consistent
Motivation:
The HTTP/2 RFC (https://tools.ietf.org/html/rfc7540#section-8.1.2) indicates that header names consist of ASCII characters. We currently use ByteString to represent HTTP/2 header names. The HTTP/2 RFC (https://tools.ietf.org/html/rfc7540#section-10.3) also eludes to header values inheriting the same validity characteristics as HTTP/1.x. Using AsciiString for the value type of HTTP/2 headers would allow for re-use of predefined HTTP/1.x values, and make comparisons more intuitive. The Headers<T> interface could also be expanded to allow for easier use of header types which do not have the same Key and Value type.
Motivation:
- Change Headers<T> to Headers<K, V>
- Change Http2Headers<ByteString> to Http2Headers<CharSequence, CharSequence>
- Remove ByteString. Having AsciiString extend ByteString complicates equality comparisons when the hash code algorithm is no longer shared.
Result:
Http2Header types are more representative of the HTTP/2 RFC, and relationship between HTTP/2 header name/values more directly relates to HTTP/1.x header names/values.
Motivation:
At the moment we only forward decoded messages that were added the out List once the full decode loop was completed. This has the affect that resources may not be released as fast as possible and as an application may incounter higher latency if the user triggeres a writeAndFlush(...) as a result of the decoded messages.
Modifications:
- forward decoded messages after each decode call
Result:
Forwarding decoded messages through the pipeline in a more eager fashion.
Motivation:
We should prevent to add/set DefaultHttpHeaders to itself to prevent unexpected side-effects.
Modifications:
Throw IllegalArgumentException if user tries to pass the same instance to set/add.
Result:
No surprising side-effects.
Motivation:
If a remote peer writes fast enough it may take a long time to have fireChannelReadComplete(...) triggered. Because of this we need to take special care and ensure we try to discard some bytes if channelRead(...) is called to often in ByteToMessageDecoder.
Modifications:
- Add ByteToMessageDecoder.setDiscardAfterReads(...) which allows to set the number of reads after which we try to discard the read bytes
- Use default value of 16 for max reads.
Result:
No risk of OOME.
Motivation:
The HashingStrategy for DefaultStompHeaders was using the java .equals() method which would fail to compare String, AsciiString, and other CharSequence objects as equal.
Modification:
- Use AsciiString.CASE_SENSITIVE_HASHER for DefaultStompHeaders
Result:
DefaultStompHeaders work with all CharSequence objects.
Fixes https://github.com/netty/netty/issues/4247
Motivation:
The HTTP/2 header name validation was removed, and does not currently exist.
Modifications:
- Header name validation for HTTP/2 should be restored and set to the default mode of operation.
Result:
HTTP/2 header names are validated according to https://tools.ietf.org/html/rfc7540
Motivation:
We missed to correctly implement the handlerRemoved(...) / channelInactive(...) and channelReadComplete(...) method, this leaded to multiple problems:
- Missed to forward bytes when the codec is removed from the pipeline
- Missed to call decodeLast(...) once the Channel goes in active
- No correct handling of channelReadComplete that could lead to grow of cumulation buffer.
Modifications:
- Correctly implement methods and forward to the internal ByteToMessageDecoder
- Add unit test.
Result:
Correct behaviour
Motivation:
The HttpObjectAggregator always responds with a 100-continue response. It should check the Content-Length header to see if the content length is OK, and if not responds with a 417.
Modifications:
- HttpObjectAggregator checks the Content-Length header in the case of a 100-continue.
Result:
HttpObjectAggregator responds with 417 if content is known to be too big.
Motivation:
A degradation in performance has been observed from the 4.0 branch as documented in https://github.com/netty/netty/issues/3962.
Modifications:
- Simplify Headers class hierarchy.
- Restore the DefaultHeaders to be based upon DefaultHttpHeaders from 4.0.
- Make various other modifications that are causing hot spots.
Result:
Performance is now on par with 4.0.
Motivation:
We noticed that the headers implementation in Netty for HTTP/2 uses quite a lot of memory
and that also at least the performance of randomly accessing a header is quite poor. The main
concern however was memory usage, as profiling has shown that a DefaultHttp2Headers
not only use a lot of memory it also wastes a lot due to the underlying hashmaps having
to be resized potentially several times as new headers are being inserted.
This is tracked as issue #3600.
Modifications:
We redesigned the DefaultHeaders to simply take a Map object in its constructor and
reimplemented the class using only the Map primitives. That way the implementation
is very concise and hopefully easy to understand and it allows each concrete headers
implementation to provide its own map or to even use a different headers implementation
for processing requests and writing responses i.e. incoming headers need to provide
fast random access while outgoing headers need fast insertion and fast iteration. The
new implementation can support this with hardly any code changes. It also comes
with the advantage that if the Netty project decides to add a third party collections library
as a dependency, one can simply plug in one of those very fast and memory efficient map
implementations and get faster and smaller headers for free.
For now, we are using the JDK's TreeMap for HTTP and HTTP/2 default headers.
Result:
- Significantly fewer lines of code in the implementation. While the total commit is still
roughly 400 lines less, the actual implementation is a lot less. I just added some more
tests and microbenchmarks.
- Overall performance is up. The current implementation should be significantly faster
for insertion and retrieval. However, it is slower when it comes to iteration. There is simply
no way a TreeMap can have the same iteration performance as a linked list (as used in the
current headers implementation). That's totally fine though, because when looking at the
benchmark results @ejona86 pointed out that the performance of the headers is completely
dominated by insertion, that is insertion is so significantly faster in the new implementation
that it does make up for several times the iteration speed. You can't iterate what you haven't
inserted. I am demonstrating that in this spreadsheet [1]. (Actually, iteration performance is
only down for HTTP, it's significantly improved for HTTP/2).
- Memory is down. The implementation with TreeMap uses on avg ~30% less memory. It also does not
produce any garbage while being resized. In load tests for GRPC we have seen a memory reduction
of up to 1.2KB per RPC. I summarized the memory improvements in this spreadsheet [1]. The data
was generated by [2] using JOL.
- While it was my original intend to only improve the memory usage for HTTP/2, it should be similarly
improved for HTTP, SPDY and STOMP as they all share a common implementation.
[1] https://docs.google.com/spreadsheets/d/1ck3RQklyzEcCLlyJoqDXPCWRGVUuS-ArZf0etSXLVDQ/edit#gid=0
[2] https://gist.github.com/buchgr/4458a8bdb51dd58c82b4
Motivation:
Two problems:
1. Decoder assumption that as soon as it finds </ element it can decrement opened xml brackets counter. It can lead to bugs when closing bracket is not in byteBuf yet.
2. Not proper handling of more than two root elements in XML document. First element will be processed properly, second one not. It is caused by assumption that byteBuf readerIndex is 0 at the begging of decoding.
Modifications:
Both problems were resolved by fixes:
1. decrement opened brackets count only if </ > enclosing bracket is found
2. consider readerIndex higher than 0 when counting output frame length
Result:
Both problems were resolved
Motivation:
Sometimes it is useful to detect if a ByteBuf contains a HAProxy header, for example if you want to write something like the PortUnification example.
Modifications:
- Add ProtocolDetectionResult which can be used as a return type for detecting different protocol.
- Add new method which allows to detect HA Proxy messages.
Result:
Easier to detect protocol.
Motivation:
A user sometimes just want the aggregated message has no content at
all. (e.g. A user only wants HTTP GET requests.)
Modifications:
- Do not raise IllegalArgumentException even if a user specified
the maxContentLength of 0
Result:
A user can disallow a message with non-empty content.
Motivation:
We are currently doing a memory cop to extract the frame in LengthFieldBasedFrameDecoder which can be eliminated.
Modifications:
Use buffer.slice(...).retain() to eliminate the memory copy.
Result:
Better performance.
Motivation:
The LineBasedFrameDecoder discardedBytes counting different compare to
DelimiterBasedFrameDecoder.
Modifications:
Add plus sign
Result:
DiscardedBytes counting correctly
Motivation:
Our automatically handling of non-auto-read failed because it not detected the need of calling read again by itself if nothing was decoded. Beside this handling of non-auto-read never worked for SslHandler as it always triggered a read even if it decoded a message and auto-read was false.
This fixes [#3529] and [#3587].
Modifications:
- Implement handling of calling read when nothing was decoded (with non-auto-read) to ByteToMessageDecoder again
- Correctly respect non-auto-read by SslHandler
Result:
No more stales and correctly respecting of non-auto-read by SslHandler.
Motivation:
The usage and code within AsciiString has exceeded the original design scope for this class. Its usage as a binary string is confusing and on the verge of violating interface assumptions in some spots.
Modifications:
- ByteString will be created as a base class to AsciiString. All of the generic byte handling processing will live in ByteString and all the special character encoding will live in AsciiString.
Results:
The AsciiString interface will be clarified. Users of AsciiString can now be clear of the limitations the class imposes while users of the ByteString class don't have to live with those limitations.
Motivation:
The ReplayingDecoderBuffer does not match the naming scheme we use for ByteBuf types.
Modifications:
Rename to ReplayingDecoderByteBuf to match naming scheme
Result:
Consistent naming
Motivation:
Too many duplicated code of tests for different compression codecs.
Modifications:
- Added abstract classes AbstractCompressionTest, AbstractDecoderTest and AbstractEncoderTest which contains common variables and tests for any compression codec.
- Removed common tests which are implemented in AbstractDecoderTest and AbstractEncoderTest from current tests for compression codecs.
- Implemented abstract methods of AbstractDecoderTest and AbstractEncoderTest in current tests for compression codecs.
- Added additional checks for current tests.
- Renamed abstract class IntegrationTest to AbstractIntegrationTest.
- Used Theories to run tests with head and direct buffers.
- Removed code duplicates.
Result:
Removed duplicated code of tests for compression codecs and simplified an addition of tests for new compression codecs.
Motivation:
While the LengthFieldBasedFrameDecoder supports a byte order the LengthFieldPrepender does not.
That means that I can simply add a LengthFieldBasedFrameDecoder with ByteOrder.LITTLE_ENDIAN to my pipeline
but have to write my own Encoder to write length fields in little endian byte order.
Modifications:
Added a constructor that takes a byte order and all other parameters.
All other constructors delegate to this one with ByteOrder.BIG_ENDIAN.
LengthFieldPrepender.encode() uses this byte order to write the length field.
Result:
LengthFieldPrepender will write the length field in the defined byte order.
Motivation:
Not knowing which unit is returned by the maxContentLength() of the Messageggregator when reading the Javadoc is annoying and can be a source of bugs.
Modifications:
Added the mention "in bytes"
Result:
Javadoc is clear.
Motivation:
At the moment if you want to return a HTTP header containing multiple
values you have to set/add that header once with the values wanted. If
you used set/add with an array/iterable multiple HTTP header fields will
be returned in the response.
Note, that this is indeed a suggestion and additional work and tests
should be added. This is mainly to bring up a discussion.
Modifications:
Added a flag to specify that when multiple values exist for a single
HTTP header then add them as a comma separated string.
In addition added a method to StringUtil to help escape comma separated
value charsequences.
Result:
Allows for responses to be smaller.
Motivation:
This will avoid one unncessary method invokation which will slightly improve performance.
Modifications:
Instead of calling isReadable we just check for the value of readableBytes()
Result:
Nothing functionally speaking change.
While implementing netty-handler-proxy, I realized various issues in our
current socksx package. Here's the list of the modifications and their
background:
- Split message types into interfaces and default implementations
- so that a user can implement an alternative message implementations
- Use classes instead of enums when a user might want to define a new
constant
- so that a user can extend SOCKS5 protocol, such as:
- defining a new error code
- defining a new address type
- Rename the message classes
- to avoid abbreviated class names. e.g:
- Cmd -> Command
- Init -> Initial
- so that the class names align better with the protocol
specifications. e.g:
- AuthRequest -> PasswordAuthRequest
- AuthScheme -> AuthMethod
- Rename the property names of the messages
- so that the property names align better when the field names in the
protocol specifications
- Improve the decoder implementations
- Give a user more control over when a decoder has to be removed
- Use DecoderResult and DecoderResultProvider to handle decode failure
gracefully. i.e. no more Unknown* message classes
- Add SocksPortUnifinicationServerHandler since it's useful to the users
who write a SOCKS server
- Cleaned up and moved from the socksproxy example
Motivation:
Currently, using a MessageAggregator in the pipeline always results in the creation of an unpooled heap CompositeByteBuf. By using the ByteBufAllocator the CompositeByteBuf will use the implementation specified by the ByteBufAllocator.
Modifications:
Use the ChannelHandlerContext's ByteBufAllocator to create the CompositeByteBuf for message aggregation
Result:
The CompositeByteBuf is now configured based on the ByteBufAllocator's settings.
Related:
- 27a25e29f7
Motivation:
The commit mentioned above introduced a regression where
channelReadComplete() event is swallowed by a handler which was added
dynamically.
Modifications:
Do not suppress channelReadComplete() if the current handler's
channelRead() method was not invoked at all, so that a just-added
handler does not suppress channelReadComplete().
Result:
Regression is gone, and channelReadComplete() is invoked when necessary.
Motivation:
Even if a handler called ctx.fireChannelReadComplete(), the next handler
should not get its channelReadComplete() invoked if fireChannelRead()
was not invoked before.
Modifications:
- Ensure channelReadComplete() is invoked only when the handler of the
current context actually produced a message, because otherwise there's
no point of triggering channelReadComplete().
i.e. channelReadComplete() must follow channelRead().
- Fix a bug where ctx.read() was not called if the handler of the
current context did not produce any message, making the connection
stall. Read the new comment for more information.
Result:
- channelReadComplete() is invoked only when it makes sense.
- No stale connection
Motivation:
The JdkZlibDecoder and JZlibDecoder call isReadable and readableBytes in the same method. There is an opportunity to reduce the number of methods calls to just use readableBytes. JdkZlibDecoder reads from a ByteBuf with an absolute index instead of using readerIndex()
Modifications:
- Use readableBytes where isReadable was used
- Correct absolute ByteBuf index to be relative to readerIndex()
Result:
Less method calls duplicating work and preventing an index out of bounds exception.
Motivation:
There are two member variables (addAllVisitor, setAllVisitor) which are likely not to be used in the majority of use cases.
Modifications:
Remove these member variables and rely on a method to return a new object when needed.
Result:
Two less member variables for each DefaultHeaders instance.
Motivation:
The Headers interface had two member variables (addAllVisitor, setAllVisitor) which are not necessarily always needed but are always instantiated. This may result in excess memory being used.
Modifications:
- addAllVisitor will be accessed via a method addAllVisitor() which will use lazy initialization.
- setAllVisitor will be accessed via a method addAllVisitor() which will use lazy initialization.
Result:
Potential memory savings by using lazy initialization.
Motivation:
Decompression handlers contain heavy use of switch-case statements. We
use compact indentation style for 'case' so that we utilize our screen
real-estate more efficiently.
Also, the following decompression handlers do not need to run a loop,
because ByteToMessageDecoder already runs a loop for them:
- FastLzFrameDecoder
- Lz4FrameDecoder
- LzfDecoder
Modifications:
- Fix indentations
- Do not wrap the decoding logic with a for loop when unnecessary
- Handle the case where a FastLz/Lzf frame contains no data properly so
that the buffer does not leak and less garbage is produced.
Result:
- Efficiency
- Compact source code
- No buffer leak
Motivation:
Currently when there are bytes left in the cumulation buffer we do a byte copy to produce the input buffer for the decode method. This can put quite some overhead on the impl.
Modification:
- Use a CompositeByteBuf to eliminate the byte copy.
- Allow to specify if a CompositeBytebug should be used or not as some handlers can only act on one ByteBuffer in an efficient way (like SslHandler :( ).
Result:
Performance improvement as shown in the following benchmark.
Without this patch:
[xxx@xxx ~]$ ./wrk-benchmark
Running 5m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 20.19ms 38.34ms 1.02s 98.70%
Req/Sec 241.10k 26.50k 303.45k 93.46%
1153994119 requests in 5.00m, 155.84GB read
Requests/sec: 3846702.44
Transfer/sec: 531.93MB
With the patch:
[xxx@xxx ~]$ ./wrk-benchmark
Running 5m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 17.34ms 27.14ms 877.62ms 98.26%
Req/Sec 252.55k 23.77k 329.50k 87.71%
1209772221 requests in 5.00m, 163.37GB read
Requests/sec: 4032584.22
Transfer/sec: 557.64MB
Modifications:
Converted AsciiString into a String by calling toString() method before comparing with equals(). Also added a unit-test to show that it works.
Result:
Major violation is gone. Code is correct.
Motivation:
The new Headers interface contains methods to getTimeMillis but no add/set/contains variants. These should be added for consistency.
Modifications:
- Add three new methods: addTimeMillis, setTimeMillis, containsTimeMillis to the Headers interface.
- Add a new method to the Headers.ValueConverter interface: T convertTimeMillis(long)
- Bring these new interfaces up the class hierarchy
Result:
All Headers classes have setters/getters for timeMillis.