Motivation:
The HTTP/2 RFC (https://tools.ietf.org/html/rfc7540#section-8.1.2) indicates that header names consist of ASCII characters. We currently use ByteString to represent HTTP/2 header names. The HTTP/2 RFC (https://tools.ietf.org/html/rfc7540#section-10.3) also eludes to header values inheriting the same validity characteristics as HTTP/1.x. Using AsciiString for the value type of HTTP/2 headers would allow for re-use of predefined HTTP/1.x values, and make comparisons more intuitive. The Headers<T> interface could also be expanded to allow for easier use of header types which do not have the same Key and Value type.
Motivation:
- Change Headers<T> to Headers<K, V>
- Change Http2Headers<ByteString> to Http2Headers<CharSequence, CharSequence>
- Remove ByteString. Having AsciiString extend ByteString complicates equality comparisons when the hash code algorithm is no longer shared.
Result:
Http2Header types are more representative of the HTTP/2 RFC, and relationship between HTTP/2 header name/values more directly relates to HTTP/1.x header names/values.
Motivation:
At the moment we only forward decoded messages that were added the out List once the full decode loop was completed. This has the affect that resources may not be released as fast as possible and as an application may incounter higher latency if the user triggeres a writeAndFlush(...) as a result of the decoded messages.
Modifications:
- forward decoded messages after each decode call
Result:
Forwarding decoded messages through the pipeline in a more eager fashion.
Motivation:
We should prevent to add/set DefaultHttpHeaders to itself to prevent unexpected side-effects.
Modifications:
Throw IllegalArgumentException if user tries to pass the same instance to set/add.
Result:
No surprising side-effects.
Motivation:
If a remote peer writes fast enough it may take a long time to have fireChannelReadComplete(...) triggered. Because of this we need to take special care and ensure we try to discard some bytes if channelRead(...) is called to often in ByteToMessageDecoder.
Modifications:
- Add ByteToMessageDecoder.setDiscardAfterReads(...) which allows to set the number of reads after which we try to discard the read bytes
- Use default value of 16 for max reads.
Result:
No risk of OOME.
Motivation:
The HashingStrategy for DefaultStompHeaders was using the java .equals() method which would fail to compare String, AsciiString, and other CharSequence objects as equal.
Modification:
- Use AsciiString.CASE_SENSITIVE_HASHER for DefaultStompHeaders
Result:
DefaultStompHeaders work with all CharSequence objects.
Fixes https://github.com/netty/netty/issues/4247
Motivation:
The HTTP/2 header name validation was removed, and does not currently exist.
Modifications:
- Header name validation for HTTP/2 should be restored and set to the default mode of operation.
Result:
HTTP/2 header names are validated according to https://tools.ietf.org/html/rfc7540
Motivation:
We missed to correctly implement the handlerRemoved(...) / channelInactive(...) and channelReadComplete(...) method, this leaded to multiple problems:
- Missed to forward bytes when the codec is removed from the pipeline
- Missed to call decodeLast(...) once the Channel goes in active
- No correct handling of channelReadComplete that could lead to grow of cumulation buffer.
Modifications:
- Correctly implement methods and forward to the internal ByteToMessageDecoder
- Add unit test.
Result:
Correct behaviour
Motivation:
The HttpObjectAggregator always responds with a 100-continue response. It should check the Content-Length header to see if the content length is OK, and if not responds with a 417.
Modifications:
- HttpObjectAggregator checks the Content-Length header in the case of a 100-continue.
Result:
HttpObjectAggregator responds with 417 if content is known to be too big.
Motivation:
A degradation in performance has been observed from the 4.0 branch as documented in https://github.com/netty/netty/issues/3962.
Modifications:
- Simplify Headers class hierarchy.
- Restore the DefaultHeaders to be based upon DefaultHttpHeaders from 4.0.
- Make various other modifications that are causing hot spots.
Result:
Performance is now on par with 4.0.
Motivation:
We noticed that the headers implementation in Netty for HTTP/2 uses quite a lot of memory
and that also at least the performance of randomly accessing a header is quite poor. The main
concern however was memory usage, as profiling has shown that a DefaultHttp2Headers
not only use a lot of memory it also wastes a lot due to the underlying hashmaps having
to be resized potentially several times as new headers are being inserted.
This is tracked as issue #3600.
Modifications:
We redesigned the DefaultHeaders to simply take a Map object in its constructor and
reimplemented the class using only the Map primitives. That way the implementation
is very concise and hopefully easy to understand and it allows each concrete headers
implementation to provide its own map or to even use a different headers implementation
for processing requests and writing responses i.e. incoming headers need to provide
fast random access while outgoing headers need fast insertion and fast iteration. The
new implementation can support this with hardly any code changes. It also comes
with the advantage that if the Netty project decides to add a third party collections library
as a dependency, one can simply plug in one of those very fast and memory efficient map
implementations and get faster and smaller headers for free.
For now, we are using the JDK's TreeMap for HTTP and HTTP/2 default headers.
Result:
- Significantly fewer lines of code in the implementation. While the total commit is still
roughly 400 lines less, the actual implementation is a lot less. I just added some more
tests and microbenchmarks.
- Overall performance is up. The current implementation should be significantly faster
for insertion and retrieval. However, it is slower when it comes to iteration. There is simply
no way a TreeMap can have the same iteration performance as a linked list (as used in the
current headers implementation). That's totally fine though, because when looking at the
benchmark results @ejona86 pointed out that the performance of the headers is completely
dominated by insertion, that is insertion is so significantly faster in the new implementation
that it does make up for several times the iteration speed. You can't iterate what you haven't
inserted. I am demonstrating that in this spreadsheet [1]. (Actually, iteration performance is
only down for HTTP, it's significantly improved for HTTP/2).
- Memory is down. The implementation with TreeMap uses on avg ~30% less memory. It also does not
produce any garbage while being resized. In load tests for GRPC we have seen a memory reduction
of up to 1.2KB per RPC. I summarized the memory improvements in this spreadsheet [1]. The data
was generated by [2] using JOL.
- While it was my original intend to only improve the memory usage for HTTP/2, it should be similarly
improved for HTTP, SPDY and STOMP as they all share a common implementation.
[1] https://docs.google.com/spreadsheets/d/1ck3RQklyzEcCLlyJoqDXPCWRGVUuS-ArZf0etSXLVDQ/edit#gid=0
[2] https://gist.github.com/buchgr/4458a8bdb51dd58c82b4
Motivation:
Two problems:
1. Decoder assumption that as soon as it finds </ element it can decrement opened xml brackets counter. It can lead to bugs when closing bracket is not in byteBuf yet.
2. Not proper handling of more than two root elements in XML document. First element will be processed properly, second one not. It is caused by assumption that byteBuf readerIndex is 0 at the begging of decoding.
Modifications:
Both problems were resolved by fixes:
1. decrement opened brackets count only if </ > enclosing bracket is found
2. consider readerIndex higher than 0 when counting output frame length
Result:
Both problems were resolved
Motivation:
Sometimes it is useful to detect if a ByteBuf contains a HAProxy header, for example if you want to write something like the PortUnification example.
Modifications:
- Add ProtocolDetectionResult which can be used as a return type for detecting different protocol.
- Add new method which allows to detect HA Proxy messages.
Result:
Easier to detect protocol.
Motivation:
A user sometimes just want the aggregated message has no content at
all. (e.g. A user only wants HTTP GET requests.)
Modifications:
- Do not raise IllegalArgumentException even if a user specified
the maxContentLength of 0
Result:
A user can disallow a message with non-empty content.
Motivation:
We are currently doing a memory cop to extract the frame in LengthFieldBasedFrameDecoder which can be eliminated.
Modifications:
Use buffer.slice(...).retain() to eliminate the memory copy.
Result:
Better performance.
Motivation:
The LineBasedFrameDecoder discardedBytes counting different compare to
DelimiterBasedFrameDecoder.
Modifications:
Add plus sign
Result:
DiscardedBytes counting correctly
Motivation:
Our automatically handling of non-auto-read failed because it not detected the need of calling read again by itself if nothing was decoded. Beside this handling of non-auto-read never worked for SslHandler as it always triggered a read even if it decoded a message and auto-read was false.
This fixes [#3529] and [#3587].
Modifications:
- Implement handling of calling read when nothing was decoded (with non-auto-read) to ByteToMessageDecoder again
- Correctly respect non-auto-read by SslHandler
Result:
No more stales and correctly respecting of non-auto-read by SslHandler.
Motivation:
The usage and code within AsciiString has exceeded the original design scope for this class. Its usage as a binary string is confusing and on the verge of violating interface assumptions in some spots.
Modifications:
- ByteString will be created as a base class to AsciiString. All of the generic byte handling processing will live in ByteString and all the special character encoding will live in AsciiString.
Results:
The AsciiString interface will be clarified. Users of AsciiString can now be clear of the limitations the class imposes while users of the ByteString class don't have to live with those limitations.
Motivation:
The ReplayingDecoderBuffer does not match the naming scheme we use for ByteBuf types.
Modifications:
Rename to ReplayingDecoderByteBuf to match naming scheme
Result:
Consistent naming
Motivation:
Too many duplicated code of tests for different compression codecs.
Modifications:
- Added abstract classes AbstractCompressionTest, AbstractDecoderTest and AbstractEncoderTest which contains common variables and tests for any compression codec.
- Removed common tests which are implemented in AbstractDecoderTest and AbstractEncoderTest from current tests for compression codecs.
- Implemented abstract methods of AbstractDecoderTest and AbstractEncoderTest in current tests for compression codecs.
- Added additional checks for current tests.
- Renamed abstract class IntegrationTest to AbstractIntegrationTest.
- Used Theories to run tests with head and direct buffers.
- Removed code duplicates.
Result:
Removed duplicated code of tests for compression codecs and simplified an addition of tests for new compression codecs.
Motivation:
While the LengthFieldBasedFrameDecoder supports a byte order the LengthFieldPrepender does not.
That means that I can simply add a LengthFieldBasedFrameDecoder with ByteOrder.LITTLE_ENDIAN to my pipeline
but have to write my own Encoder to write length fields in little endian byte order.
Modifications:
Added a constructor that takes a byte order and all other parameters.
All other constructors delegate to this one with ByteOrder.BIG_ENDIAN.
LengthFieldPrepender.encode() uses this byte order to write the length field.
Result:
LengthFieldPrepender will write the length field in the defined byte order.
Motivation:
Not knowing which unit is returned by the maxContentLength() of the Messageggregator when reading the Javadoc is annoying and can be a source of bugs.
Modifications:
Added the mention "in bytes"
Result:
Javadoc is clear.
Motivation:
At the moment if you want to return a HTTP header containing multiple
values you have to set/add that header once with the values wanted. If
you used set/add with an array/iterable multiple HTTP header fields will
be returned in the response.
Note, that this is indeed a suggestion and additional work and tests
should be added. This is mainly to bring up a discussion.
Modifications:
Added a flag to specify that when multiple values exist for a single
HTTP header then add them as a comma separated string.
In addition added a method to StringUtil to help escape comma separated
value charsequences.
Result:
Allows for responses to be smaller.
Motivation:
This will avoid one unncessary method invokation which will slightly improve performance.
Modifications:
Instead of calling isReadable we just check for the value of readableBytes()
Result:
Nothing functionally speaking change.
While implementing netty-handler-proxy, I realized various issues in our
current socksx package. Here's the list of the modifications and their
background:
- Split message types into interfaces and default implementations
- so that a user can implement an alternative message implementations
- Use classes instead of enums when a user might want to define a new
constant
- so that a user can extend SOCKS5 protocol, such as:
- defining a new error code
- defining a new address type
- Rename the message classes
- to avoid abbreviated class names. e.g:
- Cmd -> Command
- Init -> Initial
- so that the class names align better with the protocol
specifications. e.g:
- AuthRequest -> PasswordAuthRequest
- AuthScheme -> AuthMethod
- Rename the property names of the messages
- so that the property names align better when the field names in the
protocol specifications
- Improve the decoder implementations
- Give a user more control over when a decoder has to be removed
- Use DecoderResult and DecoderResultProvider to handle decode failure
gracefully. i.e. no more Unknown* message classes
- Add SocksPortUnifinicationServerHandler since it's useful to the users
who write a SOCKS server
- Cleaned up and moved from the socksproxy example
Motivation:
Currently, using a MessageAggregator in the pipeline always results in the creation of an unpooled heap CompositeByteBuf. By using the ByteBufAllocator the CompositeByteBuf will use the implementation specified by the ByteBufAllocator.
Modifications:
Use the ChannelHandlerContext's ByteBufAllocator to create the CompositeByteBuf for message aggregation
Result:
The CompositeByteBuf is now configured based on the ByteBufAllocator's settings.
Related:
- 27a25e29f7
Motivation:
The commit mentioned above introduced a regression where
channelReadComplete() event is swallowed by a handler which was added
dynamically.
Modifications:
Do not suppress channelReadComplete() if the current handler's
channelRead() method was not invoked at all, so that a just-added
handler does not suppress channelReadComplete().
Result:
Regression is gone, and channelReadComplete() is invoked when necessary.
Motivation:
Even if a handler called ctx.fireChannelReadComplete(), the next handler
should not get its channelReadComplete() invoked if fireChannelRead()
was not invoked before.
Modifications:
- Ensure channelReadComplete() is invoked only when the handler of the
current context actually produced a message, because otherwise there's
no point of triggering channelReadComplete().
i.e. channelReadComplete() must follow channelRead().
- Fix a bug where ctx.read() was not called if the handler of the
current context did not produce any message, making the connection
stall. Read the new comment for more information.
Result:
- channelReadComplete() is invoked only when it makes sense.
- No stale connection
Motivation:
The JdkZlibDecoder and JZlibDecoder call isReadable and readableBytes in the same method. There is an opportunity to reduce the number of methods calls to just use readableBytes. JdkZlibDecoder reads from a ByteBuf with an absolute index instead of using readerIndex()
Modifications:
- Use readableBytes where isReadable was used
- Correct absolute ByteBuf index to be relative to readerIndex()
Result:
Less method calls duplicating work and preventing an index out of bounds exception.
Motivation:
There are two member variables (addAllVisitor, setAllVisitor) which are likely not to be used in the majority of use cases.
Modifications:
Remove these member variables and rely on a method to return a new object when needed.
Result:
Two less member variables for each DefaultHeaders instance.
Motivation:
The Headers interface had two member variables (addAllVisitor, setAllVisitor) which are not necessarily always needed but are always instantiated. This may result in excess memory being used.
Modifications:
- addAllVisitor will be accessed via a method addAllVisitor() which will use lazy initialization.
- setAllVisitor will be accessed via a method addAllVisitor() which will use lazy initialization.
Result:
Potential memory savings by using lazy initialization.
Motivation:
Decompression handlers contain heavy use of switch-case statements. We
use compact indentation style for 'case' so that we utilize our screen
real-estate more efficiently.
Also, the following decompression handlers do not need to run a loop,
because ByteToMessageDecoder already runs a loop for them:
- FastLzFrameDecoder
- Lz4FrameDecoder
- LzfDecoder
Modifications:
- Fix indentations
- Do not wrap the decoding logic with a for loop when unnecessary
- Handle the case where a FastLz/Lzf frame contains no data properly so
that the buffer does not leak and less garbage is produced.
Result:
- Efficiency
- Compact source code
- No buffer leak
Motivation:
Currently when there are bytes left in the cumulation buffer we do a byte copy to produce the input buffer for the decode method. This can put quite some overhead on the impl.
Modification:
- Use a CompositeByteBuf to eliminate the byte copy.
- Allow to specify if a CompositeBytebug should be used or not as some handlers can only act on one ByteBuffer in an efficient way (like SslHandler :( ).
Result:
Performance improvement as shown in the following benchmark.
Without this patch:
[xxx@xxx ~]$ ./wrk-benchmark
Running 5m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 20.19ms 38.34ms 1.02s 98.70%
Req/Sec 241.10k 26.50k 303.45k 93.46%
1153994119 requests in 5.00m, 155.84GB read
Requests/sec: 3846702.44
Transfer/sec: 531.93MB
With the patch:
[xxx@xxx ~]$ ./wrk-benchmark
Running 5m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 17.34ms 27.14ms 877.62ms 98.26%
Req/Sec 252.55k 23.77k 329.50k 87.71%
1209772221 requests in 5.00m, 163.37GB read
Requests/sec: 4032584.22
Transfer/sec: 557.64MB
Modifications:
Converted AsciiString into a String by calling toString() method before comparing with equals(). Also added a unit-test to show that it works.
Result:
Major violation is gone. Code is correct.
Motivation:
The new Headers interface contains methods to getTimeMillis but no add/set/contains variants. These should be added for consistency.
Modifications:
- Add three new methods: addTimeMillis, setTimeMillis, containsTimeMillis to the Headers interface.
- Add a new method to the Headers.ValueConverter interface: T convertTimeMillis(long)
- Bring these new interfaces up the class hierarchy
Result:
All Headers classes have setters/getters for timeMillis.
Motivation:
Found performance issues via FindBugs and PMD.
Modifications:
- Removed unnecessary boxing/unboxing operations in DefaultTextHeaders.convertToInt(CharSequence) and DefaultTextHeaders.convertToLong(CharSequence). A boxed primitive is created from a string, just to extract the unboxed primitive value.
- Added a static modifier for DefaultHttp2Connection.ParentChangedEvent class. This class is an inner class, but does not use its embedded reference to the object which created it. This reference makes the instances of the class larger, and may keep the reference to the creator object alive longer than necessary.
- Added a static compiled Pattern to avoid compile it each time it is used when we need to replace some part of authority.
- Improved using of StringBuilders.
Result:
Performance improvements.
Motivation:
Currently, we only test our ZlibEncoders against our ZlibDecoders. It is
convenient to write such tests, but it does not necessarily guarantee
their correctness. For example, both encoder and decoder might be faulty
even if the tests pass.
Modifications:
Add another test that makes sure that our GZIP encoder generates the
GZIP trailer, using the fact that GZIPInputStream raises an EOFException
when GZIP trailer is missing.
Result:
More coverage for GZIP compression
Motiviation:
The HttpContentEncoder does not account for a EmptyLastHttpContent being provided as input. This is useful in situations where the client is unable to determine if the current content chunk is the last content chunk (i.e. a proxy forwarding content when transfer encoding is chunked).
Modifications:
- HttpContentEncoder should not attempt to compress empty HttpContent objects
Result:
HttpContentEncoder supports a EmptyLastHttpContent to terminate the response.
Motivation:
CollectionUtils has only one method and it is used only in DefaultHeaders.
Modification:
Move CollectionUtils.equals() to DefaultHeaders and make it private
Result:
One less class to expose in our public API
Motivation:
The header class hierarchy and algorithm was improved on the master branch for versions 5.x. These improvments should be backported to the 4.1 baseline.
Modifications:
- cherry-pick the following commits from the master branch: 2374e17, 36b4157, 222d258
Result:
Header improvements in master branch are available in 4.1 branch.
Motivation:
Make it much more readable code.
Modifications:
- Added states of decompression.
- Refactored decode(...) method to use this states.
Result:
Much more readable decoder which looks like other compression decoders.
Motivation:
A discovered typo in LzmaFrameEncoder constructor when we check `lc + lp` for better compatibility.
Modifications:
Changed `lc + pb` to `lc + lp`.
Result:
Correct check of `lc + lp` value.
Motivation:
LZMA compression algorithm has a very good compression ratio.
Modifications:
- Added `lzma-java` library which implements LZMA algorithm.
- Implemented LzmaFrameEncoder which extends MessageToByteEncoder and provides compression of outgoing messages.
- Added tests to verify the LzmaFrameEncoder and how it can compress data for the next uncompression using the original library.
Result:
LZMA encoder which can compress data using LZMA algorithm.
Motivation:
ExtensionRegistry is a subclass of ExtensionRegistryLite. The ProtobufDecoder
doesn't use the registry directly, it simply passes it through to the Protobuf
API. The Protobuf calls in question are themselves written in terms
ExtensionRegistryLite not ExtensionRegistry.
Modifications:
Require ExtensionRegistryLite instead of ExtensionRegistry in ProtobufDecoder.
Result:
Consumers can use ExtensionRegistryLite with ProtobufDecoder.
Related issue: #2821
Motivation:
There's no way for a user to change the default ZlibEncoder
implementation.
It is already possible to change the default ZlibDecoder implementation.
Modification:
Add a new system property 'io.netty.noJdkZlibEncoder'.
Result:
A user can disable JDK ZlibEncoder, just like he or she can disable JDK
ZlibDecoder.
Motivation:
We have some duplicated code that can be reused.
Modifications:
Create package private class called CodecUtil that now contains the shared code / helper method.
Result:
Less code-duplication
Motivation:
ByteToMessageCodec miss to check for @Sharable annotation in one of its constructors.
Modifications:
Ensure we call checkForSharableAnnotation in all constructors.
Result:
After your change, what will change.
Related issue: #2766
Motivation:
Forgot to rename them before the final release by mistake.
Modifications:
Rename and then re-introduce the deprecated version that extends the
renamed class.
Result:
Better naming
Motivation:
LZ4 compression codec provides sending and receiving data encoded by very fast LZ4 algorithm.
Modifications:
- Added `lz4` library which implements LZ4 algorithm.
- Implemented Lz4FramedEncoder which extends MessageToByteEncoder and provides compression of outgoing messages.
- Added tests to verify the Lz4FramedEncoder and how it can compress data for the next uncompression using the original library.
- Implemented Lz4FramedDecoder which extends ByteToMessageDecoder and provides uncompression of incoming messages.
- Added tests to verify the Lz4FramedDecoder and how it can uncompress data after compression using the original library.
- Added integration tests for Lz4FramedEncoder/Decoder.
Result:
Full LZ4 compression codec which can compress/uncompress data using LZ4 algorithm.
Motivation:
ByteToMessageDecoder and ReplayingDecoder have incorrect javadocs in some places.
Modifications:
Fix incorrect javadocs for both classes.
Result:
Correct javadocs for both classes
Motivation:
FastLZ compression codec provides sending and receiving data encoded by fast FastLZ algorithm using block mode.
Modifications:
- Added part of `jfastlz` library which implements FastLZ algorithm. See FastLz class.
- Implemented FastLzFramedEncoder which extends MessageToByteEncoder and provides compression of outgoing messages.
- Implemented FastLzFramedDecoder which extends ByteToMessageDecoder and provides uncompression of incoming messages.
- Added integration tests for `FastLzFramedEncoder/Decoder`.
Result:
Full FastLZ compression codec which can compress/uncompress data using FastLZ algorithm.
Motivation:
It is often very expensive to instantiate an exception. TextHeader
should not raise an exception when it failed to find a header or when
its header value is not valid.
Modification:
- Change the return type of the getter methods to Integer and Long so
that null is returned when no header is found or its value is invalid
- Update Javadoc
Result:
- Fixes#2758
- No unnecessary instantiation of exceptions
Motivation:
DefaultTextHeaders.getAll*() methods create an ArrayList whose initial
capacity is 4. However, it is more likely that the actual number of
values is smaller than that.
Modifications:
Reduce the initial capacity of the value list from 4 to 2
Result:
Slightly reduced memory footprint
Related issue: #2649 and #2745
Motivation:
At the moment there is no way to get and remove a header with one call.
This means you need to search the headers two times. We should add
getAndRemove(...) to allow doing so with one call.
Modifications:
Add getAndRemove(...) and getUnconvertedAndRemove(...) and their
variants
Result:
More efficient API
Motivation:
Before this changes Bzip2BitReader and Bzip2BitWriter accessed to ByteBuf byte by byte. So tests for Bzip2 compression codec takes a lot of time if we ran them with paranoid level of resource leak detection. For more information see comments to #2681 and #2689.
Modifications:
- Increased size of bit buffers from 8 to 64 bits.
- Improved reading and writing operations.
- Save link to incoming ByteBuf inside Bzip2BitReader.
- Added methods to check possible readable bits and bytes in Bzip2BitReader.
- Updated Bzip2 classes to use new API of Bzip2BitReader.
- Added new constants to Bzip2Constants.
Result:
Increased size of bit buffers and improved performance of Bzip2 compression codec (for general work by 13% and for tests with paranoid level of resource leak detection by 55%).
Motivation:
In ReplayingDecoder / ByteToMessageDecoder channelInactive(...) method we try to decode a last time and fire all decoded messages throw the pipeline before call ctx.fireChannelInactive(...). To keep the correct order of events we also need to call ctx.fireChannelReadComplete() if we read anything.
Modifications:
- Channel channelInactive(...) to call ctx.fireChannelReadComplete() if something was decoded
- Move out.recycle() to finally block
Result:
Correct order of events.
Motivation:
Complicated code of Bzip2 tests with some unnecessary actions.
Modifications:
- Reduce size of BYTES_LARGE array of random test data for Bzip2 tests.
- Removed unnecessary creations of EmbeddedChannel instances in Bzip2 tests.
- Simplified tests in Bzip2DecoderTest which expect exception.
- Removed unnecessary testStreamInitialization() from Bzip2EncoderTest.
Result:
Reduced time to test the 'codec' package by 7 percent, simplified code of Bzip2 tests.
Motivation:
Duplicated code of integration tests for different compression codecs.
Modifications:
- Added abstract class IntegrationTest which contains common tests for any compression codec.
- Removed common tests from Bzip2IntegrationTest and LzfIntegrationTest.
- Implemented abstract methods of IntegrationTest in Bzip2IntegrationTest, LzfIntegrationTest and SnappyIntegrationTest.
Result:
Removed duplicated code of integration tests for compression codecs and simplified an addition of integration tests for new compression codecs.
Motivation:
Sometimes we have a 'build time out' error because tests for bzip2 codec take a long time.
Modifications:
Removed cycles from Bzip2EncoderTest.testCompression(byte[]) and Bzip2DecoderTest.testDecompression(byte[]).
Result:
Reduced time to test the 'codec' package by 30 percent.
Motivation:
Fixed founded mistakes in compression codecs.
Modifications:
- Changed return type of ZlibUtil.inflaterException() from CompressionException to DecompressionException
- Updated @throws in javadoc of JZlibDecoder to throw DecompressionException instead of CompressionException
- Fixed JdkZlibDecoder to throw DecompressionException instead of CompressionException
- Removed unnecessary empty lines in JdkZlibEncoder and JZlibEncoder
- Removed public modifier from Snappy class
- Added MAX_UNCOMPRESSED_DATA_SIZE constant in SnappyFramedDecoder
- Used in.readableBytes() instead of (in.writerIndex() - in.readerIndex()) in SnappyFramedDecoder
- Added private modifier for enum ChunkType in SnappyFramedDecoder
- Fixed potential bug (sum overflow) in Bzip2HuffmanAllocator.first(). For more info, see http://googleresearch.blogspot.ru/2006/06/extra-extra-read-all-about-it-nearly.html
Result:
Fixed sum overflow in Bzip2HuffmanAllocator, improved exceptions in ZlibDecoder implementations, hid Snappy class
Motivation:
We create a new CompactObjectInputStream with ByteBufInputStream in ObjectDecoder.decode(...) method and don't close this InputStreams before return statement.
Modifications:
Save link to the ObjectInputStream and close it before return statement.
Result:
Close InputStreams and clean up unused resources. It will be better for GC.
Motivation:
LZF compression codec provides sending and receiving data encoded by very fast LZF algorithm.
Modifications:
- Added Compress-LZF library which implements LZF algorithm
- Implemented LzfEncoder which extends MessageToByteEncoder and provides compression of outgoing messages
- Added tests to verify the LzfEncoder and how it can compress data for the next uncompression using the original library
- Implemented LzfDecoder which extends ByteToMessageDecoder and provides uncompression of incoming messages
- Added tests to verify the LzfDecoder and how it can uncompress data after compression using the original library
- Added integration tests for LzfEncoder/Decoder
Result:
Full LZF compression codec which can compress/uncompress data using LZF algorithm.
Motivation:
It's not always the case that there is another handler in the pipeline that will intercept the exceptionCaught event because sometimes users just sub-class. In this case the exception will just hit the end of the pipeline.
Modification:
Throw the TooLongFrameException so that sub-classes can handle it in the exceptionCaught(...) method directly.
Result:
Sub-classes can correctly handle the exception,
Motivation:
Collect all bit-level read operations in one class is better. And now it's easy to use not only in Bzip2Decoder. For example, in Bzip2HuffmanStageDecoder.
Modifications:
Created a new class - Bzip2BitReader which provides bit-level reads.
Removed bit-level read operations from Bzip2Decoder.
Improved javadoc.
Result:
Bzip2BitReader allows the reading of single bit booleans, bit strings of arbitrary length (up to 24 bits), and bit aligned 32-bit integers.
Motivation:
There's no way to recover from a corrupted JSON stream. The current
implementation will raise an infinite exception storm when a peer sends
a large corrupted stream.
Modification:
Discard everything once stream corruption is detected.
Result:
Fixes a buffer leak
Fixes exception storm
Motivation:
See GitHub Issue #2536.
Modifications:
Introduce the class JsonObjectDecoder to split a JSON byte stream
into individual JSON objets/arrays.
Result:
A Netty application can now handle a byte stream where multiple JSON
documents follow eachother as opposed to only a single JSON document
per request.
Motivation:
bytesBefore(length, ...), bytesBefore(index, length, ...), and
indexOf(fromIndex, toIndex,...) in ReplayingDecoderBuffer are buggy.
They trigger 'REPLAY even when they don't need to.
Modification:
Implement the buggy methods properly so that REPLAYs are not triggered
unnecessarily.
Result:
Correct behvaior
Motivation:
At the moment we use a lot of unnecessary memory copies in JdkZlibEncoder. This is caused by either allocate a to small ByteBuf and expand it later or using a temporary byte array.
Beside this the memory footprint of JdkZlibEncoder is pretty high because of the byte[] used for compressing.
Modification:
- Override allocateBuffer(...) and calculate the estimatedsize in there, this reduce expanding of the ByteBuf later
- Not use byte[] in the instance itself but allocate a heap ByteBuf and write directly into the byte array
Result:
Less memory copies and smaller memory footprint
If decompression fails, the buffer that contains the decompressed data
is not released. Bzip2DecoderTest.testStreamCrcError() also does not
release the partial output Bzip2Decoder produces.
Motivation:
MessageToByteEncoder always starts with ByteBuf that use initalCapacity == 0 when preferDirect is used. This is really wasteful in terms of performance as every first write into the buffer will cause an expand of the buffer itself.
Modifications:
- Change ByteBufAllocator.ioBuffer() use the same default initialCapacity as heapBuffer() and directBuffer()
- Add new allocateBuffer method to MessageToByteEncoder that allow the user to do some smarter allocation based on the message that will be encoded.
Result:
Less expanding of buffer and more flexibilty when allocate the buffer for encoding.
Motivation:
The proxy protocol provides client connection information for proxied
network services. Several implementations exist (e.g. Haproxy, Stunnel,
Stud, Postfix), but the primary motivation for this implementation is to
support the proxy protocol feature of Amazon Web Services Elastic Load
Balancing.
Modifications:
This commit adds a proxy protocol decoder for proxy protocol version 1
as specified at:
http://haproxy.1wt.eu/download/1.5/doc/proxy-protocol.txt
The foundation for version 2 support is also in this commit but it is
explicitly NOT supported due to a lack of external implementations to
test against.
Result:
The proxy protocol decoder can be used to send client connection
information to inbound handlers in a channel pipeline from services
which support the proxy protocol.
Motivation:
When Netty runs in a managed environment such as web application server,
Netty needs to provide an explicit way to remove the thread-local
variables it created to prevent class loader leaks.
FastThreadLocal uses different execution paths for storing a
thread-local variable depending on the type of the current thread.
It increases the complexity of thread-local removal.
Modifications:
- Moved FastThreadLocal and FastThreadLocalThread out of the internal
package so that a user can use it.
- FastThreadLocal now keeps track of all thread local variables it has
initialized, and calling FastThreadLocal.removeAll() will remove all
thread-local variables of the caller thread.
- Added FastThreadLocal.size() for diagnostics and tests
- Introduce InternalThreadLocalMap which is a mixture of hard-wired
thread local variable fields and extensible indexed variables
- FastThreadLocal now uses InternalThreadLocalMap to implement a
thread-local variable.
- Added ThreadDeathWatcher.unwatch() so that PooledByteBufAllocator
tells it to stop watching when its thread-local cache has been freed
by FastThreadLocal.removeAll().
- Added FastThreadLocalTest to ensure that removeAll() works
- Added microbenchmark for FastThreadLocal and JDK ThreadLocal
- Upgraded to JMH 0.9
Result:
- A user can remove all thread-local variables Netty created, as long as
he or she did not exit from the current thread. (Note that there's no
way to remove a thread-local variable from outside of the thread.)
- FastThreadLocal exposes more useful operations such as isSet() because
we always implement a thread local variable via InternalThreadLocalMap
instead of falling back to JDK ThreadLocal.
- FastThreadLocalBenchmark shows that this change improves the
performance of FastThreadLocal even more.
Motivation:
JdkZlibDecoder fails to decode because the length of the output buffer is not calculated correctly.
This can cause an IndexOutOfBoundsException or data-corruption when the PooledByteBuffAllocator is used.
Modifications:
Correctly calculate the length
Result:
No more IndexOutOfBoundsException or data-corruption.
Motivation:
We have quite a bit of code duplication between HTTP/1, HTTP/2, SPDY,
and STOMP codec, because they all have a notion of 'headers', which is a
multimap of string names and values.
Modifications:
- Add TextHeaders and its default implementation
- Add AsciiString to replace HttpHeaderEntity
- Borrowed some portion from Apache Harmony's java.lang.String.
- Reimplement HttpHeaders, SpdyHeaders, and StompHeaders using
TextHeaders
- Add AsciiHeadersEncoder to reuse the encoding a TextHeaders
- Used a dedicated encoder for HTTP headers for better performance
though
- Remove shortcut methods in SpdyHeaders
- Replace SpdyHeaders.getStatus() with HttpResponseStatus.parseLine()
Result:
- Removed quite a bit of code duplication in the header implementations.
- Slightly better performance thanks to improved header validation and
hash code calculation
Motivation:
Provide a faster ThreadLocal implementation
Modification:
Add a "FastThreadLocal" which uses an EnumMap and a predefined fixed set of possible thread locals (all of the static instances created by netty) that is around 10-20% faster than standard ThreadLocal in my benchmarks (and can be seen having an effect in the direct PooledByteBufAllocator benchmark that uses the DEFAULT ByteBufAllocator which uses this FastThreadLocal, as opposed to normal instantiations that do not, and in the new RecyclableArrayList benchmark);
Result:
Improved performance
Motivation:
We have different message aggregator implementations for different
protocols, but they are very similar with each other. They all stems
from HttpObjectAggregator. If we provide an abstract class that provide
generic message aggregation functionality, we will remove their code
duplication.
Modifications:
- Add MessageAggregator which provides generic message aggregation
- Reimplement all existing aggregators using MessageAggregator
- Add DecoderResultProvider interface and extend it wherever possible so
that MessageAggregator respects the state of the decoded message
Result:
Less code duplication
Motivation:
At the moment MessageToMessageEncoder uses ctx.write(msg) when have more then one message was produced. This may produce more GC pressure then necessary as when the original ChannelPromise is a VoidChannelPromise we can safely also use one when write messages.
Modifications:
Use VoidChannelPromise when the original ChannelPromise was of this type
Result:
Less object creation and GC pressure
Motivation:
At the moment we call ByteBuf.readBytes(...) in these handlers but with optimizations done as part of 25e0d9d we can just use readSlice(...).retain() and eliminate the memory copy.
Modifications:
Replace ByteBuf.readBytes(...) usage with readSlice(...).retain().
Result:
Less memory copies.
Motivation:
4 and 5 were diverged long time ago and we recently reverted some of the
early commits in master. We must make sure 4.1 and master are not very
different now.
Modification:
Fix found differences
Result:
4.1 and master got closer.
Motivation:
The problem with the current snappy implementation is that it does
not comply with framing format definition found on
https://code.google.com/p/snappy/source/browse/trunk/framing_format.txt
The document describes that chunk type of the stream identifier is defined
as 0xff. The current implentation uses 0x80.
Modifications:
This patch replaces the first byte of the chunk type of the stream identifier
with 0xff.
Result:
After this modification the snappy implementation is compliant to the
framing format described at
https://code.google.com/p/snappy/source/browse/trunk/framing_format.txt.
This results in a better compatibility with other implementations.
Motivation:
When using System.getProperty(...) and various methods to get a ClassLoader it will fail when a SecurityManager is in place.
Modifications:
Use a priveled block if needed. This work is based in the PR #2353 done by @anilsaldhana .
Result:
Code works also when SecurityManager is present
Motivation:
At the moment a user can not safetly call slice().retain() or duplicate.retain()in the ByteToMessageDecoder.decode(...) implementation without the risk to see coruption because we may call discardSomeReadBytes() to make room on the buffer once the handling is done.
Modifications:
Check for the refCnt() before call discardSomeReadBytes() and also check before call decode(...) to create a copy if needed.
Result:
The user can safetly call slice().retain() or duplicate.retain() in his/her ByteToMessageDecoder.decode(...) method.
Motivation:
Reduce memory usage in ProtobufVarint32LengthFieldPrepender.
Modifications:
Explicit set the buffer size that is needed for the header (between 1 and 5 bytes).
Result:
Less memory usage in ProtobufVarint32LengthFieldPrepender.
Motivation:
Remove the synchronization bottleneck and so speed up things
Modifications:
Introduce a ThreadLocal cache that holds mappings between classes of ChannelHandlerAdapater implementations and the result of checking if the @Sharable annotation is present.
This way we only will need to do the real check one time and server the other calls via the cache. A ThreadLocal and WeakHashMap combo is used to implement the cache
as this way we can minimize the conditions while still be sure we not leak class instances in containers.
Result:
Less conditions during adding ChannelHandlerAdapter to the ChannelPipeline
- Related: #2163
- Add ResourceLeakHint to allow a user to provide a meaningful information about the leak when touching it
- DefaultChannelHandlerContext now implements ResourceLeakHint to tell where the message is going.
- Cleaner resource leak report by excluding noisy stack trace elements
- Fixes#2014
- Add the tests that mix JDK ZLIB codec and JZlib codecs
- Fix a bug where JdkZlibEncoder does not encode the GZIP header when nothing was written to te channel
- Fix a bug where the encoders do not consider the overhead of the wrapper format when calculating the estimated compressed output size.
- Fix a bug where the decoders do not discard the received data after the compressed stream is finished
- Fixes#2003 properly
- Instead of using 'bundle' packaging, use 'jar' packaging. This is
more robust because some strict build tools fail to retrieve the
artifacts from a Maven repository unless their packaging is not 'jar'.
- All artifacts now contain META-INF/io.netty.version.properties, which
provides the detailed information about the build and repository.
- Removed OSGi testsuite temporarily because it gives false errors
during split package test and examination.
- Add io.netty.util.Version for easy retrieval of version information
This fixes#1664 and revert also the original commit which was meant to fix it 3b1881b523 . The problem with the original commit was that it could delay handlerRemove(..) calls and so mess up the order or forward bytes to late.
- write() now accepts a ChannelPromise and returns ChannelFuture as most
users expected. It makes the user's life much easier because it is
now much easier to get notified when a specific message has been
written.
- flush() does not create a ChannelPromise nor returns ChannelFuture.
It is now similar to what read() looks like.
- Remove channelReadSuspended because it's actually same with messageReceivedLast
- Rename messageReceived to channelRead
- Rename messageReceivedLast to channelReadComplete
We renamed messageReceivedLast to channelReadComplete because it
reflects what it really is for. Also, we renamed messageReceived to
channelRead for consistency in method names.
I must admit MesageList was pain in the ass. Instead of forcing a
handler always loop over the list of messages, this commit splits
messageReceived(ctx, list) into two event handlers:
- messageReceived(ctx, msg)
- mmessageReceivedLast(ctx)
When Netty reads one or more messages, messageReceived(ctx, msg) event
is triggered for each message. Once the current read operation is
finished, messageReceivedLast() is triggered to tell the handler that
the last messageReceived() was the last message in the current batch.
Similarly, for outbound, write(ctx, list) has been split into two:
- write(ctx, msg)
- flush(ctx, promise)
Instead of writing a list of message with a promise, a user is now
supposed to call write(msg) multiple times and then call flush() to
actually flush the buffered messages.
Please note that write() doesn't have a promise with it. You must call
flush() to get notified on completion. (or you can use writeAndFlush())
Other changes:
- Because MessageList is completely hidden, codec framework uses
List<Object> instead of MessageList as an output parameter.
- 5% improvement in throughput (HelloWorldServer example)
- Made CompositeByteBuf a concrete class (renamed from DefaultCompositeByteBuf) because there's no multiple inheritance in Java
Fixes#1536
- Related: #1378
- They now accept only one argument.
- A user who wants to use a buffer for more complex use cases, he or she can always access the buffer directly via memoryAddress() and array()
The API changes made so far turned out to increase the memory footprint
and consumption while our intention was actually decreasing them.
Memory consumption issue:
When there are many connections which does not exchange data frequently,
the old Netty 4 API spent a lot more memory than 3 because it always
allocates per-handler buffer for each connection unless otherwise
explicitly stated by a user. In a usual real world load, a client
doesn't always send requests without pausing, so the idea of having a
buffer whose life cycle if bound to the life cycle of a connection
didn't work as expected.
Memory footprint issue:
The old Netty 4 API decreased overall memory footprint by a great deal
in many cases. It was mainly because the old Netty 4 API did not
allocate a new buffer and event object for each read. Instead, it
created a new buffer for each handler in a pipeline. This works pretty
well as long as the number of handlers in a pipeline is only a few.
However, for a highly modular application with many handlers which
handles connections which lasts for relatively short period, it actually
makes the memory footprint issue much worse.
Changes:
All in all, this is about retaining all the good changes we made in 4 so
far such as better thread model and going back to the way how we dealt
with message events in 3.
To fix the memory consumption/footprint issue mentioned above, we made a
hard decision to break the backward compatibility again with the
following changes:
- Remove MessageBuf
- Merge Buf into ByteBuf
- Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler
- Similar changes were made to the adapter classes
- Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler
- Similar changes were made to the adapter classes
- Introduce MessageList which is similar to `MessageEvent` in Netty 3
- Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList)
- Replace flush(ctx, promise) with write(ctx, MessageList, promise)
- Remove ByteToByteEncoder/Decoder/Codec
- Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf>
- Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel
- Add SimpleChannelInboundHandler which is sometimes more useful than
ChannelInboundHandlerAdapter
- Bring back Channel.isWritable() from Netty 3
- Add ChannelInboundHandler.channelWritabilityChanges() event
- Add RecvByteBufAllocator configuration property
- Similar to ReceiveBufferSizePredictor in Netty 3
- Some existing configuration properties such as
DatagramChannelConfig.receivePacketSize is gone now.
- Remove suspend/resumeIntermediaryDeallocation() in ByteBuf
This change would have been impossible without @normanmaurer's help. He
fixed, ported, and improved many parts of the changes.
* This could cause for example corrupt WebSocketFrame's if they was written from the server
to the client directly after it send the handshake response.
- Fixes#1308
freeInboundBuffer() and freeOutboundBuffer() were introduced in the early days of the new API when we did not have reference counting mechanism in the buffer. A user did not want Netty to free the handler buffers had to override these methods.
However, now that we have reference counting mechanism built into the buffer, a user who wants to retain the buffers beyond handler's life cycle can simply return the buffer whose reference count is greater than 1 in newInbound/OutboundBuffer().
- Added a test case that reproduces the problem in ReplayingDecoderTest
- Call newHandler.handlerAdded() *before* oldHandler.handlerRemoved() to ensure newHandlerAdded() is called before forwarding the buffer content of the old handler in replace0().
This change also introduce a few other changes which was needed:
* ChannelHandler.beforeAdd(...) and ChannelHandler.beforeRemove(...) were removed
* ChannelHandler.afterAdd(...) -> handlerAdded(...)
* ChannelHandler.afterRemoved(...) -> handlerRemoved(...)
* SslHandler.handshake() -> SslHandler.hanshakeFuture() as the handshake is triggered automatically after
the Channel becomes active
- Return early when the buffer is empty
- Keep only the number of byte buffers
- Remove unnecessary null check in the loop (because we know buffer is not empty at certain point)