Motivation:
We noticed that the headers implementation in Netty for HTTP/2 uses quite a lot of memory
and that also at least the performance of randomly accessing a header is quite poor. The main
concern however was memory usage, as profiling has shown that a DefaultHttp2Headers
not only use a lot of memory it also wastes a lot due to the underlying hashmaps having
to be resized potentially several times as new headers are being inserted.
This is tracked as issue #3600.
Modifications:
We redesigned the DefaultHeaders to simply take a Map object in its constructor and
reimplemented the class using only the Map primitives. That way the implementation
is very concise and hopefully easy to understand and it allows each concrete headers
implementation to provide its own map or to even use a different headers implementation
for processing requests and writing responses i.e. incoming headers need to provide
fast random access while outgoing headers need fast insertion and fast iteration. The
new implementation can support this with hardly any code changes. It also comes
with the advantage that if the Netty project decides to add a third party collections library
as a dependency, one can simply plug in one of those very fast and memory efficient map
implementations and get faster and smaller headers for free.
For now, we are using the JDK's TreeMap for HTTP and HTTP/2 default headers.
Result:
- Significantly fewer lines of code in the implementation. While the total commit is still
roughly 400 lines less, the actual implementation is a lot less. I just added some more
tests and microbenchmarks.
- Overall performance is up. The current implementation should be significantly faster
for insertion and retrieval. However, it is slower when it comes to iteration. There is simply
no way a TreeMap can have the same iteration performance as a linked list (as used in the
current headers implementation). That's totally fine though, because when looking at the
benchmark results @ejona86 pointed out that the performance of the headers is completely
dominated by insertion, that is insertion is so significantly faster in the new implementation
that it does make up for several times the iteration speed. You can't iterate what you haven't
inserted. I am demonstrating that in this spreadsheet [1]. (Actually, iteration performance is
only down for HTTP, it's significantly improved for HTTP/2).
- Memory is down. The implementation with TreeMap uses on avg ~30% less memory. It also does not
produce any garbage while being resized. In load tests for GRPC we have seen a memory reduction
of up to 1.2KB per RPC. I summarized the memory improvements in this spreadsheet [1]. The data
was generated by [2] using JOL.
- While it was my original intend to only improve the memory usage for HTTP/2, it should be similarly
improved for HTTP, SPDY and STOMP as they all share a common implementation.
[1] https://docs.google.com/spreadsheets/d/1ck3RQklyzEcCLlyJoqDXPCWRGVUuS-ArZf0etSXLVDQ/edit#gid=0
[2] https://gist.github.com/buchgr/4458a8bdb51dd58c82b4
Motivation:
The MqttEncoder was failing to build because it was using a method that doesn't exist.
Modifications:
Change sessionPresent() to isSessionPresent().
Result:
MqttEnccoder is now able to build.
Motivation:
The HttpObjectDecoder is on the hot code path for the http codec. There are a few hot methods which can be modified to improve performance.
Modifications:
- Modify AppendableCharSequence to provide unsafe methods which don't need to re-check bounds for every call.
- Update HttpObjectDecoder methods to take advantage of new AppendableCharSequence methods.
Result:
Peformance boost for decoding http objects.
Motivation:
We don't decrease acquired channel count in FixedChannelPool when timeout occurs by AcquireTimeoutAction.NEW and eventually fails.
Modifications:
Set AcquireTask.acquired=true to call decrementAndRunTaskQueue when timeout action fails.
Result:
Acquired channel count decreases correctly.
Motivation:
Two problems:
1. Decoder assumption that as soon as it finds </ element it can decrement opened xml brackets counter. It can lead to bugs when closing bracket is not in byteBuf yet.
2. Not proper handling of more than two root elements in XML document. First element will be processed properly, second one not. It is caused by assumption that byteBuf readerIndex is 0 at the begging of decoding.
Modifications:
Both problems were resolved by fixes:
1. decrement opened brackets count only if </ > enclosing bracket is found
2. consider readerIndex higher than 0 when counting output frame length
Result:
Both problems were resolved
Motivation:
When looking through the logs for entries pertaining to a specific stream, it's difficult because header entries use the syntax "streamId:<id>" but all other entries use "streamId=<id>". We should make all of the entries consistent.
Modifications:
Changed header entries to use "streamId=<id>" to match the other entries.
Result:
Easier HTTP/2 log navigation.
Motivation:
We should support XXXCollections methods for all primitive map types.
Modifications:
Removed PrimitiveCollections and added a template for XXXCollections.
Result:
Fixes#4001
Motivation:
It would be useful to support the Java `Map` interface in our primitive maps.
Modifications:
Renamed current methods to "pXXX", where p is short for "primitive". Made the template for all primitive maps extend the appropriate Map interface.
Result:
Fixes#3970
Motivation:
We pass-through non ByteBuf when SslHandler.write(...) is called which can lead to have unencrypted data to be send (like for example if a FileRegion is written).
Modifications:
- Fail ChannelPromise with UnsupportedMessageException if a non ByteBuf is written.
Result:
Only allow ByteBuf to be written when using SslHandler.
Motivation:
Remove RC4 from default ciphers as it is not known as secure anymore.
Modifications:
Remove RC4
Result:
Not use an insecure cipher as default.
Motivation:
We missed to correctly count acquired channels in FixedChannelPool which could produce an assert error.
Modifications:
Only try to decrement acquired count if the channel was really acuired.
Result:
No more assert error possible.
Motivation:
HttpToHttp2ConnectionHandler only converts FullHttpMessage to HTTP/2 Frames. This does not support other use cases such as adding a HttpContentCompressor to the pipeline, which writes HttpMessage and HttpContent.
Additionally HttpToHttp2ConnectionHandler ignores converting and sending HTTP trailing headers, which is a bug as the HTTP/2 spec states that they should be sent.
Modifications:
Update HttpToHttp2ConnectionHandler to support converting HttpMessage and HttpContent to HTTP/2 Frames.
Additionally, include an extra call to writeHeaders if the message includes trailing headers
Result:
One can now write HttpMessage and HttpContent (http chunking) down the pipeline and they will be converted to HTTP/2 Frames. If any trailing headers exist, they will be converted and sent as well.
Motivation:
We missed to register for EPOLLRDHUP events when construct the EpollSocketChannel from an existing FileDescriptor. This could cause to miss connection-resets.
Modifications:
Add Native.EPOLLRDHUP to the events we are interested in.
Result:
Connection-resets are detected correctly.
Motivation:
The bufferingNewStreamFailsAfterGoAwayReceived method currently causes an NPE.
Modifications:
Fixed the test so that a valid ByteBuf is passed in.
Result:
The test no longer throws an NPE.
Motivation:
Prior we used a purge task that would remove previous canceled scheduled tasks from the internal queue. This could introduce some delay and so use a lot of memory even if the task itself is already canceled.
Modifications:
Schedule removal of task from queue via EventLoop if cancel operation is not done in the EventLoop Thread or just remove directly if the Thread that cancels the scheduled task is in the EventLoop.
Result:
Faster possibility to GC a canceled ScheduledFutureTask.
Motivation:
DnsResolver.resolve(...) fails when an InetSocketAddress is used that was constructed of an ipaddress string.
Modifications:
Don't try to lookup when the InetSocketAddress was constructed via an ipaddress.
Result:
DnsResolver.resolve(...) works in all cases.
Motivation:
It is currently assumed that all usages of the HTTP/2 codec will be from the same event loop context. If the methods are used outside of the assumed thread context then unexpected behavior is observed. This assumption should be more clearly communicated and enforced in key areas.
Modifications:
- The flow controller interfaces have assert statements and updated javadocs indicating the assumptions.
Result:
Interfaces more clearly indicate thread context limitations.
Motivation:
If the Channel is already closed when the PendingWriteQueue is created it will generate a NPE when add or remove is called later.
Modifications:
Add null checks to guard against NPE.
Result:
No more NPE possible.
Motivation:
The CompressorHttp2ConnectionEncoder is attempting to attach a property to streams before the exist.
Modifications:
- Allow the super class to create the streams before attempting to attach a property to the stream.
Result:
CompressorHttp2ConnectionEncoder is able to set the property and access the compressor.
Motivation:
WebSocketServerHandshakerFactory.sendUnsupportedVersionResponse does not
send a LastHttpContent, nor does it flush, and it doesn't send a content
length.
Modifications:
Changed sendUnsupportedVersionResponse to send FullHttpResponse, to
writeAndFlush, and to set a content length of 0. Also added a test for
this method.
Result:
Upstream handlers will be able to determine the end of the response, the
response will actually get written to the client, and the client will be
able to determine the end of the response.
Motivation:
See #3783
Modifications:
- The DefaultHttp2RemoteFlowController should use Channel.isWritable() before attempting to do any write operations.
- The Flow controller methods should no longer take ChannelHandlerContext. The concept of flow control is tied to a connection and we do not support 1 flow controller keeping track of multiple ChannelHandlerContext.
Result:
Writes are delayed until isWritable() is true. Flow controller interface methods are more clear as to ChannelHandlerContext restrictions.
Motivation:
Test was leaving composite buffers taken from the queue unreleased.
Modifications:
Make the test release buffers.
Result:
Nagging about leaked buffers should stop.
Motivation:
The MAX_HEADER_LIST_SIZE of SETTINGS is represented by
unsigned 32-bit value and this value isn't limited in RFC7540.
But in current implementation, its stored to int variable so
over 2^31-1 value is recognized as minus and handled as
PROTOCOL_ERROR.
Modifications:
If a value of MAX_HEADER_LIST_SIZE is larger than 2^31-1, its
handled as 2^31-1
Result:
Over 2^31-1 MAX_HEADER_LIST_SIZE is became acceptable
Motivation:
Slicing a mutable CompositeByteBuf is not the appropriate mechanism to use to track and release buffers that have been written to a channel.
In particular buffers passed over an Embedded or LocalChannel are retained after the ChannelPromise is completed and listening to the
promise to consolidate a CompositeBuffer breaks slices taken from the composite as the offset indices have changed.
In addition CoalescingBufferQueue handles taking arbitrarily sized slices of a sequence of buffers more efficiently.
Modifications:
Convert FlowControlledData to use a CoalescingBufferQueue to handle merging data writes.
Result:
HTTP2 works over LocalChannel and code is considerably simpler.
Motivation:
Sometimes people use a data frame with length 0 to end a stream(such as jetty http2-server). So it is possible that data.readableBytes and padding are all 0 for a data frame, and cause an IllegalArgumentException when calling flowController.consumeBytes.
Modifications:
Return false when numBytes == 0 instead of throwing IllegalArgumentException.
Result:
Fix IllegalArgumentException.
Motivation:
Simplifies writing code that needs to merge or slice a sequence of buffer & promise pairs into chunks of arbitrary sizes.
For example in HTTP2 we merge or split buffers across fixed-size DATA frame boundaries.
Modifications:
Add new utility class CoalescingBufferQueue
Result:
Following this change HTTP2 code will switch to use it instead of CompositeByteBuffer for DATA frame coalescing.
Motivation:
The problem is described in https://github.com/grpc/grpc-java/issues/605. Basically, when using `StreamBufferingEncoder` there is a chance of creating zombie streams that never get closed.
Modifications:
Change `Http2ConnectionHandler`'s `channelInactive` handling logic to shutdown the encoder/decoder before shutting down the active streams.
Result:
Fixes https://github.com/grpc/grpc-java/issues/605
Motivation:
While cherry-picked 11f9e9084b I changed the EmbeddedChannel implementation to not allow no ChannelHandlers when constructing it.
This was done by mistake.
Modifications:
Revert change and add unit test.
Result:
Restore old behavior.
Motivation:
When using an EmbeddedChannel often it either does inbound or outbound processing which means we only often need one queue.
Modifications:
Lazy init the inbound and outbound message queues.
Result:
Less memory usage.
Motivation:
When we detect a BUFFER_OVERFLOW we should just forward the already produced data and allocate a new buffer and NOT do any extra memory copies while trying to expand the buffer.
Modifications:
When a BUFFER_OVERFLOW is returned and some data was produced just fire this data through the pipeline and allocate a new buffer to read again.
Result:
Less memorycopies and so better performance.
Motivation:
A SSL_read is needed to ensure the bio buffer is flushed, for this we did a priming read. This can be removed in many cases. Also ensure we always fill as much as possible in the destination buffers.
Modifications:
- Only do priming read if capacity of all dsts buffers is zero
- Always produce as must data as possible in the dsts buffers.
Result:
Faster code.
Motivation:
Previous we called BIO_write until either everything was written into it or it returned an error, which meant that the buffer is full. This then needed a ERR_clear_error() call which is expensive.
Modifications:
Break out of writing loop once we detect that not everything was written and so the buffer is full.
Result:
Less overhead when writing more data then the internal buffer can take.
Motivation:
When BIO_write is called with an empty buffer it will return 0 for which we call ERR_clear_error(). This is not neccessary as we should just skip these buffers. This eliminates a lot of overhead.
Modifications:
Skip empty src buffers when call unwrap(...).
Result:
Less overhead for unwrap(...) when called with empty buffers.
Motivation:
Http2ConnectionHandler missed to forward channelReadComplete(...) events.
Modifications:
Ensure we notify the next handler in the pipeline via ctx.fireChannelReadComplete().
Result:
Correctly forwarding of event.
Motivation:
The HTTP/2 hello world example server should be expecting a FullHttpRequest when falling back to HTTP/1.x mode.
Modifications:
- HelloWorldHttp1Handler should process FullHttpRequestObjects
- Http2ServerInitializer should insert an HttpObjectAggregator into the pipeline if no upgrade was attempted
Result:
Responses from the HelloWorldHttp1Handler should only come after full HTTP requests are received.
Motivation:
The Http2FrameListener requires that the Http2FrameReader accumulate ByteBuf objects. There is currently no way to limit the amount of bytes that is accumulated.
Motiviation:
- DefaultHttp2FrameReader supports maxHeaderSize which will fail fast as soon as the maximum size is exceeded.
- DefaultHttp2HeadersDecoder will respect the return value of endHeaderBlock() and fail if the max size is exceeded.
Result:
Frames which carry header data now respect a maximum number of bytes that can be accumulated.
Motivation:
If server sends SETTINGS with ENABLE_PUSH, its handled as
PROTOCOL_ERROR in spite of the value. But the value specified to
0 may be allowed in RFC7540.
Modifications:
Check whether ENABLE_PUSH sent from a server is 0 or not.
Result:
When server specifies ENABLE_PUSH to 0 explicitly, client doesn't
handle it as PROTOCOL_ERROR.
Motivation:
If a user tries to access various informations on the OpenSslSession after the SSLEngine was closed it will not work if these were not accessed before as we lazy init most of them.
Modifications:
Directly populate the whole OpenSslSession once the handshake is complete and before the user is notified about it.
Result:
OpenSslSession informations are avaible until it is GC'ed.
Motivation:
At the moment we directly closed the Channel when an exception accoured durring initChannel(...) without giving the user any way to do extra or special handling.
Modifications:
Handle the exception in exceptionCaught(...) of the ChannelInitializer which will by default log and close the Channel. This way the user can override this.
Result:
More felixible handling of exceptions.
Motivation:
Currently in EmbeddedChannel we add the ChannelHandlers before the Channel is registered which leads to have the handlerAdded(...) callback
be called from outside the EventLoop and also prevent the user to obtain a reference to the EventLoop in the callback itself.
Modifications:
Delay adding ChannelHandlers until EmbeddedChannel is registered.
Result:
Correctly call handlerAdded(...) after EmbeddedChannel is registered.
Motivation:
If you set a ChannelHandler via ServerBootstrap.handler(...) it is added to the ChannelPipeline before the Channel is registered. This will lead to and IllegalStateException if a user tries to access the EventLoop in the ChannelHandler.handlerAdded(...) method.
Modifications:
Delay the adding of the ChannelHandler until the Channel was registered.
Result:
No more IllegalStateException.
Motivation:
As we modify the position of the passed in ByteBuffer's this methods are not thread-safe.
Modifications:
Duplicate the input ByteBuffers before copy the content to byte[].
Result:
Unpooled.copiedBuffer(ByteBuffer) and copiedBuffer(ByteBuffer...) is now thread-safe.
Motivation:
The javadoc of ByteBuf contained some out-dated code.
Modifications:
Update code example in javadoc to use netty 4+ API
Result:
Correct javadocs
Motivation:
FixedCompositeByteBuf does not properly implement a number of methods for
copying its content to direct buffers and output streams
Modifications:
Replace improper use of capacity() with readableBytes() when computing offesets during writes
Result:
Copying works correctly