Motivation:
We not set any optimization flag when compile native transport
Modification:
Add -O3 to CFLAGS to have GCC do optimizations
Result:
Ship optimized native code
Motivation:
Sometimes the user already has a PrivateKey / X509Certificate which should be used to create a new SslContext. At the moment we only allow to construct it via Files.
Modifications:
- Add new methods to the SslContextBuilder to allow creating a SslContext from PrivateKey / X509Certificate
- Mark all public constructors of *SslContext as @Deprecated, the user should use SslContextBuilder
- Update tests to us SslContextBuilder.
Result:
Creating of SslContext is possible with PrivateKay/X509Certificate
Motivation:
The acquire channel function resulted in calling itself several times in case when channel polled from the pool queue was unhealthy, which resulted FixedChannelPool to be called several times which in it's turn caused FixedChannelPool.acquire() to be called and resulted into acquireChannelCount to be unnecessary increased.
Example use case:
1) Create FixedChannelPool instance with one channel in the pool: new FixedChannelPool(cb, handler, 1)
2) Acquire channel A from the pool
3) close the channel A
4) Return it back to the pool
5) Acquire channel from the same pool again
Expected result:
new channel created and acquired, channel A that has been closed discarded and removed from the pool from being unhealthy
Actual result:
Channel A had been removed from the pool, how ever the new channel had never be acquired, instead the request to acquire had been added to the pending queue in FixedChannelPool and the acquireChannelCount is increased by one. The reason is that at the time when SimpleChannelPool figured out that the channel was unhealthy called FixedChannelPool.acquire to try to acquire new channel, how ever the request was added to the pendingTakQueue because by the time when FixedChannelPool.acquire was called, the acquireChannelCount was already "1" so new channel ould not be created cause of maxChannelsLimit=1.
Modifications:
The suggested approach modifies the SimpleChannelPool in a way so that when channel detected to be unhealthy it calls private method SimpleChannelPool.acquireHealthyFromPoolOrNew() which guarantees that SimpleChannelPool actually either finds a healthy channel in the pool and returns it or causes the promise.cause() in case when new channel was failed to be created.
Result:
The ```acquiredChannelCount``` is now calculated correctly as a result of SimpleChannelPool.acquire() of not being recursive on overridable acquire method.
Motivation:
Even though MemoryRegionCache$Entry instances are allocated through a recycler they are not properly recycled,
leaving a lot of instances to be GCed along with Recycler$DefaultHandle objects.
Fixes#4071
Modification:
Recycle Entry when done using it.
Result:
Less GCed objects.
Motiviation:
The current read loops don't fascilitate reading a maximum amount of bytes. This capability is useful to have more fine grain control over how much data is injested.
Modifications:
- Add a setMaxBytesPerRead(int) and getMaxBytesPerRead() to ChannelConfig
- Add a setMaxBytesPerIndividualRead(int) and getMaxBytesPerIndividualRead to ChannelConfig
- Add methods to RecvByteBufAllocator so that a pluggable scheme can be used to control the behavior of the read loop.
- Modify read loop for all transport types to respect the new RecvByteBufAllocator API
Result:
The ability to control how many bytes are read for each read operation/loop, and a more extensible read loop.
Motivation:
In the event an HTTP message does not include either a content-length or a transfer-encoding header [RFC 7230](https://tools.ietf.org/html/rfc7230#section-3.3.3) states the behavior must be treated differently for requests and responses. If the channel is half closed then the HttpObjectDecoder is not invoking decodeLast and thus not checking if messages should be sent up the pipeline.
Modifications:
- Add comments to clarify regular decode default case.
- Handle the ChannelInputShutdownEvent in the HttpObjectDecoder and evaluate if messages need to be generated.
Result:
Messages are generated on half closed, and comments clarify existing logic.
Motivation:
We noticed that the headers implementation in Netty for HTTP/2 uses quite a lot of memory
and that also at least the performance of randomly accessing a header is quite poor. The main
concern however was memory usage, as profiling has shown that a DefaultHttp2Headers
not only use a lot of memory it also wastes a lot due to the underlying hashmaps having
to be resized potentially several times as new headers are being inserted.
This is tracked as issue #3600.
Modifications:
We redesigned the DefaultHeaders to simply take a Map object in its constructor and
reimplemented the class using only the Map primitives. That way the implementation
is very concise and hopefully easy to understand and it allows each concrete headers
implementation to provide its own map or to even use a different headers implementation
for processing requests and writing responses i.e. incoming headers need to provide
fast random access while outgoing headers need fast insertion and fast iteration. The
new implementation can support this with hardly any code changes. It also comes
with the advantage that if the Netty project decides to add a third party collections library
as a dependency, one can simply plug in one of those very fast and memory efficient map
implementations and get faster and smaller headers for free.
For now, we are using the JDK's TreeMap for HTTP and HTTP/2 default headers.
Result:
- Significantly fewer lines of code in the implementation. While the total commit is still
roughly 400 lines less, the actual implementation is a lot less. I just added some more
tests and microbenchmarks.
- Overall performance is up. The current implementation should be significantly faster
for insertion and retrieval. However, it is slower when it comes to iteration. There is simply
no way a TreeMap can have the same iteration performance as a linked list (as used in the
current headers implementation). That's totally fine though, because when looking at the
benchmark results @ejona86 pointed out that the performance of the headers is completely
dominated by insertion, that is insertion is so significantly faster in the new implementation
that it does make up for several times the iteration speed. You can't iterate what you haven't
inserted. I am demonstrating that in this spreadsheet [1]. (Actually, iteration performance is
only down for HTTP, it's significantly improved for HTTP/2).
- Memory is down. The implementation with TreeMap uses on avg ~30% less memory. It also does not
produce any garbage while being resized. In load tests for GRPC we have seen a memory reduction
of up to 1.2KB per RPC. I summarized the memory improvements in this spreadsheet [1]. The data
was generated by [2] using JOL.
- While it was my original intend to only improve the memory usage for HTTP/2, it should be similarly
improved for HTTP, SPDY and STOMP as they all share a common implementation.
[1] https://docs.google.com/spreadsheets/d/1ck3RQklyzEcCLlyJoqDXPCWRGVUuS-ArZf0etSXLVDQ/edit#gid=0
[2] https://gist.github.com/buchgr/4458a8bdb51dd58c82b4
Motivation:
The DataCompressionHttp2Test was exiting prematurely leading to unit test failures.
Modifications:
- Fix the race condition so the test does not evaluate final conditions until all expected events occur
Result:
Unit test no longer fails
Motivation:
The SPDY spec requires that all header names be lowercase (see https://www.chromium.org/spdy/spdy-protocol/spdy-protocol-draft3-1#TOC-3.2-HTTP-Request-Response). The SPDY codec header name validator does not enforce this requirement.
Modifications:
- SpdyCodecUtil.validateHeaderName should check for upper case characters and throw an error if any are found.
Result:
SPDY codec header validation enforces specification requirement.
Motivation:
IP_FREEBIND allows to bind to addresses without the address up yet or even the interface configured yet.
Modifications:
Add support for IP_FREEBIND.
Result:
It's now possible to use IP_FREEBIND when using the native epoll transport.
Motivation:
The MqttEncoder was failing to build because it was using a method that doesn't exist.
Modifications:
Change sessionPresent() to isSessionPresent().
Result:
MqttEnccoder is now able to build.
Motivation:
The HttpObjectDecoder is on the hot code path for the http codec. There are a few hot methods which can be modified to improve performance.
Modifications:
- Modify AppendableCharSequence to provide unsafe methods which don't need to re-check bounds for every call.
- Update HttpObjectDecoder methods to take advantage of new AppendableCharSequence methods.
Result:
Peformance boost for decoding http objects.
Motivation:
We don't decrease acquired channel count in FixedChannelPool when timeout occurs by AcquireTimeoutAction.NEW and eventually fails.
Modifications:
Set AcquireTask.acquired=true to call decrementAndRunTaskQueue when timeout action fails.
Result:
Acquired channel count decreases correctly.
Motivation:
Two problems:
1. Decoder assumption that as soon as it finds </ element it can decrement opened xml brackets counter. It can lead to bugs when closing bracket is not in byteBuf yet.
2. Not proper handling of more than two root elements in XML document. First element will be processed properly, second one not. It is caused by assumption that byteBuf readerIndex is 0 at the begging of decoding.
Modifications:
Both problems were resolved by fixes:
1. decrement opened brackets count only if </ > enclosing bracket is found
2. consider readerIndex higher than 0 when counting output frame length
Result:
Both problems were resolved
Motivation:
When looking through the logs for entries pertaining to a specific stream, it's difficult because header entries use the syntax "streamId:<id>" but all other entries use "streamId=<id>". We should make all of the entries consistent.
Modifications:
Changed header entries to use "streamId=<id>" to match the other entries.
Result:
Easier HTTP/2 log navigation.
Motivation:
We should support XXXCollections methods for all primitive map types.
Modifications:
Removed PrimitiveCollections and added a template for XXXCollections.
Result:
Fixes#4001
Motivation:
It would be useful to support the Java `Map` interface in our primitive maps.
Modifications:
Renamed current methods to "pXXX", where p is short for "primitive". Made the template for all primitive maps extend the appropriate Map interface.
Result:
Fixes#3970
Motivation:
We pass-through non ByteBuf when SslHandler.write(...) is called which can lead to have unencrypted data to be send (like for example if a FileRegion is written).
Modifications:
- Fail ChannelPromise with UnsupportedMessageException if a non ByteBuf is written.
Result:
Only allow ByteBuf to be written when using SslHandler.
Motivation:
Remove RC4 from default ciphers as it is not known as secure anymore.
Modifications:
Remove RC4
Result:
Not use an insecure cipher as default.
Motivation:
We missed to correctly count acquired channels in FixedChannelPool which could produce an assert error.
Modifications:
Only try to decrement acquired count if the channel was really acuired.
Result:
No more assert error possible.
Motivation:
HttpToHttp2ConnectionHandler only converts FullHttpMessage to HTTP/2 Frames. This does not support other use cases such as adding a HttpContentCompressor to the pipeline, which writes HttpMessage and HttpContent.
Additionally HttpToHttp2ConnectionHandler ignores converting and sending HTTP trailing headers, which is a bug as the HTTP/2 spec states that they should be sent.
Modifications:
Update HttpToHttp2ConnectionHandler to support converting HttpMessage and HttpContent to HTTP/2 Frames.
Additionally, include an extra call to writeHeaders if the message includes trailing headers
Result:
One can now write HttpMessage and HttpContent (http chunking) down the pipeline and they will be converted to HTTP/2 Frames. If any trailing headers exist, they will be converted and sent as well.
Motivation:
We missed to register for EPOLLRDHUP events when construct the EpollSocketChannel from an existing FileDescriptor. This could cause to miss connection-resets.
Modifications:
Add Native.EPOLLRDHUP to the events we are interested in.
Result:
Connection-resets are detected correctly.
Motivation:
The bufferingNewStreamFailsAfterGoAwayReceived method currently causes an NPE.
Modifications:
Fixed the test so that a valid ByteBuf is passed in.
Result:
The test no longer throws an NPE.
Motivation:
Prior we used a purge task that would remove previous canceled scheduled tasks from the internal queue. This could introduce some delay and so use a lot of memory even if the task itself is already canceled.
Modifications:
Schedule removal of task from queue via EventLoop if cancel operation is not done in the EventLoop Thread or just remove directly if the Thread that cancels the scheduled task is in the EventLoop.
Result:
Faster possibility to GC a canceled ScheduledFutureTask.
Motivation:
DnsResolver.resolve(...) fails when an InetSocketAddress is used that was constructed of an ipaddress string.
Modifications:
Don't try to lookup when the InetSocketAddress was constructed via an ipaddress.
Result:
DnsResolver.resolve(...) works in all cases.
Motivation:
It is currently assumed that all usages of the HTTP/2 codec will be from the same event loop context. If the methods are used outside of the assumed thread context then unexpected behavior is observed. This assumption should be more clearly communicated and enforced in key areas.
Modifications:
- The flow controller interfaces have assert statements and updated javadocs indicating the assumptions.
Result:
Interfaces more clearly indicate thread context limitations.
Motivation:
If the Channel is already closed when the PendingWriteQueue is created it will generate a NPE when add or remove is called later.
Modifications:
Add null checks to guard against NPE.
Result:
No more NPE possible.
Motivation:
The CompressorHttp2ConnectionEncoder is attempting to attach a property to streams before the exist.
Modifications:
- Allow the super class to create the streams before attempting to attach a property to the stream.
Result:
CompressorHttp2ConnectionEncoder is able to set the property and access the compressor.
Motivation:
WebSocketServerHandshakerFactory.sendUnsupportedVersionResponse does not
send a LastHttpContent, nor does it flush, and it doesn't send a content
length.
Modifications:
Changed sendUnsupportedVersionResponse to send FullHttpResponse, to
writeAndFlush, and to set a content length of 0. Also added a test for
this method.
Result:
Upstream handlers will be able to determine the end of the response, the
response will actually get written to the client, and the client will be
able to determine the end of the response.
Motivation:
See #3783
Modifications:
- The DefaultHttp2RemoteFlowController should use Channel.isWritable() before attempting to do any write operations.
- The Flow controller methods should no longer take ChannelHandlerContext. The concept of flow control is tied to a connection and we do not support 1 flow controller keeping track of multiple ChannelHandlerContext.
Result:
Writes are delayed until isWritable() is true. Flow controller interface methods are more clear as to ChannelHandlerContext restrictions.
Motivation:
Test was leaving composite buffers taken from the queue unreleased.
Modifications:
Make the test release buffers.
Result:
Nagging about leaked buffers should stop.
Motivation:
The MAX_HEADER_LIST_SIZE of SETTINGS is represented by
unsigned 32-bit value and this value isn't limited in RFC7540.
But in current implementation, its stored to int variable so
over 2^31-1 value is recognized as minus and handled as
PROTOCOL_ERROR.
Modifications:
If a value of MAX_HEADER_LIST_SIZE is larger than 2^31-1, its
handled as 2^31-1
Result:
Over 2^31-1 MAX_HEADER_LIST_SIZE is became acceptable
Motivation:
Slicing a mutable CompositeByteBuf is not the appropriate mechanism to use to track and release buffers that have been written to a channel.
In particular buffers passed over an Embedded or LocalChannel are retained after the ChannelPromise is completed and listening to the
promise to consolidate a CompositeBuffer breaks slices taken from the composite as the offset indices have changed.
In addition CoalescingBufferQueue handles taking arbitrarily sized slices of a sequence of buffers more efficiently.
Modifications:
Convert FlowControlledData to use a CoalescingBufferQueue to handle merging data writes.
Result:
HTTP2 works over LocalChannel and code is considerably simpler.
Motivation:
Sometimes people use a data frame with length 0 to end a stream(such as jetty http2-server). So it is possible that data.readableBytes and padding are all 0 for a data frame, and cause an IllegalArgumentException when calling flowController.consumeBytes.
Modifications:
Return false when numBytes == 0 instead of throwing IllegalArgumentException.
Result:
Fix IllegalArgumentException.
Motivation:
Simplifies writing code that needs to merge or slice a sequence of buffer & promise pairs into chunks of arbitrary sizes.
For example in HTTP2 we merge or split buffers across fixed-size DATA frame boundaries.
Modifications:
Add new utility class CoalescingBufferQueue
Result:
Following this change HTTP2 code will switch to use it instead of CompositeByteBuffer for DATA frame coalescing.
Motivation:
The problem is described in https://github.com/grpc/grpc-java/issues/605. Basically, when using `StreamBufferingEncoder` there is a chance of creating zombie streams that never get closed.
Modifications:
Change `Http2ConnectionHandler`'s `channelInactive` handling logic to shutdown the encoder/decoder before shutting down the active streams.
Result:
Fixes https://github.com/grpc/grpc-java/issues/605
Motivation:
While cherry-picked 11f9e9084b I changed the EmbeddedChannel implementation to not allow no ChannelHandlers when constructing it.
This was done by mistake.
Modifications:
Revert change and add unit test.
Result:
Restore old behavior.
Motivation:
When using an EmbeddedChannel often it either does inbound or outbound processing which means we only often need one queue.
Modifications:
Lazy init the inbound and outbound message queues.
Result:
Less memory usage.
Motivation:
When we detect a BUFFER_OVERFLOW we should just forward the already produced data and allocate a new buffer and NOT do any extra memory copies while trying to expand the buffer.
Modifications:
When a BUFFER_OVERFLOW is returned and some data was produced just fire this data through the pipeline and allocate a new buffer to read again.
Result:
Less memorycopies and so better performance.
Motivation:
A SSL_read is needed to ensure the bio buffer is flushed, for this we did a priming read. This can be removed in many cases. Also ensure we always fill as much as possible in the destination buffers.
Modifications:
- Only do priming read if capacity of all dsts buffers is zero
- Always produce as must data as possible in the dsts buffers.
Result:
Faster code.
Motivation:
Previous we called BIO_write until either everything was written into it or it returned an error, which meant that the buffer is full. This then needed a ERR_clear_error() call which is expensive.
Modifications:
Break out of writing loop once we detect that not everything was written and so the buffer is full.
Result:
Less overhead when writing more data then the internal buffer can take.
Motivation:
When BIO_write is called with an empty buffer it will return 0 for which we call ERR_clear_error(). This is not neccessary as we should just skip these buffers. This eliminates a lot of overhead.
Modifications:
Skip empty src buffers when call unwrap(...).
Result:
Less overhead for unwrap(...) when called with empty buffers.
Motivation:
Http2ConnectionHandler missed to forward channelReadComplete(...) events.
Modifications:
Ensure we notify the next handler in the pipeline via ctx.fireChannelReadComplete().
Result:
Correctly forwarding of event.