Motivation
The nice change made by @carl-mastrangelo in #9307 for lookup-table
based HPACK Huffman decoding can be simplified a little to remove the
separate flags field and eliminate some intermediate operations.
Modification
Simplify HpackHuffmanDecoder::decode logic including de-dup of the
per-nibble part.
Result
Less code, possibly better performance though not noticeable in a quick
benchmark.
Motivation:
We don't need the extra ChannelPromise when writing headers anymore in Http2FrameCodec. This also means we cal re-use a ChannelFutureListener and so not need to create new instances all the time.
Modifications:
- Just pass the original ChannelPromise when writing headers
- Reuse the ChannelFutureListener
Result:
Two less objects created when writing headers for an not-yet created stream.
Motivation:
The previous used maxHeaderListSize was too low which resulted in exceptions during the benchmark run:
```
io.netty.handler.codec.http2.Http2Exception: Header size exceeded max allowed size (8192)
at io.netty.handler.codec.http2.Http2Exception.connectionError(Http2Exception.java:103)
at io.netty.handler.codec.http2.Http2Exception.headerListSizeError(Http2Exception.java:188)
at io.netty.handler.codec.http2.Http2CodecUtil.headerListSizeExceeded(Http2CodecUtil.java:231)
at io.netty.handler.codec.http2.HpackDecoder$Http2HeadersSink.finish(HpackDecoder.java:545)
at io.netty.handler.codec.http2.HpackDecoder.decode(HpackDecoder.java:132)
at io.netty.handler.codec.http2.HpackDecoderBenchmark.decode(HpackDecoderBenchmark.java:85)
at io.netty.handler.codec.http2.generated.HpackDecoderBenchmark_decode_jmhTest.decode_thrpt_jmhStub(HpackDecoderBenchmark_decode_jmhTest.java:120)
at io.netty.handler.codec.http2.generated.HpackDecoderBenchmark_decode_jmhTest.decode_Throughput(HpackDecoderBenchmark_decode_jmhTest.java:83)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:453)
at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:437)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
```
Also we should ensure we only use ascii for header names.
Modifications:
Just use Integer.MAX_VALUE as limit
Result:
Be able to run benchmark without exceptions
Motivation:
There are is some unnecessary code (like toString() calls) which can be cleaned up.
Modifications:
- Remove not needed toString() calls
- Simplify subString(...) calls
- Remove some explicit casts when not needed.
Result:
Cleaner code
Motivation:
ff0045e3e1 changed HpackHuffmanDecoder to use a lookup-table which greatly improved performance. We can squeeze out another 3% win by using an ByteProcessor which will reduce the number of bound-checks / reference-count-checks needed by processing byte-by-byte.
Modifications:
Implement logic with ByteProcessor
Result:
Another ~3% perf improvement which shows up when using h2load to simulate load.
`h2load -c 100 -m 100 --duration 60 --warm-up-time 10 http://127.0.0.1:8080`
Before:
```
finished in 70.02s, 620051.67 req/s, 20.70MB/s
requests: 37203100 total, 37203100 started, 37203100 done, 37203100 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 37203100 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 1.21GB (1302108500) total, 41.84MB (43872600) headers (space savings 90.00%), 460.24MB (482598600) data
min max mean sd +/- sd
time for request: 404us 24.52ms 15.93ms 1.45ms 87.90%
time for connect: 0us 0us 0us 0us 0.00%
time to 1st byte: 0us 0us 0us 0us 0.00%
req/s : 6186.64 6211.60 6199.00 5.18 65.00%
```
With this change:
```
finished in 70.02s, 642103.33 req/s, 21.43MB/s
requests: 38526200 total, 38526200 started, 38526200 done, 38526200 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 38526200 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 1.26GB (1348417000) total, 42.39MB (44444900) headers (space savings 90.00%), 466.25MB (488893900) data
min max mean sd +/- sd
time for request: 370us 24.89ms 15.52ms 1.35ms 88.02%
time for connect: 0us 0us 0us 0us 0.00%
time to 1st byte: 0us 0us 0us 0us 0.00%
req/s : 6407.06 6435.19 6419.74 5.62 67.00%
```
Motivation:
In the latest release we introduced Http2MultiplexHandler as a replacement of Http2MultiplexCodec. This did split the frame parsing from the multiplexing to allow a more flexible way to handle frames and to make the code cleaner. Unfortunally we did miss to special handle this in Http2ServerUpgradeCodec and so did not correctly add Http2MultiplexHandler to the pipeline before calling Http2FrameCodec.onHttpServerUpgrade(...). This did lead to the situation that we did not correctly receive the event on the Http2MultiplexHandler and so did not correctly created the Http2StreamChannel for the upgrade stream. Because of this we ended up with an NPE if a frame was dispatched to the upgrade stream later on.
Modifications:
- Correctly add Http2MultiplexHandler to the pipeline before calling Http2FrameCodec.onHttpServerUpgrade(...)
- Add unit test
Result:
Fixes https://github.com/netty/netty/issues/9314.
Motivation:
2c99fc0f12 introduced a change that eagly recycles the queue. Unfortunally it did not correct protect against re-entrance which can cause a NPE.
Modifications:
- Correctly protect against re-entrance by adding null checks
- Add unit test
Result:
Fixes https://github.com/netty/netty/issues/9319.
Motivation:
When decoding DnsRecord, if the record contains compression pointers, and not all compression pointers are decompressed, but part of the pointers are decompressed. Then when encoding the record, the compressed pointer will point to the wrong location, resulting in bad label problem.
Modification:
Pre-decompressed record RData that may contain compression pointers.
Result:
Fixes#8962
* Correctly take length of ByteBufInputStream into account for readLine() / readByte()
Motivation:
ByteBufInputStream did not correctly take the length into account when validate bounds for readLine() / readByte() which could lead to read more then allowed.
Modifications:
- Correctly take length into account
- Add unit tests
- Fix existing unit test
Result:
Correctly take length of ByteBufInputStream into account.
Related to https://github.com/netty/netty/pull/9306.
On servers with many pipelines or dynamic pipelines, it is easy for end user to make mistake during pipeline configuration. Current message:
`Discarded inbound message PooledUnsafeDirectByteBuf(ridx: 0, widx: 2, cap: 2) that reached at the tail of the pipeline. Please check your pipeline configuration.`
Is not always meaningful and doesn't allow to find the wrong pipeline quickly.
Modification:
Added additional log placeholder that identifies pipeline handlers and channel info. This will allow for the end users quickly find the problem pipeline.
Result:
Meaningful warning when the message reaches the end of the pipeline. Fixes#7285
Motivation:
buffer.isReadable() should not be used to limit the amount of data that can be read as the amount may be less then was is readable.
Modification:
- Use available() which takes the length into account
- Add unit test
Result:
Fixes https://github.com/netty/netty/issues/9305
Motivation:
The AbstractSniHandler previously was willing to tolerate up to three
non-handshake records before a ClientHello that contained an SNI
extension field. This is, so far as I can tell, completely
unnecessary: no TLS implementation will be sending alerts or change
cipher spec messages before ClientHello.
Given that it was not possible to determine why this loop is in
the code to begin with, it's probably just best to remove it.
Modifications:
Remove the for loop.
Result:
The AbstractSniHandler will more rapidly determine whether it should
pass the records on to the default SSL handler.
Co-authored-by: Norman Maurer <norman_maurer@apple.com>
Motivation:
Fix the issue of incorrectly calculating the number of dump rows when using prettyHexDumpmethod in ByteBufUtil. The way to find the remainder is either length % 16 or length & 15
Modification:
Fixed the way to calculate the remainder
Result:
Fixed#9301
Motivation:
Compiling with -Werror,-Wuninitialized complains about the sockaddrs being uninitialized.
I believe this is because the init function netty_unix_socket_initSockaddr is in a
separate compilation unit. Since this code isn't on the criticial path, it's easy
to just memset the variables rather than suppress the warning.
Modification:
Always clear the sockaddrs, even if they will be initialized later.
Result:
Able to compile with warnings turned on
Motivation:
b3dba317d7 added AbstractHttp2ConnectionBuilder.autoAckSettingsFrame(...) as protected method and made it public for Http2MultiplexCodecBuilder. Unfortunally it did miss to also make it public in Http2FrameCodecBuilder
Modifications:
Correctly override autoAckSettingsFrame in Http2FrameCodecBuilder and so make it usable when building Http2FrameCodec.
Result:
Be able to also configure autoAckSettingsFrame when Http2FrameCodec is used.
Motivation:
There is some manual coping of elements of Collections which can be replaced by Collections.addAll(...) and also some unnecessary semicolons.
Modifications:
- Simplify branches
- Use Collections.addAll
- Code cleanup
Result:
Code cleanup
Motivation:
ByteToMessageDecoder only looks at the last channelRead() in the batch
of channelRead()-s when determining whether or not it should call
ChannelHandlerContext#read() to consume more data when !isAutoRead. This
will lead to read() calls issued unnecessaily and unprompted if the very
last channelRead() didn't result in at least one decoded message, even
if there have been messages decoded from other channelRead()-s in the
current batch.
Modifications:
Track decode outcomes for the entire batch of channelRead() calls and
only issue a read in BTMD if the entire batch of channelRead() calls
yielded no complete messages.
Result:
ByteToMessageDecoder will no longer overread when the very last read
yielded no message, but the batch of reads did.
Motivation:
I've introduced netty/netty-tcnative#421 that introduced exposing OpenSSL master key & client/server
random values with the purpose of allowing someone to log them to debug the traffic via auxiliary tools like Wireshark (see also #8653)
Modification:
Augmented OpenSslEngineTest to include a test which manually decrypts the TLS ciphertext
after exposing the masterkey + client/server random. This acts as proof that the tc-native new methods work correctly!
Result:
More tests
Signed-off-by: Farid Zakaria <farid.m.zakaria@gmail.com>
Motivation:
In line base decoders, lines are split by delimiter, but the delimiter may be \r\n or \r, so in decoding, if findEndOfLine finds delimiter of a line, the length of the delimiter may be 1 or 2, instead of DELIMITER_LENGTH, where the value is fixed to 2.
The second problem is that if the data to be decoded is too long, the decoder will discard too long data, and needs to record the length of the discarded bytes. In the original implementation, the discarded bytes are not accumulated, but are assigned to the currently discarded bytes.
Modification:
Modifications:
Dynamic calculation of the length of delimiter.
In discarding mode, add up the number of characters discarded each time.
Result:
Correctly handle all delimiters and also correctly handle too long frames.
Motivation:
The toString() method should use Arrays.toString() to produce a meaningful String representation for arrays.
Modification:
Use Arrays.toString()
Result:
More useful toString() implementation
Motivation:
We should not propage Http2WindowUpdateFrames to the child channels at all as these are not really use-ful and should not be flow-controlled via `read()` anyway. In the other hand Http2ResetFrame is very useful but should be propagated via an user event so the user is aware of it directly even if the user stops reading.
Modifications:
- Dont propagate Http2WindowUpdateFrames when using Http2MultiplexHandler
- Use user event for Http2ResetFrame when using Http2MultiplexHandler
- Adjust javadoc of Http2MultiplexHandler
- Add unit tests
Result:
Fixes https://github.com/netty/netty/pull/8889 and https://github.com/netty/netty/pull/7635
Motivation:
Http2MultiplexCodec and Http2MultiplexHandler had a very strong coupling with Http2FrameCodec which we can reduce easily. The end-goal should be to have no coupling at all.
Modifications:
- Reduce coupling by move some common logic to Http2CodecUtil
- Move logic to check if a stream may have existed before to Http2FrameCodec
- Use ArrayDeque as replacement for custom double-linked-list which makes the code a lot more readable
- Use WindowUpdateFrame to signal consume bytes (just as users do when they use Http2FrameCodec directly)
Result:
Less coupling and cleaner code.
Motivation:
Some methods that either override others or are implemented as part of implementation an interface did miss the `@Override` annotation
Modifications:
Add missing `@Override`s
Result:
Code cleanup
Motivation:
asList should only be used if there are multiple elements.
Modification:
Call to asList with only one argument could be replaced with singletonList
Result:
Cleaner code and a bit of memory savings
Motivation:
SpotJMHBugs reports that accumulating a value as a way of eliding dead code
elimination may be inadvisable, as discussed in
`JMHSample_34_SafeLooping::measureWrong_2`. Change the test so that it consumes
the response with `Blackhole::consume` instead.
Modifications:
- Replace addition of results with explicit `blackhole.consume()` call
Result:
Tests work as before, but with different benchmark numbers.
Motivation:
Some JMH benchmarks need additional explanations to motivate
specific code choices.
Modifications:
Introduced comment to explai why calling BlackHole::consume
in a loop is not always the right choice for some benchmark.
Result:
The relevant method shows a comment that warn about changing
the code to introduce BlackHole::consume in the loop.
Motivation:
Currently GraalVM substrate returns null for reflective calls if the reflection access is not declared up front.
A change introduced in Netty 4.1.35 results in needing to register every Netty handler for reflection. This complicates matters as it is difficult to know all the possible handlers that need to be registered.
Modification:
This change adds a simple
null check such that Netty does not break on GraalVM substrate without the reflection information registration.
Result:
Fixes#9278
Motivation:
At the moment EmptyByteBuf.getCharSequence(0,...) will return null while it must return a "".
Modifications:
- Let EmptyByteBuf.getCharSequence(0,...) return ""
- Add unit test
Result:
Fixes https://github.com/netty/netty/issues/9271.
Motivation:
The HttpPostRequestEncoder overwrites the original filename of file uploads sharing the same name encoded in mixed mode when it rewrites the multipart body header of the previous file. The original filename should be preserved instead.
Modifications:
Change the HttpPostRequestEncoder to reuse the correct filename when the encoder switches to mixed mode. The original test is incorrect and has been modified too, in addition it tests with an extra file upload since the current test was not testing the continuation of a mixed mode.
Result:
The HttpPostRequestEncoder will preserve the original filename of the first fileupload when switching to mixed mode
Motivation:
HAProxyMessage should be released as it contains a list of TLV which hold a ByteBuf, otherwise, it may cause memory leaks.
Modification:
- Let HAProxyMessage extend AbstractReferenceCounted
- Adjust tests.
Result:
Fixes#9201
Motivation:
In the past we had the following class hierarchy:
Http2ConnectionHandler --- Http2FrameCodec -- Http2MultiplexCodec
This hierarchy makes it impossible to plug in any code that would like to act on Http2Frame and Http2StreamFrame which can be quite useful for various situations (like metrics, logging etc). Beside this it also made the implementtion very hacky. To allow easier maintainance and also allow more flexible costumizations we should split Http2MultiplexCodec and Http2FrameCode.
Modifications:
- Introduce Http2MultiplexHandler (which is a replacement for Http2MultiplexCodec when used together with Http2FrameCodec)
- Mark Http2MultiplexCodecBuilder and Http2MultiplexCodec as deprecated. People should use Http2FrameCodecBuilder / Http2FrameCodec together with Http2MultiplexHandlder in the future
- Adjust / Add tests
- Adjust examples
Result:
More flexible usage possible and less hacky / coupled implementation for http2 multiplexing
Motivation:
I need to control WebSockets inbound flow manually, when autoRead=false
Modification:
Add missed ctx.read() call into WebSocketProtocolHandler, where read request has been swallowed.
Result:
Fixes#9257
Motivation:
FlowControlHandler does use a recyclable ArrayDeque internally but only recycles it when the channel is closed. We should better recycle it once it is empty.
Modifications:
Recycle the deque as fast as possible
Result:
Less RecyclableArrayDeque instances.
Motivation:
Resolve the issue highlighted by SpotJMHBugs that the creation of the RecyclableArrayList may be elided by the JIT since the result isn't consumed or returned.
Modifications:
Return the result of `list.recycle()` so that the list isn't elided.
Result:
The JMH benchmark shows a change in performance indicating that the prior results of this may be unsound.
Motivation
It would be useful to be able to write UTF-8 encoded subsequence of
CharSequence characters to a ByteBuf without needing to create a
temporary object via CharSequence#subSequence().
Modification
Add overloads of ByteBufUtil writeUtf8, reserveAndWriteUtf8 and
utf8Bytes methods which take explicit subsequence bounds.
Result
More efficient writing of substrings to byte buffers possible
Motivation:
testTruncatedWithTcpFallback was flacky as we may end up closing the socket before we could read all data. We should only close the socket after we succesfully read all data.
Modifications:
Move socket.close() to finally block
Result:
Fix flaky test and so make the CI more stable again.
Motivation
There's quite a lot of duplicate/equivalent logic across the various
concrete ByteBuf implementations. We could take this even further but
for now I've focused on the PooledByteBuf sub-hierarchy.
Modifications
- Move common logic/methods into existing PooledByteBuf abstract
superclass
- Shorten PooledByteBuf.capacity(int) method implementation
Result
Less code to maintain
Motivation:
For HTTP/2 messages with multiple cookies HttpConversionUtil.addHttp2ToHttpHeaders spends a good portion of time creating throwaway StringBuilders.
Modification:
Handle cookies lazily by using a ThreadLocal StringBuilder and then converting it to the H1 header at the end.
Result:
Less allocations.
Motivation:
f945a071db decoupled the writability state from the flow controller but could lead to the situation of a lot of writability updates events were propagated to the child channels. This change ensure we only take into account if the parent channel becomes writable again before we try to set the child channels to writable.
Modifications:
Only listen for channel writability changes for if the parent channel becomes writable again.
Result:
Less writability updates.
Motivation:
Incorrect WebSockets closure affects our production system.
Enforced 'close socket on any protocol violation' prevents our custom termination sequence from execution.
Huge number of parameters is a nightmare both in usage and in support (decoders configuration).
Modification:
- Fix violations handling - send proper response codes.
- Fix for messages leak.
- Introduce decoder's option to disable default behavior (send close frame) on protocol violations.
- Encapsulate WebSocket response codes - WebSocketCloseStatus.
- Encapsulate decoder's configuration into a separate class - WebSocketDecoderConfig.
Result:
Fixes#8295.
Motivation:
We should decouple the writability state of the http2 child channels from the flow-controller and just tie it to its own pending bytes counter that is decremented by the parent Channel once the bytes were written.
Modifications:
- Decouple writability state of child channels from flow-contoller
- Update tests
Result:
Less coupling and more correct behavior. Fixes https://github.com/netty/netty/issues/8148.
Motivation:
Traffic shaping needs more accurate execution than scheduled one. So the
use of FixedRate instead.
Moreover the current implementation tends to create as many threads as
channels use a ChannelTrafficShapingHandlern, which is unnecessary.
Modifications:
Change the executor.schedule to executor.scheduleAtFixedRate in the
start and remove the reschedule call from run monitor thread since it
will be restarted by the Fixed rate executor.
Also fix a minor bug where restart was only doing start() without stop()
before.
Result:
Threads are more stable in number of cached and precision of traffic
shaping is enhanced.
Motivation:
Lz4FrameEncoder and Lz4FrameDecoder in their default configuration use
an extremely inefficient way to checksum direct byte buffers. In
particular, for every byte checksummed, a single-element byte array is
being allocated and a JNI cal is made, which in some internal testing
makes a 25x difference in total throughput and allocates *a lot* of
garbage.
Modifications:
Lz4XXHash32, an implementation of ByteBufChecksum specifically for use
by Lz4FrameEncoder and Lz4FrameDecoder, is introduced. It utilises
xxHash32 block API which provides a hash() method that accepts a
ByteBuffer as an argument. Lz4FrameEncoder and Lz4FrameDecoder are
modified to use this implementation by default.
Result:
Lz4FrameEncoder and Lz4FrameDecoder perform well again when operating
on direct byte buffers with default checksum configuration; a public
implementation is provided for those who need to override the seed.