Motivation:
There is no benchmark to measure the priority tree implementation performance.
Modifications:
Introduce a new benchmark which will populate the priority tree, and then shuffle parent/child links around.
Result:
A simple benchmark to get a baseline for the HTTP/2 codec's priority tree implementation.
Motivation:
Allows for running benchmarks from built jars which is useful in development environments that only take released artifacts.
Modifications:
Move benchmarks into 'main' from 'test'
Add @State annotations to benchmarks that are missing them
Fix timing issue grabbing context during channel initialization
Result:
Users can run benchmarks more easily.
Motivation:
The current implementation does byte by byte comparison, which we have seen
can be a performance bottleneck when the AsciiString is used as the key in
a Map.
Modifications:
Use sun.misc.Unsafe (on supporting platforms) to compare up to eight bytes at a time
and get closer to the performance of String.equals(Object).
Result:
Significant improvement (2x - 6x) in performance over the current implementation.
Benchmark (size) Mode Samples Score Score error Units
i.n.m.i.PlatformDependentBenchmark.arraysBytesEqual 10 thrpt 10 118843477.518 2347259.347 ops/s
i.n.m.i.PlatformDependentBenchmark.arraysBytesEqual 50 thrpt 10 43910319.773 198376.996 ops/s
i.n.m.i.PlatformDependentBenchmark.arraysBytesEqual 100 thrpt 10 26339969.001 159599.252 ops/s
i.n.m.i.PlatformDependentBenchmark.arraysBytesEqual 1000 thrpt 10 2873119.030 20779.056 ops/s
i.n.m.i.PlatformDependentBenchmark.arraysBytesEqual 10000 thrpt 10 306370.450 1933.303 ops/s
i.n.m.i.PlatformDependentBenchmark.arraysBytesEqual 100000 thrpt 10 25750.415 108.391 ops/s
i.n.m.i.PlatformDependentBenchmark.unsafeBytesEqual 10 thrpt 10 248077563.510 635320.093 ops/s
i.n.m.i.PlatformDependentBenchmark.unsafeBytesEqual 50 thrpt 10 128198943.138 614827.548 ops/s
i.n.m.i.PlatformDependentBenchmark.unsafeBytesEqual 100 thrpt 10 86195621.349 1063959.307 ops/s
i.n.m.i.PlatformDependentBenchmark.unsafeBytesEqual 1000 thrpt 10 16920264.598 61615.365 ops/s
i.n.m.i.PlatformDependentBenchmark.unsafeBytesEqual 10000 thrpt 10 1687454.747 6367.602 ops/s
i.n.m.i.PlatformDependentBenchmark.unsafeBytesEqual 100000 thrpt 10 153717.851 586.916 ops/s
Motivation:
ByteString#hashCode() trashes its own hash code if it's being accessed concurrently
Modifications:
Pull the ByteString#hash into a local variable and calculate it locally.
Result:
ByteString#hashCode() is no longer returning a junk value.
Motivation:
The IntObjectHashMap benchmarks show the Agrona collections to be faster on put, lookup, and remove. One major difference is that we're using 2 modulus operations each time we increment the position index while iterating. Agrona uses a mask instead.
Modifications:
Modified the KObjectHashMap to use masking rather than modulus when wrapping the position index. This requires that the capacity be a power of 2.
Result:
Improved performance of IntObjectHashMap.
Motivation:
Other implementations of FullHttpMessage allow .toString to be called after the Message has been released
This brings AggregatedFullHttpMessage into line with those impls.
Modifications:
- Changed AggregatedFullHttpMessage to no longer be a sub-class of DefaultByteBufHolder
- Changes AggregatedFullHttpMessage to implement ByteBufHolder
- Hold the content buffer internally to AggregatedFullHttpMessage
- Implement the required content() and release() methods that were missing
- Do not check refcnt when accessing content() (similar to DefaultFullHttpMessage)
Result:
A released AggregatedFullHttpMessage can have .toString called without throwing an exception
Motivation:
If an exclusive dependency change stream B should be an exclusive dependency of stream A is requested and stream B is already a child of stream A...then we will add B to B's own children map and create a circular link in the priority tree. This leads to an infinite recursive loop and a stack overflow exception.
Modifications:
-when removeAllChildren is called it should not remove the exclusive dependency.
-unit test to ensure this case is covered.
Result:
No more circular link in the priority tree.
Motivation:
While forward porting https://github.com/netty/netty/pull/3579 there were a few areas that had not been previously back ported.
Modifications:
Backport the missed areas to ensure consistency.
Result:
More consistent 4.1 and master branches.
Motivation:
The usage and code within AsciiString has exceeded the original design scope for this class. Its usage as a binary string is confusing and on the verge of violating interface assumptions in some spots.
Modifications:
- ByteString will be created as a base class to AsciiString. All of the generic byte handling processing will live in ByteString and all the special character encoding will live in AsciiString.
Results:
The AsciiString interface will be clarified. Users of AsciiString can now be clear of the limitations the class imposes while users of the ByteString class don't have to live with those limitations.
Motivation:
Attribute.getAndRemove() will return the value but also remove the AttributeKey itself from the AttributeMap. This may not
what you want as you may want to keep an instance of it and just set it later again. Document the contract so the user know what to expect.
Modifications:
- Make it clear when to use AttributeKey.getAndRemove() / AttributeKey.remove() and when AttributeKey.getAndSet(null) / AttributeKey.set(null).
Result:
Less suprising behaviour.
Motivation:
Sometimes it's useful to use EC keys and not DSA or RSA. We should support it.
Modifications:
Support EC keys and share the code between JDK and Openssl impl.
Result:
It's possible to use EC keys now.
Motivation:
SslContext factory methods have gotten out of control; it's past time to
swap to a builder.
Modifications:
New Builder class. The existing factory methods must be left as-is for
backward compatibility.
Result:
Fixes#3531
Motivation:
As we missed to correctly handle EPOLLRDHUP we produce an IOException which is unnessary. This leads
to have exceptionCaught(...) methods called.
Modifications:
When EPOLLRDHUP was received just close the socket and fail all pending writes.
Result:
Correctly handle of EPOLLRDHUP and so not miss-leading exceptions.
Motivation:
Due to a recent flurry of cleanup and fixes, we no longer need the stream removal policy to protect against recently removed streams. We should get rid of it.
Modifications:
Removed Http2StreamRemovalPolicy and everywhere it's used.
Result:
Fixes#3448
Motivation:
The Http2FrameWriterBenchmark JMH harness class name was not updated for the JVM arguments. The number of forks is 0 which means the JHM will share a JVM with the benchmarks. Sharing the JVM may lead to less reliable benchmarking results and as doesn't allow for the command line arguments to be applied for each benchmark.
Modifications:
- Update the JMH version from 0.9 to 1.7.1. Benchmarks wouldn't run on old version.
- Increase the number of forks from 0 to 1.
- Remove allocation of environment from static and cleanup AfterClass to using the Setup and Teardown methods. The forked JVM would not shut down correctly otherwise (and wait for 30+ seconds before timeing out).
Result:
Benchmarks that run as intended.
Motivation:
The DefaultHttp2Connection is not checking for RuntimeExceptions when invoking Http2Connection.Listener methods. This is a problem for a few reasons: 1. The state of DefaultHttp2Connection will be corrupted if a listener throws a RuntimeException. 2. If the first listener throws then no other listeners will be notified, which may further corrupt state that is updated as a result of listeners being notified.
Modifications:
- Document that RuntimeExceptions are not supported for Http2Connection.Listener methods, and will be logged as an error.
- Update DefaultHttp2Connection to handle and exception for each listener that is notified, and be sure that 1 listener throwing an exception does not prevent others from being notified.
Result:
More robust DefaultHttp2Connection.
Motivation:
When create NormalMemoryRegionCache for PoolThreadCache, we overbooked
cache array size. This means unnecessary overhead for thread local cache
as we will create multi cache enties for each element in cache array.
Modifications:
change:
int arraySize = Math.max(1, max / area.pageSize);
to:
int arraySize = Math.max(1, log2(max / area.pageSize) + 1);
Result:
Now arraySize won't introduce unnecessary overhead.
Changes to be committed:
modified: buffer/src/main/java/io/netty/buffer/PoolThreadCache.java
Motivation:
CompositeByteBuf has an iterator() method but fails to implement Iterable
Modifications:
Let CompositeByteBuf implement Iterable<ByteBuf>
Result:
Easier usage
Motivation:
The ReplayingDecoderBuffer does not match the naming scheme we use for ByteBuf types.
Modifications:
Rename to ReplayingDecoderByteBuf to match naming scheme
Result:
Consistent naming
Motivation:
We missed to dereference the chunk and tmpNioBuf when calling deallocate(). This means the GC can not collect these as we still hold a reference while have the PooledByteBuf in the recycler stack.
Modifications:
Dereference chunk and tmpNioBuf.
Result:
GC can collect things.
Motivation:
We missed to flush the channel when using HttpChunkedInput (this is done when using SSL). This will result in a stale.
Modifications:
Replace ctx.write(...) with ctx.writeAndFlush(...)
Result:
Correctly working example.
Motiviation:
At the moment we use FIFO for the PoolThreadCache which is sub-optimal as this may reduce the changes to have the cached memory actual still in the cpu-cache.
Modification:
- Change to use LIFO as this increase the chance to be able to serve buffers from the cpu-cache
Results:
Faster allocation out of the ThreadLocal cache.
Before the commit:
[xxx wrk]$ ./wrk -H 'Connection: keep-alive' -d 120 -c 256 -t 16 -s scripts/pipeline-many.lua http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 14.69ms 10.06ms 131.43ms 80.10%
Req/Sec 283.89k 40.37k 433.69k 66.81%
533859742 requests in 2.00m, 72.09GB read
Requests/sec: 4449510.51
Transfer/sec: 615.29MB
After the commit:
[xxx wrk]$ ./wrk -H 'Connection: keep-alive' -d 120 -c 256 -t 16 -s scripts/pipeline-many.lua http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 16.38ms 26.32ms 734.06ms 97.38%
Req/Sec 283.86k 39.31k 361.69k 83.38%
540836511 requests in 2.00m, 73.04GB read
Requests/sec: 4508150.18
Transfer/sec: 623.40MB
Motivation:
Now that we have a CharObjectHashMap, we should change Http2Settings to use it.
Modifications:
Changed Http2Settings to extend CharObjectHashMap rather than IntObjectHashMap.
Result:
Http2Settings uses less memory to store keys.
Motivation:
To support HTTP2 we need APLN support. This was not provided before when using OpenSslEngine, so SSLEngine (JDK one) was the only bet.
Beside this CipherSuiteFilter was not supported
Modifications:
- Upgrade netty-tcnative and make use of new features to support ALPN and NPN in server and client mode.
- Guard against segfaults after the ssl pointer is freed
- support correctly different failure behaviours
- add support for CipherSuiteFilter
Result:
Be able to use OpenSslEngine for ALPN / NPN for server and client.
Motivation:
We've removed access to the activeStreams collection, we should do the same for the children of a stream to provide a consistent interface.
Modifications:
Moved Http2StreamVisitor to a top-level interface. Removed unnecessary child operations from the Http2Stream interface so that we no longer require a map structure.
Result:
Cleaner and more consistent interface for iterating over child streams.
Motivation:
Currently we have IntObjectMap/HashMap, but it will be useful to support other primitive-based maps.
Modifications:
Moved the code int the current maps to template files and run Groovy code from common/pom.xml to apply the templates.
Result:
Autogeneration of int and char-based hash maps.
Motivation:
Too many duplicated code of tests for different compression codecs.
Modifications:
- Added abstract classes AbstractCompressionTest, AbstractDecoderTest and AbstractEncoderTest which contains common variables and tests for any compression codec.
- Removed common tests which are implemented in AbstractDecoderTest and AbstractEncoderTest from current tests for compression codecs.
- Implemented abstract methods of AbstractDecoderTest and AbstractEncoderTest in current tests for compression codecs.
- Added additional checks for current tests.
- Renamed abstract class IntegrationTest to AbstractIntegrationTest.
- Used Theories to run tests with head and direct buffers.
- Removed code duplicates.
Result:
Removed duplicated code of tests for compression codecs and simplified an addition of tests for new compression codecs.
Motivation:
The Http2Connection interface exposes an activeStreams() method which allows direct iteration over the underlying collection. There are a few places that make copies of this collection to avoid modification while iterating, and a few places that do not make copies. The copy operation can be expensive on hot code paths and also we are not consistently iterating over the activeStreams collection.
Modifications:
- The Http2Connection interface should reduce the exposure of the underlying collection and just expose what is necessary for the interface to function. This is just a means to iterate over the collection.
- The DefaultHttp2Connection should use this new interface and protect it's internal state while iteration is occurring.
Result:
Reduction in surface area of the Http2Connection interface. Consistent iteration of the set of active streams. Concurrent modification exceptions are handled in 1 encapsulated spot.
Motivation:
1) The current implementation doesn't allow for HEADERS, DATA, PING, PRIORITY and SETTINGS
frames to be sent after GOAWAY.
2) When receiving or sending a GOAWAY frame, all streams with ids greater than the lastStreamId
of the GOAWAY frame should be closed. That's not happening.
Modifications:
1) Allow sending of HEADERS and DATA frames after GOAWAY for streams with ids < lastStreamId.
2) Always allow sending PING, PRIORITY AND SETTINGS frames.
3) Allow sending multiple GOAWAY frames with decreasing lastStreamIds.
4) After receiving or sending a GOAWAY frame, close all streams with ids > lastStreamId.
Result:
The GOAWAY handling is more correct.
Motivation:
There are methods to manipulate the prioritzable count for streams which have the '0' postfix which are designed to be used during recursion. However these methods are calling out to an external method without the '0' during the recursive process. This is doing uneccessary conditional checks during recursion.
Modifications:
Change the decrementPrioritizableForTree to decrementPrioritizableForTree0 while in recursive method.
Change the incrementPrioritizableForTree to incrementPrioritizableForTree0 while in recursive method.
Result:
Less overhead during recursive calls.
Motiviation:
The interface provided by Http2LifecycleManager is not clear as to how the writeXXX methods should behave. The implementation of this interface from the Http2ConnectionHandler's perspecitve is unclear what writeXXX means in this context.
Modifications:
- Method names in Http2LifecycleManager and Http2ConnectionHandler should be renamed and comments should clarify the interfaces.
Results:
Http2LifecycleManager is more clear and Http2ConnectionHandler's implementation makes sense w.r.t to return values.
In TrafficCounter, a recent change makes the contract of the API (the
constructor) wrong and lead to issue with GlobalChannelTrafficCounter
where executor must be null.
Motivation:
TrafficCounter executor argument in constructor might be null, as
explained in the API, for some particular cases where no executor are
needed (relevant tasks being taken by the caller as in
GlobalChannelTrafficCounter).
A null pointer exception is raised while it should not since it is
legal.
Modifications:
Remove the 2 null checking for this particular attribute.
Note that when null, the attribute is not reached nor used (a null
checking condition later on is applied).
Result:
No more null exception raized while it should not.
This shall be made also to 4.0, 4.1 (present) and master. 3.10 is not
concerned.
Motivation:
The HTTP/2 headers code should be using binary string (currently AsciiString) objects instead of String objects. The DefaultHttp2HeadersEncoder was still using String for sensitiveHeaders.
Modifications:
- Remove the usage of String from DefaultHttp2HeadersEncoder.
- Introduce an interface to determine if a header name/value is sensitive or not to 1. prevent necessarily creating/copying sets. 2. Allow the name/value to be considered when checking if sensitive.
Result:
No more String in DefaultHttp2HeadersEncoder and less required set creation/operations.
Motivation:
The spec requires that a RST_STREAM received on an IDLE stream results in a connection error. This is not happening.
Modifications:
Check for this condition when a RST_STREAM is received in DefaultHttp2ConnectionDecoder.
Result:
More spec compliant. Fixes https://github.com/netty/netty/issues/3573.
Motivation:
The DefaultHttp2ConnectionDecoder has the setPriority call after the Http2FrameListener is notified of the change. The setPriority call has additional verification logic and may even create the dependency stream and so it must be before the Http2FrameListener is notified.
Modifications:
The DefaultHttp2ConnectionDecoder should treat the setPriority call in the same for the HEADERS and PRIORITY frame (call it before notifying the listener).
Result:
Http2FrameListener should see correct state when a HEADERS frame has a stream dependency that has not yet been created yet. Fixes https://github.com/netty/netty/issues/3572.
Motivation:
We are allocating a hash map for every HTTP2 Stream to store it's children.
Most streams are leafs in the priority tree and don't have children.
Modification:
- Only allocate children when we actually use them.
- Make EmptyIntObjectMap not throw a UnsupportedOperationException on remove, but return null instead (as is stated in it's javadoc).
Result:
Fewer unnecessary allocations.
Motivation:
PrimitiveCollections is not in the 4.1 branch. It is needed by HTTP/2.
Modifications:
Backport this class.
Result:
PrimitiveCollections is in 4.1.
Motivation:
In a simple load test that creates and closes several 10k streams per second
I have seen Iterator objects using roughly 1.6% of the total committed heap.
Modifications:
Use an ArrayList instead of a LinkedHashSet to store the connection listeners.
That way we can iterate over the list without creating an iterator every time.
Result:
Zero Iterator allocations due to notifying connection listeners.
Motivation:
The Http2Settings class currently disallows setting non-standard settings, which violates the spec.
Modifications:
Updated Http2Settings to permit arbitrary settings. Also adjusting the default initial capacity to allow setting all of the standard settings without reallocation.
Result:
Fixes#3560
Related: #3567
Motivation:
SslHandler.channelReadComplete() forgets to call
super.channelReadComplete(), which discards read bytes from the
cumulative buffer. As a result, the cumulative buffer can expand its
capacity unboundedly.
Modifications:
Call super.channelReadComplete() instead of calling
ctx.fireChannelReadComplete()
Result:
Fixes#3567
Motivation:
The HTTP/2 specification allows for closed (and streams in any state) to exist in the priority tree. The current code removes streams from the priority tree as soon as they are closed (subject to the removal policy). This may lead to undesired distribution of resources from the peer's perspective.
Modifications:
- We should only remove streams from the priority tree when they have no descendant streams in a viable state.
- We should track when tree edges change or nodes are removed if inviable nodes can then be removed.
Result:
Priority tree doesn't remove closed streams until descendant are all closed, or there are no descendants.
Motivation:
We're currently using Math.ceil which isn't necessary. We should exchange for a lighter weight operation.
Modifications:
Changing the logic to just ensure that we allocate at least one byte to the child rather than always performing a ceil.
Result:
Slight performance improvement in the priority algorithm.
Motivation:
The Connection.Listener GOAWAY event handler currently provides no additional information, requiring applications to hack in other ways to get at the error code and debug message.
Modifications:
Modified the Connection.Listener interface to pass on the error code and message that triggered the GOAWAY.
Result:
Application can now use Connection.Listener for all GOAWAY processing.
Motivation:
LoggingHandlerTest sometimes failure due to unexpected log messages
logged due to the automatic reclaimation of thread-local objects.
Expectation failure on verify:
Appender.doAppend([DEBUG] Freed 3 thread-local buffer(s) from thread: nioEventLoopGroup-23-0): expected: 1, actual: 0
Appender.doAppend([DEBUG] Freed 9 thread-local buffer(s) from thread: nioEventLoopGroup-23-1): expected: 1, actual: 0
Appender.doAppend([DEBUG] Freed 2 thread-local buffer(s) from thread: nioEventLoopGroup-23-2): expected: 1, actual: 0
Appender.doAppend([DEBUG] Freed 4 thread-local buffer(s) from thread: nioEventLoopGroup-26-0): expected: 1, actual: 0
Appender.doAppend(matchesLog(expected: ".+CLOSE$", got: "[id: 0xembedded, embedded => embedded] CLOSE")): expected: 1, actual: 0
Modifications:
Add the mock appender to the related logger only
Result:
No more intermittent test failures