Motivation:
Previously client Http2ConnectionHandler trigger a user event
immediately when the HTTP/2 connection preface is sent. Any attempt to
immediately send a new request could cause the server to terminate the
connection, as it might not have received the SETTINGS frame from the
client. Per RFC7540 Section 3.5, the preface "MUST be followed by a
SETTINGS frame (Section 6.5), which MAY be empty."
(https://tools.ietf.org/html/rfc7540#section-3.5)
This event could be made more meaningful if it also indicates that the
initial client SETTINGS frame has been sent to signal that the channel
is ready to send new requests.
Modification:
- Renamed event to Http2ConnectionPrefaceAndSettingsFrameWrittenEvent.
- Modified Http2ConnectionHandler to trigger the user event only if it
is a client and it has sent both the preface and SETTINGS frame.
Result:
It is now safe to use the event as an indicator that the HTTP/2
connection is ready to send new requests.
Motivation:
A regression was introduced in 86e653e which had the effect that the writability was not updated for a Channel while queueing data in the SslHandler.
Modifications:
- Factor out code that will increment / decrement pending bytes and use it in AbstractCoalescingBufferQueue and PendingWriteQueue
- Add test-case
Result:
Channel writability changes are triggered again.
Motivation:
We tried to set IPV6 opts on an ipv4 only system and so failed to set / get the traffic opts. This resulted in a test-error when trying to compile netty on ipv4 only systems.
Modifications:
Use the correct opts depending on if the system is ipv4 only or not.
Result:
Be able to build and use on ipv4 only systems.
Motivation:
Bug in capacity calculation: occurs auto convert to string instead of sum up.
Modifications:
Use `eventName.length()` in sum.
Result:
Less trash in logs.
Motivation:
We should also enforce the handshake timeout on the server-side to allow closing connections which will not finish the handshake in an expected amount of time.
Modifications:
- Enforce the timeout on the server and client side
- Add unit test.
Result:
Fixes [#7230].
Motviation:
DnsNameResolverContext#followCname attempts to build a query to follow a CNAME, but puts the original hostname in the DnsQuery instead of the CNAME hostname. This will result in not following CNAME redirects correctly.
Result:
- DnsNameResolverContext#followCname should use the CNAME instead of the original hostname when building the DnsQuery
Result:
More correct handling of redirect queries.
Motivation:
An `origin`/`sec-websocket-origin` header value in websocket client is filling incorrect in some cases:
- Hostname is not converting to lower-case as prescribed by RFC 6354 (see [1]).
- Selecting a `http` scheme when source URI has `wss`/`https` scheme and non-standard port.
Modifications:
- Convert uri-host to lower-case.
- Use a `https` scheme if source URI scheme is `wss`/`https`, or if source scheme is null and port == 443.
Result:
Correct filling an `origin` header for WS client.
[1] https://tools.ietf.org/html/rfc6454#section-4
Motivation:
Use actual links to new locations of Protobuf repo and documentation to
avoid problems when redirect will not work.
Modification:
Links in comments and all/pom.xml
Result:
Correct links to Protobuf resources
Motivation:
Objects of java.util.TreeMap or java.util.TreeSet will become
non-Serializable if instantiated with Comparators, which are not also
Serializable. This can result in unexpected and difficult-to-diagnose
bugs.
Modifications:
Implements Serializable for all classes, which implements Comparator.
Result:
Proper Comparators which will not force collections to
non-Serializable mode.
Motivation:
Even if it's a super micro-optimization (most JVM could optimize such
cases in runtime), in theory (and according to some perf tests) it
may help a bit. It also makes a code more clear and allows you to
access such methods in the test scope directly, without instance of
the class.
Modifications:
Add 'static' modifier for all methods, where it possible. Mostly in
test scope.
Result:
Cleaner code with proper 'static' modifiers.
Motivation:
Javadoc of the `ByteBufUtil#copy(AsciiString, int, ByteBuf, int, int)` is incorrect.
Modifications:
Fix it.
Result:
The description of the `#copy` method is not misleading.
Motivation:
In the `ByteBufOutputStream` we can use an appropriate methods of `ByteBuf`
to reduce calls of virtual methods and do not copying converting logic.
Modifications:
- Use an appropriate methods of `ByteBuf`
- Remove redundant conversions (int -> byte, int -> char).
- Use `ByteBuf#writeCharSequence` in the `writeBytes(String)'.
Result:
Less code duplication. A `writeBytes(String)` method is faster.
No unnecessary conversions. More consistent and cleaner code.
Motivation:
Without a 'serialVersionUID' field, any change to a class will make
previously serialized versions unreadable.
Modifications:
Add missed 'serialVersionUID' field for all Serializable
classes.
Result:
Proper deserialization of previously serialized objects.
Motivation:
During code read of the Netty codebase I noticed that the Netty
HttpServerUpgradeHandler unconditionally sets a Content-Length: 0
header on 101 Switching Protocols responses. This explicitly
contravenes RFC 7230 Section 3.3.2 (Content-Length), which notes
that:
A server MUST NOT send a Content-Length header field in any
response with a status code of 1xx (Informational) or 204
(No Content).
While it is unlikely that any client will ever be confused by
this behaviour, there is no reason to contravene this part of the
specification.
Modifications:
Removed the line of code setting the header field and changed the
only test that expected it to be there.
Result:
When performing the server portion of HTTP upgrade, the 101
Switching Protocols response will no longer contain a
Content-Length: 0 header field.
Configuring this is tough because there is split between highly shared (and accessed) objects and lightly accessed objects.
Modification:
There are a number of changes here. In relative order of importance:
API / Functionality changes:
* Max records and max sample records are gone. Only "target" records, the number of records tries to retain is exposed.
* Records are sampled based on the number of already stored records. The likelihood of recording a new sample is `2^(-n)`, where `n` is the number of currently stored elements.
* Records are stored in a concurrent stack structure rather than a list. This avoids a head and tail. Since the stack is only read once, there is no need to maintain head and tail pointers
* The properties of this imply that the very first and very last access are always recorded. When deciding to sample, the top element is replaced rather than pushed.
* Samples that happen between the first and last accesses now have a chance of being recorded. Previously only the final few were kept.
* Sampling is no longer deterministic. Previously, a deterministic access pattern meant that you could conceivably always miss some access points.
* Sampling has a linear ramp for low values and and exponentially backs off roughly equal to 2^n. This means that for 1,000,000 accesses, about 20 will actually be kept. I have an elegant proof for this which is too large to fit in this commit message.
Code changes:
* All locks are gone. Because sampling rarely needs to do a write, there is almost 0 contention. The dropped records counter is slightly contentious, but this could be removed or changed to a LongAdder. This was not done because of memory concerns.
* Stack trace exclusion is done outside of RLD. Classes can opt to remove some of their methods.
* Stack trace exclusion is faster, since it uses String.equals, often getting a pointer compare due to interning. Previously it used contains()
* Leak printing is outputted fairly differently. I tried to preserve as much of the original formatting as possible, but some things didn't make sense to keep.
Result:
More useful leak reporting.
Faster:
```
Before:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 136293.404 ± 7669.454 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 72805.720 ± 3710.864 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 139131.215 ± 4882.751 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 74146.313 ± 4999.246 ops/s
After:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 155281.969 ± 5301.399 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 77866.239 ± 3821.054 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 153360.036 ± 8611.353 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 78670.804 ± 2399.149 ops/s
```
Motivation:
Due a bug we happen to sometimes fail the connectPromise with a ClosedChannelException when using the kqueue transport and the remote peer refuses the connection. We need to ensure we fail it with the correct exception.
Modifications:
Call finishConnect() before calling close() to ensure we preserve the correct exception.
Result:
KQueueSocketConnectionAttemptTest.testConnectionRefused will pass always on macOS.
Motivation: Today when Netty encounters a general error while decoding
it treats this as a decoder exception. However, for fatal causes this
should not be treated as such, instead the fatal error should be carried
up the stack without the callee having to unwind causes. This was
probably done for byte to byte message decoder but is now done for all
decoders.
Modifications: Instead of translating any error to a decoder exception,
we let those unwind out the stack (note that finally blocks still
execute) except in places where an event needs to fire where we fire
with the error instead of wrapping in a decoder exception.
Result: Fatal errors will not be treated as innocent decoder exceptions.
Motivation:
We need to upgrade our dependencies to versions which use ASM 6.0.0+ to support compiling on java9.
Modifications:
Update animal-sniffer-maven-plugin and maven-shade-plugin.
Result:
Fixes https://github.com/netty/netty/issues/6100
Motivation: Today when Netty encounters a general error while decoding
it treats this as a decoder exception. However, for fatal causes this
should not be treated as such, instead the fatal error should be carried
up the stack without the callee having to unwind causes.
Modifications: Instead of translating any error to a decoder exception,
we let those unwind out the stack (note that finally blocks still
execute).
Result: Fatal errors will not be treated as innocent decoder exceptions.
Motivation:
There are 2 motivations, the first depends on the second:
Loading Netty Epoll statically stopped working in 4.1.16, due to
`Native` always loading the arch specific shared object. In a
static binary, there is no arch specific SO.
Second, there are a ton of exceptions that can happen when loading
a native library. When loading native code, Netty tries a bunch of
different paths but a failure in any given may not be fatal.
Additionally: turning on debug logging is not always feasible so
exceptions get silently swallowed.
Modifications:
* Change Epoll and Kqueue to try the static load second
* Modify NativeLibraryLoader to record all the locations where
exceptions occur.
* Attempt to use `addSuppressed` from Java 7 if available.
Alternatives Considered:
An alternative would be to record log messages at each failure. If
all load attempts fail, the log messages are printed as warning,
else as debug. The problem with this is there is no `LogRecord` to
create like in java.util.logging. Buffering the args to
logger.log() at the end of the method loses the call site, and
changes the order of events to be confusing.
Another alternative is to teach NativeLibraryLoader about loading
the SO first, and then the static version. This would consolidate
the code fore Epoll, Kqueue, and TCNative. I think this is the
long term better option, but this PR is changing a lot already.
Someone else can take a crack at it later
Results:
Epoll Still Loads and easier debugging.
Motivation:
Highly retained and released objects have contention on their ref
count. Currently, the ref count is updated using compareAndSet
with care to make sure the count doesn't overflow, double free, or
revive the object.
Profiling has shown that a non trivial (~1%) of CPU time on gRPC
latency benchmarks is from the ref count updating.
Modification:
Rather than pessimistically assuming the ref count will be invalid,
optimistically update it assuming it will be. If the update was
wrong, then use the slow path to revert the change and throw an
execption. Most of the time, the ref counts are correct.
This changes from using compareAndSet to getAndAdd, which emits a
different CPU instruction on x86 (CMPXCHG to XADD). Because the
CPU knows it will modifiy the memory, it can avoid contention.
On a highly contended machine, this can be about 2x faster.
There is a downside to the new approach. The ref counters can
temporarily enter invalid states if over retained or over released.
The code does handle these overflow and underflow scenarios, but it
is possible that another concurrent access may push the failure to
a different location. For example:
Time 1 Thread 1: obj.retain(INT_MAX - 1)
Time 2 Thread 1: obj.retain(2)
Time 2 Thread 2: obj.retain(1)
Previously Thread 2 would always succeed and Thread 1 would always
fail on the second access. Now, thread 2 could fail while thread 1
is rolling back its change.
====
There are a few reasons why I think this is okay:
1. Buggy code is going to have bugs. An exception _is_ going to be
thrown. This just causes the other threads to notice the state
is messed up and stop early.
2. If high retention counts are a use case, then ref count should
be a long rather than an int.
3. The critical section is greatly reduced compared to the previous
version, so the likelihood of this happening is lower
4. On error, the code always rollsback the change atomically, so
there is no possibility of corruption.
Result:
Faster refcounting
```
BEFORE:
Benchmark (delay) Mode Cnt Score Error Units
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1 sample 2901361 804.579 ± 1.835 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10 sample 3038729 785.376 ± 16.471 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 100 sample 2899401 817.392 ± 6.668 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1000 sample 3650566 2077.700 ± 0.600 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10000 sample 3005467 19949.334 ± 4.243 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1 sample 456091 48.610 ± 1.162 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10 sample 732051 62.599 ± 0.815 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 100 sample 778925 228.629 ± 1.205 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1000 sample 633682 2002.987 ± 2.856 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10000 sample 506442 19735.345 ± 12.312 ns/op
AFTER:
Benchmark (delay) Mode Cnt Score Error Units
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1 sample 3761980 383.436 ± 1.315 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10 sample 3667304 474.429 ± 1.101 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 100 sample 3039374 479.267 ± 0.435 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1000 sample 3709210 2044.603 ± 0.989 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10000 sample 3011591 19904.227 ± 18.025 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1 sample 494975 52.269 ± 8.345 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10 sample 771094 62.290 ± 0.795 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 100 sample 763230 235.044 ± 1.552 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1000 sample 634037 2006.578 ± 3.574 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10000 sample 506284 19742.605 ± 13.729 ns/op
```
Motivation:
#7269 removed an unnecessary instanciation for verifying WebSocket
handshake status code.
But it uses a hardcoded status code value for 101 instead of using the
intended `HttpResponseStatus#SWITCHING_PROTOCOLS` constant.
Modidication:
Compare actual `HttpResponseStatus` against predefined constant. Note
that `HttpResponseStatus#equals` is implemented in respect with the RFC
(only honor code, not text) so it’s intended to be used this way.
Result:
Cleaner code, use intended constant instead of hard coded value.
Motivation:
Java9SslEngine did not correctly implement ApplicationProtocolAccessor and so returned an empty String when no extension was used while the interface contract is to return null. This lead to the situation that ApplicationProtocolNegationHandler did not correct work.
Modifications:
- Rename ApplicationProtocolAccessor.getApplicationProtocol() to getNegotiatedApplicationProtocol() which resolves the clash with the method exposed by Java9s SSLEngine.
- Correctly implement getNegotiatedApplicationProtocol() for Java9Sslengine
- Add delegate in Java9Sslengine.getApplicationProtocol() which is provided by Java9
- Adjust tests to test the correct behaviour.
Result:
Fixes [#7251].
Motivation:
https://github.com/netty/netty/issues/7253
Modifications:
Adding `Content-Length: 0` to `CorsHandler.forbidden()` and `CorsHandler.handlePreflight()`
Result:
Contexts that are terminated by the CorsHandler will always include a Content-Length header
Motivation:
- In the `HttpResponseStatus#equals` checks only status code. No need to create new instance of `HttpResponseStatus` for comparison with response status.
- The RFC says: `the HTTP version and reason phrase aren't important` [1].
Modifications:
Use comparison by status code without creating new `HttpResponseStatus`.
Result:
Less allocations, more clear code.
[1] https://tools.ietf.org/html/draft-ietf-hybi-thewebsocketprotocol-00
Motivation:
There are many @SuppressWarnings("unchecked") in the code for the same purpose that we want to do this return:
@SuppressWarnings("unchecked")
public B someMethod() {
......
return (B) this;
}
Modification:
Add a method self() and reuse in all these return lines:
@SuppressWarnings("unchecked")
private B self() {
return (B) this;
}
Result:
Then only one @SuppressWarnings("unchecked") left in the code.
Motivation:
Getting the latest Conscrypt goodies.
Modifications:
A few API changes have occurred, specifically in the Conscrypt
class.
Result:
Netty now builds and tests against Conscrypt 1.0.0.RC11
Motivation:
Http2ServerDowngrader is specifically built for server channels where
inbound Http2StreamFrames are converted into HttpRequests, and outbound
HttpResponses are converted into Http2StreamFrames. It can be easily
made to be more generic to work with client channels where inbound
Http2StreamFrames are converted into HttpResponses, and outbound
HttpRequests are converted into Http2StreamFrames.
Modification:
- Renamed Http2ServerDowngrader to a more general
Http2StreamFrameToHttpObjectCodec
- Made it take in an "isServer" parameter to determine whether encoding
inbound Http2StreamFrames should create HttpRequests (for server) or
HttpResponses (for client)
- Norman fixed a leak in the unit test. Thanks! :-)
Result:
Now Http2StreamFrameToHttpObjectCodec can be used to translate
Http2StreamFrame to HttpObject for both server and client.
Motivation:
We used to check for version 6.8 but the latest is 6.9
Modifications:
Update version to 6.9 in the check.
Result:
Be able to cut a release on latest centos version
Motivation:
3c8c7fc7e9 introduced some changes to the ResourceLeakDetector that introduced a regression and so would always log that paranoid leak detection should be enabled even it was already.
Modifications:
Correctly not clear the recorded stacktraces when we process the reference queue so we can log these.
Result:
ResourceLeakDetector works again as expected.
Motivation:
Most, but not all defaults are statically exposed on
PooledByteBufAllocator. This makes it cumbersome to make a custom
allocator where most of the defaults remain the same.
Modification:
Expose useCacheForAllThreads, and Direct preferred. The latter is
needed because it is under the internal package, and public code
should probably not depend on it.
Result:
More customizeable allocators
Motivation:
A bunch of unit tests are failing due to certificates having expired.
This has to be replaced with a newly generated certificate that has a
longer validity.
Modifications:
- generated a certificate with validity of 100 years from now
Results:
Unit tests are passing again
Motivation:
We changed Http2ConnectionHandler to expect the upgrade method to be
called *after* we send the preface (ie add the handler to the pipeline)
but we forgot to change the Http2ClientUpgradeCodec to match the new
policy. This meant that client-side h2c upgrades failed.
Modifications:
Reverse sending the preface and calling the upgrade method to match the
new policy.
Result:
Clients can initiate h2c upgrades successfully.
Motivation:
We need to ensure we not write any body when a response with status code of 1xx, 204 or 304 is used as stated in rfc:
https://tools.ietf.org/html/rfc7230#section-3.3.3
Modifications:
- Correctly handle status codes
- Add unit tests
Result:
Correctly handle responses with 1xx, 204, 304 status codes.
Motivation:
NativeLibraryLoader uses ClassLoader#getResource method that can return nulls when the resource cannot be found. The returned url variable should be checked for nullity and fail in a more usable manner than a NullPointerException
Modifications:
Fail with a FileNotFoundException
Result:
Fixes [#7222].
Motivation:
The constrcutors a protected atm but the classes are public. We should make the constructors public as well to make it easier to write your own ByteBufAllocator.
Modifications:
Change constructors to be public and add some javadocs.
Result:
Easier to create own ByteBufAllocator.
Motiviation:
The ResourceLeakDetector helps to detect and troubleshoot resource leaks and is often used even in production enviroments with a low level. Because of this its import that we try to keep the overhead as low as overhead. Most of the times no leak is detected (as all is correctly handled) so we should keep the overhead for this case as low as possible.
Modifications:
- Only call getStackTrace() if a leak is reported as it is a very expensive native call. Also handle the filtering and creating of the String in a lazy fashion
- Remove the need to mantain a Queue to store the last access records
- Add benchmark
Result:
Huge decrease of performance overhead.
Before the patch:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 4358.367 ± 116.419 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 2306.027 ± 55.044 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 4220.979 ± 114.046 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 2250.734 ± 55.352 ops/s
With this patch:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 71398.957 ± 2695.925 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 38643.963 ± 1446.694 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 71677.882 ± 2923.622 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 38660.176 ± 1467.732 ops/s
Motivation:
When SO_LINGER is used we run doClose() on the GlobalEventExecutor by default so we need to ensure we schedule all code that needs to be run on the EventLoop on the EventLoop in doClose. Beside this there are also threading issues when calling shutdownOutput(...)
Modifications:
- Schedule removal from EventLoop to the EventLoop
- Correctly handle shutdownOutput and shutdown in respect with threading-model
- Add unit tests
Result:
Fixes [#7159].
Motivation:
c93e58c453 changed to use _ for the tcnative lib name but missed to also adjust the filtering.
Modifications:
Fix filtering to look for _
Result:
Not include native tcnative lib as expected.
Motivation:
A large frame will be componsed by many packages. Every time the package
arrived, findEndOfLine will be called from the start of the buffer. It
will cause the complexity of reading frame equal to O(n^2). This can be
eliminated by using a offset to mark the last scan position, when new
package arrived, just find the delimter from the mark. The complexity
will be O(n).
Modification:
Add a offset to mark the last scan position.
Result:
Better performance for read large frame.
Motiviation:
At the moment an NPE is thrown if someone tries to use the InboundHttp2ToHttpAdapter.
Modifications:
- Ensure the status was null in "InboundHttp2ToHttpAdapter::onPushPromiseRead" before calling "HttpConversionUtil.parseStatus" methods.
- Fix setting status to OK in "InboundHttp2ToHttpAdapter::onPushPromiseRead".
Result:
Fixes [#7214].
Motivation:
The Headers interface supports an interface to get all the headers values corresponding to a particular name. This API returns a List which requires intermediate storage and increases GC pressure.
Modifications:
- Add a method which returns an iterator over all the values for a specific name
Result:
Ability to iterator over values for a specific name with no intermediate collection.
Motivation:
RLD allocates an ArrayDeque in anticipation of recording access
points. If the leak detection level is less than ADVANCED though,
the dequeue is never used. Since SIMPLE is the default level,
there is a minor perf win to not preemptively allocate it.
This showed up in garbage profiling when creation a high number of
buffers.
Modifications:
Only allocate the dequeue if it will be used.
Result:
Less garbage created.
Motivation:
NativeLibraryLoader may only log a debug statement if the library is successfully loaded from java.library.path, but will log failure statements the if load for java.library.path fails which can mislead users to believe the load actually failed when it may have succeeded.
Modifications:
- Always load a debug statement when a library was successfully loaded
Result:
NativeLibraryLoader log statements more clear.