Configuring this is tough because there is split between highly shared (and accessed) objects and lightly accessed objects.
Modification:
There are a number of changes here. In relative order of importance:
API / Functionality changes:
* Max records and max sample records are gone. Only "target" records, the number of records tries to retain is exposed.
* Records are sampled based on the number of already stored records. The likelihood of recording a new sample is `2^(-n)`, where `n` is the number of currently stored elements.
* Records are stored in a concurrent stack structure rather than a list. This avoids a head and tail. Since the stack is only read once, there is no need to maintain head and tail pointers
* The properties of this imply that the very first and very last access are always recorded. When deciding to sample, the top element is replaced rather than pushed.
* Samples that happen between the first and last accesses now have a chance of being recorded. Previously only the final few were kept.
* Sampling is no longer deterministic. Previously, a deterministic access pattern meant that you could conceivably always miss some access points.
* Sampling has a linear ramp for low values and and exponentially backs off roughly equal to 2^n. This means that for 1,000,000 accesses, about 20 will actually be kept. I have an elegant proof for this which is too large to fit in this commit message.
Code changes:
* All locks are gone. Because sampling rarely needs to do a write, there is almost 0 contention. The dropped records counter is slightly contentious, but this could be removed or changed to a LongAdder. This was not done because of memory concerns.
* Stack trace exclusion is done outside of RLD. Classes can opt to remove some of their methods.
* Stack trace exclusion is faster, since it uses String.equals, often getting a pointer compare due to interning. Previously it used contains()
* Leak printing is outputted fairly differently. I tried to preserve as much of the original formatting as possible, but some things didn't make sense to keep.
Result:
More useful leak reporting.
Faster:
```
Before:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 136293.404 ± 7669.454 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 72805.720 ± 3710.864 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 139131.215 ± 4882.751 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 74146.313 ± 4999.246 ops/s
After:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 155281.969 ± 5301.399 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 77866.239 ± 3821.054 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 153360.036 ± 8611.353 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 78670.804 ± 2399.149 ops/s
```
Motivation:
Due a bug we happen to sometimes fail the connectPromise with a ClosedChannelException when using the kqueue transport and the remote peer refuses the connection. We need to ensure we fail it with the correct exception.
Modifications:
Call finishConnect() before calling close() to ensure we preserve the correct exception.
Result:
KQueueSocketConnectionAttemptTest.testConnectionRefused will pass always on macOS.
Motivation: Today when Netty encounters a general error while decoding
it treats this as a decoder exception. However, for fatal causes this
should not be treated as such, instead the fatal error should be carried
up the stack without the callee having to unwind causes. This was
probably done for byte to byte message decoder but is now done for all
decoders.
Modifications: Instead of translating any error to a decoder exception,
we let those unwind out the stack (note that finally blocks still
execute) except in places where an event needs to fire where we fire
with the error instead of wrapping in a decoder exception.
Result: Fatal errors will not be treated as innocent decoder exceptions.
Motivation:
We need to upgrade our dependencies to versions which use ASM 6.0.0+ to support compiling on java9.
Modifications:
Update animal-sniffer-maven-plugin and maven-shade-plugin.
Result:
Fixes https://github.com/netty/netty/issues/6100
Motivation: Today when Netty encounters a general error while decoding
it treats this as a decoder exception. However, for fatal causes this
should not be treated as such, instead the fatal error should be carried
up the stack without the callee having to unwind causes.
Modifications: Instead of translating any error to a decoder exception,
we let those unwind out the stack (note that finally blocks still
execute).
Result: Fatal errors will not be treated as innocent decoder exceptions.
Motivation:
There are 2 motivations, the first depends on the second:
Loading Netty Epoll statically stopped working in 4.1.16, due to
`Native` always loading the arch specific shared object. In a
static binary, there is no arch specific SO.
Second, there are a ton of exceptions that can happen when loading
a native library. When loading native code, Netty tries a bunch of
different paths but a failure in any given may not be fatal.
Additionally: turning on debug logging is not always feasible so
exceptions get silently swallowed.
Modifications:
* Change Epoll and Kqueue to try the static load second
* Modify NativeLibraryLoader to record all the locations where
exceptions occur.
* Attempt to use `addSuppressed` from Java 7 if available.
Alternatives Considered:
An alternative would be to record log messages at each failure. If
all load attempts fail, the log messages are printed as warning,
else as debug. The problem with this is there is no `LogRecord` to
create like in java.util.logging. Buffering the args to
logger.log() at the end of the method loses the call site, and
changes the order of events to be confusing.
Another alternative is to teach NativeLibraryLoader about loading
the SO first, and then the static version. This would consolidate
the code fore Epoll, Kqueue, and TCNative. I think this is the
long term better option, but this PR is changing a lot already.
Someone else can take a crack at it later
Results:
Epoll Still Loads and easier debugging.
Motivation:
Highly retained and released objects have contention on their ref
count. Currently, the ref count is updated using compareAndSet
with care to make sure the count doesn't overflow, double free, or
revive the object.
Profiling has shown that a non trivial (~1%) of CPU time on gRPC
latency benchmarks is from the ref count updating.
Modification:
Rather than pessimistically assuming the ref count will be invalid,
optimistically update it assuming it will be. If the update was
wrong, then use the slow path to revert the change and throw an
execption. Most of the time, the ref counts are correct.
This changes from using compareAndSet to getAndAdd, which emits a
different CPU instruction on x86 (CMPXCHG to XADD). Because the
CPU knows it will modifiy the memory, it can avoid contention.
On a highly contended machine, this can be about 2x faster.
There is a downside to the new approach. The ref counters can
temporarily enter invalid states if over retained or over released.
The code does handle these overflow and underflow scenarios, but it
is possible that another concurrent access may push the failure to
a different location. For example:
Time 1 Thread 1: obj.retain(INT_MAX - 1)
Time 2 Thread 1: obj.retain(2)
Time 2 Thread 2: obj.retain(1)
Previously Thread 2 would always succeed and Thread 1 would always
fail on the second access. Now, thread 2 could fail while thread 1
is rolling back its change.
====
There are a few reasons why I think this is okay:
1. Buggy code is going to have bugs. An exception _is_ going to be
thrown. This just causes the other threads to notice the state
is messed up and stop early.
2. If high retention counts are a use case, then ref count should
be a long rather than an int.
3. The critical section is greatly reduced compared to the previous
version, so the likelihood of this happening is lower
4. On error, the code always rollsback the change atomically, so
there is no possibility of corruption.
Result:
Faster refcounting
```
BEFORE:
Benchmark (delay) Mode Cnt Score Error Units
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1 sample 2901361 804.579 ± 1.835 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10 sample 3038729 785.376 ± 16.471 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 100 sample 2899401 817.392 ± 6.668 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1000 sample 3650566 2077.700 ± 0.600 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10000 sample 3005467 19949.334 ± 4.243 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1 sample 456091 48.610 ± 1.162 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10 sample 732051 62.599 ± 0.815 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 100 sample 778925 228.629 ± 1.205 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1000 sample 633682 2002.987 ± 2.856 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10000 sample 506442 19735.345 ± 12.312 ns/op
AFTER:
Benchmark (delay) Mode Cnt Score Error Units
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1 sample 3761980 383.436 ± 1.315 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10 sample 3667304 474.429 ± 1.101 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 100 sample 3039374 479.267 ± 0.435 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1000 sample 3709210 2044.603 ± 0.989 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10000 sample 3011591 19904.227 ± 18.025 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1 sample 494975 52.269 ± 8.345 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10 sample 771094 62.290 ± 0.795 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 100 sample 763230 235.044 ± 1.552 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1000 sample 634037 2006.578 ± 3.574 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10000 sample 506284 19742.605 ± 13.729 ns/op
```
Motivation:
#7269 removed an unnecessary instanciation for verifying WebSocket
handshake status code.
But it uses a hardcoded status code value for 101 instead of using the
intended `HttpResponseStatus#SWITCHING_PROTOCOLS` constant.
Modidication:
Compare actual `HttpResponseStatus` against predefined constant. Note
that `HttpResponseStatus#equals` is implemented in respect with the RFC
(only honor code, not text) so it’s intended to be used this way.
Result:
Cleaner code, use intended constant instead of hard coded value.
Motivation:
Java9SslEngine did not correctly implement ApplicationProtocolAccessor and so returned an empty String when no extension was used while the interface contract is to return null. This lead to the situation that ApplicationProtocolNegationHandler did not correct work.
Modifications:
- Rename ApplicationProtocolAccessor.getApplicationProtocol() to getNegotiatedApplicationProtocol() which resolves the clash with the method exposed by Java9s SSLEngine.
- Correctly implement getNegotiatedApplicationProtocol() for Java9Sslengine
- Add delegate in Java9Sslengine.getApplicationProtocol() which is provided by Java9
- Adjust tests to test the correct behaviour.
Result:
Fixes [#7251].
Motivation:
https://github.com/netty/netty/issues/7253
Modifications:
Adding `Content-Length: 0` to `CorsHandler.forbidden()` and `CorsHandler.handlePreflight()`
Result:
Contexts that are terminated by the CorsHandler will always include a Content-Length header
Motivation:
- In the `HttpResponseStatus#equals` checks only status code. No need to create new instance of `HttpResponseStatus` for comparison with response status.
- The RFC says: `the HTTP version and reason phrase aren't important` [1].
Modifications:
Use comparison by status code without creating new `HttpResponseStatus`.
Result:
Less allocations, more clear code.
[1] https://tools.ietf.org/html/draft-ietf-hybi-thewebsocketprotocol-00
Motivation:
There are many @SuppressWarnings("unchecked") in the code for the same purpose that we want to do this return:
@SuppressWarnings("unchecked")
public B someMethod() {
......
return (B) this;
}
Modification:
Add a method self() and reuse in all these return lines:
@SuppressWarnings("unchecked")
private B self() {
return (B) this;
}
Result:
Then only one @SuppressWarnings("unchecked") left in the code.
Motivation:
Getting the latest Conscrypt goodies.
Modifications:
A few API changes have occurred, specifically in the Conscrypt
class.
Result:
Netty now builds and tests against Conscrypt 1.0.0.RC11
Motivation:
Http2ServerDowngrader is specifically built for server channels where
inbound Http2StreamFrames are converted into HttpRequests, and outbound
HttpResponses are converted into Http2StreamFrames. It can be easily
made to be more generic to work with client channels where inbound
Http2StreamFrames are converted into HttpResponses, and outbound
HttpRequests are converted into Http2StreamFrames.
Modification:
- Renamed Http2ServerDowngrader to a more general
Http2StreamFrameToHttpObjectCodec
- Made it take in an "isServer" parameter to determine whether encoding
inbound Http2StreamFrames should create HttpRequests (for server) or
HttpResponses (for client)
- Norman fixed a leak in the unit test. Thanks! :-)
Result:
Now Http2StreamFrameToHttpObjectCodec can be used to translate
Http2StreamFrame to HttpObject for both server and client.
Motivation:
We used to check for version 6.8 but the latest is 6.9
Modifications:
Update version to 6.9 in the check.
Result:
Be able to cut a release on latest centos version
Motivation:
3c8c7fc7e9 introduced some changes to the ResourceLeakDetector that introduced a regression and so would always log that paranoid leak detection should be enabled even it was already.
Modifications:
Correctly not clear the recorded stacktraces when we process the reference queue so we can log these.
Result:
ResourceLeakDetector works again as expected.
Motivation:
Most, but not all defaults are statically exposed on
PooledByteBufAllocator. This makes it cumbersome to make a custom
allocator where most of the defaults remain the same.
Modification:
Expose useCacheForAllThreads, and Direct preferred. The latter is
needed because it is under the internal package, and public code
should probably not depend on it.
Result:
More customizeable allocators
Motivation:
A bunch of unit tests are failing due to certificates having expired.
This has to be replaced with a newly generated certificate that has a
longer validity.
Modifications:
- generated a certificate with validity of 100 years from now
Results:
Unit tests are passing again
Motivation:
We changed Http2ConnectionHandler to expect the upgrade method to be
called *after* we send the preface (ie add the handler to the pipeline)
but we forgot to change the Http2ClientUpgradeCodec to match the new
policy. This meant that client-side h2c upgrades failed.
Modifications:
Reverse sending the preface and calling the upgrade method to match the
new policy.
Result:
Clients can initiate h2c upgrades successfully.
Motivation:
We need to ensure we not write any body when a response with status code of 1xx, 204 or 304 is used as stated in rfc:
https://tools.ietf.org/html/rfc7230#section-3.3.3
Modifications:
- Correctly handle status codes
- Add unit tests
Result:
Correctly handle responses with 1xx, 204, 304 status codes.
Motivation:
NativeLibraryLoader uses ClassLoader#getResource method that can return nulls when the resource cannot be found. The returned url variable should be checked for nullity and fail in a more usable manner than a NullPointerException
Modifications:
Fail with a FileNotFoundException
Result:
Fixes [#7222].
Motivation:
The constrcutors a protected atm but the classes are public. We should make the constructors public as well to make it easier to write your own ByteBufAllocator.
Modifications:
Change constructors to be public and add some javadocs.
Result:
Easier to create own ByteBufAllocator.
Motiviation:
The ResourceLeakDetector helps to detect and troubleshoot resource leaks and is often used even in production enviroments with a low level. Because of this its import that we try to keep the overhead as low as overhead. Most of the times no leak is detected (as all is correctly handled) so we should keep the overhead for this case as low as possible.
Modifications:
- Only call getStackTrace() if a leak is reported as it is a very expensive native call. Also handle the filtering and creating of the String in a lazy fashion
- Remove the need to mantain a Queue to store the last access records
- Add benchmark
Result:
Huge decrease of performance overhead.
Before the patch:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 4358.367 ± 116.419 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 2306.027 ± 55.044 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 4220.979 ± 114.046 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 2250.734 ± 55.352 ops/s
With this patch:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 71398.957 ± 2695.925 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 38643.963 ± 1446.694 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 71677.882 ± 2923.622 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 38660.176 ± 1467.732 ops/s
Motivation:
When SO_LINGER is used we run doClose() on the GlobalEventExecutor by default so we need to ensure we schedule all code that needs to be run on the EventLoop on the EventLoop in doClose. Beside this there are also threading issues when calling shutdownOutput(...)
Modifications:
- Schedule removal from EventLoop to the EventLoop
- Correctly handle shutdownOutput and shutdown in respect with threading-model
- Add unit tests
Result:
Fixes [#7159].
Motivation:
c93e58c453 changed to use _ for the tcnative lib name but missed to also adjust the filtering.
Modifications:
Fix filtering to look for _
Result:
Not include native tcnative lib as expected.
Motivation:
A large frame will be componsed by many packages. Every time the package
arrived, findEndOfLine will be called from the start of the buffer. It
will cause the complexity of reading frame equal to O(n^2). This can be
eliminated by using a offset to mark the last scan position, when new
package arrived, just find the delimter from the mark. The complexity
will be O(n).
Modification:
Add a offset to mark the last scan position.
Result:
Better performance for read large frame.
Motiviation:
At the moment an NPE is thrown if someone tries to use the InboundHttp2ToHttpAdapter.
Modifications:
- Ensure the status was null in "InboundHttp2ToHttpAdapter::onPushPromiseRead" before calling "HttpConversionUtil.parseStatus" methods.
- Fix setting status to OK in "InboundHttp2ToHttpAdapter::onPushPromiseRead".
Result:
Fixes [#7214].
Motivation:
The Headers interface supports an interface to get all the headers values corresponding to a particular name. This API returns a List which requires intermediate storage and increases GC pressure.
Modifications:
- Add a method which returns an iterator over all the values for a specific name
Result:
Ability to iterator over values for a specific name with no intermediate collection.
Motivation:
RLD allocates an ArrayDeque in anticipation of recording access
points. If the leak detection level is less than ADVANCED though,
the dequeue is never used. Since SIMPLE is the default level,
there is a minor perf win to not preemptively allocate it.
This showed up in garbage profiling when creation a high number of
buffers.
Modifications:
Only allocate the dequeue if it will be used.
Result:
Less garbage created.
Motivation:
NativeLibraryLoader may only log a debug statement if the library is successfully loaded from java.library.path, but will log failure statements the if load for java.library.path fails which can mislead users to believe the load actually failed when it may have succeeded.
Modifications:
- Always load a debug statement when a library was successfully loaded
Result:
NativeLibraryLoader log statements more clear.
Motivation:
When Log4j2Logger is used with PatternLayout (%F:%L)%c.%M, the log message incorrect shows:
(Log4J2Logger.java:73)io.netty.util.internal.PlatformDependent0.debug ....
Modification:
Extend AbstractLogger
Result:
Fixes [#7186].
Motivation:
transport-rxtx has no tests and there is really no easy way to add some. Beside this this transport is not really well maintained.
Modifications:
Mark transport-rxtx as @deprecated so we can drop it in next major version.
Result:
Notify users of plan to drop the transport.
Motivation:
The MQTT decoder should raise an exception trying to build a CONNECT packet where password field is set but not the username one (as by MQTT 3.1/3.1.1 spec).
Modification:
Throw exception if password field is set but not the username
Result:
Fixes [#7205].
Motivation:
We need to ensure we use the correct osname in the Bundle-NativeCode declaration as declared in:
https://www.osgi.org/developer/specifications/reference/
Modifications:
Update osname to match the spec.
Result:
Correct Bundle-NativeCode entry in the MANIFEST
Motivation:
We missed to mark the Http2StreamChannel as writable in some cases which could lead to the situation that a Channel never becomes writable. Also when a Http2StreamChannel was created we always marked it non-writable at the beginning which means if the user will only start writing once the Channel becomes writable it will never happen as it only became writable after the first header was written.
Modifications:
- Correctly handle updates for writability in all cases
- Change unit tests to cover this.
Result:
Fixes [#7179].
Motiviation:
At the moment an NPE is thrown if someone tries to use the Http2ServerUpgradeCodec with Http2FrameCodec and Http2MultiplexCodec.
Modifications:
- Ensure the handler was added to the pipeline before calling on*Upgrade(...) methods.
- Add tests
- Fix adding of handlers after upgrade.
Result:
Fixes [#7173].
Motivation:
In `Http2StreamChannelId` a `hashCode()` is not consistent with `equals()`.
Modifications:
Make a `Http2StreamChannelId.hashCode()` consistent with `equals()`.
Result:
Faster hash map's operations where the Http2StreamChannelId as keys.
Motivation:
A `DefaultChannelId` has final `hashCode` field calculated in the constructor. We can use it in `equals` to the fast return for different objects.
Modifications:
Use `hashCode` field in `DefaultChannelId.equals()`.
Result:
Fast `equals` on negative scenarios.
Motivation:
`useCacheForAllThreads` may be false which disables memory caching
on non netty threads. Setting this argument or the system property
makes it impossible to use `PooledByteBufAllocator`.
Modifications:
Delayed the check of `freeSweepAllocationThreshold` in
`PoolThreadCache` to after it knows there will be any caches in
use. Additionally, check if the caches will have any data in them
(rather than allocating a 0-length array).
A test case is also added that fails without this change.
Results:
Fixes#7194
Motivation:
We recently saw an assertion failure when running DatagramUnicastTest.testSimpleSendWithConnect.
Modifications:
- Adding more debug infos
- Ensure we always correctly release the buffers.
Result:
More informations when tests fail.
Motivation:
We should not log by default if the promise is a VoidChannelPromise as its try* methods will always return false.
Modifications:
Do an instanceof check to determine if we should log or not by default
Result:
No more noise in the logs when using a VoidChannelPromise.
Motivation:
In `DefaultHttp2GoAwayFrame.equals()` a content compared twice: explicitly and in the `super` method.
Modifications:
Remove explicit content comparision.
Make `hashCode()` consistent with `equals()`.
Result:
A `DefaultHttp2GoAwayFrame.equals()` work faster.
Motivation:
We must not add the inboundStreamHandler for outbound streams creates by Http2MultiplexCodec as the user will specify a handler via Http2StreamChannelBootstrap.
Modifications:
- Check if the stream is for outbound and if so not add the inboundStreamHandler to the pipeline
- Update tests so this is covered.
Result:
Fixes [#7178]
Motivation:
379ac890f4 introduced the usage of the inline mock maker. This unfortunally not work on java7 and java9.
Modifications:
Just use reflection to create the event for now.
Result:
Netty tests pass again on java7 and java9 as well.
Motivation:
We should only try to load the native artifacts if the architecture we are currently running on is the same as the one the native libraries were compiled for.
Modifications:
Include architecture in native lib name and append the current arch when trying to load these. This will fail then if its not the same as the arch of the compiled arch.
Result:
Fixes [#7150].