Motivation:
Highly retained and released objects have contention on their ref
count. Currently, the ref count is updated using compareAndSet
with care to make sure the count doesn't overflow, double free, or
revive the object.
Profiling has shown that a non trivial (~1%) of CPU time on gRPC
latency benchmarks is from the ref count updating.
Modification:
Rather than pessimistically assuming the ref count will be invalid,
optimistically update it assuming it will be. If the update was
wrong, then use the slow path to revert the change and throw an
execption. Most of the time, the ref counts are correct.
This changes from using compareAndSet to getAndAdd, which emits a
different CPU instruction on x86 (CMPXCHG to XADD). Because the
CPU knows it will modifiy the memory, it can avoid contention.
On a highly contended machine, this can be about 2x faster.
There is a downside to the new approach. The ref counters can
temporarily enter invalid states if over retained or over released.
The code does handle these overflow and underflow scenarios, but it
is possible that another concurrent access may push the failure to
a different location. For example:
Time 1 Thread 1: obj.retain(INT_MAX - 1)
Time 2 Thread 1: obj.retain(2)
Time 2 Thread 2: obj.retain(1)
Previously Thread 2 would always succeed and Thread 1 would always
fail on the second access. Now, thread 2 could fail while thread 1
is rolling back its change.
====
There are a few reasons why I think this is okay:
1. Buggy code is going to have bugs. An exception _is_ going to be
thrown. This just causes the other threads to notice the state
is messed up and stop early.
2. If high retention counts are a use case, then ref count should
be a long rather than an int.
3. The critical section is greatly reduced compared to the previous
version, so the likelihood of this happening is lower
4. On error, the code always rollsback the change atomically, so
there is no possibility of corruption.
Result:
Faster refcounting
```
BEFORE:
Benchmark (delay) Mode Cnt Score Error Units
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1 sample 2901361 804.579 ± 1.835 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10 sample 3038729 785.376 ± 16.471 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 100 sample 2899401 817.392 ± 6.668 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1000 sample 3650566 2077.700 ± 0.600 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10000 sample 3005467 19949.334 ± 4.243 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1 sample 456091 48.610 ± 1.162 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10 sample 732051 62.599 ± 0.815 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 100 sample 778925 228.629 ± 1.205 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1000 sample 633682 2002.987 ± 2.856 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10000 sample 506442 19735.345 ± 12.312 ns/op
AFTER:
Benchmark (delay) Mode Cnt Score Error Units
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1 sample 3761980 383.436 ± 1.315 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10 sample 3667304 474.429 ± 1.101 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 100 sample 3039374 479.267 ± 0.435 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1000 sample 3709210 2044.603 ± 0.989 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10000 sample 3011591 19904.227 ± 18.025 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1 sample 494975 52.269 ± 8.345 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10 sample 771094 62.290 ± 0.795 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 100 sample 763230 235.044 ± 1.552 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1000 sample 634037 2006.578 ± 3.574 ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10000 sample 506284 19742.605 ± 13.729 ns/op
```
Motivation:
#7269 removed an unnecessary instanciation for verifying WebSocket
handshake status code.
But it uses a hardcoded status code value for 101 instead of using the
intended `HttpResponseStatus#SWITCHING_PROTOCOLS` constant.
Modidication:
Compare actual `HttpResponseStatus` against predefined constant. Note
that `HttpResponseStatus#equals` is implemented in respect with the RFC
(only honor code, not text) so it’s intended to be used this way.
Result:
Cleaner code, use intended constant instead of hard coded value.
Motivation:
Java9SslEngine did not correctly implement ApplicationProtocolAccessor and so returned an empty String when no extension was used while the interface contract is to return null. This lead to the situation that ApplicationProtocolNegationHandler did not correct work.
Modifications:
- Rename ApplicationProtocolAccessor.getApplicationProtocol() to getNegotiatedApplicationProtocol() which resolves the clash with the method exposed by Java9s SSLEngine.
- Correctly implement getNegotiatedApplicationProtocol() for Java9Sslengine
- Add delegate in Java9Sslengine.getApplicationProtocol() which is provided by Java9
- Adjust tests to test the correct behaviour.
Result:
Fixes [#7251].
Motivation:
https://github.com/netty/netty/issues/7253
Modifications:
Adding `Content-Length: 0` to `CorsHandler.forbidden()` and `CorsHandler.handlePreflight()`
Result:
Contexts that are terminated by the CorsHandler will always include a Content-Length header
Motivation:
- In the `HttpResponseStatus#equals` checks only status code. No need to create new instance of `HttpResponseStatus` for comparison with response status.
- The RFC says: `the HTTP version and reason phrase aren't important` [1].
Modifications:
Use comparison by status code without creating new `HttpResponseStatus`.
Result:
Less allocations, more clear code.
[1] https://tools.ietf.org/html/draft-ietf-hybi-thewebsocketprotocol-00
Motivation:
There are many @SuppressWarnings("unchecked") in the code for the same purpose that we want to do this return:
@SuppressWarnings("unchecked")
public B someMethod() {
......
return (B) this;
}
Modification:
Add a method self() and reuse in all these return lines:
@SuppressWarnings("unchecked")
private B self() {
return (B) this;
}
Result:
Then only one @SuppressWarnings("unchecked") left in the code.
Motivation:
Getting the latest Conscrypt goodies.
Modifications:
A few API changes have occurred, specifically in the Conscrypt
class.
Result:
Netty now builds and tests against Conscrypt 1.0.0.RC11
Motivation:
Http2ServerDowngrader is specifically built for server channels where
inbound Http2StreamFrames are converted into HttpRequests, and outbound
HttpResponses are converted into Http2StreamFrames. It can be easily
made to be more generic to work with client channels where inbound
Http2StreamFrames are converted into HttpResponses, and outbound
HttpRequests are converted into Http2StreamFrames.
Modification:
- Renamed Http2ServerDowngrader to a more general
Http2StreamFrameToHttpObjectCodec
- Made it take in an "isServer" parameter to determine whether encoding
inbound Http2StreamFrames should create HttpRequests (for server) or
HttpResponses (for client)
- Norman fixed a leak in the unit test. Thanks! :-)
Result:
Now Http2StreamFrameToHttpObjectCodec can be used to translate
Http2StreamFrame to HttpObject for both server and client.
Motivation:
We used to check for version 6.8 but the latest is 6.9
Modifications:
Update version to 6.9 in the check.
Result:
Be able to cut a release on latest centos version
Motivation:
3c8c7fc7e9 introduced some changes to the ResourceLeakDetector that introduced a regression and so would always log that paranoid leak detection should be enabled even it was already.
Modifications:
Correctly not clear the recorded stacktraces when we process the reference queue so we can log these.
Result:
ResourceLeakDetector works again as expected.
Motivation:
Most, but not all defaults are statically exposed on
PooledByteBufAllocator. This makes it cumbersome to make a custom
allocator where most of the defaults remain the same.
Modification:
Expose useCacheForAllThreads, and Direct preferred. The latter is
needed because it is under the internal package, and public code
should probably not depend on it.
Result:
More customizeable allocators
Motivation:
A bunch of unit tests are failing due to certificates having expired.
This has to be replaced with a newly generated certificate that has a
longer validity.
Modifications:
- generated a certificate with validity of 100 years from now
Results:
Unit tests are passing again
Motivation:
We changed Http2ConnectionHandler to expect the upgrade method to be
called *after* we send the preface (ie add the handler to the pipeline)
but we forgot to change the Http2ClientUpgradeCodec to match the new
policy. This meant that client-side h2c upgrades failed.
Modifications:
Reverse sending the preface and calling the upgrade method to match the
new policy.
Result:
Clients can initiate h2c upgrades successfully.
Motivation:
We need to ensure we not write any body when a response with status code of 1xx, 204 or 304 is used as stated in rfc:
https://tools.ietf.org/html/rfc7230#section-3.3.3
Modifications:
- Correctly handle status codes
- Add unit tests
Result:
Correctly handle responses with 1xx, 204, 304 status codes.
Motivation:
NativeLibraryLoader uses ClassLoader#getResource method that can return nulls when the resource cannot be found. The returned url variable should be checked for nullity and fail in a more usable manner than a NullPointerException
Modifications:
Fail with a FileNotFoundException
Result:
Fixes [#7222].
Motivation:
The constrcutors a protected atm but the classes are public. We should make the constructors public as well to make it easier to write your own ByteBufAllocator.
Modifications:
Change constructors to be public and add some javadocs.
Result:
Easier to create own ByteBufAllocator.
Motiviation:
The ResourceLeakDetector helps to detect and troubleshoot resource leaks and is often used even in production enviroments with a low level. Because of this its import that we try to keep the overhead as low as overhead. Most of the times no leak is detected (as all is correctly handled) so we should keep the overhead for this case as low as possible.
Modifications:
- Only call getStackTrace() if a leak is reported as it is a very expensive native call. Also handle the filtering and creating of the String in a lazy fashion
- Remove the need to mantain a Queue to store the last access records
- Add benchmark
Result:
Huge decrease of performance overhead.
Before the patch:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 4358.367 ± 116.419 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 2306.027 ± 55.044 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 4220.979 ± 114.046 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 2250.734 ± 55.352 ops/s
With this patch:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 71398.957 ± 2695.925 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 38643.963 ± 1446.694 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 71677.882 ± 2923.622 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 38660.176 ± 1467.732 ops/s
Motivation:
When SO_LINGER is used we run doClose() on the GlobalEventExecutor by default so we need to ensure we schedule all code that needs to be run on the EventLoop on the EventLoop in doClose. Beside this there are also threading issues when calling shutdownOutput(...)
Modifications:
- Schedule removal from EventLoop to the EventLoop
- Correctly handle shutdownOutput and shutdown in respect with threading-model
- Add unit tests
Result:
Fixes [#7159].
Motivation:
c93e58c453 changed to use _ for the tcnative lib name but missed to also adjust the filtering.
Modifications:
Fix filtering to look for _
Result:
Not include native tcnative lib as expected.
Motivation:
A large frame will be componsed by many packages. Every time the package
arrived, findEndOfLine will be called from the start of the buffer. It
will cause the complexity of reading frame equal to O(n^2). This can be
eliminated by using a offset to mark the last scan position, when new
package arrived, just find the delimter from the mark. The complexity
will be O(n).
Modification:
Add a offset to mark the last scan position.
Result:
Better performance for read large frame.
Motiviation:
At the moment an NPE is thrown if someone tries to use the InboundHttp2ToHttpAdapter.
Modifications:
- Ensure the status was null in "InboundHttp2ToHttpAdapter::onPushPromiseRead" before calling "HttpConversionUtil.parseStatus" methods.
- Fix setting status to OK in "InboundHttp2ToHttpAdapter::onPushPromiseRead".
Result:
Fixes [#7214].
Motivation:
The Headers interface supports an interface to get all the headers values corresponding to a particular name. This API returns a List which requires intermediate storage and increases GC pressure.
Modifications:
- Add a method which returns an iterator over all the values for a specific name
Result:
Ability to iterator over values for a specific name with no intermediate collection.
Motivation:
RLD allocates an ArrayDeque in anticipation of recording access
points. If the leak detection level is less than ADVANCED though,
the dequeue is never used. Since SIMPLE is the default level,
there is a minor perf win to not preemptively allocate it.
This showed up in garbage profiling when creation a high number of
buffers.
Modifications:
Only allocate the dequeue if it will be used.
Result:
Less garbage created.
Motivation:
NativeLibraryLoader may only log a debug statement if the library is successfully loaded from java.library.path, but will log failure statements the if load for java.library.path fails which can mislead users to believe the load actually failed when it may have succeeded.
Modifications:
- Always load a debug statement when a library was successfully loaded
Result:
NativeLibraryLoader log statements more clear.
Motivation:
When Log4j2Logger is used with PatternLayout (%F:%L)%c.%M, the log message incorrect shows:
(Log4J2Logger.java:73)io.netty.util.internal.PlatformDependent0.debug ....
Modification:
Extend AbstractLogger
Result:
Fixes [#7186].
Motivation:
transport-rxtx has no tests and there is really no easy way to add some. Beside this this transport is not really well maintained.
Modifications:
Mark transport-rxtx as @deprecated so we can drop it in next major version.
Result:
Notify users of plan to drop the transport.
Motivation:
The MQTT decoder should raise an exception trying to build a CONNECT packet where password field is set but not the username one (as by MQTT 3.1/3.1.1 spec).
Modification:
Throw exception if password field is set but not the username
Result:
Fixes [#7205].
Motivation:
We need to ensure we use the correct osname in the Bundle-NativeCode declaration as declared in:
https://www.osgi.org/developer/specifications/reference/
Modifications:
Update osname to match the spec.
Result:
Correct Bundle-NativeCode entry in the MANIFEST
Motivation:
We missed to mark the Http2StreamChannel as writable in some cases which could lead to the situation that a Channel never becomes writable. Also when a Http2StreamChannel was created we always marked it non-writable at the beginning which means if the user will only start writing once the Channel becomes writable it will never happen as it only became writable after the first header was written.
Modifications:
- Correctly handle updates for writability in all cases
- Change unit tests to cover this.
Result:
Fixes [#7179].
Motiviation:
At the moment an NPE is thrown if someone tries to use the Http2ServerUpgradeCodec with Http2FrameCodec and Http2MultiplexCodec.
Modifications:
- Ensure the handler was added to the pipeline before calling on*Upgrade(...) methods.
- Add tests
- Fix adding of handlers after upgrade.
Result:
Fixes [#7173].
Motivation:
In `Http2StreamChannelId` a `hashCode()` is not consistent with `equals()`.
Modifications:
Make a `Http2StreamChannelId.hashCode()` consistent with `equals()`.
Result:
Faster hash map's operations where the Http2StreamChannelId as keys.
Motivation:
A `DefaultChannelId` has final `hashCode` field calculated in the constructor. We can use it in `equals` to the fast return for different objects.
Modifications:
Use `hashCode` field in `DefaultChannelId.equals()`.
Result:
Fast `equals` on negative scenarios.
Motivation:
`useCacheForAllThreads` may be false which disables memory caching
on non netty threads. Setting this argument or the system property
makes it impossible to use `PooledByteBufAllocator`.
Modifications:
Delayed the check of `freeSweepAllocationThreshold` in
`PoolThreadCache` to after it knows there will be any caches in
use. Additionally, check if the caches will have any data in them
(rather than allocating a 0-length array).
A test case is also added that fails without this change.
Results:
Fixes#7194
Motivation:
We recently saw an assertion failure when running DatagramUnicastTest.testSimpleSendWithConnect.
Modifications:
- Adding more debug infos
- Ensure we always correctly release the buffers.
Result:
More informations when tests fail.
Motivation:
We should not log by default if the promise is a VoidChannelPromise as its try* methods will always return false.
Modifications:
Do an instanceof check to determine if we should log or not by default
Result:
No more noise in the logs when using a VoidChannelPromise.
Motivation:
In `DefaultHttp2GoAwayFrame.equals()` a content compared twice: explicitly and in the `super` method.
Modifications:
Remove explicit content comparision.
Make `hashCode()` consistent with `equals()`.
Result:
A `DefaultHttp2GoAwayFrame.equals()` work faster.
Motivation:
We must not add the inboundStreamHandler for outbound streams creates by Http2MultiplexCodec as the user will specify a handler via Http2StreamChannelBootstrap.
Modifications:
- Check if the stream is for outbound and if so not add the inboundStreamHandler to the pipeline
- Update tests so this is covered.
Result:
Fixes [#7178]
Motivation:
379ac890f4 introduced the usage of the inline mock maker. This unfortunally not work on java7 and java9.
Modifications:
Just use reflection to create the event for now.
Result:
Netty tests pass again on java7 and java9 as well.
Motivation:
We should only try to load the native artifacts if the architecture we are currently running on is the same as the one the native libraries were compiled for.
Modifications:
Include architecture in native lib name and append the current arch when trying to load these. This will fail then if its not the same as the arch of the compiled arch.
Result:
Fixes [#7150].
Motivation:
When using Http2FrameCodec / Http2MultiplexCodec with HttpServerUpgradeHandler reference count exception will be triggered.
Modifications:
- Correctly retain before calling InboundHttpToHttp2Adapter.handle
- Add unit test
Result:
Fixes [#7172].
Motivation:
When the user want to have the direct memory explicitly managed by the GC (just as java.nio does) it is useful to be able to construct an UnpooledByteBufAllocator that allows this without the chances to see any memory leak.
Modifications:
Allow to explicitly disable the usage of reflection to construct direct ByteBufs and so be sure these will be collected by GC.
Result:
More flexible way to use the UnpooledByteBufAllocator.
Motivation:
HttpObjectEncoder allocates a new buffer when encoding the initial line and headers, and also allocates a buffer when encoding the trailers. The allocation always uses the default size of 256. This may lead to consistent under allocation and require a few resize/copy operations which can cause GC/memory pressure.
Modifications:
- Introduce a weighted average which tracks the historical size of encoded data and uses this as an estimate for future buffer allocations
Result:
Better approximation of buffer sizes.
Motivation:
The documentation for field updates says:
> Note that the guarantees of the {@code compareAndSet}
> method in this class are weaker than in other atomic classes.
> Because this class cannot ensure that all uses of the field
> are appropriate for purposes of atomic access, it can
> guarantee atomicity only with respect to other invocations of
> {@code compareAndSet} and {@code set} on the same updater.
This implies that volatiles shouldn't use normal assignment; the
updater should set them.
Modifications:
Use setter for field updaters that make use of compareAndSet.
Result:
Concurrency compliant code
Motivation:
By STOMP 1.2 specification - header name or value include any octet except CR or LF or ":".
Modification:
Add constructor argument that allows to enable / disable validation.
Result:
Fixes [#7083]
Motivation:
PD and PD0 Both try to find and use Unsafe. If unavailable, they
try to log why and continue on. However, it is not always east to
enable this logging. Chaining exceptions together is much easier
to reach, and the original exception is relevant when Unsafe is
needed.
Modifications:
* Make PD log why PD0 could not be loaded with a trace level log
* Make PD0 remember why Unsafe wasn't available
* Expose unavailability cause through PD for higher level use.
* Make Epoll and KQueue include the reason when failing
Result:
Easier debugging in hard to reconfigure environments
Motivation:
According to SOCKS 5 spec, dstPort = 0 is a valid value in case of UDP ASSOCIATE.
Modifications:
- Allow 0 as port.
- Add unit tests.
Result:
Fixes [#7156].