Motivation:
A race detector discovered a data race in GlobalEventExecutor present in
netty 4.1.51.Final:
```
Write of size 4 at 0x0000cea08774 by thread T103:
#0 io.netty.util.internal.DefaultPriorityQueue.poll()Lio/netty/util/internal/PriorityQueueNode; DefaultPriorityQueue.java:113
#1 io.netty.util.internal.DefaultPriorityQueue.poll()Ljava/lang/Object; DefaultPriorityQueue.java:31
#2 java.util.AbstractQueue.remove()Ljava/lang/Object; AbstractQueue.java:113
#3 io.netty.util.concurrent.AbstractScheduledEventExecutor.pollScheduledTask(J)Ljava/lang/Runnable; AbstractScheduledEventExecutor.java:133
#4 io.netty.util.concurrent.GlobalEventExecutor.fetchFromScheduledTaskQueue()V GlobalEventExecutor.java:119
#5 io.netty.util.concurrent.GlobalEventExecutor.takeTask()Ljava/lang/Runnable; GlobalEventExecutor.java:106
#6 io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run()V GlobalEventExecutor.java:240
#7 io.netty.util.internal.ThreadExecutorMap$2.run()V ThreadExecutorMap.java:74
#8 io.netty.util.concurrent.FastThreadLocalRunnable.run()V FastThreadLocalRunnable.java:30
#9 java.lang.Thread.run()V Thread.java:835
#10 (Generated Stub) <null>
Previous read of size 4 at 0x0000cea08774 by thread T110:
#0 io.netty.util.internal.DefaultPriorityQueue.size()I DefaultPriorityQueue.java:46
#1 io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run()V GlobalEventExecutor.java:263
#2 io.netty.util.internal.ThreadExecutorMap$2.run()V ThreadExecutorMap.java:74
#3 io.netty.util.concurrent.FastThreadLocalRunnable.run()V FastThreadLocalRunnable.java:30
#4 java.lang.Thread.run()V Thread.java:835
#5 (Generated Stub) <null>
```
The race is legit, but benign. To trigger it requires a TaskRunner to
begin exiting and set 'started' to false, more work to be scheduled
which starts a new TaskRunner, that work then needs to schedule
additional work which modifies 'scheduledTaskQueue', and then the
original TaskRunner checks 'scheduledTaskQueue'. But there is no danger
to this race as it can only produce a false negative in the condition
which causes the code to CAS 'started' which is thread-safe.
Modifications:
Delete problematic references to scheduledTaskQueue. The only way
scheduledTaskQueue could be modified since the last check is if another
TaskRunner is running, in which case the current TaskRunner doesn't
care.
Result:
Data-race free code, and a bit less code to boot.
Motivation:
When validating PRs we should also at least run one job that uses boringssl
Modifications:
- Add job that uses boringssl
- Cleanup docker compose files
- Fix buffer leak in test
Result:
Also run with boringssl when PRs are validated
Motivation:
To prevent failures to problems while downloading dependencies we shoud cache these
Modifications:
Add maven cache
Result:
No more failures due problems while downloading dependencies
Motivation:
We need to find a way to deploy SNAPSHOTS for different arch with the same timestamp. Otherwise it will cause problems.
See https://github.com/netty/netty/issues/10887
Modification:
Skip all other deploys then x86_64
Result:
Users are able to use SNAPSHOTS for x86_6
Motivation:
The CONNACK message builder `ConnAckBuilder` doesn't provide a smooth way to assign the message properties. This PR try to provide an simpler way to create them, in a lazy way.
Modification:
This PR permit to store properties in the ConnAck message, collecting them and inserting during the build phase. The syntax this PR introduces is:
```java
MqttMessageBuilders.connAck().properties(new MqttMessageBuilders.PropertiesInitializer<MqttMessageBuilders.ConnAckPropertiesBuilder>() {
@Override
public void apply(MqttMessageBuilders.ConnAckPropertiesBuilder builder) {
builder.assignedClientId("client1234");
builder.userProperty("custom_property", "value");
}
}).build()
```
The name of the properties are defined in the `ConnAckPropertiesBuilder` so that is can be easily used by autocompletion tools.
Result:
This PR adds the builder class `ConnAckPropertiesBuilder`which is used by newly introduced method `properties` inside the message builder class `ConnAckBuilder`.
Motivation:
We should also deploy snapshots for our cross compiled native jars.
Modifications:
- Add job and docker files for deploying cross compiled native jars
- Ensure we map the maven cache into our docker containers
Result:
Deploy aarch64 jars and re-use cache
Motivation:
Android seems to use a different field name so we should also try to access it with the name used by android.
Modifications:
Try first fd and if this fails try descriptor as field name
Result:
Workaround for android.
Motivation:
switch is used when we have a good amount of cases because switch is faster than if-else. However, we're using only 1 case in switch which can affect performance.
Modification:
Changed switch to if.
Result:
Good code.
Motivation:
We should just use GitHub Actions for the CI
Modifications:
- Adjust docker / docker compose files
- Add different workflows and jobs to deploy and build the project
Result:
Don't depend on external CI services
Motivation:
HPACK static table is organized in a way that fields with the same
name are sequential. Which means when doing sequential scan we can
short-circuit scan on name mismatch.
Modifications:
* `HpackStaticTable.getIndexIndensitive` returns -1 on name mismatch
rather than keep scanning.
* `HpackStaticTable` statically defined max position in the array
where name duplication is possible (after the given index there's
no need to check for other fields with the same name)
* Benchmark for different lookup patterns
Result:
Better HPACK static table lookup performance.
Co-authored-by: Norman Maurer <norman_maurer@apple.com>
Motivation:
netty-jni-util is now also hosted on maven central. Let's use it
Modifications:
Adjust plugins to just unpack netty-jni-util and use it
Result:
Be able to use what is in the maven cache for netty-jni-util
Motivation:
If the MQTT client specifies Subscribe Options parameters only available in MQTT v5 and tries to encode the message as MQTT v3 then an invalid QoS value is encoded
Modification:
Check MQTT version when encoding SUBSCRIBE message options, if it's 3.1 or 3.1.1 - only encode QoS, skip other options.
Result:
MqttEncoder produces a valid SUBSCRIBE message even if the client has specified options not available in the current MQTT version.
Motivation:
We need to ensure we only register native methods once as otherwise we may end up in an "invalid" state. The problem here was that before it was basically the responsibility the user of transport-native-unix-common to register the methods. This is error prone as there may be multiple users of these on the classpath at the same time.
Modifications:
- Provide a way to init native lib without register the native methods of the provided classes. This is needed to be able to re-use functionality which is exposed to our internal native code
- Use flatten plugin to correctly resolve classifier and so have the correct dependency
- Call Unix.* method to ensure we register the methods correctly once
- Include native lib as well in the native jars of unix-common
Result:
Be able to have multiple artifacts of the classpath that depends on the unix-common. Related to https://github.com/netty/netty-incubator-transport-io_uring/issues/15
Motivation:
I did not see any tangible advantage to the padding.
The only other field that was guarded was a rarely changed object reference to a BitSet.
Without the padding, there is also no longer any use of the inheritance hierarchy.
The padding was also using `long`, which would not necessarily prevent the JVM from fitting the aforementioned object reference in an alignment gap.
Modification:
Move all the fields into the InternalThreadLocalMap and deprecate the stuff we'd like to remove.
Result:
Simpler code.
This resolves the discussion in https://github.com/netty/netty/issues/9284
Motivation:
https://github.com/netty/netty/pull/10814 did fix a bug where we did try to call memoryAddress() even tho this is not supported. Unfortunally this fix was only applied for one method and so we missed another method which then could throw an exception when we called memoryAddress()
Modifications:
- Also fix the memoryAddress(offset) method.
_ Adjust unit test to also test this.
Result:
Fixes https://github.com/netty/netty/issues/10813 completely.
Motivation:
We shouldnt catch Throwable in InternalLoggerFactory as this may hide OOME etc.
Modifications:
Only catch LinkageError and Exception
Result:
Fixes https://github.com/netty/netty/issues/10857
Motivation:
When using the JDKs SSLEngineImpl with TLSv1.3 it sometimes returns HandshakeResult.FINISHED multiple times. This can lead to have SslHandshakeCompletionEvents to be fired multiple times.
Modifications:
- Keep track of if we notified before and if so not do so again if we use TLSv1.3
- Add unit test
Result:
Consistent usage of events
Motivation:
To fix the infinite loop parsing a multipart body.
Modifications:
Modified the loop to use the correct variable.
Result:
Multipart bodies will be parsed correctly again.
Motivation:
`DnsServerAddressStreamProviders` tries to load `MacOSDnsServerAddressStreamProvider`
on macOS. However, it doesn't warn users when `MacOSDnsServerAddressStreamProvider`
is not awailable, which may cause incorrect results for DNS resolutions.
Modifications:
- Log at warn level if `MacOSDnsServerAddressStreamProvider` is not found on macOS;
- Log at debug level when `MacOSDnsServerAddressStreamProvider` is loaded and available;
Result:
macOS users are notified when `MacOSDnsServerAddressStreamProvider` is not available.
Motivation:
Found an invalid comment in UnpooledDirectByteBuf.
Modification:
Fixed a comment in UnpooledDirectByteBuf.
Result:
Fixed a comment in UnpooledDirectByteBuf.
Motivation:
In #10630, field substitutions were introduced for NetUtil.LOCALHOST4, NetUtil.LOCALHOST6 and NetUtil.LOCALHOST fields. They were required to allow a native image be built with most of Netty (including NetUtil) initialized at build time.
The substitutions created in #10630 only define getters, so the 3 fields can only be read in a native image.
But when NetUtil is initialized at run-time (this is what happens in #10797), its static initialization block is executed, and this block writes to all 3 fields. As the substitutions do not provide any setters, field stores are not valid, and such builds fail.
Modifications:
- Add netty-testsuite-native-image-client-runtime-init Maven module that builds a native image deferring NetUtil class initialization till run-time; this module is used to demonstrate the problem and verify the problem is gone with the fix
- Add no-op setters to substitutions for NetUtil.LOCALHOST4, NetUtil.LOCALHOST6 and NetUtil.LOCALHOST
Result:
A native image initializing NetUtil at run-time builds successfully.
Fixes#10797
Motivation:
According to specification 1006 status code must not be set as a status code in a
Close control frame by the endpoint. However 1006 status code can be
used in applications to indicate that the connection was closed abnormally.
Modifications:
- Enforce status code validation in CloseWebSocketFrame
- Add WebSocketCloseStatus construction with disabled validation
- Add test
Result:
Fixes#10838
Motivation:
We should use aarch_64 in our classifier / jni libname on aarch64 as os.detected.arch uses the name. Being non consistent (especially across our different projects) already gave us a lot of trouble in the past.
Let's fix this once for all.
Modifications:
Use aarch_64
Result:
More consistent classifier usage on aarch64
Motivation:
We should add `state` in the exception message of `HttpObjectEncoder` because it makes debugging a little easier.
Modification:
Added `state` in the exception message.
Result:
Better exception message for smooth debugging.
Motivation:
Internally SSLEngineImpl.wrap(...) may call FileInputStream.read(...).
This will cause the error below when BlockHound is enabled
reactor.blockhound.BlockingOperationError: Blocking call! java.io.FileInputStream#readBytes
at java.io.FileInputStream.readBytes(FileInputStream.java)
at java.io.FileInputStream.read(FileInputStream.java:255)
Modifications:
- Add whitelist entry to BlockHound configuration
- Add test
Result:
Fixes#10837
Motivation:
If Log4J2's `Filter` creates `Recycler.Stack` somehow, `Recycler.Stack()` will see uninitialized `Recycler.INITIAL_CAPACITY`. This has been raised originally in https://github.com/micrometer-metrics/micrometer/issues/2369.
Modification:
This PR changes to initialize `Recycler.INITIAL_CAPACITY` before invoking `InternalLogger.debug()` to avoid it.
Result:
Fixes the problem described in the "Motivation" section.
Motivation:
We rely on this functionality in PoolChunk, and a bug was caught by a non-deterministic test failure
Modification:
Went back to the Algorithms book, and reimplemented remove() the way it was meant to.
Result:
No test failures after 200.000 runs, so we have some confidence the code is correct now.
Motivation:
Http2ConnectionHandler tries to addListener to the future without checking if it's void. If it is void, this will fail and generate an exception.
Modifications:
Unvoid the promise in writeData()
Result:
Fixes#10816, Writing with a voidPromise no longer generates exceptions.
Motivation:
The uncached access to PoolChunk can be made faster, and avoid allocating boxed Longs, if we have a primitive hash map and priority queue implementation for it.
Modification:
Add bespoke primitive implementations of a hash map and a priority queue for PoolChunk.
Remove all the long-boxing caused by the previous implementation.
The hashmap is a linear probing map with a fairly short probe that keeps the search within a couple of cache lines.
The priority queue is the same binary heap algorithm that's described in Algorithms by Sedgewick and Wayne.
The implementation avoids the Long boxing by relying on a long[] array.
This makes the internal-remove method faster, which is an important operation in PoolChunk.
Result:
Roughly 13% performance uplift in buffer allocations that miss cache.
Motivation:
The DnsNameResolver internally follows CNAME indirects for all records types, and supports caching for CNAME resolution and A* records. For DNS record types that are not cached (e.g. SRV records) the caching of CNAME records may result in failures at incorrect times. For example if a CNAME record has a larger TTL than the entries it resolves this may result in failures which don't occur if the CNAME cache is disabled.
Modifications:
- Don't cache CNAME and also dont use the cache for CNAME when using DnsRecordResolveContext
- Add unit test
Result:
More correct resolving and also not possible to have failures due CNAME still be in the cache while the queried record experied