Expose a LoggingDnsQueryLifeCycleObserverFactory
Motivation:
There is a use case for having logging in the DnsNameResolver, similar to the LoggingHandler.
Previously, one could set `traceEnabled` on the DnsNameResolverBuilder, but this is not very configurable.
Specifically, the log level and the logger context cannot be changed.
Modification:
Expose a LoggingDnsQueryLifeCycleObserverFactory, that permit changing the log-level
and logger context.
Result:
It is now possible to get logging in the DnsNameResolver at a custom log level and logger,
without very much effort.
Fixes#10485
Motivation:
We need limit the number of queries done to resolve unresolved nameserver as otherwise we may never fail the resolve when there is a missconfigured dns server that lead to a "resolve loop".
Modifications:
- When resolve unresolved nameservers ensure we use the allowedQueries as max upper limit for queries so it will eventually fail
- Add unit tests
Result:
No more possibility to fail in a loop while resolve. This is related to https://github.com/netty/netty/issues/10420
Motivation:
DnsAddressResolverGroup allows to be constructed with a DnsNameResolverBuilder and so should respect its configured EventLoop.
Modifications:
- Correctly respect the configured EventLoop
- Ensure there are no thread-issues by calling copy()
- Add unit tests
Result:
Fixes https://github.com/netty/netty/issues/10460
Motivation:
If we cancel the returned future of resolve query, we may get LEAK. Try to release the ByteBuf if netty can't pass the DnsRawRecord to the caller.
Modification:
Using debug mode I saw there are two places that don't handle trySuccess with release. Try to release there.
Result:
Fixes#10447.
Motivation:
There is a possibility to end up with a StackOverflowError when trying to resolve authorative nameservers because of incorrect wrapping the AuthoritativeDnsServerCache.
Modifications:
Ensure we don't end up with an overflow due wrapping
Result:
Fixes https://github.com/netty/netty/issues/10246
Motivation:
Once a CNAME loop was detected we can just fail fast and so reduce the number of queries.
Modifications:
Fail fast once a CNAME loop is detected
Result:
Fail fast
Motivation:
We did use a HashSet to detect CNAME cache loops which needs allocations. We can use an algorithm that doesnt need any allocations
Modifications:
Use algorithm that doesnt need allocations
Result:
Less allocations on the slow path
Motivation:
The nameserver that should / must be used to resolve a CNAME may be different then the nameserver that was selected for the hostname to resolve. Failing to select the correct nameserver may result in problems during resolution.
Modifications:
Use the correct DnsServerAddressStream for CNAMEs
Result:
Always use the correct DnsServerAddressStream for CNAMEs and so fix resolution failures which could accour when CNAMEs are in the mix that use a different domain then the original hostname that we try to resolve
Motivation:
We should only log with warn level if something really critical happens as otherwise we may spam logs and confuse the user.
Modifications:
- Change log level to debug for most cases
Result:
Less noisy logging
Motivation:
We need to detect CNAME loops during lookup the DnsCnameCache as otherwise we may try to follow cnames forever.
Modifications:
- Correctly detect CNAME loops in the cache
- Add unit test
Result:
Fixes https://github.com/netty/netty/issues/10220
Motivations
-----------
DnsNameResolverBuilder and DnsNameResolver do not auto-configure
themselves uing default options define in /etc/resolv.conf.
In particular, rotate, timeout and attempts options are ignored.
Modifications
-------------
- Modified UnixResolverDnsServerAddressStreamProvider to parse ndots,
attempts and timeout options all at once and use these defaults to
configure DnsNameResolver when values are not provided by the
DnsNameResolverBuilder.
- When rotate option is specified, the DnsServerAddresses returned by
UnixResolverDnsServerAddressStreamProvider is rotational.
- Amend resolv.conf options with the RES_OPTIONS environment variable
when present.
Result:
Fixes https://github.com/netty/netty/issues/10202
Motivation:
A user might want to cancel DNS resolution when it takes too long.
Currently, there's no way to cancel the internal DNS queries especially when there's a lot of search domains.
Modification:
- Stop sending a DNS query if the original `Promise`, which was passed calling `resolve()`, is canceled.
Result:
- You can now stop sending DNS queries by cancelling the `Promise`.
Motivation:
Related https://github.com/line/armeria/issues/2463
Here is an example that an NIC has only link local address for IPv6.
```
$ ipaddr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
3: eth0@if18692: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1460 qdisc noqueue
link/ether 1a:5e:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
inet 10.xxx.xxx.xxx/24 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::xxxx:xxxx:xxxx:xxxx/64 scope link
valid_lft forever preferred_lft forever
```
If the NICs have only local or link local addresses, We should not send IPv6 DNS queris.
Modification:
- Ignore link-local IPv6 addresses which may exist even on a machine without IPv6 network.
Result:
- `DnsNameResolver` does not send DNS queries for AAAA when IPv6 is not available.
Motivation:
939e928312 introduced MacOSDnsServerAddressStreamProvider which will ensure the right nameservers are selected when running on MacOS. To ensure this is done automatically on MacOS we should use it by default on these platforms.
Modifications:
Try to use MacOSDnsServerAddressStreamProvider when on MacOS via reflection and fallback if not possible
Result:
Ensure the right nameservers are used on MacOS even when a VPN (for example) is used.
Motivation:
In general, we will close the debug log in a product environment. However, logging without external level check may still affect performance as varargs will need to allocate an array.
Modification:
Add log level check simply before logging.
Result:
Improve performance slightly in a product environment.
Motivation:
The resolver API and implementations should be considered stable by now so we should not mark these with @UnstableApi
Modifications:
Remove @UnstableApi annotation from API and implementation of resolver
Result:
Make it explicit that the API is considered stable
Motivation:
The resolv.conf file may contain inline comments which should be ignored
Modifications:
- Detect if we have a comment after the ipaddress and if so skip it
- Add unit test
Result:
Fixes https://github.com/netty/netty/issues/9889
Motivation:
We should just ignore (and so skip) invalid entries in /etc/resolver.conf.
Modifications:
- Skip invalid entries
- Add unit test
Result:
Fix https://github.com/netty/netty/issues/9684
Motivation
A memory leak related to DNS resolution was reported in #9634,
specifically linked to the TCP retry fallback functionality that was
introduced relatively recently. Upon inspection it's apparent that there
are some error paths where the original UDP response might not be fully
released, and more significantly the TCP response actually leaks every
time on the fallback success path.
It turns out that a bug in the unit test meant that the intended TCP
fallback path was not actually exercised, so it did not expose the main
leak in question.
Modifications
- Fix DnsNameResolverTest#testTruncated0 dummy server fallback logic to
first read transaction id of retried query and use it in replayed
response
- Adjust semantic of internal DnsQueryContext#finish method to always
take refcount ownership of passed in envelope
- Reorder some logic in DnsResponseHandler fallback handling to verify
the context of the response is expected, and ensure that the query
response are either released or propagated in all cases. This also
reduces a number of redundant retain/release pairings
Result
Fixes#9634
Motivation:
Classes `AbstractHttp2StreamChannel.Http2StreamChannelConfig`
and `DnsNameResolver.AddressedEnvelopeAdapter` may be static:
it doesn't reference its enclosing instance.
Modification:
Add `static` modifier.
Result:
Prevents a possible memory leak and uses less memory per class instance.
Motivation:
We currently try to access the the domain search list via reflection on windows which will print a illegal access warning when using Java9 and later.
Modifications:
Add a guard against the used java version.
Result:
Fixes https://github.com/netty/netty/issues/9500.
Motivation:
It is possible that the user uses a too big EDNS0 setting for the MTU and so we may receive a truncated datagram packet. In this case we should try to detect this and retry via TCP if possible
Modifications:
- Fix detecting of incomplete records
- Mark response as truncated if we did not consume the whole packet
- Add unit test
Result:
Fixes https://github.com/netty/netty/issues/9365
Motivation:
We should only ever close the underlying tcp socket once we received the envelope to ensure we never race in the test.
Modifications:
- Only close socket once we received the envelope
- Set REUSE_ADDR
Result:
More robust test
Motivation:
testTruncatedWithTcpFallback was flacky as we may end up closing the socket before we could read all data. We should only close the socket after we succesfully read all data.
Modifications:
Move socket.close() to finally block
Result:
Fix flaky test and so make the CI more stable again.
Motivation:
We should only try to use reflection to access default nameservers when using Java8 and lower as otherwise we will produce an Illegal reflective access warning like:
WARNING: Illegal reflective access by io.netty.resolver.dns.DefaultDnsServerAddressStreamProvider
Modifications:
Add Java version check before try to use reflective access.
Result:
No more warning when Java9+ is used.
Motivation:
OOME is occurred by increasing suppressedExceptions because other libraries call Throwable#addSuppressed. As we have no control over what other libraries do we need to ensure this can not lead to OOME.
Modifications:
Only use static instances of the Exceptions if we can either dissable addSuppressed or we run on java6.
Result:
Not possible to OOME because of addSuppressed. Fixes https://github.com/netty/netty/issues/9151.
Motivation:
Sometimes DNS responses can be very large which mean they will not fit in a UDP packet. When this is happening the DNS server will set the TC flag (truncated flag) to tell the resolver that the response was truncated. When a truncated response was received we should allow to retry via TCP and use the received response (if possible) as a replacement for the truncated one.
See https://tools.ietf.org/html/rfc7766.
Modifications:
- Add support for TCP fallback by allow to specify a socketChannelFactory / socketChannelType on the DnsNameResolverBuilder. If this is set to something different then null we will try to fallback to TCP.
- Add decoder / encoder for TCP
- Add unit tests
Result:
Support for TCP fallback as defined by https://tools.ietf.org/html/rfc7766 when using DnsNameResolver.
Motivation:
https://github.com/netty/netty/pull/9021 did apply some changes to filter out duplicates InetAddress when calling resolveAll(...) to mimic JDK behaviour. Unfortunally this also introduced a regression as we should not filter duplicates when the user explicit calls resolveAll(DnsQuestion).
Modifications:
- Only filter duplicates if resolveAll(String) is used
- Add unit test
Result:
Fixes regressions introduces by https://github.com/netty/netty/pull/9021
Motivation:
075cf8c02e introduced a change to allow resolve(...) to notify as soon as the preferred record was resolved. This works great but we should also allow the user to configure that we want to do the same for resolveAll(...), which means we should be able to notify as soon as all records for a preferred record were resolved.
Modifications:
- Add a new DnsNameResolverBuilder method to allow configure this (use false as default to not change default behaviour)
- Add unit test
Result:
Be able to speed up resolving.
Motivation:
At the moment resolve(...) does just delegate to resolveAll(...) and so will only notify the future once all records were resolved. This is wasteful as we are only interested in the first record anyway. We should notify the promise as soon as one record that matches the preferred record type is resolved.
Modifications:
- Introduce DnsResolveContext.isCompleteEarly(...) to be able to detect once we should early notify the promise.
- Make use of this early detecting if resolve(...) is called
- Remove FutureListener which could lead to IllegalReferenceCountException due double releases
- add unit test
Result:
Be able to notify about resolved host more quickly.
Motivation:
We did not correctly calculate the new ttl as we did forget to add `this.`
Modifications:
Add .this and so correctly calculate the TTL
Result:
Use correct TTL for authoritative nameservers when updating these.
Motivation:
To closely mimic what the JDK does we should not try to resolve AAAA records if the system itself does not support IPv6 at all as it is impossible to connect to this addresses later on. In this case we need to use ResolvedAddressTypes.IPV4_ONLY.
Modifications:
Add static method to detect if IPv6 is supported and if not use ResolvedAddressTypes.IPV4_ONLY.
Result:
More consistent behaviour between JDK and our resolver implementation.
Motivation:
At the moment we basically drop all non prefered addresses when calling DnsNameResolver.resolveAll(...). This is just incorrect and was introduced by 4cd39cc4b3. More correct is to still retain these but sort the returned List to have the prefered addresses on the beginning of the List. This also ensures resolve(...) will return the correct return type.
Modifications:
- Introduce PreferredAddressTypeComperator which we use to sort the List so it will contain the preferred address type first.
- Add unit test to verify behaviour
Result:
Include not only preferred addresses in the List that is returned by resolveAll(...)
Motivation:
During investigating some other bug I noticed that we log with warn level if we fail to notify the promise due the fact that it is already full-filled. This is not correct and missleading as there is nothing wrong with it in general. A promise may already been fullfilled because we did multiple queries and one of these was successful.
Modifications:
- Change log level to trace
- Add unit test which before did log with warn level but now does with trace level.
Result:
Less missleading noise in the log.
Motivation:
DnsNameResolver#resolveAll(String) may return duplicate results in the event that the original hostname DNS response includes an IP address X and a CNAME that ends up resolving the same IP address X. This behavior is inconsistent with the JDK’s resolver and is unexpected to retrun a List with duplicate entries from a resolveAll(..) call.
Modifications:
- Filter out duplicates
- Add unit test
Result:
More consistent and less suprising behavior
Motivation:
We did not have any unit tests that queries for TXT records.
Modifications:
Add unit test to query TXT records.
Result:
More test-coverage.
Motivation:
When using multiple nameservers and a nameserver respond with NXDOMAIN we should only fail the query if the nameserver in question is authoritive or no nameservers are left to try.
Modifications:
- Try next nameserver if NXDOMAIN was returned but the nameserver is not authoritive
- Adjust testcase to respect correct behaviour.
Result:
Fixes https://github.com/netty/netty/issues/8261
Motivation:
We do not correctly detect loops when follow CNAMEs and so may try to follow it without any success.
Modifications:
- Correctly detect CNAME loops
- Do not cache CNAME entries which point to itself
- Add unit test.
Result:
Fixes https://github.com/netty/netty/issues/8687.
Motivation:
Andoid does not contain javax.naming.* so we should not try to use it to prevent a NoClassDefFoundError on init.
Modifications:
Only try to use javax.naming.* to retrieve nameservers when not using Android.
Result:
Fixes https://github.com/netty/netty/issues/8654.
Motivation:
Most of the maven modules do not explicitly declare their
dependencies and rely on transitivity, which is not always correct.
Modifications:
For all maven modules, add all of their dependencies to pom.xml
Result:
All of the (essentially non-transitive) depepdencies of the modules are explicitly declared in pom.xml
Motivation:
Some of transports support gathering writes when using datagrams. For example this is the case for EpollDatagramChannel. We should minimize the calls to flush() to allow making efficient usage of sendmmsg in this case.
Modifications:
- minimize flush() operations when we query for multiple address types.
- reduce GC by always directly schedule doResolveAll0(...) on the EventLoop.
Result:
Be able to use sendmmsg internally in the DnsNameResolver.
Motivation:
We should refresh the DNS configuration each 5 minutes to be able to detect changes done by the user. This is inline with what OpenJDK is doing
Modifications:
Refresh config every 5 minutes.
Result:
Be able to consume changes made by the user.
Motivation:
It should be possible to build a DnsNameResolver with a null resolvedAddressTypes, defaulting then to DEFAULT_RESOLVE_ADDRESS_TYPES (see line 309).
Sadly, `preferredAddressType` is then called on line 377 with the original parameter instead of the instance attribute, causing an NPE when it's null.
Modification:
Call preferredAddressType with instance attribuet instead of constructor parameter.
Result:
No more NPE
Motivation:
ba594bcf4a added a utility to parse searchdomains defined in /etc/resolv.conf but did not correctly handle the case when multiple are defined that are seperated by either whitespace or tab.
Modifications:
- Correctly parse multiple entries
- Add unit test.
Result:
Correctly parse multiple searchdomain entries.
* Use AuthoritativeDnsServerCache for creating the new redirect stream.
Motivation:
At the moment if a user wants to provide custom sorting of the nameservers used for redirects it needs to be implemented in two places. This is more complicated as it needs to be.
Modifications:
- Just delegate to the AuthoritativeDnsServerCache always as we fill it before we call newRedirectDnsServerStream anyway.
Result:
Easier way for the user to implement custom sorting.
* Add cache for CNAME mappings resolved during lookup of DNS entries.
Motivation:
If the CNAMEd hostname is backed by load balancing component, typically the final A or AAAA DNS records have small TTL. However, the CNAME record itself is setup with longer TTL.
For example:
* x.netty.io could be CNAMEd to y.netty.io with TTL of 5 min
* A / AAAA records for y.netty.io has a TTL of 0.5 min
In current Netty implementation, original hostname is saved in resolved cached with the TTL of final A / AAAA records. When that cache entry expires, Netty recursive resolver sends at least two queries — 1st one to be resolved as CNAME record and the 2nd one to resolve the hostname in CNAME record.
If CNAME record was cached, only the 2nd query would be needed most of the time. 1st query would be needed less frequently.
Modifications:
Add a new CnameCache that will be used to cache CNAMEs and so may reduce queries.
Result:
Less queries needed when CNAME is used.
Motivation
Applications should not depend on internal packages with Java 9 and later. This cause a warning now, but will break in future versions of Java.
Modification
This change adds methods to UnixResolverDnsServerAddressStreamProvider (following after #6844) that parse /etc/resolv.conf for domain and search entries. Then DnsNameResolver does not need to rely on sun.net.dns.ResolverConfiguration to do this.
Result
Fixes#8318. Furthermore, at least in my testing with Java 11, this also makes multiple search entries work properly (previously I was only getting the first entry).
Motivation:
We should not try to cast the Channel to a DatagramChannel as this will cause a ClassCastException.
Modifications:
- Do not cast
- rethrow from constructor if we detect the registration failed.
- Add unit test.
Result:
Propagate correct exception.
Motiviation:
We incorrectly did ignore NS servers during redirect which had no ADDITIONAL record. This could at worse have the affect that we failed the query completely as none of the NS servers had a ADDITIONAL record. Beside this using a DnsCache to cache authoritative nameservers does not work in practise as we we need different features and semantics when cache these servers (for example we also want to cache unresolved nameservers and resolve these on the fly when needed).
Modifications:
- Correctly take NS records into account that have no matching ADDITIONAL record
- Correctly handle multiple ADDITIONAL records for the same NS record
- Introduce AuthoritativeDnsServerCache as a replacement of the DnsCache when caching authoritative nameservers + adding default implementation
- Add an adapter layer to reduce API breakage as much as possible
- Replace DnsNameResolver.uncachedRedirectDnsServerStream(...) with newRedirectDnsServerStream(...)
- Add unit tests
Result:
Our DnsResolver now correctly handle redirects in all cases.
Motivation:
We are currently always remove all entries from the cache for a hostname if the lowest TTL was reached but schedule one for each of the cached entries. This is wasteful.
Modifications:
- Reimplement logic to schedule TTL to only schedule a new removal task if the requested TTL was actual lower then the one for the already scheduled task.
- Ensure we only remove from the internal map if we did not replace the Entries in the meantime.
Result:
Less overhead in terms of scheduled tasks for the DefaultDnsCache
Motivation:
We should ensure we return the same cached entries for the hostname and hostname ending with dot. Beside this we also should use it for the searchdomains as well.
Modifications:
- Internally always use hostname with a dot as a key and so ensure we correctly handle it in the cache.
- Also query the cache for each searchdomain
- Add unit tests
Result:
Use the same cached entries for hostname with and without trailing dot. Query the cache for each searchdomain query as well