Motivation:
When looking for a leak, its nice to be able to request at least a
number of leaks.
Modification:
* Made all leak records up to the target amoutn recorded, and only
then enable backing off.
* Enable recording more than 32 elements. Previously the shift
amount made this impossible.
Result:
Ability to record all accesses.
Motivation:
Phantom references are for cleaning up resources that were
forgotten, which means they keep their referent alive. This
means garbage is kept around until the refqueue is drained, rather
than when the reference is unreachable.
Modification:
Use Weak References instead of Phantoms
Result:
More punctual leak detection.
Motivation:
Without a 'serialVersionUID' field, any change to a class will make
previously serialized versions unreadable.
Modifications:
Add missed 'serialVersionUID' field for all Serializable
classes.
Result:
Proper deserialization of previously serialized objects.
Configuring this is tough because there is split between highly shared (and accessed) objects and lightly accessed objects.
Modification:
There are a number of changes here. In relative order of importance:
API / Functionality changes:
* Max records and max sample records are gone. Only "target" records, the number of records tries to retain is exposed.
* Records are sampled based on the number of already stored records. The likelihood of recording a new sample is `2^(-n)`, where `n` is the number of currently stored elements.
* Records are stored in a concurrent stack structure rather than a list. This avoids a head and tail. Since the stack is only read once, there is no need to maintain head and tail pointers
* The properties of this imply that the very first and very last access are always recorded. When deciding to sample, the top element is replaced rather than pushed.
* Samples that happen between the first and last accesses now have a chance of being recorded. Previously only the final few were kept.
* Sampling is no longer deterministic. Previously, a deterministic access pattern meant that you could conceivably always miss some access points.
* Sampling has a linear ramp for low values and and exponentially backs off roughly equal to 2^n. This means that for 1,000,000 accesses, about 20 will actually be kept. I have an elegant proof for this which is too large to fit in this commit message.
Code changes:
* All locks are gone. Because sampling rarely needs to do a write, there is almost 0 contention. The dropped records counter is slightly contentious, but this could be removed or changed to a LongAdder. This was not done because of memory concerns.
* Stack trace exclusion is done outside of RLD. Classes can opt to remove some of their methods.
* Stack trace exclusion is faster, since it uses String.equals, often getting a pointer compare due to interning. Previously it used contains()
* Leak printing is outputted fairly differently. I tried to preserve as much of the original formatting as possible, but some things didn't make sense to keep.
Result:
More useful leak reporting.
Faster:
```
Before:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 136293.404 ± 7669.454 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 72805.720 ± 3710.864 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 139131.215 ± 4882.751 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 74146.313 ± 4999.246 ops/s
After:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 155281.969 ± 5301.399 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 77866.239 ± 3821.054 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 153360.036 ± 8611.353 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 78670.804 ± 2399.149 ops/s
```
Motivation:
3c8c7fc7e9 introduced some changes to the ResourceLeakDetector that introduced a regression and so would always log that paranoid leak detection should be enabled even it was already.
Modifications:
Correctly not clear the recorded stacktraces when we process the reference queue so we can log these.
Result:
ResourceLeakDetector works again as expected.
Motiviation:
The ResourceLeakDetector helps to detect and troubleshoot resource leaks and is often used even in production enviroments with a low level. Because of this its import that we try to keep the overhead as low as overhead. Most of the times no leak is detected (as all is correctly handled) so we should keep the overhead for this case as low as possible.
Modifications:
- Only call getStackTrace() if a leak is reported as it is a very expensive native call. Also handle the filtering and creating of the String in a lazy fashion
- Remove the need to mantain a Queue to store the last access records
- Add benchmark
Result:
Huge decrease of performance overhead.
Before the patch:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 4358.367 ± 116.419 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 2306.027 ± 55.044 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 4220.979 ± 114.046 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 2250.734 ± 55.352 ops/s
With this patch:
Benchmark (recordTimes) Mode Cnt Score Error Units
ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 71398.957 ± 2695.925 ops/s
ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 38643.963 ± 1446.694 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 71677.882 ± 2923.622 ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 38660.176 ± 1467.732 ops/s
Motivation:
RLD allocates an ArrayDeque in anticipation of recording access
points. If the leak detection level is less than ADVANCED though,
the dequeue is never used. Since SIMPLE is the default level,
there is a minor perf win to not preemptively allocate it.
This showed up in garbage profiling when creation a high number of
buffers.
Modifications:
Only allocate the dequeue if it will be used.
Result:
Less garbage created.
Motivation:
ResourceLeakDetector records at most MAX_RECORDS+1 records
Modifications:
Make room before add to lastRecords
Result:
ResourceLeakDetector will record at most MAX_RECORDS records
Motivation:
We have our own ThreadLocalRandom implementation to support older JDKs . That said we should prefer the JDK provided when running on JDK >= 7
Modification:
Using ThreadLocalRandom implementation of the JDK when possible.
Result:
Make use of JDK implementations when possible.
Motivation:
ResourceLeakDetector supports a parameter called maxActive. This parameter is used in attempt to limit the amount of objects which are being tracked for leaks at any given time, and generates an error log message if this limit is exceeded. This assumes that there is a relationship between leak sample rate and object lifetime for objects which are already being tracked. This relationship may appear to work in cases were there are a single leak record per object and those leak records live for the lifetime of the application but in general this relationship doesn't exist. The original motivation was to provide a limit for cases such as HashedWheelTimer to limit the number of instances which exist at any given time. This limit is not enforced in all circumstances in HashedWheelTimer (e.g. if the thread is a daemon) and can be implemented outside ResourceLeakDetector.
Modifications:
- Deprecate all methods which interact with maxActive in ResourceLeakDetectorFactory and ResourceLeakDetector
- Remove all logic related to maxActive in ResourceLeakDetector
- HashedWheelTimer implements its own logic to impose a limit and warn users if too many instances exists at any given time.
Result:
Fixes https://github.com/netty/netty/issues/6225.
Motivation:
We need to ensure the tracked object can not be GC'ed before ResourceLeak.close() is called as otherwise we may get false-positives reported by the ResourceLeakDetector. This can happen as the JIT / GC may be able to figure out that we do not need the tracked object anymore and so already enqueue it for collection before we actually get a chance to close the enclosing ResourceLeak.
Modifications:
- Add ResourceLeakTracker and deprecate the old ResourceLeak
- Fix some javadocs to correctly release buffers.
- Add a unit test for ResourceLeakDetector that shows that ResourceLeakTracker has not the problems.
Result:
No more false-positives reported by ResourceLeakDetector when ResourceLeakDetector.track(...) is used.
Motivation:
42fba015ce changed the implemention of ResourceLeakDetector to improve performance. While this was done a branch was missed that can be removed. Beside this using a Boolean as value for the ConcurrentMap is sub-optimal as when calling remove(key, value) an uncessary instanceof check and cast is needed on each removal.
Modifications:
- Remove branch which is not needed anymore
- Replace usage of Boolean as value type of the ConcurrentMap and use our own special type which only compute hash-code one time and use a == operation for equals(...) to reduce overhead present when using Boolean.
Result:
Faster and cleaner ResourceLeakDetector.
Motivation:
ResourceLeakDetector shows two main problems, racy access and heavy lock contention.
Modifications:
This PR fixes this by doing two things:
1. Replace the sampling counter with a ThreadLocalRandom. This has two benefits.
First, it makes the sampling ration no longer have to be a power of two. Second,
it de-noises the continuous races that fight over this single value. Instead,
this change uses slightly more CPU to decide if it should sample by using TLR.
2. DefaultResourceLeaks need to be kept alive in order to catch leaks. The means
by which this happens is by a singular, doubly-linked list. This creates a
large amount of contention when allocating quickly. This is noticeable when
running on a multi core machine.
Instead, this uses a concurrent hash map to keep track of active resources
which has much better contention characteristics.
Results:
Better concurrent hygiene. Running the gRPC QPS benchmark showed RLD taking about
3 CPU seconds for every 1 wall second when runnign with 12 threads.
There are some minor perks to this as well. DefaultResourceLeak accounting is
moved to a central place which probably has better caching behavior.
Motivation:
ResourceLeakDetector reports leak for first call to open(obj) as its leakCheckCnt starts with value 0 and increment subsequently. with value of leakCheckCnt =0, it always returns ResourceLeak. Our application calls ResourceLeakDetector.open(obj) to validate Leak and it fails at very first call even though there is no leak in application.
Modifications:
ResourceLeakDetector.leakCheckCnt value will not be 0 while deriving leak and it will not return incorrect value of ResourceLeak.
Result:
Fix false leak report on first call on ResourceLeakDetector.
Motivation:
Some usages of findNextPositivePowerOfTwo assume that bounds checking is taken care of by this method. However bounds checking is not taken care of by findNextPositivePowerOfTwo and instead assert statements are used to imply the caller has checked the bounds. This can lead to unexpected non power of 2 return values if the caller is not careful and thus invalidate any logic which depends upon a power of 2.
Modifications:
- Add a safeFindNextPositivePowerOfTwo method which will do runtime bounds checks and always return a power of 2
Result:
Fixes https://github.com/netty/netty/issues/5601
Motivation:
We recently added the ResourceLeakDetectorFactory but missed to updated HashedWheelTimer to use it.
Modifications:
- Add new abstract method to ResourceLeakDetectorFactory that allows to provide also samplingInterval and maxActive args.
- Deprecate most constructors in ResourceLeakDetector and add doc explaining that people should use ResourceLeakDetectorFactory
Result:
Custom ResourceLeakDetectorFactory will also be used in HashedWheelTimer if configured.
Allow users of Netty to plug in their own leak detector for the purpose
of instrumentation.
Motivation:
We are rolling out a large Netty deployment and want to be able to
track the amount of leaks we're seeing in production via custom
instrumentation. In order to achieve this today, I had to plug in a
custom `ByteBufAllocator` into the bootstrap and have it initialize a
custom `ResourceLeakDetector`. Due to these classes mostly being marked
`final` or having private or static methods, a lot of the code had to
be copy-pasted and it's quite ugly.
Modifications:
* I've added a static loader method for the `ResourceLeakDetector` in
`AbstractByteBuf` that tries to instantiate the class passed in via the
`-Dio.netty.customResourceLeakDetector`, otherwise falling back to the
default one.
* I've modified `ResourceLeakDetector` to be non-final and to have the
reporting broken out in to methods that can be overridden.
Result:
You can instrument leaks in your application by just adding something
like the following:
```java
public class InstrumentedResourceLeakDetector<T> extends
ResourceLeakDetector<T> {
@Monitor("InstanceLeakCounter")
private final AtomicInteger instancesLeakCounter;
@Monitor("LeakCounter")
private final AtomicInteger leakCounter;
public InstrumentedResourceLeakDetector(Class<T> resource) {
super(resource);
this.instancesLeakCounter = new AtomicInteger();
this.leakCounter = new AtomicInteger();
}
@Override
protected void reportTracedLeak(String records) {
super.reportTracedLeak(records);
leakCounter.incrementAndGet();
}
@Override
protected void reportUntracedLeak() {
super.reportUntracedLeak();
leakCounter.incrementAndGet();
}
@Override
protected void reportInstancesLeak() {
super.reportInstancesLeak();
instancesLeakCounter.incrementAndGet();
}
}
```
Motivation:
ResourceLeakDetector enforces a limit as to how large the queue is allowed to grow for stack traces in order to keep memory from growing too large. However it is not always clear when records are dropped, or how many have been dropped. This can make interpreting leak reports more difficult if you assume all information is present when it may not be. Also we should increase the limit (currently 4) when running tests on the CI servers.
Modifications:
- Increase leak detector record limit on CI servers from 4 to 32.
- Track how many records have been discarded and disclose this in the leak report.
Result:
Leak reports clarify how many records were dropped, and how to increase the limit.
Motivation:
Modulo operations are slow, we can use bitwise operation to detect if resource leak detection must be done while sampling.
Modifications:
- Ensure the interval is a power of two
- Use bitwise operation for sampling
- Add benchmark.
Result:
Faster sampling.
Motivation:
Leak detector, when it detects a leak, will print the last 5 stack
traces that touched the ByteBuf. In some cases that might not be enough
to identify the root cause of the leak.
Also, sometimes users might not be interested in tracing all the
operations on the buffer, but just the ones that are affecting the
reference count.
Modifications:
Added command line properties to override default values:
* Allow to configure max number of stack traces to collect
* Allow to only record retain/release operation on buffers
Result:
Users can increase the number of stack traces to debug buffer leaks
with lot of retain/release operations.
Motivation:
When a user sees an error message, sometimes he or she does not know
what exactly he or she has to do to fix the problem.
Modifications:
Log the URL of the wiki pages that might help the user troubleshoot.
Result:
We are more friendly.
Motivation:
Found performance issues via FindBugs and PMD.
Modifications:
- Removed unnecessary boxing/unboxing operations in DefaultTextHeaders.convertToInt(CharSequence) and DefaultTextHeaders.convertToLong(CharSequence). A boxed primitive is created from a string, just to extract the unboxed primitive value.
- Added a static modifier for DefaultHttp2Connection.ParentChangedEvent class. This class is an inner class, but does not use its embedded reference to the object which created it. This reference makes the instances of the class larger, and may keep the reference to the creator object alive longer than necessary.
- Added a static compiled Pattern to avoid compile it each time it is used when we need to replace some part of authority.
- Improved using of StringBuilders.
Result:
Performance improvements.
- Related: #2163
- Add ResourceLeakHint to allow a user to provide a meaningful information about the leak when touching it
- DefaultChannelHandlerContext now implements ResourceLeakHint to tell where the message is going.
- Cleaner resource leak report by excluding noisy stack trace elements
- Remove the reference to ResourceLeak from the buffer implementations
and use wrappers instead:
- SimpleLeakAwareByteBuf and AdvancedLeakAwareByteBuf
- It is now allocator's responsibility to create a leak-aware buffer.
- Added AbstractByteBufAllocator.toLeakAwareBuffer() for easier
implementation
- Add WrappedByteBuf to reduce duplication between *LeakAwareByteBuf and
UnreleasableByteBuf
- Raise the level of leak reports to ERROR - because it will break the
app eventually
- Replace enabled/disabled property with the leak detection level
- Only print stack trace when level is ADVANCED or above to avoid user
confusion
- Add the 'leak' build profile, which enables highly detailed leak
reporting during the build
- Remove ResourceLeakException which is unsed anymore
The API changes made so far turned out to increase the memory footprint
and consumption while our intention was actually decreasing them.
Memory consumption issue:
When there are many connections which does not exchange data frequently,
the old Netty 4 API spent a lot more memory than 3 because it always
allocates per-handler buffer for each connection unless otherwise
explicitly stated by a user. In a usual real world load, a client
doesn't always send requests without pausing, so the idea of having a
buffer whose life cycle if bound to the life cycle of a connection
didn't work as expected.
Memory footprint issue:
The old Netty 4 API decreased overall memory footprint by a great deal
in many cases. It was mainly because the old Netty 4 API did not
allocate a new buffer and event object for each read. Instead, it
created a new buffer for each handler in a pipeline. This works pretty
well as long as the number of handlers in a pipeline is only a few.
However, for a highly modular application with many handlers which
handles connections which lasts for relatively short period, it actually
makes the memory footprint issue much worse.
Changes:
All in all, this is about retaining all the good changes we made in 4 so
far such as better thread model and going back to the way how we dealt
with message events in 3.
To fix the memory consumption/footprint issue mentioned above, we made a
hard decision to break the backward compatibility again with the
following changes:
- Remove MessageBuf
- Merge Buf into ByteBuf
- Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler
- Similar changes were made to the adapter classes
- Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler
- Similar changes were made to the adapter classes
- Introduce MessageList which is similar to `MessageEvent` in Netty 3
- Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList)
- Replace flush(ctx, promise) with write(ctx, MessageList, promise)
- Remove ByteToByteEncoder/Decoder/Codec
- Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf>
- Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel
- Add SimpleChannelInboundHandler which is sometimes more useful than
ChannelInboundHandlerAdapter
- Bring back Channel.isWritable() from Netty 3
- Add ChannelInboundHandler.channelWritabilityChanges() event
- Add RecvByteBufAllocator configuration property
- Similar to ReceiveBufferSizePredictor in Netty 3
- Some existing configuration properties such as
DatagramChannelConfig.receivePacketSize is gone now.
- Remove suspend/resumeIntermediaryDeallocation() in ByteBuf
This change would have been impossible without @normanmaurer's help. He
fixed, ported, and improved many parts of the changes.