Commit Graph

15 Commits

Author SHA1 Message Date
Nick Hill
fedf3ccecb Harden ref-counting concurrency semantics (#8583)
Motivation

#8563 highlighted race conditions introduced by the prior optimistic
update optimization in 83a19d5650. These
were known at the time but considered acceptable given the perf
benefit in high contention scenarios.

This PR proposes a modified approach which provides roughly half the
gains but stronger concurrency semantics. Race conditions still exist
but their scope is narrowed to much less likely cases (releases
coinciding with retain overflow), and even in those
cases certain guarantees are still assured. Once release() returns true,
all subsequent release/retains are guaranteed to throw, and in
particular deallocate will be called at most once.

Modifications

- Use even numbers internally (including -ve) for live refcounts
- "Final" release changes to odd number (equivalent to refcount 0)
- Retain still uses faster getAndAdd, release uses CAS loop
- First CAS attempt uses non-volatile read
- Thread.yield() after a failed CAS provides a net gain

Result

More (though not completely) robust concurrency semantics for ref
counting; increased latency under high contention, but still roughly
twice as fast as the original logic. Bench results to follow
2018-11-29 08:32:32 +01:00
Xiaoyan Lin
a0ed6ec06c Fix the error message in ReferenceCounted.release (#7921)
Motivation:

When a buffer is over-released, the current error message of `IllegalReferenceCountException` is `refCnt: XXX, increment: XXX`, which is confusing. The correct message should be `refCnt: XXX, decrement: XXX`.

Modifications:

Pass `-decrement` to create `IllegalReferenceCountException`.

Result:

The error message will be `refCnt: XXX, decrement: XXX` when a buffer is over-released.
2018-05-08 20:09:16 +02:00
Carl Mastrangelo
83a19d5650 Optimistically update ref counts
Motivation:
Highly retained and released objects have contention on their ref
count.  Currently, the ref count is updated using compareAndSet
with care to make sure the count doesn't overflow, double free, or
revive the object.

Profiling has shown that a non trivial (~1%) of CPU time on gRPC
latency benchmarks is from the ref count updating.

Modification:
Rather than pessimistically assuming the ref count will be invalid,
optimistically update it assuming it will be.  If the update was
wrong, then use the slow path to revert the change and throw an
execption.  Most of the time, the ref counts are correct.

This changes from using compareAndSet to getAndAdd, which emits a
different CPU instruction on x86 (CMPXCHG to XADD).  Because the
CPU knows it will modifiy the memory, it can avoid contention.

On a highly contended machine, this can be about 2x faster.

There is a downside to the new approach.  The ref counters can
temporarily enter invalid states if over retained or over released.
The code does handle these overflow and underflow scenarios, but it
is possible that another concurrent access may push the failure to
a different location.  For example:

Time 1 Thread 1: obj.retain(INT_MAX - 1)
Time 2 Thread 1: obj.retain(2)
Time 2 Thread 2: obj.retain(1)

Previously Thread 2 would always succeed and Thread 1 would always
fail on the second access.  Now, thread 2 could fail while thread 1
is rolling back its change.

====

There are a few reasons why I think this is okay:

1. Buggy code is going to have bugs.  An exception _is_ going to be
   thrown.  This just causes the other threads to notice the state
   is messed up and stop early.
2. If high retention counts are a use case, then ref count should
   be a long rather than an int.
3. The critical section is greatly reduced compared to the previous
   version, so the likelihood of this happening is lower
4. On error, the code always rollsback the change atomically, so
   there is no possibility of corruption.

Result:
Faster refcounting

```
BEFORE:

Benchmark                                                                                             (delay)    Mode      Cnt         Score    Error  Units
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                            1  sample  2901361       804.579 ±  1.835  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                           10  sample  3038729       785.376 ± 16.471  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                          100  sample  2899401       817.392 ±  6.668  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                         1000  sample  3650566      2077.700 ±  0.600  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                        10000  sample  3005467     19949.334 ±  4.243  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                          1  sample   456091        48.610 ±  1.162  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                         10  sample   732051        62.599 ±  0.815  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                        100  sample   778925       228.629 ±  1.205  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                       1000  sample   633682      2002.987 ±  2.856  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                      10000  sample   506442     19735.345 ± 12.312  ns/op

AFTER:
Benchmark                                                                                             (delay)    Mode      Cnt         Score    Error  Units
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                            1  sample  3761980       383.436 ±  1.315  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                           10  sample  3667304       474.429 ±  1.101  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                          100  sample  3039374       479.267 ±  0.435  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                         1000  sample  3709210      2044.603 ±  0.989  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                        10000  sample  3011591     19904.227 ± 18.025  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                          1  sample   494975        52.269 ±  8.345  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                         10  sample   771094        62.290 ±  0.795  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                        100  sample   763230       235.044 ±  1.552  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                       1000  sample   634037      2006.578 ±  3.574  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                      10000  sample   506284     19742.605 ± 13.729  ns/op

```
2017-10-04 08:42:33 +02:00
Carl Mastrangelo
7528e5a11e Use threadsafe setter on Atomic Updaters
Motivation:
The documentation for field updates says:

> Note that the guarantees of the {@code compareAndSet}
> method in this class are weaker than in other atomic classes.
> Because this class cannot ensure that all uses of the field
> are appropriate for purposes of atomic access, it can
> guarantee atomicity only with respect to other invocations of
> {@code compareAndSet} and {@code set} on the same updater.

This implies that volatiles shouldn't use normal assignment; the
updater should set them.

Modifications:
Use setter for field updaters that make use of compareAndSet.

Result:
Concurrency compliant code
2017-08-31 10:14:40 +02:00
Norman Maurer
89e93968ac Remove usage of own Atomic*FieldUpdater in favor of JDKs
Motivation:

In later Java8 versions our Atomic*FieldUpdater are slower then the JDK implementations so we should not use ours anymore. Even worse the JDK implementations provide for example an optimized version of addAndGet(...) using intrinsics which makes it a lot faster for this use-case.

Modifications:

- Remove methods that return our own Atomic*FieldUpdaters.
- Use the JDK implementations everywhere.

Result:

Faster code.
2016-12-15 08:09:06 +00:00
Norman Maurer
3103f0551c Share code between retain(...) and release(...) implementations.
Motivation:

We can share the code in retain() and retain(...) and also in release() and release(...).

Modifications:

Share code.

Result:

Less duplicated code.
2016-09-02 21:53:10 +02:00
Norman Maurer
d44017189e Remove extra conditional check in retain
Motivation:

We not need to do an extra conditional check in retain(...) as we can just check for overflow after we did the increment.

Modifications:

- Remove extra conditional check
- Add test code.

Result:

One conditional check less.
2016-08-05 13:09:26 +02:00
Scott Mitchell
01523e7835 Reduce conditionals in AbstractReferenceCounted
Motivation:
AbstractReferenceCounted as independent conditional statements to check the bounds of the retain IllegalReferenceCountException condition. One of the exceptions also uses the incorrect increment.

Modifications:
- Combined independent conditional checks into 1 where possible
- Correct IllegalReferenceCountException with incorrect increment
- Remove the subtract to check for overflow and re-use the addition and check for overflow to remove 1 arithmetic operation (see http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.18.2)

Result:
AbstractReferenceCounted has less independent branch statements and more correct IllegalReferenceCountException. Compilation size of AbstractReferenceCounted.retain() is reduced from 58 bytes to 47 bytes.
2016-07-25 12:25:55 -07:00
Trustin Lee
d0912f2709 Fix most inspector warnings
Motivation:

It's good to minimize potentially broken windows.

Modifications:

Fix most inspector warnings from our profile
Update IntObjectHashMap

Result:

Cleaner code
2014-07-02 19:55:07 +09:00
Norman Maurer
e3c76ec106 DNS codec for Netty which is based on the work of [#1622].
Motivation:
As part of GSOC 2013 we had @mbakkar working on a DNS codec but did not integrate it yet as it needs some cleanup. This commit is based on @mbakkar's work and provide the codec for DNS.

Modifications:
Add DNS codec

Result:
Reusable DNS codec will be included in netty.

This PR also includes a AsynchronousDnsResolver which allows to resolve DNS entries in a non blocking way by make use
of the dns codec and netty transport itself.
2014-06-10 09:57:06 +02:00
Trustin Lee
8837afddf8 Enable a user specify an arbitrary information with ReferenceCounted.touch()
- Related: #2163
- Add ResourceLeakHint to allow a user to provide a meaningful information about the leak when touching it
- DefaultChannelHandlerContext now implements ResourceLeakHint to tell where the message is going.
- Cleaner resource leak report by excluding noisy stack trace elements
2014-02-13 18:16:25 -08:00
Norman Maurer
f23d68b42f [#2187] Always do a volatile read on the refCnt 2014-02-07 09:23:16 +01:00
Norman Maurer
9bee78f91c Provide an optimized AtomicIntegerFieldUpdater, AtomicLongFieldUpdater and AtomicReferenceFieldUpdater 2014-02-06 20:08:45 +01:00
Trustin Lee
6d1cd0d0cd ReferenceCountException -> IllegalReferenceCountException 2013-06-13 14:00:15 +09:00
Trustin Lee
175526b6bd Move ReferenceCounted and AbstractReferenceCounted to io.netty.util
- Fixes #1441
- Also move and rename IllegalBufferAccessException to ReferenceCountException
- Prettier reference count exception messages
2013-06-13 13:14:21 +09:00