Commit Graph

212 Commits

Author SHA1 Message Date
Norman Maurer
ad6e4fcb10 [maven-release-plugin] prepare for next development iteration 2018-02-05 14:31:57 +00:00
Norman Maurer
a15dd48862 [maven-release-plugin] prepare release netty-4.0.56.Final 2018-02-05 14:31:39 +00:00
Norman Maurer
3b57a73602 Use FastThreadLocal for CodecOutputList
Motivation:

We used Recycler for the CodecOutputList which is not optimized for the use-case of access only from the same Thread all the time.

Modifications:

- Use FastThreadLocal for CodecOutputList
- Add benchmark

Result:

Less overhead in our codecs.
2018-01-23 11:34:52 +01:00
Norman Maurer
db6f9f4f26 [maven-release-plugin] prepare for next development iteration 2018-01-21 18:02:00 +00:00
Norman Maurer
8a4654ae9f [maven-release-plugin] prepare release netty-4.0.55.Final 2018-01-21 18:01:42 +00:00
Norman Maurer
692ce0c288 [maven-release-plugin] prepare for next development iteration 2017-12-08 14:30:35 +00:00
Norman Maurer
ae43640088 [maven-release-plugin] prepare release netty-4.0.54.Final 2017-12-08 14:30:20 +00:00
Norman Maurer
d23feba2fa [maven-release-plugin] prepare for next development iteration 2017-11-09 00:08:30 +00:00
Norman Maurer
f571151794 [maven-release-plugin] prepare release netty-4.0.53.Final 2017-11-09 00:05:39 +00:00
Carl Mastrangelo
45a6bd399e Optimistically update ref counts
Motivation:
Highly retained and released objects have contention on their ref
count.  Currently, the ref count is updated using compareAndSet
with care to make sure the count doesn't overflow, double free, or
revive the object.

Profiling has shown that a non trivial (~1%) of CPU time on gRPC
latency benchmarks is from the ref count updating.

Modification:
Rather than pessimistically assuming the ref count will be invalid,
optimistically update it assuming it will be.  If the update was
wrong, then use the slow path to revert the change and throw an
execption.  Most of the time, the ref counts are correct.

This changes from using compareAndSet to getAndAdd, which emits a
different CPU instruction on x86 (CMPXCHG to XADD).  Because the
CPU knows it will modifiy the memory, it can avoid contention.

On a highly contended machine, this can be about 2x faster.

There is a downside to the new approach.  The ref counters can
temporarily enter invalid states if over retained or over released.
The code does handle these overflow and underflow scenarios, but it
is possible that another concurrent access may push the failure to
a different location.  For example:

Time 1 Thread 1: obj.retain(INT_MAX - 1)
Time 2 Thread 1: obj.retain(2)
Time 2 Thread 2: obj.retain(1)

Previously Thread 2 would always succeed and Thread 1 would always
fail on the second access.  Now, thread 2 could fail while thread 1
is rolling back its change.

====

There are a few reasons why I think this is okay:

1. Buggy code is going to have bugs.  An exception _is_ going to be
   thrown.  This just causes the other threads to notice the state
   is messed up and stop early.
2. If high retention counts are a use case, then ref count should
   be a long rather than an int.
3. The critical section is greatly reduced compared to the previous
   version, so the likelihood of this happening is lower
4. On error, the code always rollsback the change atomically, so
   there is no possibility of corruption.

Result:
Faster refcounting

```
BEFORE:

Benchmark                                                                                             (delay)    Mode      Cnt         Score    Error  Units
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                            1  sample  2901361       804.579 ±  1.835  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                           10  sample  3038729       785.376 ± 16.471  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                          100  sample  2899401       817.392 ±  6.668  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                         1000  sample  3650566      2077.700 ±  0.600  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                        10000  sample  3005467     19949.334 ±  4.243  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                          1  sample   456091        48.610 ±  1.162  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                         10  sample   732051        62.599 ±  0.815  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                        100  sample   778925       228.629 ±  1.205  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                       1000  sample   633682      2002.987 ±  2.856  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                      10000  sample   506442     19735.345 ± 12.312  ns/op

AFTER:
Benchmark                                                                                             (delay)    Mode      Cnt         Score    Error  Units
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                            1  sample  3761980       383.436 ±  1.315  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                           10  sample  3667304       474.429 ±  1.101  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                          100  sample  3039374       479.267 ±  0.435  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                         1000  sample  3709210      2044.603 ±  0.989  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                        10000  sample  3011591     19904.227 ± 18.025  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                          1  sample   494975        52.269 ±  8.345  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                         10  sample   771094        62.290 ±  0.795  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                        100  sample   763230       235.044 ±  1.552  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                       1000  sample   634037      2006.578 ±  3.574  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                      10000  sample   506284     19742.605 ± 13.729  ns/op

```
2017-10-04 08:31:27 +02:00
Norman Maurer
b30d73e013 [maven-release-plugin] prepare for next development iteration 2017-09-21 19:47:23 +00:00
Norman Maurer
4e9a6e5ab6 [maven-release-plugin] prepare release netty-4.0.52.Final 2017-09-21 19:47:02 +00:00
Norman Maurer
4560197923 Reduce performance overhead of ResourceLeakDetector
Motiviation:

The ResourceLeakDetector helps to detect and troubleshoot resource leaks and is often used even in production enviroments with a low level. Because of this its import that we try to keep the overhead as low as overhead. Most of the times no leak is detected (as all is correctly handled) so we should keep the overhead for this case as low as possible.

Modifications:

- Only call getStackTrace() if a leak is reported as it is a very expensive native call. Also handle the filtering and creating of the String in a lazy fashion
- Remove the need to mantain a Queue to store the last access records
- Add benchmark

Result:

Huge decrease of performance overhead.

Before the patch:

Benchmark                                           (recordTimes)   Mode  Cnt     Score     Error  Units
ResourceLeakDetectorRecordBenchmark.record                      8  thrpt   20  4358.367 ± 116.419  ops/s
ResourceLeakDetectorRecordBenchmark.record                     16  thrpt   20  2306.027 ±  55.044  ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint              8  thrpt   20  4220.979 ± 114.046  ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint             16  thrpt   20  2250.734 ±  55.352  ops/s

With this patch:

Benchmark                                           (recordTimes)   Mode  Cnt      Score      Error  Units
ResourceLeakDetectorRecordBenchmark.record                      8  thrpt   20  71398.957 ± 2695.925  ops/s
ResourceLeakDetectorRecordBenchmark.record                     16  thrpt   20  38643.963 ± 1446.694  ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint              8  thrpt   20  71677.882 ± 2923.622  ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint             16  thrpt   20  38660.176 ± 1467.732  ops/s
2017-09-18 16:48:27 -07:00
Norman Maurer
a27624a77b [maven-release-plugin] prepare for next development iteration 2017-08-24 12:47:31 +00:00
Norman Maurer
cf89fb78b8 [maven-release-plugin] prepare release netty-4.0.51.Final 2017-08-24 12:46:31 +00:00
Norman Maurer
d0d1105e45 [maven-release-plugin] prepare for next development iteration 2017-08-02 20:29:15 +02:00
Norman Maurer
5d304e9521 [maven-release-plugin] prepare release netty-4.0.50.Final 2017-08-02 20:28:37 +02:00
Norman Maurer
dde14d2a65 [maven-release-plugin] prepare for next development iteration 2017-07-06 07:37:47 +02:00
Norman Maurer
1e50efb615 [maven-release-plugin] prepare release netty-4.0.49.Final 2017-07-06 07:37:30 +02:00
Norman Maurer
7aa8ad1841 [maven-release-plugin] prepare for next development iteration 2017-06-09 11:23:06 +02:00
Norman Maurer
b6be3a77bc [maven-release-plugin] prepare release netty-4.0.48.Final 2017-06-09 11:22:25 +02:00
Nikolay Fedorovskikh
57948458e7 Optimizations in NetUtil
Motivation:

IPv4/6 validation methods use allocations, which can be avoided.
IPv4 parse method use StringTokenizer.

Modifications:

Rewriting IPv4/6 validation methods to avoid allocations.
Rewriting IPv4 parse method without use StringTokenizer.

Result:

IPv4/6 validation and IPv4 parsing faster up to 2-10x.
2017-05-18 16:51:44 -07:00
Norman Maurer
c9b5415c91 [maven-release-plugin] prepare for next development iteration 2017-05-11 12:26:35 +02:00
Norman Maurer
9c432f8ae1 [maven-release-plugin] prepare release netty-4.0.47.Final 2017-05-11 12:26:15 +02:00
Norman Maurer
8d73e2637a [maven-release-plugin] prepare for next development iteration 2017-04-29 15:21:48 +02:00
Norman Maurer
cdc6671828 [maven-release-plugin] prepare release netty-4.0.46.Final 2017-04-29 15:21:21 +02:00
Nikolay Fedorovskikh
0444d4e165 fix the typos 2017-04-20 05:19:06 +02:00
Norman Maurer
ee198f9c35 Add 'io.netty.tryAllocateUninitializedArray' system property which allows to allocate byte[] without memset in Java9+
Motivation:

Java9 added a new method to Unsafe which allows to allocate a byte[] without memset it. This can have a massive impact in allocation times when the byte[] is big. This change allows to enable this when using Java9 with the io.netty.tryAllocateUninitializedArray property when running Java9+. Please note that you will need to open up the jdk.internal.misc package via '--add-opens java.base/jdk.internal.misc=ALL-UNNAMED' as well.

Modifications:

Allow to allocate byte[] without memset on Java9+

Result:

Better performance when allocate big heap buffers and using java9.
2017-04-19 11:53:12 +02:00
Norman Maurer
577757198b [maven-release-plugin] prepare for next development iteration 2017-03-10 09:37:31 +01:00
Norman Maurer
f994184afd [maven-release-plugin] prepare release netty-4.0.45.Final 2017-03-10 09:02:39 +01:00
Norman Maurer
57fd316a82 Allow to obtain informations of used direct and heap memory for ByteBufAllocator implementations
Motivation:

Often its useful for the user to be able to get some stats about the memory allocated via an allocator.

Modifications:

- Allow to obtain the used heap and direct memory for an allocator
- Add test case

Result:

Fixes [#6341]
2017-03-01 18:56:16 +01:00
Norman Maurer
b372a2f19f Add benchmarks for UnpooledUnsafeNoCleanerDirectByteBuf vs UnpooledUnsafeDirectByteBuf
Motivation:

Issue [#6349] brought up the idea to not use UnpooledUnsafeNoCleanerDirectByteBuf by default. To decide what to do a benchmark is needed.

Modifications:

Add benchmarks for UnpooledUnsafeNoCleanerDirectByteBuf vs UnpooledUnsafeDirectB
yteBuf

Result:

Better idea about impact of using UnpooledUnsafeNoCleanerDirectByteBuf.
2017-02-27 20:04:24 +01:00
Norman Maurer
95b4814e3a Add benchmarks for SSLEngine implementations
Motivation:

As we provide our own SSLEngine implementation we should have benchmarks to compare it against JDK impl.

Modifications:

Add benchmarks for wrap / unwrap and handshake performance.

Result:

Benchmarks FTW.
2017-02-24 08:07:38 +01:00
Norman Maurer
00dde3224c Move all the microbenchmark code into src/main/java
Motivation:

All our benchmarks are in src/main/java in 4.1.x . We should make it consistent.

Modifications:

Move everything to src/main/java

Result:

Consistent code base.
2017-02-24 08:00:32 +01:00
Norman Maurer
ab3ee48fc5 Move benchmark to the correct folder after bad cherry-pick of 2f0b07975e 2017-02-15 19:55:37 +01:00
Norman Maurer
cc848f6960 Update to latest jmh version
Motivation:

We use an outdated jmh version.

Modifications:

Update to jmh 1.17.4.

Result:

Using latest jmh version.
2017-02-14 08:41:30 +01:00
Kiril Menshikov
2f0b07975e Allow to allign allocated Buffers
Motivation:

64-byte alignment is recommended by the Intel performance guide (https://software.intel.com/en-us/articles/practical-intel-avx-optimization-on-2nd-generation-intel-core-processors) for data-structures over 64 bytes.
Requiring padding to a multiple of 64 bytes allows for using SIMD instructions consistently in loops without additional conditional checks. This should allow for simpler and more efficient code.

Modification:

At the moment cache alignment must be setup manually. But probably it might be taken from the system. The original code was introduced by @normanmaurer https://github.com/netty/netty/pull/4726/files

Result:

Buffer alignment works better than miss-align cache.
2017-02-06 09:51:06 +01:00
Norman Maurer
0fbad09535 [maven-release-plugin] prepare for next development iteration 2017-01-30 17:42:39 +01:00
Norman Maurer
452812a62d [maven-release-plugin] prepare release netty-4.0.44.Final 2017-01-30 17:42:07 +01:00
Tim Brooks
095be39826 Wrap operations requiring SocketPermission with doPrivileged blocks
Motivation:

Currently Netty does not wrap socket connect, bind, or accept
operations in doPrivileged blocks. Nor does it wrap cases where a dns
lookup might happen.

This prevents an application utilizing the SecurityManager from
isolating SocketPermissions to Netty.

Modifications:

I have introduced a class (SocketUtils) that wraps operations
requiring SocketPermissions in doPrivileged blocks.

Result:

A user of Netty can grant SocketPermissions explicitly to the Netty
jar, without granting it to the rest of their application.
2017-01-19 21:23:28 +01:00
Norman Maurer
a2b8646b5f [maven-release-plugin] prepare for next development iteration 2017-01-12 13:25:14 +01:00
Norman Maurer
91a0bdc17a [maven-release-plugin] prepare release netty-4.0.43.Final 2017-01-12 13:05:33 +01:00
Norman Maurer
50a11d964d [maven-release-plugin] prepare for next development iteration 2016-10-14 14:32:28 +02:00
Norman Maurer
73306e017d [maven-release-plugin] prepare release netty-4.0.42.Final 2016-10-14 14:31:27 +02:00
Masaru Nomura
368701f528 Upgrade JMH to 1.14.1
Motivation:
It'd be usually good to use the latest library version.

Modification:
Bumped JMH to the latest version as of today.

Result:
Now we use JMH version 1.14.1 for our benchmark.
2016-09-29 21:31:27 +02:00
Norman Maurer
3b86867992 [maven-release-plugin] prepare for next development iteration 2016-08-26 08:36:54 +02:00
Norman Maurer
8bdfc9ce39 [maven-release-plugin] prepare release netty-4.0.41.Final 2016-08-26 06:51:15 +02:00
Norman Maurer
e015dfaea2 [maven-release-plugin] prepare for next development iteration 2016-07-27 10:47:03 +02:00
Norman Maurer
837d9947ec [maven-release-plugin] prepare release netty-4.0.40.Final 2016-07-27 10:30:08 +02:00
Norman Maurer
45f9d29fc1 [maven-release-plugin] prepare for next development iteration 2016-07-15 07:10:09 +02:00