netty5

Author	SHA1	Message	Date
Carl Mastrangelo	45a6bd399e	Optimistically update ref counts Motivation: Highly retained and released objects have contention on their ref count. Currently, the ref count is updated using compareAndSet with care to make sure the count doesn't overflow, double free, or revive the object. Profiling has shown that a non trivial (~1%) of CPU time on gRPC latency benchmarks is from the ref count updating. Modification: Rather than pessimistically assuming the ref count will be invalid, optimistically update it assuming it will be. If the update was wrong, then use the slow path to revert the change and throw an execption. Most of the time, the ref counts are correct. This changes from using compareAndSet to getAndAdd, which emits a different CPU instruction on x86 (CMPXCHG to XADD). Because the CPU knows it will modifiy the memory, it can avoid contention. On a highly contended machine, this can be about 2x faster. There is a downside to the new approach. The ref counters can temporarily enter invalid states if over retained or over released. The code does handle these overflow and underflow scenarios, but it is possible that another concurrent access may push the failure to a different location. For example: Time 1 Thread 1: obj.retain(INT_MAX - 1) Time 2 Thread 1: obj.retain(2) Time 2 Thread 2: obj.retain(1) Previously Thread 2 would always succeed and Thread 1 would always fail on the second access. Now, thread 2 could fail while thread 1 is rolling back its change. ==== There are a few reasons why I think this is okay: 1. Buggy code is going to have bugs. An exception _is_ going to be thrown. This just causes the other threads to notice the state is messed up and stop early. 2. If high retention counts are a use case, then ref count should be a long rather than an int. 3. The critical section is greatly reduced compared to the previous version, so the likelihood of this happening is lower 4. On error, the code always rollsback the change atomically, so there is no possibility of corruption. Result: Faster refcounting ``` BEFORE: Benchmark (delay) Mode Cnt Score Error Units AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1 sample 2901361 804.579 ± 1.835 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10 sample 3038729 785.376 ± 16.471 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 100 sample 2899401 817.392 ± 6.668 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1000 sample 3650566 2077.700 ± 0.600 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10000 sample 3005467 19949.334 ± 4.243 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1 sample 456091 48.610 ± 1.162 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10 sample 732051 62.599 ± 0.815 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 100 sample 778925 228.629 ± 1.205 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1000 sample 633682 2002.987 ± 2.856 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10000 sample 506442 19735.345 ± 12.312 ns/op AFTER: Benchmark (delay) Mode Cnt Score Error Units AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1 sample 3761980 383.436 ± 1.315 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10 sample 3667304 474.429 ± 1.101 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 100 sample 3039374 479.267 ± 0.435 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1000 sample 3709210 2044.603 ± 0.989 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10000 sample 3011591 19904.227 ± 18.025 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1 sample 494975 52.269 ± 8.345 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10 sample 771094 62.290 ± 0.795 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 100 sample 763230 235.044 ± 1.552 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1000 sample 634037 2006.578 ± 3.574 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10000 sample 506284 19742.605 ± 13.729 ns/op ```	2017-10-04 08:31:27 +02:00
Norman Maurer	b30d73e013	[maven-release-plugin] prepare for next development iteration	2017-09-21 19:47:23 +00:00
Norman Maurer	4e9a6e5ab6	[maven-release-plugin] prepare release netty-4.0.52.Final	2017-09-21 19:47:02 +00:00
Norman Maurer	4560197923	Reduce performance overhead of ResourceLeakDetector Motiviation: The ResourceLeakDetector helps to detect and troubleshoot resource leaks and is often used even in production enviroments with a low level. Because of this its import that we try to keep the overhead as low as overhead. Most of the times no leak is detected (as all is correctly handled) so we should keep the overhead for this case as low as possible. Modifications: - Only call getStackTrace() if a leak is reported as it is a very expensive native call. Also handle the filtering and creating of the String in a lazy fashion - Remove the need to mantain a Queue to store the last access records - Add benchmark Result: Huge decrease of performance overhead. Before the patch: Benchmark (recordTimes) Mode Cnt Score Error Units ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 4358.367 ± 116.419 ops/s ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 2306.027 ± 55.044 ops/s ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 4220.979 ± 114.046 ops/s ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 2250.734 ± 55.352 ops/s With this patch: Benchmark (recordTimes) Mode Cnt Score Error Units ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 71398.957 ± 2695.925 ops/s ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 38643.963 ± 1446.694 ops/s ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 71677.882 ± 2923.622 ops/s ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 38660.176 ± 1467.732 ops/s	2017-09-18 16:48:27 -07:00
Norman Maurer	a27624a77b	[maven-release-plugin] prepare for next development iteration	2017-08-24 12:47:31 +00:00
Norman Maurer	cf89fb78b8	[maven-release-plugin] prepare release netty-4.0.51.Final	2017-08-24 12:46:31 +00:00
Norman Maurer	d0d1105e45	[maven-release-plugin] prepare for next development iteration	2017-08-02 20:29:15 +02:00
Norman Maurer	5d304e9521	[maven-release-plugin] prepare release netty-4.0.50.Final	2017-08-02 20:28:37 +02:00
Norman Maurer	dde14d2a65	[maven-release-plugin] prepare for next development iteration	2017-07-06 07:37:47 +02:00
Norman Maurer	1e50efb615	[maven-release-plugin] prepare release netty-4.0.49.Final	2017-07-06 07:37:30 +02:00
Norman Maurer	7aa8ad1841	[maven-release-plugin] prepare for next development iteration	2017-06-09 11:23:06 +02:00
Norman Maurer	b6be3a77bc	[maven-release-plugin] prepare release netty-4.0.48.Final	2017-06-09 11:22:25 +02:00
Nikolay Fedorovskikh	57948458e7	Optimizations in NetUtil Motivation: IPv4/6 validation methods use allocations, which can be avoided. IPv4 parse method use StringTokenizer. Modifications: Rewriting IPv4/6 validation methods to avoid allocations. Rewriting IPv4 parse method without use StringTokenizer. Result: IPv4/6 validation and IPv4 parsing faster up to 2-10x.	2017-05-18 16:51:44 -07:00
Norman Maurer	c9b5415c91	[maven-release-plugin] prepare for next development iteration	2017-05-11 12:26:35 +02:00
Norman Maurer	9c432f8ae1	[maven-release-plugin] prepare release netty-4.0.47.Final	2017-05-11 12:26:15 +02:00
Norman Maurer	8d73e2637a	[maven-release-plugin] prepare for next development iteration	2017-04-29 15:21:48 +02:00
Norman Maurer	cdc6671828	[maven-release-plugin] prepare release netty-4.0.46.Final	2017-04-29 15:21:21 +02:00
Nikolay Fedorovskikh	0444d4e165	fix the typos	2017-04-20 05:19:06 +02:00
Norman Maurer	ee198f9c35	Add 'io.netty.tryAllocateUninitializedArray' system property which allows to allocate byte[] without memset in Java9+ Motivation: Java9 added a new method to Unsafe which allows to allocate a byte[] without memset it. This can have a massive impact in allocation times when the byte[] is big. This change allows to enable this when using Java9 with the io.netty.tryAllocateUninitializedArray property when running Java9+. Please note that you will need to open up the jdk.internal.misc package via '--add-opens java.base/jdk.internal.misc=ALL-UNNAMED' as well. Modifications: Allow to allocate byte[] without memset on Java9+ Result: Better performance when allocate big heap buffers and using java9.	2017-04-19 11:53:12 +02:00
Norman Maurer	577757198b	[maven-release-plugin] prepare for next development iteration	2017-03-10 09:37:31 +01:00
Norman Maurer	f994184afd	[maven-release-plugin] prepare release netty-4.0.45.Final	2017-03-10 09:02:39 +01:00
Norman Maurer	57fd316a82	Allow to obtain informations of used direct and heap memory for ByteBufAllocator implementations Motivation: Often its useful for the user to be able to get some stats about the memory allocated via an allocator. Modifications: - Allow to obtain the used heap and direct memory for an allocator - Add test case Result: Fixes [#6341]	2017-03-01 18:56:16 +01:00
Norman Maurer	b372a2f19f	Add benchmarks for UnpooledUnsafeNoCleanerDirectByteBuf vs UnpooledUnsafeDirectByteBuf Motivation: Issue [#6349] brought up the idea to not use UnpooledUnsafeNoCleanerDirectByteBuf by default. To decide what to do a benchmark is needed. Modifications: Add benchmarks for UnpooledUnsafeNoCleanerDirectByteBuf vs UnpooledUnsafeDirectB yteBuf Result: Better idea about impact of using UnpooledUnsafeNoCleanerDirectByteBuf.	2017-02-27 20:04:24 +01:00
Norman Maurer	95b4814e3a	Add benchmarks for SSLEngine implementations Motivation: As we provide our own SSLEngine implementation we should have benchmarks to compare it against JDK impl. Modifications: Add benchmarks for wrap / unwrap and handshake performance. Result: Benchmarks FTW.	2017-02-24 08:07:38 +01:00
Norman Maurer	00dde3224c	Move all the microbenchmark code into src/main/java Motivation: All our benchmarks are in src/main/java in 4.1.x . We should make it consistent. Modifications: Move everything to src/main/java Result: Consistent code base.	2017-02-24 08:00:32 +01:00
Norman Maurer	ab3ee48fc5	Move benchmark to the correct folder after bad cherry-pick of `2f0b07975e`	2017-02-15 19:55:37 +01:00
Norman Maurer	cc848f6960	Update to latest jmh version Motivation: We use an outdated jmh version. Modifications: Update to jmh 1.17.4. Result: Using latest jmh version.	2017-02-14 08:41:30 +01:00
Kiril Menshikov	2f0b07975e	Allow to allign allocated Buffers Motivation: 64-byte alignment is recommended by the Intel performance guide (https://software.intel.com/en-us/articles/practical-intel-avx-optimization-on-2nd-generation-intel-core-processors) for data-structures over 64 bytes. Requiring padding to a multiple of 64 bytes allows for using SIMD instructions consistently in loops without additional conditional checks. This should allow for simpler and more efficient code. Modification: At the moment cache alignment must be setup manually. But probably it might be taken from the system. The original code was introduced by @normanmaurer https://github.com/netty/netty/pull/4726/files Result: Buffer alignment works better than miss-align cache.	2017-02-06 09:51:06 +01:00
Norman Maurer	0fbad09535	[maven-release-plugin] prepare for next development iteration	2017-01-30 17:42:39 +01:00
Norman Maurer	452812a62d	[maven-release-plugin] prepare release netty-4.0.44.Final	2017-01-30 17:42:07 +01:00
Tim Brooks	095be39826	Wrap operations requiring SocketPermission with doPrivileged blocks Motivation: Currently Netty does not wrap socket connect, bind, or accept operations in doPrivileged blocks. Nor does it wrap cases where a dns lookup might happen. This prevents an application utilizing the SecurityManager from isolating SocketPermissions to Netty. Modifications: I have introduced a class (SocketUtils) that wraps operations requiring SocketPermissions in doPrivileged blocks. Result: A user of Netty can grant SocketPermissions explicitly to the Netty jar, without granting it to the rest of their application.	2017-01-19 21:23:28 +01:00
Norman Maurer	a2b8646b5f	[maven-release-plugin] prepare for next development iteration	2017-01-12 13:25:14 +01:00
Norman Maurer	91a0bdc17a	[maven-release-plugin] prepare release netty-4.0.43.Final	2017-01-12 13:05:33 +01:00
Norman Maurer	50a11d964d	[maven-release-plugin] prepare for next development iteration	2016-10-14 14:32:28 +02:00
Norman Maurer	73306e017d	[maven-release-plugin] prepare release netty-4.0.42.Final	2016-10-14 14:31:27 +02:00
Masaru Nomura	368701f528	Upgrade JMH to 1.14.1 Motivation: It'd be usually good to use the latest library version. Modification: Bumped JMH to the latest version as of today. Result: Now we use JMH version 1.14.1 for our benchmark.	2016-09-29 21:31:27 +02:00
Norman Maurer	3b86867992	[maven-release-plugin] prepare for next development iteration	2016-08-26 08:36:54 +02:00
Norman Maurer	8bdfc9ce39	[maven-release-plugin] prepare release netty-4.0.41.Final	2016-08-26 06:51:15 +02:00
Norman Maurer	e015dfaea2	[maven-release-plugin] prepare for next development iteration	2016-07-27 10:47:03 +02:00
Norman Maurer	837d9947ec	[maven-release-plugin] prepare release netty-4.0.40.Final	2016-07-27 10:30:08 +02:00
Norman Maurer	45f9d29fc1	[maven-release-plugin] prepare for next development iteration	2016-07-15 07:10:09 +02:00
Norman Maurer	38bdf86ba1	[maven-release-plugin] prepare release netty-4.0.39.Final	2016-07-15 07:08:29 +02:00
Norman Maurer	4329e97455	[maven-release-plugin] prepare for next development iteration	2016-07-01 07:59:55 +02:00
Norman Maurer	8642f16f35	[maven-release-plugin] prepare release netty-4.0.38.Final	2016-07-01 07:59:37 +02:00
Norman Maurer	2919145072	[maven-release-plugin] prepare for next development iteration	2016-06-07 20:00:14 +02:00
Norman Maurer	4169779352	[maven-release-plugin] prepare release netty-4.0.37.Final	2016-06-07 19:57:15 +02:00
Norman Maurer	de751ed179	Fix compile error introduced by `0ea4597542`	2016-05-21 20:00:45 +02:00
Norman Maurer	0ea4597542	Introduce CodecOutputList to reduce overhead of encoder/decoder Motivation: `99dfc9ea79` introduced some code that will more frequently try to forward messages out of the list of decoded messages to reduce latency and memory footprint. Unfortunally this has the side-effect that RecycleableArrayList.clear() will be called more often and so introduce some overhead as ArrayList will null out the array on each call. Modifications: - Introduce a CodecOutputList which allows to not null out the array until we recycle it and also allows to access internal array with extra range checks. - Add benchmark that add elements to different List implementations and clear them Result: Less overhead when decode / encode messages. Benchmark (elements) Mode Cnt Score Error Units CodecOutputListBenchmark.arrayList 1 thrpt 20 24853764.609 ± 161582.376 ops/s CodecOutputListBenchmark.arrayList 4 thrpt 20 17310636.508 ± 930517.403 ops/s CodecOutputListBenchmark.codecOutList 1 thrpt 20 26670751.661 ± 587812.655 ops/s CodecOutputListBenchmark.codecOutList 4 thrpt 20 25166421.089 ± 166945.599 ops/s CodecOutputListBenchmark.recyclableArrayList 1 thrpt 20 24565992.626 ± 210017.290 ops/s CodecOutputListBenchmark.recyclableArrayList 4 thrpt 20 18477881.775 ± 157003.777 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 246.748 sec - in io.netty.handler.codec.CodecOutputListBenchmark	2016-05-20 09:58:12 +02:00
Norman Maurer	4b6b167839	[maven-release-plugin] prepare for next development iteration	2016-04-04 16:53:40 +02:00
Norman Maurer	e8fa848f43	[maven-release-plugin] prepare release netty-4.0.36.Final	2016-04-04 16:52:53 +02:00

1 2 3 4 5

203 Commits