netty5

Author	SHA1	Message	Date
buchgr	e12613a018	Fix performance regression in FastThreadLocal microbenchmark. Fixes #4402 Motivation: As reported in #4402, the FastThreadLocalBenchmark shows that the JDK ThreadLocal is actually faster than Netty's custom thread local implementation. I was looking forward to doing some deep digging, but got disappointed :(. Modifications: The microbenchmark was not using FastThreadLocalThreads and would thus always hit the slow path. I updated the JMH command line flags, so that FastThreadLocalThreads would be used. Result: FastThreadLocalBenchmark shows FastThreadLocal to be faster than JDK's ThreadLocal implementation, by about 56% in this particular benchmark. Run on OSX El Capitan with OpenJDK 1.8u60. Benchmark Mode Cnt Score Error Units FastThreadLocalBenchmark.fastThreadLocal thrpt 20 55452.027 ± 725.713 ops/s FastThreadLocalBenchmark.jdkThreadLocalGet thrpt 20 35481.888 ± 1471.647 ops/s	2015-10-29 21:46:04 +01:00
Norman Maurer	0c8fe18d3c	Add benchmark for HeapByteBuf implementations. Motivation: To prove one implementation is faster as the other we should have a benchmark. Modifications: Add benchmark which benchmarks the unsafe and non-unsafe implementation of HeapByteBuf. Result: Able to compare speed of implementations easily.	2015-10-29 19:38:33 +01:00
Norman Maurer	577931e8bc	Use bitwise operation when sampling for resource leak detection. Motivation: Modulo operations are slow, we can use bitwise operation to detect if resource leak detection must be done while sampling. Modifications: - Ensure the interval is a power of two - Use bitwise operation for sampling - Add benchmark. Result: Faster sampling.	2015-10-29 19:18:06 +01:00
Norman Maurer	291674262c	Added SlicedAbstractByteBuf that can provide fast-path for _get* and _set* methods Motivation: SlicedByteBuf can be used for any ByteBuf implementations and so can not do any optimizations that could be done when AbstractByteBuf is sliced. Modifications: - Add SlicedAbstractByteBuf that can eliminate range and reference count checks for _get* and _set* methods. Result: Faster SlicedByteBuf implementations for AbstractByteBuf sub-classes.	2015-10-16 08:59:58 +02:00
Norman Maurer	054af70fed	Minimize object allocation when calling AbstractByteBuf.toString(..., Charset) Motivation: Calling AbstractByteBuf.toString(..., Charset) is used quite frequently by users but produce a lot of GC. Modification: - Use a FastThreadLocal to store the CharBuffer that are needed for decoding. - Use internalNioBuffer(...) when possible Result: Less object creation / Less GC	2015-10-15 17:49:21 +02:00
Norman Maurer	1103379e02	Allow to disable reference count checks on every access of the ByteBuf Motiviation: Checking reference count on every access on a ByteBuf can have some big performance overhead depending on how the access pattern is. If the user is sure that there are no reference count errors on his side it should be possible to disable the check and so gain the max performance. Modification: - Add io.netty.buffer.bytebuf.checkAccessible system property which allows to disable the checks. Enabled by default. - Add microbenchmark Result: Increased performance for operations on the ByteBuf.	2015-10-15 10:19:49 +02:00
Norman Maurer	696a287736	[maven-release-plugin] prepare for next development iteration	2015-09-30 09:31:26 +02:00
Norman Maurer	fb2d562306	[maven-release-plugin] prepare release netty-4.0.32.Final	2015-09-30 09:28:40 +02:00
Norman Maurer	bd928eaa38	[maven-release-plugin] prepare for next development iteration	2015-09-02 08:58:54 +02:00
Norman Maurer	26bbcc38c2	[maven-release-plugin] prepare release netty-4.0.31.Final	2015-09-02 08:57:57 +02:00
Scott Mitchell	3056b80602	Microbench backport issue Motivation: The microbench code in 4.0 lives in src/test while in 4.1 and master it lives in src/main. A backport of a patch did not account for this. Modifications: - Move the benchmark to the src/test directory - Update new benchmark package info Result: 4.0 branch can now build again.	2015-07-30 10:33:10 -07:00
Scott Mitchell	1fcc72aa90	HttpObjectDecoder performance improvements Motivation: The HttpObjectDecoder is on the hot code path for the http codec. There are a few hot methods which can be modified to improve performance. Modifications: - Modify AppendableCharSequence to provide unsafe methods which don't need to re-check bounds for every call. - Update HttpObjectDecoder methods to take advantage of new AppendableCharSequence methods. Result: Peformance boost for decoding http objects.	2015-07-29 23:26:49 -07:00
Norman Maurer	148692705c	[maven-release-plugin] prepare for next development iteration	2015-07-24 10:11:44 +02:00
Norman Maurer	11cc2d5197	[maven-release-plugin] prepare release netty-4.0.30.Final	2015-07-24 09:54:20 +02:00
Norman Maurer	1da998bc7c	[maven-release-plugin] prepare for next development iteration	2015-06-23 11:08:27 +02:00
Norman Maurer	4c482c1215	[maven-release-plugin] prepare release netty-4.0.29.Final	2015-06-23 11:07:56 +02:00
Norman Maurer	d1c46ca987	[maven-release-plugin] prepare for next development iteration	2015-05-07 11:33:47 -04:00
Norman Maurer	005d4a42fc	[maven-release-plugin] prepare release netty-4.0.28.Final	2015-05-07 11:33:09 -04:00
Norman Maurer	0f4d3a981e	Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit `3c10ffab5e`.	2015-05-07 11:02:03 -04:00
Norman Maurer	3c10ffab5e	[maven-release-plugin] prepare for next development iteration	2015-05-07 09:09:23 -04:00
Norman Maurer	f2fedbcdef	[maven-release-plugin] prepare for next development iteration	2015-03-31 22:06:30 -04:00
Norman Maurer	054e7c5d17	[maven-release-plugin] prepare release netty-4.0.27.Final	2015-03-31 22:05:43 -04:00
Norman Maurer	37264bb72b	[maven-release-plugin] prepare for next development iteration	2015-03-02 01:31:30 -05:00
Norman Maurer	0dbc96cffd	[maven-release-plugin] prepare release netty-4.0.26.Final	2015-03-02 01:30:58 -05:00
Norman Maurer	e99d89c04d	[maven-release-plugin] rollback the release of netty-4.0.26.Final	2015-02-28 21:28:06 +01:00
Norman Maurer	b86e2e6ac0	[maven-release-plugin] prepare release netty-4.0.26.Final	2015-02-28 13:55:01 -05:00
Trustin Lee	0e61aeb849	[maven-release-plugin] prepare for next development iteration	2014-12-31 20:58:44 +09:00
Trustin Lee	087db82e78	[maven-release-plugin] prepare release netty-4.0.25.Final	2014-12-31 20:58:33 +09:00
Michael Nitschinger	9fc95803da	Fix ByteBufUtilBenchmark on utf8 encodings. Motivation ---------- The performance tests for utf8 also used the getBytes on ASCII, which is incorrect and also provides different performance numbers. Modifications ------------- Use CharsetUtil.UTF_8 instead of US_ASCII for the getBytes calls. Result ------ Accurate and semantically correct benchmarking results on utf8 comparisons.	2014-12-31 20:26:21 +09:00
Norman Maurer	61a5e60513	Provide helper methods in ByteBufUtil to write UTF-8/ASCII CharSequences. Related to [#909 ] Motivation: We expose no methods in ByteBuf to directly write a CharSequence into it. This leads to have the user either convert the CharSequence first to a byte array or use CharsetEncoder. Both cases have some overheads and we can do a lot better for well known Charsets like UTF-8 and ASCII. Modifications: Add ByteBufUtil.writeAscii(...) and ByteBufUtil.writeUtf8(...) which can do the task in an optimized way. This is especially true if the passed in ByteBuf extends AbstractByteBuf which is true for all of our implementations which not wrap another ByteBuf. Result: Writing an ASCII and UTF-8 CharSequence into a AbstractByteBuf is a lot faster then what the user could do by himself as we can make use of some package private methods and so eliminate reference and range checks. When the Charseq is not ASCII or UTF-8 we can still do a very good job and are on par in most of the cases with what the user would do. The following benchmark shows the improvements: Result: 2456866.966 ?(99.9%) 59066.370 ops/s [Average] Statistics: (min, avg, max) = (2297025.189, 2456866.966, 2586003.225), stdev = 78851.914 Confidence interval (99.9%): [2397800.596, 2515933.336] Benchmark Mode Samples Score Score error Units i.n.m.b.ByteBufUtilBenchmark.writeAscii thrpt 50 9398165.238 131503.098 ops/s i.n.m.b.ByteBufUtilBenchmark.writeAsciiString thrpt 50 9695177.968 176684.821 ops/s i.n.m.b.ByteBufUtilBenchmark.writeAsciiStringViaArray thrpt 50 4788597.415 83181.549 ops/s i.n.m.b.ByteBufUtilBenchmark.writeAsciiStringViaArrayWrapped thrpt 50 4722297.435 98984.491 ops/s i.n.m.b.ByteBufUtilBenchmark.writeAsciiStringWrapped thrpt 50 4028689.762 66192.505 ops/s i.n.m.b.ByteBufUtilBenchmark.writeAsciiViaArray thrpt 50 3234841.565 91308.009 ops/s i.n.m.b.ByteBufUtilBenchmark.writeAsciiViaArrayWrapped thrpt 50 3311387.474 39018.933 ops/s i.n.m.b.ByteBufUtilBenchmark.writeAsciiWrapped thrpt 50 3379764.250 66735.415 ops/s i.n.m.b.ByteBufUtilBenchmark.writeUtf8 thrpt 50 5671116.821 101760.081 ops/s i.n.m.b.ByteBufUtilBenchmark.writeUtf8String thrpt 50 5682733.440 111874.084 ops/s i.n.m.b.ByteBufUtilBenchmark.writeUtf8StringViaArray thrpt 50 3564548.995 55709.512 ops/s i.n.m.b.ByteBufUtilBenchmark.writeUtf8StringViaArrayWrapped thrpt 50 3621053.671 47632.820 ops/s i.n.m.b.ByteBufUtilBenchmark.writeUtf8StringWrapped thrpt 50 2634029.071 52304.876 ops/s i.n.m.b.ByteBufUtilBenchmark.writeUtf8ViaArray thrpt 50 3397049.332 57784.119 ops/s i.n.m.b.ByteBufUtilBenchmark.writeUtf8ViaArrayWrapped thrpt 50 3318685.262 35869.562 ops/s i.n.m.b.ByteBufUtilBenchmark.writeUtf8Wrapped thrpt 50 2473791.249 46423.114 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1,387.417 sec - in io.netty.microbench.buffer.ByteBufUtilBenchmark Results : Tests run: 1, Failures: 0, Errors: 0, Skipped: 0 Results : Tests run: 1, Failures: 0, Errors: 0, Skipped: 0 The ViaArray benchmarks are basically doing a toString().getBytes(Charset) which the others are using ByteBufUtil.write*(...).	2014-12-26 15:57:59 +09:00
Idel Pivnitskiy	9b3f536921	Benchmark for HttpRequestDecoder	2014-11-12 14:37:11 +01:00
Norman Maurer	1914b77c71	[maven-release-plugin] prepare for next development iteration	2014-10-29 11:48:40 +01:00
Norman Maurer	c170e7df3f	[maven-release-plugin] prepare release netty-4.0.24.Final	2014-10-29 11:47:19 +01:00
Trustin Lee	7710e7da44	[maven-release-plugin] prepare for next development iteration	2014-08-16 03:02:02 +09:00
Trustin Lee	208198c0cb	[maven-release-plugin] prepare release netty-4.0.23.Final	2014-08-16 03:01:57 +09:00
Trustin Lee	a2d508711d	[maven-release-plugin] prepare for next development iteration	2014-08-14 09:41:33 +09:00
Trustin Lee	3051db9d59	[maven-release-plugin] prepare release netty-4.0.22.Final	2014-08-14 09:41:28 +09:00
Norman Maurer	e8f4def2a3	[maven-release-plugin] prepare for next development iteration	2014-06-30 14:31:08 +02:00
Norman Maurer	25e3c8ce3d	[maven-release-plugin] prepare release netty-4.0.21.Final	2014-06-30 14:29:15 +02:00
Trustin Lee	cb994dd926	Fix the inconsistencies between performance tests in ByteBufAllocatorBenchmark Motivation: default() tests are performing a test in a different way, and they must be same with other tests. Modification: Make sure default() tests are same with the others Result: Easier to compare default and non-default allocators	2014-06-21 13:28:11 +09:00
Trustin Lee	fb538ea532	Refactor FastThreadLocal to simplify TLV management Motivation: When Netty runs in a managed environment such as web application server, Netty needs to provide an explicit way to remove the thread-local variables it created to prevent class loader leaks. FastThreadLocal uses different execution paths for storing a thread-local variable depending on the type of the current thread. It increases the complexity of thread-local removal. Modifications: - Moved FastThreadLocal and FastThreadLocalThread out of the internal package so that a user can use it. - FastThreadLocal now keeps track of all thread local variables it has initialized, and calling FastThreadLocal.removeAll() will remove all thread-local variables of the caller thread. - Added FastThreadLocal.size() for diagnostics and tests - Introduce InternalThreadLocalMap which is a mixture of hard-wired thread local variable fields and extensible indexed variables - FastThreadLocal now uses InternalThreadLocalMap to implement a thread-local variable. - Added ThreadDeathWatcher.unwatch() so that PooledByteBufAllocator tells it to stop watching when its thread-local cache has been freed by FastThreadLocal.removeAll(). - Added FastThreadLocalTest to ensure that removeAll() works - Added microbenchmark for FastThreadLocal and JDK ThreadLocal - Upgraded to JMH 0.9 Result: - A user can remove all thread-local variables Netty created, as long as he or she did not exit from the current thread. (Note that there's no way to remove a thread-local variable from outside of the thread.) - FastThreadLocal exposes more useful operations such as isSet() because we always implement a thread local variable via InternalThreadLocalMap instead of falling back to JDK ThreadLocal. - FastThreadLocalBenchmark shows that this change improves the performance of FastThreadLocal even more.	2014-06-19 21:08:16 +09:00
Norman Maurer	b737d631f1	[maven-release-plugin] prepare for next development iteration	2014-06-12 16:20:52 +02:00
Norman Maurer	1709113a1f	[maven-release-plugin] prepare release netty-4.0.20.Final	2014-06-12 16:14:48 +02:00
belliottsmith	1ac2ff8d7b	Introduce FastThreadLocal which uses an EnumMap and a predefined fixed set of possible thread locals Motivation: Provide a faster ThreadLocal implementation Modification: Add a "FastThreadLocal" which uses an EnumMap and a predefined fixed set of possible thread locals (all of the static instances created by netty) that is around 10-20% faster than standard ThreadLocal in my benchmarks (and can be seen having an effect in the direct PooledByteBufAllocator benchmark that uses the DEFAULT ByteBufAllocator which uses this FastThreadLocal, as opposed to normal instantiations that do not, and in the new RecyclableArrayList benchmark); Result: Improved performance	2014-06-12 15:43:20 +02:00
Norman Maurer	4ad3984c8b	[#2436 ] UnsafeByteBuf implementation should only invert bytes if ByteOrder differ from native ByteOrder Motivation: Our UnsafeByteBuf implementation always invert bytes when the native ByteOrder is LITTLE_ENDIAN (this is true on intel), even when the user calls order(ByteOrder.LITTLE_ENDIAN). This is not optimal for performance reasons as the user should be able to set the ByteOrder to LITTLE_ENDIAN and so write bytes without the extra inverting. Modification: - Introduce a new special SwappedByteBuf (called UnsafeDirectSwappedByteBuf) that is used by all the Unsafe*ByteBuf implementation and allows to write without inverting the bytes. - Add benchmark - Upgrade jmh to 0.8 Result: The user is be able to get the max performance even on servers that have ByteOrder.LITTLE_ENDIAN as their native ByteOrder.	2014-06-05 11:09:58 +02:00
Trustin Lee	172e7f06be	More realistic ByteBuf allocation benchmark Motivation: Allocating a single buffer and releasing it repetitively for a benchmark will not involve the realistic execution path of the allocators. Modifications: Keep the last 8192 allocations and release them randomly. Result: We are now getting the result close to what we got with caliper.	2014-05-29 19:51:13 +09:00
Norman Maurer	a597087a9f	[maven-release-plugin] prepare for next development iteration	2014-04-30 15:40:54 +02:00
Norman Maurer	b562148e2d	[maven-release-plugin] prepare release netty-4.0.19.Final	2014-04-30 15:40:31 +02:00
Norman Maurer	816165c96a	[maven-release-plugin] prepare for next development iteration	2014-04-01 07:21:40 +02:00
Norman Maurer	1512a4dcca	[maven-release-plugin] prepare release netty-4.0.18.Final	2014-04-01 07:20:16 +02:00

1 2 3

142 Commits