netty5

Author	SHA1	Message	Date
Norman Maurer	efeec7e390	Correctly construct Executor in microbenchmarks. Motivation: We should allow our custom Executor to shutdown quickly. Modifications: Call super constructor which correct arguments. Result: Custom Executor can be shutdown quickly.	2015-11-03 09:42:42 +01:00
buchgr	e12613a018	Fix performance regression in FastThreadLocal microbenchmark. Fixes #4402 Motivation: As reported in #4402, the FastThreadLocalBenchmark shows that the JDK ThreadLocal is actually faster than Netty's custom thread local implementation. I was looking forward to doing some deep digging, but got disappointed :(. Modifications: The microbenchmark was not using FastThreadLocalThreads and would thus always hit the slow path. I updated the JMH command line flags, so that FastThreadLocalThreads would be used. Result: FastThreadLocalBenchmark shows FastThreadLocal to be faster than JDK's ThreadLocal implementation, by about 56% in this particular benchmark. Run on OSX El Capitan with OpenJDK 1.8u60. Benchmark Mode Cnt Score Error Units FastThreadLocalBenchmark.fastThreadLocal thrpt 20 55452.027 ± 725.713 ops/s FastThreadLocalBenchmark.jdkThreadLocalGet thrpt 20 35481.888 ± 1471.647 ops/s	2015-10-29 21:46:04 +01:00
Norman Maurer	0c8fe18d3c	Add benchmark for HeapByteBuf implementations. Motivation: To prove one implementation is faster as the other we should have a benchmark. Modifications: Add benchmark which benchmarks the unsafe and non-unsafe implementation of HeapByteBuf. Result: Able to compare speed of implementations easily.	2015-10-29 19:38:33 +01:00
Norman Maurer	577931e8bc	Use bitwise operation when sampling for resource leak detection. Motivation: Modulo operations are slow, we can use bitwise operation to detect if resource leak detection must be done while sampling. Modifications: - Ensure the interval is a power of two - Use bitwise operation for sampling - Add benchmark. Result: Faster sampling.	2015-10-29 19:18:06 +01:00
Norman Maurer	291674262c	Added SlicedAbstractByteBuf that can provide fast-path for _get* and _set* methods Motivation: SlicedByteBuf can be used for any ByteBuf implementations and so can not do any optimizations that could be done when AbstractByteBuf is sliced. Modifications: - Add SlicedAbstractByteBuf that can eliminate range and reference count checks for _get* and _set* methods. Result: Faster SlicedByteBuf implementations for AbstractByteBuf sub-classes.	2015-10-16 08:59:58 +02:00
Norman Maurer	054af70fed	Minimize object allocation when calling AbstractByteBuf.toString(..., Charset) Motivation: Calling AbstractByteBuf.toString(..., Charset) is used quite frequently by users but produce a lot of GC. Modification: - Use a FastThreadLocal to store the CharBuffer that are needed for decoding. - Use internalNioBuffer(...) when possible Result: Less object creation / Less GC	2015-10-15 17:49:21 +02:00
Norman Maurer	1103379e02	Allow to disable reference count checks on every access of the ByteBuf Motiviation: Checking reference count on every access on a ByteBuf can have some big performance overhead depending on how the access pattern is. If the user is sure that there are no reference count errors on his side it should be possible to disable the check and so gain the max performance. Modification: - Add io.netty.buffer.bytebuf.checkAccessible system property which allows to disable the checks. Enabled by default. - Add microbenchmark Result: Increased performance for operations on the ByteBuf.	2015-10-15 10:19:49 +02:00
Scott Mitchell	3056b80602	Microbench backport issue Motivation: The microbench code in 4.0 lives in src/test while in 4.1 and master it lives in src/main. A backport of a patch did not account for this. Modifications: - Move the benchmark to the src/test directory - Update new benchmark package info Result: 4.0 branch can now build again.	2015-07-30 10:33:10 -07:00
Michael Nitschinger	9fc95803da	Fix ByteBufUtilBenchmark on utf8 encodings. Motivation ---------- The performance tests for utf8 also used the getBytes on ASCII, which is incorrect and also provides different performance numbers. Modifications ------------- Use CharsetUtil.UTF_8 instead of US_ASCII for the getBytes calls. Result ------ Accurate and semantically correct benchmarking results on utf8 comparisons.	2014-12-31 20:26:21 +09:00
Norman Maurer	61a5e60513	Provide helper methods in ByteBufUtil to write UTF-8/ASCII CharSequences. Related to [#909 ] Motivation: We expose no methods in ByteBuf to directly write a CharSequence into it. This leads to have the user either convert the CharSequence first to a byte array or use CharsetEncoder. Both cases have some overheads and we can do a lot better for well known Charsets like UTF-8 and ASCII. Modifications: Add ByteBufUtil.writeAscii(...) and ByteBufUtil.writeUtf8(...) which can do the task in an optimized way. This is especially true if the passed in ByteBuf extends AbstractByteBuf which is true for all of our implementations which not wrap another ByteBuf. Result: Writing an ASCII and UTF-8 CharSequence into a AbstractByteBuf is a lot faster then what the user could do by himself as we can make use of some package private methods and so eliminate reference and range checks. When the Charseq is not ASCII or UTF-8 we can still do a very good job and are on par in most of the cases with what the user would do. The following benchmark shows the improvements: Result: 2456866.966 ?(99.9%) 59066.370 ops/s [Average] Statistics: (min, avg, max) = (2297025.189, 2456866.966, 2586003.225), stdev = 78851.914 Confidence interval (99.9%): [2397800.596, 2515933.336] Benchmark Mode Samples Score Score error Units i.n.m.b.ByteBufUtilBenchmark.writeAscii thrpt 50 9398165.238 131503.098 ops/s i.n.m.b.ByteBufUtilBenchmark.writeAsciiString thrpt 50 9695177.968 176684.821 ops/s i.n.m.b.ByteBufUtilBenchmark.writeAsciiStringViaArray thrpt 50 4788597.415 83181.549 ops/s i.n.m.b.ByteBufUtilBenchmark.writeAsciiStringViaArrayWrapped thrpt 50 4722297.435 98984.491 ops/s i.n.m.b.ByteBufUtilBenchmark.writeAsciiStringWrapped thrpt 50 4028689.762 66192.505 ops/s i.n.m.b.ByteBufUtilBenchmark.writeAsciiViaArray thrpt 50 3234841.565 91308.009 ops/s i.n.m.b.ByteBufUtilBenchmark.writeAsciiViaArrayWrapped thrpt 50 3311387.474 39018.933 ops/s i.n.m.b.ByteBufUtilBenchmark.writeAsciiWrapped thrpt 50 3379764.250 66735.415 ops/s i.n.m.b.ByteBufUtilBenchmark.writeUtf8 thrpt 50 5671116.821 101760.081 ops/s i.n.m.b.ByteBufUtilBenchmark.writeUtf8String thrpt 50 5682733.440 111874.084 ops/s i.n.m.b.ByteBufUtilBenchmark.writeUtf8StringViaArray thrpt 50 3564548.995 55709.512 ops/s i.n.m.b.ByteBufUtilBenchmark.writeUtf8StringViaArrayWrapped thrpt 50 3621053.671 47632.820 ops/s i.n.m.b.ByteBufUtilBenchmark.writeUtf8StringWrapped thrpt 50 2634029.071 52304.876 ops/s i.n.m.b.ByteBufUtilBenchmark.writeUtf8ViaArray thrpt 50 3397049.332 57784.119 ops/s i.n.m.b.ByteBufUtilBenchmark.writeUtf8ViaArrayWrapped thrpt 50 3318685.262 35869.562 ops/s i.n.m.b.ByteBufUtilBenchmark.writeUtf8Wrapped thrpt 50 2473791.249 46423.114 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1,387.417 sec - in io.netty.microbench.buffer.ByteBufUtilBenchmark Results : Tests run: 1, Failures: 0, Errors: 0, Skipped: 0 Results : Tests run: 1, Failures: 0, Errors: 0, Skipped: 0 The ViaArray benchmarks are basically doing a toString().getBytes(Charset) which the others are using ByteBufUtil.write*(...).	2014-12-26 15:57:59 +09:00
Idel Pivnitskiy	9b3f536921	Benchmark for HttpRequestDecoder	2014-11-12 14:37:11 +01:00
Trustin Lee	cb994dd926	Fix the inconsistencies between performance tests in ByteBufAllocatorBenchmark Motivation: default() tests are performing a test in a different way, and they must be same with other tests. Modification: Make sure default() tests are same with the others Result: Easier to compare default and non-default allocators	2014-06-21 13:28:11 +09:00
Trustin Lee	fb538ea532	Refactor FastThreadLocal to simplify TLV management Motivation: When Netty runs in a managed environment such as web application server, Netty needs to provide an explicit way to remove the thread-local variables it created to prevent class loader leaks. FastThreadLocal uses different execution paths for storing a thread-local variable depending on the type of the current thread. It increases the complexity of thread-local removal. Modifications: - Moved FastThreadLocal and FastThreadLocalThread out of the internal package so that a user can use it. - FastThreadLocal now keeps track of all thread local variables it has initialized, and calling FastThreadLocal.removeAll() will remove all thread-local variables of the caller thread. - Added FastThreadLocal.size() for diagnostics and tests - Introduce InternalThreadLocalMap which is a mixture of hard-wired thread local variable fields and extensible indexed variables - FastThreadLocal now uses InternalThreadLocalMap to implement a thread-local variable. - Added ThreadDeathWatcher.unwatch() so that PooledByteBufAllocator tells it to stop watching when its thread-local cache has been freed by FastThreadLocal.removeAll(). - Added FastThreadLocalTest to ensure that removeAll() works - Added microbenchmark for FastThreadLocal and JDK ThreadLocal - Upgraded to JMH 0.9 Result: - A user can remove all thread-local variables Netty created, as long as he or she did not exit from the current thread. (Note that there's no way to remove a thread-local variable from outside of the thread.) - FastThreadLocal exposes more useful operations such as isSet() because we always implement a thread local variable via InternalThreadLocalMap instead of falling back to JDK ThreadLocal. - FastThreadLocalBenchmark shows that this change improves the performance of FastThreadLocal even more.	2014-06-19 21:08:16 +09:00
belliottsmith	1ac2ff8d7b	Introduce FastThreadLocal which uses an EnumMap and a predefined fixed set of possible thread locals Motivation: Provide a faster ThreadLocal implementation Modification: Add a "FastThreadLocal" which uses an EnumMap and a predefined fixed set of possible thread locals (all of the static instances created by netty) that is around 10-20% faster than standard ThreadLocal in my benchmarks (and can be seen having an effect in the direct PooledByteBufAllocator benchmark that uses the DEFAULT ByteBufAllocator which uses this FastThreadLocal, as opposed to normal instantiations that do not, and in the new RecyclableArrayList benchmark); Result: Improved performance	2014-06-12 15:43:20 +02:00
Norman Maurer	4ad3984c8b	[#2436 ] UnsafeByteBuf implementation should only invert bytes if ByteOrder differ from native ByteOrder Motivation: Our UnsafeByteBuf implementation always invert bytes when the native ByteOrder is LITTLE_ENDIAN (this is true on intel), even when the user calls order(ByteOrder.LITTLE_ENDIAN). This is not optimal for performance reasons as the user should be able to set the ByteOrder to LITTLE_ENDIAN and so write bytes without the extra inverting. Modification: - Introduce a new special SwappedByteBuf (called UnsafeDirectSwappedByteBuf) that is used by all the Unsafe*ByteBuf implementation and allows to write without inverting the bytes. - Add benchmark - Upgrade jmh to 0.8 Result: The user is be able to get the max performance even on servers that have ByteOrder.LITTLE_ENDIAN as their native ByteOrder.	2014-06-05 11:09:58 +02:00
Trustin Lee	172e7f06be	More realistic ByteBuf allocation benchmark Motivation: Allocating a single buffer and releasing it repetitively for a benchmark will not involve the realistic execution path of the allocators. Modifications: Keep the last 8192 allocations and release them randomly. Result: We are now getting the result close to what we got with caliper.	2014-05-29 19:51:13 +09:00
Michael Nitschinger	b3b73be61c	Upgrade JMH to 0.4.1 and make use of @Params.	2014-02-23 16:40:04 +01:00
Michael Nitschinger	268988378f	Update JMH to 0.3.2	2014-02-14 13:16:22 -08:00
Michael Nitschinger	ac332dfe02	Using SystemPropertyUtil for prperty parsing.	2014-01-15 18:53:28 +01:00
Michael Nitschinger	99f9c6dbc3	Make JMH options modifiable through the subclassed benchmark.	2014-01-15 18:53:22 +01:00
Michael Nitschinger	03b0099b63	microbench: move from Caliper to JMH	2014-01-14 14:56:20 +09:00
Trustin Lee	dba3aa2d4f	Add io.netty.noResourceLeak option to microbench	2013-06-25 11:07:14 +09:00
Prajwal Tuladhar	05850da863	enable checkstyle for test source directory and fix checkstyle errors	2013-03-30 13:18:57 +01:00
Trustin Lee	8d88acb4a7	Change ByteBufAllocator.buffer() to allocate a direct buffer only when the platform can handle a direct buffer reliably - Rename directbyDefault to preferDirect - Add a system property 'io.netty.prederDirect' to allow a user from changing the preference on launch-time - Merge UnpooledByteBufAllocator.DEFAULT_BY_* to DEFAULT	2013-03-05 17:55:24 +09:00
Trustin Lee	b9996908b1	Implement reference counting - Related: #1029 - Replace Freeable with ReferenceCounted - Add AbstractReferenceCounted - Add AbstractReferenceCountedByteBuf - Add AbstractDerivedByteBuf - Add EmptyByteBuf	2013-02-10 13:10:09 +09:00
Trustin Lee	03e68482bb	Remove ChannelBuf/ByteBuf.Unsafe - Fixes #826 Unsafe.isFreed(), free(), suspend/resumeIntermediaryAllocations() are not that dangerous. internalNioBuffer() and internalNioBuffers() are dangerous but it seems like nobody is using it even inside Netty. Removing those two methods also removes the necessity to keep Unsafe interface at all.	2012-12-17 17:41:21 +09:00
Trustin Lee	e37aeb38d6	Add the original copyright	2012-12-14 00:10:28 +09:00
Trustin Lee	6339feaa8f	Apply advanced JVM options to benchmarks / Fix duplicate uploads - Add common optimization options when launching a new JVM to run a benchmark - Fix a bug where a benchmark report is uploaded twice - Simplify pom.xml and move the build instruction messages to DefaultBenchmark - Print an empty line to prettify the output	2012-12-14 00:00:41 +09:00
Trustin Lee	b47fc77522	Add PooledByteBufAllocator + microbenchmark module This pull request introduces the new default ByteBufAllocator implementation based on jemalloc, with a some differences: * Minimum possible buffer capacity is 16 (jemalloc: 2) * Uses binary heap with random branching (jemalloc: red-black tree) * No thread-local cache yet (jemalloc has thread-local cache) * Default page size is 8 KiB (jemalloc: 4 KiB) * Default chunk size is 16 MiB (jemalloc: 2 MiB) * Cannot allocate a buffer bigger than the chunk size (jemalloc: possible) because we don't have control over memory layout in Java. A user can work around this issue by creating a composite buffer, but it's not always a feasible option. Although 16 MiB is a pretty big default, a user's handler might need to deal with the bounded buffers when the user wants to deal with a large message. Also, to ensure the new allocator performs good enough, I wrote a microbenchmark for it and made it a dedicated Maven module. It uses Google's Caliper framework to run and publish the test result (example) Miscellaneous changes: * Made some ByteBuf implementations public so that those who implements a new allocator can make use of them. * Added ByteBufAllocator.compositeBuffer() and its variants. * ByteBufAllocator.ioBuffer() creates a buffer with 0 capacity.	2012-12-13 22:35:06 +09:00

29 Commits