netty5

Author	SHA1	Message	Date
Pavan Kumar	004ffbad90	Improve the allocation algorithm in PoolChunk Motivation: Depth-first search is not always efficient for buddy allocation. Modification: Employ a new faster search algorithm with different memoryMap layout. Result: With thread-local cache disabled, we see a lot of performance improvment, especially when the size of the allocation is as small as the page size, which had the largest search space previously: -- master head -- Benchmark (size) Mode Score Error Units pooledDirectAllocAndFree 8192 thrpt 215.392 1.565 ops/ms pooledDirectAllocAndFree 16384 thrpt 594.625 2.154 ops/ms pooledDirectAllocAndFree 65536 thrpt 1221.520 18.965 ops/ms pooledHeapAllocAndFree 8192 thrpt 217.175 1.653 ops/ms pooledHeapAllocAndFree 16384 thrpt 587.250 14.827 ops/ms pooledHeapAllocAndFree 65536 thrpt 1217.023 44.963 ops/ms -- changes -- Benchmark (size) Mode Score Error Units pooledDirectAllocAndFree 8192 thrpt 3656.744 94.093 ops/ms pooledDirectAllocAndFree 16384 thrpt 4087.152 22.921 ops/ms pooledDirectAllocAndFree 65536 thrpt 4058.814 29.276 ops/ms pooledHeapAllocAndFree 8192 thrpt 3640.355 44.418 ops/ms pooledHeapAllocAndFree 16384 thrpt 4030.206 24.365 ops/ms pooledHeapAllocAndFree 65536 thrpt 4103.991 70.991 ops/ms	2014-06-21 13:20:56 +09:00
Norman Maurer	19a1b603d0	Remove System.out.println(...) debug messages	2014-06-20 19:42:08 +02:00
Norman Maurer	d2b8560a76	[#2580 ] [#2587 ] Fix buffer corruption regression when ByteBuf.order(LITTLE_ENDIAN) is used Motivation: To improve the speed of ByteBuf with order LITTLE_ENDIAN and where the native order is also LITTLE_ENDIAN (intel) we introduces a new special SwappedByteBuf before in commit 4ad3984c8b725ef59856d174d09d1209d65933fc. Unfortunally the commit has a flaw which does not handle correctly the case when a ByteBuf expands. This was caused because the memoryAddress was cached and never changed again even if the underlying buffer expanded. This can lead to corrupt data or even to SEGFAULT the JVM if you are lucky enough. Modification: Always lookup the actual memoryAddress of the wrapped ByteBuf. Result: No more data-corruption for ByteBuf with order LITTLE_ENDIAN and no JVM crashes.	2014-06-20 18:25:54 +02:00
Trustin Lee	fb538ea532	Refactor FastThreadLocal to simplify TLV management Motivation: When Netty runs in a managed environment such as web application server, Netty needs to provide an explicit way to remove the thread-local variables it created to prevent class loader leaks. FastThreadLocal uses different execution paths for storing a thread-local variable depending on the type of the current thread. It increases the complexity of thread-local removal. Modifications: - Moved FastThreadLocal and FastThreadLocalThread out of the internal package so that a user can use it. - FastThreadLocal now keeps track of all thread local variables it has initialized, and calling FastThreadLocal.removeAll() will remove all thread-local variables of the caller thread. - Added FastThreadLocal.size() for diagnostics and tests - Introduce InternalThreadLocalMap which is a mixture of hard-wired thread local variable fields and extensible indexed variables - FastThreadLocal now uses InternalThreadLocalMap to implement a thread-local variable. - Added ThreadDeathWatcher.unwatch() so that PooledByteBufAllocator tells it to stop watching when its thread-local cache has been freed by FastThreadLocal.removeAll(). - Added FastThreadLocalTest to ensure that removeAll() works - Added microbenchmark for FastThreadLocal and JDK ThreadLocal - Upgraded to JMH 0.9 Result: - A user can remove all thread-local variables Netty created, as long as he or she did not exit from the current thread. (Note that there's no way to remove a thread-local variable from outside of the thread.) - FastThreadLocal exposes more useful operations such as isSet() because we always implement a thread local variable via InternalThreadLocalMap instead of falling back to JDK ThreadLocal. - FastThreadLocalBenchmark shows that this change improves the performance of FastThreadLocal even more.	2014-06-19 21:08:16 +09:00
Norman Maurer	6fdf1138ca	[#2573 ] UnpooledUnsafeDirectByteBuf.setBytes(int,ByteBuf,int,int) fails to use fast-path when src has array Motivation: UnpooledUnsafeDirectByteBuf.setBytes(int,ByteBuf,int,int) fails to use fast-path when src uses an array as backing storage. This is because the if else uses the wrong ByteBuf for its check. Modifications: - Use correct ByteBuf when check for array as backing storage - Also eliminate unecessary check in UnpooledDirectByteBuf which always fails anyway Result: Faster setBytes(...) when src ByteBuf is backed by an array. No more IndexOutOfBoundsException or data-corruption.	2014-06-16 11:11:54 +02:00
Norman Maurer	b737d631f1	[maven-release-plugin] prepare for next development iteration	2014-06-12 16:20:52 +02:00
Norman Maurer	1709113a1f	[maven-release-plugin] prepare release netty-4.0.20.Final	2014-06-12 16:14:48 +02:00
Norman Maurer	76043bc8c8	Make use of an array to store FastThreadLocals and so allow to also use it in PooledByteBufAllocator that is instanced by users. Motivation: Allow to make use of our new FastThreadLocal whereever possible Modification: Make use of an array to store FastThreadLocals and so allow to also use it in PooledByteBufAllocator that is instanced by users. The maximal size of the array is configurable per system property to allow to tune it if needed. As default we use 64 entries which should be good enough. Result: More flexible usage of FastThreadLocal	2014-06-12 15:43:20 +02:00
belliottsmith	1ac2ff8d7b	Introduce FastThreadLocal which uses an EnumMap and a predefined fixed set of possible thread locals Motivation: Provide a faster ThreadLocal implementation Modification: Add a "FastThreadLocal" which uses an EnumMap and a predefined fixed set of possible thread locals (all of the static instances created by netty) that is around 10-20% faster than standard ThreadLocal in my benchmarks (and can be seen having an effect in the direct PooledByteBufAllocator benchmark that uses the DEFAULT ByteBufAllocator which uses this FastThreadLocal, as opposed to normal instantiations that do not, and in the new RecyclableArrayList benchmark); Result: Improved performance	2014-06-12 15:43:20 +02:00
Norman Maurer	4ad3984c8b	[#2436 ] UnsafeByteBuf implementation should only invert bytes if ByteOrder differ from native ByteOrder Motivation: Our UnsafeByteBuf implementation always invert bytes when the native ByteOrder is LITTLE_ENDIAN (this is true on intel), even when the user calls order(ByteOrder.LITTLE_ENDIAN). This is not optimal for performance reasons as the user should be able to set the ByteOrder to LITTLE_ENDIAN and so write bytes without the extra inverting. Modification: - Introduce a new special SwappedByteBuf (called UnsafeDirectSwappedByteBuf) that is used by all the Unsafe*ByteBuf implementation and allows to write without inverting the bytes. - Add benchmark - Upgrade jmh to 0.8 Result: The user is be able to get the max performance even on servers that have ByteOrder.LITTLE_ENDIAN as their native ByteOrder.	2014-06-05 11:09:58 +02:00
Trustin Lee	0928e28385	Use Java 5 foreach for arrays for brevity at no cost	2014-06-02 18:25:42 +09:00
Trustin Lee	ed6df98653	Introduce ThreadDeathWatcher Motivation: PooledByteBufAllocator's thread local cache and ReferenceCountUtil.releaseLater() are in need of a way to run an arbitrary logic when a certain thread is terminated. Modifications: - Add ThreadDeathWatcher, which spawns a low-priority daemon thread that watches a list of threads periodically (every second) and invokes the specified tasks when the associated threads are not alive anymore - Start-stop logic based on CAS operation proposed by @tea-dragon - Add debug-level log messages to see if ThreadDeathWatcher works Result: - Fixes #2519 because we don't use GlobalEventExecutor anymore - Cleaner code	2014-06-02 18:23:48 +09:00
Trustin Lee	350ac9787e	Do not use a pseudo random for tree traversal Motivation: If we make allocateRun/SubpageSimple() always try the left node first and make allocateRun/Subpage() always tries the right node first, it is more likely that allocateRun/Subpage() will find a node with ST_UNUSED sooner. Modifications: - Make allocateRunSimple() and allocateSubpageSimple() always try the left node first. - Make allocateRun() and allocateSubpage() always try the right node first. - Remove randome Result: We get the same performance without using random numbers.	2014-05-30 11:24:22 +09:00
Trustin Lee	3e7dbe072e	Optimize PooledByteBufAllocator Motivation: We still have a room for improvement in PoolChunk.allocateRun() and Subpage.allocate(). Modifications: - Unroll the recursion in PoolChunk.allocateRun() - Subpage.allocate() makes use of the 'nextAvail' value set by previous free(). Result: - PoolChunk.allocateRun() optimization yields 10%+ improvements in allocation throughput for non-subpage allocations. - Subpage.allocate() optimization makes the subpage allocations for tiny buffers as fast as non-tiny buffers even when the pageSize is huge (e.g. 1048576) because it doesn't need to perform a linear search in most cases.	2014-05-30 10:51:27 +09:00
Jake Luciani	795507aa7b	Fix capacity check bug affecting offheap buffers	2014-05-13 07:24:56 +02:00
Norman Maurer	a597087a9f	[maven-release-plugin] prepare for next development iteration	2014-04-30 15:40:54 +02:00
Norman Maurer	b562148e2d	[maven-release-plugin] prepare release netty-4.0.19.Final	2014-04-30 15:40:31 +02:00
ian	fde13d96f9	Fix error that causes (up to) double memory usage Motivation: PoolArena's 'normalizeCapacity' function was micro-optimized some time ago to remove a while loop. However, there was a change of behavior in the function as a result. Capacities passed into it that are already powers of 2 (and >= 512) are doubled in size. So if I ask for a buffer with a capacity of 1024, I will get back one that actually uses 2048 bytes (stored in maxLength). Aligning to powers of two for book keeping ease is reasonable, and if someone tries to expand a buffer, you might as well use some of the previously wasted space. However, since this distinction between 'easily expanded' and 'costly to expand' space is not supported at all by the APIs, I cannot imagine this change to doubling is desirable or intentional. This is especially costly when using composite buffers. They frequently allocate components with a capacity that is a power of 2, and they never attempt to expand components themselves. The end result is that heavy use of pool-backed composite buffers wastes almost half of the memory pool (the smaller / initial components are <512 and so are not affected by the off-by-one bug). Modifications: Although I find it difficult to believe that such an optimization is really helpful, I left it in and fixed the off-by-one issue by decrementing the value at the start. I also added a simple test to both attempt to verify that the decrement fixes the issue without introducing any other change, and to make it easy for a reviewer to test the existing behavior. PoolArena does not seem to have much testing or testability support though so the test is kind of a hack and will break for unrelated changes. I suggest either removing it or factoring out the single non-static portion of normalizeCapacity so that the fragile dummy PoolArena is not required. Result: Pooled allocators will allocate less resources to the highly inefficient and undocumented buffer section between length and maxLength. Composite buffers of non-trivial size that are backed by pooled allocators will use about half as much memory.	2014-04-15 07:02:49 +02:00
Norman Maurer	9c934116f2	[#2370 ] Periodically check for not alive Threads and free up their ThreadPoolCache Motivation: At the moment we create new ThreadPoolCache whenever a Thread tries either allocate or release something on the PooledByteBufAllocator. When something is released we put it then in its ThreadPoolCache. The problem is we never check if a Thread is not alive anymore and so we may end up with memory that is never freed again if a user create many short living Threads that use the PooledByteBufAllocator. Modifications: Periodically check if the Thread is still alive that has a ThreadPoolCache assinged and if not free it. Result: Memory is freed up correctly even for short living Threads.	2014-04-09 11:44:51 +02:00
Norman Maurer	816165c96a	[maven-release-plugin] prepare for next development iteration	2014-04-01 07:21:40 +02:00
Norman Maurer	1512a4dcca	[maven-release-plugin] prepare release netty-4.0.18.Final	2014-04-01 07:20:16 +02:00
Norman Maurer	13fd69e871	Implement Thread caches for pooled buffers to minimize conditions. This fixes [#2264 ] and [#808 ]. Motivation: Remove the synchronization bottleneck in PoolArena and so speed up things Modifications: This implementation uses kind of the same technics as outlined in the jemalloc paper and jemalloc blogpost https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919. At the moment we only cache for "known" Threads (that powers EventExecutors) and not for others to keep the overhead minimal when need to free up unused buffers in the cache and free up cached buffers once the Thread completes. Here we use multi-level caches for tiny, small and normal allocations. Huge allocations are not cached at all to keep the memory usage at a sane level. All the different cache configurations can be adjusted via system properties or the constructor directly where it makes sense. Result: Less conditions as most allocations can be served by the cache itself	2014-03-20 09:18:04 -07:00
Jakob Buchgraber	17ba35b6d0	Bit tricks to check for and calculate power of two. Motivation: I was studying the code and thought this was simpler and easier to understand. Modifications: Replaced the for loop and if conditions, with a simple implementation. Result: Code is easier to understand.	2014-03-18 15:59:58 +09:00
Bourne, Geoff	1c074eabe5	Fix limit computation of NIO ByteBuffers obtained via ReadOnlyByteBufferBuf.nioBuffer Motivation: When starting with a read-only NIO buffer, wrapping it in a ByteBuf, and then later retrieving a re-wrapped NIO buffer the limit was getting too short. Modifications: Changed ReadOnlyByteBufferBuf.nioBuffer(int,int) to compute the limit in the same manner as the internalNioBuffer method. Result: Round-trip conversion from NIO to ByteBuf to NIO will work reliably.	2014-03-14 08:07:19 +01:00
Norman Maurer	ccd135df01	[maven-release-plugin] prepare for next development iteration	2014-02-24 15:39:26 +01:00
Norman Maurer	33587eb183	[maven-release-plugin] prepare release netty-4.0.17.Final	2014-02-24 15:37:31 +01:00
Trustin Lee	0fc66a411f	The default buffer must be unpooled for backward compatibility Mistakenly set to pooled while merging the changes from 4.1 and master.	2014-02-21 14:43:07 -08:00
Norman Maurer	66e2bb1e75	[maven-release-plugin] prepare for next development iteration	2014-02-19 03:41:24 +01:00
Norman Maurer	c466bb803d	[maven-release-plugin] prepare release netty-4.0.16.Final	2014-02-19 03:36:54 +01:00
Trustin Lee	b18c8fe688	Determine the default allocator from system property - Add ByteBufAllocator.DEFAULT - The default allocator is 'unpooled'	2014-02-14 13:05:57 -08:00
Norman Maurer	f23d68b42f	[#2187 ] Always do a volatile read on the refCnt	2014-02-07 09:23:16 +01:00
Norman Maurer	9bee78f91c	Provide an optimized AtomicIntegerFieldUpdater, AtomicLongFieldUpdater and AtomicReferenceFieldUpdater	2014-02-06 20:08:45 +01:00
Norman Maurer	d67184b488	[maven-release-plugin] prepare for next development iteration	2014-01-21 08:18:32 +01:00
Norman Maurer	287515210d	[maven-release-plugin] prepare release netty-4.0.15.Final	2014-01-21 08:18:26 +01:00
Trustin Lee	e83d2e0b4e	[maven-release-plugin] prepare for next development iteration	2013-12-22 21:57:48 +09:00
Trustin Lee	cdb700c7a4	[maven-release-plugin] prepare release netty-4.0.14.Final	2013-12-22 21:57:40 +09:00
Trustin Lee	0b7aedb13b	[maven-release-plugin] rollback the release of netty-4.0.14.Final	2013-12-22 21:53:24 +09:00
Trustin Lee	4bf6ec7171	[maven-release-plugin] prepare release netty-4.0.14.Final	2013-12-22 21:52:56 +09:00
Trustin Lee	9c1a49c58e	[maven-release-plugin] rollback the release of netty-4.0.14.Final	2013-12-22 21:47:35 +09:00
Trustin Lee	008a049bf4	[maven-release-plugin] prepare for next development iteration	2013-12-22 21:43:55 +09:00
Trustin Lee	f6cb9088c6	[maven-release-plugin] prepare release netty-4.0.14.Final	2013-12-22 21:43:45 +09:00
Norman Maurer	b3d8c81557	Fix all leaks reported during tests - One notable leak is from WebSocketFrameAggregator - All other leaks are from tests	2013-12-07 00:44:56 +09:00
Trustin Lee	2102cb062b	Fix false-positive leaks - All derived buffers and swapped buffers of a leak-aware buffer must be wrapped again with the leak-aware buffer	2013-12-06 21:32:56 +09:00
Trustin Lee	e506581eb1	Add ReferenceCountUtil.releaseLater() to make writing tests easy with ReferenceCounteds	2013-12-06 15:13:00 +09:00
Trustin Lee	128c4b96b5	Checkstyle	2013-12-06 13:54:36 +09:00
Trustin Lee	5d39b1fc3d	Also record retain() and release()	2013-12-06 13:45:24 +09:00
Trustin Lee	e88172495a	Ensure backward compatibility .. by resurrecting the removed methods and system properties.	2013-12-05 01:02:38 +09:00
Trustin Lee	65b522a2a7	Better buffer leak reporting - Remove the reference to ResourceLeak from the buffer implementations and use wrappers instead: - SimpleLeakAwareByteBuf and AdvancedLeakAwareByteBuf - It is now allocator's responsibility to create a leak-aware buffer. - Added AbstractByteBufAllocator.toLeakAwareBuffer() for easier implementation - Add WrappedByteBuf to reduce duplication between *LeakAwareByteBuf and UnreleasableByteBuf - Raise the level of leak reports to ERROR - because it will break the app eventually - Replace enabled/disabled property with the leak detection level - Only print stack trace when level is ADVANCED or above to avoid user confusion - Add the 'leak' build profile, which enables highly detailed leak reporting during the build - Remove ResourceLeakException which is unsed anymore	2013-12-05 00:51:39 +09:00
Norman Maurer	053c512f6d	Fix checkstyle	2013-12-02 08:23:57 +01:00
Norman Maurer	14600167d6	[#2021 ] No need to synchronize for unpooled chunks	2013-12-02 08:02:48 +01:00

1 2 3 4 5 ...

423 Commits