netty5

Author	SHA1	Message	Date
Norman Maurer	fe796fc8ab	Provide helper methods in ByteBufUtil to write UTF-8/ASCII CharSequences. Related to [#909 ] Motivation: We expose no methods in ByteBuf to directly write a CharSequence into it. This leads to have the user either convert the CharSequence first to a byte array or use CharsetEncoder. Both cases have some overheads and we can do a lot better for well known Charsets like UTF-8 and ASCII. Modifications: Add ByteBufUtil.writeAscii(...) and ByteBufUtil.writeUtf8(...) which can do the task in an optimized way. This is especially true if the passed in ByteBuf extends AbstractByteBuf which is true for all of our implementations which not wrap another ByteBuf. Result: Writing an ASCII and UTF-8 CharSequence into a AbstractByteBuf is a lot faster then what the user could do by himself as we can make use of some package private methods and so eliminate reference and range checks. When the Charseq is not ASCII or UTF-8 we can still do a very good job and are on par in most of the cases with what the user would do. The following benchmark shows the improvements: Result: 2456866.966 ?(99.9%) 59066.370 ops/s [Average] Statistics: (min, avg, max) = (2297025.189, 2456866.966, 2586003.225), stdev = 78851.914 Confidence interval (99.9%): [2397800.596, 2515933.336] Benchmark Mode Samples Score Score error Units i.n.m.b.ByteBufUtilBenchmark.writeAscii thrpt 50 9398165.238 131503.098 ops/s i.n.m.b.ByteBufUtilBenchmark.writeAsciiString thrpt 50 9695177.968 176684.821 ops/s i.n.m.b.ByteBufUtilBenchmark.writeAsciiStringViaArray thrpt 50 4788597.415 83181.549 ops/s i.n.m.b.ByteBufUtilBenchmark.writeAsciiStringViaArrayWrapped thrpt 50 4722297.435 98984.491 ops/s i.n.m.b.ByteBufUtilBenchmark.writeAsciiStringWrapped thrpt 50 4028689.762 66192.505 ops/s i.n.m.b.ByteBufUtilBenchmark.writeAsciiViaArray thrpt 50 3234841.565 91308.009 ops/s i.n.m.b.ByteBufUtilBenchmark.writeAsciiViaArrayWrapped thrpt 50 3311387.474 39018.933 ops/s i.n.m.b.ByteBufUtilBenchmark.writeAsciiWrapped thrpt 50 3379764.250 66735.415 ops/s i.n.m.b.ByteBufUtilBenchmark.writeUtf8 thrpt 50 5671116.821 101760.081 ops/s i.n.m.b.ByteBufUtilBenchmark.writeUtf8String thrpt 50 5682733.440 111874.084 ops/s i.n.m.b.ByteBufUtilBenchmark.writeUtf8StringViaArray thrpt 50 3564548.995 55709.512 ops/s i.n.m.b.ByteBufUtilBenchmark.writeUtf8StringViaArrayWrapped thrpt 50 3621053.671 47632.820 ops/s i.n.m.b.ByteBufUtilBenchmark.writeUtf8StringWrapped thrpt 50 2634029.071 52304.876 ops/s i.n.m.b.ByteBufUtilBenchmark.writeUtf8ViaArray thrpt 50 3397049.332 57784.119 ops/s i.n.m.b.ByteBufUtilBenchmark.writeUtf8ViaArrayWrapped thrpt 50 3318685.262 35869.562 ops/s i.n.m.b.ByteBufUtilBenchmark.writeUtf8Wrapped thrpt 50 2473791.249 46423.114 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1,387.417 sec - in io.netty.microbench.buffer.ByteBufUtilBenchmark Results : Tests run: 1, Failures: 0, Errors: 0, Skipped: 0 Results : Tests run: 1, Failures: 0, Errors: 0, Skipped: 0 The ViaArray benchmarks are basically doing a toString().getBytes(Charset) which the others are using ByteBufUtil.write*(...).	2014-12-26 15:58:18 +09:00
Norman Maurer	66294892a0	CompositeByteBuf.nioBuffers(...) must not return an empty ByteBuffer array Motivation: CompositeByteBuf.nioBuffers(...) returns an empty ByteBuffer array if the specified length is 0. This is not consistent with other ByteBuf implementations which return an ByteBuffer array of size 1 with an empty ByteBuffer included. Modifications: Make CompositeByteBuf.nioBuffers(...) consistent with other ByteBuf implementations. Result: Consistent and correct behaviour of nioBufffers(...)	2014-12-22 11:18:32 +01:00
Norman Maurer	a69a39c849	Always return SliceByteBuf on slice(...) to eliminate possible leak Motivation: When calling slice(...) on a ByteBuf the returned ByteBuf should be the slice of a ByteBuf and shares it's reference count. This is important as it is perfect legal to use buf.slice(...).release() and have both, the slice and the original ByteBuf released. At the moment this is only the case if the requested slice size is > 0. This makes the behavior inconsistent and so may lead to a memory leak. Modifications: - Never return Unpooled.EMPTY_BUFFER when calling slice(...). - Adding test case for buffer.slice(...).release() and buffer.duplicate(...).release() Result: Consistent behaviour and so no more leaks possible.	2014-12-22 11:15:50 +01:00
Norman Maurer	182c91f06c	Ensure buffer is not released when call array() / memoryAddress() Motivation: Before we missed to check if a buffer was released before we return the backing byte array or memoryaddress. This could lead to JVM crashes when someone tried various bulk operations on the UnsafeByteBuf implementations. Modifications: Always check if the buffer is released before all to return the byte array and memoryaddress. Result: No more JVM crashes because of released buffers when doing bulk operations on UnsafeByteBuf implementations.	2014-12-11 11:30:31 +01:00
Idel Pivnitskiy	35db3c6710	Small performance improvements Motivation: Found performance issues via FindBugs and PMD. Modifications: - Removed unnecessary boxing/unboxing operations in DefaultTextHeaders.convertToInt(CharSequence) and DefaultTextHeaders.convertToLong(CharSequence). A boxed primitive is created from a string, just to extract the unboxed primitive value. - Added a static modifier for DefaultHttp2Connection.ParentChangedEvent class. This class is an inner class, but does not use its embedded reference to the object which created it. This reference makes the instances of the class larger, and may keep the reference to the creator object alive longer than necessary. - Added a static compiled Pattern to avoid compile it each time it is used when we need to replace some part of authority. - Improved using of StringBuilders. Result: Performance improvements.	2014-11-20 00:10:06 -05:00
Norman Maurer	48f1398869	Disable caching of PooledByteBuf for different threads. Motivation: We introduced a PoolThreadCache which is used in our PooledByteBufAllocator to reduce the synchronization overhead on PoolArenas when allocate / deallocate PooledByteBuf instances. This cache is used for both the allocation path and deallocation path by: - Look for cached memory in the PoolThreadCache for the Thread that tries to allocate a new PooledByteBuf and if one is found return it. - Add the memory that is used by a PooledByteBuf to the PoolThreadCache of the Thread that release the PooledByteBuf This works out very well when all allocation / deallocation is done in the EventLoop as the EventLoop will be used for read and write. On the otherside this can lead to surprising side-effects if the user allocate from outside the EventLoop and and pass the ByteBuf over for writing. The problem here is that the memory will be added to the PoolThreadCache that did the actual write on the underlying transport and not on the Thread that previously allocated the buffer. Modifications: Don't cache if different Threads are used for allocating/deallocating Result: Less confusing behavior for users that allocate PooledByteBufs from outside the EventLoop.	2014-09-22 13:39:31 +02:00
Norman Maurer	858de5699b	[#2924 ] Correctly update head in MemoryRegionCache.trim() Motivation: When MemoryRegionCache.trim() is called, some unused cache entries will be freed (started from head). However, in MeoryRegionCache.trim() the head is not updated, which make entry list's head point to an entry whose chunk is null now and following allocate of MeoryRegionCache will return false immediately. In other word, cache is no longer usable once trim happen. Modifications: Update head to correct idx after free entries in trim(). Result: MemoryRegionCache behaves correctly even after calling trim().	2014-09-22 11:04:21 +02:00
Norman Maurer	4e62b51c6d	[#2843 ] Add test-case to show correct behavior of ByteBuf.refCnt() and ByteBuf.release(...) Motivation: We received a bug-report that the ByteBuf.refCnt() does sometimes not show the correct value when release() and refCnt() is called from different Threads. Modifications: Add test-case which shows that all is working like expected Result: Test-case added which shows everything is ok.	2014-09-01 08:50:21 +02:00
Trustin Lee	b5f61d0de5	[maven-release-plugin] prepare for next development iteration	2014-08-16 03:27:42 +09:00
Trustin Lee	76ac3b21a5	[maven-release-plugin] prepare release netty-4.1.0.Beta3	2014-08-16 03:27:37 +09:00
Trustin Lee	b3c1904cc9	[maven-release-plugin] prepare for next development iteration	2014-08-15 09:31:03 +09:00
Trustin Lee	e013b2400f	[maven-release-plugin] prepare release netty-4.1.0.Beta2	2014-08-15 09:30:59 +09:00
Trustin Lee	0dc6a8dccf	Use heap buffers for Unpooled.copiedBuffer() Related issue: #2028 Motivation: Some copiedBuffer() methods in Unpooled allocated a direct buffer. An allocation of a direct buffer is an expensive operation, and thus should be avoided for unpooled buffers. Modifications: - Use heap buffers in all copiedBuffer() methods Result: Unpooled.copiedBuffers() are less expensive now.	2014-08-13 15:10:11 -07:00
Norman Maurer	ef572d859d	Change back default allocator to pooled. Motivation: While porting some changes from 4.0 to 4.1 and master branch I changed the default allocator from pooled to unpooled by mistake. This should be reverted. The guilty commit is `4a3ef90381`. Thanks to @blucas for spotting this. Modifications: Revert changes related to allocator. Result: Use the correct default allocator again.	2014-08-13 12:07:06 +02:00
Norman Maurer	869687bd71	Port ChannelOutboundBuffer and related changes from 4.0 Motivation: We did various changes related to the ChannelOutboundBuffer in 4.0 branch. This commit port all of them over and so make sure our branches are synced in terms of these changes. Related to [#2734], [#2709], [#2729], [#2710] and [#2693] . Modification: Port all changes that was done on the ChannelOutboundBuffer. This includes the port of the following commits: - `73dfd7c01b` - `997d8c32d2` - `e282e504f1` - `5e5d1a58fd` - `8ee3575e72` - `d6f0d12a86` - `16e50765d1` - `3f3e66c31a` Result: - Less memory usage by ChannelOutboundBuffer - Same code as in 4.0 branch - Make it possible to use ChannelOutboundBuffer with Channel implementation that not extends AbstractChannel	2014-08-05 15:00:45 +02:00
Trustin Lee	9a654d8a61	Remove duplicate range check in AbstractByteBuf.skipBytes()	2014-07-29 15:58:28 -07:00
Idel Pivnitskiy	ad1389be9d	Small performance improvements Modifications: - Added a static modifier for CompositeByteBuf.Component. This class is an inner class, but does not use its embedded reference to the object which created it. This reference makes the instances of the class larger, and may keep the reference to the creator object alive longer than necessary. - Removed unnecessary boxing/unboxing operations in HttpResponseDecoder, RtspResponseDecoder, PerMessageDeflateClientExtensionHandshaker and PerMessageDeflateServerExtensionHandshaker A boxed primitive is created from a String, just to extract the unboxed primitive value. - Removed unnecessary 3 times calculations in DiskAttribute.addContent(...). - Removed unnecessary checks if file exists before call mkdirs() in NativeLibraryLoader and PlatformDependent. Because the method mkdirs() has this check inside. - Removed unnecessary `instanceof AsciiString` check in StompSubframeAggregator.contentLength(StompHeadersSubframe) and StompSubframeDecoder.getContentLength(StompHeaders, long). Because StompHeaders.get(CharSequence) always returns java.lang.String.	2014-07-20 09:26:04 +02:00
Norman Maurer	f88dfd0430	[#2653 ] Remove unnecessary ensureAccessible() calls Motivation: I introduced ensureAccessible() class as part of `6c47cc9711` in some places. Unfortunally I also added some where these are not needed and so caused a performance regression. Modification: Remove calls where not needed. Result: Fixed performance regression.	2014-07-14 21:04:12 +02:00
Norman Maurer	93c306602a	[#2653 ] Remove uncessary range checks for performance reasons Motivation: I introduced range checks as part of `6c47cc9711` in some places. Unfortunally I also added some where these are not needed and so caused a performance regression. Modification: Remove range checks where not needed Result: Fixed performance regression.	2014-07-14 11:43:19 +02:00
Brendt Lucas	ac8ac59148	[#2642 ] CompositeByteBuf.deallocate memory/GC improvement Motivation: CompositeByteBuf.deallocate generates unnecessary GC pressure when using the 'foreach' loop, as a 'foreach' loop creates an iterator when looping. Modification: Convert 'foreach' loop into regular 'for' loop. Result: Less GC pressure (and possibly more throughput) as the 'for' loop does not create an iterator	2014-07-08 21:08:14 +02:00
Trustin Lee	e167b02d52	[maven-release-plugin] prepare for next development iteration	2014-07-04 17:26:02 +09:00
Trustin Lee	ba50cb829b	[maven-release-plugin] prepare release netty-4.1.0.Beta1	2014-07-04 17:25:54 +09:00
Trustin Lee	787663a644	[maven-release-plugin] rollback the release of netty-4.1.0.Beta1	2014-07-04 17:11:14 +09:00
Trustin Lee	83eae705e1	[maven-release-plugin] prepare release netty-4.1.0.Beta1	2014-07-04 17:02:17 +09:00
Trustin Lee	fbf1bdbef1	Fix the build timeout when 'leak' profile is active Motivation: AbstractByteBufTest.testInternalBuffer() uses writeByte() operations to populate the sample data. Usually, this isn't a problem, but it starts to take a lot of time when the resource leak detection level gets higher. In our CI machine, testInternalBuffer() takes more than 30 minutes, causing the build timeout when the 'leak' profile is active (paranoid level resource detection.) Modification: Populate the sample data using ThreadLocalRandom.nextBytes() instead of using millions of writeByte() operations. Result: Test runs much faster when leak detection level is high.	2014-07-03 17:55:10 +09:00
Trustin Lee	d0912f2709	Fix most inspector warnings Motivation: It's good to minimize potentially broken windows. Modifications: Fix most inspector warnings from our profile Update IntObjectHashMap Result: Cleaner code	2014-07-02 19:55:07 +09:00
Norman Maurer	9594a81b95	[#2622 ] Correctly check reference count before try to work on the underlying memory Motivation: Because of how we use reference counting we need to check for the reference count before each operation that touches the underlying memory. This is especially true as we use sun.misc.Cleaner.clean() to release the memory ASAP when possible. Because of this the user may cause a SEGFAULT if an operation is called that tries to access the backing memory after it was released. Modification: Correctly check the reference count on all methods that access the underlying memory or expose it via a ByteBuffer. Result: Safer usage of ByteBuf	2014-06-30 07:14:25 +02:00
Trustin Lee	c0462c0c3b	Optimize PoolChunk - Using short[] for memoryMap did not improve performance. Reverting back to the original dual-byte[] structure in favor of simplicity. - Optimize allocateRun() which yields small performence improvement - Use local variable when member fields are accessed more than once	2014-06-26 17:06:10 +09:00
Trustin Lee	dbc011c3f4	Fix inspector warnings	2014-06-26 17:06:10 +09:00
Pavan Kumar	69a6ad940a	Improve the allocation algorithm in PoolChunk Motivation: Depth-first search is not always efficient for buddy allocation. Modification: Employ a new faster search algorithm with different memoryMap layout. Result: With thread-local cache disabled, we see a lot of performance improvment, especially when the size of the allocation is as small as the page size, which had the largest search space previously.	2014-06-26 17:06:10 +09:00
Trustin Lee	41d44a8161	Remove 'get' prefix from all HTTP/SPDY messages Motivation: Persuit for the consistency in method naming Modifications: - Remove the 'get' prefix from all HTTP/SPDY message classes - Fix some inspector warnings Result: Consistency	2014-06-24 18:03:33 +09:00
Norman Maurer	12a3e23e47	MessageToByteEncoder always starts with ByteBuf that use initalCapacity == 0 Motivation: MessageToByteEncoder always starts with ByteBuf that use initalCapacity == 0 when preferDirect is used. This is really wasteful in terms of performance as every first write into the buffer will cause an expand of the buffer itself. Modifications: - Change ByteBufAllocator.ioBuffer() use the same default initialCapacity as heapBuffer() and directBuffer() - Add new allocateBuffer method to MessageToByteEncoder that allow the user to do some smarter allocation based on the message that will be encoded. Result: Less expanding of buffer and more flexibilty when allocate the buffer for encoding.	2014-06-24 13:55:21 +09:00
Trustin Lee	37b07a04d4	Revert "Improve the allocation algorithm in PoolChunk" This reverts commit `36305d7dce`, which seems to cause an assertion failure on our CI machine.	2014-06-21 19:19:35 +09:00
Pavan Kumar	6bd8c5d4d0	Improve the allocation algorithm in PoolChunk Motivation: Depth-first search is not always efficient for buddy allocation. Modification: Employ a new faster search algorithm with different memoryMap layout. Result: With thread-local cache disabled, we see a lot of performance improvment, especially when the size of the allocation is as small as the page size, which had the largest search space previously: -- master head -- Benchmark (size) Mode Score Error Units pooledDirectAllocAndFree 8192 thrpt 215.392 1.565 ops/ms pooledDirectAllocAndFree 16384 thrpt 594.625 2.154 ops/ms pooledDirectAllocAndFree 65536 thrpt 1221.520 18.965 ops/ms pooledHeapAllocAndFree 8192 thrpt 217.175 1.653 ops/ms pooledHeapAllocAndFree 16384 thrpt 587.250 14.827 ops/ms pooledHeapAllocAndFree 65536 thrpt 1217.023 44.963 ops/ms -- changes -- Benchmark (size) Mode Score Error Units pooledDirectAllocAndFree 8192 thrpt 3656.744 94.093 ops/ms pooledDirectAllocAndFree 16384 thrpt 4087.152 22.921 ops/ms pooledDirectAllocAndFree 65536 thrpt 4058.814 29.276 ops/ms pooledHeapAllocAndFree 8192 thrpt 3640.355 44.418 ops/ms pooledHeapAllocAndFree 16384 thrpt 4030.206 24.365 ops/ms pooledHeapAllocAndFree 65536 thrpt 4103.991 70.991 ops/ms	2014-06-21 13:20:25 +09:00
Norman Maurer	f05510063e	Remove System.out.println(...) debug messages	2014-06-20 19:42:38 +02:00
Norman Maurer	371f8066d2	[#2580 ] [#2587 ] Fix buffer corruption regression when ByteBuf.order(LITTLE_ENDIAN) is used Motivation: To improve the speed of ByteBuf with order LITTLE_ENDIAN and where the native order is also LITTLE_ENDIAN (intel) we introduces a new special SwappedByteBuf before in commit `4ad3984c8b`. Unfortunally the commit has a flaw which does not handle correctly the case when a ByteBuf expands. This was caused because the memoryAddress was cached and never changed again even if the underlying buffer expanded. This can lead to corrupt data or even to SEGFAULT the JVM if you are lucky enough. Modification: Always lookup the actual memoryAddress of the wrapped ByteBuf. Result: No more data-corruption for ByteBuf with order LITTLE_ENDIAN and no JVM crashes.	2014-06-20 18:24:44 +02:00
Trustin Lee	085a61a310	Refactor FastThreadLocal to simplify TLV management Motivation: When Netty runs in a managed environment such as web application server, Netty needs to provide an explicit way to remove the thread-local variables it created to prevent class loader leaks. FastThreadLocal uses different execution paths for storing a thread-local variable depending on the type of the current thread. It increases the complexity of thread-local removal. Modifications: - Moved FastThreadLocal and FastThreadLocalThread out of the internal package so that a user can use it. - FastThreadLocal now keeps track of all thread local variables it has initialized, and calling FastThreadLocal.removeAll() will remove all thread-local variables of the caller thread. - Added FastThreadLocal.size() for diagnostics and tests - Introduce InternalThreadLocalMap which is a mixture of hard-wired thread local variable fields and extensible indexed variables - FastThreadLocal now uses InternalThreadLocalMap to implement a thread-local variable. - Added ThreadDeathWatcher.unwatch() so that PooledByteBufAllocator tells it to stop watching when its thread-local cache has been freed by FastThreadLocal.removeAll(). - Added FastThreadLocalTest to ensure that removeAll() works - Added microbenchmark for FastThreadLocal and JDK ThreadLocal - Upgraded to JMH 0.9 Result: - A user can remove all thread-local variables Netty created, as long as he or she did not exit from the current thread. (Note that there's no way to remove a thread-local variable from outside of the thread.) - FastThreadLocal exposes more useful operations such as isSet() because we always implement a thread local variable via InternalThreadLocalMap instead of falling back to JDK ThreadLocal. - FastThreadLocalBenchmark shows that this change improves the performance of FastThreadLocal even more.	2014-06-19 21:13:55 +09:00
Norman Maurer	ad86ec798d	Move calculateNewCapacity(...) to ByteBufAllocator Motivation: Currently we have the algorithm of calculate the new capacity of a ByteBuf implemented in AbstractByteBuf. The problem with this is that it is impossible for a user to change it if it not fits well it's use-case. We should better move it to ByteBufAllocator and so let the user implement it's own by either write his/her own ByteBufAllocator or just override the default implementation in one of our provided ByteBufAllocators. Modifications: Move calculateNewCapacity(...) to ByteBufAllocator and move the implementation (which was part of AbstractByteBuf) to AbstractByteBufAllocator. Result: The user can now override the default calculation algorithm when needed.	2014-06-17 09:35:45 +02:00
Norman Maurer	066f95d047	[#2573 ] UnpooledUnsafeDirectByteBuf.setBytes(int,ByteBuf,int,int) fails to use fast-path when src has array Motivation: UnpooledUnsafeDirectByteBuf.setBytes(int,ByteBuf,int,int) fails to use fast-path when src uses an array as backing storage. This is because the if else uses the wrong ByteBuf for its check. Modifications: - Use correct ByteBuf when check for array as backing storage - Also eliminate unecessary check in UnpooledDirectByteBuf which always fails anyway Result: Faster setBytes(...) when src ByteBuf is backed by an array. No more IndexOutOfBoundsException or data-corruption.	2014-06-16 11:11:41 +02:00
belliottsmith	2a2a21ec59	Introduce FastThreadLocal which uses an EnumMap and a predefined fixed set of possible thread locals Motivation: Provide a faster ThreadLocal implementation Modification: Add a "FastThreadLocal" which uses an EnumMap and a predefined fixed set of possible thread locals (all of the static instances created by netty) that is around 10-20% faster than standard ThreadLocal in my benchmarks (and can be seen having an effect in the direct PooledByteBufAllocator benchmark that uses the DEFAULT ByteBufAllocator which uses this FastThreadLocal, as opposed to normal instantiations that do not, and in the new RecyclableArrayList benchmark); Result: Improved performance	2014-06-13 10:56:18 +02:00
Norman Maurer	61dbc353ca	[#2436 ] UnsafeByteBuf implementation should only invert bytes if ByteOrder differ from native ByteOrder Motivation: Our UnsafeByteBuf implementation always invert bytes when the native ByteOrder is LITTLE_ENDIAN (this is true on intel), even when the user calls order(ByteOrder.LITTLE_ENDIAN). This is not optimal for performance reasons as the user should be able to set the ByteOrder to LITTLE_ENDIAN and so write bytes without the extra inverting. Modification: - Introduce a new special SwappedByteBuf (called UnsafeDirectSwappedByteBuf) that is used by all the Unsafe*ByteBuf implementation and allows to write without inverting the bytes. - Add benchmark - Upgrade jmh to 0.8 Result: The user is be able to get the max performance even on servers that have ByteOrder.LITTLE_ENDIAN as their native ByteOrder.	2014-06-05 10:59:22 +02:00
Trustin Lee	7d9374a582	Use Java 5 foreach for arrays for brevity at no cost	2014-06-02 18:25:25 +09:00
Trustin Lee	af4c30fa56	Remove the deprecated constructor	2014-06-02 18:24:19 +09:00
Trustin Lee	e79ca269b8	Introduce ThreadDeathWatcher Motivation: PooledByteBufAllocator's thread local cache and ReferenceCountUtil.releaseLater() are in need of a way to run an arbitrary logic when a certain thread is terminated. Modifications: - Add ThreadDeathWatcher, which spawns a low-priority daemon thread that watches a list of threads periodically (every second) and invokes the specified tasks when the associated threads are not alive anymore - Start-stop logic based on CAS operation proposed by @tea-dragon - Add debug-level log messages to see if ThreadDeathWatcher works Result: - Fixes #2519 because we don't use GlobalEventExecutor anymore - Cleaner code	2014-06-02 18:23:23 +09:00
Trustin Lee	ea3dac0753	Do not use a pseudo random for tree traversal Motivation: If we make allocateRun/SubpageSimple() always try the left node first and make allocateRun/Subpage() always tries the right node first, it is more likely that allocateRun/Subpage() will find a node with ST_UNUSED sooner. Modifications: - Make allocateRunSimple() and allocateSubpageSimple() always try the left node first. - Make allocateRun() and allocateSubpage() always try the right node first. - Remove randome Result: We get the same performance without using random numbers.	2014-05-30 11:24:16 +09:00
Trustin Lee	e5ed69241b	Optimize PooledByteBufAllocator Motivation: We still have a room for improvement in PoolChunk.allocateRun() and Subpage.allocate(). Modifications: - Unroll the recursion in PoolChunk.allocateRun() - Subpage.allocate() makes use of the 'nextAvail' value set by previous free(). Result: - PoolChunk.allocateRun() optimization yields 10%+ improvements in allocation throughput for non-subpage allocations. - Subpage.allocate() optimization makes the subpage allocations for tiny buffers as fast as non-tiny buffers even when the pageSize is huge (e.g. 1048576) because it doesn't need to perform a linear search in most cases.	2014-05-30 10:51:21 +09:00
Jake Luciani	d547b5d51d	Fix capacity check bug affecting offheap buffers	2014-05-13 07:25:15 +02:00
Trustin Lee	db3709e652	Synchronized between 4.1 and master Motivation: 4 and 5 were diverged long time ago and we recently reverted some of the early commits in master. We must make sure 4.1 and master are not very different now. Modification: Fix found differences Result: 4.1 and master got closer.	2014-04-25 00:38:02 +09:00
ian	15d11289b0	Fix error that causes (up to) double memory usage Motivation: PoolArena's 'normalizeCapacity' function was micro-optimized some time ago to remove a while loop. However, there was a change of behavior in the function as a result. Capacities passed into it that are already powers of 2 (and >= 512) are doubled in size. So if I ask for a buffer with a capacity of 1024, I will get back one that actually uses 2048 bytes (stored in maxLength). Aligning to powers of two for book keeping ease is reasonable, and if someone tries to expand a buffer, you might as well use some of the previously wasted space. However, since this distinction between 'easily expanded' and 'costly to expand' space is not supported at all by the APIs, I cannot imagine this change to doubling is desirable or intentional. This is especially costly when using composite buffers. They frequently allocate components with a capacity that is a power of 2, and they never attempt to expand components themselves. The end result is that heavy use of pool-backed composite buffers wastes almost half of the memory pool (the smaller / initial components are <512 and so are not affected by the off-by-one bug). Modifications: Although I find it difficult to believe that such an optimization is really helpful, I left it in and fixed the off-by-one issue by decrementing the value at the start. I also added a simple test to both attempt to verify that the decrement fixes the issue without introducing any other change, and to make it easy for a reviewer to test the existing behavior. PoolArena does not seem to have much testing or testability support though so the test is kind of a hack and will break for unrelated changes. I suggest either removing it or factoring out the single non-static portion of normalizeCapacity so that the fragile dummy PoolArena is not required. Result: Pooled allocators will allocate less resources to the highly inefficient and undocumented buffer section between length and maxLength. Composite buffers of non-trivial size that are backed by pooled allocators will use about half as much memory.	2014-04-15 07:03:13 +02:00
Norman Maurer	ceffa82d0d	[#2370 ] Periodically check for not alive Threads and free up their ThreadPoolCache Motivation: At the moment we create new ThreadPoolCache whenever a Thread tries either allocate or release something on the PooledByteBufAllocator. When something is released we put it then in its ThreadPoolCache. The problem is we never check if a Thread is not alive anymore and so we may end up with memory that is never freed again if a user create many short living Threads that use the PooledByteBufAllocator. Modifications: Periodically check if the Thread is still alive that has a ThreadPoolCache assinged and if not free it. Result: Memory is freed up correctly even for short living Threads.	2014-04-09 11:45:11 +02:00

1 2 3 4 5 ...

459 Commits