Commit Graph

404 Commits

Author SHA1 Message Date
Pavan Kumar
36305d7dce Improve the allocation algorithm in PoolChunk
Motivation:

Depth-first search is not always efficient for buddy allocation.

Modification:

Employ a new faster search algorithm with different memoryMap layout.

Result:

With thread-local cache disabled, we see a lot of performance
improvment, especially when the size of the allocation is as small as
the page size, which had the largest search space previously:

-- master head --
Benchmark                (size) Mode    Score  Error Units
pooledDirectAllocAndFree  8192 thrpt  215.392  1.565 ops/ms
pooledDirectAllocAndFree 16384 thrpt  594.625  2.154 ops/ms
pooledDirectAllocAndFree 65536 thrpt 1221.520 18.965 ops/ms
pooledHeapAllocAndFree    8192 thrpt  217.175  1.653 ops/ms
pooledHeapAllocAndFree   16384 thrpt  587.250 14.827 ops/ms
pooledHeapAllocAndFree   65536 thrpt 1217.023 44.963 ops/ms

-- changes --
Benchmark                (size) Mode    Score  Error Units
pooledDirectAllocAndFree  8192 thrpt 3656.744 94.093 ops/ms
pooledDirectAllocAndFree 16384 thrpt 4087.152 22.921 ops/ms
pooledDirectAllocAndFree 65536 thrpt 4058.814 29.276 ops/ms
pooledHeapAllocAndFree    8192 thrpt 3640.355 44.418 ops/ms
pooledHeapAllocAndFree   16384 thrpt 4030.206 24.365 ops/ms
pooledHeapAllocAndFree   65536 thrpt 4103.991 70.991 ops/ms
2014-06-21 13:20:03 +09:00
Norman Maurer
c44b5f54fb Remove System.out.println(...) debug messages 2014-06-20 19:42:47 +02:00
Norman Maurer
4e36c0ae96 [#2580] [#2587] Fix buffer corruption regression when ByteBuf.order(LITTLE_ENDIAN) is used
Motivation:

To improve the speed of ByteBuf with order LITTLE_ENDIAN and where the native order is also LITTLE_ENDIAN (intel) we introduces a new special SwappedByteBuf before in commit 4ad3984c8b. Unfortunally the commit has a flaw which does not handle correctly the case when a ByteBuf expands. This was caused because the memoryAddress was cached and never changed again even if the underlying buffer expanded. This can lead to corrupt data or even to SEGFAULT the JVM if you are lucky enough.

Modification:

Always lookup the actual memoryAddress of the wrapped ByteBuf.

Result:

No more data-corruption for ByteBuf with order LITTLE_ENDIAN and no JVM crashes.
2014-06-20 18:19:45 +02:00
Trustin Lee
760bbc7ea6 Refactor FastThreadLocal to simplify TLV management
Motivation:

When Netty runs in a managed environment such as web application server,
Netty needs to provide an explicit way to remove the thread-local
variables it created to prevent class loader leaks.

FastThreadLocal uses different execution paths for storing a
thread-local variable depending on the type of the current thread.
It increases the complexity of thread-local removal.

Modifications:

- Moved FastThreadLocal and FastThreadLocalThread out of the internal
  package so that a user can use it.
- FastThreadLocal now keeps track of all thread local variables it has
  initialized, and calling FastThreadLocal.removeAll() will remove all
  thread-local variables of the caller thread.
- Added FastThreadLocal.size() for diagnostics and tests
- Introduce InternalThreadLocalMap which is a mixture of hard-wired
  thread local variable fields and extensible indexed variables
- FastThreadLocal now uses InternalThreadLocalMap to implement a
  thread-local variable.
- Added ThreadDeathWatcher.unwatch() so that PooledByteBufAllocator
  tells it to stop watching when its thread-local cache has been freed
  by FastThreadLocal.removeAll().
- Added FastThreadLocalTest to ensure that removeAll() works
- Added microbenchmark for FastThreadLocal and JDK ThreadLocal
- Upgraded to JMH 0.9

Result:

- A user can remove all thread-local variables Netty created, as long as
  he or she did not exit from the current thread. (Note that there's no
  way to remove a thread-local variable from outside of the thread.)
- FastThreadLocal exposes more useful operations such as isSet() because
  we always implement a thread local variable via InternalThreadLocalMap
  instead of falling back to JDK ThreadLocal.
- FastThreadLocalBenchmark shows that this change improves the
  performance of FastThreadLocal even more.
2014-06-19 21:17:46 +09:00
Norman Maurer
2c7ecb444d Move calculateNewCapacity(...) to ByteBufAllocator
Motivation:

Currently we have the algorithm of calculate the new capacity of a ByteBuf implemented in AbstractByteBuf. The problem with this is that it is impossible for a user to change it if it not fits well it's use-case. We should better move it to ByteBufAllocator and so let the user implement it's own by either write his/her own ByteBufAllocator or just override the default implementation in one of our provided ByteBufAllocators.

Modifications:

Move calculateNewCapacity(...) to ByteBufAllocator and move the implementation (which was part of AbstractByteBuf) to AbstractByteBufAllocator.

Result:

The user can now override the default calculation algorithm when needed.
2014-06-17 09:32:46 +02:00
Norman Maurer
a2ec2a1e3a [#2573] UnpooledUnsafeDirectByteBuf.setBytes(int,ByteBuf,int,int) fails to use fast-path when src has array
Motivation:

UnpooledUnsafeDirectByteBuf.setBytes(int,ByteBuf,int,int) fails to use fast-path when src uses an array as backing storage. This is because the if else uses the wrong ByteBuf for its check.

Modifications:

- Use correct ByteBuf when check for array as backing storage
- Also eliminate unecessary check in UnpooledDirectByteBuf which always fails anyway

Result:

Faster setBytes(...) when src ByteBuf is backed by an array.

No more IndexOutOfBoundsException or data-corruption.
2014-06-16 11:11:17 +02:00
belliottsmith
7d37af5dfb Introduce FastThreadLocal which uses an EnumMap and a predefined fixed set of possible thread locals
Motivation:
Provide a faster ThreadLocal implementation

Modification:
Add a "FastThreadLocal" which uses an EnumMap and a predefined fixed set of possible thread locals (all of the static instances created by netty) that is around 10-20% faster than standard ThreadLocal in my benchmarks (and can be seen having an effect in the direct PooledByteBufAllocator benchmark that uses the DEFAULT ByteBufAllocator which uses this FastThreadLocal, as opposed to normal instantiations that do not, and in the new RecyclableArrayList benchmark);

Result:
Improved performance
2014-06-13 11:02:16 +02:00
Norman Maurer
405d573715 [#2436] Unsafe*ByteBuf implementation should only invert bytes if ByteOrder differ from native ByteOrder
Motivation:
Our Unsafe*ByteBuf implementation always invert bytes when the native ByteOrder is LITTLE_ENDIAN (this is true on intel), even when the user calls order(ByteOrder.LITTLE_ENDIAN). This is not optimal for performance reasons as the user should be able to set the ByteOrder to LITTLE_ENDIAN and so write bytes without the extra inverting.

Modification:
- Introduce a new special SwappedByteBuf (called UnsafeDirectSwappedByteBuf) that is used by all the Unsafe*ByteBuf implementation and allows to write without inverting the bytes.
- Add benchmark
- Upgrade jmh to 0.8

Result:
The user is be able to get the max performance even on servers that have ByteOrder.LITTLE_ENDIAN as their native ByteOrder.
2014-06-05 10:59:03 +02:00
Trustin Lee
ddb6441212 Use Java 5 foreach for arrays for brevity at no cost 2014-06-02 18:25:49 +09:00
Trustin Lee
642f4bb3b1 Introduce ThreadDeathWatcher
Motivation:

PooledByteBufAllocator's thread local cache and
ReferenceCountUtil.releaseLater() are in need of a way to run an
arbitrary logic when a certain thread is terminated.

Modifications:

- Add ThreadDeathWatcher, which spawns a low-priority daemon thread
  that watches a list of threads periodically (every second) and
  invokes the specified tasks when the associated threads are not alive
  anymore
  - Start-stop logic based on CAS operation proposed by @tea-dragon
- Add debug-level log messages to see if ThreadDeathWatcher works

Result:

- Fixes #2519 because we don't use GlobalEventExecutor anymore
- Cleaner code
2014-06-02 18:14:23 +09:00
Trustin Lee
ea0eb4fdab Do not use a pseudo random for tree traversal
Motivation:

If we make allocateRun/SubpageSimple() always try the left node first and make allocateRun/Subpage() always tries the right node first,  it is more likely that allocateRun/Subpage() will find a node with ST_UNUSED sooner.

Modifications:

- Make allocateRunSimple() and allocateSubpageSimple() always try the left node first.
- Make allocateRun() and allocateSubpage() always try the right node first.
- Remove randome

Result:

We get the same performance without using random numbers.
2014-05-30 11:23:46 +09:00
Trustin Lee
1d0a79e11e Optimize PooledByteBufAllocator
Motivation:

We still have a room for improvement in PoolChunk.allocateRun() and
Subpage.allocate().

Modifications:

- Unroll the recursion in PoolChunk.allocateRun()
- Subpage.allocate() makes use of the 'nextAvail' value set by previous
  free().

Result:

- PoolChunk.allocateRun() optimization yields 10%+ improvements in
  allocation throughput for non-subpage allocations.
- Subpage.allocate() optimization makes the subpage allocations for
  tiny buffers as fast as non-tiny buffers even when the pageSize is
  huge (e.g. 1048576) because it doesn't need to perform a linear search
  in most cases.
2014-05-30 10:50:23 +09:00
Jake Luciani
856c89dd70 Fix capacity check bug affecting offheap buffers 2014-05-13 07:25:26 +02:00
Trustin Lee
d2614cfc01 Synchronized between 4.1 and master
Motivation:

4 and 5 were diverged long time ago and we recently reverted some of the
early commits in master.  We must make sure 4.1 and master are not very
different now.

Modification:

Fix found differences

Result:

4.1 and master got closer.
2014-04-25 00:36:01 +09:00
ian
db5e729853 Fix error that causes (up to) double memory usage
Motivation:

PoolArena's 'normalizeCapacity' function was micro-optimized some
time ago to remove a while loop. However, there was a change of
behavior in the function as a result. Capacities passed into it
that are already powers of 2 (and >= 512) are doubled in size. So
if I ask for a buffer with a capacity of 1024, I will get back one
that actually uses 2048 bytes (stored in maxLength).

Aligning to powers of two for book keeping ease is reasonable,
and if someone tries to expand a buffer, you might as well use some
of the previously wasted space. However, since this distinction
between 'easily expanded' and 'costly to expand' space is not
supported at all by the APIs, I cannot imagine this change to
doubling is desirable or intentional.

This is especially costly when using composite buffers. They
frequently allocate components with a capacity that is a power of
2, and they never attempt to expand components themselves. The end
result is that heavy use of pool-backed composite buffers wastes
almost half of the memory pool (the smaller / initial components are
<512 and so are not affected by the off-by-one bug).

Modifications:

Although I find it difficult to believe that such an optimization
is really helpful, I left it in and fixed the off-by-one issue by
decrementing the value at the start.

I also added a simple test to both attempt to verify that the
decrement fixes the issue without introducing any other change, and
to make it easy for a reviewer to test the existing behavior. PoolArena
does not seem to have much testing or testability support though so
the test is kind of a hack and will break for unrelated changes. I
suggest either removing it or factoring out the single non-static
portion of normalizeCapacity so that the fragile dummy PoolArena is
not required.

Result:

Pooled allocators will allocate less resources to the highly
inefficient and undocumented buffer section between length and
maxLength.

Composite buffers of non-trivial size that are backed by pooled
allocators will use about half as much memory.
2014-04-15 06:57:19 +02:00
Norman Maurer
7f99f0bb32 [#2370] Periodically check for not alive Threads and free up their ThreadPoolCache
Motivation:
At the moment we create new ThreadPoolCache whenever a Thread tries either allocate or release something on the PooledByteBufAllocator. When something is released we put it then in its ThreadPoolCache. The problem is we never check if a Thread is not alive anymore and so we may end up with memory that is never freed again if a user create many short living Threads that use the PooledByteBufAllocator.

Modifications:
Periodically check if the Thread is still alive that has a ThreadPoolCache assinged and if not free it.

Result:
Memory is freed up correctly even for short living Threads.
2014-04-09 12:03:22 +02:00
Norman Maurer
241d24cbfa Implement Thread caches for pooled buffers to minimize conditions. This fixes [#2264] and [#808].
Motivation:
Remove the synchronization bottleneck in PoolArena and so speed up things

Modifications:

This implementation uses kind of the same technics as outlined in the jemalloc paper and jemalloc
blogpost https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919.

At the moment we only cache for "known" Threads (that powers EventExecutors) and not for others to keep the overhead
minimal when need to free up unused buffers in the cache and free up cached buffers once the Thread completes. Here
we use multi-level caches for tiny, small and normal allocations. Huge allocations are not cached at all to keep the
memory usage at a sane level. All the different cache configurations can be adjusted via system properties or the constructor
directly where it makes sense.

Result:
Less conditions as most allocations can be served by the cache itself
2014-03-20 09:31:15 -07:00
Trustin Lee
ab72dd7303 Fix and simplify freeing a direct buffer / Fix Android support
Motivation:

6e8ba291cf introduced a regression in Android because Android does not have sun.nio.ch.DirectBuffer (see #2330.)  I also found PlatformDependent0.freeDirectBuffer() and freeDirectBufferUnsafe() are pretty much same after the commit and the unsafe version should be removed.

Modifications:

- Do not use the pooled allocator in Android because it's too resource hungry for Androids.
- Merge PlatformDependent0.freeDirectBuffer() and freeDirectBufferUnsafe() into one method.
- Make the Unsafe unavailable when sun.nio.ch.DirectBuffer is unavailable.  We could keep the Unsafe available and handle the sun.nio.ch.DirectBuffer case separately, but I don't want to complicate our code just because of that.  All supported JDK versions have sun.nio.ch.DirectBuffer if the Unsafe is available.

Result:

Simpler code. Fixes Android support (#2330)
2014-03-20 11:12:15 +09:00
Jakob Buchgraber
c2d34649ff Bit tricks to check for and calculate power of two.
Motivation:
I was studying the code and thought this was simpler and easier to
understand.

Modifications:
Replaced the for loop and if conditions, with a simple implementation.

Result:
Code is easier to understand.
2014-03-18 16:00:14 +09:00
Bourne, Geoff
8cbd53978a Fix limit computation of NIO ByteBuffers obtained via ReadOnlyByteBufferBuf.nioBuffer
Motivation:

When starting with a read-only NIO buffer, wrapping it in a ByteBuf,
and then later retrieving a re-wrapped NIO buffer the limit was getting
too short.

Modifications:

Changed ReadOnlyByteBufferBuf.nioBuffer(int,int) to compute the
limit in the same manner as the internalNioBuffer method.

Result:

Round-trip conversion from NIO to ByteBuf to NIO will work reliably.
2014-03-14 08:10:29 +01:00
Trustin Lee
d5e897bba2 Determine the default allocator from system property
- Add ByteBufAllocator.DEFAULT
- The default allocator is now 'pooled'
2014-02-14 13:04:48 -08:00
Norman Maurer
e7b800eb82 [#2187] Always do a volatile read on the refCnt 2014-02-07 09:37:31 +01:00
Trustin Lee
b4e3e09b76 Fix a bug that CompositeByteBuf.touch() does nothing 2014-02-06 21:00:05 -08:00
Norman Maurer
7a1a30f0ad Provide an optimized AtomicIntegerFieldUpdater, AtomicLongFieldUpdater and AtomicReferenceFieldUpdater 2014-02-06 21:07:31 +01:00
Trustin Lee
0f1b1be0aa Enable a user specify an arbitrary information with ReferenceCounted.touch()
- Related: #2163
- Add ResourceLeakHint to allow a user to provide a meaningful information about the leak when touching it
- DefaultChannelHandlerContext now implements ResourceLeakHint to tell where the message is going.
- Cleaner resource leak report by excluding noisy stack trace elements
2014-01-29 11:44:59 +09:00
Trustin Lee
b887e35ac2 Add ReferenceCounted.touch() / Add missing retain() overrides
- Fixes #2163
- Inspector warnings
2014-01-28 20:06:55 +09:00
Trustin Lee
f3a842ecca [maven-release-plugin] prepare for next development iteration 2013-12-22 22:06:15 +09:00
Trustin Lee
888dfba76f [maven-release-plugin] prepare release netty-5.0.0.Alpha1 2013-12-22 22:06:06 +09:00
Trustin Lee
04ec2e1330 Add Recycler.Handle.recycle() so that it's possible to recycle an object without an explicit reference to Recycler 2013-12-19 01:10:52 +09:00
Trustin Lee
065b6cf785 Fixed various buffer leaks in FixedCompositeByteBufTest 2013-12-07 11:36:18 +09:00
Norman Maurer
643ce2f8c0 Fix all leaks reported during tests
- One notable leak is from WebSocketFrameAggregator
- All other leaks are from tests
2013-12-07 00:47:30 +09:00
Trustin Lee
187a5976cc Fix false-positive leaks
- All derived buffers and swapped buffers of a leak-aware buffer must be wrapped again with the leak-aware buffer
2013-12-06 21:32:47 +09:00
Trustin Lee
0f3451c227 Add ReferenceCountUtil.releaseLater() to make writing tests easy with ReferenceCounteds 2013-12-06 15:12:46 +09:00
Trustin Lee
ea3143a1ee Checkstyle 2013-12-06 13:53:42 +09:00
Trustin Lee
d21568b962 Also record retain() and release() 2013-12-06 13:44:59 +09:00
Trustin Lee
6431be8954 Better buffer leak reporting
- Remove the reference to ResourceLeak from the buffer implementations
  and use wrappers instead:
  - SimpleLeakAwareByteBuf and AdvancedLeakAwareByteBuf
  - It is now allocator's responsibility to create a leak-aware buffer.
  - Added AbstractByteBufAllocator.toLeakAwareBuffer() for easier
    implementation
- Add WrappedByteBuf to reduce duplication between *LeakAwareByteBuf and
  UnreleasableByteBuf
- Raise the level of leak reports to ERROR - because it will break the
  app eventually
- Replace enabled/disabled property with the leak detection level
  - Only print stack trace when level is ADVANCED or above to avoid user
    confusion
- Add the 'leak' build profile, which enables highly detailed leak
  reporting during the build
- Remove ResourceLeakException which is unsed anymore
2013-12-05 00:49:21 +09:00
Norman Maurer
f9a77b3c83 Add FixedCompositeByteBuf which can be used to write an array of ByteBuf in an efficient way.
This implementation does not produce as much GC pressure as CompositeByteBuf and so is prefered,
for writing an array of ByteBufs. Be aware that FixedCompositeByteBuf is readonly.

When using this in a project that make heavy use of CompositeByteBuf for writes we was able to cut
down allocation to a half.
2013-12-03 08:09:24 +01:00
Norman Maurer
a4e4479407 Fix checkstyle 2013-12-02 08:24:15 +01:00
Norman Maurer
d66bffe271 [#2021] No need to synchronize for unpooled chunks 2013-12-02 08:02:32 +01:00
Trustin Lee
110745b0eb Remove the distinction of inbound handlers and outbound handlers
- Fixes #1808
- Move all methods in ChannelInboundHandler and ChannelOutboundHandler up to ChannelHandler
- Remove ChannelInboundHandler and ChannelOutboundHandler
- Deprecate ChannelInboundHandlerAdapter, ChannelOutboundHandlerAdapter, and ChannelDuplexHandler
- Replace CombinedChannelDuplexHandler with ChannelHandlerAppender
  because it's not possible to combine two handlers into one easily now
- Introduce 'Skip' annotation to pass events through efficiently
- Remove all references to the deprecated types and update Javadoc
2013-11-27 17:31:28 +09:00
Trustin Lee
807d96ed6c Simplify bundle generation / Add io.netty.versions.properties to all JARs
- Fixes #2003 properly
- Instead of using 'bundle' packaging, use 'jar' packaging.  This is
  more robust because some strict build tools fail to retrieve the
  artifacts from a Maven repository unless their packaging is not 'jar'.
- All artifacts now contain META-INF/io.netty.version.properties, which
  provides the detailed information about the build and repository.
- Removed OSGi testsuite temporarily because it gives false errors
  during split package test and examination.
- Add io.netty.util.Version for easy retrieval of version information
2013-11-26 22:00:14 +09:00
Trustin Lee
2235873537 Resurrect Channel.id() with global uniqueness
- Fixes #1810
- Add a new interface ChannelId and its default implementation which generates globally unique channel ID.
- Replace AbstractChannel.hashCode with ChannelId.hashCode() and ChannelId.shortValue()
- Add variants of ByteBuf.hexDump() which accept byte[] instead of ByteBuf.
2013-11-18 15:30:12 +09:00
Trustin Lee
6ba1a85c4b Remove unnecessary parenthesis 2013-11-15 23:08:25 +09:00
Norman Maurer
d11d3a6b50 Also allow to override how direct ByteBuffers are freed 2013-11-12 12:44:11 +01:00
Norman Maurer
9b7d286652 Allow to override how wrapped direct ByteBuffer are allocated to make it easier to extend 2013-11-12 12:44:00 +01:00
Norman Maurer
329bbfcd87 [#1976] Fix IndexOutOfBoundsException when calling CompositeByteBuf.discardReadComponents() 2013-11-09 20:13:43 +01:00
Alex Petrov
519c632b2e Improve docstrings for and of 2013-11-08 12:23:43 +01:00
Trustin Lee
9125977692 Simpler toString() for ByteBufAllocators 2013-11-08 17:53:57 +09:00
Norman Maurer
c26d43757e [#1800] [#1802] Correctly expand capacity of ByteBuf while preserve content 2013-11-04 15:18:48 +01:00
Trustin Lee
26415b8f4c Use StringUtil.simpleClassName(..) instead of Class.getSimpleName() where necessary
- Class.getSimpleName() doesn't render anonymous classes very well
- + some minor cleanup
2013-11-04 19:42:33 +09:00