Commit Graph

49 Commits

Author SHA1 Message Date
Norman Maurer
922e463524
Don't try to put back MemoryRegionCache.Entry objects into the Recycler when recycled because of a finalizer. (#8955)
Motivation:

In MemoryRegionCache.Entry we use the Recycler to reduce GC pressure and churn. The problem is that these will also be recycled when the PoolThreadCache is collected and finalize() is called. This then can have the effect that we try to load class but the WebApp is already stoped.

This will produce an stacktrace like this on Tomcat:

```
19-Mar-2019 15:53:21.351 INFO [Finalizer] org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading Illegal access: this web application instance has been stopped already. Could not load [java.util.WeakHashMap]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
 java.lang.IllegalStateException: Illegal access: this web application instance has been stopped already. Could not load [java.util.WeakHashMap]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
	at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading(WebappClassLoaderBase.java:1383)
	at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForClassLoading(WebappClassLoaderBase.java:1371)
	at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1224)
	at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1186)
	at io.netty.util.Recycler$3.initialValue(Recycler.java:233)
	at io.netty.util.Recycler$3.initialValue(Recycler.java:230)
	at io.netty.util.concurrent.FastThreadLocal.initialize(FastThreadLocal.java:188)
	at io.netty.util.concurrent.FastThreadLocal.get(FastThreadLocal.java:142)
	at io.netty.util.Recycler$Stack.pushLater(Recycler.java:624)
	at io.netty.util.Recycler$Stack.push(Recycler.java:597)
	at io.netty.util.Recycler$DefaultHandle.recycle(Recycler.java:225)
	at io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry.recycle(PoolThreadCache.java:478)
	at io.netty.buffer.PoolThreadCache$MemoryRegionCache.freeEntry(PoolThreadCache.java:459)
	at io.netty.buffer.PoolThreadCache$MemoryRegionCache.free(PoolThreadCache.java:430)
	at io.netty.buffer.PoolThreadCache$MemoryRegionCache.free(PoolThreadCache.java:422)
	at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:279)
	at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:270)
	at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:241)
	at io.netty.buffer.PoolThreadCache.finalize(PoolThreadCache.java:230)
	at java.lang.System$2.invokeFinalize(System.java:1270)
	at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:102)
	at java.lang.ref.Finalizer.access$100(Finalizer.java:34)
	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:217)
```

Beside this we also need to ensure we not try to lazy load SizeClass when the finalizer is used as it may not be present anymore if the ClassLoader is already destroyed.

This would produce an error like:

```
20-Mar-2019 11:26:35.254 INFO [Finalizer] org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading Illegal access: this web application instance has been stopped already. Could not load [io.netty.buffer.PoolArena$1]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
 java.lang.IllegalStateException: Illegal access: this web application instance has been stopped already. Could not load [io.netty.buffer.PoolArena$1]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
	at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading(WebappClassLoaderBase.java:1383)
	at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForClassLoading(WebappClassLoaderBase.java:1371)
	at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1224)
	at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1186)
	at io.netty.buffer.PoolArena.freeChunk(PoolArena.java:287)
	at io.netty.buffer.PoolThreadCache$MemoryRegionCache.freeEntry(PoolThreadCache.java:464)
	at io.netty.buffer.PoolThreadCache$MemoryRegionCache.free(PoolThreadCache.java:429)
	at io.netty.buffer.PoolThreadCache$MemoryRegionCache.free(PoolThreadCache.java:421)
	at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:278)
	at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:269)
	at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:240)
	at io.netty.buffer.PoolThreadCache.finalize(PoolThreadCache.java:229)
	at java.lang.System$2.invokeFinalize(System.java:1270)
	at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:102)
	at java.lang.ref.Finalizer.access$100(Finalizer.java:34)
	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:217)
```

Modifications:

- Only try to put the Entry back into the Recycler if the PoolThredCache is not destroyed because of the finalizer.
- Only try to access SizeClass if not triggered by finalizer.

Result:

No IllegalStateException anymoe when a webapp is reloaded in Tomcat that uses netty and uses the PooledByteBufAllocator.
2019-03-22 12:16:21 +01:00
Norman Maurer
c83904a12a
Allow to automatically trim the PoolThreadCache in a timely interval (#8941)
Motivation:

PooledByteBufAllocator uses a PoolThreadCache per Thread that allocates / deallocates to minimize the performance overhead. This PoolThreadCache is trimmed after X allocations to free up buffers that are not allocated for a long time. This works out quite well when the app continues to allocate but fails if the app stops to allocate frequently (for whatever reason) and so a lot of memory is wasted and not given back to the arena / freed.

Modifications:

- Add a ThreadExecutorMap that offers multiple methods that wrap Runnable / ThreadFactory / Executor and allow to call ThreadExecutorMap.currentEventExecutor() to get the current executing EventExecutor for the calling Thread.
- Use these methods in the constructors of our EventExecutor implementations (which also covers the EventLoop implementations)
- Add io.netty.allocator.cacheTrimIntervalMillis system property which can be used to specify a fixed rate / interval on which we should try to trim the PoolThreadCache for a EventExecutor that allocates.
- Add PooledByteBufAllocator.trimCurrentThreadCache() to allow the user to trim the cache of the calling thread manually.
- Add testcases
- Introduce FastThreadLocal.getIfExists()

Result:

Allow to better / more frequently trim PoolThreadCache and so give back memory to the area / system.
2019-03-22 11:08:37 +01:00
田欧
a33200ca38 use checkPositive/checkPositiveOrZero (#8803)
Motivation:

We have a utility method to check for > 0 and >0 arguments. We should use it.

Modification:

use checkPositive/checkPositiveOrZero instead of if statement.

Result:

Re-use utility method.
2019-01-31 09:07:14 +01:00
Norman Maurer
2680357423
Provide a way to cache the internal nioBuffer of the PooledByteBuffer… (#8603)
Motivation:

Often a temporary ByteBuffer is used which can be cached to reduce the GC pressure.

Modifications:

Cache the ByteBuffer in the PoolThreadCache as well.

Result:

Less GC.
2018-12-04 15:26:05 +01:00
Norman Maurer
15e4fe05a8 Revert "Provide a way to cache the internal nioBuffer of the PooledByteBuffer to reduce GC. (#8593)"
This reverts commit 8cd005ba43 as it seems to produce some failures in some cases. This needs more research.
2018-11-27 20:02:34 +01:00
Norman Maurer
8cd005ba43
Provide a way to cache the internal nioBuffer of the PooledByteBuffer to reduce GC. (#8593)
Motivation:

Often a temporary ByteBuffer is used which can be cached to reduce the GC pressure.

Modifications:

Add a Deque per PoolChunk which will be used for caching.

Result:

Less GC.
2018-11-27 13:55:13 +01:00
Dmitriy Dumanskiy
6cebb6069b remove unnecessary vararg argument in PooledByteBufAllocator (#8338)
Motivation:

No need in varargs, the method always accepts array.

Modification:

... replaced with []
2018-10-05 19:06:44 +08:00
Norman Maurer
e329ca1cf3 Introduce ObjectCleaner and use it in FastThreadLocal to ensure FastThreadLocal.onRemoval(...) is called
Motivation:

There is no guarantee that FastThreadLocal.onRemoval(...) is called if the FastThreadLocal is used by "non" FastThreacLocalThreads. This can lead to all sort of problems, like for example memory leaks as direct memory is not correctly cleaned up etc.

Beside this we use ThreadDeathWatcher to check if we need to release buffers back to the pool when thread local caches are collected. In the past ThreadDeathWatcher was used which will need to "wakeup" every second to check if the registered Threads are still alive. If we can ensure FastThreadLocal.onRemoval(...) is called we do not need this anymore.

Modifications:

- Introduce ObjectCleaner and use it to ensure FastThreadLocal.onRemoval(...) is always called when a Thread is collected.
- Deprecate ThreadDeathWatcher
- Add unit tests.

Result:

Consistent way of cleanup FastThreadLocals when a Thread is collected.
2017-12-21 07:34:44 +01:00
Norman Maurer
09a05b680d Dont use ThreadDeathWatcher to cleanup PoolThreadCache if FastThreadLocalThread with wrapped Runnable is used
Motivation:

We dont need to use the ThreadDeathWatcher if we use a FastThreadLocalThread for which we wrap the Runnable and ensure we call FastThreadLocal.removeAll() once the Runnable completes.

Modifications:

- Dont use a ThreadDeathWatcher if we are sure we will call FastThreadLocal.removeAll()
- Add unit test.

Result:

Less overhead / running theads if you only allocate / deallocate from FastThreadLocalThreads.
2017-11-28 13:43:28 +01:00
Carl Mastrangelo
003c8cc7ab Expose all defaults on PooledByteBufAllocator
Motivation:
Most, but not all defaults are statically exposed on
PooledByteBufAllocator.  This makes it cumbersome to make a custom
allocator where most of the defaults remain the same.

Modification:
Expose useCacheForAllThreads, and Direct preferred.  The latter is
needed because it is under the internal package, and public code
should probably not depend on it.

Result:
More customizeable allocators
2017-09-20 21:48:53 -07:00
Jason Tedor
98beb777f8 Enable configuring available processors
Motivation:

In cases when an application is running in a container or is otherwise
constrained to the number of processors that it is using, the JVM
invocation Runtime#availableProcessors will not return the constrained
value but rather the number of processors available to the virtual
machine. Netty uses this number in sizing various resources.
Additionally, some applications will constrain the number of threads
that they are using independenly of the number of processors available
on the system. Thus, applications should have a way to globally
configure the number of processors.

Modifications:

Rather than invoking Runtime#availableProcessors, Netty should rely on a
method that enables configuration when the JVM is started or by the
application. This commit exposes a new class NettyRuntime for enabling
such configuraiton. This value can only be set once. Its default value
is Runtime#availableProcessors so that there is no visible change to
existing applications, but enables configuring either a system property
or configuring during application startup (e.g., based on settings used
to configure the application).

Additionally, we introduce the usage of forbidden-apis to prevent future
uses of Runtime#availableProcessors from creeping. Future work should
enable the bundled signatures and clean up uses of deprecated and
other forbidden methods.

Result:

Netty can be configured to not use the underlying number of processors,
but rather the constrained number of processors.
2017-04-23 10:31:17 +02:00
Nikolay Fedorovskikh
0692bf1b6a fix the typos 2017-04-20 04:56:09 +02:00
Norman Maurer
3ad3356892 Expose ByteBufAllocator metric in a more general way
Motivation:

PR [#6460] added a way to access the used memory of an allocator. The used naming was not very good and how things were exposed are not consistent.

Modifications:

- Add a new ByteBufAllocatorMetric and ByteBufAllocatorMetricProvider interface
- Let the ByteBufAllocator implementations implement ByteBufAllocatorMetricProvider
- Move exposed stats / metric from PooledByteBufAllocator to PooledByteBufAllocatorMetric and mark old methods as `@Deprecated`.

Result:

More consistent way to expose metric / stats for ByteBufAllocator
2017-03-08 20:07:58 +01:00
Norman Maurer
461f9a1212 Allow to obtain informations of used direct and heap memory for ByteBufAllocator implementations
Motivation:

Often its useful for the user to be able to get some stats about the memory allocated via an allocator.

Modifications:

- Allow to obtain the used heap and direct memory for an allocator
- Add test case

Result:

Fixes [#6341]
2017-03-01 18:53:43 +01:00
Norman Maurer
8a3a3245df Ensure Unsafe buffer implementations are used when sun.misc.Unsafe is present
Motivation:

When sun.misc.Unsafe is present we want to use *Unsafe*ByteBuf implementations. We missed to do so in PooledByteBufAllocator when the heapArena is null.

Modifications:

- Correctly use UnpooledUnsafeHeapByteBuf
- Add unit tests

Result:

Use most optimal ByteBuf implementation.
2017-02-16 07:48:33 +01:00
Norman Maurer
f09a721d7f Expose the chunkSize used by PooledByteBufAllocator.
Motivation:

Sometimes it may be useful to know the used chunkSize.

Modifications:

Add method to expose chunkSize.

Result:

More exposed details.
2017-02-14 08:37:05 +01:00
Norman Maurer
54339c08ac Only try to calculate direct memory offset when sun.misc.Unsafe is present
Motivation:

We should only try to calculate the direct memory offset when sun.misc.Unsafe is present as otherwise it will fail with an NPE as PlatformDependent.directBufferAddress(...) will throw it.
This problem was introduced by 66b9be3a46.

Modifications:

Use offset of 0 if no sun.misc.Unsafe is present.

Result:

PooledByteBufAllocator also works again when no sun.misc.Unsafe is present.
2017-02-14 07:49:24 +01:00
Kiril Menshikov
66b9be3a46 Allow to allign allocated Buffers
Motivation:

64-byte alignment is recommended by the Intel performance guide (https://software.intel.com/en-us/articles/practical-intel-avx-optimization-on-2nd-generation-intel-core-processors) for data-structures over 64 bytes.
Requiring padding to a multiple of 64 bytes allows for using SIMD instructions consistently in loops without additional conditional checks. This should allow for simpler and more efficient code.

Modification:

At the moment cache alignment must be setup manually. But probably it might be taken from the system. The original code was introduced by @normanmaurer https://github.com/netty/netty/pull/4726/files

Result:

Buffer alignment works better than miss-align cache.
2017-02-06 07:58:29 +01:00
Dmitriy Dumanskiy
b9abd3c9fc Cleanup : for loops for arrays to make code easier to read and removed unnecessary toLowerCase() 2017-02-06 07:47:59 +01:00
ming.ma
44add3c525 Log correct value for useCacheForAllThreads
Motivation:

Log about "-Dio.netty.allocator.useCacheForAllThreads" is missing log placeholder, and so can't output correct value.

Modification:

- Add placeholder

Result:

Fixes #6265 .
2017-01-25 08:01:23 +01:00
Norman Maurer
2b8fd8d43b Allow to disable caching in PooledByteBufAllocator for non FastThreadLocalThreads
Motivation:

If a user allocates a lot from outside the EventLoop we may end up creating a lot of caches in the PooledByteBufAllocator. This may be wasteful and so it may be useful for an other to configure that caches should only be used from within EventLoops.

Modifications:

Add new constructor which allows to configure the caching behaviour.

Result:

More flexible configuration of PooledByteBufAllocator possible
2016-12-02 07:40:33 +01:00
Norman Maurer
3d29bcfc8d Allow to create Unsafe ByteBuf implementations that not use a Cleaner to clean the native memory.
Motivation:

Using the Cleaner to release the native memory has a few drawbacks:

- Cleaner.clean() uses static synchronized internally which means it can be a performance bottleneck
- It put more load on the GC

Modifications:

Add new buffer implementations that can be enabled with a system flag as optimizations. In this case no Cleaner is used at all and the user must ensure everything is always released.

Result:

Less performance impact by direct buffers when need to be allocated and released.
2016-06-03 21:20:10 +02:00
Norman Maurer
2537880e5d Fix typo in exception message
Motivation:

Typo in exception message.

Modifications:

Fix the typo.

Result:

No more typo.
2016-04-14 08:03:35 +02:00
Norman Maurer
200ca39b5c Add PooledByteBufAllocator.dumpStats() which allows to obtain a human-readable status of the allocator.
Motiviation:

Sometimes it is useful to dump the status of the PooledByteBufAllocator and log it. Doing this is currently a bit cumbersome as the user needs to basically iterate through all the metrics and compose the String. we would better provide an easy way to do this.

Modification:

Add dumpStats() method.

Result:

Easier to get a view into the status of the allocator.
2016-04-09 19:16:53 +02:00
buchgr
b88a980482 Change arena to thread cache mapping algorithm to be closer to ideal.
Motivation:
Circular assignment of arenas to thread caches can lead to less than optimal
mappings in cases where threads are (frequently) shutdown and started.

Example Scenario:
There are a total of 2 arenas. The first two threads performing an allocation
would lead to the following mapping:

Thread 0 -> Arena 0
Thread 1 -> Arena 1

Now, assume Thread 1 is shut down and another Thread 2 is started. The current
circular assignment algorithm would lead to the following mapping:

Thread 0 -> Arena 0
Thread 2 -> Arena 0

Ideally, we want Thread 2 to use Arena 1 though.

Presumably, this is not much of an issue for most Netty applications that do all
the allocations inside the eventloop, as eventloop threads are seldomly shut down
and restarted. However, applications that only use the netty-buffer package
or implement their own threading model outside the eventloop might suffer from
increased contention. For example, gRPC Java when using the blocking stub
performs some allocations outside the eventloop and within its own thread pool
that is dynamically sized depending on system load.

Modifications:

Implement a linear scan algorithm that assigns a new thread cache to the arena
that currently backs the fewest thread caches.

Result:

Closer to ideal mappings between thread caches and arenas. In order to always
get an ideal mapping, we would have to re-balance the mapping whenever a thread
dies. However, that's difficult because of deallocation.
2016-03-15 14:16:34 +01:00
Sylwester Lachiewicz
a18416df60 Export defaults from PooledByteBufAcclocator static fields
Motivation:

Allow external application to tune initialization of PooledByteBufAllocator

Modifications:

Added new static methods

Result:

Exported
DEFAULT_NUM_HEAP_ARENA
DEFAULT_NUM_DIRECT_ARENA
DEFAULT_PAGE_SIZE
DEFAULT_MAX_ORDER
DEFAULT_TINY_CACHE_SIZE
DEFAULT_SMALL_CACHE_SIZE
DEFAULT_NORMAL_CACHE_SIZE
2015-11-08 08:08:23 +01:00
Norman Maurer
e0ef01cf93 [#3888] Use 2 * cores as default minimum for pool arenas.
Motivation:

At the moment we use 1 * cores as default mimimum for pool arenas. This can easily lead to conditions as we use 2 * cores as default for EventLoop's when using NIO or EPOLL. If we choose a smaller number we will run into hotspots as allocation and deallocation needs to be synchronized on the PoolArena.

Modifications:

Change the default number of arenas to 2 * cores.

Result:

Less conditions when using the default settings.
2015-06-18 07:27:30 +02:00
Norman Maurer
dce0dd9b78 [#3654] No need to hold lock while destroy a chunk
Motiviation:

At the moment we sometimes hold the lock on the PoolArena during destroy a PoolChunk. This is not needed.

Modification:

- Ensure we not hold the lock during destroy a PoolChunk
- Move all synchronized usage in PoolArena
- Cleanup

Result:

Less condition.
2015-05-27 09:47:53 +02:00
Norman Maurer
271af7c624 Expose metrics for PooledByteBufAllocator
Motivation:

The PooledByteBufAllocator is more or less a black-box atm. We need to expose some metrics to allow the user to get a better idea how to tune it.

Modifications:

- Expose different metrics via PooledByteBufAllocator
- Add *Metrics interfaces

Result:

It is now easy to gather metrics and detail about the PooledByteBufAllocator and so get a better understanding about resource-usage etc.
2015-05-20 21:06:17 +02:00
Trustin Lee
085a61a310 Refactor FastThreadLocal to simplify TLV management
Motivation:

When Netty runs in a managed environment such as web application server,
Netty needs to provide an explicit way to remove the thread-local
variables it created to prevent class loader leaks.

FastThreadLocal uses different execution paths for storing a
thread-local variable depending on the type of the current thread.
It increases the complexity of thread-local removal.

Modifications:

- Moved FastThreadLocal and FastThreadLocalThread out of the internal
  package so that a user can use it.
- FastThreadLocal now keeps track of all thread local variables it has
  initialized, and calling FastThreadLocal.removeAll() will remove all
  thread-local variables of the caller thread.
- Added FastThreadLocal.size() for diagnostics and tests
- Introduce InternalThreadLocalMap which is a mixture of hard-wired
  thread local variable fields and extensible indexed variables
- FastThreadLocal now uses InternalThreadLocalMap to implement a
  thread-local variable.
- Added ThreadDeathWatcher.unwatch() so that PooledByteBufAllocator
  tells it to stop watching when its thread-local cache has been freed
  by FastThreadLocal.removeAll().
- Added FastThreadLocalTest to ensure that removeAll() works
- Added microbenchmark for FastThreadLocal and JDK ThreadLocal
- Upgraded to JMH 0.9

Result:

- A user can remove all thread-local variables Netty created, as long as
  he or she did not exit from the current thread. (Note that there's no
  way to remove a thread-local variable from outside of the thread.)
- FastThreadLocal exposes more useful operations such as isSet() because
  we always implement a thread local variable via InternalThreadLocalMap
  instead of falling back to JDK ThreadLocal.
- FastThreadLocalBenchmark shows that this change improves the
  performance of FastThreadLocal even more.
2014-06-19 21:13:55 +09:00
belliottsmith
2a2a21ec59 Introduce FastThreadLocal which uses an EnumMap and a predefined fixed set of possible thread locals
Motivation:
Provide a faster ThreadLocal implementation

Modification:
Add a "FastThreadLocal" which uses an EnumMap and a predefined fixed set of possible thread locals (all of the static instances created by netty) that is around 10-20% faster than standard ThreadLocal in my benchmarks (and can be seen having an effect in the direct PooledByteBufAllocator benchmark that uses the DEFAULT ByteBufAllocator which uses this FastThreadLocal, as opposed to normal instantiations that do not, and in the new RecyclableArrayList benchmark);

Result:
Improved performance
2014-06-13 10:56:18 +02:00
Trustin Lee
af4c30fa56 Remove the deprecated constructor 2014-06-02 18:24:19 +09:00
Trustin Lee
e79ca269b8 Introduce ThreadDeathWatcher
Motivation:

PooledByteBufAllocator's thread local cache and
ReferenceCountUtil.releaseLater() are in need of a way to run an
arbitrary logic when a certain thread is terminated.

Modifications:

- Add ThreadDeathWatcher, which spawns a low-priority daemon thread
  that watches a list of threads periodically (every second) and
  invokes the specified tasks when the associated threads are not alive
  anymore
  - Start-stop logic based on CAS operation proposed by @tea-dragon
- Add debug-level log messages to see if ThreadDeathWatcher works

Result:

- Fixes #2519 because we don't use GlobalEventExecutor anymore
- Cleaner code
2014-06-02 18:23:23 +09:00
Norman Maurer
ceffa82d0d [#2370] Periodically check for not alive Threads and free up their ThreadPoolCache
Motivation:
At the moment we create new ThreadPoolCache whenever a Thread tries either allocate or release something on the PooledByteBufAllocator. When something is released we put it then in its ThreadPoolCache. The problem is we never check if a Thread is not alive anymore and so we may end up with memory that is never freed again if a user create many short living Threads that use the PooledByteBufAllocator.

Modifications:
Periodically check if the Thread is still alive that has a ThreadPoolCache assinged and if not free it.

Result:
Memory is freed up correctly even for short living Threads.
2014-04-09 11:45:11 +02:00
Norman Maurer
8429ecfcc4 Implement Thread caches for pooled buffers to minimize conditions. This fixes [#2264] and [#808].
Motivation:
Remove the synchronization bottleneck in PoolArena and so speed up things

Modifications:

This implementation uses kind of the same technics as outlined in the jemalloc paper and jemalloc
blogpost https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919.

At the moment we only cache for "known" Threads (that powers EventExecutors) and not for others to keep the overhead
minimal when need to free up unused buffers in the cache and free up cached buffers once the Thread completes. Here
we use multi-level caches for tiny, small and normal allocations. Huge allocations are not cached at all to keep the
memory usage at a sane level. All the different cache configurations can be adjusted via system properties or the constructor
directly where it makes sense.

Result:
Less conditions as most allocations can be served by the cache itself
2014-03-20 09:30:57 -07:00
Jakob Buchgraber
1bce46dbb3 Bit tricks to check for and calculate power of two.
Motivation:
I was studying the code and thought this was simpler and easier to
understand.

Modifications:
Replaced the for loop and if conditions, with a simple implementation.

Result:
Code is easier to understand.
2014-03-18 15:59:34 +09:00
Trustin Lee
65b522a2a7 Better buffer leak reporting
- Remove the reference to ResourceLeak from the buffer implementations
  and use wrappers instead:
  - SimpleLeakAwareByteBuf and AdvancedLeakAwareByteBuf
  - It is now allocator's responsibility to create a leak-aware buffer.
  - Added AbstractByteBufAllocator.toLeakAwareBuffer() for easier
    implementation
- Add WrappedByteBuf to reduce duplication between *LeakAwareByteBuf and
  UnreleasableByteBuf
- Raise the level of leak reports to ERROR - because it will break the
  app eventually
- Replace enabled/disabled property with the leak detection level
  - Only print stack trace when level is ADVANCED or above to avoid user
    confusion
- Add the 'leak' build profile, which enables highly detailed leak
  reporting during the build
- Remove ResourceLeakException which is unsed anymore
2013-12-05 00:51:39 +09:00
Trustin Lee
ba3bc0c020 Simpler toString() for ByteBufAllocators 2013-11-08 17:54:34 +09:00
Norman Maurer
25c226a835 Make sure only direct ByteBuffer are passed to the underlying jdk Channel.
This is needed because of otherwise the JDK itself will do an extra ByteBuffer copy with it's own pool implementation. Even worth it will be done
multiple times if the ByteBuffer is always only partial written. With this change the copy is done inside of netty using it's own allocator and
only be done one time in all cases.
2013-09-02 20:17:53 +02:00
Trustin Lee
4b11aff08f Less confusing log messages for system properties
- Fixes #1502
2013-07-02 09:23:29 +09:00
Trustin Lee
0da48e7e7f Determine the default number of heap/direct arenas of PooledByteBufAllocator conservatively
- Fixes #1445
- Add PlatformDependent.maxDirectMemory()
- Ensure the default number or arenas is decreased if the max memory of the VM is not large enough.
2013-06-14 12:14:45 +09:00
Norman Maurer
bc20107b68 Use correct value to disable/enable direct arenas in PooledByteBufAllocator 2013-05-30 20:24:11 +02:00
Trustin Lee
2e0dd65250 Fix a bug where the unpooled buffer returned by the pooled allocator reports an incorrect allocator 2013-05-01 11:14:21 +09:00
Trustin Lee
a218eb6f6f Allow to disable only heap or direct buffer pool
- Fixes #1315

If a user specifies the arena size of 0, the pool is now disabled
instead of raising an IllegalArgumentException. Using this, you can
disable only heap or direct buffer pool easily. Once disabled,
PooledByteBufAllocator will delegate the allocation request to
UnpooledByteBufAllocator.
2013-04-27 08:55:16 +09:00
Trustin Lee
baf9ecfe7b Fix IndexOutOfBoundsException raised when numHeapArenas and numDirectArenas differ
- Fixes #1227
2013-04-03 22:15:34 +09:00
Trustin Lee
2ffa083d3c Allow overriding the default allocator properties and log them / Prettier log 2013-04-03 12:08:01 +09:00
Trustin Lee
8d88acb4a7 Change ByteBufAllocator.buffer() to allocate a direct buffer only when the platform can handle a direct buffer reliably
- Rename directbyDefault to preferDirect
 - Add a system property 'io.netty.prederDirect' to allow a user from changing the preference on launch-time
 - Merge UnpooledByteBufAllocator.DEFAULT_BY_* to DEFAULT
2013-03-05 17:55:24 +09:00
Trustin Lee
67da6e4bf9 Remove the notion of ByteBufAllocator.bufferMaxCapacity()
- Allocate the unpooled memory if the requested capacity is greater then the chunkSize
- Fixes #834
2012-12-19 17:35:32 +09:00
Trustin Lee
b47fc77522 Add PooledByteBufAllocator + microbenchmark module
This pull request introduces the new default ByteBufAllocator implementation based on jemalloc, with a some differences:

* Minimum possible buffer capacity is 16 (jemalloc: 2)
* Uses binary heap with random branching (jemalloc: red-black tree)
* No thread-local cache yet (jemalloc has thread-local cache)
* Default page size is 8 KiB (jemalloc: 4 KiB)
* Default chunk size is 16 MiB (jemalloc: 2 MiB)
* Cannot allocate a buffer bigger than the chunk size (jemalloc: possible) because we don't have control over memory layout in Java. A user can work around this issue by creating a composite buffer, but it's not always a feasible option. Although 16 MiB is a pretty big default, a user's handler might need to deal with the bounded buffers when the user wants to deal with a large message.

Also, to ensure the new allocator performs good enough, I wrote a microbenchmark for it and made it a dedicated Maven module. It uses Google's Caliper framework to run and publish the test result (example)

Miscellaneous changes:

* Made some ByteBuf implementations public so that those who implements a new allocator can make use of them.
* Added ByteBufAllocator.compositeBuffer() and its variants.
* ByteBufAllocator.ioBuffer() creates a buffer with 0 capacity.
2012-12-13 22:35:06 +09:00