Commit Graph

27 Commits

Author SHA1 Message Date
Norman Maurer
54e41df65d Ensure people are aware recycler capacity is per thread.
Motivation:

Its not clear that the capacity is per thread.

Modifications:

Rename system property to make it more clear that the recycler capacity is per thread.

Result:

Less confusing.
2016-08-08 11:00:26 +02:00
Norman Maurer
d3dc9c9e74 Allow to limit the maximum number of WeakOrderQueue instances per Thread.
Motivation:

To better restrict resource usage we should limit the number of WeakOrderQueue instances per Thread. Once this limit is reached object that are recycled from a different Thread then the allocation Thread are dropped on the floor.

Modifications:

Add new system property io.netty.recycler.maxDelayedQueuesPerThread and constructor that allows to limit the max number of WeakOrderQueue instances per Thread for Recycler instance. The default is 2 * cores (the same as the default number of EventLoop instances per EventLoopGroup).

Result:

Better way to restrict resource / memory usage per Recycler instance.
2016-08-04 06:23:14 +02:00
Scott Mitchell
82b22d6f11 findNextPositivePowerOfTwo out of bounds
Motivation:
Some usages of findNextPositivePowerOfTwo assume that bounds checking is taken care of by this method. However bounds checking is not taken care of by findNextPositivePowerOfTwo and instead assert statements are used to imply the caller has checked the bounds. This can lead to unexpected non power of 2 return values if the caller is not careful and thus invalidate any logic which depends upon a power of 2.

Modifications:
- Add a safeFindNextPositivePowerOfTwo method which will do runtime bounds checks and always return a power of 2

Result:
Fixes https://github.com/netty/netty/issues/5601
2016-08-01 19:52:13 -07:00
Norman Maurer
d92c5f5f5b Introduce allocation / pooling ratio in Recycler
Motivation:

At the moment the Recyler is very sensitive to allocation bursts which means that if there is a need for X objects for only one time these will most likely end up in the Recycler and sit there forever as the normal workload only need a subset of this number.

Modifications:

Add a ratio which sets how many objects should be pooled for each new allocation. This allows to slowly increase the number of objects in the Recycler while not be to sensitive for bursts.

Result:

Less unused objects in the Recycler if allocation rate sometimes bursts.
2016-07-29 15:20:39 +02:00
Norman Maurer
445a547265 Set Recycler DEFAULT_INITIAL_MAX_CAPACITY to a more sane value
Motivation:

We used a very high number for DEFAULT_INITIAL_MAX_CAPACITY (over 200k) which is not very relastic and my lead to very surprising memory usage if allocations happen in bursts.

Modifications:

Use a more sane default value of 32k

Result:

Less possible memory usage by default
2016-07-27 07:59:23 +02:00
Norman Maurer
fe4af7e32c Ensure shared capacity is updated correctly when WeakOrderQueue is collected.
Motivation:

We use a shared capacity per Stack for all its WeakOrderQueue elements. These relations are stored in a WeakHashMap to allow dropping these if memory pressure arise. The problem is that we not "reclaim" previous reserved space when this happens. This can lead to a Stack which has not shared capacity left which then will lead to an AssertError when we try to allocate a new WeakOderQueue.

Modifications:

- Ensure we never throw an AssertError if we not have enough space left for a new WeakOrderQueue
- Ensure we reclaim space when WeakOrderQueue is collected.

Result:

No more AssertError possible when new WeakOrderQueue is created and also correctly reclaim space that was reserved from the shared capacity.
2016-07-24 20:56:51 +02:00
Norman Maurer
94d7557dea Ensure WeakOrderQueue can be collected fast enough
Motivation:

Commit afafadd3d7 introduced a change which stored the Stack in the WeakOrderQueue as field. This unfortunally had the effect that it was not removed from the WeakHashMap anymore as the Stack also is used as key.

Modifications:

Do not store a reference to the Stack in WeakOrderQueue.

Result:

WeakOrderQueue can be collected correctly again.
2016-07-22 20:42:05 +02:00
Norman Maurer
afafadd3d7 [#5505] Enforce Recycler limit when recycling from different threads
Motivation:

Currently, the recycler max capacity it's only enforced on the
thread-local stack which is used when the recycling happens on the
same thread that requested the object.

When the recycling happens in a different thread, then the objects
will be queued into a linked list (where each node holds N objects,
default=16). These objects are then transfered into the stack when
new objects are requested and the stack is empty.

The problem is that the queue doesn't have a max capacity and that
can lead to bad scenarios. Eg:

- Allocate 1M object from recycler
- Recycle all of them from different thread
- Recycler WeakOrderQueue will contain 1M objects
- Reference graph will be very long to traverse and GC timeseems to be negatively impacted
- Size of the queue will never shrink after this

Modifications:

Add some shared counter which is used to manage capacity limits when recycle from different thread then the allocation thread. We modify the counter whenever we allocate a new Link to reduce the overhead of increment / decrement it.

Result:

More predictable number of objects mantained in the recycler pool.
2016-07-14 07:56:03 +02:00
Alex Petrov
bbed330468 Fix the possible reference leak in Recycler
Motivation:

Under very unlikely (however possible) circumstances, Recycler may leak
references. This happens _only_ when the object was already recycled
at least once (which means it's got written to the stack) and then
taken out again, and never returned.

The "never returned" part may be the fault of the user (forgotten
`finally` clause) or the situation when Recycler drops the possibly
youngest item itself.

Modifications:

Nullify the item taken from the stack.

Result:

Reference is cleaned up. If the object is lost, it will be a subject for
GC. The rest of Stack / Recycler functionality remains unaffected.
2016-06-01 06:47:43 +02:00
Norman Maurer
1dfcfc17fa Allow to change link capacity via system property
Motivation:

Sometimes people may want to trade GC with memory overhead. For this it can be useful to allow to change the capacity of the array that is hold in the Link that is used by the Recycler internally.

Modifications:

Introduce a new system property , io.netty.recycler.linkCapacity which allows to change the capcity.

Result:

More flexible configuration of netty.
2016-05-31 14:05:23 +02:00
Norman Maurer
e10dca7601 Mark Recycler.recycle(...) deprecated and update usage.
Motivation:

Recycler.recycle(...) should not be used anymore and be replaced by Handle.recycle().

Modifications:

Mark it as deprecated and update usage.

Result:

Correctly document deprecated api.
2016-05-20 22:11:31 +02:00
Norman Maurer
4e1760c91b Allow disable Recycler via -Dio.netty.recycler.maxCapacity=0
Motivation:

It should be possible to disable the Recycler with -Dio.netty.recycler.maxCapacity=0, but because of a typo this is not the case.

Modifications:

Replace <= with < to make it posible to disable the Recycler.

Result:

Correct behaviour when using -Dio.netty.recycler.maxCapacity=0
2016-03-21 08:34:42 +01:00
Norman Maurer
b26652a934 Fix typo in log message during static init of Recycler.
Motivation:

Fix a typo in the log message of the static initializer of Recycler.

Modifications:

Fix typo.

Result:

Correctly log system property io.netty.recycler.maxCapacity.
2016-03-21 08:23:31 +01:00
Norman Maurer
1a9ea2d349 [#4147] Allow to disable recycling
Motivation:

Sometimes it is useful to disable recycling completely if memory constraints are very tight.

Modifications:

Allow to use -Dio.netty.recycler.maxCapacity=0 to disable recycling completely.

Result:

It's possible to disable recycling now.
2015-08-28 15:05:17 +02:00
Trustin Lee
7f92771496 Fix a bug where Recycler's capacity can increase beyond its maximum
Related: #3166

Motivation:

When the recyclable object created at one thread is returned at the
other thread, it is stored in a WeakOrderedQueue.

The objects stored in the WeakOrderedQueue is added back to the stack by
WeakOrderedQueue.transfer() when the owner thread ran out of recyclable
objects.

However, WeakOrderedQueue.transfer() does not have any mechanism that
prevents the stack from growing beyond its maximum capacity.

Modifications:

- Make WeakOrderedQueue.transfer() increase the capacity of the stack
  only up to its maximum
- Add tests for the cases where the recyclable object is returned at the
  non-owner thread
- Fix a bug where Stack.scavengeSome() does not scavenge the objects
  when it's the first time it ran out of objects and thus its cursor is
  null.
- Overall clean-up of scavengeSome() and transfer()

Result:

The capacity of Stack never increases beyond its maximum.
2014-12-06 17:58:31 +09:00
Amir Szekely
98a533ae44 Don't ignore maxCapacity if it's not a power of 2
Motivation:

This fixes bug #2848 which caused Recycler to become unbounded and cache infinite number of objects with maxCapacity that's not a power of two. This can result in general sluggishness of the application and OutOfMemoryError.

Modifications:

The test for maxCapacity has been moved out of test to check if the buffer has filled. The buffer is now also capped at maxCapacity and cannot grow over it as it jumps from one power of two to the other.

Additionally, a unit test was added to verify maxCapacity is honored even when it's not a power of two.

Result:

With these changes the user is able to use a custom maxCapacity number and not have it ignored. The unit test assures this bug will not repeat itself.
2014-08-31 09:06:45 +02:00
Idel Pivnitskiy
dd429b2495 Small fixes and improvements
Motivation:

Fix some typos in Netty.

Modifications:

- Fix potentially dangerous use of non-short-circuit logic in Recycler.transfer(Stack<?>).
- Removed double 'the the' in javadoc of EmbeddedChannel.
- Write to log an exception message if we can not get SOMAXCONN in the NetUtil's static block.
2014-07-20 09:37:22 +02:00
Trustin Lee
d0912f2709 Fix most inspector warnings
Motivation:

It's good to minimize potentially broken windows.

Modifications:

Fix most inspector warnings from our profile
Update IntObjectHashMap

Result:

Cleaner code
2014-07-02 19:55:07 +09:00
Norman Maurer
030bcaae81 Improve performance of Recycler
Motivation:

Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used.

Modification:

 - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance.
 - Recycling of the same object multiple times without acquire it will fail.
 - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler

These changes are based on @belliottsmith 's work that was part of #2504.

Result:

Huge increase in performance.

4.0 branch without this commit:

Benchmark                                                (size)   Mode   Samples        Score  Score error    Units
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    00000  thrpt        20 116026994.130  2763381.305    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    00256  thrpt        20 110823170.627  3007221.464    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    01024  thrpt        20 118290272.413  7143962.304    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    04096  thrpt        20 120560396.523  6483323.228    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    16384  thrpt        20 114726607.428  2960013.108    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    65536  thrpt        20 119385917.899  3172913.684    ops/s
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark

4.0 branch with this commit:

Benchmark                                                (size)   Mode   Samples        Score  Score error    Units
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    00000  thrpt        20 204158855.315  5031432.145    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    00256  thrpt        20 205179685.861  1934137.841    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    01024  thrpt        20 209906801.437  8007811.254    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    04096  thrpt        20 214288320.053  6413126.689    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    16384  thrpt        20 215940902.649  7837706.133    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    65536  thrpt        20 211141994.206  5017868.542    ops/s
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-24 10:47:38 +02:00
Trustin Lee
085a61a310 Refactor FastThreadLocal to simplify TLV management
Motivation:

When Netty runs in a managed environment such as web application server,
Netty needs to provide an explicit way to remove the thread-local
variables it created to prevent class loader leaks.

FastThreadLocal uses different execution paths for storing a
thread-local variable depending on the type of the current thread.
It increases the complexity of thread-local removal.

Modifications:

- Moved FastThreadLocal and FastThreadLocalThread out of the internal
  package so that a user can use it.
- FastThreadLocal now keeps track of all thread local variables it has
  initialized, and calling FastThreadLocal.removeAll() will remove all
  thread-local variables of the caller thread.
- Added FastThreadLocal.size() for diagnostics and tests
- Introduce InternalThreadLocalMap which is a mixture of hard-wired
  thread local variable fields and extensible indexed variables
- FastThreadLocal now uses InternalThreadLocalMap to implement a
  thread-local variable.
- Added ThreadDeathWatcher.unwatch() so that PooledByteBufAllocator
  tells it to stop watching when its thread-local cache has been freed
  by FastThreadLocal.removeAll().
- Added FastThreadLocalTest to ensure that removeAll() works
- Added microbenchmark for FastThreadLocal and JDK ThreadLocal
- Upgraded to JMH 0.9

Result:

- A user can remove all thread-local variables Netty created, as long as
  he or she did not exit from the current thread. (Note that there's no
  way to remove a thread-local variable from outside of the thread.)
- FastThreadLocal exposes more useful operations such as isSet() because
  we always implement a thread local variable via InternalThreadLocalMap
  instead of falling back to JDK ThreadLocal.
- FastThreadLocalBenchmark shows that this change improves the
  performance of FastThreadLocal even more.
2014-06-19 21:13:55 +09:00
belliottsmith
2a2a21ec59 Introduce FastThreadLocal which uses an EnumMap and a predefined fixed set of possible thread locals
Motivation:
Provide a faster ThreadLocal implementation

Modification:
Add a "FastThreadLocal" which uses an EnumMap and a predefined fixed set of possible thread locals (all of the static instances created by netty) that is around 10-20% faster than standard ThreadLocal in my benchmarks (and can be seen having an effect in the direct PooledByteBufAllocator benchmark that uses the DEFAULT ByteBufAllocator which uses this FastThreadLocal, as opposed to normal instantiations that do not, and in the new RecyclableArrayList benchmark);

Result:
Improved performance
2014-06-13 10:56:18 +02:00
Trustin Lee
9fe9710315 Rename "io.netty.recycler.maxCapacity.default" to "io.netty.recycler.maxCapacity"
Motivation:

'io.netty.recycler.maxCapacity.default' is the only property for recycler's default maximum capacity, so having the 'default' suffix only increases the length of the property name.

Modifications:

Rename "io.netty.recycler.maxCapacity.default" to "io.netty.recycler.maxCapacity"

Result:

Shorter system property name. The future addition of system properties, such as io.netty.recycler.maxCapacity.outboundBuffer, are not confusing either.
2014-03-18 16:26:16 +09:00
Norman Maurer
50e95383a3 Fix checkstyle errors introduced by f0d1bbd63e 2014-03-12 12:41:06 +01:00
Trustin Lee
e57cf9d201 Add capacity limit to Recycler / Optimize when assertion is off
Motivation:

- As reported recently [1], Recycler's thread-local object pool has unbounded capacity which is a potential problem.
- It accesses a hash table on each push and pop for debugging purposes.  We don't really need it besides debugging Netty itself.

Modifications:

- Introduced the maxCapacity constructor parameter to Recycler.  The default default maxCapacity is retrieved from the system property whose default is 256K, which should be plenty for most cases.
- Recycler.Stack.map is now created and accessed only when assertion is enabled for Recycler.

Result:

- Recycler does not grow infinitely anymore.
- If assertion is disabled, Recycler should be much faster.

[1] https://github.com/netty/netty/issues/1841
2014-03-12 18:16:53 +09:00
Trustin Lee
2b84314fdd Add Recycler.Handle.recycle() so that it's possible to recycle an object without an explicit reference to Recycler 2014-02-13 17:24:37 -08:00
Trustin Lee
9449efb9b2 Optimize Recycler.Stack
- No need to use a deque at all
- Increase the initial capacity so that there's no practical chance of capacity expansion
2013-06-10 16:38:57 +09:00
Trustin Lee
14158070bf Revamp the core API to reduce memory footprint and consumption
The API changes made so far turned out to increase the memory footprint
and consumption while our intention was actually decreasing them.

Memory consumption issue:

When there are many connections which does not exchange data frequently,
the old Netty 4 API spent a lot more memory than 3 because it always
allocates per-handler buffer for each connection unless otherwise
explicitly stated by a user.  In a usual real world load, a client
doesn't always send requests without pausing, so the idea of having a
buffer whose life cycle if bound to the life cycle of a connection
didn't work as expected.

Memory footprint issue:

The old Netty 4 API decreased overall memory footprint by a great deal
in many cases.  It was mainly because the old Netty 4 API did not
allocate a new buffer and event object for each read.  Instead, it
created a new buffer for each handler in a pipeline.  This works pretty
well as long as the number of handlers in a pipeline is only a few.
However, for a highly modular application with many handlers which
handles connections which lasts for relatively short period, it actually
makes the memory footprint issue much worse.

Changes:

All in all, this is about retaining all the good changes we made in 4 so
far such as better thread model and going back to the way how we dealt
with message events in 3.

To fix the memory consumption/footprint issue mentioned above, we made a
hard decision to break the backward compatibility again with the
following changes:

- Remove MessageBuf
- Merge Buf into ByteBuf
- Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler
  - Similar changes were made to the adapter classes
- Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler
  - Similar changes were made to the adapter classes
- Introduce MessageList which is similar to `MessageEvent` in Netty 3
- Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList)
- Replace flush(ctx, promise) with write(ctx, MessageList, promise)
- Remove ByteToByteEncoder/Decoder/Codec
  - Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf>
- Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel
- Add SimpleChannelInboundHandler which is sometimes more useful than
  ChannelInboundHandlerAdapter
- Bring back Channel.isWritable() from Netty 3
- Add ChannelInboundHandler.channelWritabilityChanges() event
- Add RecvByteBufAllocator configuration property
  - Similar to ReceiveBufferSizePredictor in Netty 3
  - Some existing configuration properties such as
    DatagramChannelConfig.receivePacketSize is gone now.
- Remove suspend/resumeIntermediaryDeallocation() in ByteBuf

This change would have been impossible without @normanmaurer's help. He
fixed, ported, and improved many parts of the changes.
2013-06-10 16:10:39 +09:00