13 Commits

Author SHA1 Message Date
Trustin Lee
7f92771496 Fix a bug where Recycler's capacity can increase beyond its maximum
Related: #3166

Motivation:

When the recyclable object created at one thread is returned at the
other thread, it is stored in a WeakOrderedQueue.

The objects stored in the WeakOrderedQueue is added back to the stack by
WeakOrderedQueue.transfer() when the owner thread ran out of recyclable
objects.

However, WeakOrderedQueue.transfer() does not have any mechanism that
prevents the stack from growing beyond its maximum capacity.

Modifications:

- Make WeakOrderedQueue.transfer() increase the capacity of the stack
  only up to its maximum
- Add tests for the cases where the recyclable object is returned at the
  non-owner thread
- Fix a bug where Stack.scavengeSome() does not scavenge the objects
  when it's the first time it ran out of objects and thus its cursor is
  null.
- Overall clean-up of scavengeSome() and transfer()

Result:

The capacity of Stack never increases beyond its maximum.
2014-12-06 17:58:31 +09:00
Amir Szekely
98a533ae44 Don't ignore maxCapacity if it's not a power of 2
Motivation:

This fixes bug #2848 which caused Recycler to become unbounded and cache infinite number of objects with maxCapacity that's not a power of two. This can result in general sluggishness of the application and OutOfMemoryError.

Modifications:

The test for maxCapacity has been moved out of test to check if the buffer has filled. The buffer is now also capped at maxCapacity and cannot grow over it as it jumps from one power of two to the other.

Additionally, a unit test was added to verify maxCapacity is honored even when it's not a power of two.

Result:

With these changes the user is able to use a custom maxCapacity number and not have it ignored. The unit test assures this bug will not repeat itself.
2014-08-31 09:06:45 +02:00
Idel Pivnitskiy
dd429b2495 Small fixes and improvements
Motivation:

Fix some typos in Netty.

Modifications:

- Fix potentially dangerous use of non-short-circuit logic in Recycler.transfer(Stack<?>).
- Removed double 'the the' in javadoc of EmbeddedChannel.
- Write to log an exception message if we can not get SOMAXCONN in the NetUtil's static block.
2014-07-20 09:37:22 +02:00
Trustin Lee
d0912f2709 Fix most inspector warnings
Motivation:

It's good to minimize potentially broken windows.

Modifications:

Fix most inspector warnings from our profile
Update IntObjectHashMap

Result:

Cleaner code
2014-07-02 19:55:07 +09:00
Norman Maurer
030bcaae81 Improve performance of Recycler
Motivation:

Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used.

Modification:

 - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance.
 - Recycling of the same object multiple times without acquire it will fail.
 - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler

These changes are based on @belliottsmith 's work that was part of #2504.

Result:

Huge increase in performance.

4.0 branch without this commit:

Benchmark                                                (size)   Mode   Samples        Score  Score error    Units
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    00000  thrpt        20 116026994.130  2763381.305    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    00256  thrpt        20 110823170.627  3007221.464    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    01024  thrpt        20 118290272.413  7143962.304    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    04096  thrpt        20 120560396.523  6483323.228    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    16384  thrpt        20 114726607.428  2960013.108    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    65536  thrpt        20 119385917.899  3172913.684    ops/s
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark

4.0 branch with this commit:

Benchmark                                                (size)   Mode   Samples        Score  Score error    Units
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    00000  thrpt        20 204158855.315  5031432.145    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    00256  thrpt        20 205179685.861  1934137.841    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    01024  thrpt        20 209906801.437  8007811.254    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    04096  thrpt        20 214288320.053  6413126.689    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    16384  thrpt        20 215940902.649  7837706.133    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    65536  thrpt        20 211141994.206  5017868.542    ops/s
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-24 10:47:38 +02:00
Trustin Lee
085a61a310 Refactor FastThreadLocal to simplify TLV management
Motivation:

When Netty runs in a managed environment such as web application server,
Netty needs to provide an explicit way to remove the thread-local
variables it created to prevent class loader leaks.

FastThreadLocal uses different execution paths for storing a
thread-local variable depending on the type of the current thread.
It increases the complexity of thread-local removal.

Modifications:

- Moved FastThreadLocal and FastThreadLocalThread out of the internal
  package so that a user can use it.
- FastThreadLocal now keeps track of all thread local variables it has
  initialized, and calling FastThreadLocal.removeAll() will remove all
  thread-local variables of the caller thread.
- Added FastThreadLocal.size() for diagnostics and tests
- Introduce InternalThreadLocalMap which is a mixture of hard-wired
  thread local variable fields and extensible indexed variables
- FastThreadLocal now uses InternalThreadLocalMap to implement a
  thread-local variable.
- Added ThreadDeathWatcher.unwatch() so that PooledByteBufAllocator
  tells it to stop watching when its thread-local cache has been freed
  by FastThreadLocal.removeAll().
- Added FastThreadLocalTest to ensure that removeAll() works
- Added microbenchmark for FastThreadLocal and JDK ThreadLocal
- Upgraded to JMH 0.9

Result:

- A user can remove all thread-local variables Netty created, as long as
  he or she did not exit from the current thread. (Note that there's no
  way to remove a thread-local variable from outside of the thread.)
- FastThreadLocal exposes more useful operations such as isSet() because
  we always implement a thread local variable via InternalThreadLocalMap
  instead of falling back to JDK ThreadLocal.
- FastThreadLocalBenchmark shows that this change improves the
  performance of FastThreadLocal even more.
2014-06-19 21:13:55 +09:00
belliottsmith
2a2a21ec59 Introduce FastThreadLocal which uses an EnumMap and a predefined fixed set of possible thread locals
Motivation:
Provide a faster ThreadLocal implementation

Modification:
Add a "FastThreadLocal" which uses an EnumMap and a predefined fixed set of possible thread locals (all of the static instances created by netty) that is around 10-20% faster than standard ThreadLocal in my benchmarks (and can be seen having an effect in the direct PooledByteBufAllocator benchmark that uses the DEFAULT ByteBufAllocator which uses this FastThreadLocal, as opposed to normal instantiations that do not, and in the new RecyclableArrayList benchmark);

Result:
Improved performance
2014-06-13 10:56:18 +02:00
Trustin Lee
9fe9710315 Rename "io.netty.recycler.maxCapacity.default" to "io.netty.recycler.maxCapacity"
Motivation:

'io.netty.recycler.maxCapacity.default' is the only property for recycler's default maximum capacity, so having the 'default' suffix only increases the length of the property name.

Modifications:

Rename "io.netty.recycler.maxCapacity.default" to "io.netty.recycler.maxCapacity"

Result:

Shorter system property name. The future addition of system properties, such as io.netty.recycler.maxCapacity.outboundBuffer, are not confusing either.
2014-03-18 16:26:16 +09:00
Norman Maurer
50e95383a3 Fix checkstyle errors introduced by f0d1bbd63ec910b9c5bccc925bdf0b0f55db1f9c 2014-03-12 12:41:06 +01:00
Trustin Lee
e57cf9d201 Add capacity limit to Recycler / Optimize when assertion is off
Motivation:

- As reported recently [1], Recycler's thread-local object pool has unbounded capacity which is a potential problem.
- It accesses a hash table on each push and pop for debugging purposes.  We don't really need it besides debugging Netty itself.

Modifications:

- Introduced the maxCapacity constructor parameter to Recycler.  The default default maxCapacity is retrieved from the system property whose default is 256K, which should be plenty for most cases.
- Recycler.Stack.map is now created and accessed only when assertion is enabled for Recycler.

Result:

- Recycler does not grow infinitely anymore.
- If assertion is disabled, Recycler should be much faster.

[1] https://github.com/netty/netty/issues/1841
2014-03-12 18:16:53 +09:00
Trustin Lee
2b84314fdd Add Recycler.Handle.recycle() so that it's possible to recycle an object without an explicit reference to Recycler 2014-02-13 17:24:37 -08:00
Trustin Lee
9449efb9b2 Optimize Recycler.Stack
- No need to use a deque at all
- Increase the initial capacity so that there's no practical chance of capacity expansion
2013-06-10 16:38:57 +09:00
Trustin Lee
14158070bf Revamp the core API to reduce memory footprint and consumption
The API changes made so far turned out to increase the memory footprint
and consumption while our intention was actually decreasing them.

Memory consumption issue:

When there are many connections which does not exchange data frequently,
the old Netty 4 API spent a lot more memory than 3 because it always
allocates per-handler buffer for each connection unless otherwise
explicitly stated by a user.  In a usual real world load, a client
doesn't always send requests without pausing, so the idea of having a
buffer whose life cycle if bound to the life cycle of a connection
didn't work as expected.

Memory footprint issue:

The old Netty 4 API decreased overall memory footprint by a great deal
in many cases.  It was mainly because the old Netty 4 API did not
allocate a new buffer and event object for each read.  Instead, it
created a new buffer for each handler in a pipeline.  This works pretty
well as long as the number of handlers in a pipeline is only a few.
However, for a highly modular application with many handlers which
handles connections which lasts for relatively short period, it actually
makes the memory footprint issue much worse.

Changes:

All in all, this is about retaining all the good changes we made in 4 so
far such as better thread model and going back to the way how we dealt
with message events in 3.

To fix the memory consumption/footprint issue mentioned above, we made a
hard decision to break the backward compatibility again with the
following changes:

- Remove MessageBuf
- Merge Buf into ByteBuf
- Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler
  - Similar changes were made to the adapter classes
- Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler
  - Similar changes were made to the adapter classes
- Introduce MessageList which is similar to `MessageEvent` in Netty 3
- Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList)
- Replace flush(ctx, promise) with write(ctx, MessageList, promise)
- Remove ByteToByteEncoder/Decoder/Codec
  - Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf>
- Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel
- Add SimpleChannelInboundHandler which is sometimes more useful than
  ChannelInboundHandlerAdapter
- Bring back Channel.isWritable() from Netty 3
- Add ChannelInboundHandler.channelWritabilityChanges() event
- Add RecvByteBufAllocator configuration property
  - Similar to ReceiveBufferSizePredictor in Netty 3
  - Some existing configuration properties such as
    DatagramChannelConfig.receivePacketSize is gone now.
- Remove suspend/resumeIntermediaryDeallocation() in ByteBuf

This change would have been impossible without @normanmaurer's help. He
fixed, ported, and improved many parts of the changes.
2013-06-10 16:10:39 +09:00