netty5/common/src/main/java/io/netty/util/Recycler.java

662 lines
25 KiB
Java
Raw Normal View History

Revamp the core API to reduce memory footprint and consumption The API changes made so far turned out to increase the memory footprint and consumption while our intention was actually decreasing them. Memory consumption issue: When there are many connections which does not exchange data frequently, the old Netty 4 API spent a lot more memory than 3 because it always allocates per-handler buffer for each connection unless otherwise explicitly stated by a user. In a usual real world load, a client doesn't always send requests without pausing, so the idea of having a buffer whose life cycle if bound to the life cycle of a connection didn't work as expected. Memory footprint issue: The old Netty 4 API decreased overall memory footprint by a great deal in many cases. It was mainly because the old Netty 4 API did not allocate a new buffer and event object for each read. Instead, it created a new buffer for each handler in a pipeline. This works pretty well as long as the number of handlers in a pipeline is only a few. However, for a highly modular application with many handlers which handles connections which lasts for relatively short period, it actually makes the memory footprint issue much worse. Changes: All in all, this is about retaining all the good changes we made in 4 so far such as better thread model and going back to the way how we dealt with message events in 3. To fix the memory consumption/footprint issue mentioned above, we made a hard decision to break the backward compatibility again with the following changes: - Remove MessageBuf - Merge Buf into ByteBuf - Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler - Similar changes were made to the adapter classes - Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler - Similar changes were made to the adapter classes - Introduce MessageList which is similar to `MessageEvent` in Netty 3 - Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList) - Replace flush(ctx, promise) with write(ctx, MessageList, promise) - Remove ByteToByteEncoder/Decoder/Codec - Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf> - Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel - Add SimpleChannelInboundHandler which is sometimes more useful than ChannelInboundHandlerAdapter - Bring back Channel.isWritable() from Netty 3 - Add ChannelInboundHandler.channelWritabilityChanges() event - Add RecvByteBufAllocator configuration property - Similar to ReceiveBufferSizePredictor in Netty 3 - Some existing configuration properties such as DatagramChannelConfig.receivePacketSize is gone now. - Remove suspend/resumeIntermediaryDeallocation() in ByteBuf This change would have been impossible without @normanmaurer's help. He fixed, ported, and improved many parts of the changes.
2013-05-28 13:40:19 +02:00
/*
* Copyright 2013 The Netty Project
*
* The Netty Project licenses this file to you under the Apache License,
* version 2.0 (the "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
* License for the specific language governing permissions and limitations
* under the License.
*/
package io.netty.util;
Refactor FastThreadLocal to simplify TLV management Motivation: When Netty runs in a managed environment such as web application server, Netty needs to provide an explicit way to remove the thread-local variables it created to prevent class loader leaks. FastThreadLocal uses different execution paths for storing a thread-local variable depending on the type of the current thread. It increases the complexity of thread-local removal. Modifications: - Moved FastThreadLocal and FastThreadLocalThread out of the internal package so that a user can use it. - FastThreadLocal now keeps track of all thread local variables it has initialized, and calling FastThreadLocal.removeAll() will remove all thread-local variables of the caller thread. - Added FastThreadLocal.size() for diagnostics and tests - Introduce InternalThreadLocalMap which is a mixture of hard-wired thread local variable fields and extensible indexed variables - FastThreadLocal now uses InternalThreadLocalMap to implement a thread-local variable. - Added ThreadDeathWatcher.unwatch() so that PooledByteBufAllocator tells it to stop watching when its thread-local cache has been freed by FastThreadLocal.removeAll(). - Added FastThreadLocalTest to ensure that removeAll() works - Added microbenchmark for FastThreadLocal and JDK ThreadLocal - Upgraded to JMH 0.9 Result: - A user can remove all thread-local variables Netty created, as long as he or she did not exit from the current thread. (Note that there's no way to remove a thread-local variable from outside of the thread.) - FastThreadLocal exposes more useful operations such as isSet() because we always implement a thread local variable via InternalThreadLocalMap instead of falling back to JDK ThreadLocal. - FastThreadLocalBenchmark shows that this change improves the performance of FastThreadLocal even more.
2014-06-17 11:37:58 +02:00
import io.netty.util.concurrent.FastThreadLocal;
import io.netty.util.internal.SystemPropertyUtil;
import io.netty.util.internal.logging.InternalLogger;
import io.netty.util.internal.logging.InternalLoggerFactory;
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
import java.lang.ref.WeakReference;
import java.util.Arrays;
Revamp the core API to reduce memory footprint and consumption The API changes made so far turned out to increase the memory footprint and consumption while our intention was actually decreasing them. Memory consumption issue: When there are many connections which does not exchange data frequently, the old Netty 4 API spent a lot more memory than 3 because it always allocates per-handler buffer for each connection unless otherwise explicitly stated by a user. In a usual real world load, a client doesn't always send requests without pausing, so the idea of having a buffer whose life cycle if bound to the life cycle of a connection didn't work as expected. Memory footprint issue: The old Netty 4 API decreased overall memory footprint by a great deal in many cases. It was mainly because the old Netty 4 API did not allocate a new buffer and event object for each read. Instead, it created a new buffer for each handler in a pipeline. This works pretty well as long as the number of handlers in a pipeline is only a few. However, for a highly modular application with many handlers which handles connections which lasts for relatively short period, it actually makes the memory footprint issue much worse. Changes: All in all, this is about retaining all the good changes we made in 4 so far such as better thread model and going back to the way how we dealt with message events in 3. To fix the memory consumption/footprint issue mentioned above, we made a hard decision to break the backward compatibility again with the following changes: - Remove MessageBuf - Merge Buf into ByteBuf - Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler - Similar changes were made to the adapter classes - Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler - Similar changes were made to the adapter classes - Introduce MessageList which is similar to `MessageEvent` in Netty 3 - Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList) - Replace flush(ctx, promise) with write(ctx, MessageList, promise) - Remove ByteToByteEncoder/Decoder/Codec - Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf> - Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel - Add SimpleChannelInboundHandler which is sometimes more useful than ChannelInboundHandlerAdapter - Bring back Channel.isWritable() from Netty 3 - Add ChannelInboundHandler.channelWritabilityChanges() event - Add RecvByteBufAllocator configuration property - Similar to ReceiveBufferSizePredictor in Netty 3 - Some existing configuration properties such as DatagramChannelConfig.receivePacketSize is gone now. - Remove suspend/resumeIntermediaryDeallocation() in ByteBuf This change would have been impossible without @normanmaurer's help. He fixed, ported, and improved many parts of the changes.
2013-05-28 13:40:19 +02:00
import java.util.Map;
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
import java.util.WeakHashMap;
import java.util.concurrent.atomic.AtomicInteger;
Revamp the core API to reduce memory footprint and consumption The API changes made so far turned out to increase the memory footprint and consumption while our intention was actually decreasing them. Memory consumption issue: When there are many connections which does not exchange data frequently, the old Netty 4 API spent a lot more memory than 3 because it always allocates per-handler buffer for each connection unless otherwise explicitly stated by a user. In a usual real world load, a client doesn't always send requests without pausing, so the idea of having a buffer whose life cycle if bound to the life cycle of a connection didn't work as expected. Memory footprint issue: The old Netty 4 API decreased overall memory footprint by a great deal in many cases. It was mainly because the old Netty 4 API did not allocate a new buffer and event object for each read. Instead, it created a new buffer for each handler in a pipeline. This works pretty well as long as the number of handlers in a pipeline is only a few. However, for a highly modular application with many handlers which handles connections which lasts for relatively short period, it actually makes the memory footprint issue much worse. Changes: All in all, this is about retaining all the good changes we made in 4 so far such as better thread model and going back to the way how we dealt with message events in 3. To fix the memory consumption/footprint issue mentioned above, we made a hard decision to break the backward compatibility again with the following changes: - Remove MessageBuf - Merge Buf into ByteBuf - Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler - Similar changes were made to the adapter classes - Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler - Similar changes were made to the adapter classes - Introduce MessageList which is similar to `MessageEvent` in Netty 3 - Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList) - Replace flush(ctx, promise) with write(ctx, MessageList, promise) - Remove ByteToByteEncoder/Decoder/Codec - Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf> - Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel - Add SimpleChannelInboundHandler which is sometimes more useful than ChannelInboundHandlerAdapter - Bring back Channel.isWritable() from Netty 3 - Add ChannelInboundHandler.channelWritabilityChanges() event - Add RecvByteBufAllocator configuration property - Similar to ReceiveBufferSizePredictor in Netty 3 - Some existing configuration properties such as DatagramChannelConfig.receivePacketSize is gone now. - Remove suspend/resumeIntermediaryDeallocation() in ByteBuf This change would have been impossible without @normanmaurer's help. He fixed, ported, and improved many parts of the changes.
2013-05-28 13:40:19 +02:00
import static io.netty.util.internal.MathUtil.safeFindNextPositivePowerOfTwo;
import static java.lang.Math.max;
import static java.lang.Math.min;
Revamp the core API to reduce memory footprint and consumption The API changes made so far turned out to increase the memory footprint and consumption while our intention was actually decreasing them. Memory consumption issue: When there are many connections which does not exchange data frequently, the old Netty 4 API spent a lot more memory than 3 because it always allocates per-handler buffer for each connection unless otherwise explicitly stated by a user. In a usual real world load, a client doesn't always send requests without pausing, so the idea of having a buffer whose life cycle if bound to the life cycle of a connection didn't work as expected. Memory footprint issue: The old Netty 4 API decreased overall memory footprint by a great deal in many cases. It was mainly because the old Netty 4 API did not allocate a new buffer and event object for each read. Instead, it created a new buffer for each handler in a pipeline. This works pretty well as long as the number of handlers in a pipeline is only a few. However, for a highly modular application with many handlers which handles connections which lasts for relatively short period, it actually makes the memory footprint issue much worse. Changes: All in all, this is about retaining all the good changes we made in 4 so far such as better thread model and going back to the way how we dealt with message events in 3. To fix the memory consumption/footprint issue mentioned above, we made a hard decision to break the backward compatibility again with the following changes: - Remove MessageBuf - Merge Buf into ByteBuf - Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler - Similar changes were made to the adapter classes - Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler - Similar changes were made to the adapter classes - Introduce MessageList which is similar to `MessageEvent` in Netty 3 - Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList) - Replace flush(ctx, promise) with write(ctx, MessageList, promise) - Remove ByteToByteEncoder/Decoder/Codec - Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf> - Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel - Add SimpleChannelInboundHandler which is sometimes more useful than ChannelInboundHandlerAdapter - Bring back Channel.isWritable() from Netty 3 - Add ChannelInboundHandler.channelWritabilityChanges() event - Add RecvByteBufAllocator configuration property - Similar to ReceiveBufferSizePredictor in Netty 3 - Some existing configuration properties such as DatagramChannelConfig.receivePacketSize is gone now. - Remove suspend/resumeIntermediaryDeallocation() in ByteBuf This change would have been impossible without @normanmaurer's help. He fixed, ported, and improved many parts of the changes.
2013-05-28 13:40:19 +02:00
/**
* Light-weight object pool based on a thread-local stack.
*
* @param <T> the type of the pooled object
*/
public abstract class Recycler<T> {
private static final InternalLogger logger = InternalLoggerFactory.getInstance(Recycler.class);
@SuppressWarnings("rawtypes")
private static final Handle NOOP_HANDLE = new Handle() {
@Override
public void recycle(Object object) {
// NOOP
}
};
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
private static final AtomicInteger ID_GENERATOR = new AtomicInteger(Integer.MIN_VALUE);
private static final int OWN_THREAD_ID = ID_GENERATOR.getAndIncrement();
private static final int DEFAULT_INITIAL_MAX_CAPACITY_PER_THREAD = 4 * 1024; // Use 4k instances as default.
private static final int DEFAULT_MAX_CAPACITY_PER_THREAD;
private static final int INITIAL_CAPACITY;
private static final int MAX_SHARED_CAPACITY_FACTOR;
private static final int MAX_DELAYED_QUEUES_PER_THREAD;
private static final int LINK_CAPACITY;
private static final int RATIO;
static {
// In the future, we might have different maxCapacity for different object types.
// e.g. io.netty.recycler.maxCapacity.writeTask
// io.netty.recycler.maxCapacity.outboundBuffer
int maxCapacityPerThread = SystemPropertyUtil.getInt("io.netty.recycler.maxCapacityPerThread",
SystemPropertyUtil.getInt("io.netty.recycler.maxCapacity", DEFAULT_INITIAL_MAX_CAPACITY_PER_THREAD));
if (maxCapacityPerThread < 0) {
maxCapacityPerThread = DEFAULT_INITIAL_MAX_CAPACITY_PER_THREAD;
}
DEFAULT_MAX_CAPACITY_PER_THREAD = maxCapacityPerThread;
MAX_SHARED_CAPACITY_FACTOR = max(2,
SystemPropertyUtil.getInt("io.netty.recycler.maxSharedCapacityFactor",
2));
MAX_DELAYED_QUEUES_PER_THREAD = max(0,
SystemPropertyUtil.getInt("io.netty.recycler.maxDelayedQueuesPerThread",
// We use the same value as default EventLoop number
Enable configuring available processors Motivation: In cases when an application is running in a container or is otherwise constrained to the number of processors that it is using, the JVM invocation Runtime#availableProcessors will not return the constrained value but rather the number of processors available to the virtual machine. Netty uses this number in sizing various resources. Additionally, some applications will constrain the number of threads that they are using independenly of the number of processors available on the system. Thus, applications should have a way to globally configure the number of processors. Modifications: Rather than invoking Runtime#availableProcessors, Netty should rely on a method that enables configuration when the JVM is started or by the application. This commit exposes a new class NettyRuntime for enabling such configuraiton. This value can only be set once. Its default value is Runtime#availableProcessors so that there is no visible change to existing applications, but enables configuring either a system property or configuring during application startup (e.g., based on settings used to configure the application). Additionally, we introduce the usage of forbidden-apis to prevent future uses of Runtime#availableProcessors from creeping. Future work should enable the bundled signatures and clean up uses of deprecated and other forbidden methods. Result: Netty can be configured to not use the underlying number of processors, but rather the constrained number of processors.
2017-01-16 18:36:32 +01:00
NettyRuntime.availableProcessors() * 2));
LINK_CAPACITY = safeFindNextPositivePowerOfTwo(
max(SystemPropertyUtil.getInt("io.netty.recycler.linkCapacity", 16), 16));
// By default we allow one push to a Recycler for each 8th try on handles that were never recycled before.
// This should help to slowly increase the capacity of the recycler while not be too sensitive to allocation
// bursts.
RATIO = safeFindNextPositivePowerOfTwo(SystemPropertyUtil.getInt("io.netty.recycler.ratio", 8));
if (logger.isDebugEnabled()) {
if (DEFAULT_MAX_CAPACITY_PER_THREAD == 0) {
logger.debug("-Dio.netty.recycler.maxCapacityPerThread: disabled");
logger.debug("-Dio.netty.recycler.maxSharedCapacityFactor: disabled");
logger.debug("-Dio.netty.recycler.linkCapacity: disabled");
logger.debug("-Dio.netty.recycler.ratio: disabled");
} else {
logger.debug("-Dio.netty.recycler.maxCapacityPerThread: {}", DEFAULT_MAX_CAPACITY_PER_THREAD);
logger.debug("-Dio.netty.recycler.maxSharedCapacityFactor: {}", MAX_SHARED_CAPACITY_FACTOR);
logger.debug("-Dio.netty.recycler.linkCapacity: {}", LINK_CAPACITY);
logger.debug("-Dio.netty.recycler.ratio: {}", RATIO);
}
}
INITIAL_CAPACITY = min(DEFAULT_MAX_CAPACITY_PER_THREAD, 256);
}
private final int maxCapacityPerThread;
private final int maxSharedCapacityFactor;
private final int ratioMask;
private final int maxDelayedQueuesPerThread;
Refactor FastThreadLocal to simplify TLV management Motivation: When Netty runs in a managed environment such as web application server, Netty needs to provide an explicit way to remove the thread-local variables it created to prevent class loader leaks. FastThreadLocal uses different execution paths for storing a thread-local variable depending on the type of the current thread. It increases the complexity of thread-local removal. Modifications: - Moved FastThreadLocal and FastThreadLocalThread out of the internal package so that a user can use it. - FastThreadLocal now keeps track of all thread local variables it has initialized, and calling FastThreadLocal.removeAll() will remove all thread-local variables of the caller thread. - Added FastThreadLocal.size() for diagnostics and tests - Introduce InternalThreadLocalMap which is a mixture of hard-wired thread local variable fields and extensible indexed variables - FastThreadLocal now uses InternalThreadLocalMap to implement a thread-local variable. - Added ThreadDeathWatcher.unwatch() so that PooledByteBufAllocator tells it to stop watching when its thread-local cache has been freed by FastThreadLocal.removeAll(). - Added FastThreadLocalTest to ensure that removeAll() works - Added microbenchmark for FastThreadLocal and JDK ThreadLocal - Upgraded to JMH 0.9 Result: - A user can remove all thread-local variables Netty created, as long as he or she did not exit from the current thread. (Note that there's no way to remove a thread-local variable from outside of the thread.) - FastThreadLocal exposes more useful operations such as isSet() because we always implement a thread local variable via InternalThreadLocalMap instead of falling back to JDK ThreadLocal. - FastThreadLocalBenchmark shows that this change improves the performance of FastThreadLocal even more.
2014-06-17 11:37:58 +02:00
private final FastThreadLocal<Stack<T>> threadLocal = new FastThreadLocal<Stack<T>>() {
Revamp the core API to reduce memory footprint and consumption The API changes made so far turned out to increase the memory footprint and consumption while our intention was actually decreasing them. Memory consumption issue: When there are many connections which does not exchange data frequently, the old Netty 4 API spent a lot more memory than 3 because it always allocates per-handler buffer for each connection unless otherwise explicitly stated by a user. In a usual real world load, a client doesn't always send requests without pausing, so the idea of having a buffer whose life cycle if bound to the life cycle of a connection didn't work as expected. Memory footprint issue: The old Netty 4 API decreased overall memory footprint by a great deal in many cases. It was mainly because the old Netty 4 API did not allocate a new buffer and event object for each read. Instead, it created a new buffer for each handler in a pipeline. This works pretty well as long as the number of handlers in a pipeline is only a few. However, for a highly modular application with many handlers which handles connections which lasts for relatively short period, it actually makes the memory footprint issue much worse. Changes: All in all, this is about retaining all the good changes we made in 4 so far such as better thread model and going back to the way how we dealt with message events in 3. To fix the memory consumption/footprint issue mentioned above, we made a hard decision to break the backward compatibility again with the following changes: - Remove MessageBuf - Merge Buf into ByteBuf - Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler - Similar changes were made to the adapter classes - Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler - Similar changes were made to the adapter classes - Introduce MessageList which is similar to `MessageEvent` in Netty 3 - Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList) - Replace flush(ctx, promise) with write(ctx, MessageList, promise) - Remove ByteToByteEncoder/Decoder/Codec - Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf> - Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel - Add SimpleChannelInboundHandler which is sometimes more useful than ChannelInboundHandlerAdapter - Bring back Channel.isWritable() from Netty 3 - Add ChannelInboundHandler.channelWritabilityChanges() event - Add RecvByteBufAllocator configuration property - Similar to ReceiveBufferSizePredictor in Netty 3 - Some existing configuration properties such as DatagramChannelConfig.receivePacketSize is gone now. - Remove suspend/resumeIntermediaryDeallocation() in ByteBuf This change would have been impossible without @normanmaurer's help. He fixed, ported, and improved many parts of the changes.
2013-05-28 13:40:19 +02:00
@Override
protected Stack<T> initialValue() {
return new Stack<T>(Recycler.this, Thread.currentThread(), maxCapacityPerThread, maxSharedCapacityFactor,
ratioMask, maxDelayedQueuesPerThread);
Revamp the core API to reduce memory footprint and consumption The API changes made so far turned out to increase the memory footprint and consumption while our intention was actually decreasing them. Memory consumption issue: When there are many connections which does not exchange data frequently, the old Netty 4 API spent a lot more memory than 3 because it always allocates per-handler buffer for each connection unless otherwise explicitly stated by a user. In a usual real world load, a client doesn't always send requests without pausing, so the idea of having a buffer whose life cycle if bound to the life cycle of a connection didn't work as expected. Memory footprint issue: The old Netty 4 API decreased overall memory footprint by a great deal in many cases. It was mainly because the old Netty 4 API did not allocate a new buffer and event object for each read. Instead, it created a new buffer for each handler in a pipeline. This works pretty well as long as the number of handlers in a pipeline is only a few. However, for a highly modular application with many handlers which handles connections which lasts for relatively short period, it actually makes the memory footprint issue much worse. Changes: All in all, this is about retaining all the good changes we made in 4 so far such as better thread model and going back to the way how we dealt with message events in 3. To fix the memory consumption/footprint issue mentioned above, we made a hard decision to break the backward compatibility again with the following changes: - Remove MessageBuf - Merge Buf into ByteBuf - Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler - Similar changes were made to the adapter classes - Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler - Similar changes were made to the adapter classes - Introduce MessageList which is similar to `MessageEvent` in Netty 3 - Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList) - Replace flush(ctx, promise) with write(ctx, MessageList, promise) - Remove ByteToByteEncoder/Decoder/Codec - Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf> - Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel - Add SimpleChannelInboundHandler which is sometimes more useful than ChannelInboundHandlerAdapter - Bring back Channel.isWritable() from Netty 3 - Add ChannelInboundHandler.channelWritabilityChanges() event - Add RecvByteBufAllocator configuration property - Similar to ReceiveBufferSizePredictor in Netty 3 - Some existing configuration properties such as DatagramChannelConfig.receivePacketSize is gone now. - Remove suspend/resumeIntermediaryDeallocation() in ByteBuf This change would have been impossible without @normanmaurer's help. He fixed, ported, and improved many parts of the changes.
2013-05-28 13:40:19 +02:00
}
@Override
protected void onRemoval(Stack<T> value) {
// Let us remove the WeakOrderQueue from the WeakHashMap directly if its safe to remove some overhead
if (value.threadRef.get() == Thread.currentThread()) {
if (DELAYED_RECYCLED.isSet()) {
DELAYED_RECYCLED.get().remove(value);
}
}
}
Revamp the core API to reduce memory footprint and consumption The API changes made so far turned out to increase the memory footprint and consumption while our intention was actually decreasing them. Memory consumption issue: When there are many connections which does not exchange data frequently, the old Netty 4 API spent a lot more memory than 3 because it always allocates per-handler buffer for each connection unless otherwise explicitly stated by a user. In a usual real world load, a client doesn't always send requests without pausing, so the idea of having a buffer whose life cycle if bound to the life cycle of a connection didn't work as expected. Memory footprint issue: The old Netty 4 API decreased overall memory footprint by a great deal in many cases. It was mainly because the old Netty 4 API did not allocate a new buffer and event object for each read. Instead, it created a new buffer for each handler in a pipeline. This works pretty well as long as the number of handlers in a pipeline is only a few. However, for a highly modular application with many handlers which handles connections which lasts for relatively short period, it actually makes the memory footprint issue much worse. Changes: All in all, this is about retaining all the good changes we made in 4 so far such as better thread model and going back to the way how we dealt with message events in 3. To fix the memory consumption/footprint issue mentioned above, we made a hard decision to break the backward compatibility again with the following changes: - Remove MessageBuf - Merge Buf into ByteBuf - Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler - Similar changes were made to the adapter classes - Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler - Similar changes were made to the adapter classes - Introduce MessageList which is similar to `MessageEvent` in Netty 3 - Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList) - Replace flush(ctx, promise) with write(ctx, MessageList, promise) - Remove ByteToByteEncoder/Decoder/Codec - Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf> - Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel - Add SimpleChannelInboundHandler which is sometimes more useful than ChannelInboundHandlerAdapter - Bring back Channel.isWritable() from Netty 3 - Add ChannelInboundHandler.channelWritabilityChanges() event - Add RecvByteBufAllocator configuration property - Similar to ReceiveBufferSizePredictor in Netty 3 - Some existing configuration properties such as DatagramChannelConfig.receivePacketSize is gone now. - Remove suspend/resumeIntermediaryDeallocation() in ByteBuf This change would have been impossible without @normanmaurer's help. He fixed, ported, and improved many parts of the changes.
2013-05-28 13:40:19 +02:00
};
protected Recycler() {
this(DEFAULT_MAX_CAPACITY_PER_THREAD);
}
protected Recycler(int maxCapacityPerThread) {
this(maxCapacityPerThread, MAX_SHARED_CAPACITY_FACTOR);
}
protected Recycler(int maxCapacityPerThread, int maxSharedCapacityFactor) {
this(maxCapacityPerThread, maxSharedCapacityFactor, RATIO, MAX_DELAYED_QUEUES_PER_THREAD);
}
protected Recycler(int maxCapacityPerThread, int maxSharedCapacityFactor,
int ratio, int maxDelayedQueuesPerThread) {
ratioMask = safeFindNextPositivePowerOfTwo(ratio) - 1;
if (maxCapacityPerThread <= 0) {
this.maxCapacityPerThread = 0;
this.maxSharedCapacityFactor = 1;
this.maxDelayedQueuesPerThread = 0;
} else {
this.maxCapacityPerThread = maxCapacityPerThread;
this.maxSharedCapacityFactor = max(1, maxSharedCapacityFactor);
this.maxDelayedQueuesPerThread = max(0, maxDelayedQueuesPerThread);
}
}
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
@SuppressWarnings("unchecked")
Revamp the core API to reduce memory footprint and consumption The API changes made so far turned out to increase the memory footprint and consumption while our intention was actually decreasing them. Memory consumption issue: When there are many connections which does not exchange data frequently, the old Netty 4 API spent a lot more memory than 3 because it always allocates per-handler buffer for each connection unless otherwise explicitly stated by a user. In a usual real world load, a client doesn't always send requests without pausing, so the idea of having a buffer whose life cycle if bound to the life cycle of a connection didn't work as expected. Memory footprint issue: The old Netty 4 API decreased overall memory footprint by a great deal in many cases. It was mainly because the old Netty 4 API did not allocate a new buffer and event object for each read. Instead, it created a new buffer for each handler in a pipeline. This works pretty well as long as the number of handlers in a pipeline is only a few. However, for a highly modular application with many handlers which handles connections which lasts for relatively short period, it actually makes the memory footprint issue much worse. Changes: All in all, this is about retaining all the good changes we made in 4 so far such as better thread model and going back to the way how we dealt with message events in 3. To fix the memory consumption/footprint issue mentioned above, we made a hard decision to break the backward compatibility again with the following changes: - Remove MessageBuf - Merge Buf into ByteBuf - Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler - Similar changes were made to the adapter classes - Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler - Similar changes were made to the adapter classes - Introduce MessageList which is similar to `MessageEvent` in Netty 3 - Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList) - Replace flush(ctx, promise) with write(ctx, MessageList, promise) - Remove ByteToByteEncoder/Decoder/Codec - Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf> - Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel - Add SimpleChannelInboundHandler which is sometimes more useful than ChannelInboundHandlerAdapter - Bring back Channel.isWritable() from Netty 3 - Add ChannelInboundHandler.channelWritabilityChanges() event - Add RecvByteBufAllocator configuration property - Similar to ReceiveBufferSizePredictor in Netty 3 - Some existing configuration properties such as DatagramChannelConfig.receivePacketSize is gone now. - Remove suspend/resumeIntermediaryDeallocation() in ByteBuf This change would have been impossible without @normanmaurer's help. He fixed, ported, and improved many parts of the changes.
2013-05-28 13:40:19 +02:00
public final T get() {
if (maxCapacityPerThread == 0) {
return newObject((Handle<T>) NOOP_HANDLE);
}
Revamp the core API to reduce memory footprint and consumption The API changes made so far turned out to increase the memory footprint and consumption while our intention was actually decreasing them. Memory consumption issue: When there are many connections which does not exchange data frequently, the old Netty 4 API spent a lot more memory than 3 because it always allocates per-handler buffer for each connection unless otherwise explicitly stated by a user. In a usual real world load, a client doesn't always send requests without pausing, so the idea of having a buffer whose life cycle if bound to the life cycle of a connection didn't work as expected. Memory footprint issue: The old Netty 4 API decreased overall memory footprint by a great deal in many cases. It was mainly because the old Netty 4 API did not allocate a new buffer and event object for each read. Instead, it created a new buffer for each handler in a pipeline. This works pretty well as long as the number of handlers in a pipeline is only a few. However, for a highly modular application with many handlers which handles connections which lasts for relatively short period, it actually makes the memory footprint issue much worse. Changes: All in all, this is about retaining all the good changes we made in 4 so far such as better thread model and going back to the way how we dealt with message events in 3. To fix the memory consumption/footprint issue mentioned above, we made a hard decision to break the backward compatibility again with the following changes: - Remove MessageBuf - Merge Buf into ByteBuf - Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler - Similar changes were made to the adapter classes - Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler - Similar changes were made to the adapter classes - Introduce MessageList which is similar to `MessageEvent` in Netty 3 - Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList) - Replace flush(ctx, promise) with write(ctx, MessageList, promise) - Remove ByteToByteEncoder/Decoder/Codec - Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf> - Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel - Add SimpleChannelInboundHandler which is sometimes more useful than ChannelInboundHandlerAdapter - Bring back Channel.isWritable() from Netty 3 - Add ChannelInboundHandler.channelWritabilityChanges() event - Add RecvByteBufAllocator configuration property - Similar to ReceiveBufferSizePredictor in Netty 3 - Some existing configuration properties such as DatagramChannelConfig.receivePacketSize is gone now. - Remove suspend/resumeIntermediaryDeallocation() in ByteBuf This change would have been impossible without @normanmaurer's help. He fixed, ported, and improved many parts of the changes.
2013-05-28 13:40:19 +02:00
Stack<T> stack = threadLocal.get();
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
DefaultHandle<T> handle = stack.pop();
if (handle == null) {
handle = stack.newHandle();
handle.value = newObject(handle);
Revamp the core API to reduce memory footprint and consumption The API changes made so far turned out to increase the memory footprint and consumption while our intention was actually decreasing them. Memory consumption issue: When there are many connections which does not exchange data frequently, the old Netty 4 API spent a lot more memory than 3 because it always allocates per-handler buffer for each connection unless otherwise explicitly stated by a user. In a usual real world load, a client doesn't always send requests without pausing, so the idea of having a buffer whose life cycle if bound to the life cycle of a connection didn't work as expected. Memory footprint issue: The old Netty 4 API decreased overall memory footprint by a great deal in many cases. It was mainly because the old Netty 4 API did not allocate a new buffer and event object for each read. Instead, it created a new buffer for each handler in a pipeline. This works pretty well as long as the number of handlers in a pipeline is only a few. However, for a highly modular application with many handlers which handles connections which lasts for relatively short period, it actually makes the memory footprint issue much worse. Changes: All in all, this is about retaining all the good changes we made in 4 so far such as better thread model and going back to the way how we dealt with message events in 3. To fix the memory consumption/footprint issue mentioned above, we made a hard decision to break the backward compatibility again with the following changes: - Remove MessageBuf - Merge Buf into ByteBuf - Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler - Similar changes were made to the adapter classes - Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler - Similar changes were made to the adapter classes - Introduce MessageList which is similar to `MessageEvent` in Netty 3 - Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList) - Replace flush(ctx, promise) with write(ctx, MessageList, promise) - Remove ByteToByteEncoder/Decoder/Codec - Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf> - Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel - Add SimpleChannelInboundHandler which is sometimes more useful than ChannelInboundHandlerAdapter - Bring back Channel.isWritable() from Netty 3 - Add ChannelInboundHandler.channelWritabilityChanges() event - Add RecvByteBufAllocator configuration property - Similar to ReceiveBufferSizePredictor in Netty 3 - Some existing configuration properties such as DatagramChannelConfig.receivePacketSize is gone now. - Remove suspend/resumeIntermediaryDeallocation() in ByteBuf This change would have been impossible without @normanmaurer's help. He fixed, ported, and improved many parts of the changes.
2013-05-28 13:40:19 +02:00
}
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
return (T) handle.value;
Revamp the core API to reduce memory footprint and consumption The API changes made so far turned out to increase the memory footprint and consumption while our intention was actually decreasing them. Memory consumption issue: When there are many connections which does not exchange data frequently, the old Netty 4 API spent a lot more memory than 3 because it always allocates per-handler buffer for each connection unless otherwise explicitly stated by a user. In a usual real world load, a client doesn't always send requests without pausing, so the idea of having a buffer whose life cycle if bound to the life cycle of a connection didn't work as expected. Memory footprint issue: The old Netty 4 API decreased overall memory footprint by a great deal in many cases. It was mainly because the old Netty 4 API did not allocate a new buffer and event object for each read. Instead, it created a new buffer for each handler in a pipeline. This works pretty well as long as the number of handlers in a pipeline is only a few. However, for a highly modular application with many handlers which handles connections which lasts for relatively short period, it actually makes the memory footprint issue much worse. Changes: All in all, this is about retaining all the good changes we made in 4 so far such as better thread model and going back to the way how we dealt with message events in 3. To fix the memory consumption/footprint issue mentioned above, we made a hard decision to break the backward compatibility again with the following changes: - Remove MessageBuf - Merge Buf into ByteBuf - Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler - Similar changes were made to the adapter classes - Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler - Similar changes were made to the adapter classes - Introduce MessageList which is similar to `MessageEvent` in Netty 3 - Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList) - Replace flush(ctx, promise) with write(ctx, MessageList, promise) - Remove ByteToByteEncoder/Decoder/Codec - Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf> - Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel - Add SimpleChannelInboundHandler which is sometimes more useful than ChannelInboundHandlerAdapter - Bring back Channel.isWritable() from Netty 3 - Add ChannelInboundHandler.channelWritabilityChanges() event - Add RecvByteBufAllocator configuration property - Similar to ReceiveBufferSizePredictor in Netty 3 - Some existing configuration properties such as DatagramChannelConfig.receivePacketSize is gone now. - Remove suspend/resumeIntermediaryDeallocation() in ByteBuf This change would have been impossible without @normanmaurer's help. He fixed, ported, and improved many parts of the changes.
2013-05-28 13:40:19 +02:00
}
/**
* @deprecated use {@link Handle#recycle(Object)}.
*/
@Deprecated
public final boolean recycle(T o, Handle<T> handle) {
if (handle == NOOP_HANDLE) {
return false;
}
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
DefaultHandle<T> h = (DefaultHandle<T>) handle;
if (h.stack.parent != this) {
Revamp the core API to reduce memory footprint and consumption The API changes made so far turned out to increase the memory footprint and consumption while our intention was actually decreasing them. Memory consumption issue: When there are many connections which does not exchange data frequently, the old Netty 4 API spent a lot more memory than 3 because it always allocates per-handler buffer for each connection unless otherwise explicitly stated by a user. In a usual real world load, a client doesn't always send requests without pausing, so the idea of having a buffer whose life cycle if bound to the life cycle of a connection didn't work as expected. Memory footprint issue: The old Netty 4 API decreased overall memory footprint by a great deal in many cases. It was mainly because the old Netty 4 API did not allocate a new buffer and event object for each read. Instead, it created a new buffer for each handler in a pipeline. This works pretty well as long as the number of handlers in a pipeline is only a few. However, for a highly modular application with many handlers which handles connections which lasts for relatively short period, it actually makes the memory footprint issue much worse. Changes: All in all, this is about retaining all the good changes we made in 4 so far such as better thread model and going back to the way how we dealt with message events in 3. To fix the memory consumption/footprint issue mentioned above, we made a hard decision to break the backward compatibility again with the following changes: - Remove MessageBuf - Merge Buf into ByteBuf - Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler - Similar changes were made to the adapter classes - Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler - Similar changes were made to the adapter classes - Introduce MessageList which is similar to `MessageEvent` in Netty 3 - Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList) - Replace flush(ctx, promise) with write(ctx, MessageList, promise) - Remove ByteToByteEncoder/Decoder/Codec - Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf> - Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel - Add SimpleChannelInboundHandler which is sometimes more useful than ChannelInboundHandlerAdapter - Bring back Channel.isWritable() from Netty 3 - Add ChannelInboundHandler.channelWritabilityChanges() event - Add RecvByteBufAllocator configuration property - Similar to ReceiveBufferSizePredictor in Netty 3 - Some existing configuration properties such as DatagramChannelConfig.receivePacketSize is gone now. - Remove suspend/resumeIntermediaryDeallocation() in ByteBuf This change would have been impossible without @normanmaurer's help. He fixed, ported, and improved many parts of the changes.
2013-05-28 13:40:19 +02:00
return false;
}
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
h.recycle(o);
Revamp the core API to reduce memory footprint and consumption The API changes made so far turned out to increase the memory footprint and consumption while our intention was actually decreasing them. Memory consumption issue: When there are many connections which does not exchange data frequently, the old Netty 4 API spent a lot more memory than 3 because it always allocates per-handler buffer for each connection unless otherwise explicitly stated by a user. In a usual real world load, a client doesn't always send requests without pausing, so the idea of having a buffer whose life cycle if bound to the life cycle of a connection didn't work as expected. Memory footprint issue: The old Netty 4 API decreased overall memory footprint by a great deal in many cases. It was mainly because the old Netty 4 API did not allocate a new buffer and event object for each read. Instead, it created a new buffer for each handler in a pipeline. This works pretty well as long as the number of handlers in a pipeline is only a few. However, for a highly modular application with many handlers which handles connections which lasts for relatively short period, it actually makes the memory footprint issue much worse. Changes: All in all, this is about retaining all the good changes we made in 4 so far such as better thread model and going back to the way how we dealt with message events in 3. To fix the memory consumption/footprint issue mentioned above, we made a hard decision to break the backward compatibility again with the following changes: - Remove MessageBuf - Merge Buf into ByteBuf - Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler - Similar changes were made to the adapter classes - Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler - Similar changes were made to the adapter classes - Introduce MessageList which is similar to `MessageEvent` in Netty 3 - Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList) - Replace flush(ctx, promise) with write(ctx, MessageList, promise) - Remove ByteToByteEncoder/Decoder/Codec - Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf> - Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel - Add SimpleChannelInboundHandler which is sometimes more useful than ChannelInboundHandlerAdapter - Bring back Channel.isWritable() from Netty 3 - Add ChannelInboundHandler.channelWritabilityChanges() event - Add RecvByteBufAllocator configuration property - Similar to ReceiveBufferSizePredictor in Netty 3 - Some existing configuration properties such as DatagramChannelConfig.receivePacketSize is gone now. - Remove suspend/resumeIntermediaryDeallocation() in ByteBuf This change would have been impossible without @normanmaurer's help. He fixed, ported, and improved many parts of the changes.
2013-05-28 13:40:19 +02:00
return true;
}
final int threadLocalCapacity() {
return threadLocal.get().elements.length;
}
final int threadLocalSize() {
return threadLocal.get().size;
}
protected abstract T newObject(Handle<T> handle);
Revamp the core API to reduce memory footprint and consumption The API changes made so far turned out to increase the memory footprint and consumption while our intention was actually decreasing them. Memory consumption issue: When there are many connections which does not exchange data frequently, the old Netty 4 API spent a lot more memory than 3 because it always allocates per-handler buffer for each connection unless otherwise explicitly stated by a user. In a usual real world load, a client doesn't always send requests without pausing, so the idea of having a buffer whose life cycle if bound to the life cycle of a connection didn't work as expected. Memory footprint issue: The old Netty 4 API decreased overall memory footprint by a great deal in many cases. It was mainly because the old Netty 4 API did not allocate a new buffer and event object for each read. Instead, it created a new buffer for each handler in a pipeline. This works pretty well as long as the number of handlers in a pipeline is only a few. However, for a highly modular application with many handlers which handles connections which lasts for relatively short period, it actually makes the memory footprint issue much worse. Changes: All in all, this is about retaining all the good changes we made in 4 so far such as better thread model and going back to the way how we dealt with message events in 3. To fix the memory consumption/footprint issue mentioned above, we made a hard decision to break the backward compatibility again with the following changes: - Remove MessageBuf - Merge Buf into ByteBuf - Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler - Similar changes were made to the adapter classes - Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler - Similar changes were made to the adapter classes - Introduce MessageList which is similar to `MessageEvent` in Netty 3 - Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList) - Replace flush(ctx, promise) with write(ctx, MessageList, promise) - Remove ByteToByteEncoder/Decoder/Codec - Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf> - Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel - Add SimpleChannelInboundHandler which is sometimes more useful than ChannelInboundHandlerAdapter - Bring back Channel.isWritable() from Netty 3 - Add ChannelInboundHandler.channelWritabilityChanges() event - Add RecvByteBufAllocator configuration property - Similar to ReceiveBufferSizePredictor in Netty 3 - Some existing configuration properties such as DatagramChannelConfig.receivePacketSize is gone now. - Remove suspend/resumeIntermediaryDeallocation() in ByteBuf This change would have been impossible without @normanmaurer's help. He fixed, ported, and improved many parts of the changes.
2013-05-28 13:40:19 +02:00
public interface Handle<T> {
void recycle(T object);
}
Revamp the core API to reduce memory footprint and consumption The API changes made so far turned out to increase the memory footprint and consumption while our intention was actually decreasing them. Memory consumption issue: When there are many connections which does not exchange data frequently, the old Netty 4 API spent a lot more memory than 3 because it always allocates per-handler buffer for each connection unless otherwise explicitly stated by a user. In a usual real world load, a client doesn't always send requests without pausing, so the idea of having a buffer whose life cycle if bound to the life cycle of a connection didn't work as expected. Memory footprint issue: The old Netty 4 API decreased overall memory footprint by a great deal in many cases. It was mainly because the old Netty 4 API did not allocate a new buffer and event object for each read. Instead, it created a new buffer for each handler in a pipeline. This works pretty well as long as the number of handlers in a pipeline is only a few. However, for a highly modular application with many handlers which handles connections which lasts for relatively short period, it actually makes the memory footprint issue much worse. Changes: All in all, this is about retaining all the good changes we made in 4 so far such as better thread model and going back to the way how we dealt with message events in 3. To fix the memory consumption/footprint issue mentioned above, we made a hard decision to break the backward compatibility again with the following changes: - Remove MessageBuf - Merge Buf into ByteBuf - Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler - Similar changes were made to the adapter classes - Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler - Similar changes were made to the adapter classes - Introduce MessageList which is similar to `MessageEvent` in Netty 3 - Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList) - Replace flush(ctx, promise) with write(ctx, MessageList, promise) - Remove ByteToByteEncoder/Decoder/Codec - Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf> - Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel - Add SimpleChannelInboundHandler which is sometimes more useful than ChannelInboundHandlerAdapter - Bring back Channel.isWritable() from Netty 3 - Add ChannelInboundHandler.channelWritabilityChanges() event - Add RecvByteBufAllocator configuration property - Similar to ReceiveBufferSizePredictor in Netty 3 - Some existing configuration properties such as DatagramChannelConfig.receivePacketSize is gone now. - Remove suspend/resumeIntermediaryDeallocation() in ByteBuf This change would have been impossible without @normanmaurer's help. He fixed, ported, and improved many parts of the changes.
2013-05-28 13:40:19 +02:00
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
static final class DefaultHandle<T> implements Handle<T> {
private int lastRecycledId;
private int recycleId;
boolean hasBeenRecycled;
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
private Stack<?> stack;
private Object value;
DefaultHandle(Stack<?> stack) {
this.stack = stack;
}
@Override
public void recycle(Object object) {
if (object != value) {
throw new IllegalArgumentException("object does not belong to handle");
}
Stack<?> stack = this.stack;
if (lastRecycledId != recycleId || stack == null) {
throw new IllegalStateException("recycled already");
}
stack.push(this);
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
}
}
private static final FastThreadLocal<Map<Stack<?>, WeakOrderQueue>> DELAYED_RECYCLED =
new FastThreadLocal<Map<Stack<?>, WeakOrderQueue>>() {
@Override
protected Map<Stack<?>, WeakOrderQueue> initialValue() {
return new WeakHashMap<Stack<?>, WeakOrderQueue>();
}
};
// a queue that makes only moderate guarantees about visibility: items are seen in the correct order,
// but we aren't absolutely guaranteed to ever see anything at all, thereby keeping the queue cheap to maintain
private static final class WeakOrderQueue {
static final WeakOrderQueue DUMMY = new WeakOrderQueue();
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
// Let Link extend AtomicInteger for intrinsics. The Link itself will be used as writerIndex.
@SuppressWarnings("serial")
static final class Link extends AtomicInteger {
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
private final DefaultHandle<?>[] elements = new DefaultHandle[LINK_CAPACITY];
private int readIndex;
Link next;
}
// This act as a place holder for the head Link but also will reclaim space once finalized.
// Its important this does not hold any reference to either Stack or WeakOrderQueue.
static final class Head {
private final AtomicInteger availableSharedCapacity;
Link link;
Head(AtomicInteger availableSharedCapacity) {
this.availableSharedCapacity = availableSharedCapacity;
}
/// TODO: In the future when we move to Java9+ we should use java.lang.ref.Cleaner.
@Override
protected void finalize() throws Throwable {
try {
super.finalize();
} finally {
Link head = link;
link = null;
while (head != null) {
reclaimSpace(LINK_CAPACITY);
Link next = head.next;
// Unlink to help GC and guard against GC nepotism.
head.next = null;
head = next;
}
}
}
void reclaimSpace(int space) {
assert space >= 0;
availableSharedCapacity.addAndGet(space);
}
boolean reserveSpace(int space) {
return reserveSpace(availableSharedCapacity, space);
}
static boolean reserveSpace(AtomicInteger availableSharedCapacity, int space) {
assert space >= 0;
for (;;) {
int available = availableSharedCapacity.get();
if (available < space) {
return false;
}
if (availableSharedCapacity.compareAndSet(available, available - space)) {
return true;
}
}
}
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
}
// chain of data items
private final Head head;
private Link tail;
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
// pointer to another queue of delayed items for the same stack
private WeakOrderQueue next;
private final WeakReference<Thread> owner;
private final int id = ID_GENERATOR.getAndIncrement();
private WeakOrderQueue() {
owner = null;
head = new Head(null);
}
private WeakOrderQueue(Stack<?> stack, Thread thread) {
tail = new Link();
// Its important that we not store the Stack itself in the WeakOrderQueue as the Stack also is used in
// the WeakHashMap as key. So just store the enclosed AtomicInteger which should allow to have the
// Stack itself GCed.
head = new Head(stack.availableSharedCapacity);
head.link = tail;
owner = new WeakReference<Thread>(thread);
}
static WeakOrderQueue newQueue(Stack<?> stack, Thread thread) {
final WeakOrderQueue queue = new WeakOrderQueue(stack, thread);
// Done outside of the constructor to ensure WeakOrderQueue.this does not escape the constructor and so
// may be accessed while its still constructed.
stack.setHead(queue);
return queue;
}
private void setNext(WeakOrderQueue next) {
assert next != this;
this.next = next;
}
/**
* Allocate a new {@link WeakOrderQueue} or return {@code null} if not possible.
*/
static WeakOrderQueue allocate(Stack<?> stack, Thread thread) {
// We allocated a Link so reserve the space
return Head.reserveSpace(stack.availableSharedCapacity, LINK_CAPACITY)
? newQueue(stack, thread) : null;
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
}
void add(DefaultHandle<?> handle) {
handle.lastRecycledId = id;
Link tail = this.tail;
int writeIndex;
if ((writeIndex = tail.get()) == LINK_CAPACITY) {
if (!head.reserveSpace(LINK_CAPACITY)) {
// Drop it.
return;
}
// We allocate a Link so reserve the space
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
this.tail = tail = tail.next = new Link();
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
writeIndex = tail.get();
}
tail.elements[writeIndex] = handle;
handle.stack = null;
// we lazy set to ensure that setting stack to null appears before we unnull it in the owning thread;
// this also means we guarantee visibility of an element in the queue if we see the index updated
tail.lazySet(writeIndex + 1);
}
boolean hasFinalData() {
return tail.readIndex != tail.get();
}
// transfer as many items as we can from this queue to the stack, returning true if any were transferred
@SuppressWarnings("rawtypes")
boolean transfer(Stack<?> dst) {
Link head = this.head.link;
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
if (head == null) {
return false;
}
if (head.readIndex == LINK_CAPACITY) {
if (head.next == null) {
return false;
}
this.head.link = head = head.next;
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
}
final int srcStart = head.readIndex;
int srcEnd = head.get();
final int srcSize = srcEnd - srcStart;
if (srcSize == 0) {
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
return false;
}
final int dstSize = dst.size;
final int expectedCapacity = dstSize + srcSize;
if (expectedCapacity > dst.elements.length) {
final int actualCapacity = dst.increaseCapacity(expectedCapacity);
srcEnd = min(srcStart + actualCapacity - dstSize, srcEnd);
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
}
if (srcStart != srcEnd) {
final DefaultHandle[] srcElems = head.elements;
final DefaultHandle[] dstElems = dst.elements;
int newDstSize = dstSize;
for (int i = srcStart; i < srcEnd; i++) {
DefaultHandle element = srcElems[i];
if (element.recycleId == 0) {
element.recycleId = element.lastRecycledId;
} else if (element.recycleId != element.lastRecycledId) {
throw new IllegalStateException("recycled already");
}
srcElems[i] = null;
if (dst.dropHandle(element)) {
// Drop the object.
continue;
}
element.stack = dst;
dstElems[newDstSize ++] = element;
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
}
if (srcEnd == LINK_CAPACITY && head.next != null) {
// Add capacity back as the Link is GCed.
this.head.reclaimSpace(LINK_CAPACITY);
this.head.link = head.next;
}
head.readIndex = srcEnd;
if (dst.size == newDstSize) {
return false;
}
dst.size = newDstSize;
return true;
} else {
// The destination stack is full already.
return false;
}
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
}
}
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
static final class Stack<T> {
// we keep a queue of per-thread queues, which is appended to once only, each time a new thread other
// than the stack owner recycles: when we run out of items in our stack we iterate this collection
// to scavenge those that can be reused. this permits us to incur minimal thread synchronisation whilst
// still recycling all items.
Revamp the core API to reduce memory footprint and consumption The API changes made so far turned out to increase the memory footprint and consumption while our intention was actually decreasing them. Memory consumption issue: When there are many connections which does not exchange data frequently, the old Netty 4 API spent a lot more memory than 3 because it always allocates per-handler buffer for each connection unless otherwise explicitly stated by a user. In a usual real world load, a client doesn't always send requests without pausing, so the idea of having a buffer whose life cycle if bound to the life cycle of a connection didn't work as expected. Memory footprint issue: The old Netty 4 API decreased overall memory footprint by a great deal in many cases. It was mainly because the old Netty 4 API did not allocate a new buffer and event object for each read. Instead, it created a new buffer for each handler in a pipeline. This works pretty well as long as the number of handlers in a pipeline is only a few. However, for a highly modular application with many handlers which handles connections which lasts for relatively short period, it actually makes the memory footprint issue much worse. Changes: All in all, this is about retaining all the good changes we made in 4 so far such as better thread model and going back to the way how we dealt with message events in 3. To fix the memory consumption/footprint issue mentioned above, we made a hard decision to break the backward compatibility again with the following changes: - Remove MessageBuf - Merge Buf into ByteBuf - Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler - Similar changes were made to the adapter classes - Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler - Similar changes were made to the adapter classes - Introduce MessageList which is similar to `MessageEvent` in Netty 3 - Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList) - Replace flush(ctx, promise) with write(ctx, MessageList, promise) - Remove ByteToByteEncoder/Decoder/Codec - Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf> - Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel - Add SimpleChannelInboundHandler which is sometimes more useful than ChannelInboundHandlerAdapter - Bring back Channel.isWritable() from Netty 3 - Add ChannelInboundHandler.channelWritabilityChanges() event - Add RecvByteBufAllocator configuration property - Similar to ReceiveBufferSizePredictor in Netty 3 - Some existing configuration properties such as DatagramChannelConfig.receivePacketSize is gone now. - Remove suspend/resumeIntermediaryDeallocation() in ByteBuf This change would have been impossible without @normanmaurer's help. He fixed, ported, and improved many parts of the changes.
2013-05-28 13:40:19 +02:00
final Recycler<T> parent;
// We store the Thread in a WeakReference as otherwise we may be the only ones that still hold a strong
// Reference to the Thread itself after it died because DefaultHandle will hold a reference to the Stack.
//
// The biggest issue is if we do not use a WeakReference the Thread may not be able to be collected at all if
// the user will store a reference to the DefaultHandle somewhere and never clear this reference (or not clear
// it in a timely manner).
final WeakReference<Thread> threadRef;
final AtomicInteger availableSharedCapacity;
final int maxDelayedQueues;
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
private final int maxCapacity;
private final int ratioMask;
private DefaultHandle<?>[] elements;
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
private int size;
private int handleRecycleCount = -1; // Start with -1 so the first one will be recycled.
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
private WeakOrderQueue cursor, prev;
private volatile WeakOrderQueue head;
Revamp the core API to reduce memory footprint and consumption The API changes made so far turned out to increase the memory footprint and consumption while our intention was actually decreasing them. Memory consumption issue: When there are many connections which does not exchange data frequently, the old Netty 4 API spent a lot more memory than 3 because it always allocates per-handler buffer for each connection unless otherwise explicitly stated by a user. In a usual real world load, a client doesn't always send requests without pausing, so the idea of having a buffer whose life cycle if bound to the life cycle of a connection didn't work as expected. Memory footprint issue: The old Netty 4 API decreased overall memory footprint by a great deal in many cases. It was mainly because the old Netty 4 API did not allocate a new buffer and event object for each read. Instead, it created a new buffer for each handler in a pipeline. This works pretty well as long as the number of handlers in a pipeline is only a few. However, for a highly modular application with many handlers which handles connections which lasts for relatively short period, it actually makes the memory footprint issue much worse. Changes: All in all, this is about retaining all the good changes we made in 4 so far such as better thread model and going back to the way how we dealt with message events in 3. To fix the memory consumption/footprint issue mentioned above, we made a hard decision to break the backward compatibility again with the following changes: - Remove MessageBuf - Merge Buf into ByteBuf - Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler - Similar changes were made to the adapter classes - Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler - Similar changes were made to the adapter classes - Introduce MessageList which is similar to `MessageEvent` in Netty 3 - Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList) - Replace flush(ctx, promise) with write(ctx, MessageList, promise) - Remove ByteToByteEncoder/Decoder/Codec - Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf> - Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel - Add SimpleChannelInboundHandler which is sometimes more useful than ChannelInboundHandlerAdapter - Bring back Channel.isWritable() from Netty 3 - Add ChannelInboundHandler.channelWritabilityChanges() event - Add RecvByteBufAllocator configuration property - Similar to ReceiveBufferSizePredictor in Netty 3 - Some existing configuration properties such as DatagramChannelConfig.receivePacketSize is gone now. - Remove suspend/resumeIntermediaryDeallocation() in ByteBuf This change would have been impossible without @normanmaurer's help. He fixed, ported, and improved many parts of the changes.
2013-05-28 13:40:19 +02:00
Stack(Recycler<T> parent, Thread thread, int maxCapacity, int maxSharedCapacityFactor,
int ratioMask, int maxDelayedQueues) {
Revamp the core API to reduce memory footprint and consumption The API changes made so far turned out to increase the memory footprint and consumption while our intention was actually decreasing them. Memory consumption issue: When there are many connections which does not exchange data frequently, the old Netty 4 API spent a lot more memory than 3 because it always allocates per-handler buffer for each connection unless otherwise explicitly stated by a user. In a usual real world load, a client doesn't always send requests without pausing, so the idea of having a buffer whose life cycle if bound to the life cycle of a connection didn't work as expected. Memory footprint issue: The old Netty 4 API decreased overall memory footprint by a great deal in many cases. It was mainly because the old Netty 4 API did not allocate a new buffer and event object for each read. Instead, it created a new buffer for each handler in a pipeline. This works pretty well as long as the number of handlers in a pipeline is only a few. However, for a highly modular application with many handlers which handles connections which lasts for relatively short period, it actually makes the memory footprint issue much worse. Changes: All in all, this is about retaining all the good changes we made in 4 so far such as better thread model and going back to the way how we dealt with message events in 3. To fix the memory consumption/footprint issue mentioned above, we made a hard decision to break the backward compatibility again with the following changes: - Remove MessageBuf - Merge Buf into ByteBuf - Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler - Similar changes were made to the adapter classes - Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler - Similar changes were made to the adapter classes - Introduce MessageList which is similar to `MessageEvent` in Netty 3 - Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList) - Replace flush(ctx, promise) with write(ctx, MessageList, promise) - Remove ByteToByteEncoder/Decoder/Codec - Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf> - Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel - Add SimpleChannelInboundHandler which is sometimes more useful than ChannelInboundHandlerAdapter - Bring back Channel.isWritable() from Netty 3 - Add ChannelInboundHandler.channelWritabilityChanges() event - Add RecvByteBufAllocator configuration property - Similar to ReceiveBufferSizePredictor in Netty 3 - Some existing configuration properties such as DatagramChannelConfig.receivePacketSize is gone now. - Remove suspend/resumeIntermediaryDeallocation() in ByteBuf This change would have been impossible without @normanmaurer's help. He fixed, ported, and improved many parts of the changes.
2013-05-28 13:40:19 +02:00
this.parent = parent;
threadRef = new WeakReference<Thread>(thread);
this.maxCapacity = maxCapacity;
availableSharedCapacity = new AtomicInteger(max(maxCapacity / maxSharedCapacityFactor, LINK_CAPACITY));
elements = new DefaultHandle[min(INITIAL_CAPACITY, maxCapacity)];
this.ratioMask = ratioMask;
this.maxDelayedQueues = maxDelayedQueues;
}
// Marked as synchronized to ensure this is serialized.
synchronized void setHead(WeakOrderQueue queue) {
queue.setNext(head);
head = queue;
}
int increaseCapacity(int expectedCapacity) {
int newCapacity = elements.length;
int maxCapacity = this.maxCapacity;
do {
newCapacity <<= 1;
} while (newCapacity < expectedCapacity && newCapacity < maxCapacity);
newCapacity = min(newCapacity, maxCapacity);
if (newCapacity != elements.length) {
elements = Arrays.copyOf(elements, newCapacity);
}
return newCapacity;
Revamp the core API to reduce memory footprint and consumption The API changes made so far turned out to increase the memory footprint and consumption while our intention was actually decreasing them. Memory consumption issue: When there are many connections which does not exchange data frequently, the old Netty 4 API spent a lot more memory than 3 because it always allocates per-handler buffer for each connection unless otherwise explicitly stated by a user. In a usual real world load, a client doesn't always send requests without pausing, so the idea of having a buffer whose life cycle if bound to the life cycle of a connection didn't work as expected. Memory footprint issue: The old Netty 4 API decreased overall memory footprint by a great deal in many cases. It was mainly because the old Netty 4 API did not allocate a new buffer and event object for each read. Instead, it created a new buffer for each handler in a pipeline. This works pretty well as long as the number of handlers in a pipeline is only a few. However, for a highly modular application with many handlers which handles connections which lasts for relatively short period, it actually makes the memory footprint issue much worse. Changes: All in all, this is about retaining all the good changes we made in 4 so far such as better thread model and going back to the way how we dealt with message events in 3. To fix the memory consumption/footprint issue mentioned above, we made a hard decision to break the backward compatibility again with the following changes: - Remove MessageBuf - Merge Buf into ByteBuf - Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler - Similar changes were made to the adapter classes - Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler - Similar changes were made to the adapter classes - Introduce MessageList which is similar to `MessageEvent` in Netty 3 - Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList) - Replace flush(ctx, promise) with write(ctx, MessageList, promise) - Remove ByteToByteEncoder/Decoder/Codec - Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf> - Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel - Add SimpleChannelInboundHandler which is sometimes more useful than ChannelInboundHandlerAdapter - Bring back Channel.isWritable() from Netty 3 - Add ChannelInboundHandler.channelWritabilityChanges() event - Add RecvByteBufAllocator configuration property - Similar to ReceiveBufferSizePredictor in Netty 3 - Some existing configuration properties such as DatagramChannelConfig.receivePacketSize is gone now. - Remove suspend/resumeIntermediaryDeallocation() in ByteBuf This change would have been impossible without @normanmaurer's help. He fixed, ported, and improved many parts of the changes.
2013-05-28 13:40:19 +02:00
}
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
@SuppressWarnings({ "unchecked", "rawtypes" })
DefaultHandle<T> pop() {
int size = this.size;
if (size == 0) {
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
if (!scavenge()) {
return null;
}
size = this.size;
}
size --;
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
DefaultHandle ret = elements[size];
elements[size] = null;
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
if (ret.lastRecycledId != ret.recycleId) {
throw new IllegalStateException("recycled multiple times");
}
ret.recycleId = 0;
ret.lastRecycledId = 0;
this.size = size;
Revamp the core API to reduce memory footprint and consumption The API changes made so far turned out to increase the memory footprint and consumption while our intention was actually decreasing them. Memory consumption issue: When there are many connections which does not exchange data frequently, the old Netty 4 API spent a lot more memory than 3 because it always allocates per-handler buffer for each connection unless otherwise explicitly stated by a user. In a usual real world load, a client doesn't always send requests without pausing, so the idea of having a buffer whose life cycle if bound to the life cycle of a connection didn't work as expected. Memory footprint issue: The old Netty 4 API decreased overall memory footprint by a great deal in many cases. It was mainly because the old Netty 4 API did not allocate a new buffer and event object for each read. Instead, it created a new buffer for each handler in a pipeline. This works pretty well as long as the number of handlers in a pipeline is only a few. However, for a highly modular application with many handlers which handles connections which lasts for relatively short period, it actually makes the memory footprint issue much worse. Changes: All in all, this is about retaining all the good changes we made in 4 so far such as better thread model and going back to the way how we dealt with message events in 3. To fix the memory consumption/footprint issue mentioned above, we made a hard decision to break the backward compatibility again with the following changes: - Remove MessageBuf - Merge Buf into ByteBuf - Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler - Similar changes were made to the adapter classes - Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler - Similar changes were made to the adapter classes - Introduce MessageList which is similar to `MessageEvent` in Netty 3 - Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList) - Replace flush(ctx, promise) with write(ctx, MessageList, promise) - Remove ByteToByteEncoder/Decoder/Codec - Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf> - Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel - Add SimpleChannelInboundHandler which is sometimes more useful than ChannelInboundHandlerAdapter - Bring back Channel.isWritable() from Netty 3 - Add ChannelInboundHandler.channelWritabilityChanges() event - Add RecvByteBufAllocator configuration property - Similar to ReceiveBufferSizePredictor in Netty 3 - Some existing configuration properties such as DatagramChannelConfig.receivePacketSize is gone now. - Remove suspend/resumeIntermediaryDeallocation() in ByteBuf This change would have been impossible without @normanmaurer's help. He fixed, ported, and improved many parts of the changes.
2013-05-28 13:40:19 +02:00
return ret;
}
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
boolean scavenge() {
// continue an existing scavenge, if any
if (scavengeSome()) {
return true;
}
// reset our scavenge cursor
prev = null;
cursor = head;
return false;
}
boolean scavengeSome() {
WeakOrderQueue prev;
WeakOrderQueue cursor = this.cursor;
if (cursor == null) {
prev = null;
cursor = head;
if (cursor == null) {
return false;
}
} else {
prev = this.prev;
}
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
boolean success = false;
do {
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
if (cursor.transfer(this)) {
success = true;
break;
}
WeakOrderQueue next = cursor.next;
if (cursor.owner.get() == null) {
// If the thread associated with the queue is gone, unlink it, after
// performing a volatile read to confirm there is no data left to collect.
// We never unlink the first queue, as we don't want to synchronize on updating the head.
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
if (cursor.hasFinalData()) {
for (;;) {
if (cursor.transfer(this)) {
success = true;
} else {
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
break;
}
}
}
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
if (prev != null) {
prev.setNext(next);
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
}
} else {
prev = cursor;
}
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
cursor = next;
} while (cursor != null && !success);
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
this.prev = prev;
this.cursor = cursor;
return success;
}
void push(DefaultHandle<?> item) {
Thread currentThread = Thread.currentThread();
if (threadRef.get() == currentThread) {
// The current Thread is the thread that belongs to the Stack, we can try to push the object now.
pushNow(item);
} else {
// The current Thread is not the one that belongs to the Stack
// (or the Thread that belonged to the Stack was collected already), we need to signal that the push
// happens later.
pushLater(item, currentThread);
}
}
private void pushNow(DefaultHandle<?> item) {
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
if ((item.recycleId | item.lastRecycledId) != 0) {
throw new IllegalStateException("recycled already");
}
item.recycleId = item.lastRecycledId = OWN_THREAD_ID;
int size = this.size;
if (size >= maxCapacity || dropHandle(item)) {
// Hit the maximum capacity or should drop - drop the possibly youngest object.
return;
}
if (size == elements.length) {
elements = Arrays.copyOf(elements, min(size << 1, maxCapacity));
}
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
elements[size] = item;
this.size = size + 1;
}
private void pushLater(DefaultHandle<?> item, Thread thread) {
// we don't want to have a ref to the queue as the value in our weak map
// so we null it out; to ensure there are no races with restoring it later
// we impose a memory ordering here (no-op on x86)
Map<Stack<?>, WeakOrderQueue> delayedRecycled = DELAYED_RECYCLED.get();
WeakOrderQueue queue = delayedRecycled.get(this);
if (queue == null) {
if (delayedRecycled.size() >= maxDelayedQueues) {
// Add a dummy queue so we know we should drop the object
delayedRecycled.put(this, WeakOrderQueue.DUMMY);
return;
}
// Check if we already reached the maximum number of delayed queues and if we can allocate at all.
if ((queue = WeakOrderQueue.allocate(this, thread)) == null) {
// drop object
return;
}
delayedRecycled.put(this, queue);
} else if (queue == WeakOrderQueue.DUMMY) {
// drop object
return;
}
queue.add(item);
}
boolean dropHandle(DefaultHandle<?> handle) {
if (!handle.hasBeenRecycled) {
if ((++handleRecycleCount & ratioMask) != 0) {
// Drop the object.
return true;
}
handle.hasBeenRecycled = true;
}
return false;
}
Improve performance of Recycler Motivation: Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used. Modification: - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance. - Recycling of the same object multiple times without acquire it will fail. - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler These changes are based on @belliottsmith 's work that was part of #2504. Result: Huge increase in performance. 4.0 branch without this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 116026994.130 2763381.305 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 110823170.627 3007221.464 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 118290272.413 7143962.304 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 120560396.523 6483323.228 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 114726607.428 2960013.108 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 119385917.899 3172913.684 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark 4.0 branch with this commit: Benchmark (size) Mode Samples Score Score error Units i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00000 thrpt 20 204158855.315 5031432.145 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 00256 thrpt 20 205179685.861 1934137.841 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 01024 thrpt 20 209906801.437 8007811.254 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 04096 thrpt 20 214288320.053 6413126.689 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 16384 thrpt 20 215940902.649 7837706.133 ops/s i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread 65536 thrpt 20 211141994.206 5017868.542 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-13 11:56:35 +02:00
DefaultHandle<T> newHandle() {
return new DefaultHandle<T>(this);
Revamp the core API to reduce memory footprint and consumption The API changes made so far turned out to increase the memory footprint and consumption while our intention was actually decreasing them. Memory consumption issue: When there are many connections which does not exchange data frequently, the old Netty 4 API spent a lot more memory than 3 because it always allocates per-handler buffer for each connection unless otherwise explicitly stated by a user. In a usual real world load, a client doesn't always send requests without pausing, so the idea of having a buffer whose life cycle if bound to the life cycle of a connection didn't work as expected. Memory footprint issue: The old Netty 4 API decreased overall memory footprint by a great deal in many cases. It was mainly because the old Netty 4 API did not allocate a new buffer and event object for each read. Instead, it created a new buffer for each handler in a pipeline. This works pretty well as long as the number of handlers in a pipeline is only a few. However, for a highly modular application with many handlers which handles connections which lasts for relatively short period, it actually makes the memory footprint issue much worse. Changes: All in all, this is about retaining all the good changes we made in 4 so far such as better thread model and going back to the way how we dealt with message events in 3. To fix the memory consumption/footprint issue mentioned above, we made a hard decision to break the backward compatibility again with the following changes: - Remove MessageBuf - Merge Buf into ByteBuf - Merge ChannelInboundByte/MessageHandler and ChannelStateHandler into ChannelInboundHandler - Similar changes were made to the adapter classes - Merge ChannelOutboundByte/MessageHandler and ChannelOperationHandler into ChannelOutboundHandler - Similar changes were made to the adapter classes - Introduce MessageList which is similar to `MessageEvent` in Netty 3 - Replace inboundBufferUpdated(ctx) with messageReceived(ctx, MessageList) - Replace flush(ctx, promise) with write(ctx, MessageList, promise) - Remove ByteToByteEncoder/Decoder/Codec - Replaced by MessageToByteEncoder<ByteBuf>, ByteToMessageDecoder<ByteBuf>, and ByteMessageCodec<ByteBuf> - Merge EmbeddedByteChannel and EmbeddedMessageChannel into EmbeddedChannel - Add SimpleChannelInboundHandler which is sometimes more useful than ChannelInboundHandlerAdapter - Bring back Channel.isWritable() from Netty 3 - Add ChannelInboundHandler.channelWritabilityChanges() event - Add RecvByteBufAllocator configuration property - Similar to ReceiveBufferSizePredictor in Netty 3 - Some existing configuration properties such as DatagramChannelConfig.receivePacketSize is gone now. - Remove suspend/resumeIntermediaryDeallocation() in ByteBuf This change would have been impossible without @normanmaurer's help. He fixed, ported, and improved many parts of the changes.
2013-05-28 13:40:19 +02:00
}
}
}