Fix performance regression in FastThreadLocal microbenchmark. Fixes #4402

Motivation:

As reported in #4402, the FastThreadLocalBenchmark shows that the JDK ThreadLocal
is actually faster than Netty's custom thread local implementation.

I was looking forward to doing some deep digging, but got disappointed :(.

Modifications:

The microbenchmark was not using FastThreadLocalThreads and would thus always hit the slow path.
I updated the JMH command line flags, so that FastThreadLocalThreads would be used.

Result:

FastThreadLocalBenchmark shows FastThreadLocal to be faster than JDK's ThreadLocal implementation,
by about 56% in this particular benchmark. Run on OSX El Capitan with OpenJDK 1.8u60.

Benchmark                                    Mode  Cnt      Score      Error  Units
FastThreadLocalBenchmark.fastThreadLocal    thrpt   20  55452.027 ±  725.713  ops/s
FastThreadLocalBenchmark.jdkThreadLocalGet  thrpt   20  35481.888 ± 1471.647  ops/s
This commit is contained in:
buchgr 2015-10-29 21:27:23 +01:00 committed by Norman Maurer
parent 2e36ac4594
commit c9364616c8

View File

@ -36,8 +36,8 @@ public class AbstractMicrobenchmark extends AbstractMicrobenchmarkBase {
static {
final String[] customArgs = {
"-Xms768m", "-Xmx768m", "-XX:MaxDirectMemorySize=768m", "-Dharness.executor=CUSTOM",
"-Dharness.executor.class=AbstractMicrobenchmark$HarnessExecutor" };
"-Xms768m", "-Xmx768m", "-XX:MaxDirectMemorySize=768m", "-Djmh.executor=CUSTOM",
"-Djmh.executor.class=io.netty.microbench.util.AbstractMicrobenchmark$HarnessExecutor" };
JVM_ARGS = new String[BASE_JVM_ARGS.length + customArgs.length];
System.arraycopy(BASE_JVM_ARGS, 0, JVM_ARGS, 0, BASE_JVM_ARGS.length);
@ -59,7 +59,6 @@ public class AbstractMicrobenchmark extends AbstractMicrobenchmarkBase {
protected ChainedOptionsBuilder newOptionsBuilder() throws Exception {
ChainedOptionsBuilder runnerOptions = super.newOptionsBuilder();
if (getForks() > 0) {
runnerOptions.forks(getForks());
}