Commit Graph

276 Commits

Author SHA1 Message Date
Norman Maurer
96c7132dee [maven-release-plugin] prepare release netty-4.1.20.Final 2018-01-21 12:53:34 +00:00
Francesco Nigro
1cf2687244 Fixed JMH ByteBuf benchmark to avoid dead code elimination
Motivation:

The JMH doc suggests to use BlackHoles to avoid dead code elimination hence would be better to follow this best practice.

Modifications:

Each benchmark method is returning the ByteBuf/ByteBuffer to avoid the JVM to perform any dead code elimination.

Result:

The results are more reliable and comparable to the others provided by other ByteBuf benchmarks (eg HeapByteBufBenchmark)
2017-12-19 14:09:18 +01:00
Scott Mitchell
55ef09f191 Add HttpObjectEncoderBenchmark
Motivation:
Benchmark to measure HttpObjectEncoder performance.

Modifications:
- Create new benchmark HttpObjectEncoderBenchmark

Result:
JMH Microbenchmark for HttpObjectEncoder.
2017-12-16 13:47:34 +01:00
Scott Mitchell
5f0342ebe0 Add RedisEncoderBenchmark
Motivation:
Add a benchmark to measure RedisEncoder's performance

Modifications:
- Add RedisEncoderBenchmark

Result:
JMH benchmark exists to measure RedisEncoder's performance.
2017-12-16 13:42:50 +01:00
Norman Maurer
264a5daa41 [maven-release-plugin] prepare for next development iteration 2017-12-15 13:10:54 +00:00
Norman Maurer
0786c4c8d9 [maven-release-plugin] prepare release netty-4.1.19.Final 2017-12-15 13:09:30 +00:00
Norman Maurer
b2bc6407ab [maven-release-plugin] prepare for next development iteration 2017-12-08 09:26:15 +00:00
Norman Maurer
96732f47d8 [maven-release-plugin] prepare release netty-4.1.18.Final 2017-12-08 09:25:56 +00:00
Scott Mitchell
93b144b7b4 HttpMethod#valueOf improvement
Motivation:
HttpMethod#valueOf shows up on profiler results in the top set of
results. Since it is a relatively simple operation it can be improved in
isolation.

Modifications:
- Introduce a special case map which assigns each HttpMethod to a unique
index in an array and provides constant time lookup from a hash code
algorithm. When the bucket is matched we can then directly do equality
comparison instead of potentially following a linked structure when
HashMap has hash collisions.

Result:
~10% improvement in benchmark results for HttpMethod#valueOf

Benchmark                                     Mode  Cnt   Score   Error   Units
HttpMethodMapBenchmark.newMapKnownMethods    thrpt   16  31.831 ± 0.928  ops/us
HttpMethodMapBenchmark.newMapMixMethods      thrpt   16  25.568 ± 0.400  ops/us
HttpMethodMapBenchmark.newMapUnknownMethods  thrpt   16  51.413 ± 1.824  ops/us
HttpMethodMapBenchmark.oldMapKnownMethods    thrpt   16  29.226 ± 0.330  ops/us
HttpMethodMapBenchmark.oldMapMixMethods      thrpt   16  21.073 ± 0.247  ops/us
HttpMethodMapBenchmark.oldMapUnknownMethods  thrpt   16  49.081 ± 0.577  ops/us
2017-11-20 11:07:50 -08:00
Scott Mitchell
e6126215e0 DefaultHttp2FrameWriter reduce object allocation
Motivation:
DefaultHttp2FrameWriter#writeData allocates a DataFrameHeader for each write operation. DataFrameHeader maintains internal state and allocates multiple slices of a buffer which is a maximum of 30 bytes. This 30 byte buffer may not always be necessary and the additional slice operations can utilize retainedSlice to take advantage of pooled objects. We can also save computation and object allocations if there is no padding which is a common case in practice.

Modifications:
- Remove DataFrameHeader
- Add a fast path for padding == 0

Result:
Less object allocation in DefaultHttp2FrameWriter
2017-11-20 08:10:59 -08:00
Anuraag Agrawal
1f1a60ae7d Use Netty's DefaultPriorityQueue instead of JDK's PriorityQueue for scheduled tasks
Motivation:

`AbstractScheduledEventExecutor` uses a standard `java.util.PriorityQueue` to keep track of task deadlines. `ScheduledFuture.cancel` removes tasks from this `PriorityQueue`. Unfortunately, `PriorityQueue.remove` has `O(n)` performance since it must search for the item in the entire queue before removing it. This is fast when the future is at the front of the queue (e.g., already triggered) but not when it's randomly located in the queue.

Many servers will use `ScheduledFuture.cancel` on all requests, e.g., to manage a request timeout. As these cancellations will be happen in arbitrary order, when there are many scheduled futures, `PriorityQueue.remove` is a bottleneck and greatly hurts performance with many concurrent requests (>10K).

Modification:

Use netty's `DefaultPriorityQueue` for scheduling futures instead of the JDK. `DefaultPriorityQueue` is almost identical to the JDK version except it is able to remove futures without searching for them in the queue. This means `DefaultPriorityQueue.remove` has `O(log n)` performance.

Result:

Before - cancelling futures has varying performance, capped at `O(n)`
After - cancelling futures has stable performance, capped at `O(log n)`

Benchmark results

After - cancelling in order and in reverse order have similar performance within `O(log n)` bounds
```
Benchmark                                           (num)   Mode  Cnt       Score      Error  Units
ScheduledFutureTaskBenchmark.cancelInOrder            100  thrpt   20  137779.616 ± 7709.751  ops/s
ScheduledFutureTaskBenchmark.cancelInOrder           1000  thrpt   20   11049.448 ±  385.832  ops/s
ScheduledFutureTaskBenchmark.cancelInOrder          10000  thrpt   20     943.294 ±   12.391  ops/s
ScheduledFutureTaskBenchmark.cancelInOrder         100000  thrpt   20      64.210 ±    1.824  ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder     100  thrpt   20  167531.096 ± 9187.865  ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder    1000  thrpt   20   33019.786 ± 4737.770  ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder   10000  thrpt   20    2976.955 ±  248.555  ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder  100000  thrpt   20     362.654 ±   45.716  ops/s
```

Before - cancelling in order and in reverse order have significantly different performance at higher queue size, orders of magnitude worse than the new implementation.
```
Benchmark                                           (num)   Mode  Cnt       Score       Error  Units
ScheduledFutureTaskBenchmark.cancelInOrder            100  thrpt   20  139968.586 ± 12951.333  ops/s
ScheduledFutureTaskBenchmark.cancelInOrder           1000  thrpt   20   12274.420 ±   337.800  ops/s
ScheduledFutureTaskBenchmark.cancelInOrder          10000  thrpt   20     958.168 ±    15.350  ops/s
ScheduledFutureTaskBenchmark.cancelInOrder         100000  thrpt   20      53.381 ±    13.981  ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder     100  thrpt   20  123918.829 ±  3642.517  ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder    1000  thrpt   20    5099.810 ±   206.992  ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder   10000  thrpt   20      72.335 ±     0.443  ops/s
ScheduledFutureTaskBenchmark.cancelInReverseOrder  100000  thrpt   20       0.743 ±     0.003  ops/s
```
2017-11-10 23:09:32 -08:00
Norman Maurer
188ea59c9d [maven-release-plugin] prepare for next development iteration 2017-11-08 22:36:53 +00:00
Norman Maurer
812354cf1f [maven-release-plugin] prepare release netty-4.1.17.Final 2017-11-08 22:36:33 +00:00
Carl Mastrangelo
83a19d5650 Optimistically update ref counts
Motivation:
Highly retained and released objects have contention on their ref
count.  Currently, the ref count is updated using compareAndSet
with care to make sure the count doesn't overflow, double free, or
revive the object.

Profiling has shown that a non trivial (~1%) of CPU time on gRPC
latency benchmarks is from the ref count updating.

Modification:
Rather than pessimistically assuming the ref count will be invalid,
optimistically update it assuming it will be.  If the update was
wrong, then use the slow path to revert the change and throw an
execption.  Most of the time, the ref counts are correct.

This changes from using compareAndSet to getAndAdd, which emits a
different CPU instruction on x86 (CMPXCHG to XADD).  Because the
CPU knows it will modifiy the memory, it can avoid contention.

On a highly contended machine, this can be about 2x faster.

There is a downside to the new approach.  The ref counters can
temporarily enter invalid states if over retained or over released.
The code does handle these overflow and underflow scenarios, but it
is possible that another concurrent access may push the failure to
a different location.  For example:

Time 1 Thread 1: obj.retain(INT_MAX - 1)
Time 2 Thread 1: obj.retain(2)
Time 2 Thread 2: obj.retain(1)

Previously Thread 2 would always succeed and Thread 1 would always
fail on the second access.  Now, thread 2 could fail while thread 1
is rolling back its change.

====

There are a few reasons why I think this is okay:

1. Buggy code is going to have bugs.  An exception _is_ going to be
   thrown.  This just causes the other threads to notice the state
   is messed up and stop early.
2. If high retention counts are a use case, then ref count should
   be a long rather than an int.
3. The critical section is greatly reduced compared to the previous
   version, so the likelihood of this happening is lower
4. On error, the code always rollsback the change atomically, so
   there is no possibility of corruption.

Result:
Faster refcounting

```
BEFORE:

Benchmark                                                                                             (delay)    Mode      Cnt         Score    Error  Units
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                            1  sample  2901361       804.579 ±  1.835  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                           10  sample  3038729       785.376 ± 16.471  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                          100  sample  2899401       817.392 ±  6.668  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                         1000  sample  3650566      2077.700 ±  0.600  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                        10000  sample  3005467     19949.334 ±  4.243  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                          1  sample   456091        48.610 ±  1.162  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                         10  sample   732051        62.599 ±  0.815  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                        100  sample   778925       228.629 ±  1.205  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                       1000  sample   633682      2002.987 ±  2.856  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                      10000  sample   506442     19735.345 ± 12.312  ns/op

AFTER:
Benchmark                                                                                             (delay)    Mode      Cnt         Score    Error  Units
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                            1  sample  3761980       383.436 ±  1.315  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                           10  sample  3667304       474.429 ±  1.101  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                          100  sample  3039374       479.267 ±  0.435  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                         1000  sample  3709210      2044.603 ±  0.989  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                        10000  sample  3011591     19904.227 ± 18.025  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                          1  sample   494975        52.269 ±  8.345  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                         10  sample   771094        62.290 ±  0.795  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                        100  sample   763230       235.044 ±  1.552  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                       1000  sample   634037      2006.578 ±  3.574  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                      10000  sample   506284     19742.605 ± 13.729  ns/op

```
2017-10-04 08:42:33 +02:00
Norman Maurer
625a7426cd [maven-release-plugin] prepare for next development iteration 2017-09-25 06:12:32 +02:00
Norman Maurer
f57d8f00e1 [maven-release-plugin] prepare release netty-4.1.16.Final 2017-09-25 06:12:16 +02:00
Norman Maurer
3c8c7fc7e9 Reduce performance overhead of ResourceLeakDetector
Motiviation:

The ResourceLeakDetector helps to detect and troubleshoot resource leaks and is often used even in production enviroments with a low level. Because of this its import that we try to keep the overhead as low as overhead. Most of the times no leak is detected (as all is correctly handled) so we should keep the overhead for this case as low as possible.

Modifications:

- Only call getStackTrace() if a leak is reported as it is a very expensive native call. Also handle the filtering and creating of the String in a lazy fashion
- Remove the need to mantain a Queue to store the last access records
- Add benchmark

Result:

Huge decrease of performance overhead.

Before the patch:

Benchmark                                           (recordTimes)   Mode  Cnt     Score     Error  Units
ResourceLeakDetectorRecordBenchmark.record                      8  thrpt   20  4358.367 ± 116.419  ops/s
ResourceLeakDetectorRecordBenchmark.record                     16  thrpt   20  2306.027 ±  55.044  ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint              8  thrpt   20  4220.979 ± 114.046  ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint             16  thrpt   20  2250.734 ±  55.352  ops/s

With this patch:

Benchmark                                           (recordTimes)   Mode  Cnt      Score      Error  Units
ResourceLeakDetectorRecordBenchmark.record                      8  thrpt   20  71398.957 ± 2695.925  ops/s
ResourceLeakDetectorRecordBenchmark.record                     16  thrpt   20  38643.963 ± 1446.694  ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint              8  thrpt   20  71677.882 ± 2923.622  ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint             16  thrpt   20  38660.176 ± 1467.732  ops/s
2017-09-18 16:36:19 -07:00
Norman Maurer
b967805f32 [maven-release-plugin] prepare for next development iteration 2017-08-24 15:38:22 +02:00
Norman Maurer
da8e010a42 [maven-release-plugin] prepare release netty-4.1.15.Final 2017-08-24 15:37:59 +02:00
Norman Maurer
52f384b37f [maven-release-plugin] prepare for next development iteration 2017-08-02 12:55:10 +00:00
Norman Maurer
8cc1071881 [maven-release-plugin] prepare release netty-4.1.14.Final 2017-08-02 12:54:51 +00:00
Nikolay Fedorovskikh
df568c739e Use ByteBuf#writeShort/writeMedium instead of writeBytes
Motivation:

1. Some encoders used a `ByteBuf#writeBytes` to write short constant byte array (2-3 bytes). This can be replaced with more faster `ByteBuf#writeShort` or `ByteBuf#writeMedium` which do not access the memory.
2. Two chained calls of the `ByteBuf#setByte` with constants can be replaced with one `ByteBuf#setShort` to reduce index checks.
3. The signature of method `HttpHeadersEncoder#encoderHeader` has an unnecessary `throws`.

Modifications:

1. Use `ByteBuf#writeShort` or `ByteBuf#writeMedium` instead of `ByteBuf#writeBytes` for the constants.
2. Use `ByteBuf#setShort` instead of chained call of the `ByteBuf#setByte` with constants.
3. Remove an unnecessary `throws` from `HttpHeadersEncoder#encoderHeader`.

Result:

A bit faster writes constants into buffers.
2017-07-10 14:37:41 +02:00
Norman Maurer
2a376eeb1b [maven-release-plugin] prepare for next development iteration 2017-07-06 13:24:06 +02:00
Norman Maurer
c7f8168324 [maven-release-plugin] prepare release netty-4.1.13.Final 2017-07-06 13:23:51 +02:00
Dmitriy Dumanskiy
dd69a813d4 Performance improvement for HttpRequestEncoder. Insert char into the string optimized.
Motivation:

Right now HttpRequestEncoder does insertion of slash for url like http://localhost?pararm=1 before the question mark. It is done not effectively.

Modification:

Code:

new StringBuilder(len + 1)
                .append(uri, 0, index)
                .append(SLASH)
                .append(uri, index, len)
                .toString();
Replaced with:

new StringBuilder(uri)
                .insert(index, SLASH)
                .toString();
Result:

Faster HttpRequestEncoder. Additional small test. Attached benchmark in PR.

Benchmark                                      Mode  Cnt        Score        Error  Units
HttpRequestEncoderInsertBenchmark.newEncoder  thrpt   40  3704843.303 ±  98950.919  ops/s
HttpRequestEncoderInsertBenchmark.oldEncoder  thrpt   40  3284236.960 ± 134433.217  ops/s
2017-06-27 10:53:43 +02:00
Nikolay Fedorovskikh
aa38b6a769 Prevent unnecessary allocations in the StringUtil#escapeCsv
Motivation:

A `StringUtil#escapeCsv` creates new `StringBuilder` on each value even if the same string is returned in the end.

Modifications:

Create new `StringBuilder` only if it really needed. Otherwise, return the original string (or just trimmed substring).

Result:

Less GC load. Up to 4x faster work for not changed strings.
2017-06-13 14:57:38 -07:00
Dmitriy Dumanskiy
acc07fac32 disabling leak detection micro benchmark
Motivation:

When I run Netty micro benchmarks I get many warnings like:

WARNING: -Dio.netty.noResourceLeakDetection is deprecated. Use '-Dio.netty.leakDetection.level=simple' instead.

Modification:

-Dio.netty.noResourceLeakDetection replaced with -Dio.netty.leakDetection.level=disabled.

Result:

No warnings.
2017-06-09 18:03:54 +02:00
Norman Maurer
fd67a2354d [maven-release-plugin] prepare for next development iteration 2017-06-08 21:06:24 +02:00
Norman Maurer
3acd5c68ea [maven-release-plugin] prepare release netty-4.1.12.Final 2017-06-08 21:06:01 +02:00
Nikolay Fedorovskikh
e4531918a3 Optimizations in NetUtil
Motivation:

IPv4/6 validation methods use allocations, which can be avoided.
IPv4 parse method use StringTokenizer.

Modifications:

Rewriting IPv4/6 validation methods to avoid allocations.
Rewriting IPv4 parse method without use StringTokenizer.

Result:

IPv4/6 validation and IPv4 parsing faster up to 2-10x.
2017-05-18 16:42:22 -07:00
Norman Maurer
0db2901f4d [maven-release-plugin] prepare for next development iteration 2017-05-11 16:00:55 +02:00
Norman Maurer
f7a19d330c [maven-release-plugin] prepare release netty-4.1.11.Final 2017-05-11 16:00:16 +02:00
Scott Mitchell
3cc4052963 New native transport for kqueue
Motivation:
We currently don't have a native transport which supports kqueue https://www.freebsd.org/cgi/man.cgi?query=kqueue&sektion=2. This can be useful for BSD systems such as MacOS to take advantage of native features, and provide feature parity with the Linux native transport.

Modifications:
- Make a new transport-native-unix-common module with all the java classes and JNI code for generic unix items. This module will build a static library for each unix platform, and included in the dynamic libraries used for JNI (e.g. transport-native-epoll, and eventually kqueue).
- Make a new transport-native-unix-common-tests module where the tests for the transport-native-unix-common module will live. This is so each unix platform can inherit from these test and ensure they pass.
- Add a new transport-native-kqueue module which uses JNI to directly interact with kqueue

Result:
JNI support for kqueue.
Fixes https://github.com/netty/netty/issues/2448
Fixes https://github.com/netty/netty/issues/4231
2017-05-03 09:53:22 -07:00
Norman Maurer
6915ec3bb9 [maven-release-plugin] prepare for next development iteration 2017-04-29 14:10:00 +02:00
Norman Maurer
f30f242fee [maven-release-plugin] prepare release netty-4.1.10.Final 2017-04-29 14:09:32 +02:00
Nikolay Fedorovskikh
0692bf1b6a fix the typos 2017-04-20 04:56:09 +02:00
Norman Maurer
e482d933f7 Add 'io.netty.tryAllocateUninitializedArray' system property which allows to allocate byte[] without memset in Java9+
Motivation:

Java9 added a new method to Unsafe which allows to allocate a byte[] without memset it. This can have a massive impact in allocation times when the byte[] is big. This change allows to enable this when using Java9 with the io.netty.tryAllocateUninitializedArray property when running Java9+. Please note that you will need to open up the jdk.internal.misc package via '--add-opens java.base/jdk.internal.misc=ALL-UNNAMED' as well.

Modifications:

Allow to allocate byte[] without memset on Java9+

Result:

Better performance when allocate big heap buffers and using java9.
2017-04-19 11:45:39 +02:00
Ade Setyawan Sajim
016629fe3b Replace system.out.println with InternalLoggerFactory
Motivation:

There are two files that still use `system.out.println` to log their status

Modification:

Replace `system.out.println` with a `debug` function inside an instance of `InternalLoggerFactory`

Result:

Introduce an instance of `InternalLoggerFactory` in class `AbstractMicrobenchmark.java` and `AbstractSharedExecutorMicrobenchmark.java`
2017-03-28 14:51:59 +02:00
Norman Maurer
2b8c8e0805 [maven-release-plugin] prepare for next development iteration 2017-03-10 07:46:17 +01:00
Norman Maurer
1db58ea980 [maven-release-plugin] prepare release netty-4.1.9.Final 2017-03-10 07:45:28 +01:00
Scott Mitchell
743d2d374c SslHandler benchmark and SslEngine multiple packets benchmark
Motivation:
We currently don't have a benchmark which includes SslHandler. The SslEngine benchmarks also always include a single TLS packet when encoding/decoding. In practice when reading data from the network there may be multiple TLS packets present and we should expand the benchmarks to understand this use case.

Modifications:
- SslEngine benchmarks should include wrapping/unwrapping of multiple TLS packets
- Introduce SslHandler benchmarks which can also account for wrapping/unwrapping of multiple TLS packets

Result:
SslHandler and SslEngine benchmarks are more comprehensive.
2017-03-06 08:42:39 -08:00
Scott Mitchell
f9001b9fc0 HTTP/2 move internal HPACK classes to the http2 package
Motivation:
The internal.hpack classes are no longer exposed in our public APIs and can be made package private in the http2 package.

Modifications:
- Make the hpack classes package private in the http2 package

Result:
Less APIs exposed as public.
2017-03-02 07:42:41 -08:00
Norman Maurer
461f9a1212 Allow to obtain informations of used direct and heap memory for ByteBufAllocator implementations
Motivation:

Often its useful for the user to be able to get some stats about the memory allocated via an allocator.

Modifications:

- Allow to obtain the used heap and direct memory for an allocator
- Add test case

Result:

Fixes [#6341]
2017-03-01 18:53:43 +01:00
Norman Maurer
90a61046c7 Add benchmarks for UnpooledUnsafeNoCleanerDirectByteBuf vs UnpooledUnsafeDirectByteBuf
Motivation:

Issue [#6349] brought up the idea to not use UnpooledUnsafeNoCleanerDirectByteBuf by default. To decide what to do a benchmark is needed.

Modifications:

Add benchmarks for UnpooledUnsafeNoCleanerDirectByteBuf vs UnpooledUnsafeDirectB
yteBuf

Result:

Better idea about impact of using UnpooledUnsafeNoCleanerDirectByteBuf.
2017-02-27 20:04:09 +01:00
Norman Maurer
d73477c7bd Add benchmarks for SSLEngine implementations
Motivation:

As we provide our own SSLEngine implementation we should have benchmarks to compare it against JDK impl.

Modifications:

Add benchmarks for wrap / unwrap and handshake performance.

Result:

Benchmarks FTW.
2017-02-24 08:02:10 +01:00
Norman Maurer
a80d3411ee Move all the microbenchmark code into one directory.
Motivation:

Allmost all our benchmarks are in src/main/java but a few are in src/test/java. We should make it consistent.

Modifications:

Move everything to src/main/java

Result:

Consistent code base.
2017-02-23 19:59:09 +01:00
Nikolay Fedorovskikh
0623c6c533 Fix javadoc issues
Motivation:

Invalid javadoc in project

Modifications:

Fix it

Result:

More correct javadoc
2017-02-22 07:31:07 +01:00
Nikolay Fedorovskikh
634a8afa53 Fix some warnings at generics usage
Motivation:

Existing warnings from java compiler

Modifications:

Add/fix type parameters

Result:

Less warnings
2017-02-22 07:29:59 +01:00
Norman Maurer
fd2e142e74 Update to latest jmh version
Motivation:

We use an outdated jmh version.

Modifications:

Update to jmh 1.17.4.

Result:

Using latest jmh version.
2017-02-14 08:40:12 +01:00
Kiril Menshikov
66b9be3a46 Allow to allign allocated Buffers
Motivation:

64-byte alignment is recommended by the Intel performance guide (https://software.intel.com/en-us/articles/practical-intel-avx-optimization-on-2nd-generation-intel-core-processors) for data-structures over 64 bytes.
Requiring padding to a multiple of 64 bytes allows for using SIMD instructions consistently in loops without additional conditional checks. This should allow for simpler and more efficient code.

Modification:

At the moment cache alignment must be setup manually. But probably it might be taken from the system. The original code was introduced by @normanmaurer https://github.com/netty/netty/pull/4726/files

Result:

Buffer alignment works better than miss-align cache.
2017-02-06 07:58:29 +01:00