669 Commits

Author SHA1 Message Date
Nick Hill
d7fa7be67f Exploit PlatformDependent.allocateUninitializedArray() in more places (#8393)
Motivation:

There are currently many more places where this could be used which were
possibly not considered when the method was added.

If https://github.com/netty/netty/pull/8388 is included in its current
form, a number of these places could additionally make use of the same
BYTE_ARRAYS threadlocal.

There's also a couple of adjacent places where an optimistically-pooled
heap buffer is used for temp byte storage which could use the
threadlocal too in preference to allocating a temp heap bytebuf wrapper.
For example
https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/ByteBufUtil.java#L1417.

Modifications:

Replace new byte[] with PlatformDependent.allocateUninitializedArray()
where appropriate; make use of ByteBufUtil.getBytes() in some places
which currently perform the equivalent logic, including avoiding copy of
backing array if possible (although would be rare).

Result:

Further potential speed-up with java9+ and appropriate compile flags.
Many of these places could be on latency-sensitive code paths.
2018-10-27 10:43:28 -05:00
Nick Hill
583d838f7c Optimize AbstractByteBuf.getCharSequence() in US_ASCII case (#8392)
* Optimize AbstractByteBuf.getCharSequence() in US_ASCII case

Motivation:

Inspired by https://github.com/netty/netty/pull/8388, I noticed this
simple optimization to avoid char[] allocation (also suggested in a TODO
here).

Modifications:

Return an AsciiString from AbstractByteBuf.getCharSequence() if
requested charset is US_ASCII or ISO_8859_1 (latter thanks to
@Scottmitch's suggestion). Also tweak unit tests not to require Strings
and include a new benchmark to demonstrate the speedup.

Result:

Speed-up of AbstractByteBuf.getCharSequence() in ascii and iso 8859/1
cases
2018-10-26 15:32:38 -07:00
Norman Maurer
87ec2f882a
Reduce overhead by ByteBufUtil.decodeString(...) which is used by AbstractByteBuf.toString(...) and AbstractByteBuf.getCharSequence(...) (#8388)
Motivation:

Our current implementation that is used for toString(Charset) operations on AbstractByteBuf implementation is quite slow as it does a lot of uncessary memory copies. We should just use new String(...) as it has a lot of optimizations to handle these cases.

Modifications:

Rewrite ByteBufUtil.decodeString(...) to use new String(...)

Result:

Less overhead for toString(Charset) operations.

Benchmark                                         (charsetName)  (direct)  (size)   Mode  Cnt         Score         Error  Units
ByteBufUtilDecodeStringBenchmark.decodeString          US-ASCII     false       8  thrpt   20  22401645.093 ? 4671452.479  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString          US-ASCII     false      64  thrpt   20  23678483.384 ? 3749164.446  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString          US-ASCII      true       8  thrpt   20  15731142.651 ? 3782931.591  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString          US-ASCII      true      64  thrpt   20  16244232.229 ? 1886259.658  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString             UTF-8     false       8  thrpt   20  25983680.959 ? 5045782.289  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString             UTF-8     false      64  thrpt   20  26235589.339 ? 2867004.950  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString             UTF-8      true       8  thrpt   20  18499027.808 ? 4784684.268  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString             UTF-8      true      64  thrpt   20  16825286.141 ? 1008712.342  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString            UTF-16     false       8  thrpt   20   5789879.092 ? 1201786.359  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString            UTF-16     false      64  thrpt   20   2173243.225 ?  417809.341  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString            UTF-16      true       8  thrpt   20   5035583.011 ? 1001978.854  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString            UTF-16      true      64  thrpt   20   2162345.301 ?  402410.408  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString        ISO-8859-1     false       8  thrpt   20  30039052.376 ? 6539111.622  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString        ISO-8859-1     false      64  thrpt   20  31414163.515 ? 2096710.526  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString        ISO-8859-1      true       8  thrpt   20  19538587.855 ? 4639115.572  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString        ISO-8859-1      true      64  thrpt   20  19467839.722 ? 1672687.213  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld       US-ASCII     false       8  thrpt   20  10787326.745 ? 1034197.864  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld       US-ASCII     false      64  thrpt   20   7129801.930 ? 1363019.209  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld       US-ASCII      true       8  thrpt   20   9002529.605 ? 2017642.445  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld       US-ASCII      true      64  thrpt   20   3860192.352 ?  826218.738  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld          UTF-8     false       8  thrpt   20  10532838.027 ? 2151743.968  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld          UTF-8     false      64  thrpt   20   7185554.597 ? 1387685.785  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld          UTF-8      true       8  thrpt   20   7352253.316 ? 1333823.850  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld          UTF-8      true      64  thrpt   20   2825578.707 ?  349701.156  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld         UTF-16     false       8  thrpt   20   7277446.665 ? 1447034.346  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld         UTF-16     false      64  thrpt   20   2445929.579 ?  562816.641  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld         UTF-16      true       8  thrpt   20   6201174.401 ? 1236137.786  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld         UTF-16      true      64  thrpt   20   2310674.973 ?  525587.959  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld     ISO-8859-1     false       8  thrpt   20  11142625.392 ? 1680556.468  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld     ISO-8859-1     false      64  thrpt   20   8127116.405 ? 1128513.860  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld     ISO-8859-1      true       8  thrpt   20   9405751.952 ? 2193324.806  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld     ISO-8859-1      true      64  thrpt   20   3943282.076 ?  737798.070  ops/s

Benchmark result is saved to /home/norman/mainframer/netty/microbench/target/reports/performance/ByteBufUtilDecodeStringBenchmark.json
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1,030.173 sec - in io.netty.buffer.ByteBufUtilDecodeStringBenchmark
[1030.460s][info   ][gc,heap,exit ] Heap
[1030.460s][info   ][gc,heap,exit ]  garbage-first heap   total 516096K, used 257918K [0x0000000609a00000, 0x0000000800000000)
[1030.460s][info   ][gc,heap,exit ]   region size 2048K, 127 young (260096K), 2 survivors (4096K)
[1030.460s][info   ][gc,heap,exit ]  Metaspace       used 17123K, capacity 17438K, committed 17792K, reserved 1064960K
[1030.460s][info   ][gc,heap,exit ]   class space    used 1709K, capacity 1827K, committed 1920K, reserved 1048576K
2018-10-19 14:00:13 +02:00
Norman Maurer
69545aedc4
CompositeByteBuf.decompose(...) does not correctly slice content. (#8403)
Motivation:

CompositeByteBuf.decompose(...) did not correctly slice the content and so produced an incorrect representation of the data.

Modifications:

- Rewrote implementation to fix bug and also improved it to reduce GC
- Add unit tests.

Result:

Fixes https://github.com/netty/netty/issues/8400.
2018-10-19 08:05:22 +02:00
Nick Hill
7062ceedb0 Simplify ByteBufInputStream.readLine() logic (#8380)
Motivation:

While looking at the nice optimization done in
https://github.com/netty/netty/pull/8347 I couldn't help noticing the
logic could be simplified further. Apologies if this is just my OCD and
inappropriate!

Modifications:

Reduce amount of code used for ByteBufInputStream.readLine()

Result:

Slightly smaller and simpler code
2018-10-13 06:24:40 +02:00
Francesco Nigro
83dc3b503e ByteBufInputStream is always allocating a StringBuilder instance (#8347)
Motivation:

Avoid creating any StringBuilder instance if
ByteBufInputStream::readLine isn't used

Modifications:

The StringBuilder instance is lazy allocated on demand and
are added new test case branches to address the increased
complexity of ByteBufInputStream::readLine

Result:

Reduced GC activity if ByteBufInputStream::readLine isn't used
2018-10-11 14:56:29 +08:00
Dmitriy Dumanskiy
6cebb6069b remove unnecessary vararg argument in PooledByteBufAllocator (#8338)
Motivation:

No need in varargs, the method always accepts array.

Modification:

... replaced with []
2018-10-05 19:06:44 +08:00
Norman Maurer
c14efd952d
Directly init refCnt to 1 (#8274)
Motivation:

We should just directly init the refCnt to 1 and not use the AtomicIntegerFieldUpdater.

Modifications:

Just assing directly to 1.

Result:

Cleaner code and possible a bit faster as the JVM / JIT may be able to optimize the first store easily.
2018-09-07 19:04:19 +02:00
Norman Maurer
e542a2cf26
Use a non-volatile read for ensureAccessible() whenever possible to reduce overhead and allow better inlining. (#8266)
Motiviation:

At the moment whenever ensureAccessible() is called in our ByteBuf implementations (which is basically on each operation) we will do a volatile read. That per-se is not such a bad thing but the problem here is that it will also reduce the the optimizations that the compiler / jit can do. For example as these are volatile it can not eliminate multiple loads of it when inline the methods of ByteBuf which happens quite frequently because most of them a quite small and very hot. That is especially true for all the methods that act on primitives.

It gets even worse as people often call a lot of these after each other in the same method or even use method chaining here.

The idea of the change is basically just ue a non-volatile read for the ensureAccessible() check as its a best-effort implementation to detect acting on already released buffers anyway as even with a volatile read it could happen that the user will release it in another thread before we actual access the buffer after the reference check.

Modifications:

- Try to do a non-volatile read using sun.misc.Unsafe if we can use it.
- Add a benchmark

Result:

Big performance win when multiple ByteBuf methods are called from a method.

With the change:
UnsafeByteBufBenchmark.setGetLongUnsafeByteBuf  thrpt   20  281395842,128 ± 5050792,296  ops/s

Before the change:
UnsafeByteBufBenchmark.setGetLongUnsafeByteBuf  thrpt   20  217419832,801 ± 5080579,030  ops/s
2018-09-07 07:47:02 +02:00
Francesco Nigro
c78be33443 Added configurable ByteBuf bounds checking (#7521)
Motivation:

The JVM isn't always able to hoist out/reduce bounds checking (due to ref counting operations etc etc) hence making it configurable could improve performances for most CPU intensive use cases.

Modifications:

Each AbstractByteBuf bounds check has been tested against a new static final configuration property similar to checkAccessible ie io.netty.buffer.bytebuf.checkBounds.

Result:

Any user could disable ByteBuf bounds checking in order to get extra performances.
2018-09-03 20:33:47 +02:00
Norman Maurer
54f565ac67
Allow to use native transports when sun.misc.Unsafe is not present on… (#8231)
* Allow to use native transports when sun.misc.Unsafe is not present on the system

Motivation:

We should be able to use the native transports (epoll / kqueue) even when sun.misc.Unsafe is not present on the system. This is especially important as Java11 will be released soon and does not allow access to it by default.

Modifications:

- Correctly disable usage of sun.misc.Unsafe when -PnoUnsafe is used while running the build
- Correctly increment metric when UnpooledDirectByteBuf is allocated. This was uncovered once -PnoUnsafe usage was fixed.
- Implement fallbacks in all our native transport code for when sun.misc.Unsafe is not present.

Result:

Fixes https://github.com/netty/netty/issues/8229.
2018-08-29 19:36:33 +02:00
vincent-grosbois
0bea8ecf5d CompositeByteBuf nioBuffer doesn't always alloc (#8176)
In nioBuffer(int,int) in CompositeByteBuf , we create a sub-array of nioBuffers for the components that are in range, then concatenate all the components in range into a single bigger buffer.
However, if the call to nioBuffers() returned only one sub-buffer, then we are copying it to a newly-allocated buffer "merged" for no reason.

Motivation:

Profiler for Spark shows a lot of time spent in put() method inside nioBuffer(), while usually no copy of data is required.

Modification:
This change skips this last step and just returns a duplicate of the single buffer returned by the call to nioBuffers(), which will in most implementation not copy the data

Result:
No copy when the source is only 1 buffer
2018-08-07 11:31:24 +02:00
Norman Maurer
9b08dbca00
Leak detection combined with composite buffers results in incorrectly handled writerIndex when calling ByteBufUtil.writeAscii/writeUtf8 (#8153)
Motivation:

We need to add special handling for WrappedCompositeByteBuf as these also extend AbstractByteBuf, otherwise we will not correctly adjust / read the writerIndex during processing.

Modifications:

- Add instanceof checks for WrappedCompositeByteBuf as well.
- Add testcases

Result:

Fixes https://github.com/netty/netty/issues/8152.
2018-07-27 01:56:09 +08:00
Norman Maurer
6afab517b0
Guard against calling PoolThreadCache.free() multiple times. (#8108)
Motivation:

5b1fe611a637c362a60b391079fff73b1a4ef912 introduced the usage of a finalizer as last resort for PoolThreadCache. As we may call free() from the FastThreadLocal.onRemoval(...) and finalize() we need to guard against multiple calls as otherwise we will corrupt internal state (that is used for metrics).

Modifications:

Use AtomicBoolean to guard against multiple calls of PoolThreadCache.free().

Result:

No more corruption of internal state caused by calling PoolThreadCache.free() multuple times.
2018-07-09 15:58:12 -04:00
Nick Hill
fef462c043 Deprecate Unpooled.unmodifiableBuffer(ByteBuf...) (#8096)
Motivation:

Recent PR https://github.com/netty/netty/pull/8040 introduced
Unpooled.wrappedUnmodifiableBuffer(ByteBuf...) which has the same
behaviour but wraps the provided array directly. This is preferred for
most uses (including varargs-based use) and if there are any unusual
cases of an explicit array which is re-used before the ByteBuf is
finished with, it can just be copied first.

Modifications:

Added @Deprecated annotation and javadoc to
Unpooled.unmodifiableBuffer(ByteBuf...).

Result:

Unpooled.unmodifiableBuffer(ByteBuf...) will be deprecated.
2018-07-07 14:45:27 -04:00
Norman Maurer
83710cb2e1
Replace toArray(new T[size]) with toArray(new T[0]) to eliminate zero-out and allow the VM to optimize. (#8075)
Motivation:

Using toArray(new T[0]) is usually the faster aproach these days. We should use it.

See also https://shipilev.net/blog/2016/arrays-wisdom-ancients/#_conclusion.

Modifications:

Replace toArray(new T[size]) with toArray(new T[0]).

Result:

Faster code.
2018-06-29 07:56:04 +02:00
Norman Maurer
5b1fe611a6
Remove usage of ObjectCleaner (#8064)
Motivation:

ObjectCleaner does start a Thread to handle the cleaning of resources which leaks into the users application. We should not use it in netty itself to make things more predictable.

Modifications:

- Remove usage of ObjectCleaner and use finalize as a replacement when possible.
- Clarify javadocs for FastThreadLocal.onRemoval(...) to ensure its clear that remove() is not guaranteed to be called when the Thread completees and so this method is not enough to guarantee cleanup for this case.

Result:

Fixes https://github.com/netty/netty/issues/8017.
2018-06-28 08:15:27 +02:00
nickhill
f164759ea3 Support composite buffer creation without array alloc and copy
Motivation:

Unpooled.unmodifiableBuffer() is currently used to efficiently write
arrays of ByteBufs via FixedCompositeByteBuf, but involves an allocation
and content-copy of the provided ByteBuf array which in many (most?)
cases shouldn't be necessary.

Modifications:

Modify the internal FixedCompositeByteBuf class to support wrapping the
provided ByteBuf array directly. Control this behaviour with a
constructor flag and expose the "unsafe" version via a new
Unpooled.wrappedUnmodifiableBuffer(ByteBuf...) method.

Result:

Less garbage on IO paths. I would guess pretty much all existing usage
of unmodifiableBuffer() could use the copy-free version but assume it's
not safe to change its default behaviour.
2018-06-27 07:40:14 +02:00
nickhill
9b95b8ee62 Reduce array allocations during CompositeByteBuf construction
Motivation:

Eliminate avoidable backing array reallocations when constructing
composite ByteBufs from existing buffer arrays/Iterables. This also
applies to the Unpooled.wrappedBuffer(...) methods.

Modifications:

Ensure the initial components ComponentList is sized at least as large
as the provided buffer array/Iterable in the CompositeByteBuffer
constructors.

In single-arg Unpooled.wrappedBuffer(...) methods, set maxNumComponents
to the count of provided buffers, rather than a fixed default of 16. It
seems likely that most usage of these involves wrapping a list without
subsequent modification, particularly since they return a ByteBuf rather
than CompositeByteBuf. If a different/larger max is required there are
already the wrappedBuffer(int, ...) variants.

In fact the current behaviour could be considered inconsistent - if you
call Unpooled.wrappedBuffer(int, ByteBuf) with a single buffer, you
might expect to subsequently be able to add buffers to it (since you
specified a max related to consolidation), but it will in fact return
just a slice of the provided ByteBuf.

Result:

Fewer and smaller allocations in some cases when using CompositeByteBufs
or Unpooled.wrappedBuffer(...).
2018-06-20 16:09:23 +02:00
Tim Brooks
35215309b9 Make UnpooledHeapByteBuf array methods protected (#8015)
Motivation:

Currently there is not a clear way to provide a byte array to a netty
ByteBuf and be informed when it is released. This is a would be a
valuable addition for projects that integrate with netty but also pool
their own byte arrays.

Modification:

Modified the UnpooledHeapByteBuf class so that the freeArray method is
protected visibility instead of default. This will allow a user to
subclass the UnpooledHeapByteBuf, provide a byte array, and override
freeArray to return the byte array to a pool when it is called.
Additionally this makes this implementation equivalent to
UnpooledDirectByteBuf (freeDirect is protected).

Additionally allocateArray is also made protect to provide another override
option for subclasses.

Result:

Users can override UnpooledHeapByteBuf#freeArray and
UnpooledHeapByteBuf#allocateArray.
2018-06-13 11:43:31 -07:00
zekaryu
09d9daf1c4 Update the comment of io.netty.buffer.PoolChunk.java (#7934)
Motivation:

When I read the source code, I found that the comment of PoolChunk is out of date, it may confuses readers with the description about memoryMap.

Modifications:

update the last passage of the comment of the PoolChunk class.

Result:
No change to any source code , just update comment.
2018-05-14 15:44:32 +02:00
Xiaoyan Lin
a0ed6ec06c Fix the error message in ReferenceCounted.release (#7921)
Motivation:

When a buffer is over-released, the current error message of `IllegalReferenceCountException` is `refCnt: XXX, increment: XXX`, which is confusing. The correct message should be `refCnt: XXX, decrement: XXX`.

Modifications:

Pass `-decrement` to create `IllegalReferenceCountException`.

Result:

The error message will be `refCnt: XXX, decrement: XXX` when a buffer is over-released.
2018-05-08 20:09:16 +02:00
Rikki Gibson
1b1f7677ac Implement isWritable and ensureWritable on ReadOnlyByteBufferBuf (#7883)
Motivation:

It should be possible to write a ReadOnlyByteBufferBuf to a channel without errors. However, ReadOnlyByteBufferBuf does not override isWritable and ensureWritable, which can cause some handlers to mistakenly assume they can write to the ReadOnlyByteBufferBuf, resulting in ReadOnlyBufferException.

Modification:

Added isWritable and ensureWritable method overrides on ReadOnlyByteBufferBuf to indicate that it is never writable. Added tests for these methods.

Result:

Can successfully write ReadOnlyByteBufferBuf to a channel with an SslHandler (or any other handler which may attempt to write to the ByteBuf it receives).
2018-04-23 08:31:30 +02:00
Nikolay Fedorovskikh
81a7d1413b Makes EmptyByteBuf#hashCode and AbstractByteBuf#hashCode consistent (#7870)
Motivation:
The `AbstractByteBuf#equals` method doesn't take into account the
class of buffer instance. So the two buffers with different classes
must have the same `hashCode` values if `equals` method returns `true`.
But `EmptyByteBuf#hashCode` is not consistent with `#hashCode`
of the empty `AbstractByteBuf`, that is violates the contract and
can lead to errors.

Modifications:
Return `1` in `EmptyByteBuf#hashCode`.

Result:
Consistent behavior of `EmptyByteBuf#hashCode` and `AbstractByteBuf#hashCode`.
2018-04-16 12:11:42 +02:00
Nikolay Fedorovskikh
f8ff834f03 Checks accessibility in the #slice and #duplicate methods of ByteBuf (#7846)
Motivation:
The `ByteBuf#slice` and `ByteBuf#duplicate` methods should check
an accessibility to prevent creation slice or duplicate
of released buffer. At now this works not in the all scenarios.

Modifications:
Add missed checks.

Result:
More correct and consistent behavior of `ByteBuf` methods.
2018-04-10 10:41:50 +02:00
Nikolay Fedorovskikh
a95fd91bc6 Don't check accessible in the #capacity method (#7830)
Motivation:
The `#ensureAccessible` method in `UnpooledHeapByteBuf#capacity` used
to prevent NPE if buffer is released and `array` is `null`. In all
other implementations of `ByteBuf` the accessible is not checked by
`capacity` method. We can assign an empty array to `array`
in the `deallocate` and don't worry about NPE in the `#capacity`.
This will help reduce the number of repeated calls of the
`#ensureAccessible` in many operations with `UnpooledHeapByteBuf`.

Modifications:
1. Remove `#ensureAccessible` call from `UnpooledHeapByteBuf#capacity`.
Use the `EmptyArrays#EMPTY_BYTES` instead of `null` in `#deallocate`.

2. Fix access checks in `AbstractUnsafeSwappedByteBuf` and
`AbstractByteBuf#slice` that relied on `#ensureAccessible`
in `UnpooledHeapByteBuf#capacity`. This was found by unit tests.

Result:
Less double calls of `#ensureAccessible` for `UnpooledHeapByteBuf`.
2018-04-03 21:35:02 +02:00
Norman Maurer
965734a1eb
Limit the number of bytes to use to copy the content of a direct buffer to an Outputstream (#7813)
Motivation:

Currently copying a direct ByteBuf copies it fully into the heap before writing it to an output stream.
The can result in huge memory usage on the heap.

Modification:

copy the bytebuf contents via an 8k buffer into the output stream

Result:

Fixes #7804
2018-03-29 12:49:27 +02:00
Norman Maurer
bd772d127e FixedCompositeByteBuf should allow to access memoryAddress / array when wrap a single buffer.
Motivation:

We should allow to access the memoryAddress / array of the FixedCompositeByteBuf when it only wraps a single ByteBuf. We do the same for CompositeByteBuf.

Modifications:

- Check how many buffers FixedCompositeByteBuf wraps and depending on it delegate the access to the memoryAddress / array
- Add unit tests.

Result:

Fixes [#7752].
2018-03-13 08:50:42 +01:00
kakashiio
12ccd40c5a Correctly throw IndexOutOfBoundsException when writerIndex < readerIndex
Motivation:

If someone invoke writeByte(), markWriterIndex(), readByte() in order first, and then invoke resetWriterIndex() should be throw a IndexOutOfBoundsException to obey the rule that the buffer declared "0 <= readerIndex <= writerIndex <= capacity".

Modification:

Changed the code writerIndex = markedWriterIndex; into writerIndex(markedWriterIndex); to make the check affect

Result:
Throw IndexOutOfBoundsException if any invalid happened in resetWriterIndex.
2018-03-02 10:05:33 +09:00
Francesco Nigro
ed46c4ed00 Copies from read-only heap ByteBuffer to direct ByteBuf can avoid stealth ByteBuf allocation and additional copies
Motivation:

Read-only heap ByteBuffer doesn't expose array: the existent method to perform copies to direct ByteBuf involves the creation of a (maybe pooled) additional heap ByteBuf instance and copy

Modifications:

To avoid stressing the allocator with additional (and stealth) heap ByteBuf allocations is provided a method to perform copies using the (pooled) internal NIO buffer

Result:

Copies from read-only heap ByteBuffer to direct ByteBuf won't create any intermediate ByteBuf
2018-02-27 09:54:21 +09:00
Francesco Nigro
bc8e022601 Added exact utf8 length estimator and exposed writeUtf8 with custom space reservation on destination buffer
Motivation:

To avoid eager allocation of the destination and to perform length prefixed encoding of UTF-8 string with forward only access pattern

Modifications:

The original writeUtf8 is modified by allowing customization of the reserved bytes on the destination buffer and is introduced an exact UTF-8 length estimator.

Result:

Is now possible to perform length first encoding with UTF-8 well-formed char sequences following a forward only write access pattern on the destination buffer.
2018-02-16 11:52:35 +01:00
Scott Mitchell
108fbe5282
ByteBufUtil to not pool direct memory by default
Motivation:
ByteBufUtil by default will cache DirectByteBuffer objects, and the
associated direct memory (up to 64k). In combination with the Recycler which may
cache up to 32k elements per thread may lead to a large amount of direct
memory being retained per EventLoop thread. As traffic spikes come this
may be perceived as a memory leak because the memory in the Recycler
will never be reclaimed.

Modifications:
- By default we shouldn't cache DirectByteBuffer objects.

Result:
Less direct memory consumption due to caching DirectByteBuffer objects.
2018-02-12 10:49:17 -08:00
Norman Maurer
fbbaf2bd7e Cleanup buffer tests.
Motivation:

There is some cleanup that can be done.

Modifications:

- Use intializer list expression where possible
- Remove unused imports.

Result:

Cleaner code.
2018-02-02 07:33:46 +01:00
Norman Maurer
011841e454 ReadOnlyUnsafeDirectByteBuf.memoryAddress() should not throw
Motivation:

We need the memoryAddress of a direct buffer when using our native transports. For this reason ReadOnlyUnsafeDirectByteBuf.memoryAddress() should not throw.

Modifications:

- Correctly override ReadOnlyUnsafeDirectByteBuf.memoryAddress() and hasMemoryAddress()
- Add test case

Result:

Fixes [#7672].
2018-02-02 07:27:26 +01:00
Norman Maurer
95b9b0af5c Increase timeout and decrement number of operations in AbstractByteBufTest.testToStringMultipleThreads
Motivation:

We saw some timeouts on the CI when the leak detection is enabled.

Modifications:

- Use smaller number of operations in test
- Increase timeout

Result:

CI not times out.
2018-01-31 14:57:38 +01:00
Norman Maurer
d2bd36fc4c ByteBufUtil.isText method should be safe to be called concurrently
Motivation:

ByteBufUtil.isText(...) may produce unexpected results if called concurrently on the same ByteBuffer.

Modifications:

- Don't use internalNioBuffer where it is not safe.
- Add unit test.

Result:

ByteBufUtil.isText is thread-safe.
2018-01-31 13:47:49 +01:00
Scott Mitchell
4921f62c8a
HttpResponseStatus object allocation reduction
Motivation:
Usages of HttpResponseStatus may result in more object allocation then necessary due to not looking for cached objects and the AsciiString parsing method not being used due to CharSequence method being used instead.

Modifications:
- HttpResponseDecoder should attempt to get the HttpResponseStatus from cache instead of allocating a new object
- HttpResponseStatus#parseLine(CharSequence) should check if the type is AsciiString and redirect to the AsciiString parsing method which may not require an additional toString call
- HttpResponseStatus#parseLine(AsciiString) can be optimized and doesn't require and may not require object allocation

Result:
Less allocations when dealing with HttpResponseStatus.
2018-01-24 22:01:52 -08:00
Norman Maurer
336bea9dc5 Fix ByteBuf.nioBuffer(...) and nioBuffers(...) docs to reflect reality.
Motivation:

Depending on the implementation of ByteBuf nioBuffer(...) and nioBuffers(...) may either share the content or return a ByteBuffer that contains a copy of the content.

Modifications:

Fix javadocs.

Result:

Correct docs.
2018-01-22 19:50:08 +01:00
Thomas Devanneaux
3ae57cf302 ByteBuf.toString(Charset) is not thread-safe
Motivation:

Calling ByteBuf.toString(Charset) on the same buffer from multiple threads at the same time produces unexpected results, such as various exceptions and/or corrupted output. This is because ByteBufUtil.decodeString(...) is taking the source ByteBuffer for CharsetDecoder.decode() from ByteBuf.internalNioBuffer(int, int), which is not thread-safe.

Modification:

Call ByteBuf.nioBuffer() instead of ByteBuf.internalNioBuffer() to get the source buffer to pass to CharsetDecoder.decode().

Result:

Fixes the possible race condition.
2018-01-21 09:02:42 +01:00
Norman Maurer
819b870b8a Correctly take position into account when wrap a ByteBuffer via ReadOnlyUnsafeDirectByteBuf
Motivation:

We did not correctly take the position into account when wrapping a ByteBuffer via ReadOnlyUnsafeDirectByteBuf as we obtained the memory address from the original ByteBuffer and not the slice we take.

Modifications:

- Correctly use the slice to obtain memory address.
- Add test case.

Result:

Fixes [#7565].
2018-01-16 19:18:59 +01:00
Norman Maurer
e329ca1cf3 Introduce ObjectCleaner and use it in FastThreadLocal to ensure FastThreadLocal.onRemoval(...) is called
Motivation:

There is no guarantee that FastThreadLocal.onRemoval(...) is called if the FastThreadLocal is used by "non" FastThreacLocalThreads. This can lead to all sort of problems, like for example memory leaks as direct memory is not correctly cleaned up etc.

Beside this we use ThreadDeathWatcher to check if we need to release buffers back to the pool when thread local caches are collected. In the past ThreadDeathWatcher was used which will need to "wakeup" every second to check if the registered Threads are still alive. If we can ensure FastThreadLocal.onRemoval(...) is called we do not need this anymore.

Modifications:

- Introduce ObjectCleaner and use it to ensure FastThreadLocal.onRemoval(...) is always called when a Thread is collected.
- Deprecate ThreadDeathWatcher
- Add unit tests.

Result:

Consistent way of cleanup FastThreadLocals when a Thread is collected.
2017-12-21 07:34:44 +01:00
Norman Maurer
1988cd041d Reduce Object allocations in CompositeByteBuf.
Motivation:

We used subList in CompositeByteBuf to remove ranges of elements from the internal storage. Beside this we also used an foreach loop in a few cases which will crate an Iterator.

Modifications:

- Use our own sub-class of ArrayList which exposes removeRange(...). This allows to remove a range of elements without an extra allocation.
- Use an old style for loop to iterate over the elements to reduce object allocations.

Result:

Less allocations.
2017-12-12 09:08:58 +01:00
Norman Maurer
09a05b680d Dont use ThreadDeathWatcher to cleanup PoolThreadCache if FastThreadLocalThread with wrapped Runnable is used
Motivation:

We dont need to use the ThreadDeathWatcher if we use a FastThreadLocalThread for which we wrap the Runnable and ensure we call FastThreadLocal.removeAll() once the Runnable completes.

Modifications:

- Dont use a ThreadDeathWatcher if we are sure we will call FastThreadLocal.removeAll()
- Add unit test.

Result:

Less overhead / running theads if you only allocate / deallocate from FastThreadLocalThreads.
2017-11-28 13:43:28 +01:00
Scott Mitchell
a8bb9dc180 AbstractByteBuf readSlice bound check bug
Motivation:
AbstractByteBuf#readSlice relied upon the bounds checking of the slice operation in order to detect index out of bounds conditions. However the slice bounds checking operation allows for the slice to go beyond the writer index, and this is out of bounds for a read operation.

Modifications:
- AbstractByteBuf#readSlice and AbstractByteBuf#readRetainedSlice should ensure the desired amount of bytes are readable before taking a slice

Result:
No reading of undefined data in AbstractByteBuf#readSlice and AbstractByteBuf#readRetainedSlice.
2017-11-18 09:03:42 +01:00
Norman Maurer
cc069722a2 CompositeBytebuf.copy() and copy(...) should respect the allocator
Motivation:

When calling CompositeBytebuf.copy() and copy(...) we currently use Unpooled to allocate the buffer. This is not really correct and may produce more GC then needed. We should use the allocator that was used when creating the CompositeByteBuf to allocate the new buffer which may be for example the PooledByteBufAllocator.

Modifications:

- Use alloc() to allocate the new buffer.
- Add tests
- Fix tests that depend on the copy to be backed by an byte-array without checking hasArray() first.

Result:

Fixes [#7393].
2017-11-10 07:17:16 -08:00
Idel Pivnitskiy
50a067a8f7 Make methods 'static' where it possible
Motivation:

Even if it's a super micro-optimization (most JVM could optimize such
 cases in runtime), in theory (and according to some perf tests) it
 may help a bit. It also makes a code more clear and allows you to
 access such methods in the test scope directly, without instance of
 the class.

Modifications:

Add 'static' modifier for all methods, where it possible. Mostly in
test scope.

Result:

Cleaner code with proper 'static' modifiers.
2017-10-21 14:59:26 +02:00
Nikolay Fedorovskikh
17e1a26d64 Fixes a javadoc for ByteBufUtil#copy method
Motivation:
Javadoc of the `ByteBufUtil#copy(AsciiString, int, ByteBuf, int, int)` is incorrect.

Modifications:
Fix it.

Result:
The description of the `#copy` method is not misleading.
2017-10-21 14:51:20 +02:00
Nikolay Fedorovskikh
09dd6a5d4d Minor improvements in ByteBufOutputStream
Motivation:
In the `ByteBufOutputStream` we can use an appropriate methods of `ByteBuf`
to reduce calls of virtual methods and do not copying converting logic.

Modifications:
- Use an appropriate methods of `ByteBuf`
- Remove redundant conversions (int -> byte, int -> char).
- Use `ByteBuf#writeCharSequence` in the `writeBytes(String)'.

Result:
Less code duplication. A `writeBytes(String)` method is faster.
No unnecessary conversions. More consistent and cleaner code.
2017-10-21 14:44:13 +02:00
Carl Mastrangelo
16b1dbdf92 Motivation: Resource Leak Detector (RLD) tries to helpfully indicate where an object was last accessed and report the accesses in the case the object was not cleaned up. It handles lightly used objects well, but drops all but the last few accesses.
Configuring this is tough because there is split between highly shared (and accessed) objects and lightly accessed objects.

Modification:
There are a number of changes here.  In relative order of importance:

API / Functionality changes:
* Max records and max sample records are gone.  Only "target" records, the number of records tries to retain is exposed.
* Records are sampled based on the number of already stored records.  The likelihood of recording a new sample is `2^(-n)`, where `n` is the number of currently stored elements.
* Records are stored in a concurrent stack structure rather than a list.  This avoids a head and tail.  Since the stack is only read once, there is no need to maintain head and tail pointers
* The properties of this imply that the very first and very last access are always recorded.  When deciding to sample, the top element is replaced rather than pushed.
* Samples that happen between the first and last accesses now have a chance of being recorded.  Previously only the final few were kept.
* Sampling is no longer deterministic.  Previously, a deterministic access pattern meant that you could conceivably always miss some access points.
* Sampling has a linear ramp for low values and and exponentially backs off roughly equal to 2^n.  This means that for 1,000,000 accesses, about 20 will actually be kept.  I have an elegant proof for this which is too large to fit in this commit message.

Code changes:
* All locks are gone.  Because sampling rarely needs to do a write, there is almost 0 contention.  The dropped records counter is slightly contentious, but this could be removed or changed to a LongAdder.  This was not done because of memory concerns.
* Stack trace exclusion is done outside of RLD.  Classes can opt to remove some of their methods.
* Stack trace exclusion is faster, since it uses String.equals, often getting a pointer compare due to interning.  Previously it used contains()
* Leak printing is outputted fairly differently.  I tried to preserve as much of the original formatting as possible, but some things didn't make sense to keep.

Result:
More useful leak reporting.

Faster:
```
Before:
Benchmark                                           (recordTimes)   Mode  Cnt       Score      Error  Units
ResourceLeakDetectorRecordBenchmark.record                      8  thrpt   20  136293.404 ± 7669.454  ops/s
ResourceLeakDetectorRecordBenchmark.record                     16  thrpt   20   72805.720 ± 3710.864  ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint              8  thrpt   20  139131.215 ± 4882.751  ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint             16  thrpt   20   74146.313 ± 4999.246  ops/s

After:
Benchmark                                           (recordTimes)   Mode  Cnt       Score      Error  Units
ResourceLeakDetectorRecordBenchmark.record                      8  thrpt   20  155281.969 ± 5301.399  ops/s
ResourceLeakDetectorRecordBenchmark.record                     16  thrpt   20   77866.239 ± 3821.054  ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint              8  thrpt   20  153360.036 ± 8611.353  ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint             16  thrpt   20   78670.804 ± 2399.149  ops/s
```
2017-10-19 12:21:21 -07:00
Carl Mastrangelo
83a19d5650 Optimistically update ref counts
Motivation:
Highly retained and released objects have contention on their ref
count.  Currently, the ref count is updated using compareAndSet
with care to make sure the count doesn't overflow, double free, or
revive the object.

Profiling has shown that a non trivial (~1%) of CPU time on gRPC
latency benchmarks is from the ref count updating.

Modification:
Rather than pessimistically assuming the ref count will be invalid,
optimistically update it assuming it will be.  If the update was
wrong, then use the slow path to revert the change and throw an
execption.  Most of the time, the ref counts are correct.

This changes from using compareAndSet to getAndAdd, which emits a
different CPU instruction on x86 (CMPXCHG to XADD).  Because the
CPU knows it will modifiy the memory, it can avoid contention.

On a highly contended machine, this can be about 2x faster.

There is a downside to the new approach.  The ref counters can
temporarily enter invalid states if over retained or over released.
The code does handle these overflow and underflow scenarios, but it
is possible that another concurrent access may push the failure to
a different location.  For example:

Time 1 Thread 1: obj.retain(INT_MAX - 1)
Time 2 Thread 1: obj.retain(2)
Time 2 Thread 2: obj.retain(1)

Previously Thread 2 would always succeed and Thread 1 would always
fail on the second access.  Now, thread 2 could fail while thread 1
is rolling back its change.

====

There are a few reasons why I think this is okay:

1. Buggy code is going to have bugs.  An exception _is_ going to be
   thrown.  This just causes the other threads to notice the state
   is messed up and stop early.
2. If high retention counts are a use case, then ref count should
   be a long rather than an int.
3. The critical section is greatly reduced compared to the previous
   version, so the likelihood of this happening is lower
4. On error, the code always rollsback the change atomically, so
   there is no possibility of corruption.

Result:
Faster refcounting

```
BEFORE:

Benchmark                                                                                             (delay)    Mode      Cnt         Score    Error  Units
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                            1  sample  2901361       804.579 ±  1.835  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                           10  sample  3038729       785.376 ± 16.471  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                          100  sample  2899401       817.392 ±  6.668  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                         1000  sample  3650566      2077.700 ±  0.600  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                        10000  sample  3005467     19949.334 ±  4.243  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                          1  sample   456091        48.610 ±  1.162  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                         10  sample   732051        62.599 ±  0.815  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                        100  sample   778925       228.629 ±  1.205  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                       1000  sample   633682      2002.987 ±  2.856  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                      10000  sample   506442     19735.345 ± 12.312  ns/op

AFTER:
Benchmark                                                                                             (delay)    Mode      Cnt         Score    Error  Units
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                            1  sample  3761980       383.436 ±  1.315  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                           10  sample  3667304       474.429 ±  1.101  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                          100  sample  3039374       479.267 ±  0.435  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                         1000  sample  3709210      2044.603 ±  0.989  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                        10000  sample  3011591     19904.227 ± 18.025  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                          1  sample   494975        52.269 ±  8.345  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                         10  sample   771094        62.290 ±  0.795  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                        100  sample   763230       235.044 ±  1.552  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                       1000  sample   634037      2006.578 ±  3.574  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                      10000  sample   506284     19742.605 ± 13.729  ns/op

```
2017-10-04 08:42:33 +02:00