Commit Graph

759 Commits

Author SHA1 Message Date
Nick Hill 5e786e549e Fix incorrect behavior of ReadOnlyByteBufferBuf.getBytes(int,ByteBuffer) (#9125)
* Fix incorrect behavior of ReadOnlyByteBufferBuf.getBytes(int,ByteBuffer)

Motivation

It currently will succeed when the destination is larger than the source
range, but the ByteBuf javadoc states this should be a failure, as is
the case with all the other implementations.

Modifications

- Fix logic to fail the bounds check in this case
- Remove explicit null check which isn't done in any equivalent method
- Add unit test

Result

More correct/consistent behaviour
2019-05-13 07:03:10 +02:00
Norman Maurer 7e6d1bbcfd Fix regression in CompositeByteBuf.discard*ReadBytes() (#9068)
Motivation:

1f93bd3 introduced a regression that could lead to not have the lastAccessed field correctly null'ed out when the endOffset of the internal Component == CompositeByteBuf.readerIndex()

Modifications:

- Correctly null out the lastAccessed field in any case
- Add unit tests

Result:

Fixes regression in CompositeByteBuf.discard*ReadBytes()
2019-04-17 18:03:29 +02:00
Nick Hill e75b7edaa3 Centralize internal reference counting logic (#8614)
Motivation

AbstractReferenceCounted and AbstractReferenceCountedByteBuf contain
duplicate logic for managing the volatile refcount in an optimized and
consistent manner, which increased in complexity in #8583. It's possible
to extract this into a common helper class now that all access is via an
AtomicIntegerFieldUpdater.

Modifications

- Move duplicate logic into a shared ReferenceCountUpdater class
- Incorporate some additional simplification for the most common single
increment/decrement cases (fewer checks/operations)

Result

Less code duplication, better encapsulation of the "non-trivial"
internal volatile refcount manipulation
2019-04-09 16:23:15 +02:00
Nick Hill 1f93bd36b6 CompositeByteBuf optimizations and new addFlattenedComponents method (#8939)
Motivation:

The CompositeByteBuf discardReadBytes / discardReadComponents methods are currently quite inefficient, including when there are no read components to discard. We would like to call the latter more frequently in ByteToMessageDecoder#COMPOSITE_CUMULATOR.

In the same context it would be beneficial to perform a "shallow copy" of a composite buffer (for example when it has a refcount > 1) to avoid having to allocate and copy the contained bytes just to obtain an "independent" cumulation.

Modifications:

- Optimize discardReadBytes() and discardReadComponents() implementations (start at first comp rather than performing a binary search for the readerIndex).
- New addFlattenedComponents(boolean,ByteBuf) method which performs a shallow copy if the provided buffer is also composite and avoids adding any empty buffers, plus unit test.
- Other minor optimizations to avoid unnecessary checks.

Results:

discardReadXX methods are faster, composite buffers can be easily appended without deepening the buffer "tree" or retaining unused components.
2019-04-08 20:55:19 +02:00
Norman Maurer 1eabb752c0 Fix NPE that was encounter by debugger (will never happen in real code). (#8992)
Motivation:

We synchronize on the chunk.arena when produce the String returned by PoolSubpage.toString() which may raise a NPE when chunk == null. Chunk == null for the head of the linked-list and so a NPE may raised by a debugger. This NPE can never happen in real code tho as we never access toString() of the head.

Modifications:

Add null checks and so fix the possible NPE

Result:

No NPE when using a debugger and inspect the PooledByteBufAllocator.
2019-04-01 19:44:45 +02:00
Norman Maurer 78c4abbfb7
Change default of io.netty.allocator.useCacheForAllThreads to false (#8991)
Motivation:

We currently use a thread local cache for all threads which often is suprising to users as it may result in a lot of memory usage if they allocate buffers from outside the EventLoop in different threads. We should better not do this by default to keep suprises to a minimum. Users that need the performance and know what they are doing can still change this.

Modifications:

Change io.netty.allocator.useCacheForAllThreads to false by default

Result:

Related to https://github.com/netty/netty/issues/8536.
2019-04-01 19:43:19 +02:00
Norman Maurer ab7accb2de Don't try to put back MemoryRegionCache.Entry objects into the Recycler when recycled because of a finalizer. (#8955)
Motivation:

In MemoryRegionCache.Entry we use the Recycler to reduce GC pressure and churn. The problem is that these will also be recycled when the PoolThreadCache is collected and finalize() is called. This then can have the effect that we try to load class but the WebApp is already stoped.

This will produce an stacktrace like this on Tomcat:

```
19-Mar-2019 15:53:21.351 INFO [Finalizer] org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading Illegal access: this web application instance has been stopped already. Could not load [java.util.WeakHashMap]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
 java.lang.IllegalStateException: Illegal access: this web application instance has been stopped already. Could not load [java.util.WeakHashMap]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
	at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading(WebappClassLoaderBase.java:1383)
	at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForClassLoading(WebappClassLoaderBase.java:1371)
	at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1224)
	at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1186)
	at io.netty.util.Recycler$3.initialValue(Recycler.java:233)
	at io.netty.util.Recycler$3.initialValue(Recycler.java:230)
	at io.netty.util.concurrent.FastThreadLocal.initialize(FastThreadLocal.java:188)
	at io.netty.util.concurrent.FastThreadLocal.get(FastThreadLocal.java:142)
	at io.netty.util.Recycler$Stack.pushLater(Recycler.java:624)
	at io.netty.util.Recycler$Stack.push(Recycler.java:597)
	at io.netty.util.Recycler$DefaultHandle.recycle(Recycler.java:225)
	at io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry.recycle(PoolThreadCache.java:478)
	at io.netty.buffer.PoolThreadCache$MemoryRegionCache.freeEntry(PoolThreadCache.java:459)
	at io.netty.buffer.PoolThreadCache$MemoryRegionCache.free(PoolThreadCache.java:430)
	at io.netty.buffer.PoolThreadCache$MemoryRegionCache.free(PoolThreadCache.java:422)
	at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:279)
	at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:270)
	at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:241)
	at io.netty.buffer.PoolThreadCache.finalize(PoolThreadCache.java:230)
	at java.lang.System$2.invokeFinalize(System.java:1270)
	at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:102)
	at java.lang.ref.Finalizer.access$100(Finalizer.java:34)
	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:217)
```

Beside this we also need to ensure we not try to lazy load SizeClass when the finalizer is used as it may not be present anymore if the ClassLoader is already destroyed.

This would produce an error like:

```
20-Mar-2019 11:26:35.254 INFO [Finalizer] org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading Illegal access: this web application instance has been stopped already. Could not load [io.netty.buffer.PoolArena$1]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
 java.lang.IllegalStateException: Illegal access: this web application instance has been stopped already. Could not load [io.netty.buffer.PoolArena$1]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
	at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading(WebappClassLoaderBase.java:1383)
	at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForClassLoading(WebappClassLoaderBase.java:1371)
	at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1224)
	at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1186)
	at io.netty.buffer.PoolArena.freeChunk(PoolArena.java:287)
	at io.netty.buffer.PoolThreadCache$MemoryRegionCache.freeEntry(PoolThreadCache.java:464)
	at io.netty.buffer.PoolThreadCache$MemoryRegionCache.free(PoolThreadCache.java:429)
	at io.netty.buffer.PoolThreadCache$MemoryRegionCache.free(PoolThreadCache.java:421)
	at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:278)
	at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:269)
	at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:240)
	at io.netty.buffer.PoolThreadCache.finalize(PoolThreadCache.java:229)
	at java.lang.System$2.invokeFinalize(System.java:1270)
	at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:102)
	at java.lang.ref.Finalizer.access$100(Finalizer.java:34)
	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:217)
```

Modifications:

- Only try to put the Entry back into the Recycler if the PoolThredCache is not destroyed because of the finalizer.
- Only try to access SizeClass if not triggered by finalizer.

Result:

No IllegalStateException anymoe when a webapp is reloaded in Tomcat that uses netty and uses the PooledByteBufAllocator.
2019-03-22 12:16:53 +01:00
Nick Hill a71e33ae35 Fix possible ByteBuf leak when CompositeByteBuf is resized (#8946)
Motivation:

The special case fixed in #8497 also requires that we keep a derived slice when trimming components in place, as done by the capacity(int) and discardReadBytes() methods.

Modifications:

Ensure that we keep a ref to trimmed components' original retained slice in capacity(int) and discardReadBytes() methods, so that it is released properly when the they are later freed. Add unit test which fails prior to the fix.

Result:

Edge case leak is eliminated.
2019-03-22 11:18:28 +01:00
Norman Maurer 9681842d54 Allow to automatically trim the PoolThreadCache in a timely interval (#8941)
Motivation:

PooledByteBufAllocator uses a PoolThreadCache per Thread that allocates / deallocates to minimize the performance overhead. This PoolThreadCache is trimmed after X allocations to free up buffers that are not allocated for a long time. This works out quite well when the app continues to allocate but fails if the app stops to allocate frequently (for whatever reason) and so a lot of memory is wasted and not given back to the arena / freed.

Modifications:

- Add a ThreadExecutorMap that offers multiple methods that wrap Runnable / ThreadFactory / Executor and allow to call ThreadExecutorMap.currentEventExecutor() to get the current executing EventExecutor for the calling Thread.
- Use these methods in the constructors of our EventExecutor implementations (which also covers the EventLoop implementations)
- Add io.netty.allocator.cacheTrimIntervalMillis system property which can be used to specify a fixed rate / interval on which we should try to trim the PoolThreadCache for a EventExecutor that allocates.
- Add PooledByteBufAllocator.trimCurrentThreadCache() to allow the user to trim the cache of the calling thread manually.
- Add testcases
- Introduce FastThreadLocal.getIfExists()

Result:

Allow to better / more frequently trim PoolThreadCache and so give back memory to the area / system.
2019-03-22 11:17:40 +01:00
Norman Maurer 795f9600ba Adjust tests to be able to build / test when using IBM J9 / OpenJ9 (#8900)
Motivation:

We should run a CI job using J9 to ensure netty also works when using different JVMs.

Modifications:

- Adjust PooledByteBufAllocatorTest to be able to complete faster when using a JVM which takes longer when joining Threads (this seems to be the case with J9).

Result:

Be able to run CI against J9.
2019-03-01 06:48:35 +01:00
Nick Hill 35161ad174 Further reduce ensureAccessible() overhead (#8895)
Motivation:

This PR fixes some non-negligible overhead discovered in the ByteBuf
accessibility (non-zero refcount) checking. The cause turned out to be
mostly twofold:
- Unnecessary operations used to calculate the refcount from the "raw"
encoded int field value
- Call stack depths exceeding the default limit for inlining, in some
places (CompositeByteBuf in particular)

It's a follow-on from #8882 which uses the maxCapacity field for a
simpler non-negative check. The performance gap between these two
variants appears to be _mostly_ closed, but there's one exception which
may warrant further analysis.

Modifications:

- Replace ABB.internalRefCount() with ByteBuf.isAccessible(), the
default still checks for non-zero refCnt()
- Just test for parity of raw refCnt instead of converting to "real",
with fast-path for specific small values
- Make sure isAccessible() is delegated by derived/wrapper ByteBufs
- Use existing freed flag in CompositeByteBuf for faster isAccessible()
- Manually inline some calls in methods like CompositeByteBuf.setLong()
and AbstractReferenceCountedByteBuf.isAccessible() to reduce stack
depths (to ensure default inlining limit isn't hit)
- Add ByteBufAccessBenchmark which is an extension of
UnsafeByteBufBenchmark (maybe latter could now be removed)

Results:

Before:

Benchmark   (bufferType)  (checkAccessible)  (checkBounds)   Mode  Cnt
Score          Error  Units
readBatch         UNSAFE               true           true  thrpt   30
84524972.863 ±   518338.811  ops/s
readBatch   UNSAFE_SLICE               true           true  thrpt   30
38608795.037 ±   298176.974  ops/s
readBatch           HEAP               true           true  thrpt   30
80003697.649 ±   974674.119  ops/s
readBatch      COMPOSITE               true           true  thrpt   30
18495554.788 ±   108075.023  ops/s
setGetLong        UNSAFE               true           true  thrpt   30
247069881.578 ± 10839162.593  ops/s
setGetLong  UNSAFE_SLICE               true           true  thrpt   30
196355905.206 ±  1802420.990  ops/s
setGetLong          HEAP               true           true  thrpt   30
245686644.713 ± 11769311.527  ops/s
setGetLong     COMPOSITE               true           true  thrpt   30
83170940.687 ±   657524.123  ops/s
setLong           UNSAFE               true           true  thrpt   30
278940253.918 ±  1807265.259  ops/s
setLong     UNSAFE_SLICE               true           true  thrpt   30
202556738.764 ± 11887973.563  ops/s
setLong             HEAP               true           true  thrpt   30
280045958.053 ±  2719583.400  ops/s
setLong        COMPOSITE               true           true  thrpt   30
121299806.002 ±  2155084.707  ops/s


After:

Benchmark   (bufferType)  (checkAccessible)  (checkBounds)   Mode  Cnt
Score          Error  Units
readBatch         UNSAFE               true           true  thrpt   30
101641801.035 ±  3950050.059  ops/s
readBatch   UNSAFE_SLICE               true           true  thrpt   30
84395902.846 ±  4339579.057  ops/s
readBatch           HEAP               true           true  thrpt   30
100179060.207 ±  3222487.287  ops/s
readBatch      COMPOSITE               true           true  thrpt   30
42288494.472 ±   294919.633  ops/s
setGetLong        UNSAFE               true           true  thrpt   30
304530755.027 ±  6574163.899  ops/s
setGetLong  UNSAFE_SLICE               true           true  thrpt   30
212028547.645 ± 14277828.768  ops/s
setGetLong          HEAP               true           true  thrpt   30
309335422.609 ±  2272150.415  ops/s
setGetLong     COMPOSITE               true           true  thrpt   30
160383609.236 ±   966484.033  ops/s
setLong           UNSAFE               true           true  thrpt   30
298055969.747 ±  7437449.627  ops/s
setLong     UNSAFE_SLICE               true           true  thrpt   30
223784178.650 ±  9869750.095  ops/s
setLong             HEAP               true           true  thrpt   30
302543263.328 ±  8140104.706  ops/s
setLong        COMPOSITE               true           true  thrpt   30
157083673.285 ±  3528779.522  ops/s

There's also a similar knock-on improvement to other benchmarks (e.g.
HPACK encoding/decoding) as shown in #8882.

For sanity I did a final comparison of the "fast path" tweak using one
of the HPACK benchmarks:

(rawCnt & 1) == 0:

Benchmark                     (limitToAscii)  (sensitive)  (size)   Mode
Cnt      Score     Error  Units
HpackDecoderBenchmark.decode            true         true  MEDIUM  thrpt
30  50914.479 ± 940.114  ops/s


rawCnt == 2 || rawCnt == 4 || rawCnt == 6 || rawCnt == 8 ||  (rawCnt &
1) == 0:

Benchmark                     (limitToAscii)  (sensitive)  (size)   Mode
Cnt      Score      Error  Units
HpackDecoderBenchmark.decode            true         true  MEDIUM  thrpt
30  60036.425 ± 1478.196  ops/s
2019-02-28 20:41:16 +01:00
Dmitriy Dumanskiy 116f72db8d Legacy properties removed (#8839)
Motivation:

We can remove some properties for which we introduced replacements.

Modifications:

io.netty.buffer.bytebuf.checkAccessible, io.netty.leakDetectionLevel, org.jboss.netty.tryUnsafe properties removed

Result:

Code cleanup
2019-02-04 13:56:15 +01:00
田欧 e8efcd82a8 migrate java8: use requireNonNull (#8840)
Motivation:

We can just use Objects.requireNonNull(...) as a replacement for ObjectUtil.checkNotNull(....)

Modifications:

- Use Objects.requireNonNull(...)

Result:

Less code to maintain.
2019-02-04 10:32:25 +01:00
Nick Hill 0ac8db0307 CompositeByteBuf tidy-up (#8784)
Motivation

There's some miscellaneous cleanup/simplification of CompositeByteBuf
which would help make the code a bit clearer.

Modifications

- Simplify web of constructors and addComponents methods, reducing
duplication of logic
- Rename `Component.freeIfNecessary()` method to just `free()`, which is
less confusing (see #8641)
- Make loop in addComponents0(...) method more verbose/readable (see
https://github.com/netty/netty/pull/8437#discussion_r232124414)
- Simplify addition/subtraction in setBytes(...) methods

Result

Smaller/clearer code
2019-02-01 10:32:02 +01:00
田欧 d7648f1d93 use checkPositive/checkPositiveOrZero (#8803)
Motivation:

We have a utility method to check for > 0 and >0 arguments. We should use it.

Modification:

use checkPositive/checkPositiveOrZero instead of if statement.

Result:

Re-use utility method.
2019-01-31 09:06:59 +01:00
田欧 6222101924 migrate java8: use lambda and method reference (#8781)
Motivation:

We can use lambdas now as we use Java8.

Modification:

use lambda function for all package, #8751 only migrate transport package.

Result:

Code cleanup.
2019-01-29 14:06:05 +01:00
田欧 e941cbe27a remove unused import statement (#8792)
Motivation:
The code contained some unused import statements.

Modification:
Remove unused import statements.

Result:
Code cleanup
2019-01-28 16:50:15 +01:00
kezhenxu94 7b6336f1fd Java 8 Migration: remove uneccessary if statement (#8755)
Motivation:

As netty 4.x supported Java 6 we had various if statements to check for java versions < 8. We can remove these now.

Modification:

Remove unnecessary if statements that check for java versions < 8.

Result:

Cleanup code.
2019-01-25 08:57:11 +01:00
Norman Maurer 310f31b392
Update to new checkstyle plugin (#8777)
Motivation:

We need to update to a new checkstyle plugin to allow the usage of lambdas.

Modifications:

- Update to new plugin version.
- Fix checkstyle problems.

Result:

Be able to use checkstyle plugin which supports new Java syntax.
2019-01-24 16:24:19 +01:00
Nick Hill 9e7385ddba Fix three bugs in CompositeByteBuf (#8773)
Motivation

In #8758, @doom369 reported an infinite loop bug in CompositeByteBuf
which was introduced in #8437.

This is the same small fix for that, along with fixes for two other bugs
found while re-inspecting the changes and adding unit tests.

Modification

- Replace recursive call to toComponentIndex with toComponentIndex0 as
intended
- Add missed "lastAccessed" racy cache invalidation in capacity(int)
method
- Fix incorrect determination of initial offset in non-zero cIndex case
of updateComponentOffsets method
- New unit tests for previously uncovered methods

Results

Fewer bugs.
2019-01-24 14:59:33 +01:00
Dmitriy Dumanskiy 4a10357fd8 IDE warnings cleanup (#8768)
Motivation:

IDE shows some warnings

Modification:

Small IDE warnings cleanup in different places.

Result:

Less warnings
2019-01-23 14:01:48 +01:00
Dmitriy Dumanskiy 7b92ff2500 Java 8 migration. Remove ThreadLocalProvider and inline java.util.concurrent.ThreadLocalRandom.current() where necessary. (#8762)
Motivation:

Custom Netty ThreadLocalRandom and ThreadLocalRandomProvider classes are no longer needed and can be removed.

Modification:

Remove own ThreadLocalRandom

Result:

Less code to maintain
2019-01-22 20:14:28 +01:00
田欧 9d62deeb6f Java 8 migration: Use diamond operator (#8749)
Motivation:

We can use the diamond operator these days.

Modification:

Use diamond operator whenever possible.

Result:

More modern code and less boiler-plate.
2019-01-22 16:07:26 +01:00
Dmitriy Dumanskiy b2a9a8e9fe Java 8 migration. Removed custom LongCounter and LongAdderCounter classes and replaced with direct reference to LongAdder (#8750)
Motivation:

After switching to Java 8 we no longer need LongCounter and LongAdderCounter. We can directly replace them with LongAdder.

Modification:

Removed LongCounter and LongAdderCounter classes.

Result:

Less code to maintain
2019-01-22 13:53:28 +01:00
Derek Lewis 13f640cbba Updating `ByteBuf` Javadocs to represent actual behaviour. (#8709)
Motivation:

The javadocs stating `IndexOutOfBoundsException` is thrown were
different from what `ByteBuf` actually did. We want to ensure the
Javadocs represent reality.

Modifications:

Updated javadocs on `write*`, `ensureWriteable`, `capacity`, and
`maxCapacity` methods.

Results:

Javadocs more closely match actual behaviour.
2019-01-14 20:09:18 +01:00
kashike c0aa1ea5c7 Fix minor spelling issues in javadocs (#8701)
Motivation:

Javadocs contained some spelling errors, we should fix these.

Modification:

Fix spelling

Result:

Javadoc cleanup.
2019-01-14 07:25:13 +01:00
Norman Maurer d9a6cf341c
Remove support for marking reader and writerIndex in ByteBuf to reduce overhead and complexity. (#8636)
Motivation:

ByteBuf supports “marker indexes”. The intended use case for these is if a speculative operation (e.g. decode) is in process the user can “mark” and interface and refer to it later if the operation isn’t successful (e.g. not enough data). However this is rarely used in practice,
requires extra memory to maintain, and introduces complexity in the state management for derived/pooled buffer initialization, resizing, and other operations which may modify reader/writer indexes.

Modifications:

Remove support for marking and adjust testcases / code.

Result:

Fixes https://github.com/netty/netty/issues/8535.
2018-12-11 14:00:49 +01:00
Nick Travers 0a8c93a6fa Loosen bounds check on CompositeByteBuf's maxNumComponents (#8621)
Motivation:

In versions of Netty prior to 4.1.31.Final, a CompositeByteBuf could be
created with any size (including potentially nonsensical negative
values). This behavior changed in e7737b993, which introduced a bounds
check to only allow for a component size greater than one. This broke
some existing use cases that attempted to create a byte buf with a
single component.

Modifications:

Lower the bounds check on numComponents to include the single component
case, but still throw an exception for anything less than one.

Add unit tests for the case of numComponents being less than, equal to,
and greater than this lower bound.

Result:

Return to the behavior of 4.1.30.Final, allowing one component, but
still include an explicit check against a lower bound.

Note that while creating a CompositeByteBuf with a single component is
in some ways a contradiction of the term "composite", this patch caters
for existing uses while excluding the clearly nonsensical case of asking
for a CompositeByteBuf with zero or fewer components.

Fixes #8613.
2018-12-05 08:43:53 +01:00
Norman Maurer 85c1590a90 Provide a way to cache the internal nioBuffer of the PooledByteBuffer… (#8603)
Motivation:

Often a temporary ByteBuffer is used which can be cached to reduce the GC pressure.

Modifications:

Cache the ByteBuffer in the PoolThreadCache as well.

Result:

Less GC.
2018-12-04 15:39:55 +01:00
Nick Hill fc0a4e15ab Harden ref-counting concurrency semantics (#8583)
Motivation

#8563 highlighted race conditions introduced by the prior optimistic
update optimization in 83a19d5650. These
were known at the time but considered acceptable given the perf
benefit in high contention scenarios.

This PR proposes a modified approach which provides roughly half the
gains but stronger concurrency semantics. Race conditions still exist
but their scope is narrowed to much less likely cases (releases
coinciding with retain overflow), and even in those
cases certain guarantees are still assured. Once release() returns true,
all subsequent release/retains are guaranteed to throw, and in
particular deallocate will be called at most once.

Modifications

- Use even numbers internally (including -ve) for live refcounts
- "Final" release changes to odd number (equivalent to refcount 0)
- Retain still uses faster getAndAdd, release uses CAS loop
- First CAS attempt uses non-volatile read
- Thread.yield() after a failed CAS provides a net gain

Result

More (though not completely) robust concurrency semantics for ref
counting; increased latency under high contention, but still roughly
twice as fast as the original logic. Bench results to follow
2018-11-29 08:32:50 +01:00
Norman Maurer 42e0ce6f7a Move less common code-path to extra method to allow inlining of writeUtf8. (#8600)
Motivation:

ByteBuf is used everywhere so we should try hard to be able to make things inlinable. During benchmarks it showed that writeCharSequence(...) fails to inline writeUtf8 because it is too big even if its hots.

Modifications:

Move less common code-path to extra method to allow inlining.

Result:

Be able to inline writeUtf8 in most cases.
2018-11-27 21:04:13 +01:00
Norman Maurer 45e9fd4802 Revert "Provide a way to cache the internal nioBuffer of the PooledByteBuffer to reduce GC. (#8593)"
This reverts commit 945f123f8b as it seems to produce some failures in some cases. This needs more research.
2018-11-27 20:03:33 +01:00
Norman Maurer 945f123f8b Provide a way to cache the internal nioBuffer of the PooledByteBuffer to reduce GC. (#8593)
Motivation:

Often a temporary ByteBuffer is used which can be cached to reduce the GC pressure.

Modifications:

Add a Deque per PoolChunk which will be used for caching.

Result:

Less GC.
2018-11-27 13:55:24 +01:00
Rolandz 836c39b82f Fix offset calculation in PooledByteBufAllocator when used
Motivation:

When we create new chunk with memory aligned, the offset of direct memory should be
'alignment - address & (alignment - 1)', not just 'address & (alignment - 1)'.

Modification:

Change offset calculating formula to offset = alignment - address & (alignment - 1) in PoolArena.DirectArena#offsetCacheLine and add a unit test to assert that.

Result:

Correctly calculate offset.
2018-11-27 11:47:40 +01:00
Nick Hill 804e1fa9cc Fix ref-counting when CompositeByteBuf is used with retainedSlice() (#8497)
Motivation:

ByteBuf.retainedSlice() and similar methods produce sliced buffers with
an independent refcount to the buffer that they wrap.


One of the optimizations in 10539f4dc7 was
to use the ref to the unwrapped buffer object for added slices, but this
did not take into account the above special case when later releasing.

Thanks to @rkapsi for discovering this via #8495.

Modifications:

Since a reference to the slice is still kept in the Component class,
just changed Component.freeIfNecessary() to release the slice in
preference to the unwrapped buf.

Also added a unit test which reproduces the bug.

Result:

Fixes #8495
2018-11-13 20:56:09 +01:00
Nick Hill 0f8ce1b284 Fix incorrect sizing of temp byte arrays in (Unsafe)ByteBufUtil (#8484)
Motivation:

Two similar bugs were introduced by myself in separate recent PRs #8393
and #8464, while optimizing the assignment/handling of temporary arrays
in ByteBufUtil and UnsafeByteBufUtil.

The temp arrays allocated for buffering data written to an OutputStream
are incorrectly sized to the full length of the data to copy rather than
being capped at WRITE_CHUNK_SIZE.

Unfortunately one of these is in the 4.1.31.Final release, I'm really
sorry and will be more careful in future.

This kind of thing is tricky to cover in unit tests.

Modifications:

Revert the temp array allocations back to their original sizes.

Avoid making duplicate calls to ByteBuf.capacity() in a couple of places
in ByteBufUtil (unrelated thing I noticed, can remove it from this PR if
desired!)

Result:

Temporary byte arrays will be reverted to their originally intended
sizes.
2018-11-09 18:24:38 +01:00
Nick Hill 5954110b9a Use ByteBufUtil.BYTE_ARRAYS ThreadLocal temporary arrays in more places (#8464)
Motivation:

#8388 introduced a reusable ThreadLocal<byte[]> for use in
decodeString(...). It can be used in more places in the buffer package
to avoid temporary allocations of small arrays.

Modifications:

Encapsulate use of the ThreadLocal in a static package-private
ByteBufUtil.threadLocalTempArray(int) method, and make use of it from a
handful of new places including ByteBufUtil.readBytes(...).

Result:

Fewer short-lived small byte array allocations.
2018-11-05 21:11:28 +01:00
Nick Hill 10539f4dc7 Streamline CompositeByteBuf internals (#8437)
Motivation:

CompositeByteBuf is a powerful and versatile abstraction, allowing for
manipulation of large data without copying bytes. There is still a
non-negligible cost to reading/writing however relative to "singular"
ByteBufs, and this can be mostly eliminated with some rework of the
internals.

My use case is message modification/transformation while zero-copy
proxying. For example replacing a string within a large message with one
of a different length

Modifications:

- No longer slice added buffers and unwrap added slices
   - Components store target buf offset relative to position in
composite buf
   - Less allocations, object footprint, pointer indirection, offset
arithmetic
- Use Component[] rather than ArrayList<Component>
   - Avoid pointer indirection and duplicate bounds check, more
efficient backing array growth
   - Facilitates optimization when doing bulk-inserts - inserting n
ByteBufs behind m is now O(m + n) instead of O(mn)
- Avoid unnecessary casting and method call indirection via superclass
- Eliminate some duplicate range/ref checks via non-checking versions of
toComponentIndex and findComponent
- Add simple fast-path for toComponentIndex(0); add racy cache of
last-accessed Component to findComponent(int)
- Override forEachByte0(...) and forEachByteDesc0(...) methods
- Make use of RecyclableArrayList in nioBuffers(int, int) (in line with
FasterCompositeByteBuf impl)
- Modify addComponents0(boolean,int,Iterable) to use the Iterable
directly rather than copy to an array first (and possibly to an
ArrayList before that)
- Optimize addComponents0(boolean,int,ByteBuf[],int) to not perform
repeated array insertions and avoid second loop for offset updates
- Simplify other logic in various places, in particular the general
pattern used where a sub-range is iterated over
- Add benchmarks to demonstrate some improvements

While refactoring I also came across a couple of clear bugs. They are
fixed in these changes but I will open another PR with unit tests and
fixes to the current version.

Result:

Much faster creation, manipulation, and access; many fewer allocations
and smaller footprint. Benchmark results to follow.
2018-11-03 10:37:07 +01:00
Nick Hill 44cca1a26f Avoid allocations when wrapping byte[] and ByteBuffer arrays as ByteBuf (#8420)
Motivation:

Unpooled.wrap(byte[]...) and Unpooled.wrap(ByteBuffer...) currently
allocate/copy an intermediate ByteBuf ArrayList and array, which can be
avoided.

Modifications:

- Define new internal ByteWrapper interface and add a CompositeByteBuf
constructor which takes a ByteWrapper with an array of the type that it
wraps, and modify the appropriate Unpooled.wrap(...) methods to take
advantage of it
- Tidy up other constructors in CompositeByteBuf to remove duplication
and misleading len arg (which is really an end offset into provided
array)

Result:

Less allocation/copying when wrapping byte[] and ByteBuffer arrays,
tidier code.
2018-10-30 19:35:39 +01:00
Nick Hill 48c45cf4ac Fix leak and corruption bugs in CompositeByteBuf (#8438)
Motivation:

I came across two bugs:
- Components removed due to capacity reduction aren't released
- Offsets aren't set correctly on empty components that are added
between existing components

Modifications:

Add unit tests which expose these bugs, fix them.

Result:

Bugs are fixed
2018-10-28 10:28:18 +01:00
Nick Hill d7fa7be67f Exploit PlatformDependent.allocateUninitializedArray() in more places (#8393)
Motivation:

There are currently many more places where this could be used which were
possibly not considered when the method was added.

If https://github.com/netty/netty/pull/8388 is included in its current
form, a number of these places could additionally make use of the same
BYTE_ARRAYS threadlocal.

There's also a couple of adjacent places where an optimistically-pooled
heap buffer is used for temp byte storage which could use the
threadlocal too in preference to allocating a temp heap bytebuf wrapper.
For example
https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/ByteBufUtil.java#L1417.

Modifications:

Replace new byte[] with PlatformDependent.allocateUninitializedArray()
where appropriate; make use of ByteBufUtil.getBytes() in some places
which currently perform the equivalent logic, including avoiding copy of
backing array if possible (although would be rare).

Result:

Further potential speed-up with java9+ and appropriate compile flags.
Many of these places could be on latency-sensitive code paths.
2018-10-27 10:43:28 -05:00
Nick Hill 583d838f7c Optimize AbstractByteBuf.getCharSequence() in US_ASCII case (#8392)
* Optimize AbstractByteBuf.getCharSequence() in US_ASCII case

Motivation:

Inspired by https://github.com/netty/netty/pull/8388, I noticed this
simple optimization to avoid char[] allocation (also suggested in a TODO
here).

Modifications:

Return an AsciiString from AbstractByteBuf.getCharSequence() if
requested charset is US_ASCII or ISO_8859_1 (latter thanks to
@Scottmitch's suggestion). Also tweak unit tests not to require Strings
and include a new benchmark to demonstrate the speedup.

Result:

Speed-up of AbstractByteBuf.getCharSequence() in ascii and iso 8859/1
cases
2018-10-26 15:32:38 -07:00
Norman Maurer 87ec2f882a
Reduce overhead by ByteBufUtil.decodeString(...) which is used by `AbstractByteBuf.toString(...)` and `AbstractByteBuf.getCharSequence(...)` (#8388)
Motivation:

Our current implementation that is used for toString(Charset) operations on AbstractByteBuf implementation is quite slow as it does a lot of uncessary memory copies. We should just use new String(...) as it has a lot of optimizations to handle these cases.

Modifications:

Rewrite ByteBufUtil.decodeString(...) to use new String(...)

Result:

Less overhead for toString(Charset) operations.

Benchmark                                         (charsetName)  (direct)  (size)   Mode  Cnt         Score         Error  Units
ByteBufUtilDecodeStringBenchmark.decodeString          US-ASCII     false       8  thrpt   20  22401645.093 ? 4671452.479  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString          US-ASCII     false      64  thrpt   20  23678483.384 ? 3749164.446  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString          US-ASCII      true       8  thrpt   20  15731142.651 ? 3782931.591  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString          US-ASCII      true      64  thrpt   20  16244232.229 ? 1886259.658  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString             UTF-8     false       8  thrpt   20  25983680.959 ? 5045782.289  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString             UTF-8     false      64  thrpt   20  26235589.339 ? 2867004.950  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString             UTF-8      true       8  thrpt   20  18499027.808 ? 4784684.268  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString             UTF-8      true      64  thrpt   20  16825286.141 ? 1008712.342  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString            UTF-16     false       8  thrpt   20   5789879.092 ? 1201786.359  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString            UTF-16     false      64  thrpt   20   2173243.225 ?  417809.341  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString            UTF-16      true       8  thrpt   20   5035583.011 ? 1001978.854  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString            UTF-16      true      64  thrpt   20   2162345.301 ?  402410.408  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString        ISO-8859-1     false       8  thrpt   20  30039052.376 ? 6539111.622  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString        ISO-8859-1     false      64  thrpt   20  31414163.515 ? 2096710.526  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString        ISO-8859-1      true       8  thrpt   20  19538587.855 ? 4639115.572  ops/s
ByteBufUtilDecodeStringBenchmark.decodeString        ISO-8859-1      true      64  thrpt   20  19467839.722 ? 1672687.213  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld       US-ASCII     false       8  thrpt   20  10787326.745 ? 1034197.864  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld       US-ASCII     false      64  thrpt   20   7129801.930 ? 1363019.209  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld       US-ASCII      true       8  thrpt   20   9002529.605 ? 2017642.445  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld       US-ASCII      true      64  thrpt   20   3860192.352 ?  826218.738  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld          UTF-8     false       8  thrpt   20  10532838.027 ? 2151743.968  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld          UTF-8     false      64  thrpt   20   7185554.597 ? 1387685.785  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld          UTF-8      true       8  thrpt   20   7352253.316 ? 1333823.850  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld          UTF-8      true      64  thrpt   20   2825578.707 ?  349701.156  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld         UTF-16     false       8  thrpt   20   7277446.665 ? 1447034.346  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld         UTF-16     false      64  thrpt   20   2445929.579 ?  562816.641  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld         UTF-16      true       8  thrpt   20   6201174.401 ? 1236137.786  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld         UTF-16      true      64  thrpt   20   2310674.973 ?  525587.959  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld     ISO-8859-1     false       8  thrpt   20  11142625.392 ? 1680556.468  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld     ISO-8859-1     false      64  thrpt   20   8127116.405 ? 1128513.860  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld     ISO-8859-1      true       8  thrpt   20   9405751.952 ? 2193324.806  ops/s
ByteBufUtilDecodeStringBenchmark.decodeStringOld     ISO-8859-1      true      64  thrpt   20   3943282.076 ?  737798.070  ops/s

Benchmark result is saved to /home/norman/mainframer/netty/microbench/target/reports/performance/ByteBufUtilDecodeStringBenchmark.json
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1,030.173 sec - in io.netty.buffer.ByteBufUtilDecodeStringBenchmark
[1030.460s][info   ][gc,heap,exit ] Heap
[1030.460s][info   ][gc,heap,exit ]  garbage-first heap   total 516096K, used 257918K [0x0000000609a00000, 0x0000000800000000)
[1030.460s][info   ][gc,heap,exit ]   region size 2048K, 127 young (260096K), 2 survivors (4096K)
[1030.460s][info   ][gc,heap,exit ]  Metaspace       used 17123K, capacity 17438K, committed 17792K, reserved 1064960K
[1030.460s][info   ][gc,heap,exit ]   class space    used 1709K, capacity 1827K, committed 1920K, reserved 1048576K
2018-10-19 14:00:13 +02:00
Norman Maurer 69545aedc4
CompositeByteBuf.decompose(...) does not correctly slice content. (#8403)
Motivation:

CompositeByteBuf.decompose(...) did not correctly slice the content and so produced an incorrect representation of the data.

Modifications:

- Rewrote implementation to fix bug and also improved it to reduce GC
- Add unit tests.

Result:

Fixes https://github.com/netty/netty/issues/8400.
2018-10-19 08:05:22 +02:00
Nick Hill 7062ceedb0 Simplify ByteBufInputStream.readLine() logic (#8380)
Motivation:

While looking at the nice optimization done in
https://github.com/netty/netty/pull/8347 I couldn't help noticing the
logic could be simplified further. Apologies if this is just my OCD and
inappropriate!

Modifications:

Reduce amount of code used for ByteBufInputStream.readLine()

Result:

Slightly smaller and simpler code
2018-10-13 06:24:40 +02:00
Francesco Nigro 83dc3b503e ByteBufInputStream is always allocating a StringBuilder instance (#8347)
Motivation:

Avoid creating any StringBuilder instance if
ByteBufInputStream::readLine isn't used

Modifications:

The StringBuilder instance is lazy allocated on demand and
are added new test case branches to address the increased
complexity of ByteBufInputStream::readLine

Result:

Reduced GC activity if ByteBufInputStream::readLine isn't used
2018-10-11 14:56:29 +08:00
Dmitriy Dumanskiy 6cebb6069b remove unnecessary vararg argument in PooledByteBufAllocator (#8338)
Motivation:

No need in varargs, the method always accepts array.

Modification:

... replaced with []
2018-10-05 19:06:44 +08:00
Norman Maurer c14efd952d
Directly init refCnt to 1 (#8274)
Motivation:

We should just directly init the refCnt to 1 and not use the AtomicIntegerFieldUpdater.

Modifications:

Just assing directly to 1.

Result:

Cleaner code and possible a bit faster as the JVM / JIT may be able to optimize the first store easily.
2018-09-07 19:04:19 +02:00
Norman Maurer e542a2cf26
Use a non-volatile read for ensureAccessible() whenever possible to reduce overhead and allow better inlining. (#8266)
Motiviation:

At the moment whenever ensureAccessible() is called in our ByteBuf implementations (which is basically on each operation) we will do a volatile read. That per-se is not such a bad thing but the problem here is that it will also reduce the the optimizations that the compiler / jit can do. For example as these are volatile it can not eliminate multiple loads of it when inline the methods of ByteBuf which happens quite frequently because most of them a quite small and very hot. That is especially true for all the methods that act on primitives.

It gets even worse as people often call a lot of these after each other in the same method or even use method chaining here.

The idea of the change is basically just ue a non-volatile read for the ensureAccessible() check as its a best-effort implementation to detect acting on already released buffers anyway as even with a volatile read it could happen that the user will release it in another thread before we actual access the buffer after the reference check.

Modifications:

- Try to do a non-volatile read using sun.misc.Unsafe if we can use it.
- Add a benchmark

Result:

Big performance win when multiple ByteBuf methods are called from a method.

With the change:
UnsafeByteBufBenchmark.setGetLongUnsafeByteBuf  thrpt   20  281395842,128 ± 5050792,296  ops/s

Before the change:
UnsafeByteBufBenchmark.setGetLongUnsafeByteBuf  thrpt   20  217419832,801 ± 5080579,030  ops/s
2018-09-07 07:47:02 +02:00
Francesco Nigro c78be33443 Added configurable ByteBuf bounds checking (#7521)
Motivation:

The JVM isn't always able to hoist out/reduce bounds checking (due to ref counting operations etc etc) hence making it configurable could improve performances for most CPU intensive use cases.

Modifications:

Each AbstractByteBuf bounds check has been tested against a new static final configuration property similar to checkAccessible ie io.netty.buffer.bytebuf.checkBounds.

Result:

Any user could disable ByteBuf bounds checking in order to get extra performances.
2018-09-03 20:33:47 +02:00
Norman Maurer 54f565ac67
Allow to use native transports when sun.misc.Unsafe is not present on… (#8231)
* Allow to use native transports when sun.misc.Unsafe is not present on the system

Motivation:

We should be able to use the native transports (epoll / kqueue) even when sun.misc.Unsafe is not present on the system. This is especially important as Java11 will be released soon and does not allow access to it by default.

Modifications:

- Correctly disable usage of sun.misc.Unsafe when -PnoUnsafe is used while running the build
- Correctly increment metric when UnpooledDirectByteBuf is allocated. This was uncovered once -PnoUnsafe usage was fixed.
- Implement fallbacks in all our native transport code for when sun.misc.Unsafe is not present.

Result:

Fixes https://github.com/netty/netty/issues/8229.
2018-08-29 19:36:33 +02:00
vincent-grosbois 0bea8ecf5d CompositeByteBuf nioBuffer doesn't always alloc (#8176)
In nioBuffer(int,int) in CompositeByteBuf , we create a sub-array of nioBuffers for the components that are in range, then concatenate all the components in range into a single bigger buffer.
However, if the call to nioBuffers() returned only one sub-buffer, then we are copying it to a newly-allocated buffer "merged" for no reason.

Motivation:

Profiler for Spark shows a lot of time spent in put() method inside nioBuffer(), while usually no copy of data is required.

Modification:
This change skips this last step and just returns a duplicate of the single buffer returned by the call to nioBuffers(), which will in most implementation not copy the data

Result:
No copy when the source is only 1 buffer
2018-08-07 11:31:24 +02:00
Norman Maurer 9b08dbca00
Leak detection combined with composite buffers results in incorrectly handled writerIndex when calling ByteBufUtil.writeAscii/writeUtf8 (#8153)
Motivation:

We need to add special handling for WrappedCompositeByteBuf as these also extend AbstractByteBuf, otherwise we will not correctly adjust / read the writerIndex during processing.

Modifications:

- Add instanceof checks for WrappedCompositeByteBuf as well.
- Add testcases

Result:

Fixes https://github.com/netty/netty/issues/8152.
2018-07-27 01:56:09 +08:00
Norman Maurer 6afab517b0
Guard against calling PoolThreadCache.free() multiple times. (#8108)
Motivation:

5b1fe611a6 introduced the usage of a finalizer as last resort for PoolThreadCache. As we may call free() from the FastThreadLocal.onRemoval(...) and finalize() we need to guard against multiple calls as otherwise we will corrupt internal state (that is used for metrics).

Modifications:

Use AtomicBoolean to guard against multiple calls of PoolThreadCache.free().

Result:

No more corruption of internal state caused by calling PoolThreadCache.free() multuple times.
2018-07-09 15:58:12 -04:00
Nick Hill fef462c043 Deprecate Unpooled.unmodifiableBuffer(ByteBuf...) (#8096)
Motivation:

Recent PR https://github.com/netty/netty/pull/8040 introduced
Unpooled.wrappedUnmodifiableBuffer(ByteBuf...) which has the same
behaviour but wraps the provided array directly. This is preferred for
most uses (including varargs-based use) and if there are any unusual
cases of an explicit array which is re-used before the ByteBuf is
finished with, it can just be copied first.

Modifications:

Added @Deprecated annotation and javadoc to
Unpooled.unmodifiableBuffer(ByteBuf...).

Result:

Unpooled.unmodifiableBuffer(ByteBuf...) will be deprecated.
2018-07-07 14:45:27 -04:00
Norman Maurer 83710cb2e1
Replace toArray(new T[size]) with toArray(new T[0]) to eliminate zero-out and allow the VM to optimize. (#8075)
Motivation:

Using toArray(new T[0]) is usually the faster aproach these days. We should use it.

See also https://shipilev.net/blog/2016/arrays-wisdom-ancients/#_conclusion.

Modifications:

Replace toArray(new T[size]) with toArray(new T[0]).

Result:

Faster code.
2018-06-29 07:56:04 +02:00
Norman Maurer 5b1fe611a6
Remove usage of ObjectCleaner (#8064)
Motivation:

ObjectCleaner does start a Thread to handle the cleaning of resources which leaks into the users application. We should not use it in netty itself to make things more predictable.

Modifications:

- Remove usage of ObjectCleaner and use finalize as a replacement when possible.
- Clarify javadocs for FastThreadLocal.onRemoval(...) to ensure its clear that remove() is not guaranteed to be called when the Thread completees and so this method is not enough to guarantee cleanup for this case.

Result:

Fixes https://github.com/netty/netty/issues/8017.
2018-06-28 08:15:27 +02:00
nickhill f164759ea3 Support composite buffer creation without array alloc and copy
Motivation:

Unpooled.unmodifiableBuffer() is currently used to efficiently write
arrays of ByteBufs via FixedCompositeByteBuf, but involves an allocation
and content-copy of the provided ByteBuf array which in many (most?)
cases shouldn't be necessary.

Modifications:

Modify the internal FixedCompositeByteBuf class to support wrapping the
provided ByteBuf array directly. Control this behaviour with a
constructor flag and expose the "unsafe" version via a new
Unpooled.wrappedUnmodifiableBuffer(ByteBuf...) method.

Result:

Less garbage on IO paths. I would guess pretty much all existing usage
of unmodifiableBuffer() could use the copy-free version but assume it's
not safe to change its default behaviour.
2018-06-27 07:40:14 +02:00
nickhill 9b95b8ee62 Reduce array allocations during CompositeByteBuf construction
Motivation:

Eliminate avoidable backing array reallocations when constructing
composite ByteBufs from existing buffer arrays/Iterables. This also
applies to the Unpooled.wrappedBuffer(...) methods.

Modifications:

Ensure the initial components ComponentList is sized at least as large
as the provided buffer array/Iterable in the CompositeByteBuffer
constructors.

In single-arg Unpooled.wrappedBuffer(...) methods, set maxNumComponents
to the count of provided buffers, rather than a fixed default of 16. It
seems likely that most usage of these involves wrapping a list without
subsequent modification, particularly since they return a ByteBuf rather
than CompositeByteBuf. If a different/larger max is required there are
already the wrappedBuffer(int, ...) variants.

In fact the current behaviour could be considered inconsistent - if you
call Unpooled.wrappedBuffer(int, ByteBuf) with a single buffer, you
might expect to subsequently be able to add buffers to it (since you
specified a max related to consolidation), but it will in fact return
just a slice of the provided ByteBuf.

Result:

Fewer and smaller allocations in some cases when using CompositeByteBufs
or Unpooled.wrappedBuffer(...).
2018-06-20 16:09:23 +02:00
Tim Brooks 35215309b9 Make UnpooledHeapByteBuf array methods protected (#8015)
Motivation:

Currently there is not a clear way to provide a byte array to a netty
ByteBuf and be informed when it is released. This is a would be a
valuable addition for projects that integrate with netty but also pool
their own byte arrays.

Modification:

Modified the UnpooledHeapByteBuf class so that the freeArray method is
protected visibility instead of default. This will allow a user to
subclass the UnpooledHeapByteBuf, provide a byte array, and override
freeArray to return the byte array to a pool when it is called.
Additionally this makes this implementation equivalent to
UnpooledDirectByteBuf (freeDirect is protected).

Additionally allocateArray is also made protect to provide another override
option for subclasses.

Result:

Users can override UnpooledHeapByteBuf#freeArray and
UnpooledHeapByteBuf#allocateArray.
2018-06-13 11:43:31 -07:00
zekaryu 09d9daf1c4 Update the comment of io.netty.buffer.PoolChunk.java (#7934)
Motivation:

When I read the source code, I found that the comment of PoolChunk is out of date, it may confuses readers with the description about memoryMap.

Modifications:

update the last passage of the comment of the PoolChunk class.

Result:
No change to any source code , just update comment.
2018-05-14 15:44:32 +02:00
Xiaoyan Lin a0ed6ec06c Fix the error message in ReferenceCounted.release (#7921)
Motivation:

When a buffer is over-released, the current error message of `IllegalReferenceCountException` is `refCnt: XXX, increment: XXX`, which is confusing. The correct message should be `refCnt: XXX, decrement: XXX`.

Modifications:

Pass `-decrement` to create `IllegalReferenceCountException`.

Result:

The error message will be `refCnt: XXX, decrement: XXX` when a buffer is over-released.
2018-05-08 20:09:16 +02:00
Rikki Gibson 1b1f7677ac Implement isWritable and ensureWritable on ReadOnlyByteBufferBuf (#7883)
Motivation:

It should be possible to write a ReadOnlyByteBufferBuf to a channel without errors. However, ReadOnlyByteBufferBuf does not override isWritable and ensureWritable, which can cause some handlers to mistakenly assume they can write to the ReadOnlyByteBufferBuf, resulting in ReadOnlyBufferException.

Modification:

Added isWritable and ensureWritable method overrides on ReadOnlyByteBufferBuf to indicate that it is never writable. Added tests for these methods.

Result:

Can successfully write ReadOnlyByteBufferBuf to a channel with an SslHandler (or any other handler which may attempt to write to the ByteBuf it receives).
2018-04-23 08:31:30 +02:00
Nikolay Fedorovskikh 81a7d1413b Makes `EmptyByteBuf#hashCode` and `AbstractByteBuf#hashCode` consistent (#7870)
Motivation:
The `AbstractByteBuf#equals` method doesn't take into account the
class of buffer instance. So the two buffers with different classes
must have the same `hashCode` values if `equals` method returns `true`.
But `EmptyByteBuf#hashCode` is not consistent with `#hashCode`
of the empty `AbstractByteBuf`, that is violates the contract and
can lead to errors.

Modifications:
Return `1` in `EmptyByteBuf#hashCode`.

Result:
Consistent behavior of `EmptyByteBuf#hashCode` and `AbstractByteBuf#hashCode`.
2018-04-16 12:11:42 +02:00
Nikolay Fedorovskikh f8ff834f03 Checks accessibility in the #slice and #duplicate methods of ByteBuf (#7846)
Motivation:
The `ByteBuf#slice` and `ByteBuf#duplicate` methods should check
an accessibility to prevent creation slice or duplicate
of released buffer. At now this works not in the all scenarios.

Modifications:
Add missed checks.

Result:
More correct and consistent behavior of `ByteBuf` methods.
2018-04-10 10:41:50 +02:00
Nikolay Fedorovskikh a95fd91bc6 Don't check accessible in the #capacity method (#7830)
Motivation:
The `#ensureAccessible` method in `UnpooledHeapByteBuf#capacity` used
to prevent NPE if buffer is released and `array` is `null`. In all
other implementations of `ByteBuf` the accessible is not checked by
`capacity` method. We can assign an empty array to `array`
in the `deallocate` and don't worry about NPE in the `#capacity`.
This will help reduce the number of repeated calls of the
`#ensureAccessible` in many operations with `UnpooledHeapByteBuf`.

Modifications:
1. Remove `#ensureAccessible` call from `UnpooledHeapByteBuf#capacity`.
Use the `EmptyArrays#EMPTY_BYTES` instead of `null` in `#deallocate`.

2. Fix access checks in `AbstractUnsafeSwappedByteBuf` and
`AbstractByteBuf#slice` that relied on `#ensureAccessible`
in `UnpooledHeapByteBuf#capacity`. This was found by unit tests.

Result:
Less double calls of `#ensureAccessible` for `UnpooledHeapByteBuf`.
2018-04-03 21:35:02 +02:00
Norman Maurer 965734a1eb
Limit the number of bytes to use to copy the content of a direct buffer to an Outputstream (#7813)
Motivation:

Currently copying a direct ByteBuf copies it fully into the heap before writing it to an output stream.
The can result in huge memory usage on the heap.

Modification:

copy the bytebuf contents via an 8k buffer into the output stream

Result:

Fixes #7804
2018-03-29 12:49:27 +02:00
Norman Maurer bd772d127e FixedCompositeByteBuf should allow to access memoryAddress / array when wrap a single buffer.
Motivation:

We should allow to access the memoryAddress / array of the FixedCompositeByteBuf when it only wraps a single ByteBuf. We do the same for CompositeByteBuf.

Modifications:

- Check how many buffers FixedCompositeByteBuf wraps and depending on it delegate the access to the memoryAddress / array
- Add unit tests.

Result:

Fixes [#7752].
2018-03-13 08:50:42 +01:00
kakashiio 12ccd40c5a Correctly throw IndexOutOfBoundsException when writerIndex < readerIndex
Motivation:

If someone invoke writeByte(), markWriterIndex(), readByte() in order first, and then invoke resetWriterIndex() should be throw a IndexOutOfBoundsException to obey the rule that the buffer declared "0 <= readerIndex <= writerIndex <= capacity".

Modification:

Changed the code writerIndex = markedWriterIndex; into writerIndex(markedWriterIndex); to make the check affect

Result:
Throw IndexOutOfBoundsException if any invalid happened in resetWriterIndex.
2018-03-02 10:05:33 +09:00
Francesco Nigro ed46c4ed00 Copies from read-only heap ByteBuffer to direct ByteBuf can avoid stealth ByteBuf allocation and additional copies
Motivation:

Read-only heap ByteBuffer doesn't expose array: the existent method to perform copies to direct ByteBuf involves the creation of a (maybe pooled) additional heap ByteBuf instance and copy

Modifications:

To avoid stressing the allocator with additional (and stealth) heap ByteBuf allocations is provided a method to perform copies using the (pooled) internal NIO buffer

Result:

Copies from read-only heap ByteBuffer to direct ByteBuf won't create any intermediate ByteBuf
2018-02-27 09:54:21 +09:00
Francesco Nigro bc8e022601 Added exact utf8 length estimator and exposed writeUtf8 with custom space reservation on destination buffer
Motivation:

To avoid eager allocation of the destination and to perform length prefixed encoding of UTF-8 string with forward only access pattern

Modifications:

The original writeUtf8 is modified by allowing customization of the reserved bytes on the destination buffer and is introduced an exact UTF-8 length estimator.

Result:

Is now possible to perform length first encoding with UTF-8 well-formed char sequences following a forward only write access pattern on the destination buffer.
2018-02-16 11:52:35 +01:00
Scott Mitchell 108fbe5282
ByteBufUtil to not pool direct memory by default
Motivation:
ByteBufUtil by default will cache DirectByteBuffer objects, and the
associated direct memory (up to 64k). In combination with the Recycler which may
cache up to 32k elements per thread may lead to a large amount of direct
memory being retained per EventLoop thread. As traffic spikes come this
may be perceived as a memory leak because the memory in the Recycler
will never be reclaimed.

Modifications:
- By default we shouldn't cache DirectByteBuffer objects.

Result:
Less direct memory consumption due to caching DirectByteBuffer objects.
2018-02-12 10:49:17 -08:00
Norman Maurer fbbaf2bd7e Cleanup buffer tests.
Motivation:

There is some cleanup that can be done.

Modifications:

- Use intializer list expression where possible
- Remove unused imports.

Result:

Cleaner code.
2018-02-02 07:33:46 +01:00
Norman Maurer 011841e454 ReadOnlyUnsafeDirectByteBuf.memoryAddress() should not throw
Motivation:

We need the memoryAddress of a direct buffer when using our native transports. For this reason ReadOnlyUnsafeDirectByteBuf.memoryAddress() should not throw.

Modifications:

- Correctly override ReadOnlyUnsafeDirectByteBuf.memoryAddress() and hasMemoryAddress()
- Add test case

Result:

Fixes [#7672].
2018-02-02 07:27:26 +01:00
Norman Maurer 95b9b0af5c Increase timeout and decrement number of operations in AbstractByteBufTest.testToStringMultipleThreads
Motivation:

We saw some timeouts on the CI when the leak detection is enabled.

Modifications:

- Use smaller number of operations in test
- Increase timeout

Result:

CI not times out.
2018-01-31 14:57:38 +01:00
Norman Maurer d2bd36fc4c ByteBufUtil.isText method should be safe to be called concurrently
Motivation:

ByteBufUtil.isText(...) may produce unexpected results if called concurrently on the same ByteBuffer.

Modifications:

- Don't use internalNioBuffer where it is not safe.
- Add unit test.

Result:

ByteBufUtil.isText is thread-safe.
2018-01-31 13:47:49 +01:00
Scott Mitchell 4921f62c8a
HttpResponseStatus object allocation reduction
Motivation:
Usages of HttpResponseStatus may result in more object allocation then necessary due to not looking for cached objects and the AsciiString parsing method not being used due to CharSequence method being used instead.

Modifications:
- HttpResponseDecoder should attempt to get the HttpResponseStatus from cache instead of allocating a new object
- HttpResponseStatus#parseLine(CharSequence) should check if the type is AsciiString and redirect to the AsciiString parsing method which may not require an additional toString call
- HttpResponseStatus#parseLine(AsciiString) can be optimized and doesn't require and may not require object allocation

Result:
Less allocations when dealing with HttpResponseStatus.
2018-01-24 22:01:52 -08:00
Norman Maurer 336bea9dc5 Fix ByteBuf.nioBuffer(...) and nioBuffers(...) docs to reflect reality.
Motivation:

Depending on the implementation of ByteBuf nioBuffer(...) and nioBuffers(...) may either share the content or return a ByteBuffer that contains a copy of the content.

Modifications:

Fix javadocs.

Result:

Correct docs.
2018-01-22 19:50:08 +01:00
Thomas Devanneaux 3ae57cf302 ByteBuf.toString(Charset) is not thread-safe
Motivation:

Calling ByteBuf.toString(Charset) on the same buffer from multiple threads at the same time produces unexpected results, such as various exceptions and/or corrupted output. This is because ByteBufUtil.decodeString(...) is taking the source ByteBuffer for CharsetDecoder.decode() from ByteBuf.internalNioBuffer(int, int), which is not thread-safe.

Modification:

Call ByteBuf.nioBuffer() instead of ByteBuf.internalNioBuffer() to get the source buffer to pass to CharsetDecoder.decode().

Result:

Fixes the possible race condition.
2018-01-21 09:02:42 +01:00
Norman Maurer 819b870b8a Correctly take position into account when wrap a ByteBuffer via ReadOnlyUnsafeDirectByteBuf
Motivation:

We did not correctly take the position into account when wrapping a ByteBuffer via ReadOnlyUnsafeDirectByteBuf as we obtained the memory address from the original ByteBuffer and not the slice we take.

Modifications:

- Correctly use the slice to obtain memory address.
- Add test case.

Result:

Fixes [#7565].
2018-01-16 19:18:59 +01:00
Norman Maurer e329ca1cf3 Introduce ObjectCleaner and use it in FastThreadLocal to ensure FastThreadLocal.onRemoval(...) is called
Motivation:

There is no guarantee that FastThreadLocal.onRemoval(...) is called if the FastThreadLocal is used by "non" FastThreacLocalThreads. This can lead to all sort of problems, like for example memory leaks as direct memory is not correctly cleaned up etc.

Beside this we use ThreadDeathWatcher to check if we need to release buffers back to the pool when thread local caches are collected. In the past ThreadDeathWatcher was used which will need to "wakeup" every second to check if the registered Threads are still alive. If we can ensure FastThreadLocal.onRemoval(...) is called we do not need this anymore.

Modifications:

- Introduce ObjectCleaner and use it to ensure FastThreadLocal.onRemoval(...) is always called when a Thread is collected.
- Deprecate ThreadDeathWatcher
- Add unit tests.

Result:

Consistent way of cleanup FastThreadLocals when a Thread is collected.
2017-12-21 07:34:44 +01:00
Norman Maurer 1988cd041d Reduce Object allocations in CompositeByteBuf.
Motivation:

We used subList in CompositeByteBuf to remove ranges of elements from the internal storage. Beside this we also used an foreach loop in a few cases which will crate an Iterator.

Modifications:

- Use our own sub-class of ArrayList which exposes removeRange(...). This allows to remove a range of elements without an extra allocation.
- Use an old style for loop to iterate over the elements to reduce object allocations.

Result:

Less allocations.
2017-12-12 09:08:58 +01:00
Norman Maurer 09a05b680d Dont use ThreadDeathWatcher to cleanup PoolThreadCache if FastThreadLocalThread with wrapped Runnable is used
Motivation:

We dont need to use the ThreadDeathWatcher if we use a FastThreadLocalThread for which we wrap the Runnable and ensure we call FastThreadLocal.removeAll() once the Runnable completes.

Modifications:

- Dont use a ThreadDeathWatcher if we are sure we will call FastThreadLocal.removeAll()
- Add unit test.

Result:

Less overhead / running theads if you only allocate / deallocate from FastThreadLocalThreads.
2017-11-28 13:43:28 +01:00
Scott Mitchell a8bb9dc180 AbstractByteBuf readSlice bound check bug
Motivation:
AbstractByteBuf#readSlice relied upon the bounds checking of the slice operation in order to detect index out of bounds conditions. However the slice bounds checking operation allows for the slice to go beyond the writer index, and this is out of bounds for a read operation.

Modifications:
- AbstractByteBuf#readSlice and AbstractByteBuf#readRetainedSlice should ensure the desired amount of bytes are readable before taking a slice

Result:
No reading of undefined data in AbstractByteBuf#readSlice and AbstractByteBuf#readRetainedSlice.
2017-11-18 09:03:42 +01:00
Norman Maurer cc069722a2 CompositeBytebuf.copy() and copy(...) should respect the allocator
Motivation:

When calling CompositeBytebuf.copy() and copy(...) we currently use Unpooled to allocate the buffer. This is not really correct and may produce more GC then needed. We should use the allocator that was used when creating the CompositeByteBuf to allocate the new buffer which may be for example the PooledByteBufAllocator.

Modifications:

- Use alloc() to allocate the new buffer.
- Add tests
- Fix tests that depend on the copy to be backed by an byte-array without checking hasArray() first.

Result:

Fixes [#7393].
2017-11-10 07:17:16 -08:00
Idel Pivnitskiy 50a067a8f7 Make methods 'static' where it possible
Motivation:

Even if it's a super micro-optimization (most JVM could optimize such
 cases in runtime), in theory (and according to some perf tests) it
 may help a bit. It also makes a code more clear and allows you to
 access such methods in the test scope directly, without instance of
 the class.

Modifications:

Add 'static' modifier for all methods, where it possible. Mostly in
test scope.

Result:

Cleaner code with proper 'static' modifiers.
2017-10-21 14:59:26 +02:00
Nikolay Fedorovskikh 17e1a26d64 Fixes a javadoc for ByteBufUtil#copy method
Motivation:
Javadoc of the `ByteBufUtil#copy(AsciiString, int, ByteBuf, int, int)` is incorrect.

Modifications:
Fix it.

Result:
The description of the `#copy` method is not misleading.
2017-10-21 14:51:20 +02:00
Nikolay Fedorovskikh 09dd6a5d4d Minor improvements in ByteBufOutputStream
Motivation:
In the `ByteBufOutputStream` we can use an appropriate methods of `ByteBuf`
to reduce calls of virtual methods and do not copying converting logic.

Modifications:
- Use an appropriate methods of `ByteBuf`
- Remove redundant conversions (int -> byte, int -> char).
- Use `ByteBuf#writeCharSequence` in the `writeBytes(String)'.

Result:
Less code duplication. A `writeBytes(String)` method is faster.
No unnecessary conversions. More consistent and cleaner code.
2017-10-21 14:44:13 +02:00
Carl Mastrangelo 16b1dbdf92 Motivation: Resource Leak Detector (RLD) tries to helpfully indicate where an object was last accessed and report the accesses in the case the object was not cleaned up. It handles lightly used objects well, but drops all but the last few accesses.
Configuring this is tough because there is split between highly shared (and accessed) objects and lightly accessed objects.

Modification:
There are a number of changes here.  In relative order of importance:

API / Functionality changes:
* Max records and max sample records are gone.  Only "target" records, the number of records tries to retain is exposed.
* Records are sampled based on the number of already stored records.  The likelihood of recording a new sample is `2^(-n)`, where `n` is the number of currently stored elements.
* Records are stored in a concurrent stack structure rather than a list.  This avoids a head and tail.  Since the stack is only read once, there is no need to maintain head and tail pointers
* The properties of this imply that the very first and very last access are always recorded.  When deciding to sample, the top element is replaced rather than pushed.
* Samples that happen between the first and last accesses now have a chance of being recorded.  Previously only the final few were kept.
* Sampling is no longer deterministic.  Previously, a deterministic access pattern meant that you could conceivably always miss some access points.
* Sampling has a linear ramp for low values and and exponentially backs off roughly equal to 2^n.  This means that for 1,000,000 accesses, about 20 will actually be kept.  I have an elegant proof for this which is too large to fit in this commit message.

Code changes:
* All locks are gone.  Because sampling rarely needs to do a write, there is almost 0 contention.  The dropped records counter is slightly contentious, but this could be removed or changed to a LongAdder.  This was not done because of memory concerns.
* Stack trace exclusion is done outside of RLD.  Classes can opt to remove some of their methods.
* Stack trace exclusion is faster, since it uses String.equals, often getting a pointer compare due to interning.  Previously it used contains()
* Leak printing is outputted fairly differently.  I tried to preserve as much of the original formatting as possible, but some things didn't make sense to keep.

Result:
More useful leak reporting.

Faster:
```
Before:
Benchmark                                           (recordTimes)   Mode  Cnt       Score      Error  Units
ResourceLeakDetectorRecordBenchmark.record                      8  thrpt   20  136293.404 ± 7669.454  ops/s
ResourceLeakDetectorRecordBenchmark.record                     16  thrpt   20   72805.720 ± 3710.864  ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint              8  thrpt   20  139131.215 ± 4882.751  ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint             16  thrpt   20   74146.313 ± 4999.246  ops/s

After:
Benchmark                                           (recordTimes)   Mode  Cnt       Score      Error  Units
ResourceLeakDetectorRecordBenchmark.record                      8  thrpt   20  155281.969 ± 5301.399  ops/s
ResourceLeakDetectorRecordBenchmark.record                     16  thrpt   20   77866.239 ± 3821.054  ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint              8  thrpt   20  153360.036 ± 8611.353  ops/s
ResourceLeakDetectorRecordBenchmark.recordWithHint             16  thrpt   20   78670.804 ± 2399.149  ops/s
```
2017-10-19 12:21:21 -07:00
Carl Mastrangelo 83a19d5650 Optimistically update ref counts
Motivation:
Highly retained and released objects have contention on their ref
count.  Currently, the ref count is updated using compareAndSet
with care to make sure the count doesn't overflow, double free, or
revive the object.

Profiling has shown that a non trivial (~1%) of CPU time on gRPC
latency benchmarks is from the ref count updating.

Modification:
Rather than pessimistically assuming the ref count will be invalid,
optimistically update it assuming it will be.  If the update was
wrong, then use the slow path to revert the change and throw an
execption.  Most of the time, the ref counts are correct.

This changes from using compareAndSet to getAndAdd, which emits a
different CPU instruction on x86 (CMPXCHG to XADD).  Because the
CPU knows it will modifiy the memory, it can avoid contention.

On a highly contended machine, this can be about 2x faster.

There is a downside to the new approach.  The ref counters can
temporarily enter invalid states if over retained or over released.
The code does handle these overflow and underflow scenarios, but it
is possible that another concurrent access may push the failure to
a different location.  For example:

Time 1 Thread 1: obj.retain(INT_MAX - 1)
Time 2 Thread 1: obj.retain(2)
Time 2 Thread 2: obj.retain(1)

Previously Thread 2 would always succeed and Thread 1 would always
fail on the second access.  Now, thread 2 could fail while thread 1
is rolling back its change.

====

There are a few reasons why I think this is okay:

1. Buggy code is going to have bugs.  An exception _is_ going to be
   thrown.  This just causes the other threads to notice the state
   is messed up and stop early.
2. If high retention counts are a use case, then ref count should
   be a long rather than an int.
3. The critical section is greatly reduced compared to the previous
   version, so the likelihood of this happening is lower
4. On error, the code always rollsback the change atomically, so
   there is no possibility of corruption.

Result:
Faster refcounting

```
BEFORE:

Benchmark                                                                                             (delay)    Mode      Cnt         Score    Error  Units
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                            1  sample  2901361       804.579 ±  1.835  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                           10  sample  3038729       785.376 ± 16.471  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                          100  sample  2899401       817.392 ±  6.668  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                         1000  sample  3650566      2077.700 ±  0.600  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                        10000  sample  3005467     19949.334 ±  4.243  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                          1  sample   456091        48.610 ±  1.162  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                         10  sample   732051        62.599 ±  0.815  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                        100  sample   778925       228.629 ±  1.205  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                       1000  sample   633682      2002.987 ±  2.856  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                      10000  sample   506442     19735.345 ± 12.312  ns/op

AFTER:
Benchmark                                                                                             (delay)    Mode      Cnt         Score    Error  Units
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                            1  sample  3761980       383.436 ±  1.315  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                           10  sample  3667304       474.429 ±  1.101  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                          100  sample  3039374       479.267 ±  0.435  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                         1000  sample  3709210      2044.603 ±  0.989  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_contended                                        10000  sample  3011591     19904.227 ± 18.025  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                          1  sample   494975        52.269 ±  8.345  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                         10  sample   771094        62.290 ±  0.795  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                        100  sample   763230       235.044 ±  1.552  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                       1000  sample   634037      2006.578 ±  3.574  ns/op
AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended                                      10000  sample   506284     19742.605 ± 13.729  ns/op

```
2017-10-04 08:42:33 +02:00
回眸,境界 06da0ceb64 Fix typo in comment. 2017-10-02 08:16:22 +02:00
Carl Mastrangelo 003c8cc7ab Expose all defaults on PooledByteBufAllocator
Motivation:
Most, but not all defaults are statically exposed on
PooledByteBufAllocator.  This makes it cumbersome to make a custom
allocator where most of the defaults remain the same.

Modification:
Expose useCacheForAllThreads, and Direct preferred.  The latter is
needed because it is under the internal package, and public code
should probably not depend on it.

Result:
More customizeable allocators
2017-09-20 21:48:53 -07:00
Norman Maurer 9d56439aa1 Make UnpooledDirectByteBuf, UnpooledHeapByteBuf and UnpooledUnsafeDirectByteBuf constructors public.
Motivation:

The constrcutors a protected atm but the classes are public. We should make the constructors public as well to make it easier to write your own ByteBufAllocator.

Modifications:

Change constructors to be public and add some javadocs.

Result:

Easier to create own ByteBufAllocator.
2017-09-18 21:42:46 -07:00
Carl Mastrangelo d2cb51bc2e Enable PooledByteBufAllocator to work, event without a cache
Motivation:
`useCacheForAllThreads` may be false which disables memory caching
on non netty threads.  Setting this argument or the system property
makes it impossible to use `PooledByteBufAllocator`.

Modifications:

Delayed the check of `freeSweepAllocationThreshold` in
`PoolThreadCache` to after it knows there will be any caches in
use.  Additionally, check if the caches will have any data in them
(rather than allocating a 0-length array).

A test case is also added that fails without this change.

Results:

Fixes #7194
2017-09-08 10:20:45 +02:00
Norman Maurer bca35b0449 Allow to construct UnpooledByteBufAllocator that explictly always use sun.misc.Cleaner
Motivation:

When the user want to have the direct memory explicitly managed by the GC (just as java.nio does) it is useful to be able to construct an UnpooledByteBufAllocator that allows this without the chances to see any memory leak.

Modifications:

Allow to explicitly disable the usage of reflection to construct direct ByteBufs and so be sure these will be collected by GC.

Result:

More flexible way to use the UnpooledByteBufAllocator.
2017-08-31 12:57:09 +02:00
Carl Mastrangelo 7528e5a11e Use threadsafe setter on Atomic Updaters
Motivation:
The documentation for field updates says:

> Note that the guarantees of the {@code compareAndSet}
> method in this class are weaker than in other atomic classes.
> Because this class cannot ensure that all uses of the field
> are appropriate for purposes of atomic access, it can
> guarantee atomicity only with respect to other invocations of
> {@code compareAndSet} and {@code set} on the same updater.

This implies that volatiles shouldn't use normal assignment; the
updater should set them.

Modifications:
Use setter for field updaters that make use of compareAndSet.

Result:
Concurrency compliant code
2017-08-31 10:14:40 +02:00
Norman Maurer e487db7836 Use the ByteBufAllocator when copy a ReadOnlyByteBufferBuf and so also be able to release it without the GC when the Cleaner is present.
Motivation:

In ReadOnlyByteBufferBuf.copy(...) we just allocated a ByteBuffer directly and wrapped it. This way it was not possible for us to free the direct memory that was used by the copy without the GC.

Modifications:

- Ensure we use the allocator when create the copy and so be able to release direct memory in a timely manner
- Add unit test
- Depending on if the to be copied buffer is direct or heap based we also allocate the same type on copy.

Result:

Fixes [#7103].
2017-08-16 07:33:10 +02:00
Nikolay Fedorovskikh 0774c91456 Support the little endian floats and doubles by ByteBuf
Motivation:
`ByteBuf` does not have the little endian variant of float/double access methods.

Modifications:
Add support for little endian floats and doubles into `ByteBuf`.

Result:
`ByteBuf` has get/read/set/writeFloatLE() and get/read/set/writeDoubleLE() methods. Fixes [#6576].
2017-08-15 06:24:28 +02:00
louyl 8be9a63c1c FIX endless loop in ByteBufUtil#writeAscii
Motivation:

Missing return in ByteBufUtil#writeAscii causes endless loop

Modifications:

Add return after write finished

Result:

ByteBufUtil#writeAscii is ok
2017-08-12 12:50:00 +02:00
Scott Mitchell 452fd36240 ByteBufs which are not resizable should not throw in ensureWritable(int,boolean)
Motivation:
ByteBuf#ensureWritable(int,boolean) returns an int indicating the status of the resize operation. For buffers that are unmodifiable or cannot be resized this method shouldn't throw but just return 1.
ByteBuf#ensureWriteable(int) should throw unmodifiable buffers.

Modifications:
- ReadOnlyByteBuf should be updated as described above.
- Add a unit test to SslHandler which verifies the read only buffer can be tolerated in the aggregation algorithm.

Result:
Fixes https://github.com/netty/netty/issues/7002.
2017-07-22 08:44:48 -07:00
Norman Maurer 06f64948d5 Add tests to ensure an IllegalReferenceCountException is thrown if set/writeCharSequence is called on a released buffer
Motivation:

We need to ensure we not allow calling set/writeCharsequence on an released ByteBuf.

Modifications:

Add test-cases

Result:

Proves fix of [#6951].
2017-07-21 07:39:32 +02:00
Norman Maurer 4af47f0ced AbstractByteBuf.setCharSequence(...) must not expand buffer
Motivation:

AbstractByteBuf.setCharSequence(...) must not expand the buffer if not enough writable space is present in the buffer to be consistent with all the other set operations.

Modifications:

- Ensure we only exand the buffer on writeCharSequence(...) but not on setCharSequence(...)
- Add unit tests.

Result:

Consistent and correct behavior.
2017-07-19 19:44:53 +02:00
Norman Maurer d125adec38 AbstractByteBuf.ensureWritable(...) should check if buffer was released
Motivation:

AbstractByteBuf.ensureWritable(...) should check if buffer was released and if so throw an IllegalReferenceCountException

Modifications:

Ensure we throw in all cases.

Result:

More consistent and correct behaviour
2017-07-19 07:34:08 +02:00
louxiu 0ad99310f5 Record release when enable detailed leak detection
Motivation:
It would be easier to find where is missing release call in several retain release calls on a ByteBuf

Modifications:
Remove final modifier on SimpleLeakAwareByteBuf and SimpleLeakAwareByteBuf release function and override it to record release in AdvancedLeakAwareByteBuf and AdvancedLeakAwareCompositeByteBuf

Result:
Release will be recorded when enable detailed leak detection
2017-07-18 09:28:56 +02:00
Scott Mitchell 86e653e04f SslHandler aggregation of plaintext data on write
Motivation:
Each call to SSL_write may introduce about ~100 bytes of overhead. The OpenSslEngine (based upon OpenSSL) is not able to do gathering writes so this means each wrap operation will incur the ~100 byte overhead. This commit attempts to increase goodput by aggregating the plaintext in chunks of <a href="https://tools.ietf.org/html/rfc5246#section-6.2">2^14</a>. If many small chunks are written this can increase goodput, decrease the amount of calls to SSL_write, and decrease overall encryption operations.

Modifications:
- Introduce SslHandlerCoalescingBufferQueue in SslHandler which will aggregate up to 2^14 chunks of plaintext by default
- Introduce SslHandler#setWrapDataSize to control how much data should be aggregated for each write. Aggregation can be disabled by setting this value to <= 0.

Result:
Better goodput when using SslHandler and the OpenSslEngine.
2017-07-10 12:22:08 -07:00
Nikolay Fedorovskikh df568c739e Use ByteBuf#writeShort/writeMedium instead of writeBytes
Motivation:

1. Some encoders used a `ByteBuf#writeBytes` to write short constant byte array (2-3 bytes). This can be replaced with more faster `ByteBuf#writeShort` or `ByteBuf#writeMedium` which do not access the memory.
2. Two chained calls of the `ByteBuf#setByte` with constants can be replaced with one `ByteBuf#setShort` to reduce index checks.
3. The signature of method `HttpHeadersEncoder#encoderHeader` has an unnecessary `throws`.

Modifications:

1. Use `ByteBuf#writeShort` or `ByteBuf#writeMedium` instead of `ByteBuf#writeBytes` for the constants.
2. Use `ByteBuf#setShort` instead of chained call of the `ByteBuf#setByte` with constants.
3. Remove an unnecessary `throws` from `HttpHeadersEncoder#encoderHeader`.

Result:

A bit faster writes constants into buffers.
2017-07-10 14:37:41 +02:00
Norman Maurer 83db2b07b4 Also use realloc when shrink the buffer.
Motivation:

We should also use realloc when shrink the buffer to eliminate extra allocations / memory copies when possible.

Modifications:

Use realloc for expanding and shrinking when possible.

Result:

Less memory copies and allocations
2017-07-06 20:03:15 +02:00
Nikolay Fedorovskikh f35047765f Avoid a double check ByteBuf#ensureWritable in ByteBufUtil
Motivation:

Methods `ByteBufUtil#writeUtf8` and `ByteBufUtil#writeAscii` contains a check `ByteBuf#ensureWritable` before the calling `ByteBuf#writeBytes`. But the `ByteBuf#writeBytes` also do a such check inside.

Modifications:

Make checks more targeted.

Result:

Less redundant method calls.
2017-06-28 19:00:01 +02:00
Nikolay Fedorovskikh ba3616da3e Apply appropriate methods for writing CharSequence into ByteBuf
Motivation:

1. `ByteBuf` contains methods to writing `CharSequence` which optimized for UTF-8 and ASCII encodings. We can also apply optimization for ISO-8859-1.
2. In many places appropriate methods are not used.

Modifications:

1. Apply optimization for ISO-8859-1 encoding in the `ByteBuf#setCharSequence` realizations.
2. Apply appropriate methods for writing `CharSequences` into buffers.

Result:

Reduce overhead from string-to-bytes conversion.
2017-06-27 07:58:39 +02:00
Nikolay Fedorovskikh 01eb428b39 Move methods for decode hex dump into StringUtil
Motivation:

PR #6811 introduced a public utility methods to decode hex dump and its parts, but they are not visible from netty-common.

Modifications:

1. Move the `decodeHexByte`, `decodeHexDump` and `decodeHexNibble` methods into `StringUtils`.
2. Apply these methods where applicable.
3. Remove similar methods from other locations (e.g. `HpackHex` test class).

Result:

Less code duplication.
2017-06-23 18:52:42 +02:00
Norman Maurer 7922757575 Allow to access memoryAddress of wrapped ByteBuf for ReadOnlyByteBuf
Motivation:

We should allow to access the memoryAddress of the wrapped ByteBuf when using ReadOnlyByteBuf for peformance reasons. If a user act on a memoryAddress its his responsible anyway to do nothing "stupid".

Modifications:

Delegate to wrapped ByteBuf.

Result:

Less performance overhead for various operations and also when writing to a native transport (which needs the memoryAddress).
2017-06-07 18:45:26 +02:00
Renjie Sun 629b83e0a5 Move QueryStringDecoder.decodeHexByte into ByteBufUtil
Motivations:
1. There are duplicated implementations of decoding hex strings. #6797
2. ByteBufUtil.HexUtil.decodeHexDump does not handle substring start
index properly and does not decode hex byte rigorously.

Modifications:
1. Function decodeHexByte is moved from QueryStringDecoder into ByteBufUtil.
2. ByteBufUtil.HexUtil.decodeHexDump is changed to use decodeHexByte.
3. Tests are Updated accordingly.

Result:
Fixed #6797 and made hex decoding functions more robust.
2017-06-07 09:27:36 -07:00
Scott Mitchell b71abcedd1 ByteBufUtil#decodeHexDump
Motivation:
ByteBufUtil provides a hexDump method. For debugging purposes it is often useful to decode that hex dump to get the original content, but no such method exists.

Modifications:
- Add ByteBufUtil#decodeHexDump

Result:
ByteBufUtil#decodeHexDump is available to make debugging easier.
2017-05-30 15:20:54 -07:00
Scott Mitchell 63f5cdb0d5 ByteBuf#ensureWritable(int, boolean) should not throw
Motivation:
The javadocs for ByteBuf#ensureWritable(int, boolean) indicate that it should not throw, and instead the return code should indicate the result of the operation. Due to a bug in AbstractByteBuf it is possible for a resize to be attempted on a buffer that may exceed maxCapacity() and therefore throw.

Modifications:
- If there is not enough space in the buffer, and force is false, then a resize should not be attempted

Result:
AbstractByteBuf#ensureWritable(int, boolean) enforces the javadoc constraints and does not throw.
2017-05-09 00:12:25 -07:00
Norman Maurer b662afeece Correctly release all buffers in UnpooledTest
Motivation:

We not correctly released all buffers in the UnpooledTest and so showed "bad" way of handling buffers to people that inspect our code to understand when a buffer needs to be released.

Modifications:

Explicit release all buffers.

Result:

Cleaner and more correct code.
2017-04-27 19:29:45 +02:00
Jason Tedor 98beb777f8 Enable configuring available processors
Motivation:

In cases when an application is running in a container or is otherwise
constrained to the number of processors that it is using, the JVM
invocation Runtime#availableProcessors will not return the constrained
value but rather the number of processors available to the virtual
machine. Netty uses this number in sizing various resources.
Additionally, some applications will constrain the number of threads
that they are using independenly of the number of processors available
on the system. Thus, applications should have a way to globally
configure the number of processors.

Modifications:

Rather than invoking Runtime#availableProcessors, Netty should rely on a
method that enables configuration when the JVM is started or by the
application. This commit exposes a new class NettyRuntime for enabling
such configuraiton. This value can only be set once. Its default value
is Runtime#availableProcessors so that there is no visible change to
existing applications, but enables configuring either a system property
or configuring during application startup (e.g., based on settings used
to configure the application).

Additionally, we introduce the usage of forbidden-apis to prevent future
uses of Runtime#availableProcessors from creeping. Future work should
enable the bundled signatures and clean up uses of deprecated and
other forbidden methods.

Result:

Netty can be configured to not use the underlying number of processors,
but rather the constrained number of processors.
2017-04-23 10:31:17 +02:00
Norman Maurer bf0beb772c Fix IllegalArgumentException when release a wrapped ByteBuffer on Java9
Motivation:

Unsafe.invokeCleaner(...) checks if the passed in ByteBuffer is a slice or duplicate and if so throws an IllegalArgumentException on Java9. We need to ensure we never try to free a ByteBuffer that was provided by the user directly as we not know if its a slice / duplicate or not.

Modifications:

Never try to free a ByteBuffer that was passed into UnpooledUnsafeDirectByteBuf constructor by an user (via Unpooled.wrappedBuffer(....)).

Result:

Build passes again on Java9
2017-04-20 19:19:11 +02:00
Nikolay Fedorovskikh 0692bf1b6a fix the typos 2017-04-20 04:56:09 +02:00
Norman Maurer e482d933f7 Add 'io.netty.tryAllocateUninitializedArray' system property which allows to allocate byte[] without memset in Java9+
Motivation:

Java9 added a new method to Unsafe which allows to allocate a byte[] without memset it. This can have a massive impact in allocation times when the byte[] is big. This change allows to enable this when using Java9 with the io.netty.tryAllocateUninitializedArray property when running Java9+. Please note that you will need to open up the jdk.internal.misc package via '--add-opens java.base/jdk.internal.misc=ALL-UNNAMED' as well.

Modifications:

Allow to allocate byte[] without memset on Java9+

Result:

Better performance when allocate big heap buffers and using java9.
2017-04-19 11:45:39 +02:00
Scott Mitchell 21562d8808 Retained[Duplicate|Slice] operations should not increase the reference count for UnreleasableByteBuf
Motivation:
UnreleasableByteBuf operations are designed to not modify the reference count of the underlying buffer. The Retained[Duplicate|Slice] operations violate this assumption and can cause the underlying buffer's reference count to be increased, but never allow for it to be decreased. This may lead to memory leaks.

Modifications:
- UnreleasableByteBuf's Retained[Duplicate|Slice] should leave the reference count of the parent buffer unchanged after the operation completes.

Result:
No more memory leaks due to usage of the Retained[Duplicate|Slice] on an UnreleasableByteBuf object.
2017-03-31 17:45:29 -07:00
Scott Mitchell ef21d5f4ca UnsafeByteBufUtil errors and simplification
Motiviation:
UnsafeByteBufUtil has some bugs related to using an incorrect index, and also omitting the array paramter when dealing with byte[] objects. There is also some simplification possible with respect to type casting, and minor formatting consistentcy issues.

Modifications:
- Ensure indexing is correct when dealing with native memory
- Fix the native access and endianness for the medium/unsigned medium methods
- Ensure array is used when dealing with heap memory
- Remove unecessary casts when using long
- Fix formating and alignment

Result:
UnsafeByteBufUtil is more correct and won't access direct memory when heap arrays are used.
2017-03-30 11:52:03 -07:00
Norman Maurer 6036b3f6ea Fix buffer leak in EmptyByteBufTest introduced by aa2f16f314 2017-03-27 05:20:02 +02:00
Bryce Anderson aa2f16f314 EmptyByteBuf allows writing ByteBufs with 0 readable bytes
Motivation:

The contract of `ByteBuf.writeBytes(ByteBuf src)` is such that it will
throw an `IndexOutOfBoundsException if `src.readableBytes()` is greater than
`this.writableBytes()`. The EmptyByteBuf class will throw the exception,
even if the source buffer has zero readable bytes, in violation of the
contract.

Modifications:

Use the helper method `checkLength(..)` to check the length and throw
the exception, if appropriate.

Result:

Conformance with the stated behavior of ByteBuf.
2017-03-21 22:00:54 -07:00
Nikolay Fedorovskikh 2993760e92 Fix misordered 'assertEquals' arguments in tests
Motivation:

Wrong argument order in some 'assertEquals' applying.

Modifications:

Flip compared arguments.

Result:

Correct `assertEquals` usage.
2017-03-08 22:48:37 -08:00
Norman Maurer 3ad3356892 Expose ByteBufAllocator metric in a more general way
Motivation:

PR [#6460] added a way to access the used memory of an allocator. The used naming was not very good and how things were exposed are not consistent.

Modifications:

- Add a new ByteBufAllocatorMetric and ByteBufAllocatorMetricProvider interface
- Let the ByteBufAllocator implementations implement ByteBufAllocatorMetricProvider
- Move exposed stats / metric from PooledByteBufAllocator to PooledByteBufAllocatorMetric and mark old methods as `@Deprecated`.

Result:

More consistent way to expose metric / stats for ByteBufAllocator
2017-03-08 20:07:58 +01:00
Scott Mitchell 2cff918044 Correct usages of internalNioBuffer
Motivation:
There are numerous usages of internalNioBuffer which hard code 0 for the index when the intention was to use the readerIndex().

Modifications:
- Remove hard coded 0 for the index and use readerIndex()

Result:
We are less susceptible to using the wrong index, and don't make assumptions about the ByteBufAllocator.
2017-03-02 12:51:22 -08:00
Norman Maurer 461f9a1212 Allow to obtain informations of used direct and heap memory for ByteBufAllocator implementations
Motivation:

Often its useful for the user to be able to get some stats about the memory allocated via an allocator.

Modifications:

- Allow to obtain the used heap and direct memory for an allocator
- Add test case

Result:

Fixes [#6341]
2017-03-01 18:53:43 +01:00
Norman Maurer a7fe6c0153 Metrics exposed by PooledByteBufAllocator needs to be correctly synchronized
Motivation:

As we may access the metrics exposed of PooledByteBufAllocator from another thread then the allocations happen we need to ensure we synchronize on the PoolArena to ensure correct visibility.

Modifications:

Synchronize on the PoolArena to ensure correct visibility.

Result:

Fix multi-thread issues on the metrics
2017-03-01 06:26:08 +01:00
Norman Maurer deb90923a2 Ensure PooledByteBuf.initUnpooled(...) correctly set the allocator
Motivation:

Commit 8dda984afe introduced a regression which lead to the situation that the allocator is not set when PooledByteBuf.initUnpooled(...) is called. Thus it was possible that PooledByteBuf.alloc() returns null or the wrong allocator if multiple PooledByteBufAllocator are used in an application.

Modifications:

- Correctly set the allocator
- Add test-case

Result:

Fixes [#6436].
2017-02-23 19:53:17 +01:00
Nikolay Fedorovskikh 0623c6c533 Fix javadoc issues
Motivation:

Invalid javadoc in project

Modifications:

Fix it

Result:

More correct javadoc
2017-02-22 07:31:07 +01:00
Norman Maurer fbf0e5f4dd Prefer JDK ThreadLocalRandom implementation over ours.
Motivation:

We have our own ThreadLocalRandom implementation to support older JDKs . That said we should prefer the JDK provided when running on JDK >= 7

Modification:

Using ThreadLocalRandom implementation of the JDK when possible.

Result:

Make use of JDK implementations when possible.
2017-02-16 15:44:00 -08:00
Norman Maurer 6ac5f35077 Use Unsafe to read ByteBuffer.address field to make it work on Java9 as well.
Motivation:

Java9 does not allow changing access level via reflection by default. This lead to the situation that netty disabled Unsafe completely as ByteBuffer.address could not be read.

Modification:

Use Unsafe to read the address field as this works on all Java versions.

Result:

Again be able to use Unsafe optimisations when using Netty with Java9
2017-02-16 20:40:59 +01:00
Norman Maurer 8a3a3245df Ensure Unsafe buffer implementations are used when sun.misc.Unsafe is present
Motivation:

When sun.misc.Unsafe is present we want to use *Unsafe*ByteBuf implementations. We missed to do so in PooledByteBufAllocator when the heapArena is null.

Modifications:

- Correctly use UnpooledUnsafeHeapByteBuf
- Add unit tests

Result:

Use most optimal ByteBuf implementation.
2017-02-16 07:48:33 +01:00
Norman Maurer f09a721d7f Expose the chunkSize used by PooledByteBufAllocator.
Motivation:

Sometimes it may be useful to know the used chunkSize.

Modifications:

Add method to expose chunkSize.

Result:

More exposed details.
2017-02-14 08:37:05 +01:00
Norman Maurer 371c0ca0f8 Eliminate unnessary wrapping when call ByteBuf.asReadOnly() in some cases
Motivation:

We can eliminate unnessary wrapping when call ByteBuf.asReadOnly() in some cases to reduce indirection.

Modifications:

- Check if asReadOnly() needs to create a new instance or not
- Add test cases

Result:

Less object creation / wrapping.
2017-02-14 08:35:16 +01:00
fenik17 0cf3f54a8d Adding 'final' keyword for private fields where possible
Motivation

Missing 'final' keyword for fields

Modifications

Add 'final' for fields where possible

Result

More safe and consistent code
2017-02-14 08:29:15 +01:00
Norman Maurer 34ea09e552 Add missing assumeTrue(...) that were not added in 54339c08ac 2017-02-14 08:17:33 +01:00
Norman Maurer 9b2b3e2512 Ensure tests pass when sun.misc.Unsafe is not present
Motivation:

We need to ensure we pass all tests when sun.misc.Unsafe is not present.

Modifications:

- Make *ByteBufAllocatorTest work whenever sun.misc.Unsafe is present or not
- Let Lz4FrameEncoderTest not depend on AbstractByteBufAllocator implementation details which take into account if sun.misc.Unsafe is present or not

Result:

Tests pass even without sun.misc.Unsafe.
2017-02-14 07:52:07 +01:00
Norman Maurer 54339c08ac Only try to calculate direct memory offset when sun.misc.Unsafe is present
Motivation:

We should only try to calculate the direct memory offset when sun.misc.Unsafe is present as otherwise it will fail with an NPE as PlatformDependent.directBufferAddress(...) will throw it.
This problem was introduced by 66b9be3a46.

Modifications:

Use offset of 0 if no sun.misc.Unsafe is present.

Result:

PooledByteBufAllocator also works again when no sun.misc.Unsafe is present.
2017-02-14 07:49:24 +01:00
Norman Maurer d8596d2d90 Two tests are missing `@Test` annotations
Motivation:

ReadOnlyByteBufTest contains two tests which are missing the `@Test` annotation and so will never run.

Modifications:

Add missing annotation.

Result:

Tests run as expected.
2017-02-14 07:48:37 +01:00
Norman Maurer a7c0ff665c Only use Mockito for mocking.
Motivation:

We used various mocking frameworks. We should only use one...

Modifications:

Make usage of mocking framework consistent by only using Mockito.

Result:

Less dependencies and more consistent mocking usage.
2017-02-07 08:47:22 +01:00
Kiril Menshikov 66b9be3a46 Allow to allign allocated Buffers
Motivation:

64-byte alignment is recommended by the Intel performance guide (https://software.intel.com/en-us/articles/practical-intel-avx-optimization-on-2nd-generation-intel-core-processors) for data-structures over 64 bytes.
Requiring padding to a multiple of 64 bytes allows for using SIMD instructions consistently in loops without additional conditional checks. This should allow for simpler and more efficient code.

Modification:

At the moment cache alignment must be setup manually. But probably it might be taken from the system. The original code was introduced by @normanmaurer https://github.com/netty/netty/pull/4726/files

Result:

Buffer alignment works better than miss-align cache.
2017-02-06 07:58:29 +01:00
Norman Maurer 756b78b7df Add common tests for ByteBufAllocator / AbstractByteBufAllocator implementations.
Motivation:

We not had tests for ByteBufAllocator implementations in general.

Modifications:

Added ByteBufAllocatorTest, AbstractByteBufAllocatorTest and UnpooledByteBufAllocatorTest

Result:

More tests for allocator implementations.
2017-02-06 07:51:10 +01:00
Dmitriy Dumanskiy b9abd3c9fc Cleanup : for loops for arrays to make code easier to read and removed unnecessary toLowerCase() 2017-02-06 07:47:59 +01:00
Norman Maurer 66b1731041 PooledByteBuf.capacity(...) not enforces maxCapacity()
Motivation:

PooledByteBuf.capacity(...) miss to enforce maxCapacity() and so its possible to increase the capacity of the buffer even if it will be bigger then maxCapacity().

Modifications:

- Correctly enforce maxCapacity()
- Add unit tests for capacity(...) calls.

Result:

Correctly enforce maxCapacity().
2017-02-01 18:45:54 +01:00
Carl Mastrangelo ead9938980 Include Http 1 request in error message
Motivation:

When An HTTP server is listening in plaintext mode, it doesn't have
a chance to negotiate "h2" in the tls handshake.  HTTP 1 clients
that are not expecting an HTTP2 server will accidentally a request
that isn't an upgrade, which the HTTP/2 decoder will not
understand.  The decoder treats the bytes as hex and adds them to
the error message.

These error messages are hard to understand by humans, and result
in extra, manual work to decode.

Modification:

If the first bytes of the request are not the preface, the decoder
will now see if they are an HTTP/1 request first.  If so, the error
message will include the method and path of the original request in
the error message.

In case the path is long, the decoder will check up to the first
1024 bytes to see if it matches.  This could be a DoS vector if
tons of bad requests or other garbage come in.  A future optimization
would be to treat the first few bytes as an AsciiString and not do
any Charset decoding.  ByteBuf.toCharSequence alludes to such an
optimization.

The code has been left simple for the time being.

Result:

Faster identification of errant HTTP requests.
2017-01-30 09:46:38 -08:00
ming.ma f10f8a3131 Calculate correct count for tiny/small/normal allocation
Motivation:

Disable ThreadLocal Cache, then allocate Pooled ByteBuf and release all these buffers, PoolArena's tiny/small/normal allocation count is incorrect.

Modifications:

- Calculate PoolArena's tiny/small/normal allocation one time
- Add testAllocationCounter TestCase

Result:

Fixes #6282 .
2017-01-30 10:32:58 +01:00
Norman Maurer 8dda984afe Null out references to tmpNioBuf and chunk to allow quicker collecting
Motivation:

In PooledByteBuf we missed to null out the chunk and tmpNioBuf fields before recycle it to the Recycler. This could lead to keep objects longer alive then necessary which may hold a lot of memory.

Modifications:

Null out tmpNioBuf and chunk before recycle.

Result:

Possible to earlier GC objects.
2017-01-26 22:06:47 +01:00
ming.ma 44add3c525 Log correct value for useCacheForAllThreads
Motivation:

Log about "-Dio.netty.allocator.useCacheForAllThreads" is missing log placeholder, and so can't output correct value.

Modification:

- Add placeholder

Result:

Fixes #6265 .
2017-01-25 08:01:23 +01:00
Scott Mitchell 583a59abb1 ByteBufUtil.compare int underflow
Motivation:
ByteBufUtil.compare uses long arithmetic but doesn't check for underflow on when converting from long to int to satisfy the Comparable interface. This will result in incorrect comparisons and violate the Comparable interface contract.

Modifications:
- ByteBufUtil.compare should protect against int underflow

Result:
Fixes https://github.com/netty/netty/issues/6169
2017-01-10 11:43:59 -08:00