524 Commits

Author SHA1 Message Date
Norman Maurer
c0fd74858a [#4198] Fix race-condition when allocate from multiple-thread.
Motivation:

Fix a race condition that was introduced by f18990a8a507d52fc40416d169db340105b10ec0 that could lead to a NPE when allocate from the PooledByteBufAllocator concurrently by many threads.

Modifications:

Correctly synchronize on the PoolSubPage head.

Result:

No more race.
2015-10-27 07:37:33 +01:00
Norman Maurer
40e0fbfcb6 Share code between Unsafe ByteBuf implementations
Motiviation:

We have a lot of duplicated code which makes it hard to maintain.

Modification:

Move shared code to UnsafeByteBufUtil and use it in the implementations.

Result:

Less duplicated code and so easier to maintain.
2015-10-23 12:05:15 +02:00
Norman Maurer
1055c8787e Share code between Heap ByteBuf implementations
Motiviation:

We have a lot of duplicated code which makes it hard to maintain.

Modification:

Move shared code to HeapByteBufUtil and use it in the implementations.

Result:

Less duplicated code and so easier to maintain.
2015-10-23 11:53:25 +02:00
Norman Maurer
164f6f1731 Add *UnsafeHeapByteBuf for improve performance on systems with sun.misc.Unsafe
Motivation:

sun.misc.Unsafe allows us to handle heap ByteBuf in a more efficient matter. We should use special ByteBuf implementation when sun.misc.Unsafe can be used to increase performance.

Modifications:

- Add PooledUnsafeHeapByteBuf and UnpooledUnsafeHeapByteBuf that are used when sun.misc.Unsafe is ready to use.
- Add UnsafeHeapSwappedByteBuf

Result:

Better performance when using heap buffers and sun.misc.Unsafe is ready to use.
2015-10-21 08:29:56 +02:00
Norman Maurer
eadf1bfc3f Correctly handle byte shifting if system does not support unaligned access.
Motivation:

We had a bug in our implemention which double "reversed" bytes on systems which not support unaligned access.

Modifications:

- Correctly only reverse bytes if needed.
- Share code between unsafe implementations.

Result:

No more data-corruption on sytems without unaligned access.
2015-10-20 17:31:54 +02:00
Matteo Merli
a6324758b8 In (Pooled|Unpooled)UnsafeDirectByteBuf copy memory directly to and from ByteBuffer
Motivation:

When moving bytes between a PooledUnsafeDirectByteBuf or an UnpooledUnsafeDirectByteBuf
and a ByteBuffer, a temp ByteBuffer is allocated and will need to be GCed. This is a
common case since a ByteBuffer is always needed when reading/writing on a file,
for example.

Modifications:

Use PlatformDependent.copyMemory() to avoid the need for the temp ByteBuffer

Result:

No temp ByteBuffer allocated and GCed.
2015-10-19 22:23:45 +02:00
Norman Maurer
3bc4d2330a Minimize reference count checks in SlicedByteBuf
Motivation:

SlicedByteBuf did double reference count checking for various bulk operations, which affects performance.

Modifications:

- Add package private method to AbstractByteBuf that can be used to check indexes without check the reference count
- Use this new method in the bulk operation os SlicedByteBuf as the reference count checks take place on the wrapped buffer anyway
- Fix test-case to not try to read data that is out of the bounds of the buffer.

Result:

Better performance on bulk operations when using SlicedByteBuf (and sub-classes)
2015-10-16 21:05:13 +02:00
Norman Maurer
6faef55aef Cleanup buffer tests.
Motivation:

Some of the tests in the buffer module contained unused code. Some of the tests also used unnecessary inheritance which could be avoided to simplify code.

Modifications:

Cleanup the test cases.

Result:

Cleaner code, less cruft.
2015-10-16 20:47:32 +02:00
Norman Maurer
31ef237085 Always return a real slice even when the length is 0
Motivation:

We need to always return a real slice even when the requested length is 0. This is needed as otherwise we not correctly share the reference count and so may leak a buffer if the user call release() on the returned slice and expect it to decrement the reference count of the "parent" buffer.

Modifications:

- Always return a real slice
- Add unit test for the bug.

Result:

No more leak possible when a user requests a slice of length 0 of a SlicedByteBuf.
2015-10-16 20:45:54 +02:00
Norman Maurer
291674262c Added SlicedAbstractByteBuf that can provide fast-path for _get* and _set* methods
Motivation:

SlicedByteBuf can be used for any ByteBuf implementations and so can not do any optimizations that could be done
when AbstractByteBuf is sliced.

Modifications:

- Add SlicedAbstractByteBuf that can eliminate range and reference count checks for _get* and _set* methods.

Result:

Faster SlicedByteBuf implementations for AbstractByteBuf sub-classes.
2015-10-16 08:59:58 +02:00
Norman Maurer
b8c73a4806 Added DuplicatedAbstractByteBuf that can provide fast-path for _get* and _set* methods
Motivation:

DuplicatedByteBuf can be used for any ByteBuf implementations and so can not do any optimizations that could be done
when AbstractByteBuf is duplicated.

Modifications:

- Add DuplicatedAbstractByteBuf that can eliminate range and reference count checks for _get* and _set* methods.

Result:

Faster DuplicatedByteBuf implementations for AbstractByteBuf sub-classes.
2015-10-16 08:43:53 +02:00
Norman Maurer
054af70fed Minimize object allocation when calling AbstractByteBuf.toString(..., Charset)
Motivation:

Calling AbstractByteBuf.toString(..., Charset) is used quite frequently by users but produce a lot of GC.

Modification:

- Use a FastThreadLocal to store the CharBuffer that are needed for decoding.
- Use internalNioBuffer(...) when possible

Result:

Less object creation / Less GC
2015-10-15 17:49:21 +02:00
Norman Maurer
1103379e02 Allow to disable reference count checks on every access of the ByteBuf
Motiviation:

Checking reference count on every access on a ByteBuf can have some big performance overhead depending on how the access pattern is. If the user is sure that there are no reference count errors on his side it should be possible to disable the check and so gain the max performance.

Modification:

- Add io.netty.buffer.bytebuf.checkAccessible system property which allows to disable the checks. Enabled by default.
- Add microbenchmark

Result:

Increased performance for operations on the ByteBuf.
2015-10-15 10:19:49 +02:00
Norman Maurer
af9dc2c6a6 Optimize and minimize bound checks
Motivation:

We should minimize and optimize bound checks as much as possible to get the most out of performance.

Modifications:

- Use bitwise operations to remove branching
- Remove branches when possible

Result:

Better performance for various operations.
2015-10-15 10:18:20 +02:00
Norman Maurer
d6a00d0642 [#4313] ByteBufUtil.writeUtf8 should use fast-path for WrappedByteBuf
Motivation:

ByteBufUtil.writeUtf8(...) / writeUsAscii(...) can use a fast-path when writing into AbstractByteBuf. We should try to unwrap WrappedByteBuf implementations so
we are able to do the same on wrapped AbstractByteBuf instances.

Modifications:

- Try to unwrap WrappedByteBuf to use the fast-path

Result:

Faster writing of utf8 and usascii for WrappedByteBuf instances.
2015-10-13 11:53:31 +02:00
Norman Maurer
99b4aec46d [#4327] Ensure toString() will not throw IllegalReferenceCountException
Motivation:

As toString() is often used while logging we need to ensure this produces no exception.

Modifications:

Ensure we never throw an IllegalReferenceCountException.

Result:

Be able to log without produce exceptions.
2015-10-10 20:12:19 +02:00
Norman Maurer
696a287736 [maven-release-plugin] prepare for next development iteration 2015-09-30 09:31:26 +02:00
Norman Maurer
fb2d562306 [maven-release-plugin] prepare release netty-4.0.32.Final 2015-09-30 09:28:40 +02:00
Norman Maurer
3de8768601 [#3789] Correctly reset markers for all allocations when using PooledByteBufAllocator
Motivation:

We need to ensure all markers are reset when doing an allocation via the PooledByteBufAllocator. This was not the always the case.

Modifications:

Move all logic that needs to get executed when reuse a PooledByteBuf into one place and call it.

Result:

Correct behavior
2015-09-25 19:57:17 +02:00
Norman Maurer
bd928eaa38 [maven-release-plugin] prepare for next development iteration 2015-09-02 08:58:54 +02:00
Norman Maurer
26bbcc38c2 [maven-release-plugin] prepare release netty-4.0.31.Final 2015-09-02 08:57:57 +02:00
Matteo Merli
fd70dd658e Added debug logging with effective value for io.netty.leakDetection.acquireAndReleaseOnly property
Motivation:
The configurable property value recently added was not logged like others properties.

Modifications:
Added debug log with effective value applied.

Result:
Consistent with other properties
2015-09-01 09:10:14 +02:00
Matteo Merli
2d4a8a75bb Additional configuration for leak detection
Motivation:

Leak detector, when it detects a leak, will print the last 5 stack
traces that touched the ByteBuf. In some cases that might not be enough
to identify the root cause of the leak.
Also, sometimes users might not be interested in tracing all the
operations on the buffer, but just the ones that are affecting the
reference count.

Modifications:

Added command line properties to override default values:
 * Allow to configure max number of stack traces to collect
 * Allow to only record retain/release operation on buffers

Result:
Users can increase the number of stack traces to debug buffer leaks
with lot of retain/release operations.
2015-08-30 20:38:35 +02:00
Scott Mitchell
09dff41343 Restore derived buffer index/mark updates
Motivation:
As part of the revert process in https://github.com/netty/netty/pull/4138 some index and mark updates were lost.

Modifications:
- Restore the index / mark updates made in https://github.com/netty/netty/pull/3788

Result:
Slice and Duplicate buffers index / marks are correctly initialized.
2015-08-27 10:24:48 -07:00
Scott Mitchell
33001e84ff Revert "Add PooledSlicedByteBuf and PooledDuplicatedByteBuf"
Motivation:
Currently the "derived" buffer will only ever be recycled if the release call is made on the "derived" object, and the "wrapped" buffer ends up being "fully released" (aka refcount goes to 0). From my experience this is not the common use case and thus the "derived" buffers will not be recycled.

Modifications:
- revert https://github.com/netty/netty/pull/3788

Result:
Less complexity, and less code to create new objects in majority of cases.
2015-08-26 13:23:42 -07:00
Matteo Merli
53f9438aec MemoryRegionCache$Entry objects are not recycled
Motivation:

Even though MemoryRegionCache$Entry instances are allocated through a recycler they are not properly recycled,
leaving a lot of instances to be GCed along with Recycler$DefaultHandle objects.
Fixes #4071

Modification:

Recycle Entry when done using it.

Result:

Less GCed objects.
2015-08-10 21:28:53 +02:00
Norman Maurer
148692705c [maven-release-plugin] prepare for next development iteration 2015-07-24 10:11:44 +02:00
Norman Maurer
11cc2d5197 [maven-release-plugin] prepare release netty-4.0.30.Final 2015-07-24 09:54:20 +02:00
Norman Maurer
0a0292476e [#3896] Unpooled.copiedBuffer(ByteBuffer) and copiedBuffer(ByteBuffer...) is not thread-safe.
Motivation:

As we modify the position of the passed in ByteBuffer's this methods are not thread-safe.

Modifications:

Duplicate the input ByteBuffers before copy the content to  byte[].

Result:

Unpooled.copiedBuffer(ByteBuffer) and copiedBuffer(ByteBuffer...) is now thread-safe.
2015-07-07 08:38:07 +02:00
Norman Maurer
75bb7882bf [#3899] Fix javadoc to use netty 4 API.
Motivation:

The javadoc of ByteBuf contained some out-dated code.

Modifications:

Update code example in javadoc to use netty 4+ API

Result:

Correct javadocs
2015-07-03 14:18:27 +02:00
Louis Ryan
bb0b86ce50 Fix FixedCompositeByteBuf handling when copying to direct buffers and streams
Motivation:

FixedCompositeByteBuf does not properly implement a number of methods for
copying its content to direct buffers and output streams

Modifications:

Replace improper use of capacity() with readableBytes() when computing offesets during writes

Result:

Copying works correctly
2015-06-27 21:21:36 +02:00
Norman Maurer
24de49cb82 Add FixedCompositeByteBuf which can be used to write an array of ByteBuf in an efficient way.
This implementation does not produce as much GC pressure as CompositeByteBuf and so is prefered,
for writing an array of ByteBufs. Be aware that FixedCompositeByteBuf is readonly.

When using this in a project that make heavy use of CompositeByteBuf for writes we was able to cut
down allocation to a half.
2015-06-27 20:57:24 +02:00
Norman Maurer
1da998bc7c [maven-release-plugin] prepare for next development iteration 2015-06-23 11:08:27 +02:00
Norman Maurer
4c482c1215 [maven-release-plugin] prepare release netty-4.0.29.Final 2015-06-23 11:07:56 +02:00
Norman Maurer
1796cfc419 [#3888] Use 2 * cores as default minimum for pool arenas.
Motivation:

At the moment we use 1 * cores as default mimimum for pool arenas. This can easily lead to conditions as we use 2 * cores as default for EventLoop's when using NIO or EPOLL. If we choose a smaller number we will run into hotspots as allocation and deallocation needs to be synchronized on the PoolArena.

Modifications:

Change the default number of arenas to 2 * cores.

Result:

Less conditions when using the default settings.
2015-06-18 07:19:15 +02:00
Norman Maurer
9a30b6860f [#3798] Extract dump method to ByteBufUtil
Motivation:

Dumping the content of a ByteBuf in a hex format is very useful.

Modifications:

Move code into ByteBufUtil so its easy to reuse.

Result:

Easy to reuse dumping code.
2015-06-12 14:05:32 +02:00
Norman Maurer
528415bc29 Update javadocs to highlight that derived buffers will not increment the reference count.
Motivation:

We not explain the derived buffers will not retain the parent buffer.

Modifications:

Add docs.

Result:

Correctly document behaviour
2015-06-06 19:54:03 +02:00
Norman Maurer
7e328fd1c3 Fix regression introduced by f765053ae740e300a6b696840d7dfe5de32afeb3 by use Entry after it is recycled 2015-05-27 16:56:05 +02:00
Norman Maurer
f765053ae7 Let PoolThreadCache work even if allocation and deallocation Thread are different
Motivation:

PoolThreadCache did only cache allocations if the allocation and deallocation Thread were the same. This is not optimal as often people write from differen thread then the actual EventLoop thread.

Modification:

- Add MpscArrayQueue which was forked from jctools and lightly modified.
- Use MpscArrayQueue for caches and always add buffer back to the cache that belongs to the allocation thread.

Result:

ThreadPoolCache is now also usable and so gives performance improvements when allocation and deallocation thread are different.

Performance when using same thread for allocation and deallocation is noticable worse then before.
2015-05-27 14:35:22 +02:00
Norman Maurer
f18990a8a5 [#3654] Synchronize on PoolSubpage head when allocate / free PoolSubpages
Motivation:

Currently we hold a lock on the PoolArena when we allocate / free PoolSubpages, which is wasteful as this also affects "normal" allocations. The same is true vice-verse.

Modifications:

Ensure we synchronize on the head of the PoolSubPages pool. This is done per size and so it is possible to concurrently allocate / deallocate PoolSubPages with different sizes, and also normal allocations.

Result:

Less condition and so faster allocation/deallocation.

Before this commit:
xxx:~/wrk $ ./wrk -H 'Connection: keep-alive' -d 120 -c 256 -t 16 -s scripts/pipeline-many.lua  http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
  16 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    17.61ms   29.52ms 689.73ms   97.27%
    Req/Sec   278.93k    41.97k  351.04k    84.83%
  530527460 requests in 2.00m, 71.64GB read
Requests/sec: 4422226.13
Transfer/sec:    611.52MB

After this commit:
xxx:~/wrk $ ./wrk -H 'Connection: keep-alive' -d 120 -c 256 -t 16 -s scripts/pipeline-many.lua  http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
  16 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    15.85ms   24.50ms 681.61ms   97.42%
    Req/Sec   287.14k    38.39k  360.33k    85.88%
  547902773 requests in 2.00m, 73.99GB read
Requests/sec: 4567066.11
Transfer/sec:    631.55MB

This is reproducable every time.
2015-05-27 10:32:52 +02:00
Norman Maurer
7e80e1bf97 [#3654] No need to hold lock while destroy a chunk
Motiviation:

At the moment we sometimes hold the lock on the PoolArena during destroy a PoolChunk. This is not needed.

Modification:

- Ensure we not hold the lock during destroy a PoolChunk
- Move all synchronized usage in PoolArena
- Cleanup

Result:

Less condition.
2015-05-27 09:47:34 +02:00
Norman Maurer
8fed94eee3 Expose metrics for PooledByteBufAllocator
Motivation:

The PooledByteBufAllocator is more or less a black-box atm. We need to expose some metrics to allow the user to get a better idea how to tune it.

Modifications:

- Expose different metrics via PooledByteBufAllocator
- Add *Metrics interfaces

Result:

It is now easy to gather metrics and detail about the PooledByteBufAllocator and so get a better understanding about resource-usage etc.
2015-05-20 21:01:37 +02:00
Norman Maurer
8a1a0b2c1e Add PooledSlicedByteBuf and PooledDuplicatedByteBuf
Motivation:

At the moment when calling slice(...) or duplicate(...) on a Pooled*ByteBuf a new SlicedByteBuf or DuplicatedByteBuf. This can create a lot of GC.

Modifications:

Add PooledSlicedByteBuf and PooledDuplicatedByteBuf which will be used when a PooledByteBuf is used.

Result:

Less GC.
2015-05-20 11:37:41 +02:00
Norman Maurer
981292ffc0 Clarify ByteBuf.duplicate() semantics.
Motivation:

From the javadocs of ByteBuf.duplicate() it is not clear if the reader and writer marks will be duplicated.

Modifications:

Add sentence to clarify that marks will not be duplicated.

Result:

Clear semantics.
2015-05-20 08:32:43 +02:00
Norman Maurer
979fdfd3d3 Reset markers when obtain PooledByteBuf.
Motivation:

When allocate a PooledByteBuf we need to ensure to also reset the markers for the readerIndex and writerIndex.

Modifications:

- Correct reset the markers
- Add test-case for it

Result:

Correctly reset markers.
2015-05-20 07:29:21 +02:00
Norman Maurer
dc73c15df3 No need to release lock and acquire again when allocate normal size.
Motiviation:

When tried to allocate tiny and small sized and failed to serve these out of the PoolSubPage we exit the synchronization
block just to enter it again when call allocateNormal(...).

Modification:

Not exit the synchronized block until allocateNormal(...) is done.

Result:

Better performance.
2015-05-18 10:23:29 +02:00
Norman Maurer
d1c46ca987 [maven-release-plugin] prepare for next development iteration 2015-05-07 11:33:47 -04:00
Norman Maurer
005d4a42fc [maven-release-plugin] prepare release netty-4.0.28.Final 2015-05-07 11:33:09 -04:00
Norman Maurer
0f4d3a981e Revert "[maven-release-plugin] prepare for next development iteration"
This reverts commit 3c10ffab5ee6b607d45f73123acb59c5c6602b6f.
2015-05-07 11:02:03 -04:00
Norman Maurer
3c10ffab5e [maven-release-plugin] prepare for next development iteration 2015-05-07 09:09:23 -04:00