Commit Graph

494 Commits

Author SHA1 Message Date
Louis Ryan
bb0b86ce50 Fix FixedCompositeByteBuf handling when copying to direct buffers and streams
Motivation:

FixedCompositeByteBuf does not properly implement a number of methods for
copying its content to direct buffers and output streams

Modifications:

Replace improper use of capacity() with readableBytes() when computing offesets during writes

Result:

Copying works correctly
2015-06-27 21:21:36 +02:00
Norman Maurer
24de49cb82 Add FixedCompositeByteBuf which can be used to write an array of ByteBuf in an efficient way.
This implementation does not produce as much GC pressure as CompositeByteBuf and so is prefered,
for writing an array of ByteBufs. Be aware that FixedCompositeByteBuf is readonly.

When using this in a project that make heavy use of CompositeByteBuf for writes we was able to cut
down allocation to a half.
2015-06-27 20:57:24 +02:00
Norman Maurer
1da998bc7c [maven-release-plugin] prepare for next development iteration 2015-06-23 11:08:27 +02:00
Norman Maurer
4c482c1215 [maven-release-plugin] prepare release netty-4.0.29.Final 2015-06-23 11:07:56 +02:00
Norman Maurer
1796cfc419 [#3888] Use 2 * cores as default minimum for pool arenas.
Motivation:

At the moment we use 1 * cores as default mimimum for pool arenas. This can easily lead to conditions as we use 2 * cores as default for EventLoop's when using NIO or EPOLL. If we choose a smaller number we will run into hotspots as allocation and deallocation needs to be synchronized on the PoolArena.

Modifications:

Change the default number of arenas to 2 * cores.

Result:

Less conditions when using the default settings.
2015-06-18 07:19:15 +02:00
Norman Maurer
9a30b6860f [#3798] Extract dump method to ByteBufUtil
Motivation:

Dumping the content of a ByteBuf in a hex format is very useful.

Modifications:

Move code into ByteBufUtil so its easy to reuse.

Result:

Easy to reuse dumping code.
2015-06-12 14:05:32 +02:00
Norman Maurer
528415bc29 Update javadocs to highlight that derived buffers will not increment the reference count.
Motivation:

We not explain the derived buffers will not retain the parent buffer.

Modifications:

Add docs.

Result:

Correctly document behaviour
2015-06-06 19:54:03 +02:00
Norman Maurer
7e328fd1c3 Fix regression introduced by f765053ae7 by use Entry after it is recycled 2015-05-27 16:56:05 +02:00
Norman Maurer
f765053ae7 Let PoolThreadCache work even if allocation and deallocation Thread are different
Motivation:

PoolThreadCache did only cache allocations if the allocation and deallocation Thread were the same. This is not optimal as often people write from differen thread then the actual EventLoop thread.

Modification:

- Add MpscArrayQueue which was forked from jctools and lightly modified.
- Use MpscArrayQueue for caches and always add buffer back to the cache that belongs to the allocation thread.

Result:

ThreadPoolCache is now also usable and so gives performance improvements when allocation and deallocation thread are different.

Performance when using same thread for allocation and deallocation is noticable worse then before.
2015-05-27 14:35:22 +02:00
Norman Maurer
f18990a8a5 [#3654] Synchronize on PoolSubpage head when allocate / free PoolSubpages
Motivation:

Currently we hold a lock on the PoolArena when we allocate / free PoolSubpages, which is wasteful as this also affects "normal" allocations. The same is true vice-verse.

Modifications:

Ensure we synchronize on the head of the PoolSubPages pool. This is done per size and so it is possible to concurrently allocate / deallocate PoolSubPages with different sizes, and also normal allocations.

Result:

Less condition and so faster allocation/deallocation.

Before this commit:
xxx:~/wrk $ ./wrk -H 'Connection: keep-alive' -d 120 -c 256 -t 16 -s scripts/pipeline-many.lua  http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
  16 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    17.61ms   29.52ms 689.73ms   97.27%
    Req/Sec   278.93k    41.97k  351.04k    84.83%
  530527460 requests in 2.00m, 71.64GB read
Requests/sec: 4422226.13
Transfer/sec:    611.52MB

After this commit:
xxx:~/wrk $ ./wrk -H 'Connection: keep-alive' -d 120 -c 256 -t 16 -s scripts/pipeline-many.lua  http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
  16 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    15.85ms   24.50ms 681.61ms   97.42%
    Req/Sec   287.14k    38.39k  360.33k    85.88%
  547902773 requests in 2.00m, 73.99GB read
Requests/sec: 4567066.11
Transfer/sec:    631.55MB

This is reproducable every time.
2015-05-27 10:32:52 +02:00
Norman Maurer
7e80e1bf97 [#3654] No need to hold lock while destroy a chunk
Motiviation:

At the moment we sometimes hold the lock on the PoolArena during destroy a PoolChunk. This is not needed.

Modification:

- Ensure we not hold the lock during destroy a PoolChunk
- Move all synchronized usage in PoolArena
- Cleanup

Result:

Less condition.
2015-05-27 09:47:34 +02:00
Norman Maurer
8fed94eee3 Expose metrics for PooledByteBufAllocator
Motivation:

The PooledByteBufAllocator is more or less a black-box atm. We need to expose some metrics to allow the user to get a better idea how to tune it.

Modifications:

- Expose different metrics via PooledByteBufAllocator
- Add *Metrics interfaces

Result:

It is now easy to gather metrics and detail about the PooledByteBufAllocator and so get a better understanding about resource-usage etc.
2015-05-20 21:01:37 +02:00
Norman Maurer
8a1a0b2c1e Add PooledSlicedByteBuf and PooledDuplicatedByteBuf
Motivation:

At the moment when calling slice(...) or duplicate(...) on a Pooled*ByteBuf a new SlicedByteBuf or DuplicatedByteBuf. This can create a lot of GC.

Modifications:

Add PooledSlicedByteBuf and PooledDuplicatedByteBuf which will be used when a PooledByteBuf is used.

Result:

Less GC.
2015-05-20 11:37:41 +02:00
Norman Maurer
981292ffc0 Clarify ByteBuf.duplicate() semantics.
Motivation:

From the javadocs of ByteBuf.duplicate() it is not clear if the reader and writer marks will be duplicated.

Modifications:

Add sentence to clarify that marks will not be duplicated.

Result:

Clear semantics.
2015-05-20 08:32:43 +02:00
Norman Maurer
979fdfd3d3 Reset markers when obtain PooledByteBuf.
Motivation:

When allocate a PooledByteBuf we need to ensure to also reset the markers for the readerIndex and writerIndex.

Modifications:

- Correct reset the markers
- Add test-case for it

Result:

Correctly reset markers.
2015-05-20 07:29:21 +02:00
Norman Maurer
dc73c15df3 No need to release lock and acquire again when allocate normal size.
Motiviation:

When tried to allocate tiny and small sized and failed to serve these out of the PoolSubPage we exit the synchronization
block just to enter it again when call allocateNormal(...).

Modification:

Not exit the synchronized block until allocateNormal(...) is done.

Result:

Better performance.
2015-05-18 10:23:29 +02:00
Norman Maurer
d1c46ca987 [maven-release-plugin] prepare for next development iteration 2015-05-07 11:33:47 -04:00
Norman Maurer
005d4a42fc [maven-release-plugin] prepare release netty-4.0.28.Final 2015-05-07 11:33:09 -04:00
Norman Maurer
0f4d3a981e Revert "[maven-release-plugin] prepare for next development iteration"
This reverts commit 3c10ffab5e.
2015-05-07 11:02:03 -04:00
Norman Maurer
3c10ffab5e [maven-release-plugin] prepare for next development iteration 2015-05-07 09:09:23 -04:00
Alwayswithme
c727c16707 ByteBufUtil use IndexOfProcessor to find occurrence.
Motivation:
The way of firstIndexOf and lastIndexOf iterating the ByteBuf is similar to forEachByte and forEachByteDesc, but have many range checks.
Modifications:
Use forEachByte and a IndexOfProcessor to find occurrence.
Result:
eliminate range checks
2015-05-07 09:50:07 +02:00
Norman Maurer
f242bc5a1a [#3623] CompositeByteBuf.iterator() should return optimized Iterable
Motivation:

CompositeByteBuf.iterator() currently creates a new ArrayList and fill it with the ByteBufs, which is more expensive then it needs to be.

Modifications:

- Use special Iterator implementation

Result:

Less overhead when calling iterator()
2015-04-20 10:45:28 +02:00
garywu
8843d2b1a1 [#2925] Bug fix for NormalMemoryRegionCache overbooked for PoolThreadCache
Motivation:

When create NormalMemoryRegionCache for PoolThreadCache, we overbooked
cache array size. This means unnecessary overhead for thread local cache
as we will create multi cache enties for each element in cache array.

Modifications:

change:
int arraySize = Math.max(1, max / area.pageSize);
to:
int arraySize = Math.max(1, log2(max / area.pageSize) + 1);

Result:

Now arraySize won't introduce unnecessary overhead.

 Changes to be committed:
	modified:   buffer/src/main/java/io/netty/buffer/PoolThreadCache.java
2015-04-13 09:00:51 +02:00
Norman Maurer
bfb6189f77 Let CompositeByteBuf implement Iterable
Motivation:

CompositeByteBuf has an iterator() method but fails to implement Iterable

Modifications:

Let CompositeByteBuf implement Iterable<ByteBuf>

Result:

Easier usage
2015-04-12 13:38:39 +02:00
Norman Maurer
e4900cbdd3 Revert "Dereference when calling PooledByteBuf.deallocate()"
This reverts commit ddd19f4f21.
2015-04-11 06:44:10 +02:00
Norman Maurer
ddd19f4f21 Dereference when calling PooledByteBuf.deallocate()
Motivation:

We missed to dereference the chunk and tmpNioBuf when calling deallocate(). This means the GC can not collect these as we still hold a reference while have the PooledByteBuf in the recycler stack.

Modifications:

Dereference chunk and tmpNioBuf.

Result:

GC can collect things.
2015-04-10 21:46:43 +02:00
Norman Maurer
30f60ac68a Change PoolThreadCache to use LIFO for better cache performance
Motiviation:

At the moment we use FIFO for the PoolThreadCache which is sub-optimal as this may reduce the changes to have the cached memory actual still in the cpu-cache.

Modification:

- Change to use LIFO as this increase the chance to be able to serve buffers from the cpu-cache

Results:

Faster allocation out of the ThreadLocal cache.

Before the commit:
[xxx wrk]$ ./wrk -H 'Connection: keep-alive' -d 120 -c 256 -t 16 -s scripts/pipeline-many.lua  http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
  16 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    14.69ms   10.06ms 131.43ms   80.10%
    Req/Sec   283.89k    40.37k  433.69k    66.81%
  533859742 requests in 2.00m, 72.09GB read
Requests/sec: 4449510.51
Transfer/sec:    615.29MB

After the commit:
[xxx wrk]$ ./wrk -H 'Connection: keep-alive' -d 120 -c 256 -t 16 -s scripts/pipeline-many.lua  http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
  16 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    16.38ms   26.32ms 734.06ms   97.38%
    Req/Sec   283.86k    39.31k  361.69k    83.38%
  540836511 requests in 2.00m, 73.04GB read
Requests/sec: 4508150.18
Transfer/sec:    623.40MB
2015-04-10 20:57:31 +02:00
Norman Maurer
f2fedbcdef [maven-release-plugin] prepare for next development iteration 2015-03-31 22:06:30 -04:00
Norman Maurer
054e7c5d17 [maven-release-plugin] prepare release netty-4.0.27.Final 2015-03-31 22:05:43 -04:00
Leo Gomes
2cc7466266 Updates the javadoc of Unpooled to remove mention to methods it does not provide
Motivation:

`Unpooled` javadoc's mentioned the generation of hex dump and swapping an integer's byte order,
which are actually provided by `ByteBufUtil`.

Modifications:

 Sentence moved to `ByteBufUtil` javadoc.

Result:

`Unpooled` javadoc is correct.
2015-03-04 12:03:50 +09:00
Norman Maurer
37264bb72b [maven-release-plugin] prepare for next development iteration 2015-03-02 01:31:30 -05:00
Norman Maurer
0dbc96cffd [maven-release-plugin] prepare release netty-4.0.26.Final 2015-03-02 01:30:58 -05:00
Norman Maurer
e99d89c04d [maven-release-plugin] rollback the release of netty-4.0.26.Final 2015-02-28 21:28:06 +01:00
Norman Maurer
b86e2e6ac0 [maven-release-plugin] prepare release netty-4.0.26.Final 2015-02-28 13:55:01 -05:00
Norman Maurer
5f71b6dc54 Ensure CompositeByteBuf.addComponent* handles buffer in consistent way and not causes leaks
Motivation:

At the moment we have two problems:
 - CompositeByteBuf.addComponent(...) will not add the supplied buffer to the CompositeByteBuf if its empty, which means it will not be released on CompositeByteBuf.release() call. This is a problem as a user will expect everything added will be released (the user not know we not added it).
 - CompositeByteBuf.addComponents(...) will either add no buffers if none is readable and so has the same problem as addComponent(...) or directly release the ByteBuf if at least one ByteBuf is readable. Again this gives inconsistent handling and may lead to memory leaks.

Modifications:

 - Always add the buffer to the CompositeByteBuf and so release it on release call.

Result:

Consistent handling and no buffer leaks.
2015-02-12 16:09:24 +01:00
Trustin Lee
0e61aeb849 [maven-release-plugin] prepare for next development iteration 2014-12-31 20:58:44 +09:00
Trustin Lee
087db82e78 [maven-release-plugin] prepare release netty-4.0.25.Final 2014-12-31 20:58:33 +09:00
Trustin Lee
0b8f47da04 Implement internal memory access methods of CompositeByteBuf correctly
Motivation:

When a CompositeByteBuf is empty (i.e. has no component), its internal
memory access operations do not always behave as expected.

Modifications:

Check if the nunmber of components is zero. If so, return an empty
array or an empty NIO buffer, etc.

Result:

More robustness
2014-12-30 15:52:57 +09:00
Trustin Lee
38fae09f23 Add more tests to EmptyByteBufTest
- Ensure an EmptyByteBuf has an array, an NIO buffer, and a memory
  address at the same time
- Add an assertion that checks if EMPTY_BUFFER is an EmptyByteBuf,
  just in case we make a mistake in the future
2014-12-30 15:50:49 +09:00
Norman Maurer
61a5e60513 Provide helper methods in ByteBufUtil to write UTF-8/ASCII CharSequences. Related to [#909]
Motivation:

We expose no methods in ByteBuf to directly write a CharSequence into it. This leads to have the user either convert the CharSequence first to a byte array or use CharsetEncoder. Both cases have some overheads and we can do a lot better for well known Charsets like UTF-8 and ASCII.

Modifications:

Add ByteBufUtil.writeAscii(...) and ByteBufUtil.writeUtf8(...) which can do the task in an optimized way. This is especially true if the passed in ByteBuf extends AbstractByteBuf which is true for all of our implementations which not wrap another ByteBuf.

Result:

Writing an ASCII and UTF-8 CharSequence into a AbstractByteBuf is a lot faster then what the user could do by himself as we can make use of some package private methods and so eliminate reference and range checks. When the Charseq is not ASCII or UTF-8 we can still do a very good job and are on par in most of the cases with what the user would do.

The following benchmark shows the improvements:

Result: 2456866.966 ?(99.9%) 59066.370 ops/s [Average]
  Statistics: (min, avg, max) = (2297025.189, 2456866.966, 2586003.225), stdev = 78851.914
  Confidence interval (99.9%): [2397800.596, 2515933.336]

Benchmark                                                        Mode   Samples        Score  Score error    Units
i.n.m.b.ByteBufUtilBenchmark.writeAscii                         thrpt        50  9398165.238   131503.098    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeAsciiString                   thrpt        50  9695177.968   176684.821    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeAsciiStringViaArray           thrpt        50  4788597.415    83181.549    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeAsciiStringViaArrayWrapped    thrpt        50  4722297.435    98984.491    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeAsciiStringWrapped            thrpt        50  4028689.762    66192.505    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeAsciiViaArray                 thrpt        50  3234841.565    91308.009    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeAsciiViaArrayWrapped          thrpt        50  3311387.474    39018.933    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeAsciiWrapped                  thrpt        50  3379764.250    66735.415    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeUtf8                          thrpt        50  5671116.821   101760.081    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeUtf8String                    thrpt        50  5682733.440   111874.084    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeUtf8StringViaArray            thrpt        50  3564548.995    55709.512    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeUtf8StringViaArrayWrapped     thrpt        50  3621053.671    47632.820    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeUtf8StringWrapped             thrpt        50  2634029.071    52304.876    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeUtf8ViaArray                  thrpt        50  3397049.332    57784.119    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeUtf8ViaArrayWrapped           thrpt        50  3318685.262    35869.562    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeUtf8Wrapped                   thrpt        50  2473791.249    46423.114    ops/s
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1,387.417 sec - in io.netty.microbench.buffer.ByteBufUtilBenchmark

Results :

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

Results :

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

The *ViaArray* benchmarks are basically doing a toString().getBytes(Charset) which the others are using ByteBufUtil.write*(...).
2014-12-26 15:57:59 +09:00
Norman Maurer
93237d435f CompositeByteBuf.nioBuffers(...) must not return an empty ByteBuffer array
Motivation:

CompositeByteBuf.nioBuffers(...) returns an empty ByteBuffer array if the specified length is 0. This is not consistent with other ByteBuf implementations which return an ByteBuffer array of size 1 with an empty ByteBuffer included.

Modifications:

Make CompositeByteBuf.nioBuffers(...) consistent with other ByteBuf implementations.

Result:

Consistent and correct behaviour of nioBufffers(...)
2014-12-22 11:17:53 +01:00
Norman Maurer
fa079baf29 Always return SliceByteBuf on slice(...) to eliminate possible leak
Motivation:

When calling slice(...) on a ByteBuf the returned ByteBuf should be the slice of a ByteBuf and shares it's reference count. This is important as it is perfect legal to use buf.slice(...).release() and have both, the slice and the original ByteBuf released. At the moment this is only the case if the requested slice size is > 0. This makes the behavior inconsistent and so may lead to a memory leak.

Modifications:

- Never return Unpooled.EMPTY_BUFFER when calling slice(...).
- Adding test case for buffer.slice(...).release() and buffer.duplicate(...).release()

Result:

Consistent behaviour and so no more leaks possible.
2014-12-22 11:15:28 +01:00
Norman Maurer
d62932c7de Ensure buffer is not released when call array() / memoryAddress()
Motivation:

Before we missed to check if a buffer was released before we return the backing byte array or memoryaddress. This could lead to JVM crashes when someone tried various bulk operations on the Unsafe*ByteBuf implementations.

Modifications:

Always check if the buffer is released before all to return the byte array and memoryaddress.

Result:

No more JVM crashes because of released buffers when doing bulk operations on Unsafe*ByteBuf implementations.
2014-12-11 11:30:10 +01:00
Idel Pivnitskiy
3d200085a4 Small performance improvements
Motivation:

Found performance issues via FindBugs and PMD.

Modifications:

- Removed unnecessary boxing/unboxing operations in DefaultTextHeaders.convertToInt(CharSequence) and DefaultTextHeaders.convertToLong(CharSequence). A boxed primitive is created from a string, just to extract the unboxed primitive value.
- Added a static modifier for DefaultHttp2Connection.ParentChangedEvent class. This class is an inner class, but does not use its embedded reference to the object which created it. This reference makes the instances of the class larger, and may keep the reference to the creator object alive longer than necessary.
- Added a static compiled Pattern to avoid compile it each time it is used when we need to replace some part of authority.
- Improved using of StringBuilders.

Result:

Performance improvements.
2014-11-20 00:58:35 -05:00
Norman Maurer
1914b77c71 [maven-release-plugin] prepare for next development iteration 2014-10-29 11:48:40 +01:00
Norman Maurer
c170e7df3f [maven-release-plugin] prepare release netty-4.0.24.Final 2014-10-29 11:47:19 +01:00
Norman Maurer
cb85ed9d66 Disable caching of PooledByteBuf for different threads.
Motivation:

We introduced a PoolThreadCache which is used in our PooledByteBufAllocator to reduce the synchronization overhead on PoolArenas when allocate / deallocate PooledByteBuf instances. This cache is used for both the allocation path and deallocation path by:
  - Look for cached memory in the PoolThreadCache for the Thread that tries to allocate a new PooledByteBuf and if one is found return it.
  - Add the memory that is used by a PooledByteBuf to the PoolThreadCache of the Thread that release the PooledByteBuf

This works out very well when all allocation / deallocation is done in the EventLoop as the EventLoop will be used for read and write. On the otherside this can lead to surprising side-effects if the user allocate from outside the EventLoop and and pass the ByteBuf over for writing. The problem here is that the memory will be added to the PoolThreadCache that did the actual write on the underlying transport and not on the Thread that previously allocated the buffer.

Modifications:

Don't cache if different Threads are used for allocating/deallocating

Result:

Less confusing behavior for users that allocate PooledByteBufs from outside the EventLoop.
2014-09-22 13:38:35 +02:00
Norman Maurer
687d3d3b5c [#2924] Correctly update head in MemoryRegionCache.trim()
Motivation:
When MemoryRegionCache.trim() is called, some unused cache entries will be freed (started from head). However, in MeoryRegionCache.trim() the head is not updated, which make entry list's head point to an entry whose chunk is null now and following allocate of MeoryRegionCache will return false immediately.

In other word, cache is no longer usable once trim happen.

Modifications:

Update head to correct idx after free entries in trim().

Result:

MemoryRegionCache behaves correctly even after calling trim().
2014-09-22 10:56:17 +02:00
Norman Maurer
0bab6116f3 [#2843] Add test-case to show correct behavior of ByteBuf.refCnt() and ByteBuf.release(...)
Motivation:

We received a bug-report that the ByteBuf.refCnt() does sometimes not show the correct value when release() and refCnt() is called from different Threads.

Modifications:

Add test-case which shows that all is working like expected

Result:

Test-case added which shows everything is ok.
2014-09-01 08:48:15 +02:00
Trustin Lee
7710e7da44 [maven-release-plugin] prepare for next development iteration 2014-08-16 03:02:02 +09:00