Commit Graph

366 Commits

Author SHA1 Message Date
Norman Maurer
7d4c077492 Add *UnsafeHeapByteBuf for improve performance on systems with sun.misc.Unsafe
Motivation:

sun.misc.Unsafe allows us to handle heap ByteBuf in a more efficient matter. We should use special ByteBuf implementation when sun.misc.Unsafe can be used to increase performance.

Modifications:

- Add PooledUnsafeHeapByteBuf and UnpooledUnsafeHeapByteBuf that are used when sun.misc.Unsafe is ready to use.
- Add UnsafeHeapSwappedByteBuf

Result:

Better performance when using heap buffers and sun.misc.Unsafe is ready to use.
2015-10-21 09:04:13 +02:00
Norman Maurer
f30a51b905 Correctly handle byte shifting if system does not support unaligned access.
Motivation:

We had a bug in our implemention which double "reversed" bytes on systems which not support unaligned access.

Modifications:

- Correctly only reverse bytes if needed.
- Share code between unsafe implementations.

Result:

No more data-corruption on sytems without unaligned access.
2015-10-20 17:32:13 +02:00
Matteo Merli
27c68647df In (Pooled|Unpooled)UnsafeDirectByteBuf copy memory directly to and from ByteBuffer
Motivation:

When moving bytes between a PooledUnsafeDirectByteBuf or an UnpooledUnsafeDirectByteBuf
and a ByteBuffer, a temp ByteBuffer is allocated and will need to be GCed. This is a
common case since a ByteBuffer is always needed when reading/writing on a file,
for example.

Modifications:

Use PlatformDependent.copyMemory() to avoid the need for the temp ByteBuffer

Result:

No temp ByteBuffer allocated and GCed.
2015-10-19 22:24:07 +02:00
Norman Maurer
5a6238ed4c Minimize reference count checks in SlicedByteBuf
Motivation:

SlicedByteBuf did double reference count checking for various bulk operations, which affects performance.

Modifications:

- Add package private method to AbstractByteBuf that can be used to check indexes without check the reference count
- Use this new method in the bulk operation os SlicedByteBuf as the reference count checks take place on the wrapped buffer anyway
- Fix test-case to not try to read data that is out of the bounds of the buffer.

Result:

Better performance on bulk operations when using SlicedByteBuf (and sub-classes)
2015-10-16 21:09:03 +02:00
Norman Maurer
8f13e333dd Always return a real slice even when the length is 0
Motivation:

We need to always return a real slice even when the requested length is 0. This is needed as otherwise we not correctly share the reference count and so may leak a buffer if the user call release() on the returned slice and expect it to decrement the reference count of the "parent" buffer.

Modifications:

- Always return a real slice
- Add unit test for the bug.

Result:

No more leak possible when a user requests a slice of length 0 of a SlicedByteBuf.
2015-10-16 20:46:05 +02:00
Norman Maurer
4c287d4e27 Added SlicedAbstractByteBuf that can provide fast-path for _get* and _set* methods
Motivation:

SlicedByteBuf can be used for any ByteBuf implementations and so can not do any optimizations that could be done
when AbstractByteBuf is sliced.

Modifications:

- Add SlicedAbstractByteBuf that can eliminate range and reference count checks for _get* and _set* methods.

Result:

Faster SlicedByteBuf implementations for AbstractByteBuf sub-classes.
2015-10-16 09:12:20 +02:00
Norman Maurer
8c93f4b1ef Added DuplicatedAbstractByteBuf that can provide fast-path for _get* and _set* methods
Motivation:

DuplicatedByteBuf can be used for any ByteBuf implementations and so can not do any optimizations that could be done
when AbstractByteBuf is duplicated.

Modifications:

- Add DuplicatedAbstractByteBuf that can eliminate range and reference count checks for _get* and _set* methods.

Result:

Faster DuplicatedByteBuf implementations for AbstractByteBuf sub-classes.
2015-10-16 08:56:35 +02:00
Norman Maurer
2aef4a504f Minimize object allocation when calling AbstractByteBuf.toString(..., Charset)
Motivation:

Calling AbstractByteBuf.toString(..., Charset) is used quite frequently by users but produce a lot of GC.

Modification:

- Use a FastThreadLocal to store the CharBuffer that are needed for decoding.
- Use internalNioBuffer(...) when possible

Result:

Less object creation / Less GC
2015-10-15 17:51:57 +02:00
Norman Maurer
9697afc106 Allow to disable reference count checks on every access of the ByteBuf
Motiviation:

Checking reference count on every access on a ByteBuf can have some big performance overhead depending on how the access pattern is. If the user is sure that there are no reference count errors on his side it should be possible to disable the check and so gain the max performance.

Modification:

- Add io.netty.buffer.bytebuf.checkAccessible system property which allows to disable the checks. Enabled by default.
- Add microbenchmark

Result:

Increased performance for operations on the ByteBuf.
2015-10-15 10:21:16 +02:00
Norman Maurer
ffe7aafd82 Optimize and minimize bound checks
Motivation:

We should minimize and optimize bound checks as much as possible to get the most out of performance.

Modifications:

- Use bitwise operations to remove branching
- Remove branches when possible

Result:

Better performance for various operations.
2015-10-15 10:18:13 +02:00
Norman Maurer
30dc1c1fa4 [#4313] ByteBufUtil.writeUtf8 should use fast-path for WrappedByteBuf
Motivation:

ByteBufUtil.writeUtf8(...) / writeUsAscii(...) can use a fast-path when writing into AbstractByteBuf. We should try to unwrap WrappedByteBuf implementations so
we are able to do the same on wrapped AbstractByteBuf instances.

Modifications:

- Try to unwrap WrappedByteBuf to use the fast-path

Result:

Faster writing of utf8 and usascii for WrappedByteBuf instances.
2015-10-13 12:00:37 +02:00
Norman Maurer
99b11c95b4 [#4327] Ensure toString() will not throw IllegalReferenceCountException
Motivation:

As toString() is often used while logging we need to ensure this produces no exception.

Modifications:

Ensure we never throw an IllegalReferenceCountException.

Result:

Be able to log without produce exceptions.
2015-10-10 20:12:43 +02:00
Norman Maurer
956a757d37 [#3789] Correctly reset markers for all allocations when using PooledByteBufAllocator
Motivation:

We need to ensure all markers are reset when doing an allocation via the PooledByteBufAllocator. This was not the always the case.

Modifications:

Move all logic that needs to get executed when reuse a PooledByteBuf into one place and call it.

Result:

Correct behavior
2015-09-25 19:57:33 +02:00
Norman Maurer
cac51ab8d6 Optimize ByteBufUtil.writeUsAscii(...) when AsciiString is used.
Motivation:

When AsciiString is used we can optimize the write operation done by ByteBufUtil.writeUsAscii(...)

Modifications:

Sepcial handle AsciiString.

Result:

Faster writing of AsciiString.
2015-09-15 12:26:58 +02:00
Matteo Merli
7a9a3159f9 Added debug logging with effective value for io.netty.leakDetection.acquireAndReleaseOnly property
Motivation:
The configurable property value recently added was not logged like others properties.

Modifications:
Added debug log with effective value applied.

Result:
Consistent with other properties
2015-09-01 09:10:38 +02:00
Matteo Merli
9b45e9d015 Additional configuration for leak detection
Motivation:

Leak detector, when it detects a leak, will print the last 5 stack
traces that touched the ByteBuf. In some cases that might not be enough
to identify the root cause of the leak.
Also, sometimes users might not be interested in tracing all the
operations on the buffer, but just the ones that are affecting the
reference count.

Modifications:

Added command line properties to override default values:
 * Allow to configure max number of stack traces to collect
 * Allow to only record retain/release operation on buffers

Result:
Users can increase the number of stack traces to debug buffer leaks
with lot of retain/release operations.
2015-08-30 20:50:10 +02:00
Scott Mitchell
4bdd8dacb9 Restore derived buffer index/mark updates
Motivation:
As part of the revert process in https://github.com/netty/netty/pull/4138 some index and mark updates were lost.

Modifications:
- Restore the index / mark updates made in https://github.com/netty/netty/pull/3788

Result:
Slice and Duplicate buffers index / marks are correctly initialized.
2015-08-27 10:25:15 -07:00
Scott Mitchell
e280251b15 Revert "Add PooledSlicedByteBuf and PooledDuplicatedByteBuf"
Motivation:
Currently the "derived" buffer will only ever be recycled if the release call is made on the "derived" object, and the "wrapped" buffer ends up being "fully released" (aka refcount goes to 0). From my experience this is not the common use case and thus the "derived" buffers will not be recycled.

Modifications:
- revert https://github.com/netty/netty/pull/3788

Result:
Less complexity, and less code to create new objects in majority of cases.
2015-08-26 13:24:44 -07:00
Matteo Merli
0be53f296f MemoryRegionCache$Entry objects are not recycled
Motivation:

Even though MemoryRegionCache$Entry instances are allocated through a recycler they are not properly recycled,
leaving a lot of instances to be GCed along with Recycler$DefaultHandle objects.
Fixes #4071

Modification:

Recycle Entry when done using it.

Result:

Less GCed objects.
2015-08-10 21:29:25 +02:00
Scott Mitchell
cf171ff525 maxBytesPerRead channel configuration
Motiviation:
The current read loops don't fascilitate reading a maximum amount of bytes. This capability is useful to have more fine grain control over how much data is injested.

Modifications:
- Add a setMaxBytesPerRead(int) and getMaxBytesPerRead() to ChannelConfig
- Add a setMaxBytesPerIndividualRead(int) and getMaxBytesPerIndividualRead to ChannelConfig
- Add methods to RecvByteBufAllocator so that a pluggable scheme can be used to control the behavior of the read loop.
- Modify read loop for all transport types to respect the new RecvByteBufAllocator API

Result:
The ability to control how many bytes are read for each read operation/loop, and a more extensible read loop.
2015-08-05 23:59:54 -07:00
Norman Maurer
ae163d687d [#3896] Unpooled.copiedBuffer(ByteBuffer) and copiedBuffer(ByteBuffer...) is not thread-safe.
Motivation:

As we modify the position of the passed in ByteBuffer's this methods are not thread-safe.

Modifications:

Duplicate the input ByteBuffers before copy the content to  byte[].

Result:

Unpooled.copiedBuffer(ByteBuffer) and copiedBuffer(ByteBuffer...) is now thread-safe.
2015-07-07 08:38:37 +02:00
Norman Maurer
8c90d602d7 [#3899] Fix javadoc to use netty 4 API.
Motivation:

The javadoc of ByteBuf contained some out-dated code.

Modifications:

Update code example in javadoc to use netty 4+ API

Result:

Correct javadocs
2015-07-03 14:18:53 +02:00
Louis Ryan
ba6319eb6c Fix FixedCompositeByteBuf handling when copying to direct buffers and streams
Motivation:

FixedCompositeByteBuf does not properly implement a number of methods for
copying its content to direct buffers and output streams

Modifications:

Replace improper use of capacity() with readableBytes() when computing offesets during writes

Result:

Copying works correctly
2015-06-27 20:39:31 +02:00
Norman Maurer
e0ef01cf93 [#3888] Use 2 * cores as default minimum for pool arenas.
Motivation:

At the moment we use 1 * cores as default mimimum for pool arenas. This can easily lead to conditions as we use 2 * cores as default for EventLoop's when using NIO or EPOLL. If we choose a smaller number we will run into hotspots as allocation and deallocation needs to be synchronized on the PoolArena.

Modifications:

Change the default number of arenas to 2 * cores.

Result:

Less conditions when using the default settings.
2015-06-18 07:27:30 +02:00
Norman Maurer
4570f30dd9 [#3798] Extract dump method to ByteBufUtil
Motivation:

Dumping the content of a ByteBuf in a hex format is very useful.

Modifications:

Move code into ByteBufUtil so its easy to reuse.

Result:

Easy to reuse dumping code.
2015-06-09 06:21:09 +02:00
Norman Maurer
6f9eb2cd34 Update javadocs to highlight that derived buffers will not increment the reference count.
Motivation:

We not explain the derived buffers will not retain the parent buffer.

Modifications:

Add docs.

Result:

Correctly document behaviour
2015-06-06 19:54:38 +02:00
Norman Maurer
9f5a3e553c Fix regression introduced by f765053ae7 by use Entry after it is recycled 2015-05-27 16:56:20 +02:00
Norman Maurer
81fee66c78 Let PoolThreadCache work even if allocation and deallocation Thread are different
Motivation:

PoolThreadCache did only cache allocations if the allocation and deallocation Thread were the same. This is not optimal as often people write from differen thread then the actual EventLoop thread.

Modification:

- Add MpscArrayQueue which was forked from jctools and lightly modified.
- Use MpscArrayQueue for caches and always add buffer back to the cache that belongs to the allocation thread.

Result:

ThreadPoolCache is now also usable and so gives performance improvements when allocation and deallocation thread are different.

Performance when using same thread for allocation and deallocation is noticable worse then before.
2015-05-27 14:38:11 +02:00
Norman Maurer
ab1bb9136b [#3654] Synchronize on PoolSubpage head when allocate / free PoolSubpages
Motivation:

Currently we hold a lock on the PoolArena when we allocate / free PoolSubpages, which is wasteful as this also affects "normal" allocations. The same is true vice-verse.

Modifications:

Ensure we synchronize on the head of the PoolSubPages pool. This is done per size and so it is possible to concurrently allocate / deallocate PoolSubPages with different sizes, and also normal allocations.

Result:

Less condition and so faster allocation/deallocation.

Before this commit:
xxx:~/wrk $ ./wrk -H 'Connection: keep-alive' -d 120 -c 256 -t 16 -s scripts/pipeline-many.lua  http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
  16 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    17.61ms   29.52ms 689.73ms   97.27%
    Req/Sec   278.93k    41.97k  351.04k    84.83%
  530527460 requests in 2.00m, 71.64GB read
Requests/sec: 4422226.13
Transfer/sec:    611.52MB

After this commit:
xxx:~/wrk $ ./wrk -H 'Connection: keep-alive' -d 120 -c 256 -t 16 -s scripts/pipeline-many.lua  http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
  16 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    15.85ms   24.50ms 681.61ms   97.42%
    Req/Sec   287.14k    38.39k  360.33k    85.88%
  547902773 requests in 2.00m, 73.99GB read
Requests/sec: 4567066.11
Transfer/sec:    631.55MB

This is reproducable every time.
2015-05-27 10:33:12 +02:00
Norman Maurer
dce0dd9b78 [#3654] No need to hold lock while destroy a chunk
Motiviation:

At the moment we sometimes hold the lock on the PoolArena during destroy a PoolChunk. This is not needed.

Modification:

- Ensure we not hold the lock during destroy a PoolChunk
- Move all synchronized usage in PoolArena
- Cleanup

Result:

Less condition.
2015-05-27 09:47:53 +02:00
Norman Maurer
271af7c624 Expose metrics for PooledByteBufAllocator
Motivation:

The PooledByteBufAllocator is more or less a black-box atm. We need to expose some metrics to allow the user to get a better idea how to tune it.

Modifications:

- Expose different metrics via PooledByteBufAllocator
- Add *Metrics interfaces

Result:

It is now easy to gather metrics and detail about the PooledByteBufAllocator and so get a better understanding about resource-usage etc.
2015-05-20 21:06:17 +02:00
Norman Maurer
88b8558ec8 Add PooledSlicedByteBuf and PooledDuplicatedByteBuf
Motivation:

At the moment when calling slice(...) or duplicate(...) on a Pooled*ByteBuf a new SlicedByteBuf or DuplicatedByteBuf. This can create a lot of GC.

Modifications:

Add PooledSlicedByteBuf and PooledDuplicatedByteBuf which will be used when a PooledByteBuf is used.

Result:

Less GC.
2015-05-20 11:37:50 +02:00
Norman Maurer
33e443e71a Clarify ByteBuf.duplicate() semantics.
Motivation:

From the javadocs of ByteBuf.duplicate() it is not clear if the reader and writer marks will be duplicated.

Modifications:

Add sentence to clarify that marks will not be duplicated.

Result:

Clear semantics.
2015-05-20 08:32:52 +02:00
Norman Maurer
9d568586db Reset markers when obtain PooledByteBuf.
Motivation:

When allocate a PooledByteBuf we need to ensure to also reset the markers for the readerIndex and writerIndex.

Modifications:

- Correct reset the markers
- Add test-case for it

Result:

Correctly reset markers.
2015-05-20 07:29:32 +02:00
Norman Maurer
92bfeeca1b No need to release lock and acquire again when allocate normal size.
Motiviation:

When tried to allocate tiny and small sized and failed to serve these out of the PoolSubPage we exit the synchronization
block just to enter it again when call allocateNormal(...).

Modification:

Not exit the synchronized block until allocateNormal(...) is done.

Result:

Better performance.
2015-05-20 07:29:32 +02:00
Alwayswithme
79c17cf1fd ByteBufUtil use IndexOfProcessor to find occurrence.
Motivation:
The way of firstIndexOf and lastIndexOf iterating the ByteBuf is similar to forEachByte and forEachByteDesc, but have many range checks.
Modifications:
Use forEachByte and a IndexOfProcessor to find occurrence.
Result:
eliminate range checks
2015-05-07 09:36:47 +02:00
Scott Mitchell
f812180c2d ByteString arrayOffset method
Motivation:
The ByteString class currently assumes the underlying array will be a complete representation of data. This is limiting as it does not allow a subsection of another array to be used. The forces copy operations to take place to compensate for the lack of API support.

Modifications:
- add arrayOffset method to ByteString
- modify all ByteString and AsciiString methods that loop over or index into the underlying array to use this offset
- update all code that uses ByteString.array to ensure it accounts for the offset
- add unit tests to test the implementation respects the offset

Result:
ByteString and AsciiString can represent a sub region of a byte[].
2015-04-24 18:54:01 -07:00
Norman Maurer
8b1f247a1a [#3623] CompositeByteBuf.iterator() should return optimized Iterable
Motivation:

CompositeByteBuf.iterator() currently creates a new ArrayList and fill it with the ByteBufs, which is more expensive then it needs to be.

Modifications:

- Use special Iterator implementation

Result:

Less overhead when calling iterator()
2015-04-20 10:45:37 +02:00
Scott Mitchell
9a7a85dbe5 ByteString introduced as AsciiString super class
Motivation:
The usage and code within AsciiString has exceeded the original design scope for this class. Its usage as a binary string is confusing and on the verge of violating interface assumptions in some spots.

Modifications:
- ByteString will be created as a base class to AsciiString. All of the generic byte handling processing will live in ByteString and all the special character encoding will live in AsciiString.

Results:
The AsciiString interface will be clarified. Users of AsciiString can now be clear of the limitations the class imposes while users of the ByteString class don't have to live with those limitations.
2015-04-14 16:35:17 -07:00
garywu
4d02c3a040 [#2925] Bug fix for NormalMemoryRegionCache overbooked for PoolThreadCache
Motivation:

When create NormalMemoryRegionCache for PoolThreadCache, we overbooked
cache array size. This means unnecessary overhead for thread local cache
as we will create multi cache enties for each element in cache array.

Modifications:

change:
int arraySize = Math.max(1, max / area.pageSize);
to:
int arraySize = Math.max(1, log2(max / area.pageSize) + 1);

Result:

Now arraySize won't introduce unnecessary overhead.

 Changes to be committed:
	modified:   buffer/src/main/java/io/netty/buffer/PoolThreadCache.java
2015-04-13 09:02:10 +02:00
Norman Maurer
18627749a9 Let CompositeByteBuf implement Iterable
Motivation:

CompositeByteBuf has an iterator() method but fails to implement Iterable

Modifications:

Let CompositeByteBuf implement Iterable<ByteBuf>

Result:

Easier usage
2015-04-12 13:38:27 +02:00
Norman Maurer
d8e5d421e1 Revert "Dereference when calling PooledByteBuf.deallocate()"
This reverts commit 7094c7b797.
2015-04-11 06:44:32 +02:00
Norman Maurer
7094c7b797 Dereference when calling PooledByteBuf.deallocate()
Motivation:

We missed to dereference the chunk and tmpNioBuf when calling deallocate(). This means the GC can not collect these as we still hold a reference while have the PooledByteBuf in the recycler stack.

Modifications:

Dereference chunk and tmpNioBuf.

Result:

GC can collect things.
2015-04-10 21:47:14 +02:00
Norman Maurer
3e42292d8b Change PoolThreadCache to use LIFO for better cache performance
Motiviation:

At the moment we use FIFO for the PoolThreadCache which is sub-optimal as this may reduce the changes to have the cached memory actual still in the cpu-cache.

Modification:

- Change to use LIFO as this increase the chance to be able to serve buffers from the cpu-cache

Results:

Faster allocation out of the ThreadLocal cache.

Before the commit:
[xxx wrk]$ ./wrk -H 'Connection: keep-alive' -d 120 -c 256 -t 16 -s scripts/pipeline-many.lua  http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
  16 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    14.69ms   10.06ms 131.43ms   80.10%
    Req/Sec   283.89k    40.37k  433.69k    66.81%
  533859742 requests in 2.00m, 72.09GB read
Requests/sec: 4449510.51
Transfer/sec:    615.29MB

After the commit:
[xxx wrk]$ ./wrk -H 'Connection: keep-alive' -d 120 -c 256 -t 16 -s scripts/pipeline-many.lua  http://xxx:8080/plaintext
Running 2m test @ http://xxx:8080/plaintext
  16 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    16.38ms   26.32ms 734.06ms   97.38%
    Req/Sec   283.86k    39.31k  361.69k    83.38%
  540836511 requests in 2.00m, 73.04GB read
Requests/sec: 4508150.18
Transfer/sec:    623.40MB
2015-04-10 20:57:54 +02:00
Scott Mitchell
0d3a6e0511 HTTP/2 Decoder reduce preface conditional checks
Motivation:
The DefaultHttp2ConnectionDecoder class is calling verifyPrefaceReceived() for almost every frame event at all times.
The Http2ConnectionHandler class is calling readClientPrefaceString() on every decode event.

Modifications:
- DefaultHttp2ConnectionDecoder should not have to continuously call verifyPrefaceReceived() because it transitions boolean state 1 time for each connection.
- Http2ConnectionHandler should not have to continuously call readClientPrefaceString() because it transitions boolean state 1 time for each connection.

Result:
- Less conditional checks for the mainstream usage of the connection.
2015-03-28 18:52:35 -07:00
Leo Gomes
4500adb6e0 Updates the javadoc of Unpooled to remove mention to methods it does not provide
Motivation:

`Unpooled` javadoc's mentioned the generation of hex dump and swapping an integer's byte order,
which are actually provided by `ByteBufUtil`.

Modifications:

 Sentence moved to `ByteBufUtil` javadoc.

Result:

`Unpooled` javadoc is correct.
2015-03-04 12:04:14 +09:00
Norman Maurer
41fd857a7c Ensure CompositeByteBuf.addComponent* handles buffer in consistent way and not causes leaks
Motivation:

At the moment we have two problems:
 - CompositeByteBuf.addComponent(...) will not add the supplied buffer to the CompositeByteBuf if its empty, which means it will not be released on CompositeByteBuf.release() call. This is a problem as a user will expect everything added will be released (the user not know we not added it).
 - CompositeByteBuf.addComponents(...) will either add no buffers if none is readable and so has the same problem as addComponent(...) or directly release the ByteBuf if at least one ByteBuf is readable. Again this gives inconsistent handling and may lead to memory leaks.

Modifications:

 - Always add the buffer to the CompositeByteBuf and so release it on release call.

Result:

Consistent handling and no buffer leaks.
2015-02-12 16:09:41 +01:00
Trustin Lee
155c0e2f36 Implement internal memory access methods of CompositeByteBuf correctly
Motivation:

When a CompositeByteBuf is empty (i.e. has no component), its internal
memory access operations do not always behave as expected.

Modifications:

Check if the nunmber of components is zero. If so, return an empty
array or an empty NIO buffer, etc.

Result:

More robustness
2014-12-30 15:56:53 +09:00
Trustin Lee
a666acce6d Add more tests to EmptyByteBufTest
- Ensure an EmptyByteBuf has an array, an NIO buffer, and a memory
  address at the same time
- Add an assertion that checks if EMPTY_BUFFER is an EmptyByteBuf,
  just in case we make a mistake in the future
2014-12-30 15:51:45 +09:00
Norman Maurer
fe796fc8ab Provide helper methods in ByteBufUtil to write UTF-8/ASCII CharSequences. Related to [#909]
Motivation:

We expose no methods in ByteBuf to directly write a CharSequence into it. This leads to have the user either convert the CharSequence first to a byte array or use CharsetEncoder. Both cases have some overheads and we can do a lot better for well known Charsets like UTF-8 and ASCII.

Modifications:

Add ByteBufUtil.writeAscii(...) and ByteBufUtil.writeUtf8(...) which can do the task in an optimized way. This is especially true if the passed in ByteBuf extends AbstractByteBuf which is true for all of our implementations which not wrap another ByteBuf.

Result:

Writing an ASCII and UTF-8 CharSequence into a AbstractByteBuf is a lot faster then what the user could do by himself as we can make use of some package private methods and so eliminate reference and range checks. When the Charseq is not ASCII or UTF-8 we can still do a very good job and are on par in most of the cases with what the user would do.

The following benchmark shows the improvements:

Result: 2456866.966 ?(99.9%) 59066.370 ops/s [Average]
  Statistics: (min, avg, max) = (2297025.189, 2456866.966, 2586003.225), stdev = 78851.914
  Confidence interval (99.9%): [2397800.596, 2515933.336]

Benchmark                                                        Mode   Samples        Score  Score error    Units
i.n.m.b.ByteBufUtilBenchmark.writeAscii                         thrpt        50  9398165.238   131503.098    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeAsciiString                   thrpt        50  9695177.968   176684.821    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeAsciiStringViaArray           thrpt        50  4788597.415    83181.549    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeAsciiStringViaArrayWrapped    thrpt        50  4722297.435    98984.491    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeAsciiStringWrapped            thrpt        50  4028689.762    66192.505    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeAsciiViaArray                 thrpt        50  3234841.565    91308.009    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeAsciiViaArrayWrapped          thrpt        50  3311387.474    39018.933    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeAsciiWrapped                  thrpt        50  3379764.250    66735.415    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeUtf8                          thrpt        50  5671116.821   101760.081    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeUtf8String                    thrpt        50  5682733.440   111874.084    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeUtf8StringViaArray            thrpt        50  3564548.995    55709.512    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeUtf8StringViaArrayWrapped     thrpt        50  3621053.671    47632.820    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeUtf8StringWrapped             thrpt        50  2634029.071    52304.876    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeUtf8ViaArray                  thrpt        50  3397049.332    57784.119    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeUtf8ViaArrayWrapped           thrpt        50  3318685.262    35869.562    ops/s
i.n.m.b.ByteBufUtilBenchmark.writeUtf8Wrapped                   thrpt        50  2473791.249    46423.114    ops/s
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1,387.417 sec - in io.netty.microbench.buffer.ByteBufUtilBenchmark

Results :

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

Results :

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

The *ViaArray* benchmarks are basically doing a toString().getBytes(Charset) which the others are using ByteBufUtil.write*(...).
2014-12-26 15:58:18 +09:00