Commit Graph

6269 Commits

Author SHA1 Message Date
Norman Maurer
4d2b78ca3c Reduce the memory copies in JdkZlibEncoder
Motivation:

At the moment we use a lot of unnecessary memory copies in JdkZlibEncoder. This is caused by either allocate a to small ByteBuf and expand it later or using a temporary byte array.
Beside this the memory footprint of JdkZlibEncoder is pretty high because of the byte[] used for compressing.

Modification:

- Override allocateBuffer(...) and calculate the estimatedsize in there, this reduce expanding of the ByteBuf later
- Not use byte[] in the instance itself but allocate a heap ByteBuf and write directly into the byte array

Result:

Less memory copies and smaller memory footprint
2014-06-26 11:12:19 +02:00
Trustin Lee
937f790f70 Checkstyle 2014-06-26 17:48:32 +09:00
Trustin Lee
5f889d92a1 Fix buffer leaks in Bzip2Decoder(Test)
If decompression fails, the buffer that contains the decompressed data
is not released.  Bzip2DecoderTest.testStreamCrcError() also does not
release the partial output Bzip2Decoder produces.
2014-06-26 17:48:32 +09:00
Trustin Lee
c0462c0c3b Optimize PoolChunk
- Using short[] for memoryMap did not improve performance. Reverting
  back to the original dual-byte[] structure in favor of simplicity.
- Optimize allocateRun() which yields small performence improvement
- Use local variable when member fields are accessed more than once
2014-06-26 17:06:10 +09:00
Trustin Lee
dbc011c3f4 Fix inspector warnings 2014-06-26 17:06:10 +09:00
Pavan Kumar
69a6ad940a Improve the allocation algorithm in PoolChunk
Motivation:

Depth-first search is not always efficient for buddy allocation.

Modification:

Employ a new faster search algorithm with different memoryMap layout.

Result:

With thread-local cache disabled, we see a lot of performance
improvment, especially when the size of the allocation is as small as
the page size, which had the largest search space previously.
2014-06-26 17:06:10 +09:00
Norman Maurer
56d732d439 Fix buffer leaks in Bzip2DecoderTest 2014-06-26 09:21:44 +02:00
Norman Maurer
31211487e9 Use IntObjectMap to replace Map in EpollEventLoop.
Motivation:

We need to map from ints to AbstractEpollChannel in EpollEventLoop but there is no need for box to Integer.

Modification:

Replace Map with IntObjectMap.

Result:

No more auto-boxing needed.
2014-06-25 20:22:58 +02:00
Norman Maurer
8a75ba35ef [#2599] Not use sun.nio.ch.DirectBuffer as it not exists on android
Motivation:

During some refactoring we changed PlatformDependend0 to use sun.nio.ch.DirectBuffer for release direct buffers. This broke support for android as the class does not exist there and so an exception is thrown.

Modification:

Use again the fieldoffset to get access to Cleaner for release direct buffers.

Result:
Netty can be used on android again
2014-06-25 15:07:02 +02:00
Alexey Parfenov
4b4f4e619e Fix integer overflow in HttpObjectEncoder when handling chunked encoding and FileRegion > Integer.MAX_VALUE
Motivation:

Due to integer overflow bug, writes of FileRegions to http server pipeline (eg like one from HttpStaticFileServer example) with length greater than Integer.MAX_VALUE are ignored in 1/2 of cases (ie no data gets sent to client)

Modification:

Correctly handle chunk sized > Integer.MAX_VALUE

Result:

Be able to use FileRegion > Integer.MAX_VALUE when using chunked encoding.
2014-06-24 12:03:48 +02:00
Trustin Lee
41d44a8161 Remove 'get' prefix from all HTTP/SPDY messages
Motivation:

Persuit for the consistency in method naming

Modifications:

- Remove the 'get' prefix from all HTTP/SPDY message classes
- Fix some inspector warnings

Result:

Consistency
2014-06-24 18:03:33 +09:00
onlychoice
5275f37ffb Fix a typo in comment 2014-06-24 11:01:35 +02:00
Norman Maurer
030bcaae81 Improve performance of Recycler
Motivation:

Recycler is used in many places to reduce GC-pressure but is still not as fast as possible because of the internal datastructures used.

Modification:

 - Rewrite Recycler to use a WeakOrderQueue which makes minimal guaranteer about order and visibility for max performance.
 - Recycling of the same object multiple times without acquire it will fail.
 - Introduce a RecyclableMpscLinkedQueueNode which can be used for MpscLinkedQueueNodes that use Recycler

These changes are based on @belliottsmith 's work that was part of #2504.

Result:

Huge increase in performance.

4.0 branch without this commit:

Benchmark                                                (size)   Mode   Samples        Score  Score error    Units
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    00000  thrpt        20 116026994.130  2763381.305    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    00256  thrpt        20 110823170.627  3007221.464    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    01024  thrpt        20 118290272.413  7143962.304    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    04096  thrpt        20 120560396.523  6483323.228    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    16384  thrpt        20 114726607.428  2960013.108    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    65536  thrpt        20 119385917.899  3172913.684    ops/s
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.617 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark

4.0 branch with this commit:

Benchmark                                                (size)   Mode   Samples        Score  Score error    Units
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    00000  thrpt        20 204158855.315  5031432.145    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    00256  thrpt        20 205179685.861  1934137.841    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    01024  thrpt        20 209906801.437  8007811.254    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    04096  thrpt        20 214288320.053  6413126.689    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    16384  thrpt        20 215940902.649  7837706.133    ops/s
i.n.m.i.RecyclableArrayListBenchmark.recycleSameThread    65536  thrpt        20 211141994.206  5017868.542    ops/s
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 297.648 sec - in io.netty.microbench.internal.RecyclableArrayListBenchmark
2014-06-24 10:47:38 +02:00
Trustin Lee
bf85af5743 Fix buffer leaks in Bzip2DecoderTest 2014-06-24 16:47:14 +09:00
Trustin Lee
448b0105b4 Deprecate SocksMessage.encodeAsByteBuf()
It was an internal use only method which became public by a mistake
during the review process.
2014-06-24 16:40:44 +09:00
Trustin Lee
45fde9abba Rename fromByte() to valueOf()
Motivation:

Persuit the consistency in method naming

Modifications:

Rename fromByte(byte) to valueOf(byte)

Result:

Consistency
2014-06-24 16:34:52 +09:00
Trustin Lee
33b9bc02ed Fix potential buffer leak in AbstractBinaryMemcacheDecoder
If a connection is closed unexpectedly while
AbstractBinaryMemcacheDecoder decodes a message, the half-constructed
message's content might not be released.
2014-06-24 16:25:40 +09:00
Trustin Lee
cde319dabd Adhere to our getter/setter naming rules
Motivation:

Persuit for consistent method naming across all classes

Modifications:

Remove 'get' prefix for the getter methods in codec-memcache

Result:

More consistent method naming
2014-06-24 16:10:32 +09:00
Trustin Lee
3c21b1cc43 Fix buffer leak in DefaultFullBinaryMemcacheRequest/Response
Motivation:

DefaultFullBinaryMemcacheRequest/Response overrides release(), retain(),
and touch() methods without calling its super, resulting in a leak of
the extras.

Modifications:

When overriding release(), retain(), and touch(), ensure to call super.

Result:

Fixes #2533 by fixing the buffer leak
2014-06-24 15:07:33 +09:00
Idel Pivnitskiy
f9021a6061 Implement a Bzip2Decoder
Motivation:

Bzip2Decoder provides receiving data compressed in bzip2 format.

Modifications:

Added classes:
- Bzip2Decoder
- Bzip2Constants
- Bzip2BlockDecompressor
- Bzip2HuffmanStageDecoder
- Bzip2MoveToFrontTable
- Bzip2Rand
- Crc32
- Bzip2DecoderTest

Result:

Implemented and tested new decoder which can uncompress incoming data in bzip2 format.
2014-06-24 14:50:09 +09:00
Norman Maurer
12a3e23e47 MessageToByteEncoder always starts with ByteBuf that use initalCapacity == 0
Motivation:

MessageToByteEncoder always starts with ByteBuf that use initalCapacity == 0 when preferDirect is used. This is really wasteful in terms of performance as every first write into the buffer will cause an expand of the buffer itself.

Modifications:

 - Change ByteBufAllocator.ioBuffer() use the same default initialCapacity as heapBuffer() and directBuffer()
 - Add new allocateBuffer method to MessageToByteEncoder that allow the user to do some smarter allocation based on the message that will be encoded.

Result:

Less expanding of buffer and more flexibilty when allocate the buffer for encoding.
2014-06-24 13:55:21 +09:00
Norman Maurer
4036eda048 Make use of HttpChunkedInput as this will also work when compression is used 2014-06-23 09:38:48 +02:00
Trustin Lee
37b07a04d4 Revert "Improve the allocation algorithm in PoolChunk"
This reverts commit 36305d7dce, which
seems to cause an assertion failure on our CI machine.
2014-06-21 19:19:35 +09:00
Trustin Lee
e99f26fe09 Make sure OpenSslEngine is tested against transport-native-epoll 2014-06-21 18:28:54 +09:00
Trustin Lee
cdaeb54fb9 Remove padding utility classes
- It's not used anywhere
2014-06-21 17:59:49 +09:00
Trustin Lee
f44720850c Add missing last padding / Comment 2014-06-21 17:57:06 +09:00
Trustin Lee
a368f9d12a Checkstyle / Overall clean-up / Fix serialization 2014-06-21 17:57:06 +09:00
nitsanw
32aab3b0b3 Fix false sharing between head and tail reference in MpscLinkedQueue
Motivation:

The tail node reference writes (by producer threads) are very likely to
invalidate the cache line holding the headRef which is read by the
consumer threads in order to access the padded reference to the head
node. This is because the resulting layout for the object is:

- header
- Object AtomicReference.value -> Tail node
- Object MpscLinkedQueue.headRef -> PaddedRef -> Head node

This is 'passive' false sharing where one thread reads and the other
writes.  The current implementation suffers from further passive false
sharing potential from any and all neighbours to the queue object as no
pre/post padding is provided for the class fields.

Modifications:

Fix the memory layout by adding pre-post padding for the head node and
putting the tail node reference in the same object.

Result:

Fixed false sharing
2014-06-21 17:57:06 +09:00
Trustin Lee
c65d5b170d Mqtt -> MQTT 2014-06-21 17:14:41 +09:00
Trustin Lee
cd8e35e95a Fix test failures due to incorrect validation 2014-06-21 17:11:35 +09:00
Trustin Lee
00d2cea8ba Overall clean-up on codec-mqtt
- Use simple string concatenation instead of String.format()
- Rewrite exception messages so that it follows our style
- Merge MqttCommonUtil and MqttValidationUtil into MqttCodecUtil
- Hide MqttCodecUtil from users
- Rename MqttConnectReturnCode.value to byteValue
- Rename MqttMessageFactory.create*() to new*()
- Rename QoS to MqttQoS
- Make MqttSubAckPayload.grantedQoSLevels immutable and add more useful
  constructor
2014-06-21 16:52:28 +09:00
Mousom Dhar Gupta
1ba5fa4b4b Add MQTT protocol codec
MQTT is a open source protocol on top of TCP which is widely used in
mobile communication and also for IoT (Internet of Things) today. This
will add an open source implementation of MQTT so that it becomes easier
for Netty users to implement an MQTT application.

For more information about the MQTT protocol, read this:

http://public.dhe.ibm.com/software/dw/webservices/ws-mqtt/mqtt-v3r1.html
2014-06-21 16:52:10 +09:00
Trustin Lee
e274a6549c Checkstyle 2014-06-21 16:04:13 +09:00
Trustin Lee
3900d1c665 Overall refactoring of the haproxy codec
- Convert constant classes to enum
- Rename HAProxyProtocolMessage to HAProxyMessage for simpilicity
- Rename HAProxyProtocolDecoder to HAProxyMessageDecoder
- Rename HAProxyProtocolCommand to HAProxyCommand
- Merge ProxiedProtocolAndFamity, ProxiedAddressFamily, and
  ProxiedTransportProtocol into HAProxiProxiedProtocol and its inner
  enums
- Overall clean-up
2014-06-21 16:00:27 +09:00
Trustin Lee
8c25830b0b Move haproxy codec to a separate module 2014-06-21 15:59:21 +09:00
Jon Keys
d7b2affe32 Add HAProxy protocol decoder
Motivation:

The proxy protocol provides client connection information for proxied
network services. Several implementations exist (e.g. Haproxy, Stunnel,
Stud, Postfix), but the primary motivation for this implementation is to
support the proxy protocol feature of Amazon Web Services Elastic Load
Balancing.

Modifications:

This commit adds a proxy protocol decoder for proxy protocol version 1
as specified at:

  http://haproxy.1wt.eu/download/1.5/doc/proxy-protocol.txt

The foundation for version 2 support is also in this commit but it is
explicitly NOT supported due to a lack of external implementations to
test against.

Result:

The proxy protocol decoder can be used to send client connection
information to inbound handlers in a channel pipeline from services
which support the proxy protocol.
2014-06-21 15:59:21 +09:00
nmittler
02a6dc8ba7 Adding int-to-object map implementation.
Motivation:

Maps with integer keys are used in several places (HTTP/2 code, for
example). To reduce the memory footprint of these structures, we need a
specialized map class that uses ints as keys.

Modifications:

Added IntObjectHashMap, which is uses open addressing and double hashing
for collision resolution.

Result:

A new int-based map class that can be shared across Netty.
2014-06-21 08:37:59 +02:00
Trustin Lee
f67ac5e46d Fix the inconsistencies between performance tests in ByteBufAllocatorBenchmark
Motivation:

default*() tests are performing a test in a different way, and they must be same with other tests.

Modification:

Make sure default*() tests are same with the others

Result:

Easier to compare default and non-default allocators
2014-06-21 13:28:02 +09:00
Pavan Kumar
6bd8c5d4d0 Improve the allocation algorithm in PoolChunk
Motivation:

Depth-first search is not always efficient for buddy allocation.

Modification:

Employ a new faster search algorithm with different memoryMap layout.

Result:

With thread-local cache disabled, we see a lot of performance
improvment, especially when the size of the allocation is as small as
the page size, which had the largest search space previously:

-- master head --
Benchmark                (size) Mode    Score  Error Units
pooledDirectAllocAndFree  8192 thrpt  215.392  1.565 ops/ms
pooledDirectAllocAndFree 16384 thrpt  594.625  2.154 ops/ms
pooledDirectAllocAndFree 65536 thrpt 1221.520 18.965 ops/ms
pooledHeapAllocAndFree    8192 thrpt  217.175  1.653 ops/ms
pooledHeapAllocAndFree   16384 thrpt  587.250 14.827 ops/ms
pooledHeapAllocAndFree   65536 thrpt 1217.023 44.963 ops/ms

-- changes --
Benchmark                (size) Mode    Score  Error Units
pooledDirectAllocAndFree  8192 thrpt 3656.744 94.093 ops/ms
pooledDirectAllocAndFree 16384 thrpt 4087.152 22.921 ops/ms
pooledDirectAllocAndFree 65536 thrpt 4058.814 29.276 ops/ms
pooledHeapAllocAndFree    8192 thrpt 3640.355 44.418 ops/ms
pooledHeapAllocAndFree   16384 thrpt 4030.206 24.365 ops/ms
pooledHeapAllocAndFree   65536 thrpt 4103.991 70.991 ops/ms
2014-06-21 13:20:25 +09:00
Norman Maurer
81e5f1ad46 [#2589] LocalServerChannel.doClose() throws NPE when localAddress == null
Motivation:

LocalServerChannel.doClose() calls LocalChannelRegistry.unregister(localAddress); without check if localAddress is null and so produce a NPE when pass null the used ConcurrentHashMapV8

Modification:
Check for localAddress != null before try to remove it from Map. Also added a unit test which showed the stacktrace of the error.

Result:

No more NPE during doClose().
2014-06-20 20:13:23 +02:00
Norman Maurer
f05510063e Remove System.out.println(...) debug messages 2014-06-20 19:42:38 +02:00
Norman Maurer
371f8066d2 [#2580] [#2587] Fix buffer corruption regression when ByteBuf.order(LITTLE_ENDIAN) is used
Motivation:

To improve the speed of ByteBuf with order LITTLE_ENDIAN and where the native order is also LITTLE_ENDIAN (intel) we introduces a new special SwappedByteBuf before in commit 4ad3984c8b. Unfortunally the commit has a flaw which does not handle correctly the case when a ByteBuf expands. This was caused because the memoryAddress was cached and never changed again even if the underlying buffer expanded. This can lead to corrupt data or even to SEGFAULT the JVM if you are lucky enough.

Modification:

Always lookup the actual memoryAddress of the wrapped ByteBuf.

Result:

No more data-corruption for ByteBuf with order LITTLE_ENDIAN and no JVM crashes.
2014-06-20 18:24:44 +02:00
Norman Maurer
3d3ec4753d [#2586] Use correct EventLoop to notify delayed successful registration
Motivation:

At the moment AbstractBoostrap.bind(...) will always use the GlobalEventExecutor to notify the returned ChannelFuture if the registration is not done yet. This should only be done if the registration fails later. If it completes successful we should just notify with the EventLoop of the Channel.

Modification:

Use EventLoop of the Channel if possible to use the correct Thread to notify and so guaranteer the right order of events.

Result:

Use the correct EventLoop for notification
2014-06-20 16:59:13 +02:00
Trustin Lee
085a61a310 Refactor FastThreadLocal to simplify TLV management
Motivation:

When Netty runs in a managed environment such as web application server,
Netty needs to provide an explicit way to remove the thread-local
variables it created to prevent class loader leaks.

FastThreadLocal uses different execution paths for storing a
thread-local variable depending on the type of the current thread.
It increases the complexity of thread-local removal.

Modifications:

- Moved FastThreadLocal and FastThreadLocalThread out of the internal
  package so that a user can use it.
- FastThreadLocal now keeps track of all thread local variables it has
  initialized, and calling FastThreadLocal.removeAll() will remove all
  thread-local variables of the caller thread.
- Added FastThreadLocal.size() for diagnostics and tests
- Introduce InternalThreadLocalMap which is a mixture of hard-wired
  thread local variable fields and extensible indexed variables
- FastThreadLocal now uses InternalThreadLocalMap to implement a
  thread-local variable.
- Added ThreadDeathWatcher.unwatch() so that PooledByteBufAllocator
  tells it to stop watching when its thread-local cache has been freed
  by FastThreadLocal.removeAll().
- Added FastThreadLocalTest to ensure that removeAll() works
- Added microbenchmark for FastThreadLocal and JDK ThreadLocal
- Upgraded to JMH 0.9

Result:

- A user can remove all thread-local variables Netty created, as long as
  he or she did not exit from the current thread. (Note that there's no
  way to remove a thread-local variable from outside of the thread.)
- FastThreadLocal exposes more useful operations such as isSet() because
  we always implement a thread local variable via InternalThreadLocalMap
  instead of falling back to JDK ThreadLocal.
- FastThreadLocalBenchmark shows that this change improves the
  performance of FastThreadLocal even more.
2014-06-19 21:13:55 +09:00
Norman Maurer
7ee18e92f9 Small improvement in SimpleChannelInboundHandlerAdapter javadoc 2014-06-18 14:49:36 +02:00
Norman Maurer
061cb21689 Make use of AtomicLongFieldUpdater.addAndGet(...) for cleaner code
Motivation:

The code in ChannelOutboundBuffer can be simplified by using AtomicLongFieldUpdater.addAndGet(...)

Modification:

Replace our manual looping with AtomicLongFieldUpdater.addAndGet(...)

Result:

Cleaner code
2014-06-17 20:18:18 +02:00
Norman Maurer
ad86ec798d Move calculateNewCapacity(...) to ByteBufAllocator
Motivation:

Currently we have the algorithm of calculate the new capacity of a ByteBuf implemented in AbstractByteBuf. The problem with this is that it is impossible for a user to change it if it not fits well it's use-case. We should better move it to ByteBufAllocator and so let the user implement it's own by either write his/her own ByteBufAllocator or just override the default implementation in one of our provided ByteBufAllocators.

Modifications:

Move calculateNewCapacity(...) to ByteBufAllocator and move the implementation (which was part of AbstractByteBuf) to AbstractByteBufAllocator.

Result:

The user can now override the default calculation algorithm when needed.
2014-06-17 09:35:45 +02:00
Norman Maurer
375da788e7 [#2577] ChannelOutboundBuffer.addFlush() unnecessary loop through all entries on multiple calls
Motivation:

If ChannelOutboundBuffer.addFlush() is called multiple times and flushed != unflushed it will still loop through all entries that are not flushed yet even if it is not needed anymore as these were marked uncancellable before.

Modifications:

Check if new messages were added since addFlush() was called and only if this was the case loop through all entries and try to mark the uncancellable.

Result:

Less overhead when ChannelOuboundBuffer.addFlush() is called multiple times without new messages been added.
2014-06-17 09:31:53 +02:00
Trustin Lee
4d60ea2aeb Fix incorrect method signature of awaitInactivity()
- Related: #2084
2014-06-17 16:00:54 +09:00
Norman Maurer
066f95d047 [#2573] UnpooledUnsafeDirectByteBuf.setBytes(int,ByteBuf,int,int) fails to use fast-path when src has array
Motivation:

UnpooledUnsafeDirectByteBuf.setBytes(int,ByteBuf,int,int) fails to use fast-path when src uses an array as backing storage. This is because the if else uses the wrong ByteBuf for its check.

Modifications:

- Use correct ByteBuf when check for array as backing storage
- Also eliminate unecessary check in UnpooledDirectByteBuf which always fails anyway

Result:

Faster setBytes(...) when src ByteBuf is backed by an array.

No more IndexOutOfBoundsException or data-corruption.
2014-06-16 11:11:41 +02:00