Commit Graph

5833 Commits

Author SHA1 Message Date
Trustin Lee
7162d96ed5 Revert "Improve the allocation algorithm in PoolChunk"
This reverts commit 36305d7dce, which
seems to cause an assertion failure on our CI machine.
2014-06-21 19:19:49 +09:00
Trustin Lee
a1b87411fb Make sure OpenSslEngine is tested against transport-native-epoll 2014-06-21 18:29:00 +09:00
Trustin Lee
3775a6124e Remove padding utility classes
- It's not used anywhere
2014-06-21 17:59:20 +09:00
Trustin Lee
6834c2da51 Add missing last padding / Comment 2014-06-21 17:57:29 +09:00
Trustin Lee
b1a5ced729 Checkstyle / Overall clean-up / Fix serialization 2014-06-21 17:57:29 +09:00
nitsanw
a5d585a8ef Fix false sharing between head and tail reference in MpscLinkedQueue
Motivation:

The tail node reference writes (by producer threads) are very likely to
invalidate the cache line holding the headRef which is read by the
consumer threads in order to access the padded reference to the head
node. This is because the resulting layout for the object is:

- header
- Object AtomicReference.value -> Tail node
- Object MpscLinkedQueue.headRef -> PaddedRef -> Head node

This is 'passive' false sharing where one thread reads and the other
writes.  The current implementation suffers from further passive false
sharing potential from any and all neighbours to the queue object as no
pre/post padding is provided for the class fields.

Modifications:

Fix the memory layout by adding pre-post padding for the head node and
putting the tail node reference in the same object.

Result:

Fixed false sharing
2014-06-21 17:57:29 +09:00
nmittler
fd895e53f4 Adding int-to-object map implementation.
Motivation:

Maps with integer keys are used in several places (HTTP/2 code, for
example). To reduce the memory footprint of these structures, we need a
specialized map class that uses ints as keys.

Modifications:

Added IntObjectHashMap, which is uses open addressing and double hashing
for collision resolution.

Result:

A new int-based map class that can be shared across Netty.
2014-06-21 08:36:06 +02:00
Trustin Lee
cb994dd926 Fix the inconsistencies between performance tests in ByteBufAllocatorBenchmark
Motivation:

default*() tests are performing a test in a different way, and they must be same with other tests.

Modification:

Make sure default*() tests are same with the others

Result:

Easier to compare default and non-default allocators
2014-06-21 13:28:11 +09:00
Pavan Kumar
004ffbad90 Improve the allocation algorithm in PoolChunk
Motivation:

Depth-first search is not always efficient for buddy allocation.

Modification:

Employ a new faster search algorithm with different memoryMap layout.

Result:

With thread-local cache disabled, we see a lot of performance
improvment, especially when the size of the allocation is as small as
the page size, which had the largest search space previously:

-- master head --
Benchmark                (size) Mode    Score  Error Units
pooledDirectAllocAndFree  8192 thrpt  215.392  1.565 ops/ms
pooledDirectAllocAndFree 16384 thrpt  594.625  2.154 ops/ms
pooledDirectAllocAndFree 65536 thrpt 1221.520 18.965 ops/ms
pooledHeapAllocAndFree    8192 thrpt  217.175  1.653 ops/ms
pooledHeapAllocAndFree   16384 thrpt  587.250 14.827 ops/ms
pooledHeapAllocAndFree   65536 thrpt 1217.023 44.963 ops/ms

-- changes --
Benchmark                (size) Mode    Score  Error Units
pooledDirectAllocAndFree  8192 thrpt 3656.744 94.093 ops/ms
pooledDirectAllocAndFree 16384 thrpt 4087.152 22.921 ops/ms
pooledDirectAllocAndFree 65536 thrpt 4058.814 29.276 ops/ms
pooledHeapAllocAndFree    8192 thrpt 3640.355 44.418 ops/ms
pooledHeapAllocAndFree   16384 thrpt 4030.206 24.365 ops/ms
pooledHeapAllocAndFree   65536 thrpt 4103.991 70.991 ops/ms
2014-06-21 13:20:56 +09:00
Norman Maurer
58f4b4b7d9 [#2589] LocalServerChannel.doClose() throws NPE when localAddress == null
Motivation:

LocalServerChannel.doClose() calls LocalChannelRegistry.unregister(localAddress); without check if localAddress is null and so produce a NPE when pass null the used ConcurrentHashMapV8

Modification:
Check for localAddress != null before try to remove it from Map. Also added a unit test which showed the stacktrace of the error.

Result:

No more NPE during doClose().
2014-06-20 20:07:00 +02:00
Norman Maurer
19a1b603d0 Remove System.out.println(...) debug messages 2014-06-20 19:42:08 +02:00
Norman Maurer
d2b8560a76 [#2580] [#2587] Fix buffer corruption regression when ByteBuf.order(LITTLE_ENDIAN) is used
Motivation:

To improve the speed of ByteBuf with order LITTLE_ENDIAN and where the native order is also LITTLE_ENDIAN (intel) we introduces a new special SwappedByteBuf before in commit 4ad3984c8b. Unfortunally the commit has a flaw which does not handle correctly the case when a ByteBuf expands. This was caused because the memoryAddress was cached and never changed again even if the underlying buffer expanded. This can lead to corrupt data or even to SEGFAULT the JVM if you are lucky enough.

Modification:

Always lookup the actual memoryAddress of the wrapped ByteBuf.

Result:

No more data-corruption for ByteBuf with order LITTLE_ENDIAN and no JVM crashes.
2014-06-20 18:25:54 +02:00
Norman Maurer
1278467fec [#2586] Use correct EventLoop to notify delayed successful registration
Motivation:

At the moment AbstractBoostrap.bind(...) will always use the GlobalEventExecutor to notify the returned ChannelFuture if the registration is not done yet. This should only be done if the registration fails later. If it completes successful we should just notify with the EventLoop of the Channel.

Modification:

Use EventLoop of the Channel if possible to use the correct Thread to notify and so guaranteer the right order of events.

Result:

Use the correct EventLoop for notification
2014-06-20 16:51:28 +02:00
Trustin Lee
fb538ea532 Refactor FastThreadLocal to simplify TLV management
Motivation:

When Netty runs in a managed environment such as web application server,
Netty needs to provide an explicit way to remove the thread-local
variables it created to prevent class loader leaks.

FastThreadLocal uses different execution paths for storing a
thread-local variable depending on the type of the current thread.
It increases the complexity of thread-local removal.

Modifications:

- Moved FastThreadLocal and FastThreadLocalThread out of the internal
  package so that a user can use it.
- FastThreadLocal now keeps track of all thread local variables it has
  initialized, and calling FastThreadLocal.removeAll() will remove all
  thread-local variables of the caller thread.
- Added FastThreadLocal.size() for diagnostics and tests
- Introduce InternalThreadLocalMap which is a mixture of hard-wired
  thread local variable fields and extensible indexed variables
- FastThreadLocal now uses InternalThreadLocalMap to implement a
  thread-local variable.
- Added ThreadDeathWatcher.unwatch() so that PooledByteBufAllocator
  tells it to stop watching when its thread-local cache has been freed
  by FastThreadLocal.removeAll().
- Added FastThreadLocalTest to ensure that removeAll() works
- Added microbenchmark for FastThreadLocal and JDK ThreadLocal
- Upgraded to JMH 0.9

Result:

- A user can remove all thread-local variables Netty created, as long as
  he or she did not exit from the current thread. (Note that there's no
  way to remove a thread-local variable from outside of the thread.)
- FastThreadLocal exposes more useful operations such as isSet() because
  we always implement a thread local variable via InternalThreadLocalMap
  instead of falling back to JDK ThreadLocal.
- FastThreadLocalBenchmark shows that this change improves the
  performance of FastThreadLocal even more.
2014-06-19 21:08:16 +09:00
Norman Maurer
7279e48bef Small improvement in SimpleChannelInboundHandlerAdapter javadoc 2014-06-18 14:49:11 +02:00
Norman Maurer
917132e28d Make use of AtomicLongFieldUpdater.addAndGet(...) for cleaner code
Motivation:

The code in ChannelOutboundBuffer can be simplified by using AtomicLongFieldUpdater.addAndGet(...)

Modification:

Replace our manual looping with AtomicLongFieldUpdater.addAndGet(...)

Result:

Cleaner code
2014-06-17 19:50:14 +02:00
Norman Maurer
b627824b18 [#2577] ChannelOutboundBuffer.addFlush() unnecessary loop through all entries on multiple calls
Motivation:

If ChannelOutboundBuffer.addFlush() is called multiple times and flushed != unflushed it will still loop through all entries that are not flushed yet even if it is not needed anymore as these were marked uncancellable before.

Modifications:

Check if new messages were added since addFlush() was called and only if this was the case loop through all entries and try to mark the uncancellable.

Result:

Less overhead when ChannelOuboundBuffer.addFlush() is called multiple times without new messages been added.
2014-06-17 09:29:16 +02:00
Trustin Lee
b8a7881588 Fix incorrect method signature of awaitInactivity()
- Related: #2084
2014-06-17 16:01:25 +09:00
Norman Maurer
6fdf1138ca [#2573] UnpooledUnsafeDirectByteBuf.setBytes(int,ByteBuf,int,int) fails to use fast-path when src has array
Motivation:

UnpooledUnsafeDirectByteBuf.setBytes(int,ByteBuf,int,int) fails to use fast-path when src uses an array as backing storage. This is because the if else uses the wrong ByteBuf for its check.

Modifications:

- Use correct ByteBuf when check for array as backing storage
- Also eliminate unecessary check in UnpooledDirectByteBuf which always fails anyway

Result:

Faster setBytes(...) when src ByteBuf is backed by an array.

No more IndexOutOfBoundsException or data-corruption.
2014-06-16 11:11:54 +02:00
Norman Maurer
a4bd566cef [#2572] Correctly calculate length of output buffer before inflate to fix IndexOutOfBoundException
Motivation:

JdkZlibDecoder fails to decode because the length of the output buffer is not calculated correctly.
This can cause an IndexOutOfBoundsException or data-corruption when the PooledByteBuffAllocator is used.

Modifications:

Correctly calculate the length

Result:

No more IndexOutOfBoundsException or data-corruption.
2014-06-16 10:22:02 +02:00
Phil.Baxter
101b9ded33 export sun security packages as optional 2014-06-15 20:59:55 +02:00
Trustin Lee
e88262861a Use FastThreadLocal in more places 2014-06-14 17:46:36 +09:00
Norman Maurer
b737d631f1 [maven-release-plugin] prepare for next development iteration 2014-06-12 16:20:52 +02:00
Norman Maurer
1709113a1f [maven-release-plugin] prepare release netty-4.0.20.Final 2014-06-12 16:14:48 +02:00
Norman Maurer
76043bc8c8 Make use of an array to store FastThreadLocals and so allow to also use it in PooledByteBufAllocator that is instanced by users.
Motivation:
Allow to make use of our new FastThreadLocal whereever possible

Modification:
Make use of an array to store FastThreadLocals and so allow to also use it in PooledByteBufAllocator that is instanced by users.
The maximal size of the array is configurable per system property to allow to tune it if needed. As default we use 64 entries which should be good enough.

Result:
More flexible usage of FastThreadLocal
2014-06-12 15:43:20 +02:00
belliottsmith
1ac2ff8d7b Introduce FastThreadLocal which uses an EnumMap and a predefined fixed set of possible thread locals
Motivation:
Provide a faster ThreadLocal implementation

Modification:
Add a "FastThreadLocal" which uses an EnumMap and a predefined fixed set of possible thread locals (all of the static instances created by netty) that is around 10-20% faster than standard ThreadLocal in my benchmarks (and can be seen having an effect in the direct PooledByteBufAllocator benchmark that uses the DEFAULT ByteBufAllocator which uses this FastThreadLocal, as opposed to normal instantiations that do not, and in the new RecyclableArrayList benchmark);

Result:
Improved performance
2014-06-12 15:43:20 +02:00
Norman Maurer
cf1d9823a0 Make sure cancelled Timeouts are able to be GC'ed fast.
Motivation:
At the moment the HashedWheelTimer will only remove the cancelled Timeouts once the HashedWheelBucket is processed again. Until this the instance will not be able to be GC'ed as there are still strong referenced to it even if the user not reference it by himself/herself. This can cause to waste a lot of memory even if the Timeout was cancelled before.

Modification:
Add a new queue which holds CancelTasks that will be processed on each tick to remove cancelled Timeouts. Because all of this is done only by the WorkerThread there is no need for synchronization and only one extra object creation is needed when cancel() is executed. For addTimeout(...) no new overhead is introduced.

Result:
Less memory usage for cancelled Timeouts.
2014-06-10 12:47:13 +02:00
Norman Maurer
9b468bc275 Optimize DefaultChannelPipeline in terms of memory usage and initialization time
Motivation:
Each of DefaultChannelPipeline instance creates an head and tail that wraps a handler. These are used to chain together other DefaultChannelHandlerContext that are created once a new ChannelHandler is added. There are a few things here that can be improved in terms of memory usage and initialization time.

Modification:
- Only generate the name for the tail and head one time as it will never change anyway
- Rename DefaultChannelHandlerContext to AbstractChannelHandlerContext and make it abstract
- Create a new DefaultChannelHandlerContext that is used when a ChannelHandler is added to the DefaultChannelPipeline
- Rename TailHandler to TailContext and HeadHandler to HeadContext and let them extend AbstractChannelHandlerContext. This way we can save 2 object creations per DefaultChannelPipeline

Result:
- Less memory usage because we have 2 less objects per DefaultChannelPipeline
- Faster creation of DefaultChannelPipeline as we not need to generate the name for the head and tail
2014-06-10 12:45:37 +02:00
Frederic Bregier
6b69ccb585 [#2542] HTTP post request decoder does not support quoted boundaries
Motivation:
According to RFC2616 section 19, boundary string could be quoted, but
currently the PostRequestDecoder does not support it while it should.

Modifications:
Once the boundary is found, one check is made to verify if the boundary
is "quoted", and if so, it is "unqoted".

Note: in following usage of this boundary (as delimiter), quote seems no
more allowed according to the same RFC, so the reason that only the
boundary definition is corrected.

Result:
Now the boundary could be whatever quoted or not. A Junit test case
checks it.
2014-06-08 21:57:43 +02:00
Norman Maurer
a0a8f1032b [#2544] Correctly parse Multipart-mixed POST HTTP request in case of entity ends with odd number of 0x0D. Port of @fredericBregier 's work.
Motivation:
When an attribute is ending with an odd number of CR (0x0D), the decoder
add an extra CR in the decoded attribute and should not.

Modifications:
Each time a CR is detected, the next byte was tested to be LF or not. If
not, in a number of places, the CR byte was lost while it should not be.
When a CR is detected, if the next byte is not LF, the CR byte should be
saved as the position point to the next byte (not LF). When a CR is
detected, if there is not yet other available bytes, the position is
reset to the position of CR (since a LF could follow).

A new Junit test case is added, using DECODER and variable number of CR
in the final attribute (testMultipartCodecWithCRasEndOfAttribute).

Result:
The attribute is now correctly decoded with the right number of CR
ending bytes.
2014-06-08 11:50:58 +02:00
Norman Maurer
4ad3984c8b [#2436] Unsafe*ByteBuf implementation should only invert bytes if ByteOrder differ from native ByteOrder
Motivation:
Our Unsafe*ByteBuf implementation always invert bytes when the native ByteOrder is LITTLE_ENDIAN (this is true on intel), even when the user calls order(ByteOrder.LITTLE_ENDIAN). This is not optimal for performance reasons as the user should be able to set the ByteOrder to LITTLE_ENDIAN and so write bytes without the extra inverting.

Modification:
- Introduce a new special SwappedByteBuf (called UnsafeDirectSwappedByteBuf) that is used by all the Unsafe*ByteBuf implementation and allows to write without inverting the bytes.
- Add benchmark
- Upgrade jmh to 0.8

Result:
The user is be able to get the max performance even on servers that have ByteOrder.LITTLE_ENDIAN as their native ByteOrder.
2014-06-05 11:09:58 +02:00
Trustin Lee
7ce8dca97c Clean up MpscLinkedQueue, fix its leak, and make it work without Unsafe
Motivation:

MpscLinkedQueue has various issues:
- It does not work without sun.misc.Unsafe.
- Some field names are confusing.
  - Node.tail does not refer to the tail node really.
  - The tail node is the starting point of iteration. I think the tail
    node should be the head node and vice versa to reduce confusion.
- Some important methods are not implemented (e.g. iterator())
- Not serializable
- Potential false cache sharing problem due to lack of padding
- MpscLinkedQueue extends AtomicReference and thus exposes various
  operations that mutates the internal state of the queue directly.

Modifications:

- Use AtomicReferenceFieldUpdater wherever possible so that we do not
  use Unsafe directly. (e.g. use lazySet() instead of putOrderedObject)
- Extend AbstractQueue to implement most operations
- Implement serialization and iterator()
- Rename tail to head and head to tail to reduce confusion.
- Rename Node.tail to Node.next.
- Fix a leak where the references in the removed head are not cleared
  properly.
- Add Node.clearMaybe() method so that the value of the new head node
  is cleared if possible.
- Add some comments for my own educational purposes
- Add padding to the head node
  - Add FullyPaddedReference and RightPaddedReference for future reuse
- Make MpscLinkedQueue package-local so that a user cannot access the
  dangerous yet public operations exposed by the superclass.
  - MpscLinkedQueue.Node becomes MpscLinkedQueueNode, a top level class

Result:

- It's more like a drop-in replacement of ConcurrentLinkedQueue for the
  MPSC case.
- Works without sun.misc.Unsafe
- Code potentially easier to understand
- Fixed leak (related: #2372)
2014-06-04 03:24:05 +09:00
Trustin Lee
72a4e8c178 Require Maven 3.1.1 or above
.. because the build fails with an older Maven version due to Eclipse
Aether issues
2014-06-04 03:15:38 +09:00
Norman Maurer
a5b230c585 ChannelFlushPromiseNotifier should allow long value for pendingDataSize
Motivation:
At the moment ChannelFlushPromiseNotifier.add(....) takes an int value for pendingDataSize, which may be too small as a user may need to use a long. This can for example be useful when a user writes a FileRegion etc. Beside this the notify* method names are kind of missleading as these should not contain *Future* because it is about ChannelPromises.

Modification:
Add a new add(...) method that takes a long for pendingDataSize and @deprecated the old method. Beside this also @deprecated all *Future* methods and add methods that have *Promise* in the method name to better reflect usage.

Result:
ChannelFlushPromiseNotifier can be used with bigger data.
2014-06-03 17:34:04 +02:00
Daniel Bevenius
95b6ee80ca OkResponseHandler should return a FullHttpResponse.
Motivation:
Currently OkResponseHandler returns a DefaultHttpResponse which is not
correct and it should be returning complete http response.

Modifications:
Updated OkResponseHandler to return an instance of
DefaultFullHttpResponse.

Result:
It is not possible to add compression to the example without getting any
errors.
2014-06-03 09:44:04 +02:00
DhanaRaj Durairaj
f3b1816db8 [#2494] Fix data curruption by ChannelTrafficShapingHandler
Motivation:
ChannelTrafficShapingHandler may corrupt inbound data stream by
scheduling the fireChannelRead event.

Modification:
Always call fireChannelRead(...) and only suspend reads after it

Result:
No more data corruption
2014-06-03 08:38:23 +02:00
Trustin Lee
e07e37a924 Add awaitInactivity() to GlobalEventExecutor and ThreadDeathWatcher
Motivation:

When running Netty on a container environment, the container will often
complain about the lingering threads such as the worker threads of
ThreadDeathWatcher and GlobalEventExecutor.  We should provide an
operation that allows a use to wait until such threads are terminated.

Modifications:

- Add awaitInactivity()
- (misc) Fix typo in GlobalEventExecutorTest
- (misc) Port ThreadDeathWatch's CAS-based thread life cycle management
  to GlobalEventExecutor

Result:

- Fixes #2084
- Less overhead on task submission of GlobalEventExecutor
2014-06-02 19:28:17 +09:00
Trustin Lee
6c6d211652 Fix checkstyle 2014-06-02 18:27:18 +09:00
Trustin Lee
0928e28385 Use Java 5 foreach for arrays for brevity at no cost 2014-06-02 18:25:42 +09:00
Trustin Lee
ed6df98653 Introduce ThreadDeathWatcher
Motivation:

PooledByteBufAllocator's thread local cache and
ReferenceCountUtil.releaseLater() are in need of a way to run an
arbitrary logic when a certain thread is terminated.

Modifications:

- Add ThreadDeathWatcher, which spawns a low-priority daemon thread
  that watches a list of threads periodically (every second) and
  invokes the specified tasks when the associated threads are not alive
  anymore
  - Start-stop logic based on CAS operation proposed by @tea-dragon
- Add debug-level log messages to see if ThreadDeathWatcher works

Result:

- Fixes #2519 because we don't use GlobalEventExecutor anymore
- Cleaner code
2014-06-02 18:23:48 +09:00
Norman Maurer
0ea796ad55 [#2525] Use VoidChannelPromise in MessageToMessageEncoder when possible
Motivation:
At the moment MessageToMessageEncoder uses ctx.write(msg) when have more then one message was produced. This may produce more GC pressure then necessary as when the original ChannelPromise is a VoidChannelPromise we can safely also use one when write messages.

Modifications:
Use VoidChannelPromise when the original ChannelPromise was of this type

Result:
Less object creation and GC pressure
2014-06-01 19:26:09 +02:00
Korotaev Boris
303dfd5138 Fix broken CompositeMatcher
Motivation:

ChannelMatchers#CompositeMatcher inverts matches result.

Modifications:

Switched return values.

Result:

ChannelMatchers#CompositeMatcher will return correct results.
2014-06-01 13:13:32 +02:00
Norman Maurer
476c79c2e7 [#2523] Fix infinite-loop when remove attribute and create the same attribute again
Motivation:
The current DefaultAttributeMap cause an infinite-loop when the user removes an attribute and create the same attribute again. This regression was introduced by c3bd7a8ff1.

Modification:
Correctly break out loop

Result:
No infinite-loop anymore.
2014-06-01 13:11:03 +02:00
Josh Hoyt
0a4cade36a codec-http: Document the semantics of HttpResponseStatus equality and comparison 2014-05-30 07:52:42 +02:00
Trustin Lee
350ac9787e Do not use a pseudo random for tree traversal
Motivation:

If we make allocateRun/SubpageSimple() always try the left node first and make allocateRun/Subpage() always tries the right node first,  it is more likely that allocateRun/Subpage() will find a node with ST_UNUSED sooner.

Modifications:

- Make allocateRunSimple() and allocateSubpageSimple() always try the left node first.
- Make allocateRun() and allocateSubpage() always try the right node first.
- Remove randome

Result:

We get the same performance without using random numbers.
2014-05-30 11:24:22 +09:00
Trustin Lee
3e7dbe072e Optimize PooledByteBufAllocator
Motivation:

We still have a room for improvement in PoolChunk.allocateRun() and
Subpage.allocate().

Modifications:

- Unroll the recursion in PoolChunk.allocateRun()
- Subpage.allocate() makes use of the 'nextAvail' value set by previous
  free().

Result:

- PoolChunk.allocateRun() optimization yields 10%+ improvements in
  allocation throughput for non-subpage allocations.
- Subpage.allocate() optimization makes the subpage allocations for
  tiny buffers as fast as non-tiny buffers even when the pageSize is
  huge (e.g. 1048576) because it doesn't need to perform a linear search
  in most cases.
2014-05-30 10:51:27 +09:00
Trustin Lee
172e7f06be More realistic ByteBuf allocation benchmark
Motivation:

Allocating a single buffer and releasing it repetitively for a benchmark will not involve the realistic execution path of the allocators.

Modifications:

Keep the last 8192 allocations and release them randomly.

Result:

We are now getting the result close to what we got with caliper.
2014-05-29 19:51:13 +09:00
Trustin Lee
5177f963e7 Upgrade to netty-tcnative 1.1.30.Fork2 to support Windows 2014-05-28 10:56:23 +09:00
Trustin Lee
ba92cd4c3e Work around the system configuration issue that causes NioSocketChannelTest to fail
Motivation:

On some ill-configured systems, InetAddress.getLocalHost() fails.  NioSocketChannelTest calls java.net.Socket.connect() and it internally invoked InetAddress.getLocalHost(), which causes the test failures in NioSocketChannelTes on such an ill-configured system.

Modifications:

Use NetUtil.LOCALHOST explicitly.

Result:

NioSocketChannelTest should not fail anymore.
2014-05-28 09:40:59 +09:00
Trustin Lee
0b5e1ced97 Upgrade os-maven-plugin to fix an issue with IntelliJ IDEA on Windows 2014-05-27 04:42:27 +09:00