Motivation:
We create a new CompactObjectInputStream with ByteBufInputStream in ObjectDecoder.decode(...) method and don't close this InputStreams before return statement.
Modifications:
Save link to the ObjectInputStream and close it before return statement.
Result:
Close InputStreams and clean up unused resources. It will be better for GC.
Motivation:
In the previous fix for #2667 I did introduce a bit overhead by calling setEpollOut() too often.
Modification:
Only call setEpollOut() if really needed and remove unused code.
Result:
Less overhead when saturate network.
Motivation:
As a DatagramChannel supports to write to multiple remote peers we must not close the Channel once a IOException accours as this error may be only valid for one remote peer.
Modification:
Continue writing on IOException.
Result:
DatagramChannel can be used even after an IOException accours during writing.
Motivation:
We need to continue write until we hit EAGAIN to make sure we not see an starvation
Modification:
Write until EAGAIN is returned
Result:
No starvation when using native transport with ET.
Motivation:
Because of a missing return statement we may produce a NPE when try to fullfill the connect ChannelPromise when it was fullfilled before.
Modification:
Add missing return statement.
Result:
No more NPE.
Motivation:
The handling of IOV_MAX was done in JNI code base which makes stuff really complicated to maintain etc.
Modifications:
Move handling of IOV_MAX to java code to simplify stuff
Result:
Cleaner code.
Motivation:
In our nio implementation we use write-spinning for maximize throughput, but in the native implementation this is not used.
Modification:
Respect writeSpinCount in native transport.
Result:
Better throughput
Motivation:
When we receive an incomplete WebSocketFrame we need to make sure to wait for more data. Because we not did this we could produce a NPE.
Modification:
Make sure we not try to add null into the RecyclableArrayList
Result:
no more NPE on incomplete frames.
Motivation:
Currently we do 4 ByteBuf.writeBytes(...) calls per header line. This is can be improved.
Modification:
Introduce two new HttpHeaders methods to allow create HttpHeaderEntity which contains the separator. With this we can minimize it to 2 ByteBuf.writeBytes(...) calls per header line
Result:
Performance improvement.
Motivation:
I introduced ensureAccessible() class as part of 6c47cc9711 in some places. Unfortunally I also added some where these are not needed and so caused a performance regression.
Modification:
Remove calls where not needed.
Result:
Fixed performance regression.
Motivation:
I introduced range checks as part of 6c47cc9711 in some places. Unfortunally I also added some where these are not needed and so caused a performance regression.
Modification:
Remove range checks where not needed
Result:
Fixed performance regression.
Motivations:
In our new version of HWT we used some kind of lazy cancelation of timeouts by put them back in the queue and let them pick up on the next tick. This multiple problems:
- we may corrupt the MpscLinkedQueue if the task is used as tombstone
- this sometimes lead to an uncessary delay especially when someone did executed some "heavy" logic in the TimeTask
Modifications:
Use a Lock per HashedWheelBucket for save and fast removal.
Modifications:
Cancellation of tasks can be done fast and so stuff can be GC'ed and no more infinite-loop possible
Motivation:
HTTP header validation can be expensive so we should allow to disable it like we do in HttpObjectDecoder.
Modification:
Add constructor argument to disable validation.
Result:
Performance improvement
Motivation:
HttpObjectAggregator currently creates a new FullHttpResponse / FullHttpRequest for each message it needs to aggregate. While doing so it also creates 2 DefaultHttpHeader instances (one for the headers and one for the trailing headers). This is bad for two reasons:
- More objects are created then needed and also populate the headers is not for free
- Headers may get validated even if the validation was disabled in the decoder
Modification:
- Wrap the previous created HttpResponse / HttpRequest and so reuse the original HttpHeaders
- Reuse the previous created trailing HttpHeader.
- Fix a bug where the trailing HttpHeader was incorrectly mixed in the headers.
Result:
- Less GC
- Faster HttpObjectAggregator implementation
Motivation:
It's not always the case that there is another handler in the pipeline that will intercept the exceptionCaught event because sometimes users just sub-class. In this case the exception will just hit the end of the pipeline.
Modification:
Throw the TooLongFrameException so that sub-classes can handle it in the exceptionCaught(...) method directly.
Result:
Sub-classes can correctly handle the exception,
Motivation:
epoll transport fails on gathering write of more then 1024 buffers. As linux supports max. 1024 iov entries when calling writev(...) the epoll transport throws an exception.
Thanks again to @blucas to provide me with a reproducer and so helped me to understand what the issue is.
Modifications:
Make sure we break down the writes if to many buffers are uses for gathering writes.
Result:
Gathering writes work with any number of buffers
Motivation:
CompositeByteBuf.deallocate generates unnecessary GC pressure when using the 'foreach' loop, as a 'foreach' loop creates an iterator when looping.
Modification:
Convert 'foreach' loop into regular 'for' loop.
Result:
Less GC pressure (and possibly more throughput) as the 'for' loop does not create an iterator
Motivation:
When an exception is thrown during try to send DatagramPacket or SctpMessage a buffer may leak.
Modification:
Correctly handle allocated buffers in case of exception
Result:
No more leaks
Motivation:
HttpOrSpdyChooser can be simplified so the user not need to implement getProtocol(...) method.
Modification:
Add implementation for the method. The user can override it if necessary.
Result:
Easier usage of HttpOrSpdyChooser.
Motivation:
When a bind fails AbstractBootstrap will use the GlobalEventExecutor to notify the ChannelPromise. We should use the EventLoop of the Channel if possible.
Modification:
Use EventLoop of the Channel if possible to use the correct Thread to notify and so guaranteer the right order of events.
Result:
Use the correct EventLoop for notification
Motivation:
OkResponseHandler is the last handler in the pipeline of the HTTP CORS
example. It is responsible for releasing all messages it handled.
Modification:
Extend SimpleChannelInboundHandler instead of
ChannelInboundHandlerAdapter
Result:
Fixed a leak
Motivation:
AbstractByteBufTest.testInternalBuffer() uses writeByte() operations to
populate the sample data. Usually, this isn't a problem, but it starts
to take a lot of time when the resource leak detection level gets
higher.
In our CI machine, testInternalBuffer() takes more than 30 minutes,
causing the build timeout when the 'leak' profile is active (paranoid
level resource detection.)
Modification:
Populate the sample data using ThreadLocalRandom.nextBytes() instead of
using millions of writeByte() operations.
Result:
Test runs much faster when leak detection level is high.
Motivation:
Currently it is impossible to build netty on linux system that not define SO_REUSEPORT even if it is supported.
Modification:
Define SO_REUSEPORT if not defined.
Result:
Possible to build on more linux dists.
Motivation:
We use the nanoTime of the scheduledTasks to calculate the milli-seconds to wait for a select operation to select something. Once these elapsed we check if there was something selected or some task is ready for processing. Unfortunally we not take into account scheduled tasks here so the selection loop will continue if only scheduled tasks are ready for processing. This will delay the execution of these tasks.
Modification:
- Check if a scheduled task is ready after selecting
- also make a tiny change in NioEventLoop to not trigger a rebuild if nothing was selected because the timeout was reached a few times in a row.
Result:
Execute scheduled tasks on time.
Motivation:
When we do a (env*)->GetObjectArrayElement(...) call we may created many local references which will only be cleaned up once we exist the native method. Thus a lot of memory can be used and so a StackOverFlow may be triggered. Beside this the JNI specification only say that an implementation must cope with 16 local references.
Modification:
Call (env*)->ReleaseLocalRef(...) to release the resource once not needed anymore.
Result:
Less memory usage and guard against StackOverflow
Motivation:
Because of how we use reference counting we need to check for the reference count before each operation that touches the underlying memory. This is especially true as we use sun.misc.Cleaner.clean() to release the memory ASAP when possible. Because of this the user may cause a SEGFAULT if an operation is called that tries to access the backing memory after it was released.
Modification:
Correctly check the reference count on all methods that access the underlying memory or expose it via a ByteBuffer.
Result:
Safer usage of ByteBuf
Motivation:
DecodeResult is dropped when aggregate HTTP messages.
Modification:
Make sure we not drop the DecodeResult while aggregate HTTP messages.
Result:
Correctly include the DecodeResult for later processing.
Motivation:
When a select rebuild was triggered the reference to the SelectionKey is not updated in AbstractNioChannel. This will cause a CancelledKeyException later.
Modification:
Correctly update SelectionKey reference after rebuild
Result:
Fix exception
Revert the removal of 'get' prefix from HTTP classes to ensure ABI
compatibility. Note that this commit does not revert the changes in
SPDY, which is considered experimental.
Motivation:
When a user tries to use netty on android it currently fails with "Could not find class 'sun.misc.Cleaner'"
Modification:
Encapsulate sun.misc.Cleaner usage in extra class to workaround this isssue.
Result:
Netty can be used on android again
Motivation:
At the moment there is no simple way for a user to check if the native epoll transport can be used on the running platform. Thus the user can only try to instance it and catch any exception and fallback to nio transport.
Modification:
Add Epoll.isAvailable() which allows to check if epoll can be used.
Result:
User can easily check if epoll transport can be used or not
Motivation:
bytesBefore(length, ...), bytesBefore(index, length, ...), and
indexOf(fromIndex, toIndex,...) in ReplayingDecoderBuffer are buggy.
They trigger 'REPLAY even when they don't need to.
Modification:
Implement the buggy methods properly so that REPLAYs are not triggered
unnecessarily.
Result:
Correct behvaior
Motivation:
When using openjdk and oracle jdk's nio (while using the nio transport) the ServerSocketChannel uses SO_REUSEADDR by default. Our native transport should do the same to make it easier to switch between the different implementations and get the expected result.
Modification:
Change EpollServerSocketChannelConfig to set SO_REUSEADDR on the created socket.
Result:
SO_REUSEADDR is used by default on servers.
Motivation:
At the moment we use a lot of unnecessary memory copies in JdkZlibEncoder. This is caused by either allocate a to small ByteBuf and expand it later or using a temporary byte array.
Beside this the memory footprint of JdkZlibEncoder is pretty high because of the byte[] used for compressing.
Modification:
- Override allocateBuffer(...) and calculate the estimatedsize in there, this reduce expanding of the ByteBuf later
- Not use byte[] in the instance itself but allocate a heap ByteBuf and write directly into the byte array
Result:
Less memory copies and smaller memory footprint
- Using short[] for memoryMap did not improve performance. Reverting
back to the original dual-byte[] structure in favor of simplicity.
- Optimize allocateRun() which yields small performence improvement
- Use local variable when member fields are accessed more than once
Motivation:
Depth-first search is not always efficient for buddy allocation.
Modification:
Employ a new faster search algorithm with different memoryMap layout.
Result:
With thread-local cache disabled, we see a lot of performance
improvment, especially when the size of the allocation is as small as
the page size, which had the largest search space previously.