Motivation:
Base64#decode4to3 generally calculates an int value where the contents of the decodabet straddle bytes, and then uses a byte shifting or a full byte swapping operation to get the resulting contents. We can directly calculate the contents and avoid any intermediate int values and full byte swap operations. This will reduce the number of operations required during the decode operation.
Modifications:
- remove the intermediate int in the Base64#decond4to3 method.
- manually do the byte shifting since we are already doing bit/byte manipulations here anyways.
Result:
Base64#decode4to3 requires less operations to compute the end result.
Motivation:
The decode and encode method uses getByte(...) and setByte(...) in loops which can be very expensive because of bounds / reference-count checking. Beside this it also slows-down a lot when paranoid leak-detection is enabled as it will track each access.
Modifications:
- Pack bytes into int / short and so reduce operations on the ByteBuf
- Use ByteBufProcessor to reduce getByte calls.
Result:
Better performance in general. Also when you run the build with -Pleak the handler module will build in 1/4 of the time it took before.
Motivation:
We have our own ThreadLocalRandom implementation to support older JDKs . That said we should prefer the JDK provided when running on JDK >= 7
Modification:
Using ThreadLocalRandom implementation of the JDK when possible.
Result:
Make use of JDK implementations when possible.
Motivation:
To use jboss-marshalling extra command-line arguments are needed on JDK9+ as it makes use of reflection internally.
Modifications:
Skip jboss-marshalling tests when running on JDK9+ and init of MarshallingFactory fails.
Result:
Be able to build on latest JDK9 release.
Motivation:
We used various mocking frameworks. We should only use one...
Modifications:
Make usage of mocking framework consistent by only using Mockito.
Result:
Less dependencies and more consistent mocking usage.
Motivation:
Thought there may be a bug so added a testcase to verify everything works as expected.
Modifications:
Added testcase
Result:
More test-coverage.
Motiviation:
We used ReferenceCountUtil.releaseLater(...) in our tests which simplifies a bit the releasing of ReferenceCounted objects. The problem with this is that while it simplifies stuff it increase memory usage a lot as memory may not be freed up in a timely manner.
Modifications:
- Deprecate releaseLater(...)
- Remove usage of releaseLater(...) in tests.
Result:
Less memory needed to build netty while running the tests.
Motivation:
Netty provides a adaptor from ByteBuf to Java's InputStream interface. The JDK Stream interfaces have an explicit lifetime because they implement the Closable interface. This lifetime may be differnt than the ByteBuf which is wrapped, and controlled by the interface which accepts the JDK Stream. However Netty's ByteBufInputStream currently does not take reference count ownership of the underlying ByteBuf. There may be no way for existing classes which only accept the InputStream interface to communicate when they are done with the stream, other than calling close(). This means that when the stream is closed it may be appropriate to release the underlying ByteBuf, as the ownership of the underlying ByteBuf resource may be transferred to the Java Stream.
Motivation:
- ByteBufInputStream.close() supports taking reference count ownership of the underyling ByteBuf
Result:
ByteBufInputStream can assume reference count ownership so the underlying ByteBuf can be cleaned up when the stream is closed.
Motivation:
ObjectOutputStream uses a Channel Attribute to cache a ObjectOutputStream which is backed by a ByteBuf that may be released after an object is encoded and the underlying buffer is written to the channel. On subsequent encode operations the cached ObjectOutputStream will be invalid and lead to a reference count exception.
Modifications:
- CompatibleObjectEncoder should not cache a ObjectOutputStream.
Result:
CompatibleObjectEncoder doesn't use a cached object backed by a released ByteBuf.
Motivation:
JCTools supports both non-unsafe, unsafe versions of queues and JDK6 which allows us to shade the library in netty-common allowing it to stay "zero dependency".
Modifications:
- Remove copy paste JCTools code and shade the library (dependencies that are shaded should be removed from the <dependencies> section of the generated POM).
- Remove usage of OneTimeTask and remove it all together.
Result:
Less code to maintain and easier to update JCTools and less GC pressure as the queue implementation nt creates so much garbage
Motivation:
We should ensure we null out the cumulation buffer before we fire it through the pipleine in handlerRemoved(...) as in theory it could be possible that another method is triggered as result of the fireChannelRead(...) or fireChannelReadComplete() that will try to access the cumulation.
Modifications:
Null out cumulation buffer early in handlerRemoved(...)
Result:
No possible to access the cumulation buffer that was already handed over.
Motivation:
At the moment the user is responsible to increase the writer index of the composite buffer when a new component is added. We should add some methods that handle this for the user as this is the most popular usage of the composite buffer.
Modifications:
Add new methods that autoamtically increase the writerIndex when buffers are added.
Result:
Easier usage of CompositeByteBuf.
Motivation:
99dfc9ea79 introduced some code that will more frequently try to forward messages out of the list of decoded messages to reduce latency and memory footprint. Unfortunally this has the side-effect that RecycleableArrayList.clear() will be called more often and so introduce some overhead as ArrayList will null out the array on each call.
Modifications:
- Introduce a CodecOutputList which allows to not null out the array until we recycle it and also allows to access internal array with extra range checks.
- Add benchmark that add elements to different List implementations and clear them
Result:
Less overhead when decode / encode messages.
Benchmark (elements) Mode Cnt Score Error Units
CodecOutputListBenchmark.arrayList 1 thrpt 20 24853764.609 ± 161582.376 ops/s
CodecOutputListBenchmark.arrayList 4 thrpt 20 17310636.508 ± 930517.403 ops/s
CodecOutputListBenchmark.codecOutList 1 thrpt 20 26670751.661 ± 587812.655 ops/s
CodecOutputListBenchmark.codecOutList 4 thrpt 20 25166421.089 ± 166945.599 ops/s
CodecOutputListBenchmark.recyclableArrayList 1 thrpt 20 24565992.626 ± 210017.290 ops/s
CodecOutputListBenchmark.recyclableArrayList 4 thrpt 20 18477881.775 ± 157003.777 ops/s
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 246.748 sec - in io.netty.handler.codec.CodecOutputListBenchmark
Motivation:
b112673554 added ChannelInputShutdownEvent support to ByteToMessageDecoder but missed updating the code for ReplayingDecoder. This has the effect:
- If a ChannelInputShutdownEvent is fired ByteToMessageDecoder (the super-class of ReplayingDecoder) will call the channelInputClosed(...) method which will pass the incorrect buffer to the decode method of ReplayingDecoder.
Modifications:
Share more code between ByteToMessageDEcoder and ReplayingDecoder and so also support ChannelInputShutdownEvent correctly in ReplayingDecoder
Result:
ChannelInputShutdownEvent is corrrectly handle in ReplayingDecoder as well.
We need to check if this handler was removed before continuing with decoding.
If it was removed, it is not safe to continue to operate on the buffer. This was already fixed for ByteToMessageDecoder in 4cdbe39284 but missed for ReplayingDecoder.
Modifications:
Check if decoder was removed after fire messages through the pipeline.
Result:
No illegal buffer access when decoder was removed.
Motivation:
We use ByteBuf.readBytes(int) in various places where we could either remove it completely or use readSlice(int).retain().
Modifications:
- Remove ByteBuf.readBytes(int) when possible or replace by readSlice(int).retain().
Result:
Faster code.
Motivation:
If the input buffer is empty we should not have decodeLast(...) call decode(...) as the user may not expect this.
Modifications:
- Not call decode(...) in decodeLast(...) if the input buffer is empty.
- Add testcases.
Result:
decodeLast(...) will not call decode(...) if input buffer is empty.
Motivation:
Each call of ByteBuf.getByte(int) method does boundary checking. This can be eliminated by using ByteBuf.forEachByte(ByteProcessor) method and ByteProcessor.FIND_LF processor.
Modifications:
Find end of line with ByteProcessor.FIND_LF
Result:
A little better performance of LineBasedFrameDecoder.
Motivation:
ByteToMessageDecoder must ensure that read components of the CompositeByteBuf can be discard by default when discardSomeReadBytes() is called. This may not be the case before as because of the default maxNumComponents that will cause consolidation.
Modifications:
Ensure we not do any consolidation to actually be abel to discard read components
Result:
Less memory usage and allocations.
Motivation:
b714297a44 introduced ChannelInputShutdownEvent support for HttpObjectDecoder. However this should have been added to the super class ByteToMessageDecoder, and ByteToMessageDecoder should not propegate a channelInactive event through the pipeline in this case.
Modifications:
- Move the ChannelInputShutdownEvent handling from HttpObjectDecoder to ByteToMessageDecoder
- ByteToMessageDecoder doesn't call ctx.fireChannelInactive() on ChannelInputShutdownEvent
Result:
Half closed events are treated more generically, and don't get translated into a channelInactive pipeline event.