netty5

Author	SHA1	Message	Date
Chris Vest	a3d5617d45	Remove deprecated stuff around ResourceLeakDetector (#11572 ) Motivation: A number of classes and APIs around the ResourceLeakDetector have been deprecated for removal in Netty 5.x, because better alternatives exist. Modification: Remove everything in and around ResourceLeakDetector that is deprecated, and fix the few usages that were found. Result: Less deprecated code.	2021-08-11 21:41:49 +02:00
Aayush Atharva	99bd5895dc	Inline variables to make code more readable (#11565 ) Motivation: There are lots of redundant variable declarations which should be inlined to make good look better. Modification: Made variables inlined. Result: Less redundant variable and more readable code.	2021-08-11 17:07:32 +02:00
Chris Vest	6a92a3354e	Use the standard `japicmp.skip` instead of the custom `skipJapicmp` (#11558 )	2021-08-10 09:07:04 +02:00
Aayush Atharva	25a0a6d425	Make variables final (#11548 ) Motivation: We should make variables `final` which are not reinstated again in code to match the code style and makes the code look better. Modification: Made couples of variables as `final`. Result: Variables marked as `final`.	2021-08-06 09:28:12 +02:00
Aayush Atharva	b700793951	Remove Unused Imports (#11546 ) Motivation: There are lots of imports which are unused. We should get rid of them to make the code look better, Modification: Removed unused imports. Result: No unused imports.	2021-08-05 14:08:07 +02:00
Chris Vest	6b11f7fbc2	All *Bootstrap methods that used to return ChannelFuture now return Future<Channel> (#11517 ) Bootstrap methods now return Future<Channel> instead of ChannelFuture Motivation: In #8516 it was proposed to at some point remove the specialised ChannelFuture and ChannelPromise. Or at least make them not extend Future and Promise, respectively. One pain point encountered in this discussion is the need to get access to the channel object after it has been initialised, but without waiting for the channel registration to propagate through the pipeline. Modification: Add a Bootstrap.createUnregistered method, which will return a Channel directly. All other Bootstrap methods that previously returned ChannelFuture now return Future<Channel> Result: It's now possible to obtain an initialised but unregistered channel from a bootstrap, without blocking. And the other bootstrap methods now only release their channels through the result of their futures, preventing racy access to the channels.	2021-08-03 19:43:38 +02:00
Norman Maurer	1f6577ee92	Remove rest of junit4 usage (#11484 ) Motivation: We did migrate all these modules to junit5 before but missed a few usages of junit4 Modifications: Replace all junit4 imports by junit5 apis Result: Part of https://github.com/netty/netty/issues/10757	2021-07-13 21:00:53 +02:00
Norman Maurer	6ac8ef54f7	Remove `throws Exception` from `ChannelHandler` methods that handle o… (#11417 ) Motivation: At the moment all methods in `ChannelHandler` declare `throws Exception` as part of their method signature. While this is fine for methods that handle inbound events it is quite confusing for methods that handle outbound events. This comes due the fact that these methods also take a `ChannelPromise` which actually need to be fullfilled to signal back either success or failure. Define `throws...` for these methods is confusing at best. We should just always require the implementation to use the passed in promise to signal back success or failure. Doing so also clears up semantics in general. Due the fact that we can't "forbid" throwing `RuntimeException` we still need to handle this in some way tho. In this case we should just consider it a "bug" and so log it and close the `Channel` in question. The user should never have an exception "escape" their implementation and just use the promise. This also clears up the ownership of the passed in message etc. As `flush(ChannelHandlerContext)` and `read(ChannelHandlerContext)` don't take a `ChannelPromise` as argument this also means that these methods can never produce an error. This makes kind of sense as these really are just "signals" for the underlying transports to do something. For `RuntimeException` the same rule is used as for other outbound event handling methods, which is logging and closing the `Channel`. Motifications: - Remove `throws Exception` from signature - Adjust code to not throw and just notify the promise directly - Adjust unit tests Result: Much cleaner API and semantics.	2021-07-08 10:16:00 +02:00
Norman Maurer	dbdf9f16c2	Migrate microbenchmark to junit5 (#11440 ) (#11443 ) Motivation: We should update to use junit5 in all modules. Modifications: Adjust microbenchmark to use junit5 Result: Part of https://github.com/netty/netty/issues/10757	2021-07-02 08:05:18 +02:00
Norman Maurer	07baabaac5	Remove ProgressivePromise / ProgressiveFuture (#11374 ) Motivation: This special case implementation of Promise / Future requires the implementations responsible for completing the promise to have knowledge of this class to provide value. It also requires that the implementations are able to provide intermediate status while the work is being done. Even throughout the core of Netty it is not really supported most of the times and so just brings more complexity without real gain. Let's remove it completely which is better then only support it sometimes. Modifications: Remove Progressive* API Result: Code cleanup.... Fixes https://github.com/netty/netty/issues/8519	2021-06-09 08:32:38 +02:00
Norman Maurer	abdaa769de	Remove VoidPromise (#11348 ) Motivation: Sometime in the past we introduced the concept of VoidPromise. As it turned out this was not a good idea at all as basically each handler in the pipeline need to be very careful to correctly handle this. We should better just remove this "optimization". Modifications: - Remove VoidPromise and all the related APIs - Remove tests which were related to VoidPromise Result: Less error-prone API	2021-06-08 14:22:16 +02:00
Boris Unckel	0e8f5c5f7c	Utilize i.n.u.internal.ObjectUtil to assert Preconditions (misc) (#11170 ) (#11186 ) Motivation: NullChecks resulting in a NullPointerException or IllegalArgumentException, numeric ranges (>0, >=0) checks, not empty strings/arrays checks must never be anonymous but with the parameter or variable name which is checked. They must be specific and should not be done with an "OR-Logic" (if a == null \|\| b == null) throw new NullPointerEx. Modifications: * import static relevant checks * Replace manual checks with ObjectUtil methods Result: All checks needed are done with ObjectUtil, some exception texts are improved in microbench and resolver-dns Fixes #11170	2021-04-22 17:50:36 +02:00
Frédéric Brégier	d421ae10d7	Minimize get byte multipart and fix buffer reuse (#11001 ) Motivation: - Underlying buffer usages might be erroneous when releasing them internaly in HttpPostMultipartRequestDecoder. 2 bugs occurs: 1) Final File upload seems not to be of the right size. 2) Memory, even in Disk mode, is increasing continuously, while it shouldn't. - Method `getByte(position)` is too often called within the current implementation of the HttpPostMultipartRequestDecoder. This implies too much activities which is visible when PARANOID mode is active. This is also true in standard mode. Apply the same fix on buffer from HttpPostMultipartRequestDecoder to HttpPostStandardRequestDecoder made previously. Finally in order to ensure we do not rewrite already decoded HttpData when decoding next ones within multipart, we must ensure the buffers are copied and not a retained slice. Modification: - Add some tests to check consistency for HttpPostMultipartRequestDecoder. Add a package protected method for testing purpose only. - Use the `bytesBefore(...)` method instead of `getByte(pos)` in order to limit the external access to the underlying buffer by retrieving iteratively the beginning of a correct start position. It is used to find both LF/CRLF and delimiter. 2 methods in HttpPostBodyUtil were created for that. The undecodedChunk is copied when adding a chunk to a DataMultipart is loaded. The same buffer is also rewritten in order to release the copied memory part. Result: Just for note, for both Memory or Disk or Mixed mode factories, the release has to be done as: for (InterfaceHttpData httpData: decoder.getBodyHttpDatas()) { httpData.release(); factory.removeHttpDataFromClean(request, httpData); } factory.cleanAllHttpData(); decoder.destroy(); The memory used is minimal in Disk or Mixed mode. In Memory mode, a big file is still in memory but not more in the undecodedChunk but its own buffer (copied). In terms of benchmarking, the results are: Original code Benchmark Mode Cnt Score Error Units HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel thrpt 6 0,152 ± 0,100 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel thrpt 6 0,543 ± 0,218 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel thrpt 6 0,001 ± 0,001 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel thrpt 6 0,615 ± 0,070 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel thrpt 6 0,114 ± 0,063 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel thrpt 6 0,664 ± 0,034 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel thrpt 6 0,001 ± 0,001 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel thrpt 6 0,620 ± 0,140 ops/ms New code Benchmark Mode Cnt Score Error Units HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel thrpt 6 4,037 ± 0,358 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel thrpt 6 4,226 ± 0,471 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel thrpt 6 0,875 ± 0,029 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel thrpt 6 4,346 ± 0,275 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel thrpt 6 2,044 ± 0,020 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel thrpt 6 2,278 ± 0,159 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel thrpt 6 0,174 ± 0,004 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel thrpt 6 2,370 ± 0,065 ops/ms In short, using big file transfers, this is about 7 times faster with new code, while using high number of HttpData, this is about 4 times faster with new code when using Simple Level. When using Paranoid Level, using big file transfers, this is about 800 times faster with new code, while using high number of HttpData, this is about 170 times faster with new code.	2021-02-26 14:26:24 +01:00
Francesco Nigro	517da28740	DecodeHexBenchmark is too branch-predictor friendly (#9942 ) Motivation: DecodeHexBenchmark needs to be less branch-predictor friendly to mimic the "real" behaviour while decoding Modifications: DecodeHexBenchmark uses a larger sets of inputs, picking them at random on each iteration and the benchmarked method is made !inlineable Result: DecodeHexBenchmark is more trusty while showing the performance difference between different decoding methods	2021-02-05 15:28:25 +01:00
Francesco Nigro	5337d3eeb4	Implement SWAR indexOf byte search (#10737 ) Motivation: Faster indexOf Modification: Create generic SWAR indexOf that any ByteBuf implementation can use Result: Fixes #10731	2021-01-15 15:09:50 +01:00
Oleksii Kachaiev	85ec20ecbd	Improve performance of HPACK static table lookup (#10840 ) Motivation: HPACK static table is organized in a way that fields with the same name are sequential. Which means when doing sequential scan we can short-circuit scan on name mismatch. Modifications: * `HpackStaticTable.getIndexIndensitive` returns -1 on name mismatch rather than keep scanning. * `HpackStaticTable` statically defined max position in the array where name duplication is possible (after the given index there's no need to check for other fields with the same name) * Benchmark for different lookup patterns Result: Better HPACK static table lookup performance. Co-authored-by: Norman Maurer <norman_maurer@apple.com>	2020-12-21 15:34:31 +01:00
Andrey Mizurov	2877eef5d5	Provide ability to extend StompSubframeEncoder and improve full stomp frame encoding (allocate one buffer for full frame considering the size of the headers) (#10778 ) Motivation: At the moment `StompSubframeEncoder` encode a frame only to `ByteBuf` it is not convenient if further we need to convert it to another type of message, e.g. `WebSocketFrame`. Also, if we send a full frame, it splits into two headers and a content what makes it difficult to convert it in the next handler. Modification: Introduce additional converter methods e.g. (`Object protected convertFullFrame(StompFrame original, ByteBuf encoded`)...) for extending encoder functionality and allocate only one `ByteBuf` for full stomp frame. Change headers size calculation, previously used only 256 bytes that reallocate a new buffer each time when headers size more than this threshold. Add `StompEncoderBenchmark`. Result: Improved `StompSubframeEncoder` fro extensions. Previous version benchmark ``` Benchmark (contentLength) (headersType) (pooledAllocator) Mode Cnt Score Error Units StompEncoderBenchmark.writeStompFrame 0 ONE true thrpt 10 4432132.884 ± 178923.436 ops/s StompEncoderBenchmark.writeStompFrame 0 ONE false thrpt 10 1281122.756 ± 52484.174 ops/s StompEncoderBenchmark.writeStompFrame 0 THREE true thrpt 10 2980897.937 ± 130253.049 ops/s StompEncoderBenchmark.writeStompFrame 0 THREE false thrpt 10 1116883.574 ± 35471.482 ops/s StompEncoderBenchmark.writeStompFrame 0 SEVEN true thrpt 10 1988012.159 ± 74352.450 ops/s StompEncoderBenchmark.writeStompFrame 0 SEVEN false thrpt 10 881772.343 ± 94633.870 ops/s StompEncoderBenchmark.writeStompFrame 0 ELEVEN true thrpt 10 1048125.919 ± 151053.902 ops/s StompEncoderBenchmark.writeStompFrame 0 ELEVEN false thrpt 10 429900.066 ± 47956.661 ops/s StompEncoderBenchmark.writeStompFrame 0 TWENTY true thrpt 10 660584.122 ± 104973.439 ops/s StompEncoderBenchmark.writeStompFrame 0 TWENTY false thrpt 10 278255.488 ± 20143.708 ops/s StompEncoderBenchmark.writeStompFrame 10 ONE true thrpt 10 4251498.549 ± 625050.979 ops/s StompEncoderBenchmark.writeStompFrame 10 ONE false thrpt 10 1214006.861 ± 60421.601 ops/s StompEncoderBenchmark.writeStompFrame 10 THREE true thrpt 10 3117736.486 ± 173613.974 ops/s StompEncoderBenchmark.writeStompFrame 10 THREE false thrpt 10 1046605.891 ± 94428.064 ops/s StompEncoderBenchmark.writeStompFrame 10 SEVEN true thrpt 10 2006986.881 ± 108456.748 ops/s StompEncoderBenchmark.writeStompFrame 10 SEVEN false thrpt 10 877983.112 ± 82919.387 ops/s StompEncoderBenchmark.writeStompFrame 10 ELEVEN true thrpt 10 1132844.437 ± 84578.571 ops/s StompEncoderBenchmark.writeStompFrame 10 ELEVEN false thrpt 10 429334.649 ± 35403.161 ops/s StompEncoderBenchmark.writeStompFrame 10 TWENTY true thrpt 10 657093.390 ± 48092.947 ops/s StompEncoderBenchmark.writeStompFrame 10 TWENTY false thrpt 10 252140.876 ± 37337.255 ops/s StompEncoderBenchmark.writeStompFrame 100 ONE true thrpt 10 4720507.067 ± 100993.908 ops/s StompEncoderBenchmark.writeStompFrame 100 ONE false thrpt 10 1266182.925 ± 85888.413 ops/s StompEncoderBenchmark.writeStompFrame 100 THREE true thrpt 10 2898746.621 ± 452579.753 ops/s StompEncoderBenchmark.writeStompFrame 100 THREE false thrpt 10 1019555.288 ± 65640.507 ops/s StompEncoderBenchmark.writeStompFrame 100 SEVEN true thrpt 10 2259187.459 ± 20025.989 ops/s StompEncoderBenchmark.writeStompFrame 100 SEVEN false thrpt 10 896405.412 ± 53750.148 ops/s StompEncoderBenchmark.writeStompFrame 100 ELEVEN true thrpt 10 1110670.772 ± 107650.327 ops/s StompEncoderBenchmark.writeStompFrame 100 ELEVEN false thrpt 10 445187.398 ± 28845.959 ops/s StompEncoderBenchmark.writeStompFrame 100 TWENTY true thrpt 10 611506.846 ± 25304.240 ops/s StompEncoderBenchmark.writeStompFrame 100 TWENTY false thrpt 10 247687.007 ± 43471.578 ops/s StompEncoderBenchmark.writeStompFrame 1000 ONE true thrpt 10 4140949.576 ± 270274.087 ops/s StompEncoderBenchmark.writeStompFrame 1000 ONE false thrpt 10 1154515.598 ± 134413.876 ops/s StompEncoderBenchmark.writeStompFrame 1000 THREE true thrpt 10 3349996.875 ± 162309.889 ops/s StompEncoderBenchmark.writeStompFrame 1000 THREE false thrpt 10 1141040.562 ± 5895.693 ops/s StompEncoderBenchmark.writeStompFrame 1000 SEVEN true thrpt 10 2184632.248 ± 8957.833 ops/s StompEncoderBenchmark.writeStompFrame 1000 SEVEN false thrpt 10 959545.704 ± 5835.161 ops/s StompEncoderBenchmark.writeStompFrame 1000 ELEVEN true thrpt 10 1081113.327 ± 3957.527 ops/s StompEncoderBenchmark.writeStompFrame 1000 ELEVEN false thrpt 10 467524.660 ± 1383.236 ops/s StompEncoderBenchmark.writeStompFrame 1000 TWENTY true thrpt 10 568411.797 ± 108712.493 ops/s StompEncoderBenchmark.writeStompFrame 1000 TWENTY false thrpt 10 260764.231 ± 43149.129 ops/s StompEncoderBenchmark.writeStompFrame 10000 ONE true thrpt 10 4369787.147 ± 619367.939 ops/s StompEncoderBenchmark.writeStompFrame 10000 ONE false thrpt 10 1246782.845 ± 47468.764 ops/s StompEncoderBenchmark.writeStompFrame 10000 THREE true thrpt 10 3333328.810 ± 253061.481 ops/s StompEncoderBenchmark.writeStompFrame 10000 THREE false thrpt 10 1108278.988 ± 81905.149 ops/s StompEncoderBenchmark.writeStompFrame 10000 SEVEN true thrpt 10 2062961.266 ± 247096.284 ops/s StompEncoderBenchmark.writeStompFrame 10000 SEVEN false thrpt 10 925199.985 ± 36734.594 ops/s StompEncoderBenchmark.writeStompFrame 10000 ELEVEN true thrpt 10 1223240.034 ± 58833.801 ops/s StompEncoderBenchmark.writeStompFrame 10000 ELEVEN false thrpt 10 460864.117 ± 2361.459 ops/s StompEncoderBenchmark.writeStompFrame 10000 TWENTY true thrpt 10 655864.762 ± 35237.335 ops/s StompEncoderBenchmark.writeStompFrame 10000 TWENTY false thrpt 10 286388.865 ± 1002.460 ops/s ``` A new version benchmark ``` Benchmark (contentLength) (headersType) (pooledAllocator) Mode Cnt Score Error Units StompEncoderBenchmark.writeStompFrame 0 ONE true thrpt 10 4366110.018 ± 420377.867 ops/s StompEncoderBenchmark.writeStompFrame 0 ONE false thrpt 10 1289437.153 ± 215271.656 ops/s StompEncoderBenchmark.writeStompFrame 0 THREE true thrpt 10 2818791.355 ± 218894.471 ops/s StompEncoderBenchmark.writeStompFrame 0 THREE false thrpt 10 1040151.615 ± 75352.695 ops/s StompEncoderBenchmark.writeStompFrame 0 SEVEN true thrpt 10 1842144.001 ± 94668.864 ops/s StompEncoderBenchmark.writeStompFrame 0 SEVEN false thrpt 10 916742.825 ± 65467.820 ops/s StompEncoderBenchmark.writeStompFrame 0 ELEVEN true thrpt 10 1310454.012 ± 100747.490 ops/s StompEncoderBenchmark.writeStompFrame 0 ELEVEN false thrpt 10 679934.001 ± 82168.249 ops/s StompEncoderBenchmark.writeStompFrame 0 TWENTY true thrpt 10 746867.549 ± 68373.269 ops/s StompEncoderBenchmark.writeStompFrame 0 TWENTY false thrpt 10 483316.314 ± 50978.009 ops/s StompEncoderBenchmark.writeStompFrame 10 ONE true thrpt 10 4791698.722 ± 263890.510 ops/s StompEncoderBenchmark.writeStompFrame 10 ONE false thrpt 10 1289877.116 ± 128677.185 ops/s StompEncoderBenchmark.writeStompFrame 10 THREE true thrpt 10 2984662.187 ± 395567.524 ops/s StompEncoderBenchmark.writeStompFrame 10 THREE false thrpt 10 1079028.782 ± 43548.555 ops/s StompEncoderBenchmark.writeStompFrame 10 SEVEN true thrpt 10 1806763.709 ± 59162.209 ops/s StompEncoderBenchmark.writeStompFrame 10 SEVEN false thrpt 10 935274.980 ± 22064.148 ops/s StompEncoderBenchmark.writeStompFrame 10 ELEVEN true thrpt 10 1284172.151 ± 119068.047 ops/s StompEncoderBenchmark.writeStompFrame 10 ELEVEN false thrpt 10 687174.498 ± 30270.916 ops/s StompEncoderBenchmark.writeStompFrame 10 TWENTY true thrpt 10 803843.483 ± 29106.133 ops/s StompEncoderBenchmark.writeStompFrame 10 TWENTY false thrpt 10 502134.552 ± 23653.215 ops/s StompEncoderBenchmark.writeStompFrame 100 ONE true thrpt 10 4337438.694 ± 378524.452 ops/s StompEncoderBenchmark.writeStompFrame 100 ONE false thrpt 10 1289174.213 ± 50640.853 ops/s StompEncoderBenchmark.writeStompFrame 100 THREE true thrpt 10 3232767.156 ± 311934.194 ops/s StompEncoderBenchmark.writeStompFrame 100 THREE false thrpt 10 1115247.028 ± 15683.477 ops/s StompEncoderBenchmark.writeStompFrame 100 SEVEN true thrpt 10 2213147.232 ± 86326.187 ops/s StompEncoderBenchmark.writeStompFrame 100 SEVEN false thrpt 10 901120.188 ± 71344.491 ops/s StompEncoderBenchmark.writeStompFrame 100 ELEVEN true thrpt 10 1238317.714 ± 68148.477 ops/s StompEncoderBenchmark.writeStompFrame 100 ELEVEN false thrpt 10 671336.339 ± 72735.337 ops/s StompEncoderBenchmark.writeStompFrame 100 TWENTY true thrpt 10 754565.791 ± 28574.382 ops/s StompEncoderBenchmark.writeStompFrame 100 TWENTY false thrpt 10 498939.383 ± 38146.118 ops/s StompEncoderBenchmark.writeStompFrame 1000 ONE true thrpt 10 3722594.471 ± 515861.000 ops/s StompEncoderBenchmark.writeStompFrame 1000 ONE false thrpt 10 1265629.633 ± 84113.347 ops/s StompEncoderBenchmark.writeStompFrame 1000 THREE true thrpt 10 2829696.349 ± 172520.267 ops/s StompEncoderBenchmark.writeStompFrame 1000 THREE false thrpt 10 1111454.609 ± 26275.913 ops/s StompEncoderBenchmark.writeStompFrame 1000 SEVEN true thrpt 10 1901506.449 ± 37701.353 ops/s StompEncoderBenchmark.writeStompFrame 1000 SEVEN false thrpt 10 912528.888 ± 46221.215 ops/s StompEncoderBenchmark.writeStompFrame 1000 ELEVEN true thrpt 10 1299674.123 ± 21889.002 ops/s StompEncoderBenchmark.writeStompFrame 1000 ELEVEN false thrpt 10 724527.644 ± 2757.370 ops/s StompEncoderBenchmark.writeStompFrame 1000 TWENTY true thrpt 10 811389.799 ± 2606.626 ops/s StompEncoderBenchmark.writeStompFrame 1000 TWENTY false thrpt 10 504955.449 ± 6737.804 ops/s StompEncoderBenchmark.writeStompFrame 10000 ONE true thrpt 10 3837912.649 ± 380742.919 ops/s StompEncoderBenchmark.writeStompFrame 10000 ONE false thrpt 10 1375544.306 ± 3157.068 ops/s StompEncoderBenchmark.writeStompFrame 10000 THREE true thrpt 10 3224743.448 ± 297369.719 ops/s StompEncoderBenchmark.writeStompFrame 10000 THREE false thrpt 10 1125772.007 ± 4051.498 ops/s StompEncoderBenchmark.writeStompFrame 10000 SEVEN true thrpt 10 2127352.136 ± 106787.777 ops/s StompEncoderBenchmark.writeStompFrame 10000 SEVEN false thrpt 10 934848.418 ± 4564.147 ops/s StompEncoderBenchmark.writeStompFrame 10000 ELEVEN true thrpt 10 1379672.772 ± 8778.640 ops/s StompEncoderBenchmark.writeStompFrame 10000 ELEVEN false thrpt 10 723169.459 ± 2317.767 ops/s StompEncoderBenchmark.writeStompFrame 10000 TWENTY true thrpt 10 802275.113 ± 4155.137 ops/s StompEncoderBenchmark.writeStompFrame 10000 TWENTY false thrpt 10 517604.265 ± 3398.384 ops/s ``` For headers over 256 bytes we get a speedup.	2020-12-07 09:59:17 +01:00
Norman Maurer	2dae6665f4	Fix caching for normal allocations (#10825 ) Motivation: https://github.com/netty/netty/pull/10267 introduced a change that reduced the fragmentation. Unfortunally it also introduced a regression when it comes to caching of normal allocations. This can have a negative performance impact depending on the allocation sizes. Modifications: - Fix algorithm to calculate the array size for normal allocation caches - Correctly calculate indeox for normal caches - Add unit test Result: Fixes https://github.com/netty/netty/issues/10805	2020-11-25 15:09:39 +01:00
Frédéric Brégier	3a58063fe7	Fix for performance regression on HttpPost RequestDecoder (#10623 ) Fix issue #10508 where PARANOID mode slow down about 1000 times compared to ADVANCED. Also fix a rare issue when internal buffer was growing over a limit, it was partially discarded using `discardReadBytes()` which causes bad changes within previously discovered HttpData. Reasons were: Too many `readByte()` method calls while other ways exist (such as keep in memory the last scan position when trying to find a delimiter or using `bytesBefore(firstByte)` instead of looping externally). Changes done: - major change on way buffer are parsed: instead of read byte per byte until found delimiter, try to find the delimiter using `bytesBefore()` and keep the last unfound position to skeep already parsed parts (algorithms are the same but implementation of scan are different) - Change the condition to discard read bytes when refCnt is at most 1. Observations using Async-Profiler: ================================== 1) Without optimizations, most of the time (more than 95%) is through `readByte()` method within `loadDataMultipartStandard` method. 2) With using `bytesBefore(byte)` instead of `readByte()` to find various delimiter, the `loadDataMultipartStandard` method is going down to 19 to 33% depending on the test used. the `readByte()` method or equivalent `getByte(pos)` method are going down to 15% (from 95%). Times are confirming those profiling: - With optimizations, in SIMPLE mode about 82% better, in ADVANCED mode about 79% better and in PARANOID mode about 99% better (most of the duplicate read accesses are removed or make internally through `bytesBefore(byte)` method) A benchmark is added to show the behavior of the various cases (one big item, such as File upload, and many items) and various level of detection (Disabled, Simple, Advanced, Paranoid). This benchmark is intend to alert if new implementations make too many differences (such as the previous version where about PARANOID gives about 1000 times slower than other levels, while it is now about at most 10 times). Extract of Benchmark run: ========================= Run complete. Total time: 00:13:27 Benchmark Mode Cnt Score Error Units HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel thrpt 6 2,248 ± 0,198 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel thrpt 6 2,067 ± 1,219 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel thrpt 6 1,109 ± 0,038 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel thrpt 6 2,326 ± 0,314 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel thrpt 6 1,444 ± 0,226 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel thrpt 6 1,462 ± 0,642 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel thrpt 6 0,159 ± 0,003 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel thrpt 6 1,522 ± 0,049 ops/ms	2020-11-19 08:01:05 +01:00
Norman Maurer	eeece4cfa5	Use http in xmlns URIs to make maven release plugin happy again (#10788 ) Motivation: https in xmlns URIs does not work and will let the maven release plugin fail: ``` [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 1.779 s [INFO] Finished at: 2020-11-10T07:45:21Z [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.apache.maven.plugins:maven-release-plugin:2.5.3:prepare (default-cli) on project netty-parent: Execution default-cli of goal org.apache.maven.plugins:maven-release-plugin:2.5.3:prepare failed: The namespace xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" could not be added as a namespace to "project": The namespace prefix "xsi" collides with an additional namespace declared by the element -> [Help 1] [ERROR] ``` See also https://issues.apache.org/jira/browse/HBASE-24014. Modifications: Use http for xmlns Result: Be able to use maven release plugin	2020-11-10 10:51:05 +01:00
Chris Vest	a6b749843f	Use JUnit 5 for running all tests (#10764 ) Motivation: JUnit 5 is the new hotness. It's more expressive, extensible, and composable in many ways, and it's better able to run tests in parallel. But most importantly, it's able to directly run JUnit 4 tests. This means we can update and start using JUnit 5 without touching any of our existing tests. I'm also introducing a dependency on assertj-core, which is like hamcrest, but arguably has a nicer and more discoverable API. Modification: Add the JUnit 5 and assertj-core dependencies, without converting any tests at time time. Result: All our tests are now executed through the JUnit 5 Vintage Engine. Also, the JUnit 5 test APIs are available, and any JUnit 5 tests that are added from now on will also be executed.	2020-11-04 10:21:03 +01:00
Artem Smotrakov	b8ae2a2af4	Enable nohttp check during the build (#10708 ) Motivation: HTTP is a plaintext protocol which means that someone may be able to eavesdrop the data. To prevent this, HTTPS should be used whenever possible. However, maintaining using https:// in all URLs may be difficult. The nohttp tool can help here. The tool scans all the files in a repository and reports where http:// is used. Modifications: - Added nohttp (via checkstyle) into the build process. - Suppressed findings for the websites that don't support HTTPS or that are not reachable Result: - Prevent using HTTP in the future. - Encourage users to use HTTPS when they follow the links they found in the code.	2020-10-23 15:26:25 +02:00
Francesco Nigro	4624b6309d	Reduce DefaultAttributeMap lookup cost (#10530 ) Motivation: DefaultAttributeMap::attr has a blocking behaviour on lookup of an existing attribute: it can be made non-blocking. Modification: Replace the existing fixed bucket table using a locked intrusive linked list with an hand-rolled copy-on-write ordered single array Result: Non blocking behaviour for the lookup happy path	2020-10-02 21:19:03 +02:00
Chris Vest	1d7efbddd9	Fix compilation after forward port of #10368 Motivation: Code failed to compile because ByteBuf index marking has been removed. Modification: Index marking wasn't really used anyway, so just set the relevant index to zero. Result: Code compiles again.	2020-09-09 16:27:52 +02:00
Francesco Nigro	7f86f90646	Improve predictability of writeUtf8/writeAscii performance (#10368 ) Motivation: writeUtf8 can suffer from inlining issues and/or megamorphic call-sites on the hot path due to ByteBuf hierarchy Modifications: Duplicate and specialize the code paths to reduce the need of polymorphic calls Result: Performance are more stable in user code	2020-09-09 16:15:22 +02:00
Francesco Nigro	319a4bc3ba	Reduce garbage on MQTT (#10509 ) Reduce garbage on MQTT encoding Motivation: MQTT encoding and decoding is doing unnecessary object allocation in a number of places: - MqttEncoder create many byte[] to encode Strings into UTF-8 bytes - MqttProperties uses Integer keys instead of int - Some enums valueOf create unnecessary arrays on the hot paths - MqttDecoder was using unecessary Result<T> Modification: - ByteBufUtil::utf8Bytes and ByteBufUtil::reserveAndWriteUtf8 allows to perform the same operation GC-free - MqttProperties uses a primitive key map - Implemented GC free const table lookup/switch valueOf - Use some bit-tricks to pack 2 ints into a single primitive long to store both result and numberOfBytesConsumed and use byte[].length to compute numberOfByteConsumed on fly. These changes allowed to save creating Result<T>. Result: Significantly less garbage produced in MQTT encoding/decoding	2020-09-04 18:31:53 +02:00
Francesco Nigro	0a8c9192e5	Improve MqttMessageType::valueOf cost (#10400 ) Motivation: MqttMessageType::valueOf has O(N) cost Modifications: MqttMessageType::valueOf uses a const lookup table Result: MqttMessageType::valueOf has O(1) cost	2020-08-31 10:32:48 +02:00
Linas Medžiūnas	abdcf102da	Efficient BytBuf search algorithms (#9914 ) (#9955 ) Motivation: We have found out that ByteBufUtil.indexOf can be inefficient for substring search on ByteBuf, both in terms of algorithm complexity (worst case O(needle.readableBytes * haystack.readableBytes)), and in constant factor (esp. on Composite buffers). With implementation of more performant search algorithms we have seen improvements on the order of magnitude. Modifications: This change introduces three search algorithms: 1. Knuth Morris Pratt - classical textbook algorithm, a good default choice. 2. Bit mask based algorithm - stable performance on any input, but limited to maximum search substring (the needle) length of 64 bytes. 3. Aho–Corasick - worse performance and higher memory consumption than [1] and [2], but it supports multiple substring (the needles) search simultaneously, by inspecting every byte of the haystack only once. Each algorithm processes every byte of underlying buffer only once, they are implemented as ByteProcessor. Result: Efficient search algorithms with linear time complexity available in Netty (I will share benchmark results in a comment on a PR).	2020-04-15 10:26:53 +02:00
Dmitry Konstantinov	dc69c04434	Replace usage() with freeBytes() in thresholds within hot paths of PoolChunkList (#10141 ) Motivation: PoolChunk.usage() method has non-trivial computations. It is used currently in hot path methods invoked when an allocation and de-allocation are happened. The idea is to replace usage() output comparison against percent thresholds by Chunk.freeBytes plain comparison against absolute thresholds. In such way the majority of computations from the threshold conditions are moved to init logic. Modifications: Replace PoolChunk.usage() conditions in PoolChunkList with equivalent conditions for PoolChunk.freeBytes() Result: Improve performance of allocation and de-allocation of ByteBuf from normal size cache pool	2020-03-31 22:11:42 +02:00
Norman Maurer	6a43807843	Use lambdas whenever possible (#9979 ) Motivation: We should update our code to use lamdas whenever possible Modifications: Use lambdas when possible Result: Cleanup code for Java8	2020-01-30 09:28:24 +01:00
Norman Maurer	9e29c39daa	Cleanup usage of Channel*Handler (#9959 ) Motivation: In next major version of netty users should use ChannelHandler everywhere. We should ensure we do the same Modifications: Replace usage of deprecated classes / interfaces with ChannelHandler Result: Use non-deprecated code	2020-01-20 17:47:17 -08:00
Francesco Nigro	1e4f0e6a09	Faster decodeHexNibble (#9896 ) Motivation: decodeHexNibble can be a lot faster using a lookup table Modifications: decodeHexNibble is made faster by using a lookup table Result: decodeHexNibble is faster	2019-12-23 21:16:44 +01:00
Anuraag Agrawal	ee206b6ba8	Separate out query string encoding for non-encoded strings. (#9887 ) Motivation: Currently, characters are appended to the encoded string char-by-char even when no encoding is needed. We can instead separate out codepath that appends the entire string in one go for better `StringBuilder` allocation performance. Modification: Only go into char-by-char loop when finding a character that requires encoding. Result: The results aren't so clear with noise on my hot laptop - the biggest impact is on long strings, both to reduce resizes of the buffer and also to reduce complexity of the loop. I don't think there's a significant downside though for the cases that hit the slow path. After ``` Benchmark Mode Cnt Score Error Units QueryStringEncoderBenchmark.longAscii thrpt 6 1.406 ± 0.069 ops/us QueryStringEncoderBenchmark.longAsciiFirst thrpt 6 0.046 ± 0.001 ops/us QueryStringEncoderBenchmark.longUtf8 thrpt 6 0.046 ± 0.001 ops/us QueryStringEncoderBenchmark.shortAscii thrpt 6 15.781 ± 0.949 ops/us QueryStringEncoderBenchmark.shortAsciiFirst thrpt 6 3.171 ± 0.232 ops/us QueryStringEncoderBenchmark.shortUtf8 thrpt 6 3.900 ± 0.667 ops/us ``` Before ``` Benchmark Mode Cnt Score Error Units QueryStringEncoderBenchmark.longAscii thrpt 6 0.444 ± 0.072 ops/us QueryStringEncoderBenchmark.longAsciiFirst thrpt 6 0.043 ± 0.002 ops/us QueryStringEncoderBenchmark.longUtf8 thrpt 6 0.047 ± 0.001 ops/us QueryStringEncoderBenchmark.shortAscii thrpt 6 16.503 ± 1.015 ops/us QueryStringEncoderBenchmark.shortAsciiFirst thrpt 6 3.316 ± 0.154 ops/us QueryStringEncoderBenchmark.shortUtf8 thrpt 6 3.776 ± 0.956 ops/us ```	2019-12-20 08:51:26 +01:00
Anuraag Agrawal	0f42eb1ceb	Use array to buffer decoded query instead of ByteBuffer. (#9886 ) Motivation: In Java, it is almost always at least slower to use `ByteBuffer` than `byte[]` without pooling or I/O. `QueryStringDecoder` can use `byte[]` with arguably simpler code. Modification: Replace `ByteBuffer` / `CharsetDecoder` with `byte[]` and `new String` Result: After ``` Benchmark Mode Cnt Score Error Units QueryStringDecoderBenchmark.noDecoding thrpt 6 5.612 ± 2.639 ops/us QueryStringDecoderBenchmark.onlyDecoding thrpt 6 1.393 ± 0.067 ops/us QueryStringDecoderBenchmark.mixedDecoding thrpt 6 1.223 ± 0.048 ops/us ``` Before ``` Benchmark Mode Cnt Score Error Units QueryStringDecoderBenchmark.noDecoding thrpt 6 6.123 ± 0.250 ops/us QueryStringDecoderBenchmark.onlyDecoding thrpt 6 0.922 ± 0.159 ops/us QueryStringDecoderBenchmark.mixedDecoding thrpt 6 1.032 ± 0.178 ops/us ``` I notice #6781 switched from an array to `ByteBuffer` but I can't find any motivation for that in the PR. Unit tests pass fine with an array and we get a reasonable speed bump.	2019-12-18 21:15:44 +01:00
Nick Hill	d370d48d4a	Update to latest JMH version (#9787 ) Motivation JMH 1.22 was released recently, we might as well use the latest when running benchmarks. Summary of changes: https://mail.openjdk.java.net/pipermail/jmh-dev/2019-November/002879.html Modifications Update jmh dependencies in microbench module from version 1.21 to 1.22. Result Benchmarks run using latest JMH	2019-11-19 11:28:36 +01:00
康智冬	1c69448e2e	Fix typos in javadocs (#9527 ) Motivation: We should have correct docs without typos Modification: Fix typos and spelling Result: More correct docs	2019-10-09 15:25:41 +02:00
jingene	af614e4d6e	Change the netty.io homepage scheme(http -> https) (#9344 ) Motivation: Netty homepage(netty.io) serves both "http" and "https". It's recommended to use https than http. Modification: I changed from "http://netty.io" to "https://netty.io" Result: No effects.	2019-07-09 21:10:14 +02:00
Norman Maurer	c5a602b272	Increase maxHeaderListSize for HpackDecoderBenchmark to be able to be… (#9321 ) Motivation: The previous used maxHeaderListSize was too low which resulted in exceptions during the benchmark run: ``` io.netty.handler.codec.http2.Http2Exception: Header size exceeded max allowed size (8192) at io.netty.handler.codec.http2.Http2Exception.connectionError(Http2Exception.java:103) at io.netty.handler.codec.http2.Http2Exception.headerListSizeError(Http2Exception.java:188) at io.netty.handler.codec.http2.Http2CodecUtil.headerListSizeExceeded(Http2CodecUtil.java:231) at io.netty.handler.codec.http2.HpackDecoder$Http2HeadersSink.finish(HpackDecoder.java:545) at io.netty.handler.codec.http2.HpackDecoder.decode(HpackDecoder.java:132) at io.netty.handler.codec.http2.HpackDecoderBenchmark.decode(HpackDecoderBenchmark.java:85) at io.netty.handler.codec.http2.generated.HpackDecoderBenchmark_decode_jmhTest.decode_thrpt_jmhStub(HpackDecoderBenchmark_decode_jmhTest.java:120) at io.netty.handler.codec.http2.generated.HpackDecoderBenchmark_decode_jmhTest.decode_Throughput(HpackDecoderBenchmark_decode_jmhTest.java:83) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:453) at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:437) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) ``` Also we should ensure we only use ascii for header names. Modifications: Just use Integer.MAX_VALUE as limit Result: Be able to run benchmark without exceptions	2019-07-04 11:24:37 +02:00
Carl Mastrangelo	65d8ecc3a0	Use Table lookup for HPACK decoder (#9307 ) Motivation: Table based decoding is fast. Modification: Use table based decoding in HPACK decoder, inspired by https://github.com/python-hyper/hpack/blob/master/hpack/huffman_table.py This modifies the table to be based on integers, rather than 3-tuples of bytes. This is for two reasons: 1. It's faster 2. Using bytes makes the static intializer too big, and doesn't compile. Result: Faster Huffman decoding. This only seems to help the ascii case, the other decoding is about the same. Benchmarks: ``` Before: Benchmark (limitToAscii) (sensitive) (size) Mode Cnt Score Error Units HpackDecoderBenchmark.decode true true SMALL thrpt 20 426293.636 ± 1444.843 ops/s HpackDecoderBenchmark.decode true true MEDIUM thrpt 20 57843.738 ± 725.704 ops/s HpackDecoderBenchmark.decode true true LARGE thrpt 20 3002.412 ± 16.998 ops/s HpackDecoderBenchmark.decode true false SMALL thrpt 20 412339.400 ± 1128.394 ops/s HpackDecoderBenchmark.decode true false MEDIUM thrpt 20 58226.870 ± 199.591 ops/s HpackDecoderBenchmark.decode true false LARGE thrpt 20 3044.256 ± 10.675 ops/s HpackDecoderBenchmark.decode false true SMALL thrpt 20 2082615.030 ± 5929.726 ops/s HpackDecoderBenchmark.decode false true MEDIUM thrpt 10 571640.454 ± 26499.229 ops/s HpackDecoderBenchmark.decode false true LARGE thrpt 20 92714.555 ± 2292.222 ops/s HpackDecoderBenchmark.decode false false SMALL thrpt 20 1745872.421 ± 6788.840 ops/s HpackDecoderBenchmark.decode false false MEDIUM thrpt 20 490420.323 ± 2455.431 ops/s HpackDecoderBenchmark.decode false false LARGE thrpt 20 84536.200 ± 398.714 ops/s After(bytes): Benchmark (limitToAscii) (sensitive) (size) Mode Cnt Score Error Units HpackDecoderBenchmark.decode true true SMALL thrpt 20 472649.148 ± 7122.461 ops/s HpackDecoderBenchmark.decode true true MEDIUM thrpt 20 66739.638 ± 341.607 ops/s HpackDecoderBenchmark.decode true true LARGE thrpt 20 3139.773 ± 24.491 ops/s HpackDecoderBenchmark.decode true false SMALL thrpt 20 466933.833 ± 4514.971 ops/s HpackDecoderBenchmark.decode true false MEDIUM thrpt 20 66111.778 ± 568.326 ops/s HpackDecoderBenchmark.decode true false LARGE thrpt 20 3143.619 ± 3.332 ops/s HpackDecoderBenchmark.decode false true SMALL thrpt 20 2109995.177 ± 6203.143 ops/s HpackDecoderBenchmark.decode false true MEDIUM thrpt 20 586026.055 ± 1578.550 ops/s HpackDecoderBenchmark.decode false false SMALL thrpt 20 1775723.270 ± 4932.057 ops/s HpackDecoderBenchmark.decode false false MEDIUM thrpt 20 493316.467 ± 1453.037 ops/s HpackDecoderBenchmark.decode false false LARGE thrpt 10 85726.219 ± 402.573 ops/s After(ints): Benchmark (limitToAscii) (sensitive) (size) Mode Cnt Score Error Units HpackDecoderBenchmark.decode true true SMALL thrpt 20 615549.006 ± 5282.283 ops/s HpackDecoderBenchmark.decode true true MEDIUM thrpt 20 86714.630 ± 654.489 ops/s HpackDecoderBenchmark.decode true true LARGE thrpt 20 3984.439 ± 61.612 ops/s HpackDecoderBenchmark.decode true false SMALL thrpt 20 602489.337 ± 5397.024 ops/s HpackDecoderBenchmark.decode true false MEDIUM thrpt 20 88399.109 ± 241.115 ops/s HpackDecoderBenchmark.decode true false LARGE thrpt 20 3875.729 ± 103.057 ops/s HpackDecoderBenchmark.decode false true SMALL thrpt 20 2092165.454 ± 11918.859 ops/s HpackDecoderBenchmark.decode false true MEDIUM thrpt 20 583465.437 ± 5452.115 ops/s HpackDecoderBenchmark.decode false true LARGE thrpt 20 93290.061 ± 665.904 ops/s HpackDecoderBenchmark.decode false false SMALL thrpt 20 1758402.495 ± 14677.438 ops/s HpackDecoderBenchmark.decode false false MEDIUM thrpt 10 491598.099 ± 5029.698 ops/s HpackDecoderBenchmark.decode false false LARGE thrpt 20 85834.290 ± 554.915 ops/s ```	2019-07-02 20:13:19 +02:00
jimin	78adeb5408	All override methods must be added @override (#9285 ) Motivation: Some methods that either override others or are implemented as part of implementation an interface did miss the `@Override` annotation Modifications: Add missing `@Override`s Result: Code cleanup	2019-06-27 13:52:06 +02:00
Alex Blewitt	e233407e01	Replace accumulation with blackhole.consume (#9275 ) Motivation: SpotJMHBugs reports that accumulating a value as a way of eliding dead code elimination may be inadvisable, as discussed in `JMHSample_34_SafeLooping::measureWrong_2`. Change the test so that it consumes the response with `Blackhole::consume` instead. Modifications: - Replace addition of results with explicit `blackhole.consume()` call Result: Tests work as before, but with different benchmark numbers.	2019-06-25 21:46:26 +02:00
Francesco Nigro	c6114786ab	Documented non-usage of BlackHole::consume on ByteBufAccessBenchmark (#9279 ) Motivation: Some JMH benchmarks need additional explanations to motivate specific code choices. Modifications: Introduced comment to explai why calling BlackHole::consume in a loop is not always the right choice for some benchmark. Result: The relevant method shows a comment that warn about changing the code to introduce BlackHole::consume in the loop.	2019-06-25 14:53:12 +02:00
Alex Blewitt	99034a15b5	Return the result of the list.recycle() call (#9264 ) Motivation: Resolve the issue highlighted by SpotJMHBugs that the creation of the RecyclableArrayList may be elided by the JIT since the result isn't consumed or returned. Modifications: Return the result of `list.recycle()` so that the list isn't elided. Result: The JMH benchmark shows a change in performance indicating that the prior results of this may be unsound.	2019-06-22 07:21:51 +02:00
Carl Mastrangelo	f01278616a	Properly debounce wakeups (#9191 ) Motivation: The wakeup logic in EpollEventLoop is overly complex Modification: * Simplify the race to wakeup the loop * Dont let the event loop wake up itself (it's already awake!) * Make event loop check if there are any more tasks after preparing to sleep. There is small window where the non-eventloop writers can issue eventfd writes here, but that is okay. Result: Cleaner wakeup logic. Benchmarks: ``` BEFORE Benchmark Mode Cnt Score Error Units EpollSocketChannelBenchmark.executeMulti thrpt 20 408381.411 ± 2857.498 ops/s EpollSocketChannelBenchmark.executeSingle thrpt 20 157022.360 ± 1240.573 ops/s EpollSocketChannelBenchmark.pingPong thrpt 20 60571.704 ± 331.125 ops/s Benchmark Mode Cnt Score Error Units EpollSocketChannelBenchmark.executeMulti thrpt 20 440546.953 ± 1652.823 ops/s EpollSocketChannelBenchmark.executeSingle thrpt 20 168114.751 ± 1176.609 ops/s EpollSocketChannelBenchmark.pingPong thrpt 20 61231.878 ± 520.108 ops/s ```	2019-06-04 05:27:15 -07:00
Nick Hill	23554e6997	Ensure "full" ownership of msgs passed to EmbeddedChannel.writeInbound() (#9058 ) Motivation Pipeline handlers are free to "take control" of input buffers if they have singular refcount - in particular to mutate their raw data if non-readonly via discarding of read bytes, etc. However there are various places (primarily unit tests) where a wrapped byte-array buffer is passed in and the wrapped array is assumed not to change (used after the wrapped buffer is passed to EmbeddedChannel.writeInbound()). This invalid assumption could result in unexpected errors, such as those exposed by #8931. Modifications Anywhere that the data passed to writeInbound() might be used again, ensure that either: - A copy is used rather than wrapping a shared byte array, or - The buffer is otherwise protected from modification by making it read-only For the tests, copying is preferred since it still allows the "mutating" optimizations to be exercised. Results Avoid possible errors when pipeline assumes it has full control of input buffer.	2019-05-22 12:35:03 +02:00
Francesco Nigro	635fc9eae0	The benchmark is not taking into account nanoTime granularity (#9033 ) Motivation: Results are just wrong for small delays. Modifications: Switching to AvarageTime avoid to rely on OS nanoTime granularity. Result: Uncontended low delay results are not reliable	2019-04-15 15:15:08 +02:00
Norman Maurer	0f34345347	Merge ChannelInboundHandler and ChannelOutboundHandler into ChannelHa… (#8957 ) Motivation: In `42742e233f` we already added default methods to Channel*Handler and deprecated the Adapter classes to simplify the class hierarchy. With this change we go even further and merge everything into just ChannelHandler. This simplifies things even more in terms of class-hierarchy. Modifications: - Merge ChannelInboundHandler \| ChannelOutboundHandler into ChannelHandler - Adjust code to just use ChannelHandler - Deprecate old interfaces. Result: Cleaner and simpler code in terms of class-hierarchy.	2019-03-28 09:28:27 +00:00
Norman Maurer	42742e233f	Deprecate ChannelInboundHandlerAdapter and ChannelOutboundHandlerAdapter (#8929 ) Motivation: As we now us java8 as minimum java version we can deprecate ChannelInboundHandlerAdapter / ChannelOutboundHandlerAdapter and just move the default implementations into the interfaces. This makes things a bit more flexible for the end-user and also simplifies the class-hierarchy. Modifications: - Mark ChannelInboundHandlerAdapter and ChannelOutboundHandlerAdapter as deprecated - Add default implementations to ChannelInboundHandler / ChannelOutboundHandler - Refactor our code to not use ChannelInboundHandlerAdapter / ChannelOutboundHandlerAdapter anymore Result: Cleanup class-hierarchy and make things a bit more flexible.	2019-03-13 09:46:10 +01:00
Norman Maurer	c6b372f517	Use maven plugin to prevent API/ABI breakage as part of build process (#8904 ) Motivation: Netty is very widely used which can lead to a lot of pain when we break API / ABI. We should make use japicmp-maven-plugin during the build to verify we do not introduce breakage by mistake. Modifications: - Add japicmp-maven-plugin to the build process - Fix a method signature change in HttpProxyHandler that was flagged as a possible problem. Result: Ensure no API/ABI breakage accour between releases.	2019-03-01 19:48:29 +01:00
Nick Hill	35161ad174	Further reduce ensureAccessible() overhead (#8895 ) Motivation: This PR fixes some non-negligible overhead discovered in the ByteBuf accessibility (non-zero refcount) checking. The cause turned out to be mostly twofold: - Unnecessary operations used to calculate the refcount from the "raw" encoded int field value - Call stack depths exceeding the default limit for inlining, in some places (CompositeByteBuf in particular) It's a follow-on from #8882 which uses the maxCapacity field for a simpler non-negative check. The performance gap between these two variants appears to be _mostly_ closed, but there's one exception which may warrant further analysis. Modifications: - Replace ABB.internalRefCount() with ByteBuf.isAccessible(), the default still checks for non-zero refCnt() - Just test for parity of raw refCnt instead of converting to "real", with fast-path for specific small values - Make sure isAccessible() is delegated by derived/wrapper ByteBufs - Use existing freed flag in CompositeByteBuf for faster isAccessible() - Manually inline some calls in methods like CompositeByteBuf.setLong() and AbstractReferenceCountedByteBuf.isAccessible() to reduce stack depths (to ensure default inlining limit isn't hit) - Add ByteBufAccessBenchmark which is an extension of UnsafeByteBufBenchmark (maybe latter could now be removed) Results: Before: Benchmark (bufferType) (checkAccessible) (checkBounds) Mode Cnt Score Error Units readBatch UNSAFE true true thrpt 30 84524972.863 ± 518338.811 ops/s readBatch UNSAFE_SLICE true true thrpt 30 38608795.037 ± 298176.974 ops/s readBatch HEAP true true thrpt 30 80003697.649 ± 974674.119 ops/s readBatch COMPOSITE true true thrpt 30 18495554.788 ± 108075.023 ops/s setGetLong UNSAFE true true thrpt 30 247069881.578 ± 10839162.593 ops/s setGetLong UNSAFE_SLICE true true thrpt 30 196355905.206 ± 1802420.990 ops/s setGetLong HEAP true true thrpt 30 245686644.713 ± 11769311.527 ops/s setGetLong COMPOSITE true true thrpt 30 83170940.687 ± 657524.123 ops/s setLong UNSAFE true true thrpt 30 278940253.918 ± 1807265.259 ops/s setLong UNSAFE_SLICE true true thrpt 30 202556738.764 ± 11887973.563 ops/s setLong HEAP true true thrpt 30 280045958.053 ± 2719583.400 ops/s setLong COMPOSITE true true thrpt 30 121299806.002 ± 2155084.707 ops/s After: Benchmark (bufferType) (checkAccessible) (checkBounds) Mode Cnt Score Error Units readBatch UNSAFE true true thrpt 30 101641801.035 ± 3950050.059 ops/s readBatch UNSAFE_SLICE true true thrpt 30 84395902.846 ± 4339579.057 ops/s readBatch HEAP true true thrpt 30 100179060.207 ± 3222487.287 ops/s readBatch COMPOSITE true true thrpt 30 42288494.472 ± 294919.633 ops/s setGetLong UNSAFE true true thrpt 30 304530755.027 ± 6574163.899 ops/s setGetLong UNSAFE_SLICE true true thrpt 30 212028547.645 ± 14277828.768 ops/s setGetLong HEAP true true thrpt 30 309335422.609 ± 2272150.415 ops/s setGetLong COMPOSITE true true thrpt 30 160383609.236 ± 966484.033 ops/s setLong UNSAFE true true thrpt 30 298055969.747 ± 7437449.627 ops/s setLong UNSAFE_SLICE true true thrpt 30 223784178.650 ± 9869750.095 ops/s setLong HEAP true true thrpt 30 302543263.328 ± 8140104.706 ops/s setLong COMPOSITE true true thrpt 30 157083673.285 ± 3528779.522 ops/s There's also a similar knock-on improvement to other benchmarks (e.g. HPACK encoding/decoding) as shown in #8882. For sanity I did a final comparison of the "fast path" tweak using one of the HPACK benchmarks: (rawCnt & 1) == 0: Benchmark (limitToAscii) (sensitive) (size) Mode Cnt Score Error Units HpackDecoderBenchmark.decode true true MEDIUM thrpt 30 50914.479 ± 940.114 ops/s rawCnt == 2 \|\| rawCnt == 4 \|\| rawCnt == 6 \|\| rawCnt == 8 \|\| (rawCnt & 1) == 0: Benchmark (limitToAscii) (sensitive) (size) Mode Cnt Score Error Units HpackDecoderBenchmark.decode true true MEDIUM thrpt 30 60036.425 ± 1478.196 ops/s	2019-02-28 20:41:16 +01:00

1 2 3 4 5 ...

381 Commits