netty5

Author	SHA1	Message	Date
Andrey Mizurov	2877eef5d5	Provide ability to extend StompSubframeEncoder and improve full stomp frame encoding (allocate one buffer for full frame considering the size of the headers) (#10778 ) Motivation: At the moment `StompSubframeEncoder` encode a frame only to `ByteBuf` it is not convenient if further we need to convert it to another type of message, e.g. `WebSocketFrame`. Also, if we send a full frame, it splits into two headers and a content what makes it difficult to convert it in the next handler. Modification: Introduce additional converter methods e.g. (`Object protected convertFullFrame(StompFrame original, ByteBuf encoded`)...) for extending encoder functionality and allocate only one `ByteBuf` for full stomp frame. Change headers size calculation, previously used only 256 bytes that reallocate a new buffer each time when headers size more than this threshold. Add `StompEncoderBenchmark`. Result: Improved `StompSubframeEncoder` fro extensions. Previous version benchmark ``` Benchmark (contentLength) (headersType) (pooledAllocator) Mode Cnt Score Error Units StompEncoderBenchmark.writeStompFrame 0 ONE true thrpt 10 4432132.884 ± 178923.436 ops/s StompEncoderBenchmark.writeStompFrame 0 ONE false thrpt 10 1281122.756 ± 52484.174 ops/s StompEncoderBenchmark.writeStompFrame 0 THREE true thrpt 10 2980897.937 ± 130253.049 ops/s StompEncoderBenchmark.writeStompFrame 0 THREE false thrpt 10 1116883.574 ± 35471.482 ops/s StompEncoderBenchmark.writeStompFrame 0 SEVEN true thrpt 10 1988012.159 ± 74352.450 ops/s StompEncoderBenchmark.writeStompFrame 0 SEVEN false thrpt 10 881772.343 ± 94633.870 ops/s StompEncoderBenchmark.writeStompFrame 0 ELEVEN true thrpt 10 1048125.919 ± 151053.902 ops/s StompEncoderBenchmark.writeStompFrame 0 ELEVEN false thrpt 10 429900.066 ± 47956.661 ops/s StompEncoderBenchmark.writeStompFrame 0 TWENTY true thrpt 10 660584.122 ± 104973.439 ops/s StompEncoderBenchmark.writeStompFrame 0 TWENTY false thrpt 10 278255.488 ± 20143.708 ops/s StompEncoderBenchmark.writeStompFrame 10 ONE true thrpt 10 4251498.549 ± 625050.979 ops/s StompEncoderBenchmark.writeStompFrame 10 ONE false thrpt 10 1214006.861 ± 60421.601 ops/s StompEncoderBenchmark.writeStompFrame 10 THREE true thrpt 10 3117736.486 ± 173613.974 ops/s StompEncoderBenchmark.writeStompFrame 10 THREE false thrpt 10 1046605.891 ± 94428.064 ops/s StompEncoderBenchmark.writeStompFrame 10 SEVEN true thrpt 10 2006986.881 ± 108456.748 ops/s StompEncoderBenchmark.writeStompFrame 10 SEVEN false thrpt 10 877983.112 ± 82919.387 ops/s StompEncoderBenchmark.writeStompFrame 10 ELEVEN true thrpt 10 1132844.437 ± 84578.571 ops/s StompEncoderBenchmark.writeStompFrame 10 ELEVEN false thrpt 10 429334.649 ± 35403.161 ops/s StompEncoderBenchmark.writeStompFrame 10 TWENTY true thrpt 10 657093.390 ± 48092.947 ops/s StompEncoderBenchmark.writeStompFrame 10 TWENTY false thrpt 10 252140.876 ± 37337.255 ops/s StompEncoderBenchmark.writeStompFrame 100 ONE true thrpt 10 4720507.067 ± 100993.908 ops/s StompEncoderBenchmark.writeStompFrame 100 ONE false thrpt 10 1266182.925 ± 85888.413 ops/s StompEncoderBenchmark.writeStompFrame 100 THREE true thrpt 10 2898746.621 ± 452579.753 ops/s StompEncoderBenchmark.writeStompFrame 100 THREE false thrpt 10 1019555.288 ± 65640.507 ops/s StompEncoderBenchmark.writeStompFrame 100 SEVEN true thrpt 10 2259187.459 ± 20025.989 ops/s StompEncoderBenchmark.writeStompFrame 100 SEVEN false thrpt 10 896405.412 ± 53750.148 ops/s StompEncoderBenchmark.writeStompFrame 100 ELEVEN true thrpt 10 1110670.772 ± 107650.327 ops/s StompEncoderBenchmark.writeStompFrame 100 ELEVEN false thrpt 10 445187.398 ± 28845.959 ops/s StompEncoderBenchmark.writeStompFrame 100 TWENTY true thrpt 10 611506.846 ± 25304.240 ops/s StompEncoderBenchmark.writeStompFrame 100 TWENTY false thrpt 10 247687.007 ± 43471.578 ops/s StompEncoderBenchmark.writeStompFrame 1000 ONE true thrpt 10 4140949.576 ± 270274.087 ops/s StompEncoderBenchmark.writeStompFrame 1000 ONE false thrpt 10 1154515.598 ± 134413.876 ops/s StompEncoderBenchmark.writeStompFrame 1000 THREE true thrpt 10 3349996.875 ± 162309.889 ops/s StompEncoderBenchmark.writeStompFrame 1000 THREE false thrpt 10 1141040.562 ± 5895.693 ops/s StompEncoderBenchmark.writeStompFrame 1000 SEVEN true thrpt 10 2184632.248 ± 8957.833 ops/s StompEncoderBenchmark.writeStompFrame 1000 SEVEN false thrpt 10 959545.704 ± 5835.161 ops/s StompEncoderBenchmark.writeStompFrame 1000 ELEVEN true thrpt 10 1081113.327 ± 3957.527 ops/s StompEncoderBenchmark.writeStompFrame 1000 ELEVEN false thrpt 10 467524.660 ± 1383.236 ops/s StompEncoderBenchmark.writeStompFrame 1000 TWENTY true thrpt 10 568411.797 ± 108712.493 ops/s StompEncoderBenchmark.writeStompFrame 1000 TWENTY false thrpt 10 260764.231 ± 43149.129 ops/s StompEncoderBenchmark.writeStompFrame 10000 ONE true thrpt 10 4369787.147 ± 619367.939 ops/s StompEncoderBenchmark.writeStompFrame 10000 ONE false thrpt 10 1246782.845 ± 47468.764 ops/s StompEncoderBenchmark.writeStompFrame 10000 THREE true thrpt 10 3333328.810 ± 253061.481 ops/s StompEncoderBenchmark.writeStompFrame 10000 THREE false thrpt 10 1108278.988 ± 81905.149 ops/s StompEncoderBenchmark.writeStompFrame 10000 SEVEN true thrpt 10 2062961.266 ± 247096.284 ops/s StompEncoderBenchmark.writeStompFrame 10000 SEVEN false thrpt 10 925199.985 ± 36734.594 ops/s StompEncoderBenchmark.writeStompFrame 10000 ELEVEN true thrpt 10 1223240.034 ± 58833.801 ops/s StompEncoderBenchmark.writeStompFrame 10000 ELEVEN false thrpt 10 460864.117 ± 2361.459 ops/s StompEncoderBenchmark.writeStompFrame 10000 TWENTY true thrpt 10 655864.762 ± 35237.335 ops/s StompEncoderBenchmark.writeStompFrame 10000 TWENTY false thrpt 10 286388.865 ± 1002.460 ops/s ``` A new version benchmark ``` Benchmark (contentLength) (headersType) (pooledAllocator) Mode Cnt Score Error Units StompEncoderBenchmark.writeStompFrame 0 ONE true thrpt 10 4366110.018 ± 420377.867 ops/s StompEncoderBenchmark.writeStompFrame 0 ONE false thrpt 10 1289437.153 ± 215271.656 ops/s StompEncoderBenchmark.writeStompFrame 0 THREE true thrpt 10 2818791.355 ± 218894.471 ops/s StompEncoderBenchmark.writeStompFrame 0 THREE false thrpt 10 1040151.615 ± 75352.695 ops/s StompEncoderBenchmark.writeStompFrame 0 SEVEN true thrpt 10 1842144.001 ± 94668.864 ops/s StompEncoderBenchmark.writeStompFrame 0 SEVEN false thrpt 10 916742.825 ± 65467.820 ops/s StompEncoderBenchmark.writeStompFrame 0 ELEVEN true thrpt 10 1310454.012 ± 100747.490 ops/s StompEncoderBenchmark.writeStompFrame 0 ELEVEN false thrpt 10 679934.001 ± 82168.249 ops/s StompEncoderBenchmark.writeStompFrame 0 TWENTY true thrpt 10 746867.549 ± 68373.269 ops/s StompEncoderBenchmark.writeStompFrame 0 TWENTY false thrpt 10 483316.314 ± 50978.009 ops/s StompEncoderBenchmark.writeStompFrame 10 ONE true thrpt 10 4791698.722 ± 263890.510 ops/s StompEncoderBenchmark.writeStompFrame 10 ONE false thrpt 10 1289877.116 ± 128677.185 ops/s StompEncoderBenchmark.writeStompFrame 10 THREE true thrpt 10 2984662.187 ± 395567.524 ops/s StompEncoderBenchmark.writeStompFrame 10 THREE false thrpt 10 1079028.782 ± 43548.555 ops/s StompEncoderBenchmark.writeStompFrame 10 SEVEN true thrpt 10 1806763.709 ± 59162.209 ops/s StompEncoderBenchmark.writeStompFrame 10 SEVEN false thrpt 10 935274.980 ± 22064.148 ops/s StompEncoderBenchmark.writeStompFrame 10 ELEVEN true thrpt 10 1284172.151 ± 119068.047 ops/s StompEncoderBenchmark.writeStompFrame 10 ELEVEN false thrpt 10 687174.498 ± 30270.916 ops/s StompEncoderBenchmark.writeStompFrame 10 TWENTY true thrpt 10 803843.483 ± 29106.133 ops/s StompEncoderBenchmark.writeStompFrame 10 TWENTY false thrpt 10 502134.552 ± 23653.215 ops/s StompEncoderBenchmark.writeStompFrame 100 ONE true thrpt 10 4337438.694 ± 378524.452 ops/s StompEncoderBenchmark.writeStompFrame 100 ONE false thrpt 10 1289174.213 ± 50640.853 ops/s StompEncoderBenchmark.writeStompFrame 100 THREE true thrpt 10 3232767.156 ± 311934.194 ops/s StompEncoderBenchmark.writeStompFrame 100 THREE false thrpt 10 1115247.028 ± 15683.477 ops/s StompEncoderBenchmark.writeStompFrame 100 SEVEN true thrpt 10 2213147.232 ± 86326.187 ops/s StompEncoderBenchmark.writeStompFrame 100 SEVEN false thrpt 10 901120.188 ± 71344.491 ops/s StompEncoderBenchmark.writeStompFrame 100 ELEVEN true thrpt 10 1238317.714 ± 68148.477 ops/s StompEncoderBenchmark.writeStompFrame 100 ELEVEN false thrpt 10 671336.339 ± 72735.337 ops/s StompEncoderBenchmark.writeStompFrame 100 TWENTY true thrpt 10 754565.791 ± 28574.382 ops/s StompEncoderBenchmark.writeStompFrame 100 TWENTY false thrpt 10 498939.383 ± 38146.118 ops/s StompEncoderBenchmark.writeStompFrame 1000 ONE true thrpt 10 3722594.471 ± 515861.000 ops/s StompEncoderBenchmark.writeStompFrame 1000 ONE false thrpt 10 1265629.633 ± 84113.347 ops/s StompEncoderBenchmark.writeStompFrame 1000 THREE true thrpt 10 2829696.349 ± 172520.267 ops/s StompEncoderBenchmark.writeStompFrame 1000 THREE false thrpt 10 1111454.609 ± 26275.913 ops/s StompEncoderBenchmark.writeStompFrame 1000 SEVEN true thrpt 10 1901506.449 ± 37701.353 ops/s StompEncoderBenchmark.writeStompFrame 1000 SEVEN false thrpt 10 912528.888 ± 46221.215 ops/s StompEncoderBenchmark.writeStompFrame 1000 ELEVEN true thrpt 10 1299674.123 ± 21889.002 ops/s StompEncoderBenchmark.writeStompFrame 1000 ELEVEN false thrpt 10 724527.644 ± 2757.370 ops/s StompEncoderBenchmark.writeStompFrame 1000 TWENTY true thrpt 10 811389.799 ± 2606.626 ops/s StompEncoderBenchmark.writeStompFrame 1000 TWENTY false thrpt 10 504955.449 ± 6737.804 ops/s StompEncoderBenchmark.writeStompFrame 10000 ONE true thrpt 10 3837912.649 ± 380742.919 ops/s StompEncoderBenchmark.writeStompFrame 10000 ONE false thrpt 10 1375544.306 ± 3157.068 ops/s StompEncoderBenchmark.writeStompFrame 10000 THREE true thrpt 10 3224743.448 ± 297369.719 ops/s StompEncoderBenchmark.writeStompFrame 10000 THREE false thrpt 10 1125772.007 ± 4051.498 ops/s StompEncoderBenchmark.writeStompFrame 10000 SEVEN true thrpt 10 2127352.136 ± 106787.777 ops/s StompEncoderBenchmark.writeStompFrame 10000 SEVEN false thrpt 10 934848.418 ± 4564.147 ops/s StompEncoderBenchmark.writeStompFrame 10000 ELEVEN true thrpt 10 1379672.772 ± 8778.640 ops/s StompEncoderBenchmark.writeStompFrame 10000 ELEVEN false thrpt 10 723169.459 ± 2317.767 ops/s StompEncoderBenchmark.writeStompFrame 10000 TWENTY true thrpt 10 802275.113 ± 4155.137 ops/s StompEncoderBenchmark.writeStompFrame 10000 TWENTY false thrpt 10 517604.265 ± 3398.384 ops/s ``` For headers over 256 bytes we get a speedup.	2020-12-07 09:59:17 +01:00
Norman Maurer	2dae6665f4	Fix caching for normal allocations (#10825 ) Motivation: https://github.com/netty/netty/pull/10267 introduced a change that reduced the fragmentation. Unfortunally it also introduced a regression when it comes to caching of normal allocations. This can have a negative performance impact depending on the allocation sizes. Modifications: - Fix algorithm to calculate the array size for normal allocation caches - Correctly calculate indeox for normal caches - Add unit test Result: Fixes https://github.com/netty/netty/issues/10805	2020-11-25 15:09:39 +01:00
Frédéric Brégier	3a58063fe7	Fix for performance regression on HttpPost RequestDecoder (#10623 ) Fix issue #10508 where PARANOID mode slow down about 1000 times compared to ADVANCED. Also fix a rare issue when internal buffer was growing over a limit, it was partially discarded using `discardReadBytes()` which causes bad changes within previously discovered HttpData. Reasons were: Too many `readByte()` method calls while other ways exist (such as keep in memory the last scan position when trying to find a delimiter or using `bytesBefore(firstByte)` instead of looping externally). Changes done: - major change on way buffer are parsed: instead of read byte per byte until found delimiter, try to find the delimiter using `bytesBefore()` and keep the last unfound position to skeep already parsed parts (algorithms are the same but implementation of scan are different) - Change the condition to discard read bytes when refCnt is at most 1. Observations using Async-Profiler: ================================== 1) Without optimizations, most of the time (more than 95%) is through `readByte()` method within `loadDataMultipartStandard` method. 2) With using `bytesBefore(byte)` instead of `readByte()` to find various delimiter, the `loadDataMultipartStandard` method is going down to 19 to 33% depending on the test used. the `readByte()` method or equivalent `getByte(pos)` method are going down to 15% (from 95%). Times are confirming those profiling: - With optimizations, in SIMPLE mode about 82% better, in ADVANCED mode about 79% better and in PARANOID mode about 99% better (most of the duplicate read accesses are removed or make internally through `bytesBefore(byte)` method) A benchmark is added to show the behavior of the various cases (one big item, such as File upload, and many items) and various level of detection (Disabled, Simple, Advanced, Paranoid). This benchmark is intend to alert if new implementations make too many differences (such as the previous version where about PARANOID gives about 1000 times slower than other levels, while it is now about at most 10 times). Extract of Benchmark run: ========================= Run complete. Total time: 00:13:27 Benchmark Mode Cnt Score Error Units HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel thrpt 6 2,248 ± 0,198 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel thrpt 6 2,067 ± 1,219 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel thrpt 6 1,109 ± 0,038 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel thrpt 6 2,326 ± 0,314 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel thrpt 6 1,444 ± 0,226 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel thrpt 6 1,462 ± 0,642 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel thrpt 6 0,159 ± 0,003 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel thrpt 6 1,522 ± 0,049 ops/ms	2020-11-19 08:01:05 +01:00
Norman Maurer	eeece4cfa5	Use http in xmlns URIs to make maven release plugin happy again (#10788 ) Motivation: https in xmlns URIs does not work and will let the maven release plugin fail: ``` [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 1.779 s [INFO] Finished at: 2020-11-10T07:45:21Z [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.apache.maven.plugins:maven-release-plugin:2.5.3:prepare (default-cli) on project netty-parent: Execution default-cli of goal org.apache.maven.plugins:maven-release-plugin:2.5.3:prepare failed: The namespace xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" could not be added as a namespace to "project": The namespace prefix "xsi" collides with an additional namespace declared by the element -> [Help 1] [ERROR] ``` See also https://issues.apache.org/jira/browse/HBASE-24014. Modifications: Use http for xmlns Result: Be able to use maven release plugin	2020-11-10 10:51:05 +01:00
Chris Vest	a6b749843f	Use JUnit 5 for running all tests (#10764 ) Motivation: JUnit 5 is the new hotness. It's more expressive, extensible, and composable in many ways, and it's better able to run tests in parallel. But most importantly, it's able to directly run JUnit 4 tests. This means we can update and start using JUnit 5 without touching any of our existing tests. I'm also introducing a dependency on assertj-core, which is like hamcrest, but arguably has a nicer and more discoverable API. Modification: Add the JUnit 5 and assertj-core dependencies, without converting any tests at time time. Result: All our tests are now executed through the JUnit 5 Vintage Engine. Also, the JUnit 5 test APIs are available, and any JUnit 5 tests that are added from now on will also be executed.	2020-11-04 10:21:03 +01:00
Artem Smotrakov	b8ae2a2af4	Enable nohttp check during the build (#10708 ) Motivation: HTTP is a plaintext protocol which means that someone may be able to eavesdrop the data. To prevent this, HTTPS should be used whenever possible. However, maintaining using https:// in all URLs may be difficult. The nohttp tool can help here. The tool scans all the files in a repository and reports where http:// is used. Modifications: - Added nohttp (via checkstyle) into the build process. - Suppressed findings for the websites that don't support HTTPS or that are not reachable Result: - Prevent using HTTP in the future. - Encourage users to use HTTPS when they follow the links they found in the code.	2020-10-23 15:26:25 +02:00
Francesco Nigro	4624b6309d	Reduce DefaultAttributeMap lookup cost (#10530 ) Motivation: DefaultAttributeMap::attr has a blocking behaviour on lookup of an existing attribute: it can be made non-blocking. Modification: Replace the existing fixed bucket table using a locked intrusive linked list with an hand-rolled copy-on-write ordered single array Result: Non blocking behaviour for the lookup happy path	2020-10-02 21:19:03 +02:00
Chris Vest	1d7efbddd9	Fix compilation after forward port of #10368 Motivation: Code failed to compile because ByteBuf index marking has been removed. Modification: Index marking wasn't really used anyway, so just set the relevant index to zero. Result: Code compiles again.	2020-09-09 16:27:52 +02:00
Francesco Nigro	7f86f90646	Improve predictability of writeUtf8/writeAscii performance (#10368 ) Motivation: writeUtf8 can suffer from inlining issues and/or megamorphic call-sites on the hot path due to ByteBuf hierarchy Modifications: Duplicate and specialize the code paths to reduce the need of polymorphic calls Result: Performance are more stable in user code	2020-09-09 16:15:22 +02:00
Francesco Nigro	319a4bc3ba	Reduce garbage on MQTT (#10509 ) Reduce garbage on MQTT encoding Motivation: MQTT encoding and decoding is doing unnecessary object allocation in a number of places: - MqttEncoder create many byte[] to encode Strings into UTF-8 bytes - MqttProperties uses Integer keys instead of int - Some enums valueOf create unnecessary arrays on the hot paths - MqttDecoder was using unecessary Result<T> Modification: - ByteBufUtil::utf8Bytes and ByteBufUtil::reserveAndWriteUtf8 allows to perform the same operation GC-free - MqttProperties uses a primitive key map - Implemented GC free const table lookup/switch valueOf - Use some bit-tricks to pack 2 ints into a single primitive long to store both result and numberOfBytesConsumed and use byte[].length to compute numberOfByteConsumed on fly. These changes allowed to save creating Result<T>. Result: Significantly less garbage produced in MQTT encoding/decoding	2020-09-04 18:31:53 +02:00
Francesco Nigro	0a8c9192e5	Improve MqttMessageType::valueOf cost (#10400 ) Motivation: MqttMessageType::valueOf has O(N) cost Modifications: MqttMessageType::valueOf uses a const lookup table Result: MqttMessageType::valueOf has O(1) cost	2020-08-31 10:32:48 +02:00
Linas Medžiūnas	abdcf102da	Efficient BytBuf search algorithms (#9914 ) (#9955 ) Motivation: We have found out that ByteBufUtil.indexOf can be inefficient for substring search on ByteBuf, both in terms of algorithm complexity (worst case O(needle.readableBytes * haystack.readableBytes)), and in constant factor (esp. on Composite buffers). With implementation of more performant search algorithms we have seen improvements on the order of magnitude. Modifications: This change introduces three search algorithms: 1. Knuth Morris Pratt - classical textbook algorithm, a good default choice. 2. Bit mask based algorithm - stable performance on any input, but limited to maximum search substring (the needle) length of 64 bytes. 3. Aho–Corasick - worse performance and higher memory consumption than [1] and [2], but it supports multiple substring (the needles) search simultaneously, by inspecting every byte of the haystack only once. Each algorithm processes every byte of underlying buffer only once, they are implemented as ByteProcessor. Result: Efficient search algorithms with linear time complexity available in Netty (I will share benchmark results in a comment on a PR).	2020-04-15 10:26:53 +02:00
Dmitry Konstantinov	dc69c04434	Replace usage() with freeBytes() in thresholds within hot paths of PoolChunkList (#10141 ) Motivation: PoolChunk.usage() method has non-trivial computations. It is used currently in hot path methods invoked when an allocation and de-allocation are happened. The idea is to replace usage() output comparison against percent thresholds by Chunk.freeBytes plain comparison against absolute thresholds. In such way the majority of computations from the threshold conditions are moved to init logic. Modifications: Replace PoolChunk.usage() conditions in PoolChunkList with equivalent conditions for PoolChunk.freeBytes() Result: Improve performance of allocation and de-allocation of ByteBuf from normal size cache pool	2020-03-31 22:11:42 +02:00
Norman Maurer	6a43807843	Use lambdas whenever possible (#9979 ) Motivation: We should update our code to use lamdas whenever possible Modifications: Use lambdas when possible Result: Cleanup code for Java8	2020-01-30 09:28:24 +01:00
Norman Maurer	9e29c39daa	Cleanup usage of Channel*Handler (#9959 ) Motivation: In next major version of netty users should use ChannelHandler everywhere. We should ensure we do the same Modifications: Replace usage of deprecated classes / interfaces with ChannelHandler Result: Use non-deprecated code	2020-01-20 17:47:17 -08:00
Francesco Nigro	1e4f0e6a09	Faster decodeHexNibble (#9896 ) Motivation: decodeHexNibble can be a lot faster using a lookup table Modifications: decodeHexNibble is made faster by using a lookup table Result: decodeHexNibble is faster	2019-12-23 21:16:44 +01:00
Anuraag Agrawal	ee206b6ba8	Separate out query string encoding for non-encoded strings. (#9887 ) Motivation: Currently, characters are appended to the encoded string char-by-char even when no encoding is needed. We can instead separate out codepath that appends the entire string in one go for better `StringBuilder` allocation performance. Modification: Only go into char-by-char loop when finding a character that requires encoding. Result: The results aren't so clear with noise on my hot laptop - the biggest impact is on long strings, both to reduce resizes of the buffer and also to reduce complexity of the loop. I don't think there's a significant downside though for the cases that hit the slow path. After ``` Benchmark Mode Cnt Score Error Units QueryStringEncoderBenchmark.longAscii thrpt 6 1.406 ± 0.069 ops/us QueryStringEncoderBenchmark.longAsciiFirst thrpt 6 0.046 ± 0.001 ops/us QueryStringEncoderBenchmark.longUtf8 thrpt 6 0.046 ± 0.001 ops/us QueryStringEncoderBenchmark.shortAscii thrpt 6 15.781 ± 0.949 ops/us QueryStringEncoderBenchmark.shortAsciiFirst thrpt 6 3.171 ± 0.232 ops/us QueryStringEncoderBenchmark.shortUtf8 thrpt 6 3.900 ± 0.667 ops/us ``` Before ``` Benchmark Mode Cnt Score Error Units QueryStringEncoderBenchmark.longAscii thrpt 6 0.444 ± 0.072 ops/us QueryStringEncoderBenchmark.longAsciiFirst thrpt 6 0.043 ± 0.002 ops/us QueryStringEncoderBenchmark.longUtf8 thrpt 6 0.047 ± 0.001 ops/us QueryStringEncoderBenchmark.shortAscii thrpt 6 16.503 ± 1.015 ops/us QueryStringEncoderBenchmark.shortAsciiFirst thrpt 6 3.316 ± 0.154 ops/us QueryStringEncoderBenchmark.shortUtf8 thrpt 6 3.776 ± 0.956 ops/us ```	2019-12-20 08:51:26 +01:00
Anuraag Agrawal	0f42eb1ceb	Use array to buffer decoded query instead of ByteBuffer. (#9886 ) Motivation: In Java, it is almost always at least slower to use `ByteBuffer` than `byte[]` without pooling or I/O. `QueryStringDecoder` can use `byte[]` with arguably simpler code. Modification: Replace `ByteBuffer` / `CharsetDecoder` with `byte[]` and `new String` Result: After ``` Benchmark Mode Cnt Score Error Units QueryStringDecoderBenchmark.noDecoding thrpt 6 5.612 ± 2.639 ops/us QueryStringDecoderBenchmark.onlyDecoding thrpt 6 1.393 ± 0.067 ops/us QueryStringDecoderBenchmark.mixedDecoding thrpt 6 1.223 ± 0.048 ops/us ``` Before ``` Benchmark Mode Cnt Score Error Units QueryStringDecoderBenchmark.noDecoding thrpt 6 6.123 ± 0.250 ops/us QueryStringDecoderBenchmark.onlyDecoding thrpt 6 0.922 ± 0.159 ops/us QueryStringDecoderBenchmark.mixedDecoding thrpt 6 1.032 ± 0.178 ops/us ``` I notice #6781 switched from an array to `ByteBuffer` but I can't find any motivation for that in the PR. Unit tests pass fine with an array and we get a reasonable speed bump.	2019-12-18 21:15:44 +01:00
Nick Hill	d370d48d4a	Update to latest JMH version (#9787 ) Motivation JMH 1.22 was released recently, we might as well use the latest when running benchmarks. Summary of changes: https://mail.openjdk.java.net/pipermail/jmh-dev/2019-November/002879.html Modifications Update jmh dependencies in microbench module from version 1.21 to 1.22. Result Benchmarks run using latest JMH	2019-11-19 11:28:36 +01:00
康智冬	1c69448e2e	Fix typos in javadocs (#9527 ) Motivation: We should have correct docs without typos Modification: Fix typos and spelling Result: More correct docs	2019-10-09 15:25:41 +02:00
jingene	af614e4d6e	Change the netty.io homepage scheme(http -> https) (#9344 ) Motivation: Netty homepage(netty.io) serves both "http" and "https". It's recommended to use https than http. Modification: I changed from "http://netty.io" to "https://netty.io" Result: No effects.	2019-07-09 21:10:14 +02:00
Norman Maurer	c5a602b272	Increase maxHeaderListSize for HpackDecoderBenchmark to be able to be… (#9321 ) Motivation: The previous used maxHeaderListSize was too low which resulted in exceptions during the benchmark run: ``` io.netty.handler.codec.http2.Http2Exception: Header size exceeded max allowed size (8192) at io.netty.handler.codec.http2.Http2Exception.connectionError(Http2Exception.java:103) at io.netty.handler.codec.http2.Http2Exception.headerListSizeError(Http2Exception.java:188) at io.netty.handler.codec.http2.Http2CodecUtil.headerListSizeExceeded(Http2CodecUtil.java:231) at io.netty.handler.codec.http2.HpackDecoder$Http2HeadersSink.finish(HpackDecoder.java:545) at io.netty.handler.codec.http2.HpackDecoder.decode(HpackDecoder.java:132) at io.netty.handler.codec.http2.HpackDecoderBenchmark.decode(HpackDecoderBenchmark.java:85) at io.netty.handler.codec.http2.generated.HpackDecoderBenchmark_decode_jmhTest.decode_thrpt_jmhStub(HpackDecoderBenchmark_decode_jmhTest.java:120) at io.netty.handler.codec.http2.generated.HpackDecoderBenchmark_decode_jmhTest.decode_Throughput(HpackDecoderBenchmark_decode_jmhTest.java:83) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:453) at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:437) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) ``` Also we should ensure we only use ascii for header names. Modifications: Just use Integer.MAX_VALUE as limit Result: Be able to run benchmark without exceptions	2019-07-04 11:24:37 +02:00
Carl Mastrangelo	65d8ecc3a0	Use Table lookup for HPACK decoder (#9307 ) Motivation: Table based decoding is fast. Modification: Use table based decoding in HPACK decoder, inspired by https://github.com/python-hyper/hpack/blob/master/hpack/huffman_table.py This modifies the table to be based on integers, rather than 3-tuples of bytes. This is for two reasons: 1. It's faster 2. Using bytes makes the static intializer too big, and doesn't compile. Result: Faster Huffman decoding. This only seems to help the ascii case, the other decoding is about the same. Benchmarks: ``` Before: Benchmark (limitToAscii) (sensitive) (size) Mode Cnt Score Error Units HpackDecoderBenchmark.decode true true SMALL thrpt 20 426293.636 ± 1444.843 ops/s HpackDecoderBenchmark.decode true true MEDIUM thrpt 20 57843.738 ± 725.704 ops/s HpackDecoderBenchmark.decode true true LARGE thrpt 20 3002.412 ± 16.998 ops/s HpackDecoderBenchmark.decode true false SMALL thrpt 20 412339.400 ± 1128.394 ops/s HpackDecoderBenchmark.decode true false MEDIUM thrpt 20 58226.870 ± 199.591 ops/s HpackDecoderBenchmark.decode true false LARGE thrpt 20 3044.256 ± 10.675 ops/s HpackDecoderBenchmark.decode false true SMALL thrpt 20 2082615.030 ± 5929.726 ops/s HpackDecoderBenchmark.decode false true MEDIUM thrpt 10 571640.454 ± 26499.229 ops/s HpackDecoderBenchmark.decode false true LARGE thrpt 20 92714.555 ± 2292.222 ops/s HpackDecoderBenchmark.decode false false SMALL thrpt 20 1745872.421 ± 6788.840 ops/s HpackDecoderBenchmark.decode false false MEDIUM thrpt 20 490420.323 ± 2455.431 ops/s HpackDecoderBenchmark.decode false false LARGE thrpt 20 84536.200 ± 398.714 ops/s After(bytes): Benchmark (limitToAscii) (sensitive) (size) Mode Cnt Score Error Units HpackDecoderBenchmark.decode true true SMALL thrpt 20 472649.148 ± 7122.461 ops/s HpackDecoderBenchmark.decode true true MEDIUM thrpt 20 66739.638 ± 341.607 ops/s HpackDecoderBenchmark.decode true true LARGE thrpt 20 3139.773 ± 24.491 ops/s HpackDecoderBenchmark.decode true false SMALL thrpt 20 466933.833 ± 4514.971 ops/s HpackDecoderBenchmark.decode true false MEDIUM thrpt 20 66111.778 ± 568.326 ops/s HpackDecoderBenchmark.decode true false LARGE thrpt 20 3143.619 ± 3.332 ops/s HpackDecoderBenchmark.decode false true SMALL thrpt 20 2109995.177 ± 6203.143 ops/s HpackDecoderBenchmark.decode false true MEDIUM thrpt 20 586026.055 ± 1578.550 ops/s HpackDecoderBenchmark.decode false false SMALL thrpt 20 1775723.270 ± 4932.057 ops/s HpackDecoderBenchmark.decode false false MEDIUM thrpt 20 493316.467 ± 1453.037 ops/s HpackDecoderBenchmark.decode false false LARGE thrpt 10 85726.219 ± 402.573 ops/s After(ints): Benchmark (limitToAscii) (sensitive) (size) Mode Cnt Score Error Units HpackDecoderBenchmark.decode true true SMALL thrpt 20 615549.006 ± 5282.283 ops/s HpackDecoderBenchmark.decode true true MEDIUM thrpt 20 86714.630 ± 654.489 ops/s HpackDecoderBenchmark.decode true true LARGE thrpt 20 3984.439 ± 61.612 ops/s HpackDecoderBenchmark.decode true false SMALL thrpt 20 602489.337 ± 5397.024 ops/s HpackDecoderBenchmark.decode true false MEDIUM thrpt 20 88399.109 ± 241.115 ops/s HpackDecoderBenchmark.decode true false LARGE thrpt 20 3875.729 ± 103.057 ops/s HpackDecoderBenchmark.decode false true SMALL thrpt 20 2092165.454 ± 11918.859 ops/s HpackDecoderBenchmark.decode false true MEDIUM thrpt 20 583465.437 ± 5452.115 ops/s HpackDecoderBenchmark.decode false true LARGE thrpt 20 93290.061 ± 665.904 ops/s HpackDecoderBenchmark.decode false false SMALL thrpt 20 1758402.495 ± 14677.438 ops/s HpackDecoderBenchmark.decode false false MEDIUM thrpt 10 491598.099 ± 5029.698 ops/s HpackDecoderBenchmark.decode false false LARGE thrpt 20 85834.290 ± 554.915 ops/s ```	2019-07-02 20:13:19 +02:00
jimin	78adeb5408	All override methods must be added @override (#9285 ) Motivation: Some methods that either override others or are implemented as part of implementation an interface did miss the `@Override` annotation Modifications: Add missing `@Override`s Result: Code cleanup	2019-06-27 13:52:06 +02:00
Alex Blewitt	e233407e01	Replace accumulation with blackhole.consume (#9275 ) Motivation: SpotJMHBugs reports that accumulating a value as a way of eliding dead code elimination may be inadvisable, as discussed in `JMHSample_34_SafeLooping::measureWrong_2`. Change the test so that it consumes the response with `Blackhole::consume` instead. Modifications: - Replace addition of results with explicit `blackhole.consume()` call Result: Tests work as before, but with different benchmark numbers.	2019-06-25 21:46:26 +02:00
Francesco Nigro	c6114786ab	Documented non-usage of BlackHole::consume on ByteBufAccessBenchmark (#9279 ) Motivation: Some JMH benchmarks need additional explanations to motivate specific code choices. Modifications: Introduced comment to explai why calling BlackHole::consume in a loop is not always the right choice for some benchmark. Result: The relevant method shows a comment that warn about changing the code to introduce BlackHole::consume in the loop.	2019-06-25 14:53:12 +02:00
Alex Blewitt	99034a15b5	Return the result of the list.recycle() call (#9264 ) Motivation: Resolve the issue highlighted by SpotJMHBugs that the creation of the RecyclableArrayList may be elided by the JIT since the result isn't consumed or returned. Modifications: Return the result of `list.recycle()` so that the list isn't elided. Result: The JMH benchmark shows a change in performance indicating that the prior results of this may be unsound.	2019-06-22 07:21:51 +02:00
Carl Mastrangelo	f01278616a	Properly debounce wakeups (#9191 ) Motivation: The wakeup logic in EpollEventLoop is overly complex Modification: * Simplify the race to wakeup the loop * Dont let the event loop wake up itself (it's already awake!) * Make event loop check if there are any more tasks after preparing to sleep. There is small window where the non-eventloop writers can issue eventfd writes here, but that is okay. Result: Cleaner wakeup logic. Benchmarks: ``` BEFORE Benchmark Mode Cnt Score Error Units EpollSocketChannelBenchmark.executeMulti thrpt 20 408381.411 ± 2857.498 ops/s EpollSocketChannelBenchmark.executeSingle thrpt 20 157022.360 ± 1240.573 ops/s EpollSocketChannelBenchmark.pingPong thrpt 20 60571.704 ± 331.125 ops/s Benchmark Mode Cnt Score Error Units EpollSocketChannelBenchmark.executeMulti thrpt 20 440546.953 ± 1652.823 ops/s EpollSocketChannelBenchmark.executeSingle thrpt 20 168114.751 ± 1176.609 ops/s EpollSocketChannelBenchmark.pingPong thrpt 20 61231.878 ± 520.108 ops/s ```	2019-06-04 05:27:15 -07:00
Nick Hill	23554e6997	Ensure "full" ownership of msgs passed to EmbeddedChannel.writeInbound() (#9058 ) Motivation Pipeline handlers are free to "take control" of input buffers if they have singular refcount - in particular to mutate their raw data if non-readonly via discarding of read bytes, etc. However there are various places (primarily unit tests) where a wrapped byte-array buffer is passed in and the wrapped array is assumed not to change (used after the wrapped buffer is passed to EmbeddedChannel.writeInbound()). This invalid assumption could result in unexpected errors, such as those exposed by #8931. Modifications Anywhere that the data passed to writeInbound() might be used again, ensure that either: - A copy is used rather than wrapping a shared byte array, or - The buffer is otherwise protected from modification by making it read-only For the tests, copying is preferred since it still allows the "mutating" optimizations to be exercised. Results Avoid possible errors when pipeline assumes it has full control of input buffer.	2019-05-22 12:35:03 +02:00
Francesco Nigro	635fc9eae0	The benchmark is not taking into account nanoTime granularity (#9033 ) Motivation: Results are just wrong for small delays. Modifications: Switching to AvarageTime avoid to rely on OS nanoTime granularity. Result: Uncontended low delay results are not reliable	2019-04-15 15:15:08 +02:00
Norman Maurer	0f34345347	Merge ChannelInboundHandler and ChannelOutboundHandler into ChannelHa… (#8957 ) Motivation: In `42742e233f` we already added default methods to Channel*Handler and deprecated the Adapter classes to simplify the class hierarchy. With this change we go even further and merge everything into just ChannelHandler. This simplifies things even more in terms of class-hierarchy. Modifications: - Merge ChannelInboundHandler \| ChannelOutboundHandler into ChannelHandler - Adjust code to just use ChannelHandler - Deprecate old interfaces. Result: Cleaner and simpler code in terms of class-hierarchy.	2019-03-28 09:28:27 +00:00
Norman Maurer	42742e233f	Deprecate ChannelInboundHandlerAdapter and ChannelOutboundHandlerAdapter (#8929 ) Motivation: As we now us java8 as minimum java version we can deprecate ChannelInboundHandlerAdapter / ChannelOutboundHandlerAdapter and just move the default implementations into the interfaces. This makes things a bit more flexible for the end-user and also simplifies the class-hierarchy. Modifications: - Mark ChannelInboundHandlerAdapter and ChannelOutboundHandlerAdapter as deprecated - Add default implementations to ChannelInboundHandler / ChannelOutboundHandler - Refactor our code to not use ChannelInboundHandlerAdapter / ChannelOutboundHandlerAdapter anymore Result: Cleanup class-hierarchy and make things a bit more flexible.	2019-03-13 09:46:10 +01:00
Norman Maurer	c6b372f517	Use maven plugin to prevent API/ABI breakage as part of build process (#8904 ) Motivation: Netty is very widely used which can lead to a lot of pain when we break API / ABI. We should make use japicmp-maven-plugin during the build to verify we do not introduce breakage by mistake. Modifications: - Add japicmp-maven-plugin to the build process - Fix a method signature change in HttpProxyHandler that was flagged as a possible problem. Result: Ensure no API/ABI breakage accour between releases.	2019-03-01 19:48:29 +01:00
Nick Hill	35161ad174	Further reduce ensureAccessible() overhead (#8895 ) Motivation: This PR fixes some non-negligible overhead discovered in the ByteBuf accessibility (non-zero refcount) checking. The cause turned out to be mostly twofold: - Unnecessary operations used to calculate the refcount from the "raw" encoded int field value - Call stack depths exceeding the default limit for inlining, in some places (CompositeByteBuf in particular) It's a follow-on from #8882 which uses the maxCapacity field for a simpler non-negative check. The performance gap between these two variants appears to be _mostly_ closed, but there's one exception which may warrant further analysis. Modifications: - Replace ABB.internalRefCount() with ByteBuf.isAccessible(), the default still checks for non-zero refCnt() - Just test for parity of raw refCnt instead of converting to "real", with fast-path for specific small values - Make sure isAccessible() is delegated by derived/wrapper ByteBufs - Use existing freed flag in CompositeByteBuf for faster isAccessible() - Manually inline some calls in methods like CompositeByteBuf.setLong() and AbstractReferenceCountedByteBuf.isAccessible() to reduce stack depths (to ensure default inlining limit isn't hit) - Add ByteBufAccessBenchmark which is an extension of UnsafeByteBufBenchmark (maybe latter could now be removed) Results: Before: Benchmark (bufferType) (checkAccessible) (checkBounds) Mode Cnt Score Error Units readBatch UNSAFE true true thrpt 30 84524972.863 ± 518338.811 ops/s readBatch UNSAFE_SLICE true true thrpt 30 38608795.037 ± 298176.974 ops/s readBatch HEAP true true thrpt 30 80003697.649 ± 974674.119 ops/s readBatch COMPOSITE true true thrpt 30 18495554.788 ± 108075.023 ops/s setGetLong UNSAFE true true thrpt 30 247069881.578 ± 10839162.593 ops/s setGetLong UNSAFE_SLICE true true thrpt 30 196355905.206 ± 1802420.990 ops/s setGetLong HEAP true true thrpt 30 245686644.713 ± 11769311.527 ops/s setGetLong COMPOSITE true true thrpt 30 83170940.687 ± 657524.123 ops/s setLong UNSAFE true true thrpt 30 278940253.918 ± 1807265.259 ops/s setLong UNSAFE_SLICE true true thrpt 30 202556738.764 ± 11887973.563 ops/s setLong HEAP true true thrpt 30 280045958.053 ± 2719583.400 ops/s setLong COMPOSITE true true thrpt 30 121299806.002 ± 2155084.707 ops/s After: Benchmark (bufferType) (checkAccessible) (checkBounds) Mode Cnt Score Error Units readBatch UNSAFE true true thrpt 30 101641801.035 ± 3950050.059 ops/s readBatch UNSAFE_SLICE true true thrpt 30 84395902.846 ± 4339579.057 ops/s readBatch HEAP true true thrpt 30 100179060.207 ± 3222487.287 ops/s readBatch COMPOSITE true true thrpt 30 42288494.472 ± 294919.633 ops/s setGetLong UNSAFE true true thrpt 30 304530755.027 ± 6574163.899 ops/s setGetLong UNSAFE_SLICE true true thrpt 30 212028547.645 ± 14277828.768 ops/s setGetLong HEAP true true thrpt 30 309335422.609 ± 2272150.415 ops/s setGetLong COMPOSITE true true thrpt 30 160383609.236 ± 966484.033 ops/s setLong UNSAFE true true thrpt 30 298055969.747 ± 7437449.627 ops/s setLong UNSAFE_SLICE true true thrpt 30 223784178.650 ± 9869750.095 ops/s setLong HEAP true true thrpt 30 302543263.328 ± 8140104.706 ops/s setLong COMPOSITE true true thrpt 30 157083673.285 ± 3528779.522 ops/s There's also a similar knock-on improvement to other benchmarks (e.g. HPACK encoding/decoding) as shown in #8882. For sanity I did a final comparison of the "fast path" tweak using one of the HPACK benchmarks: (rawCnt & 1) == 0: Benchmark (limitToAscii) (sensitive) (size) Mode Cnt Score Error Units HpackDecoderBenchmark.decode true true MEDIUM thrpt 30 50914.479 ± 940.114 ops/s rawCnt == 2 \|\| rawCnt == 4 \|\| rawCnt == 6 \|\| rawCnt == 8 \|\| (rawCnt & 1) == 0: Benchmark (limitToAscii) (sensitive) (size) Mode Cnt Score Error Units HpackDecoderBenchmark.decode true true MEDIUM thrpt 30 60036.425 ± 1478.196 ops/s	2019-02-28 20:41:16 +01:00
Dmitriy Dumanskiy	116f72db8d	Legacy properties removed (#8839 ) Motivation: We can remove some properties for which we introduced replacements. Modifications: io.netty.buffer.bytebuf.checkAccessible, io.netty.leakDetectionLevel, org.jboss.netty.tryUnsafe properties removed Result: Code cleanup	2019-02-04 13:56:15 +01:00
田欧	e8efcd82a8	migrate java8: use requireNonNull (#8840 ) Motivation: We can just use Objects.requireNonNull(...) as a replacement for ObjectUtil.checkNotNull(....) Modifications: - Use Objects.requireNonNull(...) Result: Less code to maintain.	2019-02-04 10:32:25 +01:00
Dmitriy Dumanskiy	2e433889b2	Improve DateFormatter parsing performance (#8821 ) Motivation: Just was looking through code and found 1 interesting place DateFormatter.tryParseMonth that was not very effective, so I decided to optimize it a bit. Modification: Changed DateFormatter.tryParseMonth method. Instead of invocation regionMatch() for every month - compare chars one by one. Result: DateFormatter.parseHttpDate method performance improved from ~3% to ~15%. Benchmark (DATE_STRING) Mode Cnt Score Error Units DateFormatter2Benchmark.parseHttpHeaderDateFormatter Sun, 27 Jan 2016 19:18:46 GMT thrpt 6 4142781.221 ± 82155.002 ops/s DateFormatter2Benchmark.parseHttpHeaderDateFormatter Sun, 27 Dec 2016 19:18:46 GMT thrpt 6 3781810.558 ± 38679.061 ops/s DateFormatter2Benchmark.parseHttpHeaderDateFormatterNew Sun, 27 Jan 2016 19:18:46 GMT thrpt 6 4372569.705 ± 30257.537 ops/s DateFormatter2Benchmark.parseHttpHeaderDateFormatterNew Sun, 27 Dec 2016 19:18:46 GMT thrpt 6 4339785.100 ± 57542.660 ops/s	2019-02-04 10:04:35 +01:00
Norman Maurer	e3846c54f6	Remove ChannelHandler.exceptionCaught(...) as it should only exist in… (#8822 ) Motivation: ChannelHandler.exceptionCaught(...) was marked as @deprecated as it should only exist in inbound handlers. Modifications: Remove ChannelHandler.exceptionCaught(...) and adjust code / tests. Result: Fixes https://github.com/netty/netty/issues/8527	2019-01-31 20:29:17 +01:00
Dmitriy Dumanskiy	67b23ab056	Remove HttpHeaderDateFormat class (#8807 ) Motivation: HttpHeaderDateFormat was replaced with DateFormatter many days ago and now can be easily removed. Modification: Remove deprecated class and related test / benchmark Result: Less code to maintain	2019-01-31 07:22:20 +01:00
田欧	6222101924	migrate java8: use lambda and method reference (#8781 ) Motivation: We can use lambdas now as we use Java8. Modification: use lambda function for all package, #8751 only migrate transport package. Result: Code cleanup.	2019-01-29 14:06:05 +01:00
Norman Maurer	310f31b392	Update to new checkstyle plugin (#8777 ) Motivation: We need to update to a new checkstyle plugin to allow the usage of lambdas. Modifications: - Update to new plugin version. - Fix checkstyle problems. Result: Be able to use checkstyle plugin which supports new Java syntax.	2019-01-24 16:24:19 +01:00
Norman Maurer	3d6e6136a9	Decouple EventLoop details from the IO handling for each transport to… (#8680 ) * Decouble EventLoop details from the IO handling for each transport to allow easy re-use of code and customization Motiviation: As today extending EventLoop implementations to add custom logic / metrics / instrumentations is only possible in a very limited way if at all. This is due the fact that most implementations are final or even package-private. That said even if these would be public there are the ability to do something useful with these is very limited as the IO processing and task processing are very tightly coupled. All of the mentioned things are a big pain point in netty 4.x and need improvement. Modifications: This changeset decoubled the IO processing logic from the task processing logic for the main transport (NIO, Epoll, KQueue) by introducing the concept of an IoHandler. The IoHandler itself is responsible to wait for IO readiness and process these IO events. The execution of the IoHandler itself is done by the SingleThreadEventLoop as part of its EventLoop processing. This allows to use the same EventLoopGroup (MultiThreadEventLoupGroup) for all the mentioned transports by just specify a different IoHandlerFactory during construction. Beside this core API change this changeset also allows to easily extend SingleThreadEventExecutor / SingleThreadEventLoop to add custom logic to it which then can be reused by all the transports. The ideas are very similar to what is provided by ScheduledThreadPoolExecutor (that is part of the JDK). This allows for example things like: * Adding instrumentation / metrics: * how many Channels are registered on an SingleThreadEventLoop * how many Channels were handled during the IO processing in an EventLoop run * how many task were handled during the last EventLoop / EventExecutor run * how many outstanding tasks we have ... ... * Implementing custom strategies for choosing the next EventExecutor / EventLoop to use based on these metrics. * Use different Promise / Future / ScheduledFuture implementations * decorate Runnable / Callables when submitted to the EventExecutor / EventLoop As a lot of functionalities are folded into the MultiThreadEventLoopGroup and SingleThreadEventLoopGroup this changeset also removes: * AbstractEventLoop * AbstractEventLoopGroup * EventExecutorChooser * EventExecutorChooserFactory * DefaultEventLoopGroup * DefaultEventExecutor * DefaultEventExecutorGroup Result: Fixes https://github.com/netty/netty/issues/8514 .	2019-01-23 08:32:05 +01:00
Dmitriy Dumanskiy	7b92ff2500	Java 8 migration. Remove ThreadLocalProvider and inline java.util.concurrent.ThreadLocalRandom.current() where necessary. (#8762 ) Motivation: Custom Netty ThreadLocalRandom and ThreadLocalRandomProvider classes are no longer needed and can be removed. Modification: Remove own ThreadLocalRandom Result: Less code to maintain	2019-01-22 20:14:28 +01:00
田欧	9d62deeb6f	Java 8 migration: Use diamond operator (#8749 ) Motivation: We can use the diamond operator these days. Modification: Use diamond operator whenever possible. Result: More modern code and less boiler-plate.	2019-01-22 16:07:26 +01:00
Norman Maurer	8fdf373557	Skip execution of ChannelHandler method if annotated with @Skip and just use the next handler in the pipeline. (#8723 ) Motivation: Invoking ChannelHandlers is not free and can result in some overhead when the ChannelPipeline becomes very long. This is especially true if most handlers will just forward the call to the next handler in the pipeline. When the user extends ChannelHandlerAdapter we can easily detect if can just skip the handler and invoke the next handler in the pipeline directly. This reduce the overhead of dispatch but also reduce the call-stack in many cases. Modifications: Detect if we can skip the handler when walking the pipeline. Result: Reduce overhead for long pipelines. Benchmark (extraHandlers) Mode Cnt Score Error Units DefaultChannelPipelineBenchmark.propagateEventOld 4 thrpt 10 267313.031 ± 9131.140 ops/s DefaultChannelPipelineBenchmark.propagateEvent 4 thrpt 10 824825.673 ± 12727.594 ops/s	2019-01-22 08:58:58 +01:00
Norman Maurer	1fe931b6e2	Make it possible to use a wrapped EventLoop with a Channel (#8677 ) Motiviation: Because of how we implemented the registration / deregistration of an EventLoop it was not possible to wrap an EventLoop implementation and use it with a Channel. Modification: - Introduce EventLoop.Unsafe which is responsible for the actual registration. - Move validation of EventLoop / Channel combo to the EventLoop - Add unit test that verifies that wrapping works Result: Be able to wrap an EventLoop and so add some extra functionality.	2019-01-17 09:17:51 +01:00
Norman Maurer	c10ccc5dec	Tighten contract between Channel and EventLoop by require the EventLoop on Channel construction. (#8587 ) Motivation: At the moment it’s possible to have a Channel in Netty that is not registered / assigned to an EventLoop until register(...) is called. This is suboptimal as if the Channel is not registered it is also not possible to do anything useful with a ChannelFuture that belongs to the Channel. We should think about if we should have the EventLoop as a constructor argument of a Channel and have the register / deregister method only have the effect of add a Channel to KQueue/Epoll/... It is also currently possible to deregister a Channel from one EventLoop and register it with another EventLoop. This operation defeats the threading model assumptions that are wide spread in Netty, and requires careful user level coordination to pull off without any concurrency issues. It is not a commonly used feature in practice, may be better handled by other means (e.g. client side load balancing), and therefore we propose removing this feature. Modifications: - Change all Channel implementations to require an EventLoop for construction ( + an EventLoopGroup for all ServerChannel implementations) - Remove all register(...) methods from EventLoopGroup - Add ChannelOutboundInvoker.register(...) which now basically means we want to register on the EventLoop for IO. - Change ChannelUnsafe.register(...) to not take an EventLoop as parameter (as the EventLoop is supplied on custruction). - Change ChannelFactory to take an EventLoop to create new Channels and introduce ServerChannelFactory which takes an EventLoop and one EventLoopGroup to create new ServerChannel instances. - Add ServerChannel.childEventLoopGroup() - Ensure all operations on the accepted Channel is done in the EventLoop of the Channel in ServerBootstrap - Change unit tests for new behaviour Result: A Channel always has an EventLoop assigned which will never change during its life-time. This ensures we are always be able to call any operation on the Channel once constructed (unit the EventLoop is shutdown). This also simplifies the logic in DefaultChannelPipeline a lot as we can always call handlerAdded / handlerRemoved directly without the need to wait for register() to happen. Also note that its still possible to deregister a Channel and register it again. It's just not possible anymore to move from one EventLoop to another (which was not really safe anyway). Fixes https://github.com/netty/netty/issues/8513.	2019-01-14 20:11:13 +01:00
Norman Maurer	d9a6cf341c	Remove support for marking reader and writerIndex in ByteBuf to reduce overhead and complexity. (#8636 ) Motivation: ByteBuf supports “marker indexes”. The intended use case for these is if a speculative operation (e.g. decode) is in process the user can “mark” and interface and refer to it later if the operation isn’t successful (e.g. not enough data). However this is rarely used in practice, requires extra memory to maintain, and introduces complexity in the state management for derived/pooled buffer initialization, resizing, and other operations which may modify reader/writer indexes. Modifications: Remove support for marking and adjust testcases / code. Result: Fixes https://github.com/netty/netty/issues/8535.	2018-12-11 14:00:49 +01:00
Francesco Nigro	4c2b11633a	Adding an execute burst cost benchmark for Netty executors (#8594 ) Motivation: Netty executors doesn't have yet any means to compare with each others nor to compare with the j.u.c. executors Modifications: A new benchmark measuring execute burst cost is being added Result: It's now possible to compare some of Netty executors with each others and with the j.u.c. executors	2018-12-04 15:46:48 +01:00
Norman Maurer	2c78dde749	Update version number to start working on Netty 5	2018-11-20 15:49:57 +01:00

1 2 3 4 5 ...

365 Commits