netty5

Author	SHA1	Message	Date
Norman Maurer	2dae6665f4	Fix caching for normal allocations (#10825 ) Motivation: https://github.com/netty/netty/pull/10267 introduced a change that reduced the fragmentation. Unfortunally it also introduced a regression when it comes to caching of normal allocations. This can have a negative performance impact depending on the allocation sizes. Modifications: - Fix algorithm to calculate the array size for normal allocation caches - Correctly calculate indeox for normal caches - Add unit test Result: Fixes https://github.com/netty/netty/issues/10805	2020-11-25 15:09:39 +01:00
Frédéric Brégier	3a58063fe7	Fix for performance regression on HttpPost RequestDecoder (#10623 ) Fix issue #10508 where PARANOID mode slow down about 1000 times compared to ADVANCED. Also fix a rare issue when internal buffer was growing over a limit, it was partially discarded using `discardReadBytes()` which causes bad changes within previously discovered HttpData. Reasons were: Too many `readByte()` method calls while other ways exist (such as keep in memory the last scan position when trying to find a delimiter or using `bytesBefore(firstByte)` instead of looping externally). Changes done: - major change on way buffer are parsed: instead of read byte per byte until found delimiter, try to find the delimiter using `bytesBefore()` and keep the last unfound position to skeep already parsed parts (algorithms are the same but implementation of scan are different) - Change the condition to discard read bytes when refCnt is at most 1. Observations using Async-Profiler: ================================== 1) Without optimizations, most of the time (more than 95%) is through `readByte()` method within `loadDataMultipartStandard` method. 2) With using `bytesBefore(byte)` instead of `readByte()` to find various delimiter, the `loadDataMultipartStandard` method is going down to 19 to 33% depending on the test used. the `readByte()` method or equivalent `getByte(pos)` method are going down to 15% (from 95%). Times are confirming those profiling: - With optimizations, in SIMPLE mode about 82% better, in ADVANCED mode about 79% better and in PARANOID mode about 99% better (most of the duplicate read accesses are removed or make internally through `bytesBefore(byte)` method) A benchmark is added to show the behavior of the various cases (one big item, such as File upload, and many items) and various level of detection (Disabled, Simple, Advanced, Paranoid). This benchmark is intend to alert if new implementations make too many differences (such as the previous version where about PARANOID gives about 1000 times slower than other levels, while it is now about at most 10 times). Extract of Benchmark run: ========================= Run complete. Total time: 00:13:27 Benchmark Mode Cnt Score Error Units HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel thrpt 6 2,248 ± 0,198 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel thrpt 6 2,067 ± 1,219 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel thrpt 6 1,109 ± 0,038 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel thrpt 6 2,326 ± 0,314 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel thrpt 6 1,444 ± 0,226 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel thrpt 6 1,462 ± 0,642 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel thrpt 6 0,159 ± 0,003 ops/ms HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel thrpt 6 1,522 ± 0,049 ops/ms	2020-11-19 08:01:05 +01:00
Norman Maurer	eeece4cfa5	Use http in xmlns URIs to make maven release plugin happy again (#10788 ) Motivation: https in xmlns URIs does not work and will let the maven release plugin fail: ``` [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 1.779 s [INFO] Finished at: 2020-11-10T07:45:21Z [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.apache.maven.plugins:maven-release-plugin:2.5.3:prepare (default-cli) on project netty-parent: Execution default-cli of goal org.apache.maven.plugins:maven-release-plugin:2.5.3:prepare failed: The namespace xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" could not be added as a namespace to "project": The namespace prefix "xsi" collides with an additional namespace declared by the element -> [Help 1] [ERROR] ``` See also https://issues.apache.org/jira/browse/HBASE-24014. Modifications: Use http for xmlns Result: Be able to use maven release plugin	2020-11-10 10:51:05 +01:00
Chris Vest	a6b749843f	Use JUnit 5 for running all tests (#10764 ) Motivation: JUnit 5 is the new hotness. It's more expressive, extensible, and composable in many ways, and it's better able to run tests in parallel. But most importantly, it's able to directly run JUnit 4 tests. This means we can update and start using JUnit 5 without touching any of our existing tests. I'm also introducing a dependency on assertj-core, which is like hamcrest, but arguably has a nicer and more discoverable API. Modification: Add the JUnit 5 and assertj-core dependencies, without converting any tests at time time. Result: All our tests are now executed through the JUnit 5 Vintage Engine. Also, the JUnit 5 test APIs are available, and any JUnit 5 tests that are added from now on will also be executed.	2020-11-04 10:21:03 +01:00
Artem Smotrakov	b8ae2a2af4	Enable nohttp check during the build (#10708 ) Motivation: HTTP is a plaintext protocol which means that someone may be able to eavesdrop the data. To prevent this, HTTPS should be used whenever possible. However, maintaining using https:// in all URLs may be difficult. The nohttp tool can help here. The tool scans all the files in a repository and reports where http:// is used. Modifications: - Added nohttp (via checkstyle) into the build process. - Suppressed findings for the websites that don't support HTTPS or that are not reachable Result: - Prevent using HTTP in the future. - Encourage users to use HTTPS when they follow the links they found in the code.	2020-10-23 15:26:25 +02:00
Francesco Nigro	4624b6309d	Reduce DefaultAttributeMap lookup cost (#10530 ) Motivation: DefaultAttributeMap::attr has a blocking behaviour on lookup of an existing attribute: it can be made non-blocking. Modification: Replace the existing fixed bucket table using a locked intrusive linked list with an hand-rolled copy-on-write ordered single array Result: Non blocking behaviour for the lookup happy path	2020-10-02 21:19:03 +02:00
Chris Vest	1d7efbddd9	Fix compilation after forward port of #10368 Motivation: Code failed to compile because ByteBuf index marking has been removed. Modification: Index marking wasn't really used anyway, so just set the relevant index to zero. Result: Code compiles again.	2020-09-09 16:27:52 +02:00
Francesco Nigro	7f86f90646	Improve predictability of writeUtf8/writeAscii performance (#10368 ) Motivation: writeUtf8 can suffer from inlining issues and/or megamorphic call-sites on the hot path due to ByteBuf hierarchy Modifications: Duplicate and specialize the code paths to reduce the need of polymorphic calls Result: Performance are more stable in user code	2020-09-09 16:15:22 +02:00
Francesco Nigro	319a4bc3ba	Reduce garbage on MQTT (#10509 ) Reduce garbage on MQTT encoding Motivation: MQTT encoding and decoding is doing unnecessary object allocation in a number of places: - MqttEncoder create many byte[] to encode Strings into UTF-8 bytes - MqttProperties uses Integer keys instead of int - Some enums valueOf create unnecessary arrays on the hot paths - MqttDecoder was using unecessary Result<T> Modification: - ByteBufUtil::utf8Bytes and ByteBufUtil::reserveAndWriteUtf8 allows to perform the same operation GC-free - MqttProperties uses a primitive key map - Implemented GC free const table lookup/switch valueOf - Use some bit-tricks to pack 2 ints into a single primitive long to store both result and numberOfBytesConsumed and use byte[].length to compute numberOfByteConsumed on fly. These changes allowed to save creating Result<T>. Result: Significantly less garbage produced in MQTT encoding/decoding	2020-09-04 18:31:53 +02:00
Francesco Nigro	0a8c9192e5	Improve MqttMessageType::valueOf cost (#10400 ) Motivation: MqttMessageType::valueOf has O(N) cost Modifications: MqttMessageType::valueOf uses a const lookup table Result: MqttMessageType::valueOf has O(1) cost	2020-08-31 10:32:48 +02:00
Linas Medžiūnas	abdcf102da	Efficient BytBuf search algorithms (#9914 ) (#9955 ) Motivation: We have found out that ByteBufUtil.indexOf can be inefficient for substring search on ByteBuf, both in terms of algorithm complexity (worst case O(needle.readableBytes * haystack.readableBytes)), and in constant factor (esp. on Composite buffers). With implementation of more performant search algorithms we have seen improvements on the order of magnitude. Modifications: This change introduces three search algorithms: 1. Knuth Morris Pratt - classical textbook algorithm, a good default choice. 2. Bit mask based algorithm - stable performance on any input, but limited to maximum search substring (the needle) length of 64 bytes. 3. Aho–Corasick - worse performance and higher memory consumption than [1] and [2], but it supports multiple substring (the needles) search simultaneously, by inspecting every byte of the haystack only once. Each algorithm processes every byte of underlying buffer only once, they are implemented as ByteProcessor. Result: Efficient search algorithms with linear time complexity available in Netty (I will share benchmark results in a comment on a PR).	2020-04-15 10:26:53 +02:00
Dmitry Konstantinov	dc69c04434	Replace usage() with freeBytes() in thresholds within hot paths of PoolChunkList (#10141 ) Motivation: PoolChunk.usage() method has non-trivial computations. It is used currently in hot path methods invoked when an allocation and de-allocation are happened. The idea is to replace usage() output comparison against percent thresholds by Chunk.freeBytes plain comparison against absolute thresholds. In such way the majority of computations from the threshold conditions are moved to init logic. Modifications: Replace PoolChunk.usage() conditions in PoolChunkList with equivalent conditions for PoolChunk.freeBytes() Result: Improve performance of allocation and de-allocation of ByteBuf from normal size cache pool	2020-03-31 22:11:42 +02:00
Norman Maurer	6a43807843	Use lambdas whenever possible (#9979 ) Motivation: We should update our code to use lamdas whenever possible Modifications: Use lambdas when possible Result: Cleanup code for Java8	2020-01-30 09:28:24 +01:00
Norman Maurer	9e29c39daa	Cleanup usage of Channel*Handler (#9959 ) Motivation: In next major version of netty users should use ChannelHandler everywhere. We should ensure we do the same Modifications: Replace usage of deprecated classes / interfaces with ChannelHandler Result: Use non-deprecated code	2020-01-20 17:47:17 -08:00
Francesco Nigro	1e4f0e6a09	Faster decodeHexNibble (#9896 ) Motivation: decodeHexNibble can be a lot faster using a lookup table Modifications: decodeHexNibble is made faster by using a lookup table Result: decodeHexNibble is faster	2019-12-23 21:16:44 +01:00
Anuraag Agrawal	ee206b6ba8	Separate out query string encoding for non-encoded strings. (#9887 ) Motivation: Currently, characters are appended to the encoded string char-by-char even when no encoding is needed. We can instead separate out codepath that appends the entire string in one go for better `StringBuilder` allocation performance. Modification: Only go into char-by-char loop when finding a character that requires encoding. Result: The results aren't so clear with noise on my hot laptop - the biggest impact is on long strings, both to reduce resizes of the buffer and also to reduce complexity of the loop. I don't think there's a significant downside though for the cases that hit the slow path. After ``` Benchmark Mode Cnt Score Error Units QueryStringEncoderBenchmark.longAscii thrpt 6 1.406 ± 0.069 ops/us QueryStringEncoderBenchmark.longAsciiFirst thrpt 6 0.046 ± 0.001 ops/us QueryStringEncoderBenchmark.longUtf8 thrpt 6 0.046 ± 0.001 ops/us QueryStringEncoderBenchmark.shortAscii thrpt 6 15.781 ± 0.949 ops/us QueryStringEncoderBenchmark.shortAsciiFirst thrpt 6 3.171 ± 0.232 ops/us QueryStringEncoderBenchmark.shortUtf8 thrpt 6 3.900 ± 0.667 ops/us ``` Before ``` Benchmark Mode Cnt Score Error Units QueryStringEncoderBenchmark.longAscii thrpt 6 0.444 ± 0.072 ops/us QueryStringEncoderBenchmark.longAsciiFirst thrpt 6 0.043 ± 0.002 ops/us QueryStringEncoderBenchmark.longUtf8 thrpt 6 0.047 ± 0.001 ops/us QueryStringEncoderBenchmark.shortAscii thrpt 6 16.503 ± 1.015 ops/us QueryStringEncoderBenchmark.shortAsciiFirst thrpt 6 3.316 ± 0.154 ops/us QueryStringEncoderBenchmark.shortUtf8 thrpt 6 3.776 ± 0.956 ops/us ```	2019-12-20 08:51:26 +01:00
Anuraag Agrawal	0f42eb1ceb	Use array to buffer decoded query instead of ByteBuffer. (#9886 ) Motivation: In Java, it is almost always at least slower to use `ByteBuffer` than `byte[]` without pooling or I/O. `QueryStringDecoder` can use `byte[]` with arguably simpler code. Modification: Replace `ByteBuffer` / `CharsetDecoder` with `byte[]` and `new String` Result: After ``` Benchmark Mode Cnt Score Error Units QueryStringDecoderBenchmark.noDecoding thrpt 6 5.612 ± 2.639 ops/us QueryStringDecoderBenchmark.onlyDecoding thrpt 6 1.393 ± 0.067 ops/us QueryStringDecoderBenchmark.mixedDecoding thrpt 6 1.223 ± 0.048 ops/us ``` Before ``` Benchmark Mode Cnt Score Error Units QueryStringDecoderBenchmark.noDecoding thrpt 6 6.123 ± 0.250 ops/us QueryStringDecoderBenchmark.onlyDecoding thrpt 6 0.922 ± 0.159 ops/us QueryStringDecoderBenchmark.mixedDecoding thrpt 6 1.032 ± 0.178 ops/us ``` I notice #6781 switched from an array to `ByteBuffer` but I can't find any motivation for that in the PR. Unit tests pass fine with an array and we get a reasonable speed bump.	2019-12-18 21:15:44 +01:00
Nick Hill	d370d48d4a	Update to latest JMH version (#9787 ) Motivation JMH 1.22 was released recently, we might as well use the latest when running benchmarks. Summary of changes: https://mail.openjdk.java.net/pipermail/jmh-dev/2019-November/002879.html Modifications Update jmh dependencies in microbench module from version 1.21 to 1.22. Result Benchmarks run using latest JMH	2019-11-19 11:28:36 +01:00
康智冬	1c69448e2e	Fix typos in javadocs (#9527 ) Motivation: We should have correct docs without typos Modification: Fix typos and spelling Result: More correct docs	2019-10-09 15:25:41 +02:00
jingene	af614e4d6e	Change the netty.io homepage scheme(http -> https) (#9344 ) Motivation: Netty homepage(netty.io) serves both "http" and "https". It's recommended to use https than http. Modification: I changed from "http://netty.io" to "https://netty.io" Result: No effects.	2019-07-09 21:10:14 +02:00
Norman Maurer	c5a602b272	Increase maxHeaderListSize for HpackDecoderBenchmark to be able to be… (#9321 ) Motivation: The previous used maxHeaderListSize was too low which resulted in exceptions during the benchmark run: ``` io.netty.handler.codec.http2.Http2Exception: Header size exceeded max allowed size (8192) at io.netty.handler.codec.http2.Http2Exception.connectionError(Http2Exception.java:103) at io.netty.handler.codec.http2.Http2Exception.headerListSizeError(Http2Exception.java:188) at io.netty.handler.codec.http2.Http2CodecUtil.headerListSizeExceeded(Http2CodecUtil.java:231) at io.netty.handler.codec.http2.HpackDecoder$Http2HeadersSink.finish(HpackDecoder.java:545) at io.netty.handler.codec.http2.HpackDecoder.decode(HpackDecoder.java:132) at io.netty.handler.codec.http2.HpackDecoderBenchmark.decode(HpackDecoderBenchmark.java:85) at io.netty.handler.codec.http2.generated.HpackDecoderBenchmark_decode_jmhTest.decode_thrpt_jmhStub(HpackDecoderBenchmark_decode_jmhTest.java:120) at io.netty.handler.codec.http2.generated.HpackDecoderBenchmark_decode_jmhTest.decode_Throughput(HpackDecoderBenchmark_decode_jmhTest.java:83) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:453) at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:437) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) ``` Also we should ensure we only use ascii for header names. Modifications: Just use Integer.MAX_VALUE as limit Result: Be able to run benchmark without exceptions	2019-07-04 11:24:37 +02:00
Carl Mastrangelo	65d8ecc3a0	Use Table lookup for HPACK decoder (#9307 ) Motivation: Table based decoding is fast. Modification: Use table based decoding in HPACK decoder, inspired by https://github.com/python-hyper/hpack/blob/master/hpack/huffman_table.py This modifies the table to be based on integers, rather than 3-tuples of bytes. This is for two reasons: 1. It's faster 2. Using bytes makes the static intializer too big, and doesn't compile. Result: Faster Huffman decoding. This only seems to help the ascii case, the other decoding is about the same. Benchmarks: ``` Before: Benchmark (limitToAscii) (sensitive) (size) Mode Cnt Score Error Units HpackDecoderBenchmark.decode true true SMALL thrpt 20 426293.636 ± 1444.843 ops/s HpackDecoderBenchmark.decode true true MEDIUM thrpt 20 57843.738 ± 725.704 ops/s HpackDecoderBenchmark.decode true true LARGE thrpt 20 3002.412 ± 16.998 ops/s HpackDecoderBenchmark.decode true false SMALL thrpt 20 412339.400 ± 1128.394 ops/s HpackDecoderBenchmark.decode true false MEDIUM thrpt 20 58226.870 ± 199.591 ops/s HpackDecoderBenchmark.decode true false LARGE thrpt 20 3044.256 ± 10.675 ops/s HpackDecoderBenchmark.decode false true SMALL thrpt 20 2082615.030 ± 5929.726 ops/s HpackDecoderBenchmark.decode false true MEDIUM thrpt 10 571640.454 ± 26499.229 ops/s HpackDecoderBenchmark.decode false true LARGE thrpt 20 92714.555 ± 2292.222 ops/s HpackDecoderBenchmark.decode false false SMALL thrpt 20 1745872.421 ± 6788.840 ops/s HpackDecoderBenchmark.decode false false MEDIUM thrpt 20 490420.323 ± 2455.431 ops/s HpackDecoderBenchmark.decode false false LARGE thrpt 20 84536.200 ± 398.714 ops/s After(bytes): Benchmark (limitToAscii) (sensitive) (size) Mode Cnt Score Error Units HpackDecoderBenchmark.decode true true SMALL thrpt 20 472649.148 ± 7122.461 ops/s HpackDecoderBenchmark.decode true true MEDIUM thrpt 20 66739.638 ± 341.607 ops/s HpackDecoderBenchmark.decode true true LARGE thrpt 20 3139.773 ± 24.491 ops/s HpackDecoderBenchmark.decode true false SMALL thrpt 20 466933.833 ± 4514.971 ops/s HpackDecoderBenchmark.decode true false MEDIUM thrpt 20 66111.778 ± 568.326 ops/s HpackDecoderBenchmark.decode true false LARGE thrpt 20 3143.619 ± 3.332 ops/s HpackDecoderBenchmark.decode false true SMALL thrpt 20 2109995.177 ± 6203.143 ops/s HpackDecoderBenchmark.decode false true MEDIUM thrpt 20 586026.055 ± 1578.550 ops/s HpackDecoderBenchmark.decode false false SMALL thrpt 20 1775723.270 ± 4932.057 ops/s HpackDecoderBenchmark.decode false false MEDIUM thrpt 20 493316.467 ± 1453.037 ops/s HpackDecoderBenchmark.decode false false LARGE thrpt 10 85726.219 ± 402.573 ops/s After(ints): Benchmark (limitToAscii) (sensitive) (size) Mode Cnt Score Error Units HpackDecoderBenchmark.decode true true SMALL thrpt 20 615549.006 ± 5282.283 ops/s HpackDecoderBenchmark.decode true true MEDIUM thrpt 20 86714.630 ± 654.489 ops/s HpackDecoderBenchmark.decode true true LARGE thrpt 20 3984.439 ± 61.612 ops/s HpackDecoderBenchmark.decode true false SMALL thrpt 20 602489.337 ± 5397.024 ops/s HpackDecoderBenchmark.decode true false MEDIUM thrpt 20 88399.109 ± 241.115 ops/s HpackDecoderBenchmark.decode true false LARGE thrpt 20 3875.729 ± 103.057 ops/s HpackDecoderBenchmark.decode false true SMALL thrpt 20 2092165.454 ± 11918.859 ops/s HpackDecoderBenchmark.decode false true MEDIUM thrpt 20 583465.437 ± 5452.115 ops/s HpackDecoderBenchmark.decode false true LARGE thrpt 20 93290.061 ± 665.904 ops/s HpackDecoderBenchmark.decode false false SMALL thrpt 20 1758402.495 ± 14677.438 ops/s HpackDecoderBenchmark.decode false false MEDIUM thrpt 10 491598.099 ± 5029.698 ops/s HpackDecoderBenchmark.decode false false LARGE thrpt 20 85834.290 ± 554.915 ops/s ```	2019-07-02 20:13:19 +02:00
jimin	78adeb5408	All override methods must be added @override (#9285 ) Motivation: Some methods that either override others or are implemented as part of implementation an interface did miss the `@Override` annotation Modifications: Add missing `@Override`s Result: Code cleanup	2019-06-27 13:52:06 +02:00
Alex Blewitt	e233407e01	Replace accumulation with blackhole.consume (#9275 ) Motivation: SpotJMHBugs reports that accumulating a value as a way of eliding dead code elimination may be inadvisable, as discussed in `JMHSample_34_SafeLooping::measureWrong_2`. Change the test so that it consumes the response with `Blackhole::consume` instead. Modifications: - Replace addition of results with explicit `blackhole.consume()` call Result: Tests work as before, but with different benchmark numbers.	2019-06-25 21:46:26 +02:00
Francesco Nigro	c6114786ab	Documented non-usage of BlackHole::consume on ByteBufAccessBenchmark (#9279 ) Motivation: Some JMH benchmarks need additional explanations to motivate specific code choices. Modifications: Introduced comment to explai why calling BlackHole::consume in a loop is not always the right choice for some benchmark. Result: The relevant method shows a comment that warn about changing the code to introduce BlackHole::consume in the loop.	2019-06-25 14:53:12 +02:00
Alex Blewitt	99034a15b5	Return the result of the list.recycle() call (#9264 ) Motivation: Resolve the issue highlighted by SpotJMHBugs that the creation of the RecyclableArrayList may be elided by the JIT since the result isn't consumed or returned. Modifications: Return the result of `list.recycle()` so that the list isn't elided. Result: The JMH benchmark shows a change in performance indicating that the prior results of this may be unsound.	2019-06-22 07:21:51 +02:00
Carl Mastrangelo	f01278616a	Properly debounce wakeups (#9191 ) Motivation: The wakeup logic in EpollEventLoop is overly complex Modification: * Simplify the race to wakeup the loop * Dont let the event loop wake up itself (it's already awake!) * Make event loop check if there are any more tasks after preparing to sleep. There is small window where the non-eventloop writers can issue eventfd writes here, but that is okay. Result: Cleaner wakeup logic. Benchmarks: ``` BEFORE Benchmark Mode Cnt Score Error Units EpollSocketChannelBenchmark.executeMulti thrpt 20 408381.411 ± 2857.498 ops/s EpollSocketChannelBenchmark.executeSingle thrpt 20 157022.360 ± 1240.573 ops/s EpollSocketChannelBenchmark.pingPong thrpt 20 60571.704 ± 331.125 ops/s Benchmark Mode Cnt Score Error Units EpollSocketChannelBenchmark.executeMulti thrpt 20 440546.953 ± 1652.823 ops/s EpollSocketChannelBenchmark.executeSingle thrpt 20 168114.751 ± 1176.609 ops/s EpollSocketChannelBenchmark.pingPong thrpt 20 61231.878 ± 520.108 ops/s ```	2019-06-04 05:27:15 -07:00
Nick Hill	23554e6997	Ensure "full" ownership of msgs passed to EmbeddedChannel.writeInbound() (#9058 ) Motivation Pipeline handlers are free to "take control" of input buffers if they have singular refcount - in particular to mutate their raw data if non-readonly via discarding of read bytes, etc. However there are various places (primarily unit tests) where a wrapped byte-array buffer is passed in and the wrapped array is assumed not to change (used after the wrapped buffer is passed to EmbeddedChannel.writeInbound()). This invalid assumption could result in unexpected errors, such as those exposed by #8931. Modifications Anywhere that the data passed to writeInbound() might be used again, ensure that either: - A copy is used rather than wrapping a shared byte array, or - The buffer is otherwise protected from modification by making it read-only For the tests, copying is preferred since it still allows the "mutating" optimizations to be exercised. Results Avoid possible errors when pipeline assumes it has full control of input buffer.	2019-05-22 12:35:03 +02:00
Francesco Nigro	635fc9eae0	The benchmark is not taking into account nanoTime granularity (#9033 ) Motivation: Results are just wrong for small delays. Modifications: Switching to AvarageTime avoid to rely on OS nanoTime granularity. Result: Uncontended low delay results are not reliable	2019-04-15 15:15:08 +02:00
Norman Maurer	0f34345347	Merge ChannelInboundHandler and ChannelOutboundHandler into ChannelHa… (#8957 ) Motivation: In `42742e233f` we already added default methods to Channel*Handler and deprecated the Adapter classes to simplify the class hierarchy. With this change we go even further and merge everything into just ChannelHandler. This simplifies things even more in terms of class-hierarchy. Modifications: - Merge ChannelInboundHandler \| ChannelOutboundHandler into ChannelHandler - Adjust code to just use ChannelHandler - Deprecate old interfaces. Result: Cleaner and simpler code in terms of class-hierarchy.	2019-03-28 09:28:27 +00:00
Norman Maurer	42742e233f	Deprecate ChannelInboundHandlerAdapter and ChannelOutboundHandlerAdapter (#8929 ) Motivation: As we now us java8 as minimum java version we can deprecate ChannelInboundHandlerAdapter / ChannelOutboundHandlerAdapter and just move the default implementations into the interfaces. This makes things a bit more flexible for the end-user and also simplifies the class-hierarchy. Modifications: - Mark ChannelInboundHandlerAdapter and ChannelOutboundHandlerAdapter as deprecated - Add default implementations to ChannelInboundHandler / ChannelOutboundHandler - Refactor our code to not use ChannelInboundHandlerAdapter / ChannelOutboundHandlerAdapter anymore Result: Cleanup class-hierarchy and make things a bit more flexible.	2019-03-13 09:46:10 +01:00
Norman Maurer	c6b372f517	Use maven plugin to prevent API/ABI breakage as part of build process (#8904 ) Motivation: Netty is very widely used which can lead to a lot of pain when we break API / ABI. We should make use japicmp-maven-plugin during the build to verify we do not introduce breakage by mistake. Modifications: - Add japicmp-maven-plugin to the build process - Fix a method signature change in HttpProxyHandler that was flagged as a possible problem. Result: Ensure no API/ABI breakage accour between releases.	2019-03-01 19:48:29 +01:00
Nick Hill	35161ad174	Further reduce ensureAccessible() overhead (#8895 ) Motivation: This PR fixes some non-negligible overhead discovered in the ByteBuf accessibility (non-zero refcount) checking. The cause turned out to be mostly twofold: - Unnecessary operations used to calculate the refcount from the "raw" encoded int field value - Call stack depths exceeding the default limit for inlining, in some places (CompositeByteBuf in particular) It's a follow-on from #8882 which uses the maxCapacity field for a simpler non-negative check. The performance gap between these two variants appears to be _mostly_ closed, but there's one exception which may warrant further analysis. Modifications: - Replace ABB.internalRefCount() with ByteBuf.isAccessible(), the default still checks for non-zero refCnt() - Just test for parity of raw refCnt instead of converting to "real", with fast-path for specific small values - Make sure isAccessible() is delegated by derived/wrapper ByteBufs - Use existing freed flag in CompositeByteBuf for faster isAccessible() - Manually inline some calls in methods like CompositeByteBuf.setLong() and AbstractReferenceCountedByteBuf.isAccessible() to reduce stack depths (to ensure default inlining limit isn't hit) - Add ByteBufAccessBenchmark which is an extension of UnsafeByteBufBenchmark (maybe latter could now be removed) Results: Before: Benchmark (bufferType) (checkAccessible) (checkBounds) Mode Cnt Score Error Units readBatch UNSAFE true true thrpt 30 84524972.863 ± 518338.811 ops/s readBatch UNSAFE_SLICE true true thrpt 30 38608795.037 ± 298176.974 ops/s readBatch HEAP true true thrpt 30 80003697.649 ± 974674.119 ops/s readBatch COMPOSITE true true thrpt 30 18495554.788 ± 108075.023 ops/s setGetLong UNSAFE true true thrpt 30 247069881.578 ± 10839162.593 ops/s setGetLong UNSAFE_SLICE true true thrpt 30 196355905.206 ± 1802420.990 ops/s setGetLong HEAP true true thrpt 30 245686644.713 ± 11769311.527 ops/s setGetLong COMPOSITE true true thrpt 30 83170940.687 ± 657524.123 ops/s setLong UNSAFE true true thrpt 30 278940253.918 ± 1807265.259 ops/s setLong UNSAFE_SLICE true true thrpt 30 202556738.764 ± 11887973.563 ops/s setLong HEAP true true thrpt 30 280045958.053 ± 2719583.400 ops/s setLong COMPOSITE true true thrpt 30 121299806.002 ± 2155084.707 ops/s After: Benchmark (bufferType) (checkAccessible) (checkBounds) Mode Cnt Score Error Units readBatch UNSAFE true true thrpt 30 101641801.035 ± 3950050.059 ops/s readBatch UNSAFE_SLICE true true thrpt 30 84395902.846 ± 4339579.057 ops/s readBatch HEAP true true thrpt 30 100179060.207 ± 3222487.287 ops/s readBatch COMPOSITE true true thrpt 30 42288494.472 ± 294919.633 ops/s setGetLong UNSAFE true true thrpt 30 304530755.027 ± 6574163.899 ops/s setGetLong UNSAFE_SLICE true true thrpt 30 212028547.645 ± 14277828.768 ops/s setGetLong HEAP true true thrpt 30 309335422.609 ± 2272150.415 ops/s setGetLong COMPOSITE true true thrpt 30 160383609.236 ± 966484.033 ops/s setLong UNSAFE true true thrpt 30 298055969.747 ± 7437449.627 ops/s setLong UNSAFE_SLICE true true thrpt 30 223784178.650 ± 9869750.095 ops/s setLong HEAP true true thrpt 30 302543263.328 ± 8140104.706 ops/s setLong COMPOSITE true true thrpt 30 157083673.285 ± 3528779.522 ops/s There's also a similar knock-on improvement to other benchmarks (e.g. HPACK encoding/decoding) as shown in #8882. For sanity I did a final comparison of the "fast path" tweak using one of the HPACK benchmarks: (rawCnt & 1) == 0: Benchmark (limitToAscii) (sensitive) (size) Mode Cnt Score Error Units HpackDecoderBenchmark.decode true true MEDIUM thrpt 30 50914.479 ± 940.114 ops/s rawCnt == 2 \|\| rawCnt == 4 \|\| rawCnt == 6 \|\| rawCnt == 8 \|\| (rawCnt & 1) == 0: Benchmark (limitToAscii) (sensitive) (size) Mode Cnt Score Error Units HpackDecoderBenchmark.decode true true MEDIUM thrpt 30 60036.425 ± 1478.196 ops/s	2019-02-28 20:41:16 +01:00
Dmitriy Dumanskiy	116f72db8d	Legacy properties removed (#8839 ) Motivation: We can remove some properties for which we introduced replacements. Modifications: io.netty.buffer.bytebuf.checkAccessible, io.netty.leakDetectionLevel, org.jboss.netty.tryUnsafe properties removed Result: Code cleanup	2019-02-04 13:56:15 +01:00
田欧	e8efcd82a8	migrate java8: use requireNonNull (#8840 ) Motivation: We can just use Objects.requireNonNull(...) as a replacement for ObjectUtil.checkNotNull(....) Modifications: - Use Objects.requireNonNull(...) Result: Less code to maintain.	2019-02-04 10:32:25 +01:00
Dmitriy Dumanskiy	2e433889b2	Improve DateFormatter parsing performance (#8821 ) Motivation: Just was looking through code and found 1 interesting place DateFormatter.tryParseMonth that was not very effective, so I decided to optimize it a bit. Modification: Changed DateFormatter.tryParseMonth method. Instead of invocation regionMatch() for every month - compare chars one by one. Result: DateFormatter.parseHttpDate method performance improved from ~3% to ~15%. Benchmark (DATE_STRING) Mode Cnt Score Error Units DateFormatter2Benchmark.parseHttpHeaderDateFormatter Sun, 27 Jan 2016 19:18:46 GMT thrpt 6 4142781.221 ± 82155.002 ops/s DateFormatter2Benchmark.parseHttpHeaderDateFormatter Sun, 27 Dec 2016 19:18:46 GMT thrpt 6 3781810.558 ± 38679.061 ops/s DateFormatter2Benchmark.parseHttpHeaderDateFormatterNew Sun, 27 Jan 2016 19:18:46 GMT thrpt 6 4372569.705 ± 30257.537 ops/s DateFormatter2Benchmark.parseHttpHeaderDateFormatterNew Sun, 27 Dec 2016 19:18:46 GMT thrpt 6 4339785.100 ± 57542.660 ops/s	2019-02-04 10:04:35 +01:00
Norman Maurer	e3846c54f6	Remove ChannelHandler.exceptionCaught(...) as it should only exist in… (#8822 ) Motivation: ChannelHandler.exceptionCaught(...) was marked as @deprecated as it should only exist in inbound handlers. Modifications: Remove ChannelHandler.exceptionCaught(...) and adjust code / tests. Result: Fixes https://github.com/netty/netty/issues/8527	2019-01-31 20:29:17 +01:00
Dmitriy Dumanskiy	67b23ab056	Remove HttpHeaderDateFormat class (#8807 ) Motivation: HttpHeaderDateFormat was replaced with DateFormatter many days ago and now can be easily removed. Modification: Remove deprecated class and related test / benchmark Result: Less code to maintain	2019-01-31 07:22:20 +01:00
田欧	6222101924	migrate java8: use lambda and method reference (#8781 ) Motivation: We can use lambdas now as we use Java8. Modification: use lambda function for all package, #8751 only migrate transport package. Result: Code cleanup.	2019-01-29 14:06:05 +01:00
Norman Maurer	310f31b392	Update to new checkstyle plugin (#8777 ) Motivation: We need to update to a new checkstyle plugin to allow the usage of lambdas. Modifications: - Update to new plugin version. - Fix checkstyle problems. Result: Be able to use checkstyle plugin which supports new Java syntax.	2019-01-24 16:24:19 +01:00
Norman Maurer	3d6e6136a9	Decouple EventLoop details from the IO handling for each transport to… (#8680 ) * Decouble EventLoop details from the IO handling for each transport to allow easy re-use of code and customization Motiviation: As today extending EventLoop implementations to add custom logic / metrics / instrumentations is only possible in a very limited way if at all. This is due the fact that most implementations are final or even package-private. That said even if these would be public there are the ability to do something useful with these is very limited as the IO processing and task processing are very tightly coupled. All of the mentioned things are a big pain point in netty 4.x and need improvement. Modifications: This changeset decoubled the IO processing logic from the task processing logic for the main transport (NIO, Epoll, KQueue) by introducing the concept of an IoHandler. The IoHandler itself is responsible to wait for IO readiness and process these IO events. The execution of the IoHandler itself is done by the SingleThreadEventLoop as part of its EventLoop processing. This allows to use the same EventLoopGroup (MultiThreadEventLoupGroup) for all the mentioned transports by just specify a different IoHandlerFactory during construction. Beside this core API change this changeset also allows to easily extend SingleThreadEventExecutor / SingleThreadEventLoop to add custom logic to it which then can be reused by all the transports. The ideas are very similar to what is provided by ScheduledThreadPoolExecutor (that is part of the JDK). This allows for example things like: * Adding instrumentation / metrics: * how many Channels are registered on an SingleThreadEventLoop * how many Channels were handled during the IO processing in an EventLoop run * how many task were handled during the last EventLoop / EventExecutor run * how many outstanding tasks we have ... ... * Implementing custom strategies for choosing the next EventExecutor / EventLoop to use based on these metrics. * Use different Promise / Future / ScheduledFuture implementations * decorate Runnable / Callables when submitted to the EventExecutor / EventLoop As a lot of functionalities are folded into the MultiThreadEventLoopGroup and SingleThreadEventLoopGroup this changeset also removes: * AbstractEventLoop * AbstractEventLoopGroup * EventExecutorChooser * EventExecutorChooserFactory * DefaultEventLoopGroup * DefaultEventExecutor * DefaultEventExecutorGroup Result: Fixes https://github.com/netty/netty/issues/8514 .	2019-01-23 08:32:05 +01:00
Dmitriy Dumanskiy	7b92ff2500	Java 8 migration. Remove ThreadLocalProvider and inline java.util.concurrent.ThreadLocalRandom.current() where necessary. (#8762 ) Motivation: Custom Netty ThreadLocalRandom and ThreadLocalRandomProvider classes are no longer needed and can be removed. Modification: Remove own ThreadLocalRandom Result: Less code to maintain	2019-01-22 20:14:28 +01:00
田欧	9d62deeb6f	Java 8 migration: Use diamond operator (#8749 ) Motivation: We can use the diamond operator these days. Modification: Use diamond operator whenever possible. Result: More modern code and less boiler-plate.	2019-01-22 16:07:26 +01:00
Norman Maurer	8fdf373557	Skip execution of ChannelHandler method if annotated with @Skip and just use the next handler in the pipeline. (#8723 ) Motivation: Invoking ChannelHandlers is not free and can result in some overhead when the ChannelPipeline becomes very long. This is especially true if most handlers will just forward the call to the next handler in the pipeline. When the user extends ChannelHandlerAdapter we can easily detect if can just skip the handler and invoke the next handler in the pipeline directly. This reduce the overhead of dispatch but also reduce the call-stack in many cases. Modifications: Detect if we can skip the handler when walking the pipeline. Result: Reduce overhead for long pipelines. Benchmark (extraHandlers) Mode Cnt Score Error Units DefaultChannelPipelineBenchmark.propagateEventOld 4 thrpt 10 267313.031 ± 9131.140 ops/s DefaultChannelPipelineBenchmark.propagateEvent 4 thrpt 10 824825.673 ± 12727.594 ops/s	2019-01-22 08:58:58 +01:00
Norman Maurer	1fe931b6e2	Make it possible to use a wrapped EventLoop with a Channel (#8677 ) Motiviation: Because of how we implemented the registration / deregistration of an EventLoop it was not possible to wrap an EventLoop implementation and use it with a Channel. Modification: - Introduce EventLoop.Unsafe which is responsible for the actual registration. - Move validation of EventLoop / Channel combo to the EventLoop - Add unit test that verifies that wrapping works Result: Be able to wrap an EventLoop and so add some extra functionality.	2019-01-17 09:17:51 +01:00
Norman Maurer	c10ccc5dec	Tighten contract between Channel and EventLoop by require the EventLoop on Channel construction. (#8587 ) Motivation: At the moment it’s possible to have a Channel in Netty that is not registered / assigned to an EventLoop until register(...) is called. This is suboptimal as if the Channel is not registered it is also not possible to do anything useful with a ChannelFuture that belongs to the Channel. We should think about if we should have the EventLoop as a constructor argument of a Channel and have the register / deregister method only have the effect of add a Channel to KQueue/Epoll/... It is also currently possible to deregister a Channel from one EventLoop and register it with another EventLoop. This operation defeats the threading model assumptions that are wide spread in Netty, and requires careful user level coordination to pull off without any concurrency issues. It is not a commonly used feature in practice, may be better handled by other means (e.g. client side load balancing), and therefore we propose removing this feature. Modifications: - Change all Channel implementations to require an EventLoop for construction ( + an EventLoopGroup for all ServerChannel implementations) - Remove all register(...) methods from EventLoopGroup - Add ChannelOutboundInvoker.register(...) which now basically means we want to register on the EventLoop for IO. - Change ChannelUnsafe.register(...) to not take an EventLoop as parameter (as the EventLoop is supplied on custruction). - Change ChannelFactory to take an EventLoop to create new Channels and introduce ServerChannelFactory which takes an EventLoop and one EventLoopGroup to create new ServerChannel instances. - Add ServerChannel.childEventLoopGroup() - Ensure all operations on the accepted Channel is done in the EventLoop of the Channel in ServerBootstrap - Change unit tests for new behaviour Result: A Channel always has an EventLoop assigned which will never change during its life-time. This ensures we are always be able to call any operation on the Channel once constructed (unit the EventLoop is shutdown). This also simplifies the logic in DefaultChannelPipeline a lot as we can always call handlerAdded / handlerRemoved directly without the need to wait for register() to happen. Also note that its still possible to deregister a Channel and register it again. It's just not possible anymore to move from one EventLoop to another (which was not really safe anyway). Fixes https://github.com/netty/netty/issues/8513.	2019-01-14 20:11:13 +01:00
Norman Maurer	d9a6cf341c	Remove support for marking reader and writerIndex in ByteBuf to reduce overhead and complexity. (#8636 ) Motivation: ByteBuf supports “marker indexes”. The intended use case for these is if a speculative operation (e.g. decode) is in process the user can “mark” and interface and refer to it later if the operation isn’t successful (e.g. not enough data). However this is rarely used in practice, requires extra memory to maintain, and introduces complexity in the state management for derived/pooled buffer initialization, resizing, and other operations which may modify reader/writer indexes. Modifications: Remove support for marking and adjust testcases / code. Result: Fixes https://github.com/netty/netty/issues/8535.	2018-12-11 14:00:49 +01:00
Francesco Nigro	4c2b11633a	Adding an execute burst cost benchmark for Netty executors (#8594 ) Motivation: Netty executors doesn't have yet any means to compare with each others nor to compare with the j.u.c. executors Modifications: A new benchmark measuring execute burst cost is being added Result: It's now possible to compare some of Netty executors with each others and with the j.u.c. executors	2018-12-04 15:46:48 +01:00
Norman Maurer	2c78dde749	Update version number to start working on Netty 5	2018-11-20 15:49:57 +01:00
Nick Hill	10539f4dc7	Streamline CompositeByteBuf internals (#8437 ) Motivation: CompositeByteBuf is a powerful and versatile abstraction, allowing for manipulation of large data without copying bytes. There is still a non-negligible cost to reading/writing however relative to "singular" ByteBufs, and this can be mostly eliminated with some rework of the internals. My use case is message modification/transformation while zero-copy proxying. For example replacing a string within a large message with one of a different length Modifications: - No longer slice added buffers and unwrap added slices - Components store target buf offset relative to position in composite buf - Less allocations, object footprint, pointer indirection, offset arithmetic - Use Component[] rather than ArrayList<Component> - Avoid pointer indirection and duplicate bounds check, more efficient backing array growth - Facilitates optimization when doing bulk-inserts - inserting n ByteBufs behind m is now O(m + n) instead of O(mn) - Avoid unnecessary casting and method call indirection via superclass - Eliminate some duplicate range/ref checks via non-checking versions of toComponentIndex and findComponent - Add simple fast-path for toComponentIndex(0); add racy cache of last-accessed Component to findComponent(int) - Override forEachByte0(...) and forEachByteDesc0(...) methods - Make use of RecyclableArrayList in nioBuffers(int, int) (in line with FasterCompositeByteBuf impl) - Modify addComponents0(boolean,int,Iterable) to use the Iterable directly rather than copy to an array first (and possibly to an ArrayList before that) - Optimize addComponents0(boolean,int,ByteBuf[],int) to not perform repeated array insertions and avoid second loop for offset updates - Simplify other logic in various places, in particular the general pattern used where a sub-range is iterated over - Add benchmarks to demonstrate some improvements While refactoring I also came across a couple of clear bugs. They are fixed in these changes but I will open another PR with unit tests and fixes to the current version. Result: Much faster creation, manipulation, and access; many fewer allocations and smaller footprint. Benchmark results to follow.	2018-11-03 10:37:07 +01:00

1 2 3 4 5 ...

364 Commits