Commit Graph

412 Commits

Author SHA1 Message Date
root
97d044812d [maven-release-plugin] prepare release netty-4.1.59.Final 2021-02-08 10:47:46 +00:00
Francesco Nigro
d943d11eb0
DecodeHexBenchmark is too branch-predictor friendly (#9942)
Motivation:

DecodeHexBenchmark needs to be less branch-predictor friendly
to mimic the "real" behaviour while decoding

Modifications:

DecodeHexBenchmark uses a larger sets of inputs, picking them at
random on each iteration and the benchmarked method is made !inlineable

Result:

DecodeHexBenchmark is more trusty while showing the performance
difference between different decoding methods
2021-02-05 15:27:35 +01:00
Francesco Nigro
9a02832fdb
Implement SWAR indexOf byte search (#10737)
Motivation:

Faster indexOf

Modification:

Create generic SWAR indexOf that any ByteBuf implementation can use

Result:

Fixes #10731
2021-01-15 15:09:27 +01:00
root
a137ce2042 [maven-release-plugin] prepare for next development iteration 2021-01-13 10:28:54 +00:00
root
10b03e65f1 [maven-release-plugin] prepare release netty-4.1.58.Final 2021-01-13 10:27:17 +00:00
root
c6b894d03d [maven-release-plugin] prepare for next development iteration 2021-01-12 11:10:44 +00:00
root
b016568e21 [maven-release-plugin] prepare release netty-4.1.57.Final 2021-01-12 11:10:20 +00:00
Oleksii Kachaiev
ab8c4f22c6
Improve performance of HPACK static table lookup (#10840)
Motivation:

HPACK static table is organized in a way that fields with the same
name are sequential. Which means when doing sequential scan we can
short-circuit scan on name mismatch.

Modifications:

* `HpackStaticTable.getIndexIndensitive` returns -1 on name mismatch
rather than keep scanning.
* `HpackStaticTable` statically defined max position in the array
where name duplication is possible (after the given index there's
no need to check for other fields with the same name)
* Benchmark for different lookup patterns

Result:

Better HPACK static table lookup performance.

Co-authored-by: Norman Maurer <norman_maurer@apple.com>
2020-12-21 15:34:04 +01:00
root
a9ec3d86f6 [maven-release-plugin] prepare for next development iteration 2020-12-17 06:11:39 +00:00
root
1188d8320e [maven-release-plugin] prepare release netty-4.1.56.Final 2020-12-17 06:11:18 +00:00
root
f57d64f1c7 [maven-release-plugin] prepare for next development iteration 2020-12-08 11:51:39 +00:00
root
38da45ffe1 [maven-release-plugin] prepare release netty-4.1.55.Final 2020-12-08 11:51:25 +00:00
Andrey Mizurov
f551db2bda
Provide ability to extend StompSubframeEncoder and improve full stomp frame encoding (allocate one buffer for full frame considering the size of the headers) (#10778)
Motivation:

At the moment `StompSubframeEncoder` encode a frame only to `ByteBuf` it is not convenient if further we need to convert it to another type of message,  e.g. `WebSocketFrame`. Also, if we send a full frame, it splits into two headers and a content what makes it difficult to convert it in the next handler. 

Modification:

Introduce additional converter methods e.g. (`Object protected convertFullFrame(StompFrame original, ByteBuf encoded`)...) for extending encoder functionality and allocate only one `ByteBuf` for full stomp frame. Change headers size calculation, previously used only 256 bytes that reallocate a new buffer each time when headers size more than this threshold. Add `StompEncoderBenchmark`.

Result:

Improved  `StompSubframeEncoder` fro extensions.

Previous version benchmark
```
Benchmark                              (contentLength)  (headersType)  (pooledAllocator)   Mode  Cnt        Score        Error  Units
StompEncoderBenchmark.writeStompFrame                0            ONE               true  thrpt   10  4432132.884 ± 178923.436  ops/s
StompEncoderBenchmark.writeStompFrame                0            ONE              false  thrpt   10  1281122.756 ±  52484.174  ops/s
StompEncoderBenchmark.writeStompFrame                0          THREE               true  thrpt   10  2980897.937 ± 130253.049  ops/s
StompEncoderBenchmark.writeStompFrame                0          THREE              false  thrpt   10  1116883.574 ±  35471.482  ops/s
StompEncoderBenchmark.writeStompFrame                0          SEVEN               true  thrpt   10  1988012.159 ±  74352.450  ops/s
StompEncoderBenchmark.writeStompFrame                0          SEVEN              false  thrpt   10   881772.343 ±  94633.870  ops/s
StompEncoderBenchmark.writeStompFrame                0         ELEVEN               true  thrpt   10  1048125.919 ± 151053.902  ops/s
StompEncoderBenchmark.writeStompFrame                0         ELEVEN              false  thrpt   10   429900.066 ±  47956.661  ops/s
StompEncoderBenchmark.writeStompFrame                0         TWENTY               true  thrpt   10   660584.122 ± 104973.439  ops/s
StompEncoderBenchmark.writeStompFrame                0         TWENTY              false  thrpt   10   278255.488 ±  20143.708  ops/s
StompEncoderBenchmark.writeStompFrame               10            ONE               true  thrpt   10  4251498.549 ± 625050.979  ops/s
StompEncoderBenchmark.writeStompFrame               10            ONE              false  thrpt   10  1214006.861 ±  60421.601  ops/s
StompEncoderBenchmark.writeStompFrame               10          THREE               true  thrpt   10  3117736.486 ± 173613.974  ops/s
StompEncoderBenchmark.writeStompFrame               10          THREE              false  thrpt   10  1046605.891 ±  94428.064  ops/s
StompEncoderBenchmark.writeStompFrame               10          SEVEN               true  thrpt   10  2006986.881 ± 108456.748  ops/s
StompEncoderBenchmark.writeStompFrame               10          SEVEN              false  thrpt   10   877983.112 ±  82919.387  ops/s
StompEncoderBenchmark.writeStompFrame               10         ELEVEN               true  thrpt   10  1132844.437 ±  84578.571  ops/s
StompEncoderBenchmark.writeStompFrame               10         ELEVEN              false  thrpt   10   429334.649 ±  35403.161  ops/s
StompEncoderBenchmark.writeStompFrame               10         TWENTY               true  thrpt   10   657093.390 ±  48092.947  ops/s
StompEncoderBenchmark.writeStompFrame               10         TWENTY              false  thrpt   10   252140.876 ±  37337.255  ops/s
StompEncoderBenchmark.writeStompFrame              100            ONE               true  thrpt   10  4720507.067 ± 100993.908  ops/s
StompEncoderBenchmark.writeStompFrame              100            ONE              false  thrpt   10  1266182.925 ±  85888.413  ops/s
StompEncoderBenchmark.writeStompFrame              100          THREE               true  thrpt   10  2898746.621 ± 452579.753  ops/s
StompEncoderBenchmark.writeStompFrame              100          THREE              false  thrpt   10  1019555.288 ±  65640.507  ops/s
StompEncoderBenchmark.writeStompFrame              100          SEVEN               true  thrpt   10  2259187.459 ±  20025.989  ops/s
StompEncoderBenchmark.writeStompFrame              100          SEVEN              false  thrpt   10   896405.412 ±  53750.148  ops/s
StompEncoderBenchmark.writeStompFrame              100         ELEVEN               true  thrpt   10  1110670.772 ± 107650.327  ops/s
StompEncoderBenchmark.writeStompFrame              100         ELEVEN              false  thrpt   10   445187.398 ±  28845.959  ops/s
StompEncoderBenchmark.writeStompFrame              100         TWENTY               true  thrpt   10   611506.846 ±  25304.240  ops/s
StompEncoderBenchmark.writeStompFrame              100         TWENTY              false  thrpt   10   247687.007 ±  43471.578  ops/s
StompEncoderBenchmark.writeStompFrame             1000            ONE               true  thrpt   10  4140949.576 ± 270274.087  ops/s
StompEncoderBenchmark.writeStompFrame             1000            ONE              false  thrpt   10  1154515.598 ± 134413.876  ops/s
StompEncoderBenchmark.writeStompFrame             1000          THREE               true  thrpt   10  3349996.875 ± 162309.889  ops/s
StompEncoderBenchmark.writeStompFrame             1000          THREE              false  thrpt   10  1141040.562 ±   5895.693  ops/s
StompEncoderBenchmark.writeStompFrame             1000          SEVEN               true  thrpt   10  2184632.248 ±   8957.833  ops/s
StompEncoderBenchmark.writeStompFrame             1000          SEVEN              false  thrpt   10   959545.704 ±   5835.161  ops/s
StompEncoderBenchmark.writeStompFrame             1000         ELEVEN               true  thrpt   10  1081113.327 ±   3957.527  ops/s
StompEncoderBenchmark.writeStompFrame             1000         ELEVEN              false  thrpt   10   467524.660 ±   1383.236  ops/s
StompEncoderBenchmark.writeStompFrame             1000         TWENTY               true  thrpt   10   568411.797 ± 108712.493  ops/s
StompEncoderBenchmark.writeStompFrame             1000         TWENTY              false  thrpt   10   260764.231 ±  43149.129  ops/s
StompEncoderBenchmark.writeStompFrame            10000            ONE               true  thrpt   10  4369787.147 ± 619367.939  ops/s
StompEncoderBenchmark.writeStompFrame            10000            ONE              false  thrpt   10  1246782.845 ±  47468.764  ops/s
StompEncoderBenchmark.writeStompFrame            10000          THREE               true  thrpt   10  3333328.810 ± 253061.481  ops/s
StompEncoderBenchmark.writeStompFrame            10000          THREE              false  thrpt   10  1108278.988 ±  81905.149  ops/s
StompEncoderBenchmark.writeStompFrame            10000          SEVEN               true  thrpt   10  2062961.266 ± 247096.284  ops/s
StompEncoderBenchmark.writeStompFrame            10000          SEVEN              false  thrpt   10   925199.985 ±  36734.594  ops/s
StompEncoderBenchmark.writeStompFrame            10000         ELEVEN               true  thrpt   10  1223240.034 ±  58833.801  ops/s
StompEncoderBenchmark.writeStompFrame            10000         ELEVEN              false  thrpt   10   460864.117 ±   2361.459  ops/s
StompEncoderBenchmark.writeStompFrame            10000         TWENTY               true  thrpt   10   655864.762 ±  35237.335  ops/s
StompEncoderBenchmark.writeStompFrame            10000         TWENTY              false  thrpt   10   286388.865 ±   1002.460  ops/s
```
A new version benchmark
```
Benchmark                              (contentLength)  (headersType)  (pooledAllocator)   Mode  Cnt        Score        Error  Units
StompEncoderBenchmark.writeStompFrame                0            ONE               true  thrpt   10  4366110.018 ± 420377.867  ops/s
StompEncoderBenchmark.writeStompFrame                0            ONE              false  thrpt   10  1289437.153 ± 215271.656  ops/s
StompEncoderBenchmark.writeStompFrame                0          THREE               true  thrpt   10  2818791.355 ± 218894.471  ops/s
StompEncoderBenchmark.writeStompFrame                0          THREE              false  thrpt   10  1040151.615 ±  75352.695  ops/s
StompEncoderBenchmark.writeStompFrame                0          SEVEN               true  thrpt   10  1842144.001 ±  94668.864  ops/s
StompEncoderBenchmark.writeStompFrame                0          SEVEN              false  thrpt   10   916742.825 ±  65467.820  ops/s
StompEncoderBenchmark.writeStompFrame                0         ELEVEN               true  thrpt   10  1310454.012 ± 100747.490  ops/s
StompEncoderBenchmark.writeStompFrame                0         ELEVEN              false  thrpt   10   679934.001 ±  82168.249  ops/s
StompEncoderBenchmark.writeStompFrame                0         TWENTY               true  thrpt   10   746867.549 ±  68373.269  ops/s
StompEncoderBenchmark.writeStompFrame                0         TWENTY              false  thrpt   10   483316.314 ±  50978.009  ops/s
StompEncoderBenchmark.writeStompFrame               10            ONE               true  thrpt   10  4791698.722 ± 263890.510  ops/s
StompEncoderBenchmark.writeStompFrame               10            ONE              false  thrpt   10  1289877.116 ± 128677.185  ops/s
StompEncoderBenchmark.writeStompFrame               10          THREE               true  thrpt   10  2984662.187 ± 395567.524  ops/s
StompEncoderBenchmark.writeStompFrame               10          THREE              false  thrpt   10  1079028.782 ±  43548.555  ops/s
StompEncoderBenchmark.writeStompFrame               10          SEVEN               true  thrpt   10  1806763.709 ±  59162.209  ops/s
StompEncoderBenchmark.writeStompFrame               10          SEVEN              false  thrpt   10   935274.980 ±  22064.148  ops/s
StompEncoderBenchmark.writeStompFrame               10         ELEVEN               true  thrpt   10  1284172.151 ± 119068.047  ops/s
StompEncoderBenchmark.writeStompFrame               10         ELEVEN              false  thrpt   10   687174.498 ±  30270.916  ops/s
StompEncoderBenchmark.writeStompFrame               10         TWENTY               true  thrpt   10   803843.483 ±  29106.133  ops/s
StompEncoderBenchmark.writeStompFrame               10         TWENTY              false  thrpt   10   502134.552 ±  23653.215  ops/s
StompEncoderBenchmark.writeStompFrame              100            ONE               true  thrpt   10  4337438.694 ± 378524.452  ops/s
StompEncoderBenchmark.writeStompFrame              100            ONE              false  thrpt   10  1289174.213 ±  50640.853  ops/s
StompEncoderBenchmark.writeStompFrame              100          THREE               true  thrpt   10  3232767.156 ± 311934.194  ops/s
StompEncoderBenchmark.writeStompFrame              100          THREE              false  thrpt   10  1115247.028 ±  15683.477  ops/s
StompEncoderBenchmark.writeStompFrame              100          SEVEN               true  thrpt   10  2213147.232 ±  86326.187  ops/s
StompEncoderBenchmark.writeStompFrame              100          SEVEN              false  thrpt   10   901120.188 ±  71344.491  ops/s
StompEncoderBenchmark.writeStompFrame              100         ELEVEN               true  thrpt   10  1238317.714 ±  68148.477  ops/s
StompEncoderBenchmark.writeStompFrame              100         ELEVEN              false  thrpt   10   671336.339 ±  72735.337  ops/s
StompEncoderBenchmark.writeStompFrame              100         TWENTY               true  thrpt   10   754565.791 ±  28574.382  ops/s
StompEncoderBenchmark.writeStompFrame              100         TWENTY              false  thrpt   10   498939.383 ±  38146.118  ops/s
StompEncoderBenchmark.writeStompFrame             1000            ONE               true  thrpt   10  3722594.471 ± 515861.000  ops/s
StompEncoderBenchmark.writeStompFrame             1000            ONE              false  thrpt   10  1265629.633 ±  84113.347  ops/s
StompEncoderBenchmark.writeStompFrame             1000          THREE               true  thrpt   10  2829696.349 ± 172520.267  ops/s
StompEncoderBenchmark.writeStompFrame             1000          THREE              false  thrpt   10  1111454.609 ±  26275.913  ops/s
StompEncoderBenchmark.writeStompFrame             1000          SEVEN               true  thrpt   10  1901506.449 ±  37701.353  ops/s
StompEncoderBenchmark.writeStompFrame             1000          SEVEN              false  thrpt   10   912528.888 ±  46221.215  ops/s
StompEncoderBenchmark.writeStompFrame             1000         ELEVEN               true  thrpt   10  1299674.123 ±  21889.002  ops/s
StompEncoderBenchmark.writeStompFrame             1000         ELEVEN              false  thrpt   10   724527.644 ±   2757.370  ops/s
StompEncoderBenchmark.writeStompFrame             1000         TWENTY               true  thrpt   10   811389.799 ±   2606.626  ops/s
StompEncoderBenchmark.writeStompFrame             1000         TWENTY              false  thrpt   10   504955.449 ±   6737.804  ops/s
StompEncoderBenchmark.writeStompFrame            10000            ONE               true  thrpt   10  3837912.649 ± 380742.919  ops/s
StompEncoderBenchmark.writeStompFrame            10000            ONE              false  thrpt   10  1375544.306 ±   3157.068  ops/s
StompEncoderBenchmark.writeStompFrame            10000          THREE               true  thrpt   10  3224743.448 ± 297369.719  ops/s
StompEncoderBenchmark.writeStompFrame            10000          THREE              false  thrpt   10  1125772.007 ±   4051.498  ops/s
StompEncoderBenchmark.writeStompFrame            10000          SEVEN               true  thrpt   10  2127352.136 ± 106787.777  ops/s
StompEncoderBenchmark.writeStompFrame            10000          SEVEN              false  thrpt   10   934848.418 ±   4564.147  ops/s
StompEncoderBenchmark.writeStompFrame            10000         ELEVEN               true  thrpt   10  1379672.772 ±   8778.640  ops/s
StompEncoderBenchmark.writeStompFrame            10000         ELEVEN              false  thrpt   10   723169.459 ±   2317.767  ops/s
StompEncoderBenchmark.writeStompFrame            10000         TWENTY               true  thrpt   10   802275.113 ±   4155.137  ops/s
StompEncoderBenchmark.writeStompFrame            10000         TWENTY              false  thrpt   10   517604.265 ±   3398.384  ops/s
```
For headers over 256 bytes we get a speedup.
2020-12-07 09:00:52 +01:00
Norman Maurer
221c1a1ed7
Fix caching for normal allocations (#10825)
Motivation:

https://github.com/netty/netty/pull/10267 introduced a change that reduced the fragmentation. Unfortunally it also introduced a regression when it comes to caching of normal allocations. This can have a negative performance impact depending on the allocation sizes.

Modifications:

- Fix algorithm to calculate the array size for normal allocation caches
- Correctly calculate indeox for normal caches
- Add unit test

Result:

Fixes https://github.com/netty/netty/issues/10805
2020-11-25 15:05:30 +01:00
Frédéric Brégier
1c230405fd
Fix for performance regression on HttpPost RequestDecoder (#10623)
Fix issue #10508 where PARANOID mode slow down about 1000 times compared to ADVANCED.
Also fix a rare issue when internal buffer was growing over a limit, it was partially discarded
using `discardReadBytes()` which causes bad changes within previously discovered HttpData.

Reasons were:

Too many `readByte()` method calls while other ways exist (such as keep in memory the last scan position when trying to find a delimiter or using `bytesBefore(firstByte)` instead of looping externally).

Changes done:
- major change on way buffer are parsed: instead of read byte per byte until found delimiter, try to find the delimiter using `bytesBefore()` and keep the last unfound position to skeep already parsed parts (algorithms are the same but implementation of scan are different)
- Change the condition to discard read bytes when refCnt is at most 1.

Observations using Async-Profiler:
==================================

1) Without optimizations, most of the time (more than 95%) is through `readByte()` method within `loadDataMultipartStandard` method.
2) With using `bytesBefore(byte)` instead of `readByte()` to find various delimiter, the `loadDataMultipartStandard` method is going down to 19 to 33% depending on the test used. the `readByte()` method or equivalent `getByte(pos)` method are going down to 15% (from 95%).

Times are confirming those profiling:
- With optimizations, in SIMPLE mode about 82% better, in ADVANCED mode about 79% better and in PARANOID mode about 99% better (most of the duplicate read accesses are removed or make internally through `bytesBefore(byte)` method)

A benchmark is added to show the behavior of the various cases (one big item, such as File upload, and many items) and various level of detection (Disabled, Simple, Advanced, Paranoid). This benchmark is intend to alert if new implementations make too many differences (such as the previous version where about PARANOID gives about 1000 times slower than other levels, while it is now about at most 10 times).

Extract of Benchmark run:
=========================

Run complete. Total time: 00:13:27

Benchmark                                                                           Mode  Cnt  Score   Error   Units
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel   thrpt    6  2,248 ± 0,198 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel   thrpt    6  2,067 ± 1,219 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel   thrpt    6  1,109 ± 0,038 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel     thrpt    6  2,326 ± 0,314 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel  thrpt    6  1,444 ± 0,226 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel  thrpt    6  1,462 ± 0,642 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel  thrpt    6  0,159 ± 0,003 ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel    thrpt    6  1,522 ± 0,049 ops/ms
2020-11-19 08:00:35 +01:00
root
944a020586 [maven-release-plugin] prepare for next development iteration 2020-11-11 05:47:51 +00:00
root
715353ecd6 [maven-release-plugin] prepare release netty-4.1.54.Final 2020-11-11 05:47:37 +00:00
root
afca81a9d8 [maven-release-plugin] rollback the release of netty-4.1.54.Final 2020-11-10 12:02:24 +00:00
root
e256074e49 [maven-release-plugin] prepare for next development iteration 2020-11-10 11:12:23 +00:00
root
cea659bd8a [maven-release-plugin] prepare release netty-4.1.54.Final 2020-11-10 11:12:06 +00:00
Norman Maurer
5ffca6ef4a
Use http in xmlns URIs to make maven release plugin happy again (#10788)
Motivation:

https in xmlns URIs does not work and will let the maven release plugin fail:

```
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  1.779 s
[INFO] Finished at: 2020-11-10T07:45:21Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-release-plugin:2.5.3:prepare (default-cli) on project netty-parent: Execution default-cli of goal org.apache.maven.plugins:maven-release-plugin:2.5.3:prepare failed: The namespace xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" could not be added as a namespace to "project": The namespace prefix "xsi" collides with an additional namespace declared by the element -> [Help 1]
[ERROR]
```

See also https://issues.apache.org/jira/browse/HBASE-24014.

Modifications:

Use http for xmlns

Result:

Be able to use maven release plugin
2020-11-10 10:22:35 +01:00
Chris Vest
1c0662ea42
Use JUnit 5 for running all tests (#10764)
Motivation:
JUnit 5 is the new hotness. It's more expressive, extensible, and composable in many ways, and it's better able to run tests in parallel. But most importantly, it's able to directly run JUnit 4 tests.
This means we can update and start using JUnit 5 without touching any of our existing tests.
I'm also introducing a dependency on assertj-core, which is like hamcrest, but arguably has a nicer and more discoverable API.

Modification:
Add the JUnit 5 and assertj-core dependencies, without converting any tests at time time.

Result:
All our tests are now executed through the JUnit 5 Vintage Engine.
Also, the JUnit 5 test APIs are available, and any JUnit 5 tests that are added from now on will also be executed.
2020-11-04 10:19:59 +01:00
Artem Smotrakov
e5951d46fc
Enable nohttp check during the build (#10708)
Motivation:

HTTP is a plaintext protocol which means that someone may be able
to eavesdrop the data. To prevent this, HTTPS should be used whenever
possible. However, maintaining using https:// in all URLs may be
difficult. The nohttp tool can help here. The tool scans all the files
in a repository and reports where http:// is used.

Modifications:

- Added nohttp (via checkstyle) into the build process.
- Suppressed findings for the websites
  that don't support HTTPS or that are not reachable

Result:

- Prevent using HTTP in the future.
- Encourage users to use HTTPS when they follow the links they found in
  the code.
2020-10-23 14:44:18 +02:00
root
01b7e18632 [maven-release-plugin] prepare for next development iteration 2020-10-13 06:29:26 +00:00
root
d4a0050ef3 [maven-release-plugin] prepare release netty-4.1.53.Final 2020-10-13 06:29:02 +00:00
Francesco Nigro
69b44c6d06
Reduce DefaultAttributeMap lookup cost (#10530)
Motivation:

DefaultAttributeMap::attr has a blocking behaviour on lookup of an existing attribute:
it can be made non-blocking.

Modification:

Replace the existing fixed bucket table using a locked intrusive linked list
with an hand-rolled copy-on-write ordered single array

Result:
Non blocking behaviour for the lookup happy path
2020-10-02 18:24:35 +02:00
Francesco Nigro
162e59848a
Improve predictability of writeUtf8/writeAscii performance (#10368)
Motivation:

writeUtf8 can suffer from inlining issues and/or megamorphic call-sites on the hot path due to ByteBuf hierarchy

Modifications:

Duplicate and specialize the code paths to reduce the need of polymorphic calls

Result:

Performance are more stable in user code
2020-09-09 16:10:26 +02:00
root
957ef746d8 [maven-release-plugin] prepare for next development iteration 2020-09-08 05:26:25 +00:00
root
ada9c38c0a [maven-release-plugin] prepare release netty-4.1.52.Final 2020-09-08 05:26:05 +00:00
Francesco Nigro
38f01e0840
Reduce garbage on MQTT (#10509)
Reduce garbage on MQTT encoding

Motivation:

MQTT encoding and decoding is doing unnecessary object allocation in a number of places:
- MqttEncoder create many byte[] to encode Strings into UTF-8 bytes
- MqttProperties uses Integer keys instead of int
- Some enums valueOf create unnecessary arrays on the hot paths
- MqttDecoder was using unecessary Result<T>

Modification:

- ByteBufUtil::utf8Bytes and ByteBufUtil::reserveAndWriteUtf8 allows to perform the same operation GC-free
- MqttProperties uses a primitive key map
- Implemented GC free const table lookup/switch valueOf
- Use some bit-tricks to pack 2 ints into a single primitive long to store both result and numberOfBytesConsumed and use byte[].length to compute numberOfByteConsumed on fly. These changes allowed to save creating Result<T>.

Result:
Significantly less garbage produced in MQTT encoding/decoding
2020-09-04 18:27:22 +02:00
Francesco Nigro
d2c03c9a29
Improve MqttMessageType::valueOf cost (#10400)
Motivation:

MqttMessageType::valueOf has O(N) cost

Modifications:

MqttMessageType::valueOf uses a const lookup table

Result:

MqttMessageType::valueOf has O(1) cost
2020-08-31 10:32:33 +02:00
root
bfbeb2dec6 [maven-release-plugin] prepare for next development iteration 2020-07-09 12:27:06 +00:00
root
646934ef0a [maven-release-plugin] prepare release netty-4.1.51.Final 2020-07-09 12:26:30 +00:00
root
caf51b7284 [maven-release-plugin] prepare for next development iteration 2020-05-13 06:00:23 +00:00
root
8c5b72aaf0 [maven-release-plugin] prepare release netty-4.1.50.Final 2020-05-13 05:59:55 +00:00
root
9c5008b109 [maven-release-plugin] prepare for next development iteration 2020-04-22 09:57:54 +00:00
root
d0ec961cce [maven-release-plugin] prepare release netty-4.1.49.Final 2020-04-22 09:57:26 +00:00
Linas Medžiūnas
fb5e2cd3aa
Efficient BytBuf search algorithms (#9914) (#9955)
Motivation:

We have found out that ByteBufUtil.indexOf can be inefficient for substring search on
ByteBuf, both in terms of algorithm complexity (worst case O(needle.readableBytes *
haystack.readableBytes)), and in constant factor (esp. on Composite buffers).
With implementation of more performant search algorithms we have seen improvements on
the order of magnitude.

Modifications:

This change introduces three search algorithms:
1. Knuth Morris Pratt - classical textbook algorithm, a good default choice.
2. Bit mask based algorithm - stable performance on any input, but limited to maximum
search substring (the needle) length of 64 bytes.
3. Aho–Corasick - worse performance and higher memory consumption than [1] and [2], but
it supports multiple substring (the needles) search simultaneously, by inspecting every
byte of the haystack only once.

Each algorithm processes every byte of underlying buffer only once, they are implemented
as ByteProcessor.

Result:

Efficient search algorithms with linear time complexity available in Netty (I will share
benchmark results in a comment on a PR).
2020-04-15 10:21:24 +02:00
Dmitry Konstantinov
ea31b59037
Replace usage() with freeBytes() in thresholds within hot paths of PoolChunkList (#10141)
Motivation:
PoolChunk.usage() method has non-trivial computations. It is used currently in hot path methods invoked when an allocation and de-allocation are happened.
The idea is to replace usage() output comparison against percent thresholds by Chunk.freeBytes plain comparison against absolute thresholds. In such way the majority of computations from the threshold conditions are moved to init logic.

Modifications:
Replace PoolChunk.usage() conditions in PoolChunkList with equivalent conditions for PoolChunk.freeBytes()

Result:
Improve performance of allocation and de-allocation of ByteBuf from normal size cache pool
2020-03-31 22:11:16 +02:00
root
14e4afeba2 [maven-release-plugin] prepare for next development iteration 2020-03-17 09:20:54 +00:00
root
c10c697e5b [maven-release-plugin] prepare release netty-4.1.48.Final 2020-03-17 09:18:28 +00:00
root
c623a50d19 [maven-release-plugin] prepare for next development iteration 2020-03-09 12:13:56 +00:00
root
a401b2ac92 [maven-release-plugin] prepare release netty-4.1.47.Final 2020-03-09 12:13:26 +00:00
root
e0d73bca4d [maven-release-plugin] prepare for next development iteration 2020-02-28 06:37:33 +00:00
root
ebe7af5102 [maven-release-plugin] prepare release netty-4.1.46.Final 2020-02-28 06:36:45 +00:00
root
9b1ea10a12 [maven-release-plugin] prepare for next development iteration 2020-01-13 09:13:53 +00:00
root
136db8680a [maven-release-plugin] prepare release netty-4.1.45.Final 2020-01-13 09:13:30 +00:00
Francesco Nigro
bc026ef8ba Faster decodeHexNibble (#9896)
Motivation:

decodeHexNibble can be a lot faster using a lookup table

Modifications:

decodeHexNibble is made faster by using a lookup table

Result:

decodeHexNibble is faster
2019-12-23 21:15:56 +01:00
Anuraag Agrawal
687308b4de Separate out query string encoding for non-encoded strings. (#9887)
Motivation:

Currently, characters are appended to the encoded string char-by-char even when no encoding is needed. We can instead separate out codepath that appends the entire string in one go for better `StringBuilder` allocation performance.

Modification:

Only go into char-by-char loop when finding a character that requires encoding.

Result:

The results aren't so clear with noise on my hot laptop - the biggest impact is on long strings, both to reduce resizes of the buffer and also to reduce complexity of the loop. I don't think there's a significant downside though for the cases that hit the slow path.

After
```
Benchmark                                     Mode  Cnt   Score   Error   Units
QueryStringEncoderBenchmark.longAscii        thrpt    6   1.406 ± 0.069  ops/us
QueryStringEncoderBenchmark.longAsciiFirst   thrpt    6   0.046 ± 0.001  ops/us
QueryStringEncoderBenchmark.longUtf8         thrpt    6   0.046 ± 0.001  ops/us
QueryStringEncoderBenchmark.shortAscii       thrpt    6  15.781 ± 0.949  ops/us
QueryStringEncoderBenchmark.shortAsciiFirst  thrpt    6   3.171 ± 0.232  ops/us
QueryStringEncoderBenchmark.shortUtf8        thrpt    6   3.900 ± 0.667  ops/us
```

Before
```
Benchmark                                     Mode  Cnt   Score    Error   Units
QueryStringEncoderBenchmark.longAscii        thrpt    6   0.444 ±  0.072  ops/us
QueryStringEncoderBenchmark.longAsciiFirst   thrpt    6   0.043 ±  0.002  ops/us
QueryStringEncoderBenchmark.longUtf8         thrpt    6   0.047 ±  0.001  ops/us
QueryStringEncoderBenchmark.shortAscii       thrpt    6  16.503 ±  1.015  ops/us
QueryStringEncoderBenchmark.shortAsciiFirst  thrpt    6   3.316 ±  0.154  ops/us
QueryStringEncoderBenchmark.shortUtf8        thrpt    6   3.776 ±  0.956  ops/us
```
2019-12-20 08:51:18 +01:00
Anuraag Agrawal
95b8db0633 Use array to buffer decoded query instead of ByteBuffer. (#9886)
Motivation:

In Java, it is almost always at least slower to use `ByteBuffer` than `byte[]` without pooling or I/O. `QueryStringDecoder` can use `byte[]` with arguably simpler code.

Modification:

Replace `ByteBuffer` / `CharsetDecoder` with `byte[]` and `new String`

Result:

After
```
Benchmark                                   Mode  Cnt  Score   Error   Units
QueryStringDecoderBenchmark.noDecoding     thrpt    6  5.612 ± 2.639  ops/us
QueryStringDecoderBenchmark.onlyDecoding   thrpt    6  1.393 ± 0.067  ops/us
QueryStringDecoderBenchmark.mixedDecoding  thrpt    6  1.223 ± 0.048  ops/us
```

Before
```
Benchmark                                   Mode  Cnt  Score   Error   Units
QueryStringDecoderBenchmark.noDecoding     thrpt    6  6.123 ± 0.250  ops/us
QueryStringDecoderBenchmark.onlyDecoding   thrpt    6  0.922 ± 0.159  ops/us
QueryStringDecoderBenchmark.mixedDecoding  thrpt    6  1.032 ± 0.178  ops/us
```

I notice #6781 switched from an array to `ByteBuffer` but I can't find any motivation for that in the PR. Unit tests pass fine with an array and we get a reasonable speed bump.
2019-12-18 21:11:28 +01:00