netty5/microbench
Frédéric Brégier d421ae10d7 Minimize get byte multipart and fix buffer reuse (#11001)
Motivation:
- Underlying buffer usages might be erroneous when releasing them internaly
in HttpPostMultipartRequestDecoder.

2 bugs occurs:
1) Final File upload seems not to be of the right size.
2) Memory, even in Disk mode, is increasing continuously, while it shouldn't.

- Method `getByte(position)` is too often called within the current implementation
of the HttpPostMultipartRequestDecoder.
This implies too much activities which is visible when PARANOID mode is active.
This is also true in standard mode.

Apply the same fix on buffer from HttpPostMultipartRequestDecoder to HttpPostStandardRequestDecoder
made previously.

Finally in order to ensure we do not rewrite already decoded HttpData when decoding
next ones within multipart, we must ensure the buffers are copied and not a retained slice.

Modification:
- Add some tests to check consistency for HttpPostMultipartRequestDecoder.
Add a package protected method for testing purpose only.

- Use the `bytesBefore(...)` method instead of `getByte(pos)` in order to limit the external
access to the underlying buffer by retrieving iteratively the beginning of a correct start
position.
It is used to find both LF/CRLF and delimiter.
2 methods in HttpPostBodyUtil were created for that.

The undecodedChunk is copied when adding a chunk to a DataMultipart is loaded.
The same buffer is also rewritten in order to release the copied memory part.

Result:

Just for note, for both Memory or Disk or Mixed mode factories, the release has to be done as:

      for (InterfaceHttpData httpData: decoder.getBodyHttpDatas()) {
          httpData.release();
          factory.removeHttpDataFromClean(request, httpData);
      }
      factory.cleanAllHttpData();
      decoder.destroy();

The memory used is minimal in Disk or Mixed mode. In Memory mode, a big file is still
in memory but not more in the undecodedChunk but its own buffer (copied).

In terms of benchmarking, the results are:

Original code Benchmark                                                             Mode  Cnt  Score    Error   Units
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel   thrpt    6  0,152 ±  0,100  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel   thrpt    6  0,543 ±  0,218  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel   thrpt    6  0,001 ±  0,001  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel     thrpt    6  0,615 ±  0,070  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel  thrpt    6  0,114 ±  0,063  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel  thrpt    6  0,664 ±  0,034  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel  thrpt    6  0,001 ±  0,001  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel    thrpt    6  0,620 ±  0,140  ops/ms

New code Benchmark                                                                  Mode  Cnt  Score   Error   Units
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigAdvancedLevel   thrpt    6  4,037 ± 0,358  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigDisabledLevel   thrpt    6  4,226 ± 0,471  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigParanoidLevel   thrpt    6  0,875 ± 0,029  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderBigSimpleLevel     thrpt    6  4,346 ± 0,275  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighAdvancedLevel  thrpt    6  2,044 ± 0,020  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighDisabledLevel  thrpt    6  2,278 ± 0,159  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighParanoidLevel  thrpt    6  0,174 ± 0,004  ops/ms
HttpPostMultipartRequestDecoderBenchmark.multipartRequestDecoderHighSimpleLevel    thrpt    6  2,370 ± 0,065  ops/ms

In short, using big file transfers, this is about 7 times faster with new code, while
using high number of HttpData, this is about 4 times faster with new code when using Simple Level.
When using Paranoid Level, using big file transfers, this is about 800 times faster with new code, while
using high number of HttpData, this is about 170 times faster with new code.
2021-02-26 14:26:24 +01:00
..

Microbenchmark tests

See our wiki page.