Commit Graph

389 Commits

Author SHA1 Message Date
root
01b7e18632 [maven-release-plugin] prepare for next development iteration 2020-10-13 06:29:26 +00:00
root
d4a0050ef3 [maven-release-plugin] prepare release netty-4.1.53.Final 2020-10-13 06:29:02 +00:00
Francesco Nigro
69b44c6d06
Reduce DefaultAttributeMap lookup cost (#10530)
Motivation:

DefaultAttributeMap::attr has a blocking behaviour on lookup of an existing attribute:
it can be made non-blocking.

Modification:

Replace the existing fixed bucket table using a locked intrusive linked list
with an hand-rolled copy-on-write ordered single array

Result:
Non blocking behaviour for the lookup happy path
2020-10-02 18:24:35 +02:00
Francesco Nigro
162e59848a
Improve predictability of writeUtf8/writeAscii performance (#10368)
Motivation:

writeUtf8 can suffer from inlining issues and/or megamorphic call-sites on the hot path due to ByteBuf hierarchy

Modifications:

Duplicate and specialize the code paths to reduce the need of polymorphic calls

Result:

Performance are more stable in user code
2020-09-09 16:10:26 +02:00
root
957ef746d8 [maven-release-plugin] prepare for next development iteration 2020-09-08 05:26:25 +00:00
root
ada9c38c0a [maven-release-plugin] prepare release netty-4.1.52.Final 2020-09-08 05:26:05 +00:00
Francesco Nigro
38f01e0840
Reduce garbage on MQTT (#10509)
Reduce garbage on MQTT encoding

Motivation:

MQTT encoding and decoding is doing unnecessary object allocation in a number of places:
- MqttEncoder create many byte[] to encode Strings into UTF-8 bytes
- MqttProperties uses Integer keys instead of int
- Some enums valueOf create unnecessary arrays on the hot paths
- MqttDecoder was using unecessary Result<T>

Modification:

- ByteBufUtil::utf8Bytes and ByteBufUtil::reserveAndWriteUtf8 allows to perform the same operation GC-free
- MqttProperties uses a primitive key map
- Implemented GC free const table lookup/switch valueOf
- Use some bit-tricks to pack 2 ints into a single primitive long to store both result and numberOfBytesConsumed and use byte[].length to compute numberOfByteConsumed on fly. These changes allowed to save creating Result<T>.

Result:
Significantly less garbage produced in MQTT encoding/decoding
2020-09-04 18:27:22 +02:00
Francesco Nigro
d2c03c9a29
Improve MqttMessageType::valueOf cost (#10400)
Motivation:

MqttMessageType::valueOf has O(N) cost

Modifications:

MqttMessageType::valueOf uses a const lookup table

Result:

MqttMessageType::valueOf has O(1) cost
2020-08-31 10:32:33 +02:00
root
bfbeb2dec6 [maven-release-plugin] prepare for next development iteration 2020-07-09 12:27:06 +00:00
root
646934ef0a [maven-release-plugin] prepare release netty-4.1.51.Final 2020-07-09 12:26:30 +00:00
root
caf51b7284 [maven-release-plugin] prepare for next development iteration 2020-05-13 06:00:23 +00:00
root
8c5b72aaf0 [maven-release-plugin] prepare release netty-4.1.50.Final 2020-05-13 05:59:55 +00:00
root
9c5008b109 [maven-release-plugin] prepare for next development iteration 2020-04-22 09:57:54 +00:00
root
d0ec961cce [maven-release-plugin] prepare release netty-4.1.49.Final 2020-04-22 09:57:26 +00:00
Linas Medžiūnas
fb5e2cd3aa
Efficient BytBuf search algorithms (#9914) (#9955)
Motivation:

We have found out that ByteBufUtil.indexOf can be inefficient for substring search on
ByteBuf, both in terms of algorithm complexity (worst case O(needle.readableBytes *
haystack.readableBytes)), and in constant factor (esp. on Composite buffers).
With implementation of more performant search algorithms we have seen improvements on
the order of magnitude.

Modifications:

This change introduces three search algorithms:
1. Knuth Morris Pratt - classical textbook algorithm, a good default choice.
2. Bit mask based algorithm - stable performance on any input, but limited to maximum
search substring (the needle) length of 64 bytes.
3. Aho–Corasick - worse performance and higher memory consumption than [1] and [2], but
it supports multiple substring (the needles) search simultaneously, by inspecting every
byte of the haystack only once.

Each algorithm processes every byte of underlying buffer only once, they are implemented
as ByteProcessor.

Result:

Efficient search algorithms with linear time complexity available in Netty (I will share
benchmark results in a comment on a PR).
2020-04-15 10:21:24 +02:00
Dmitry Konstantinov
ea31b59037
Replace usage() with freeBytes() in thresholds within hot paths of PoolChunkList (#10141)
Motivation:
PoolChunk.usage() method has non-trivial computations. It is used currently in hot path methods invoked when an allocation and de-allocation are happened.
The idea is to replace usage() output comparison against percent thresholds by Chunk.freeBytes plain comparison against absolute thresholds. In such way the majority of computations from the threshold conditions are moved to init logic.

Modifications:
Replace PoolChunk.usage() conditions in PoolChunkList with equivalent conditions for PoolChunk.freeBytes()

Result:
Improve performance of allocation and de-allocation of ByteBuf from normal size cache pool
2020-03-31 22:11:16 +02:00
root
14e4afeba2 [maven-release-plugin] prepare for next development iteration 2020-03-17 09:20:54 +00:00
root
c10c697e5b [maven-release-plugin] prepare release netty-4.1.48.Final 2020-03-17 09:18:28 +00:00
root
c623a50d19 [maven-release-plugin] prepare for next development iteration 2020-03-09 12:13:56 +00:00
root
a401b2ac92 [maven-release-plugin] prepare release netty-4.1.47.Final 2020-03-09 12:13:26 +00:00
root
e0d73bca4d [maven-release-plugin] prepare for next development iteration 2020-02-28 06:37:33 +00:00
root
ebe7af5102 [maven-release-plugin] prepare release netty-4.1.46.Final 2020-02-28 06:36:45 +00:00
root
9b1ea10a12 [maven-release-plugin] prepare for next development iteration 2020-01-13 09:13:53 +00:00
root
136db8680a [maven-release-plugin] prepare release netty-4.1.45.Final 2020-01-13 09:13:30 +00:00
Francesco Nigro
bc026ef8ba Faster decodeHexNibble (#9896)
Motivation:

decodeHexNibble can be a lot faster using a lookup table

Modifications:

decodeHexNibble is made faster by using a lookup table

Result:

decodeHexNibble is faster
2019-12-23 21:15:56 +01:00
Anuraag Agrawal
687308b4de Separate out query string encoding for non-encoded strings. (#9887)
Motivation:

Currently, characters are appended to the encoded string char-by-char even when no encoding is needed. We can instead separate out codepath that appends the entire string in one go for better `StringBuilder` allocation performance.

Modification:

Only go into char-by-char loop when finding a character that requires encoding.

Result:

The results aren't so clear with noise on my hot laptop - the biggest impact is on long strings, both to reduce resizes of the buffer and also to reduce complexity of the loop. I don't think there's a significant downside though for the cases that hit the slow path.

After
```
Benchmark                                     Mode  Cnt   Score   Error   Units
QueryStringEncoderBenchmark.longAscii        thrpt    6   1.406 ± 0.069  ops/us
QueryStringEncoderBenchmark.longAsciiFirst   thrpt    6   0.046 ± 0.001  ops/us
QueryStringEncoderBenchmark.longUtf8         thrpt    6   0.046 ± 0.001  ops/us
QueryStringEncoderBenchmark.shortAscii       thrpt    6  15.781 ± 0.949  ops/us
QueryStringEncoderBenchmark.shortAsciiFirst  thrpt    6   3.171 ± 0.232  ops/us
QueryStringEncoderBenchmark.shortUtf8        thrpt    6   3.900 ± 0.667  ops/us
```

Before
```
Benchmark                                     Mode  Cnt   Score    Error   Units
QueryStringEncoderBenchmark.longAscii        thrpt    6   0.444 ±  0.072  ops/us
QueryStringEncoderBenchmark.longAsciiFirst   thrpt    6   0.043 ±  0.002  ops/us
QueryStringEncoderBenchmark.longUtf8         thrpt    6   0.047 ±  0.001  ops/us
QueryStringEncoderBenchmark.shortAscii       thrpt    6  16.503 ±  1.015  ops/us
QueryStringEncoderBenchmark.shortAsciiFirst  thrpt    6   3.316 ±  0.154  ops/us
QueryStringEncoderBenchmark.shortUtf8        thrpt    6   3.776 ±  0.956  ops/us
```
2019-12-20 08:51:18 +01:00
Anuraag Agrawal
95b8db0633 Use array to buffer decoded query instead of ByteBuffer. (#9886)
Motivation:

In Java, it is almost always at least slower to use `ByteBuffer` than `byte[]` without pooling or I/O. `QueryStringDecoder` can use `byte[]` with arguably simpler code.

Modification:

Replace `ByteBuffer` / `CharsetDecoder` with `byte[]` and `new String`

Result:

After
```
Benchmark                                   Mode  Cnt  Score   Error   Units
QueryStringDecoderBenchmark.noDecoding     thrpt    6  5.612 ± 2.639  ops/us
QueryStringDecoderBenchmark.onlyDecoding   thrpt    6  1.393 ± 0.067  ops/us
QueryStringDecoderBenchmark.mixedDecoding  thrpt    6  1.223 ± 0.048  ops/us
```

Before
```
Benchmark                                   Mode  Cnt  Score   Error   Units
QueryStringDecoderBenchmark.noDecoding     thrpt    6  6.123 ± 0.250  ops/us
QueryStringDecoderBenchmark.onlyDecoding   thrpt    6  0.922 ± 0.159  ops/us
QueryStringDecoderBenchmark.mixedDecoding  thrpt    6  1.032 ± 0.178  ops/us
```

I notice #6781 switched from an array to `ByteBuffer` but I can't find any motivation for that in the PR. Unit tests pass fine with an array and we get a reasonable speed bump.
2019-12-18 21:11:28 +01:00
root
79d4e74019 [maven-release-plugin] prepare for next development iteration 2019-12-18 08:32:54 +00:00
root
5ddf45a2d5 [maven-release-plugin] prepare release netty-4.1.44.Final 2019-12-18 08:31:43 +00:00
时无两丶
0cde4d9cb4 Uniform null pointer check. (#9840)
Motivation:
Uniform null pointer check.

Modifications:

Use ObjectUtil.checkNonNull(...)

Result:
Less code, same result.
2019-12-09 09:47:35 +01:00
Nick Hill
43252a6135 Update to latest JMH version (#9787)
Motivation

JMH 1.22 was released recently, we might as well use the latest when
running benchmarks.

Summary of changes:
https://mail.openjdk.java.net/pipermail/jmh-dev/2019-November/002879.html

Modifications

Update jmh dependencies in microbench module from version 1.21 to 1.22.

Result

Benchmarks run using latest JMH
2019-11-19 11:28:18 +01:00
Nick Hill
feb804dca8 Avoid extra Runnable allocs when scheduling tasks outside event loop (#9744)
Motivation

Currently when future tasks are scheduled via EventExecutors from a
different thread, at least two allocations are performed - the
ScheduledFutureTask wrapping the to-be-run task, and a Runnable wrapping
the action to add to the scheduled task priority queue. The latter can
be avoided by incorporating this logic into the former.

Modification

- When scheduling or cancelling a future task from outside the event
loop, enqueue the task itself rather than wrapping in a Runnable
- Have ScheduledFutureTask#run first verify the task's deadline has
passed and if not add or remove it from the scheduledTaskQueue depending
on its cancellation state
- Add new outside-event-loop benchmarks to ScheduleFutureTaskBenchmark

Result

Fewer allocations when scheduling/cancelling future tasks
2019-11-04 11:57:53 +01:00
root
844b82b986 [maven-release-plugin] prepare for next development iteration 2019-10-24 12:57:00 +00:00
root
d066f163d7 [maven-release-plugin] prepare release netty-4.1.43.Final 2019-10-24 12:56:30 +00:00
康智冬
bd8cea644a Fix typos in javadocs (#9527)
Motivation:

We should have correct docs without typos

Modification:

Fix typos and spelling

Result:

More correct docs
2019-10-09 17:12:52 +04:00
root
92941cdcac [maven-release-plugin] prepare for next development iteration 2019-09-25 06:15:31 +00:00
root
bd907c3b3a [maven-release-plugin] prepare release netty-4.1.42.Final 2019-09-25 06:14:31 +00:00
Nick Hill
2791f0fefa Avoid use of global AtomicLong for ScheduledFutureTask ids (#9599)
Motivation

Currently a static AtomicLong is used to allocate a unique id whenever a
task is scheduled to any event loop. This could be a source of
contention if delayed tasks are scheduled at a high frequency and can be
easily avoided by having a non-volatile id counter per queue.

Modifications

- Replace static AtomicLong ScheduledFutureTask#nextTaskId with a long
field in AbstractScheduledExecutorService
- Set ScheduledFutureTask#id based on this when adding the task to the
queue (in event loop) instead of at construction time
- Add simple benchmark

Result

Less contention / cache-miss possibility when scheduling future tasks

Before:

Benchmark      (num)   Mode  Cnt    Score    Error  Units
scheduleLots  100000  thrpt   20  346.008 ± 21.931  ops/s

Benchmark      (num)   Mode  Cnt    Score    Error  Units
scheduleLots  100000  thrpt   20  654.824 ± 22.064  ops/s
2019-09-25 07:34:25 +02:00
root
01d805bb76 [maven-release-plugin] prepare for next development iteration 2019-09-12 16:09:55 +00:00
root
7cf69022d4 [maven-release-plugin] prepare release netty-4.1.41.Final 2019-09-12 16:09:00 +00:00
root
aef47bec7f [maven-release-plugin] prepare for next development iteration 2019-09-12 05:38:11 +00:00
root
267e5da481 [maven-release-plugin] prepare release netty-4.1.40.Final 2019-09-12 05:37:30 +00:00
root
d45a4ce01b [maven-release-plugin] prepare for next development iteration 2019-08-13 17:16:42 +00:00
root
88c2a4cab5 [maven-release-plugin] prepare release netty-4.1.39.Final 2019-08-13 17:15:20 +00:00
root
718b7626e6 [maven-release-plugin] prepare for next development iteration 2019-07-24 09:05:57 +00:00
root
465c900c04 [maven-release-plugin] prepare release netty-4.1.38.Final 2019-07-24 09:05:23 +00:00
jingene
c0f9364870 Change the netty.io homepage scheme(http -> https) (#9344)
Motivation:

Netty homepage(netty.io) serves both "http" and "https".
It's recommended to use https than http.
Modification:

I changed from "http://netty.io" to "https://netty.io"
Result:

No effects.
2019-07-09 21:09:42 +02:00
Norman Maurer
6da809dc11
Increase maxHeaderListSize for HpackDecoderBenchmark to be able to be… (#9321)
Motivation:

The previous used maxHeaderListSize was too low which resulted in exceptions during the benchmark run:

```
io.netty.handler.codec.http2.Http2Exception: Header size exceeded max allowed size (8192)
	at io.netty.handler.codec.http2.Http2Exception.connectionError(Http2Exception.java:103)
	at io.netty.handler.codec.http2.Http2Exception.headerListSizeError(Http2Exception.java:188)
	at io.netty.handler.codec.http2.Http2CodecUtil.headerListSizeExceeded(Http2CodecUtil.java:231)
	at io.netty.handler.codec.http2.HpackDecoder$Http2HeadersSink.finish(HpackDecoder.java:545)
	at io.netty.handler.codec.http2.HpackDecoder.decode(HpackDecoder.java:132)
	at io.netty.handler.codec.http2.HpackDecoderBenchmark.decode(HpackDecoderBenchmark.java:85)
	at io.netty.handler.codec.http2.generated.HpackDecoderBenchmark_decode_jmhTest.decode_thrpt_jmhStub(HpackDecoderBenchmark_decode_jmhTest.java:120)
	at io.netty.handler.codec.http2.generated.HpackDecoderBenchmark_decode_jmhTest.decode_Throughput(HpackDecoderBenchmark_decode_jmhTest.java:83)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:453)
	at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:437)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:748)

```

Also we should ensure we only use ascii for header names.

Modifications:

Just use Integer.MAX_VALUE as limit

Result:

Be able to run benchmark without exceptions
2019-07-04 11:24:13 +02:00
Carl Mastrangelo
ff0045e3e1 Use Table lookup for HPACK decoder (#9307)
Motivation:
Table based decoding is fast.

Modification:
Use table based decoding in HPACK decoder, inspired by
https://github.com/python-hyper/hpack/blob/master/hpack/huffman_table.py

This modifies the table to be based on integers, rather than 3-tuples of
bytes.  This is for two reasons:

1.  It's faster
2.  Using bytes makes the static intializer too big, and doesn't
compile.

Result:
Faster Huffman decoding.  This only seems to help the ascii case, the
other decoding is about the same.

Benchmarks:

```
Before:
Benchmark                     (limitToAscii)  (sensitive)  (size)   Mode  Cnt        Score       Error  Units
HpackDecoderBenchmark.decode            true         true   SMALL  thrpt   20   426293.636 ±  1444.843  ops/s
HpackDecoderBenchmark.decode            true         true  MEDIUM  thrpt   20    57843.738 ±   725.704  ops/s
HpackDecoderBenchmark.decode            true         true   LARGE  thrpt   20     3002.412 ±    16.998  ops/s
HpackDecoderBenchmark.decode            true        false   SMALL  thrpt   20   412339.400 ±  1128.394  ops/s
HpackDecoderBenchmark.decode            true        false  MEDIUM  thrpt   20    58226.870 ±   199.591  ops/s
HpackDecoderBenchmark.decode            true        false   LARGE  thrpt   20     3044.256 ±    10.675  ops/s
HpackDecoderBenchmark.decode           false         true   SMALL  thrpt   20  2082615.030 ±  5929.726  ops/s
HpackDecoderBenchmark.decode           false         true  MEDIUM  thrpt   10   571640.454 ± 26499.229  ops/s
HpackDecoderBenchmark.decode           false         true   LARGE  thrpt   20    92714.555 ±  2292.222  ops/s
HpackDecoderBenchmark.decode           false        false   SMALL  thrpt   20  1745872.421 ±  6788.840  ops/s
HpackDecoderBenchmark.decode           false        false  MEDIUM  thrpt   20   490420.323 ±  2455.431  ops/s
HpackDecoderBenchmark.decode           false        false   LARGE  thrpt   20    84536.200 ±   398.714  ops/s

After(bytes):
Benchmark                     (limitToAscii)  (sensitive)  (size)   Mode  Cnt        Score      Error  Units
HpackDecoderBenchmark.decode            true         true   SMALL  thrpt   20   472649.148 ± 7122.461  ops/s
HpackDecoderBenchmark.decode            true         true  MEDIUM  thrpt   20    66739.638 ±  341.607  ops/s
HpackDecoderBenchmark.decode            true         true   LARGE  thrpt   20     3139.773 ±   24.491  ops/s
HpackDecoderBenchmark.decode            true        false   SMALL  thrpt   20   466933.833 ± 4514.971  ops/s
HpackDecoderBenchmark.decode            true        false  MEDIUM  thrpt   20    66111.778 ±  568.326  ops/s
HpackDecoderBenchmark.decode            true        false   LARGE  thrpt   20     3143.619 ±    3.332  ops/s
HpackDecoderBenchmark.decode           false         true   SMALL  thrpt   20  2109995.177 ± 6203.143  ops/s
HpackDecoderBenchmark.decode           false         true  MEDIUM  thrpt   20   586026.055 ± 1578.550  ops/s
HpackDecoderBenchmark.decode           false        false   SMALL  thrpt   20  1775723.270 ± 4932.057  ops/s
HpackDecoderBenchmark.decode           false        false  MEDIUM  thrpt   20   493316.467 ± 1453.037  ops/s
HpackDecoderBenchmark.decode           false        false   LARGE  thrpt   10    85726.219 ±  402.573  ops/s

After(ints):
Benchmark                     (limitToAscii)  (sensitive)  (size)   Mode  Cnt        Score       Error  Units
HpackDecoderBenchmark.decode            true         true   SMALL  thrpt   20   615549.006 ±  5282.283  ops/s
HpackDecoderBenchmark.decode            true         true  MEDIUM  thrpt   20    86714.630 ±   654.489  ops/s
HpackDecoderBenchmark.decode            true         true   LARGE  thrpt   20     3984.439 ±    61.612  ops/s
HpackDecoderBenchmark.decode            true        false   SMALL  thrpt   20   602489.337 ±  5397.024  ops/s
HpackDecoderBenchmark.decode            true        false  MEDIUM  thrpt   20    88399.109 ±   241.115  ops/s
HpackDecoderBenchmark.decode            true        false   LARGE  thrpt   20     3875.729 ±   103.057  ops/s
HpackDecoderBenchmark.decode           false         true   SMALL  thrpt   20  2092165.454 ± 11918.859  ops/s
HpackDecoderBenchmark.decode           false         true  MEDIUM  thrpt   20   583465.437 ±  5452.115  ops/s
HpackDecoderBenchmark.decode           false         true   LARGE  thrpt   20    93290.061 ±   665.904  ops/s
HpackDecoderBenchmark.decode           false        false   SMALL  thrpt   20  1758402.495 ± 14677.438  ops/s
HpackDecoderBenchmark.decode           false        false  MEDIUM  thrpt   10   491598.099 ±  5029.698  ops/s
HpackDecoderBenchmark.decode           false        false   LARGE  thrpt   20    85834.290 ±   554.915  ops/s
```
2019-07-02 20:09:44 +02:00
root
5b58b8e6b5 [maven-release-plugin] prepare for next development iteration 2019-06-28 05:57:21 +00:00