0811409ca3
Motivation: This PR fixes some non-negligible overhead discovered in the ByteBuf accessibility (non-zero refcount) checking. The cause turned out to be mostly twofold: - Unnecessary operations used to calculate the refcount from the "raw" encoded int field value - Call stack depths exceeding the default limit for inlining, in some places (CompositeByteBuf in particular) It's a follow-on from #8882 which uses the maxCapacity field for a simpler non-negative check. The performance gap between these two variants appears to be _mostly_ closed, but there's one exception which may warrant further analysis. Modifications: - Replace ABB.internalRefCount() with ByteBuf.isAccessible(), the default still checks for non-zero refCnt() - Just test for parity of raw refCnt instead of converting to "real", with fast-path for specific small values - Make sure isAccessible() is delegated by derived/wrapper ByteBufs - Use existing freed flag in CompositeByteBuf for faster isAccessible() - Manually inline some calls in methods like CompositeByteBuf.setLong() and AbstractReferenceCountedByteBuf.isAccessible() to reduce stack depths (to ensure default inlining limit isn't hit) - Add ByteBufAccessBenchmark which is an extension of UnsafeByteBufBenchmark (maybe latter could now be removed) Results: Before: Benchmark (bufferType) (checkAccessible) (checkBounds) Mode Cnt Score Error Units readBatch UNSAFE true true thrpt 30 84524972.863 ± 518338.811 ops/s readBatch UNSAFE_SLICE true true thrpt 30 38608795.037 ± 298176.974 ops/s readBatch HEAP true true thrpt 30 80003697.649 ± 974674.119 ops/s readBatch COMPOSITE true true thrpt 30 18495554.788 ± 108075.023 ops/s setGetLong UNSAFE true true thrpt 30 247069881.578 ± 10839162.593 ops/s setGetLong UNSAFE_SLICE true true thrpt 30 196355905.206 ± 1802420.990 ops/s setGetLong HEAP true true thrpt 30 245686644.713 ± 11769311.527 ops/s setGetLong COMPOSITE true true thrpt 30 83170940.687 ± 657524.123 ops/s setLong UNSAFE true true thrpt 30 278940253.918 ± 1807265.259 ops/s setLong UNSAFE_SLICE true true thrpt 30 202556738.764 ± 11887973.563 ops/s setLong HEAP true true thrpt 30 280045958.053 ± 2719583.400 ops/s setLong COMPOSITE true true thrpt 30 121299806.002 ± 2155084.707 ops/s After: Benchmark (bufferType) (checkAccessible) (checkBounds) Mode Cnt Score Error Units readBatch UNSAFE true true thrpt 30 101641801.035 ± 3950050.059 ops/s readBatch UNSAFE_SLICE true true thrpt 30 84395902.846 ± 4339579.057 ops/s readBatch HEAP true true thrpt 30 100179060.207 ± 3222487.287 ops/s readBatch COMPOSITE true true thrpt 30 42288494.472 ± 294919.633 ops/s setGetLong UNSAFE true true thrpt 30 304530755.027 ± 6574163.899 ops/s setGetLong UNSAFE_SLICE true true thrpt 30 212028547.645 ± 14277828.768 ops/s setGetLong HEAP true true thrpt 30 309335422.609 ± 2272150.415 ops/s setGetLong COMPOSITE true true thrpt 30 160383609.236 ± 966484.033 ops/s setLong UNSAFE true true thrpt 30 298055969.747 ± 7437449.627 ops/s setLong UNSAFE_SLICE true true thrpt 30 223784178.650 ± 9869750.095 ops/s setLong HEAP true true thrpt 30 302543263.328 ± 8140104.706 ops/s setLong COMPOSITE true true thrpt 30 157083673.285 ± 3528779.522 ops/s There's also a similar knock-on improvement to other benchmarks (e.g. HPACK encoding/decoding) as shown in #8882. For sanity I did a final comparison of the "fast path" tweak using one of the HPACK benchmarks: (rawCnt & 1) == 0: Benchmark (limitToAscii) (sensitive) (size) Mode Cnt Score Error Units HpackDecoderBenchmark.decode true true MEDIUM thrpt 30 50914.479 ± 940.114 ops/s rawCnt == 2 || rawCnt == 4 || rawCnt == 6 || rawCnt == 8 || (rawCnt & 1) == 0: Benchmark (limitToAscii) (sensitive) (size) Mode Cnt Score Error Units HpackDecoderBenchmark.decode true true MEDIUM thrpt 30 60036.425 ± 1478.196 ops/s |
||
---|---|---|
.github | ||
.mvn | ||
all | ||
bom | ||
buffer | ||
codec | ||
codec-dns | ||
codec-haproxy | ||
codec-http | ||
codec-http2 | ||
codec-memcache | ||
codec-mqtt | ||
codec-redis | ||
codec-smtp | ||
codec-socks | ||
codec-stomp | ||
codec-xml | ||
common | ||
dev-tools | ||
docker | ||
example | ||
handler | ||
handler-proxy | ||
license | ||
microbench | ||
resolver | ||
resolver-dns | ||
tarball | ||
testsuite | ||
testsuite-autobahn | ||
testsuite-http2 | ||
testsuite-osgi | ||
testsuite-shading | ||
transport | ||
transport-native-epoll | ||
transport-native-kqueue | ||
transport-native-unix-common | ||
transport-native-unix-common-tests | ||
transport-rxtx | ||
transport-sctp | ||
transport-udt | ||
.fbprefs | ||
.gitattributes | ||
.gitignore | ||
CONTRIBUTING.md | ||
LICENSE.txt | ||
mvnw | ||
mvnw.cmd | ||
NOTICE.txt | ||
pom.xml | ||
README.md | ||
run-example.sh |
Netty Project
Netty is an asynchronous event-driven network application framework for rapid development of maintainable high performance protocol servers & clients.
Links
How to build
For the detailed information about building and developing Netty, please visit the developer guide. This page only gives very basic information.
You require the following to build Netty:
- Latest stable Oracle JDK 7
- Latest stable Apache Maven
- If you are on Linux, you need additional development packages installed on your system, because you'll build the native transport.
Note that this is build-time requirement. JDK 5 (for 3.x) or 6 (for 4.0+) is enough to run your Netty-based application.
Branches to look
Development of all versions takes place in each branch whose name is identical to <majorVersion>.<minorVersion>
. For example, the development of 3.9 and 4.0 resides in the branch '3.9' and the branch '4.0' respectively.
Usage with JDK 9
Netty can be used in modular JDK9 applications as a collection of automatic modules. The module names follow the reverse-DNS style, and are derived from subproject names rather than root packages due to historical reasons. They are listed below:
io.netty.all
io.netty.buffer
io.netty.codec
io.netty.codec.dns
io.netty.codec.haproxy
io.netty.codec.http
io.netty.codec.http2
io.netty.codec.memcache
io.netty.codec.mqtt
io.netty.codec.redis
io.netty.codec.smtp
io.netty.codec.socks
io.netty.codec.stomp
io.netty.codec.xml
io.netty.common
io.netty.handler
io.netty.handler.proxy
io.netty.resolver
io.netty.resolver.dns
io.netty.transport
io.netty.transport.epoll
(native
omitted - reserved keyword in Java)io.netty.transport.kqueue
(native
omitted - reserved keyword in Java)io.netty.transport.unix.common
(native
omitted - reserved keyword in Java)io.netty.transport.rxtx
io.netty.transport.sctp
io.netty.transport.udt
Automatic modules do not provide any means to declare dependencies, so you need to list each used module separately
in your module-info
file.