netty5

Author	SHA1	Message	Date
Anuraag Agrawal	1f1a60ae7d	Use Netty's DefaultPriorityQueue instead of JDK's PriorityQueue for scheduled tasks Motivation: `AbstractScheduledEventExecutor` uses a standard `java.util.PriorityQueue` to keep track of task deadlines. `ScheduledFuture.cancel` removes tasks from this `PriorityQueue`. Unfortunately, `PriorityQueue.remove` has `O(n)` performance since it must search for the item in the entire queue before removing it. This is fast when the future is at the front of the queue (e.g., already triggered) but not when it's randomly located in the queue. Many servers will use `ScheduledFuture.cancel` on all requests, e.g., to manage a request timeout. As these cancellations will be happen in arbitrary order, when there are many scheduled futures, `PriorityQueue.remove` is a bottleneck and greatly hurts performance with many concurrent requests (>10K). Modification: Use netty's `DefaultPriorityQueue` for scheduling futures instead of the JDK. `DefaultPriorityQueue` is almost identical to the JDK version except it is able to remove futures without searching for them in the queue. This means `DefaultPriorityQueue.remove` has `O(log n)` performance. Result: Before - cancelling futures has varying performance, capped at `O(n)` After - cancelling futures has stable performance, capped at `O(log n)` Benchmark results After - cancelling in order and in reverse order have similar performance within `O(log n)` bounds ``` Benchmark (num) Mode Cnt Score Error Units ScheduledFutureTaskBenchmark.cancelInOrder 100 thrpt 20 137779.616 ± 7709.751 ops/s ScheduledFutureTaskBenchmark.cancelInOrder 1000 thrpt 20 11049.448 ± 385.832 ops/s ScheduledFutureTaskBenchmark.cancelInOrder 10000 thrpt 20 943.294 ± 12.391 ops/s ScheduledFutureTaskBenchmark.cancelInOrder 100000 thrpt 20 64.210 ± 1.824 ops/s ScheduledFutureTaskBenchmark.cancelInReverseOrder 100 thrpt 20 167531.096 ± 9187.865 ops/s ScheduledFutureTaskBenchmark.cancelInReverseOrder 1000 thrpt 20 33019.786 ± 4737.770 ops/s ScheduledFutureTaskBenchmark.cancelInReverseOrder 10000 thrpt 20 2976.955 ± 248.555 ops/s ScheduledFutureTaskBenchmark.cancelInReverseOrder 100000 thrpt 20 362.654 ± 45.716 ops/s ``` Before - cancelling in order and in reverse order have significantly different performance at higher queue size, orders of magnitude worse than the new implementation. ``` Benchmark (num) Mode Cnt Score Error Units ScheduledFutureTaskBenchmark.cancelInOrder 100 thrpt 20 139968.586 ± 12951.333 ops/s ScheduledFutureTaskBenchmark.cancelInOrder 1000 thrpt 20 12274.420 ± 337.800 ops/s ScheduledFutureTaskBenchmark.cancelInOrder 10000 thrpt 20 958.168 ± 15.350 ops/s ScheduledFutureTaskBenchmark.cancelInOrder 100000 thrpt 20 53.381 ± 13.981 ops/s ScheduledFutureTaskBenchmark.cancelInReverseOrder 100 thrpt 20 123918.829 ± 3642.517 ops/s ScheduledFutureTaskBenchmark.cancelInReverseOrder 1000 thrpt 20 5099.810 ± 206.992 ops/s ScheduledFutureTaskBenchmark.cancelInReverseOrder 10000 thrpt 20 72.335 ± 0.443 ops/s ScheduledFutureTaskBenchmark.cancelInReverseOrder 100000 thrpt 20 0.743 ± 0.003 ops/s ```	2017-11-10 23:09:32 -08:00
Carl Mastrangelo	83a19d5650	Optimistically update ref counts Motivation: Highly retained and released objects have contention on their ref count. Currently, the ref count is updated using compareAndSet with care to make sure the count doesn't overflow, double free, or revive the object. Profiling has shown that a non trivial (~1%) of CPU time on gRPC latency benchmarks is from the ref count updating. Modification: Rather than pessimistically assuming the ref count will be invalid, optimistically update it assuming it will be. If the update was wrong, then use the slow path to revert the change and throw an execption. Most of the time, the ref counts are correct. This changes from using compareAndSet to getAndAdd, which emits a different CPU instruction on x86 (CMPXCHG to XADD). Because the CPU knows it will modifiy the memory, it can avoid contention. On a highly contended machine, this can be about 2x faster. There is a downside to the new approach. The ref counters can temporarily enter invalid states if over retained or over released. The code does handle these overflow and underflow scenarios, but it is possible that another concurrent access may push the failure to a different location. For example: Time 1 Thread 1: obj.retain(INT_MAX - 1) Time 2 Thread 1: obj.retain(2) Time 2 Thread 2: obj.retain(1) Previously Thread 2 would always succeed and Thread 1 would always fail on the second access. Now, thread 2 could fail while thread 1 is rolling back its change. ==== There are a few reasons why I think this is okay: 1. Buggy code is going to have bugs. An exception _is_ going to be thrown. This just causes the other threads to notice the state is messed up and stop early. 2. If high retention counts are a use case, then ref count should be a long rather than an int. 3. The critical section is greatly reduced compared to the previous version, so the likelihood of this happening is lower 4. On error, the code always rollsback the change atomically, so there is no possibility of corruption. Result: Faster refcounting ``` BEFORE: Benchmark (delay) Mode Cnt Score Error Units AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1 sample 2901361 804.579 ± 1.835 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10 sample 3038729 785.376 ± 16.471 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 100 sample 2899401 817.392 ± 6.668 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1000 sample 3650566 2077.700 ± 0.600 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10000 sample 3005467 19949.334 ± 4.243 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1 sample 456091 48.610 ± 1.162 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10 sample 732051 62.599 ± 0.815 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 100 sample 778925 228.629 ± 1.205 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1000 sample 633682 2002.987 ± 2.856 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10000 sample 506442 19735.345 ± 12.312 ns/op AFTER: Benchmark (delay) Mode Cnt Score Error Units AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1 sample 3761980 383.436 ± 1.315 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10 sample 3667304 474.429 ± 1.101 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 100 sample 3039374 479.267 ± 0.435 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 1000 sample 3709210 2044.603 ± 0.989 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_contended 10000 sample 3011591 19904.227 ± 18.025 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1 sample 494975 52.269 ± 8.345 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10 sample 771094 62.290 ± 0.795 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 100 sample 763230 235.044 ± 1.552 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 1000 sample 634037 2006.578 ± 3.574 ns/op AbstractReferenceCountedByteBufBenchmark.retainRelease_uncontended 10000 sample 506284 19742.605 ± 13.729 ns/op ```	2017-10-04 08:42:33 +02:00
Norman Maurer	3c8c7fc7e9	Reduce performance overhead of ResourceLeakDetector Motiviation: The ResourceLeakDetector helps to detect and troubleshoot resource leaks and is often used even in production enviroments with a low level. Because of this its import that we try to keep the overhead as low as overhead. Most of the times no leak is detected (as all is correctly handled) so we should keep the overhead for this case as low as possible. Modifications: - Only call getStackTrace() if a leak is reported as it is a very expensive native call. Also handle the filtering and creating of the String in a lazy fashion - Remove the need to mantain a Queue to store the last access records - Add benchmark Result: Huge decrease of performance overhead. Before the patch: Benchmark (recordTimes) Mode Cnt Score Error Units ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 4358.367 ± 116.419 ops/s ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 2306.027 ± 55.044 ops/s ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 4220.979 ± 114.046 ops/s ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 2250.734 ± 55.352 ops/s With this patch: Benchmark (recordTimes) Mode Cnt Score Error Units ResourceLeakDetectorRecordBenchmark.record 8 thrpt 20 71398.957 ± 2695.925 ops/s ResourceLeakDetectorRecordBenchmark.record 16 thrpt 20 38643.963 ± 1446.694 ops/s ResourceLeakDetectorRecordBenchmark.recordWithHint 8 thrpt 20 71677.882 ± 2923.622 ops/s ResourceLeakDetectorRecordBenchmark.recordWithHint 16 thrpt 20 38660.176 ± 1467.732 ops/s	2017-09-18 16:36:19 -07:00
Nikolay Fedorovskikh	df568c739e	Use ByteBuf#writeShort/writeMedium instead of writeBytes Motivation: 1. Some encoders used a `ByteBuf#writeBytes` to write short constant byte array (2-3 bytes). This can be replaced with more faster `ByteBuf#writeShort` or `ByteBuf#writeMedium` which do not access the memory. 2. Two chained calls of the `ByteBuf#setByte` with constants can be replaced with one `ByteBuf#setShort` to reduce index checks. 3. The signature of method `HttpHeadersEncoder#encoderHeader` has an unnecessary `throws`. Modifications: 1. Use `ByteBuf#writeShort` or `ByteBuf#writeMedium` instead of `ByteBuf#writeBytes` for the constants. 2. Use `ByteBuf#setShort` instead of chained call of the `ByteBuf#setByte` with constants. 3. Remove an unnecessary `throws` from `HttpHeadersEncoder#encoderHeader`. Result: A bit faster writes constants into buffers.	2017-07-10 14:37:41 +02:00
Dmitriy Dumanskiy	dd69a813d4	Performance improvement for HttpRequestEncoder. Insert char into the string optimized. Motivation: Right now HttpRequestEncoder does insertion of slash for url like http://localhost?pararm=1 before the question mark. It is done not effectively. Modification: Code: new StringBuilder(len + 1) .append(uri, 0, index) .append(SLASH) .append(uri, index, len) .toString(); Replaced with: new StringBuilder(uri) .insert(index, SLASH) .toString(); Result: Faster HttpRequestEncoder. Additional small test. Attached benchmark in PR. Benchmark Mode Cnt Score Error Units HttpRequestEncoderInsertBenchmark.newEncoder thrpt 40 3704843.303 ± 98950.919 ops/s HttpRequestEncoderInsertBenchmark.oldEncoder thrpt 40 3284236.960 ± 134433.217 ops/s	2017-06-27 10:53:43 +02:00
Nikolay Fedorovskikh	aa38b6a769	Prevent unnecessary allocations in the `StringUtil#escapeCsv` Motivation: A `StringUtil#escapeCsv` creates new `StringBuilder` on each value even if the same string is returned in the end. Modifications: Create new `StringBuilder` only if it really needed. Otherwise, return the original string (or just trimmed substring). Result: Less GC load. Up to 4x faster work for not changed strings.	2017-06-13 14:57:38 -07:00
Dmitriy Dumanskiy	acc07fac32	disabling leak detection micro benchmark Motivation: When I run Netty micro benchmarks I get many warnings like: WARNING: -Dio.netty.noResourceLeakDetection is deprecated. Use '-Dio.netty.leakDetection.level=simple' instead. Modification: -Dio.netty.noResourceLeakDetection replaced with -Dio.netty.leakDetection.level=disabled. Result: No warnings.	2017-06-09 18:03:54 +02:00
Nikolay Fedorovskikh	e4531918a3	Optimizations in NetUtil Motivation: IPv4/6 validation methods use allocations, which can be avoided. IPv4 parse method use StringTokenizer. Modifications: Rewriting IPv4/6 validation methods to avoid allocations. Rewriting IPv4 parse method without use StringTokenizer. Result: IPv4/6 validation and IPv4 parsing faster up to 2-10x.	2017-05-18 16:42:22 -07:00
Nikolay Fedorovskikh	0692bf1b6a	fix the typos	2017-04-20 04:56:09 +02:00
Norman Maurer	e482d933f7	Add 'io.netty.tryAllocateUninitializedArray' system property which allows to allocate byte[] without memset in Java9+ Motivation: Java9 added a new method to Unsafe which allows to allocate a byte[] without memset it. This can have a massive impact in allocation times when the byte[] is big. This change allows to enable this when using Java9 with the io.netty.tryAllocateUninitializedArray property when running Java9+. Please note that you will need to open up the jdk.internal.misc package via '--add-opens java.base/jdk.internal.misc=ALL-UNNAMED' as well. Modifications: Allow to allocate byte[] without memset on Java9+ Result: Better performance when allocate big heap buffers and using java9.	2017-04-19 11:45:39 +02:00
Ade Setyawan Sajim	016629fe3b	Replace system.out.println with InternalLoggerFactory Motivation: There are two files that still use `system.out.println` to log their status Modification: Replace `system.out.println` with a `debug` function inside an instance of `InternalLoggerFactory` Result: Introduce an instance of `InternalLoggerFactory` in class `AbstractMicrobenchmark.java` and `AbstractSharedExecutorMicrobenchmark.java`	2017-03-28 14:51:59 +02:00
Scott Mitchell	743d2d374c	SslHandler benchmark and SslEngine multiple packets benchmark Motivation: We currently don't have a benchmark which includes SslHandler. The SslEngine benchmarks also always include a single TLS packet when encoding/decoding. In practice when reading data from the network there may be multiple TLS packets present and we should expand the benchmarks to understand this use case. Modifications: - SslEngine benchmarks should include wrapping/unwrapping of multiple TLS packets - Introduce SslHandler benchmarks which can also account for wrapping/unwrapping of multiple TLS packets Result: SslHandler and SslEngine benchmarks are more comprehensive.	2017-03-06 08:42:39 -08:00
Scott Mitchell	f9001b9fc0	HTTP/2 move internal HPACK classes to the http2 package Motivation: The internal.hpack classes are no longer exposed in our public APIs and can be made package private in the http2 package. Modifications: - Make the hpack classes package private in the http2 package Result: Less APIs exposed as public.	2017-03-02 07:42:41 -08:00
Norman Maurer	461f9a1212	Allow to obtain informations of used direct and heap memory for ByteBufAllocator implementations Motivation: Often its useful for the user to be able to get some stats about the memory allocated via an allocator. Modifications: - Allow to obtain the used heap and direct memory for an allocator - Add test case Result: Fixes [#6341]	2017-03-01 18:53:43 +01:00
Norman Maurer	90a61046c7	Add benchmarks for UnpooledUnsafeNoCleanerDirectByteBuf vs UnpooledUnsafeDirectByteBuf Motivation: Issue [#6349] brought up the idea to not use UnpooledUnsafeNoCleanerDirectByteBuf by default. To decide what to do a benchmark is needed. Modifications: Add benchmarks for UnpooledUnsafeNoCleanerDirectByteBuf vs UnpooledUnsafeDirectB yteBuf Result: Better idea about impact of using UnpooledUnsafeNoCleanerDirectByteBuf.	2017-02-27 20:04:09 +01:00
Norman Maurer	d73477c7bd	Add benchmarks for SSLEngine implementations Motivation: As we provide our own SSLEngine implementation we should have benchmarks to compare it against JDK impl. Modifications: Add benchmarks for wrap / unwrap and handshake performance. Result: Benchmarks FTW.	2017-02-24 08:02:10 +01:00
Norman Maurer	a80d3411ee	Move all the microbenchmark code into one directory. Motivation: Allmost all our benchmarks are in src/main/java but a few are in src/test/java. We should make it consistent. Modifications: Move everything to src/main/java Result: Consistent code base.	2017-02-23 19:59:09 +01:00
Nikolay Fedorovskikh	0623c6c533	Fix javadoc issues Motivation: Invalid javadoc in project Modifications: Fix it Result: More correct javadoc	2017-02-22 07:31:07 +01:00
Nikolay Fedorovskikh	634a8afa53	Fix some warnings at generics usage Motivation: Existing warnings from java compiler Modifications: Add/fix type parameters Result: Less warnings	2017-02-22 07:29:59 +01:00
Kiril Menshikov	66b9be3a46	Allow to allign allocated Buffers Motivation: 64-byte alignment is recommended by the Intel performance guide (https://software.intel.com/en-us/articles/practical-intel-avx-optimization-on-2nd-generation-intel-core-processors) for data-structures over 64 bytes. Requiring padding to a multiple of 64 bytes allows for using SIMD instructions consistently in loops without additional conditional checks. This should allow for simpler and more efficient code. Modification: At the moment cache alignment must be setup manually. But probably it might be taken from the system. The original code was introduced by @normanmaurer https://github.com/netty/netty/pull/4726/files Result: Buffer alignment works better than miss-align cache.	2017-02-06 07:58:29 +01:00
Scott Mitchell	3482651e0c	HTTP/2 Non Active Stream RFC Corrections Motivation: codec-http2 couples the dependency tree state with the remainder of the stream state (Http2Stream). This makes implementing constraints where stream state and dependency tree state diverge in the RFC challenging. For example the RFC recommends retaining dependency tree state after a stream transitions to closed [1]. Dependency tree state can be exchanged on streams in IDLE. In practice clients may use stream IDs for the purpose of establishing QoS classes and therefore retaining this dependency tree state can be important to client perceived performance. It is difficult to limit the total amount of state we retain when stream state and dependency tree state is combined. Modifications: - Remove dependency tree, priority, and weight related items from public facing Http2Connection and Http2Stream APIs. This information is optional to track and depends on the flow controller implementation. - Move all dependency tree, priority, and weight related code from DefaultHttp2Connection to WeightedFairQueueByteDistributor. This is currently the only place which cares about priority. We can pull out the dependency tree related code in the future if it is generally useful to expose for other implementations. - DefaultHttp2Connection should explicitly limit the number of reserved streams now that IDLE streams are no longer created. Result: More compliant with the HTTP/2 RFC. Fixes https://github.com/netty/netty/issues/6206. [1] https://tools.ietf.org/html/rfc7540#section-5.3.4	2017-02-01 10:34:27 -08:00
Scott Mitchell	e13da218e9	HTTP/2 revert Http2FrameWriter throws API change Motivation: `2fd42cfc6b` fixed a bug related to encoding headers but it also introduced a throws statement onto the Http2FrameWriter methods which write headers. This throws statement makes the API more verbose and is not necessary because we can communicate the failure in the ChannelFuture that is returned by these methods. Modifications: - Remove throws from all Http2FrameWriter methods. Result: Http2FrameWriter APIs do not propagate checked exceptions.	2017-01-26 23:26:17 -08:00
Tim Brooks	3344cd21ac	Wrap operations requiring SocketPermission with doPrivileged blocks Motivation: Currently Netty does not wrap socket connect, bind, or accept operations in doPrivileged blocks. Nor does it wrap cases where a dns lookup might happen. This prevents an application utilizing the SecurityManager from isolating SocketPermissions to Netty. Modifications: I have introduced a class (SocketUtils) that wraps operations requiring SocketPermissions in doPrivileged blocks. Result: A user of Netty can grant SocketPermissions explicitly to the Netty jar, without granting it to the rest of their application.	2017-01-19 21:12:52 +01:00
Scott Mitchell	2fd42cfc6b	HTTP/2 Max Header List Size Bug Motivation: If the HPACK Decoder detects that SETTINGS_MAX_HEADER_LIST_SIZE has been violated it aborts immediately and sends a RST_STREAM frame for what ever stream caused the issue. Because HPACK is stateful this means that the HPACK state may become out of sync between peers, and the issue won't be detected until the next headers frame. We should make a best effort to keep processing to keep the HPACK state in sync with our peer, or completely close the connection. If the HPACK Encoder is configured to verify SETTINGS_MAX_HEADER_LIST_SIZE it checks the limit and encodes at the same time. This may result in modifying the HPACK local state but not sending the headers to the peer if SETTINGS_MAX_HEADER_LIST_SIZE is violated. This will also lead to an inconsistency in HPACK state that will be flagged at some later time. Modifications: - HPACK Decoder now has 2 levels of limits related to SETTINGS_MAX_HEADER_LIST_SIZE. The first will attempt to keep processing data and send a RST_STREAM after all data is processed. The second will send a GO_AWAY and close the entire connection. - When the HPACK Encoder enforces SETTINGS_MAX_HEADER_LIST_SIZE it should not modify the HPACK state until the size has been checked. - https://tools.ietf.org/html/rfc7540#section-6.5.2 states that the initial value of SETTINGS_MAX_HEADER_LIST_SIZE is "unlimited". We currently use 8k as a limit. We should honor the specifications default value so we don't unintentionally close a connection before the remote peer is aware of the local settings. - Remove unnecessary object allocation in DefaultHttp2HeadersDecoder and DefaultHttp2HeadersEncoder. Result: Fixes https://github.com/netty/netty/issues/6209.	2017-01-19 10:42:43 -08:00
Scott Mitchell	b701da8d1c	HTTP/2 HPACK Integer Encoding Bugs Motivation: - Decoder#decodeULE128 has a bounds bug and cannot decode Integer.MAX_VALUE - Decoder#decodeULE128 doesn't support values greater than can be represented with Java's int data type. This is a problem because there are cases that require at least unsigned 32 bits (max header table size). - Decoder#decodeULE128 treats overflowing the data type and invalid input the same. This can be misleading when inspecting the error that is thrown. - Encoder#encodeInteger doesn't support values greater than can be represented with Java's int data type. This is a problem because there are cases that require at least unsigned 32 bits (max header table size). Modifications: - Correct the above issues and add unit tests. Result: Fixes https://github.com/netty/netty/issues/6210.	2017-01-18 18:36:47 -08:00
Scott Mitchell	06e7627b5f	Read Only Http2Headers Motivation: A read only implementation of Http2Headers can allow for a more efficient usage of memory and more performant combined construction and iteration during serialization. Modifications: - Add a new ReadOnlyHttp2Headers class Result: ReadOnlyHttp2Headers exists and can be used for performance reasons when appropriate. ``` Benchmark (headerCount) Mode Cnt Score Error Units ReadOnlyHttp2HeadersBenchmark.defaultClientHeaders 1 avgt 20 96.156 ± 1.902 ns/op ReadOnlyHttp2HeadersBenchmark.defaultClientHeaders 5 avgt 20 157.925 ± 3.847 ns/op ReadOnlyHttp2HeadersBenchmark.defaultClientHeaders 10 avgt 20 236.257 ± 2.663 ns/op ReadOnlyHttp2HeadersBenchmark.defaultClientHeaders 20 avgt 20 392.861 ± 3.932 ns/op ReadOnlyHttp2HeadersBenchmark.defaultServerHeaders 1 avgt 20 48.759 ± 0.466 ns/op ReadOnlyHttp2HeadersBenchmark.defaultServerHeaders 5 avgt 20 113.122 ± 0.948 ns/op ReadOnlyHttp2HeadersBenchmark.defaultServerHeaders 10 avgt 20 192.698 ± 1.936 ns/op ReadOnlyHttp2HeadersBenchmark.defaultServerHeaders 20 avgt 20 348.974 ± 3.111 ns/op ReadOnlyHttp2HeadersBenchmark.defaultTrailers 1 avgt 20 35.694 ± 0.271 ns/op ReadOnlyHttp2HeadersBenchmark.defaultTrailers 5 avgt 20 98.993 ± 2.933 ns/op ReadOnlyHttp2HeadersBenchmark.defaultTrailers 10 avgt 20 171.035 ± 5.068 ns/op ReadOnlyHttp2HeadersBenchmark.defaultTrailers 20 avgt 20 330.621 ± 3.381 ns/op ReadOnlyHttp2HeadersBenchmark.readOnlyClientHeaders 1 avgt 20 40.573 ± 0.474 ns/op ReadOnlyHttp2HeadersBenchmark.readOnlyClientHeaders 5 avgt 20 56.516 ± 0.660 ns/op ReadOnlyHttp2HeadersBenchmark.readOnlyClientHeaders 10 avgt 20 76.890 ± 0.776 ns/op ReadOnlyHttp2HeadersBenchmark.readOnlyClientHeaders 20 avgt 20 117.531 ± 1.393 ns/op ReadOnlyHttp2HeadersBenchmark.readOnlyServerHeaders 1 avgt 20 29.206 ± 0.264 ns/op ReadOnlyHttp2HeadersBenchmark.readOnlyServerHeaders 5 avgt 20 44.587 ± 0.312 ns/op ReadOnlyHttp2HeadersBenchmark.readOnlyServerHeaders 10 avgt 20 64.458 ± 1.169 ns/op ReadOnlyHttp2HeadersBenchmark.readOnlyServerHeaders 20 avgt 20 107.179 ± 0.881 ns/op ReadOnlyHttp2HeadersBenchmark.readOnlyTrailers 1 avgt 20 21.563 ± 0.202 ns/op ReadOnlyHttp2HeadersBenchmark.readOnlyTrailers 5 avgt 20 41.019 ± 0.440 ns/op ReadOnlyHttp2HeadersBenchmark.readOnlyTrailers 10 avgt 20 64.053 ± 0.785 ns/op ReadOnlyHttp2HeadersBenchmark.readOnlyTrailers 20 avgt 20 113.737 ± 4.433 ns/op ```	2016-12-18 09:32:24 -08:00
Stephane Landelle	f755e58463	Clean up following #6016 Motivation: * DefaultHeaders from netty-codec has some duplicated logic for header date parsing * Several classes keep on using deprecated HttpHeaderDateFormat Modifications: * Move HttpHeaderDateFormatter to netty-codec and rename it into HeaderDateFormatter * Make DefaultHeaders use HeaderDateFormatter * Replace HttpHeaderDateFormat usage with HeaderDateFormatter Result: Faster and more consistent code	2016-11-21 12:35:40 -08:00
Stephane Landelle	edc4842309	Fix cookie date parsing, close #6016 Motivation: * RFC6265 defines its own parser which is different from RFC1123 (it accepts RFC1123 format but also other ones). Basically, it's very lax on delimiters, ignores day of week and timezone. Currently, ClientCookieDecoder uses HttpHeaderDateFormat underneath, and can't parse valid cookies such as Github ones whose expires attribute looks like "Sun, 27 Nov 2016 19:37:15 -0000" * ServerSideCookieEncoder currently uses HttpHeaderDateFormat underneath for formatting expires field, and it's slow. Modifications: * Introduce HttpHeaderDateFormatter that correctly implement RFC6265 * Use HttpHeaderDateFormatter in ClientCookieDecoder and ServerCookieEncoder * Deprecate HttpHeaderDateFormat Result: * Proper RFC6265 dates support * Faster ServerCookieEncoder and ClientCookieDecoder * Faster tool for handling headers such as "Expires" and "Date"	2016-11-18 11:22:21 +00:00
buchgr	c9918de37b	http2: Make MAX_HEADER_LIST_SIZE exceeded a stream error when encoding. Motivation: The SETTINGS_MAX_HEADER_LIST_SIZE limit, as enforced by the HPACK Encoder, should be a stream error and not apply to the whole connection. Modifications: Made the necessary changes for the exception to be of type StreamException. Result: A HEADERS frame exceeding the limit, only affects a specific stream.	2016-10-17 09:24:06 -07:00
Scott Mitchell	540c26bb56	HTTP/2 Ensure default settings are correctly enforced and interfaces clarified Motivation: The responsibility for retaining the settings values and enforcing the settings constraints is spread out in different areas of the code and may be initialized with different values than the default specified in the RFC. This should not be allowed by default and interfaces which are responsible for maintaining/enforcing settings state should clearly indicate the restrictions that they should only be set by the codec upon receipt of a SETTINGS ACK frame. Modifications: - Encoder, Decoder, and the Headers Encoder/Decoder no longer expose public constructors that allow the default settings to be changed. - Http2HeadersDecoder#maxHeaderSize() exists to provide some bound when headers/continuation frames are being aggregated. However this is roughly the same as SETTINGS_MAX_HEADER_LIST_SIZE (besides the 32 byte octet for each header field) and can be used instead of attempting to keep the two independent values in sync. - Encoding headers now enforces SETTINGS_MAX_HEADER_LIST_SIZE at the octect level. Previously the header encoder compared the number of header key/value pairs against SETTINGS_MAX_HEADER_LIST_SIZE instead of the number of octets (plus 32 bytes overhead). - DefaultHttp2ConnectionDecoder#onData calls shouldIgnoreHeadersOrDataFrame but may swallow exceptions from this method. This means a STREAM_RST frame may not be sent when it should for an unknown stream and thus violate the RFC. The exception is no longer swallowed. Result: Default settings state is enforced and interfaces related to settings state are clarified.	2016-10-07 13:00:45 -07:00
radai-rosenblatt	15ac6c4a1f	Clean-up unused imports Motivation: the build doesnt seem to enforce this, so they piled up Modifications: removed unused import lines Result: less unused imports Signed-off-by: radai-rosenblatt <radai.rosenblatt@gmail.com>	2016-09-30 09:08:50 +02:00
buchgr	67d3a78123	Reduce bytecode size of PlatformDependent0.equals. Motivation: PP0.equals has a bytecode size of 476. This is above the default inlining threshold of OpenJDK. Modifications: Slightly change the method to reduce the bytecode size by > 50% to 212 bytes. Result: The bytecode size is dramatically reduced, making the method a candidate for inlining. The relevant code in our application (gRPC) that relies heavily on equals comparisons, runs some ~10% faster. The Netty JMH benchmark shows no performance regression. Current 4.1: PlatformDependentBenchmark.unsafeBytesEqual 10 avgt 20 7.836 ± 0.113 ns/op PlatformDependentBenchmark.unsafeBytesEqual 50 avgt 20 16.889 ± 4.284 ns/op PlatformDependentBenchmark.unsafeBytesEqual 100 avgt 20 15.601 ± 0.296 ns/op PlatformDependentBenchmark.unsafeBytesEqual 1000 avgt 20 95.885 ± 1.992 ns/op PlatformDependentBenchmark.unsafeBytesEqual 10000 avgt 20 824.429 ± 12.792 ns/op PlatformDependentBenchmark.unsafeBytesEqual 100000 avgt 20 8907.035 ± 177.844 ns/op With this change: PlatformDependentBenchmark.unsafeBytesEqual 10 avgt 20 5.616 ± 0.102 ns/op PlatformDependentBenchmark.unsafeBytesEqual 50 avgt 20 17.896 ± 0.373 ns/op PlatformDependentBenchmark.unsafeBytesEqual 100 avgt 20 14.952 ± 0.210 ns/op PlatformDependentBenchmark.unsafeBytesEqual 1000 avgt 20 94.799 ± 1.604 ns/op PlatformDependentBenchmark.unsafeBytesEqual 10000 avgt 20 834.996 ± 17.484 ns/op PlatformDependentBenchmark.unsafeBytesEqual 100000 avgt 20 8757.421 ± 187.555 ns/op	2016-09-09 07:57:41 +02:00
Scott Mitchell	208893aac9	HTTP/2 Hpack Encoder Cleanup Motivation: The HTTP/2 HPACK Encoder class has some code which is only used for test purposes. This code can be removed to reduce complexity and member variable count. Modifications: - Remove test code and update unit tests - Other minor cleanup Result: Test code is removed from operational code.	2016-08-25 09:08:46 -07:00
Scott Mitchell	21e8d84b79	HTTP/2 Simplify Headers Decode Bounds Checking Motivation: The HPACK decoder keeps state so that the decode method can be called multiple times with successive header fragments. This decoder also requires that a method is called to signify the decoding is complete. At this point status is returned to indicate if the max header size has been violated. Netty always accumulates the header fragments into a single buffer before attempting to HPACK decode process and so keeping state and delaying notification that bounds have been exceeded is not necessary. Modifications: - HPACK Decoder#decode(..) now must be called with a complete header block - HPACK will terminate immediately if the maximum header length, or maximum number of headers is exceeded - Reduce member variables in the HPACK Decoder class because they can now live in the decode(..) method Result: HPACK bounds checking is done earlier and less class state is needed.	2016-08-12 17:12:53 -07:00
Scott Mitchell	6af56ffe76	HPACK Encoder headerFields improvements Motivation: HPACK Encoder has a data structure which is similar to a previous version of DefaultHeaders. Some of the same improvements can be made. Motivation: - Enforce the restriction that the Encoder's headerFields length must be a power of two so we can use masking instead of modulo - Use AsciiString.hashCode which already has optimizations instead of having yet another hash code algorithm in Encoder Result: Fixes https://github.com/netty/netty/issues/5357	2016-06-30 09:00:12 -07:00
Scott Mitchell	a7f7d9c8e0	Remove unsafe char[] access in PlatformDependent Motivation: PlatformDependent attempts to use reflection to get the underlying char[] (or byte[]) from String objects. This is fragile as if the String implementation does not utilize the full array, and instead uses a subset of the array, this optimization is invalid. OpenJDK6 and some earlier versions of OpenJDK7 String have the capability to use a subsection of the underlying char[]. Modifications: - PlatformDependent should not attempt to use the underlying array from String (or other data types) via reflection Result: PlatformDependent hash code generation for CharSequence does not depend upon specific JDK implementation details.	2016-06-30 08:58:28 -07:00
Scott Mitchell	70651cc58d	HpackUtil.equals performance improvement Motivation: PR #5355 modified interfaces to reduce GC related to the HPACK code. However this came with an anticipated performance regression related to HpackUtil.equals due to AsciiString's increase cost of charAt(..). We should mitigate this performance regression. Modifications: - Introduce an equals method in PlatformDependent which doesn't leak timing information and use this in HpcakUtil.equals Result: Fixes https://github.com/netty/netty/issues/5436	2016-06-27 14:37:39 -07:00
Guido Medina	f0a5ee068f	Update dependencies and plugins to latest possible versions. Motivation: It is good to have used dependencies and plugins up-to-date to fix any undiscovered bug fixed by the authors. Modification: Scanned dependencies and plugins and carefully updated one by one. Result: Dependencies and plugins are up-to-date.	2016-06-27 13:35:35 +02:00
Norman Maurer	b4d4c0034d	Optimize HPACK usage to align more with Netty types and remove heavy object creations. Related to [#3597 ] Motivations: The HPACK code was not really optimized and written with Netty types in mind. Because of this a lot of garbage was created due heavy object creation. This was first reported in [#3597] and https://github.com/grpc/grpc-java/issues/1872 . Modifications: - Directly use ByteBuf as input and output - Make use of ByteProcessor where possible - Use AsciiString as this is the only thing we need for our http2 usage Result: Less garbage and better usage of Netty apis.	2016-06-22 14:26:05 +02:00
Norman Maurer	e59d0e9efb	Introduce CodecOutputList to reduce overhead of encoder/decoder Motivation: `99dfc9ea79` introduced some code that will more frequently try to forward messages out of the list of decoded messages to reduce latency and memory footprint. Unfortunally this has the side-effect that RecycleableArrayList.clear() will be called more often and so introduce some overhead as ArrayList will null out the array on each call. Modifications: - Introduce a CodecOutputList which allows to not null out the array until we recycle it and also allows to access internal array with extra range checks. - Add benchmark that add elements to different List implementations and clear them Result: Less overhead when decode / encode messages. Benchmark (elements) Mode Cnt Score Error Units CodecOutputListBenchmark.arrayList 1 thrpt 20 24853764.609 ± 161582.376 ops/s CodecOutputListBenchmark.arrayList 4 thrpt 20 17310636.508 ± 930517.403 ops/s CodecOutputListBenchmark.codecOutList 1 thrpt 20 26670751.661 ± 587812.655 ops/s CodecOutputListBenchmark.codecOutList 4 thrpt 20 25166421.089 ± 166945.599 ops/s CodecOutputListBenchmark.recyclableArrayList 1 thrpt 20 24565992.626 ± 210017.290 ops/s CodecOutputListBenchmark.recyclableArrayList 4 thrpt 20 18477881.775 ± 157003.777 ops/s Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 246.748 sec - in io.netty.handler.codec.CodecOutputListBenchmark	2016-05-20 09:12:07 +02:00
Norman Maurer	68cd670eb9	Remove ChannelHandlerInvoker Motivation: We tried to provide the ability for the user to change the semantics of the threading-model by delegate the invoking of the ChannelHandler to the ChannelHandlerInvoker. Unfortunually this not really worked out quite well and resulted in just more complexity and splitting of code that belongs together. We should remove the ChannelHandlerInvoker again and just do the same as in 4.0 Modifications: Remove ChannelHandlerInvoker again and replace its usage in Http2MultiplexCodec Result: Easier code and less bad abstractions.	2016-05-17 11:14:00 +02:00
Scott Mitchell	c5faa142fb	KObjectHashMap probeNext improvement Motivation: KObjectHashMap.probeNext(..) usually involves 2 conditional statements and 2 aritmatic operations. This can be improved to have 0 conditional statements. Modifications: - Use bit masking to avoid conditional statements Result: Improved performance for KObjecthashMap.probeNext(..)	2016-05-13 10:04:26 -07:00
Scott Mitchell	d580245afc	DefaultHttp2Connection.close Reentrant Modification Motivation: The DefaultHttp2Conneciton.close method accounts for active streams being iterated and attempts to avoid reentrant modifications of the underlying stream map by using iterators to remove from the stream map. However there are a few issues: - While iterating over the stream map we don't prevent iterations over the active stream collection - Removing a single stream may actually remove > 1 streams due to closed non-leaf streams being preserved in the priority tree which may result in NPE Preserving closed non-leaf streams in the priority tree is no longer necessary with our current allocation algorithms, and so this feature (and related complexity) can be removed. Modifications: - DefaultHttp2Connection.close should prevent others from iterating over the active streams and reentrant modification scenarios which may result from this - DefaultHttp2Connection should not keep closed stream in the priority tree - Remove all associated code in DefaultHttp2RemoteFlowController which accounts for this case including the ReducedState object - This includes fixing writability changes which depended on ReducedState - Update unit tests Result: Fixes https://github.com/netty/netty/issues/5198	2016-05-09 14:16:30 -07:00
Norman Maurer	9229ed98e2	[#5088 ] Add annotation which marks packages/interfaces/classes as unstable Motivation: Some codecs should be considered unstable as these are relative new. For this purpose we should introduce an annotation which these codecs should us to be marked as unstable in terms of API. Modifications: - Add UnstableApi annotation and use it on codecs that are not stable - Move http2.hpack to http2.internal.hpack as it is internal. Result: Better document unstable APIs.	2016-05-09 15:16:35 +02:00
Xiaoyan Lin	3ad55eb839	Speed up the slow path of FastThreadLocal Motivation: The current slow path of FastThreadLocal is much slower than JDK ThreadLocal. See #4418 Modifications: - Add FastThreadLocalSlowPathBenchmark for the flow path of FastThreadLocal - Add final to speed up the slow path of FastThreadLocal Result: The slow path of FastThreadLocal is improved.	2016-03-23 11:36:16 +01:00
Scott Mitchell	f990f9983d	HTTP/2 Don't Flow Control Iniital Headers Motivation: Currently the initial headers for every stream is queued in the flow controller. Since the initial header frame may create streams the peer must receive these frames in the order in which they were created, or else this will be a protocol error and the connection will be closed. Tolerating the initial headers being queued would increase the complexity of the WeightedFairQueueByteDistributor and there is benefit of doing so is not clear. Modifications: - The initial headers will no longer be queued in the flow controllers Result: Fixes https://github.com/netty/netty/issues/4758	2016-02-01 13:37:43 -08:00
Eric Anderson	6dbb610f5b	Add ChannelHandlerContext.invoker() Motivation: Being able to access the invoker() is useful when adding additional handlers that should be running in the same thread. Since an application may be using a threading model unsupported by the default invoker, they can specify their own. Because of that, in a handler that auto-adds other handlers: // This is a good pattern ctx.pipeline().addBefore(ctx.invoker(), ctx.name(), null, newHandler); // This will generally work, but prevents using custom invoker. ctx.pipeline().addBefore(ctx.executor(), ctx.name(), null, newHandler); That's why I believe in commit `110745b0`, for the now-defunct 5.0 branch, when ChannelHandlerAppender was added the invoker() method was also necessary. There is a side-benefit to exposing the invoker: in certain advanced use-cases using the invoker for a particular handler is useful. Using the invoker you are able to invoke a _particular_ handler, from possibly a different thread yet still using standard exception processing. ChannelHandlerContext does part of that, but is unwieldy when trying to invoke a particular handler because it invokes the prev or next handler, not the one the context is for. A workaround is to use the next or prev context (respectively), but this breaks when the pipeline changes. This came up during writing the Http2MultiplexCodec which uses a separate child channel for each http/2 stream and wants to send messages from the child channel directly to the Http2MultiplexCodec handler that created it. Modifications: Add the invoker() method to ChannelHandlerContext. It was already being implemented by AbstractChannelHandlerContext. The two other implementations of ChannelHandlerContext needed minor tweaks. Result: Access to the invoker used for a particular handler, for either reusing for other handlers or for advanced use-cases. Fixes #4738	2016-01-22 14:04:35 +01:00
Xiaoyan Lin	475d901131	Fix errors reported by javadoc Motivation: Javadoc reports errors about invalid docs. Modifications: Fix some errors reported by javadoc. Result: A lot of javadoc errors are fixed by this patch.	2015-12-27 08:36:45 +01:00
zhangduo	f22ad97cf3	Remove PriorityStreamByteDistributor from http2 microbench Motivation: PriorityStreamByteDistributor has been removed but NoPriorityByteDistributionBenchmark in microbench still need it and causes compile error Modifications: Remove PriorityStreamByteDistributor from NoPriorityByteDistributionBenchmark Result: The compile error has been fixed	2015-12-22 09:20:32 +01:00
nmittler	ee56a4a5c6	Fixing broken HTTP/2 benchmark Motivation: The `NoPriorityByteDistibbutionBenchmark` was broken with a recent commit. Modifications: Fixed the benchmark to use the new HTTP2 handler builder. Result: It builds.	2015-12-18 09:52:12 -08:00
Scott Mitchell	904e70a4d4	HTTP/2 Weighted Fair Queue Byte Distributor Motivation: PriorityStreamByteDistributor uses a homegrown algorithm which distributes bytes to nodes in the priority tree. PriorityStreamByteDistributor has no concept of goodput which may result in poor utilization of network resources. PriorityStreamByteDistributor also has performance issues related to the tree traversal approach and number of nodes that must be visited. There also exists some more proven algorithms from the resource scheduling domain which PriorityStreamByteDistributor does not employ. Modifications: - Introduce a new ByteDistributor which uses elements from weighted fair queue schedulers Result: StreamByteDistributor which is sensitive to priority and uses a more familiar distribution concept. Fixes https://github.com/netty/netty/issues/4462	2015-12-17 11:17:02 -08:00
Trustin Lee	2202e8f967	Revamp the Http2ConnectionHandler builder API Related: #4572 Motivation: - A user might want to extend Http2ConnectionHandler and define his/her own static inner Builder class that extends Http2ConnectionHandler.BuilderBase. This introduces potential confusion because there's already Http2ConnectionHandler.Builder. Your IDE will warn about this name duplication as well. - BuilderBase exposes all setters with public modifier. A user's Builder might not want to expose them to enforce it to certain configuration. There's no way to hide them because it's public already and they are final. - BuilderBase.build(Http2ConnectionDecoder, Http2ConnectionEncoder) ignores most properties exposed by BuilderBase, such as validateHeaders, frameLogger and encoderEnforceMaxConcurrentStreams. If any build() method ignores the properties exposed by the builder, there's something wrong. - A user's Builder that extends BuilderBase might want to require more parameters in build(). There's no way to do that cleanly because build() is public and final already. Modifications: - Make BuilderBase and Builder top-level so that there's no duplicate name issue anymore. - Add AbstractHttp2ConnectionHandlerBuilder - Add Http2ConnectionHandlerBuilder - Add HttpToHttp2ConnectionHandlerBuilder - Make all builder methods in AbstractHttp2ConnectionHandlerBuilder protected so that a subclass can choose which methods to expose - Provide only a single build() method - Add connection() and codec() so that a user can still specify Http2Connection or Http2Connection(En\|De)coder explicitly - Implement proper state validation mechanism so that it is prevented to invoke conflicting setters Result: Less confusing yet flexible builder API	2015-12-17 14:08:13 +09:00
Scott Mitchell	641505a5d2	DefaultChannelConfig maxMessagesPerRead default not always set Motivation: ChannelMetadata has a field minMaxMessagesPerRead which can be confusing. There are also some cases where static instances are used and the default value for channel type is not being applied. Modifications: - use a default value which is set unconditionally to simplify - make sure static instances of MaxMessagesRecvByteBufAllocator are not used if the intention is that the default maxMessagesPerRead should be derived from the channel type. Result: Less confusing interfaces in ChannelMetadata and ChannelConfig. Default maxMessagesPerRead is correctly applied.	2015-11-25 15:14:07 -08:00
Scott Mitchell	49cd00da1c	Backport of benchmark broke build Motivation: `2a2059d976` was backported from master, and included an overriden method which does not exist in 4.1. Modifications: - Remove the invoker method from NoPriorityByteDistributionBenchmark Result: No more build error	2015-11-20 15:03:50 -08:00
nmittler	2a2059d976	Adding UniformStreamByteDistributor Motivation: The current priority algorithm can yield poor per-stream goodput when either the number of streams is high or the connection window is small. When all priorities are the same (i.e. priority is disabled), we should be able to do better. Modifications: Added a new UniformStreamByteDistributor that ignores priority entirely and manages a queue of streams. Each stream is allocated a minimum of 1KiB on each iteration. Result: Improved goodput when priority is not used.	2015-11-19 16:49:12 -08:00
nmittler	8accc52b03	Forking Twitter's hpack Motivation: The twitter hpack project does not have the support that it used to have. See discussion here: https://github.com/netty/netty/issues/4403. Modifications: Created a new module in Netty and copied the latest from twitter hpack master. Result: Netty no longer depends on twitter hpack.	2015-11-14 10:13:32 -08:00
Scott Mitchell	b4b791353d	AsciiString optimized hashCode Motivation: The AsciiString.hashCode() method can be optimized. This method is frequently used while to build the DefaultHeaders data structure. Modification: - Add a PlatformDependent hashCode algorithm which utilizes UNSAFE if available Result: AsciiString hashCode is faster.	2015-11-10 10:28:31 -08:00
Louis Ryan	6e108cb96a	Improve the performance of copying header sets when hashing and name validation are equivalent. Motivation: Headers and groups of headers are frequently copied and the current mechanism is slower than it needs to be. Modifications: Skip name validation and hash computation when they are not necessary. Fix emergent bug in CombinedHttpHeaders identified with better testing Fix memory leak in DefaultHttp2Headers when clearing Added benchmarks Result: Faster header copying and some collateral bug fixes	2015-11-07 08:53:10 -08:00
nmittler	6504d52b94	Add HTTP/2 local flow control option for auto refill Motivation: For many HTTP/2 applications (such as gRPC) it is necessary to autorefill the connection window in order to prevent application-level deadlocking. Consider an application with 2 streams, A and B. A receives a stream of messages and the application pops off one message at a time and makes a request on stream B. However, if receiving of data on A has caused the connection window to collapse, B will not be able to receive any data and the application will deadlock. The only way (currently) to get around this is 1) use multiple connections, or 2) manually refill the connection window. Both are undesirable and could needlessly complicate the application code. Modifications: Add a configuration option to DefaultHttp2LocalFlowController, allowing it to autorefill the connection window. Result: Applications can configure HTTP/2 to avoid inter-stream deadlocking.	2015-11-05 15:47:10 -08:00
Norman Maurer	1b2e43e70c	Correctly construct Executor in microbenchmarks. Motivation: We should allow our custom Executor to shutdown quickly. Modifications: Call super constructor which correct arguments. Result: Custom Executor can be shutdown quickly.	2015-11-03 09:46:05 +01:00
Scott Mitchell	19658e9cd8	HTTP/2 Headers Type Updates Motivation: The HTTP/2 RFC (https://tools.ietf.org/html/rfc7540#section-8.1.2) indicates that header names consist of ASCII characters. We currently use ByteString to represent HTTP/2 header names. The HTTP/2 RFC (https://tools.ietf.org/html/rfc7540#section-10.3) also eludes to header values inheriting the same validity characteristics as HTTP/1.x. Using AsciiString for the value type of HTTP/2 headers would allow for re-use of predefined HTTP/1.x values, and make comparisons more intuitive. The Headers<T> interface could also be expanded to allow for easier use of header types which do not have the same Key and Value type. Motivation: - Change Headers<T> to Headers<K, V> - Change Http2Headers<ByteString> to Http2Headers<CharSequence, CharSequence> - Remove ByteString. Having AsciiString extend ByteString complicates equality comparisons when the hash code algorithm is no longer shared. Result: Http2Header types are more representative of the HTTP/2 RFC, and relationship between HTTP/2 header name/values more directly relates to HTTP/1.x header names/values.	2015-10-30 15:29:44 -07:00
buchgr	c9364616c8	Fix performance regression in FastThreadLocal microbenchmark. Fixes #4402 Motivation: As reported in #4402, the FastThreadLocalBenchmark shows that the JDK ThreadLocal is actually faster than Netty's custom thread local implementation. I was looking forward to doing some deep digging, but got disappointed :(. Modifications: The microbenchmark was not using FastThreadLocalThreads and would thus always hit the slow path. I updated the JMH command line flags, so that FastThreadLocalThreads would be used. Result: FastThreadLocalBenchmark shows FastThreadLocal to be faster than JDK's ThreadLocal implementation, by about 56% in this particular benchmark. Run on OSX El Capitan with OpenJDK 1.8u60. Benchmark Mode Cnt Score Error Units FastThreadLocalBenchmark.fastThreadLocal thrpt 20 55452.027 ± 725.713 ops/s FastThreadLocalBenchmark.jdkThreadLocalGet thrpt 20 35481.888 ± 1471.647 ops/s	2015-10-29 21:40:13 +01:00
Norman Maurer	2aef4a504f	Minimize object allocation when calling AbstractByteBuf.toString(..., Charset) Motivation: Calling AbstractByteBuf.toString(..., Charset) is used quite frequently by users but produce a lot of GC. Modification: - Use a FastThreadLocal to store the CharBuffer that are needed for decoding. - Use internalNioBuffer(...) when possible Result: Less object creation / Less GC	2015-10-15 17:51:57 +02:00
Scott Mitchell	d4680c55d8	AsciiString contains utility methods Motivation: When dealing with case insensitive headers it can be useful to have a case insensitive contains method for CharSequence. Modifications: - Add containsCaseInsensative to AsciiString Result: More expressive utility method for case insensitive CharSequence.	2015-10-02 12:50:11 -07:00
Scott Mitchell	284e3702d8	Http2ConnectionHandler Builder instead of constructors Motivation: Using the builder pattern for Http2ConnectionHandler (and subclasses) would be advantageous for the following reasons: 1. Provides the consistent construction afforded by the builder pattern for 'optional' arguments. Users can specify these options 1 time in the builder and then re-use the builder after this. 2. Enforces that the Http2ConnectionHandler's internals (decoder Http2FrameListener) are initialized after construction. Modifications: - Add an extensible builder which can be used to build Http2ConnectionHandler objects - Update classes which inherit from Http2ConnectionHandler Result: It is easier to specify options and construct Http2ConnectionHandler objects.	2015-10-01 13:51:03 -07:00
Scott Mitchell	1485a87e25	Http2ConnectionHandler and Http2FrameListener cyclic dependency Motivation: It is often the case that implementations of Http2FrameListener will want to send responses when data is read. The Http2FrameListener needs access to the Http2ConnectionHandler (or the encoder contained within) to be able to send responses. However the Http2ConnectionHandler requires a Http2FrameListener instance to be passed in during construction time. This creates a cyclic dependency which can make it difficult to cleanly accomplish this relationship. Modifications: - Add Http2ConnectionDecoder.frameListener(..) method to set the frame listener. This will allow the listener to be set after construction. Result: Classes which inherit from Http2ConnectionHandler can more cleanly set the Http2FrameListener.	2015-09-30 15:41:15 -07:00
Scott Mitchell	0e9545e94d	Http2RemoteFlowController stream writibility listener Motivation: For implementations that want to manage flow control down to the stream level it is useful to be notified when stream writability changes. Modifications: - Add writabilityChanged to Http2RemoteFlowController.Listener - Add isWritable to Http2RemoteFlowController Result: The Http2RemoteFlowController provides notification when writability of a stream changes.	2015-09-28 13:47:24 -07:00
Scott Mitchell	ba6ce5449e	Headers Performance Boost and Interface Simplification Motivation: A degradation in performance has been observed from the 4.0 branch as documented in https://github.com/netty/netty/issues/3962. Modifications: - Simplify Headers class hierarchy. - Restore the DefaultHeaders to be based upon DefaultHttpHeaders from 4.0. - Make various other modifications that are causing hot spots. Result: Performance is now on par with 4.0.	2015-08-17 08:50:11 -07:00
Jakob Buchgraber	6fd0a0c55f	Faster and more memory efficient headers for HTTP, HTTP/2, STOMP and SPYD. Fixes #3600 Motivation: We noticed that the headers implementation in Netty for HTTP/2 uses quite a lot of memory and that also at least the performance of randomly accessing a header is quite poor. The main concern however was memory usage, as profiling has shown that a DefaultHttp2Headers not only use a lot of memory it also wastes a lot due to the underlying hashmaps having to be resized potentially several times as new headers are being inserted. This is tracked as issue #3600. Modifications: We redesigned the DefaultHeaders to simply take a Map object in its constructor and reimplemented the class using only the Map primitives. That way the implementation is very concise and hopefully easy to understand and it allows each concrete headers implementation to provide its own map or to even use a different headers implementation for processing requests and writing responses i.e. incoming headers need to provide fast random access while outgoing headers need fast insertion and fast iteration. The new implementation can support this with hardly any code changes. It also comes with the advantage that if the Netty project decides to add a third party collections library as a dependency, one can simply plug in one of those very fast and memory efficient map implementations and get faster and smaller headers for free. For now, we are using the JDK's TreeMap for HTTP and HTTP/2 default headers. Result: - Significantly fewer lines of code in the implementation. While the total commit is still roughly 400 lines less, the actual implementation is a lot less. I just added some more tests and microbenchmarks. - Overall performance is up. The current implementation should be significantly faster for insertion and retrieval. However, it is slower when it comes to iteration. There is simply no way a TreeMap can have the same iteration performance as a linked list (as used in the current headers implementation). That's totally fine though, because when looking at the benchmark results @ejona86 pointed out that the performance of the headers is completely dominated by insertion, that is insertion is so significantly faster in the new implementation that it does make up for several times the iteration speed. You can't iterate what you haven't inserted. I am demonstrating that in this spreadsheet [1]. (Actually, iteration performance is only down for HTTP, it's significantly improved for HTTP/2). - Memory is down. The implementation with TreeMap uses on avg ~30% less memory. It also does not produce any garbage while being resized. In load tests for GRPC we have seen a memory reduction of up to 1.2KB per RPC. I summarized the memory improvements in this spreadsheet [1]. The data was generated by [2] using JOL. - While it was my original intend to only improve the memory usage for HTTP/2, it should be similarly improved for HTTP, SPDY and STOMP as they all share a common implementation. [1] https://docs.google.com/spreadsheets/d/1ck3RQklyzEcCLlyJoqDXPCWRGVUuS-ArZf0etSXLVDQ/edit#gid=0 [2] https://gist.github.com/buchgr/4458a8bdb51dd58c82b4	2015-08-04 17:12:24 -07:00
Scott Mitchell	a7713069a1	HttpObjectDecoder performance improvements Motivation: The HttpObjectDecoder is on the hot code path for the http codec. There are a few hot methods which can be modified to improve performance. Modifications: - Modify AppendableCharSequence to provide unsafe methods which don't need to re-check bounds for every call. - Update HttpObjectDecoder methods to take advantage of new AppendableCharSequence methods. Result: Peformance boost for decoding http objects.	2015-07-29 23:26:26 -07:00
Scott Mitchell	9747ffe5fc	HTTP/2 Flow Controller should use Channel.isWritable() Motivation: See #3783 Modifications: - The DefaultHttp2RemoteFlowController should use Channel.isWritable() before attempting to do any write operations. - The Flow controller methods should no longer take ChannelHandlerContext. The concept of flow control is tied to a connection and we do not support 1 flow controller keeping track of multiple ChannelHandlerContext. Result: Writes are delayed until isWritable() is true. Flow controller interface methods are more clear as to ChannelHandlerContext restrictions.	2015-07-16 14:38:48 -07:00
Louis Ryan	05ce33f5ca	Make the flow-controllers write fewer, fatter frames to improve throughput. Motivation: Coalescing many small writes into a larger DATA frame reduces framing overheads on the wire and reduces the number of calls to Http2FrameListeners on the remote side. Delaying the write of WINDOW_UPDATE until flush allows for more consumed bytes to be returned as the aggregate of consumed bytes is returned and not the amount consumed when the threshold was crossed. Modifications: - Remote flow controller no longer immediately writes bytes when a flow-controlled payload is enqueued. Sequential data payloads are now merged into a single CompositeByteBuf which are written when 'writePendingBytes' is called. - Listener added to remote flow-controller which observes written bytes per stream. - Local flow-controller no longer immediately writes WINDOW_UPDATE when the ratio threshold is crossed. Now an explicit call to 'writeWindowUpdates' triggers the WINDOW_UPDATE for all streams who's ratio is exceeded at that time. This results in fewer window updates being sent and more bytes being returned. - Http2ConnectionHandler.flush triggers 'writeWindowUpdates' on the local flow-controller followed by 'writePendingBytes' on the remote flow-controller so WINDOW_UPDATES preceed DATA frames on the wire. Result: - Better throughput for writing many small DATA chunks followed by a flush, saving 9-bytes per coalesced frame. - Fewer WINDOW_UPDATES being written and more flow-control bytes returned to remote side more quickly, thereby improving throughput.	2015-06-19 15:20:31 -07:00
Louis Ryan	a3cea186ce	Have Http2LocalFlowController.consumeBytes indicate whether a WINDOW_UPDATE was written	2015-05-04 13:22:18 -07:00
Scott Mitchell	f812180c2d	ByteString arrayOffset method Motivation: The ByteString class currently assumes the underlying array will be a complete representation of data. This is limiting as it does not allow a subsection of another array to be used. The forces copy operations to take place to compensate for the lack of API support. Modifications: - add arrayOffset method to ByteString - modify all ByteString and AsciiString methods that loop over or index into the underlying array to use this offset - update all code that uses ByteString.array to ensure it accounts for the offset - add unit tests to test the implementation respects the offset Result: ByteString and AsciiString can represent a sub region of a byte[].	2015-04-24 18:54:01 -07:00
nmittler	70a2608325	Optimizing user-defined stream properties. Motivation: Streams currently maintain a hash map of user-defined properties, which has been shown to add significant memory overhead as well as being a performance bottleneck for lookup of frequently used properties. Modifications: Modifying the connection/stream to use an array as the storage of user-defined properties, indexed by the class that identifies the index into the array where the property is stored. Result: Stream processing performance should be improved.	2015-04-23 12:41:14 -07:00
Scott Mitchell	b426fb1618	Compile error introduced in `ee9233d` Motivation: Commit `ee9233d` introduced a compile error in microbench. Modifications: Fix compile error. Result: Code now builds.	2015-04-22 16:23:39 -07:00
Scott Mitchell	541137cc93	HTTP/2 Flow Controller interface updates Motivation: Flow control is a required part of the HTTP/2 specification but it is currently structured more like an optional item. It must be accessed through the property map which is time consuming and does not represent its required nature. This access pattern does not give any insight into flow control outside of the codec (or flow controller implementation). Modifications: 1. Create a read only public interface for LocalFlowState and RemoteFlowState. 2. Add a LocalFlowState localFlowState(); and RemoteFlowState remoteFlowState(); to Http2Stream. Result: Flow control is not part of the Http2Stream interface. This clarifies its responsibility and logical relationship to other interfaces. The flow controller no longer must be acquired though a map lookup.	2015-04-20 20:02:02 -07:00
Scott Mitchell	2b8104c852	HTTP/2 Priority Tree Benchmark Motivation: There is no benchmark to measure the priority tree implementation performance. Modifications: Introduce a new benchmark which will populate the priority tree, and then shuffle parent/child links around. Result: A simple benchmark to get a baseline for the HTTP/2 codec's priority tree implementation.	2015-04-17 10:14:13 -07:00
Louis Ryan	f3fb77f4bc	Have microbenchmarks produce a deployable artifact. Fix some minor miscellaneous issues. Motivation: Allows for running benchmarks from built jars which is useful in development environments that only take released artifacts. Modifications: Move benchmarks into 'main' from 'test' Add @State annotations to benchmarks that are missing them Fix timing issue grabbing context during channel initialization Result: Users can run benchmarks more easily.	2015-04-17 10:04:26 -07:00

1 2 3

129 Commits