Commit Graph

77 Commits

Author SHA1 Message Date
Scott Mitchell
641505a5d2 DefaultChannelConfig maxMessagesPerRead default not always set
Motivation:
ChannelMetadata has a field minMaxMessagesPerRead which can be confusing. There are also some cases where static instances are used and the default value for channel type is not being applied.

Modifications:
- use a default value which is set unconditionally to simplify
- make sure static instances of MaxMessagesRecvByteBufAllocator are not used if the intention is that the default maxMessagesPerRead should be derived from the channel type.

Result:
Less confusing interfaces in ChannelMetadata and ChannelConfig. Default maxMessagesPerRead is correctly applied.
2015-11-25 15:14:07 -08:00
Scott Mitchell
49cd00da1c Backport of benchmark broke build
Motivation:
2a2059d976 was backported from master, and included an overriden method which does not exist in 4.1.

Modifications:
- Remove the invoker method from NoPriorityByteDistributionBenchmark

Result:
No more build error
2015-11-20 15:03:50 -08:00
nmittler
2a2059d976 Adding UniformStreamByteDistributor
Motivation:

The current priority algorithm can yield poor per-stream goodput when either the number of streams is high or the connection window is small. When all priorities are the same (i.e. priority is disabled), we should be able to do better.

Modifications:

Added a new UniformStreamByteDistributor that ignores priority entirely and manages a queue of streams.  Each stream is allocated a minimum of 1KiB on each iteration.

Result:

Improved goodput when priority is not used.
2015-11-19 16:49:12 -08:00
nmittler
8accc52b03 Forking Twitter's hpack
Motivation:

The twitter hpack project does not have the support that it used to have.  See discussion here: https://github.com/netty/netty/issues/4403.

Modifications:

Created a new module in Netty and copied the latest from twitter hpack master.

Result:

Netty no longer depends on twitter hpack.
2015-11-14 10:13:32 -08:00
Scott Mitchell
b4b791353d AsciiString optimized hashCode
Motivation:
The AsciiString.hashCode() method can be optimized. This method is frequently used while to build the DefaultHeaders data structure.

Modification:
- Add a PlatformDependent hashCode algorithm which utilizes UNSAFE if available

Result:
AsciiString hashCode is faster.
2015-11-10 10:28:31 -08:00
Louis Ryan
6e108cb96a Improve the performance of copying header sets when hashing and name validation are equivalent.
Motivation:
Headers and groups of headers are frequently copied and the current mechanism is slower than it needs to be.

Modifications:
Skip name validation and hash computation when they are not necessary.
Fix emergent bug in CombinedHttpHeaders identified with better testing
Fix memory leak in DefaultHttp2Headers when clearing
Added benchmarks

Result:
Faster header copying and some collateral bug fixes
2015-11-07 08:53:10 -08:00
nmittler
6504d52b94 Add HTTP/2 local flow control option for auto refill
Motivation:

For many HTTP/2 applications (such as gRPC) it is necessary to autorefill the connection window in order to prevent application-level deadlocking.

Consider an application with 2 streams, A and B.  A receives a stream of messages and the application pops off one message at a time and makes a request on stream B. However, if receiving of data on A has caused the connection window to collapse, B will not be able to receive any data and the application will deadlock.  The only way (currently) to get around this is 1) use multiple connections, or 2) manually refill the connection window.  Both are undesirable and could needlessly complicate the application code.

Modifications:

Add a configuration option to DefaultHttp2LocalFlowController, allowing it to autorefill the connection window.

Result:

Applications can configure HTTP/2 to avoid inter-stream deadlocking.
2015-11-05 15:47:10 -08:00
Norman Maurer
1b2e43e70c Correctly construct Executor in microbenchmarks.
Motivation:

We should allow our custom Executor to shutdown quickly.

Modifications:

Call super constructor which correct arguments.

Result:

Custom Executor can be shutdown quickly.
2015-11-03 09:46:05 +01:00
Scott Mitchell
19658e9cd8 HTTP/2 Headers Type Updates
Motivation:
The HTTP/2 RFC (https://tools.ietf.org/html/rfc7540#section-8.1.2) indicates that header names consist of ASCII characters. We currently use ByteString to represent HTTP/2 header names. The HTTP/2 RFC (https://tools.ietf.org/html/rfc7540#section-10.3) also eludes to header values inheriting the same validity characteristics as HTTP/1.x. Using AsciiString for the value type of HTTP/2 headers would allow for re-use of predefined HTTP/1.x values, and make comparisons more intuitive. The Headers<T> interface could also be expanded to allow for easier use of header types which do not have the same Key and Value type.

Motivation:
- Change Headers<T> to Headers<K, V>
- Change Http2Headers<ByteString> to Http2Headers<CharSequence, CharSequence>
- Remove ByteString. Having AsciiString extend ByteString complicates equality comparisons when the hash code algorithm is no longer shared.

Result:
Http2Header types are more representative of the HTTP/2 RFC, and relationship between HTTP/2 header name/values more directly relates to HTTP/1.x header names/values.
2015-10-30 15:29:44 -07:00
buchgr
c9364616c8 Fix performance regression in FastThreadLocal microbenchmark. Fixes #4402
Motivation:

As reported in #4402, the FastThreadLocalBenchmark shows that the JDK ThreadLocal
is actually faster than Netty's custom thread local implementation.

I was looking forward to doing some deep digging, but got disappointed :(.

Modifications:

The microbenchmark was not using FastThreadLocalThreads and would thus always hit the slow path.
I updated the JMH command line flags, so that FastThreadLocalThreads would be used.

Result:

FastThreadLocalBenchmark shows FastThreadLocal to be faster than JDK's ThreadLocal implementation,
by about 56% in this particular benchmark. Run on OSX El Capitan with OpenJDK 1.8u60.

Benchmark                                    Mode  Cnt      Score      Error  Units
FastThreadLocalBenchmark.fastThreadLocal    thrpt   20  55452.027 ±  725.713  ops/s
FastThreadLocalBenchmark.jdkThreadLocalGet  thrpt   20  35481.888 ± 1471.647  ops/s
2015-10-29 21:40:13 +01:00
Norman Maurer
2aef4a504f Minimize object allocation when calling AbstractByteBuf.toString(..., Charset)
Motivation:

Calling AbstractByteBuf.toString(..., Charset) is used quite frequently by users but produce a lot of GC.

Modification:

- Use a FastThreadLocal to store the CharBuffer that are needed for decoding.
- Use internalNioBuffer(...) when possible

Result:

Less object creation / Less GC
2015-10-15 17:51:57 +02:00
Scott Mitchell
d4680c55d8 AsciiString contains utility methods
Motivation:
When dealing with case insensitive headers it can be useful to have a case insensitive contains method for CharSequence.

Modifications:
- Add containsCaseInsensative to AsciiString

Result:
More expressive utility method for case insensitive CharSequence.
2015-10-02 12:50:11 -07:00
Scott Mitchell
284e3702d8 Http2ConnectionHandler Builder instead of constructors
Motivation:
Using the builder pattern for Http2ConnectionHandler (and subclasses) would be advantageous for the following reasons:
1. Provides the consistent construction afforded by the builder pattern for 'optional' arguments. Users can specify these options 1 time in the builder and then re-use the builder after this.
2. Enforces that the Http2ConnectionHandler's internals (decoder Http2FrameListener) are initialized after construction.

Modifications:
- Add an extensible builder which can be used to build Http2ConnectionHandler objects
- Update classes which inherit from Http2ConnectionHandler

Result:
It is easier to specify options and construct Http2ConnectionHandler objects.
2015-10-01 13:51:03 -07:00
Scott Mitchell
1485a87e25 Http2ConnectionHandler and Http2FrameListener cyclic dependency
Motivation:
It is often the case that implementations of Http2FrameListener will want to send responses when data is read. The Http2FrameListener needs access to the Http2ConnectionHandler (or the encoder contained within) to be able to send responses. However the Http2ConnectionHandler requires a Http2FrameListener instance to be passed in during construction time. This creates a cyclic dependency which can make it difficult to cleanly accomplish this relationship.

Modifications:
- Add Http2ConnectionDecoder.frameListener(..) method to set the frame listener. This will allow the listener to be set after construction.

Result:
Classes which inherit from Http2ConnectionHandler can more cleanly set the Http2FrameListener.
2015-09-30 15:41:15 -07:00
Scott Mitchell
0e9545e94d Http2RemoteFlowController stream writibility listener
Motivation:
For implementations that want to manage flow control down to the stream level it is useful to be notified when stream writability changes.

Modifications:
- Add writabilityChanged to Http2RemoteFlowController.Listener
- Add isWritable to Http2RemoteFlowController

Result:
The Http2RemoteFlowController provides notification when writability of a stream changes.
2015-09-28 13:47:24 -07:00
Scott Mitchell
ba6ce5449e Headers Performance Boost and Interface Simplification
Motivation:
A degradation in performance has been observed from the 4.0 branch as documented in https://github.com/netty/netty/issues/3962.

Modifications:
- Simplify Headers class hierarchy.
- Restore the DefaultHeaders to be based upon DefaultHttpHeaders from 4.0.
- Make various other modifications that are causing hot spots.

Result:
Performance is now on par with 4.0.
2015-08-17 08:50:11 -07:00
Jakob Buchgraber
6fd0a0c55f Faster and more memory efficient headers for HTTP, HTTP/2, STOMP and SPYD. Fixes #3600
Motivation:

We noticed that the headers implementation in Netty for HTTP/2 uses quite a lot of memory
and that also at least the performance of randomly accessing a header is quite poor. The main
concern however was memory usage, as profiling has shown that a DefaultHttp2Headers
not only use a lot of memory it also wastes a lot due to the underlying hashmaps having
to be resized potentially several times as new headers are being inserted.

This is tracked as issue #3600.

Modifications:
We redesigned the DefaultHeaders to simply take a Map object in its constructor and
reimplemented the class using only the Map primitives. That way the implementation
is very concise and hopefully easy to understand and it allows each concrete headers
implementation to provide its own map or to even use a different headers implementation
for processing requests and writing responses i.e. incoming headers need to provide
fast random access while outgoing headers need fast insertion and fast iteration. The
new implementation can support this with hardly any code changes. It also comes
with the advantage that if the Netty project decides to add a third party collections library
as a dependency, one can simply plug in one of those very fast and memory efficient map
implementations and get faster and smaller headers for free.

For now, we are using the JDK's TreeMap for HTTP and HTTP/2 default headers.

Result:

- Significantly fewer lines of code in the implementation. While the total commit is still
  roughly 400 lines less, the actual implementation is a lot less. I just added some more
  tests and microbenchmarks.

- Overall performance is up. The current implementation should be significantly faster
  for insertion and retrieval. However, it is slower when it comes to iteration. There is simply
  no way a TreeMap can have the same iteration performance as a linked list (as used in the
  current headers implementation). That's totally fine though, because when looking at the
  benchmark results @ejona86 pointed out that the performance of the headers is completely
  dominated by insertion, that is insertion is so significantly faster in the new implementation
  that it does make up for several times the iteration speed. You can't iterate what you haven't
  inserted. I am demonstrating that in this spreadsheet [1]. (Actually, iteration performance is
  only down for HTTP, it's significantly improved for HTTP/2).

- Memory is down. The implementation with TreeMap uses on avg ~30% less memory. It also does not
  produce any garbage while being resized. In load tests for GRPC we have seen a memory reduction
  of up to 1.2KB per RPC. I summarized the memory improvements in this spreadsheet [1]. The data
  was generated by [2] using JOL.

- While it was my original intend to only improve the memory usage for HTTP/2, it should be similarly
  improved for HTTP, SPDY and STOMP as they all share a common implementation.

[1] https://docs.google.com/spreadsheets/d/1ck3RQklyzEcCLlyJoqDXPCWRGVUuS-ArZf0etSXLVDQ/edit#gid=0
[2] https://gist.github.com/buchgr/4458a8bdb51dd58c82b4
2015-08-04 17:12:24 -07:00
Scott Mitchell
a7713069a1 HttpObjectDecoder performance improvements
Motivation:
The HttpObjectDecoder is on the hot code path for the http codec. There are a few hot methods which can be modified to improve performance.

Modifications:
- Modify AppendableCharSequence to provide unsafe methods which don't need to re-check bounds for every call.
- Update HttpObjectDecoder methods to take advantage of new AppendableCharSequence methods.

Result:
Peformance boost for decoding http objects.
2015-07-29 23:26:26 -07:00
Scott Mitchell
9747ffe5fc HTTP/2 Flow Controller should use Channel.isWritable()
Motivation:
See #3783

Modifications:
- The DefaultHttp2RemoteFlowController should use Channel.isWritable() before attempting to do any write operations.
- The Flow controller methods should no longer take ChannelHandlerContext. The concept of flow control is tied to a connection and we do not support 1 flow controller keeping track of multiple ChannelHandlerContext.

Result:
Writes are delayed until isWritable() is true. Flow controller interface methods are more clear as to ChannelHandlerContext restrictions.
2015-07-16 14:38:48 -07:00
Louis Ryan
05ce33f5ca Make the flow-controllers write fewer, fatter frames to improve throughput.
Motivation:

Coalescing many small writes into a larger DATA frame reduces framing overheads on the wire and reduces the number of calls to Http2FrameListeners on the remote side.
Delaying the write of WINDOW_UPDATE until flush allows for more consumed bytes to be returned as the aggregate of consumed bytes is returned and not the amount consumed when the threshold was crossed.

Modifications:
- Remote flow controller no longer immediately writes bytes when a flow-controlled payload is enqueued. Sequential data payloads are now merged into a single CompositeByteBuf which are written when 'writePendingBytes' is called.
- Listener added to remote flow-controller which observes written bytes per stream.
- Local flow-controller no longer immediately writes WINDOW_UPDATE when the ratio threshold is crossed. Now an explicit call to 'writeWindowUpdates' triggers the WINDOW_UPDATE for all streams who's ratio is exceeded at that time. This results in
  fewer window updates being sent and more bytes being returned.
- Http2ConnectionHandler.flush triggers 'writeWindowUpdates' on the local flow-controller followed by 'writePendingBytes' on the remote flow-controller so WINDOW_UPDATES preceed DATA frames on the wire.

Result:
- Better throughput for writing many small DATA chunks followed by a flush, saving 9-bytes per coalesced frame.
- Fewer WINDOW_UPDATES being written and more flow-control bytes returned to remote side more quickly, thereby improving throughput.
2015-06-19 15:20:31 -07:00
Louis Ryan
a3cea186ce Have Http2LocalFlowController.consumeBytes indicate whether a WINDOW_UPDATE was written 2015-05-04 13:22:18 -07:00
Scott Mitchell
f812180c2d ByteString arrayOffset method
Motivation:
The ByteString class currently assumes the underlying array will be a complete representation of data. This is limiting as it does not allow a subsection of another array to be used. The forces copy operations to take place to compensate for the lack of API support.

Modifications:
- add arrayOffset method to ByteString
- modify all ByteString and AsciiString methods that loop over or index into the underlying array to use this offset
- update all code that uses ByteString.array to ensure it accounts for the offset
- add unit tests to test the implementation respects the offset

Result:
ByteString and AsciiString can represent a sub region of a byte[].
2015-04-24 18:54:01 -07:00
nmittler
70a2608325 Optimizing user-defined stream properties.
Motivation:

Streams currently maintain a hash map of user-defined properties, which has been shown to add significant memory overhead as well as being a performance bottleneck for lookup of frequently used properties.

Modifications:

Modifying the connection/stream to use an array as the storage of user-defined properties, indexed by the class that identifies the index into the array where the property is stored.

Result:

Stream processing performance should be improved.
2015-04-23 12:41:14 -07:00
Scott Mitchell
b426fb1618 Compile error introduced in ee9233d
Motivation:
Commit ee9233d introduced a compile error in microbench.

Modifications:
Fix compile error.

Result:
Code now builds.
2015-04-22 16:23:39 -07:00
Scott Mitchell
541137cc93 HTTP/2 Flow Controller interface updates
Motivation:
Flow control is a required part of the HTTP/2 specification but it is currently structured more like an optional item. It must be accessed through the property map which is time consuming and does not represent its required nature. This access pattern does not give any insight into flow control outside of the codec (or flow controller implementation).

Modifications:
1. Create a read only public interface for LocalFlowState and RemoteFlowState.
2. Add a LocalFlowState localFlowState(); and RemoteFlowState remoteFlowState(); to Http2Stream.

Result:
Flow control is not part of the Http2Stream interface. This clarifies its responsibility and logical relationship to other interfaces. The flow controller no longer must be acquired though a map lookup.
2015-04-20 20:02:02 -07:00
Scott Mitchell
2b8104c852 HTTP/2 Priority Tree Benchmark
Motivation:
There is no benchmark to measure the priority tree implementation performance.

Modifications:
Introduce a new benchmark which will populate the priority tree, and then shuffle parent/child links around.

Result:
A simple benchmark to get a baseline for the HTTP/2 codec's priority tree implementation.
2015-04-17 10:14:13 -07:00
Louis Ryan
f3fb77f4bc Have microbenchmarks produce a deployable artifact. Fix some minor miscellaneous issues.
Motivation:
Allows for running benchmarks from built jars which is useful in development environments that only take released artifacts.

Modifications:
Move benchmarks into 'main' from 'test'
Add @State annotations to benchmarks that are missing them
Fix timing issue grabbing context during channel initialization

Result:
Users can run benchmarks more easily.
2015-04-17 10:04:26 -07:00