Motivation:
The requirement for the masking of frames and for checks of correct
masking in the websocket specifiation have a large impact on performance.
While it is mandatory for browsers to use masking there are other
applications (like IPC protocols) that want to user websocket framing and proxy-traversing
characteristics without the overhead of masking. The websocket standard
also mentions that the requirement for mask verification on server side
might be dropped in future.
Modifications:
Added an optional parameter allowMaskMismatch for the websocket decoder
that allows a server to also accept unmasked frames (and clients to accept
masked frames).
Allowed to set this option through the websocket handshaker
constructors as well as the websocket client and server handlers.
The public API for existing components doesn't change, it will be
forwarded to functions which implicetly set masking as required in the
specification.
For websocket clients an additional parameter is added that allows to
disable the masking of frames that are sent by the client.
Result:
This update gives netty users the ability to create and use completely
unmasked websocket connections in addition to the normal masked channels
that the standard describes.
Motivation:
At the moment the whole HTTP header must be parsed at once which can lead to multiple parsing of the same bytes. We can do better here and allow to parse it in multiple steps.
Modifications:
- Not parse headers multiple times
- Simplify the code
- Eliminate uncessary String[] creations
- Use readSlice(...).retain() when possible.
Result:
Performance improvements as shown in the included benchmark below.
Before change:
[nmaurer@xxx]~% ./wrk-benchmark
Running 2m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 21.55ms 15.10ms 245.02ms 90.26%
Req/Sec 196.33k 30.17k 297.29k 76.03%
373954750 requests in 2.00m, 50.15GB read
Requests/sec: 3116466.08
Transfer/sec: 427.98MB
After change:
[nmaurer@xxx]~% ./wrk-benchmark
Running 2m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 20.91ms 36.79ms 1.26s 98.24%
Req/Sec 206.67k 21.69k 243.62k 94.96%
393071191 requests in 2.00m, 52.71GB read
Requests/sec: 3275971.50
Transfer/sec: 449.89MB
Motivation:
The 4.1.0-Beta3 implementation of HttpObjectAggregator.handleOversizedMessage closes the
connection if the client sent oversized chunked data with no Expect:
100-continue header. This causes a broken pipe or "connection reset by
peer" error in some clients (tested on Firefox 31 OS X 10.9.5,
async-http-client 1.8.14).
This part of the HTTP 1.1 spec (below) seems to say that in this scenario the connection
should not be closed (unless the intention is to be very strict about
how data should be sent).
http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html
"If an origin server receives a request that does not include an
Expect request-header field with the "100-continue" expectation,
the request includes a request body, and the server responds
with a final status code before reading the entire request body
from the transport connection, then the server SHOULD NOT close
the transport connection until it has read the entire request,
or until the client closes the connection. Otherwise, the client
might not reliably receive the response message. However, this
requirement is not be construed as preventing a server from
defending itself against denial-of-service attacks, or from
badly broken client implementations."
Modifications:
Change HttpObjectAggregator.handleOversizedMessage to close the
connection only if keep-alive is off and Expect: 100-continue is
missing. Update test to reflect the change.
Result:
Broken pipe and connection reset errors on the client are avoided when
oversized data is sent.
Motivation:
- There are still various inspector warnings to fix.
- ValueConverter.convert() methods need to end with the type name like
other methods in Headers, such as setInt() and addInt(), for more
consistency
Modifications:
- Fix all inspector warnings
- Rename ValueConverter.convert() to convert<type>()
Result:
- Cleaner code
- Consistency
Related: #2983
Motivation:
It is a well known idiom to write an empty buffer and add a listener to
its future to close a channel when the last byte has been written out:
ChannelFuture f = channel.writeAndFlush(Unpooled.EMPTY_BUFFER);
f.addListener(ChannelFutureListener.CLOSE);
When HttpObjectEncoder is in the pipeline, this still works, but it
silently raises an IllegalStateException, because HttpObjectEncoder does
not allow writing a ByteBuf when it is expecting an HttpMessage.
Modifications:
- Handle an empty ByteBuf specially in HttpObjectEncoder, so that
writing an empty buffer does not fail even if the pipeline contains an
HttpObjectEncoder
- Add a test
Result:
An exception is not triggered anymore by HttpObjectEncoder, when a user
attempts to write an empty buffer.
Motivation:
Currently, when the CorsHandler processes a preflight request, or
respondes with an 403 Forbidden using the short-curcuit option, the
HttpRequest is not released which leads to a buffer leak.
Modifications:
Releasing the HttpRequest when done processing a preflight request or
responding with an 403.
Result:
Using the CorsHandler will not cause buffer leaks.
Motivation:
Headers within netty do not cleanly share a common class hierarchy. As a result some header types support some operations
and don't support others. The consolidation of the class hierarchy will allow for maintenance and scalability for new codec.
The existing hierarchy also has a few short comings such as it is not clear when data conversions are happening. This
could result unintentionally getting back a collection or iterator where a conversion on each entry must happen.
The current headers algorithm also prepends all elements which means to find the first element or return a collection
in insertion order often requires a complete traversal followed by a collections.reverse call.
Modifications:
-Provide a generic base class which provides all the implementation for headers in netty
-Provide an extension to this class which allows for name type conversions to happen (to accommodate legacy CharSequence to String conversions)
-Update the headers interface to clarify when conversions will happen.
-Update the headers data structure so that appends are done to avoid unnecessary iteration or collection reversal.
Result:
-More unified class hierarchy for headers in netty
-Improved headers data structure and algorithms
-headers API more clearly identify when conversions are required.
Motivation
Issue #3004 shows that "=" character was not supported as it should in
the HttpPostRequestDecoder in form-data boundary.
Modifications:
Add 2 methods in StringUtil
split with maxParts: String split with a max parts only (to prevent multiple '=' to
be source of extra split while not needed)
substringAfter: String part after delimiter (since first part is not needed)
Use those methods in HttpPostRequestDecoder. Change and the
HttpPostRequestDecoderTest to check using a boundary beginning with "=".
Results:
The fix implies more stability and fix the issue.
Related issue: #1133
Motivation:
There is no support for client socket connections via a proxy server in
Netty.
Modifications:
- Add a new module 'handler-proxy'
- Add ProxyHandler and its subclasses to support SOCKS 4a/5 and HTTP(S)
proxy connections
- Add a full parameterized test for most scenarios
- Clean up pom.xml
Result:
A user can make an outgoing connection via proxy servers with only
trivial effort.
Motivation:
There's no way for a user to get the encoder and the decoder of an
HttpClientCodec. The lack of such getter methods makes it impossible to
remove the codec handlers from the pipeline correctly.
For example, a user could add more than one HttpClientCodec to the
pipeline, and then the user cannot easily decide which encoder and
decoder to remove.
Modifications:
- Add encoder() and decoder() method to HttpClientCodec which returns
HttpRequestEncoder and HttpResponseDecoder respectively
- Also made the same changes to HttpServerCodec
Result:
A user can distinguish the handlers added by multiple HttpClientCodecs
easily.
Motivation:
Websocket clients can request to speak a specific subprotocol. The list of
subprotocols the client understands are sent to the server. The server
should select one of the protocols an reply this with the websocket
handshake response. The added code verifies that the reponded subprotocol
is valid.
Modifications:
Added verification of the subprotocol received from the server against the
subprotocol(s) that the user requests. If the user requests a subprotocol
but the server responds none or a non-requested subprotocol this is an
error and the handshake fails through an exception. If the user requests
no subprotocol but the server responds one this is also marked as an
error.
Addiontionally a getter for the WebSocketClientHandshaker in the
WebSocketClientProtocolHandler is added to enable the user of a
WebSocketClientProtocolHandler to extract the used negotiated subprotocol.
Result:
The subprotocol field which is received from a websocket server is now
properly verified on client side and clients and websocket connection
attempts will now only succeed if both parties can negotiate on a
subprotocol.
If the client sends a list of multiple possible subprotocols it can
extract the negotiated subprotocol through the added handshaker getter (WebSocketClientProtocolHandler.handshaker().actualSubprotocol()).
Motivation:
I was not fully reassured that whether everything works correctly when a websocket client receives the websocket handshake HTTP response and a websocket frame in a single ByteBuf (which can happen when the server sends a response directly or shortly after the connect). In this case some parts of the ByteBuf must be processed by HTTP decoder and the remaining by the websocket decoder.
Modification:
Adding a test that verifies that in this scenaria the handshake and the message are correctly interpreted and delivered by Netty.
Result:
One more test for Netty.
The test succeeds - No problems
Motivation:
The WebSocketClientProtocolHandshakeHandler never releases the received handshake response.
Modification:
Release the message in a finally block.
Result:
No more leak
Motivation:
The WebSocket08FrameEncoder contains an optimization path for small messages which copies the message content into the header buffer to avoid vectored writes. However this path is in the current implementation never taken because the target buffer is preallocated only for exactly the size of the header.
Modification:
For messages below a certain treshold allocate the buffer so that the message can be directly copied. Thereby the optimized path is taken.
Result:
A speedup of about 25% for 100byte messages. Declines with bigger message sizes. I have currently set the treshold to 1kB which is a point where I could still see a few percent speedup, but we should also avoid burning too many CPU cycles.
Motivation:
Websocket performance is to a large account determined through the masking
and unmasking of frames. The current behavior of this in Netty can be
improved.
Modifications:
Perform the XOR operation not bytewise but in int blocks as long as
possible. This reduces the number of necessary operations by 4. Also don't
read the writerIndex in each iteration.
Added a unit test for websocket decoding and encoding for verifiation.
Result:
A large performance gain (up to 50%) in websocket throughput.
Motivation:
According to the websocket specification peers may send a close frame when
they detect a protocol violation (with status code 1002). The current
implementation simply closes the connection. This update should add this
functionality. The functionality is optional - but it might help other
implementations with debugging when they receive such a frame.
Modification:
When a protocol violation in the decoder is detected and a close was not
already initiated by the remote peer a close frame is
sent.
Result:
Remotes which will send an invalid frame will now get a close frame that
indicates the protocol violation instead of only seeing a closed
connection.
Motivation:
Sometimes it is useful to be able to access the uri that was used to initialize the QueryStringDecoder.
Modifications:
Add method which allows to retrieve the uri.
Result:
Allow to retrieve the uri that was used to create the QueryStringDecoder.
Motivation:
The HTTP/2 codec does not provide a way to decompress data. This functionality is supported by the HTTP codec and is expected to be a commonly used feature.
Modifications:
-The Http2FrameReader will be modified to allow hooks for decompression
-New classes which detect the decompression from HTTP/2 header frames and uses that decompression when HTTP/2 data frames come in
-New unit tests
Result:
The HTTP/2 codec will provide a means to support data decompression
Motiviation:
The HttpContentDecoder.getTargetContentEncoding has a SuppressWarnings(unused) on its parameter.
This should be SuppressWarnings(UnusedParameters).
Modifications:
SuppressWarnings(unused) -> SuppressWarnings(UnusedParameters)
Result:
Correctly suppressing warnings due to HttpContentDecoder.getTargetContentEncoding
Motiviation:
The HTTP content decoder's cleanup method is not cleaning up the decoder correctly.
The cleanup method is currently doing a readOutbound on the EmbeddedChannel but
for decoding the call should be readInbound.
Modifications:
-Change readOutbound to readInbound in the cleanup method
Result:
The cleanup method should be correctly releaseing unused resources
Motivation:
Because of a bug a NPE was thrown when tried to encode HTTP to SPDY and no X-SPDY-Associated-To-Stream-ID was present.
Modifications:
Use 0 as default value when X-SPDY-Associated-To-Stream-ID not present.
Result:
No NPE anymore.
Motivation:
The priority information reported by the HTTP/2 to HTTP tranlsation layer is not correct in all situations.
The HTTP translation layer is not using the Http2Connection.Listener interface to track tree restructures.
This incorrect information is being sent up to clients and is misleading.
Modifications:
-Restructure InboundHttp2ToHttpAdapter to allow a default data/header mode
-Extend this interface to provide an optional priority translation layer
Result:
-Priority information being correctly reported in HTTP/2 to HTTP translation layer
-Cleaner code with seperation of concerns (optional priority conversion).
Motivation:
HTTP/2 draft 14 came out a couple of weeks ago and we need to keep up
with the spec.
Modifications:
-Revert back to dispatching FullHttpMessage objects instead of individual HttpObjects
-Corrections to HttpObject comparitors to support test cases
-New test cases to support sending headers immediatley
-Bug fixes cleaned up to ensure the message flow is terminated properly
Result:
Netty HTTP/2 to HTTP/1.x translation layer will support the HTTP/2 draft message flow.
Motivation:
Currently we do more memory copies then needed.
Modification:
- Directly use heap buffers to reduce memory copy
- Correctly release buffers to fix buffer leak
Result:
Less memory copies and no leaks
Related issue: #2741 and #2151
Motivation:
There is no way for ChunkedWriteHandler to know the progress of the
transfer of a ChannelInput. Therefore, ChannelProgressiveFutureListener
cannot get exact information about the progress of the transfer.
If you add a few methods that optionally provides the transfer progress
to ChannelInput, it becomes possible for ChunkedWriteHandler to notify
ChannelProgressiveFutureListeners.
If the input has no definite length, we can still use the progress so
far, and consider the length of the input as 'undefined'.
Modifications:
- Add ChunkedInput.progress() and ChunkedInput.length()
- Modify ChunkedWriteHandler to use progress() and length() to notify
the transfer progress
Result:
ChunkedWriteHandler now notifies ChannelProgressiveFutureListener.
Motivation:
The _0XFF_0X00 buffer is not duplicated and empty after the first usage preventing the connection close to happen on subsequent close frames.
Modifications:
Correctly duplicate the buffer.
Result:
Multiple CloseWebSocketFrames are handled correctly.
Motivation:
The HTTP/2 codec currently provides direct callbacks to access stream events/data. The HTTP/2 codec provides the protocol support for HTTP/2 but it does not pass messages up the context pipeline. It would be nice to have a decoder which could collect the data framed by HTTP/2 and translate this into traditional HTTP type objects. This would allow the traditional Netty context pipeline to be used to separate processing concerns (i.e. HttpContentDecompressor). It would also be good to have a layer which can translate FullHttp[Request|Response] objects into HTTP/2 frame outbound events.
Modifications:
Introduce a new InboundHttp2ToHttpAdapter and supporting classes which will translate HTTP/2 stream events/data into HttpObject objects. Introduce a new DelegatingHttp2HttpConnectionHandler which will translate FullHttp[Request|Response] objects to HTTP/2 frame events.
Result:
Introduced HTTP/2 frame events to HttpObject layer.
Introduced FullHttp[Request|Response] to HTTP/2 frame events.
Introduced new unit tests to support new code.
Updated HTTP/2 client example to use new code.
Miscelaneous updates and bug fixes made to support new code.
In Netty 3, downstream writes of SPDY data frames and upstream reads of
SPDY window udpate frames occur on different threads.
When receiving a window update frame, we synchronize on a java object
(SpdySessionHandler::flowControlLock) while sending any pending writes
that are now able to complete.
When writing a data frame, we check the send window size to see if we
are allowed to write it to the socket, or if we have to enqueue it as a
pending write. To prevent races with the window update frame, this is
also synchronized on the same SpdySessionHandler::flowControlLock.
In Netty 4, upstream and downstream operations on any given channel now
occur on the same thread. Since java locks are re-entrant, this now
allows downstream writes to occur while processing window update frames.
In particular, when we receive a window update frame that unblocks a
pending write, this write completes which triggers an event notification
on the response, which in turn triggers a write of a data frame. Since
this is on the same thread it re-enters the lock and modifies the send
window. When the write completes, we continue processing pending writes
without knowledge that the window size has been decremented.
Related issue: #2743
Motivation:
When there are more than one stream with the same priority, the set
returned by SpdySession.getActiveStream() will not include all of them,
because it uses TreeSet and only compares the priority of streams. If
two different streams have the same priority, one of them will be
discarded by TreeSet.
Modification:
- Rename getActiveStreams() to activeStreams()
- Replace PriorityComparator with StreamComparator
Result:
Two different streams with the same priority are compared correctly.
Motivation:
If the requests contains uri parameters but not path the HttpRequestEncoder does produce an invalid uri while try to add the missing path.
Modifications:
Correctly handle the case of uri with paramaters but no path.
Result:
HttpRequestEncoder produce correct uri in all cases.
Motivation:
Now Netty has a few problems with null values.
Modifications:
- Check HAProxyProxiedProtocol in HAProxyMessage constructor and throw NPE if it is null.
If HAProxyProxiedProtocol is null we will set AddressFamily as null. So we will get NPE inside checkAddress(String, AddressFamily) and it won't be easy to understand why addrFamily is null.
- Check File in DiskFileUpload.toString().
If File is null we will get NPE when calling toString() method.
- Check Result<String> in MqttDecoder.decodeConnectionPayload(...).
If !mqttConnectVariableHeader.isWillFlag() || !mqttConnectVariableHeader.hasUserName() || !mqttConnectVariableHeader.hasPassword() we will get NPE when we will try to create new instance of MqttConnectPayload.
- Check Unsafe before calling unsafe.getClass() in PlatformDependent0 static block.
- Removed unnecessary null check in WebSocket08FrameEncoder.encode(...).
Because msg.content() can not return null.
- Removed unnecessary null check in DefaultStompFrame(StompCommand) constructor.
Because we have this check in the super class.
- Removed unnecessary null checks in ConcurrentHashMapV8.removeTreeNode(TreeNode<K,V>).
- Removed unnecessary null check in OioDatagramChannel.doReadMessages(List<Object>).
Because tmpPacket.getSocketAddress() always returns new SocketAddress instance.
- Removed unnecessary null check in OioServerSocketChannel.doReadMessages(List<Object>).
Because socket.accept() always returns new Socket instance.
- Pass Unpooled.buffer(0) instead of null inside CloseWebSocketFrame(boolean, int) constructor.
If we will pass null we will get NPE in super class constructor.
- Added throw new IllegalStateException in GlobalEventExecutor.awaitInactivity(long, TimeUnit) if it will be called before GlobalEventExecutor.execute(Runnable).
Because now we will get NPE. IllegalStateException will be better in this case.
- Fixed null check in OpenSslServerContext.setTicketKeys(byte[]).
Now we throw new NPE if byte[] is not null.
Result:
Added new null checks when it is necessary, removed unnecessary null checks and fixed some NPE problems.
Modifications:
- Added a static modifier for CompositeByteBuf.Component.
This class is an inner class, but does not use its embedded reference to the object which created it. This reference makes the instances of the class larger, and may keep the reference to the creator object alive longer than necessary.
- Removed unnecessary boxing/unboxing operations in HttpResponseDecoder, RtspResponseDecoder, PerMessageDeflateClientExtensionHandshaker and PerMessageDeflateServerExtensionHandshaker
A boxed primitive is created from a String, just to extract the unboxed primitive value.
- Removed unnecessary 3 times calculations in DiskAttribute.addContent(...).
- Removed unnecessary checks if file exists before call mkdirs() in NativeLibraryLoader and PlatformDependent.
Because the method mkdirs() has this check inside.
- Removed unnecessary `instanceof AsciiString` check in StompSubframeAggregator.contentLength(StompHeadersSubframe) and StompSubframeDecoder.getContentLength(StompHeaders, long).
Because StompHeaders.get(CharSequence) always returns java.lang.String.
Motivation:
When we receive an incomplete WebSocketFrame we need to make sure to wait for more data. Because we not did this we could produce a NPE.
Modification:
Make sure we not try to add null into the RecyclableArrayList
Result:
no more NPE on incomplete frames.
Motivation:
HTTP header validation can be expensive so we should allow to disable it like we do in HttpObjectDecoder.
Modification:
Add constructor argument to disable validation.
Result:
Performance improvement
Motivation:
HttpObjectAggregator currently creates a new FullHttpResponse / FullHttpRequest for each message it needs to aggregate. While doing so it also creates 2 DefaultHttpHeader instances (one for the headers and one for the trailing headers). This is bad for two reasons:
- More objects are created then needed and also populate the headers is not for free
- Headers may get validated even if the validation was disabled in the decoder
Modification:
- Wrap the previous created HttpResponse / HttpRequest and so reuse the original HttpHeaders
- Reuse the previous created trailing HttpHeader.
- Fix a bug where the trailing HttpHeader was incorrectly mixed in the headers.
Result:
- Less GC
- Faster HttpObjectAggregator implementation
Add permessage-deflate and deflate-frame WebSocket extension
implementations.
Motivation:
Need to compress HTTP WebSocket frames payload.
Modifications:
- Move UTF8 checking of WebSocketFrames from frame decoder to and
external handler.
- Change ZlibCodecFactory in order to use ZLib implentation instead of
JDK if windowSize or memLevel is different from the default one.
- Add WebSocketServerExtensionHandler and
WebSocketClientExtensionHandler to handle WebSocket Extension headers.
- Add DeflateFrame and PermessageDeflate extension implementations.
Motivation:
HttpOrSpdyChooser can be simplified so the user not need to implement getProtocol(...) method.
Modification:
Add implementation for the method. The user can override it if necessary.
Result:
Easier usage of HttpOrSpdyChooser.
Motivation:
DecodeResult is dropped when aggregate HTTP messages.
Modification:
Make sure we not drop the DecodeResult while aggregate HTTP messages.
Result:
Correctly include the DecodeResult for later processing.
Motivation:
Due to integer overflow bug, writes of FileRegions to http server pipeline (eg like one from HttpStaticFileServer example) with length greater than Integer.MAX_VALUE are ignored in 1/2 of cases (ie no data gets sent to client)
Modification:
Correctly handle chunk sized > Integer.MAX_VALUE
Result:
Be able to use FileRegion > Integer.MAX_VALUE when using chunked encoding.
Motivation:
Persuit for the consistency in method naming
Modifications:
- Remove the 'get' prefix from all HTTP/SPDY message classes
- Fix some inspector warnings
Result:
Consistency
Motivation:
When Netty runs in a managed environment such as web application server,
Netty needs to provide an explicit way to remove the thread-local
variables it created to prevent class loader leaks.
FastThreadLocal uses different execution paths for storing a
thread-local variable depending on the type of the current thread.
It increases the complexity of thread-local removal.
Modifications:
- Moved FastThreadLocal and FastThreadLocalThread out of the internal
package so that a user can use it.
- FastThreadLocal now keeps track of all thread local variables it has
initialized, and calling FastThreadLocal.removeAll() will remove all
thread-local variables of the caller thread.
- Added FastThreadLocal.size() for diagnostics and tests
- Introduce InternalThreadLocalMap which is a mixture of hard-wired
thread local variable fields and extensible indexed variables
- FastThreadLocal now uses InternalThreadLocalMap to implement a
thread-local variable.
- Added ThreadDeathWatcher.unwatch() so that PooledByteBufAllocator
tells it to stop watching when its thread-local cache has been freed
by FastThreadLocal.removeAll().
- Added FastThreadLocalTest to ensure that removeAll() works
- Added microbenchmark for FastThreadLocal and JDK ThreadLocal
- Upgraded to JMH 0.9
Result:
- A user can remove all thread-local variables Netty created, as long as
he or she did not exit from the current thread. (Note that there's no
way to remove a thread-local variable from outside of the thread.)
- FastThreadLocal exposes more useful operations such as isSet() because
we always implement a thread local variable via InternalThreadLocalMap
instead of falling back to JDK ThreadLocal.
- FastThreadLocalBenchmark shows that this change improves the
performance of FastThreadLocal even more.