Motivation:
The SslHandler currently forces the use of a direct buffer for the input to the SSLEngine.wrap(..) operation. This allocation may not always be desired and should be conditionally done.
Modifications:
- Use the pre-existing wantsDirectBuffer variable as the condition to do the conversion.
Result:
- An allocation of a direct byte buffer and a copy of data is now not required for every SslHandler wrap operation.
Motivation:
The java implementations for Inet6Address.getHostName() do not follow the RFC 5952 (http://tools.ietf.org/html/rfc5952#section-4) for recommended string representation. This introduces inconsistencies when integrating with other technologies that do follow the RFC.
Modifications:
-NetUtil.java to have another public static method to convert InetAddress to string. Inet4Address will use the java InetAddress.getHostAddress() implementation and there will be new code to implement the RFC 5952 IPV6 string conversion.
-New unit tests to test the new method
Result:
Netty provides a RFC 5952 compliant string conversion method for IPV6 addresses
Motivation:
The SslHandler wrap method requires that a direct buffer be passed to the SSLEngine.wrap() call. If the ByteBuf parameter does not have an underlying direct buffer then one is allocated in this method, but it is not released.
Modifications:
- Release the direct ByteBuffer only accessible in the scope of SslHandler.wrap
Result:
Memory leak in SslHandler.wrap is fixed.
Motivation:
The requirement for the masking of frames and for checks of correct
masking in the websocket specifiation have a large impact on performance.
While it is mandatory for browsers to use masking there are other
applications (like IPC protocols) that want to user websocket framing and proxy-traversing
characteristics without the overhead of masking. The websocket standard
also mentions that the requirement for mask verification on server side
might be dropped in future.
Modifications:
Added an optional parameter allowMaskMismatch for the websocket decoder
that allows a server to also accept unmasked frames (and clients to accept
masked frames).
Allowed to set this option through the websocket handshaker
constructors as well as the websocket client and server handlers.
The public API for existing components doesn't change, it will be
forwarded to functions which implicetly set masking as required in the
specification.
For websocket clients an additional parameter is added that allows to
disable the masking of frames that are sent by the client.
Result:
This update gives netty users the ability to create and use completely
unmasked websocket connections in addition to the normal masked channels
that the standard describes.
Motivation:
DnsNameResolver.testResolveA() tests if the cache works as well as the usual DNS protocol test. To ensure the result from the cache is identical to the result without cache, it compares the two Maps which contain the result of cached/uncached resolution. The comparison of two Maps yields an expected behavior, but the output of the comparison on failure is often unreadable due to its long length.
Modifications:
Compare entry-by-entry for more comprehensible test failure output
Result:
When failure occurs, it's easier to see which domain was the cause of the problem.
Motivation:
At the moment the whole HTTP header must be parsed at once which can lead to multiple parsing of the same bytes. We can do better here and allow to parse it in multiple steps.
Modifications:
- Not parse headers multiple times
- Simplify the code
- Eliminate uncessary String[] creations
- Use readSlice(...).retain() when possible.
Result:
Performance improvements as shown in the included benchmark below.
Before change:
[nmaurer@xxx]~% ./wrk-benchmark
Running 2m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 21.55ms 15.10ms 245.02ms 90.26%
Req/Sec 196.33k 30.17k 297.29k 76.03%
373954750 requests in 2.00m, 50.15GB read
Requests/sec: 3116466.08
Transfer/sec: 427.98MB
After change:
[nmaurer@xxx]~% ./wrk-benchmark
Running 2m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 20.91ms 36.79ms 1.26s 98.24%
Req/Sec 206.67k 21.69k 243.62k 94.96%
393071191 requests in 2.00m, 52.71GB read
Requests/sec: 3275971.50
Transfer/sec: 449.89MB
Motivation:
As report in #2953 the websocket server example contained a bug and did therefore not work with chrome:
A websocket extension is added to the pipeline but extensions were disallowed in the handshaker and decoder,
which is leading the decoder to closing the connection after receiving an extension frame.
Modifications:
Allow websocket extensions in the handshaker to correctly enable the extension.
Result:
Working websocket server example
Fixes#2953
Related: #2945
Motivation:
Some special handlers such as TrafficShapingHandler need to override the
writability of a Channel to throttle the outbound traffic.
Modifications:
Add a new indexed property called 'user-defined writability flag' to
ChannelOutboundBuffer so that a handler can override the writability of
a Channel easily.
Result:
A handler can override the writability of a Channel using an unsafe API.
For example:
Channel ch = ...;
ch.unsafe().outboundBuffer().setUserDefinedWritability(1, false);
Motivation:
The 4.1.0-Beta3 implementation of HttpObjectAggregator.handleOversizedMessage closes the
connection if the client sent oversized chunked data with no Expect:
100-continue header. This causes a broken pipe or "connection reset by
peer" error in some clients (tested on Firefox 31 OS X 10.9.5,
async-http-client 1.8.14).
This part of the HTTP 1.1 spec (below) seems to say that in this scenario the connection
should not be closed (unless the intention is to be very strict about
how data should be sent).
http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html
"If an origin server receives a request that does not include an
Expect request-header field with the "100-continue" expectation,
the request includes a request body, and the server responds
with a final status code before reading the entire request body
from the transport connection, then the server SHOULD NOT close
the transport connection until it has read the entire request,
or until the client closes the connection. Otherwise, the client
might not reliably receive the response message. However, this
requirement is not be construed as preventing a server from
defending itself against denial-of-service attacks, or from
badly broken client implementations."
Modifications:
Change HttpObjectAggregator.handleOversizedMessage to close the
connection only if keep-alive is off and Expect: 100-continue is
missing. Update test to reflect the change.
Result:
Broken pipe and connection reset errors on the client are avoided when
oversized data is sent.
Related: #2983
Motivation:
It is a well known idiom to write an empty buffer and add a listener to
its future to close a channel when the last byte has been written out:
ChannelFuture f = channel.writeAndFlush(Unpooled.EMPTY_BUFFER);
f.addListener(ChannelFutureListener.CLOSE);
When HttpObjectEncoder is in the pipeline, this still works, but it
silently raises an IllegalStateException, because HttpObjectEncoder does
not allow writing a ByteBuf when it is expecting an HttpMessage.
Modifications:
- Handle an empty ByteBuf specially in HttpObjectEncoder, so that
writing an empty buffer does not fail even if the pipeline contains an
HttpObjectEncoder
- Add a test
Result:
An exception is not triggered anymore by HttpObjectEncoder, when a user
attempts to write an empty buffer.
Motivation:
Currently, when the CorsHandler processes a preflight request, or
respondes with an 403 Forbidden using the short-curcuit option, the
HttpRequest is not released which leads to a buffer leak.
Modifications:
Releasing the HttpRequest when done processing a preflight request or
responding with an 403.
Result:
Using the CorsHandler will not cause buffer leaks.
Related: #2952
Motivation:
META-INF/io.netty.versions.properties in netty-all-*.jar does not
contain the version information about the netty-transport-epoll module.
Modifications:
Fix a bug in the regular expression in pom.xml, so that the artifacts
with a classifier is also included in the version properties file.
Result:
The version information of all modules are included in the version
properties file, and Version.identify() does not miss
netty-transport-epoll.
Motivation
Issue #3004 shows that "=" character was not supported as it should in
the HttpPostRequestDecoder in form-data boundary.
Modifications:
Add 2 methods in StringUtil
- split with maxPart argument: String split with max parts only (to prevent multiple '='
to be source of extra split while not needed)
- substringAfter: String part after delimiter (since first part is not
needed)
Use those methods in HttpPostRequestDecoder.
Change and the HttpPostRequestDecoderTest to check using a boundary
beginning with "=".
Results:
The fix implies more stability and fix the issue.
Motivation:
Using a needless local copy of keys.length.
Modifications:
Using keys.length explicitly everywhere.
Result:
Slight performance improvement of hashIndex.
Motivation:
The hashIndex method currently uses a conditional to handle negative
keys. This could be done without a conditional to slightly improve
performance.
Modifications:
Modified hashIndex() to avoid using a conditional.
Result:
Slight performance improvement to hashIndex().
Motivation:
IntObjectHashMap throws an exception when using negative values for
keys.
Modifications:
Changed hashIndex() to normalize the index if the mod operation returns
a negative number.
Result:
IntObjectHashMap supports negative key values.
Motivation:
Make it much more readable code.
Modifications:
- Added states of decompression.
- Refactored decode(...) method to use this states.
Result:
Much more readable decoder which looks like other compression decoders.
Related: #2034
Motivation:
Some users want to mock Bootstrap (or ServerBootstrap), and thus they
should not be final but be fully overridable and extensible.
Modifications:
Remove finals wherever possible
Result:
@daschl is happy.
Related: #2964
Motivation:
Writing a zero-length FileRegion to an NIO channel will lead to an
infinite loop.
Modification:
- Do not write a zero-length FileRegion by protecting with proper 'if'.
- Update the testsuite
Result:
Another bug fixed
Motivation:
We see occational failures in the datagram tests saying 'address already
in use' when we attempt to bind on a port returned by
TestUtils.getFreePort().
It turns out that TestUtils.getFreePort() only checks if TCP port is
available.
Modifications:
Also check if UDP port is available, so that the datagram tests do not
fail because of the 'address already in use' error during a bind
attempt.
Result:
Less chance of datagram test failures
Motivation:
The default name resolver attempts to resolve the bad host name (destination.com) and actually succeeds, making the ProxyHandlerTest fail.
Modification:
Use NoopNameResolverGroup instead.
Result:
ProxyHandlerTest passes again.
Motivation:
So far, we relied on the domain name resolution mechanism provided by
JDK. It served its purpose very well, but had the following
shortcomings:
- Domain name resolution is performed in a blocking manner.
This becomes a problem when a user has to connect to thousands of
different hosts. e.g. web crawlers
- It is impossible to employ an alternative cache/retry policy.
e.g. lower/upper bound in TTL, round-robin
- It is impossible to employ an alternative name resolution mechanism.
e.g. Zookeeper-based name resolver
Modification:
- Add the resolver API in the new module: netty-resolver
- Implement the DNS-based resolver: netty-resolver-dns
.. which uses netty-codec-dns
- Make ChannelFactory reusable because it's now used by
io.netty.bootstrap, io.netty.resolver.dns, and potentially by other
modules in the future
- Move ChannelFactory from io.netty.bootstrap to io.netty.channel
- Deprecate the old ChannelFactory
- Add ReflectiveChannelFactory
Result:
It is trivial to resolve a large number of domain names asynchronously.
Motivation:
DnsQueryEncoder does not encode the 'additional resources' section at all, which contains the pseudo-RR as defined in RFC 2671.
Modifications:
- Modify DnsQueryEncoder to encode the additional resources
- Fix a bug in DnsQueryEncoder where an empty name is encoded incorrectly
Result:
A user can send an EDNS query.
Motivation:
When a datagram packet is sent to a destination where nobody actually listens to,
the server O/S will respond with an ICMP Port Unreachable packet.
The ICMP Port Unreachable packet is translated into PortUnreachableException by JDK.
PortUnreachableException is not a harmful exception that prevents a user from sending a datagram.
Therefore, we should not close a datagram channel when PortUnreachableException is caught.
Modifications:
- Do not close a channel when the caught exception is PortUnreachableException.
Result:
A datagram channel is not closed unexpectedly anymore.
Related issue: #1133
Motivation:
There is no support for client socket connections via a proxy server in
Netty.
Modifications:
- Add a new module 'handler-proxy'
- Add ProxyHandler and its subclasses to support SOCKS 4a/5 and HTTP(S)
proxy connections
- Add a full parameterized test for most scenarios
- Clean up pom.xml
Result:
A user can make an outgoing connection via proxy servers with only
trivial effort.
Motivation:
JDK's exception messages triggered by a connection attempt failure do
not contain the related remote address in its message. We currently
append the remote address to ConnectException's message, but I found
that we need to cover more exception types such as SocketException.
Modifications:
- Add AbstractUnsafe.annotateConnectException() to de-duplicate the
code that appends the remote address
Result:
- Less duplication
- A transport implementor can annotate connection attempt failure
message more easily
Motivation:
Socks5CmdRequestDecoder uses ByteBuf.array() naively assuming that the
array's base offset is always 0, which is not the case.
Modification:
- Allocate a new byte array and copy the content there instead
Result:
Another bug fixed
Motivation:
An IPv6 string can have a zone index which is followed by the '%' sign.
When a user passes an IPv6 string with a zone index,
NetUtil.createByteArrayFromIpAddressString() returns an incorrect value.
Modification:
- Strip the zone index before conversion
Result:
An IPv6 string with a zone index is decoded correctly.
Motivation:
There's no way to generate the name of a handler being newly added
automatically and reliably.
For example, let's say you have a routine that adds a set of handlers to
a pipeline using addBefore() or addAfter(). Because addBefore() and
addAfter() always require non-conflicting non-null handler name, making
the multiple invocation of the routine on the same pipeline is
non-trivial.
Modifications:
- If a user specifies null as the name of the new handler,
DefaultChannelPipeline generates one.
- Update the documentation of ChannelPipeline to match the new behavior
Result:
A user doesn't need to worry about name conflicts anymore.
Motivation:
There's no way for a user to get the encoder and the decoder of an
HttpClientCodec. The lack of such getter methods makes it impossible to
remove the codec handlers from the pipeline correctly.
For example, a user could add more than one HttpClientCodec to the
pipeline, and then the user cannot easily decide which encoder and
decoder to remove.
Modifications:
- Add encoder() and decoder() method to HttpClientCodec which returns
HttpRequestEncoder and HttpResponseDecoder respectively
- Also made the same changes to HttpServerCodec
Result:
A user can distinguish the handlers added by multiple HttpClientCodecs
easily.
Motiviation:
Before this change, autoRead was a volatile boolean accessed directly. Any thread that invoked the DefaultChannelConfig#setAutoRead(boolean) method would read the current value of autoRead, and then set a new value. If the old value did not match the new value, some action would be immediately taken as part of the same method call.
As volatile only provides happens-before consistency, there was no guarantee that the calling thread was actually the thread mutating the state of the autoRead variable (such that it should be the one to invoke the follow-up actions). For example, with 3 threads:
* Thread 1: get = false
* Thread 1: set = true
* Thread 1: invokes read()
* Thread 2: get = true
* Thread 3: get = true
* Thread 2: set = false
* Thread 2: invokes autoReadCleared()
* Event Loop receives notification from the Selector that data is available, but as autoRead has been cleared, cancels the operation and removes read interest
* Thread 3: set = true
This results in a livelock - autoRead is set true, but no reads will happen even if data is available (as readyOps). The only way around this livelock currently is to set autoRead to false, and then back to true.
Modifications:
Write access to the autoRead variable is now made using the getAndSet() method of an AtomicIntegerFieldUpdater, AUTOREAD_UPDATER. This also changed the type of the underlying autoRead variable to be an integer, as no AtomicBooleanFieldUpdater class exists. Boolean logic is retained by assuming that 1 is true and 0 is false.
Result:
There is no longer a race condition between retrieving the old value of the autoRead variable and setting a new value.
Motivation:
Websocket clients can request to speak a specific subprotocol. The list of
subprotocols the client understands are sent to the server. The server
should select one of the protocols an reply this with the websocket
handshake response. The added code verifies that the reponded subprotocol
is valid.
Modifications:
Added verification of the subprotocol received from the server against the
subprotocol(s) that the user requests. If the user requests a subprotocol
but the server responds none or a non-requested subprotocol this is an
error and the handshake fails through an exception. If the user requests
no subprotocol but the server responds one this is also marked as an
error.
Addiontionally a getter for the WebSocketClientHandshaker in the
WebSocketClientProtocolHandler is added to enable the user of a
WebSocketClientProtocolHandler to extract the used negotiated subprotocol.
Result:
The subprotocol field which is received from a websocket server is now
properly verified on client side and clients and websocket connection
attempts will now only succeed if both parties can negotiate on a
subprotocol.
If the client sends a list of multiple possible subprotocols it can
extract the negotiated subprotocol through the added handshaker getter (WebSocketClientProtocolHandler.handshaker().actualSubprotocol()).
Motivation:
http://public.dhe.ibm.com/software/dw/webservices/ws-mqtt/mqtt-v3r1.html#connack
In MQTT 3.1, MQTT server must send a CONNACK with return code if CONNECT
request contains an invalid client identifier or an unacceptable protocol
version. The return code is one of MqttConnectReturnCode.
But, MqttDecoder throws DecoderException when CONNECT request contains
invalid value without distinguish situations. This makes it difficult
for codec-mqtt users to send a response with return code to clients.
Modifications:
Added exceptions for client identifier rejected and unacceptable
protocol version. MqttDecoder will throw those exceptions instead of
DecoderException.
Result:
Users of codec-mqtt can distinguish which is invalid when CONNECT
contains invalid client identifier or invalid protocol version. And, users can
send CONNACK with return code to clients.
Motivation:
I was not fully reassured that whether everything works correctly when a websocket client receives the websocket handshake HTTP response and a websocket frame in a single ByteBuf (which can happen when the server sends a response directly or shortly after the connect). In this case some parts of the ByteBuf must be processed by HTTP decoder and the remaining by the websocket decoder.
Modification:
Adding a test that verifies that in this scenaria the handshake and the message are correctly interpreted and delivered by Netty.
Result:
One more test for Netty.
The test succeeds - No problems
Motivation:
In MQTT 3.1 specification, "The Client Identifier (Client ID) is between
1 and 23 characters long, and uniquely identifies the client to the
server". But, current client id validation length is 0~23. It must be
1~23. The empty string is invalid client id in MQTT 3.1
Modifications:
Change isValidClientId method. Add MIN_CLIENT_ID_LENGTH.
Result:
The validation check for client id length is between 1 and 23.
Motivation:
It is often helpful to measure the performance of connections, e.g. the
latency and the throughput. This can be performed through benchmarks.
Modification:
This adds a simple but configurable benchmark for websockets into the
example directory. The Netty WebSocket server will echo all received
websocket frames and will provide an HTML/JS page which serves as the
client for the benchmark.
The benchmark also provides a verification mode that verifies the sent
against the received data. This can be used for the verification ob
websocket frame encoding and decoding funtionality.
Result:
A benchmark is added in form a further Netty websocket example.
With this benchmark it is easily possible to measure the performance between Netty and a browser
Motivation:
The WebSocketClientProtocolHandshakeHandler never releases the received handshake response.
Modification:
Release the message in a finally block.
Result:
No more leak
Motivation:
The WebSocket08FrameEncoder contains an optimization path for small messages which copies the message content into the header buffer to avoid vectored writes. However this path is in the current implementation never taken because the target buffer is preallocated only for exactly the size of the header.
Modification:
For messages below a certain treshold allocate the buffer so that the message can be directly copied. Thereby the optimized path is taken.
Result:
A speedup of about 25% for 100byte messages. Declines with bigger message sizes. I have currently set the treshold to 1kB which is a point where I could still see a few percent speedup, but we should also avoid burning too many CPU cycles.
Motivation:
Websocket performance is to a large account determined through the masking
and unmasking of frames. The current behavior of this in Netty can be
improved.
Modifications:
Perform the XOR operation not bytewise but in int blocks as long as
possible. This reduces the number of necessary operations by 4. Also don't
read the writerIndex in each iteration.
Added a unit test for websocket decoding and encoding for verifiation.
Result:
A large performance gain (up to 50%) in websocket throughput.
Motivation:
Currently the last read/write throughput is calculated by first division,this will be 0 if the last read/write bytes < interval,change the order will get the correct result
Modifications:
Change the operator order from first do division to multiplication
Result:
Get the correct result instead of 0 when bytes are smaller than interval
Motivation:
According to the websocket specification peers may send a close frame when
they detect a protocol violation (with status code 1002). The current
implementation simply closes the connection. This update should add this
functionality. The functionality is optional - but it might help other
implementations with debugging when they receive such a frame.
Modification:
When a protocol violation in the decoder is detected and a close was not
already initiated by the remote peer a close frame is
sent.
Result:
Remotes which will send an invalid frame will now get a close frame that
indicates the protocol violation instead of only seeing a closed
connection.
Motivation:
We incorrectly used SslContext.newServerContext() in some places where a we needed a client context.
Modifications:
Use SslContext.newClientContext() when using ssl on the client side.
Result:
Working ssl client examples.
Motivation:
We use malloc(1) in the on JNI_OnLoad method but never free the allocated memory. This means we have a tiny memory leak of 1 byte.
Modifications:
Call free(...) on previous allocated memory.
Result:
Fix memory leak
Motivation:
We introduced a PoolThreadCache which is used in our PooledByteBufAllocator to reduce the synchronization overhead on PoolArenas when allocate / deallocate PooledByteBuf instances. This cache is used for both the allocation path and deallocation path by:
- Look for cached memory in the PoolThreadCache for the Thread that tries to allocate a new PooledByteBuf and if one is found return it.
- Add the memory that is used by a PooledByteBuf to the PoolThreadCache of the Thread that release the PooledByteBuf
This works out very well when all allocation / deallocation is done in the EventLoop as the EventLoop will be used for read and write. On the otherside this can lead to surprising side-effects if the user allocate from outside the EventLoop and and pass the ByteBuf over for writing. The problem here is that the memory will be added to the PoolThreadCache that did the actual write on the underlying transport and not on the Thread that previously allocated the buffer.
Modifications:
Don't cache if different Threads are used for allocating/deallocating
Result:
Less confusing behavior for users that allocate PooledByteBufs from outside the EventLoop.
Motivation:
When MemoryRegionCache.trim() is called, some unused cache entries will be freed (started from head). However, in MeoryRegionCache.trim() the head is not updated, which make entry list's head point to an entry whose chunk is null now and following allocate of MeoryRegionCache will return false immediately.
In other word, cache is no longer usable once trim happen.
Modifications:
Update head to correct idx after free entries in trim().
Result:
MemoryRegionCache behaves correctly even after calling trim().