Motivation:
Calling System.nanoTime() for each channelRead(...) is very expensive. See [#3808] for more detailed description.
Also we always do extra work for each write and read even if read or write idle states should not be handled.
Modifications:
- Move System.nanoTime() call to channelReadComplete(...).
- Reuse ChannelFutureListener for writes
- Only add ChannelFutureListener to writes if write and all idle states should be handled.
- Only call System.nanoTime() for reads if idle state events for read and all states should be handled.
Result:
Less overhead when using the IdleStateHandler.
Motivation:
We called TrustManagerFactory.init(...) even when the trustCertChainFile is null. This could lead to exceptions during the handshake.
Modifications:
Correctly only call TurstManagerFactory.init() if trustCertcChainFail is not null.
Result:
Correct behavior.
Motiviation:
The OpenSSL engine uses SSLHandshakeException in the event of failures that occur during the handshake process. The alpn-boot project's getSSLException will also map the no_application_protocol to a SSLHandshakeException exception. We should be consistent and use SSLHandshakeException for handshake failure events.
Modifications:
-Update JdkAlpnSslEngine to propagate an SSLHandshakeException in the event of a failure.
Result:
Consistent usage of SSLHandshakeException during a handshake failure event.
Motivation:
Allow writing with void promise if IdleStateHandler is configured in the pipeline for read timeout events.
Modifications:
Better performance.
Result:
No more ChannelFutureListeners are created if IdleStateHandler is only configured for read timeouts allowing for writing to the channel with void promise.
Motivation:
[#3808] introduced some improvements to reduce the calls to System.nanoTime() but missed one possible optimization.
Modifications:
Only call System.nanoTime() if no reading patch is in process.
Result:
Less System.nanoTime() calls.
Motivation:
Discussion is in https://github.com/jetty-project/jetty-alpn/issues/8. The new API allows protocol negotiation to properly throw SSLHandshakeException.
Modifications:
Updated the parent pom.xml with the new version.
Result:
Upgraded alpn-api now allows throwing SSLHandshakeException.
Motivation:
We mitigate callouts to System.nanoTime() in SingleThreadEventExecutor
as it is 'relatively expensive'. On a modern system, tak translates to
about 20ns per call. With channelReadComplete() we can side-step this in
channelRead().
Modifications:
Introduce a boolean flag, which indicates that a read batch is currently
on-going, which acts as a flush guard for lastReadTime. Update
lastReadTime in channelReadComplete() just before setting the flag to
false. We set the flag to true in channelRead().
The periodic task examines the flag, and if it observes it to be true,
it will reschedule the task for the full duration. If it observes as
false, it will read lastReadTime and adjust the delay accordingly.
Result:
ReadTimeoutHandler calls System.nanoTime() only once per read batch.
Motivation:
At the moment hostname verification is not supported with OpenSSLEngine.
Modifications:
- Allow to create OpenSslEngine with peerHost and peerPort informations.
- Respect endPointIdentificationAlgorithm and algorithmConstraints when set and get SSLParamaters.
Result:
hostname verification is supported now.
Motivation:
keyManager() is required on server-side, and so there is a forServer()
method for each override of keyManager(). However, one of the
forServer() overrides was missing, which meant that if you wanted to use
a KeyManagerFactory you were forced to provide garbage configuration
just to get past null checks.
Modifications:
Add missing override.
Result:
No hacks to use SslContextBuilder on server-side with KeyManagerFactory.
Resolves#3775
Motivation:
To prevent from DOS attacks it can be useful to disable remote initiated renegotiation.
Modifications:
Add new flag to OpenSslContext that can be used to disable it
Adding a testcase
Result:
Remote initiated renegotion requests can be disabled now.
Motivation:
In the SslHandler we schedule a timeout at which we close the Channel if a timeout was detected during close_notify. Because this can race with notify the flushFuture we can see an IllegalStateException when the Channel is closed.
Modifications:
- Use a trySuccess() and tryFailure(...) to guard against race.
Result:
No more race.
Motivation:
Currently mutual auth is not supported when using OpenSslEngine.
Modification:
- Add support to OpenSslClientContext
- Correctly throw SSLHandshakeException when an error during handshake is detected
Result:
Mutual auth can be used with OpenSslEngine
Motivation:
Our automatically handling of non-auto-read failed because it not detected the need of calling read again by itself if nothing was decoded. Beside this handling of non-auto-read never worked for SslHandler as it always triggered a read even if it decoded a message and auto-read was false.
This fixes [#3529] and [#3587].
Modifications:
- Implement handling of calling read when nothing was decoded (with non-auto-read) to ByteToMessageDecoder again
- Correctly respect non-auto-read by SslHandler
Result:
No more stales and correctly respecting of non-auto-read by SslHandler.
Motivation:
Unnecessary object allocation is currently done during wrap/unwrap while a handshake is still in progress.
Modifications:
Use static instances when possible.
Result:
Less object creations.
Motivation:
Sometimes it's useful to get informations about the available OpenSSL library that is used for the OpenSslEngine.
Modifications:
Add two new methods which allows to get the available OpenSSL version as either
an int or an String.
Result:
Easy to access details about OpenSSL version.
Motivation:
Sometimes it's useful to use EC keys and not DSA or RSA. We should support it.
Modifications:
Support EC keys and share the code between JDK and Openssl impl.
Result:
It's possible to use EC keys now.
Motivation:
SslContext factory methods have gotten out of control; it's past time to
swap to a builder.
Modifications:
New Builder class. The existing factory methods must be left as-is for
backward compatibility.
Result:
Fixes#3531
Motivation:
To support HTTP2 we need APLN support. This was not provided before when using OpenSslEngine, so SSLEngine (JDK one) was the only bet.
Beside this CipherSuiteFilter was not supported
Modifications:
- Upgrade netty-tcnative and make use of new features to support ALPN and NPN in server and client mode.
- Guard against segfaults after the ssl pointer is freed
- support correctly different failure behaviours
- add support for CipherSuiteFilter
Result:
Be able to use OpenSslEngine for ALPN / NPN for server and client.
In TrafficCounter, a recent change makes the contract of the API (the
constructor) wrong and lead to issue with GlobalChannelTrafficCounter
where executor must be null.
Motivation:
TrafficCounter executor argument in constructor might be null, as
explained in the API, for some particular cases where no executor are
needed (relevant tasks being taken by the caller as in
GlobalChannelTrafficCounter).
A null pointer exception is raised while it should not since it is
legal.
Modifications:
Remove the 2 null checking for this particular attribute.
Note that when null, the attribute is not reached nor used (a null
checking condition later on is applied).
Result:
No more null exception raized while it should not.
This shall be made also to 4.0, 4.1 (present) and master. 3.10 is not
concerned.
Related: #3567
Motivation:
SslHandler.channelReadComplete() forgets to call
super.channelReadComplete(), which discards read bytes from the
cumulative buffer. As a result, the cumulative buffer can expand its
capacity unboundedly.
Modifications:
Call super.channelReadComplete() instead of calling
ctx.fireChannelReadComplete()
Result:
Fixes#3567
Motivation:
LoggingHandlerTest sometimes failure due to unexpected log messages
logged due to the automatic reclaimation of thread-local objects.
Expectation failure on verify:
Appender.doAppend([DEBUG] Freed 3 thread-local buffer(s) from thread: nioEventLoopGroup-23-0): expected: 1, actual: 0
Appender.doAppend([DEBUG] Freed 9 thread-local buffer(s) from thread: nioEventLoopGroup-23-1): expected: 1, actual: 0
Appender.doAppend([DEBUG] Freed 2 thread-local buffer(s) from thread: nioEventLoopGroup-23-2): expected: 1, actual: 0
Appender.doAppend([DEBUG] Freed 4 thread-local buffer(s) from thread: nioEventLoopGroup-26-0): expected: 1, actual: 0
Appender.doAppend(matchesLog(expected: ".+CLOSE$", got: "[id: 0xembedded, embedded => embedded] CLOSE")): expected: 1, actual: 0
Modifications:
Add the mock appender to the related logger only
Result:
No more intermittent test failures
Related: #3368
Motivation:
ChunkedWriteHandler checks if the return value of
ChunkedInput.isEndOfInput() after calling ChunkedInput.close().
This makes ChunkedStream.isEndOfInput() trigger an IOException, which is
originally triggered by PushBackInputStream.read().
By contract, ChunkedInput.isEndOfInput() should not raise an IOException
even when the underlying stream is closed.
Modifications:
Add a boolean flag that keeps track of whether the underlying stream has
been closed or not, so that ChunkedStream.isEndOfInput() does not
propagate the IOException from PushBackInputStream.
Result:
Fixes#3368
Motivation:
For some use cases X509ExtendedTrustManager is needed as it allows to also access the SslEngine during validation.
Modifications:
Add support for X509ExtendedTrustManager on java >= 7
Result:
It's now possible to use X509ExtendedTrustManager with OpenSslEngine
Motivation:
The Http2FrameLogger is currently using the internal logging classes. We should change this so that it's using the public classes and then converts internally.
Modifications:
Modified Http2FrameLogger and the examples to use the public LogLevel class.
Result:
Fixes#2512
Motivation:
With the current implementation the client protocol preference list
takes precedence over the one of the server, since the select method
will return the first item, in the client list, that matches any of the
protocols supported by the server. This violates the recommendation of
http://tools.ietf.org/html/rfc7301#section-3.2.
It will also fail with the current implementation of Chrome, which
sends back Extension application_layer_protocol_negotiation, protocols:
[http/1.1, spdy/3.1, h2-14]
Modifications:
Changed the protocol negotiator to prefer server’s list. Added a test
case that demonstrates the issue and that is fixed with the
modifications of this commit.
Result:
Server’s preference list is used.
Related: #3476
Motivation:
Some users use TrafficCounter for other uses than we originally
intended, such as implementing their own traffic shaper. In such a
case, a user does not want to specify an AbstractTrafficShapingHandler.
Modifications:
- Add a new constructor that does not require an
AbstractTrafficShapingHandler, so that a user can use it without it.
- Simplify TrafficMonitoringTask
- Javadoc cleanup
Result:
We open the possibility of using TrafficCounter for other purposes than
just using it with AbstractTrafficShapingHandler. Eventually, we could
generalize it a little bit more, so that we can potentially use it for
other uses.
Motivation:
There are various places in OpenSslEngine wher we can do performance optimizations.
Modifications:
- Reduce JNI calls when possible
- Detect finished handshake as soon as possible
- Eliminate double calculations
- wrap multiple ByteBuffer if possible in a loop
Result:
Better performance
Motivation:
At the moment we log priming read and handshake errors via info log level and still throw a SSLException that contains the error. We should only log with debug level to generate less noise.
Modifications:
Change logging to debug level.
Result:
Less noise .
Motivation:
SonarQube (clinker.netty.io/sonar) reported a few 'critical' issues related to the OpenSslEngine.
Modifications:
- Remove potential for dereference of null variable.
- Remove duplicate null check and TODO cleanup.
Results:
Less potential for null dereference, cleaner code, and 1 less TODO.
Motivation:
SslHandler adds a pending write with an empty buffer and a VoidChannelPromise when a user flush and not pending writes are currently stored. This may produce an IllegalStateException later if the user try to add a ChannelFutureListener to the promise in the next ChannelOutboundHandler.
Modifications:
Replace ctx.voidPromise() with ctx.newPromise()
Result:
No more IllegalStateException possible
Motivation:
SSLEngine specifies that IllegalArgumentException must be thrown if a null argument is given when using wrap(...) or unwrap(...).
Modifications:
Replace NullPointerException with IllegalArgumentException to match the javadocs.
Result:
Match the javadocs.
Motivation:
We failed to correctly calculate the endOffset when wrap multiple ByteBuffer and so not wrapped everything when an offset > 0 is used.
Modifications:
Correctly calculate endOffset.
Result:
All ByteBuffers are correctly wrapped when offset > 0.
Motivation:
When SslHandler.unwrap() copies SSL records into a heap buffer, it does
not update the start offset, causing IndexOutOfBoundsException.
Modifications:
- Copy to a heap buffer before calling unwrap() for simplicity
- Do not copy an empty buffer to a heap buffer.
- unwrap(... EMPTY_BUFFER ...) never involves copying now.
- Use better parameter names for unwrap()
- Clean-up log messages
Result:
- Bugs fixed
- Cleaner code
Motivation:
When using OpenSslEngine with the SslHandler it is possible to reduce memory copies by unwrap(...) multiple ByteBuffers at the same time. This way we can eliminate a memory copy that is needed otherwise to cumulate partial received data.
Modifications:
- Add OpenSslEngine.unwrap(ByteBuffer[],...) method that can be used to unwrap multiple src ByteBuffer a the same time
- Use a CompositeByteBuffer in SslHandler for inbound data so we not need to memory copy
- Add OpenSslEngine.unwrap(ByteBuffer[],...) in SslHandler if OpenSslEngine is used and the inbound ByteBuf is backed by more then one ByteBuffer
- Reduce object allocation
Result:
SslHandler is faster when using OpenSslEngine and produce less GC
Motivation:
Currently when there are bytes left in the cumulation buffer we do a byte copy to produce the input buffer for the decode method. This can put quite some overhead on the impl.
Modification:
- Use a CompositeByteBuf to eliminate the byte copy.
- Allow to specify if a CompositeBytebug should be used or not as some handlers can only act on one ByteBuffer in an efficient way (like SslHandler :( ).
Result:
Performance improvement as shown in the following benchmark.
Without this patch:
[xxx@xxx ~]$ ./wrk-benchmark
Running 5m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 20.19ms 38.34ms 1.02s 98.70%
Req/Sec 241.10k 26.50k 303.45k 93.46%
1153994119 requests in 5.00m, 155.84GB read
Requests/sec: 3846702.44
Transfer/sec: 531.93MB
With the patch:
[xxx@xxx ~]$ ./wrk-benchmark
Running 5m test @ http://xxx:8080/plaintext
16 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 17.34ms 27.14ms 877.62ms 98.26%
Req/Sec 252.55k 23.77k 329.50k 87.71%
1209772221 requests in 5.00m, 163.37GB read
Requests/sec: 4032584.22
Transfer/sec: 557.64MB
Motivation:
When a user sees an error message, sometimes he or she does not know
what exactly he or she has to do to fix the problem.
Modifications:
Log the URL of the wiki pages that might help the user troubleshoot.
Result:
We are more friendly.
Motivation:
When a user deliberatively omitted netty-tcnative from classpath, he or
she will see an ugly stack trace of ClassNotFoundException.
Modifications:
Log more briefly when netty-tcnative is not in classpath.
Result:
Better-looking log at DEBUG level
Motivation:
- There's no point of pre-population.
- Waste of memory and time because they are going to be cached lazily
- Some pre-populated cipher suites are ancient and will be unused
Modification:
- Remove cache pre-population
Result:
Sanity restored
Motivation:
Calling JNI methods is pretty expensive, so we should only do if needed.
Modifications:
Lazy call methods if needed.
Result:
Better performance.
Motivation:
SSL_set_cipher_list() in OpenSSL does not fail as long as at least one
cipher suite is available. It is different from the semantics of
SSLEngine.setEnabledCipherSuites(), which raises an exception when the
list contains an unavailable cipher suite.
Modifications:
- Add OpenSsl.isCipherSuiteAvailable(String) which checks the
availability of a cipher suite
- Raise an IllegalArgumentException when the specified cipher suite is
not available
Result:
Fixed compatibility
Motivation:
To make OpenSslEngine a full drop-in replacement, we need to implement
getSupportedCipherSuites() and get/setEnabledCipherSuites().
Modifications:
- Retrieve the list of the available cipher suites when initializing
OpenSsl.
- Improve CipherSuiteConverter to understand SRP
- Add more test data to CipherSuiteConverterTest
- Add bulk-conversion method to CipherSuiteConverter
Result:
OpenSslEngine should now be a drop-in replacement for JDK SSLEngineImpl
for most cases.
Related: #3285
Motivation:
When a user attempts to switch from JdkSslContext to OpenSslContext, he
or she will see the initialization failure if he or she specified custom
cipher suites.
Modifications:
- Provide a utility class that converts between Java cipher suite string
and OpenSSL cipher suite string
- Attempt to convert the cipher suite so that a user can use the cipher
suite string format of Java regardless of the chosen SslContext impl
Result:
- It is possible to convert all known cipher suite strings.
- It is possible to switch from JdkSslContext and OpenSslContext and
vice versa without any configuration changes
Motivation:
Several issues were shown by various ticket (#2900#2956).
Also use the improvement on writability user management from #3036.
And finally add a mixte handler, both for Global and Channels, with
the advantages of being uniquely created and using less memory and
less shaping.
Issue #2900
When a huge amount of data are written, the current behavior of the
TrafficShaping handler is to limit the delay to 15s, whatever the delay
the previous write has. This is wrong, and when a huge amount of writes
are done in a short time, the traffic is not correctly shapened.
Moreover, there is a high risk of OOM if one is not using in his/her own
handler for instance ChannelFuture.addListener() to handle the write
bufferisation in the TrafficShapingHandler.
This fix use the "user-defined writability flags" from #3036 to
allow the TrafficShapingHandlers to "user-defined" managed writability
directly, as for reading, thus using the default isWritable() and
channelWritabilityChanged().
This allows for instance HttpChunkedInput to be fully compatible.
The "bandwidth" compute on write is only on "acquired" write orders, not
on "real" write orders, which is wrong from statistic point of view.
Issue #2956
When using GlobalTrafficShaping, every write (and read) are
synchronized, thus leading to a drop of performance.
ChannelTrafficShaping is not touched by this issue since synchronized is
then correct (handler is per channel, so the synchronized).
Modifications:
The current write delay computation takes into account the previous
write delay and time to check is the 15s delay (maxTime) is really
exceeded or not (using last scheduled write time). The algorithm is
simplified and in the same time more accurate.
This proposal uses the #3036 improvement on user-defined writability
flags.
When the real write occurs, the statistics are update accordingly on a
new attribute (getRealWriteThroughput()).
To limit the synchronisations, all synchronized on
GlobalTrafficShapingHandler on submitWrite were removed. They are
replaced with a lock per channel (since synchronization is still needed
to prevent unordered write per channel), as in the sendAllValid method
for the very same reason.
Also all synchronized on TrafficCounter on read/writeTimeToWait() are
removed as they are unnecessary since already locked before by the
caller.
Still the creation and remove operations on lock per channel (PerChannel
object) are synchronized to prevent concurrency issue on this critical
part, but then limited.
Additionnal changes:
1) Use System.nanoTime() instead of System.currentTimeMillis() and
minimize calls
2) Remove / 10 ° 10 since no more sleep usage
3) Use nanoTime instead of currentTime such that time spend is computed,
not real time clock. Therefore the "now" relative time (nanoTime based)
is passed on all sub methods.
4) Take care of removal of the handler to force write all pending writes
and release read too
8) Review Javadoc to explicit:
- recommandations to take into account isWritable
- recommandations to provide reasonable message size according to
traffic shaping limit
- explicit "best effort" traffic shaping behavior when changing
configuration dynamically
Add a MixteGlobalChannelTrafficShapingHandler which allows to use only one
handler for mixing Global and Channel TSH. I enables to save more memory and
tries to optimize the traffic among various channels.
Result:
The traffic shaping is more stable, even with a huge number of writes in
short time by taking into consideration last scheduled write time.
The current implementation of TrafficShapingHandler using user-defined
writability flags and default isWritable() and
fireChannelWritabilityChanged works as expected.
The statistics are more valuable (asked write vs real write).
The Global TrafficShapingHandler should now have less "global"
synchronization, hoping to the minimum, but still per Channel as needed.
The GlobalChannel TrafficShapingHandler allows to have only one handler for all channels while still offering per channel in addition to global traffic shaping.
And finally maintain backward compatibility.