netty5

Author	SHA1	Message	Date
Norman Maurer	8c0f1428af	Reduce the scope of synchronized block in PoolArena (#10410 ) Motivation: We shouldn't call incSmallAllocation() in a synchronized block as its backed by a concurrent datastructure Modifications: Move call of incSmallAllocation() out of synchronized block Result: Minimize scope of synchronized block	2020-07-16 19:41:02 +02:00
Ruwei	0d701d7c3c	Review PooledByteBufAllocator in respect of jemalloc 4.x changes and update allocate algorithm.(#10267 ) Motivation: For size from 512 bytes to chunkSize, we use a buddy algorithm. The drawback is that it has a large internal fragmentation. Modifications: 1. add SizeClassesMetric and SizeClasses 2. remove tiny size, now we have small, normal and huge size 3. rewrite the structure of PoolChunk 4. rewrite pooled allocate algorithm in PoolChunk 5. when allocate subpage, using lowest common multiple of pageSize and elemSize instead of pageSize. 6. add more tests in PooledByteBufAllocatorTest and PoolArenaTest Result: Reduce internal fragmentation and closes #3910	2020-07-16 08:24:27 +02:00
Norman Maurer	4dfe541167	Include more details if we throw an IllegalArgumentException because of overflow (#10330 ) Motivation: We should include as much details as possible when throwing an IllegalArgumentException because of overflow in CompositeByteBuf Modifications: Add more details and factor out check into a static method to share code Result: Make it more clear why an operations failed	2020-06-02 10:08:06 +02:00
Nick Hill	c8f701d48c	Correctly handle WrappedCompositeByteBufs in addFlattenedComponents() (#10247 ) Motivation An NPE was reported in #10245, caused by a regression introduced in #8939. This in particular affects ByteToMessageDecoders that use the COMPOSITE_CUMULATOR. Modification - Unwrap WrappedCompositeByteBufs passed to CompositeByteBuf#addFlattenedComponents(...) method before accessing internal components field - Extend unit test to cover this case and ensure more of the CompositeByteBuf tests are also run on the wrapped variant Results Fixes #10245	2020-05-05 13:57:22 +02:00
Norman Maurer	0c2c7c8f82	Add fastpath implementation for Unpooled.copiedBuffer(CharSequence, Charset) when UTF-8 or US-ASCII is used (#10206 ) Motivation: We can make use of our optimized implementations for UTF-8 and US-ASCII if the user request a copy of a sequence for these charsets Modifications: - Add fastpath implementation - Add unit tests Result: Fixes https://github.com/netty/netty/issues/10205	2020-04-23 18:02:05 +02:00
Norman Maurer	469c1ca34c	Guard against overflow when calling CompositeByteBuf.addComponent(...) (#10197 ) Motivation: We need to ensure we not overflow when adding buffers to a CompositeByteBuf Modifications: - Correctly validate overflow before adding to the internal storage - Add testcase Result: Fixes https://github.com/netty/netty/issues/10194	2020-04-21 12:04:47 +02:00
Linas Medžiūnas	abdcf102da	Efficient BytBuf search algorithms (#9914 ) (#9955 ) Motivation: We have found out that ByteBufUtil.indexOf can be inefficient for substring search on ByteBuf, both in terms of algorithm complexity (worst case O(needle.readableBytes * haystack.readableBytes)), and in constant factor (esp. on Composite buffers). With implementation of more performant search algorithms we have seen improvements on the order of magnitude. Modifications: This change introduces three search algorithms: 1. Knuth Morris Pratt - classical textbook algorithm, a good default choice. 2. Bit mask based algorithm - stable performance on any input, but limited to maximum search substring (the needle) length of 64 bytes. 3. Aho–Corasick - worse performance and higher memory consumption than [1] and [2], but it supports multiple substring (the needles) search simultaneously, by inspecting every byte of the haystack only once. Each algorithm processes every byte of underlying buffer only once, they are implemented as ByteProcessor. Result: Efficient search algorithms with linear time complexity available in Netty (I will share benchmark results in a comment on a PR).	2020-04-15 10:26:53 +02:00
时无两丶	81b435b129	Remove duplicate code in PoolArena. (#10174 ) Motivation: Remove duplicate code in PoolArena. Modification: Replace duplicate code with `tinyIdx` and `smallIdx`. Result: Clean code.	2020-04-15 09:30:26 +02:00
Dmitry Konstantinov	7402a3a55a	Propagate ref to pool thread cache down in the allocation stack to avoid extra thread local lookup (#10166 ) Motivation: PoolChunk requires a link to a PoolThreadCache to init ByteBuf. Currently the link is retrieved from a thread local: arena.parent.threadCache(). It has some performance cost. At the beginning of the allocation call the PoolThreadCache is already retrieved from the thread local. The reference can be propagated through the calls and used. Modifications: Replace second lookup of PoolThreadCache during ByteBuf init by propagation of a reference to PoolThreadCache down in the allocation stack explicitly Result: Improve performance of ByteBuf allocation --Before-- Benchmark (size) (tokens) (useThreadCache) Mode Cnt Score Error Units SimpleByteBufPooledAllocatorBenchmark.getAndRelease 123 0 true avgt 20 57.112 ± 1.004 ns/op SimpleByteBufPooledAllocatorBenchmark.getAndRelease 123 100 true avgt 20 222.827 ± 1.307 ns/op --After-- Benchmark (size) (tokens) (useThreadCache) Mode Cnt Score Error Units SimpleByteBufPooledAllocatorBenchmark.getAndRelease 123 0 true avgt 20 50.732 ± 1.321 ns/op SimpleByteBufPooledAllocatorBenchmark.getAndRelease 123 100 true avgt 20 216.892 ± 3.806 ns/op	2020-04-06 08:07:44 +02:00
Dmitry Konstantinov	dc69c04434	Replace usage() with freeBytes() in thresholds within hot paths of PoolChunkList (#10141 ) Motivation: PoolChunk.usage() method has non-trivial computations. It is used currently in hot path methods invoked when an allocation and de-allocation are happened. The idea is to replace usage() output comparison against percent thresholds by Chunk.freeBytes plain comparison against absolute thresholds. In such way the majority of computations from the threshold conditions are moved to init logic. Modifications: Replace PoolChunk.usage() conditions in PoolChunkList with equivalent conditions for PoolChunk.freeBytes() Result: Improve performance of allocation and de-allocation of ByteBuf from normal size cache pool	2020-03-31 22:11:42 +02:00
Dmitry Konstantinov	71b7dbc009	Optimize log2 in PoolThreadCache (#10128 ) Motivation: The current implementation of log2 in PoolThreadCache uses a loop and less efficient than an version based on Integer.numberOfLeadingZeros (intrinsic). Modifications: Replace the current log2 implementation in PoolThreadCache with a version based on Integer.numberOfLeadingZeros Result: It can improve performance slightly during allocation and de-allocation of ByteBuf using pooled allocator.	2020-03-24 11:14:08 +01:00
Norman Maurer	8ab4adcc56	FixedCompositeByteBuf.isDirect() may return wrong value when constructed with empty array (#10005 ) Motivation: FixedCompositeByteBuf.isDirect() should return the same value as EMPTY_BUFFER.isDirect() when constructed via an empty array Modifications: - Return correct value when constructed via empty array. - Add unit test Result: FixedCompositeByteBuf.isDirect() returns correct value	2020-02-08 17:05:44 +01:00
Norman Maurer	0d4af6d9da	Fix incorrect property name in PooledByteBufAllocator Motivation: We hat a typo in the property name that is used in PooledByteBufAllocator. Modifications: Change from "...allocation.." to "...allocator.." Result: More consistent property naming	2020-02-05 14:40:08 +01:00
Norman Maurer	6a43807843	Use lambdas whenever possible (#9979 ) Motivation: We should update our code to use lamdas whenever possible Modifications: Use lambdas when possible Result: Cleanup code for Java8	2020-01-30 09:28:24 +01:00
Nick Hill	727f03755c	Fix BufferOverflowException during non-Unsafe PooledDirectByteBuf resize (#9912 ) Motivation Recent optimization #9765 introduced a bug where the native indices of the internal reused duplicate nio buffer are not properly reset prior to using it to copy data during a reallocation operation. This can result in BufferOverflowExceptions thrown during ByteBuf capacity changes. The code path in question applies only to pooled direct buffers when Unsafe is disabled or not available. Modification Ensure ByteBuffer#clear() is always called on the reused internal nio buffer prior to returning it from PooledByteBuf#internalNioBuffer() (protected method); add unit test that exposes the bug. Result Fixes #9911	2020-01-11 06:05:32 +01:00
Norman Maurer	5b387975b8	Add unit test for leak aware CompositeByteBuf that proves that there is no NPE (#9875 ) Motivation: https://github.com/netty/netty/issues/9873 reported a NPE in previous version of netty. We should add a unit test to verify there is no more NPE Modifications: Add a unit test Result: Prove that https://github.com/netty/netty/issues/9873 is fixed	2019-12-12 16:19:26 +01:00
Sergey Skrobotov	f10bee9057	Change DefaultByteBufHolder.equals() to treat instances of different classes as not equal (#9855 ) # Motivation: `DefaultByteBufHolder.equals()` considers another object equal if it's an instance of `ByteBufferHolder` and if the contents of two objects are equal. However, the behavior of `equals` method is not a part of the `ByteBufHolder` contract so `DefaultByteBufHolder`'s version may be causing violation of the symmetric property if other classes have different logic. There are already a few classes that are affected by this: `DefaultHttp2GoAwayFrame`, `DefaultHttp2UnknownFrame`, and `SctpMessage` are all overriding `equals` method breaking the symmetric property. Another effect of this behavior is that all instances with empty data are considered equal. That may not be desireable in the situations when instances are created for predefined constants, e.g. `FullBulkStringRedisMessage.NULL_INSTANCE` and `FullBulkStringRedisMessage.EMPTY_INSTANCE` in `codec-redis`. # Modification: Make `DefaultByteBufHolder.equals()` implementation only work for the objects of the same class. # Result: - The symmetric property of the `equals` method is restored for the classes in question. - Instances of different classes are not considered equal even if the content of the data they hold are the same.	2019-12-10 11:30:23 +01:00
Norman Maurer	ff8846f1b5	Replace ObjectUtil.checkNonNull(...) with Objects.requireNonNull(...) (#9864 ) Motivation: We should use Objects.requireNonNull(...) as we require java8 Modifications: Replace ObjectUtil.checkNonNull(...) with Objects.requireNonNull(...) Result: Code cleanup	2019-12-10 11:27:32 +01:00
Nick Hill	4a5712f8d9	Minor simplifications/optimizations to AbstractByteBuf methods (#9845 ) Motivation While working on other changes I noticed some opportunities to streamline a few things in AbstractByteBuf. Modifications - Avoid duplicate ensureAccessible() checks in discard(Some)ReadBytes() and ensureWritable0(int) methods - Simplify ensureWritable0(int) logic - Make some conditional checks more concise Result Cleaner, possibly faster code	2019-12-05 11:51:02 +01:00
Nick Hill	c6bdc2b7dc	Reduce ByteBuffer duplication when resizing pooled direct ByteBufs (#9765 ) Motivation: Currently when use of Unsafe is disabled and an internal reallocation is performed for a direct PooledByteBuf, a one-off temporary duplicate is made of the source and destination backing nio buffers so that the copy can be done in a threadsafe manner. The need for this can be reduced by sharing the temporary duplicate buffer that is already stored in the corresponding destination PooledByteBuf instance. Modifications: Have PoolArena#memoryCopy(...) take the destination PooledByteBuf instead of the underlying mem reference and offset, and use internalNioBuffer() to obtain/initialize a reusable duplicate of the backing nio buffer. Result: Fewer temporary allocations when resizing direct pooled ByteBufs in the non-Unsafe case	2019-11-16 11:27:03 -08:00
Nick Hill	625981a296	Introduce ByteBuf#isContiguous() method (#9735 ) Motivation There's currently no way to determine whether an arbitrary ByteBuf behaves internally like a "singluar" buffer or a composite one, and this can be important to know when making decisions about how to manipulate it in an efficient way. An example of this is the ByteBuf#discardReadBytes() method which increases the writable bytes for a contiguous buffer (by readerIndex) but does not for a composite one. Unfortunately !(buf instanceof CompositeByteBuf) is not reliable, since for example this will be true in the case of a sliced CompositeByteBuf or some third-party composite implementation. isContiguous was chosen over isComposite since we want to assume "not contiguous" in the unknown/default case - the doc will it clear that false does not imply composite. Modifications - Add ByteBuf#isContiguous() which returns true by default - Override the "concrete" ByteBuf impls to return true and ensure wrapped/derived impls delegate it appropriately - Include some basic unit tests Result Better assumptions/decisions possible when manipulating arbitrary ByteBufs, for example when combining/cumulating them.	2019-11-06 12:07:00 +01:00
Norman Maurer	4be554a21f	Hide Recycler implemention to allow experimenting with different implementions of an Object pool (#9715 ) Motivation: At the moment we directly extend the Recycler base class in our code which makes it hard to experiment with different Object pool implementation. It would be nice to be able to switch from one to another by using a system property in the future. This would also allow to more easily test things like https://github.com/netty/netty/pull/8052. Modifications: - Introduce ObjectPool class with static method that we now use internally to obtain an ObjectPool implementation. - Wrap the Recycler into an ObjectPool and return it for now Result: Preparation for different ObjectPool implementations	2019-10-26 09:43:21 +02:00
Nick Hill	53183ae1ab	Change javadoc of ByteBuf#indexOf(...) to match its behaviour (#9679 ) Motivation Currently doc != code and so one needs to change. Though behaviour as currently documented might be more intuitive, we don't want to break anyone so will adjust the doc instead. See #9503 for discussion. Modifications Correct the javadoc of indexOf(...) method in ByteBuf abstract class. Results Correct javadoc	2019-10-25 08:43:52 +02:00
Nick Hill	b08bbd4c20	Fix for incorrect values from CompositeByteBuf#component(int) (#9525 ) Motivation This is a "simpler" alternative to #9416 which fixes the same CompositeByteBuf bugs described there, originally reported by @jingene in #9398. Modifications - Add fields to Component class for the original buffer along with its adjustment, which may be different to the already-stored unwrapped buffer. Use it in appropriate places to ensure correctness and equivalent behaviour to that prior to the earlier optimizations - Add comments explaining purpose of each of the Component fields - Unwrap more kinds of buffers in newComponent method to extend scope of the existing indirection-reduction optimization - De-duplicate common buffer consolidation logic - Unit test for the original bug provided by @jingene Result - Correct behaviour / fixed bugs - Some code deduplication / simplification - Unwrapping optimization applied to more types of buffers The downside is increased mem footprint from the two new fields, and additional allocations in some specific cases, though those should be rare. Co-authored-by: jingene <jingene0206@gmail.com>	2019-09-02 13:52:47 +02:00
Codrut Stancu	de126fdf65	Update GraalVM Native Image configuration. (#9515 ) Motivation: The Netty classes are initialized at build time by default for GraalVM Native Image compilation. This is configured via the `--initialize-at-build-time=io.netty` option. While this reduces start-up time it can lead to some problems: - The class initializer of `io.netty.buffer.PooledByteBufAllocator` looks at the maximum memory size to compute the size of internal buffers. If the class initializer runs during image generation, then the buffers are sized according to the very large heap size that the image generator uses, and Netty allocates several arrays that are 16 MByte. The fix is to initialize the following 3 classes at run time: `io.netty.buffer.PooledByteBufAllocator,io.netty.buffer.ByteBufAllocator,io.netty.buffer.ByteBufUtil`. This fix was dependent on a GraalVM Native Image fix that was included in 19.2.0. - The class initializer of `io.netty.handler.ssl.util.ThreadLocalInsecureRandom` needs to be initialized at runtime to ensure that the generated values are trully random and not fixed for each generated image. - The class initializers of `io.netty.buffer.AbstractReferenceCountedByteBuf` and `io.netty.util.AbstractReferenceCounted` compute field offsets. While the field offset recomputation is necessary for correct execution as a native image these initializers also have logic that depends on the presence/absence of `sun.misc.Unsafe`, e.g., via the `-Dio.netty.noUnsafe=true` flag. The fix is to push these initializers to runtime so that the field offset lookups (and the logic depending on them) run at run time. This way no manual substitutions are necessary either. Modifications: Add `META-INF/native-image` configuration files that correctly trigger the inialization of the above classes at run time via `--initialize-at-run-time=...` flags. Result: Fixes the initialisation issues described above for Netty executables built with GraalVM.	2019-08-30 09:21:33 +02:00
Norman Maurer	05dd1f05d5	Reduce GC produced by AbstractByteBuf.indexOf(..) implementation (#9502 ) Motivation: AbstractByteBuf.indexOf(...) currently delegates to ByteBufUtils.indexOf(...) which will create a new ByteBufProcessor on each call. This is done to reduce overhead of bounds-checks. Unfortunally while this reduces bounds checks it produces a lot of GC. We can just implement our own version in AbstractByteBuf which makes use of _getByte(...) and so does no bound checks as well but also not need to create any garbage. Modifications: Write optimized implementation of indexOf(...) for AbstractByteBuf Result: Fixes https://github.com/netty/netty/issues/9499.	2019-08-24 11:48:35 +00:00
Norman Maurer	6f616bb3cf	Avoid creating FileInputStream and FileOutputStream for obtaining Fil… (#8110 ) Motivation: If all we need is the FileChannel we should better use RandomAccessFile as FileInputStream and FileOutputStream use a finalizer. Modifications: Replace FileInputStream and FileOutputStream with RandomAccessFile when possible. Result: Fixes https://github.com/netty/netty/issues/8078.	2019-08-17 09:52:16 +02:00
Nick Hill	f673ba36a0	Don't zero non-readable buffer regions when capacity is decreased (#9427 ) Motivation region is preserved when capacity is increased, not just the readable part. The behaviour is still different however when the capacity is _decreased_ - data outside the currently-readable region is zeroed. Modifications Update ByteBuf capacity(int) implementations to also copy the whole buffer region when the new capacity is less than the current capacity. Result Consistent behaviour of ByteBuf#capacity(int) regardless of whether the new capacity is greater than or less than the current capacity.	2019-08-16 08:28:33 +02:00
Nick Hill	bc22bfa320	Use alloc().heapBuffer(...) to allocate new heap buffer. Motivation Underlying array allocations in UnpooledHeapByteBuf are intended be done via the protected allocateArray(int) method, so that they can be tracked and/or overridden by subclasses, for example UnpooledByteBufAllocator$InstrumentedUnpooledHeapByteBuf or #8015. But it looks like an explicit allocation was missed in the copy(int,int) method. Modification Just use alloc().heapBuffer(...) for the allocation Result No possibility of "missing" array allocations when ByteBuf#copy is used.	2019-08-13 10:52:52 +02:00
Nick Hill	dcd145864a	Fix ByteBufUtil#writeUtf8 subsequence split surrogate edge-case bug (#9437 ) Motivation: #9224 introduced overrides of ByteBufUtil#writeUtf8(...) and related methods to operate on a sub-CharSequence directly to save having to allocate substrings, but it missed an edge case where the subsequence does not extend to the end of the CharSequence and the last char in the sequence is a high surrogate. Due to the catch-IndexOutOfBoundsException optimization that avoids an additional bounds check, it would be possible to read past the specified end char index and successfully decode a surrogate pair which would otherwise result in a '?' byte being written. Modifications: - Check for end-of-subsequence before reading next char after a high surrogate is encountered in the writeUtf8(AbstractByteBuf,int,CharSequence,int,int) and utf8BytesNonAscii methods - Add unit test for this edge case Result: Bug is fixed. This removes the bounds-check-avoidance optimization but it does not appear to have a measurable impact on benchmark results, including when the char sequence contains many surrogate pairs (which should be rare in any case).	2019-08-10 20:54:43 +02:00
Norman Maurer	f0afd7cfcd	Correctly take length of ByteBufInputStream into account for readLine… (#9310 ) * Correctly take length of ByteBufInputStream into account for readLine() / readByte() Motivation: ByteBufInputStream did not correctly take the length into account when validate bounds for readLine() / readByte() which could lead to read more then allowed. Modifications: - Correctly take length into account - Add unit tests - Fix existing unit test Result: Correctly take length of ByteBufInputStream into account. Related to https://github.com/netty/netty/pull/9306.	2019-07-01 20:55:47 +02:00
xiaoheng1	c0f1e9bd21	Fix public int read() throws IOException method exceeds the limit of length (#9306 ) Motivation: buffer.isReadable() should not be used to limit the amount of data that can be read as the amount may be less then was is readable. Modification: - Use available() which takes the length into account - Add unit test Result: Fixes https://github.com/netty/netty/issues/9305	2019-07-01 15:58:05 +02:00
秦世成	1454c3a777	Fix the issue of incorrectly calculating the number of dump lines when using PrettyDump in ByteBufUtil (#9304 ) Motivation: Fix the issue of incorrectly calculating the number of dump rows when using prettyHexDumpmethod in ByteBufUtil. The way to find the remainder is either length % 16 or length & 15 Modification: Fixed the way to calculate the remainder Result: Fixed #9301	2019-07-01 08:36:00 +02:00
jimin	78adeb5408	All override methods must be added @override (#9285 ) Motivation: Some methods that either override others or are implemented as part of implementation an interface did miss the `@Override` annotation Modifications: Add missing `@Override`s Result: Code cleanup	2019-06-27 13:52:06 +02:00
Norman Maurer	eae9492c40	EmptyByteBuf.getCharSequence(0,...) must return empty String (#9272 ) Motivation: At the moment EmptyByteBuf.getCharSequence(0,...) will return null while it must return a "". Modifications: - Let EmptyByteBuf.getCharSequence(0,...) return "" - Add unit test Result: Fixes https://github.com/netty/netty/issues/9271.	2019-06-24 21:09:52 +02:00
Nick Hill	c61fa9be70	Subsequence versions of ByteBufUtil#writeUtf8(...) methods (#9224 ) Motivation It would be useful to be able to write UTF-8 encoded subsequence of CharSequence characters to a ByteBuf without needing to create a temporary object via CharSequence#subSequence(). Modification Add overloads of ByteBufUtil writeUtf8, reserveAndWriteUtf8 and utf8Bytes methods which take explicit subsequence bounds. Result More efficient writing of substrings to byte buffers possible	2019-06-21 14:06:34 +02:00
Nick Hill	ea2e4a7b3d	De-duplicate PooledByteBuf implementations (#9120 ) Motivation There's quite a lot of duplicate/equivalent logic across the various concrete ByteBuf implementations. We could take this even further but for now I've focused on the PooledByteBuf sub-hierarchy. Modifications - Move common logic/methods into existing PooledByteBuf abstract superclass - Shorten PooledByteBuf.capacity(int) method implementation Result Less code to maintain	2019-06-19 20:50:57 +02:00
Nick Hill	4ba75b99af	De-duplicate UnpooledDirectByteBuf/UnpooledUnsafeDirectByteBuf (#9085 ) Motivation While digging around looking at something else I noticed that these share a lot of logic and it would be nice to reduce that duplication. Modifications Have UnpooledUnsafeDirectByteBuf extend UnpooledDirectByteBuf and make adjustments to ensure existing behaviour remains unchanged. The most significant addition needed to UnpooledUnsafeDirectByteBuf was re-overriding the getPrimitive/setPrimitive methods to revert back to the AbstractByteBuf versions which include bounds checks (UnpooledDirectByteBuf excludes these as an optimization, relying on those done by underlying ByteBuffer). Result ~200 fewer lines, less duplicate logic.	2019-06-03 13:15:43 +02:00
Idel Pivnitskiy	b5ff56b6ee	Make UnpooledUnsafeHeapByteBuf class public (#9184 ) Motivation: 1. Users will be able to use an optimized version of `UnpooledHeapByteBuf` and override behavior of methods if required. 2. Consistency with `UnpooledDirectByteBuf`, `UnpooledHeapByteBuf`, and `UnpooledUnsafeDirectByteBuf`. Modifications: - Add `public` access modifier to `UnpooledUnsafeHeapByteBuf` class and ctor; Result: Public access for optimized version of `UnpooledHeapByteBuf`.	2019-05-31 07:04:52 +02:00
Nick Hill	1e575d7389	Fix redundant or missing checks and other inconsistencies in ByteBuf impls (#9119 ) Motivation There are a few minor inconsistencies / redundant operations in the ByteBuf implementations which would be good to fix. Modifications - Unnecessary ByteBuffer.duplicate() performed in CompositeByteBuf.nioBuffer(int,int) - Add missing checkIndex(...) check to ReadOnlyByteBufferBuf.nioBuffer(int,int) - Remove duplicate bounds check in ReadOnlyByteBufferBuf.getBytes(int,byte[],int,int) - Omit redundant bounds check in UnpooledHeapByteBuf.getBytes(int,ByteBuffer) Result More consistency and slightly less overhead	2019-05-27 15:32:42 +02:00
Nick Hill	4fa7b99da2	Introduce ByteBuf.maxFastWritableBytes() method (#9086 ) Motivation ByteBuf capacity is automatically increased as needed up to maxCapacity when writing beyond the buffer's current capacity. However there's no way to tell in general whether such an increase will result in a relatively costly internal buffer re-allocation. For unpooled buffers it always does, in pooled cases it depends on the size of the associated chunk of allocated memory, which I don't think is currently exposed in any way. It would sometimes be useful to know where this limit is when making external decisions about whether to reuse or preemptively reallocate. It would also be advantageous to take this limit into account when auto-increasing the capacity during writes, to defer such reallocation until really necessary. Modifications Introduce new AbstractByteBuf.maxFastWritableBytes() method which will return a value >= writableBytes() and <= maxWritableBytes(). Make use of the new method in the sizing decision made by the AbstractByteBuf.ensureWritable(...) methods. Result Less reallocation/copying.	2019-05-22 20:11:56 +02:00
Nick Hill	d29f91540d	Fix possible unsafe sharing of internal NIO buffer in CompositeByteBuf (#9169 ) Motivation A small thread-safety bug was introduced during the internal optimizations of ComponentByteBuf made a while back in #8437. When there is a single component which was added as a slice, internalNioBuffer(int,int) will currently return the unwrapped slice's un-duplicated internal NIO buffer. This is not safe since it could be modified concurrently with other usage of that parent buffer. Modifications Delegate internalNioBuffer to nioBuffer in this case, which returns a duplicate. This matches what's done in derived buffers in general (for the same reason). Add unit test. Result Fixed possible thread-safety bug	2019-05-22 11:07:27 +02:00
Tim Brooks	e4ef8f6ff3	Prefer direct io buffers if direct buffers pooled (#9167 ) Motivation Direct buffers are normally preferred when interfacing with raw sockets. Currently netty will only return direct io buffers (for reading from a channel) when a platform has unsafe. However, this is inconsistent with the write-side (filterOutboundMessage) where a direct byte buffer will be returned if pooling is enabled. This means that environments without unsafe (and no manual netty configurations) end up with many pooled heap byte buffers for reading, many pooled direct byte buffers for writing, and jdk pooled byte buffers (for reading). Modifications This commit modifies the AbstractByteBufAllocator to return a direct byte buffer for io handling when the platform has unsafe or direct byte buffers are pooled. Result: Use direct buffers when direct buffers are pooled for IO.	2019-05-22 07:33:06 +02:00
Nick Hill	5e786e549e	Fix incorrect behavior of ReadOnlyByteBufferBuf.getBytes(int,ByteBuffer) (#9125 ) * Fix incorrect behavior of ReadOnlyByteBufferBuf.getBytes(int,ByteBuffer) Motivation It currently will succeed when the destination is larger than the source range, but the ByteBuf javadoc states this should be a failure, as is the case with all the other implementations. Modifications - Fix logic to fail the bounds check in this case - Remove explicit null check which isn't done in any equivalent method - Add unit test Result More correct/consistent behaviour	2019-05-13 07:03:10 +02:00
Norman Maurer	7e6d1bbcfd	Fix regression in CompositeByteBuf.discardReadBytes() (#9068 ) Motivation: `1f93bd3` introduced a regression that could lead to not have the lastAccessed field correctly null'ed out when the endOffset of the internal Component == CompositeByteBuf.readerIndex() Modifications: - Correctly null out the lastAccessed field in any case - Add unit tests Result: Fixes regression in CompositeByteBuf.discardReadBytes()	2019-04-17 18:03:29 +02:00
Nick Hill	e75b7edaa3	Centralize internal reference counting logic (#8614 ) Motivation AbstractReferenceCounted and AbstractReferenceCountedByteBuf contain duplicate logic for managing the volatile refcount in an optimized and consistent manner, which increased in complexity in #8583. It's possible to extract this into a common helper class now that all access is via an AtomicIntegerFieldUpdater. Modifications - Move duplicate logic into a shared ReferenceCountUpdater class - Incorporate some additional simplification for the most common single increment/decrement cases (fewer checks/operations) Result Less code duplication, better encapsulation of the "non-trivial" internal volatile refcount manipulation	2019-04-09 16:23:15 +02:00
Nick Hill	1f93bd36b6	CompositeByteBuf optimizations and new addFlattenedComponents method (#8939 ) Motivation: The CompositeByteBuf discardReadBytes / discardReadComponents methods are currently quite inefficient, including when there are no read components to discard. We would like to call the latter more frequently in ByteToMessageDecoder#COMPOSITE_CUMULATOR. In the same context it would be beneficial to perform a "shallow copy" of a composite buffer (for example when it has a refcount > 1) to avoid having to allocate and copy the contained bytes just to obtain an "independent" cumulation. Modifications: - Optimize discardReadBytes() and discardReadComponents() implementations (start at first comp rather than performing a binary search for the readerIndex). - New addFlattenedComponents(boolean,ByteBuf) method which performs a shallow copy if the provided buffer is also composite and avoids adding any empty buffers, plus unit test. - Other minor optimizations to avoid unnecessary checks. Results: discardReadXX methods are faster, composite buffers can be easily appended without deepening the buffer "tree" or retaining unused components.	2019-04-08 20:55:19 +02:00
Norman Maurer	1eabb752c0	Fix NPE that was encounter by debugger (will never happen in real code). (#8992 ) Motivation: We synchronize on the chunk.arena when produce the String returned by PoolSubpage.toString() which may raise a NPE when chunk == null. Chunk == null for the head of the linked-list and so a NPE may raised by a debugger. This NPE can never happen in real code tho as we never access toString() of the head. Modifications: Add null checks and so fix the possible NPE Result: No NPE when using a debugger and inspect the PooledByteBufAllocator.	2019-04-01 19:44:45 +02:00
Norman Maurer	78c4abbfb7	Change default of io.netty.allocator.useCacheForAllThreads to false (#8991 ) Motivation: We currently use a thread local cache for all threads which often is suprising to users as it may result in a lot of memory usage if they allocate buffers from outside the EventLoop in different threads. We should better not do this by default to keep suprises to a minimum. Users that need the performance and know what they are doing can still change this. Modifications: Change io.netty.allocator.useCacheForAllThreads to false by default Result: Related to https://github.com/netty/netty/issues/8536.	2019-04-01 19:43:19 +02:00
Norman Maurer	ab7accb2de	Don't try to put back MemoryRegionCache.Entry objects into the Recycler when recycled because of a finalizer. (#8955 ) Motivation: In MemoryRegionCache.Entry we use the Recycler to reduce GC pressure and churn. The problem is that these will also be recycled when the PoolThreadCache is collected and finalize() is called. This then can have the effect that we try to load class but the WebApp is already stoped. This will produce an stacktrace like this on Tomcat: ``` 19-Mar-2019 15:53:21.351 INFO [Finalizer] org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading Illegal access: this web application instance has been stopped already. Could not load [java.util.WeakHashMap]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access. java.lang.IllegalStateException: Illegal access: this web application instance has been stopped already. Could not load [java.util.WeakHashMap]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access. at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading(WebappClassLoaderBase.java:1383) at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForClassLoading(WebappClassLoaderBase.java:1371) at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1224) at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1186) at io.netty.util.Recycler$3.initialValue(Recycler.java:233) at io.netty.util.Recycler$3.initialValue(Recycler.java:230) at io.netty.util.concurrent.FastThreadLocal.initialize(FastThreadLocal.java:188) at io.netty.util.concurrent.FastThreadLocal.get(FastThreadLocal.java:142) at io.netty.util.Recycler$Stack.pushLater(Recycler.java:624) at io.netty.util.Recycler$Stack.push(Recycler.java:597) at io.netty.util.Recycler$DefaultHandle.recycle(Recycler.java:225) at io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry.recycle(PoolThreadCache.java:478) at io.netty.buffer.PoolThreadCache$MemoryRegionCache.freeEntry(PoolThreadCache.java:459) at io.netty.buffer.PoolThreadCache$MemoryRegionCache.free(PoolThreadCache.java:430) at io.netty.buffer.PoolThreadCache$MemoryRegionCache.free(PoolThreadCache.java:422) at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:279) at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:270) at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:241) at io.netty.buffer.PoolThreadCache.finalize(PoolThreadCache.java:230) at java.lang.System$2.invokeFinalize(System.java:1270) at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:102) at java.lang.ref.Finalizer.access$100(Finalizer.java:34) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:217) ``` Beside this we also need to ensure we not try to lazy load SizeClass when the finalizer is used as it may not be present anymore if the ClassLoader is already destroyed. This would produce an error like: ``` 20-Mar-2019 11:26:35.254 INFO [Finalizer] org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading Illegal access: this web application instance has been stopped already. Could not load [io.netty.buffer.PoolArena$1]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access. java.lang.IllegalStateException: Illegal access: this web application instance has been stopped already. Could not load [io.netty.buffer.PoolArena$1]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access. at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading(WebappClassLoaderBase.java:1383) at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForClassLoading(WebappClassLoaderBase.java:1371) at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1224) at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1186) at io.netty.buffer.PoolArena.freeChunk(PoolArena.java:287) at io.netty.buffer.PoolThreadCache$MemoryRegionCache.freeEntry(PoolThreadCache.java:464) at io.netty.buffer.PoolThreadCache$MemoryRegionCache.free(PoolThreadCache.java:429) at io.netty.buffer.PoolThreadCache$MemoryRegionCache.free(PoolThreadCache.java:421) at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:278) at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:269) at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:240) at io.netty.buffer.PoolThreadCache.finalize(PoolThreadCache.java:229) at java.lang.System$2.invokeFinalize(System.java:1270) at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:102) at java.lang.ref.Finalizer.access$100(Finalizer.java:34) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:217) ``` Modifications: - Only try to put the Entry back into the Recycler if the PoolThredCache is not destroyed because of the finalizer. - Only try to access SizeClass if not triggered by finalizer. Result: No IllegalStateException anymoe when a webapp is reloaded in Tomcat that uses netty and uses the PooledByteBufAllocator.	2019-03-22 12:16:53 +01:00
Nick Hill	a71e33ae35	Fix possible ByteBuf leak when CompositeByteBuf is resized (#8946 ) Motivation: The special case fixed in #8497 also requires that we keep a derived slice when trimming components in place, as done by the capacity(int) and discardReadBytes() methods. Modifications: Ensure that we keep a ref to trimmed components' original retained slice in capacity(int) and discardReadBytes() methods, so that it is released properly when the they are later freed. Add unit test which fails prior to the fix. Result: Edge case leak is eliminated.	2019-03-22 11:18:28 +01:00
Norman Maurer	9681842d54	Allow to automatically trim the PoolThreadCache in a timely interval (#8941 ) Motivation: PooledByteBufAllocator uses a PoolThreadCache per Thread that allocates / deallocates to minimize the performance overhead. This PoolThreadCache is trimmed after X allocations to free up buffers that are not allocated for a long time. This works out quite well when the app continues to allocate but fails if the app stops to allocate frequently (for whatever reason) and so a lot of memory is wasted and not given back to the arena / freed. Modifications: - Add a ThreadExecutorMap that offers multiple methods that wrap Runnable / ThreadFactory / Executor and allow to call ThreadExecutorMap.currentEventExecutor() to get the current executing EventExecutor for the calling Thread. - Use these methods in the constructors of our EventExecutor implementations (which also covers the EventLoop implementations) - Add io.netty.allocator.cacheTrimIntervalMillis system property which can be used to specify a fixed rate / interval on which we should try to trim the PoolThreadCache for a EventExecutor that allocates. - Add PooledByteBufAllocator.trimCurrentThreadCache() to allow the user to trim the cache of the calling thread manually. - Add testcases - Introduce FastThreadLocal.getIfExists() Result: Allow to better / more frequently trim PoolThreadCache and so give back memory to the area / system.	2019-03-22 11:17:40 +01:00
Norman Maurer	795f9600ba	Adjust tests to be able to build / test when using IBM J9 / OpenJ9 (#8900 ) Motivation: We should run a CI job using J9 to ensure netty also works when using different JVMs. Modifications: - Adjust PooledByteBufAllocatorTest to be able to complete faster when using a JVM which takes longer when joining Threads (this seems to be the case with J9). Result: Be able to run CI against J9.	2019-03-01 06:48:35 +01:00
Nick Hill	35161ad174	Further reduce ensureAccessible() overhead (#8895 ) Motivation: This PR fixes some non-negligible overhead discovered in the ByteBuf accessibility (non-zero refcount) checking. The cause turned out to be mostly twofold: - Unnecessary operations used to calculate the refcount from the "raw" encoded int field value - Call stack depths exceeding the default limit for inlining, in some places (CompositeByteBuf in particular) It's a follow-on from #8882 which uses the maxCapacity field for a simpler non-negative check. The performance gap between these two variants appears to be _mostly_ closed, but there's one exception which may warrant further analysis. Modifications: - Replace ABB.internalRefCount() with ByteBuf.isAccessible(), the default still checks for non-zero refCnt() - Just test for parity of raw refCnt instead of converting to "real", with fast-path for specific small values - Make sure isAccessible() is delegated by derived/wrapper ByteBufs - Use existing freed flag in CompositeByteBuf for faster isAccessible() - Manually inline some calls in methods like CompositeByteBuf.setLong() and AbstractReferenceCountedByteBuf.isAccessible() to reduce stack depths (to ensure default inlining limit isn't hit) - Add ByteBufAccessBenchmark which is an extension of UnsafeByteBufBenchmark (maybe latter could now be removed) Results: Before: Benchmark (bufferType) (checkAccessible) (checkBounds) Mode Cnt Score Error Units readBatch UNSAFE true true thrpt 30 84524972.863 ± 518338.811 ops/s readBatch UNSAFE_SLICE true true thrpt 30 38608795.037 ± 298176.974 ops/s readBatch HEAP true true thrpt 30 80003697.649 ± 974674.119 ops/s readBatch COMPOSITE true true thrpt 30 18495554.788 ± 108075.023 ops/s setGetLong UNSAFE true true thrpt 30 247069881.578 ± 10839162.593 ops/s setGetLong UNSAFE_SLICE true true thrpt 30 196355905.206 ± 1802420.990 ops/s setGetLong HEAP true true thrpt 30 245686644.713 ± 11769311.527 ops/s setGetLong COMPOSITE true true thrpt 30 83170940.687 ± 657524.123 ops/s setLong UNSAFE true true thrpt 30 278940253.918 ± 1807265.259 ops/s setLong UNSAFE_SLICE true true thrpt 30 202556738.764 ± 11887973.563 ops/s setLong HEAP true true thrpt 30 280045958.053 ± 2719583.400 ops/s setLong COMPOSITE true true thrpt 30 121299806.002 ± 2155084.707 ops/s After: Benchmark (bufferType) (checkAccessible) (checkBounds) Mode Cnt Score Error Units readBatch UNSAFE true true thrpt 30 101641801.035 ± 3950050.059 ops/s readBatch UNSAFE_SLICE true true thrpt 30 84395902.846 ± 4339579.057 ops/s readBatch HEAP true true thrpt 30 100179060.207 ± 3222487.287 ops/s readBatch COMPOSITE true true thrpt 30 42288494.472 ± 294919.633 ops/s setGetLong UNSAFE true true thrpt 30 304530755.027 ± 6574163.899 ops/s setGetLong UNSAFE_SLICE true true thrpt 30 212028547.645 ± 14277828.768 ops/s setGetLong HEAP true true thrpt 30 309335422.609 ± 2272150.415 ops/s setGetLong COMPOSITE true true thrpt 30 160383609.236 ± 966484.033 ops/s setLong UNSAFE true true thrpt 30 298055969.747 ± 7437449.627 ops/s setLong UNSAFE_SLICE true true thrpt 30 223784178.650 ± 9869750.095 ops/s setLong HEAP true true thrpt 30 302543263.328 ± 8140104.706 ops/s setLong COMPOSITE true true thrpt 30 157083673.285 ± 3528779.522 ops/s There's also a similar knock-on improvement to other benchmarks (e.g. HPACK encoding/decoding) as shown in #8882. For sanity I did a final comparison of the "fast path" tweak using one of the HPACK benchmarks: (rawCnt & 1) == 0: Benchmark (limitToAscii) (sensitive) (size) Mode Cnt Score Error Units HpackDecoderBenchmark.decode true true MEDIUM thrpt 30 50914.479 ± 940.114 ops/s rawCnt == 2 \|\| rawCnt == 4 \|\| rawCnt == 6 \|\| rawCnt == 8 \|\| (rawCnt & 1) == 0: Benchmark (limitToAscii) (sensitive) (size) Mode Cnt Score Error Units HpackDecoderBenchmark.decode true true MEDIUM thrpt 30 60036.425 ± 1478.196 ops/s	2019-02-28 20:41:16 +01:00
Dmitriy Dumanskiy	116f72db8d	Legacy properties removed (#8839 ) Motivation: We can remove some properties for which we introduced replacements. Modifications: io.netty.buffer.bytebuf.checkAccessible, io.netty.leakDetectionLevel, org.jboss.netty.tryUnsafe properties removed Result: Code cleanup	2019-02-04 13:56:15 +01:00
田欧	e8efcd82a8	migrate java8: use requireNonNull (#8840 ) Motivation: We can just use Objects.requireNonNull(...) as a replacement for ObjectUtil.checkNotNull(....) Modifications: - Use Objects.requireNonNull(...) Result: Less code to maintain.	2019-02-04 10:32:25 +01:00
Nick Hill	0ac8db0307	CompositeByteBuf tidy-up (#8784 ) Motivation There's some miscellaneous cleanup/simplification of CompositeByteBuf which would help make the code a bit clearer. Modifications - Simplify web of constructors and addComponents methods, reducing duplication of logic - Rename `Component.freeIfNecessary()` method to just `free()`, which is less confusing (see #8641) - Make loop in addComponents0(...) method more verbose/readable (see https://github.com/netty/netty/pull/8437#discussion_r232124414) - Simplify addition/subtraction in setBytes(...) methods Result Smaller/clearer code	2019-02-01 10:32:02 +01:00
田欧	d7648f1d93	use checkPositive/checkPositiveOrZero (#8803 ) Motivation: We have a utility method to check for > 0 and >0 arguments. We should use it. Modification: use checkPositive/checkPositiveOrZero instead of if statement. Result: Re-use utility method.	2019-01-31 09:06:59 +01:00
田欧	6222101924	migrate java8: use lambda and method reference (#8781 ) Motivation: We can use lambdas now as we use Java8. Modification: use lambda function for all package, #8751 only migrate transport package. Result: Code cleanup.	2019-01-29 14:06:05 +01:00
田欧	e941cbe27a	remove unused import statement (#8792 ) Motivation: The code contained some unused import statements. Modification: Remove unused import statements. Result: Code cleanup	2019-01-28 16:50:15 +01:00
kezhenxu94	7b6336f1fd	Java 8 Migration: remove uneccessary if statement (#8755 ) Motivation: As netty 4.x supported Java 6 we had various if statements to check for java versions < 8. We can remove these now. Modification: Remove unnecessary if statements that check for java versions < 8. Result: Cleanup code.	2019-01-25 08:57:11 +01:00
Norman Maurer	310f31b392	Update to new checkstyle plugin (#8777 ) Motivation: We need to update to a new checkstyle plugin to allow the usage of lambdas. Modifications: - Update to new plugin version. - Fix checkstyle problems. Result: Be able to use checkstyle plugin which supports new Java syntax.	2019-01-24 16:24:19 +01:00
Nick Hill	9e7385ddba	Fix three bugs in CompositeByteBuf (#8773 ) Motivation In #8758, @doom369 reported an infinite loop bug in CompositeByteBuf which was introduced in #8437. This is the same small fix for that, along with fixes for two other bugs found while re-inspecting the changes and adding unit tests. Modification - Replace recursive call to toComponentIndex with toComponentIndex0 as intended - Add missed "lastAccessed" racy cache invalidation in capacity(int) method - Fix incorrect determination of initial offset in non-zero cIndex case of updateComponentOffsets method - New unit tests for previously uncovered methods Results Fewer bugs.	2019-01-24 14:59:33 +01:00
Dmitriy Dumanskiy	4a10357fd8	IDE warnings cleanup (#8768 ) Motivation: IDE shows some warnings Modification: Small IDE warnings cleanup in different places. Result: Less warnings	2019-01-23 14:01:48 +01:00
Dmitriy Dumanskiy	7b92ff2500	Java 8 migration. Remove ThreadLocalProvider and inline java.util.concurrent.ThreadLocalRandom.current() where necessary. (#8762 ) Motivation: Custom Netty ThreadLocalRandom and ThreadLocalRandomProvider classes are no longer needed and can be removed. Modification: Remove own ThreadLocalRandom Result: Less code to maintain	2019-01-22 20:14:28 +01:00
田欧	9d62deeb6f	Java 8 migration: Use diamond operator (#8749 ) Motivation: We can use the diamond operator these days. Modification: Use diamond operator whenever possible. Result: More modern code and less boiler-plate.	2019-01-22 16:07:26 +01:00
Dmitriy Dumanskiy	b2a9a8e9fe	Java 8 migration. Removed custom LongCounter and LongAdderCounter classes and replaced with direct reference to LongAdder (#8750 ) Motivation: After switching to Java 8 we no longer need LongCounter and LongAdderCounter. We can directly replace them with LongAdder. Modification: Removed LongCounter and LongAdderCounter classes. Result: Less code to maintain	2019-01-22 13:53:28 +01:00
Derek Lewis	13f640cbba	Updating `ByteBuf` Javadocs to represent actual behaviour. (#8709 ) Motivation: The javadocs stating `IndexOutOfBoundsException` is thrown were different from what `ByteBuf` actually did. We want to ensure the Javadocs represent reality. Modifications: Updated javadocs on `write*`, `ensureWriteable`, `capacity`, and `maxCapacity` methods. Results: Javadocs more closely match actual behaviour.	2019-01-14 20:09:18 +01:00
kashike	c0aa1ea5c7	Fix minor spelling issues in javadocs (#8701 ) Motivation: Javadocs contained some spelling errors, we should fix these. Modification: Fix spelling Result: Javadoc cleanup.	2019-01-14 07:25:13 +01:00
Norman Maurer	d9a6cf341c	Remove support for marking reader and writerIndex in ByteBuf to reduce overhead and complexity. (#8636 ) Motivation: ByteBuf supports “marker indexes”. The intended use case for these is if a speculative operation (e.g. decode) is in process the user can “mark” and interface and refer to it later if the operation isn’t successful (e.g. not enough data). However this is rarely used in practice, requires extra memory to maintain, and introduces complexity in the state management for derived/pooled buffer initialization, resizing, and other operations which may modify reader/writer indexes. Modifications: Remove support for marking and adjust testcases / code. Result: Fixes https://github.com/netty/netty/issues/8535.	2018-12-11 14:00:49 +01:00
Nick Travers	0a8c93a6fa	Loosen bounds check on CompositeByteBuf's maxNumComponents (#8621 ) Motivation: In versions of Netty prior to 4.1.31.Final, a CompositeByteBuf could be created with any size (including potentially nonsensical negative values). This behavior changed in e7737b993, which introduced a bounds check to only allow for a component size greater than one. This broke some existing use cases that attempted to create a byte buf with a single component. Modifications: Lower the bounds check on numComponents to include the single component case, but still throw an exception for anything less than one. Add unit tests for the case of numComponents being less than, equal to, and greater than this lower bound. Result: Return to the behavior of 4.1.30.Final, allowing one component, but still include an explicit check against a lower bound. Note that while creating a CompositeByteBuf with a single component is in some ways a contradiction of the term "composite", this patch caters for existing uses while excluding the clearly nonsensical case of asking for a CompositeByteBuf with zero or fewer components. Fixes #8613.	2018-12-05 08:43:53 +01:00
Norman Maurer	85c1590a90	Provide a way to cache the internal nioBuffer of the PooledByteBuffer… (#8603 ) Motivation: Often a temporary ByteBuffer is used which can be cached to reduce the GC pressure. Modifications: Cache the ByteBuffer in the PoolThreadCache as well. Result: Less GC.	2018-12-04 15:39:55 +01:00
Nick Hill	fc0a4e15ab	Harden ref-counting concurrency semantics (#8583 ) Motivation #8563 highlighted race conditions introduced by the prior optimistic update optimization in `83a19d5650`. These were known at the time but considered acceptable given the perf benefit in high contention scenarios. This PR proposes a modified approach which provides roughly half the gains but stronger concurrency semantics. Race conditions still exist but their scope is narrowed to much less likely cases (releases coinciding with retain overflow), and even in those cases certain guarantees are still assured. Once release() returns true, all subsequent release/retains are guaranteed to throw, and in particular deallocate will be called at most once. Modifications - Use even numbers internally (including -ve) for live refcounts - "Final" release changes to odd number (equivalent to refcount 0) - Retain still uses faster getAndAdd, release uses CAS loop - First CAS attempt uses non-volatile read - Thread.yield() after a failed CAS provides a net gain Result More (though not completely) robust concurrency semantics for ref counting; increased latency under high contention, but still roughly twice as fast as the original logic. Bench results to follow	2018-11-29 08:32:50 +01:00
Norman Maurer	42e0ce6f7a	Move less common code-path to extra method to allow inlining of writeUtf8. (#8600 ) Motivation: ByteBuf is used everywhere so we should try hard to be able to make things inlinable. During benchmarks it showed that writeCharSequence(...) fails to inline writeUtf8 because it is too big even if its hots. Modifications: Move less common code-path to extra method to allow inlining. Result: Be able to inline writeUtf8 in most cases.	2018-11-27 21:04:13 +01:00
Norman Maurer	45e9fd4802	Revert "Provide a way to cache the internal nioBuffer of the PooledByteBuffer to reduce GC. (#8593 )" This reverts commit `945f123f8b` as it seems to produce some failures in some cases. This needs more research.	2018-11-27 20:03:33 +01:00
Norman Maurer	945f123f8b	Provide a way to cache the internal nioBuffer of the PooledByteBuffer to reduce GC. (#8593 ) Motivation: Often a temporary ByteBuffer is used which can be cached to reduce the GC pressure. Modifications: Add a Deque per PoolChunk which will be used for caching. Result: Less GC.	2018-11-27 13:55:24 +01:00
Rolandz	836c39b82f	Fix offset calculation in PooledByteBufAllocator when used Motivation: When we create new chunk with memory aligned, the offset of direct memory should be 'alignment - address & (alignment - 1)', not just 'address & (alignment - 1)'. Modification: Change offset calculating formula to offset = alignment - address & (alignment - 1) in PoolArena.DirectArena#offsetCacheLine and add a unit test to assert that. Result: Correctly calculate offset.	2018-11-27 11:47:40 +01:00
Norman Maurer	2c78dde749	Update version number to start working on Netty 5	2018-11-20 15:49:57 +01:00
Nick Hill	804e1fa9cc	Fix ref-counting when CompositeByteBuf is used with retainedSlice() (#8497 ) Motivation: ByteBuf.retainedSlice() and similar methods produce sliced buffers with an independent refcount to the buffer that they wrap. One of the optimizations in `10539f4dc7` was to use the ref to the unwrapped buffer object for added slices, but this did not take into account the above special case when later releasing. Thanks to @rkapsi for discovering this via #8495. Modifications: Since a reference to the slice is still kept in the Component class, just changed Component.freeIfNecessary() to release the slice in preference to the unwrapped buf. Also added a unit test which reproduces the bug. Result: Fixes #8495	2018-11-13 20:56:09 +01:00
Nick Hill	0f8ce1b284	Fix incorrect sizing of temp byte arrays in (Unsafe)ByteBufUtil (#8484 ) Motivation: Two similar bugs were introduced by myself in separate recent PRs #8393 and #8464, while optimizing the assignment/handling of temporary arrays in ByteBufUtil and UnsafeByteBufUtil. The temp arrays allocated for buffering data written to an OutputStream are incorrectly sized to the full length of the data to copy rather than being capped at WRITE_CHUNK_SIZE. Unfortunately one of these is in the 4.1.31.Final release, I'm really sorry and will be more careful in future. This kind of thing is tricky to cover in unit tests. Modifications: Revert the temp array allocations back to their original sizes. Avoid making duplicate calls to ByteBuf.capacity() in a couple of places in ByteBufUtil (unrelated thing I noticed, can remove it from this PR if desired!) Result: Temporary byte arrays will be reverted to their originally intended sizes.	2018-11-09 18:24:38 +01:00
Nick Hill	5954110b9a	Use ByteBufUtil.BYTE_ARRAYS ThreadLocal temporary arrays in more places (#8464 ) Motivation: #8388 introduced a reusable ThreadLocal<byte[]> for use in decodeString(...). It can be used in more places in the buffer package to avoid temporary allocations of small arrays. Modifications: Encapsulate use of the ThreadLocal in a static package-private ByteBufUtil.threadLocalTempArray(int) method, and make use of it from a handful of new places including ByteBufUtil.readBytes(...). Result: Fewer short-lived small byte array allocations.	2018-11-05 21:11:28 +01:00
Nick Hill	10539f4dc7	Streamline CompositeByteBuf internals (#8437 ) Motivation: CompositeByteBuf is a powerful and versatile abstraction, allowing for manipulation of large data without copying bytes. There is still a non-negligible cost to reading/writing however relative to "singular" ByteBufs, and this can be mostly eliminated with some rework of the internals. My use case is message modification/transformation while zero-copy proxying. For example replacing a string within a large message with one of a different length Modifications: - No longer slice added buffers and unwrap added slices - Components store target buf offset relative to position in composite buf - Less allocations, object footprint, pointer indirection, offset arithmetic - Use Component[] rather than ArrayList<Component> - Avoid pointer indirection and duplicate bounds check, more efficient backing array growth - Facilitates optimization when doing bulk-inserts - inserting n ByteBufs behind m is now O(m + n) instead of O(mn) - Avoid unnecessary casting and method call indirection via superclass - Eliminate some duplicate range/ref checks via non-checking versions of toComponentIndex and findComponent - Add simple fast-path for toComponentIndex(0); add racy cache of last-accessed Component to findComponent(int) - Override forEachByte0(...) and forEachByteDesc0(...) methods - Make use of RecyclableArrayList in nioBuffers(int, int) (in line with FasterCompositeByteBuf impl) - Modify addComponents0(boolean,int,Iterable) to use the Iterable directly rather than copy to an array first (and possibly to an ArrayList before that) - Optimize addComponents0(boolean,int,ByteBuf[],int) to not perform repeated array insertions and avoid second loop for offset updates - Simplify other logic in various places, in particular the general pattern used where a sub-range is iterated over - Add benchmarks to demonstrate some improvements While refactoring I also came across a couple of clear bugs. They are fixed in these changes but I will open another PR with unit tests and fixes to the current version. Result: Much faster creation, manipulation, and access; many fewer allocations and smaller footprint. Benchmark results to follow.	2018-11-03 10:37:07 +01:00
Nick Hill	44cca1a26f	Avoid allocations when wrapping byte[] and ByteBuffer arrays as ByteBuf (#8420 ) Motivation: Unpooled.wrap(byte[]...) and Unpooled.wrap(ByteBuffer...) currently allocate/copy an intermediate ByteBuf ArrayList and array, which can be avoided. Modifications: - Define new internal ByteWrapper interface and add a CompositeByteBuf constructor which takes a ByteWrapper with an array of the type that it wraps, and modify the appropriate Unpooled.wrap(...) methods to take advantage of it - Tidy up other constructors in CompositeByteBuf to remove duplication and misleading len arg (which is really an end offset into provided array) Result: Less allocation/copying when wrapping byte[] and ByteBuffer arrays, tidier code.	2018-10-30 19:35:39 +01:00
root	3e7ddb36c7	[maven-release-plugin] prepare for next development iteration	2018-10-29 15:38:51 +00:00
root	9e50739601	[maven-release-plugin] prepare release netty-4.1.31.Final	2018-10-29 15:37:47 +00:00
Nick Hill	48c45cf4ac	Fix leak and corruption bugs in CompositeByteBuf (#8438 ) Motivation: I came across two bugs: - Components removed due to capacity reduction aren't released - Offsets aren't set correctly on empty components that are added between existing components Modifications: Add unit tests which expose these bugs, fix them. Result: Bugs are fixed	2018-10-28 10:28:18 +01:00
Nick Hill	d7fa7be67f	Exploit PlatformDependent.allocateUninitializedArray() in more places (#8393 ) Motivation: There are currently many more places where this could be used which were possibly not considered when the method was added. If https://github.com/netty/netty/pull/8388 is included in its current form, a number of these places could additionally make use of the same BYTE_ARRAYS threadlocal. There's also a couple of adjacent places where an optimistically-pooled heap buffer is used for temp byte storage which could use the threadlocal too in preference to allocating a temp heap bytebuf wrapper. For example https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/ByteBufUtil.java#L1417. Modifications: Replace new byte[] with PlatformDependent.allocateUninitializedArray() where appropriate; make use of ByteBufUtil.getBytes() in some places which currently perform the equivalent logic, including avoiding copy of backing array if possible (although would be rare). Result: Further potential speed-up with java9+ and appropriate compile flags. Many of these places could be on latency-sensitive code paths.	2018-10-27 10:43:28 -05:00
Nick Hill	583d838f7c	Optimize AbstractByteBuf.getCharSequence() in US_ASCII case (#8392 ) * Optimize AbstractByteBuf.getCharSequence() in US_ASCII case Motivation: Inspired by https://github.com/netty/netty/pull/8388, I noticed this simple optimization to avoid char[] allocation (also suggested in a TODO here). Modifications: Return an AsciiString from AbstractByteBuf.getCharSequence() if requested charset is US_ASCII or ISO_8859_1 (latter thanks to @Scottmitch's suggestion). Also tweak unit tests not to require Strings and include a new benchmark to demonstrate the speedup. Result: Speed-up of AbstractByteBuf.getCharSequence() in ascii and iso 8859/1 cases	2018-10-26 15:32:38 -07:00
Norman Maurer	87ec2f882a	Reduce overhead by ByteBufUtil.decodeString(...) which is used by `AbstractByteBuf.toString(...)` and `AbstractByteBuf.getCharSequence(...)` (#8388 ) Motivation: Our current implementation that is used for toString(Charset) operations on AbstractByteBuf implementation is quite slow as it does a lot of uncessary memory copies. We should just use new String(...) as it has a lot of optimizations to handle these cases. Modifications: Rewrite ByteBufUtil.decodeString(...) to use new String(...) Result: Less overhead for toString(Charset) operations. Benchmark (charsetName) (direct) (size) Mode Cnt Score Error Units ByteBufUtilDecodeStringBenchmark.decodeString US-ASCII false 8 thrpt 20 22401645.093 ? 4671452.479 ops/s ByteBufUtilDecodeStringBenchmark.decodeString US-ASCII false 64 thrpt 20 23678483.384 ? 3749164.446 ops/s ByteBufUtilDecodeStringBenchmark.decodeString US-ASCII true 8 thrpt 20 15731142.651 ? 3782931.591 ops/s ByteBufUtilDecodeStringBenchmark.decodeString US-ASCII true 64 thrpt 20 16244232.229 ? 1886259.658 ops/s ByteBufUtilDecodeStringBenchmark.decodeString UTF-8 false 8 thrpt 20 25983680.959 ? 5045782.289 ops/s ByteBufUtilDecodeStringBenchmark.decodeString UTF-8 false 64 thrpt 20 26235589.339 ? 2867004.950 ops/s ByteBufUtilDecodeStringBenchmark.decodeString UTF-8 true 8 thrpt 20 18499027.808 ? 4784684.268 ops/s ByteBufUtilDecodeStringBenchmark.decodeString UTF-8 true 64 thrpt 20 16825286.141 ? 1008712.342 ops/s ByteBufUtilDecodeStringBenchmark.decodeString UTF-16 false 8 thrpt 20 5789879.092 ? 1201786.359 ops/s ByteBufUtilDecodeStringBenchmark.decodeString UTF-16 false 64 thrpt 20 2173243.225 ? 417809.341 ops/s ByteBufUtilDecodeStringBenchmark.decodeString UTF-16 true 8 thrpt 20 5035583.011 ? 1001978.854 ops/s ByteBufUtilDecodeStringBenchmark.decodeString UTF-16 true 64 thrpt 20 2162345.301 ? 402410.408 ops/s ByteBufUtilDecodeStringBenchmark.decodeString ISO-8859-1 false 8 thrpt 20 30039052.376 ? 6539111.622 ops/s ByteBufUtilDecodeStringBenchmark.decodeString ISO-8859-1 false 64 thrpt 20 31414163.515 ? 2096710.526 ops/s ByteBufUtilDecodeStringBenchmark.decodeString ISO-8859-1 true 8 thrpt 20 19538587.855 ? 4639115.572 ops/s ByteBufUtilDecodeStringBenchmark.decodeString ISO-8859-1 true 64 thrpt 20 19467839.722 ? 1672687.213 ops/s ByteBufUtilDecodeStringBenchmark.decodeStringOld US-ASCII false 8 thrpt 20 10787326.745 ? 1034197.864 ops/s ByteBufUtilDecodeStringBenchmark.decodeStringOld US-ASCII false 64 thrpt 20 7129801.930 ? 1363019.209 ops/s ByteBufUtilDecodeStringBenchmark.decodeStringOld US-ASCII true 8 thrpt 20 9002529.605 ? 2017642.445 ops/s ByteBufUtilDecodeStringBenchmark.decodeStringOld US-ASCII true 64 thrpt 20 3860192.352 ? 826218.738 ops/s ByteBufUtilDecodeStringBenchmark.decodeStringOld UTF-8 false 8 thrpt 20 10532838.027 ? 2151743.968 ops/s ByteBufUtilDecodeStringBenchmark.decodeStringOld UTF-8 false 64 thrpt 20 7185554.597 ? 1387685.785 ops/s ByteBufUtilDecodeStringBenchmark.decodeStringOld UTF-8 true 8 thrpt 20 7352253.316 ? 1333823.850 ops/s ByteBufUtilDecodeStringBenchmark.decodeStringOld UTF-8 true 64 thrpt 20 2825578.707 ? 349701.156 ops/s ByteBufUtilDecodeStringBenchmark.decodeStringOld UTF-16 false 8 thrpt 20 7277446.665 ? 1447034.346 ops/s ByteBufUtilDecodeStringBenchmark.decodeStringOld UTF-16 false 64 thrpt 20 2445929.579 ? 562816.641 ops/s ByteBufUtilDecodeStringBenchmark.decodeStringOld UTF-16 true 8 thrpt 20 6201174.401 ? 1236137.786 ops/s ByteBufUtilDecodeStringBenchmark.decodeStringOld UTF-16 true 64 thrpt 20 2310674.973 ? 525587.959 ops/s ByteBufUtilDecodeStringBenchmark.decodeStringOld ISO-8859-1 false 8 thrpt 20 11142625.392 ? 1680556.468 ops/s ByteBufUtilDecodeStringBenchmark.decodeStringOld ISO-8859-1 false 64 thrpt 20 8127116.405 ? 1128513.860 ops/s ByteBufUtilDecodeStringBenchmark.decodeStringOld ISO-8859-1 true 8 thrpt 20 9405751.952 ? 2193324.806 ops/s ByteBufUtilDecodeStringBenchmark.decodeStringOld ISO-8859-1 true 64 thrpt 20 3943282.076 ? 737798.070 ops/s Benchmark result is saved to /home/norman/mainframer/netty/microbench/target/reports/performance/ByteBufUtilDecodeStringBenchmark.json Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1,030.173 sec - in io.netty.buffer.ByteBufUtilDecodeStringBenchmark [1030.460s][info ][gc,heap,exit ] Heap [1030.460s][info ][gc,heap,exit ] garbage-first heap total 516096K, used 257918K [0x0000000609a00000, 0x0000000800000000) [1030.460s][info ][gc,heap,exit ] region size 2048K, 127 young (260096K), 2 survivors (4096K) [1030.460s][info ][gc,heap,exit ] Metaspace used 17123K, capacity 17438K, committed 17792K, reserved 1064960K [1030.460s][info ][gc,heap,exit ] class space used 1709K, capacity 1827K, committed 1920K, reserved 1048576K	2018-10-19 14:00:13 +02:00
Norman Maurer	69545aedc4	CompositeByteBuf.decompose(...) does not correctly slice content. (#8403 ) Motivation: CompositeByteBuf.decompose(...) did not correctly slice the content and so produced an incorrect representation of the data. Modifications: - Rewrote implementation to fix bug and also improved it to reduce GC - Add unit tests. Result: Fixes https://github.com/netty/netty/issues/8400.	2018-10-19 08:05:22 +02:00
Nick Hill	7062ceedb0	Simplify ByteBufInputStream.readLine() logic (#8380 ) Motivation: While looking at the nice optimization done in https://github.com/netty/netty/pull/8347 I couldn't help noticing the logic could be simplified further. Apologies if this is just my OCD and inappropriate! Modifications: Reduce amount of code used for ByteBufInputStream.readLine() Result: Slightly smaller and simpler code	2018-10-13 06:24:40 +02:00
Francesco Nigro	83dc3b503e	ByteBufInputStream is always allocating a StringBuilder instance (#8347 ) Motivation: Avoid creating any StringBuilder instance if ByteBufInputStream::readLine isn't used Modifications: The StringBuilder instance is lazy allocated on demand and are added new test case branches to address the increased complexity of ByteBufInputStream::readLine Result: Reduced GC activity if ByteBufInputStream::readLine isn't used	2018-10-11 14:56:29 +08:00
Dmitriy Dumanskiy	6cebb6069b	remove unnecessary vararg argument in PooledByteBufAllocator (#8338 ) Motivation: No need in varargs, the method always accepts array. Modification: ... replaced with []	2018-10-05 19:06:44 +08:00
root	2d7cb47edd	[maven-release-plugin] prepare for next development iteration	2018-09-27 19:00:45 +00:00
root	3a9ac829d5	[maven-release-plugin] prepare release netty-4.1.30.Final	2018-09-27 18:56:12 +00:00
Norman Maurer	c14efd952d	Directly init refCnt to 1 (#8274 ) Motivation: We should just directly init the refCnt to 1 and not use the AtomicIntegerFieldUpdater. Modifications: Just assing directly to 1. Result: Cleaner code and possible a bit faster as the JVM / JIT may be able to optimize the first store easily.	2018-09-07 19:04:19 +02:00
Norman Maurer	e542a2cf26	Use a non-volatile read for ensureAccessible() whenever possible to reduce overhead and allow better inlining. (#8266 ) Motiviation: At the moment whenever ensureAccessible() is called in our ByteBuf implementations (which is basically on each operation) we will do a volatile read. That per-se is not such a bad thing but the problem here is that it will also reduce the the optimizations that the compiler / jit can do. For example as these are volatile it can not eliminate multiple loads of it when inline the methods of ByteBuf which happens quite frequently because most of them a quite small and very hot. That is especially true for all the methods that act on primitives. It gets even worse as people often call a lot of these after each other in the same method or even use method chaining here. The idea of the change is basically just ue a non-volatile read for the ensureAccessible() check as its a best-effort implementation to detect acting on already released buffers anyway as even with a volatile read it could happen that the user will release it in another thread before we actual access the buffer after the reference check. Modifications: - Try to do a non-volatile read using sun.misc.Unsafe if we can use it. - Add a benchmark Result: Big performance win when multiple ByteBuf methods are called from a method. With the change: UnsafeByteBufBenchmark.setGetLongUnsafeByteBuf thrpt 20 281395842,128 ± 5050792,296 ops/s Before the change: UnsafeByteBufBenchmark.setGetLongUnsafeByteBuf thrpt 20 217419832,801 ± 5080579,030 ops/s	2018-09-07 07:47:02 +02:00
Francesco Nigro	c78be33443	Added configurable ByteBuf bounds checking (#7521 ) Motivation: The JVM isn't always able to hoist out/reduce bounds checking (due to ref counting operations etc etc) hence making it configurable could improve performances for most CPU intensive use cases. Modifications: Each AbstractByteBuf bounds check has been tested against a new static final configuration property similar to checkAccessible ie io.netty.buffer.bytebuf.checkBounds. Result: Any user could disable ByteBuf bounds checking in order to get extra performances.	2018-09-03 20:33:47 +02:00
Norman Maurer	54f565ac67	Allow to use native transports when sun.misc.Unsafe is not present on… (#8231 ) * Allow to use native transports when sun.misc.Unsafe is not present on the system Motivation: We should be able to use the native transports (epoll / kqueue) even when sun.misc.Unsafe is not present on the system. This is especially important as Java11 will be released soon and does not allow access to it by default. Modifications: - Correctly disable usage of sun.misc.Unsafe when -PnoUnsafe is used while running the build - Correctly increment metric when UnpooledDirectByteBuf is allocated. This was uncovered once -PnoUnsafe usage was fixed. - Implement fallbacks in all our native transport code for when sun.misc.Unsafe is not present. Result: Fixes https://github.com/netty/netty/issues/8229.	2018-08-29 19:36:33 +02:00
root	a580dc7585	[maven-release-plugin] prepare for next development iteration	2018-08-24 06:36:33 +00:00

1 2 3 4 5 ...

953 Commits