Motivation:
Derefs are not necessarily their referents.
This is the case for Send, for instance.
Modification:
The Deref.isInstanceOf method is renamed to referentIsInstanceOf.
And a Send.isSendOf method has been added, that simplifies the check for sends, since it could be the case that one also needs to check if the object in question is also a Send instance.
Result:
Cleaner code that is easier to read, when working with Sends.
This fixes https://github.com/netty/netty-incubator-buffer-api/issues/46
Then run those tests in independent surefire forks.
This should allow Maven to hold on to less test metadata, and cope better with the large number of tests.
Motivation:
The untethered memory allocated by ensureWritable in a direct MemorySegment based non-pooled Buffer would be allocated without having a Cleaner attached to its ResourceScope.
This could cause that memory to leak if the Buffer instance was cast aside.
Modification:
ManagedBufferAllocator now makes sure to attach a cleaner to the buffer and its memory segment, when allocating untethered memory.
Result:
The BufferTest$CleanerTests now pass.
Motivation:
We need a new implementation of our new API that supports Java 11, since that is what Netty 5 will most likely baseline on.
We also need an implementation that does not rely on Unsafe.
This leaves us with ByteBuffer as the underlying currency of memory.
Modification:
- Add a NioBuffer implementation and associated supporting classes.
- The entry-point for this is a new MemoryManagers API, which is used to pick the implementation and provide the on-/off-heap MemoryManager implementations.
- Add a mechanism to configure/override which MemoryManagers implementation to use.
- The MemoryManagers implementations are service-loadable, so new ones can be discovered at runtime.
- The existing MemorySegment based implementation also get a MemoryManagers implementation.
- Expand the BufferTest to include all combinations of all implementations. We now run 360.000 tests in BufferTest.
- Some common infrastructure, like ArcDrop, is moved to its own package.
- Add a module-info.java to control the service loading, and the visibility in the various packages.
- Some pom.xml file updates to support our now module based project.
Result:
We have an implementation that should work on Java 11, but we currently don't build or test on 11.
More work needs to happen before that is a reality.
Motivation:
It is kind of a weird internal and hidden state, that slices were special.
For instance, slices could not be sent, and they could never obtain ownership.
This means buffers from slices behaved differently from allocated buffers.
In doing so, they violated both the principle that magic should stay hidden, and the principle of consistent behaviour.
Modification:
- The special reference-counting drop implementation that was added to support bifurcation, has been renamed to ArcDrop (for atomic reference counting).
- The ArcDrop is then used throughout the MemSegBuffer implementation to account for every instance where multiple buffers reference the same memory, e.g. slices and the like.
- Borrows of a buffer is then the sum of borrows from the buffer itself, and its ArcDrop.
- Ownership is thus tied to both the buffer itself being owned, and the ArcDrop being in an owned state.
- SizeClassedMemoryPool is changed to pool recoverable memory instead of sends, because the sends could come from slices.
- We also take care to keep around a "base" memory segment, so that we don't return memory segment slices to the memory pool (doing so would leak the memory from the parent segment that is not part of the slice).
- CleanerPooledDrop now keeps a weak reference to itself, rather than the buffer, which is more correct anyway, but now also required because we cannot rely on the buffer reference the cleaner was created with.
- The CleanerPooledDrop now takes care to drop the buffer that is actually passed to it, rather than what it was referencing from some earlier point.
- MemoryManager can now disclose the size of recoverable memory, so that SizeClassedMemoryPool can pick the correct size pool to return memory to. It cannot rely on the passed down buffer instance for this, because that buffer might have been a slice.
Result:
It is now possible for slices to obtain ownership when their parent buffer is closed.
The get* methods bounds checking accesses between 0 and the write offset, and the tests were confirming this behaviour.
This was wrong because it is not symmetric with the set* methods, which bounds check between 0 and the capacity, and does not modify the write offset.
The tests and methods have been updated so the get* methods now bounds check between 0 and the capacity.
Motivation:
This makes it possible to use the new buffer API in Netty as is.
Modification:
Make the MemSegBuffer implementation class implement AsByteBuf and ReferenceCounted.
The produced ByteBuf instance delegates all calls to the underlying Buffer instance as faithfully as possible.
One area where the two deviates, is that it's not possible to create non-retained duplicates and slices with the new buffer API.
Result:
It is now possible to use the new buffer API on both client and server side.
The Echo* examples demonstrate this, and the EchoIT proves it with a test.
The API is used more directly on the client side, since the server-side allocator in Netty does not know how to allocate buffers with the incubating API.
Motivation:
Another way to process the readable data in a buffer.
This might be faster for composite buffers, since their byte cursors are a bit slower than the MemSegBuffer due to the indirections and more complicated logic.
Modification:
ReadableComponent now have openCursor method.
Note that we *don't* add an openReverseCursor method on ReadableComponent.
The reason is that forEachReadable iterates the components in the forward direction, and it's really confusing to then iterate the bytes in a backwards direction.
Working with both directions at the same time is very error prone.
Result:
It is now possible to process readable components with byte cursors.
Motivation:
There is no reason for `compose()` to be an instance method on `BufferAllocator` since the allocator implementation should not influence how this method is implemented.
Modification:
Make `compose()` a static method and move it to the `Buffer` interface.
Also move its companion methods `extendComposite()` and `isComposite()` to the `Buffer` interface.
Result:
The composite buffer methods are now in a more sensible place.
Also: decided _against_ making `extendComposite()` and `isComposite()` instance methods, because the subtle behaviours of `extendComposite()` means it would behave quite differently for non-composite buffers.
Also: `isComposite()` is not an instance method because it relates to the hard-coded and concrete `CompositeBuffer` implementation.
Motivation:
Sometimes, we wish to operate on both buffers and anything that can produce a buffer.
For instance, when making a composite buffer, we could compose either buffers or sends.
Modification:
Introduce a Deref interface, which is extended by both Rc and Send.
A Deref can be used to acquire an Rc instance, and in doing so will also acquire a reference to the Rc.
That is, dereferencing increases the reference count.
For Rc itself, this just delegates to Rc.acquire, while for Send it delegates to Send.receive, and can only be called once.
The Allocator.compose method has been changed to take Derefs.
This allows us to compose either Bufs or Sends of bufs.
Or a mix.
Extra care and caution has been added to the code, to make sure the reference counts are managed correctly when composing buffers, now that it's a more complicated operation.
A handful of convenience methods for working with Sends have also been added to the Send interface.
Result:
We can now build a composite buffer out of sends of buffers.
Motivation:
The forEachReadable/Writable permit a cleaner FileCopyExample implementation.
Modification:
Simplify FileCopyExample.
Also add examples of various good and bad ways to transfer buffer ownership between threads.
Update the forEachReadable/Writable APIs to let exceptions pass through.
Result:
Cleaner code and more useful forEachReadable/Writable APIs.
Motivation:
There is no reason that composite buffers should nest when composed.
Instead, when composite buffers are used to compose or extend other composite buffers, we should unwrap them and copy the references to their constituent buffers.
Modification:
Composite buffers now always unwrap and flatten themselves when they participate in composition or extension of other composite buffers.
Result:
Composite buffers are now always guaranteed* to contain a single level of non-composed leaf buffers.
*assuming no other unknown buffer-wrapping buffer type is in the mix.
Motivation:
It's desirable to be able to access the contents of a Buf via an array or a ByteBuffer.
However, we would also like to have a unified API that works for both composite and non-composite buffers.
Even for nested composite buffers.
Modification:
Add a forEachReadable method, which uses internal iteration to process all buffer components.
The internal iteration allows us to hide any nesting of composite buffers.
The consumer in the internal iteration is presented with a Component object, which exposes the contents in various ways.
The data is exposed from the Component via methods, such that anything that is expensive to create, will not have to be paid for unless it is used.
This mechanism also let us avoid any allocation unnecessary allocation; the ByteBuffers and arrays will necessarily have to be allocated, but the consumer may or may not need allocation depending on how it's implemented, and the component objects do not need to be allocated, because the non-composite buffers can directly implement the Component interface.
Result:
It's now possible to access the contents of Buf instances as arrays or ByteBuffers, without having to copy the data.
Motivation:
There are cases where you want a buffer to be "constant."
Buffers are inherently mutable, but it's possible to block off write access to the buffer contents.
This doesn't make it completely safe to share the buffer across multiple threads, but it does catch most races that could occur.
Modification:
Add a method to Buf for toggling read-only mode.
When a buffer is read-only, the write accessors throw exceptions when called.
In the MemSegBuf, this is implemented by having separate read and write references to the underlying memory segment.
In a read-only buffer, the write reference is redirected to point to a closed memory segment, thus preventing all writes to the memory backing the buffer.
Result:
It is now possible to make buffers read-only.
Note, however, that it is also possible to toggle a read-only buffer back to writable.
We need that in order for buffer pools to be able to fully reset the state of a buffer, regardless of the buffer implementation.
Motivation:
Thread-confinement ends up being too confusing to code for, and also prevents some legitimate use cases.
Additionally, thread-confinement exposed implementation specific behavioural differences of buffers, where we would ideally like all buffers to always behave the same, regardless of implementation.
Modification:
All MemorySegment based buffers now always use shared segments.
For heap-based segments, we avoid the overhead associated with the closing of shared segments, by just not closing them, and instead just leave the whole thing for the GC to deal with.
Result:
Buffers can now always be accessed from multiple different threads at the same time.
Motivation:
Although having a cleaner attached adds a bit of overhead when allocating or closing buffers,
it is more important to make our systems, libraries and frameworks misuse resistant and safe by default.
Modification:
Remove the ability to allocate a buffer that does not have a cleaner attached.
Reference counting and the ability to explicitly release memory remains.
This just makes sure that we always have a safety net to fall back on.
Result:
This will make systems less prone to crashes through running out of memory, native or otherwise, even in the face of true memory leaks.
(Leaks through retained strong references cannot be fixed in any way)
Looks like the overhead is not too bad, so I think we can just always do that:
```
Benchmark (workload) Mode Cnt Score Error Units
explicitPooledClose light avgt 150 1,094 ± 0,017 us/op
pooledWithCleanerExplicitClose light avgt 150 1,181 ± 0,009 us/op
```
Motivation:
The main use case with Buf.compact is in conjunction with ensureWritable.
It turns out we can get a simpler API, and faster methods, by combining those two operations, because it allows us to relax some guarantees and skip some steps in certain cases, which wouldn't be as neat or clean if they were two separate steps.
Modification:
Add a new Buf.ensureWritable method, which takes an allowCompaction argument.
In MemSegBuf, we can just delegate to compact() when applicable.
In CompositeBuf, we can sometimes get away with just reorganising the bufs array.
Result:
We can now do ensureWritable without allocating in some cases, and this can in particular make the operation faster for CompositeBuf.
Motivation:
Compaction makes more space available at the end of a buffer, by discarding bytes at the beginning that have already been processed.
Modification:
Add a copying compact method to Buf.
Result:
It is now possible to discard read bytes by calling `compact()`.
Motivation:
There are many use cases where other objects will have fields that are buffers.
Since buffers are reference counted, their life cycle needs to be managed carefully.
Modification:
Add the abstract BufHolder, and the concrete sub-class BufRef, as neat building blocks for building other classes that contain field references to buffers.
The behaviours of closed/sent buffers have also been specified in tests, and tightened up in the code.
Result:
It is now easier to create classes/objects that wrap buffers.
Motivation:
There are use cases that involve accumulating data into a buffer, then carving out prefix slices and sending them off on their own journey for further processing.
Modification:
Add a Buf.bifurcate API, that split a buffer, and its ownership, in two.
Internally, the API will inject and maintain an atomically reference counted Drop instance, so that the original memory segment is not released until all bifurcated parts are closed.
This works particularly well for composite buffers, where only the buffer (if any) wherein the bifurcation point lands, will actually have its memory split. A composite buffer can otherwise just crack its buffer array in two.
Result:
We now have a safe way of breaking the single ownership of some memory into multiple parts, that can be sent and owned independently.
Motivation:
Cursors are better than iterators in that they only need to check boundary conditions once per iteration, when processed in a loop.
This should make them easier for the compiler to optimise.
Modification:
Change the ByteIterator to a ByteCursor. The API is almost the same, but with a few subtle differences in semantics.
The primary difference is that the cursor movement and boundary condition checking and position movement happen at the same time, and do not need to occur when the values are fetched out of the cursor.
An iterator, on the other hand, needs to throw an exception if "next" is called too many times.
Result:
Simpler code, and hopefully faster code as well.
Motivation:
This will likely be a somewhat common operation, as buffers move between eventloop and worker threads, so it's important to have an understanding of how it performs.
Modification:
Add a benchmark that specifically targets the send() operation on buffers.
Result:
We got benchmark numbers that clearly show the cost of confinement transfer
Motivation:
Composite buffers are uniquely positioned to be able to extend their underlying storage relatively cheaply.
This fact is relied upon in a couple of buffer use cases within Netty, that we wish to support.
Modification:
Add a static `extend` method to Allocator, so that the CompositeBuf class can remain internal.
The `extend` method inserts the extension buffer at the end of the composite buffer as if it had been included from the start.
This involves checking offsets and byte order invariants.
We also require that the composite buffer be in an owned state.
Result:
It's now possible to extend a composite buffer with a specific buffer, after the composite buffer has been created.
Motivation:
Capture the performance characteristics of this primitive for various buffer implementations.
Modification:
Add a benchmark that iterate 4KiB buffers forwards, and backwards, on various buffer implementations.
Result:
Another aspect of the implementation covered by benchmarks.
Turns out the composite iterators a somewhat slow.
Motivation:
Pooled buffers are a very important use case, and they change the cost dynamics around shared memory segments, so it's worth looking into in detail.
Modification:
Add another explicit close of pooled direct buffers to MemorySegmentClosedByCleanerBenchmark
Result:
Explicitly closing of pooled buffers is even out-performing cleaner close on the "heavy" workload, so this is currently the fastest way to run that workload:
Benchmark (workload) Mode Cnt Score Error Units
MemorySegmentClosedByCleanerBenchmark.cleanerClose heavy avgt 150 14,194 ± 0,558 us/op
MemorySegmentClosedByCleanerBenchmark.explicitClose heavy avgt 150 40,496 ± 0,414 us/op
MemorySegmentClosedByCleanerBenchmark.explicitPooledClose heavy avgt 150 12,723 ± 0,134 us/op