Efficient BytBuf search algorithms (#9914) (#9955)

Motivation:

We have found out that ByteBufUtil.indexOf can be inefficient for substring search on
ByteBuf, both in terms of algorithm complexity (worst case O(needle.readableBytes *
haystack.readableBytes)), and in constant factor (esp. on Composite buffers).
With implementation of more performant search algorithms we have seen improvements on
the order of magnitude.

Modifications:

This change introduces three search algorithms:
1. Knuth Morris Pratt - classical textbook algorithm, a good default choice.
2. Bit mask based algorithm - stable performance on any input, but limited to maximum
search substring (the needle) length of 64 bytes.
3. Aho–Corasick - worse performance and higher memory consumption than [1] and [2], but
it supports multiple substring (the needles) search simultaneously, by inspecting every
byte of the haystack only once.

Each algorithm processes every byte of underlying buffer only once, they are implemented
as ByteProcessor.

Result:

Efficient search algorithms with linear time complexity available in Netty (I will share
benchmark results in a comment on a PR).
This commit is contained in:
Linas Medžiūnas 2020-04-15 11:21:24 +03:00 committed by Norman Maurer
parent 81b435b129
commit abdcf102da
20 changed files with 1831 additions and 0 deletions

View File

@ -0,0 +1,94 @@
/*
* Copyright 2020 The Netty Project
*
* The Netty Project licenses this file to you under the Apache License, version 2.0 (the
* "License"); you may not use this file except in compliance with the License. You may obtain a
* copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software distributed under the License
* is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
* or implied. See the License for the specific language governing permissions and limitations under
* the License.
*/
package io.netty.buffer.search;
/**
* Base class for precomputed factories that create {@link MultiSearchProcessor}s.
* <br>
* The purpose of {@link MultiSearchProcessor} is to perform efficient simultaneous search for multiple {@code needles}
* in the {@code haystack}, while scanning every byte of the input sequentially, only once. While it can also be used
* to search for just a single {@code needle}, using a {@link SearchProcessorFactory} would be more efficient for
* doing that.
* <br>
* See the documentation of {@link AbstractSearchProcessorFactory} for a comprehensive description of common usage.
* In addition to the functionality provided by {@link SearchProcessor}, {@link MultiSearchProcessor} adds
* a method to get the index of the {@code needle} found at the current position of the {@link MultiSearchProcessor} -
* {@link MultiSearchProcessor#getFoundNeedleId()}.
* <br>
* <b>Note:</b> in some cases one {@code needle} can be a suffix of another {@code needle}, eg. {@code {"BC", "ABC"}},
* and there can potentially be multiple {@code needles} found ending at the same position of the {@code haystack}.
* In such case {@link MultiSearchProcessor#getFoundNeedleId()} returns the index of the longest matching {@code needle}
* in the array of {@code needles}.
* <br>
* Usage example (given that the {@code haystack} is a {@link io.netty.buffer.ByteBuf} containing "ABCD" and the
* {@code needles} are "AB", "BC" and "CD"):
* <pre>
* MultiSearchProcessorFactory factory = MultiSearchProcessorFactory.newAhoCorasicSearchProcessorFactory(
* "AB".getBytes(CharsetUtil.UTF_8), "BC".getBytes(CharsetUtil.UTF_8), "CD".getBytes(CharsetUtil.UTF_8));
* MultiSearchProcessor processor = factory.newSearchProcessor();
*
* int idx1 = haystack.forEachByte(processor);
* // idx1 is 1 (index of the last character of the occurence of "AB" in the haystack)
* // processor.getFoundNeedleId() is 0 (index of "AB" in needles[])
*
* int continueFrom1 = idx1 + 1;
* // continue the search starting from the next character
*
* int idx2 = haystack.forEachByte(continueFrom1, haystack.readableBytes() - continueFrom1, processor);
* // idx2 is 2 (index of the last character of the occurrence of "BC" in the haystack)
* // processor.getFoundNeedleId() is 1 (index of "BC" in needles[])
*
* int continueFrom2 = idx2 + 1;
*
* int idx3 = haystack.forEachByte(continueFrom2, haystack.readableBytes() - continueFrom2, processor);
* // idx3 is 3 (index of the last character of the occurrence of "CD" in the haystack)
* // processor.getFoundNeedleId() is 2 (index of "CD" in needles[])
*
* int continueFrom3 = idx3 + 1;
*
* int idx4 = haystack.forEachByte(continueFrom3, haystack.readableBytes() - continueFrom3, processor);
* // idx4 is -1 (no more occurrences of any of the needles)
*
* // This search session is complete, processor should be discarded.
* // To search for the same needles again, reuse the same {@link AbstractMultiSearchProcessorFactory}
* // to get a new MultiSearchProcessor.
* </pre>
*/
public abstract class AbstractMultiSearchProcessorFactory implements MultiSearchProcessorFactory {
/**
* Creates a {@link MultiSearchProcessorFactory} based on
* <a href="https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm">AhoCorasick</a>
* string search algorithm.
* <br>
* Precomputation (this method) time is linear in the size of input ({@code O(Σ|needles|)}).
* <br>
* The factory allocates and retains an array of 256 * X ints plus another array of X ints, where X
* is the sum of lengths of each entry of {@code needles} minus the sum of lengths of repeated
* prefixes of the {@code needles}.
* <br>
* Search (the actual application of {@link MultiSearchProcessor}) time is linear in the size of
* {@link io.netty.buffer.ByteBuf} on which the search is peformed ({@code O(|haystack|)}).
* Every byte of {@link io.netty.buffer.ByteBuf} is processed only once, sequentually, regardles of
* the number of {@code needles} being searched for.
*
* @param needles a varargs array of arrays of bytes to search for
* @return a new instance of {@link AhoCorasicSearchProcessorFactory} precomputed for the given {@code needles}
*/
public static AhoCorasicSearchProcessorFactory newAhoCorasicSearchProcessorFactory(byte[] ...needles) {
return new AhoCorasicSearchProcessorFactory(needles);
}
}

View File

@ -0,0 +1,115 @@
/*
* Copyright 2020 The Netty Project
*
* The Netty Project licenses this file to you under the Apache License, version 2.0 (the
* "License"); you may not use this file except in compliance with the License. You may obtain a
* copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software distributed under the License
* is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
* or implied. See the License for the specific language governing permissions and limitations under
* the License.
*/
package io.netty.buffer.search;
/**
* Base class for precomputed factories that create {@link SearchProcessor}s.
* <br>
* Different factories implement different search algorithms with performance characteristics that
* depend on a use case, so it is advisable to benchmark a concrete use case with different algorithms
* before choosing one of them.
* <br>
* A concrete instance of {@link AbstractSearchProcessorFactory} is built for searching for a concrete sequence of bytes
* (the {@code needle}), it contains precomputed data needed to perform the search, and is meant to be reused
* whenever searching for the same {@code needle}.
* <br>
* <b>Note:</b> implementations of {@link SearchProcessor} scan the {@link io.netty.buffer.ByteBuf} sequentially,
* one byte after another, without doing any random access. As a result, when using {@link SearchProcessor}
* with such methods as {@link io.netty.buffer.ByteBuf#forEachByte}, these methods return the index of the last byte
* of the found byte sequence within the {@link io.netty.buffer.ByteBuf} (which might feel counterintuitive,
* and different from {@link io.netty.buffer.ByteBufUtil#indexOf} which returns the index of the first byte
* of found sequence).
* <br>
* A {@link SearchProcessor} is implemented as a
* <a href="https://en.wikipedia.org/wiki/Finite-state_machine">Finite State Automaton</a> that contains a
* small internal state which is updated with every byte processed. As a result, an instance of {@link SearchProcessor}
* should not be reused across independent search sessions (eg. for searching in different
* {@link io.netty.buffer.ByteBuf}s). A new instance should be created with {@link AbstractSearchProcessorFactory} for
* every search session. However, a {@link SearchProcessor} can (and should) be reused within the search session,
* eg. when searching for all occurrences of the {@code needle} within the same {@code haystack}. That way, it can
* also detect overlapping occurrences of the {@code needle} (eg. a string "ABABAB" contains two occurences of "BAB"
* that overlap by one character "B"). For this to work correctly, after an occurrence of the {@code needle} is
* found ending at index {@code idx}, the search should continue starting from the index {@code idx + 1}.
* <br>
* Example (given that the {@code haystack} is a {@link io.netty.buffer.ByteBuf} containing "ABABAB" and
* the {@code needle} is "BAB"):
* <pre>
* SearchProcessorFactory factory =
* SearchProcessorFactory.newKmpSearchProcessorFactory(needle.getBytes(CharsetUtil.UTF_8));
* SearchProcessor processor = factory.newSearchProcessor();
*
* int idx1 = haystack.forEachByte(processor);
* // idx1 is 3 (index of the last character of the first occurrence of the needle in the haystack)
*
* int continueFrom1 = idx1 + 1;
* // continue the search starting from the next character
*
* int idx2 = haystack.forEachByte(continueFrom1, haystack.readableBytes() - continueFrom1, processor);
* // idx2 is 5 (index of the last character of the second occurrence of the needle in the haystack)
*
* int continueFrom2 = idx2 + 1;
* // continue the search starting from the next character
*
* int idx3 = haystack.forEachByte(continueFrom2, haystack.readableBytes() - continueFrom2, processor);
* // idx3 is -1 (no more occurrences of the needle)
*
* // After this search session is complete, processor should be discarded.
* // To search for the same needle again, reuse the same factory to get a new SearchProcessor.
* </pre>
*/
public abstract class AbstractSearchProcessorFactory implements SearchProcessorFactory {
/**
* Creates a {@link SearchProcessorFactory} based on
* <a href="https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm">Knuth-Morris-Pratt</a>
* string search algorithm. It is a reasonable default choice among the provided algorithms.
* <br>
* Precomputation (this method) time is linear in the size of input ({@code O(|needle|)}).
* <br>
* The factory allocates and retains an int array of size {@code needle.length + 1}, and retains a reference
* to the {@code needle} itself.
* <br>
* Search (the actual application of {@link SearchProcessor}) time is linear in the size of
* {@link io.netty.buffer.ByteBuf} on which the search is peformed ({@code O(|haystack|)}).
* Every byte of {@link io.netty.buffer.ByteBuf} is processed only once, sequentually.
*
* @param needle an array of bytes to search for
* @return a new instance of {@link KmpSearchProcessorFactory} precomputed for the given {@code needle}
*/
public static KmpSearchProcessorFactory newKmpSearchProcessorFactory(byte[] needle) {
return new KmpSearchProcessorFactory(needle);
}
/**
* Creates a {@link SearchProcessorFactory} based on Bitap string search algorithm.
* It is a jump free algorithm that has very stable performance (the contents of the inputs have a minimal
* effect on it). The limitation is that the {@code needle} can be no more than 64 bytes long.
* <br>
* Precomputation (this method) time is linear in the size of the input ({@code O(|needle|)}).
* <br>
* The factory allocates and retains a long[256] array.
* <br>
* Search (the actual application of {@link SearchProcessor}) time is linear in the size of
* {@link io.netty.buffer.ByteBuf} on which the search is peformed ({@code O(|haystack|)}).
* Every byte of {@link io.netty.buffer.ByteBuf} is processed only once, sequentually.
*
* @param needle an array <b>of no more than 64 bytes</b> to search for
* @return a new instance of {@link BitapSearchProcessorFactory} precomputed for the given {@code needle}
*/
public static BitapSearchProcessorFactory newBitapSearchProcessorFactory(byte[] needle) {
return new BitapSearchProcessorFactory(needle);
}
}

View File

@ -0,0 +1,191 @@
/*
* Copyright 2020 The Netty Project
*
* The Netty Project licenses this file to you under the Apache License, version 2.0 (the
* "License"); you may not use this file except in compliance with the License. You may obtain a
* copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software distributed under the License
* is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
* or implied. See the License for the specific language governing permissions and limitations under
* the License.
*/
package io.netty.buffer.search;
import io.netty.util.internal.PlatformDependent;
import java.util.ArrayDeque;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Queue;
/**
* Implements <a href="https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm">AhoCorasick</a>
* string search algorithm.
* Use static {@link AbstractMultiSearchProcessorFactory#newAhoCorasicSearchProcessorFactory}
* to create an instance of this factory.
* Use {@link AhoCorasicSearchProcessorFactory#newSearchProcessor} to get an instance of
* {@link io.netty.util.ByteProcessor} implementation for performing the actual search.
* @see AbstractMultiSearchProcessorFactory
*/
public class AhoCorasicSearchProcessorFactory extends AbstractMultiSearchProcessorFactory {
private final int[] jumpTable;
private final int[] matchForNeedleId;
static final int BITS_PER_SYMBOL = 8;
static final int ALPHABET_SIZE = 1 << BITS_PER_SYMBOL;
private static class Context {
int[] jumpTable;
int[] matchForNeedleId;
}
public static class Processor implements MultiSearchProcessor {
private final int[] jumpTable;
private final int[] matchForNeedleId;
private long currentPosition;
Processor(int[] jumpTable, int[] matchForNeedleId) {
this.jumpTable = jumpTable;
this.matchForNeedleId = matchForNeedleId;
}
@Override
public boolean process(byte value) {
currentPosition = PlatformDependent.getInt(jumpTable, currentPosition | (value & 0xffL));
if (currentPosition < 0) {
currentPosition = -currentPosition;
return false;
}
return true;
}
@Override
public int getFoundNeedleId() {
return matchForNeedleId[(int) currentPosition >> AhoCorasicSearchProcessorFactory.BITS_PER_SYMBOL];
}
@Override
public void reset() {
currentPosition = 0;
}
}
AhoCorasicSearchProcessorFactory(byte[] ...needles) {
for (byte[] needle: needles) {
if (needle.length == 0) {
throw new IllegalArgumentException("Needle must be non empty");
}
}
Context context = buildTrie(needles);
jumpTable = context.jumpTable;
matchForNeedleId = context.matchForNeedleId;
linkSuffixes();
for (int i = 0; i < jumpTable.length; i++) {
if (matchForNeedleId[jumpTable[i] >> BITS_PER_SYMBOL] >= 0) {
jumpTable[i] = -jumpTable[i];
}
}
}
private static Context buildTrie(byte[][] needles) {
ArrayList<Integer> jumpTableBuilder = new ArrayList<Integer>(ALPHABET_SIZE);
for (int i = 0; i < ALPHABET_SIZE; i++) {
jumpTableBuilder.add(-1);
}
ArrayList<Integer> matchForBuilder = new ArrayList<Integer>();
matchForBuilder.add(-1);
for (int needleId = 0; needleId < needles.length; needleId++) {
byte[] needle = needles[needleId];
int currentPosition = 0;
for (byte ch0: needle) {
final int ch = ch0 & 0xff;
final int next = currentPosition + ch;
if (jumpTableBuilder.get(next) == -1) {
jumpTableBuilder.set(next, jumpTableBuilder.size());
for (int i = 0; i < ALPHABET_SIZE; i++) {
jumpTableBuilder.add(-1);
}
matchForBuilder.add(-1);
}
currentPosition = jumpTableBuilder.get(next);
}
matchForBuilder.set(currentPosition >> BITS_PER_SYMBOL, needleId);
}
Context context = new Context();
context.jumpTable = new int[jumpTableBuilder.size()];
for (int i = 0; i < jumpTableBuilder.size(); i++) {
context.jumpTable[i] = jumpTableBuilder.get(i);
}
context.matchForNeedleId = new int[matchForBuilder.size()];
for (int i = 0; i < matchForBuilder.size(); i++) {
context.matchForNeedleId[i] = matchForBuilder.get(i);
}
return context;
}
private void linkSuffixes() {
Queue<Integer> queue = new ArrayDeque<Integer>();
queue.add(0);
int[] suffixLinks = new int[matchForNeedleId.length];
Arrays.fill(suffixLinks, -1);
while (!queue.isEmpty()) {
final int v = queue.remove();
int vPosition = v >> BITS_PER_SYMBOL;
final int u = suffixLinks[vPosition] == -1 ? 0 : suffixLinks[vPosition];
if (matchForNeedleId[vPosition] == -1) {
matchForNeedleId[vPosition] = matchForNeedleId[u >> BITS_PER_SYMBOL];
}
for (int ch = 0; ch < ALPHABET_SIZE; ch++) {
final int vIndex = v | ch;
final int uIndex = u | ch;
final int jumpV = jumpTable[vIndex];
final int jumpU = jumpTable[uIndex];
if (jumpV != -1) {
suffixLinks[jumpV >> BITS_PER_SYMBOL] = v > 0 && jumpU != -1 ? jumpU : 0;
queue.add(jumpV);
} else {
jumpTable[vIndex] = jumpU != -1 ? jumpU : 0;
}
}
}
}
/**
* Returns a new {@link Processor}.
*/
@Override
public Processor newSearchProcessor() {
return new Processor(jumpTable, matchForNeedleId);
}
}

View File

@ -0,0 +1,77 @@
/*
* Copyright 2020 The Netty Project
*
* The Netty Project licenses this file to you under the Apache License, version 2.0 (the
* "License"); you may not use this file except in compliance with the License. You may obtain a
* copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software distributed under the License
* is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
* or implied. See the License for the specific language governing permissions and limitations under
* the License.
*/
package io.netty.buffer.search;
import io.netty.util.internal.PlatformDependent;
/**
* Implements <a href="https://en.wikipedia.org/wiki/Bitap_algorithm">Bitap</a> string search algorithm.
* Use static {@link AbstractSearchProcessorFactory#newBitapSearchProcessorFactory}
* to create an instance of this factory.
* Use {@link BitapSearchProcessorFactory#newSearchProcessor} to get an instance of {@link io.netty.util.ByteProcessor}
* implementation for performing the actual search.
* @see AbstractSearchProcessorFactory
*/
public class BitapSearchProcessorFactory extends AbstractSearchProcessorFactory {
private final long[] bitMasks = new long[256];
private final long successBit;
public static class Processor implements SearchProcessor {
private final long[] bitMasks;
private final long successBit;
private long currentMask;
Processor(long[] bitMasks, long successBit) {
this.bitMasks = bitMasks;
this.successBit = successBit;
}
@Override
public boolean process(byte value) {
currentMask = ((currentMask << 1) | 1) & PlatformDependent.getLong(bitMasks, value & 0xffL);
return (currentMask & successBit) == 0;
}
@Override
public void reset() {
currentMask = 0;
}
}
BitapSearchProcessorFactory(byte[] needle) {
if (needle.length > 64) {
throw new IllegalArgumentException("Maximum supported search pattern length is 64, got " + needle.length);
}
long bit = 1L;
for (byte c: needle) {
bitMasks[c & 0xff] |= bit;
bit <<= 1;
}
successBit = 1L << (needle.length - 1);
}
/**
* Returns a new {@link Processor}.
*/
@Override
public Processor newSearchProcessor() {
return new Processor(bitMasks, successBit);
}
}

View File

@ -0,0 +1,91 @@
/*
* Copyright 2020 The Netty Project
*
* The Netty Project licenses this file to you under the Apache License, version 2.0 (the
* "License"); you may not use this file except in compliance with the License. You may obtain a
* copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software distributed under the License
* is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
* or implied. See the License for the specific language governing permissions and limitations under
* the License.
*/
package io.netty.buffer.search;
import io.netty.util.internal.PlatformDependent;
/**
* Implements
* <a href="https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm">Knuth-Morris-Pratt</a>
* string search algorithm.
* Use static {@link AbstractSearchProcessorFactory#newKmpSearchProcessorFactory}
* to create an instance of this factory.
* Use {@link KmpSearchProcessorFactory#newSearchProcessor} to get an instance of {@link io.netty.util.ByteProcessor}
* implementation for performing the actual search.
* @see AbstractSearchProcessorFactory
*/
public class KmpSearchProcessorFactory extends AbstractSearchProcessorFactory {
private final int[] jumpTable;
private final byte[] needle;
public static class Processor implements SearchProcessor {
private final byte[] needle;
private final int[] jumpTable;
private long currentPosition;
Processor(byte[] needle, int[] jumpTable) {
this.needle = needle;
this.jumpTable = jumpTable;
}
@Override
public boolean process(byte value) {
while (currentPosition > 0 && PlatformDependent.getByte(needle, currentPosition) != value) {
currentPosition = PlatformDependent.getInt(jumpTable, currentPosition);
}
if (PlatformDependent.getByte(needle, currentPosition) == value) {
currentPosition++;
}
if (currentPosition == needle.length) {
currentPosition = PlatformDependent.getInt(jumpTable, currentPosition);
return false;
}
return true;
}
@Override
public void reset() {
currentPosition = 0;
}
}
KmpSearchProcessorFactory(byte[] needle) {
this.needle = needle.clone();
this.jumpTable = new int[needle.length + 1];
int j = 0;
for (int i = 1; i < needle.length; i++) {
while (j > 0 && needle[j] != needle[i]) {
j = jumpTable[j];
}
if (needle[j] == needle[i]) {
j++;
}
jumpTable[i + 1] = j;
}
}
/**
* Returns a new {@link Processor}.
*/
@Override
public Processor newSearchProcessor() {
return new Processor(needle, jumpTable);
}
}

View File

@ -0,0 +1,28 @@
/*
* Copyright 2020 The Netty Project
*
* The Netty Project licenses this file to you under the Apache License, version 2.0 (the
* "License"); you may not use this file except in compliance with the License. You may obtain a
* copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software distributed under the License
* is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
* or implied. See the License for the specific language governing permissions and limitations under
* the License.
*/
package io.netty.buffer.search;
/**
* Interface for {@link SearchProcessor} that implements simultaneous search for multiple strings.
* @see MultiSearchProcessorFactory
*/
public interface MultiSearchProcessor extends SearchProcessor {
/**
* @return the index of found search string (if any, or -1 if none) at current position of this MultiSearchProcessor
*/
int getFoundNeedleId();
}

View File

@ -0,0 +1,25 @@
/*
* Copyright 2020 The Netty Project
*
* The Netty Project licenses this file to you under the Apache License, version 2.0 (the
* "License"); you may not use this file except in compliance with the License. You may obtain a
* copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software distributed under the License
* is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
* or implied. See the License for the specific language governing permissions and limitations under
* the License.
*/
package io.netty.buffer.search;
public interface MultiSearchProcessorFactory extends SearchProcessorFactory {
/**
* Returns a new {@link MultiSearchProcessor}.
*/
@Override
MultiSearchProcessor newSearchProcessor();
}

View File

@ -0,0 +1,30 @@
/*
* Copyright 2020 The Netty Project
*
* The Netty Project licenses this file to you under the Apache License, version 2.0 (the
* "License"); you may not use this file except in compliance with the License. You may obtain a
* copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software distributed under the License
* is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
* or implied. See the License for the specific language governing permissions and limitations under
* the License.
*/
package io.netty.buffer.search;
import io.netty.util.ByteProcessor;
/**
* Interface for {@link ByteProcessor} that implements string search.
* @see SearchProcessorFactory
*/
public interface SearchProcessor extends ByteProcessor {
/**
* Resets the state of SearchProcessor.
*/
void reset();
}

View File

@ -0,0 +1,24 @@
/*
* Copyright 2020 The Netty Project
*
* The Netty Project licenses this file to you under the Apache License, version 2.0 (the
* "License"); you may not use this file except in compliance with the License. You may obtain a
* copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software distributed under the License
* is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
* or implied. See the License for the specific language governing permissions and limitations under
* the License.
*/
package io.netty.buffer.search;
public interface SearchProcessorFactory {
/**
* Returns a new {@link SearchProcessor}.
*/
SearchProcessor newSearchProcessor();
}

View File

@ -0,0 +1,20 @@
/*
* Copyright 2020 The Netty Project
*
* The Netty Project licenses this file to you under the Apache License,
* version 2.0 (the "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
* License for the specific language governing permissions and limitations
* under the License.
*/
/**
* Utility classes for performing efficient substring search within {@link io.netty.buffer.ByteBuf}.
*/
package io.netty.buffer.search;

View File

@ -0,0 +1,32 @@
/*
* Copyright 2020 The Netty Project
*
* The Netty Project licenses this file to you under the Apache License,
* version 2.0 (the "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
* License for the specific language governing permissions and limitations
* under the License.
*/
package io.netty.buffer.search;
import org.junit.Test;
public class BitapSearchProcessorFactoryTest {
@Test
public void testAcceptMaximumLengthNeedle() {
new BitapSearchProcessorFactory(new byte[64]);
}
@Test(expected = IllegalArgumentException.class)
public void testRejectTooLongNeedle() {
new BitapSearchProcessorFactory(new byte[65]);
}
}

View File

@ -0,0 +1,107 @@
/*
* Copyright 2020 The Netty Project
*
* The Netty Project licenses this file to you under the Apache License,
* version 2.0 (the "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
* License for the specific language governing permissions and limitations
* under the License.
*/
package io.netty.buffer.search;
import io.netty.buffer.ByteBuf;
import io.netty.buffer.Unpooled;
import io.netty.util.CharsetUtil;
import org.junit.Test;
import static org.junit.Assert.*;
public class MultiSearchProcessorTest {
@Test
public void testSearchForMultiple() {
final ByteBuf haystack = Unpooled.copiedBuffer("one two three one", CharsetUtil.UTF_8);
final int length = haystack.readableBytes();
final MultiSearchProcessor processor = AbstractMultiSearchProcessorFactory.newAhoCorasicSearchProcessorFactory(
bytes("one"),
bytes("two"),
bytes("three")
).newSearchProcessor();
assertEquals(-1, processor.getFoundNeedleId());
assertEquals(2, haystack.forEachByte(processor));
assertEquals(0, processor.getFoundNeedleId()); // index of "one" in needles[]
assertEquals(6, haystack.forEachByte(3, length - 3, processor));
assertEquals(1, processor.getFoundNeedleId()); // index of "two" in needles[]
assertEquals(12, haystack.forEachByte(7, length - 7, processor));
assertEquals(2, processor.getFoundNeedleId()); // index of "three" in needles[]
assertEquals(16, haystack.forEachByte(13, length - 13, processor));
assertEquals(0, processor.getFoundNeedleId()); // index of "one" in needles[]
assertEquals(-1, haystack.forEachByte(17, length - 17, processor));
haystack.release();
}
@Test
public void testSearchForMultipleOverlapping() {
final ByteBuf haystack = Unpooled.copiedBuffer("abcd", CharsetUtil.UTF_8);
final int length = haystack.readableBytes();
final MultiSearchProcessor processor = AbstractMultiSearchProcessorFactory.newAhoCorasicSearchProcessorFactory(
bytes("ab"),
bytes("bc"),
bytes("cd")
).newSearchProcessor();
assertEquals(1, haystack.forEachByte(processor));
assertEquals(0, processor.getFoundNeedleId()); // index of "ab" in needles[]
assertEquals(2, haystack.forEachByte(2, length - 2, processor));
assertEquals(1, processor.getFoundNeedleId()); // index of "bc" in needles[]
assertEquals(3, haystack.forEachByte(3, length - 3, processor));
assertEquals(2, processor.getFoundNeedleId()); // index of "cd" in needles[]
haystack.release();
}
@Test
public void findLongerNeedleInCaseOfSuffixMatch() {
final ByteBuf haystack = Unpooled.copiedBuffer("xabcx", CharsetUtil.UTF_8);
final MultiSearchProcessor processor1 = AbstractMultiSearchProcessorFactory.newAhoCorasicSearchProcessorFactory(
bytes("abc"),
bytes("bc")
).newSearchProcessor();
assertEquals(3, haystack.forEachByte(processor1)); // end of "abc" in haystack
assertEquals(0, processor1.getFoundNeedleId()); // index of "abc" in needles[]
final MultiSearchProcessor processor2 = AbstractMultiSearchProcessorFactory.newAhoCorasicSearchProcessorFactory(
bytes("bc"),
bytes("abc")
).newSearchProcessor();
assertEquals(3, haystack.forEachByte(processor2)); // end of "abc" in haystack
assertEquals(1, processor2.getFoundNeedleId()); // index of "abc" in needles[]
haystack.release();
}
private static byte[] bytes(String s) {
return s.getBytes(CharsetUtil.UTF_8);
}
}

View File

@ -0,0 +1,174 @@
/*
* Copyright 2020 The Netty Project
*
* The Netty Project licenses this file to you under the Apache License,
* version 2.0 (the "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
* License for the specific language governing permissions and limitations
* under the License.
*/
package io.netty.buffer.search;
import io.netty.buffer.ByteBuf;
import io.netty.buffer.Unpooled;
import io.netty.util.CharsetUtil;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.Parameterized;
import org.junit.runners.Parameterized.Parameter;
import org.junit.runners.Parameterized.Parameters;
import java.util.Arrays;
import static org.junit.Assert.*;
@RunWith(Parameterized.class)
public class SearchProcessorTest {
private enum Algorithm {
KNUTH_MORRIS_PRATT {
@Override
SearchProcessorFactory newFactory(byte[] needle) {
return AbstractSearchProcessorFactory.newKmpSearchProcessorFactory(needle);
}
},
BITAP {
@Override
SearchProcessorFactory newFactory(byte[] needle) {
return AbstractSearchProcessorFactory.newBitapSearchProcessorFactory(needle);
}
},
AHO_CORASIC {
@Override
SearchProcessorFactory newFactory(byte[] needle) {
return AbstractMultiSearchProcessorFactory.newAhoCorasicSearchProcessorFactory(needle);
}
};
abstract SearchProcessorFactory newFactory(byte[] needle);
}
@Parameters(name = "{0} algorithm")
public static Object[] algorithms() {
return Algorithm.values();
}
@Parameter
public Algorithm algorithm;
@Test
public void testSearch() {
final ByteBuf haystack = Unpooled.copiedBuffer("abc☺", CharsetUtil.UTF_8);
assertEquals(0, haystack.forEachByte(factory("a").newSearchProcessor()));
assertEquals(1, haystack.forEachByte(factory("ab").newSearchProcessor()));
assertEquals(2, haystack.forEachByte(factory("abc").newSearchProcessor()));
assertEquals(5, haystack.forEachByte(factory("abc☺").newSearchProcessor()));
assertEquals(-1, haystack.forEachByte(factory("abc☺☺").newSearchProcessor()));
assertEquals(-1, haystack.forEachByte(factory("abc☺x").newSearchProcessor()));
assertEquals(1, haystack.forEachByte(factory("b").newSearchProcessor()));
assertEquals(2, haystack.forEachByte(factory("bc").newSearchProcessor()));
assertEquals(5, haystack.forEachByte(factory("bc☺").newSearchProcessor()));
assertEquals(-1, haystack.forEachByte(factory("bc☺☺").newSearchProcessor()));
assertEquals(-1, haystack.forEachByte(factory("bc☺x").newSearchProcessor()));
assertEquals(2, haystack.forEachByte(factory("c").newSearchProcessor()));
assertEquals(5, haystack.forEachByte(factory("c☺").newSearchProcessor()));
assertEquals(-1, haystack.forEachByte(factory("c☺☺").newSearchProcessor()));
assertEquals(-1, haystack.forEachByte(factory("c☺x").newSearchProcessor()));
assertEquals(5, haystack.forEachByte(factory("").newSearchProcessor()));
assertEquals(-1, haystack.forEachByte(factory("☺☺").newSearchProcessor()));
assertEquals(-1, haystack.forEachByte(factory("☺x").newSearchProcessor()));
assertEquals(-1, haystack.forEachByte(factory("z").newSearchProcessor()));
assertEquals(-1, haystack.forEachByte(factory("aa").newSearchProcessor()));
assertEquals(-1, haystack.forEachByte(factory("ba").newSearchProcessor()));
assertEquals(-1, haystack.forEachByte(factory("abcd").newSearchProcessor()));
assertEquals(-1, haystack.forEachByte(factory("abcde").newSearchProcessor()));
haystack.release();
}
@Test
public void testRepeating() {
final ByteBuf haystack = Unpooled.copiedBuffer("abcababc", CharsetUtil.UTF_8);
final int length = haystack.readableBytes();
SearchProcessor processor = factory("ab").newSearchProcessor();
assertEquals(1, haystack.forEachByte(processor));
assertEquals(4, haystack.forEachByte(2, length - 2, processor));
assertEquals(6, haystack.forEachByte(5, length - 5, processor));
assertEquals(-1, haystack.forEachByte(7, length - 7, processor));
haystack.release();
}
@Test
public void testOverlapping() {
final ByteBuf haystack = Unpooled.copiedBuffer("ababab", CharsetUtil.UTF_8);
final int length = haystack.readableBytes();
SearchProcessor processor = factory("bab").newSearchProcessor();
assertEquals(3, haystack.forEachByte(processor));
assertEquals(5, haystack.forEachByte(4, length - 4, processor));
assertEquals(-1, haystack.forEachByte(6, length - 6, processor));
haystack.release();
}
@Test
public void testLongInputs() {
final int haystackLen = 1024;
final int needleLen = 64;
final byte[] haystackBytes = new byte[haystackLen];
haystackBytes[haystackLen - 1] = 1;
final ByteBuf haystack = Unpooled.copiedBuffer(haystackBytes); // 00000...00001
final byte[] needleBytes = new byte[needleLen]; // 000...000
assertEquals(needleLen - 1, haystack.forEachByte(factory(needleBytes).newSearchProcessor()));
needleBytes[needleLen - 1] = 1; // 000...001
assertEquals(haystackLen - 1, haystack.forEachByte(factory(needleBytes).newSearchProcessor()));
needleBytes[needleLen - 1] = 2; // 000...002
assertEquals(-1, haystack.forEachByte(factory(needleBytes).newSearchProcessor()));
needleBytes[needleLen - 1] = 0;
needleBytes[0] = 1; // 100...000
assertEquals(-1, haystack.forEachByte(factory(needleBytes).newSearchProcessor()));
}
@Test
public void testUniqueLen64Substrings() {
final byte[] haystackBytes = new byte[32 * 65]; // 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, ...
int pos = 0;
for (int i = 1; i <= 64; i++) {
for (int j = 0; j < i; j++) {
haystackBytes[pos++] = (byte) i;
}
}
final ByteBuf haystack = Unpooled.copiedBuffer(haystackBytes);
for (int start = 0; start < haystackBytes.length - 64; start++) {
final byte[] needle = Arrays.copyOfRange(haystackBytes, start, start + 64);
assertEquals(start + 63, haystack.forEachByte(factory(needle).newSearchProcessor()));
}
}
private SearchProcessorFactory factory(byte[] needle) {
return algorithm.newFactory(needle);
}
private SearchProcessorFactory factory(String needle) {
return factory(needle.getBytes(CharsetUtil.UTF_8));
}
}

View File

@ -443,6 +443,10 @@ public final class PlatformDependent {
return PlatformDependent0.getByte(data, index);
}
public static byte getByte(byte[] data, long index) {
return PlatformDependent0.getByte(data, index);
}
public static short getShort(byte[] data, int index) {
return PlatformDependent0.getShort(data, index);
}
@ -451,10 +455,18 @@ public final class PlatformDependent {
return PlatformDependent0.getInt(data, index);
}
public static int getInt(int[] data, long index) {
return PlatformDependent0.getInt(data, index);
}
public static long getLong(byte[] data, int index) {
return PlatformDependent0.getLong(data, index);
}
public static long getLong(long[] data, long index) {
return PlatformDependent0.getLong(data, index);
}
private static long getLongSafe(byte[] bytes, int offset) {
if (BIG_ENDIAN_NATIVE_ORDER) {
return (long) bytes[offset] << 56 |

View File

@ -41,6 +41,10 @@ final class PlatformDependent0 {
private static final InternalLogger logger = InternalLoggerFactory.getInstance(PlatformDependent0.class);
private static final long ADDRESS_FIELD_OFFSET;
private static final long BYTE_ARRAY_BASE_OFFSET;
private static final long INT_ARRAY_BASE_OFFSET;
private static final long INT_ARRAY_INDEX_SCALE;
private static final long LONG_ARRAY_BASE_OFFSET;
private static final long LONG_ARRAY_INDEX_SCALE;
private static final MethodHandle DIRECT_BUFFER_CONSTRUCTOR_HANDLE;
private static final Throwable EXPLICIT_NO_UNSAFE_CAUSE = explicitNoUnsafeCause0();
private static final MethodHandle ALLOCATE_ARRAY_HANDLE;
@ -188,6 +192,10 @@ final class PlatformDependent0 {
if (unsafe == null) {
ADDRESS_FIELD_OFFSET = -1;
BYTE_ARRAY_BASE_OFFSET = -1;
LONG_ARRAY_BASE_OFFSET = -1;
LONG_ARRAY_INDEX_SCALE = -1;
INT_ARRAY_BASE_OFFSET = -1;
INT_ARRAY_INDEX_SCALE = -1;
UNALIGNED = false;
DIRECT_BUFFER_CONSTRUCTOR_HANDLE = null;
ALLOCATE_ARRAY_HANDLE = null;
@ -234,6 +242,10 @@ final class PlatformDependent0 {
DIRECT_BUFFER_CONSTRUCTOR_HANDLE = directBufferConstructorHandle;
ADDRESS_FIELD_OFFSET = objectFieldOffset(addressField);
BYTE_ARRAY_BASE_OFFSET = UNSAFE.arrayBaseOffset(byte[].class);
INT_ARRAY_BASE_OFFSET = UNSAFE.arrayBaseOffset(int[].class);
INT_ARRAY_INDEX_SCALE = UNSAFE.arrayIndexScale(int[].class);
LONG_ARRAY_BASE_OFFSET = UNSAFE.arrayBaseOffset(long[].class);
LONG_ARRAY_INDEX_SCALE = UNSAFE.arrayIndexScale(long[].class);
final boolean unaligned;
Object maybeUnaligned = AccessController.doPrivileged(new PrivilegedAction<Object>() {
@Override
@ -473,6 +485,10 @@ final class PlatformDependent0 {
return UNSAFE.getByte(data, BYTE_ARRAY_BASE_OFFSET + index);
}
static byte getByte(byte[] data, long index) {
return UNSAFE.getByte(data, BYTE_ARRAY_BASE_OFFSET + index);
}
static short getShort(byte[] data, int index) {
return UNSAFE.getShort(data, BYTE_ARRAY_BASE_OFFSET + index);
}
@ -481,10 +497,18 @@ final class PlatformDependent0 {
return UNSAFE.getInt(data, BYTE_ARRAY_BASE_OFFSET + index);
}
static int getInt(int[] data, long index) {
return UNSAFE.getInt(data, INT_ARRAY_BASE_OFFSET + INT_ARRAY_INDEX_SCALE * index);
}
static long getLong(byte[] data, int index) {
return UNSAFE.getLong(data, BYTE_ARRAY_BASE_OFFSET + index);
}
static long getLong(long[] data, long index) {
return UNSAFE.getLong(data, LONG_ARRAY_BASE_OFFSET + LONG_ARRAY_INDEX_SCALE * index);
}
static void putByte(long address, byte value) {
UNSAFE.putByte(address, value);
}

View File

@ -0,0 +1,52 @@
/*
* Copyright 2020 The Netty Project
*
* The Netty Project licenses this file to you under the Apache License,
* version 2.0 (the "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
* License for the specific language governing permissions and limitations
* under the License.
*/
package io.netty.microbench.search;
import io.netty.buffer.ByteBuf;
import io.netty.buffer.CompositeByteBuf;
import io.netty.buffer.Unpooled;
public enum ByteBufType {
HEAP {
@Override
ByteBuf newBuffer(byte[] bytes) {
return Unpooled.wrappedBuffer(bytes, 0, bytes.length);
}
},
COMPOSITE {
@Override
ByteBuf newBuffer(byte[] bytes) {
CompositeByteBuf buf = Unpooled.compositeBuffer();
int length = bytes.length;
int offset = 0;
int capacity = length / 8; // 8 buffers per composite
while (length > 0) {
buf.addComponent(true, Unpooled.wrappedBuffer(bytes, offset, Math.min(length, capacity)));
length -= capacity;
offset += capacity;
}
return buf;
}
},
DIRECT {
@Override
ByteBuf newBuffer(byte[] bytes) {
return Unpooled.directBuffer(bytes.length).writeBytes(bytes);
}
};
abstract ByteBuf newBuffer(byte[] bytes);
}

View File

@ -0,0 +1,182 @@
/*
* Copyright 2020 The Netty Project
*
* The Netty Project licenses this file to you under the Apache License,
* version 2.0 (the "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
* License for the specific language governing permissions and limitations
* under the License.
*/
package io.netty.microbench.search;
import io.netty.buffer.ByteBuf;
import io.netty.buffer.ByteBufUtil;
import io.netty.buffer.Unpooled;
import io.netty.buffer.search.AbstractMultiSearchProcessorFactory;
import io.netty.buffer.search.AbstractSearchProcessorFactory;
import io.netty.buffer.search.SearchProcessorFactory;
import io.netty.microbench.util.AbstractMicrobenchmark;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.CompilerControl;
import org.openjdk.jmh.annotations.CompilerControl.Mode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.TearDown;
import org.openjdk.jmh.annotations.Warmup;
import java.util.Arrays;
import java.util.Random;
import java.util.concurrent.TimeUnit;
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Warmup(iterations = 5)
@Measurement(iterations = 5)
@Fork(1)
public class SearchBenchmark extends AbstractMicrobenchmark {
private static final long SEED = 123;
public enum Input {
RANDOM_256B {
@Override
byte[] getNeedle(Random rnd) {
return new byte[] { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h' };
}
@Override
byte[] getHaystack(Random rnd) {
return randomBytes(rnd, 256, ' ', 127);
}
},
RANDOM_2KB {
@Override
byte[] getNeedle(Random rnd) {
return new byte[] { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h' };
}
@Override
byte[] getHaystack(Random rnd) {
return randomBytes(rnd, 2048, ' ', 127);
}
},
PREDICTABLE {
@Override
byte[] getNeedle(Random rnd) {
// all 0s
return new byte[64];
}
@Override
byte[] getHaystack(Random rnd) {
// no 0s except in the very end
byte[] bytes = randomBytes(rnd, 2048, 1, 255);
Arrays.fill(bytes, bytes.length - 64, bytes.length, (byte) 0);
return bytes;
}
},
UNPREDICTABLE {
@Override
byte[] getNeedle(Random rnd) {
return randomBytes(rnd, 64, 0, 1);
}
@Override
byte[] getHaystack(Random rnd) {
return randomBytes(rnd, 2048, 0, 1);
}
},
WORST_CASE { // Bitap will fail on it because the needle is >64 bytes long
@Override
byte[] getNeedle(Random rnd) {
// aa(...)aab
byte[] needle = new byte[1024];
Arrays.fill(needle, (byte) 'a');
needle[needle.length - 1] = 'b';
return needle;
}
@Override
byte[] getHaystack(Random rnd) {
// aa(...)aaa
byte[] haystack = new byte[2048];
Arrays.fill(haystack, (byte) 'a');
return haystack;
}
};
abstract byte[] getNeedle(Random rnd);
abstract byte[] getHaystack(Random rnd);
}
@Param
public Input input;
@Param
public ByteBufType bufferType;
private Random rnd;
private ByteBuf needle, haystack;
private byte[] needleBytes, haystackBytes;
private SearchProcessorFactory kmpFactory, bitapFactory, ahoCorasicFactory;
@Setup
public void setup() {
rnd = new Random(SEED);
needleBytes = input.getNeedle(rnd);
haystackBytes = input.getHaystack(rnd);
needle = Unpooled.wrappedBuffer(needleBytes);
haystack = bufferType.newBuffer(haystackBytes);
kmpFactory = AbstractSearchProcessorFactory.newKmpSearchProcessorFactory(needleBytes);
ahoCorasicFactory = AbstractMultiSearchProcessorFactory.newAhoCorasicSearchProcessorFactory(needleBytes);
if (needleBytes.length <= 64) {
bitapFactory = AbstractSearchProcessorFactory.newBitapSearchProcessorFactory(needleBytes);
}
}
@TearDown
public void teardown() {
needle.release();
haystack.release();
}
@Benchmark
@CompilerControl(Mode.DONT_INLINE)
public int indexOf() {
return ByteBufUtil.indexOf(needle, haystack);
}
@Benchmark
@CompilerControl(Mode.DONT_INLINE)
public int kmp() {
return haystack.forEachByte(kmpFactory.newSearchProcessor());
}
@Benchmark
@CompilerControl(Mode.DONT_INLINE)
public int bitap() {
return haystack.forEachByte(bitapFactory.newSearchProcessor());
}
@Benchmark
@CompilerControl(Mode.DONT_INLINE)
public int ahoCorasic() {
return haystack.forEachByte(ahoCorasicFactory.newSearchProcessor());
}
private static byte[] randomBytes(Random rnd, int size, int from, int to) {
byte[] bytes = new byte[size];
for (int i = 0; i < size; i++) {
bytes[i] = (byte) (from + rnd.nextInt(to - from + 1));
}
return bytes;
}
}

View File

@ -0,0 +1,185 @@
/*
* Copyright 2020 The Netty Project
*
* The Netty Project licenses this file to you under the Apache License,
* version 2.0 (the "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
* License for the specific language governing permissions and limitations
* under the License.
*/
package io.netty.microbench.search;
import io.netty.buffer.ByteBuf;
import io.netty.buffer.search.AbstractMultiSearchProcessorFactory;
import io.netty.buffer.search.AbstractSearchProcessorFactory;
import io.netty.buffer.search.SearchProcessor;
import io.netty.buffer.search.SearchProcessorFactory;
import io.netty.microbench.util.AbstractMicrobenchmark;
import io.netty.util.internal.ResourcesUtil;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.CompilerControl;
import org.openjdk.jmh.annotations.CompilerControl.Mode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.TearDown;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.concurrent.TimeUnit;
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Warmup(iterations = 5)
@Measurement(iterations = 5)
@Fork(1)
public class SearchRealDataBenchmark extends AbstractMicrobenchmark {
public enum Algorithm {
AHO_CORASIC {
@Override
SearchProcessorFactory newFactory(byte[] needle) {
return AbstractMultiSearchProcessorFactory.newAhoCorasicSearchProcessorFactory(needle);
}
},
KMP {
@Override
SearchProcessorFactory newFactory(byte[] needle) {
return AbstractSearchProcessorFactory.newKmpSearchProcessorFactory(needle);
}
},
BITAP {
@Override
SearchProcessorFactory newFactory(byte[] needle) {
return AbstractSearchProcessorFactory.newBitapSearchProcessorFactory(needle);
}
};
abstract SearchProcessorFactory newFactory(byte[] needle);
}
@Param
public Algorithm algorithm;
@Param
public ByteBufType bufferType;
private ByteBuf haystack;
private SearchProcessorFactory[] searchProcessorFactories;
private SearchProcessorFactory searchProcessorFactory;
private static final byte[][] NEEDLES = {
"Thank You".getBytes(),
"* Does not exist *".getBytes(),
"<li>".getBytes(),
"<body>".getBytes(),
"</li>".getBytes(),
"github.com".getBytes(),
" Does not exist 2 ".getBytes(),
"</html>".getBytes(),
"\"https://".getBytes(),
"Netty 4.1.45.Final released".getBytes()
};
private int needleId, searchFrom, haystackLength;
@Setup
public void setup() throws IOException {
File haystackFile = ResourcesUtil.getFile(SearchRealDataBenchmark.class, "netty-io-news.html");
byte[] haystackBytes = readBytes(haystackFile);
haystack = bufferType.newBuffer(haystackBytes);
needleId = 0;
searchFrom = 0;
haystackLength = haystack.readableBytes();
searchProcessorFactories = new SearchProcessorFactory[NEEDLES.length];
for (int i = 0; i < NEEDLES.length; i++) {
searchProcessorFactories[i] = algorithm.newFactory(NEEDLES[i]);
}
}
@Setup(Level.Invocation)
public void invocationSetup() {
needleId = (needleId + 1) % searchProcessorFactories.length;
searchProcessorFactory = searchProcessorFactories[needleId];
}
@TearDown
public void teardown() {
haystack.release();
}
@Benchmark
@CompilerControl(Mode.DONT_INLINE)
public int findFirst() {
return haystack.forEachByte(searchProcessorFactory.newSearchProcessor());
}
@Benchmark
@CompilerControl(Mode.DONT_INLINE)
public int findFirstFromIndex() {
searchFrom = (searchFrom + 100) % haystackLength;
return haystack.forEachByte(
searchFrom, haystackLength - searchFrom, searchProcessorFactory.newSearchProcessor());
}
@Benchmark
@CompilerControl(Mode.DONT_INLINE)
public void findAll(Blackhole blackHole) {
SearchProcessor searchProcessor = searchProcessorFactory.newSearchProcessor();
int pos = 0;
do {
pos = haystack.forEachByte(pos, haystackLength - pos, searchProcessor) + 1;
blackHole.consume(pos);
} while (pos > 0);
}
private static byte[] readBytes(File file) throws IOException {
InputStream in = new FileInputStream(file);
try {
ByteArrayOutputStream out = new ByteArrayOutputStream();
try {
byte[] buf = new byte[8192];
for (;;) {
int ret = in.read(buf);
if (ret < 0) {
break;
}
out.write(buf, 0, ret);
}
return out.toByteArray();
} finally {
safeClose(out);
}
} finally {
safeClose(in);
}
}
private static void safeClose(InputStream in) {
try {
in.close();
} catch (IOException ignored) { }
}
private static void safeClose(OutputStream out) {
try {
out.close();
} catch (IOException ignored) { }
}
}

View File

@ -0,0 +1,19 @@
/*
* Copyright 2020 The Netty Project
*
* The Netty Project licenses this file to you under the Apache License,
* version 2.0 (the "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at:
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
* License for the specific language governing permissions and limitations
* under the License.
*/
/**
* Benchmarks for search ({@link io.netty.buffer.search} and {@link io.netty.buffer.ByteBufUtil#indexOf}).
*/
package io.netty.microbench.search;

View File

@ -0,0 +1,349 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Netty.news: Netty 4.1.45.Final released</title>
<title>Netty: Netty 4.1.45.Final released</title>
<meta content="width=device-width, initial-scale=1.0" name="viewport">
<link href="../../../../images/favicon.ico" rel="shortcut icon">
<link href="//feeds.feedburner.com/netty_project" rel="alternate" title="News Feed" type="application/rss+xml">
<style>
body {
padding-top: 60px;
}
</style>
<link href="//netdna.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap.min.css" media="screen" rel="stylesheet" type="text/css">
<link href="//netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.min.css" media="screen" rel="stylesheet" type="text/css">
<script src="../../../../lib/sh/scripts/shCore.js" type="text/javascript"></script>
<script src="../../../../lib/sh/scripts/shBrushXml.js" type="text/javascript"></script>
<link href="../../../../lib/sh/styles/shCore.css" rel="stylesheet" type="text/css">
<link href="../../../../lib/sh/styles/shThemeDefault.css" rel="stylesheet" type="text/css">
<link href="../../../../lib/common.css" rel="stylesheet" type="text/css">
<script src="../../../../lib/common.js" type="text/javascript"></script>
<!--[if lt IE 9]>
<script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7/html5shiv.js" type="text/javascript"></script>
<script src="//cdnjs.cloudflare.com/ajax/libs/respond.js/1.3.0/respond.js" type="text/javascript"></script>
<![endif]-->
</head>
<body>
<a class="sr-only" href="#content" id="top">Skip navigation</a>
<nav class="navbar navbar-default navbar-fixed-top hidden-print" id="header" role="navigation">
<div class="container">
<div class="navbar-header">
<button class="navbar-toggle" data-target=".navbar-collapse" data-toggle="collapse" type="button">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="../../../../index.html">
<span class="navbar-brand-logo"></span>
Netty project
</a>
</div>
<div class="navbar-collapse collapse">
<ul class="nav navbar-nav">
<li class="dropdown">
<a href="../../../../news/2020/01/13/4-1-45-Final.html">
News
</a>
<ul class="dropdown-menu">
<li>
<a href="../../../../news/index.html">
<i class="fa fa-archive"></i>
Archive
</a>
</li>
</ul>
</li>
<li class="dropdown">
<a href="../../../../downloads.html">
Downloads
</a>
<ul class="dropdown-menu">
<li>
<a href="https://dl.bintray.com/netty/downloads/netty-4.1.45.Final.tar.bz2">
<i class="fa fa-cloud-download"></i>
4.1.45.Final
<small>&dash; 13-Jan-2020</small>
</a>
</li>
<li>
<a href="https://dl.bintray.com/netty/downloads/netty-4.0.56.Final.tar.bz2">
<i class="fa fa-cloud-download"></i>
4.0.56.Final
<small>&dash; 05-Feb-2018</small>
</a>
</li>
<li>
<a href="https://dl.bintray.com/netty/downloads/netty-3.10.6.Final-dist.tar.bz2">
<i class="fa fa-cloud-download"></i>
3.10.6.Final
<small>&dash; 29-Jun-2016</small>
</a>
</li>
<li>
<a href="https://www.tldrlegal.com/l/APACHE2">
<i class="fa fa-gavel"></i>
Apache License 2.0
</a>
</li>
<li>
<a href="https://bintray.com/netty/downloads/netty/">
<i class="fa fa-archive"></i>
Previous Releases
</a>
</li>
<li>
<a href="https://oss.sonatype.org/content/repositories/snapshots/io/netty/">
<i class="fa fa-flask"></i>
Nightly Builds
</a>
</li>
</ul>
</li>
<li class="dropdown">
<a href="../../../../wiki/index.html">
Documentation
</a>
<ul class="dropdown-menu">
<li>
<a href="../../../../wiki/user-guide.html">
<i class="fa fa-book"></i>
User guide
</a>
</li>
<li>
<a href="../../../../4.1/api/index.html">
<i class="fa fa-file-text"></i>
Javadoc - 4.1
</a>
</li>
<li>
<a href="../../../../4.0/api/index.html">
<i class="fa fa-file-text"></i>
Javadoc - 4.0
</a>
</li>
<li>
<a href="../../../../3.10/api/index.html">
<i class="fa fa-file-text"></i>
Javadoc - 3.10
</a>
</li>
<li>
<a href="../../../../wiki/all-documents.html">
<i class="fa fa-list"></i>
All Documents
</a>
</li>
<li>
<a href="../../../../wiki/related-articles.html">
<i class="fa fa-bookmark"></i>
Related Articles
</a>
</li>
<li class="hidden-xs" id="bookpromo-dropdown">
<a href="https://www.manning.com/maurer/">
<img src="../../../../images/netty-in-action.gif">
<br>
<small>
Use code <strong>mlnettyco</strong>
<br>
for a 37% discount!
</small>
</a>
</li>
</ul>
</li>
<li class="dropdown">
<a href="../../../../community.html">
Get Involved
</a>
<ul class="dropdown-menu">
<li>
<a href="https://github.com/netty/netty">
<i class="fa fa-github-square"></i>
Github
</a>
</li>
<li>
<a href="https://stackoverflow.com/questions/tagged/netty">
<i class="fa fa-stack-overflow"></i>
StackOverflow
</a>
</li>
<li>
<a href="https://twitter.com/netty_project">
<i class="fa fa-twitter-square"></i>
@netty_project
</a>
</li>
<li>
<a href="../../../../wiki/developer-guide.html">
<i class="fa fa-cogs"></i>
Developer Guide
</a>
</li>
<li>
<a href="https://webchat.freenode.net/?channels=%23netty&amp;uio=MT1mYWxzZSY5PXRydWU13">
<i class="fa fa-comment"></i>
IRC Chat
</a>
</li>
<li>
<a href="../../../../sponsor/thanks.html">
<i class="fa fa-usd"></i>
Sponsors
</a>
</li>
<li>
<a href="../../../../wiki/adopters.html">
<i class="fa fa-users"></i>
Adopters
</a>
</li>
<li>
<a href="../../../../wiki/related-projects.html">
<i class="fa fa-chain"></i>
Related Projects
</a>
</li>
</ul>
</li>
<li class="visible-xs" id="bookpromo-nav">
<a href="https://www.manning.com/maurer/">
<img src="../../../../images/netty-in-action.gif">
<br>
<small>
Use code <strong>mlnettyco</strong>
<br>
for a 37% discount!
</small>
</a>
</li>
<li>
<a href="https://feeds.feedburner.com/netty_project">
<i class="fa fa-rss"></i>
</a>
</li>
</ul>
<form action="../../../../search.html" class="navbar-form navbar-right hidden-sm" method="GET" onsubmit="return validateGlobalSearchQuery()" role="search">
<div class="form-group">
<input class="search-query form-control" id="global-search-query" name="q" placeholder="Search" type="text">
</div>
</form>
</div>
</div>
</nav>
<div id="content">
<div class="container">
<div class="row">
<div class="col-md-9">
<div class="news-item" id="main-content">
<h1>
Netty 4.1.45.Final released
</h1>
<p class="byline">
<small>
by
<a href="https://github.com/normanmaurer">normanmaurer</a>
<br>
on
<time datetime="2020-01-13">13-Jan-2020</time>
</small>
</p>
<div class="news-content">
<p>I am happy to announce the release of netty 4.1.45.Final, our first release of 2020. This is a bug-fix release which also fixes two regressions. Please upgrade as soon as possible.</p>
<p>For more details please read-on.</p>
<p>The most important changes in this release are:</p>
<ul>
<li>Fix BufferOverflowException during non-Unsafe PooledDirectByteBuf resize (<a href="https://github.com/netty/netty/pull/9912">#9912</a>)</li>
<li>FlushConsolidationHandler may suppress flushes by mistake (<a href="https://github.com/netty/netty/pull/9931">#9931</a>)</li>
<li>Utf8FrameValidator must release buffer when validation fails (<a href="https://github.com/netty/netty/pull/9909">#9909</a>)</li>
<li>Avoid possible comparison contract violation (<a href="https://github.com/netty/netty/pull/9883">#9883</a>)</li>
<li>Ignore inline comments when parsing nameservers (<a href="https://github.com/netty/netty/pull/9894">#9894</a>)</li>
</ul>
<p>For the details and all changes, please browse our issue tracker for <a href="https://github.com/netty/netty/milestone/219?closed=1">4.1.45.Final</a>.</p>
<h1>Thank You</h1>
<p>Every idea and bug-report counts and so we thought it is worth mentioning those who helped in this area. Please report an unintended omission.</p>
<ul>
<li><a href="https://github.com/anuraaga">@anuraaga</a></li>
<li><a href="https://github.com/bishwenduk029">@bishwenduk029</a></li>
<li><a href="https://github.com/carryxyh">@carryxyh</a></li>
<li><a href="https://github.com/cilki">@cilki</a></li>
<li><a href="https://github.com/Delorien84">@Delorien84</a></li>
<li><a href="https://github.com/denyska">@denyska</a></li>
<li><a href="https://github.com/doom369">@doom369</a></li>
<li><a href="https://github.com/franz1981">@franz1981</a></li>
<li><a href="https://github.com/gerdriesselmann">@gerdriesselmann</a></li>
<li><a href="https://github.com/gilgamesjh">@gilgamesjh</a></li>
<li><a href="https://github.com/hyperxpro">@hyperxpro</a></li>
<li><a href="https://github.com/ikhoon">@ikhoon</a></li>
<li><a href="https://github.com/johnou">@johnou</a></li>
<li><a href="https://github.com/kamma-cc">@kamma-cc</a></li>
<li><a href="https://github.com/njhill">@njhill</a></li>
<li><a href="https://github.com/normanmaurer">@normanmaurer</a></li>
<li><a href="https://github.com/ursaj">@ursaj</a></li>
<li><a href="https://github.com/Scottmitch">@Scottmitch</a></li>
</ul>
</div>
<ul class="pager">
<li class="previous">
<a href="../../../../news/2019/12/18/4-1-44-Final.html">&larr; Older</a>
</li>
<li>
<a href="../../../index.html">List all news items</a>
</li>
<li class="next disabled">
<a href="#">Newer &rarr;</a>
</li>
</ul>
</div>
<div class="comments">
<div id="disqus_thread"></div>
<script type="text/javascript">
var disqus_shortname = 'netty0';
var disqus_url = "http://netty.io/news/2020/01/13/4-1-45-Final.html";
var disqus_developer = null;
var disqus_identifier = "news/2020-01-13-4-1-45-Final";
(function() {
var dsq = document.createElement("script"); dsq.type = "text/javascript"; dsq.async = true;
dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
(document.getElementsByTagName("head")[0] || document.getElementsByTagName("body")[0]).appendChild(dsq);
})();
</script>
<noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript=netty0">comments powered by Disqus.</a></noscript>
</div>
</div>
<div class="col-md-3">
<div id="twitter-timeline" class="hidden-xs hidden-sm hidden-print" role="complementary"><a class="twitter-timeline" href="https://twitter.com/netty_project" data-widget-id="412058459593383936">Tweets by @netty_project</a><script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?"http":"https";if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script></div>
</div>
</div>
</div>
</div>
<div class="container">
<hr>
<div id="footer">
<p>
Copyright &copy; 2020
<a href="../../../../index.html">The Netty project</a>
</p>
</div>
</div>
<script src="//cdnjs.cloudflare.com/ajax/libs/jquery/2.0.3/jquery.min.js" type="text/javascript"></script>
<script src="//netdna.bootstrapcdn.com/bootstrap/3.2.0/js/bootstrap.min.js" type="text/javascript"></script>
<script src="../../../../lib/common.footer.js" type="text/javascript"></script>
<script type="text/javascript">
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-95307-5', 'auto');
ga('require', 'displayfeatures');
ga('require', 'linkid', 'linkid.js');
ga('send', 'pageview');
</script>
</body>
</html>