a3c1832e86
Summary: Crc32c Parallel computation optimization: Algorithm comes from Intel whitepaper: [crc-iscsi-polynomial-crc32-instruction-paper](https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/crc-iscsi-polynomial-crc32-instruction-paper.pdf) Input data is divided into three equal-sized blocks Three parallel blocks (crc0, crc1, crc2) for 1024 Bytes One Block: 42(BLK_LENGTH) * 8(step length: crc32c_u64) bytes 1. crc32c_test: ``` [==========] Running 4 tests from 1 test case. [----------] Global test environment set-up. [----------] 4 tests from CRC [ RUN ] CRC.StandardResults [ OK ] CRC.StandardResults (1 ms) [ RUN ] CRC.Values [ OK ] CRC.Values (0 ms) [ RUN ] CRC.Extend [ OK ] CRC.Extend (0 ms) [ RUN ] CRC.Mask [ OK ] CRC.Mask (0 ms) [----------] 4 tests from CRC (1 ms total) [----------] Global test environment tear-down [==========] 4 tests from 1 test case ran. (1 ms total) [ PASSED ] 4 tests. ``` 2. RocksDB benchmark: db_bench --benchmarks="crc32c" ``` Linear Arm crc32c: crc32c: 1.005 micros/op 995133 ops/sec; 3887.2 MB/s (4096 per op) ``` ``` Parallel optimization with Armv8 crypto extension: crc32c: 0.419 micros/op 2385078 ops/sec; 9316.7 MB/s (4096 per op) ``` It gets ~2.4x speedup compared to linear Arm crc32c instructions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5494 Differential Revision: D16340806 fbshipit-source-id: 95dae9a5b646fd20a8303671d82f17b2e162e945
33 lines
983 B
C
33 lines
983 B
C
// Copyright (c) 2018, Arm Limited and affiliates. All rights reserved.
|
|
// This source code is licensed under both the GPLv2 (found in the
|
|
// COPYING file in the root directory) and Apache 2.0 License
|
|
// (found in the LICENSE.Apache file in the root directory).
|
|
|
|
#ifndef UTIL_CRC32C_ARM64_H
|
|
#define UTIL_CRC32C_ARM64_H
|
|
|
|
#include <cinttypes>
|
|
|
|
#if defined(__aarch64__) || defined(__AARCH64__)
|
|
|
|
#ifdef __ARM_FEATURE_CRC32
|
|
#define HAVE_ARM64_CRC
|
|
#include <arm_acle.h>
|
|
#define crc32c_u8(crc, v) __crc32cb(crc, v)
|
|
#define crc32c_u16(crc, v) __crc32ch(crc, v)
|
|
#define crc32c_u32(crc, v) __crc32cw(crc, v)
|
|
#define crc32c_u64(crc, v) __crc32cd(crc, v)
|
|
|
|
extern uint32_t crc32c_arm64(uint32_t crc, unsigned char const *data, unsigned len);
|
|
extern uint32_t crc32c_runtime_check(void);
|
|
|
|
#ifdef __ARM_FEATURE_CRYPTO
|
|
#define HAVE_ARM64_CRYPTO
|
|
#include <arm_neon.h>
|
|
#endif // __ARM_FEATURE_CRYPTO
|
|
#endif // __ARM_FEATURE_CRC32
|
|
|
|
#endif // defined(__aarch64__) || defined(__AARCH64__)
|
|
|
|
#endif
|