C/C++快速求两个级数的绝对差

t1rydlwq 于 2023-01-10 发布在 C/C++

关注(0)|答案(4)|浏览(252)

我感兴趣的是生成高效的c/c ++代码，以获得两个时间序列之间的差异。时间序列值存储为固定且相等长度== 128的uint16_t数组。

- 我擅长纯c和纯c实现。我的代码示例是c**

我的意图是：

Let A,B and C be discrete time series of length l with a value-type of uint16_t.
Vn[n<l]: Cn = |An - Bn|;

我能想到的伪代码：

for index i:
 if a[i] > b[i]:
    c[i] = a[i] - b[i]
 else:
    c[i] = b[i] - a[i]

或者在c/c ++中

for(uint8_t idx = 0; idx < 128; idx++){
    c[i] = a[i] > b[i] ? a[i] - b[i] : b[i] - a[i];
}

但是我真的不喜欢循环中的if/else语句。我可以接受循环--这可以由编译器展开。有点像：

void getBufDiff(const uint16_t (&a)[], const uint16_t (&b)[], uint16_t (&c)[]) {
#pragma unroll 16
    for (uint8_t i = 0; i < 128; i++) {
        c[i] = a[i] > b[i] ? a[i] - b[i] : b[i] - a[i];
    }
#end pragma
}

我正在寻找的是一个"魔术代码"，它可以加快if/else的速度，并让我得到两个无符号值之间的绝对差。
我可以接受+/- 1的精度（以防出现位魔术）。我也可以更改数据类型以获得更快的结果。我也可以删除循环以获得其他结果。
比如：

void getBufDiff(const uint16_t (&a)[], const uint16_t (&b)[], uint16_t (&c)[]) {
#pragma unroll 16
    for (uint8_t i = 0; i < 128; i++) {
        c[i] = magic_code_for_abs_diff(a[i],b[i]);
    }
#end pragma
}

已尝试对两个值进行异或运算。仅对其中一种情况给出正确的结果。

- 编辑2：**

在我的笔记本电脑上做了一个不同方法的快速测试。
对于250000000个条目，这是性能（256轮）：

c[i] = a[i] > b[i] ? a[i] - b[i] : b[i] - a[i];  ~500ms
c[i] = std::abs(a[i] - b[i]);                    ~800ms
c[i] = ((a[i] - b[i]) + ((a[i] - b[i]) >> 15)) ^ (i >> 15) ~425ms
uint16_t tmp = (a[i] - b[i]); c[i] = tmp * ((tmp > 0) - (tmp < 0)); ~600ms
uint16_t ret[2] = { a[i] - b[i], b[i] - a[i] };c[i] = ret[a[i] < b[i]] ~900ms
c[i] = ((a[i] - b[i]) >> 31 | 1) * (a[i] - b[i]); ~375ms
c[i] = ((a[i] - b[i])) ^ ((a[i] - b[i]) >> 15) ~425ms

c++

来源：https://stackoverflow.com/questions/75046559/c-c-fast-absolute-difference-between-two-series

4条答案

按热度按时间

agxfikkp1#

你的问题很适合用SIMD处理。GCC可以自动处理，下面是一个简化的例子：https://godbolt.org/z/36nM8bYYv

void absDiff(const uint16_t* a, const uint16_t* b, uint16_t* __restrict__ c)
{
    for (uint8_t i = 0; i < 16; i++)
        c[i] = a[i] - b[i];
}

注意，我添加了__restrict__以启用自动向量化，否则编译器必须假设您的数组可能重叠，并且使用SIMD是不安全的（因为某些写入可能会更改循环中未来的读取）。
我将其简化为一次只有16个，为了便于说明，去掉了绝对值。生成的程序集为：

vld1.16 {q9}, [r0]!
    vld1.16 {q11}, [r1]!
    vld1.16 {q8}, [r0]
    vld1.16 {q10}, [r1]
    vsub.i16        q9, q9, q11
    vsub.i16        q8, q8, q10
    vst1.16 {q9}, [r2]!
    vst1.16 {q8}, [r2]
    bx      lr

这意味着它一次从a加载8个整数，然后从b加载，重复一次，然后一次做8个减法，然后再做一次，然后将8个值存储到c中两次。
当然，这需要进行基准测试，以确定这在您的系统上是否真的更快（在添加回绝对值部分后，我建议使用您的?:方法，它不会击败自动矢量化），但我预计它会明显更快。

赞(0）回复(0）举报 2023-01-10

rggaifut2#

尝试让编译器看到SIMD指令的条件通道选择模式，如下所示（伪代码）：

// store a,b to SIMD registers
for(0 to 32)
   a[...] = input[...]
   b[...] = input2[...]

// single type operation, easily parallelizable
for(0 to 32)
   vector1[...] = a[...] - b[...]

// single type operation, easily parallelizable
// maybe better to compute b-a to decrease dependency to first step
// since a and b are already in SIMD registers
for(0 to 32)
   vector2[...] = -vector1[...]

// single type operation, easily parallelizable
// re-use a,b registers, again
for(0 to 32)
   vector3[...] = a[...] < b[...]

// x84 architecture has SIMD instructions for this
// operation is simple, no other calculations inside, just 3 inputs, 1 out
// all operands are registers (at least should be, if compiler works fine)
for(0 to 32)
   vector4[...] = vector3[...] ? vector2[...]:vetor1[...]

如果你编写了自己的基准测试代码，我可以将其与其他解决方案进行比较，但对于好的编译器（或好的编译器标志）来说，这并不重要，因为它们会自动为第一个有问题的基准测试代码做同样的事情。

赞(0）回复(0）举报 2023-01-10

f3temu5u3#

快速abs（两个补码整数）可以实现为(x + (x >> N)) ^ (x >> N)，其中N是int-1的大小，在本例中为15。这是std::abs的一个可能实现。您仍然可以尝试它

通过freakish应答

赞(0）回复(0）举报 2023-01-10

xzlaal3s4#

既然你写的是“我可以接受+/- 1的精度”，你可以使用XOR解决方案：执行x ^ (x >> 15)而不是abs(x)。这将为负值给予相差1的结果。
如果您想计算正确的结果，即使是负值，请使用另一个答案（经过x >> 15校正）。
在任何情况下，这个异或技巧只有在溢出不可能的情况下才有效，因此编译器不能用使用异或的代码替换abs。

赞(0）回复(0）举报 2023-01-10

我来回答

C/C++快速求两个级数的绝对差

4条答案

相关问题

热门标签

最新问答