并行前缀和中的位移位操作

flvtvl50 于 2022-09-26 发布在其他

关注(0)|答案(1)|浏览(161)

代码是从OpengGL-Superbible 10并行计算前缀和。

所示着色器的本地工作组大小为1024，这意味着它将处理包含2048个元素的数组，因为每次调用都会计算输出数组的两个元素。共享变量SHARED_DATA用于存储正在传输的数据。执行开始时，着色器将两个相邻元素从输入数组加载到数组中。接下来，它执行BAILAR()函数。此步骤确保在内部循环开始之前，所有着色器调用都已将其数据加载到共享数组中。


# version 450 core

layout (local_size_x = 1024) in;
layout (binding = 0) coherent buffer block1
{
    float input_data[gl_WorkGroupSize.x];
};
layout (binding = 1) coherent buffer block2
{
    float output_data[gl_WorkGroupSize.x];
};
shared float shared_data[gl_WorkGroupSize.x * 2];
void main(void)
{
    uint id = gl_LocalInvocationID.x;
    uint rd_id;
    uint wr_id;
    uint mask;// The number of steps is the log base 2 of the
    // work group size, which should be a power of 2
    const uint steps = uint(log2(gl_WorkGroupSize.x)) + 1;
    uint step = 0;
    // Each invocation is responsible for the content of
    // two elements of the output array
    shared_data[id * 2] = input_data[id * 2];
    shared_data[id * 2 + 1] = input_data[id * 2 + 1];
    // Synchronize to make sure that everyone has initialized
    // their elements of shared_data[] with data loaded from
    // the input arrays
    barrier();
    memoryBarrierShared();
    // For each step...
    for (step = 0; step < steps; step++)
    {
        // Calculate the read and write index in the
        // shared array
        mask = (1 << step) - 1;
        rd_id = ((id >> step) << (step + 1)) + mask;
        wr_id = rd_id + 1 + (id & mask);
        // Accumulate the read data into our element
        shared_data[wr_id] += shared_data[rd_id];
        // Synchronize again to make sure that everyone
        // has caught up with us
        barrier();
        memoryBarrierShared();
    } // Finally write our data back to the output image
    output_data[id * 2] = shared_data[id * 2];
    output_data[id * 2 + 1] = shared_data[id * 2 + 1];
}

如何直观地理解rd_id和wr_id的移位操作？为什么它会起作用？

opengl

来源：https://stackoverflow.com/questions/73256737/bit-shift-operation-in-parallel-prefix-sum

1条答案

按热度按时间

yrdbyhpb1#

当我们说某件事是“直觉的”时，我们通常是指我们的理解足够深入，以至于我们没有意识到我们自己的思维过程，并且在没有意识到的情况下“知道答案”。在这里，作者在CPU/GPU中使用整数的二进制表示，以使代码更短，(可能)略微更快。只有非常熟悉这种编码和对整数的二进制操作的人才能“直观”地使用该代码。我不是，所以我不得不考虑发生了什么。

我建议您完成这段代码，因为这类操作确实会在高性能图形和其他编程中发生。如果你觉得它很有趣，它最终会变得直观。如果没有，只要你能在必要的时候想出办法，那也没关系。

一种方法是将这段代码复制到C/C++程序中，并打印出掩码、RD_id、wr_id等。您实际上并不需要数据数组，也不需要调用BarrierShared()和MemyBarrierShared()。根据超级圣经示例的操作，创建调用ID和工作组大小的值。这可能足以让你说“啊哈！我明白了。”

如果您不熟悉<>转换，我建议您编写一些小程序并打印出结果数字。实际上，Python可能会稍微简单一些，因为

print("{:016b}".format(mask))

将显示实际的位，而在C中，您只能以十六进制打印。

首先，log2返回表示整数所需的位数。Log2(256)将是8，log2(4096)12，依此类推(不要相信我的话，编写一些代码。)

X<<n是将x乘以2的n次方，因此x<<1是x2，x<<2是x4，依此类推。X>>n除以1、2、4、..取而代之的是。(非常重要：仅适用于非负整数！同样，编写一些代码来了解发生了什么。)

掩码计算很有趣。尝试

mask = (1 << step);

首先，看看有什么价值出来。这是选择单个位的常见模式。Extra-1代之以生成右侧的所有位。

运算符&AND的掩码左边是0，右边是1，对于2的幂的整数%，AND是一种更快的方法。

最后，RD_id和WR_id数组索引需要从数组中的基本位置开始，从调用ID和工作组大小开始，并根据超级圣经文本中解释的模式递增。

赞(0）回复(0）举报 2022-09-26

我来回答

并行前缀和中的位移位操作

1条答案

相关问题

热门标签

最新问答