pytorch 优化高斯热图生成

我有一组68个关键点（size [68，2]），我将它们Map到高斯热图上。

def generate_gaussian(t, x, y, sigma=10):

    """
    Generates a 2D Gaussian point at location x,y in tensor t.
    x should be in range (-1, 1).
    sigma is the standard deviation of the generated 2D Gaussian.
    """
    h,w = t.shape

    # Heatmap pixel per output pixel
    mu_x = int(0.5 * (x + 1.) * w)
    mu_y = int(0.5 * (y + 1.) * h)

    tmp_size = sigma * 3

    # Top-left
    x1,y1 = int(mu_x - tmp_size), int(mu_y - tmp_size)

    # Bottom right
    x2, y2 = int(mu_x + tmp_size + 1), int(mu_y + tmp_size + 1)
    if x1 >= w or y1 >= h or x2 < 0 or y2 < 0:
        return t

    size = 2 * tmp_size + 1
    tx = np.arange(0, size, 1, np.float32)
    ty = tx[:, np.newaxis]
    x0 = y0 = size // 2

    # The gaussian is not normalized, we want the center value to equal 1
    g = torch.tensor(np.exp(- ((tx - x0) ** 2 + (ty - y0) ** 2) / (2 * sigma ** 2)))

    # Determine the bounds of the source gaussian
    g_x_min, g_x_max = max(0, -x1), min(x2, w) - x1
    g_y_min, g_y_max = max(0, -y1), min(y2, h) - y1

    # Image range
    img_x_min, img_x_max = max(0, x1), min(x2, w)
    img_y_min, img_y_max = max(0, y1), min(y2, h)

    t[img_y_min:img_y_max, img_x_min:img_x_max] = \
      g[g_y_min:g_y_max, g_x_min:g_x_max]

    return t

def rescale(a, img_size):
    # scale tensor to [-1, 1]
    return 2 * a / img_size[0] - 1

我当前的代码使用for循环来计算68个关键点坐标中每个坐标的高斯热图，然后堆叠得到的Tensor来创建[68，H，W]Tensor：

x_k1 = [generate_gaussian(torch.zeros(H, W), x, y) for x, y in rescale(kp1.numpy(), frame.shape)]
x_k1 = torch.stack(x_k1, dim=0)

然而，这个方法非常慢。有没有什么方法可以在没有for循环的情况下完成这个操作？
编辑：
我尝试了Cris Luengo的建议来计算一维高斯：

def generate_gaussian1D(t, x, y, sigma=10):
    h,w = t.shape

    # Heatmap pixel per output pixel
    mu_x = int(0.5 * (x + 1.) * w)
    mu_y = int(0.5 * (y + 1.) * h)

    tmp_size = sigma * 3

    # Top-left
    x1, y1 = int(mu_x - tmp_size), int(mu_y - tmp_size)

    # Bottom right
    x2, y2 = int(mu_x + tmp_size + 1), int(mu_y + tmp_size + 1)
    if x1 >= w or y1 >= h or x2 < 0 or y2 < 0:
        return t

    size = 2 * tmp_size + 1
    tx = np.arange(0, size, 1, np.float32)
    ty = tx[:, np.newaxis]
    x0 = y0 = size // 2

    g = torch.tensor(np.exp(-np.power(tx - mu_x, 2.) / (2 * np.power(sigma, 2.))))
    g = g * g[:, None]

    g_x_min, g_x_max = max(0, -x1), min(x2, w) - x1
    g_y_min, g_y_max = max(0, -y1), min(y2, h) - y1

    img_x_min, img_x_max = max(0, x1), min(x2, w)
    img_y_min, img_y_max = max(0, y1), min(y2, h)

    t[img_y_min:img_y_max, img_x_min:img_x_max] = \
      g[g_y_min:g_y_max, g_x_min:g_x_max]

    return t

但是我的输出结果是不完全的高斯分布。

我不知道我做错了什么。任何帮助都将不胜感激。

生成一个NxN数组g，高斯函数以其中心像素为中心。计算N时，使其从中心像素延伸3*sigma。这是构建此类数组的最快方法：

tmp_size = sigma * 3
tx = np.arange(1, tmp_size + 1, 1, np.float32)
g = np.exp(-(tx**2) / (2 * sigma**2))
g = np.concatenate((np.flip(g), [1], g))
g = g * g[:, None]

这里我们要做的是计算半个一维高斯函数。我们甚至不需要计算中间像素的高斯函数值，我们知道它是1。然后我们通过翻转半个高斯函数并连接来构建完整的一维高斯函数。最后，通过一维高斯函数与其自身的外积来构建二维高斯函数。
我们可以通过构建四分之一的2D高斯模型，然后将它的四个旋转副本连接起来来节省一些额外的时间。但是计算成本的差异并不是很大，而且这要简单得多。注意，np.exp是目前为止最昂贵的运算，所以只要尽量减少调用它的频率，就可以显著降低计算成本。
然而，加速整个代码的最好方法是只计算一次数组g，而不是每个关键点都重新计算。注意sigma不会改变，所以所有计算的数组g都是相同的。如果只计算一次，那么用哪种方法计算就不再重要了，因为这在整个程序中只占很小的一部分。
例如，您可以使用全局变量_gaussian来保存数组，并让函数仅在第一次调用时计算该数组。或者，您可以将函数分成两个函数，一个用于构造该数组，另一个用于将该数组复制到图像中，并按如下方式调用它们：

g = create_gaussian(sigma=3)
x_k1 = [
    copy_gaussian(torch.zeros(H, W), x, y, g)
    for x, y in rescale(kp1.numpy(), frame.shape)
]

另一方面，您可能最好使用现有的功能。例如，DIPlib有一个函数dip.DrawBandlimitedPoint（）[disclosure：我是一个作者]，它可以在图像中添加高斯斑点。你可能会在其他库中找到类似的函数。

pytorch 优化高斯热图生成

1条答案

相关问题

热门标签

最新问答