pytorch 如何计算基于GPU的网络的理论推理时间？

gzszwxb4 于 12个月前发布在其他

关注(0)|答案(1)|浏览(161)

我试图估计GPU在DL网络中进行推理需要多长时间。然而，当测试该方法时，理论和真实的计算时间是完全不同的。
以下是我目前正在做的事情：
我通过使用https://github.com/Lyken17/pytorch-OpCounter获得了网络的FLOP，如下所示：

macs, params = profile(model, inputs=(image, ))
tera_flop = macs * 10 ** -12 * 2

获得0.0184295 TFLOP。然后，计算我的GPU（NVIDIA RTX A3000）的浮点数：
4096个CUDA核心 * 1560 MHz * 2 * 10^-6 = 12.77 TFLOPS
这给了我一个理论推断时间：
0.0184 TFLOPS / 12.7795 TFLOPS = 0.00144 s
然后，我通过应用以下公式来测量真实的推理时间：

model.eval()
model.to(device)
image = image.unsqueeze(0).to(device)    

start, end = torch.cuda.Event(enable_timing=True), torch.cuda.Event(enable_timing=True)
reps = 300
timings = np.zeros((reps, 1))

# GPU warmup
for _ in range(10):
    _ = model(image)

# Measure performance
with torch.no_grad():
    for rep in range(reps):
        start.record()
        _ = model(image)
        end.record()
        # Wait for GPU to sync
        torch.cuda.synchronize()
        curr_time = start.elapsed_time(end)
        timings[rep] = curr_time

mean_syn = np.sum(timings) * 10 ** -3 / reps

这给了我0. 028秒的真实的计算时间。
你能帮我弄清楚我做错了什么吗？

pytorch

来源：https://stackoverflow.com/questions/71393256/how-to-calculate-theoretical-inference-time-of-a-network-based-on-gpu