如何在Pytorch中有效地计算X@W的平均值？

jk9hmnmh 于 2022-11-09 发布在其他

关注(0)|答案(1)|浏览(189)

我有一个形状为90，708 x 8000的稀疏（CSR格式）矩阵X和一个形状为8000 x 8000的密集矩阵W，我希望有效地计算以下过程的结果：
1.将X乘以W，得到一个新的98，708 × 8000矩阵，
1.按元素应用ReLU，
1.求每列的平均值，得到一个8000长的行向量。
不幸的是，我的GPU上没有足够的内存来使用上面描述的步骤计算结果。当我尝试在PyTorch中使用代码x = torch.mean(self.relu(torch.sparse.mm(x, self.weights1)), 0)执行计算时，我得到了以下错误：RuntimeError: CUDA out of memory. Tried to allocate 978.00 MiB (GPU 0; 4.00 GiB total capacity; 2.95 GiB already allocated; 0 bytes free; 2.97 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF .
现在，应该可以用不同的方法计算结果，例如，通过迭代W的8000列，在每次迭代中，将列向量乘以X，应用ReLU并取平均值。是否有更有效/更清晰的方法来执行所需的过程？

pytorch

来源：https://stackoverflow.com/questions/73884974/how-can-i-efficiently-compute-the-mean-of-xw-in-pytorch

1条答案

按热度按时间

14ifxucb1#

正如您所提到的，您可以按部分进行处理，以降低GPU内存使用的峰值。您可以使用torch.chunk将密集的w矩阵拆分为多组列，因为这可能比单独处理每列更高效。
示例：

import torch
from scipy import sparse

csr = sparse.random(90708, 8000, density=0.001, format="csr")
x = torch.sparse_csr_tensor(csr.indptr, csr.indices, csr.data, size=csr.shape, dtype=torch.float32).cuda()
w = torch.rand((8000, 8000)).cuda()

out_chunked = []
for w_chunk in torch.chunk(w, chunks=10, dim=1):
    out_chunked.append(torch.mean(torch.relu(torch.sparse.mm(x, w_chunk)), 0))
out_chunked = torch.cat(out_chunked)

out_full = torch.mean(torch.relu(torch.sparse.mm(x.cuda(), w)), 0)
assert torch.allclose(out_full, out_chunked)

赞(0）回复(0）举报 2022-11-09

我来回答

如何在Pytorch中有效地计算X@W的平均值？

1条答案

相关问题

热门标签

最新问答