c++ 从CUDA中可能重复的三元组列表中组装COO矩阵

epggiuax 于 2022-12-05 发布在其他

关注(0)|答案(1)|浏览(173)

给定一个std::vector<Eigen::Triplet<double>>，可以使用setFromTriplets函数创建一个Eigen::SparseMatrix<double>。默认情况下，该函数累积重复的三连音。
我正在尝试将模拟器代码迁移到CUDA，并且必须构建稀疏矩阵。我有一个内核，它可以计算三元组，并在三个设备指针（int*、int*、double*）中返回它们。但是，我不确定如何从这些指针组装矩阵。cuSparse COO格式对重复的三元组没有任何说明，它实际上表示行应该排序。
有没有办法用CUDA/cuSparse创建一个COO（或者更好的是直接创建一个CSR），给定i，j和具有多个（i，j）条目的值指针。

c++

来源：https://stackoverflow.com/questions/74349633/assemble-coo-matrix-from-a-list-of-possibly-repeated-triplets-in-cuda

1条答案

按热度按时间

aor9mmx11#

下面的代码使用CUDA工具包中的Thrust库在GPU上执行所需的操作。如果经常执行此操作，您会非常关注如何获得最佳性能，你可以直接使用CUB库，它被Thrust用作后端。这样你应该能够避免多个不必要的复制操作。（也可以使用cub::DeviceRadixSort和类型转换迭代器）和cub::DeviceSegmentedReduce
在uninitialized_vector.cu Thrust示例中可以找到一种避免临时向量的不必要初始化的方法。
另一种可能在不放弃Thrust的情况下提高性能的方法是使用thrust::cuda::par_nosync执行策略。（1.15）、因此您需要获取更新的版本（1.16或更高版本）from Github。这有助于隐藏内核启动开销，因为它尽可能使Thrust算法异步（分段归约必须是同步的，因为它返回新的结束迭代器）。您还可以向API添加显式CUDA流和池内存资源，以更好地处理异步行为，并分别避免重复分配开销。请参阅cuda/explicit_cuda_stream.cu，cuda/custom_temporary_allocation.cu和mr_basic.cu推力示例，了解更多信息。

#include <thrust/device_ptr.h>
#include <thrust/device_vector.h>
#include <thrust/distance.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/reduce.h>
#include <thrust/sort.h>
#include <thrust/tuple.h>

// Transforms triplets into COO sparse marix format in-place and
// returns the number of non-zero entries (nnz).
auto triplets2coo(int *d_rows,
                  int *d_cols,
                  float *d_vals,
                  int num_triplets) {
    // thrust::device_ptr is a lightweight wrapper that lets Thrust algorithms
    // know which memory the pointer points to. thrust::device_pointer_cast is
    // the preferred way of creating them. While an API directly taking
    // wrapped pointers as arguments would be preferable, I chose it this way 
    // to show how to wrap the pointers.
    auto const coord_iter = thrust::make_zip_iterator(
        thrust::make_tuple(
            thrust::device_pointer_cast(d_rows),
            thrust::device_pointer_cast(d_cols)
        ));
    auto const vals = thrust::device_pointer_cast(d_vals);

    // Sort in-place.
    thrust::sort_by_key(coord_iter, coord_iter + num_triplets,
                        vals);
    // Segmented reduction can only be done out-of-place,
    // so allocate more memory.
    thrust::device_vector<int> tmp_rows(num_triplets);
    thrust::device_vector<int> tmp_cols(num_triplets);
    thrust::device_vector<float> tmp_vals(num_triplets);
    // Caution: These vectors are unnecessarily initialized to zero!

    auto const tmp_coord_iter = thrust::make_zip_iterator(
        thrust::make_tuple(
            tmp_rows.begin(),
            tmp_cols.begin()
        ));

    auto const new_ends =
        thrust::reduce_by_key(coord_iter, coord_iter + num_triplets,
                              vals,
                              tmp_coord_iter,
                              tmp_vals.begin());
    auto const nnz = thrust::distance(tmp_vals.begin(), new_ends.second);

    thrust::copy_n(tmp_coord_iter, nnz, coord_iter);
    thrust::copy_n(tmp_vals.cbegin(), nnz, vals);

    return nnz;
}

赞(0）回复(0）举报 2022-12-05

我来回答

c++ 从CUDA中可能重复的三元组列表中组装COO矩阵

1条答案

相关问题

热门标签

最新问答