c++ 利用OpenMP优化大型图的遍历

rqqzpn5f 于 2023-02-26 发布在其他

关注(0)|答案(1)|浏览(157)

我正在尝试优化这个函数，根据 *perf工具 *，它是归档接近线性扩展的瓶颈。当线程数量增加时，性能会变差。当我向下钻取 perf 生成的汇编代码时，它显示大部分时间都花在检查已访问和未访问的顶点上。我做了大量的谷歌搜索来提高性能，但无济于事。有没有办法提高这个函数的性能？或者有没有线程安全的方法来实现这个函数？提前感谢您的帮助！

typedef uint32_t vidType;
template<typename T, typename U, typename V>
bool compare_and_swap(T &x, U old_val, V new_val) {
     return __sync_bool_compare_and_swap(&x, old_val, new_val);
 }

template<bool map_vertices, bool map_edges>
VertexSet GraphT<map_vertices, map_edges>::N(vidType vid) const {
  assert(vid >= 0);
  assert(vid < n_vertices);
  eidType begin = vertices[vid], end = vertices[vid+1];
  if (begin > end || end > n_edges) {
    fprintf(stderr, "vertex %u bounds error: [%lu, %lu)\n", vid, begin, end);
    exit(1);
  }
  assert(end <= n_edges);
  return VertexSet(edges + begin, end - begin, vid);
}

void bfs_step(Graph &g, vidType *depth, SlidingQueue<vidType> &queue) {
  #pragma omp parallel
  {
    QueueBuffer<vidType> lqueue(queue);

    #pragma omp for
    
    for (auto q_iter = queue.begin(); q_iter < queue.end(); q_iter++) {
      auto src = *q_iter;
      for (auto dst : g.N(src)) {
        //int curr_val = parent[dst];
        auto curr_val = depth[dst];
        if (curr_val == MYINFINITY) { // not visited
          //if (compare_and_swap(parent[dst], curr_val, src)) { 
          if (compare_and_swap(depth[dst], curr_val, depth[src] + 1)) {
            lqueue.push_back(dst);
          }
        }
      }
    }
    lqueue.flush();
  }
}

c++

来源：https://stackoverflow.com/questions/75495816/optimizing-huge-graph-traversal-with-openmp

1条答案

按热度按时间

crcmnpdw1#

首先，你使用的是一个非常传统的图算法公式，适合教科书，但不适合计算，如果你把它写成一个广义的矩阵向量与邻接矩阵的乘积，你会失去所有那些繁琐的队列，并行性变得非常明显。
在您的公式中，问题出在队列上的push_back函数上。这很难并行化。解决方案是让每个线程都有自己的队列，然后使用reduction。如果您在队列对象上定义了plus操作符来实现本地队列的合并，这就可以了。

赞(0）回复(0）举报 2023-02-26

我来回答

c++ 利用OpenMP优化大型图的遍历

1条答案

相关问题

热门标签

最新问答