c++ 比较位集的最快方法(〈位集运算符)？

q5lcpyga 于 2023-01-22 发布在其他

关注(0)|答案(9)|浏览(154)

对于std::bitset，对应于无符号整数表示的比较，实现<运算符的最优化方法是什么（它应该适用于more than 64 bits的位集）？
一个简单的实现是：

template<std::size_t N>
bool operator<(const std::bitset<N>& x, const std::bitset<N>& y)
{
    for (int i = N-1; i >= 0; i--) {
        if (x[i] && !y[i]) return false;
        if (!x[i] && y[i]) return true;
    }
    return false;
}

当我说“最优化的方式”时，我寻找的是使用位操作和元编程技巧（以及类似的东西）的实现。
编辑：我想我找到窍门了：一个用于编译时递归和右移位的模板元编程，以便将位集作为几个无符号的长整型进行比较。但是没有明确的想法如何做...

c++

来源：https://stackoverflow.com/questions/21245139/fastest-way-to-compare-bitsets-operator-on-bitsets

9条答案

按热度按时间

qvtsj1bj1#

最明显的优化是

template<std::size_t N>
bool operator<(const std::bitset<N>& x, const std::bitset<N>& y)
{
    for (int i = N-1; i >= 0; i--) {
        if (x[i] ^ y[i]) return y[i];
    }
    return false;
}

除此之外，使用更多的每次测试位数应该是不可能的，因为没有符合标准的方式来访问它们。您可以对x.to_string() < y.to_string()进行基准测试，并希望to_string()和字符串比较都比按位访问bitset更好地优化，但这是一个很长的机会。

赞(0）回复(0）举报 2023-01-22

zy1mlcev2#

如果您愿意采用STL位集更改的解决方案，则可以使用

template<int n>
bool compare(bitset<n>& l, bitset<n>& r){
  if(n > 64){
  typedef array<long, (n/64)> AsArray;
  return *reinterpret_cast<AsArray*>(&l)
       < *reinterpret_cast<AsArray*>(&r);
    }//else
  return l.to_ulong() < r.to_ulong();
}

编译器将丢弃if的不相关分支

赞(0）回复(0）举报 2023-01-22

y53ybaqx3#

虽然你说的是位集，但你实际上是在谈论任意精度的无符号整数比较，如果是这样，那么你可能不会轻易地比 Package GMP做得更好。
从他们的网站：
GMP经过精心设计，无论是对于小操作数还是大操作数，速度都尽可能快。它的速度是通过使用全字作为基本算术类型、使用快速算法、使用高度优化的汇编代码来实现的，这些汇编代码用于许多CPU的最常见的内部循环，以及对速度的普遍强调。
考虑它们的整数函数

赞(0）回复(0）举报 2023-01-22

lp0sw83n4#

我刚刚看了源代码，但不幸的是（希望我弄错了），它们似乎没有为特定的位块提供对`const & unsigned long`的就地访问，如果它们提供了，那么您可以执行模板递归，并有效地比较每个`unsigned long`，而不是无符号长整型中的每个位。
毕竟，如果`A < B`，那么不仅每个最高有效位`a <= b`，而且每个最高有效块`A[i] <= B[i]`。
我不想这么说，但我可能会在C++11的`std::array`上使用递归来实现我自己的功能。如果你可以访问这些块，那么你可以创建一个模板递归函数来很容易地实现这一点（我相信你知道，因为你要求元编程），这给了编译器一个很好的优化机会。
总而言之，这不是一个很好的答案，但我会这么做。
顺便说一句，问得好。

**编辑

这应该是三种方法的时机：最新的赞成票，我描述的块策略，和一个模板递归变量。我用位集填充一个向量，然后使用指定的比较器函子重复排序。
快乐黑客!

- 计算机上的输出：**

RUNTIMES:
compiled g++ -std=c++11 -Wall -g test.cpp
    std::bitset         4530000 (6000000 original in OP)
    Block-by-block      900000
    Template recursive  730000

compiled g++ -std=c++11 -Wall -g -O3 test.cpp
RUNTIMES:
    std::bitset         700000 (740000 original in OP)
    Block-by-block      470000
    Template recursive  530000

- C++11代码：**

#include <iostream>
#include <bitset>
#include <algorithm>
#include <time.h>

/* Existing answer. Note that I've flipped the order of bit significance to match my own */
template<std::size_t N>
class BitByBitComparator
{
public:
  bool operator()(const std::bitset<N>& x, const std::bitset<N>& y) const
  {
    for (int i = 0; i < N; ++i) {
      if (x[i] ^ y[i]) return y[i];
    }
    return false;
  }
};

/* New simple bit set class (note: mostly untested). Also note bad
   design: should only allow read access via immutable facade. */
template<std::size_t N>
class SimpleBitSet
{
public:
  static const int BLOCK_SIZE = 64;
  static const int LOG_BLOCK_SIZE = 6;
  static constexpr int NUM_BLOCKS = N >> LOG_BLOCK_SIZE;
  std::array<unsigned long int, NUM_BLOCKS> allBlocks;
  SimpleBitSet()
  {
    allBlocks.fill(0);
  }
  void addItem(int itemIndex)
  {
    // TODO: can do faster
    int blockIndex = itemIndex >> LOG_BLOCK_SIZE;
    unsigned long int & block = allBlocks[blockIndex];
    int indexWithinBlock = itemIndex % BLOCK_SIZE;
    block |= (0x8000000000000000 >> indexWithinBlock);
  }
  bool getItem(int itemIndex) const
  {
    int blockIndex = itemIndex >> LOG_BLOCK_SIZE;
    unsigned long int block = allBlocks[blockIndex];
    int indexWithinBlock = itemIndex % BLOCK_SIZE;
    return bool((block << indexWithinBlock) & 0x8000000000000000);
  }
};

/* New comparator type 1: block-by-block. */
template<std::size_t N>
class BlockByBlockComparator
{
public:
  bool operator()(const SimpleBitSet<N>& x, const SimpleBitSet<N>& y) const
  {
    return ArrayCompare(x.allBlocks, y.allBlocks);
  }

  template <std::size_t S>
  bool ArrayCompare(const std::array<unsigned long int, S> & lhs, const std::array<unsigned long int, S> & rhs) const
  {
    for (int i=0; i<S; ++i)
      {
    unsigned long int lhsBlock = lhs[i];
    unsigned long int rhsBlock = rhs[i];
    if (lhsBlock < rhsBlock) return true;
    if (lhsBlock > rhsBlock) return false;
      }
    return false;
  }
};

/* New comparator type 2: template recursive block-by-block. */
template <std::size_t I, std::size_t S>
class TemplateRecursiveArrayCompare;

template <std::size_t S>
class TemplateRecursiveArrayCompare<S, S>
{
public:
  bool operator()(const std::array<unsigned long int, S> & lhs, const std::array<unsigned long int, S> & rhs) const
  {
    return false;
  }
};

template <std::size_t I, std::size_t S>
class TemplateRecursiveArrayCompare
{
public:
  bool operator()(const std::array<unsigned long int, S> & lhs, const std::array<unsigned long int, S> & rhs) const
  {
    unsigned long int lhsBlock = lhs[I];
    unsigned long int rhsBlock = rhs[I];
    if (lhsBlock < rhsBlock) return true;
    if (lhsBlock > rhsBlock) return false;

    return TemplateRecursiveArrayCompare<I+1, S>()(lhs, rhs);
  }
};

template<std::size_t N>
class TemplateRecursiveBlockByBlockComparator
{
public:
  bool operator()(const SimpleBitSet<N>& x, const SimpleBitSet<N>& y) const
  {
    return TemplateRecursiveArrayCompare<x.NUM_BLOCKS, x.NUM_BLOCKS>()(x.allBlocks, y.allBlocks);
  }
};

/* Construction, timing, and verification code */
int main()
{
  srand(0);

  const int BITSET_SIZE = 4096;

  std::cout << "Constructing..." << std::endl;

  // Fill a vector with random bitsets
  const int NUMBER_TO_PROCESS = 10000;
  const int SAMPLES_TO_FILL = BITSET_SIZE;
  std::vector<std::bitset<BITSET_SIZE> > allBitSets(NUMBER_TO_PROCESS);
  std::vector<SimpleBitSet<BITSET_SIZE> > allSimpleBitSets(NUMBER_TO_PROCESS);
  for (int k=0; k<NUMBER_TO_PROCESS; ++k)
    {
      std::bitset<BITSET_SIZE> bs;
      SimpleBitSet<BITSET_SIZE> homemadeBs;
      for (int j=0; j<SAMPLES_TO_FILL; ++j)
    {
      int indexToAdd = rand()%BITSET_SIZE;
      bs[indexToAdd] = true;
      homemadeBs.addItem(indexToAdd);
    }

      allBitSets[k] = bs;
      allSimpleBitSets[k] = homemadeBs;
    }

  clock_t t1,t2,t3,t4;
  t1=clock();

  std::cout << "Sorting using bit-by-bit compare and std::bitset..."  << std::endl;
  const int NUMBER_REPS = 100;
  for (int rep = 0; rep<NUMBER_REPS; ++rep)
    {
      auto tempCopy = allBitSets;
      std::sort(tempCopy.begin(), tempCopy.end(), BitByBitComparator<BITSET_SIZE>());
    }

  t2=clock();

  std::cout << "Sorting block-by-block using SimpleBitSet..."  << std::endl;
  for (int rep = 0; rep<NUMBER_REPS; ++rep)
    {
      auto tempCopy = allSimpleBitSets;
      std::sort(tempCopy.begin(), tempCopy.end(), BlockByBlockComparator<BITSET_SIZE>());
    }

  t3=clock();

  std::cout << "Sorting block-by-block w/ template recursion using SimpleBitSet..."  << std::endl;
  for (int rep = 0; rep<NUMBER_REPS; ++rep)
    {
      auto tempCopy = allSimpleBitSets;
      std::sort(tempCopy.begin(), tempCopy.end(), TemplateRecursiveBlockByBlockComparator<BITSET_SIZE>());
    }

  t4=clock();

  std::cout << std::endl << "RUNTIMES:" << std::endl;
  std::cout << "\tstd::bitset        \t" << t2-t1 << std::endl;
  std::cout << "\tBlock-by-block     \t" << t3-t2 << std::endl;
  std::cout << "\tTemplate recursive \t" << t4-t3 << std::endl;
  std::cout << std::endl;

  std::cout << "Checking result... ";
  std::sort(allBitSets.begin(), allBitSets.end(), BitByBitComparator<BITSET_SIZE>());
  auto copy = allSimpleBitSets;
  std::sort(allSimpleBitSets.begin(), allSimpleBitSets.end(), BlockByBlockComparator<BITSET_SIZE>());
  std::sort(copy.begin(), copy.end(), TemplateRecursiveBlockByBlockComparator<BITSET_SIZE>());
  for (int k=0; k<NUMBER_TO_PROCESS; ++k)
    {
      auto stdBitSet = allBitSets[k];
      auto blockBitSet = allSimpleBitSets[k];
      auto tempRecBlockBitSet = allSimpleBitSets[k];

      for (int j=0; j<BITSET_SIZE; ++j)
    if (stdBitSet[j] != blockBitSet.getItem(j) || blockBitSet.getItem(j) != tempRecBlockBitSet.getItem(j))
      std::cerr << "error: sorted order does not match" << std::endl;
    }
  std::cout << "success" << std::endl;

  return 0;
}

赞(0）回复(0）举报 2023-01-22

2izufjch5#

检查XOR的最高位如何？

bool operator<(const std::bitset<N>& x, const std::bitset<N>& y)
{
    return y[fls(x^y)]
}

int fls(const std::bitset<N>& n) {
    // find the last set bit
}

关于fps的一些想法可以在http://uwfsucks.blogspot.be/2007/07/fls-implementation.html中找到。

赞(0）回复(0）举报 2023-01-22

vtwuwzda6#

好吧，有一个好的老memcmp。它是脆弱的，因为它依赖于std::bitset的实现。因此可能无法使用。但是可以合理地假设模板创建了一个int的不透明数组。并且没有其他簿记字段。

template<std::size_t N>
bool operator<(const std::bitset<N>& x, const std::bitset<N>& y)
{
    int cmp = std::memcmp(&x, &y, sizeof(x));
    return (cmp < 0);
}

这将唯一地确定bitset s的排序。但它可能不是人类直观的顺序。它取决于哪些位用于哪个集合成员索引。例如，索引0可以是前32位整数的LSB。或者它可以是前8位字节的LSB。
我强烈建议进行单元测试，以确保它的使用方式确实有效。-〉

赞(0）回复(0）举报 2023-01-22

n7taea2i7#

只有当两个位集不同时才执行逐位比较，这已经产生了一些性能提升：

template<std::size_t N>
bool operator<(const std::bitset<N>& x, const std::bitset<N>& y)
{       if (x == y)
                return false;
        ….
}

赞(0）回复(0）举报 2023-01-22

e5nqia278#

我知道这是一个老问题，但是如果你知道一个位集的最大大小，你可以创建如下的东西：

class Bitset{
    vector<bitset<64>> bits;
    /*
     * operators that you need
    */
};

这允许您将每个bitsets<64>强制转换为unsigned long long以进行快速比较。如果您想获得特定位（为了更改它或其他），可以执行bits[id / 64][id % 64]

赞(0）回复(0）举报 2023-01-22

exdqitrt9#

bitset的底层实现在几乎所有64位CPU、编译器等中使用uint 64，因为只有一种明智的方法来编写具有给定接口的类的实现，这使得很容易找出“可移植”的黑客攻击。
因此，假设你只是想“明显”的有效方法来做到这一点，你的代码不会被用来控制核武库，完全知道这将使你的保修无效，等等，这里是你正在寻找的代码：

template <int N> bool operator<(const bitset<N> & a, const bitset<N> & b) {

    const uint64_t * p = (const uint64_t *)(&a);
    const uint64_t * q = (const uint64_t *)(&b);

    const uint64_t * r = p;

    int i= (sizeof(bitset<N>)-1)/sizeof(uint64_t);

    for (p+=i, q+=i; (p>=r) && (*p==*q); --p, --q) {}

    return *p<*q;
}

基本上就是转换为一个uint 64数组，然后以相反的顺序逐个元素进行比较，直到发现差异。
还要注意，这是假定x86-64字节序的。

赞(0）回复(0）举报 2023-01-22

我来回答

c++ 比较位集的最快方法(〈位集运算符)？

9条答案

**编辑

相关问题

热门标签

最新问答