c++ 将浮点值四舍五入到最接近的2的幂

dm7nw8vv  于 2023-07-01  发布在  其他
关注(0)|答案(2)|浏览(141)

虽然对于如何找到整数或浮点数的下一个2的幂的问题有很多答案,但是对于找到一个数字的最接近的2的幂却没有那么多答案。
我已经实现了以下内容:

template <typename T>
static constexpr T round_pow2(T v) {
    if constexpr (std::is_floating_point_v<T>) {
        auto high = static_cast<unsigned long>(std::ceil(v));
        auto low  = static_cast<unsigned long>(std::floor(v));

        if (high == low) {
            return round_pow2<unsigned long>(high);
        } else {
            T a = static_cast<T>(round_pow2<unsigned long>(low));
            T b = static_cast<T>(round_pow2<unsigned long>(high));

            return std::abs(a - v) <= std::abs(b - v) ? a : b;
        }
    } else {
        T high = v - 1;

        for (T i = 1; i < static_cast<T>(sizeof(T)); i *= 2) {
            high |= high >> i;
        }

        high += 1;
        T low = high >> 1;

        return (high - v) < (v - low) ? high : low;
    }
}

它应该适用于任何非负且小于ULLONG_MAX fp值的情况。但这似乎不是最佳的。是否有更好(更高性能)的方法来实现此功能?
编辑:@Eric对于我的应用程序,在v == 0的情况下,得到0是可以的。但这对某些人来说确实是个问题,因为0不是2的幂。
@Blixodus谢谢你的回答,它为我指明了正确的方向。我根据你的想法创建了以下函数:

template <typename T>
constexpr T round_p2(T v) {
    if constexpr (std::is_floating_point_v<T>) {
        using R = std::conditional_t<
            std::is_same_v<T, double>, uint64_t,
            std::conditional_t<std::is_same_v<T, float>, uint32_t,
            void
        >>;

        auto [mlen, es, em] = std::is_same_v<T, double> ? std::make_tuple(52, 1024, 0x7FF) : std::make_tuple(23, 128, 0xFF);
        auto y = *reinterpret_cast<R*>(&v);
        return (T(y >> (sizeof(R) * 8 - 1)) * -2 + 1) * (2 << (((y >> mlen) & em) - es + ((y >> mlen - 1) & 0x1)));
    } else {
        using R = std::make_unsigned_t<T>;
        R rv = static_cast<R>(v);
        T sign = 1;

        if constexpr (std::is_signed_v<T>) {
            if (v < 0) {
                rv = static_cast<R>(-v);
                sign = -1;
            }
        }

        R high = rv - 1;

        for (R i = 1; i < static_cast<R>(sizeof(R)); i *= 2) {
            high |= high >> i;
        }

        high += 1;
        R low = high >> 1;

        return sign * static_cast<T>((high - rv) <= (rv - low) ? high : low);
    }
}

与我的第一个实现相比,它似乎对我的应用程序工作得很好,并且生成了非常好的程序集。
说明:我首先得到三个魔术值,这取决于v是float还是double:第一个是尾数的长度,第二个是指数需要递减(加1)的量,第三个是用于从FP表示中提取指数的掩码。
然后,我将fp值转换为相同大小的无符号整数,以便能够处理它。
接下来,我提取将使用(T(y >> (sizeof(R) * 8 - 1)) * -2 + 1)对最终结果进行签名的值。它提取fp值的最高位(符号,0表示正,1表示负),然后将函数f(x) = x * -2 + 1应用于它,这将为x=0提供1,为x=1提供-1
最后,我使用Blixodus公式计算给定fp值的最接近的2的无符号幂。因为我不使用std::pow函数来支持位移位(因为我们使用的是2的幂)。我需要通过将1移到我们移动2的值来解决这个问题(因此es的值比预期的多1)。

vyu0f0g1

vyu0f0g11#

在C++ 2020之前,没有实现将整数舍入到2的幂,而不使用循环,编译器扩展或移位,假定对类型的宽度有一些限制。有几个关于堆栈溢出的问题。下面的代码显示了一个使用C++ std::countl_zero的解决方案和一个使用GCC的count-leading-zeros内置的替代解决方案,该解决方案适用于unsigned long long以下的任何整数类型。

#include <cmath>

#if 201703L <= __cplusplus
    #include <bit>
    #include <type_traits>
#endif

template <typename T> static constexpr T round_pow2(T v)
{
    /*  Since one is the smallest power of two, all numbers less than or equal
        to one round to one.
    */
    if (v <= 1) return 1;

    /*  For floating-point, the standard frexp function gives us the fraction
        and exponent, and ldexp applies an exponent.  The fraction is scaled to
        [.5, 1), so, if it is less than or equal to .75, we round down.
    */
    if constexpr (std::is_floating_point_v<T>)
    {
        int exponent = 0;
        T fraction = frexp(v, &exponent);
        return ldexp(.5, exponent + (.75 < fraction));
    }

    /*  Here we handle integer types.  The midpoints for rounding to powers of
        two are at 3*2^n.  That is, the transition between rounding to one
        power of two and another occurs at a number that has the form 3*2^n.
        To find which interval v is in, we can divide it by three and then
        find the next lower (instead of nearest) power of two.  To get the
        desired rounding at the midpoint, we use v-1.  So the general algorithm
        is to round (v-1)/3 down to the nearest power of two, then quadruple
        that.  For example:

            v   v-1   (v-1)/3   rounded down   quadrupled
            11   10     3            2             8
            12   11     3            2             8
            13   12     4            4            16

        Note that (v-1)/3 is not quite right for v=2, as the subtraction of 1
        jumps a full power of two, from 2 to 1.  (v-.01)/3 would work, but we
        want to stick with integer arithmetic.

        For the general case, we want 4 * 2**floor(log2((v-1)/3)).  To include
        v=2, we will use 2 * 2**f((v-1)/3), where f(x) is floor(log2(x))+1 but
        clamped to produce at least zero.

        If the C++ 2020 std::countl_zero function is available, we use that.
        Otherwise, we use the GCC builtin __builtin_clzll.  In either case, the
        function returns the number of leading zero bits, which depends on the
        width of the type rather than the operand value alone.  To calculate
        the power of two, we get the bit count for a fixed value (zero or one)
        as a reference point.
    */
    else
    {
        #if __cpp_lib_bitops
            /*  std::countl_zero is only provided for unsigned types, so
                define UT to be the unsigned type corresponding to T.
            */
            using UT  = std::make_unsigned<T>::type;

            return static_cast<UT>(2) << std::countl_zero<UT>(0) - std::countl_zero<UT>((v-1)/3));
        #else
            /*  Since __builtin_clzll is not defined for zero operands, we need
                to ensure its operand is at least 1.  To do this, we change
                (v-1)/3 to (v-1)/3*2+1.  The doubling increases the power of
                two by one, so we change the reference point from zero to one,
                decreasing the number of bits for it by one.
            */
            return 2ull << __builtin_clzll(1) - __builtin_clzll((v-1)/3*2+1);
        #endif
    }
}
6qftjkof

6qftjkof2#

浮点使用1位用于符号、n位用于指数和m位用于有效数来编码。例如,32位浮点数被编码为1位用于符号,8位用于指数,23位用于有效数。然后,32位浮点数的值由以下等式简单地给出
值=(-1)^sign * 2^(E-127)(1 + sum from i = 1 to 23(B_(23-1) 2^(-i)))
参见符号here
因此,如果有效数部分< 1.5,则更接近(-1)^sign2^(E-127),如果有效数部分>= 1.5,则更接近(-1)^sign2^(E-126)
有效数部分的第一位表示有效数是否< 1.5 or significand >为1.5(它将2^(-1)= 0.5加到总和上)
因此,您可以简单地查看有效数部分的第一位(在32位情况下为位数22),如果该位为0,则值更接近(-1)^sign2^(E-127),如果该位为1,则值更接近(-1)^sign2^(E-126)

相关问题