opengl acos()函数有精确的近似值吗？

f2uvfpb9 于 2023-01-20 发布在其他

关注(0)|答案(3)|浏览(143)

我需要一个双精度的acos()函数，因为GLSL中没有内置的双精度acos()函数，所以我尝试实现了自己的函数。
一开始，我用预先计算好的能力值实现了一个泰勒级数，比如Wiki - Taylor series的方程，但这似乎在1附近不准确，40次迭代的最大误差约为0.08。
我还实现了this method，它在CPU上工作得很好，最大错误为-2.22045e-16，但在着色器中实现它时遇到了一些麻烦。
目前，我使用here的acos()近似函数，有人在this网站上发布了他的近似函数。我使用这个网站上最精确的函数，现在我得到了最大误差-7.60454e-08，但这个误差也太高了。
这个函数的代码是：

double myACOS(double x)
{
    double part[4];
    part[0] = 32768.0/2835.0*sqrt(2.0-sqrt(2.0+sqrt(2.0+sqrt(2.0+2.0*x))));
    part[1] = 256.0/135.0*sqrt(2.0-sqrt(2.0+sqrt(2.0+2.0*x)));
    part[2] = 8.0/135.0*sqrt(2.0-sqrt(2.0+2.0*x));
    part[3] = 1.0/2835.0*sqrt(2.0-2.0*x);
    return (part[0]-part[1]+part[2]-part[3]);
}

有人知道acos()的另一种实现方法吗？这种方法非常精确，而且如果可能的话，很容易在着色器中实现。
一些系统信息：

英伟达GT 555 M
使用optirun运行OpenGL 4.3

opengl

来源：https://stackoverflow.com/questions/28969184/is-there-an-accurate-approximation-of-the-acos-function

3条答案

按热度按时间

deyfvvtc1#

NVIDIA GT 555 M GPU是一款计算能力为2.1的设备，因此它本身就支持基本的双精度运算，包括融合乘加（FMA）。和所有NVIDIA GPU一样，平方根运算是模拟的。我熟悉CUDA，但不熟悉GLSL。根据GLSL specification的4. 3版本，它将双精度FMA公开为函数fma()，并提供双精度平方根sqrt()。sqrt()实现是否根据IEEE-754规则进行了正确舍入尚不清楚。
不使用泰勒级数，而是使用多项式minimax approximation，从而减少所需的项数。极大极小近似通常使用Remez algorithm的变体生成。为了优化速度和精度，使用FMA是必不可少的。使用Horner scheme计算多项式有助于提高精度。在以下代码中，如在DanceIgel的answer中，使用asin近似作为基本构建块结合标准数学恒等式方便地计算acos。
对于400 M测试向量，使用以下代码观察到的最大相对误差为2.67e-16，而观察到的最大ulp误差为1.442 ulp。

/* compute arcsin (a) for a in [-9/16, 9/16] */
double asin_core (double a)
{
    double q, r, s, t;

    s = a * a;
    q = s * s;
    r =             5.5579749017470502e-2;
    t =            -6.2027913464120114e-2;
    r = fma (r, q,  5.4224464349245036e-2);
    t = fma (t, q, -1.1326992890324464e-2);
    r = fma (r, q,  1.5268872539397656e-2);
    t = fma (t, q,  1.0493798473372081e-2);
    r = fma (r, q,  1.4106045900607047e-2);
    t = fma (t, q,  1.7339776384962050e-2);
    r = fma (r, q,  2.2372961589651054e-2);
    t = fma (t, q,  3.0381912707941005e-2);
    r = fma (r, q,  4.4642857881094775e-2);
    t = fma (t, q,  7.4999999991367292e-2);
    r = fma (r, s, t);
    r = fma (r, s,  1.6666666666670193e-1);
    t = a * s;
    r = fma (r, t, a);

    return r;
}

/* Compute arccosine (a), maximum error observed: 1.4316 ulp
   Double-precision factorization of π courtesy of Tor Myklebust
*/
double my_acos (double a)
{
    double r;

    r = (a > 0.0) ? -a : a; // avoid modifying the "sign" of NaNs
    if (r > -0.5625) {
        /* arccos(x) = pi/2 - arcsin(x) */
        r = fma (9.3282184640716537e-1, 1.6839188885261840e+0, asin_core (r));
    } else {
        /* arccos(x) = 2 * arcsin (sqrt ((1-x) / 2)) */
        r = 2.0 * asin_core (sqrt (fma (0.5, r, 0.5)));
    }
    if (!(a > 0.0) && (a >= -1.0)) { // avoid modifying the "sign" of NaNs
        /* arccos (-x) = pi - arccos(x) */
        r = fma (1.8656436928143307e+0, 1.6839188885261840e+0, -r);
    }
    return r;
}

赞(0）回复(0）举报 2023-01-20

vdgimpew2#

我目前的acos（）精确着色器实现是一个混合了通常的泰勒级数和Bence的答案。通过40次迭代，我得到了math.h中的acos（）实现的4.44089e-16的精确度。也许它不是最好的，但它对我很有效：
这就是：

double myASIN2(double x)
{
    double sum, tempExp;
    tempExp = x;
    double factor = 1.0;
    double divisor = 1.0;
    sum = x;
    for(int i = 0; i < 40; i++)
    {
        tempExp *= x*x;
        divisor += 2.0;
        factor *= (2.0*double(i) + 1.0)/((double(i)+1.0)*2.0);
        sum += factor*tempExp/divisor;
    }
    return sum;
}

double myASIN(double x)
{
    if(abs(x) <= 0.71)
        return myASIN2(x);
    else if( x > 0)
        return (PI/2.0-myASIN2(sqrt(1.0-(x*x))));
    else //x < 0 or x is NaN
        return (myASIN2(sqrt(1.0-(x*x)))-PI/2.0);

}

double myACOS(double x)
{
    return (PI/2.0 - myASIN(x));
}

有什么意见吗？还有什么可以做得更好的吗？例如，使用查找表的值的因素，但在我的着色器'acos（）'只是调用一次，所以没有必要。

赞(0）回复(0）举报 2023-01-20

92vpleto3#

也许这个解有帮助，它比x = 1到0.2时正确Angular 的1%要好。
平方和（x）~=平方和（2）（平方和（1-x）+（1/11）（平方和（1-x））^3）
这是从Wolfram提供的泰勒级数开始的。即使是小于0.8的粗略值也需要太多的项。该方法使用了前2项的一般形式，但改变了系数以改善匹配。有趣的是整数11的整数系数工作。

赞(0）回复(0）举报 2023-01-20

我来回答

opengl acos()函数有精确的近似值吗？

3条答案

相关问题

热门标签

最新问答