我无法让OpenCV姿势重建工作

我目前正在尝试使用OpenCV重建相机姿势。对于实现，我大致遵循Epipolar Geometry Example。其思想如下：

加载两张图片并使用cv::xfeatures2d::SURF::detectAndCompute()计算其SURF要素
使用cv::DescriptorMatcher::knnMatch()匹配两个要素集
使用cv::findEssentialMat()估计具有匹配特征的基本矩阵
使用具有cv::recoverPose()的基本矩阵恢复姿势

作为输入数据，我使用的是免费提供的New Tsukuba Stereo Dataset中的一系列图片。该数据集是人工创建的具有已知地面实况姿态的立体相机帧系列。为了测试我的实现，我估计了第一张和第60张左边图片之间的相对姿态以及第一张和第60张右边图片的姿态。两对的平移矢量应该指向大致相同的方向，因为两对都是来自左相机和右相机的对应图片。观察地面实况，两个姿势的旋转幅度应该大约为7度。不幸的是，当我运行我的实现时，我得到了以下姿势：

First pair:
-> Rotation vector: [[0.3406, 0.9054, 0.2534]]
-> Rotation angle: 10.975deg
-> Translation vector: [[-0.8103, 0.04748, -0.5841]]
Second pair:
-> Rotation vector: [[0.7907, 0.5027, 0.3494]]
-> Rotation angle: 5.24811deg
-> Translation vector: [[0.748, 0.2306, -0.6223]]

我的结果到处都是，我不太确定出了什么问题。两个旋转都不接近7度，平移矢量指向完全不同的方向。我创建了一个最小的代码示例来演示我的过程：

namespace fs = std::filesystem;

const fs::path ROOT_DIR = "NewTsukubaStereoDataset";
const cv::Matx33d CAMERA_MAT(615, 0, 640, 0, 615, 480, 0, 0, 1);
constexpr double HESSIAN_THRESHOLD = 400;
constexpr double LOWE_THRESHOLD = 0.8;
constexpr double RANSAC_CONFIDENCE = 0.999;
constexpr double MAX_DIST_TO_EPIPOLAR = 1;
constexpr int MAX_RANSAC_ITERS = 1000;
constexpr std::size_t MIN_INLIERS = 100;

std::optional<cv::Affine3d> reconstructPose(const cv::Mat& firstPic, const cv::Mat& secondPic, const double scale) {
    // initialize data structures
    std::vector<cv::KeyPoint> firstKeyPoints, secondKeyPoints;
    cv::Mat firstDescriptors, secondDescriptors, inlierMask;
    cv::Matx33d essentialMat, rotation;
    cv::Vec3d translation;
    std::vector<std::vector<cv::DMatch>> knnFeatureMatches;
    std::vector<cv::Point2f> firstInlierPts, secondInlierPts;

    // initialize algorithms
    cv::Ptr<cv::xfeatures2d::SURF> detector = cv::xfeatures2d::SURF::create(HESSIAN_THRESHOLD);
    cv::Ptr<cv::DescriptorMatcher> matcher = cv::DescriptorMatcher::create(cv::DescriptorMatcher::FLANNBASED);

    // compute features
    detector->detectAndCompute(firstPic, cv::noArray(), firstKeyPoints, firstDescriptors);
    detector->detectAndCompute(secondPic, cv::noArray(), secondKeyPoints, secondDescriptors);

    // find matching features
    matcher->knnMatch(firstDescriptors, secondDescriptors, knnFeatureMatches, 2);

    // ratio test as per Lowe's paper (copied from the opencv example)
    for(std::size_t i = 0; i < knnFeatureMatches.size(); ++i) {
        if(knnFeatureMatches[i][0].distance < LOWE_THRESHOLD * knnFeatureMatches[i][1].distance) {
            const cv::DMatch& m = knnFeatureMatches[i][0];
            firstInlierPts.push_back(firstKeyPoints[m.queryIdx].pt);
            secondInlierPts.push_back(secondKeyPoints[m.trainIdx].pt);
        }
    }

    // require a minimum number of inliers for effective ransac execution
    if(firstInlierPts.size() < MIN_INLIERS) {
        std::cerr << "Not enough inliers for essential matrix estimation" << std::endl;
        return std::nullopt;
    }

    // estimate essential matrix
    essentialMat = cv::findEssentialMat(firstInlierPts, secondInlierPts, CAMERA_MAT, cv::RANSAC,
                                        RANSAC_CONFIDENCE, MAX_DIST_TO_EPIPOLAR, MAX_RANSAC_ITERS, inlierMask);

    // require minimum number of valid inliers as well as a valid essential matrix (see https://en.wikipedia.org/wiki/Essential_matrix#Properties)
    if(!isValidEssentialMatrix(essentialMat) || cv::sum(inlierMask)(0) < MIN_INLIERS) {
        std::cerr << "Invalid essential matrix" << std::endl;
        return std::nullopt;
    }

    // estimate pose from the essential matrix
    const std::size_t numPoints = cv::recoverPose(essentialMat, firstInlierPts, secondInlierPts, CAMERA_MAT, rotation, translation, inlierMask);
    // recoverPose returns a unit length translation that needs to be scaled accordingly
    translation *= scale;

    // require minimum number of valid inliers as well as a valid rotation matrix
    if (isValidRotationMatrix(rotation) && numPoints >= MIN_INLIERS) {
        displayDebugPicture(firstPic, secondPic, inlierMask, firstInlierPts, secondInlierPts);
        return cv::Affine3d(rotation, translation);
    } else {
        std::cerr << "Invalid estimated pose" << std::endl;
        return std::nullopt;
    }
}

int main(int argc, char* argv[]) {
    // loading the data
    const cv::Mat left0 = cv::imread(ROOT_DIR / "illumination" / "fluorescent" / "L_00001.png", cv::IMREAD_GRAYSCALE);
    const cv::Mat left1 = cv::imread(ROOT_DIR / "illumination" / "fluorescent" / "L_00060.png", cv::IMREAD_GRAYSCALE);
    const cv::Mat right0 = cv::imread(ROOT_DIR / "illumination" / "fluorescent" / "R_00001.png", cv::IMREAD_GRAYSCALE);
    const cv::Mat right1 = cv::imread(ROOT_DIR / "illumination" / "fluorescent" / "R_00060.png", cv::IMREAD_GRAYSCALE);

    // reconstruct first pose (rotation angle should be around 7deg)
    std::cout << "Left pair:" << std::endl;
    std::optional<cv::Affine3d> pose0 = reconstructPose(left0, left1, 1);
    if(pose0.has_value()) {
        printAffine3d(pose0.value()); // prints the pose like I mentioned it above
    }

    // reconstruct second pose (rotation angle should be around 7deg)
    std::cout << "Right pair:" << std::endl;
    std::optional<cv::Affine3d> pose1 = reconstructPose(right0, right1, 1);
    if(pose1.has_value()) {
        printAffine3d(pose1.value()); // prints the pose like I mentioned it above
    }

    return EXIT_SUCCESS;
}

正如您在上面的代码中看到的，我实现了一些调试机制，以确保流程按预期工作。首先，我确保有合理数量的匹配特征来估计基本矩阵和相对姿势：

if(firstInlierPts.size() < MIN_INLIERS) {
    std::cerr << "Not enough inliers for essential matrix estimation" << std::endl;
    return std::nullopt;
}

除了检查有效内点匹配的数量，我还检查本质矩阵的有效性，确保它包含两个相等的奇异值和一个为零。我还验证了cv::recoverPose()的输出，以确保它是一个旋转矩阵：

constexpr double EPS = 1e-5;

bool isValidEssentialMatrix(const cv::Matx33d& mat) {
    cv::Vec3d singularVals;
    cv::SVD::compute(mat, singularVals, cv::SVD::NO_UV);

    int zeroInd = -1;
    for(int i = 0; i < 3; ++i) {
        if(std::abs(singularVals(i)) <= EPS) {
            zeroInd = i;
            break;
        }
    }
    if(zeroInd == 0) {
        return std::abs(singularVals(1) - singularVals(2)) <= EPS;
    } else if(zeroInd == 1) {
        return std::abs(singularVals(0) - singularVals(2)) <= EPS;
    } else if(zeroInd == 2) {
        return std::abs(singularVals(0) - singularVals(1)) <= EPS;
    } else {
        return false;
    }
}

bool isValidRotationMatrix(const cv::Matx33d& mat) {
    cv::Matx33d tmp;
    cv::absdiff((mat.t() * mat), cv::Matx33d::eye(), tmp);
    return cv::sum(tmp)(0) <= EPS && std::abs(cv::determinant(mat) - 1) <= EPS;
}

为了确保匹配的特征实际上是有意义的，我还以并排视图显示了它们，并验证了它们实际上代表了图片中的对应点：

在这一点上，我已经不知道我还能做什么来找出我的代码为什么不工作了。我是不是遗漏了什么，还是我使用OpenCV不正确？为什么这里的姿势会得到毫无意义的数据？OpenCV代码本身是否存在bug？（我使用的是版本4.7.0）

我们知道数据集：
图像的分辨率为640 x480像素，立体相机的基线为10 cm，相机的焦距为615像素。
相机矩阵包含焦距和光学中心。
光学中心通常位于图像的中心。理想值为cx = (width-1) / 2和cy = (height-1) / 2。

***减去一半***因为像素中心是整数，如果你想象一个4像素宽的图像，中心将位于中间的两个像素之间，这将是坐标1.5 =（4-1）/ 2。

在代码中，给出了这个相机矩阵：

const cv::Matx33d CAMERA_MAT( // good for images sized 1280 x 960
    615,   0, 640,
      0, 615, 480,
      0,   0,   1);

这意味着图像大小大约为1280 × 960。
对于640 × 480的图像，更合理的矩阵将改为

const cv::Matx33d CAMERA_MAT( // good for images sized 640 x 480
    615,   0, (640-1)/2,
      0, 615, (480-1)/2,
      0,   0,   1);

光学中心位于右下角的投影矩阵是非常不寻常的，但并非不可能。它将产生于将1280 x 960传感器图像向下裁剪到左上角。这样的裁剪（保持左侧和顶部原样）将不会移动光学中心。
顺便说一下，假设f = 615，你可以计算视野。这些是从投影矩阵导出的，经过一些简化：

水平：tan(theta/2) * f = width/2 => HFoV ~55.0°
垂直：tan(theta/2) * f = height/2 => VFoV ~42.6°
对角线：tan(theta/2) * f = hypot(width,height)/2 => DFoV ~66.1°

也就是说，不考虑透镜失真。
由于这是一个立体对，您可能计划计算视差图。视差和距离（沿Z方向，而不是欧几里得方向）由“基线”（一种瞳孔间距（IPD））相关。
我们有一个由眼睛和3D点组成的三角形，在这里你可以假设（w.l.o.g.）一只眼睛直视它，另一只眼睛“错过”了一段像素距离。
三角形上的方程：

tan(alpha) * f [px] = disparity [px]
tan(alpha) * distance [m] = baseline [m]

合并：

distance [m] * disparity [px] = baseline [m] * f [px]

重新整理口味。

1像素的视差给出61.5米的距离。
1米的距离给出61.5像素的视差。

考虑到所有这些，我们还可以计算某个点在X/Y上的位移，给定它的距离（沿Z方向，而不是欧几里得方向）和屏幕坐标（相对于光学中心）。

X[m] / Z[m] * f + cx = x[px]
X[m] = (x[px] - cx) / f * Z[m]

所以如果有一个点在一米之外，它的屏幕坐标是x = 300，那么x-cx = 300-319.5 = -19.5和X [m] = -0.0317 [m]。如果是10米远，那就是X[m] = -0.317 [m]

我无法让OpenCV姿势重建工作

1条答案

相关问题

热门标签

最新问答