我想使用CUDA对来自Lenna 512 x512的OpenCV ROI应用过滤器。但是我想我在将数据从主机正确复制到设备时遇到了问题。
我意识到Mat对象是不连续的,因此ROI矩阵的维度不是预期的那样; step[0]比cols*elemSize()大得多。
当我看到the result时,我看到过滤器几乎应用于图像的所有宽度。我试着调整总字节数,但它只是改变了过滤器的高度。
我希望至少在黑色矩形(194 x194)内应用过滤器,而不使用OpenCV CUDA API(GpuMat)
这是我目前的代码:
主要功能
#include <opencv2/imgcodecs.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/objdetect.hpp>
#include <iostream>
using namespace cv;
using namespace std;
int main(){
int THREADS = 8;
//Load Image
string path = "../resources/lenna.png"; //512x512 size
Mat img = imread(path);
//Load haarcascade
CascadeClassifier faceCascade;
faceCascade.load("../resources/haarcascade_frontalface_default.xml");
Mat img_gray;
cvtColor(img, img_gray, COLOR_BGR2GRAY);
vector<Rect> faces;
faceCascade.detectMultiScale(img_gray, faces, 1.1, 10);
for (int i = 0; i < faces.size(); i++){
//The int conversion is needed for my final filter (blur).
img.convertTo(img, CV_32SC3);
Rect R = setROI(faces[i]); //Adjust black rectangle region.
Point top_left = R.tl();
Point bot_right = R.br();
int w = bot_right.x - top_left.x;
int h = bot_right.y - top_left.y;
Mat faceROI = img(R);
faceROI.convertTo(faceROI, CV_32SC3);
//Filter apply
myFilter(faceROI, w, h, THREADS);
//Draws rectangles. Green one is for detected face and black one is for previous adjustment.
rectangle(img, faces[i].tl(), faces[i].br(), Scalar(0, 255,0), 3);
rectangle(img, top_left, bot_right, Scalar(0,0,0), 3);
//Recover original format.
img.convertTo(img, CV_8UC3);
imwrite("../resources/test.jpg", img);
}
return 0;
}
我的筛选器
void myFilter(Mat face, int w, int h, int THREADS){
//It's confirmed that w, h = 194
/*CUDA WORK*/
int faceBytes = face.step[0]*face.rows; //Should be face.elemSize()*sizeof(int)*w*h = 3*4*194*194, but it gives 3*4*512*194
//face.isContinuous() gives 0
//face.rows = face.cols = 194
int *d_face;
cudaMalloc<int>(&d_face, faceBytes);
cudaMemcpy(d_face, face.ptr(), faceBytes, cudaMemcpyHostToDevice);
dim3 threadsPerBlock(THREADS, THREADS);
dim3 numBlocks(ceil(w / threadsPerBlock.x), ceil(h / threadsPerBlock.y));
myFilterKernel<<<numBlocks, threadsPerBlock>>>(d_face, w, h, face.step1());
cudaDeviceSynchronize();
cudaMemcpy(face.ptr(), d_face, faceBytes, cudaMemcpyDeviceToHost);
cudaFree(d_face);
}
我的筛选器内核
__global__ void myFilterKernel(int* d_face, int width, int height, int faceStep){
int x = blockIdx.x*blockDim.x+threadIdx.x;
int y = blockIdx.y*blockDim.y+threadIdx.y;
int face_c = faceStep /width; //Channel count, it should be 3
if (y < height && x < width){
//Thread pos
int face_tid = y * faceStep + (face_c * x);
//Filter
for (int i = 0; i < face_c; i++){
d_face[face_tid + i] *= 2;
}
}
}
1条答案
按热度按时间2wnc66cl1#
当过滤器应用特定ROI时,我们可以将ROI从主机复制到设备,对ROI应用过滤器,并将过滤后的ROI从设备复制到主机。
为了将ROI从主机复制到设备并返回,我们可以使用cudaMemcpy2D而不是
cudaMemcpy
。使用
cudaMemcpy2D
时,我们必须设置“源间距”和“目标间距”在下图中,我们可以只分配、复制和处理小矩形:
代码示例(无人脸检测):
输出量:
注意事项:
在这种情况下,我们可能必须将较大的ROI从主机复制到设备(并将准确的ROI从设备复制到主机)。
在这种情况下,测试像素是否超出图像边界可能变得更加复杂。