opencv 使用边界框列表从图像中裁剪多个边界框

7kqas0il  于 2023-11-22  发布在  其他
关注(0)|答案(2)|浏览(194)

使用Amazon的Rekognition,我使用以下方法从JSON响应中提取了感兴趣的边界框:

def __init__(self, image):
        self.shape = image.shape 

    def bounding_box_convert(self, bounding_box):

        xmin = int(bounding_box['Left'] * self.shape[1])
        xmax = xmin + int(bounding_box['Width'] * self.shape[1])
        ymin = int(bounding_box['Top'] * self.shape[0])
        ymax = ymin + int(bounding_box['Height'] * self.shape[0])

        return (xmin,ymin,xmax,ymax)

    def polygon_convert(self, polygon):
        pts = []
        for p in polygon:
            x = int(p['X'] * self.shape[1])
            y = int(p['Y'] * self.shape[0])
            pts.append( [x,y] )

        return pts

def get_bounding_boxes(jsondata):
    objectnames = ('Helmet','Hardhat')
    bboxes = []
    a = jsondata
    if('Labels' in a):
        for label in a['Labels']:

            #-- skip over anything that isn't hardhat,helmet
            if(label['Name'] in objectnames):
                print('extracting {}'.format(label['Name']))

                lbl = "{}: {:0.1f}%".format(label['Name'], label['Confidence'])
                print(lbl)

                for instance in label['Instances']:
                    coords = tmp.bounding_box_convert(instance['BoundingBox'])
                    bboxes.append(coords)

    return bboxes

if __name__=='__main__':

    imagefile = 'image011.jpg'
    bgr_image = cv2.imread(imagefile)
    tmp = Tmp(bgr_image)

    jsonname = 'json_000'
    fin = open(jsonname, 'r')

    jsondata = json.load(fin)
    bb = get_bounding_boxes(jsondata)
    print(bb)

字符串
输出是边界框的列表:

[(865, 731, 1077, 906), (1874, 646, 2117, 824)]


我可以很容易地从列表中提取一个位置,并保存为一个新的图像,使用:

from PIL import Image
img = Image.open("image011.jpg")
area = (865, 731, 1077, 906)
cropped_img = img.crop(area)
cropped_img.save("cropped.jpg")


然而,我还没有找到一个很好的解决方案来使用“bb”列表输出从图像中裁剪和保存多个边界框。
我确实找到了一个从csv中提取信息的解决方案:Most efficient/quickest way to crop multiple bounding boxes in 1 image, over thousands of images?
但是,我相信有一种比将边界框数据保存到CSV并阅读它更有效的方法。
我不是很擅长写自己的函数-所有的建议都非常感谢!

np8igboo

np8igboo1#

假设你的边界框坐标是x,y,w,h的形式,你可以做ROI = image[y:y+h,x:x+w]来裁剪。对于这个输入图像:


的数据
使用来自how to get ROI Bounding Box Coordinates without Guess & Check的脚本获取x,y,w,h边界框坐标,以裁剪出这些ROI:



我们只需遍历边界框列表并使用Numpy切片对其进行裁剪。提取的ROI:



这里有一个最小的例子:

import cv2
import numpy as np 

image = cv2.imread('1.png')
bounding_boxes = [(17, 24, 47, 47),
                  (74, 28, 47, 50),
                  (125, 15, 51, 61),
                  (184, 18, 53, 53),
                  (247, 25, 44, 46),
                  (296, 6, 65, 66)
]

num = 0
for box in bounding_boxes:
    x,y,w,h = box
    ROI = image[y:y+h, x:x+w]
    cv2.imwrite('ROI_{}.png'.format(num), ROI)
    num += 1
    cv2.imshow('ROI', ROI)
    cv2.waitKey()

字符串

cedebl8k

cedebl8k2#

建议的解决方案很慢,因为这个操作可以矢量化。看起来,确实,一些流行的框架(Tensorflow,Torch)让用户进行这种预处理,而其他框架(参见MatLab的bboxcrop)。下面是我在自己的研究中使用的矢量化代码:

def crop_bounding_boxes(boxes,window):
    """Crop bounding boxes to the speficied window. 

    Args:
        boxes: A tensor of shape `[n_boxes,4]` describing bounding boxes. Each box is in pixel units and in the format `x_min,y_min,x_max,y_max`
        window A tensor of shape `[4]` describing the window. The window is in pixel units and in the format `x_min,y_min,x_max,y_max`

    Returns:
        _type_: _description_
    """    
    """
    Args:
        boxes 
    The annotation boxes are assumed to be in pixels and in the format `x_min,y_min,x_max,y_max`.

    """
    # assume boxes and patch are given as (x1,y1,x2,y2)

    # compute intersections of rectangles
    tf_ops = [tf.maximum,tf.maximum,tf.minimum,tf.minimum]
    cropped_boxes = [op(window[pos],boxes[:,pos]) for (pos,op) in enumerate(tf_ops)]
    cropped_boxes = tf.stack(cropped_boxes,axis=-1)
    mask = tf.logical_and( tf.less(cropped_boxes[:,0],cropped_boxes[:,2]), tf.less(cropped_boxes[:,1],cropped_boxes[:,3]) )
    cropped_boxes = tf.boolean_mask(cropped_boxes,mask)
    # move the coordinates origin to (x1,y1)
    corner = tf.concat([window[:2],window[:2]],axis=0)
    corner = tf.broadcast_to(corner, cropped_boxes.shape)
    cropped_boxes = cropped_boxes - corner
    return cropped_boxes

字符串
这里有一个小演示。考虑一个简单的图像,中间有一个盒子

import tensorflow as tf
import matplotlib.pyplot as plt

img = tf.zeros(shape=(256,256,1),dtype=tf.float32)
boxes = tf.constant([[0.25,0.25,0.75,0.75]])

img_with_box = tf.image.draw_bounding_boxes([img],[boxes],colors=[[1.0,1.0,1.0]])[0]
plt.imshow(img_with_box.numpy(), cmap="gray")


的数据
使用上面的实用程序,我们将它与4个作物的边界框一起沿着

import itertools
xy = list(itertools.product(range(2),repeat=2))
IMG_PATCHES = list((x*128,y*128,(x+1)*128,(y+1)*128) for y,x in xy)

fig,axs = plt.subplots(2,2,figsize=(12,12))
boxes = tf.cast(boxes*256, dtype=tf.int32)

for ax,img_patch in zip(axs.ravel(),IMG_PATCHES):
    # crop bounding boxes to the image patch
    cropped_boxes = crop_bounding_boxes(boxes,img_patch)
    # for display, convert boxes to the format expected by TF API: [y_min, x_min, y_max, x_max] + 0-1 scale
    x1,y1,x2,y2 = img_patch
    scale = tf.constant([x2-x1,y2-y1,x2-x1,y2-y1],dtype=tf.float32)
    cropped_boxes = tf.cast(cropped_boxes,tf.float32)/tf.broadcast_to(scale,cropped_boxes.shape)
    cropped_boxes = tf.gather(cropped_boxes, [1,0,3,2], axis=-1)

    img_with_boxes = tf.image.draw_bounding_boxes([img[y1:y2,x1:x2,:]],tf.expand_dims(cropped_boxes,0),colors=[[1.0,1.0,1.0]])

    ax.imshow(img_with_boxes.numpy().squeeze(), origin='upper', extent=(x1,x2,y2,y1), cmap="gray")
    ax.set_title(str(img_patch))

plt.show()


输出:

最后,一个关于真实的世界数据的视觉上有吸引力的例子,注解的树。
之后:



还有一个full notebook

相关问题