使用SwiftUI显示多个VNRecognizedObjectObservation边界框时偏移错误

wz3gfoph  于 2023-02-03  发布在  Swift
关注(0)|答案(2)|浏览(130)

bounty将在5天后过期。回答此问题可获得+250声望奖励。J. Doe希望引起更多人对此问题的关注。

我正在使用Vision检测对象,在得到[VNRecognizedObjectObservation]后,我在显示它们之前转换归一化的矩形:

let transform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -CGFloat(height))
VNImageRectForNormalizedRect(normalizedRect, width, height) // Displayed with SwiftUI, that's why I'm applying transform
    .applying(transform)

宽度和高度来自SwiftUI GeometryReader:

Image(...)
    .resizable()
    .scaledToFit()
    .overlay {
        GeometryReader { geometry in // ZStack and ForEach([VNRecognizedObjectObservation], id: \.uuid), then:
            let calculatedRect = calculateRect(boundingBox, geometry)
            Rectangle()
                .frame(width: calculatedRect.width, height: calculatedRect.height)
                .offset(x: calculatedRect.origin.x, y: calculatedRect.origin.y)
        }
    }

但问题是许多盒子的位置不正确(虽然有些是准确的),甚至在方形图像上。
这与模型无关,因为当我在Xcode模型预览部分中尝试相同的图像(使用相同的MLModel)时,它们具有相当准确的BB。

我的应用程序中的示例图像:

Xcode预览中的示例图像:

更新(最小重现性示例):

将此代码作为macOS SwiftUI项目放在ContentView.swift中,同时将YOLOv3Tiny.mlmodel放在项目包中,将产生相同的结果。

import SwiftUI
import Vision
import CoreML

class Detection: ObservableObject {
    let imgURL = URL(string: "https://i.imgur.com/EqsxxTc.jpg")! // Xcode preview generates this: https://i.imgur.com/6IPNQ8b.png
    @Published var objects: [VNRecognizedObjectObservation] = []

    func getModel() -> VNCoreMLModel? {
        if let modelURL = Bundle.main.url(forResource: "YOLOv3Tiny", withExtension: "mlmodelc") {
            if let mlModel = try? MLModel(contentsOf: modelURL, configuration: MLModelConfiguration()) {
                return try? VNCoreMLModel(for: mlModel)
            }
        }
        return nil
    }

    func detect() async {
        guard let model = getModel(), let tiff = NSImage(contentsOf: imgURL)?.tiffRepresentation else {
            fatalError("Either YOLOv3Tiny.mlmodel is not in project bundle, or image failed to load.")
            // YOLOv3Tiny: https://ml-assets.apple.com/coreml/models/Image/ObjectDetection/YOLOv3Tiny/YOLOv3Tiny.mlmodel
        }
        let request = VNCoreMLRequest(model: model) { (request, error) in
            DispatchQueue.main.async {
                self.objects = (request.results as? [VNRecognizedObjectObservation]) ?? []
            }
        }
        try? VNImageRequestHandler(data: tiff).perform([request])
    }

    func deNormalize(_ rect: CGRect, _ geometry: GeometryProxy) -> CGRect {
        let transform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -CGFloat(geometry.size.height))
        return VNImageRectForNormalizedRect(rect, Int(geometry.size.width), Int(geometry.size.height)).applying(transform)
    }
}

struct ContentView: View {
    @StateObject var detection = Detection()

    var body: some View {
        AsyncImage(url: detection.imgURL) { img in
            img.resizable().scaledToFit().overlay {
                GeometryReader { geometry in
                    ZStack {
                        ForEach(detection.objects, id: \.uuid) { object in
                            let rect = detection.deNormalize(object.boundingBox, geometry)
                            Rectangle()
                                .stroke(lineWidth: 2)
                                .foregroundColor(.red)
                                .frame(width: rect.width, height: rect.height)
                                .offset(x: rect.origin.x, y: rect.origin.y)
                        }
                    }
                }
            }
        } placeholder: {
            ProgressView()
        }
        .onAppear {
            Task { await self.detection.detect() }
        }
    }
}

**编辑:**进一步测试显示VN返回正确的位置,我的deNormalize()函数也返回正确的位置和大小,因此它必须与SwiftUI相关。

nlejzf6q

nlejzf6q1#

第一期

GeometryReader使里面的所有东西都缩小到它的最小尺寸。
.border(Color.orange)添加到ZStack,您将看到类似于下面的内容。

您可以使用.frame(maxWidth: .infinity, maxHeight: .infinity)来拉伸ZStack以占用所有可用空间。

第二期

positionoffset的比较。
offset通常从中心开始,然后按指定的量offset
position更像是origin
将此视图的中心定位在其父级坐标空间中的指定坐标处。

第3期

调整中心位置与原点使用的左上角(0,0)。

第4期

ZStack需要在X轴上翻转。
下面是完整的代码

import SwiftUI
import Vision
import CoreML
@MainActor
class Detection: ObservableObject {
    //Moved file to assets
    //let imgURL = URL(string: "https://i.imgur.com/EqsxxTc.jpg")! // Xcode preview generates this: https://i.imgur.com/6IPNQ8b.png
    let imageName: String = "EqsxxTc"
    @Published var objects: [VNRecognizedObjectObservation] = []
    
    func getModel() throws -> VNCoreMLModel {
        //Used model directly instead of loading from URL
        let model = try YOLOv3Tiny(configuration: .init()).model
        
        let mlModel = try VNCoreMLModel(for: model)
        
        return mlModel
    }
    
    func detect() async throws {
        let model = try getModel()
        
        guard let tiff = NSImage(named: imageName)?.tiffRepresentation else {
            // YOLOv3Tiny: https://ml-assets.apple.com/coreml/models/Image/ObjectDetection/YOLOv3Tiny/YOLOv3Tiny.mlmodel
            //fatalError("Either YOLOv3Tiny.mlmodel is not in project bundle, or image failed to load.")
            throw AppError.unableToLoadImage
        }
        //Completion handlers are not compatible with async/await you have to convert to a continuation.
        self.objects = try await withCheckedThrowingContinuation { (cont: CheckedContinuation<[VNRecognizedObjectObservation], Error>) in
            
            let request = VNCoreMLRequest(model: model) { (request, error) in
                if let error = error{
                    cont.resume(throwing: error)
                }else{
                    cont.resume(returning: (request.results as? [VNRecognizedObjectObservation]) ?? [])
                }
            }
            do{
                try VNImageRequestHandler(data: tiff).perform([request])
            }catch{
                cont.resume(throwing: error)
            }
        }
    }
    
    func deNormalize(_ rect: CGRect, _ geometry: GeometryProxy) -> CGRect {
        return VNImageRectForNormalizedRect(rect, Int(geometry.size.width), Int(geometry.size.height))
    }
}

struct ContentView: View {
    @StateObject var detection = Detection()
    
    var body: some View {
        Image(detection.imageName)
            .resizable()
            .scaledToFit()
            .overlay {
                GeometryReader { geometry in
                    ZStack {
                        ForEach(detection.objects, id: \.uuid) { object in
                            let rect = detection.deNormalize(object.boundingBox, geometry)
                            Rectangle()
                                .stroke(lineWidth: 2)
                                .foregroundColor(.red)
                                .frame(width: rect.width, height: rect.height)
                            //Changed to position
                            //Adjusting for center vs leading origin
                                .position(x: rect.origin.x + rect.width/2, y: rect.origin.y + rect.height/2)
                        }
                    }
                    //Geometry reader makes the view shrink to its smallest size
                    .frame(maxWidth: .infinity, maxHeight: .infinity)
                    //Flip upside down
                    .rotation3DEffect(.degrees(180), axis: (x: 1, y: 0, z: 0))
                    
                }.border(Color.orange)
            }
        
            .task {
                do{
                    try await self.detection.detect()
                }catch{
                    //Always throw errors to the View so you can tell the user somehow. You don't want crashes or to leave the user waiting for something that has failed.
                    print(error)
                }
            }
    }
}
struct ContentView_Previews: PreviewProvider {
    static var previews: some View {
        ContentView()
    }
}

enum AppError: LocalizedError{
    case cannotFindFile
    case unableToLoadImage
}

我还改变了一些其他的东西,你可以注意到,在代码中有注解。

nc1teljy

nc1teljy2#

好吧,所以经过长时间的故障排除,我终于设法使它正确工作(同时仍然不了解问题的原因)...
问题出在这一部分:

GeometryReader { geometry in
    ZStack {
        ForEach(detection.objects, id: \.uuid) { object in
            let rect = detection.deNormalize(object.boundingBox, geometry)
            Rectangle()
                .stroke(lineWidth: 2)
                .foregroundColor(.red)
                .frame(width: rect.width, height: rect.height)
                .offset(x: rect.origin.x, y: rect.origin.y)
        }
    }
}

我假设因为许多Rectangle()会重叠,所以我需要一个ZStack()来将它们相互覆盖,结果证明这是错误的,显然当使用.offset()时,它们可以重叠而没有任何问题,所以删除ZStack()完全解决了问题:

GeometryReader { geometry in
    ForEach(detection.objects, id: \.uuid) { object in
        let rect = detection.deNormalize(object.boundingBox, geometry)
        Rectangle()
            .stroke(lineWidth: 2)
            .foregroundColor(.red)
            .frame(width: rect.width, height: rect.height)
            .offset(x: rect.origin.x, y: rect.origin.y)
    }
}

我仍然不明白的是,为什么将ZStack()移到GeometryReader()之外也能解决问题,以及为什么有些框的位置正确,而有些则不正确!

ZStack {
    GeometryReader { geometry in
        ForEach(detection.objects, id: \.uuid) { object in
            let rect = detection.deNormalize(object.boundingBox, geometry)
            Rectangle()
                .stroke(lineWidth: 2)
                .foregroundColor(.red)
                .frame(width: rect.width, height: rect.height)
                .offset(x: rect.origin.x, y: rect.origin.y)
        }
    }
}

相关问题