swift AVAudioPlayerNode导致失真

edqdpe6u  于 2023-01-29  发布在  Swift
关注(0)|答案(2)|浏览(150)

我有一个AVAudioPlayerNode连接到一个AVAudioEngine。样本缓冲区通过scheduleBuffer()方法提供给playerNode
然而,看起来playerNode正在失真音频。而不是简单地"通过"缓冲区,输出失真并包含静态(但仍然大部分是可听到的)。
相关代码:

let myBufferFormat = AVAudioFormat(standardFormatWithSampleRate: 48000, channels: 2)

// Configure player node
let playerNode = AVAudioPlayerNode()
audioEngine.attach(playerNode)
audioEngine.connect(playerNode, to: audioEngine.mainMixerNode, format: myBufferFormat)

// Provide audio buffers to playerNode
for await buffer in mySource.streamAudio() {
    await playerNode.scheduleBuffer(buffer)
}

在上面的例子中,mySource.streamAudio()从ScreenCaptureKit SCStreamDelegate实时提供音频。音频缓冲区以CMSampleBuffer到达,转换为AVAudioPCMBuffer,然后通过AsyncStream传递到上面的音频引擎。我已经验证了转换后的缓冲区是有效的。
也许缓冲区到达的速度不够快?这张约25,000帧的图表表明inputNode正在周期性地插入"零"帧片段:

失真似乎是这些空帧的结果。

编辑:

即使我们从管道中删除AsyncStream,并立即在ScreenCaptureKit回调中处理缓冲区,失真仍然存在。下面是一个可以按原样运行的端到端示例(重要部分是didOutputSampleBuffer):

class Recorder: NSObject, SCStreamOutput {
    
    private let audioEngine = AVAudioEngine()
    private let playerNode = AVAudioPlayerNode()
    private var stream: SCStream?
    private let queue = DispatchQueue(label: "sampleQueue", qos: .userInitiated)
    
    func setupEngine() {
        let format = AVAudioFormat(standardFormatWithSampleRate: 48000, channels: 2)
        audioEngine.attach(playerNode)
        // playerNode --> mainMixerNode --> outputNode --> speakers
        audioEngine.connect(playerNode, to: audioEngine.mainMixerNode, format: format)
        audioEngine.prepare()
        try? audioEngine.start()
        playerNode.play()
    }
    
    func startCapture() async {
        // Capture audio from Safari
        let availableContent = try! await SCShareableContent.excludingDesktopWindows(true, onScreenWindowsOnly: false)
        let display = availableContent.displays.first!
        let app = availableContent.applications.first(where: {$0.applicationName == "Safari"})!
        let filter = SCContentFilter(display: display, including: [app], exceptingWindows: [])
        let config = SCStreamConfiguration()
        config.capturesAudio = true
        config.sampleRate = 48000
        config.channelCount = 2
        stream = SCStream(filter: filter, configuration: config, delegate: nil)
        try! stream!.addStreamOutput(self, type: .audio, sampleHandlerQueue: queue)
        try! stream!.addStreamOutput(self, type: .screen, sampleHandlerQueue: queue) // To prevent warnings
        try! await stream!.startCapture()
    }
    
    func stream(_ stream: SCStream, didOutputSampleBuffer sampleBuffer: CMSampleBuffer, of type: SCStreamOutputType) {
        switch type {
        case .audio:
            let pcmBuffer = createPCMBuffer(from: sampleBuffer)!
            playerNode.scheduleBuffer(pcmBuffer, completionHandler: nil)
        default:
            break // Ignore video frames
        }
    }
    
    func createPCMBuffer(from sampleBuffer: CMSampleBuffer) -> AVAudioPCMBuffer? {
        var ablPointer: UnsafePointer<AudioBufferList>?
        try? sampleBuffer.withAudioBufferList { audioBufferList, blockBuffer in
            ablPointer = audioBufferList.unsafePointer
        }
        guard let audioBufferList = ablPointer,
              let absd = sampleBuffer.formatDescription?.audioStreamBasicDescription,
              let format = AVAudioFormat(standardFormatWithSampleRate: absd.mSampleRate, channels: absd.mChannelsPerFrame) else { return nil }
        return AVAudioPCMBuffer(pcmFormat: format, bufferListNoCopy: audioBufferList)
    }
    
}

let recorder = Recorder()
recorder.setupEngine()
Task {
    await recorder.startCapture()
}
yhuiod9q

yhuiod9q1#

您的“将缓冲区写入文件:distorted!”块几乎肯定是在做一些缓慢和阻塞的事情(比如写文件)。每170 ms(8192/48 k)调用一次。tap块最好不要花比这更长的时间来执行,否则你会落后并丢弃缓冲区。
写文件的时候可以保持同步,但这取决于你是如何做到的。如果你做了一些效率很低的事情(比如重新打开并刷新每个缓冲区的文件),那么你可能就跟不上了。
如果这个理论是正确的,那么现场扬声器输出应该没有静态的,只是你的输出文件。

nhhxz33t

nhhxz33t2#

罪魁祸首是createPCMBuffer()函数,用这个替换它,一切都运行得很顺利:

func createPCMBuffer(from sampleBuffer: CMSampleBuffer) -> AVAudioPCMBuffer? {
    let numSamples = AVAudioFrameCount(sampleBuffer.numSamples)
    let format = AVAudioFormat(cmAudioFormatDescription: sampleBuffer.formatDescription!)
    let pcmBuffer = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: numSamples)!
    pcmBuffer.frameLength = numSamples
    CMSampleBufferCopyPCMDataIntoAudioBufferList(sampleBuffer, at: 0, frameCount: Int32(numSamples), into: pcmBuffer.mutableAudioBufferList)
    return pcmBuffer
}

我问题中的原始函数直接取自苹果的ScreenCaptureKit示例项目。它在技术上可以工作,当写入文件时音频听起来很好,但显然它对实时音频来说不够快。

**编辑:**实际上这可能与速度无关,因为新函数由于复制数据而平均慢了2- 3倍,可能是因为在用指针创建AVAudioPCMBuffer时底层数据被释放了。

相关问题