swift 将音频从Watch流式传输到iPhone以使用SFSpeechRecognizer

我想在我的Watch应用中进行语音识别，显示实时转录。由于SFSpeechRecognizer在WatchOS上不可用，我使用WatchConnectivity将应用程序设置为将音频流传输到iOS伴侣。在尝试这个之前，我在iPhone上尝试了同样的代码，没有涉及手表-它在那里工作。

**通过我的流式传输尝试，伴侣将接收音频块并且不会抛出任何错误，但它也不会转录任何文本。**我怀疑我做错了什么，当从AVAudioPCMBuffer转换回来时，但我不能完全把我的手指放在上面，因为我缺乏经验，使用原始数据和指针。

现在，整个事情的工作原理如下：
1.用户按下按钮，触发Watch要求iPhone设置recognitionTask

iPhone设置recognitionTask并回答OK或一些错误：

guard let speechRecognizer = self.speechRecognizer else {
    WCManager.shared.sendWatchMessage(.speechRecognitionRequest(.error("no speech recognizer")))
    return
}
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
guard let recognitionRequest = recognitionRequest else {
    WCManager.shared.sendWatchMessage(.speechRecognitionRequest(.error("speech recognition request denied by ios")))
    return
}
recognitionRequest.shouldReportPartialResults = true
if #available(iOS 13, *) {
    recognitionRequest.requiresOnDeviceRecognition = true
}

recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest) { result, error in
    if let result = result {
        let t = result.bestTranscription.formattedString
        WCManager.shared.sendWatchMessage(.recognizedSpeech(t))
    }
    
    if error != nil {
        self.recognitionRequest = nil
        self.recognitionTask = nil
        WCManager.shared.sendWatchMessage(.speechRecognition(.error("?")))
    }
}
WCManager.shared.sendWatchMessage(.speechRecognitionRequest(.ok))

Watch设置音频会话，在音频引擎的输入节点上安装一个tap，并将音频格式返回到iPhone：

do {
    try startAudioSession()
} catch {
    self.state = .error("couldn't start audio session")
    return
}

let inputNode = audioEngine.inputNode
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat)
    { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
        let audioBuffer = buffer.audioBufferList.pointee.mBuffers
        let data = Data(bytes: audioBuffer.mData!, count: Int(audioBuffer.mDataByteSize))
        if self.state == .running {
            WCManager.shared.sendWatchMessage(.speechRecognition(.chunk(data, frameCount: Int(buffer.frameLength))))
        }
    }
audioEngine.prepare()

do {
    let data = try NSKeyedArchiver.archivedData(withRootObject: recordingFormat, requiringSecureCoding: true)
    WCManager.shared.sendWatchMessage(.speechRecognition(.audioFormat(data)),
        errorHandler: { _ in
            self.state = .error("iphone unavailable")
    })
    self.state = .sentAudioFormat
} catch {
    self.state = .error("could not convert audio format")
}

iPhone保存音频格式并返回.ok或.error()：

guard let format = try? NSKeyedUnarchiver.unarchivedObject(ofClass: AVAudioFormat.self, from: data) else {
    // ...send back .error, destroy the recognitionTask
}
self.audioFormat = format
// ...send back .ok

Watch启动音频引擎

try audioEngine.start()

iPhone接收音频块并将其附加到recognitionRequest：

guard let pcm = AVAudioPCMBuffer(pcmFormat: audioFormat, frameCapacity: AVAudioFrameCount(frameCount)) else {
    // ...send back .error, destroy the recognitionTask
}

let channels = UnsafeBufferPointer(start: pcm.floatChannelData, count: Int(pcm.format.channelCount))
let data = chunk as NSData
data.getBytes(UnsafeMutableRawPointer(channels[0]), length: data.length)
recognitionRequest.append(pcm)

任何想法都非常赞赏。感谢您抽出时间！

我强烈怀疑问题是，由于链接速度太慢，你甚至没有接近实时。你正在添加微小的（可能短至20毫秒）声音样本，这些样本被长时间的沉默所分隔。这是无法识别的，即使是人耳。
我将从探索CMSampleBuffers开始，因为您可以设置它们的时间戳。这将让识别器知道这个缓冲区是什么时候被记录的，并删除静音。
如果这不起作用，则需要进行缓冲以积累足够的AVAudioPCMBuffer来执行分析。这将是一个更复杂的，所以希望CMSampleBuffers将工作代替。
在这两种情况下，您也可以考虑以压缩格式传输数据。我不确定watchOS支持什么格式，但你可以大大减少手表和手机之间的带宽需求。只是要小心不要压倒手表的CPU。您需要易于计算的压缩，而不是您可以获得的最紧密的压缩。
另外，我看不出您在这里配置的采样频率。一定要低。可能是8kHz。绝对没有理由只为了做语音转录而录制CD质量的声音。实际上更糟，因为它包含了太多不在人类声音范围内的频率。

swift 将音频从Watch流式传输到iPhone以使用SFSpeechRecognizer

1条答案

相关问题

热门标签

最新问答