我想用nodejs和谷歌语音转文本API做一个实时转录应用程序。
我正在使用RecordRTC和socket.io来获取音频块到后端服务器。目前我正在录制1 s长的音频块,转录工作正常,但它不把它当作流,它在处理完每个音频块后发送一个响应。这意味着我正在返回半句话,谷歌不能使用上下文来帮助自己识别语音。
我的问题是如何让谷歌把我的数据块当作一个连续的流来对待。或者是否有另一种解决方案可以达到同样的效果?(那就是现场转录麦克风音频,或者非常接近现场)。
谷歌在他们的网站上有一个演示,这正是我正在寻找的,所以应该有可能做到这一点。
我的代码:(这主要是从自助服务亭音频流回购)
ss是套接字。io-stream
服务器端
io.on("connect", (socket) => {
socket.on("create-room", (data, cb) => createRoom(socket, data, cb))
socket.on("disconnecting", () => exitFromRoom(socket))
// getting the stream, it gets called every 1s with a blob
ss(socket).on("stream-speech", async function (stream: any, data: any) {
const filename = path.basename("stream.wav")
const writeStream = fs.createWriteStream(filename)
stream.pipe(writeStream)
speech.speechStreamToText(
stream,
async function (transcribeObj: any) {
socket.emit("transcript", transcribeObj.transcript)
}
)
})
async speechStreamToText(stream: any, cb: Function) {
sttRequest.config.languageCode = "en-US"
sttRequest = {
config: {
sampleRateHertz: 16000,
encoding: "WEBM_OPUS",
enableAutomaticPunctuation: true,
},
singleUtterance: false,
}
const stt = speechToText.SpeechClient()
//setup the stt stream
const recognizeStream = stt
.streamingRecognize(sttRequest)
.on("data", function (data: any) {
//this gets called every second and I get transciption chunks which usually have close to no sense
console.log(data.results[0].alternatives)
})
.on("error", (e: any) => {
console.log(e)
})
.on("end", () => {
//this gets called every second.
console.log("on end")
})
stream.pipe(recognizeStream)
stream.on("end", function () {
console.log("socket.io stream ended")
})
}
客户端
const sendBinaryStream = (blob: Blob) => {
const stream = ss.createStream()
ss(socket).emit("stream-speech", stream, {
name: "_temp/stream.wav",
size: blob.size,
})
ss.createBlobReadStream(blob).pipe(stream)
}
useEffect(() => {
let recorder: any
if (activeChat) {
navigator.mediaDevices.getUserMedia({ audio: true, video: false }).then((stream) => {
streamRef.current = stream
recorder = new RecordRTC(stream, {
type: "audio",
mimeType: "audio/webm",
sampleRate: 44100,
desiredSampleRate: 16000,
timeSlice: 1000,
numberOfAudioChannels: 1,
recorderType: StereoAudioRecorder,
ondataavailable(blob: Blob) {
sendBinaryStream(blob)
},
})
recorder.startRecording()
})
}
return () => {
recorder?.stopRecording()
streamRef.current?.getTracks().forEach((track) => track.stop())
}
}, [])
任何帮助都是感激的!
2条答案
按热度按时间cyvaqqii1#
我也有同样的问题!
也许谷歌官方演示正在使用node-record-lpcm 16和SoX:https://cloud.google.com/speech-to-text/docs/streaming-recognize?hl=en
hpxqektj2#
我建议使用Web Audio API来处理音频流,并通过Web Socket将其发送到后端,下面是使用
navigator.mediaDevices.getUserMedia()
API生成音频流的a working demo。Web Audio API是一个高级JavaScript API,用于在Web应用程序中处理和合成音频。我们可以创建一个RecorderProcessor类来扩展AudioWorkletProcessor类,以实时录制和转录音频。
1.在react公共文件夹中创建recorderWorkletProcessor.js处理器文件,路径为
public/worklets/recorderWorkletProcessor.js
1.React客户端
1.服务器端(节点js)
型