如何将mediaRecorder生成并使用WebSocket传输到python的音频字节转换为numpy数组

hwazgwia  于 2022-11-11  发布在  Python
关注(0)|答案(1)|浏览(241)

我有一个由FastAPI创建的简单WebSocket项目,如以下代码所示:

import uvicorn
from fastapi import FastAPI, WebSocket
from fastapi.responses import HTMLResponse
import numpy as np
import soundfile as sf

app = FastAPI()

html = """
<!DOCTYPE html>
<html>
    <body>
        <h1>Transcribe Audio With FastAPI</h1>
        <p id="status">Connection status will go here</p>
        <p id="transcript"></p>
        <script>
               navigator.mediaDevices.getUserMedia({ audio: { sampleSize: 16, channelCount: 1, sampleRate: 16000 } }).then((stream) => {
            if (!MediaRecorder.isTypeSupported('audio/webm'))
                return alert('Browser not supported')

            const mediaRecorder = new MediaRecorder(stream, {
                mimeType: 'audio/webm',
            })

            const socket = new WebSocket('ws://localhost:8000/listen')

            socket.onopen = () => {
                document.querySelector('#status').textContent = 'Connected'
                console.log({ event: 'onopen' })
                mediaRecorder.addEventListener('dataavailable', async (event) => {
                    if (event.data.size > 0 && socket.readyState == 1) {
                        socket.send(event.data)
                    }
            })
            mediaRecorder.start(250)
            }

            socket.onclose = () => {
            console.log({ event: 'onclose' })
        }

            socket.onerror = (error) => {
                console.log({ event: 'onerror', error })
            }

           })
        </script>
    </body>
</html>
"""

@app.get("/")
async def get():
    return HTMLResponse(html)

@app.websocket("/listen")
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
    try:
        while True:
            data = await websocket.receive_bytes()
            print(data)
            # Convert data to numpy array
            # rest of the process!
    except Exception as e:
        raise Exception(f'Could not process audio: {e}')
    finally:
        await websocket.close()

if __name__ == '__main__':
    uvicorn.run(app)

运行项目后,我想将数据转换为numpy数组。
我尝试过的方法:1)的

def tensorize(x):
    arr = np.frombuffer(x, dtype=np.float32)
    # copy to avoid warning
    arr = np.copy(arr)
    return arr

@app.websocket("/listen")
async def websocket_endpoint(websocket: WebSocket):
    print("I'm here websocket_endpoint")
    await websocket.accept()

    try:
        # deepgram_socket = await process_audio(websocket)
        whole = []
        counter = 0
        while True:
            data = await websocket.receive_bytes()
            array = tensorize(data)
    except Exception as e:
        raise Exception(f'Could not process audio: {e}')
    finally:
        await websocket.close()

引发错误:

arr = np.frombuffer(x, dtype=np.float32)
ValueError: buffer size must be a multiple of element size
@app.websocket("/listen")
async def websocket_endpoint(websocket: WebSocket):
    print("I'm here websocket_endpoint")
    await websocket.accept()

    try:
        # deepgram_socket = await process_audio(websocket)
        whole = []
        counter = 0
        while True:
            data = await websocket.receive_bytes()
            data_s16 = np.frombuffer(data, dtype=np.int16, count=len(data) // 2, offset=0)
            float_data = data_s16 * 0.5**15
            whole.append(float_data)
            print(data)
            counter += 1
            if counter > 20:
                data = np.concatenate(whole)
                sf.write('stereo_file1.wav', data, 16000, 'PCM_24')
                break
            print(counter)
            # await websocket.send_text(f"Message text was: {data}")
            # deepgram_socket.send(data)
    except Exception as e:
        raise Exception(f'Could not process audio: {e}')
    finally:
        await websocket.close()

这个示例代码没有引发任何错误,但是输出的音频文件不包含任何可感知的音频。只保存了噪声。
1.尝试使用librosa & soundfile读取字节io,但无法识别格式
第一个
引发的错误:

Exception: Could not process audio: Error opening <_io.BytesIO object at 0x7f12a32cd0d0>: Format not recognised.
'''

**Update 1**

I was able to save the outputted chunk using the following code, but the audio should be created in the hard drive and then loaded using librosa, which is so slow!

导入librosa @app.WebSocket(“/listen”)异步定义websocket端点(websocket:WebSocket):打印(“我在这里WebSocket_endpoint”)等待websocket.accept()

try:
    while True:
        data = await websocket.receive_bytes()
        with open('audio.wav', 'wb') as f:
            f.write(data)
        array, sr = librosa.load("audio.wav")
except Exception as e:
    raise Exception(f'Could not process audio: {e}')
finally:
    await websocket.close()
fwzugrvs

fwzugrvs1#

最后,通过在pytorch/audio github仓库上创建一个有点类似的问题,我在这篇评论中从moto那里得到了答案,最终的解决方案如下:

@app.websocket("/listen")
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
    try:
        chunk_size = 1000
        while True:
            data = await websocket.receive_bytes()
            f = BytesIO(data)
            s = torchaudio.io.StreamReader(f)
            s.add_basic_audio_stream(chunk_size)
            array = torch.concat([chunk[0] for chunk in s.stream()])
    except Exception as e:
        raise Exception(f'Could not process audio: {e}')
    finally:
        await websocket.close()

我无法将数据转换为numpy数组,但如果有人需要,array.numpy()会返回numpy格式。

**PS:**相关库的版本:

[pip3] numpy==1.23.4
[pip3] torch==1.12.1
[pip3] torchaudio==0.12.1
[conda] numpy 1.23.4 pypi_0 pypi

操作系统:

Ubuntu: 22.04
torchaudio.backend: sox_io

相关问题