numpy pyaudiostream.read在int16中返回静态,在float32中返回良好的音频

5jvtdoz2  于 2023-03-18  发布在  iOS
关注(0)|答案(1)|浏览(371)

我正在尝试将音频录制到raspberry pi上,但是遇到了一个问题。当我在PyAudio中使用paFloat32和np.frombuffer(np.float32)时,我得到了很好的音频。但是如果我使用paInt16和int16,我得到了垃圾静态。
此处为最小代码

import pyaudio
import numpy as np
import scipy.io.wavfile as wf

p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paFloat32,
            channels=2,
            rate=16000,
            input=True,
            input_device_index=0,
            frames_per_buffer=1024)
frames = [[],[]]  # Initialize array to store frames

print("recording...")

for i in range(0, 128):
    data = stream.read(1024)
    decoded = np.frombuffer(data, np.float32)
    decodedSplit = np.stack((decoded[::2], decoded[1::2]), axis=0)  # channels on separate axes
    frames = np.append(frames, decodedSplit, axis=1)


stream.close()
p.terminate()

wf.write('test.wav', 16000, frames.T)

据我所知,将paFloat32更改为paInt16,将np.float32更改为np.int16是我所需要做的全部工作,但是这些工作都不起作用。声卡是否可能有问题,或者是配置错误?我已经盯着这个问题看了两天,现在卡住了。同样,float工作得很完美,但是下一段代码依赖于其他人为int16编写的库

4urapxun

4urapxun1#

我没有尝试让float32工作,而是在HudsonHuang's Github中找到了一个float到pcmint16的转换器
由此,我能够重写记录方法以使用以下格式

import pyaudio
import numpy as np
import scipy.io.wavfile as wf


# From https://gist.github.com/HudsonHuang/fbdf8e9af7993fe2a91620d3fb86a182
def float2pcm(sig, dtype='int16'):
    """Convert floating point signal with a range from -1 to 1 to PCM.
    Any signal values outside the interval [-1.0, 1.0) are clipped.
    No dithering is used.
    Note that there are different possibilities for scaling floating
    point numbers to PCM numbers, this function implements just one of
    them.  For an overview of alternatives see
    http://blog.bjornroche.com/2009/12/int-float-int-its-jungle-out-                       
there.html
    Parameters
    ----------
    sig : array_like
        Input array, must have floating point type.
    dtype : data type, optional
        Desired (integer) data type.
    Returns
    -------
    numpy.ndarray
        Integer data, scaled and clipped to the range of the given
        *dtype*.
    See Also
    --------
    pcm2float, dtype
    """
    sig = np.asarray(sig)
    if sig.dtype.kind != 'f':
        raise TypeError("'sig' must be a float array")
    dtype = np.dtype(dtype)
    if dtype.kind not in 'iu':
        raise TypeError("'dtype' must be an integer type")

i = np.iinfo(dtype)
abs_max = 2 ** (i.bits - 1)
offset = i.min + abs_max
return (sig * abs_max + offset).clip(i.min, i.max).astype(dtype)

p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paFloat32,
            channels=2,
            rate=16000,
            input=True,
            input_device_index=0,
            frames_per_buffer=1024)
linearFrames = []
frames = [[], []]  # Initialize array to store frames

print("recording...")

for i in range(0, 128):
    data = stream.read(1024)
    decoded = np.frombuffer(data, np.float32)

    decoded = float2pcm(decoded, 'int16')

    linearFrames = np.append(linearFrames, decoded)

linearFrames = linearFrames.astype(np.int16)
decodedSplit = np.stack((linearFrames[::2], linearFrames[1::2]), axis=0)  # channels on separate axes

stream.close()
p.terminate()

wf.write('test.wav', 16000, decodedSplit.T.astype(np.int16))

相关问题