python 音频帧未转换为ndarray

c7rzv4ha 于 2023-01-12 发布在 Python

关注(0)|答案(3)|浏览(240)

我尝试运行一个colab文件训练openAI的点唱机，但是当我尝试运行加载音频的函数代码时，我得到一个错误：
文件“/content/jukebox/jukebox/data/files_dataset.py”，第82行，位于get_song_chunk数据中，sr = load_audio（文件名，sr=自身.sr，偏移量=偏移量，持续时间=自身.样本长度）文件“/content/jukebox/jukebox/utils/io.py“，第48行，位于load_audio frame = frame.to_ndarray（格式=”fltp“）中#转换为浮点数而非整数16属性错误：“list”对象没有属性“to_ndarray”
它似乎将帧输入解释为一个列表，当打印时，它看起来像这样：
[〈平均音频帧0，pts=无，22050 Hz时778个样本，立体声，0x 7 fd 03 dd 64150时fltp〉]
当我尝试更改为frame = resampler.resample(frame)时，我得到这个错误：
类型错误：“av.audio.frame.AudioFrame”对象不能解释为整数
我真的不知道很多关于音频文件，所以我不知道如何调试，并希望在这里得到帮助。
加载音频的完整代码如下。

def load_audio(file, sr, offset, duration, resample=True, approx=False, time_base='samples', check_duration=True):
    if time_base == 'sec':
        offset = offset * sr
        duration = duration * sr
    # Loads at target sr, stereo channels, seeks from offset, and stops after duration
    container = av.open(file)
    audio = container.streams.get(audio=0)[0] # Only first audio stream
    audio_duration = audio.duration * float(audio.time_base)
    if approx:
        if offset + duration > audio_duration*sr:
            # Move back one window. Cap at audio_duration
            offset = np.min(audio_duration*sr - duration, offset - duration)
    else:
        if check_duration:
            assert offset + duration <= audio_duration*sr, f'End {offset + duration} beyond duration {audio_duration*sr}'
    if resample:
        resampler = av.AudioResampler(format='fltp',layout='stereo', rate=sr)
    else:
        assert sr == audio.sample_rate
    offset = int(offset / sr / float(audio.time_base)) #int(offset / float(audio.time_base)) # Use units of time_base for seeking
    duration = int(duration) #duration = int(duration * sr) # Use units of time_out ie 1/sr for returning
    sig = np.zeros((2, duration), dtype=np.float32)
    container.seek(offset, stream=audio)
    total_read = 0
    for frame in container.decode(audio=0): # Only first audio stream
        if resample:
            frame.pts = None
            frame = resampler.resample(frame)
        frame = frame.to_ndarray(format='fltp') # Convert to floats and not int16
        read = frame.shape[-1]
        if total_read + read > duration:
            read = duration - total_read
        sig[:, total_read:total_read + read] = frame[:, :read]
        total_read += read
        if total_read == duration:
            break
    assert total_read <= duration, f'Expected {duration} frames, got {total_read}'
    return sig, sr

python

来源：https://stackoverflow.com/questions/72781717/audio-frame-not-converting-to-ndarray

3条答案

按热度按时间

ki0zmccv1#

如果变量frame被解释为一个列表，那么可以用frame = resampler.resample(frame)[0]替换frame = resampler.resample(frame)，这样做之后代码就可以正常运行了。

赞(0）回复(0）举报 2023-01-12

2g32fytz2#

尝试将frame = frame.to_ndarray(format='fltp')替换为变量frame的直接赋值：

import numpy as np

#frame = frame.to_ndarray(format='fltp') #Original line
frame = np.ndarray(frame)

如果希望它是特定的数据类型，可以更改ndarray函数的dtype参数：

frame = np.ndarray(frame, dtype=np.float32)

赞(0）回复(0）举报 2023-01-12

i7uq4tfw3#

尝试：frame = frame[0].to_ndarray(format='fltp')

赞(0）回复(0）举报 2023-01-12

我来回答

python 音频帧未转换为ndarray

3条答案

相关问题

热门标签

最新问答