numpy 短时傅立叶变换和逆变换中的数据丢失:需要帮助提高音频质量

piztneat  于 2023-06-23  发布在  其他
关注(0)|答案(1)|浏览(155)

我目前正在开发自己的音频库,并已实现了短时傅立叶变换(STFT)及其逆信号处理流水线的一部分。然而,我注意到STFT及其逆运算似乎导致了大量数据丢失,导致音频质量非常差。
首先,我将音频信号划分为重叠的帧,并且对于每一帧,我应用窗函数来减少频谱泄漏。然后,我对每一帧进行傅里叶变换,以获得频域表示。
然后,作为测试,我执行相反的操作以获取音频。我把这些结果绘制在下面,你可以看到退化。

以下是函数:

class AudioLib:
    '''Library of audio processing functions.'''
    def __init__(self, blocksize=1024 * 2):
        self.blocksize = blocksize
        self.window = np.hanning(blocksize)

    def stft(self, audio):
        '''Compute the short-time Fourier transform of the audio.'''
        # Split the audio into overlapping blocks
        num_blocks = len(audio) // self.blocksize
        blocks = np.reshape(audio[:num_blocks * self.blocksize], (num_blocks, self.blocksize))

        # Apply the windowing function to each block
        windowed_blocks = blocks * self.window[np.newaxis, :]

        # Compute the Fourier transform of each block
        spectrum = np.fft.fft(windowed_blocks, axis=1)

        return spectrum
    
    def istft(self, spectrum):
        '''Compute the inverse short-time Fourier transform of the spectrum.'''
        # Compute the inverse Fourier transform of each block
        windowed_blocks = np.fft.ifft(spectrum, axis=1).real

        # Apply overlap-and-add to reconstruct the output signal
        output = np.zeros(len(spectrum) * self.blocksize)
        for i, block in enumerate(windowed_blocks):
            output[i * self.blocksize : (i + 1) * self.blocksize] += block

        return output

我已经尝试改变块的大小,虽然减少大小改善了音频有些,它仍然不完美,我觉得好像我的实现是不正确的。任何关于这方面的帮助将不胜感激!

gzjq41n4

gzjq41n41#

正如Christoph拉克维茨所说,这种STFT实现的问题在于块是不重叠的。对于可逆性,您希望每个块与下一个块有50%的重叠。
这里是用于提取和重叠相加块的可能的简单实现:

# Copyright 2023 Google LLC.
# SPDX-License-Identifier: Apache-2.0

def extract_blocks(audio: np.ndarray, blocksize: int) -> np.ndarray:
  """Extracts blocks with 50% overlap."""
  hop_step = blocksize // 2
  blocks = []
  offset = 0

  while offset + blocksize <= len(audio):
    blocks.append(audio[offset:(offset + blocksize)])
    offset += hop_step

  return np.column_stack(blocks)

def overlap_add_blocks(blocks: np.ndarray) -> np.ndarray:
  """Overlap-adds blocks with 50% overlap."""
  blocksize, num_blocks = blocks.shape
  hop_step = blocksize // 2
  output = np.zeros(blocksize + (num_blocks - 1) * hop_step)
  offset = 0

  for i in range(num_blocks):
    output[offset:(offset + blocksize)] += blocks[:, i]
    offset += hop_step
  
  return output

如果你想更有效地完成这个任务,请查看numpy stride_tricks
另一个细节:为了精确的可逆性,窗口应该是这样的,即添加50%重叠的翻译副本产生1.0。要做到这一点,np.hanning需要一个小的修正。将self.window = np.hanning(blocksize)更改为

self.window = np.hanning(blocksize + 1)[:blocksize]

此图显示blocksize = 32的差异。粗线是窗口的总和。请注意,对于np.hanning(blocksize),粗线是摆动的,但对于np.hanning(blocksize + 1)[:blocksize],粗线完全平坦,等于1.0。

相关问题