我试图将原始wav文件拉伸k倍。在下面的代码中,我使用stretch_factor=2,但理论上我希望这里有任何k。我的问题是output.wav没有被拉伸,甚至音频现在也很混乱。它不适用于k〈1的情况,我试图压缩音频
import numpy as np
from scipy.io import wavfile
sampling_rate, signal = wavfile.read("test.wav")
NFFT = 1024
overlap_factor = 4
stretch_factor = 2
hop_size = int(NFFT / overlap_factor)
num_hops = int(np.ceil(float(len(signal)) / hop_size))
pad_size = num_hops * hop_size - len(signal)
z = np.zeros((pad_size,))
pad_signal = np.concatenate((signal, z))
frames = np.array_split(pad_signal, num_hops)
output = np.zeros((0,), dtype=np.float32)
previous_phase = np.zeros((NFFT,), dtype=np.float32)
summed_phase = np.zeros((NFFT,), dtype=np.float32)
for i, frame in enumerate(frames):
spectrum = np.fft.fft(frame, n=NFFT)
magnitude = np.abs(spectrum)
phase = np.angle(spectrum)
frequencies = np.fft.fftfreq(NFFT, d=1.0/sampling_rate)
frequencies = np.repeat(frequencies, stretch_factor)
phase_diff = (frequencies[1] - frequencies[0]) * hop_size / sampling_rate
expected_phase = previous_phase + phase_diff * 2 * np.pi
delta_phase = phase - expected_phase
previous_phase = phase.copy()
summed_phase += delta_phase
phase_advances = hop_size * summed_phase / (2 * np.pi)
phase_advances = np.round(phase_advances).astype(int)
new_spectrum = np.zeros((NFFT,), dtype=np.complex64)
for j in range(NFFT):
j2 = j + phase_advances[j]
if j2 >= 0 and j2 < NFFT:
new_spectrum[j2] += spectrum[j]
new_frame = np.fft.ifft(new_spectrum).real
output = np.concatenate((output, new_frame[:hop_size]))
wavfile.write("output.wav", sampling_rate, output.astype(np.int16))
这是我第三次重写代码,这是第一次,当它终于做的东西没有错误。我是新的工作与音频,所以不知道该怎么做
1条答案
按热度按时间xggvc2p61#
如果你想让它听起来相似,请选中librosa.time_stretch。下面的例子以系数2加速,保持音高。
或者librosa.resample,如果你想相应地拉伸波形的话。下面的例子把它加速了2倍,并且把髓向上移动了一个八度。