我需要从C++中的NDI音频流中获取音频电平或更好的EQ数据。下面是音频包的结构:
// This describes an audio frame.
typedef struct NDIlib_audio_frame_v3_t {
// The sample-rate of this buffer.
int sample_rate;
// The number of audio channels.
int no_channels;
// The number of audio samples per channel.
int no_samples;
// The timecode of this frame in 100-nanosecond intervals.
int64_t timecode;
// What FourCC describing the type of data for this frame.
NDIlib_FourCC_audio_type_e FourCC;
// The audio data.
uint8_t* p_data;
union {
// If the FourCC is not a compressed type and the audio format is planar, then this will be the
// stride in bytes for a single channel.
int channel_stride_in_bytes;
// If the FourCC is a compressed type, then this will be the size of the p_data buffer in bytes.
int data_size_in_bytes;
};
// Per frame metadata for this frame. This is a NULL terminated UTF8 string that should be in XML format.
// If you do not want any metadata then you may specify NULL here.
const char* p_metadata;
// This is only valid when receiving a frame and is specified as a 100-nanosecond time that was the exact
// moment that the frame was submitted by the sending side and is generated by the SDK. If this value is
// NDIlib_recv_timestamp_undefined then this value is not available and is NDIlib_recv_timestamp_undefined.
int64_t timestamp;
#if NDILIB_CPP_DEFAULT_CONSTRUCTORS
NDIlib_audio_frame_v3_t(
int sample_rate_ = 48000, int no_channels_ = 2, int no_samples_ = 0,
int64_t timecode_ = NDIlib_send_timecode_synthesize,
NDIlib_FourCC_audio_type_e FourCC_ = NDIlib_FourCC_audio_type_FLTP,
uint8_t* p_data_ = NULL, int channel_stride_in_bytes_ = 0,
const char* p_metadata_ = NULL,
int64_t timestamp_ = 0
);
#endif // NDILIB_CPP_DEFAULT_CONSTRUCTORS
} NDIlib_audio_frame_v3_t;
问题是,与视频帧不同,我完全不知道二进制音频是如何打包的,网上关于它的信息也少得多。到目前为止,我找到的最好的信息是这个项目:
https://github.com/gavinnn101/fishing_assistant/blob/7f5fcd73de1e39336226b5969cd1c5ca84c8058b/fishing_main.py#L124
它使用PyAudio,但是我不熟悉,他们使用16位音频格式,而我的似乎是32位,我也不能弄清楚结构。解包的东西,因为"%dh"%(count)
是告诉它一些数字,然后h
的简称,我不明白它将如何解释。
是否有任何C++库,可以采取指针的数据和类型,然后有函数提取声级,声级在一定的赫兹等?
或者只是一些关于我自己如何提取这个的好信息?:)
我在网上搜索了很多,但发现很少。我在填充音频帧时放置了一个断点,但当我意识到有太多变量需要考虑,而我对采样率、通道、样本计数等毫无头绪时,我就放弃了。
1条答案
按热度按时间8i9zcol21#
让它工作使用
称为
大声向chatGPT解释并为我提供可能的解决方案,直到我设法得到一个有效的解决方案:--)