c++ 从FLTP音频流获取音频声级

3htmauhk  于 2023-01-06  发布在  其他
关注(0)|答案(1)|浏览(233)

我需要从C++中的NDI音频流中获取音频电平或更好的EQ数据。下面是音频包的结构:

// This describes an audio frame.
typedef struct NDIlib_audio_frame_v3_t {
    // The sample-rate of this buffer.
    int sample_rate;

    // The number of audio channels.
    int no_channels;

    // The number of audio samples per channel.
    int no_samples;

    // The timecode of this frame in 100-nanosecond intervals.
    int64_t timecode;

    // What FourCC describing the type of data for this frame.
    NDIlib_FourCC_audio_type_e FourCC;

    // The audio data.
    uint8_t* p_data;

    union {
        // If the FourCC is not a compressed type and the audio format is planar, then this will be the
        // stride in bytes for a single channel.
        int channel_stride_in_bytes;

        // If the FourCC is a compressed type, then this will be the size of the p_data buffer in bytes.
        int data_size_in_bytes;
    };

    // Per frame metadata for this frame. This is a NULL terminated UTF8 string that should be in XML format.
    // If you do not want any metadata then you may specify NULL here.
    const char* p_metadata;

    // This is only valid when receiving a frame and is specified as a 100-nanosecond time that was the exact
    // moment that the frame was submitted by the sending side and is generated by the SDK. If this value is
    // NDIlib_recv_timestamp_undefined then this value is not available and is NDIlib_recv_timestamp_undefined.
    int64_t timestamp;

#if NDILIB_CPP_DEFAULT_CONSTRUCTORS
    NDIlib_audio_frame_v3_t(
        int sample_rate_ = 48000, int no_channels_ = 2, int no_samples_ = 0,
        int64_t timecode_ = NDIlib_send_timecode_synthesize,
        NDIlib_FourCC_audio_type_e FourCC_ = NDIlib_FourCC_audio_type_FLTP,
        uint8_t* p_data_ = NULL, int channel_stride_in_bytes_ = 0,
        const char* p_metadata_ = NULL,
        int64_t timestamp_ = 0
    );
#endif // NDILIB_CPP_DEFAULT_CONSTRUCTORS
} NDIlib_audio_frame_v3_t;

问题是,与视频帧不同,我完全不知道二进制音频是如何打包的,网上关于它的信息也少得多。到目前为止,我找到的最好的信息是这个项目:
https://github.com/gavinnn101/fishing_assistant/blob/7f5fcd73de1e39336226b5969cd1c5ca84c8058b/fishing_main.py#L124
它使用PyAudio,但是我不熟悉,他们使用16位音频格式,而我的似乎是32位,我也不能弄清楚结构。解包的东西,因为"%dh"%(count)是告诉它一些数字,然后h的简称,我不明白它将如何解释。
是否有任何C++库,可以采取指针的数据和类型,然后有函数提取声级,声级在一定的赫兹等?
或者只是一些关于我自己如何提取这个的好信息?:)
我在网上搜索了很多,但发现很少。我在填充音频帧时放置了一个断点,但当我意识到有太多变量需要考虑,而我对采样率、通道、样本计数等毫无头绪时,我就放弃了。

8i9zcol2

8i9zcol21#

让它工作使用

// This function calculates the RMS value of an audio frame
float calculateRMS(const NDIlib_audio_frame_v2_t& frame)
{
   // Calculate the number of samples in the frame
   int numSamples = frame.no_samples * frame.no_channels;

   // Get a pointer to the start of the audio data
   const float* data = frame.p_data;

   // Calculate the sum of the squares of the samples
   float sumSquares = 0.0f;
   for (int i = 0; i < numSamples; ++i)
   {
       float sample = data[i];
       sumSquares += sample * sample;
   }

   // Calculate the RMS value and return it
   return std::sqrt(sumSquares / numSamples);
}

称为

// Keep receiving audio frames and printing their RMS values
    NDIlib_audio_frame_v2_t audioFrame;
    while (true)
    {
        // Wait for the next audio frame to be received
        if (NDIlib_recv_capture_v2(pNDI_recv, NULL, &audioFrame, NULL, 0) != NDIlib_frame_type_audio)
            continue;

        // Print the RMS value of the audio frame
        std::cout << "RMS: " << calculateRMS(audioFrame) << std::endl;

        NDIlib_recv_free_audio_v2(pNDI_recv, &audioFrame);
    }

大声向chatGPT解释并为我提供可能的解决方案,直到我设法得到一个有效的解决方案:--)

相关问题