c++ 默认new运算符和std::malloc之间的性能差异

rnmwe5a2  于 2023-06-07  发布在  其他
关注(0)|答案(2)|浏览(195)

我正在使用google benchmark运行boost/singleton_pool的基准测试。我碰巧做了一个模板类,用std::mallocstd::free重载了newdelete

template <typename Derived>
    class Standard
    {
      public:
        static void* operator new(std::size_t size) {
            return std::malloc(size);
        }

        static void operator delete(void *ptr, std::size_t) {
            std::free(ptr);
        }
    };

我用gooogle基准测试框架测试了一些类,比如intStandard<int>,我发现这个模板 Package 器会对性能产生影响。对于T类,Standard<T>似乎比T本身快一点。
我对此很困惑。我认为默认的newdelete也使用std::mallocstd::free。为什么他们是不同的??
一个最小的可复制的二进制文件:(需要google benchmark framework来编译)

#include "evelyn/PoolPolicy.h"
#include <cstdint>
#include <benchmark/benchmark.h>

template <size_t N>
struct Block
{
    int8_t data[N];
};

template <typename T>
class Standard
{
public:
    static void *operator new(std::size_t size)
    {
        return std::malloc(size);
    }

    static void operator delete(void *ptr, std::size_t)
    {
        std::free(ptr);
    }
};

using A = Standard<int>;
using B = int;

class PoolFixture : public benchmark::Fixture
{
public:
    void SetUp(const ::benchmark::State &state)
    {
    }

    void TearDown(const ::benchmark::State &state)
    {
    }
};

auto constexpr ITERATE_TIME = 1000;

BENCHMARK_F(PoolFixture, without_pool_only_allocate)
(benchmark::State &state)
{
    using Block = Block<10>;
    for (auto _ : state)
    {
        for (auto i = 0; i < ITERATE_TIME; ++i)
        {
            auto ptr = new Block;
        }
    }
}

BENCHMARK_F(PoolFixture, with_pool_only_allocate)
(benchmark::State &state)
{
    using Block = Standard<Block<10>>;
    for (auto _ : state)
    {
        for (auto i = 0; i < ITERATE_TIME; ++i)
        {
            auto ptr = new Block;
        }
    }
}

BENCHMARK_F(PoolFixture, without_pool_allocate_and_delete_1)
(benchmark::State &state)
{
    using Block = Block<10>;
    for (auto _ : state)
    {
        for (auto i = 0; i < ITERATE_TIME; ++i)
        {
            auto ptr = new Block;
            delete ptr;
        }
    }
}

BENCHMARK_F(PoolFixture, with_pool_allocate_and_delete_1)
(benchmark::State &state)
{
    using Block = Standard<Block<10>>;
    for (auto _ : state)
    {
        for (auto i = 0; i < ITERATE_TIME; ++i)
        {
            auto ptr = new Block;
            delete ptr;
        }
    }
}
ffvjumwh

ffvjumwh1#

免责声明:在这里,我关注的是性能测试编写错误的事实。我不分析测试代码是否有效。

基本上你的测试无效。

优化程序删除了大部分测试代码,导致无效结果。此外,这种内存泄漏也令人不安。请仔细阅读machine code produced-第一眼你可以看到很多代码已经被优化器删除。
下面是我在测试中禁用优化的尝试(注意我必须减少ITERATE_TIME值)。

#include <cstdint>
#include <memory>

template <size_t N>
struct Block
{
    int8_t data[N];
};

template <typename T>
class Standard
{
public:
    static void *operator new(std::size_t size)
    {
        return std::malloc(size);
    }

    static void operator delete(void *ptr, std::size_t)
    {
        std::free(ptr);
    }
};

using A = Standard<int>;
using B = int;

class PoolFixture : public benchmark::Fixture
{
public:
    void SetUp(const ::benchmark::State &state)
    {
    }

    void TearDown(const ::benchmark::State &state)
    {
    }
};

BENCHMARK_F(PoolFixture, without_pool_only_allocate)
(benchmark::State &state)
{
    using Block = Block<10>;
    for (auto _ : state)
    {
        auto ptr = new Block;
        benchmark::DoNotOptimize(*ptr);
        benchmark::DoNotOptimize(ptr);
    }
}

BENCHMARK_F(PoolFixture, with_pool_only_allocate)
(benchmark::State &state)
{
    using Block = Standard<Block<10>>;
    for (auto _ : state)
    {
        auto ptr = new Block;
        benchmark::DoNotOptimize(*ptr);
        benchmark::DoNotOptimize(ptr);
    }
}

BENCHMARK_F(PoolFixture, without_pool_allocate_and_delete_1)
(benchmark::State &state)
{
    using Block = Block<10>;
    for (auto _ : state)
    {
        auto ptr = std::make_unique<Block>();
        benchmark::DoNotOptimize(*ptr);
        benchmark::DoNotOptimize(ptr);
    }
}

BENCHMARK_F(PoolFixture, with_pool_allocate_and_delete_1)
(benchmark::State &state)
{
    using Block = Standard<Block<10>>;
    for (auto _ : state)
    {
        auto ptr = std::make_unique<Block>();
        benchmark::DoNotOptimize(*ptr);
        benchmark::DoNotOptimize(ptr);
    }
}

遗憾的是,quick-bench.com在此版本上超时,因此无法提供演示
在C++中编写有效的性能测试,充满了狡猾的陷阱,你可以测量错误的东西,所以要小心。

7vux5j2d

7vux5j2d2#

operator new的标准版本比您的版本做得更多。如果分配请求失败(即malloc返回NULL),它首先调用一系列可能的用户指定的“新处理程序”,如果这些都不能解决问题,它将抛出异常。never返回NULL指针。有关详细信息,请参见第1)段here

相关问题