c++ 与win32 CRITICAL_SECTION相比的std::互斥锁性能

t1qtbnec  于 2023-03-05  发布在  其他
关注(0)|答案(6)|浏览(131)

std::mutex的性能与CRITICAL_SECTION相比如何?是否不相上下?
我需要轻量级同步对象(不需要是进程间对象)除了std::mutex之外,是否有任何STL类接近CRITICAL_SECTION

ekqde3dh

ekqde3dh1#

请看我在答案末尾的更新,自Visual Studio 2015以来,情况发生了巨大变化。原始答案如下。
我做了一个非常简单的测试,根据我的测量,std::mutexCRITICAL_SECTION慢50- 70倍左右。

std::mutex:       18140574us
CRITICAL_SECTION: 296874us

编辑:经过更多的测试,它原来取决于线程数(拥塞)和CPU核心数。一般来说,std::mutex是慢,但多少,这取决于使用。以下是更新的测试结果(测试的MacBook Pro与核心i5- 4258 U,Windows 10,Bootcamp):

Iterations: 1000000
Thread count: 1
std::mutex:       78132us
CRITICAL_SECTION: 31252us
Thread count: 2
std::mutex:       687538us
CRITICAL_SECTION: 140648us
Thread count: 4
std::mutex:       1031277us
CRITICAL_SECTION: 703180us
Thread count: 8
std::mutex:       86779418us
CRITICAL_SECTION: 1634123us
Thread count: 16
std::mutex:       172916124us
CRITICAL_SECTION: 3390895us

以下是生成此输出的代码。使用Visual Studio 2012、默认项目设置和Win32版本配置编译。请注意,此测试可能不完全正确,但它使我在将代码从CRITICAL_SECTION切换到std::mutex之前三思而后行。

#include "stdafx.h"
#include <Windows.h>
#include <mutex>
#include <thread>
#include <vector>
#include <chrono>
#include <iostream>

const int g_cRepeatCount = 1000000;
const int g_cThreadCount = 16;

double g_shmem = 8;
std::mutex g_mutex;
CRITICAL_SECTION g_critSec;

void sharedFunc( int i )
{
    if ( i % 2 == 0 )
        g_shmem = sqrt(g_shmem);
    else
        g_shmem *= g_shmem;
}

void threadFuncCritSec() {
    for ( int i = 0; i < g_cRepeatCount; ++i ) {
        EnterCriticalSection( &g_critSec );
        sharedFunc(i);
        LeaveCriticalSection( &g_critSec );
    }
}

void threadFuncMutex() {
    for ( int i = 0; i < g_cRepeatCount; ++i ) {
        g_mutex.lock();
        sharedFunc(i);
        g_mutex.unlock();
    }
}

void testRound(int threadCount)
{
    std::vector<std::thread> threads;

    auto startMutex = std::chrono::high_resolution_clock::now();
    for (int i = 0; i<threadCount; ++i)
        threads.push_back(std::thread( threadFuncMutex ));
    for ( std::thread& thd : threads )
        thd.join();
    auto endMutex = std::chrono::high_resolution_clock::now();

    std::cout << "std::mutex:       ";
    std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endMutex - startMutex).count();
    std::cout << "us \n\r";

    threads.clear();
    auto startCritSec = std::chrono::high_resolution_clock::now();
    for (int i = 0; i<threadCount; ++i)
        threads.push_back(std::thread( threadFuncCritSec ));
    for ( std::thread& thd : threads )
        thd.join();
    auto endCritSec = std::chrono::high_resolution_clock::now();

    std::cout << "CRITICAL_SECTION: ";
    std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endCritSec - startCritSec).count();
    std::cout << "us \n\r";
}

int _tmain(int argc, _TCHAR* argv[]) {
    InitializeCriticalSection( &g_critSec );

    std::cout << "Iterations: " << g_cRepeatCount << "\n\r";

    for (int i = 1; i <= g_cThreadCount; i = i*2) {
        std::cout << "Thread count: " << i << "\n\r";
        testRound(i);
        Sleep(1000);
    }

    DeleteCriticalSection( &g_critSec );

    // Added 10/27/2017 to try to prevent the compiler to completely
    // optimize out the code around g_shmem if it wouldn't be used anywhere.
    std::cout << "Shared variable value: " << g_shmem << std::endl;
    getchar();
    return 0;
}
  • 2017年10月27日更新(1)*:一些答案表明这不是一个现实的测试,也不代表一个“真实世界”的场景,这是真的,这个测试试图测量std::mutex的“开销”,它并不是试图证明99%的应用程序的差异是可以忽略的。
  • 2017年10月27日更新(2)*:从Visual Studio 2015(VC 140)开始,情况似乎变得对std::mutex有利。我使用了VS 2017 IDE,与上面完全相同的代码,x64发布配置,禁用优化,我只是简单地切换了每个测试的“平台工具集”。结果非常令人惊讶,我真的很好奇VC 140中挂起了什么。

  • 2020年2月25日更新(3)*:使用Visual Studio 2019(工具集v142)重新运行测试,情况仍然相同:std::mutexCRITICAL_SECTION快两到三倍。
juud5qan

juud5qan2#

waldez的测试是不现实的,它基本上模拟了100%的竞争。一般来说,这正是你不希望在多线程代码。下面是一个修改后的测试,它做了一些共享计算。我用这个代码得到的结果是不同的:

Tasks: 160000
Thread count: 1
std::mutex:       12096ms
CRITICAL_SECTION: 12060ms
Thread count: 2
std::mutex:       5206ms
CRITICAL_SECTION: 5110ms
Thread count: 4
std::mutex:       2643ms
CRITICAL_SECTION: 2625ms
Thread count: 8
std::mutex:       1632ms
CRITICAL_SECTION: 1702ms
Thread count: 12
std::mutex:       1227ms
CRITICAL_SECTION: 1244ms

这里你可以看到,对于我(使用VS2013)来说,std::mutex和CRITICAL_SECTION之间的数字非常接近。注意,这段代码执行固定数量的任务(160,000),这就是为什么性能随着线程数量的增加而提高。我这里有12个内核,所以我在12个内核上停了下来。
我并不是说与其他测试相比,这是正确的还是错误的,但它确实强调了时序问题通常是特定于领域的。

#include "stdafx.h"
#include <Windows.h>
#include <mutex>
#include <thread>
#include <vector>
#include <chrono>
#include <iostream>

const int tastCount = 160000;
int numThreads;
const int MAX_THREADS = 16;

double g_shmem = 8;
std::mutex g_mutex;
CRITICAL_SECTION g_critSec;

void sharedFunc(int i, double &data)
{
    for (int j = 0; j < 100; j++)
    {
        if (j % 2 == 0)
            data = sqrt(data);
        else
            data *= data;
    }
}

void threadFuncCritSec() {
    double lMem = 8;
    int iterations = tastCount / numThreads;
    for (int i = 0; i < iterations; ++i) {
        for (int j = 0; j < 100; j++)
            sharedFunc(j, lMem);
        EnterCriticalSection(&g_critSec);
        sharedFunc(i, g_shmem);
        LeaveCriticalSection(&g_critSec);
    }
    printf("results: %f\n", lMem);
}

void threadFuncMutex() {
    double lMem = 8;
    int iterations = tastCount / numThreads;
    for (int i = 0; i < iterations; ++i) {
        for (int j = 0; j < 100; j++)
            sharedFunc(j, lMem);
        g_mutex.lock();
        sharedFunc(i, g_shmem);
        g_mutex.unlock();
    }
    printf("results: %f\n", lMem);
}

void testRound()
{
    std::vector<std::thread> threads;

    auto startMutex = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < numThreads; ++i)
        threads.push_back(std::thread(threadFuncMutex));
    for (std::thread& thd : threads)
        thd.join();
    auto endMutex = std::chrono::high_resolution_clock::now();

    std::cout << "std::mutex:       ";
    std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(endMutex - startMutex).count();
    std::cout << "ms \n\r";

    threads.clear();
    auto startCritSec = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < numThreads; ++i)
        threads.push_back(std::thread(threadFuncCritSec));
    for (std::thread& thd : threads)
        thd.join();
    auto endCritSec = std::chrono::high_resolution_clock::now();

    std::cout << "CRITICAL_SECTION: ";
    std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(endCritSec - startCritSec).count();
    std::cout << "ms \n\r";
}

int _tmain(int argc, _TCHAR* argv[]) {
    InitializeCriticalSection(&g_critSec);

    std::cout << "Tasks: " << tastCount << "\n\r";

    for (numThreads = 1; numThreads <= MAX_THREADS; numThreads = numThreads * 2) {
        if (numThreads == 16)
            numThreads = 12;
        Sleep(100);
        std::cout << "Thread count: " << numThreads << "\n\r";
        testRound();
    }

    DeleteCriticalSection(&g_critSec);
    return 0;
}
3lxsmp7m

3lxsmp7m3#

我在这里搜索pthread与临界区的基准测试,然而,我的结果与waldez关于这个主题的回答不同,我想分享一下会很有趣。
代码是@waldez使用的,修改后添加了pthreads到比较中,用GCC编译,没有优化。我的CPU是AMD A8- 3530 MX。
Windows 7家庭版:

>a.exe
Iterations: 1000000
Thread count: 1
std::mutex:       46800us
CRITICAL_SECTION: 31200us
pthreads:         31200us
Thread count: 2
std::mutex:       171600us
CRITICAL_SECTION: 218400us
pthreads:         124800us
Thread count: 4
std::mutex:       327600us
CRITICAL_SECTION: 374400us
pthreads:         249600us
Thread count: 8
std::mutex:       967201us
CRITICAL_SECTION: 748801us
pthreads:         717601us
Thread count: 16
std::mutex:       2745604us
CRITICAL_SECTION: 1497602us
pthreads:         1903203us

正如您所看到的,差异在统计误差范围内变化很大-有时std::mutex更快,有时不是。重要的是,我没有观察到像原始答案那样大的差异。
我想,也许原因是,当答案被张贴,MSVC编译器是不好与较新的标准,并注意到,原来的答案已使用的版本从2012年.
另外,出于好奇,在Archlinux上的Wine下也有同样的二进制文件:

$ wine a.exe
fixme:winediag:start_process Wine Staging 2.19 is a testing version containing experimental patches.
fixme:winediag:start_process Please mention your exact version when filing bug reports on winehq.org.
Iterations: 1000000
Thread count: 1
std::mutex:       53810us 
CRITICAL_SECTION: 95165us 
pthreads:         62316us 
Thread count: 2
std::mutex:       604418us 
CRITICAL_SECTION: 1192601us 
pthreads:         688960us 
Thread count: 4
std::mutex:       779817us 
CRITICAL_SECTION: 2476287us 
pthreads:         818022us 
Thread count: 8
std::mutex:       1806607us 
CRITICAL_SECTION: 7246986us 
pthreads:         809566us 
Thread count: 16
std::mutex:       2987472us 
CRITICAL_SECTION: 14740350us 
pthreads:         1453991us

Waldez的密码加上我的修改

#include <math.h>
#include <windows.h>
#include <mutex>
#include <thread>
#include <vector>
#include <chrono>
#include <iostream>
#include <pthread.h>

const int g_cRepeatCount = 1000000;
const int g_cThreadCount = 16;

double g_shmem = 8;
std::mutex g_mutex;
CRITICAL_SECTION g_critSec;
pthread_mutex_t pt_mutex;

void sharedFunc( int i )
{
    if ( i % 2 == 0 )
        g_shmem = sqrt(g_shmem);
    else
        g_shmem *= g_shmem;
}

void threadFuncCritSec() {
    for ( int i = 0; i < g_cRepeatCount; ++i ) {
        EnterCriticalSection( &g_critSec );
        sharedFunc(i);
        LeaveCriticalSection( &g_critSec );
    }
}

void threadFuncMutex() {
    for ( int i = 0; i < g_cRepeatCount; ++i ) {
        g_mutex.lock();
        sharedFunc(i);
        g_mutex.unlock();
    }
}

void threadFuncPTMutex() {
    for ( int i = 0; i < g_cRepeatCount; ++i ) {
        pthread_mutex_lock(&pt_mutex);
        sharedFunc(i);
        pthread_mutex_unlock(&pt_mutex);
    }
}
void testRound(int threadCount)
{
    std::vector<std::thread> threads;

    auto startMutex = std::chrono::high_resolution_clock::now();
    for (int i = 0; i<threadCount; ++i)
        threads.push_back(std::thread( threadFuncMutex ));
    for ( std::thread& thd : threads )
        thd.join();
    auto endMutex = std::chrono::high_resolution_clock::now();

    std::cout << "std::mutex:       ";
    std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endMutex - startMutex).count();
    std::cout << "us \n";
    g_shmem = 0;

    threads.clear();
    auto startCritSec = std::chrono::high_resolution_clock::now();
    for (int i = 0; i<threadCount; ++i)
        threads.push_back(std::thread( threadFuncCritSec ));
    for ( std::thread& thd : threads )
        thd.join();
    auto endCritSec = std::chrono::high_resolution_clock::now();

    std::cout << "CRITICAL_SECTION: ";
    std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endCritSec - startCritSec).count();
    std::cout << "us \n";
    g_shmem = 0;

    threads.clear();
    auto startPThread = std::chrono::high_resolution_clock::now();
    for (int i = 0; i<threadCount; ++i)
        threads.push_back(std::thread( threadFuncPTMutex ));
    for ( std::thread& thd : threads )
        thd.join();
    auto endPThread = std::chrono::high_resolution_clock::now();

    std::cout << "pthreads:         ";
    std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endPThread - startPThread).count();
    std::cout << "us \n";
    g_shmem = 0;
}

int main() {
    InitializeCriticalSection( &g_critSec );
    pthread_mutex_init(&pt_mutex, 0);

    std::cout << "Iterations: " << g_cRepeatCount << "\n";

    for (int i = 1; i <= g_cThreadCount; i = i*2) {
        std::cout << "Thread count: " << i << "\n";
        testRound(i);
        Sleep(1000);
    }

    getchar();
    DeleteCriticalSection( &g_critSec );
    pthread_mutex_destroy(&pt_mutex);
    return 0;
}
uoifb46i

uoifb46i4#

2015年2月的原始答复:

我使用的是Visual Studio 2013。
我在单线程使用方面的结果看起来与waldez的结果相似:
100万次锁定/解锁调用:

CRITICAL_SECTION:       19 ms
std::mutex:             48 ms
std::recursive_mutex:   48 ms

Microsoft更改实现的原因是C11兼容性。C11在标准名称空间中有4种互斥锁:

Microsoft std::mutex和所有其他mutex是关键部分周围的 Package 器:

struct _Mtx_internal_imp_t
    {   /* Win32 mutex */
        int type; // here MS keeps particular mutex type
        Concurrency::critical_section cs;
        long thread_id;
        int count;
    };

对于我来说,std::recursive_mutex应该完全匹配临界区。所以微软应该优化它的实现,以占用更少的CPU和内存。

2023年2月更新:

最新调查显示,最新版本的MSVC与MSVC 2013相比,std::mutex实现存在差异。我尝试了以下编译器/STL,它们显示了相同的行为:

  • MSVC 2019(软件开发工具包14.29.30133)
  • MSVC 2022(软件开发工具包14.33.31629)

默认情况下,它们都使用SRW locks实现std::mutex
但是CRT在运行时仍然可以选择基于CRITICAL_SECTION的实现。
下面是现代底层结构的定义:

struct _Mtx_internal_imp_t { // ConcRT mutex
    int type;
    std::aligned_storage_t<Concurrency::details::stl_critical_section_max_size,
        Concurrency::details::stl_critical_section_max_alignment>
        cs;
    long thread_id;
    int count;
    Concurrency::details::stl_critical_section_interface* _get_cs() { // get pointer to implementation
        return reinterpret_cast<Concurrency::details::stl_critical_section_interface*>(&cs);
    }
};

它是这样初始化的:

void _Mtx_init_in_situ(_Mtx_t mtx, int type) { // initialize mutex in situ
    Concurrency::details::create_stl_critical_section(mtx->_get_cs());
    mtx->thread_id = -1;
    mtx->type      = type;
    mtx->count     = 0;
}

inline void create_stl_critical_section(stl_critical_section_interface* p) {
#ifdef _CRT_WINDOWS
    new (p) stl_critical_section_win7;
#else
    switch (__stl_sync_api_impl_mode) {
    case __stl_sync_api_modes_enum::normal:
    case __stl_sync_api_modes_enum::win7:
        if (are_win7_sync_apis_available()) {
            new (p) stl_critical_section_win7;
            return;
        }
        // fall through
    case __stl_sync_api_modes_enum::vista:
        new (p) stl_critical_section_vista;
        return;
    default:
        abort();
    }
#endif // _CRT_WINDOWS
}

are_win7_sync_apis_available正在检查运行时是否存在API函数TryAcquireSRWLockExclusive
如您所见,例如,如果create_stl_critical_section运行在Windows Vista上,它将选择stl_critical_section_vista
我们还可以通过调用未记录的函数__set_stl_sync_api_mode来强制CRT选择基于CRITICAL_SECTION的实现:

#include <mutex>

enum class __stl_sync_api_modes_enum { normal, win7, vista, concrt };
extern "C" _CRTIMP2 void __cdecl __set_stl_sync_api_mode(__stl_sync_api_modes_enum mode);

int main()
{
    __set_stl_sync_api_mode(__stl_sync_api_modes_enum::vista);
    std::mutex protect; // now it is forced to use CRITICAL_SECTION inside
}

这对动态CRT链接(DLL)和静态CRT都有效。但是静态CRT的调试要容易得多(在调试模式下)。

tzcvj98z

tzcvj98z5#

Waldez修改了相同的test program,以使用pthreads和boost::mutex运行。
在win10 pro上(使用intel i7- 7820 X 16核cpu),我从VS2015 update 3上的std::mutex得到了比CRITICAL_SECTION更好的结果(从boost::mutex得到的结果甚至更好):

Iterations: 1000000

Thread count: 1
std::mutex:       23403us
boost::mutex:     12574us
CRITICAL_SECTION: 19454us

Thread count: 2
std::mutex:       55031us
boost::mutex:     45263us
CRITICAL_SECTION: 187597us

Thread count: 4
std::mutex:       113964us
boost::mutex:     83699us
CRITICAL_SECTION: 605765us

Thread count: 8
std::mutex:       266091us
boost::mutex:     155265us
CRITICAL_SECTION: 1908491us

Thread count: 16
std::mutex:       633032us
boost::mutex:     300076us
CRITICAL_SECTION: 4015176us

pthreads的结果为here

#ifdef _WIN32
#include <Windows.h>
#endif
#include <mutex>
#include <boost/thread/mutex.hpp>
#include <thread>
#include <vector>
#include <chrono>
#include <iostream>

const int g_cRepeatCount = 1000000;
const int g_cThreadCount = 16;

double g_shmem = 8;
std::recursive_mutex g_mutex;
boost::mutex g_boostMutex;

void sharedFunc(int i)
{
    if (i % 2 == 0)
        g_shmem = sqrt(g_shmem);
    else
        g_shmem *= g_shmem;
}

#ifdef _WIN32
CRITICAL_SECTION g_critSec;
void threadFuncCritSec()
{
    for (int i = 0; i < g_cRepeatCount; ++i)
    {
        EnterCriticalSection(&g_critSec);
        sharedFunc(i);
        LeaveCriticalSection(&g_critSec);
    }
}
#else
pthread_mutex_t pt_mutex;
void threadFuncPtMutex()
{
    for (int i = 0; i < g_cRepeatCount; ++i) {
        pthread_mutex_lock(&pt_mutex);
        sharedFunc(i);
        pthread_mutex_unlock(&pt_mutex);
    }
}
#endif

void threadFuncMutex()
{
    for (int i = 0; i < g_cRepeatCount; ++i)
    {
        g_mutex.lock();
        sharedFunc(i);
        g_mutex.unlock();
    }
}

void threadFuncBoostMutex()
{
    for (int i = 0; i < g_cRepeatCount; ++i)
    {
        g_boostMutex.lock();
        sharedFunc(i);
        g_boostMutex.unlock();
    }
}

void testRound(int threadCount)
{
    std::vector<std::thread> threads;

    std::cout << "\nThread count: " << threadCount << "\n\r";

    auto startMutex = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < threadCount; ++i)
        threads.push_back(std::thread(threadFuncMutex));
    for (std::thread& thd : threads)
        thd.join();
    threads.clear();
    auto endMutex = std::chrono::high_resolution_clock::now();

    std::cout << "std::mutex:       ";
    std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endMutex - startMutex).count();
    std::cout << "us \n\r";

    auto startBoostMutex = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < threadCount; ++i)
        threads.push_back(std::thread(threadFuncBoostMutex));
    for (std::thread& thd : threads)
        thd.join();
    threads.clear();
    auto endBoostMutex = std::chrono::high_resolution_clock::now();

    std::cout << "boost::mutex:     ";
    std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endBoostMutex - startBoostMutex).count();
    std::cout << "us \n\r";

#ifdef _WIN32
    auto startCritSec = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < threadCount; ++i)
        threads.push_back(std::thread(threadFuncCritSec));
    for (std::thread& thd : threads)
        thd.join();
    threads.clear();
    auto endCritSec = std::chrono::high_resolution_clock::now();

    std::cout << "CRITICAL_SECTION: ";
    std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endCritSec - startCritSec).count();
    std::cout << "us \n\r";
#else
    auto startPThread = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < threadCount; ++i)
        threads.push_back(std::thread(threadFuncPtMutex));
    for (std::thread& thd : threads)
        thd.join();
    threads.clear();
    auto endPThread = std::chrono::high_resolution_clock::now();

    std::cout << "pthreads:         ";
    std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endPThread - startPThread).count();
    std::cout << "us \n";
#endif
}

int main()
{
#ifdef _WIN32
    InitializeCriticalSection(&g_critSec);
#else
    pthread_mutex_init(&pt_mutex, 0);
#endif

    std::cout << "Iterations: " << g_cRepeatCount << "\n\r";

    for (int i = 1; i <= g_cThreadCount; i = i * 2)
    {
        testRound(i);
        std::this_thread::sleep_for(std::chrono::seconds(1));
    }

#ifdef _WIN32
    DeleteCriticalSection(&g_critSec);
#else
    pthread_mutex_destroy(&pt_mutex);
#endif
    if (rand() % 10000 == 1)
    {
        // Added 10/27/2017 to try to prevent the compiler to completely
        // optimize out the code around g_shmem if it wouldn't be used anywhere.
        std::cout << "Shared variable value: " << g_shmem << std::endl;
    }
    return 0;
}
sulc1iza

sulc1iza6#

My results for test1

Iterations: 1000000
Thread count: 1
std::mutex:      27085us
CRITICAL_SECTION: 12035us
Thread count: 2
std::mutex:      40412us
CRITICAL_SECTION: 119952us
Thread count: 4
std::mutex:      123214us
CRITICAL_SECTION: 314774us
Thread count: 8
std::mutex:      387737us
CRITICAL_SECTION: 1664506us
Thread count: 16
std::mutex:      836901us
CRITICAL_SECTION: 3837877us
Shared variable value: 8

对于测试2

Tasks: 160000
Thread count: 1
results: 8.000000
std::mutex:       4642ms
results: 8.000000
CRITICAL_SECTION: 4588ms
Thread count: 2
results: 8.000000
results: 8.000000
std::mutex:       2309ms
results: 8.000000
results: 8.000000
CRITICAL_SECTION: 2307ms
Thread count: 4
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
std::mutex:       1169ms
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
CRITICAL_SECTION: 1162ms
Thread count: 8
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
std::mutex:       640ms
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
CRITICAL_SECTION: 628ms
Thread count: 12
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
std::mutex:       745ms
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
results: 8.000000
CRITICAL_SECTION: 672ms

相关问题