问题描述
我的测试表明,使用无锁原子 fetch_add 比使用互斥锁慢得多。我正在运行 10 个并行线程,每个线程向共享计数器添加 100 万个。我正在使用宽松的内存顺序。我做错了原子吗?
测试输出:
fetch_add elapsed time: 4.14048 secs
fetch_add elapsed time: 5.37776 secs
fetch_add elapsed time: 5.37771 secs
fetch_add elapsed time: 5.37791 secs
fetch_add elapsed time: 5.37865 secs
fetch_add elapsed time: 7.85113 secs
fetch_add elapsed time: 7.93542 secs
fetch_add elapsed time: 7.96275 secs
fetch_add elapsed time: 7.97306 secs
Result from fetch_add:1000000000
Lock elapsed time: 0.214009 secs
Lock elapsed time: 0.418662 secs
Lock elapsed time: 0.633993 secs
Lock elapsed time: 0.836704 secs
Lock elapsed time: 1.04225 secs
Lock elapsed time: 1.24322 secs
Lock elapsed time: 1.44627 secs
Lock elapsed time: 1.64898 secs
Lock elapsed time: 1.8539 secs
Lock elapsed time: 2.05816 secs
Result from lock:1000000000
测试代码:
#include <atomic>
#include <cstdlib>
#include <time.h> /* time */
#include <iostream>
#include <thread>
#include <mutex>
#include <chrono>
const int BILLION = 1000000000L;
std::mutex my_mutex;
int global_val = 0;
std::atomic<unsigned long> global_a_val(0);
void LockFree(int length) {
struct timespec start,end;
clock_gettime(CLOCK_MONOTONIC,&start);
for (int i = 0; i < length; i++)
global_a_val.fetch_add(1,std::memory_order_relaxed);
clock_gettime(CLOCK_MONOTONIC,&end);
auto diff = BILLION * (end.tv_sec - start.tv_sec) + end.tv_nsec - start.tv_nsec;
std::cout<<"fetch_add elapsed time: "<< (double)diff/BILLION <<" secs\n";
}
void Lock(int length) {
struct timespec start,end;
clock_gettime(CLOCK_MONOTONIC,&start);
{
const std::lock_guard<std::mutex> my_lock(my_mutex);
for (int i = 0; i < length; i++) {
auto pre = ++global_val;
}
}
clock_gettime(CLOCK_MONOTONIC,&end);
auto diff = BILLION * (end.tv_sec - start.tv_sec) + end.tv_nsec - start.tv_nsec;
std::cout<<"Lock elapsed time: "<< (double)diff/BILLION <<" secs\n";
}
int main() {
const int thread_count = 10;
const int count = 100000000;
std::thread t[thread_count];
for (int i = 0; i < thread_count; i++) {
t[i] = std::thread(LockFree,count);
}
for (int i = 0; i < thread_count; i++) {
t[i].join();
}
std::cout<<"Result from fetch_add:"<<global_a_val<<"\n";
std::thread t2[thread_count];
for (int i = 0; i < thread_count; i++) {
t2[i] = std::thread(Lock,count);
}
for (int i = 0; i < thread_count; i++) {
t2[i].join();
}
std::cout<<"Result from lock:"<<global_val<<"\n";
}
解决方法
很明显它会变慢,因为你只锁定互斥锁一次。在此代码中:
{
const std::lock_guard<std::mutex> my_lock(my_mutex);
for (int i = 0; i < length; i++) {
auto pre = ++global_val;
}
}
您只锁定互斥锁一次,然后使用循环来正常添加值,但在 fetch_add
中每次都会发生原子添加。很明显,正常添加比原子添加更快,这种比较是不正确的。如果要正确比较它们,则需要这样的代码:
{
for (int i = 0; i < length; i++) {
const std::lock_guard<std::mutex> my_lock(my_mutex);
auto pre = ++global_val;
}
}
为了比较,您在我的虚拟机上的代码给出了以下时间:
fetch_add elapsed time: 11.5873 secs
fetch_add elapsed time: 12.0809 secs
fetch_add elapsed time: 12.2435 secs
fetch_add elapsed time: 12.5315 secs
fetch_add elapsed time: 12.6408 secs
fetch_add elapsed time: 12.7246 secs
fetch_add elapsed time: 12.8112 secs
fetch_add elapsed time: 12.813 secs
fetch_add elapsed time: 12.8739 secs
fetch_add elapsed time: 12.9213 secs
Result from fetch_add:1000000000
Lock elapsed time: 0.259349 secs
Lock elapsed time: 0.51883 secs
Lock elapsed time: 0.779507 secs
Lock elapsed time: 1.03916 secs
Lock elapsed time: 1.3016 secs
Lock elapsed time: 1.56413 secs
Lock elapsed time: 1.82421 secs
Lock elapsed time: 2.08337 secs
Lock elapsed time: 2.34295 secs
Lock elapsed time: 2.60248 secs
Result from lock:1000000000
但是在我提到的更改之后,结果将是这样的:
fetch_add elapsed time: 11.4242 secs
fetch_add elapsed time: 11.9451 secs
fetch_add elapsed time: 11.9901 secs
fetch_add elapsed time: 12.0678 secs
fetch_add elapsed time: 12.1317 secs
fetch_add elapsed time: 12.505 secs
fetch_add elapsed time: 12.5591 secs
fetch_add elapsed time: 12.7394 secs
fetch_add elapsed time: 12.8966 secs
fetch_add elapsed time: 12.9199 secs
Result from fetch_add:1000000000
Lock elapsed time: 70.9624 secs
Lock elapsed time: 70.9993 secs
Lock elapsed time: 71.0727 secs
Lock elapsed time: 71.0932 secs
Lock elapsed time: 71.0966 secs
Lock elapsed time: 71.1221 secs
Lock elapsed time: 71.143 secs
Lock elapsed time: 71.1462 secs
Lock elapsed time: 71.158 secs
Lock elapsed time: 71.1588 secs
Result from lock:1000000000