参见英文答案 > Using AVX instructions disables exp() optimization? 1个
我注意到运行任何英特尔AVX功能后,数学函数(如ceil,round,…)需要更多的cpu周期.
请参阅以下示例:
#include dio.h>
#include teraTIONS 10000000
void run_round()
{
unsigned long int t1,t2,res,i;
double d = 3.2;
t1 = get_rdtsc();
for (i = 0 ; i < NUM_IteraTIONS ; ++i) {
res = round(d*i);
}
t2 = get_rdtsc();
printf("round res %lu total cycles %lu CPI %lu\n",t2 - t1,(t2 - t1) / NUM_IteraTIONS);
}
int main ()
{
__m256d a;
run_round();
a = _mm256_set1_pd(1);
run_round();
return 0;
}
编译:gcc -Wall -lm -mavx foo.c
输出是:
round res 31999997总周期224725952 CPI 22
round res 31999997 total cycles 1900864520 CPI 190
请指教.
最佳答案