为什么在 while 循环中使用或不使用 sleep/usleep 时花费在相同函数/方法调用上的时间会有很大差异？

问题描述

我原来的c++演示代码如下所示：

impl<T,U: From<T>> Into<U> for T;

有些人可能会怀疑我对 instance.Search() 的调用带来了未知的东西，所以请参考下面的代码：

From<i32> for StructA

前提：使用int counter = 0; while (counter < 5) { auto start = std::chrono::high_resolution_clock::Now(); // instance and result are pre-defined local variables instance.Search(40.055948,116.411325,&result); auto end = std::chrono::high_resolution_clock::Now(); int64_t cost_us = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count(); std::cout << "cost_us=" << cost_us << std::endl; // usleep(100); // case1: sleep 100 us // sleep(1); // case2: sleep 1 second // case3: no sleep at all counter++; }s统计cache_miss、指令等...

case 1：在while循环末尾添加usleep(100)时，进程结束后的结果为：

#include <time.h>
#include <unistd.h>
#include <iostream>
#include <set>
#include <chrono>

void test(const std::set<int>& numbers) {

  for (int counter = 0; counter < 5; ++counter) {
    auto start = std::chrono::high_resolution_clock::Now();
    auto it = numbers.lower_bound(5555555);
    auto end = std::chrono::high_resolution_clock::Now();
    int64_t cost_us = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
    std::cout << "cost_us=" << cost_us << std::endl;
    // usleep(100);
    // sleep(1);
  }
}

int main() {
  std::set<int> test_set;
  for (int i = 0; i < 100000000; i++) {
    test_set.insert(i);
  }
  test(test_set);
}

case 2: 添加 sleep(1) 时，结果为：

perf stat

情况 3：根本没有 sleep()/usleep()，结果是：

cost_us=5
cost_us=5
cost_us=5
cost_us=8
cost_us=6
Performance counter stats for './latency_perf_test_sleep_100_us':
    1,785,438     cache-references
      419,583     cache-misses        #   23.500 % of all cache refs
  203,832,235     cycles
  118,093,490     instructions        #   0.58 insn per cycle
   23,198,708     branches
       35,092     faults
          302     migrations

  1.031460583 seconds time elapsed

如上所述，相同函数/方法调用的时间成本在不同情况下差异很大。起初，我倾向于认为 cost_us=7 cost_us=65 cost_us=21 cost_us=21 cost_us=32 Performance counter stats for './latency_perf_test_sleep_1_sec': 15,302 cache-references 1,303,941 cache-misses # 8.639 % of all cache refs 14,759,103,041 cycles 24,548,401,788 instructions # 1.66 insn per cycle 5,062,488,529 branches 35,372 faults 3,444 migrations 6.033182248 seconds time elapsed 会导致缓存未命中（我的调用使用的数据）。但是，在我使用 cost_us=5 cost_us=2 cost_us=1 cost_us=1 cost_us=1 Performance counter stats for './latency_perf_test_without_sleep': 1,715,128 cache-references 420,368 cache-misses # 24.509 % of all cache refs 209,238,013 cycles 130,647,626 instructions # 0.62 insn per cycle 25,827,456 branches 35,092 faults 362 migrations 1.032256618 seconds time elapsed 将我的进程与特定的 cpu 内核绑定后，差异并没有像我预期的那样消失。

我也想知道为什么添加 sleep() / taskset 会导致 sleep() 计数的 usleep() 急剧增加。

我没有阅读任何 instructions 或 perf stat 的源代码，但我猜当进程调用 sleep() 或 usleep() (两者都在内部调用 sleep()）。

谁能解释这种奇怪现象背后的原因？提前致谢。

解决方法

谁能解释这种奇怪现象背后的原因？

您的 sleep 看起来像 this glibc sources from 2012 sysv/linux/sleep.c。由于内核错误（或者它是预期的 SysV 行为？我不确定......），它在调用 SIGCHLD 之前阻塞 nanosleep，请参阅 this commit 并由它的 LKML 线程引用。

很可能额外的开销来自调用 __sig* 相关函数。要进一步调查，请分析代码（考虑 gprof）或/并从带有调试信息的源编译您的 glibc（或仅安装调试信息，如果可以）然后分析代码。

c++sleep sleep usleep