代码执行“延迟”从何而来？

问题描述

我有一个问题，就是代码执行中经常会有一些无法解释的延迟。对于延迟，我的意思是说执行一段需要固定时间的代码有时需要更多时间。

我附加了一个小型C程序，该程序在cpu内核1上执行一些“虚拟”计算。线程固定在该内核上。我已经在具有192 GiB RAM和96 cpu内核的Ubuntu 18.04计算机上执行了它。这台机器什么都不做。

该工具仅运行一个线程（主线程正在休眠），至少perf工具未显示任何开关（线程开关），因此这应该不是问题。

该工具的输出看起来像这样（每秒显示或多或少）：

...

Stats:
 Max [us]: 883
 Min [us]: 0
 Avg [us]: 0.022393

...

这些统计信息始终显示1'000'000运行的结果。我的问题是为什么最大值总是那么大？ 99.99％的分位数通常也很大（我没有将它们添加到示例中以使代码变小；最大值也很好地显示了此行为）。为什么会发生这种情况，我该如何避免呢？在某些应用中，这种“差异”对我来说是个大问题。

鉴于没有其他东西在运行，我很难理解这些值。

非常感谢您

main.c：

#define _GNU_SOURCE

#include <stdio.h>
#include <stdbool.h>
#include <sys/time.h>
#include <pthread.h>
#include <sys/sysinfo.h>

static inline unsigned long Now_us()
{
    struct timeval tx;
    gettimeofday(&tx,NULL);
    return tx.tv_sec * 1000000 + tx.tv_usec;
}

static inline int calculate(int x)
{
    /* Do something "expensive" */
    for (int i = 0; i < 1000; ++i) {
        x = (~x * x + (1 - x)) ^ (13 * x);
        x += 2;
    }
    return x;
}

static void *worker(void *arg)
{
    (void)arg;

    const int runs_per_measurement = 1000000;
    int dummy = 0;
    while (true) {
        int max_us = -1;
        int min_us = -1;
        int sum_us = 0;
        for (int i = 0; i < runs_per_measurement; ++i) {
            const long start_us = Now_us();
            dummy = calculate(dummy);
            const long runtime_us = Now_us() - start_us;
            
            /* Update stats */
            if (max_us < runtime_us) {
                max_us = runtime_us;
            }
            if (min_us < 0 || min_us > runtime_us) {
                min_us = runtime_us;
            }
            sum_us += runtime_us;
        }
        printf("Stats:\n");
        printf(" Max [us]: %d\n",max_us);
        printf(" Min [us]: %d\n",min_us);
        printf(" Avg [us]: %f\n",(double)sum_us / runs_per_measurement);
        printf("\n");
    }

    return NULL;
}

int main()
{
    pthread_t worker_thread;

    if (pthread_create(&worker_thread,NULL,worker,NULL) != 0) {
        printf("Cannot create thread!\n");
        return 1;
    }

    /* Use cpu number 1 */
    cpu_set_t cpuset;
    cpu_ZERO(&cpuset);
    cpu_SET(1,&cpuset);

    if (pthread_setaffinity_np(worker_thread,sizeof(cpuset),&cpuset) != 0) {
        printf("Cannot set cpu core!\n");
        return 1;
    }

    pthread_join(worker_thread,NULL);

    return 0;
}

Makefile：

main: main.c
    gcc -o $@ $^ -Ofast -lpthread -Wall -Wextra -Werror

解决方法

这是一个很好的例子，说明了操作系统中如何进行多处理。

如以上评论所述：

“这台机器什么都不做”->荒唐。运行ps -e了解计算机正在执行的所有其他操作。 –约翰·布林格（John Bollinger）

这是通过操作系统（特别是内核）使一个任务运行一段时间然后暂停并允许另一个任务运行来实现的。

因此，您的代码可以有效地运行一小段，然后在其他代码运行时暂停，然后再运行一小段，依此类推。

这是在计算经过时间时所看到的时间变化，而不是“ cpu-time”（实际运行时间）。 C具有一些用于测量CPU时间的标准功能，例如GNU的this

更详细地介绍了CPU调度here

最后，为了不被抢占，您需要在Kenel-space，Bare-metal或“实时”操作系统中运行代码。（我会让你用谷歌搜索这些术语的意思是：-)）

唯一的其他解决方案是探索linux / unix的“ nice values”（我也会用google搜索它，但基本上它会为您的进程分配更高或更低的优先级。）

如果您对这种事情感兴趣，那么Robert Love会写一本很棒的书，标题为《 Linux内核开发》。

c latency linux linux scheduler scheduler scheduler