问题描述
我正在尝试使用 perf 进行性能分析。
当我使用 perf stat 时,它提供了执行时间
Performance counter stats for './quicksort_ver1 input.txt 10000':
7.00 msec task-clock:u # 0.918 cpus utilized
2,679,253 cycles:u # 0.383 GHz (9.58%)
18,034,446 instructions:u # 6.73 insn per cycle (23.56%)
5,764,095 branches:u # 822.955 M/sec (37.62%)
5,030,025 dTLB-loads # 718.150 M/sec (51.69%)
2,948,787 dTLB-stores # 421.006 M/sec (65.75%)
5,525,534 L1-dcache-loads # 788.895 M/sec (48.31%)
2,653,434 L1-dcache-stores # 378.838 M/sec (34.25%)
4,900 L1-dcache-load-misses # 0.09% of all L1-dcache hits (20.16%)
66 LLC-load-misses # 0.00% of all LL-cache hits (6.09%)
<not counted> LLC-store-misses (0.00%)
<not counted> LLC-loads (0.00%)
<not counted> LLC-stores (0.00%)
0.007631774 seconds time elapsed
0.006655000 seconds user
0.000950000 seconds sys
但是,当我使用 perf record 时,我观察到任务时钟收集了 45 个样本和 14999985 个事件。
Samples: 45 of event 'task-clock:u',Event count (approx.): 14999985
Children Self Command Shared Object Symbol
+ 91.11% 0.00% quicksort_ver1 quicksort_ver1 [.] _start
+ 91.11% 0.00% quicksort_ver1 libc-2.17.so [.] __libc_start_main
+ 91.11% 0.00% quicksort_ver1 quicksort_ver1 [.] main
有没有办法将任务时钟事件转换为秒到毫秒?
解决方法
通过一点点实验得到了答案。 task-cpu 事件的基本单位是 Nano 秒
使用 perf stat 收集的统计数据
$ sudo perf stat -e task-clock:u ./bubble_sort input.txt 50000
Performance counter stats for './bubble_sort input.txt 50000':
11,617.33 msec task-clock:u # 1.000 CPUs utilized
11.617480215 seconds time elapsed
11.615856000 seconds user
0.002000000 seconds sys
使用性能记录收集的统计数据
$ sudo perf report
Samples: 35K of event 'task-clock:u',Event count (approx.): 11715321618
Overhead Command Shared Object Symbol
73.75% bubble_sort bubble_sort [.] bubbleSort
26.15% bubble_sort bubble_sort [.] swap
0.07% bubble_sort libc-2.17.so [.] _IO_vfscanf
observe in both the cases sample has changed but event count is approximately same.
perf stat 报告已用时间为 11.617480215 秒,perf 报告 报告总任务时钟事件:11715321618
11715321618 纳秒 = 11.715321618 秒,约等于 11.615856000 秒
显然 task-cpu 事件的基本单位是纳秒。