用于矩阵乘法的perf工具中的高速缓存引用数量不同的原因

问题描述

我们尝试了3种矩阵相乘的2种不同方式。第一种方法是一次进行乘法,第二种方法是一次接一个地进行乘法。我们想知道两种情况下高速缓存引用数量不同的原因。使用以下命令收集了观察结果: sudo perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations java Multiplication

这是我们运行的代码:

import java.util.Random;

public class Multiplication {
    
    public static void main(String args[]){
        
        int dim = 500; // Dimension of the Matrices
        Multiplication multiplication = new Multiplication();
        double[][] matrix1 = multiplication.createMatrix(dim);
        double[][] matrix2 = multiplication.createMatrix(dim);
        double[][] matrix3 = multiplication.createMatrix(dim);
        long start = System.currentTimeMillis();
        // multiplication.multiply(matrix1,matrix2,matrix3); // Multiplying at a time
        multiplication.multiply(matrix1,matrix2); // Multiplying one
        multiplication.multiply(matrix1,matrix3); // after another
        long end = System.currentTimeMillis();
        System.out.println((end - start) + " ms");
    }
    
    public void multiply(double[][] mat1,double[][] mat2) {
        int size = mat1.length;
        double[][] product = new double[size][size];
        for (int i = 0; i < size; i++) {
            for (int j = 0; j < size; j++) {
                for (int k = 0; k < size; k++) {
                    product[i][j] += mat1[i][k] * mat2[k][j];
                }
            }
        }
    }
    
    public void multiply(double[][] mat1,double[][] mat2,double[][] mat3){
        int size = mat1.length;
        double[][] product1 = new double[size][size];
        double[][] product2 = new double[size][size];
        for (int i = 0; i < size; i++) {
            for (int j = 0; j < size; j++) {
                for (int k = 0; k < size; k++) {
                    product1[i][j] += mat1[i][k] * mat2[k][j];
                    product2[i][j] += mat1[i][k] * mat3[k][j];
                }
            }
        }
    }
    
    public double[][] createMatrix(int dim) {
        double[][] mat = new double[dim][dim];
        Random r = new Random();
        for (int i = 0; i < dim; i++) {
            for (int j = 0; j < dim; j++) {
                mat[i][j] = r.nextDouble();
            }
        }

        return mat;
    }

}

以下是观察结果:

观察是针对4个不同维度(500、1000、1500、2000)的,它们在第一行中是写的,第二行是时间,其余是性能统计结果。

一次:

500
398 ms

 Performance counter stats for 'java Multiplication':

       411,915,294      cache-references                                            
         1,399,148      cache-misses              #    0.340 % of all cache refs    
     2,042,614,097      cycles                                                      
     3,437,759,396      instructions              #    1.68  insn per cycle         
       475,939,754      branches                                                    
             5,627      faults                                                      
                10      migrations                                                  

       0.455140444 seconds time elapsed

       0.476102000 seconds user
       0.011804000 seconds sys


1000
11993 ms

 Performance counter stats for 'java Multiplication':

    11,704,565,015      cache-references                                            
       333,223,106      cache-misses              #    2.847 % of all cache refs    
    52,324,922,900      cycles                                                      
    23,763,328,245      instructions              #    0.45  insn per cycle         
     3,124,574,763      branches                                                    
            12,999      faults                                                      
                 9      migrations                                                  

      12.108233932 seconds time elapsed

      12.086936000 seconds user
       0.059856000 seconds sys


1500
53069 ms

 Performance counter stats for 'java Multiplication':

    47,426,797,689      cache-references                                            
     1,435,495,676      cache-misses              #    3.027 % of all cache refs    
   229,231,012,094      cycles                                                      
    78,724,694,438      instructions              #    0.34  insn per cycle         
    10,291,361,842      branches                                                    
            34,704      faults                                                      
                28      migrations                                                  

      53.251270984 seconds time elapsed

      53.135376000 seconds user
       0.143987000 seconds sys


2000
148669 ms

 Performance counter stats for 'java Multiplication':

   122,810,341,708      cache-references                                            
     3,628,091,933      cache-misses              #    2.954 % of all cache refs    
   626,161,537,985      cycles                                                      
   185,767,651,022      instructions              #    0.30  insn per cycle         
    24,266,992,254      branches                                                    
            58,795      faults                                                      
               127      migrations                                                  

     148.950934773 seconds time elapsed

     149.186738000 seconds user
       0.243997000 seconds sys

一个接一个:

500
388 ms

 Performance counter stats for 'java Multiplication':

       146,687,848      cache-references                                            
         1,556,581      cache-misses              #    1.061 % of all cache refs    
     1,971,147,110      cycles                                                      
     3,270,586,420      instructions              #    1.66  insn per cycle         
       467,409,752      branches                                                    
             5,450      faults                                                      
                13      migrations                                                  

       0.483391040 seconds time elapsed

       0.505959000 seconds user
       0.008294000 seconds sys


1000
6470 ms

 Performance counter stats for 'java Multiplication':

     4,229,871,529      cache-references                                            
        18,662,663      cache-misses              #    0.441 % of all cache refs    
    28,984,959,677      cycles                                                      
    22,751,499,355      instructions              #    0.78  insn per cycle         
     3,123,552,014      branches                                                    
            12,842      faults                                                      
                 7      migrations                                                  

       6.579694810 seconds time elapsed

       6.600153000 seconds user
       0.016010000 seconds sys

1500
48918 ms

 Performance counter stats for 'java Multiplication':

    38,902,944,785      cache-references                                            
     1,166,289,245      cache-misses              #    2.998 % of all cache refs    
   213,672,056,122      cycles                                                      
    75,416,350,352      instructions              #    0.35  insn per cycle         
    10,306,363,917      branches                                                    
            34,750      faults                                                      
                26      migrations                                                  

      49.121111199 seconds time elapsed

      49.172314000 seconds user
       0.088022000 seconds sys

2000
120057 ms

 Performance counter stats for 'java Multiplication':

    97,707,304,381      cache-references                                            
     3,208,749,714      cache-misses              #    3.284 % of all cache refs    
   516,080,961,402      cycles                                                      
   177,793,621,137      instructions              #    0.34  insn per cycle         
    24,272,120,015      branches                                                    
            45,235      faults                                                      
                87      migrations                                                  

     120.368185469 seconds time elapsed

     120.439402000 seconds user
       0.152049000 seconds sys

我们在观察中可以看到,在两种情况下,高速缓存引用都存在相当大的差距。为了确保这不是一时的错误,我们多次运行了案例并获得了相似的结果。我们无法找出造成缓存引用差异的原因,并希望获得相同的帮助。

谢谢

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...