为什么带有数组作为输入的子例程比具有自动本地数组的相同子例程具有更快的性能？

问题描述

我正在重新编写一些旧代码以提高可读性，并希望使其更易于维护。

我试图减少子程序的输入参数数量，但是我发现 subroutine sub(N,ID)-> subroutine sub(N) 明显降低了性能。

ID仅在sub中使用，因此我认为将其作为输入没有意义。是否可以使用sub(N)而不会影响性能？（对于我来说，N

性能比较：

sub_1
- N = 4，0.9秒
- N = 20，1.0秒
- N = 200，2.1秒
sub_2
- N = 4，0.07秒
- N = 20，0.18秒
- N = 200，1.3秒

我将Mac OS 10.14.6与gfortran 5.2.0结合使用

program test
  integer,parameter  :: N = 1
  real,dimension(N)  :: ID


  call cpu_time(t1)

  do i = 1,10000000
    CALL sub_1(N)
  end do

  call cpu_time(t2)
  write ( *,* ) 'Elapsed real time =',t2 - t1



  call cpu_time(t1)

  do i = 1,10000000
    CALL sub_2(N,ID)
  end do

  call cpu_time(t2)
  write ( *,t2 - t1

end program test



SUbroUTINE sub_1(N)
  integer,intent(in)      :: N
  real,dimension(N)                  :: ID

  ID = 0.0

END SUbroUTINE sub_1



SUbroUTINE sub_2(N,ID)
  integer,dimension(N),intent(in out)  :: ID

  ID = 0.0

END SUbroUTINE sub_2

解决方法

这似乎是您正在使用的gfortran旧版本的“功能”。如果我至少在N = 10时使用更高版本，那么时间的可比性要大得多：

ian@eris:~/work/stack$ head s.f90
program test
  integer,parameter  :: N = 10
  real,dimension(N)  :: ID


  call CPU_time(t1)

  do i = 1,10000000
    CALL sub_1(N)
  end do
ian@eris:~/work/stack$ gfortran-5 --version
GNU Fortran (Ubuntu 5.5.0-12ubuntu1) 5.5.0 20171010
Copyright (C) 2015 Free Software Foundation,Inc.

GNU Fortran comes with NO WARRANTY,to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters,see the file named COPYING

ian@eris:~/work/stack$ gfortran-5 -O3 s.f90
ian@eris:~/work/stack$ ./a.out
 Elapsed real time =  0.149489999    
 Elapsed real time =   1.99675560E-06
ian@eris:~/work/stack$ gfortran-6 --version
GNU Fortran (Ubuntu 6.5.0-2ubuntu1~18.04) 6.5.0 20181026
Copyright (C) 2017 Free Software Foundation,Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

ian@eris:~/work/stack$ gfortran-6 -O3 s.f90
ian@eris:~/work/stack$ ./a.out
 Elapsed real time =   7.00005330E-06
 Elapsed real time =   5.00003807E-06
ian@eris:~/work/stack$ gfortran-7 --version
GNU Fortran (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
Copyright (C) 2017 Free Software Foundation,Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

ian@eris:~/work/stack$ gfortran-7 -O3 s.f90
ian@eris:~/work/stack$ ./a.out
 Elapsed real time =   8.00006092E-06
 Elapsed real time =   6.00004569E-06
ian@eris:~/work/stack$ gfortran-8 --version
GNU Fortran (Ubuntu 8.3.0-6ubuntu1~18.04.1) 8.3.0
Copyright (C) 2018 Free Software Foundation,Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

ian@eris:~/work/stack$ gfortran-8 -O3 s.f90
ian@eris:~/work/stack$ ./a.out
 Elapsed real time =   9.00030136E-06
 Elapsed real time =   6.00004569E-06

不过，我会把上面所有的盐都装满一桶盐。优化器很可能已经得出结论，在这种简单情况下，实际上不需要执行任何操作，因此它摆脱了您想计时的所有操作-唯一可以真正告诉您的基准是您要运行的代码。

sub_1和sub_2并没有真正的可比性。在sub_1中，您正在分配ID，初始化所有元素，然后在子例程返回时将其丢弃（因为它是子例程的局部变量）。

由于从未使用过ID数组，因此编译器可以优化其创建和初始化。如果使用-O3进行编译，这就是gfortran所做的。为sub_1生成的代码只能返回。

在sub_2中，仍必须将ID的所有元素都设置为0.0。

我认为这与数组分配有关。

分配内存本身的过程需要时间。当您将数组原封不动地传递到子例程sub_2时，我认为该子例程很有可能不需要为该数组分配内存。这可能假设数组是在堆上创建的，而不是在堆栈上创建的，但我不确定100％确定。

另一方面，对于子例程sub_1，它需要每次重新为数组分配空间。

不幸的是，我不太精通优化，所以我希望其他人会同意我或告诉我我错了;）

fortran gfortran