问题描述
阅读此how-can-a-fortran-openacc-routine-call-another-fortran-openacc-routine之后,我仍然对OpenACC函数调用限制感到困惑。
以下是上面链接文章中的修改后的废话代码:
PROGRAM Test
IMPLICIT NONE
CONTAINS
SUBROUTINE OuterRoutine( N )
!$acc routine
IMPLICIT NONE
INTEGER :: N
real :: y
INTEGER :: i
DO i = 0,N
call InnerRoutine( y )
ENDDO
END SUBROUTINE OuterRoutine
subroutine InnerRoutine( y )
!$acc routine
IMPLICIT NONE
real :: y
END subroutine InnerRoutine
END PROGRAM Test
当我使用nvfortran
20.7版进行编译时,我得到了
$ nvfortran -acc -Minfo routine.f90
outerroutine:
14,Generating acc routine seq
Generating Tesla code
22,Reference argument passing prevents parallelization: y
innerroutine:
27,Generating acc routine seq
Generating Tesla code
nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION.
Error: /tmp/pgaccr22eZDXceweL.gpu (43,14): parse invalid forward reference to function '_innerroutine_' with wrong type!
ptxas /tmp/pgaccH22eJTMb0hKD.ptx,line 1; fatal : Missing .version directive at start of file '/tmp/pgaccH22eJTMb0hKD.ptx'
ptxas fatal : Ptx assembly aborted due to errors
NVFORTRAN-S-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (routine_inline.f90: 1)
0 inform,0 warnings,1 severes,0 fatal for
什么触发了编译错误?作为比较,下面的代码带有acc函数调用
module data
integer,parameter :: maxl = 100000
real,dimension(maxl) :: xstat
real,dimension(:),allocatable :: yalloc
!$acc declare create(xstat,yalloc)
logical :: IsUsed
!$acc declare create(IsUsed)
end module
module useit
use data
contains
subroutine compute(n)
integer :: n
integer :: i
!$acc parallel loop present(yalloc,xstat)
do i = 1,n
call iprocess(i,yalloc)
enddo
end subroutine
subroutine iprocess(i,yalloc)
!$acc routine seq
integer :: i
real,intent(out) :: yalloc(:)
if(IsUsed) call kernel(i,yalloc)
contains
subroutine kernel(i,yalloc)
!$acc routine seq
integer,intent(in) :: i
real,intent(out) :: yalloc(:)
yalloc(i) = 2*xstat(i)
end subroutine
end subroutine
end module
program main
use data
use useit
implicit none
integer :: nSize = 100
!---------------------------------------------------------------------------
call allocit(nSize)
call initialize
call compute(nSize)
!$acc update self(yalloc)
write(*,*) "yalloc(10)=",yalloc(10) ! 3
call finalize
contains
subroutine allocit(n)
integer :: n
allocate(yalloc(n))
end subroutine allocit
subroutine initialize
xstat = 1.0
yalloc = 1.0
IsUsed = .true.
!$acc update device(xstat,yalloc,IsUsed)
end subroutine initialize
subroutine finalize
deallocate(yalloc)
end subroutine finalize
end program main
可以用OpenACC编译并运行。
更新:令人惊讶的是,对于第一段代码,当我简单地切换子例程的顺序时,它就起作用了:
PROGRAM Test
IMPLICIT NONE
CONTAINS
subroutine InnerRoutine( y )
!$acc routine
IMPLICIT NONE
real :: y
END subroutine InnerRoutine
SUBROUTINE OuterRoutine( N )
!$acc routine
IMPLICIT NONE
INTEGER :: N
real :: y
INTEGER :: i
DO i = 0,N
call InnerRoutine( y )
ENDDO
END SUBROUTINE OuterRoutine
END PROGRAM Test
让我感到非常惊奇的是,这一特殊功能取决于例行命令。但是,为什么它对上面的第二个示例有用?
解决方法
这是编译器设备代码生成错误。从“ OuterRoutine”调用“ InnerRoutine”时,编译器将隐藏参数正确添加到父级堆栈中,但“ InnerRoutine”的定义将其作为实际参数丢失。错误是被叫方和呼叫方之间不匹配。
我添加了一个问题报告,TPR#29057。不清楚是更大的问题还是小型测试用例的产物。
注意,请注意使用包含的设备子例程。 Fortran允许通过传递指向父代堆栈的指针来访问父代的局部变量。如果父级位于主机上,子级位于设备上,则直接访问父级变量将导致运行时错误。例如,如果“计算”中包含“ iprocess”,而您直接访问了“ i”,而不是将其作为参数传递,则由于设备无法访问主机的堆栈,您会得到错误消息。