acc并行区域内的例程

问题描述

阅读此how-can-a-fortran-openacc-routine-call-another-fortran-openacc-routine之后,我仍然对OpenACC函数调用限制感到困惑。

以下是上面链接文章中的修改后的废话代码:

PROGRAM Test
IMPLICIT NONE

CONTAINS

 SUBROUTINE OuterRoutine( N )
 !$acc routine
   IMPLICIT NONE
   INTEGER :: N
   real :: y
   INTEGER :: i

      DO i = 0,N
         call InnerRoutine( y )
      ENDDO

 END SUBROUTINE OuterRoutine

 subroutine InnerRoutine( y )
 !$acc routine
   IMPLICIT NONE

   real :: y

 END subroutine InnerRoutine

END PROGRAM Test

当我使用nvfortran 20.7版进行编译时,我得到了

$ nvfortran -acc -Minfo routine.f90
outerroutine:
     14,Generating acc routine seq
         Generating Tesla code
     22,Reference argument passing prevents parallelization: y
innerroutine:
     27,Generating acc routine seq
         Generating Tesla code
nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION.
Error: /tmp/pgaccr22eZDXceweL.gpu (43,14): parse invalid forward reference to function '_innerroutine_' with wrong type!
ptxas /tmp/pgaccH22eJTMb0hKD.ptx,line 1; fatal   : Missing .version directive at start of file '/tmp/pgaccH22eJTMb0hKD.ptx'
ptxas fatal   : Ptx assembly aborted due to errors
NVFORTRAN-S-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (routine_inline.f90: 1)
  0 inform,0 warnings,1 severes,0 fatal for

什么触发了编译错误?作为比较,下面的代码带有acc函数调用

module data
   integer,parameter :: maxl = 100000
   real,dimension(maxl) :: xstat
   real,dimension(:),allocatable :: yalloc
   !$acc declare create(xstat,yalloc)
   logical :: IsUsed
   !$acc declare create(IsUsed)
 end module
 
 module useit
   use data
 contains
   subroutine compute(n)
      integer :: n
      integer :: i
      !$acc parallel loop present(yalloc,xstat)
      do i = 1,n
         call iprocess(i,yalloc)
      enddo
   end subroutine
   
   subroutine iprocess(i,yalloc)
      !$acc routine seq
      integer :: i
      real,intent(out) :: yalloc(:)
      if(IsUsed) call kernel(i,yalloc)

      contains

      subroutine kernel(i,yalloc)
        !$acc routine seq
        integer,intent(in) :: i
        real,intent(out) :: yalloc(:)
        yalloc(i) = 2*xstat(i)
      end subroutine

   end subroutine 

 end module
 
 program main
 
   use data
   use useit
 
   implicit none
 
   integer :: nSize = 100
   !---------------------------------------------------------------------------
 
   call allocit(nSize)
   call initialize
 
   call compute(nSize)
 
   !$acc update self(yalloc) 
   write(*,*) "yalloc(10)=",yalloc(10) ! 3
 
   call finalize
   
 contains
   subroutine allocit(n)
     integer :: n
     allocate(yalloc(n))
   end subroutine allocit
   
   subroutine initialize
     xstat = 1.0
     yalloc = 1.0
     IsUsed = .true.
     !$acc update device(xstat,yalloc,IsUsed)
   end subroutine initialize
 
   subroutine finalize
 
     deallocate(yalloc)
     
   end subroutine finalize
   
 end program main

可以用OpenACC编译并运行。

更新:令人惊讶的是,对于第一段代码,当我简单地切换子例程的顺序时,它就起作用了:

PROGRAM Test
IMPLICIT NONE

CONTAINS

 subroutine InnerRoutine( y )
 !$acc routine
   IMPLICIT NONE

   real :: y

 END subroutine InnerRoutine

 SUBROUTINE OuterRoutine( N )
 !$acc routine
   IMPLICIT NONE
   INTEGER :: N
   real :: y
   INTEGER :: i

      DO i = 0,N
         call InnerRoutine( y )
      ENDDO

 END SUBROUTINE OuterRoutine

END PROGRAM Test

让我感到非常惊奇的是,这一特殊功能取决于例行命令。但是,为什么它对上面的第二个示例有用?

解决方法

这是编译器设备代码生成错误。从“ OuterRoutine”调用“ InnerRoutine”时,编译器将隐藏参数正确添加到父级堆栈中,但“ InnerRoutine”的定义将其作为实际参数丢失。错误是被叫方和呼叫方之间不匹配。

我添加了一个问题报告,TPR#29057。不清楚是更大的问题还是小型测试用例的产物。

注意,请注意使用包含的设备子例程。 Fortran允许通过传递指向父代堆栈的指针来访问父代的局部变量。如果父级位于主机上,子级位于设备上,则直接访问父级变量将导致运行时错误。例如,如果“计算”中包含“ iprocess”,而您直接访问了“ i”,而不是将其作为参数传递,则由于设备无法访问主机的堆栈,您会得到错误消息。

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...