使用 Eigen 时,GCC OpenMP 目标 ptxas 错误“指令 'call' 的参数 0 需要标签”

问题描述

我正在尝试重写一个与 OpenMP 并行的算法来尝试目标加速。 OMP 的设备功能。尝试在 OMP 构造中使用 Eigen (3.4 rc1) 时,我偶然发现了以下问题(参见示例):

最小示例

#include <iostream>
#include <Eigen/Eigen>
#include <cmath>

using Eigen::MatrixXd;

int main() {
    int n = 100000000;
    double total = 0;
    MatrixXd m(1,1);
    m(0,0) = 1;

   #pragma omp target teams distribute\
    parallel for map(tofrom: total) map(to: n,m) reduction(+:total)
    for (int i = 0; i < n; ++i) {
        total +=m(0,0)* exp(sin(M_PI * (double) i/12345.6789));
    }
        std::cout << "total is " << total << '\n';
}

使用(gcc 9.3 和 10.2 测试,nvptx-none 目标,CUDA 10.1.243)编译

g++ -I ./eigen-3.4-rc1 -fopenmp -fcf-protection=none -fno-stack-protector -foffload=nvptx-none='--verbose -lm' eigen-total-omp.cxx -lm

明显错误(完整详细输出,见下文)

ptxas /tmp/ccnYX8aR.o,line 262; error   : Label expected for argument 0 of instruction 'call'
ptxas /tmp/ccnYX8aR.o,line 262; fatal   : Call target not recognized

我还没有发现 Eigen 与 OMP 目标指令一起工作的明确确认,但是,它显然应该工作 with "normal" CUDA。 该错误不是很有帮助(或者至少我无法从中获得洞察力),但是移动了 init.d 。将矩阵对象放入 for 循环会产生一个额外的错误

ptxas /tmp/ccl3LNcx.o,line 277; error   : Label expected for argument 0 of instruction 'call'
ptxas /tmp/ccl3LNcx.o,line 277; error   : Function '_ZN5Eigen6MatrixIdLin1ELin1ELi0ELin1ELin1EEC1IiiEERKT_RKT0_' not declared in this scope
ptxas /tmp/ccl3LNcx.o,line 277; fatal   : Call target not recognized

所以我的猜测是,不知何故,目标设备的编译器在循环中看不到库?将特征路径添加-foffload 或(我偶然发现的唯一一件事)添加 -fno-exceptions 标志都没有改变任何东西。

感谢您的宝贵时间!


完整(详细)错误输出

Using built-in specs.
COLLECT_GCC=x86_64-linux-gnu-accel-nvptx-none-gcc-9
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/9/accel/nvptx-none/lto-wrapper
Target: nvptx-none
Configured with: ../src/configure --prefix=/usr --libexecdir=/usr/lib --with-gcc-major-version-only --disable-bootstrap --disable-sjlj-exceptions --enable-newlib-io-long-long --target nvptx-none --enable-as-accelerator-for=x86_64-linux-gnu --enable-languages=c,c++,fortran,lto --enable-checking=release --with-system-zlib --without-isl --program-prefix=nvptx-none- --program-suffix=-9
Thread model: single
gcc version 9.3.0 (GCC) 
COLLECT_GCC_OPTIONS='-m64' '-mgomp' '-fno-openacc' '-fPIC' '-foffload-abi=lp64' '-fopenmp' '-fcf-protection=none' '-v' '-v' '-o' '/tmp/ccDYQmTY.mkoffload'
 /usr/lib/gcc/x86_64-linux-gnu/9/accel/nvptx-none/lto1 -quiet -dumpbase ccYsZMdw.o -m64 -mgomp -auxbase ccYsZMdw -version -fno-openacc -fPIC -foffload-abi=lp64 -fopenmp -fcf-protection=none @/tmp/ccllzqV5 -o /tmp/ccpHq0W6.s
GNU GIMPLE (GCC) version 9.3.0 (nvptx-none)
    compiled by GNU C version 9.3.0,GMP version 6.2.0,MPFR version 4.0.2,MPC version 1.1.0,isl version none
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
GNU GIMPLE (GCC) version 9.3.0 (nvptx-none)
    compiled by GNU C version 9.3.0,isl version none
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
COLLECT_GCC_OPTIONS='-m64' '-mgomp' '-fno-openacc' '-fPIC' '-foffload-abi=lp64' '-fopenmp' '-fcf-protection=none' '-v' '-v' '-o' '/tmp/ccDYQmTY.mkoffload'
 /usr/lib/gcc/x86_64-linux-gnu/9/accel/nvptx-none/as -o /tmp/ccRXK6K4.o /tmp/ccpHq0W6.s
ptxas /tmp/ccRXK6K4.o,line 264; error   : Label expected for argument 0 of instruction 'call'
ptxas /tmp/ccRXK6K4.o,line 264; fatal   : Call target not recognized
ptxas fatal   : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status
mkoffload: Fatal error: x86_64-linux-gnu-accel-nvptx-none-gcc-9 returned 1 exit status
compilation terminated.
lto-wrapper: Fatal error: /usr/lib/gcc/x86_64-linux-gnu/9//accel/nvptx-none/mkoffload returned 1 exit status
compilation terminated.
/usr/bin/ld: error: lto-wrapper Failed
collect2: error: ld returned 1 exit status

更新 1:从指针访问数据会引发 libgomp: cuCtxSynchronize error: an illegal memory access was encountered 错误

int main() {
    int n = 10;
    double total = 0;
    MatrixXd m(1,0) = 1;

    double* array = m.data();
    std::cout << "array: " << array[0] <<std::endl; //this works

   #pragma omp target teams distribute\
    parallel for map(tofrom: total) map(to: n,array) reduction(+:total)
    for (int i = 0; i < n; ++i) {
        total +=array[0]; //this trows an error
    }
        std::cout << "total is " << total << '\n';
}

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)