在DPC ++向量加法中给出更大数组大小的随机退出代码

问题描述

我正在尝试运行一个oneAPI的HPC世界示例，该示例在cpu和GPU上都添加了两个一维数组，并验证了结果。代码如下所示：

/*
DataParallel Addition of two Vectors
*/

#include <CL/sycl.hpp>
#include <array>
#include <iostream>
using namespace sycl;

constexpr size_t array_size = 100000;
typedef std::array<int,array_size> IntArray;

// Initialize array with the same value as its index
void InitializeArray(IntArray& a) { for (size_t i = 0; i < a.size(); i++) a[i] = i; }

/*
Create an asynchronous Exception Handler for sycl
*/
static auto exception_handler = [](cl::sycl::exception_list eList) {
    for (std::exception_ptr const& e : eList) {
        try {
            std::rethrow_exception(e);
        }
        catch (std::exception const& e) {
            std::cout << "Failure" << std::endl;
            std::terminate();
        }
    }
};

void VectorAddParallel(queue &q,const IntArray& x,const IntArray& y,IntArray& parallel_sum) {
    range<1> num_items{ x.size() };
    
    buffer x_buf(x);
    buffer y_buf(y);
    buffer sum_buf(parallel_sum.data(),num_items);

    /*
    Submit a command group to the queue by a lambda
    which contains data access permissions and device computation
    */
    q.submit([&](handler& h) {

        auto xa = x_buf.get_access<access::mode::read>(h);
        auto ya = y_buf.get_access<access::mode::read>(h);
        auto sa = sum_buf.get_access<access::mode::write>(h);

        std::cout << "Adding on GPU (Parallel)\n";
        h.parallel_for(num_items,[=](id<1> i) { sa[i] = xa[i] + ya[i]; });
        std::cout << "Done on GPU (Parallel)\n";
    });

    /*
    queue runs the kernel asynchronously. Once beyond the scope,buffers' data is copied back to the host.
    */
}

int main() {
    default_selector d_selector;
    IntArray a,b,sequential,parallel;

    InitializeArray(a);
    InitializeArray(b);

    try {
        // Queue needs: Device and Exception handler
        queue q(d_selector,exception_handler);
        
        std::cout << "Accelerator: " 
                  << q.get_device().get_info<info::device::name>() << "\n";
        std::cout << "Vector size: " << a.size() << "\n";
        VectorAddParallel(q,a,parallel);
    }
    catch (std::exception const& e) {
        std::cout << "Exception while creating Queue. Terminating...\n";
        std::terminate();
    }
    
    /*
    Do the sequential,which is supposed to be slow
    */
    std::cout << "Adding on cpu (Scalar)\n";
    for (size_t i = 0; i < sequential.size(); i++) {
        sequential[i] = a[i] + b[i];
    }
    std::cout << "Done on cpu (Scalar)\n";
    
    /*
    Verify results,the old-school way
    */
    for (size_t i = 0; i < parallel.size(); i++) {
        if (parallel[i] != sequential[i]) {
            std::cout << "Fail: " << parallel[i] << " != " << sequential[i] << std::endl;
            std::cout << "Failed. Results do not match.\n";
            return -1;
        }
    }
    std::cout << "Success!\n";
    return 0;
}

在array_size相对较小的情况下（我测试了100-50k个元素），计算结果很好。样本输出：

Accelerator: Intel(R) Gen9
Vector size: 50000
Adding on GPU (Parallel)
Done on GPU (Parallel)
Adding on cpu (Scalar)
Done on cpu (Scalar)
Success!

可以注意到，在cpu和GPU上完成计算仅需一秒钟。但是当我增加array_size时，例如100000时，我得到了这个看似毫无头绪的错误：

C:\Users\myuser\source\repos\dpcpp-iotas\x64\Debug\dpcpp-iotas.exe (process 24472) exited with code -1073741571.

虽然我不确定错误会以什么精确值开始出现，但我似乎可以确定错误发生在70000之后。我似乎不知道为什么会这样，对什么可能是错误的任何见解？

解决方法

结果证明，这是由于VS增强了堆栈大小。元素过多的连续数组导致堆栈溢出。

如@ user4581301所述，十六进制的错误代码-107374171给出C00000FD，这是Visual Studio中“堆栈耗尽/溢出”的签名表示形式。

要解决的方法：

在“项目属性”>“链接器”>“系统”>“堆栈保留/提交”值中，将/STACK保留增加到大于1MB（这是默认值）。
使用二进制编辑器（editbin.exe和dumpbin.exe）编辑/STACK:reserve。
改为使用std::vector，它可以动态分配（由@Retired Ninja建议）。

我找不到在一个API中更改/STACK的选项，链接器属性中的正常方式显示为here。

我决定采用动态分配。

当我编写大型应用程序时，我总是做一个

ulimit -s unlimited

向外壳解释我已经长大了，我真的希望堆栈上有一些空间。

这是bash语法，但是您显然可以适应其他一些shell。

我猜非UNIX操作系统可能会等效吗？

c++dpc++intel-oneapi opencl opencl sycl