如果要跨多个处理器计算总和，应该如何用MPI广播？

问题描述

我开始涉足并行计算，并且已经开始使用C进行MPI。我了解如何使用p2p（发送/接收）来做这种事情，但是我的困惑是当我尝试将集体通信与bcast和减少。

我的代码如下：

int collective(int val,int rank,int n,int *toSum){
        int *globalBuf=malloc(n*sizeof(int*));
        int globalSum=0;
        int localSum=0;
        struct timespec before;

        if(rank==0){
                //only rank 0 will start timer
                clock_gettime(CLOCK_MONOTONIC,&before);
        }
        int numInts=(val*100000)/n;
        int *mySum = malloc((numInts)*sizeof(int *));

        int j;
        for(j=rank*numInts;j<numInts*rank+numInts;j++){
                localSum=localSum+(toSum[j]);
        }

        MPI_Bcast(&localSum,1,MPI_INT,rank,MPI_COMM_WORLD);
        MPI_Reduce(&localSum,&globalSum,n,MPI_SUM,MPI_COMM_WORLD);
        if(rank==0){
                printf("Communicative sum = %d\n",globalSum);
                //only rank 0 will end the timer
                //an display
                struct timespec after;
                clock_gettime(CLOCK_MONOTONIC,&after);
                printf("Time to complete = %f\n",(after.tv_nsec-before.tv_nsec));
        }
}

传入的参数可以描述为：

val = the number of total ints that need to be summed - divided by 100000
rank= the rank of this process
n = the total number of processes
toSum = the ints that are going to be added together

我开始遇到错误的地方是我尝试广播要由等级0处理的处理器 localSum 。

我将解释我在函数调用中添加的内容，以便您可能了解我的困惑来自何处。

对于MPI_Bcast：

&localSum - the address of this processes sum
1 - there is one value that I want to broadcast,the int held by localSum
MPI_INT - meaning implied
rank - the rank of this process that is broadcasting
MPI_COMM_WORLD - meaning implied

对于MPI_Reduce

&localSum - the address of the variable that it will "reducing"
&globalSum - the address of the variable that I want to hold the reduced values of localSum
n - the number of "localSum"s that this process will reduce (n is number of processes)
MPI_INT - meaning implied
MPI_SUM - meaning implied
0 - I want rank 0 to be the process that will reduce so it can print
MPI_COMM_WORLD - meaning implied

浏览代码时，我觉得它在逻辑上是有道理的，并且可以编译，但是，当我使用m个处理器运行该程序时，会收到以下错误消息：

Assertion Failed in file src/mpi/coll/helper_fns.c at line 84: FALSE
memcpy argument memory ranges overlap,dst_=0x7fffffffd2ac src_=0x7fffffffd2a8 len_=16

internal ABORT - process 0

有人可以帮助我找到解决方案吗？抱歉，这是第二性，这只是我的第三个并行程序，并且是第一次使用bcast / reduce！

解决方法

在您的代码中提供的集体操作（MPI_Bcast，MPI_Reduce）调用中，我看到两个问题。首先，在MPI_Reduce中，将每个进程的整数localSum减少为整数globalSum。基本上是单个整数。但是在您的MPI_Reduce调用中，您试图降低n的值，实际上，您只需要从n进程中降低 1 的值。这可能会导致此错误。

如果要减少单个值，则reduce在理想情况下应像这样：

    MPI_Reduce(&localSum,&globalSum,1,MPI_INT,MPI_SUM,MPI_COMM_WORLD);

对于广播，

    MPI_Bcast(&localSum,rank,MPI_COMM_WORLD);

每个等级都在您的通话中广播。根据广播的一般思想，应该有一个 root 流程，该流程应将值广播到所有流程。因此，呼叫应如下所示：

    int rootProcess = 0;
    MPI_Bcast(&localSum,rootProcess,MPI_COMM_WORLD);

这里，rootProcess将把localSum中包含的值发送给所有进程。同时，所有调用此广播的进程将从rootProcess接收值并将其存储在其本地变量localSum

中

assertion c memory mpi parallel-processing