如果要跨多个处理器计算总和,应该如何用MPI广播?

问题描述

我开始涉足并行计算,并且已经开始使用C进行MPI。我了解如何使用p2p(发送/接收)来做这种事情,但是我的困惑是当我尝试将集体通信与bcast和减少。

我的代码如下:

int collective(int val,int rank,int n,int *toSum){
        int *globalBuf=malloc(n*sizeof(int*));
        int globalSum=0;
        int localSum=0;
        struct timespec before;

        if(rank==0){
                //only rank 0 will start timer
                clock_gettime(CLOCK_MONOTONIC,&before);
        }
        int numInts=(val*100000)/n;
        int *mySum = malloc((numInts)*sizeof(int *));

        int j;
        for(j=rank*numInts;j<numInts*rank+numInts;j++){
                localSum=localSum+(toSum[j]);
        }

        MPI_Bcast(&localSum,1,MPI_INT,rank,MPI_COMM_WORLD);
        MPI_Reduce(&localSum,&globalSum,n,MPI_SUM,MPI_COMM_WORLD);
        if(rank==0){
                printf("Communicative sum = %d\n",globalSum);
                //only rank 0 will end the timer
                //an display
                struct timespec after;
                clock_gettime(CLOCK_MONOTONIC,&after);
                printf("Time to complete = %f\n",(after.tv_nsec-before.tv_nsec));
        }
}

传入的参数可以描述为:

val = the number of total ints that need to be summed - divided by 100000
rank= the rank of this process
n = the total number of processes
toSum = the ints that are going to be added together

我开始遇到错误的地方是我尝试广播要由等级0处理的处理器 localSum

我将解释我在函数调用添加内容,以便您可能了解我的困惑来自何处。

对于MPI_Bcast:

&localSum - the address of this processes sum
1 - there is one value that I want to broadcast,the int held by localSum
MPI_INT - meaning implied
rank - the rank of this process that is broadcasting
MPI_COMM_WORLD - meaning implied

对于MPI_Reduce

&localSum - the address of the variable that it will "reducing"
&globalSum - the address of the variable that I want to hold the reduced values of localSum
n - the number of "localSum"s that this process will reduce (n is number of processes)
MPI_INT - meaning implied
MPI_SUM - meaning implied
0 - I want rank 0 to be the process that will reduce so it can print
MPI_COMM_WORLD - meaning implied

浏览代码时,我觉得它在逻辑上是有道理的,并且可以编译,但是,当我使用m个处理器运行该程序时,会收到以下错误消息:

Assertion Failed in file src/mpi/coll/helper_fns.c at line 84: FALSE
memcpy argument memory ranges overlap,dst_=0x7fffffffd2ac src_=0x7fffffffd2a8 len_=16

internal ABORT - process 0

有人可以帮助我找到解决方案吗?抱歉,这是第二性,这只是我的第三个并行程序,并且是第一次使用bcast / reduce!

解决方法

在您的代码中提供的集体操作(MPI_BcastMPI_Reduce)调用中,我看到两个问题。首先,在MPI_Reduce中,将每个进程的整数localSum减少为整数globalSum。基本上是单个整数。但是在您的MPI_Reduce调用中,您试图降低n的值,实际上,您只需要从n进程中降低 1 的值。这可能会导致此错误。

如果要减少单个值,则reduce在理想情况下应像这样:

    MPI_Reduce(&localSum,&globalSum,1,MPI_INT,MPI_SUM,MPI_COMM_WORLD);

对于广播,

    MPI_Bcast(&localSum,rank,MPI_COMM_WORLD);

每个等级都在您的通话中广播。根据广播的一般思想,应该有一个 root 流程,该流程应将值广播到所有流程。因此,呼叫应如下所示:

    int rootProcess = 0;
    MPI_Bcast(&localSum,rootProcess,MPI_COMM_WORLD);

这里,rootProcess将把localSum中包含的值发送给所有进程。同时,所有调用此广播的进程将从rootProcess接收值并将其存储在其本地变量localSum