如何在CUDA上对结构应用原子操作？

让该结构定义如下：

typedef struct S { 
    float x;
    float y;
} T;

和操作struct_add的定义如下：

__device__ T struct_add(T a1,T a2) {
    T result;
    result.x = a1.x + a2.x;
    result.y = a1.y + a2.y;
}

如果我想以原子方式应用struct_add，如何在CUDA中实现呢？例如，a，b和c需要使用struct_add求和，结果需要存储在d中。（其中a，b，c和d的类型为T）

我听说不建议通过while循环进行“锁定和访问控制”。有什么合适的方法可以实现这一点吗？

CUDA没有提供涵盖任意结构原子更新的常规原子方法。一些可能性：

沿上面建议1的内容，这是对代码from this answer的修改，它可能表明如何使用64位原子：

-bash: dot: command not found

我不保证上面的代码没有缺陷。我建议在使用前仔细测试。