将 SparseAttention 与 DeepSpeed 结合使用时遇到运行时错误

问题描述

我正在使用 Transformer 构建自回归模型，但潜在空间有点大。因此，我试图采用稀疏注意力。我从 this link 借用了 SparseAttention 模块，并使用如下测试代码测试其功能：

from sparse_attention import SparseAttention
shape = (2,32,32)
n_head = 2
casual = True
block = 32
num_local_blocks = 4
sparse_model = SparseAttention(shape,n_head,casual)

q = torch.randn(2,2,1,512)
decode_step = None
decode_idx = None
sparse_out = sparse_model(q,q,decode_step,decode_idx)

但是，此计算无法成功，error 如下所示。有人遇到同样的问题吗？顺便说一句，我使用的是PyTorch=1.7，cuda=10.2，并且我已经安装了llvm-9-config。希望有人能帮我解决这个问题！

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

attention-model pytorch transformer