IndexError：索引 0 处的掩码 [...] 的形状与索引 0 处的索引张量 [...] 的形状不匹配

问题描述

我正在尝试使用 Torch 进行标签传播。我有一个看起来像

的数据框

ID   Target   Weight   Label
1      12       0.4      1
2      24       0.1      0
4      13       0.5      1
4      12       0.3      1
12     1        0.1      1
12     4        0.4      1
13     4        0.2      1
17     1        0.1      0

等等。

我构建的网络如下：

G = nx.from_pandas_edgelist(df,source='ID',target='Target',edge_attr=['Weight'])

和邻接矩阵

adj_matrix = nx.adjacency_matrix(G).toarray()

我只有两个标签，0 和 1，还有一些未标记的数据。我按如下方式创建了输入张量：

# Create input tensors
adj_matrix_t = torch.FloatTensor(adj_matrix)
labels_t = torch.LongTensor(df['Labels'].tolist())

尝试运行以下代码

# Learn with Label Propagation
label_propagation = LabelPropagation(adj_matrix_t)
label_propagation.fit(labels_t) # this is causing the error

我收到错误：IndexError: The shape of the mask [196] at index 0 does not match the shape of the indexed tensor [207] at index 0。我检查了 adj_matrix_t.shape 的大小，当前为 (207,207)，而标签为 196。你知道我该如何解决这个不一致问题吗？

请看下面的错误轨迹：

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-42-cf4f88a4bb12> in <module>
      2 label_propagation = LabelPropagation(adj_matrix_t)
      3 print("Label Propagation: ",end="")
----> 4 label_propagation.fit(labels_t)
      5 label_propagation_output_labels = label_propagation.predict_classes()
      6 

<ipython-input-1-54a7dbc30bd1> in fit(self,labels,max_iter,tol)
    100 
    101     def fit(self,max_iter=1000,tol=1e-3):
--> 102         super().fit(labels,tol)
    103 
    104 ## Label spreading

<ipython-input-1-54a7dbc30bd1> in fit(self,tol)
     58             Convergence tolerance: threshold to consider the system at steady state.
     59         """
---> 60         self._one_hot_encode(labels)
     61 
     62         self.predictions = self.one_hot_labels.clone()

<ipython-input-1-54a7dbc30bd1> in _one_hot_encode(self,labels)
     43         self.one_hot_labels = torch.zeros((self.n_nodes,self.n_classes),dtype=torch.float)
     44         self.one_hot_labels = self.one_hot_labels.scatter(1,labels.unsqueeze(1),1)
---> 45         self.one_hot_labels[unlabeled_mask,0] = 0
     46 
     47         self.labeled_mask = ~unlabeled_mask

以下代码是我想用于标签传播的示例。似乎错误是由标签引起的。我的数据集中的节点没有标签（尽管在上面的示例中我为所有标签编写了）。这可能是导致错误消息的原因吗？

原始代码（供参考：https://mybinder.org/v2/gh/thibaudmartinez/label-propagation/master?filepath=notebook.ipynb）：

## Testing models on synthetic data

import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt

# Create caveman graph
n_cliques = 4
size_cliques = 5
caveman_graph = nx.connected_caveman_graph(n_cliques,size_cliques)
adj_matrix = nx.adjacency_matrix(caveman_graph).toarray()


# Create labels
labels = np.full(n_cliques * size_cliques,-1.)

# Only one node per clique is labeled. Each clique belongs to a different class.
labels[0] = 0
labels[size_cliques] = 1
labels[size_cliques * 2] = 2
labels[size_cliques * 3] = 3

# Create input tensors
adj_matrix_t = torch.FloatTensor(adj_matrix)
labels_t = torch.LongTensor(labels)

# Learn with Label Propagation
label_propagation = LabelPropagation(adj_matrix_t)
print("Label Propagation: ",end="")
label_propagation.fit(labels_t)
label_propagation_output_labels = label_propagation.predict_classes()

# Learn with Label Spreading
label_spreading = LabelSpreading(adj_matrix_t)
print("Label Spreading: ",end="")
label_spreading.fit(labels_t,alpha=0.8)
label_spreading_output_labels = label_spreading.predict_classes()

# Plot graphs
color_map = {-1: "orange",0: "blue",1: "green",2: "red",3: "cyan"}
input_labels_colors = [color_map[l] for l in labels]
lprop_labels_colors = [color_map[l] for l in label_propagation_output_labels.numpy()]
lspread_labels_colors = [color_map[l] for l in label_spreading_output_labels.numpy()]

plt.figure(figsize=(14,6))
ax1 = plt.subplot(1,4,1)
ax2 = plt.subplot(1,2)
ax3 = plt.subplot(1,3)

ax1.title.set_text("Raw data (4 classes)")
ax2.title.set_text("Label Propagation")
ax3.title.set_text("Label Spreading")

pos = nx.spring_layout(G)
nx.draw(G,ax=ax1,pos=pos,node_color=input_labels_colors,node_size=50)
nx.draw(G,ax=ax2,node_color=lprop_labels_colors,ax=ax3,node_color=lspread_labels_colors,node_size=50)

# Legend
ax4 = plt.subplot(1,4)
ax4.axis("off")
legend_colors = ["orange","blue","green","red","cyan"]
legend_labels = ["unlabeled","class 0","class 1","class 2","class 3"]
dummy_legend = [ax4.plot([],[],ls='-',c=c)[0] for c in legend_colors]
plt.legend(dummy_legend,legend_labels)

plt.show()

当然，如果我在这篇文章顶部的数据集示例由于标签而不适合原始代码，如果您可以给我另一个示例以了解标签（确定节点的类别）如何在数据集中应该看起来像（即使有要预测的缺失值），我们将不胜感激。

解决方法

对于这里的其他读者来说，this 似乎是这个问题中所询问的实现。

您用来尝试预测标签的方法适用于节点的标签，而不是边。为了可视化这一点，我绘制了您的示例数据并通过您的 Weight 和 Label 列（用于生成绘图的代码附加在下面）对绘图进行了着色，其中 Weight 是边缘的线条粗细，{ {1}} 是颜色：

为了使用此方法，您需要生成如下所示的数据，其中每个节点（由 Label 表示）恰好有一个 ID：

node_label

需要明确的是，您仍然需要上面的原始数据来构建网络和邻接矩阵，但您必须决定一些逻辑规则将边缘标签转换为节点标签。然后，一旦您预测了未标记的节点，您就可以在必要时反转规则以获得边缘标签。

这不是一种严格严格的方法，但它是实用的，如果您的数据不仅仅是随机噪声，它可能会产生一些合理的结果。

代码附录：

ID    node_label
1         1
2         0
4         1
12        1
13        1
17        0

networkx python torch torch torch