比较 Pandas DataFrame 中的列给出了无法解决的 ValueError

问题描述

我有以下熊猫数据帧:

df = pd.DataFrame({"id": [0,1,2,3,4,5,6],"from": ["A","B","D","C","B"],"to": ["B","F","G","E"],"cases": [[1,44],[2,3],[5,2],[5],[1,7],[4],[44,7]]
                   "start1": [1,23,12,8],"start2": [4,7,9,30,26,15,18],"end1": [5,11,32,17,21],"end2": [9,35,20,25],})

看起来像:

    id  from    to  cases       start1  start2  end1    end2
0   0   A       B   [1,44]  1       4       5       9     
1   1   B       C   [2,3]   5       7       7       12    
2   2   B       D   [5,2]      4       9       11      15   
3   3   D       F   [5]         4       30      32      35    
4   4   B       G   [1,7]      23      26      15      17     
5   5   C       F   [4]         12      15      17      20    
6   6   B       E   [44,7]     8       18      21      25    

我正在尝试创建一个列 adjacency_list,其中包含行 iidj 值,其中:

  • i["to"] == j["from"]
  • i["cases"]j["cases"] 重叠
  • 区间 (i["end1"],i["end2"]) 和 (j["start1"],j["start2"]) 重叠

我正在尝试执行以下代码来实现此目的:

data["adjacency_list"] = data.apply(
        lambda x: [
            row["id"]
            for i,row in data[(x["to"] == data["from"])].iterrows()
            if ((not set(row["cases"]).isdisjoint(x["cases"])) and ((x["end1"] <= test["start1"] <= x["end2"]) or (test["start1"] <= x["end1"] <= test["start2"])))
        ],axis=1,)

输出应如下所示:

    id  from    to  cases       start1  start2  end1    end2    adjacency_list
0   0   A       B   [1,44]  1       4       5       9       [1,6]
1   1   B       C   [2,3]   5       7       7       12      [5]
2   2   B       D   [5,2]      4       9       11      15      [3]
3   3   D       F   [5]         4       30      32      35      []
4   4   B       G   [1,7]      23      26      15      17      []
5   5   C       F   [4]         12      15      17      20      []
6   6   B       E   [44,7]     8       18      21      25      []

但它给了我以下错误

ValueError: The truth value of a Series is ambiguous. Use a.empty,a.bool(),a.item(),a.any() or a.all().

我从在不同上下文中遇到此错误用户那里阅读了很多其他答案,并尝试将 andor 替换为 &|,但是这不工作。此外,用两个单 <= 替换双 <= 比较也无济于事。

如何解决这个问题?

解决方法

(test["start1"] <= x["end1"] <= test["start2"]) 正在创建一系列布尔值,因为 test['start1'] 是一个系列,所以每个元素都会进行比较。

尝试将每个 rowx 进行比较:

df["adjacency_list"] = df.apply(
    lambda x: [
        row["id"]
        for _,row in df[(x["to"] == df["from"])].iterrows()
        if (
                (
                    not set(row["cases"]).isdisjoint(x["cases"])
                ) and (
                        (x["end1"] <= row["start1"] <= x["end2"])
                        or
                        (row["start1"] <= x["end1"] <= row["start2"])
                )
        )
    ],axis=1,)

输出:

   id from to       cases  start1  start2  end1  end2 adjacency_list
0   0    A  B  [1,2,44]       1       4     5     9      [1,6]
1   1    B  C   [2,4,3]       5       7     7    12            [5]
2   2    B  D      [5,2]       4       9    11    15            [3]
3   3    D  F         [5]       4      30    32    35             []
4   4    B  G      [1,7]      23      26    15    17             []
5   5    C  F         [4]      12      15    17    20             []
6   6    B  E     [44,7]       8      18    21    25             []