问题描述
数据 df
child parent
b a
c a
d b
e c
f c
g f
输出:
child parent level
b a 1
c a 1
d b 2
e c 2
f c 2
g f 3
根据此父子报告,“a”是主要父项,因为它不向任何人报告。 'b' 和 'c' 向 'a' 报告,因此它们的级别 = 1。'd' 和 'e' 向级别 1 (b,c) 报告,因此它们的级别 =2。 'g' 报告给 'f'(这是级别 2),因此级别 = 3 表示 'g'。请让我知道如何实现这一目标
我正在尝试下面的代码,但它不起作用
df['Level'] = np.where(df['parent'] == 'a',"level 1",np.nan)
dfm1 = pd.Series(np.where(df['Level'] == 'level 1',df['parent'],None))
df.loc[df['parent'].isin(dfm1),'Level'] = "level 2"
解决方法
这是一种使用 networkx
的方法,我们可以在其中找到没有祖先并获得相同长度的方法
import networkx as nx
G = nx.from_pandas_edgelist(df,"parent","child",create_using=nx.DiGraph())
f = lambda x: len(nx.ancestors(G,x))
df['level'] = df['child'].map(f)
print(df)
child parent level
0 b a 1
1 c a 1
2 d b 2
3 e c 2
4 f c 2
5 g f 3
,
这是第一性原理的解决方案:
# We will build the tree of relationships,using a helper node class
class Node:
def __init__(self,value,parent=None,level=0):
self.value = value
self.parent = parent
self.level = level
self.children = []
def set_child(self,child):
child.level = self.level + 1
self.children.append(child)
# Helper function to insert nodes
def insert(node,new_node):
if new_node.parent == node.value:
# if the new node is a child,insert it
node.set_child(new_node)
else:
# otherwise,iterate over the children until you find its parent
if node.children:
for child in node.children:
insert(child,new_node)
# gather the level information for the tree
def node_print(node,values=[]):
if node.parent:
values.append((node.value,node.parent,node.level))
for child in node.children:
values = node_print(child,values=values)
return values
# Now get the data and build the tree
data = """b a
c a
d b
e c
f c
g f"""
rows = [y.split() for y in data.split("\n")]
for index,(child,parent) in enumerate(rows):
if index == 0:
node = Node(value=parent)
child_node = Node(value=child,parent=parent)
insert(node,child_node)
output = pd.DataFrame(data=node_print(node,values=[]),columns=['child','parent','level']).sort_values(by='level')
print(output)
child parent level
0 b a 1
2 c a 1
1 d b 2
3 e c 2
4 f c 2
5 g f 3