问题描述
我想按条件删除重复的列 所以我想做的是“类型”是相同的(重复)删除“数字”一个
我明白了
data={"col1":[2,3,4,5,9,2,6],"col2":[4,6,1,5],"col3":[7,11,7],"col4":[14,22,8,9],"col5":[0,7,"type":["A","A","C","D","B","E"],"number":["one","two","one","two"]}
df=pd.DataFrame.from_dict(data)
我想要这个
data={"col1":[3,"col2":[2,"col3":[6,"col4":[11,"col5":[5,"number":["two","two"]}
df=pd.DataFrame.from_dict(data)
解决方法
您可以链接2个条件-通过比较Series.ne
和使用Series.duplicated
倒置掩码来选择所有非one
值:
df1 = df[df['number'].ne('one') | ~df['type'].duplicated(keep=False)]
print (df1)
col1 col2 col3 col4 col5 type number
1 3 2 6 11 5 A two
2 4 4 0 22 7 C two
3 5 6 11 8 3 D one
5 2 1 6 3 2 B two
6 6 5 7 9 9 E two
具有分类的另一个想法:
cats = pd.unique(['one'] + df['number'].unique().tolist())
df['number'] = pd.Categorical(df['number'],categories=cats,ordered=True)
df2 = df.sort_values('number').drop_duplicates(subset=['type'],keep='last').sort_index()
print (df2)
col1 col2 col3 col4 col5 type number
1 3 2 6 11 5 A two
2 4 4 0 22 7 C two
3 5 6 11 8 3 D one
5 2 1 6 3 2 B two
6 6 5 7 9 9 E two
,
尝试一下:
df = df.drop_duplicates(subset=['type'],keep='last')
print(df)
输出:
col1 col2 col3 col4 col5 type number
1 3 2 6 11 5 A two
2 4 4 0 22 7 C two
3 5 6 11 8 3 D one
5 2 1 6 3 2 B two
6 6 5 7 9 9 E two