python pandas按条件删除重复的列

问题描述

我想按条件删除重复的列所以我想做的是“类型”是相同的（重复）删除“数字”一个

我明白了

data={"col1":[2,3,4,5,9,2,6],"col2":[4,6,1,5],"col3":[7,11,7],"col4":[14,22,8,9],"col5":[0,7,"type":["A","A","C","D","B","E"],"number":["one","two","one","two"]}
df=pd.DataFrame.from_dict(data)

我想要这个

data={"col1":[3,"col2":[2,"col3":[6,"col4":[11,"col5":[5,"number":["two","two"]}
df=pd.DataFrame.from_dict(data)

解决方法

您可以链接2个条件-通过比较Series.ne和使用Series.duplicated倒置掩码来选择所有非one值：

df1 = df[df['number'].ne('one') | ~df['type'].duplicated(keep=False)]
print (df1)
   col1  col2  col3  col4  col5 type number
1     3     2     6    11     5    A    two
2     4     4     0    22     7    C    two
3     5     6    11     8     3    D    one
5     2     1     6     3     2    B    two
6     6     5     7     9     9    E    two

具有分类的另一个想法：

cats = pd.unique(['one'] + df['number'].unique().tolist())

df['number'] = pd.Categorical(df['number'],categories=cats,ordered=True)

df2 = df.sort_values('number').drop_duplicates(subset=['type'],keep='last').sort_index()
print (df2)
   col1  col2  col3  col4  col5 type number
1     3     2     6    11     5    A    two
2     4     4     0    22     7    C    two
3     5     6    11     8     3    D    one
5     2     1     6     3     2    B    two
6     6     5     7     9     9    E    two

尝试一下：

df = df.drop_duplicates(subset=['type'],keep='last')
print(df)

输出：

    col1    col2    col3    col4    col5    type    number
1   3       2       6       11      5       A       two
2   4       4       0       22      7       C       two
3   5       6       11      8       3       D       one
5   2       1       6       3       2       B       two
6   6       5       7       9       9       E       two

conditional-statements drop pandas python