问题描述
假设我有一个具有以下值的数据框:
id product1sold product2sold product3sold
1 2 3 3
2 0 0 5
3 3 2 1
如何在每个ID的列表中添加一个“ most_sold”和“ least_sold”列,其中包含所有销量最高和销量最低的商品? 看起来应该像这样。
id product1 product2 product3 most_sold least_sold
1 2 3 3 [product2,product3] [product1]
2 0 0 5 [product3] [product1,product2]
3 3 2 1 [product1] [product3]
解决方法
对产品列表使用具有最小和最大值测试的列表理解:
#select all columns without first
df1 = df.iloc[:,1:]
cols = df1.columns.to_numpy()
df['most_sold'] = [cols[x].tolist() for x in df1.eq(df1.max(axis=1),axis=0).to_numpy()]
df['least_sold'] = [cols[x].tolist() for x in df1.eq(df1.min(axis=1),axis=0).to_numpy()]
print (df)
id product1sold product2sold product3sold most_sold \
0 1 2 3 3 [product2sold,product3sold]
1 2 0 0 5 [product3sold]
2 3 3 2 1 [product1sold]
least_sold
0 [product1sold]
1 [product1sold,product2sold]
2 [product3sold]
如果性能不重要,可以使用DataFrame.apply
:
df1 = df.iloc[:,1:]
f = lambda x: x.index[x].tolist()
df['most_sold'] = df1.eq(df1.max(axis=1),axis=0).apply(f,axis=1)
df['least_sold'] = df1.eq(df1.min(axis=1),axis=1)
,
您可以执行以下操作。
minValueCol = yourDataFrame.idxmin(axis=1)
maxValueCol = yourDataFrame.idxmax(axis=1)