问题描述
我有一个包含区域,客户和某些交货的数据框。 此列用作购买类型,并且首次购买和最后一次购买被标记为“第一笔”和“最后一次”,但是介于两者之间的任何交付均被标记为“交付” 有没有一种方法可以转换投放并获得“ delivery1”,“ delivery2”等标签?
import pandas as pd
data = [['NY','A','FirsT',10],['NY','DELIVERY',20],30],'LAST',25],'B',15],['FL',12],20]
]
# Create the pandas DataFrame
df = pd.DataFrame(data,columns = ['Region','Client','purchaseType','price'])
# print dataframe.
df
所需的输出:
data2 = [['NY','DELIVERY1','DELIVERY2','DELIVERY3',20]
]
df2 = pd.DataFrame(data2,'price'])
print(df2)
谢谢!
解决方法
我们可以尝试使用GroupBy.cumcount
和Series.str.cat
blocks = df['purchaseType'].eq('FIRST').cumsum()
fill_values = df['purchaseType'].str.cat(df.groupby(blocks)
.cumcount().astype(str),sep='')
df.loc[df['purchaseType'].eq('DELIVERY'),'purchaseType'] = fill_values
print(df)
# Region Client purchaseType price
# 0 NY A FIRST 10
# 1 NY A DELIVERY1 20
# 2 NY A DELIVERY2 30
# 3 NY A LAST 25
# 4 NY B FIRST 15
# 5 NY B DELIVERY1 10
# 6 NY B LAST 20
# 7 FL A FIRST 15
# 8 FL A DELIVERY1 10
# 9 NY A DELIVERY2 12
# 10 NY A DELIVERY3 25
# 11 NY A LAST 20
,
您可以使用np.where
来决定在何处添加数字后缀:
df['purchaseType'] = df.groupby((df['purchaseType']=='FIRST').cumsum())['purchaseType'].transform(
lambda x: np.where(x=='DELIVERY',x+np.arange(len(x)).astype(str),x)
)
print(df)
打印:
Region Client purchaseType price
0 NY A FIRST 10
1 NY A DELIVERY1 20
2 NY A DELIVERY2 30
3 NY A LAST 25
4 NY B FIRST 15
5 NY B DELIVERY1 10
6 NY B LAST 20
7 FL A FIRST 15
8 FL A DELIVERY1 10
9 NY A DELIVERY2 12
10 NY A DELIVERY3 25
11 NY A LAST 20