问题描述
假设我有这个数据框
Name = ['ID','Country','IBAN','ID_info_1','Dan_Age','Dan_city','Dan_country','ID_info_2','Dan_sex','Dan_country' ]
Value = ['TAMara_CO','GERMANY','FR56','12','18','25','Berlin','34','55','345','432','43','GER','M','22','FRA','Madrid','ESP']
Ccy = ['','','EUR','USD','CHF','DKN']
Group = ['0','0','1','2','3','4','3']
df = pd.DataFrame({'Name':Name,'Value' : Value,'Ccy' : Ccy,'Group':Group})
print(df)
Name Value Ccy Group
0 ID TAMara_CO 0
1 Country GERMANY 0
2 IBAN FR56 0
3 ID_info_1 12 EUR 1
4 Dan_Age 18 EUR 1
5 ID_info_1 25 EUR 2
6 Dan_city Berlin 2
7 ID_info_1 34 EUR 3
8 Dan_country 55 3
9 ID_info_1 345 4
10 ID_info_2 432 1
11 ID_info_2 43 EUR 2
12 ID_info_2 GER EUR 3
13 Dan_sex M USD 4
14 Dan_Age 22 USD 2
15 Dan_country FRA 2
16 Dan_sex M CHF 2
17 Dan_city Madrid 3
18 Dan_country ESP DKN 3
我要减少此数据帧!我想通过将具有最高级别的行保留在“组”列中,仅减少包含字符串“ info”的行。因此,在此数据框中,这意味着我将在第4组中保留行“ ID_info_1”,在第3组中保留行“ ID_info_1”。此外,我想将其在“组”列中的值更改为1。 / p>
所以最后我想获得这个新的数据框,其中的索引也会重置
Name Value Ccy Group
0 ID TAMara_CO 0
1 Country GERMANY 0
2 IBAN FR56 0
3 ID_info_1 12 EUR 1
4 Dan_Age 18 EUR 1
5 Dan_city Berlin 2
6 Dan_country 55 3
7 ID_info_1 345 1
8 ID_info_2 GER EUR 1
9 Dan_sex M USD 4
10 Dan_Age 22 USD 2
11 Dan_country FRA 2
12 Dan_sex M CHF 2
13 Dan_city Madrid 3
14 Dan_country ESP DKN 3
有人有一个有效的主意吗?
谢谢
解决方法
如何?
# select rows with "info"
di = df[df.Name.str.contains('info')]
# Find the rows below max for removal
di = di[di.groupby('Name')['Group'].transform('max') != di['Group']]
# Remove those rows and set a new index as requested
df = df.drop(di.index).reset_index(drop=True)
# Change group to one on remaining "info" rows
df.loc[df.Name.str.contains('info'),'Group'] = 1
,
您可以使用lambda函数创建掩码,该函数在“名称”列中搜索字符串“ info”,并在“组”列中搜索值。
arr = []
mask = df.apply(lambda x: True if 'info' in x['Name'] else False,axis=1)
for info in df[mask]['Name'].unique():
min_val = df.loc[df['Name'] == info]['Group'].min()
arr += list(df[(df['Name'] == info) & (df['Group'] > min_val)].index)
df.drop(arr,inplace=True)
df.reset_index(inplace=True)
Name Value Ccy Group
0 ID TAMARA_CO 0
1 Country GERMANY 0
2 IBAN FR56 0
3 ID_info_1 12 EUR 1
4 Dan_Age 18 EUR 1
5 Dan_city Berlin 2
6 Dan_country 55 3
7 ID_info_2 432 1
8 Dan_sex M USD 4
9 Dan_Age 22 USD 2
10 Dan_country FRA 2
11 Dan_sex M CHF 2
12 Dan_city Madrid 3
13 Dan_country ESP DKN 3
我知道df看上去不像您想要的100p,但这就是我理解您的问题的方式。让我知道我是否错了。
编辑 重新阅读问题并编辑一些代码。