问题描述
尽管dataframe列的长度较大,我还是尝试根据列表在数据框中的排列顺序重新排序。
enrolNo Surname
0 1 Jones
1 2 Smith
2 3 Henderson
3 4 Kilm
4 5 Henry
5 6 Joseph
late = ['Kilm','Henry','Smith']
所需的输出:
sorted_late = ['Smith','Kilm','Henry']
我最初的尝试是在现有数据框中添加一个新列,然后将其提取为列表,但这似乎还有很长的路要走。此外,我发现尝试以以下内容开头后,由于错误消息指出的长度不同,我的尝试将无法正常工作
df_register['late_arrivals'] = np.where((df_register['Surname'] == late),late,'')
我应该使用“ for”循环吗?
解决方法
为什么不使用.isin()
函数?
df['Surename'].isin(late)
然后您将获得所需的输出。
,从数据框本身中拔出匹配值。无需对列表本身进行排序:
sorted_late = df[df.Surname.isin(late)].Surname.to_list()
如果这是一个列表,您也可以使用它:
sorted_late = [master_late for master_late in master_list if master_late in late]
,
您可以指定一个custom key for the sort function
import pandas
df = pandas.DataFrame([
{"enrolNo": 1,"Surname": "Jones"},{"enrolNo": 2,"Surname": "Smith"},{"enrolNo": 3,"Surname": "Henderson"},{"enrolNo": 4,"Surname": "Kilm"},{"enrolNo": 5,"Surname": "Henry"},{"enrolNo": 6,"Surname": "Joseph"},])
# set Surname as index so we can access enrolNo by it
df = df.set_index('Surname')
# now you can access enrolNo by Surname
assert df.loc['Kilm']['enrolNo'] == 4
# define the list to be sorted
late = ['Kilm','Henry','Smith']
# Sort late by enrolNo as listed in the dataframe
late_sorted = sorted(late,key=lambda n: df.loc[n]['enrolNo'])
# ['Smith','Kilm','Henry']