问题描述
我下面有一个大的数据框:
在此处{edu_val.csv}中用作示例的数据可以在https://github.com/ENLK/Py-Projects-/blob/master/education_val.csv
中找到import pandas as pd
edu = pd.read_csv('education_val.csv')
del edu['Unnamed: 0']
edu.head(10)
ID Year Education
22445 1991 higher education
29925 1991 No qualifications
76165 1991 No qualifications
223725 1991 Other
280165 1991 intermediate qualifications
333205 1991 No qualifications
387605 1991 higher education
541285 1991 No qualifications
541965 1991 No qualifications
599765 1991 No qualifications
列Education
中的值是:
edu.Education.value_counts()
intermediate qualifications 153705
higher education 67020
No qualifications 55842
Other 36915
我想通过以下方式替换“教育”列中的值:
-
如果一个
ID
在higher education
列中的年份中的值为Education
,则该ID
的所有未来年份也将具有{{1} }在higher education
列中。 -
如果一个
Education
在一年中的值为ID
,那么该intermediate qualifications
的所有未来年份将在相应的{{1}中包含ID
}列。但是,如果值intermediate qualifications
在此Education
的任何后续年份中出现,则higher education
在随后的年份中替换ID
,无论higher education
还是intermediate qualifications
。
例如,在下面的数据框中,Other
年中的No qualifications occur
的值为ID
,higher education
的所有后续1991
值应为在以后的年份(直到Education
年之前,都用22445
替换。
higher education
类似地,以下数据框中的2017
1587125在年份edu.loc[edu['ID'] == 22445]
ID Year Education
22445 1991 higher education
22445 1992 higher education
22445 1993 higher education
22445 1994 higher education
22445 1995 higher education
22445 1996 intermediate qualifications
22445 1997 intermediate qualifications
22445 1998 Other
22445 1999 No qualifications
22445 2000 intermediate qualifications
22445 2001 intermediate qualifications
22445 2002 intermediate qualifications
22445 2003 intermediate qualifications
22445 2004 intermediate qualifications
22445 2005 intermediate qualifications
22445 2006 intermediate qualifications
22445 2007 intermediate qualifications
22445 2008 intermediate qualifications
22445 2010 intermediate qualifications
22445 2011 intermediate qualifications
22445 2012 intermediate qualifications
22445 2013 intermediate qualifications
22445 2014 intermediate qualifications
22445 2015 intermediate qualifications
22445 2016 intermediate qualifications
22445 2017 intermediate qualifications
中具有值ID
,在intermediate qualifications
中变为1991
。未来几年(从1993年开始)higher education
中1993
列中的所有后续值都应为Education
。
1587125
数据中有12,057个唯一的higher education
,列edu.loc[edu['ID'] == 1587125]
ID Year Education
1587125 1991 intermediate qualifications
1587125 1992 intermediate qualifications
1587125 1993 higher education
1587125 1994 higher education
1587125 1995 higher education
1587125 1996 higher education
1587125 1997 higher education
1587125 1998 higher education
1587125 1999 higher education
1587125 2000 higher education
1587125 2001 higher education
1587125 2002 higher education
1587125 2003 higher education
1587125 2004 Other
1587125 2005 No qualifications
1587125 2006 intermediate qualifications
1587125 2007 intermediate qualifications
1587125 2008 intermediate qualifications
1587125 2010 intermediate qualifications
1587125 2011 higher education
1587125 2012 higher education
1587125 2013 higher education
1587125 2014 higher education
1587125 2015 higher education
1587125 2016 higher education
1587125 2017 higher education
的范围是1991年至2017年。如何根据以上所述更改所有12,057个ID
的值条件?我不确定如何针对所有唯一的Year
以统一的方式执行此操作。此处用作示例的示例数据附在上面的Github链接中。预先非常感谢。
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)