熊猫数据框根据条件更改列中的值

问题描述

我下面有一个大的数据框:

在此处{edu_val.csv}中用作示例的数据可以在https://github.com/ENLK/Py-Projects-/blob/master/education_val.csv

中找到
import pandas as pd 

edu = pd.read_csv('education_val.csv')
del edu['Unnamed: 0']
edu.head(10)

ID  Year    Education
22445   1991    higher education
29925   1991    No qualifications
76165   1991    No qualifications
223725  1991    Other
280165  1991    intermediate qualifications
333205  1991    No qualifications
387605  1991    higher education
541285  1991    No qualifications
541965  1991    No qualifications
599765  1991    No qualifications

Education中的值是:

edu.Education.value_counts()

intermediate qualifications 153705
higher education    67020
No qualifications   55842
Other   36915

我想通过以下方式替换“教育”列中的值:

  1. 如果一个IDhigher education列中的年份中的值为Education,则该ID的所有未来年份也将具有{{1} }在higher education列中。

  2. 如果一个Education在一年中的值为ID,那么该intermediate qualifications的所有未来年份将在相应的{{1}中包含ID }列。但是,如果值intermediate qualifications在此Education的任何后续年份中出现,则higher education在随后的年份中替换ID,无论higher education还是intermediate qualifications

例如,在下面的数据框中,Other年中的No qualifications occur的值为IDhigher education的所有后续1991值应为在以后的年份(直到Education年之前,都用22445替换。

higher education

类似地,以下数据框中的2017 1587125在年份edu.loc[edu['ID'] == 22445] ID Year Education 22445 1991 higher education 22445 1992 higher education 22445 1993 higher education 22445 1994 higher education 22445 1995 higher education 22445 1996 intermediate qualifications 22445 1997 intermediate qualifications 22445 1998 Other 22445 1999 No qualifications 22445 2000 intermediate qualifications 22445 2001 intermediate qualifications 22445 2002 intermediate qualifications 22445 2003 intermediate qualifications 22445 2004 intermediate qualifications 22445 2005 intermediate qualifications 22445 2006 intermediate qualifications 22445 2007 intermediate qualifications 22445 2008 intermediate qualifications 22445 2010 intermediate qualifications 22445 2011 intermediate qualifications 22445 2012 intermediate qualifications 22445 2013 intermediate qualifications 22445 2014 intermediate qualifications 22445 2015 intermediate qualifications 22445 2016 intermediate qualifications 22445 2017 intermediate qualifications 中具有值ID,在intermediate qualifications中变为1991。未来几年(从1993年开始)higher education1993列中的所有后续值都应为Education

1587125

数据中有12,057个唯一的higher education,列edu.loc[edu['ID'] == 1587125] ID Year Education 1587125 1991 intermediate qualifications 1587125 1992 intermediate qualifications 1587125 1993 higher education 1587125 1994 higher education 1587125 1995 higher education 1587125 1996 higher education 1587125 1997 higher education 1587125 1998 higher education 1587125 1999 higher education 1587125 2000 higher education 1587125 2001 higher education 1587125 2002 higher education 1587125 2003 higher education 1587125 2004 Other 1587125 2005 No qualifications 1587125 2006 intermediate qualifications 1587125 2007 intermediate qualifications 1587125 2008 intermediate qualifications 1587125 2010 intermediate qualifications 1587125 2011 higher education 1587125 2012 higher education 1587125 2013 higher education 1587125 2014 higher education 1587125 2015 higher education 1587125 2016 higher education 1587125 2017 higher education 的范围是1991年至2017年。如何根据以上所述更改所有12,057个ID的值条件?我不确定如何针对所有唯一的Year以统一的方式执行此操作。此处用作示例的示例数据附在上面的Github链接中。预先非常感谢。

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)