问题描述
我正计划实施以下逻辑以获取学生分数。
查找获得60分以上的学生
然后再根据主题学生密钥获得该学生分数!
输入数据
data = [['Maths',100,80,20],['Science',20,10]]
df = pd.DataFrame(data,columns = ['Subject','Student A','Student B','Student C'])
df.set_index("Subject",inplace=True)
Student A Student B Student C
Subject
Maths 100 80 20
Science 80 20 10
让学生获得60分以上的成绩
df=df[df.gt(60)]
rank_df = df.rank(axis=0,method='average',pct=False,ascending=False)
marks_list = []
for i in range(0,len(rank_df)):
label_series = rank_df.iloc[i,:]
labels_notna = label_series.sort_values(ascending=True)[label_series.notna()].index
marks_list.append(",".join(labels_notna))
df['Student gt 60'] = marks_list
new_df = df['Student gt 60'].str.split(',',expand = True)
new_df.reset_index(inplace=True)
new_df.columns=["Subject","Top 1","Top 2"]
new_df = pd.melt(new_df,id_vars=['Subject'],value_name='Student')
data= new_df[["Subject","Student"]]
data.loc[~data["Student"].isna()]
Subject Student
0 Maths Student A
1 Science Student A
2 Maths Student B
我计划在同一数据框中获得主题/学生键的相关分数,但无法解决。
必需的输出:
Subject Student score
0 Maths Student A 100
1 Maths Student B 80
2 Science Student A 80
有人可以帮我一些指点吗!
解决方法
我建议先堆叠数据框以获得一个MultiIndex Series(该主题位于第一级,而学生位于第二级),然后为该系列编制索引,以选择得分充分的所有学生:
df_stacked = df.stack()
df_stacked[df_stacked.gt(60)]
# Out:
# Subject
# Maths Student A 100
# Student B 80
# Science Student A 80
# dtype: int64
,
首先,按照最终所需的方式定向数据:
vertical = df.unstack()
那给你:
Subject
Student A Maths 100
Science 80
Student B Maths 80
Science 20
Student C Maths 20
Science 10
然后简单地:
vertical[vertical > 60]
为您提供最终结果:
Subject
Student A Maths 100
Science 80
Student B Maths 80
您可以对此进行reset_index()
,使其看起来更像示例输出。