根据python列中的值创建百分比列

问题描述

我正在尝试创建一列,其中包含基于python中其他列中的值的百分比。例如,假设我们具有以下数据集。

+------------------------------------+------------+
|              teachers              |  grades    |
+------------------------------------+------------+
| Teacher1                           |     1      |
| Teacher1                           |     2      |
| Teacher1                           |     0      |
| Teacher2                           |     1      |
| Teacher2                           |     2      |
| Teacher2                           |     0      |
| Teacher2                           |     2      |
| Teacher3                           |     2      |
| Teacher3                           |     0      |
| Teacher3                           |     1      |
| Teacher3                           |     0      |
| Teacher4                           |     0      |
| Teacher4                           |     0      |
+------------------------------------+------------+

如您所见,我们在第一栏中有老师。在第二列中,我们可以看到老师给学生的分数(0,1和2)。在这里,我试图获取每个老师给定年级的1年级和2年级的百分比。例如,老师1的成绩为1年级,1年级2年和0年级。在这种情况下,给定年级数字1和2在总年级中所占的百分比为66%。所以我想得到以下数据框:

+------------------------------------+------------+------------+
|              teachers              |  grades    | percentage |
+------------------------------------+------------+------------+
| Teacher1                           |     1      |     66%    |
| Teacher1                           |     2      |     66%    |
| Teacher1                           |     0      |     66%    |
| Teacher2                           |     1      |     75%    |
| Teacher2                           |     2      |     75%    |
| Teacher2                           |     0      |     75%    |
| Teacher2                           |     2      |     75%    |
| Teacher3                           |     2      |     50%    |
| Teacher3                           |     0      |     50%    |
| Teacher3                           |     1      |     50%    |
| Teacher3                           |     0      |     50%    |
| Teacher4                           |     0      |     0%     |
| Teacher4                           |     0      |     0%     |
+------------------------------------+------------+------------+

到目前为止,我已经尝试了以下方法,但是没有用。你能帮我吗?

percents = {} #store Teacher:percent
for t,g in df.groupby('teachers'):
    total = g.grades.sum()
    one_two = g.loc[g.grades.isin([1,2])].counts.sum() #consider only 1&2
    percent = (one_two/total)*100
    print(t,percent)
    percents[t] = [percent]

解决方法

使用numpy / pandas时请避免循环。这是它的向量化版本:

percentage = df.groupby('teachers').grades.transform(lambda x: sum(x > 0) / len(x))

这里唯一的区别是.transform,它具有处理组的功能-您已经拥有其他所有内容。

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...