您如何计算另一列中特定数字以上和以下的幅度百分比?

问题描述

我有这个数据集

study_ID title                  experiment question_ID participant_ID estimate_level estimate correct_answer question                      type   category   age gender
      <dbl> <chr>                       <dbl> <chr>                <int> <chr>             <dbl>          <dbl> <chr>                         <chr>  <chr>    <int> <chr> 
 1       11 Dallacker_Parents'_co…          1 1                        1 individual          3             10   How many sugar cubes does or… unlim… nutriti…    32 Female
 2       11 Dallacker_Parents'_co…          1 2                        1 individual         10             11.5 How many sugar cubes does a … unlim… nutriti…    32 Female
 3       11 Dallacker_Parents'_co…          1 3                        1 individual          7              6.5 How many sugar cubes does a … unlim… nutriti…    32 Female
 4       11 Dallacker_Parents'_co…          1 4                        1 individual          1             16.5 How many sugar cubes does a … unlim… nutriti…    32 Female
 5       11 Dallacker_Parents'_co…          1 5                        1 individual          7             11   How many sugar cubes does a … unlim… nutriti…    32 Female
 6       11 Dallacker_Parents'_co…          1 6                        1 individual          5              2.5 How many sugar cubes does a … unlim… nutriti…    32 Female
 7       11 Dallacker_Parents'_co…          1 1                        2 individual          2             10   How many sugar cubes does or… unlim… nutriti…    29 Female
 8       11 Dallacker_Parents'_co…          1 2                        2 individual         10             11.5 How many sugar cubes does a … unlim… nutriti…    29 Female
 9       11 Dallacker_Parents'_co…          1 3                        2 individual          1.5            6.5 How many sugar cubes does a … unlim… nutriti…    29 Female
10       11 Dallacker_Parents'_co…          1 4                        2 individual          2             16.5 How many sugar cubes does a … unlim… nutriti…    29 Female

此数据集中有6个问题,每个问题都有一个correct_answer列和一个estimate列。我正在尝试为每个问题计算一个数量级,以便获得被低估或高估且正确估算的人的百分比。

例如,对于6个问题中的每一个,它都将返回类似的结果:80%被低估,10%被高估,而10%回答正确。

我该怎么做?我感到难过。预先感谢!

这是举报

dput(head(DF,10))
structure(list(study_ID = c(5,5,5),title = c("5_Jayles_Debiasing_The_Crowd","5_Jayles_Debiasing_The_Crowd","5_Jayles_Debiasing_The_Crowd"),experiment = c(1,1,1),question_ID = c(1,participant_ID = c(1,2,3,4,6,7,8,9,10),estimate_level = c("individual","individual","individual"),estimate = c(2e+07,4500000,21075541,2e+07,1e+06,1.1e+07,2.5e+07,8e+06,1.6e+07,9800000),correct = c(3.8e+07,3.8e+07,3.8e+07),question = c("What is the population of Tokyo and its agglomeration?","What is the population of Tokyo and its agglomeration?","What is the population of Tokyo and its agglomeration?"),type = c("unlimited","unlimited","unlimited"),category = c("demographics","demographics","demographics"
    ),age = c("NA","NA","NA"),gender = c("NA","NA")),row.names = c(NA,-10L),class = c("tbl_df","tbl","data.frame"))

解决方法

这是一种dplyr的方法:

library(dplyr)
df %>%
  group_by(question_ID) %>%
  summarize(prop_over = mean(estimate > correct),prop_under = mean(estimate < correct),prop_correct = mean(estimate == correct)
  )
# `summarise()` ungrouping output (override with `.groups` argument)
# # A tibble: 1 x 4
#   question_ID prop_over prop_under prop_correct
#         <dbl>     <dbl>      <dbl>        <dbl>
# 1           1         0          1            0
,
list1 <- lapply(split(DF,DF$question_ID),function (x) {
  overestimated <- 100 * length(which(x$estimate > x$correct)) / length(x$estimate)
  underestimated <- 100 * length(which(x$estimate < x$correct)) / length(x$estimate)
  correct <- 100 * length(which(x$estimate == x$correct)) / length(x$estimate)
  data.frame(overestimated,underestimated,correct)
})
list2 <- mapply(function (x,y) {
  x$question_ID <- y
  return (x)
},x = list1,y = names(list1),SIMPLIFY = F)
Percent_Data <- do.call("rbind",list2)
Percent_Data <- Percent_Data[,c(which(colnames(Percent_Data) == "question_ID"),which(colnames(Percent_Data) != "question_ID"))]
Percent_Data
#   question_ID overestimated underestimated correct
# 1           1             0            100       0

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...