如何使用mutate替换当前列中的数据?

问题描述

我想按年份对数据框进行分组并标准化某些列(在本例中为BioTest,MathExam和WritingScore),然后用新数据替换旧数据。下面是我的数据框示例:

DF:

Var1   Var2   Year  BioTest   MathExam   WritingScore   Var3  Var 4
 X      X     2016   165        140         10           X     X
 X      X     2017   172        128         11           X     X
 X      X     2018   169        115          8           X     X
 X      X     2016   166        139         10           X     X
 X      X     2017   165        140         12           X     X

我尝试了以下代码的变体:

DF<- DF %>% group_by(Year)%>% mutate(across(BioTest:WritingScore),scale)

DF<- DF %>% group_by(Year)%>% mutate(across(select(BioTest:WritingScore)),scale)

我得到的是相同的DF,没有任何更改。我想要的是:

 DF:

 Var1   Var2   Year  BioTest   MathExam   WritingScore   Var3  Var 4
 X      X     2016   NewData     NewData      NewData      X     X
 X      X     2017   NewData     NewData      NewData      X     X
 X      X     2018   NewData     NewData      NewData      X     X
 X      X     2016   NewData     NewData      NewData      X     X
 X      X     2017   NewData     NewData      NewData      X     X

非常感谢您的帮助。

解决方法

问题可能是dplyr::mutate掩盖了plyr::mutate。可以使用(与across一起关闭而没有函数的事实)来复制它

iris %>%
    group_by(Species) %>%
    plyr::mutate(across(where(is.numeric),scale))
# A tibble: 150 x 5
# Groups:   Species [3]
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
# 1          5.1         3.5          1.4         0.2 setosa 
# 2          4.9         3            1.4         0.2 setosa 
# 3          4.7         3.2          1.3         0.2 setosa 
# 4          4.6         3.1          1.5         0.2 setosa 
# 5          5           3.6          1.4         0.2 setosa 
# 6          5.4         3.9          1.7         0.4 setosa 
# 7          4.6         3.4          1.4         0.3 setosa 
# 8          5           3.4          1.5         0.2 setosa 
# 9          4.4         2.9          1.4         0.2 setosa 
#10          4.9         3.1          1.5         0.1 setosa 
# … with 140 more rows

与初始“ iris”数据集相同

现在,检查正确的dplyr::mutate

iris %>% 
   group_by(Species) %>%
   dplyr::mutate(across(where(is.numeric),scale))
# A tibble: 150 x 5
# Groups:   Species [3]
#   Sepal.Length[,1] Sepal.Width[,1] Petal.Length[,1] Petal.Width[,1] Species
#              <dbl>           <dbl>            <dbl>           <dbl> <fct>  
# 1           0.267           0.190            -0.357          -0.436 setosa 
# 2          -0.301          -1.13             -0.357          -0.436 setosa 
# 3          -0.868          -0.601            -0.933          -0.436 setosa 
# 4          -1.15           -0.865             0.219          -0.436 setosa 
# 5          -0.0170          0.454            -0.357          -0.436 setosa 
# 6           1.12            1.25              1.37            1.46  setosa 
# 7          -1.15           -0.0739           -0.357           0.512 setosa 
# 8          -0.0170         -0.0739            0.219          -0.436 setosa 
# 9          -1.72           -1.39             -0.357          -0.436 setosa 
#10          -0.301          -0.865             0.219          -1.39  setosa 
# … with 140 more rows

因此,在OP的代码中,我们只需要使用dplyr::mutate或在仅加载dplyr的情况下重新启动全新的R会话

DF %>% 
   group_by(Year)%>% 
   dplyr::mutate(across(BioTest:WritingScore,scale))

scale返回带有某些属性的matrix。如果只需要numeric vector部分,则可以使用as.vectoras.numeric

DF %>% 
   group_by(Year)%>% 
   dplyr::mutate(across(BioTest:WritingScore,~ as.numeric(scale(.)))

注意:select内不需要across

,

也许尝试一下。这个问题在您的across()声明中。该函数必须在其内部:

library(dplyr)
#Code
DF %>%
  group_by(Year) %>%
  mutate(across(BioTest:WritingScore,~scale(.)[,1]))

输出:

# A tibble: 5 x 9
# Groups:   Year [3]
  Var1  Var2   Year BioTest[,1] MathExam[,1] WritingScore[,1] Var3  Var   X4   
  <chr> <chr> <int>       <dbl>        <dbl>            <dbl> <chr> <chr> <lgl>
1 X     X      2016      -0.707        0.707          NaN     X     X     NA   
2 X     X      2017       0.707       -0.707           -0.707 X     X     NA   
3 X     X      2018     NaN          NaN              NaN     X     X     NA   
4 X     X      2016       0.707       -0.707          NaN     X     X     NA   
5 X     X      2017      -0.707        0.707            0.707 X     X     NA   

使用了一些数据:

#Data
DF <- structure(list(Var1 = c("X","X","X"),Var2 = c("X",Year = c(2016L,2017L,2018L,2016L,2017L
),BioTest = c(165L,172L,169L,166L,165L),MathExam = c(140L,128L,115L,139L,140L),WritingScore = c(10L,11L,8L,10L,12L),Var3 = c("X",Var = c("X",X4 = c(NA,NA,NA)),class = "data.frame",row.names = c(NA,-5L))

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...