问题描述
我想按年份对数据框进行分组并标准化某些列(在本例中为BioTest,MathExam和WritingScore),然后用新数据替换旧数据。下面是我的数据框示例:
DF:
Var1 Var2 Year BioTest MathExam WritingScore Var3 Var 4
X X 2016 165 140 10 X X
X X 2017 172 128 11 X X
X X 2018 169 115 8 X X
X X 2016 166 139 10 X X
X X 2017 165 140 12 X X
我尝试了以下代码的变体:
DF<- DF %>% group_by(Year)%>% mutate(across(BioTest:WritingScore),scale)
DF<- DF %>% group_by(Year)%>% mutate(across(select(BioTest:WritingScore)),scale)
我得到的是相同的DF,没有任何更改。我想要的是:
DF:
Var1 Var2 Year BioTest MathExam WritingScore Var3 Var 4
X X 2016 NewData NewData NewData X X
X X 2017 NewData NewData NewData X X
X X 2018 NewData NewData NewData X X
X X 2016 NewData NewData NewData X X
X X 2017 NewData NewData NewData X X
非常感谢您的帮助。
解决方法
问题可能是dplyr::mutate
掩盖了plyr::mutate
。可以使用(与across
一起关闭而没有函数的事实)来复制它
iris %>%
group_by(Species) %>%
plyr::mutate(across(where(is.numeric),scale))
# A tibble: 150 x 5
# Groups: Species [3]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# <dbl> <dbl> <dbl> <dbl> <fct>
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3 1.4 0.2 setosa
# 3 4.7 3.2 1.3 0.2 setosa
# 4 4.6 3.1 1.5 0.2 setosa
# 5 5 3.6 1.4 0.2 setosa
# 6 5.4 3.9 1.7 0.4 setosa
# 7 4.6 3.4 1.4 0.3 setosa
# 8 5 3.4 1.5 0.2 setosa
# 9 4.4 2.9 1.4 0.2 setosa
#10 4.9 3.1 1.5 0.1 setosa
# … with 140 more rows
与初始“ iris”数据集相同
现在,检查正确的dplyr::mutate
iris %>%
group_by(Species) %>%
dplyr::mutate(across(where(is.numeric),scale))
# A tibble: 150 x 5
# Groups: Species [3]
# Sepal.Length[,1] Sepal.Width[,1] Petal.Length[,1] Petal.Width[,1] Species
# <dbl> <dbl> <dbl> <dbl> <fct>
# 1 0.267 0.190 -0.357 -0.436 setosa
# 2 -0.301 -1.13 -0.357 -0.436 setosa
# 3 -0.868 -0.601 -0.933 -0.436 setosa
# 4 -1.15 -0.865 0.219 -0.436 setosa
# 5 -0.0170 0.454 -0.357 -0.436 setosa
# 6 1.12 1.25 1.37 1.46 setosa
# 7 -1.15 -0.0739 -0.357 0.512 setosa
# 8 -0.0170 -0.0739 0.219 -0.436 setosa
# 9 -1.72 -1.39 -0.357 -0.436 setosa
#10 -0.301 -0.865 0.219 -1.39 setosa
# … with 140 more rows
因此,在OP的代码中,我们只需要使用dplyr::mutate
或在仅加载dplyr
的情况下重新启动全新的R会话
DF %>%
group_by(Year)%>%
dplyr::mutate(across(BioTest:WritingScore,scale))
scale
返回带有某些属性的matrix
。如果只需要numeric
vector
部分,则可以使用as.vector
或as.numeric
DF %>%
group_by(Year)%>%
dplyr::mutate(across(BioTest:WritingScore,~ as.numeric(scale(.)))
注意:select
内不需要across
也许尝试一下。这个问题在您的across()
声明中。该函数必须在其内部:
library(dplyr)
#Code
DF %>%
group_by(Year) %>%
mutate(across(BioTest:WritingScore,~scale(.)[,1]))
输出:
# A tibble: 5 x 9
# Groups: Year [3]
Var1 Var2 Year BioTest[,1] MathExam[,1] WritingScore[,1] Var3 Var X4
<chr> <chr> <int> <dbl> <dbl> <dbl> <chr> <chr> <lgl>
1 X X 2016 -0.707 0.707 NaN X X NA
2 X X 2017 0.707 -0.707 -0.707 X X NA
3 X X 2018 NaN NaN NaN X X NA
4 X X 2016 0.707 -0.707 NaN X X NA
5 X X 2017 -0.707 0.707 0.707 X X NA
使用了一些数据:
#Data
DF <- structure(list(Var1 = c("X","X","X"),Var2 = c("X",Year = c(2016L,2017L,2018L,2016L,2017L
),BioTest = c(165L,172L,169L,166L,165L),MathExam = c(140L,128L,115L,139L,140L),WritingScore = c(10L,11L,8L,10L,12L),Var3 = c("X",Var = c("X",X4 = c(NA,NA,NA)),class = "data.frame",row.names = c(NA,-5L))