使用R在宽数据的列名中插入点

问题描述

以下数据集具有较宽的格式，并以“ a”，“ b”和“ c”为前缀重复测量“ ql”，“ st”和“ xy”；

df<-data.frame(id=c(1,2,3,4),ex=c(1,1),aql=c(5,4,NA,6),bql=c(5,7,9),cql=c(5,bst=c(3,8,cst=c(8,5,3),axy=c(1,9,cxy=c(5,1,4))

我正在寻找一种在前缀字母“ a”，“ b”和“ c”之后插入点的方法，同时保持其他列（即id，ex）不变。我一直在使用gsub函数来解决此问题，例如

names(df) <- gsub("","\\.",names(df))

但是得到了不希望的结果。预期的输出看起来像

   id ex a.ql b.ql c.ql b.st c.st a.xy c.xy
1  1  1    5    5    5    3    8    1    5
2  2  0    4    7    7    7    7    9    3
3  3  0   NA   NA   NA    8    5    4    1
4  4  1    6    9    9    9    3    4    4

解决方法

尝试

sub("(^[a-c])(.+)","\\1.\\2",names(df))

# [1] "id"   "ex"   "a.ql" "b.ql" "c.ql" "b.st" "c.st" "a.xy" "c.xy"

或

sub("(?<=^[a-c])",".",names(df),perl = TRUE)

# [1] "id"   "ex"   "a.ql" "b.ql" "c.ql" "b.st" "c.st" "a.xy" "c.xy"

你可以做

setNames(df,sub("(ql$)|(st$)|(xy$)","\\.\\1\\2\\3",names(df)))

#>   id ex a.ql b.ql c.ql b.st c.st a.xy c.xy
#> 1  1  1    5    5    5    3    8    1    5
#> 2  2  0    4    7    7    7    7    9    3
#> 3  3  0   NA   NA   NA    8    5    4    1
#> 4  4  1    6    9    9    9    3    4    4

另一种尝试的方式

library(dplyr)
df %>% 
  rename_at(vars(aql:cxy),~ str_replace(.,"(?<=\\w{1})","\\."))
#   id ex a.ql b.ql c.ql b.st c.st a.xy c.xy
# 1  1  1    5    5    5    3    8    1    5
# 2  2  0    4    7    7    7    7    9    3
# 3  3  0   NA   NA   NA    8    5    4    1
# 4  4  1    6    9    9    9    3    4    4

您还可以尝试使用tidyverse方法来重塑数据，如下所示：

library(tidyverse)
#Data
df<-data.frame(id=c(1,2,3,4),ex=c(1,1),aql=c(5,4,NA,6),bql=c(5,7,9),cql=c(5,bst=c(3,8,cst=c(8,5,3),axy=c(1,9,cxy=c(5,1,4))
#Reshape
df %>% pivot_longer(-c(1,2)) %>%
  mutate(name=paste0(substring(name,'.',substring(name,nchar(name)))) %>%
  pivot_wider(names_from = name,values_from=value)

输出：

# A tibble: 4 x 9
     id    ex  a.ql  b.ql  c.ql  b.st  c.st  a.xy  c.xy
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     1     5     5     5     3     8     1     5
2     2     0     4     7     7     7     7     9     3
3     3     0    NA    NA    NA     8     5     4     1
4     4     1     6     9     9     9     3     4     4

dataframe gsub r r rename