使用 janitor::make_clean_names() 自定义清理字符串向量

问题描述

我有一个包含数据框列名的向量。我想清理那些字符串。

vec_of_names <- c("FirsT_column","another-column","ALLCAPS-column","cOLumn-with___specialsuffix","blah#4-column","ANOTHER_EXAMPLE___specialsuffix","THIS_IS-Misleading_specialsuffix")

我特别想使用 janitor::make_clean_names() 进行此清理。

janitor::make_clean_names(vec_of_names)

[1] "first_column"                     "another_column"                  
[3] "allcaps_column"                   "c_o_lumn_with_specialsuffix"     
[5] "blah_number_4_column"             "another_example_specialsuffix"   
[7] "this_is_misleading_specialsuffix"

但是，我想应用以下规则：

当字符串以 ___specialsuffix 结尾（即 3 个下划线和“specialsuffix”）时，
- 只用 janitor::make_clean_names() 清理 BEFORE ___specialsuffix
  部分（意思是从 strsplit(x,"___specialsuffix") 返回的值）。
- 然后将清理过的字符串粘贴回 ___specialsuffix。
否则，如果字符串不以 ___specialsuffix 结尾，则定期使用 janitor::make_clean_names() 对整个字符串进行清理。

期望的输出因此将是：

[1] "first_column"                     "another_column"                  
[3] "allcaps_column"                   "c_o_lumn_with___specialsuffix"     ## elements [4] and [6]
[5] "blah_number_4_column"             "another_example___specialsuffix"   ## were handled according to rule #1
[7] "this_is_misleading_specialsuffix"                                     ## outlined above

非常感谢您的任何想法！

解决方法

vec_of_names <- c("FIRST_column","another-column","ALLCAPS-column","cOLumn-with___specialsuffix","blah#4-column","ANOTHER_EXAMPLE___specialsuffix","THIS_IS-Misleading_specialsuffix")


library(tidyverse)

suffix <- vec_of_names %>% str_extract(pattern = "___specialsuffix$") %>% replace_na("")
cleaned_without_suffix <- vec_of_names %>% str_remove("___specialsuffix$") %>% janitor::make_clean_names()


output <- paste0(cleaned_without_suffix,suffix)