计算字符串的第一个字母并显示出现的次数,但不按字母顺序在R中显示

问题描述

我目前已编写此代码,以计算代码的第一个字母出现在表中特定列中的次数

#a test data frame    
test <- data.frame("State" = c("PA","RI","SC"),"Code1" = c("EFGG,AFGG","SSAG","AFGG,SSAG"))

#code to count method codes
test[] <- lapply(test,as.character)


test_counts <- sapply(strsplit(test$Code1,",\\s+"),function(x) {
  tab <- table(substr(x,1,1)) # Create a table of the first letters
  paste0(names(tab),tab,collapse = ",") # Paste together the letter w/ the number and collapse 
them
} )

#example of output
[1] "A1,E1" "S1"     "A1,S1"

关于此当前代码的所有内容都很完美,只是我希望R不按字母顺序输出计数。我希望它保留代码的顺序。所以这就是我希望输出看起来的样子:

 [1] "E1,A1","S1","A1,S1"

谢谢!

解决方法

这是使用factor解决问题的基本R选项

sapply(
  strsplit(test$Code1,","),function(x) {
    toString(
      do.call(
        paste0,rev(stack(table(factor(u<-substr(x,1,1),levels = unique(u)))))
      )
    )
  }
)

给出

[1] "E1,A1" "S1"     "A1,S1"
,

带有tidyverse的另一个选项。我们可以separate_rows根据频次列对行进行count拆分后,将{Code1}与group_by分开,得到paste,然后进行arrange library(dplyr) library(tidyr) test %>% separate_rows(Code1) %>% mutate(Code1 = substr(Code1,1)) %>% count(State,Code1) %>% arrange(State,n) %>% unite(Code1,Code1,n,sep="") %>% group_by(State) %>% summarise(Code1 = toString(Code1),.groups = 'drop') %>% pull(Code1) #[1] "A1,E1" "S1" "A1,S1" >

name.as_bytes().as_ptr() as *const i8