基于列表结构模式创建新列表

问题描述

我有一些看起来像这样的数据:

   dat <- c("Sales","Jim","Halpert","","Reception","Pam","Beasley","Not.Manager","Dwight","Schrute","Bears","Beets","BattlestarGalactica","Manager","Michael","Scott","")

每个“块”数据都是连续的,中间有一些空白。我想将数据转换为看起来像这样的列表列表:

iwant <- c(
           c("Sales","Halpert"),c("Reception","Beasley"),c("Not.Manager","BattlestarGalactica"),c("Manager","Scott")
           )

建议?我正在使用rvest和stringi。我不想添加更多软件包。

解决方法

我建议采用下一种方法。您将最终得到一个数据框,其中的变量格式与所需格式相似:

#Split chains
L1 <- strsplit(paste0(dat,collapse = " "),split = "  ")
#Split vectors from each chain
L2 <- lapply(L1[[1]],function(x) strsplit(trimws(x),split = " "))
#Format
L2 <- lapply(L2,as.data.frame)
#Remove zero dim data
L2[which(lapply(L2,nrow)==0)]<-NULL
#Format names
L2 <- lapply(L2,function(x) {names(x)<-'v';return(x)})
#Transform to dataframe
D1 <- as.data.frame(do.call(cbind,L2))
#Rename
names(D1) <- paste0('V',1:dim(D1)[2])
#Remove recycled values
D1 <- as.data.frame(apply(D1,2,function(x) {x[duplicated(x)]<-NA;return(x)}))

输出:

       V1        V2                  V3      V4
1   Sales Reception         Not.Manager Manager
2     Jim       Pam              Dwight Michael
3 Halpert   Beasley             Schrute   Scott
4    <NA>      <NA>               Bears    <NA>
5    <NA>      <NA>               Beets    <NA>
6    <NA>      <NA> BattlestarGalactica    <NA>
,

您可以将rlesplitlapply一起使用:

lapply(split(dat,with(rle(dat != ''),rep(cumsum(values),lengths))),function(x) x[x!= ''])

#$`1`
#[1] "Sales"   "Jim"     "Halpert"

#$`2`
#[1] "Reception" "Pam"       "Beasley"  

#$`3`
#[1] "Not.Manager"         "Dwight"    "Schrute"     "Bears"   "Beets"            
#[6] "BattlestarGalactica"

#$`4`
#[1] "Manager" "Michael" "Scott"  

rle部分创建到split上的组:

with(rle(dat != ''),lengths))
#[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4

split之后,我们使用lapply从每个列表中删除所有空元素。

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...