问题描述
代码:
library(pdftools)
library(data.table)
library(tabulizer)
pdf_file <- "new.pdf"
out2 <- extract_tables(pdf_file,pages =c(89),output = "data.frame")
out2<-as.data.table(out2)
colnames(out2)
实际输出:
"Group.1" "Day.7" "Day.8" "Day.9"
"Group.2" "Day.10" "Day.11","Day.12"
预期输出:
"Day.7" "Day.8" "Day.9"
"Day.10" "Day.11","Day.12"
另外请向我建议任何其他从 PDF 中提取数据表的 R 包(pdftools 和 tabulizer 除外)
解决方法
这将删除以 "G"
开头的列:
result <- out2[,!startsWith(names(out2),"G")]
,
您可以使用dplyr::select
:
library(dplyr)
dplyr::select(out2,-starts_with("G"))