将PDF表读入R,其中行的行数不同

问题描述

我希望将以下PDF读入R中的整洁数据框中: PDF Table。该表甚至可以跨越70多个页面。

我擅长阅读每个单元格都有一行的表,但是我不确定如何将这些知识扩展到行数不同的情况下

任何帮助将不胜感激!

解决方法

我建议您使用tabulizer。最好从pdf文件中提取表格。这是您共享文件的代码:

library(tabulizer)
lst <- extract_tables(file = '8-31-2020 Paragraph IV Update_0.pdf') 
#Format
renames <- function(x)
{
  colnames(x) <- x[1,]
  x <- x[2:dim(x)[1],drop=F]
  return(as.data.frame(x))
}
#Apply
lst21 <- lapply(lst,renames)
#Bind all
df <- do.call(rbind,lst21)

输出(某些行):

head(df)

                                       DRUG NAME   DOSAGE FORM              STRENGTH
1                               Abacavir Sulfate       Tablets                300 mg
2                                       Abacavir Oral Solution              20 mg/mL
3 Abacavir Sulfate,Dolutegravir\rand Lamivudine       Tablets  600 mg/50 mg/300\rmg
4               Abacavir Sulfate and\rLamivudine       Tablets         600 mg/300 mg
5   Abacavir Sulfate,Lamivudine\rand Zidovudine       Tablets 300 mg/150 mg/300\rmg
6                            Abiraterone Acetate       Tablets                125 mg
          RLD/NDA DATE OF\rSUBMISSION NUMBER OF\rANDAs\rSUBMITTED 180-DAY\rSTATUS
1   Ziagen\r20977           1/28/2009                           1        Eligible
2   Ziagen\r20978          12/27/2012                           1        Eligible
3 Triumeq\r205551           8/14/2017                           5                
4  Epzicom\r21652           9/27/2007                           1        Eligible
5 Trizivir\r21205           3/22/2011                           1        Eligible
6   Yonsa\r210308           7/23/2018                           1                
  180-DAY\rDECISION\rPOSTING\rDATE DATE OF\rFIRST\rAPPLICANT\rAPPROVAL
1                        2/11/2020                           6/18/2012
2                        2/11/2020                           9/26/2016
3                                                                     
4                        2/11/2020                           9/29/2016
5                        2/11/2020                           12/5/2013
6                                                                     
  DATE OF FIRST\rCOMMERCIAL\rMARKETING BY\rFTF EXPIRATION\rDATE OF LAST\rQUALIFYING\rPATENT
1                                    6/19/2012                                    5/14/2018
2                                    9/15/2017                                    5/14/2018
3                                                                                 12/8/2029
4                                    9/29/2016                                    5/14/2018
5                                   12/17/2013                                    5/14/2018
6                                                                                 3/17/2034

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...