如何使用非导入的 R 对象处理 readTableHeader 在 read.table 的“文本”上找到的不完整的最后一行?

问题描述

我正在尝试使用 read.table() 中的命名字符向量创建数据帧,该向量是在程序中创建的,而不是外部 csv。我已经看到其他解决方案向源文件添加回车符,但是当我提供向量以读取函数时我该怎么做?

# create an object that contains the link to the data source
  url <- "https://modules.ussquash.com/ssm/pages/leagues/list_scorecard.asp?id=116041"
  
  # pull in the HTML of the web page
  page <- url %>%
    read_html()
  
  # Identify the 'ID' or tag of the table that contains the results - usually
  # when using the rvest package,you would Feed (%>%) the table object into the html_table() function
  # In this case,US squash has some malformed HTML in their source code,so we have to use a different approach
  table <- page %>% html_nodes(xpath='//*[@id="corebody"]/table[4]')
  
  # Pull the actual contents of the table (not just the header)
  cells <- table %>% 
    html_nodes(xpath='.//td[not(@class="Line")]') %>% 
    html_text()
  if ("Match not played" %in% cells) {
    cells <- cells[-73:-74]
  }
  
  # Pull the header of the table
  headers <- table %>% 
    html_nodes(xpath='.//th') %>% 
    html_text()
  
  # Set how any columns there are in the imported table
  Ncol <- 8
  
  # collapse disjointed vector into rows
  rows <- sapply(split(cells,rep(1:(length(cells)/Ncol),each=Ncol)),paste,collapse="\t")
  
  # convert to an R table
  dd <- read.table(text=rows,header=F,sep="\t",col.names = headers)

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)