将数据抓取到 R

问题描述

我目前正在尝试将 Player Standard Stats table 刮到 R 中,但无法找到合适的表。

html_link <- "https://fbref.com/en/comps/9/stats/Premier-League-Stats#stats_standard::1"
"https://fbref.com/en/comps/9/stats/Premier-League-Stats#stats_standard::1"
df <- html_link %>% 
  xml2::read_html() %>%
  rvest::html_nodes("table") %>% 
  rvest::html_table(fill = T)

链接提供了到剪贴板的复制链接,因此我试图使用该链接并抓取数据,但看起来我没有得到正确的结果。有谁知道如何在 R 中自动执行此操作而无需下载 CSV 文件

谢谢。

解决方法

您可以使用表格上的“嵌入链接”...

url <- "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F9%2Fstats%2FPremier-League-Stats&div=div_stats_standard"

f <- url %>% 
     xml2::read_html() %>%
     rvest::html_nodes('table') %>%
     html_table() %>%
     .[[1]]

> head(f)
                                                               
1 Rk              Player  Nation Pos          Squad    Age Born
2  1 Patrick van Aanholt  nl NED  DF Crystal Palace 30-170 1990
3  2       Tammy Abraham eng ENG  FW        Chelsea 23-136 1997
4  3           Che Adams eng ENG  FW    Southampton 24-217 1996
5  4    Tosin Adarabioyo eng ENG  DF         Fulham 23-144 1997
6  5             Adrián  es ESP  GK      Liverpool 34-043 1987
  Playing Time Playing Time Playing Time Playing Time Performance
1           MP       Starts          Min          90s         Gls
2           14           13        1,144         12.7           0
3           18           10          957         10.6           6
4           22           20        1,735         19.3           4
5           19           19        1,710         19.0           0
6            2            2          180          2.0           0
  Performance Performance Performance Performance Performance
1         Ast        G-PK          PK       PKatt        CrdY
2           1           0           0           0           1
3           1           6           0           0           0
4           4           4           0           0           1
5           0           0           0           0           1
6           0           0           0           0           0
  Performance Per 90 Minutes Per 90 Minutes Per 90 Minutes
1        CrdR            Gls            Ast            G+A
2           0           0.00           0.08           0.08
3           0           0.56           0.09           0.66
4           0           0.21           0.21           0.41
5           0           0.00           0.00           0.00
6           0           0.00           0.00           0.00
  Per 90 Minutes Per 90 Minutes Expected Expected Expected Expected
1           G-PK         G+A-PK       xG     npxG       xA  npxG+xA
2           0.00           0.08      0.8      0.8      0.8      1.6
3           0.56           0.66      5.5      5.5      0.9      6.3
4           0.21           0.41      5.1      5.1      4.3      9.4
5           0.00           0.00      0.8      0.8      0.1      0.9
6           0.00           0.00      0.0      0.0      0.0      0.0
  Per 90 Minutes Per 90 Minutes Per 90 Minutes Per 90 Minutes
1             xG             xA          xG+xA           npxG
2           0.06           0.06           0.12           0.06
3           0.51           0.08           0.60           0.51
4           0.26           0.22           0.49           0.26
5           0.04           0.01           0.05           0.04
6           0.00           0.00           0.00           0.00
  Per 90 Minutes        
1        npxG+xA Matches
2           0.12 Matches
3           0.60 Matches
4           0.49 Matches
5           0.05 Matches
6           0.00 Matches