问题描述
我对编码,尤其是网页抓取还是很陌生,但这是我正在尝试做的:
我想抓取 fbRef.com 来为每支英超球队的一些比赛统计数据创建一个数据框。
library(rvest)
page <- "https://fbref.com/en/comps/9/Premier-League-Stats"
scraped_page <- read_html(page)
teamLinks <- scraped_page%>%
html_nodes("#stats_squads_standard_for a")%>%
html_attr("href")
teamLinks <- paste0("https://fbref.com/",teamLinks)
我还可以根据相同的信息创建每个团队名称的列表
Team <- scraped_page%>%
html_nodes('#stats_squads_standard_for .left')%>%
html_text()%>%
as.character()
但现在我想分别为每个团队创建一个数据框,并抓取每个团队的页面以获取特定统计数据。我有一个 for 循环来获取我需要的统计数据,但我不知道如何将它分开或如何使用团队名称命名每个数据框。
for (i in 1:length(teamLinks)){
url <- teamLinks[i]
scraped_url <- read_html(url)
Team <- scraped_page%>%
html_nodes('#stats_squads_standard_for .left')%>%
html_text()%>%
as.character()
df_name <- paste0(Team[i])
df <- {
Comp <- scraped_url%>%
html_nodes(comp)%>%
html_text()
Venue <- scraped_url%>%
html_nodes(venue)%>%
html_text()
Result <- scraped_url%>%
html_nodes(result)%>%
html_text()
Goals_For <- scraped_url%>%
html_nodes(GF)%>%
html_text()
Goals_Against <- scraped_url%>%
html_nodes(GA)%>%
html_text()
Opponent <- scraped_url%>%
html_nodes(Opp)%>%
html_text()
xG <- scraped_url%>%
html_nodes(xg)%>%
html_text()
xGA <- scraped_url%>%
html_nodes(xga)%>%
html_text()
Possession <- scraped_url%>%
html_nodes(poss)%>%
html_text()
Formation <- scraped_url%>%
html_nodes(formation)%>%
html_text()
data.frame(Comp,Venue,Goals_For,Goals_Against,Opponent,xG,xGA,Possession,Formation)
}
}
还有任何帮助清理 for 循环将不胜感激
这些也是每个 html 变量的值:
comp <- ".left:nth-child(3) a"
venue <- ".left:nth-child(6)"
result <- "#matchlogs_for .left+ .center"
GF <- "#matchlogs_for .right:nth-child(8)"
GA <- "#matchlogs_for .right:nth-child(9)"
Opp <- ".left:nth-child(10)"
xg <- "#matchlogs_for td.left+ .right"
xga <- "#matchlogs_for .right:nth-child(12)"
poss <- "#matchlogs_for td:nth-child(13)"
formation <- ".left:nth-child(16)"
先谢谢你!
解决方法
您可以在循环之前创建一个列表并将每个数据帧保存到该列表中,如下所示:
TeamList <- list()
for (i in 1:length(teamLinks)){
# [...] your scraping code that leads to a "df"
TeamList[[i]] <- df
}
然后根据每个团队命名 TeamList
的数据帧,然后使用 list2env()
将数据帧列表转换为多个数据帧:
names(TeamList) <- Team
list2env(TeamList,envir=.GlobalEnv)