问题描述
我是编码的新手,我已经从事了大约一个星期的研究,并且陷入僵局,所以请保持温柔。
我要做的是从URL中获取所有数据,并以print语句显示的格式将其放入CSV文件中。
我已经成功地打印了一行,但是我不知道该如何遍历所有其他行并将它们附加到CSV文件中。有任何提示或提示吗?
import io
sys.stdout = io.TextIOWrapper(sys.stdout.detach(),encoding = 'utf-8')
sys.stderr = io.TextIOWrapper(sys.stderr.detach(),encoding = 'utf-8')
import urllib.request
from bs4 import BeautifulSoup
url = "https://en.wikipedia.org/wiki/2022_FIFA_World_Cup_qualification_%E2%80%93_CAF_First_Round"
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page,"lxml")
dateLists = soup.find_all(attrs={"class" : "bday dtstart published updated"})
timeLists = soup.find_all(attrs={"class" : "mobile-float-reset ftime"})
homeTeamLists = soup.find_all(attrs={"class" : "fhome"})
awayTeamLists = soup.find_all(attrs={"class" : "faway"})
scoreLists = soup.find_all(attrs={"class" : "fscore"})
venueLists = soup.find_all('span',attrs={"itemprop" : "name address"})
date = dateLists[0].text.strip()
time = timeLists[0].text.strip()
homeTeam = homeTeamLists[0].text.strip()
awayTeam = awayTeamLists[0].text.strip()
score = scoreLists[0].text.strip()
venue = venueLists[0].text.strip()
print(date,time,homeTeam,score,awayTeam,venue)
解决方法
您只需要遍历列表中的每个项目。您可以使用config.stopBubbling=true
lombok.addLombokGeneratedAnnotation=true
lombok.addJavaxGeneratedAnnotation=true
获取索引位置,然后使用该位置将每个项目附加到列表和数据框中:
enumerate
输出:
import urllib.request
from bs4 import BeautifulSoup
import pandas as pd
url = "https://en.wikipedia.org/wiki/2022_FIFA_World_Cup_qualification_%E2%80%93_CAF_First_Round"
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page,"lxml")
dateLists = soup.find_all(attrs={"class" : "bday dtstart published updated"})
timeLists = soup.find_all(attrs={"class" : "mobile-float-reset ftime"})
homeTeamLists = soup.find_all(attrs={"class" : "fhome"})
awayTeamLists = soup.find_all(attrs={"class" : "faway"})
scoreLists = soup.find_all(attrs={"class" : "fscore"})
venueLists = soup.find_all('span',attrs={"itemprop" : "name address"})
dateList = []
timeList = []
homeTeamList = []
awayTeamList = []
scoreList = []
venueList = []
for idx,v in enumerate(dateLists):
dateList.append(dateLists[idx].text.strip())
timeList.append(timeLists[idx].text.strip())
homeTeamList.append(homeTeamLists[idx].text.strip())
awayTeamList.append(awayTeamLists[idx].text.strip())
scoreList.append(scoreLists[idx].text.strip())
venueList.append(venueLists[idx].text.strip())
df = pd.DataFrame({'date':dateList,'time':timeList,'home':homeTeamList,'away':awayTeamList,'score':scoreList,'venue':venueList})