问题描述
我试图从某个机构抓取所有即将举行的活动的详细信息:-
import requests
from bs4 import BeautifulSoup
response = requests.get("http://www.iitg.ac.in/home/eventsall/events")
soup = BeautifulSoup(response.content,"html.parser")
cards = soup.find_all("div",attrs={"class": "newsarea"})
iitg_title = []
iitg_date = []
iitg_link = []
for card in cards[0:6]:
iitg_date.append(card.find("div",attrs={"class": "ndate"}).text)
iitg_title.append(card.find("div",attrs={"class": "ntitle"}).text.strip())
iitg_link.append(card.find("div",attrs={"class": "ntitle"}).a['href'])
print("Upcoming event details scraped from iitg website:- \n")
for i in range(len(iitg_title)):
print("Title:- ",iitg_title[i])
print("Dates:- ",iitg_date[i])
print("Link:- ",iitg_link[i])
print('\n')
上面的代码为我提供了这些详细信息:-
Upcoming event details scraped from iitg website:-
Title:- 4 batch for the certification programme on AI & ML by Eckovation in association with E&ICT Academy IIT Guwahati
Dates:- 15 Aug 2020 - 15 Aug 2020
Link:- http://eict.iitg.ac.in/online_courses_training.html
Title:- 8th International and 47th National conference on Fluid Mechanics and Fluid Power
Dates:- 09 Dec 2020 - 11 Dec 2020
Link:- https://event.iitg.ac.in/fmfp2020/
Title:- 4 months Internship programme on VLSI Circuit Design
Dates:- 10 Aug 2020 - 10 Dec 2020
Link:- http://eict.iitg.ac.in/online_courses_training.html
Title:- 6 week Training cum Internship programme on AI & ML under TEQIP-III orgainsed by Assam Science Technology University
Dates:- 10 Aug 2020 - 20 Sep 2020
Link:- http://eict.iitg.ac.in/online_courses_training.html
Title:- 6 week Training cum Internship programme on Industry 4.0 (Industrial IoT) under TEQIP-III orgainsed by Assam Science Technology University
Dates:- 10 Aug 2020 - 20 Sep 2020
Link:- http://eict.iitg.ac.in/online_courses_training.html
Title:- 6 week Training cum Internship programme on Robotics Fundamentals under TEQIP-III orgainsed by Assam Science Technology University
Dates:- 10 Aug 2020 - 20 Sep 2020
Link:- http://eict.iitg.ac.in/online_courses_training.html
现在,从过去的五个小时开始,我一直在混乱,以便能够以一种以后可以通过简单的for循环访问它的方式存储结果。
我如何做到这一点?
解决方法
例如,您可以使用json
模块将数据写入磁盘:
import json
import requests
from bs4 import BeautifulSoup
response = requests.get("http://www.iitg.ac.in/home/eventsall/events")
soup = BeautifulSoup(response.content,"html.parser")
cards = soup.find_all("div",attrs={"class": "newsarea"})
events = []
for card in cards[0:6]:
events.append((
card.find("div",attrs={"class": "ntitle"}).text.strip(),card.find("div",attrs={"class": "ndate"}).text,attrs={"class": "ntitle"}).a['href']
))
# save data:
with open('data.json','w') as f_out:
json.dump(events,f_out)
# ...
# load data back:
with open('data.json','r') as f_in:
events = json.load(f_in)
print("Upcoming event details scraped from iitg website:- \n")
for t,d,l in events:
print("Title:- ",t)
print("Dates:- ",d)
print("Link:- ",l)
print('\n')
打印:
Upcoming event details scraped from iitg website:-
Title:- 4 batch for the certification programme on AI & ML by Eckovation in association with E&ICT Academy IIT Guwahati
Dates:- 15 Aug 2020 - 15 Aug 2020
Link:- http://eict.iitg.ac.in/online_courses_training.html
Title:- 8th International and 47th National conference on Fluid Mechanics and Fluid Power
Dates:- 09 Dec 2020 - 11 Dec 2020
Link:- https://event.iitg.ac.in/fmfp2020/
Title:- 4 months Internship programme on VLSI Circuit Design
Dates:- 10 Aug 2020 - 10 Dec 2020
Link:- http://eict.iitg.ac.in/online_courses_training.html
Title:- 6 week Training cum Internship programme on AI & ML under TEQIP-III orgainsed by Assam Science Technology University
Dates:- 10 Aug 2020 - 20 Sep 2020
Link:- http://eict.iitg.ac.in/online_courses_training.html
Title:- 6 week Training cum Internship programme on Industry 4.0 (Industrial IoT) under TEQIP-III orgainsed by Assam Science Technology University
Dates:- 10 Aug 2020 - 20 Sep 2020
Link:- http://eict.iitg.ac.in/online_courses_training.html
Title:- 6 week Training cum Internship programme on Robotics Fundamentals under TEQIP-III orgainsed by Assam Science Technology University
Dates:- 10 Aug 2020 - 20 Sep 2020
Link:- http://eict.iitg.ac.in/online_courses_training.html