无法找到一种方式来存储这种抓取的数据,以便以后可以借助一个简单的循环访问它?

问题描述

我试图从某个机构抓取所有即将举行的活动的详细信息:-

import requests
from bs4 import BeautifulSoup

response = requests.get("http://www.iitg.ac.in/home/eventsall/events")
soup = BeautifulSoup(response.content,"html.parser")
cards = soup.find_all("div",attrs={"class": "newsarea"})

iitg_title = []
iitg_date = []
iitg_link = []
for card in cards[0:6]:
    iitg_date.append(card.find("div",attrs={"class": "ndate"}).text)
    iitg_title.append(card.find("div",attrs={"class": "ntitle"}).text.strip())
    iitg_link.append(card.find("div",attrs={"class": "ntitle"}).a['href'])

print("Upcoming event details scraped from iitg website:- \n")
for i in range(len(iitg_title)):
    print("Title:- ",iitg_title[i])
    print("Dates:- ",iitg_date[i])
    print("Link:- ",iitg_link[i])
    print('\n')

上面的代码为我提供了这些详细信息:-

Upcoming event details scraped from iitg website:- 

Title:-  4 batch for the certification programme on AI & ML by Eckovation in association with E&ICT Academy IIT Guwahati
Dates:-  15 Aug 2020 - 15 Aug 2020
Link:-  http://eict.iitg.ac.in/online_courses_training.html


Title:-  8th International and 47th National conference on Fluid Mechanics and Fluid Power
Dates:-  09 Dec 2020 - 11 Dec 2020
Link:-  https://event.iitg.ac.in/fmfp2020/


Title:-  4 months Internship programme on VLSI Circuit Design
Dates:-  10 Aug 2020 - 10 Dec 2020
Link:-  http://eict.iitg.ac.in/online_courses_training.html


Title:-  6 week Training cum Internship programme on AI & ML under TEQIP-III orgainsed by Assam Science Technology University
Dates:-  10 Aug 2020 - 20 Sep 2020
Link:-  http://eict.iitg.ac.in/online_courses_training.html


Title:-  6 week Training cum Internship programme on Industry 4.0 (Industrial IoT) under TEQIP-III orgainsed by Assam Science Technology University
Dates:-  10 Aug 2020 - 20 Sep 2020
Link:-  http://eict.iitg.ac.in/online_courses_training.html


Title:-  6 week Training cum Internship programme on Robotics Fundamentals under TEQIP-III orgainsed by Assam Science Technology University
Dates:-  10 Aug 2020 - 20 Sep 2020
Link:-  http://eict.iitg.ac.in/online_courses_training.html

现在,从过去的五个小时开始,我一直在混乱,以便能够以一种以后可以通过简单的for循环访问它的方式存储结果。
我如何做到这一点?

解决方法

例如,您可以使用json模块将数据写入磁盘:

import json
import requests
from bs4 import BeautifulSoup

response = requests.get("http://www.iitg.ac.in/home/eventsall/events")
soup = BeautifulSoup(response.content,"html.parser")
cards = soup.find_all("div",attrs={"class": "newsarea"})

events = []
for card in cards[0:6]:
    events.append((
        card.find("div",attrs={"class": "ntitle"}).text.strip(),card.find("div",attrs={"class": "ndate"}).text,attrs={"class": "ntitle"}).a['href']
    ))

# save data:
with open('data.json','w') as f_out:
    json.dump(events,f_out)

# ...

# load data back:
with open('data.json','r') as f_in:
    events = json.load(f_in)

print("Upcoming event details scraped from iitg website:- \n")
for t,d,l in events:
    print("Title:- ",t)
    print("Dates:- ",d)
    print("Link:- ",l)
    print('\n')

打印:

Upcoming event details scraped from iitg website:- 

Title:-  4 batch for the certification programme on AI & ML by Eckovation in association with E&ICT Academy IIT Guwahati
Dates:-  15 Aug 2020 - 15 Aug 2020
Link:-  http://eict.iitg.ac.in/online_courses_training.html


Title:-  8th International and 47th National conference on Fluid Mechanics and Fluid Power
Dates:-  09 Dec 2020 - 11 Dec 2020
Link:-  https://event.iitg.ac.in/fmfp2020/


Title:-  4 months Internship programme on VLSI Circuit Design
Dates:-  10 Aug 2020 - 10 Dec 2020
Link:-  http://eict.iitg.ac.in/online_courses_training.html


Title:-  6 week Training cum Internship programme on AI & ML under TEQIP-III orgainsed by Assam Science Technology University
Dates:-  10 Aug 2020 - 20 Sep 2020
Link:-  http://eict.iitg.ac.in/online_courses_training.html


Title:-  6 week Training cum Internship programme on Industry 4.0 (Industrial IoT) under TEQIP-III orgainsed by Assam Science Technology University
Dates:-  10 Aug 2020 - 20 Sep 2020
Link:-  http://eict.iitg.ac.in/online_courses_training.html


Title:-  6 week Training cum Internship programme on Robotics Fundamentals under TEQIP-III orgainsed by Assam Science Technology University
Dates:-  10 Aug 2020 - 20 Sep 2020
Link:-  http://eict.iitg.ac.in/online_courses_training.html

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...