如何使用python从网站的链接页面中提取数据

问题描述

我一直在尝试从网页中抓取数据用于数据分析项目,但我成功地从单个页面获取了数据。

import requests
from bs4 import BeautifulSoup
import concurrent.futures
from urllib.parse import urlencode
from scraper_api import Scraperapiclient


    client = Scraperapiclient('key')
    results = client.get(url = "https://www.essex.ac.uk/course-search?query=&f.Level%7CcourseLevel=Undergraduate").text
    
    print(results)

站点“https://www.essex.ac.uk/course-search?query=&f.Level%7CcourseLevel=Undergraduate”中的示例为例,我需要在每个课程中导航并获取一个名为持续时间的数据来自那个页面

解决方法

试试下面的:

client = ScraperAPIClient('key')
results = []
for i in range(10):
   results.append(client.get(url = f"https://www.essex.ac.uk/course-search?query=&f.Level%7CcourseLevel=Undergraduate&start_rank={i}1").text)
    
print(results)

遍历 10 个结果页面并将每个文本响应放入结果列表

,
import requests
from bs4 import BeautifulSoup
import concurrent.futures
from urllib.parse import urlencode
from scraper_api import ScraperAPIClient

client = ScraperAPIClient('key')
total_pages = 12
for page_no in range(total_pages):
    # you control this page_no variable.
    # go to the website and see how the api go to the next page
    # it depends on the 'start_rank' at the end of the URL
    # for example start_rank=10,start_rank=20 will get you one page after another
    rank = page_no * 10
    results = client.get(url="https://www.essex.ac.uk/course-search?query=&f.Level%7CcourseLevel=Undergraduate&start_rank={0}".format(rank)).text
    print(results)

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...