我正在尝试对该网站进行图像抓取,但我正在抓取的网站似乎没有通过实际输出图像来响应

问题描述

我是网络抓取的新手,所以我不完全确定在这里做什么。但我正在尝试从 this URL 中的站点提取图像:

以下是最接近工作的循环:

带解析函数的for循环

import requests
import os as os
from tqdm import tqdm
from bs4 import BeautifulSoup as bs
from urllib.parse import urljoin,urlparse

url = "https://www.legacysurvey.org/viewer/data-for-radec/?ra=55.0502&dec=-18.5790&layer=ls-dr8&ralo=55.0337&rahi=55.0655&declo=-18.5892&dechi=-18.5714"
def is_valid(url):
    """
    Checks whether `url` is a valid URL.
    """
    parsed = urlparse(url)
    return bool(parsed.netloc) and bool(parsed.scheme)

def get_all_images(url):
    """
    Returns all image URLs on a single `url`
    """
    soup = bs(requests.get(url).content,"html.parser")
urls = []
for img in tqdm(soup.find_all("img"),"Extracting images"):
    img_url = img.attrs.get("src")
    if not img_url:
        # if img does not contain src attribute,just skip
        continue
os.getcwd()

While 循环 - 图片抓取

import requests
from bs4 import BeautifulSoup

# link to first page - without `page=`
url = 'https://www.legacysurvey.org/viewer/data-for-radec/?ra=55.0502&dec=-18.5799&layer=ls-dr8&ralo=55.0337&rahi=55.0655&declo=-18.5892&dechi=-18.5714'

# only for information,not used in url
page = 0 

while True:

    print('---',page,'---')

    r = requests.get(url)

    soup = BeautifulSoup(r.content,"html.parser")

    # String substitution for HTML
    for link in soup.find_all("img"):
        print("<img href='>%s'>%s</img>" % (link.get("href"),link.text))

    # Fetch and print general data from title class
    general_data = soup.find_all('div',{'class' : 'title'})

    for item in general_data:
        print(item.contents[0].text)
        print(item.contents[1].text.replace('.',''))
        print(item.contents[2].text)

    # link to next page

    next_page = soup.find('a',{'class': 'next'})

    if next_page:
        url = next_page.get('href')
        page += 1
    else:
        break # exit `while True`

我尝试将这两种方法都用于下载输出的图像链接,但我无法获得任何我尝试过的输出。非常感谢任何帮助!

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)