我怎样才能通过beautifulsoup获得这个href？

问题描述

我想在此网站上获取产品网址： https://stockx.com/search?s=555088-105

但我试试这个代码

link = soup.find("div",class_ = 'browse-grid loading undefined')
print(link)

它只是返回

<div class="browse-grid loading undefined"><div class="back-to-top"><div class="back-to-top-container"><img alt="back to top" src="https://stockx-assets.imgix.net/svg/icons/back-to-top.svg?auto=compress,format"/><span>TOP</span></div></div><div class="browse-grid"><div class="no-results">nothing TO SEE HERE! PLEASE CHANGE YOUR FILTERS OR <a href="/product-suggestion">Suggest a Product</a></div></div></div>

或者我试试这个，它只是打印没有我想要的网址的所有网址

a_tags = soup.find_all('a')
for tag in a_tags:
  print(tag.get('href'))

如何获取图片中的网址？

解决方法

您在页面上看到的 URL 是通过 JavaScript 从外部源加载的 - 因此 beautifulsoup 看不到它。您可以使用 requests 模块模拟 Ajax 请求：

import re
import json
import requests

url = "https://stockx.com/search?s=555088-105"
api_url = "https://stockx.com/api/browse"

id_ = re.search(r"s=([\d-]+)",url).group(1)
params = {
    "": "","currency": "EUR","_search": id_,"dataType": "product",}

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0","Referer": url,}

data = requests.get(api_url,params=params,headers=headers).json()

# uncomment this to print all data:
# print(json.dumps(data,indent=4))

for product in data["Products"]:
    print("https://stockx.com/" + product["urlKey"])

打印：

https://stockx.com/air-jordan-1-retro-high-dark-mocha

beautifulsoup beautifulsoup href href href python python-requests-html tags