如何使用 Beautiful Soup

问题描述

我想从给定的脚本或网站上出现的任何 id 中提取 productId 值(186852001461),使用美丽的汤。

<script type="text/javascript">
 /* <![CDATA[ */
var bv_single_product = {"prodname":"Honey Graham Gelato","productId":"186852001461"};
/* ]]> */
</script>

我的代码

import re
import requests
from bs4 import BeautifulSoup
final = "https://www.talentigelato.com/products/honey-graham-gelato"
response = requests.get(final,timeout=35)
soup = BeautifulSoup(response.content,"html.parser") 
s = soup.findAll('script',attrs={'type': 'text/javascript'} )[17]
print(type(s))
html_content = str(s)
html_content = s.prettify()
print(html_content))

解决方法

您需要先使用 .string,然后使用 regex,以便将值转储到 json.loads()

方法如下:

import json
import re

import requests
from bs4 import BeautifulSoup

final = "https://www.talentigelato.com/products/honey-graham-gelato"
response = requests.get(final,timeout=35)
soup = BeautifulSoup(response.content,"html.parser")
s = soup.findAll('script',attrs={'type': 'text/javascript'})[17]
data = json.loads(re.search(r"single_product = ({.*})",s.string).group(1))
print(data["productId"])

输出:

186852001461