我在使用beautifulsoup进行网页抓取时遇到了一些麻烦

问题描述

当我尝试使用.text()提取标签间的文本时,会显示空白屏幕,仅显示[]作为输出

import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.amazon.in/s?k=ssd&ref=nb_sb_noss")

soup = BeautifulSoup(page.content,"html.parser")

product = soup.find_all("h2",class_="a-link-normal a-text-normal")
results = soup.find_all("span",class_="a-offscreen")

print(product)

这是我得到的输出

C:\Users\Kushal\Desktop\requests-tutorial>C:/Users/Kushal/AppData/Local/Programs/Python/python37/python.exe c:/Users/Kushal/Desktop/requests-tutorial/scraper.py
[]

当我尝试使用for循环列出所有内容时,什么都没有显示,甚至没有空的方括号

解决方法

根据您在下面的评论。我已经修改了代码,以获取上述页面上的所有产品标题以及价格明细。

如果可行,请标记为答案,否则请发表评论以供进一步分析。

import requests
from bs4 import BeautifulSoup
import lxml


dataList = list()
headers = {
    "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5)","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","accept-charset": "cp1254,ISO-8859-9,utf-8;q=0.7,*;q=0.3","accept-encoding": "gzip,deflate,sdch","accept-language": "tr,tr-TR,en-US,en;q=0.8",} 

url = requests.get('https://www.amazon.in/s?k=ssd&ref=nb_sb_noss'.format(),headers=headers)

soup = BeautifulSoup(url.content,'lxml')

title = soup.find_all('span',attrs={'class':'a-size-medium a-color-base a-text-normal'})
price = soup.find_all('span',attrs={'class':'a-offscreen'})


for product in zip(title,price):
    title,price=product
    title_proper=title.text.strip()
    price_proper=price.text.strip()
    print(title_proper,'-',price_proper)