如何获得BeautifulSoup的多个课程?

问题描述

试图从skidrowreloaded获取种子链接

在帖子详细信息页面上,我们有一个这样的div,我尝试通过id来获取该div,但我认为 id是动态的 ,所以我尝试了使该div 按班级 但不起作用,

<div id="tabs-105235-0-0" aria-labelledby="ui-id-1" class="ui-tabs-panel ui-widget-content ui-corner-bottom" role="tabpanel" aria-hidden="false">

以下代码未返回

source2 = source.find("div",{"class": "ui-tabs-panel ui-widget-content ui-corner-bottom"})

错误

AttributeError: 'nonetype' object has no attribute 'find_all'

完整代码

import os
from bs4 import BeautifulSoup
import requests
import webbrowser

clear = lambda: os.system('cls')
clear()
r = requests.get('https://www.skidrowreloaded.com/')
source = BeautifulSoup(r.content,"lxml")
source2 = source.find_all("h2")
games = []
for i in source2:
    games.append(i.a.get("href"))

lastgame = games[0]

r = requests.get(lastgame)
source = BeautifulSoup(r.content,"lxml")
source2 = source.find("div",{"class": "ui-tabs-panel ui-widget-content ui-corner-bottom"})
source3 = source2.find_all("a")
k = 0;
for i in source3:
    if k == 0: #hide steam link.
        k = k + 1
    else:      
        if i.get("href") == "https://www.skidrowreloaded.com": #hide null links
            pass
        else: #throwing links to the browser
            print(i.get("href"))
            webbrowser.open(i.get("href"))   
        k = k + 1

解决方法

您可以按照BeautifulSoup documentation

中的说明使用find_all
import requests
from bs4 import BeautifulSoup
response = requests.get("your URL here")
soup = BeautifulSoup(response.text,'html.parser')
raw_data = soup.find_all("div",class_="ui-tabs-panel ui-widget-content ui-corner-bottom")
# do something with the data

编辑- 在查看response.text时,div存在,但没有要查找的class,因此返回空。您可以像这样使用正则表达式进行搜索

import requests,re
from bs4 import BeautifulSoup
response = requests.get("your URL here")
soup = BeautifulSoup(response.text,id=re.compile("^tabs"))
for ele in raw_data:
    a_tag = ele.find("a")
    # do something with the a_tag
,

要获取所有链接,请尝试以下操作:

import requests
from bs4 import BeautifulSoup

url = "https://www.skidrowreloaded.com/projection-first-light-goldberg/"
soup = BeautifulSoup(requests.get(url).text,"html.parser").find_all("a",{"target": "_blank"})
skip = 'https://www.skidrowreloaded.com'
print([a['href'] for a in soup if a['href'].startswith('https') and a['href'] != skip])

输出:

['https://store.steampowered.com/app/726490/Projection_First_Light/','https://mega.nz/file/geogAATS#-0U0PklF-Q5i5l_SELzYx3klh5FZob9HaD4QKcFH_8M','https://uptobox.com/rqnlpcp7yb3v','https://1fichier.com/?0syphwpyndpo38af04ky','https://yadi.sk/d/KAmlsBmGaI1f2A','https://pixeldra.in/u/wmcsjuhv','https://dropapk.to/v6r7mjfgxjq6','https://gofile.io/?c=FRWL1o','https://racaty.net/dkvdyjqvg02e','https://bayfiles.com/L0k7Qea2pb','https://tusfiles.com/2q00y4huuv15','https://megaup.net/2f0pv/Projection.First.Light-GoldBerg.zip','https://letsupload.org/88t5','https://filesupload.org/0d7771dfef54d055','https://dl.bdupload.in/17ykjrifizrb','https://clicknupload.co/o0k9dnd3iwoy','https://dailyuploads.net/n1jihwjwdmjp','https://userscloud.com/nircdd4q1t5w','https://rapidgator.net/file/b6b8f5782c7c2bdb534214342b58ef18','https://turbobit.net/m308zh1hdpba.html','https://hitfile.net/5OhkcqZ','https://filerio.in/0wbvn4md4i91','https://mirrorace.org/m/1Fiic','https://go4up.com/dl/0ee9f4866312b5/Projection.First.Light-GoldBerg.zip','https://katfile.com/w74l823vuyw5/Projection.First.Light-GoldBerg.zip.html','https://multiup.org/download/3d355ba18d58234c792da7a872ab4998/Projection.First.Light-GoldBerg.zip','https://dl1.indishare.in/hs55pkx4ex82']