request.get返回的HTML不同于我的浏览器中的HTML

问题描述

试图从this网站获取链接。但是请注意，我从解析中获得的链接与浏览器中显示的链接不同。没有任何链接丢失，因为浏览器和解析结果均显示14个超链接（针对系列）。但是我的浏览器显示了一些我的“结果”没有的链接，而我的“结果”显示了一些我的浏览器没有的链接。

例如，我的结果显示了一个类似

的链接

“ https://4anime.to/anime/one-piece-nenmatsu-tokubetsu-kikaku-mugiwara-no-luffy-oyabun-torimonochou”

但是当我在浏览器中搜索“ torimonochou”一词时，找不到任何匹配项。

在页面源中搜索链接（右键单击页面并选择视图页面源），因此我不应该错过任何内容。还在我的浏览器的headers.get（）中传递了标题，因此我应该获得相同的HTML代码。

代码：

head = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:79.0) Gecko/20100101 Firefox/79.0'}

searchResObj = requests.get("https://4anime.to/?s=one+piece",headers = head)
soupObj = bs4.BeautifulSoup(searchResObj.text,features="html.parser")

尝试了各种不同的方法来分析链接。这只是一个简化的版本，可获取页面中的所有链接，因此我不会丢失任何链接。

all_a = soupObj.select("a")

for links in all_a:
    print(links.get("href"))

还查看了我的编译器中的HTML代码。这些超链接确实与我的浏览器中显示的超链接不同

print(searchResObj.text)

那可能是什么原因造成的？

解决方法

运行此脚本还将打印14个链接，这些链接也会显示在浏览器中（也许您有验证码页面？）：

import requests
from bs4 import BeautifulSoup


searchResObj = requests.get("https://4anime.to/?s=one+piece")
soupObj = BeautifulSoup(searchResObj.text,features="html.parser")

for a in soupObj.select('#headerDIV_95 > a'):
    print(a['href'])

打印：

https://4anime.to/anime/one-piece-nenmatsu-tokubetsu-kikaku-mugiwara-no-luffy-oyabun-torimonochou
https://4anime.to/anime/one-piece-straw-hat-theater
https://4anime.to/anime/one-piece-movie-14-stampede
https://4anime.to/anime/one-piece-yume-no-soccer-ou
https://4anime.to/anime/one-piece-mezase-kaizoku-yakyuu-ou
https://4anime.to/anime/one-piece-umi-no-heso-no-daibouken-hen
https://4anime.to/anime/one-piece-film-gold
https://4anime.to/anime/one-piece-heart-of-gold
https://4anime.to/anime/one-piece-episode-of-sorajima
https://4anime.to/anime/one-piece-episode-of-sabo
https://4anime.to/anime/one-piece-episode-of-nami
https://4anime.to/anime/one-piece-episode-of-merry
https://4anime.to/anime/one-piece-episode-of-luffy
https://4anime.to/anime/one-piece-episode-of-east-blue

编辑：“查看源代码”的屏幕截图：

beautifulsoup html python python-requests