调用Urlopen方法

问题描述

我有一个Python脚本,其目的是根据用户输入打开网页,然后从该网页中抓取特定信息。该脚本以以下导入语句开头:

import socks
import socket
from urllib.request import urlopen
from time import sleep
from bs4 import BeautifulSoup

socks.set_default_proxy(socks.SOCKS5,"127.0.0.1",9050)
socket.socket = socks.socksocket

发生错误的部分涉及处理所需网页的网址。

url_name = "http://<website name>"
print("url name is : " + url_name)
print("About to open the web page")
sleep(5)
**webpage = urlopen(url_name)**
print("Web page opened successfully")
sleep(5)
html = webpage.read().decode("utf-8")
soup = BeautifulSoup(html,"html.parser")
print("HTML extracted")
sleep(5)
print("Printing soup object text")
sleep(5)
print(soup.get_text())

当脚本到达突出显示的语句(在其中调用urlopen方法的位置)时,我收到以下错误消息:

1599147846 WARNING torsocks[20820]: [connect] Connection to a local address are denied since it might be a TCP DNS query to a local DNS server. Rejecting it for safety reasons. (in tsocks_connect() at connect.c:193)
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/socks.py",line 832,in connect
    super(socksocket,self).connect(proxy_addr)
PermissionError: [Errno 1] Operation not permitted

During handling of the above exception,another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/urllib/request.py",line 1326,in do_open
    h.request(req.get_method(),req.selector,req.data,headers,File "/usr/lib/python3.8/http/client.py",line 1240,in request
    self._send_request(method,url,body,encode_chunked)
  File "/usr/lib/python3.8/http/client.py",line 1286,in _send_request
    self.endheaders(body,encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py",line 1235,in endheaders
    self._send_output(message_body,line 1006,in _send_output
    self.send(msg)
  File "/usr/lib/python3.8/http/client.py",line 946,in send
    self.connect()
  File "/usr/lib/python3.8/http/client.py",line 917,in connect
    self.sock = self._create_connection(
  File "/usr/lib/python3.8/socket.py",line 808,in create_connection
    raise err
  File "/usr/lib/python3.8/socket.py",line 796,in create_connection
    sock.connect(sa)
  File "/usr/lib/python3/dist-packages/socks.py",line 100,in wrapper
    return function(*args,**kwargs)
  File "/usr/lib/python3/dist-packages/socks.py",line 844,in connect
    raise ProxyConnectionError(msg,error)
socks.ProxyConnectionError: Error connecting to SOCKS5 proxy 127.0.0.1:9050: [Errno 1] Operation not permitted

During handling of the above exception,another exception occurred:

Traceback (most recent call last):
  File "dark_web_scrape_main.py",line 68,in <module>
    webpage = urlopen(url_name)
  File "/usr/lib/python3.8/urllib/request.py",line 222,in urlopen
    return opener.open(url,data,timeout)
  File "/usr/lib/python3.8/urllib/request.py",line 525,in open
    response = self._open(req,data)
  File "/usr/lib/python3.8/urllib/request.py",line 542,in _open
    result = self._call_chain(self.handle_open,protocol,protocol +
  File "/usr/lib/python3.8/urllib/request.py",line 502,in _call_chain
    result = func(*args)
  File "/usr/lib/python3.8/urllib/request.py",line 1355,in http_open
    return self.do_open(http.client.HTTPConnection,req)
  File "/usr/lib/python3.8/urllib/request.py",line 1329,in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error Error connecting to SOCKS5 proxy 127.0.0.1:9050: [Errno 1] Operation not permitted>


此外,我在与该脚本相同的VM(即Ubuntu v20.04)中运行了tosock。

有人提到使用此脚本运行“ sudo”。但是,这样做是这样的:

$ sudo python3 dark_web_scrape_main.py 
Traceback (most recent call last):
  File "dark_web_scrape_main.py",line 5,in <module>
    from bs4 import BeautifulSoup
ModuleNotFoundError: No module named 'bs4'

因此,通过最初使用“ sudo”运行此脚本,我什至无法进入数据输入提示。然而,以普通用户身份运行此脚本后,它会识别socks模块,从而使我更进一步。

在运行此脚本之前,请确保已安装袜子,插座和beautifulsoup4。我什至尝试安装bs4(“ beautifulsoup4”的缩写)。这是显示的内容:

$ pip3 install bs4
Collecting bs4
  Downloading bs4-0.0.1.tar.gz (1.1 kB)
Requirement already satisfied: beautifulsoup4 in ./.local/lib/python3.8/site-packages (from bs4) (4.9.1)
Requirement already satisfied: soupsieve>1.2 in ./.local/lib/python3.8/site-packages (from beautifulsoup4->bs4) (2.0.1)
Building wheels for collected packages: bs4
  Building wheel for bs4 (setup.py) ... done
  Created wheel for bs4: filename=bs4-0.0.1-py3-none-any.whl size=1273 sha256=912f922932a07d98aa26eca2ba3dde8e761813eea766dfe42617135f038943e4
  Stored in directory: /home/jbottiger/.cache/pip/wheels/75/78/21/68b124549c9bdc94f822c02fb9aa3578a669843f9767776bca
Successfully built bs4
Installing collected packages: bs4
Successfully installed bs4-0.0.1

我使用'sudo'重新运行了脚本,但是收到了相同的错误消息:

$ sudo python3 dark_web_scrape_main.py 
Traceback (most recent call last):
  File "dark_web_scrape_main.py",in <module>
    from bs4 import BeautifulSoup
ModuleNotFoundError: No module named 'bs4'

我发现我没有正确安装bs4模块。因此,我确保该模块已正确安装:

sudo apt-get install python3-bs4

运行“ sudo python3 dark_web_scrape_main.py”,我终于看懂了输入法部分,但是这次尝试执行urlopen方法时,显示以下错误消息:

About to open the web page
Traceback (most recent call last):
  File "/usr/lib/python3.8/urllib/request.py",line 787,in create_connection
    for res in getaddrinfo(host,port,SOCK_STREAM):
  File "/usr/lib/python3.8/socket.py",line 918,in getaddrinfo
    for res in _socket.getaddrinfo(host,family,type,proto,flags):
socket.gaierror: [Errno -2] Name or service not known

During handling of the above exception,in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno -2] Name or service not known>

我当时无法在Ubuntu v20.04 VM的Firefox浏览器上打开洋葱站点。因此,为了获得乐趣,我打开Firefox,然后在浏览器窗口中输入:“ http://xmh57jrzrnw6insl.onion”。它返回“我们无法在'http://xmh57jrzrnw6insl.onion'连接到服务器”。

我在https://protonmail.com/support/knowledge-base/firefox-onion-sites/上研究了此特定问题,并遵循以下步骤:

  1. 在Firefox中,在浏览器URL字段(又名搜索栏)中输入“ about:config”。
  2. 选择按钮“接受风险并继续”。
  3. 在搜索栏中输入“ network.dns.blockDotOnion”。
  4. 该属性的当前设置为“真”;切换为“假”。

尝试访问该洋葱站点。仍然不起作用。

我什至通过删除以下语句中的注释标记来更新/ etc / tor / torrc文件:

ControlPort 9051
CookieAuthorization 1

我还将“ CookieAuthorization”属性值修改为“ 0”。仍然无法访问洋葱站点。

最后,我意识到在Firefox的“ about:preferences”部分中,虽然我使用localhost:9050设置了手动代理配置,但我忘记了取消选择“通过HTTPS启用DNS”并选择“使用SOCKS v5时的代理DNS” 。现在,我可以在Firefox浏览器中访问洋葱站点了。但是,在脚本中到达urlopen方法调用时,仍然会出现错误。请告知。

我的教授建议我以“ torsocks”作为“ python3

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...