问题描述
我尝试了不同的方法来使Python脚本中的Selenium工作到使用online Jupyter notebook刮网,但没有成功。我阅读了许多其他说明(例如this,this)或答案(例如this,this,this,this等)关于类似的问题,但似乎对我没有任何帮助。 在虚拟环境中,我在开发文件夹中同时下载了Firefox(v81.0)和geckodriver(v0.27),并且授予了我所有的权限:
jupyterlab@jupyterlab-sps:/resources/testDevelop$ ls -l
total 7797
drwxrwsr-x 8 jupyterlab resources 4096 Oct 8 13:24 firefox
-rwxrwxrwx 1 jupyterlab resources 7274984 Oct 8 13:21 geckodriver
-rw-rw-r-- 1 jupyterlab resources 120 Oct 12 08:47 geckodriver.log
-rw-rw-r-- 1 jupyterlab resources 31813 Oct 12 09:42 testDevelop.ipynb
和
jupyterlab@jupyterlab-sps:/resources/testDevelop/firefox$ ls -l
total 165651
-rw-rw-r-- 1 jupyterlab resources 825 Sep 30 14:26 Throbber-small.gif
-rw-rw-r-- 1 jupyterlab resources 895 Sep 30 15:49 application.ini
drwxrwsr-x 4 jupyterlab resources 4096 Oct 8 13:24 browser
-rwxrwxr-x 1 jupyterlab resources 241720 Sep 30 16:28 crashreporter
-rw-rw-r-- 1 jupyterlab resources 4003 Sep 30 14:26 crashreporter.ini
drwxrwsr-x 3 jupyterlab resources 4096 Oct 8 13:24 defaults
-rw-rw-r-- 1 jupyterlab resources 174 Sep 30 16:28 dependentlibs.list
-rwxrwxr-x 1 jupyterlab resources 14656 Sep 30 16:28 firefox
-rwxrwxr-x 1 jupyterlab resources 569104 Sep 30 16:28 firefox-bin
-rw-rw-r-- 1 jupyterlab resources 1449 Sep 30 16:32 firefox-bin.sig
-rw-rw-r-- 1 jupyterlab resources 1449 Sep 30 16:32 firefox.sig
drwxrwsr-x 2 jupyterlab resources 4096 Oct 8 13:24 fonts
drwxrwsr-x 3 jupyterlab resources 4096 Oct 8 13:24 gmp-clearkey
drwxrwsr-x 2 jupyterlab resources 4096 Oct 8 13:24 gtk2
drwxrwsr-x 2 jupyterlab resources 4096 Oct 8 13:24 icons
-rwxrwxr-x 1 jupyterlab resources 895568 Sep 30 16:28 libfreeblpriv3.so
-rwxrwxr-x 1 jupyterlab resources 691064 Sep 30 16:28 libgraphitewasm.so
-rwxrwxr-x 1 jupyterlab resources 43408 Sep 30 16:28 liblgpllibs.so
-rwxrwxr-x 1 jupyterlab resources 2175768 Sep 30 16:28 libmozavcodec.so
-rwxrwxr-x 1 jupyterlab resources 220128 Sep 30 16:28 libmozavutil.so
-rwxrwxr-x 1 jupyterlab resources 14352 Sep 30 16:28 libmozgtk.so
-rwxrwxr-x 1 jupyterlab resources 113512 Sep 30 16:28 libmozsandBox.so
-rwxrwxr-x 1 jupyterlab resources 1207424 Sep 30 16:28 libmozsqlite3.so
-rwxrwxr-x 1 jupyterlab resources 18376 Sep 30 16:28 libmozwayland.so
-rwxrwxr-x 1 jupyterlab resources 243728 Sep 30 16:28 libnspr4.so
-rwxrwxr-x 1 jupyterlab resources 694896 Sep 30 16:28 libnss3.so
-rwxrwxr-x 1 jupyterlab resources 465616 Sep 30 16:28 libnssckbi.so
-rwxrwxr-x 1 jupyterlab resources 191728 Sep 30 16:28 libnssutil3.so
-rwxrwxr-x 1 jupyterlab resources 184120 Sep 30 16:28 liboggwasm.so
-rwxrwxr-x 1 jupyterlab resources 22872 Sep 30 16:28 libplc4.so
-rwxrwxr-x 1 jupyterlab resources 14592 Sep 30 16:28 libplds4.so
-rwxrwxr-x 1 jupyterlab resources 168024 Sep 30 16:28 libsmime3.so
-rwxrwxr-x 1 jupyterlab resources 326208 Sep 30 16:28 libsoftokn3.so
-rwxrwxr-x 1 jupyterlab resources 406208 Sep 30 16:28 libssl3.so
-rwxrwxr-x 1 jupyterlab resources 131841712 Sep 30 16:28 libxul.so
-rw-rw-r-- 1 jupyterlab resources 1449 Sep 30 16:32 libxul.so.sig
-rwxrwxr-x 1 jupyterlab resources 1260688 Sep 30 16:28 minidump-analyzer
-rw-rw-r-- 1 jupyterlab resources 26270759 Sep 30 16:32 omni.ja
-rwxrwxr-x 1 jupyterlab resources 614144 Sep 30 16:28 pingsender
-rw-rw-r-- 1 jupyterlab resources 166 Sep 30 16:28 platform.ini
-rwxrwxr-x 1 jupyterlab resources 564936 Sep 30 16:28 plugin-container
-rw-rw-r-- 1 jupyterlab resources 1449 Sep 30 16:32 plugin-container.sig
-rw-rw-r-- 1 jupyterlab resources 2017 Sep 30 16:32 precomplete
-rw-rw-r-- 1 jupyterlab resources 0 Sep 30 16:28 removed-files
-rw-rw-r-- 1 jupyterlab resources 132 Sep 30 16:28 update-settings.ini
-rwxrwxr-x 1 jupyterlab resources 101864 Sep 30 16:28 updater
-rw-rw-r-- 1 jupyterlab resources 638 Sep 30 16:28 updater.ini
我还将firefox和geckodriver的路径添加到env变量路径中,即:
jupyterlab@jupyterlab-sps:/resources/testDevelop/firefox$ echo $PATH
/resources/testDevelop:/resources/testDevelop/firefox:/resources/firefox:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/jupyterlab/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/jre1.8.0_211/bin:/home/jupyterlab/hadoop-2.9.2/bin:/home/jupyterlab/spark-2.4.3/bin
但是,如果我尝试这段代码:
import os
import selenium
from selenium import webdriver
from selenium.webdriver import Firefox
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
driver=Firefox(executable_path='/resources/testDevelop/geckodriver',)
我知道了:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-23-e332a8e620e3> in <module>
6 from webdriver_manager.firefox import GeckoDriverManager
7
----> 8 driver=Firefox(executable_path='/resources/testDevelop/geckodriver',)
9 cap = DesiredCapabilities().FIREFOX
10 cap["marionette"] = False
~/conda/envs/python/lib/python3.6/site-packages/selenium/webdriver/firefox/webdriver.py in __init__(self,firefox_profile,firefox_binary,timeout,capabilities,proxy,executable_path,options,service_log_path,firefox_options,service_args,desired_capabilities,log_path,keep_alive)
177 else:
178 if self.binary is None:
--> 179 self.binary = FirefoxBinary()
180 if self.profile is None:
181 self.profile = FirefoxProfile()
~/conda/envs/python/lib/python3.6/site-packages/selenium/webdriver/firefox/firefox_binary.py in __init__(self,firefox_path,log_file)
45 self.command_line = None
46 if self._start_cmd is None:
---> 47 self._start_cmd = self._get_firefox_start_cmd()
48 if not self._start_cmd.strip():
49 raise WebDriverException(
~/conda/envs/python/lib/python3.6/site-packages/selenium/webdriver/firefox/firefox_binary.py in _get_firefox_start_cmd(self)
167 raise RuntimeError(
168 "Could not find firefox in your system PATH." +
--> 169 " Please specify the firefox binary location or install firefox")
170 return start_cmd
171
RuntimeError: Could not find firefox in your system PATH. Please specify the firefox binary location or install firefox
所以我尝试了:
import os
import selenium
from selenium import webdriver
from selenium.webdriver import Firefox
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
cap = DesiredCapabilities().FIREFOX
cap["marionette"] = False
driver = os.path.normpath(os.path.join(os.getcwd(),'geckodriver'))
binary = os.path.normpath(os.path.join(os.getcwd(),'firefox','firefox'))
ff_binary = webdriver.firefox.firefox_binary.FirefoxBinary(firefox_path=binary,log_file='ff_log.log')
#driver = webdriver.Firefox(executable_path=GeckoDriverManager().install())
browser = webdriver.Firefox(firefox_binary=ff_binary,capabilities=cap,executable_path=driver)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-26-0bb63b20498c> in <module>
12 binary = os.path.normpath(os.path.join(os.getcwd(),'firefox'))
13 ff_binary = webdriver.firefox.firefox_binary.FirefoxBinary(firefox_path=binary,log_file='ff_log.log')
---> 14 browser = webdriver.Firefox(firefox_binary=ff_binary,executable_path=driver)
~/conda/envs/python/lib/python3.6/site-packages/selenium/webdriver/firefox/webdriver.py in __init__(self,keep_alive)
189
190 executor = ExtensionConnection("127.0.0.1",self.profile,--> 191 self.binary,timeout)
192 RemoteWebDriver.__init__(
193 self,~/conda/envs/python/lib/python3.6/site-packages/selenium/webdriver/firefox/extension_connection.py in __init__(self,host,timeout)
50 self.profile.add_extension()
51
---> 52 self.binary.launch_browser(self.profile,timeout=timeout)
53 _URL = "http://%s:%d/hub" % (HOST,PORT)
54 RemoteConnection.__init__(
~/conda/envs/python/lib/python3.6/site-packages/selenium/webdriver/firefox/firefox_binary.py in launch_browser(self,profile,timeout)
70 self.profile = profile
71
---> 72 self._start_from_profile_path(self.profile.path)
73 self._wait_until_connectable(timeout=timeout)
74
~/conda/envs/python/lib/python3.6/site-packages/selenium/webdriver/firefox/firefox_binary.py in _start_from_profile_path(self,path)
93 self.process = Popen(
94 command,stdout=self._log_file,stderr=STDOUT,---> 95 env=self._firefox_env)
96
97 def _wait_until_connectable(self,timeout=30):
~/conda/envs/python/lib/python3.6/subprocess.py in __init__(self,args,bufsize,executable,stdin,stdout,stderr,preexec_fn,close_fds,shell,cwd,env,universal_newlines,startupinfo,creationflags,restore_signals,start_new_session,pass_fds,encoding,errors)
685 (p2cread,p2cwrite,686 c2pread,c2pwrite,--> 687 errread,errwrite) = self._get_handles(stdin,stderr)
688
689 # We wrap OS handles *before* launching the child,otherwise a
~/conda/envs/python/lib/python3.6/subprocess.py in _get_handles(self,stderr)
1202 else:
1203 # Assuming file-like object
-> 1204 c2pwrite = stdout.fileno()
1205
1206 if stderr is None:
AttributeError: 'str' object has no attribute 'fileno'
我不知道此错误是什么问题。我已经单独检查了地址的值,它们似乎是正确的,即:
- 二进制返回:'/ resources / StockScreener / firefox / firefox'
- 驱动程序返回:'/ resources / StockScreener / geckodriver'
- ff_binary返回:
我也尝试通过这种方式使用GeckoDriverManager:
import os
import selenium
from selenium import webdriver
from selenium.webdriver import Firefox
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from webdriver_manager.firefox import GeckoDriverManager
cap = DesiredCapabilities().FIREFOX
cap["marionette"] = False
driver = os.path.normpath(os.path.join(os.getcwd(),log_file='ff_log.log')
#browser = webdriver.Firefox(firefox_binary=ff_binary,executable_path=driver)
driver = webdriver.Firefox(executable_path=GeckoDriverManager().install())
并返回:
[WDM] - Driver [/home/jupyterlab/.wdm/drivers/geckodriver/linux64/v0.27.0/geckodriver] found in cache
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-32-012cab2ea574> in <module>
13 ff_binary = webdriver.firefox.firefox_binary.FirefoxBinary(firefox_path=binary,log_file='ff_log.log')
14 #browser = webdriver.Firefox(firefox_binary=ff_binary,executable_path=driver)
---> 15 driver = webdriver.Firefox(executable_path=GeckoDriverManager().install())
16 #browser.get('http://google.com/')
17 #Simple assignment
~/conda/envs/python/lib/python3.6/site-packages/selenium/webdriver/firefox/webdriver.py in __init__(self,log_file)
45 self.command_line = None
46 if self._start_cmd is None:
---> 47 self._start_cmd = self._get_firefox_start_cmd()
48 if not self._start_cmd.strip():
49 raise WebDriverException(
~/conda/envs/python/lib/python3.6/site-packages/selenium/webdriver/firefox/firefox_binary.py in _get_firefox_start_cmd(self)
167 raise RuntimeError(
168 "Could not find firefox in your system PATH." +
--> 169 " Please specify the firefox binary location or install firefox")
170 return start_cmd
171
RuntimeError: Could not find firefox in your system PATH. Please specify the firefox binary location or install firefox
考虑到在所有情况下,我都有:
jupyterlab@jupyterlab-sps:/resources/testDevelop/firefox$ whereis firefox
firefox: /resources/testDevelop/firefox /resources/testDevelop/firefox/firefox.sig /resources/testDevelop/firefox/firefox
最后,如果我只写:
import os
import selenium
from selenium import webdriver
from selenium.webdriver import Firefox
#from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
#from webdriver_manager.firefox import GeckoDriverManager
driver=Firefox(executable_path='/resources/testDevelop/geckodriver')
在不安装webdriver-manager的情况下重新启动内核后,出现以下错误:
---------------------------------------------------------------------------
SessionNotCreatedException Traceback (most recent call last)
<ipython-input-2-89dbd2507c70> in <module>
6 #from webdriver_manager.firefox import GeckoDriverManager
7
----> 8 driver=Firefox(executable_path='/resources/testDevelop/geckodriver')
9 #cap = DesiredCapabilities().FIREFOX
10 #cap["marionette"] = False
~/conda/envs/python/lib/python3.6/site-packages/selenium/webdriver/firefox/webdriver.py in __init__(self,keep_alive)
172 command_executor=executor,173 desired_capabilities=capabilities,--> 174 keep_alive=True)
175
176 # Selenium remote
~/conda/envs/python/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py in __init__(self,command_executor,browser_profile,keep_alive,file_detector,options)
155 warnings.warn("Please use FirefoxOptions to set browser profile",156 DeprecationWarning,stacklevel=2)
--> 157 self.start_session(capabilities,browser_profile)
158 self._switch_to = SwitchTo(self)
159 self._mobile = Mobile(self)
~/conda/envs/python/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py in start_session(self,browser_profile)
250 parameters = {"capabilities": w3c_caps,251 "desiredCapabilities": capabilities}
--> 252 response = self.execute(Command.NEW_SESSION,parameters)
253 if 'sessionId' not in response:
254 response = response['value']
~/conda/envs/python/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py in execute(self,driver_command,params)
319 response = self.command_executor.execute(driver_command,params)
320 if response:
--> 321 self.error_handler.check_response(response)
322 response['value'] = self._unwrap_value(
323 response.get('value',None))
~/conda/envs/python/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py in check_response(self,response)
240 alert_text = value['alert'].get('text')
241 raise exception_class(message,screen,stacktrace,alert_text)
--> 242 raise exception_class(message,stacktrace)
243
244 def _value_or_default(self,obj,key,default):
SessionNotCreatedException: Message: Unable to find a matching set of capabilities
该问题似乎受许多版本更新的影响,因此新更新可能会引起误判和问题。怎么解决?您是否可以建议使用其他方法轻松地进行网络抓取(运行JavaScript)的类似方法?
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)