问题描述
three.js
使用上面的代码在scrapy shell中,我可以登录stackoverflow。但是,我不想将此活动作为命令行参数来执行。因此,我试图在子进程中使用以上命令登录。
from scrapy import FormRequest
url = "https://stackoverflow.com/users/login"
fetch(url)
req = FormRequest.from_response(
response,formid='login-form',formdata={'email': 'test@test.com','password': 'testpw'},clickdata={'id': 'submit-button'},)
fetch(req)
但这给了我这样的错误:
TypeError:“ FormRequest”类型的参数不可迭代
我还尝试将响应保存在html文件中,并将该文件读取为响应,并得到与上述相同的错误消息。
import subprocess
import scrapy
from scrapy import FormRequest
from subprocess import run
from bs4 import BeautifulSoup
class QuoteSpider(scrapy.Spider):
name = 'stackover'
start_urls = ['https://stackoverflow.com/users/login']
run(["scrapy","fetch",start_urls[0]],capture_output=True,text=True)
def parse(self,response):
req = FormRequest.from_response(
response,)
run(["scrapy",req],shell=True)
with open("output.html","w") as f:
response = call(["scrapy",url],stdout=f,shell=True)
with open("output.html",encoding="utf-8") as f:
data = f.read()
response = BeautifulSoup(data,'lxml')
在调用诸如以下的解析函数之前,我也尝试过formrequest:
r = run(["scrapy",capture_output=True)
response = r.stdout.decode()
而且,我遇到了新的错误。
AttributeError:“ str”对象没有属性“ encoding”
因此,如何使用子进程运行scrapy shell命令以登录stackoverflow。那么,在Formrequest中,scrapy中的响应究竟是什么呢?
我正在学习scrapy和各种方法来登录stackoverflow来练习Web抓取。
解决方法
from scrapy import FormRequest
from scrapy import Spider
class StackSpider(Spider):
name = 'stack_spider'
# List of urls for initial requests. Can be one or many.
# Default method parse() is called for start resoponses.
start_urls = ["https://stackoverflow.com/users/login"]
# Parsing users/login page. Getting form and moving on.
def parse(self,response):
yield FormRequest.from_response(
response,formid='login-form',formdata={'email': 'test@test.com','password': 'testpw'},clickdata={'id': 'submit-button'},callback=self.parse_login
)
# Parsing login result
def parse_login(self,response):
print('Checking logging in here.')
您可以使用scrapy crawl stack_spider