在Requested外壳中呈现JS内容的FormRequest

问题描述

我正尝试使用以下表单数据从此page抓取内容：

我需要将County:设置为乔治王子，并将DateOfFilingFrom设置为01-01-2000，所以我需要执行以下操作：

% scrapy shell
In [1]: from scrapy.http import FormRequest                                                                                                                                          

In [2]: request = FormRequest(url='https://registers.maryland.gov/RowNetWeb/Estates/frmEstateSearch2.aspx',formdata={'DateOfFilingFrom': '01-01-2000','County:': "Prince George's"})                             

In [3]: response                                                                                                                                                                     

In [4]:

但是它不起作用（响应为None），下一页看起来像是下面的页面，它是动态加载的，我需要知道如何在进行以下检查的情况下访问下面显示的每个链接（我知道可以使用Splash完成此操作，但是我不确定如何在SplashRequest内合并FormRequest并在scrapy shell内完成所有操作以进行测试。知道我在做错什么，以及如何呈现下一页（如下所示的FormRequest所产生的页面）

解决方法

您发送的请求缺少几个字段，这可能就是为什么您没有收到回复的原因。您填写的字段也与他们在请求中期望的字段不对应。解决此问题的一种好方法是使用scrapy的from_response（doc），它可以根据表格中的信息为您填充一些字段。

对于该网站，以下内容对我有用（使用刮板外壳）：

>>> url = "https://registers.maryland.gov/RowNetWeb/Estates/frmEstateSearch2.aspx"
>>> fetch(url)
>>> from scrapy import FormRequest
>>> req = FormRequest.from_response(
...             response,...             formxpath="//form[@id='form1']",# specify the form on the current page
...             formdata={
...               'cboCountyId': '16',# the county you select is converted to a number
...               'DateOfFilingFrom': '01-01-2001',...               'cboPartyType': 'Decedent',...               'cmdSearch': 'Search'
...             },...             clickdata={'type': 'submit'},...       )
>>> fetch(req)

dynamic-content scrapy