问题描述
我正在尝试使网站自动化,但是遇到Javascript问题。在我的第一个请求后,'__EVENTTARGET'='ctl00 $ content $ ctl01 $ btn'。同一页面上的新href弹出窗口,现在我想请求这个新的javascript链接'__EVENTTARGET'='ctl00 $ content $ ctl01 $'。我不知道如何使用scrapy spider抓取Javascript:_dopostback()。我尝试调查此问题,但找不到任何内容。
<a id="ctl00_content_" href="javascript:__doPostBack('ctl00$content$ctl01$,'')">SYED ALI </a>
我的蜘蛛代码::
URL ='xyz'
ExitRealtySpider(scrapy.Spider)类:
name = "campSpider"#name = "exit_realty"
allowed_domains = ["xyz"]
start_urls = [URL]
def parse(self,response):
# submit a form (first page)
self.data = {}
soup = BeautifulSoup(urlopen(URL),'html.parser')
viewstate = soup.find('input',{'id': '__VIEWSTATE' })['value']
generator = soup.find('input',{'id': '__VIEWSTATEGENERATOR'})['value']
validation = soup.find('input',{'id': '__EVENTVALIDATION' })['value']
self.data['__VIEWSTATE']= viewstate,self.data['__VIEWSTATEGENERATOR'] = generator,#'',self.data['__VIEWSTATEENCRYPTED'] = '',self.data['__EVENTVALIDATION'] = validation,self.data['typAirmenInquiry'] = '7',self.data['ctl00$content$ctl01$txtbxLastName'] = 'a',self.data['ctl00$content$ctl01$txtbxCertNo'] = '123',self.data['ctl00$content$ctl01$btnSearch'] = 'Search',self.data['__EVENTTARGET'] = 'ctl00$content$ctl01$'
return FormRequest.from_response(response,method='POST',callback=self.parse_page,formdata=self.data,#encoding = 'utf-8',#meta={'page': 1},dont_filter=True
#headers=HEADERS
)
def parse_page (self,response):
print("\n\n\n\n\n",response.body,"\n\n\n\n\n")
self.data = {}
soup = BeautifulSoup(urlopen(URL),{'id': '__EVENTVALIDATION' })['value']
self.data['__EVENTARGUMENT']= '',self.data['__LASTFOCUS']= '',self.data['__VIEWSTATE']= viewstate,self.data['__EVENTTARGET'] = 'ctl00$content$ctl01$'
ans = FormRequest.from_response(response,callback=self.parse_page2,dont_filter=True
#headers=HEADERS
)
return ans
def parse_page2 (self,response):
print("\n\n\n\n\n"),"\n\n\n\n\n")
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)