从https将HTML转换为PDF需要身份验证

问题描述

我一直在尝试从我公司的https安全认证所需的网站将html转换为pdf。

我尝试先直接使用pdfkit进行转换。

    pdfkit.from_url("https://companywebsite.com",'output.pdf')

但是我收到这些错误

Error: Authentication Required                                    
Error: Failed to load https://companywebsite.com,with network status code 204 and http status code 401 - Host requires authentication

所以我在参数中添加了选项

pdfkit.from_url("https://companywebsite.com",'output.pdf',options=options)
options = {'username': username,'password': password}

它永远加载而没有任何输出

我的第二种方法是尝试创建带有请求的会话

def download(session,username,password):
session.get('https://companywebsite.com',auth=HTTPBasicAuth(username,password),verify=False)

ua = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/41.0.2228.0 Safari/537.36'
session.headers = {'User-Agent': ua}
payload = {'UserName':username,'Password':password,'AuthMethod':'FormsAuthentication'}

session.post('https://companywebsite.com',data = payload,headers = session.headers)
my_html = session.get('https://companywebsite.com/thepageiwant')
my_pdf = open('myfile.html','wb+')
my_pdf.write(my_html.content)
my_pdf.close()

path_wkthmltopdf = 'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe'
config = pdfkit.configuration(wkhtmltopdf=bytes(path_wkthmltopdf,'utf8'))


pdfkit.from_file('myfile.html','out.pdf')

download(session,password)

有人可以帮我吗,我从session.get中得到200,所以它肯定可以赢得会议

解决方法

也许尝试使用硒来访问该网站并捕捉屏幕截图

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...