使用 pypdf 跳过受密码保护的文件仅支持算法代码 1 和 2

问题描述

我正在为列表中的几个不同的 pdf 链接运行 checkPdf 函数。

def getResponse(url):
    try:
        response = requests.get(url)
    except:
        response = None
    return response

def getNumberOfPages(response):
    with BytesIO(response.content) as open_pdf_file:
        read_pdf = PdfFileReader(open_pdf_file)
        if read_pdf.isEncrypted:
            read_pdf.decrypt("")
        num_pages = read_pdf.getNumPages()
        return num_pages
    
def checkPDFs(pdfLinks):
    pdfDetails = {}
    for link in pdfLinks:
        pdfDetails[link] = {}
        response = getResponse(link)
        pdfDetails[link]["numberOfPages"] = getNumberOfPages(response)
        #pdfDetails[link]["creationDate"] = getDocumentInfo(response)
    print("PDF details",pdfDetails)
    return pdfDetails

它对某些人有效,但对其他人无效并引发错误

<ipython-input-46-3213ac4b89ef> in checkPDFs(companyName,pdfLinks)
     29         pdfDetails[link] = {}
     30         response = getResponse(link)
---> 31         pdfDetails[link]["numberOfPages"] = getNumberOfPages(response)
     32         #pdfDetails[link]["creationDate"] = getDocumentInfo(response)
     33     print("PDF details",pdfDetails)

<ipython-input-46-3213ac4b89ef> in getNumberOfPages(response)
     10         read_pdf = PdfFileReader(open_pdf_file)
     11         if read_pdf.isEncrypted:
---> 12             read_pdf.decrypt("")
     13         num_pages = read_pdf.getNumPages()
     14         return num_pages

c:\users\nh\appdata\local\programs\python\python39\lib\site-packages\PyPDF2\pdf.py in decrypt(self,password)
   1985         self._override_encryption = True
   1986         try:
-> 1987             return self._decrypt(password)
   1988         finally:
   1989             self._override_encryption = False

c:\users\nh\appdata\local\programs\python\python39\lib\site-packages\PyPDF2\pdf.py in _decrypt(self,password)
   1994             raise NotImplementedError("only Standard PDF encryption handler is available")
   1995         if not (encrypt['/V'] in (1,2)):
-> 1996             raise NotImplementedError("only algorithm code 1 and 2 are supported")
   1997         user_password,key = self._authenticateUserPassword(password)
   1998         if user_password:

NotImplementedError: only algorithm code 1 and 2 are supported

我已经尝试在 getNumberOfPages 函数中解密,但它不起作用。我不知道密码。有什么办法可以绕过它吗?或者,有什么办法可以跳过代码中的加密文件?因此,例如,如果我无法解密文件,我想返回一个字符串“加密”而不是错误。如何修改我的脚本?

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)