问题描述
我需要将 .pdf 文件转换为 .jpeg 文件才能对文本进行 OCR。 我找到了这个代码:
from pdf2image import convert_from_path
pages = convert_from_path('img732.pdf',500)
for page in pages:
page.save('out.jpg','JPEG')
我收到了这个错误:
Traceback (most recent call last):
File "C:\Users\david\AppData\Local\Programs\Python\python39\lib\site-package\pdf2image\pdf2image.py",line 441,in pdfinfo_from_path
proc = Popen(command,env=env,stdout=PIPE,stderr=PIPE)
File "C:\Users\david\AppData\Local\Programs\Python\python39\lib\subprocess.py",line 951,in __init__
self._execute_child(args,executable,preexec_fn,close_fds,File "C:\Users\david\AppData\Local\Programs\Python\python39\lib\subprocess.py",line 1420,in _execute_child
hp,ht,pid,tid = _winapi.CreateProcess(executable,args,FileNotFoundError: [WinError 2] Impossibile trovare il file specificato
During handling of the above exception,another exception occurred:
Traceback (most recent call last):
File "C:\Users\david\OneDrive\Desktop\SMEpy\prova!!!.py",line 2,in <module>
pages = convert_from_path('img732.pdf',500)
File "C:\Users\david\AppData\Local\Programs\Python\python39\lib\site-packages\pdf2image\pdf2image.py",line 97,in convert_from_path
page_count = pdfinfo_from_path(pdf_path,userpw,poppler_path=poppler_path)["Pages"]
File "C:\Users\david\AppData\Local\Programs\Python\python39\lib\site-packages\pdf2image\pdf2image.py",line 467,in pdfinfo_from_path
raise PDFInfonotinstalledError(
pdf2image.exceptions.PDFInfonotinstalledError: Unable to get page count. Is poppler installed and in PATH?
我在 .py 文件的同一目录中有 .pdf 文件。问题出在哪里?
解决方法
我猜这个问题是特定于库的。但是,您可以使用此解决方案成功运行。
- 下载适用于 windows 的 poppler 工具(我推荐最新版本):
http://blog.alivate.com.au/poppler-windows/
- 下载后解压到poppler文件夹任意路径
- 添加环境变量 poppler 的
"bin"
文件夹: - 并重新启动您的 Python 工作区