问题描述
我在python中使用PyTesseract for pdf。但是我在Windows 10中遇到权限错误。 我已经从https://github.com/UB-Mannheim/tesseract/wiki安装了tesseract-ocr-w64-setup-v5.0.0-alpha.20200328.exe 我也有poppler-20.09.0文件。我正在使用python 3.8.0
import pdf2image
import PyPDF2
import os
try:
from PIL import Image
except ImportError:
import Image
import PyTesseract
PyTesseract.PyTesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR'
def pdf_to_img(pdf_file):
print('pdf_file = ',pdf_file)
return pdf2image.convert_from_path(pdf_file,dpi=200,fmt='jpg',poppler_path=r'F:\lokesh\resume_script\poppler-20.09.0\bin')
def ocr_core(file):
text = PyTesseract.image_to_string(file,)
return text
def print_pages(pdf_file):
images = pdf_to_img(pdf_file)
for pg,img in enumerate(images):
print(ocr_core(img))
print_pages("aa.pdf")
Traceback (most recent call last):
File "test.py",line 84,in <module>
print_pages("aa.pdf")
File "test.py",line 81,in print_pages
print(ocr_core(img))
File "test.py",line 74,in ocr_core
text = PyTesseract.image_to_string(file,)
File "F:\python\lib\site-packages\PyTesseract\PyTesseract.py",line 344,in image_to_string
return {
File "F:\python\lib\site-packages\PyTesseract\PyTesseract.py",line 347,in <lambda>
Output.STRING: lambda: run_and_get_output(*args),File "F:\python\lib\site-packages\PyTesseract\PyTesseract.py",line 258,in run_and_get_output
run_tesseract(**kwargs)
File "F:\python\lib\site-packages\PyTesseract\PyTesseract.py",line 229,in run_tesseract
raise e
File "F:\python\lib\site-packages\PyTesseract\PyTesseract.py",line 226,in run_tesseract
proc = subprocess.Popen(cmd_args,**subprocess_args())
File "F:\python\lib\subprocess.py",line 854,in __init__
self._execute_child(args,executable,preexec_fn,close_fds,File "F:\python\lib\subprocess.py",line 1307,in _execute_child
hp,ht,pid,tid = _winapi.CreateProcess(executable,args,PermissionError: [WinError 5] Access is denied
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)