问题描述
我在tesseract
上安装了/usr/share/tesseract-ocr/
,并且在tessdata
上的/usr/share/tesseract-ocr/4.0/tessdata
目录下工作正常。由于equ.traineddata
并未提供原始数据,因此我从官方文档中将其删除,设法将其粘贴到/usr/share/tesseract-ocr/4.0/tessdata/equ.traineddata
处。除此之外,我还粘贴了hin,ben
和更多文件。当我使用--l eng+hin+ben
时,它可以正常工作,但与equ
一起时,会引发错误。我也使用PyTesseract
进行一些配置,例如:
# making a copy of tessdata dir in the home
cli_config = '--oem 1 --psm 12 --tessdata-dir ~/tessdata/ -l eng+equ+ben+hin'
ocr.image_to_string(image=img_path,config=cli_config)
还有
cli_config = '--oem 1 --psm 12` # tessdata is at default location too
ocr.image_to_string(image=img_path,config=cli_config,lang='eng+equ+hin+ben`)
但它总是抛出错误仅用于 equ
,例如:
TesseractError Traceback (most recent call last)
<ipython-input-30-8529ae8e51e8> in <module>
----> 1 ocr.image_to_string(image=img_path,lang='equ')
~/anaconda3/envs/py36/lib/python3.6/site-packages/PyTesseract/PyTesseract.py in image_to_string(image,lang,config,nice,output_type,timeout)
356 Output.DICT: lambda: {'text': run_and_get_output(*args)},357 Output.STRING: lambda: run_and_get_output(*args),--> 358 }[output_type]()
359
360
~/anaconda3/envs/py36/lib/python3.6/site-packages/PyTesseract/PyTesseract.py in <lambda>()
355 Output.BYTES: lambda: run_and_get_output(*(args + [True])),356 Output.DICT: lambda: {'text': run_and_get_output(*args)},--> 357 Output.STRING: lambda: run_and_get_output(*args),358 }[output_type]()
359
~/anaconda3/envs/py36/lib/python3.6/site-packages/PyTesseract/PyTesseract.py in run_and_get_output(image,extension,timeout,return_bytes)
264 }
265
--> 266 run_tesseract(**kwargs)
267 filename = kwargs['output_filename_base'] + extsep + extension
268 with open(filename,'rb') as output_file:
~/anaconda3/envs/py36/lib/python3.6/site-packages/PyTesseract/PyTesseract.py in run_tesseract(input_filename,output_filename_base,timeout)
240 with timeout_manager(proc,timeout) as error_string:
241 if proc.returncode:
--> 242 raise TesseractError(proc.returncode,get_errors(error_string))
243
244
TesseractError: (1,'Error opening data file /home/deshwal/anaconda3/envs/py36/share/tessdata/equ.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'equ\' Tesseract Couldn\'t load any languages! Could not initialize tesseract.')
这可能是什么原因?如何使用equ.traineddata
?
解决方法
equ
是传统语言数据。因此,您需要使用适当的oem
值。尝试使用tesseract --help-extra
命令来显示用法。