问题描述
我有一个情况。我正在尝试使用tesseract从pdf / image中提取文本。 pdf / image中的输入文本格式如下:
Review
Dear All,Turning The Corner On A
Challenge full Year (This is a heading)
We are executing and gaining
mobile share in our markets
in Australia against intense competition.
However,the past year was
通过tesseract阅读此书后,我得到以下格式的输出:
Review
Dear All,Turning The Corner On A
Challenge full Year (This is a heading)
We are executing and gaining
mobile share in our markets
in Australia against intense competition.
However,the past year was
我想要以下格式的输出:
Review
Dear All,the past year was
我试图执行以下命令以获得预期的结果,但是没有用。
"".join([s for s in result.splitlines(True) if s.strip("\r\n")])
### result is variable to store my tesseract output
Review
Dear All,the past year was
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)