如何从段落中删除不必要的行空间

问题描述

我有一个情况。我正在尝试使用tesseract从pdf / image中提取文本。 pdf / image中的输入文本格式如下：

Review

Dear All,Turning The Corner On A
Challenge full Year (This is a heading)
We are executing and gaining
mobile share in our markets
in Australia against intense competition.
However,the past year was

通过tesseract阅读此书后，我得到以下格式的输出：

Review
    
Dear All,Turning The Corner On A
Challenge full Year (This is a heading)
We are executing and gaining
mobile share in our markets

in Australia against intense competition.
However,the past year was

我想要以下格式的输出：

Review
    
Dear All,the past year was

我试图执行以下命令以获得预期的结果，但是没有用。

"".join([s for s in result.splitlines(True) if s.strip("\r\n")])
### result is variable to store my tesseract output

上面的代码从段落中删除了所有行距，下面是输出：

Review
Dear All,the past year was

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

python-3.x python-tesseract tesseract