问题描述
我有一个带文本的PDF,我使用PuMuPDF(fitz)提取每一页的数据。我想在句子开头添加句号。示例和代码如下所示:
示例:
MORE PAGE INFO
Name of the company and some info
More info here and here
The data above is correct. We are a registered firm,("ABC") for this company.
Technology etc,more sentences and a paragraph here. These sentences are much longer etc.
Here is another pixmap example that creates Sierpinski’s Carpet – a fractal generalizing the Cantor Set to two dimensions. Given a square carpet.
所需的输出:
MORE PAGE INFO.
Name of the company and some info.
More info here and here.
The data above is correct. We are a registered firm,more sentences and a paragraph here. These sentences are much longer etc.
Here is another pixmap example that creates Sierpinski’s Carpet – a fractal generalizing the Cantor Set to two dimensions. Given a square carpet.
当前代码:
doc =fitz.open(myfile)
page=doc[0]
for page in doc:
text = page.getText("text")
text =text.replace ("\n",'.')
print(text)
代码输出的确为短句添加了句号,但也为正确形成的句子添加了句号。我还有其他方法可以做到吗?
谢谢
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)