如何使用 Python 突出显示 PDF 上的文本?

问题描述

我正在尝试制作一个允许用户输入 PDF 的 python 脚本,然后用户将输入要搜索的单词,如果找到这些单词,则突出显示并导出为唯一的文件名。如果找不到单词,我有代码会运行,但是由于某种原因,当找到单词时,此代码会中断。任何帮助或建议表示赞赏!

### IMPORT PACKAGES NEEDED
import sys
from inspect import cleandoc
# !pip install PyMuPDF==1.16.14
import fitz
import time
import PySimpleGUI as sg
import sys
searchingWords = []

### READ IN PDF
sg.theme('BlueMono')
inputfname = sg.popup_get_file('PDF browser','PDF file to open',file_types=(("PDF Files","*.pdf"),))
if inputfname is None:
    sg.popup_cancel('Cancelled.')
    exit(0)
print(inputfname)
doc = fitz.open(inputfname)

### USER INPUTTING WORDS
# Window deFinition
layout = [[sg.Text("What word or phrase do you want to search for?")],[sg.Input(key='-INPUT-',do_not_clear=False)],[sg.Text(size=(40,1),key='-OUTPUT-')],[sg.Button('Next word',),sg.Button('Confirm'),sg.Button('Next word enter',visible=False,bind_return_key=True)]]

# Create the window
window = sg.Window('Word Search',layout)

# display window
while True:
    event,values = window.read()
    # See if user wants to quit or window was closed
    if event == sg.WINDOW_CLOSED or event == 'Confirm':
        break
    # Output a message to the window
    searchingWords.append(values['-INPUT-'])
    window['-OUTPUT-'].update(str(searchingWords))

# Remove window
window.close()


### END USER INPUT FOR SEARCH WORDS
for page in doc:
    ### SEARCHING FOR THE WORDS
    for word in searchingWords:
        # ??? How to change this to ensure there is a non-alphabetic letter next to it?
        text = str(word)
        text_instances = page.searchFor(text)
        ### HIGHLIGHTING THE WORDS
        for inst in text_instances:
            highlight = page.addHighlightAnnot(inst)
            highlight.update()

### SET FILE OUTPUT NAME
datetimefilename = time.strftime("%m-%d-%Y-%H.%M.%s") + "Highlighted.pdf"

### OUTPUT
doc.save(str(datetimefilename),garbage=4,deflate=True,clean=True)

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)