我可以从Python Tesseract文件夹中的所有图像中获取数据吗?

问题描述

df <- structure(list(date = c("2019-12-01","2019-12-02","2019-12-03","2019-12-04","2019-12-05","2019-12-06","2019-12-07","2019-12-08","2019-12-09","2019-12-10"),val = c(1L,0L,1L,2L,3L,3L),cday = c(NA,NA,NA)),class = "data.frame",row.names = c(NA,-10L))

我不想只获得1张图像,我想获得一个文件夹中的图像,如果可能的话,我希望迅速一张一张地获得图像(例如1秒钟的冷却时间,总共100张图像)

[另一个我邪恶的想法是等待照片直播,当照片进入文件夹时,程序会读取并键入它,重要的是实时观看,但不一定如此]

有人可以帮我吗?

谢谢...

{https://towardsdatascience.com/how-to-extract-text-from-images-with-python-db9b87fe432b}

编辑:

文件夹中的所有图像中提取文本

import PyTesseract
PyTesseract.PyTesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'
print(PyTesseract.image_to_string(r'D:\examplepdf2image.png'))

我找到了这段代码,它是在这里读取和创建文本文件并写入数据。

解决方法

要轻松扫描并从文件夹中获取所有文件,可以使用globos.walk

import glob,os
folder = "your/folder/path"

# to get all *.png files directly under your folder:
files = glob.glob(folder+"/*.png")
# files will be a list that contains all *.png files directly under folder,not include subfolder. 

# or use os.walk:
result = []
for root,_,file in os.walk(folder):
    if file.endswith('.png'):
        result.append(os.path.join(root,file))
# result will be a list that contains all *.png files in your folder,including subfolders.

如果您想实时监视文件夹并在将新的.png文件写入文件夹时触发一些操作,

如果您不需要即时响应文件创建并且文件夹不太拥挤,

最简单的方法是每隔几秒钟扫描一次相同的文件夹,并将新文件列表与旧文件列表进行比较,并处理新文件。

如果您希望获得eventListener类型的响应,则在创建文件后立即触发该操作,您可以检查名为watchdog的python库。

这是PyPI主页:watchdog package home page

使用watchdog,您可以创建一个文件监视器,如下所示:

from watchdog.events import PatternMatchingEventHandler
from watchdog.observers import Observer

class PNG_Handler(PatternMatchingEventHandler)
    def __init__(self,):
        super().__init__(patterns=["*.png",],ignore_directories=False,)

    def on_created(self,event):
        newfilepath = event.src_path 
        # newfilepath is the path to newly created .png file 
        # you can implement your handler method here.
        # the other methods have the same principle.
        
    def on_deleted(self,event):
        pass

    def on_modified(self,event):
        pass

    def on_moved(self,event):
        pass

observer = Observer()
observer.schedule(PNG_Handler(),"path/to/folder",recursive=True)

每创建一个“ * .png”文件,就会调用on_created函数。