具有来自PyTesseract的字符串的IF语句

问题描述

我是这方面的一个新手（包括Python），但是我正在解决的问题需要它。我已经注释了我的代码，因此希望可以很容易地看到我所做的事情。我正在尝试实时读取屏幕的一部分，并将OCR检测到的任何文本与所有if / else语句进行比较。当它匹配时，我希望它打印一个字符串。当前，实时OCR检测正常，但不能通过if / else语句来打印正确的字符串。我知道这是因为print（tesstr）提供正确的输出，但是当前代码不会产生任何输出。

# cv2.cvtColor takes a numpy ndarray as an argument
import numpy as nm

import PyTesseract

# importing OpenCV
import cv2

from PIL import ImageGrab


def imToString():

    while(True):

        # ImageGrab-To capture the screen image in a loop.
        # BBox used to capture a specific area.
        cap = ImageGrab.grab(bBox=(267,225,344,257))

        # Inverted the image for it to be easily
        # read by the OCR and obtained the output String.
        tesstr = PyTesseract.image_to_string(
            cv2.bitwise_not(nm.array(cap)),lang='eng',config = '--psm 7')
        if (tesstr == '1'):
            print('Ace')
        elif (tesstr == '2'):
            print('Queen')
        elif (tesstr == '3'):
            print('King')
        ...
# Calling the function
imToString()

解决方法

感谢评论，我终于做到了。

问题是由于在我的输出旁边打印了一个非Unicode字符。我仍然不知道为什么这样做，但是下面的工作代码。关于为什么的任何想法都会受到赞赏，但这并不太令人担忧。

# cv2.cvtColor takes a numpy ndarray as an argument
import numpy as nm

# importing Tesseract for the OCR
import pytesseract

# importing OpenCV
import cv2

# importing ImageGrab to take the image of my screen
from PIL import ImageGrab

# importing re to santise my output to something readable by the if statement
import re

def replace_chars(text):

    # Replaces all characters instead of numbers from 'text'.
    # :param text: Text string to be filtered
    # :return: Resulting number
    
    list_of_numbers = re.findall(r'\d+',text)
    result_number = ''.join(list_of_numbers)
    return result_number


def imToString():

    while(True):

        # ImageGrab-To capture the screen image in a loop.
        # Bbox used to capture a specific area.
        cap = ImageGrab.grab(bbox=(267,225,344,257))

        # Reading from terminal window,so
        # inverted the image for it to be easily
        # read by the OCR engine.
        # --psm 7 as it reads input as single line.

        tesstr = pytesseract.image_to_string(
            cv2.bitwise_not(nm.array(cap)),lang='eng',config = '--psm 7')

        # Calling replace_chars to only read numbers
        tesstr = replace_chars(tesstr)
        print(tesstr)

        if (tesstr == '1'):
            print('Ace')
        elif (tesstr == '2'):
            print('Queen')
        elif (tesstr == '3'):
            print('King')        
        ...

# Calling the function
imToString()

从https://return2.net/python-tesseract-4-0-get-numbers-only/盗窃的

replace_chars

if-statement ocr python python-tesseract tesseract