尝试从颗粒状图像中提取文本

问题描述

我一直在尝试从颗粒状图像中提取文本，这是原图

这是我用来尝试处理此图像的代码

    img_gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
    cv2.imshow('img_gray',img_gray)
    cv2.waitKey(0)
    #img_bin = cv2.adaptiveThreshold(img_gray,255,cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY,21,15) # 21 and 15 need to be set for image12
    img_bin = cv2.adaptiveThreshold(img_gray,27,15) # 21 and 15 need to be set for image12

    cv2.imshow('img_bin',img_bin)
    cv2.waitKey(0)

    fig,axs = plt.subplots(3)
    axs[0].imshow(img_gray,cmap="gray")
    axs[1].imshow(img_bin,cmap="gray")
    # Merge dots into characters using erosion
    kernel = np.ones((5,5),np.uint8)
    #kernel = np.ones((15,15),np.uint8)   
    img_eroded = cv2.erode(img_bin,kernel,iterations=1)
    axs[2].imshow(img_eroded,cmap="gray")
    cv2.imshow('img_eroded',img_bin)
    cv2.waitKey(0)
    fig.show()

    # Obtain string using psm 8 (treat the image as a single word)
    ocr_string = PyTesseract.image_to_string(img_eroded,lang= 'eng',config="--psm 6")
    return ocr_string

这是将背景变成灰色后的灰色图像 img_gray

这是应用自适应阈值后的图像 enter image description here

这是腐蚀后的图像 enter image description here

在最终图像 (img_eroded) 中，实际文本周围仍有很多点，这可能导致 image_to_string 函数抛出一些垃圾值。有没有办法进一步处理这个图像，或者改进现有的代码来提取文本Pac=2665.7W

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

adaptive-threshold opencv python python-tesseract