计算OCR精度

问题描述

我需要计算OCR字符精度

样本底值：

Non sinking ship is friendship

示例ocr值输入：

non singing ship is finedship

关注的领域是：

缺少的字符
其他字符
放错位置的字符

字符精度由实际字符数及其位置除以实际字符总数定义。

我需要一个python脚本来找到这种准确性。我最初的实现如下：

ground_value = "Non sinking ship is friendship"
ocr_value = "non singing ship is finedship"
ground_value_characters = (re.sub('\s+','',ground_value)).strip()  # remove all spaces from the gr value string
    ocr_value_characters = (re.sub('\s+',ocr_value)).strip()  # remove all the spaces from the ocr string 

 total_characters = float(len(
        ground_value_characters))  

def find_matching_characters(ground,ocr):
  total = 0
  for char in ground:
    if char in ocr:
      total = total + 1
      ocr = ocr.replace(char,1)
  return total

found_characters = find_matching_characters(ground_value_characters,ocr_value_characters)

accuracy = found_characters/total_characters

我无法得到想要的东西。任何帮助将不胜感激。

解决方法

如果您不喜欢该精确定义（或者如果您想深入研究python-Levenshtein的详细信息），那么这就是我要解决的方法：

pip install python-Levenshtein

from Levenshtein import distance

ground_value = "Non sinking ship is friendship"
ocr_value = "non singing ship is finedship"

print(distance(ground_value,ocr_value))

相同的library将以相对高性能的方式为您提供汉明距离，操作码和类似功能。

例如，如果这是一项家庭作业，或者您的目的是学习如何实现字符串算法，那么这些都没有用。但是，如果您只需要一个好的指标，这就是我要使用的。

computer-vision ocr ocr python python-3.x