我可以在 Google Drive OCR 中为太小的字母设置阈值吗？

问题描述

我正在使用 Google Disc OCR API 来识别图片中的文本。但问题是它可以读取任何文本，甚至是非常非常小的微观文本。我可以以某种方式设置一个阈值，以便忽略非常小的字母吗？我不需要非常小的文字

我在 Google Apps 脚本中使用此代码：

 if (request.parameters.url != undefined && request.parameters.url != "") {
    var imageBlob = UrlFetchApp.fetch(request.parameters.url).getBlob();
    var resource = {
          title: imageBlob.getName(),mimeType: imageBlob.getContentType()
    };
    var options = {
        ocr: true
    };
    var docFile = Drive.Files.insert(resource,imageBlob,options);
    var doc = DocumentApp.openById(docFile.id);
    var text = doc.getBody().getText().replace("\n","");
    Drive.Files.remove(docFile.id);
    return ContentService.createTextOutput(text);
 }else {
    return ContentService.createTextOutput("request error");
 }
}```

解决方法

无法在 OCR 中添加阈值作为参数，但您可以采取一种解决方法。

您可以尝试阅读它创建的文档的子级字体大小，而不是源材料。

function doOCR() {
  // JT digital inspiration (font 19 in document)
  // tech à la carte (font 9 in document)
  var image = UrlFetchApp.fetch('http://img.labnol.org/logo.png').getBlob();

  var file = {
    title: 'OCR File',mimeType: 'image/png'
  };
  
  var docFile = Drive.Files.insert(file,image,{ocr: true});
  var doc = DocumentApp.openById(docFile.id).getBody();
  var numElements = doc.getNumChildren();

  // Traverse all children
  for (var i = 0; i < numElements; ++i ) {
    var element = doc.getChild(i);
    var fontSize = element.getFontSize();
    var textValue = element.asText().getText();
    var type = element.getType();
    // Add condition,if font size is less than your threshold
    // There are other children that have fontSize but doesn't have textValue,skip them
    if( type == DocumentApp.ElementType.PARAGRAPH && textValue != "" && fontSize > 10){
      Logger.log(textValue);
    }
  }
}

您还可以自定义跳过特定字体大小，只需调整条件即可。

Input：

输出（文档）：

输出（控制台）：

参考：

google-apps-script google-drive-api ocr text-recognition

我可以在 Google Drive OCR 中为太小的字母设置阈值吗？

问题描述

解决方法

相关问答