问题描述
我正在尝试提取文本区域并通过 tesseract 识别每个边界。我只想提取文本部分。然而,提取的矩形是重叠的。此外,后续的文本是分开的。如何干净地仅提取文本部分?谢谢
public List<Rect> RegionOfInterest(Mat image)
{
List<Rect> boundRect = new List<Rect>();
Rect compare;
using (Mat img_gray = new Mat())
using (Mat img_sobel = new Mat())
using (Mat img_threshold = new Mat())
{
Cv2.CvtColor(image,img_gray,ColorConversionCodes.BGR2GRAY); //GrayScale
Cv2.Sobel(img_gray,img_sobel,MatType.CV_8U,1,3,BorderTypes.Default); //Sobel Mask
Cv2.Threshold(img_sobel,img_threshold,100,255,ThresholdTypes.Otsu | ThresholdTypes.Binary); //Binary
using (Mat element = Cv2.GetStructuringElement(MorphShapes.Rect,new Size(20,20))) //ROI
{
Cv2.MorphologyEx(img_threshold,MorphTypes.Close,element);
Point[][] edgesArray = img_threshold.Clone().FindContoursAsArray(RetrievalModes.External,ContourApproximationModes.ApproxNone);
foreach (Point[] edges in edgesArray)
{
Point[] normalizedEdges = Cv2.ApproxPolyDP(edges,true);
Rect appRect = Cv2.BoundingRect(normalizedEdges);
compare = Cv2.BoundingRect(normalizedEdges);
if (appRect.Width > 12 && appRect.Height > 12)
{
boundRect.Add(appRect);
}
}
}
}
return boundRect;
}
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)