布局解析器库,用于处理Python中文档的阅读顺序

问题描述

layoutexample 我想修改以下代码,以按阅读顺序索引所有类型的列/页面布局。

text = lp.Layout([b for b in layout if b.type=='Text' ])
figure = lp.Layout([b for b in layout if b.type=='figure'])

text = lp.Layout([b for b in text \
if not any(b.is_in(b_fig) for b_fig in figure)])

h,w = image.shape[:2]
left_interval = lp.Interval(0,w/2*1.05,axis='x').put_on_canvas(image)

lp.Interval(start,end,axis='x',canvas_height=0,canvas_width=0) ##

left = text.filter_by(left_interval,center=True)
left.sort(key = lambda b:b.coordinates[1])

right = [b for b in text if b not in left]
right.sort(key = lambda b:b.coordinates[1])

text = lp.Layout([b.set(id = idx) for idx,b in enumerate(left + right)])

参考资料-https://layout-parser.readthedocs.io/en/latest/index.html

解决方法

我已经尝试过,它不会检测带有列的表格,您可以尝试使用 Google Clod 计算机视觉布局解析器中给出的其他部分。

Read on this link