问题描述
我正在尝试使用以下代码从PowerPoint文本框中提取文本:
from pptx import Presentation
from pptx.enum.shapes import MSO_SHAPE_TYPE
def iter_textable_shapes(shapes):
for shape in shapes:
if shape.has_text_frame:
yield shape
def iter_textframed_shapes(shapes):
"""Generate shape objects in *shapes* that can contain text.
Shape objects are generated in document order (z-order),bottom to top.
"""
for shape in shapes:
# ---recurse on group shapes---
if shape.shape_type == MSO_SHAPE_TYPE.GROUP:
group_shape = shape
for shape in iter_textable_shapes(group_shape.shapes):
yield shape
continue
# ---otherwise,treat shape as a "leaf" shape---
if shape.has_text_frame:
yield shape
prs = Presentation(path_to_my_prs)
for slide in prs.slides:
textable_shapes = list(iter_textframed_shapes(slide.shapes))
ordered_textable_shapes = sorted(
textable_shapes,key=lambda shape: (shape.top,shape.left)
)
for shape in ordered_textable_shapes:
print(shape.text)
但有时会首先提取ppt末尾的文本框,有时会提取中间的文本框,依此类推。如何修复我的代码以正确的顺序获取文本(从左到右,从上到下)?
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)