问题描述
5 0 obj
<< /Length 4760 /Filter [ /ASCII85Decode /FlateDecode ]
>>
stream
<<LOTS OF BINARY DATA IS HERE!!>>
endstream
如何解码二进制流?
在这里问了类似的问题:
How do we decompress FlateDecode Objects in PDF in Python?
但是,我有两个流过滤器/ASCII85Decode
和/FlateDecode
,而不仅仅是/FlateDecode
。
此外,我不断收到以下错误消息:
Error -3 while decompressing data: incorrect header check
以下是我对pdf进行解码的尝试之一:
import re
import zlib
import io
def sani_path(ugly_path):
"""
sanitize an ugly file-path
converts windows-style file paths to Unix-style paths and vis versa.
changes slashes to back-slashes or back-slashes to slashes
"""
# IMPLEMENTATION NOTES:
# VARIABLE NAMES:
# `wip`.... `work in progress`
#
# OTHER NOTES:
# `repr()` converts new-line char in the middle of the path into
# a backslash character and a letter "n"
import pathlib
wip = str(ugly_path)
wip = wip.strip()
wip = repr(wip)[1:-1]
wip = pathlib.Path(wip)
pretty_path = wip
return pretty_path
def decompress_pdf_from_path(xpath):
###########################################
ipath = sani_path(xpath)
del xpath
############################################
pdf = open(str(ipath),'rb').read()
stream = re.compile(b'stream(.*?)endstream',re.S)
results = re.findall(stream,pdf)
for s in results:
s = s.strip(b'\r\n')
IoUt = None
try:
IoUt = zlib.decompress(s).decode('UTF-8')
except BaseException as exc:
out_stream = io.StringIO()
print(
"INPUT STRING: ",s,type(exc),str(exc),sep="\n",file=out_stream
)
IoUt = out_stream.getvalue()
finally:
xout = str(IoUt)
return xout
read_file_path_str = 'C:/Users/username/Desktop/test.pdf'
decompressed_pdf = decompress_pdf_from_path(read_file_path_str)
print(decompressed_pdf)
print('length ',len(decompressed_pdf))
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)