问题描述
我不知道有任何内置方法可以执行此操作,但是包装函数很容易编写:
def read_in_chunks(infile, chunk_size=1024*64):
while True:
chunk = infile.read(chunk_size)
if chunk:
yield chunk
else:
# The chunk was empty, which means we're at the end
# of the file
return
然后在交互式提示下:
>>> from chunks import read_in_chunks
>>> infile = open('quicklisp.lisp')
>>> for chunk in read_in_chunks(infile):
... print chunk
...
<contents of quicklisp.lisp in chunks>
当然,您可以轻松地对此进行修改以使用with块:
with open('quicklisp.lisp') as infile:
for chunk in read_in_chunks(infile):
print chunk
您可以消除这样的if语句。
def read_in_chunks(infile, chunk_size=1024*64):
chunk = infile.read(chunk_size)
while chunk:
yield chunk
chunk = infile.read(chunk_size)
解决方法
在Python中,对于二进制文件,我可以这样编写:
buf_size=1024*64 # this is an important size...
with open(file,"rb") as f:
while True:
data=f.read(buf_size)
if not data: break
# deal with the data....
对于要逐行读取的文本文件,我可以编写以下代码:
with open(file,"r") as file:
for line in file:
# deal with each line....
简写为:
with open(file,"r") as file:
for line in iter(file.readline,""):
# deal with each line....
PEP
234中记录了该惯用语,但我无法为二进制文件找到类似的惯用语。
我已经试过了:
>>> with open('dups.txt','rb') as f:
... for chunk in iter(f.read,''):
... i+=1
>>> i
1 # 30 MB file,i==1 means read in one go...
我尝试放置,iter(f.read(buf_size),'')
但这是语法错误,因为在iter()中的callable之后有括号。
我知道我可以编写一个函数,但是默认习惯用法有没有办法在for chunk in file:
哪里使用缓冲区大小而不是面向行?
感谢您忍受Python新手尝试编写他的第一个平凡而又惯用的Python脚本。