S3 get_object..iter_lines() 使用 islice/zip 跳过行

问题描述

response = s3.get_object(Bucket=bucket,Key=file )

def generate_files(resp,N):
    while True:
        line = list(islice(resp["Body"].iter_lines(),10))
        if not line:
            break
        yield line
    return

然而，当我调用我的生成器时，结果并不如预期。

for line in generate_files(response,10):
    print('--')
    print([l.decode('utf-8') for l in line])

不是从第 0 行到第 9 行，然后从第 10 行到第 19 行，再从第 20 行到第 29 行等等。它会在生成器调用之间跳过任意数量的行。所以它返回第 0 行到第 9 行。然后是第 17 行到 26 行。然后是第 33 行到第 40 行等等。它也在向前移动。因此，即使在 islice 的线路调用之后，它似乎也在读取流。我也试过使用 zip 并得到相同的结果。

我在这里遗漏了什么？

解决方法

我认为问题出在 iter_lines 上，因为 iter_lines 不是可重入安全的。多次调用此方法会导致部分接收到的数据丢失。如果您需要从多个地方调用它，我认为您应该使用生成的迭代器对象，例如：

r = requests.get('https://httpbin.org/stream/20',stream=True)
lines = r.iter_lines()

for line in lines:
    print(line)

抱歉，代码的性能很差，所以我无法检查。更多信息在网址站点：https://2.python-requests.org/projects/3/user/advanced/

如果您检查了，请提供反馈。

boto3 generator generator itertools python

S3 get_object..iter_lines() 使用 islice/zip 跳过行

问题描述

解决方法

相关问答