Python bz2 readlines在字节模式下变慢

问题描述

我有一个bz2压缩的日志文件,其中包含很多行。每行都必须进行细微的分析,这在这里并不重要。

我首先以文本模式阅读行,例如:

import bz2

path = 'content.log.bz2' 

def method_1(path):
    with bz2.open(path,'rt') as file:
        lines = [line for line in file]
    return lines

(分析发生在列表理解中。)prun给出:

1394087 function calls (1394086 primitive calls) in 9.864 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    17350    3.295    0.000    3.295    0.000 {method 'decompress' of '_bz2.BZ2Decompressor' objects}
        1    3.042    3.042    9.780    9.780 <ipython-input-1-77d0033e4930>:7(<listcomp>)
  1143178    2.232    0.000    2.232    0.000 bz2.py:137(closed)
    16617    0.266    0.000    3.894    0.000 _compression.py:66(readinto)
    16617    0.138    0.000    3.474    0.000 _compression.py:72(read)
    16617    0.138    0.000    4.386    0.000 bz2.py:184(read1)
    66467    0.128    0.000    0.128    0.000 {built-in method builtins.len}
    16617    0.108    0.000    4.001    0.000 {method 'read1' of '_io.BufferedReader' objects}
    16617    0.086    0.000    0.153    0.000 codecs.py:319(decode)
        1    0.083    0.083    9.864    9.864 <string>:1(<module>)
    16617    0.076    0.000    0.247    0.000 _compression.py:16(_check_can_read)
    16619    0.070    0.000    0.171    0.000 bz2.py:151(readable)
    16621    0.068    0.000    0.101    0.000 _compression.py:12(_check_not_closed)
    16617    0.067    0.000    0.067    0.000 {built-in method _codecs.utf_8_decode}
    16617    0.058    0.000    0.058    0.000 {method 'cast' of 'memoryview' objects}
      886    0.009    0.000    0.009    0.000 {method 'read' of '_io.BufferedReader' objects}

我认为在字节模式下这样做可能会更好。

def method_2(path):
    with bz2.open(path,'rb') as file:
        lines = [line for line in file]
    return lines

事实证明,这在性能方面比我的第一个解决方案差得多。修剪现在给出:

 8020433 function calls in 39.857 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1126551   10.761    0.000   36.901    0.000 bz2.py:206(readline)
  1126551    4.739    0.000   11.553    0.000 bz2.py:151(readable)
  1126551    4.655    0.000   16.208    0.000 _compression.py:16(_check_can_read)
  1126551    4.517    0.000    6.814    0.000 _compression.py:12(_check_not_closed)
    17350    4.333    0.000    4.333    0.000 {method 'decompress' of '_bz2.BZ2Decompressor' objects}
  1126551    3.023    0.000    7.947    0.000 {method 'readline' of '_io.BufferedReader' objects}
        1    2.880    2.880   39.780   39.780 <ipython-input-1-77d0033e4930>:12(<listcomp>)
  1126554    2.297    0.000    2.297    0.000 bz2.py:137(closed)
  1126552    1.985    0.000    1.985    0.000 {built-in method builtins.isinstance}
    16617    0.273    0.000    4.924    0.000 _compression.py:66(readinto)
    16617    0.140    0.000    4.515    0.000 _compression.py:72(read)
    66467    0.128    0.000    0.128    0.000 {built-in method builtins.len}
        1    0.073    0.073   39.857   39.857 <string>:1(<module>)
    16617    0.040    0.000    0.040    0.000 {method 'cast' of 'memoryview' objects}
      886    0.009    0.000    0.009    0.000 {method 'read' of '_io.BufferedReader' objects}

有人知道这是什么问题吗?我怀疑字节模式更像“机器”,因此速度更快。但这不是。

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...