问题描述
我有一个bz2压缩的日志文件,其中包含很多行。每行都必须进行细微的分析,这在这里并不重要。
我首先以文本模式阅读行,例如:
import bz2
path = 'content.log.bz2'
def method_1(path):
with bz2.open(path,'rt') as file:
lines = [line for line in file]
return lines
(分析发生在列表理解中。)prun给出:
1394087 function calls (1394086 primitive calls) in 9.864 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
17350 3.295 0.000 3.295 0.000 {method 'decompress' of '_bz2.BZ2Decompressor' objects}
1 3.042 3.042 9.780 9.780 <ipython-input-1-77d0033e4930>:7(<listcomp>)
1143178 2.232 0.000 2.232 0.000 bz2.py:137(closed)
16617 0.266 0.000 3.894 0.000 _compression.py:66(readinto)
16617 0.138 0.000 3.474 0.000 _compression.py:72(read)
16617 0.138 0.000 4.386 0.000 bz2.py:184(read1)
66467 0.128 0.000 0.128 0.000 {built-in method builtins.len}
16617 0.108 0.000 4.001 0.000 {method 'read1' of '_io.BufferedReader' objects}
16617 0.086 0.000 0.153 0.000 codecs.py:319(decode)
1 0.083 0.083 9.864 9.864 <string>:1(<module>)
16617 0.076 0.000 0.247 0.000 _compression.py:16(_check_can_read)
16619 0.070 0.000 0.171 0.000 bz2.py:151(readable)
16621 0.068 0.000 0.101 0.000 _compression.py:12(_check_not_closed)
16617 0.067 0.000 0.067 0.000 {built-in method _codecs.utf_8_decode}
16617 0.058 0.000 0.058 0.000 {method 'cast' of 'memoryview' objects}
886 0.009 0.000 0.009 0.000 {method 'read' of '_io.BufferedReader' objects}
我认为在字节模式下这样做可能会更好。
def method_2(path):
with bz2.open(path,'rb') as file:
lines = [line for line in file]
return lines
事实证明,这在性能方面比我的第一个解决方案差得多。修剪现在给出:
8020433 function calls in 39.857 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1126551 10.761 0.000 36.901 0.000 bz2.py:206(readline)
1126551 4.739 0.000 11.553 0.000 bz2.py:151(readable)
1126551 4.655 0.000 16.208 0.000 _compression.py:16(_check_can_read)
1126551 4.517 0.000 6.814 0.000 _compression.py:12(_check_not_closed)
17350 4.333 0.000 4.333 0.000 {method 'decompress' of '_bz2.BZ2Decompressor' objects}
1126551 3.023 0.000 7.947 0.000 {method 'readline' of '_io.BufferedReader' objects}
1 2.880 2.880 39.780 39.780 <ipython-input-1-77d0033e4930>:12(<listcomp>)
1126554 2.297 0.000 2.297 0.000 bz2.py:137(closed)
1126552 1.985 0.000 1.985 0.000 {built-in method builtins.isinstance}
16617 0.273 0.000 4.924 0.000 _compression.py:66(readinto)
16617 0.140 0.000 4.515 0.000 _compression.py:72(read)
66467 0.128 0.000 0.128 0.000 {built-in method builtins.len}
1 0.073 0.073 39.857 39.857 <string>:1(<module>)
16617 0.040 0.000 0.040 0.000 {method 'cast' of 'memoryview' objects}
886 0.009 0.000 0.009 0.000 {method 'read' of '_io.BufferedReader' objects}
有人知道这是什么问题吗?我怀疑字节模式更像“机器”,因此速度更快。但这不是。
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)