问题描述
我们有一堆代码要移植到 python3 中,我们正面临一个非常奇怪的 enumerate 行为。
cdef char **c_argv
c_argv = <char**>malloc(sizeof(char*) * len(args))
for idx,s in enumerate(args):
if bytes != str:
s = s.encode('utf-8')
c_argv[idx] = s
使用 python2,我们将看到 c_argv 中的所有 argv,而在 python3 中,我们只看到一个 ... 注意,如果我们以“pythonic”方式编写 for 而不使用枚举:
for i in args:
这也不起作用。
这是我们测试的完整复制品:
test_enumerate.pyx
from libc.stdlib cimport malloc,free
from libc.string cimport const_char
def test_enumerate(args):
cdef char **c_argv
c_argv = <char**>malloc(sizeof(char*) * len(args))
for idx,s in enumerate(args):
if bytes != str:
s = s.encode('utf-8')
c_argv[idx] = s
for i in range(len(args)):
print("Set by enumerate",c_argv[i])
free(c_argv)
def test_loop_obj(args):
cdef char **c_argv
c_argv = <char**>malloc(sizeof(char*) * len(args))
idx=0
for s in (args):
if bytes != str:
s = s.encode('utf-8')
c_argv[idx] = s
idx = idx+1
for i in range(len(args)):
print("Set by loop on objects",c_argv[i])
free(c_argv)
def test_loop(args):
cdef char **c_argv
c_argv = <char**>malloc(sizeof(char*) * len(args))
for i in range(len(args)):
if bytes != str:
args[i] = args[i].encode('utf-8')
c_argv[i] = args[i]
for i in range(len(args)):
print("Set by loop on index",c_argv[i])
free(c_argv)
test.py
from test_enumerate import test_enumerate,test_loop_obj,test_loop
test_enumerate(['salut','tu','vas','bien'])
test_loop_obj(['salut','bien'])
test_loop(['salut','bien'])
setup.py:
from setuptools import setup
from Cython.Build import cythonize
setup(
ext_modules = cythonize("test_enumerate.pyx")
)
我们编译它:
python/python3 setup.py build_ext --inplace
这是说明我们问题的输出:
$ python test.py
('Set by enumerate','salut')
('Set by enumerate','tu')
('Set by enumerate','vas')
('Set by enumerate','bien')
('Set by loop on objects','salut')
('Set by loop on objects','tu')
('Set by loop on objects','vas')
('Set by loop on objects','bien')
('Set by loop on index','salut')
('Set by loop on index','tu')
('Set by loop on index','vas')
('Set by loop on index','bien')
$ python3 test.py
('Set by enumerate',b'bien')
('Set by enumerate',b'bien')
('Set by loop on objects',b'bien')
('Set by loop on index',b'salut')
('Set by loop on index',b'tu')
('Set by loop on index',b'vas')
('Set by loop on index',b'bien')
有人可以解释一下我们在这里遗漏了什么吗?
解决方法
c_argv[idx] = s
这将 c_argv[idx]
设置为指向 s
数据的指针。指针仅在 s
仍然存在时有效。
s = s.encode('utf-8')
如果发生这一行,则会创建一个新的编码 s
,导致先前编码的 s
被取消并因此可能被释放。
基本上,除非您了解(并且可以控制)它们的生命周期,否则不要乱用 c 指针。