如何从大文件名数组中测试文件是否存在，以及如何以一致的方式从文件名中删除文件名？

问题描述

我试图在ML项目中使用包含文件名的元数据数组，问题是某些文件在磁盘存储中不可用。我的目标是通过检查文件是否可用来从阵列中删除不存在的文件。我用python编写了这段代码：

for file in meta:
try:
    f = open(data_path + file,'r')
    f.close()
except: 
    meta.remove(file)

该代码似乎可以正常运行，但并不一致，我可以连续运行几次，并且每次都会缩短meta的长度。（例如：原始len（meta）= 65296，在1 iter len（meta）= 62020之后，在2 iter len（meta）= 60653之后，等等...

我的代码为什么没有一次删除所有不存在的文件的原因？有没有更一致的方法可以实现我的目标？

解决方法

我不确定我是否遵循您要执行的操作，但是从不更改在迭代过程中使用的顺序。使用类似这样的内容：

new_meta = meta
for file in meta:
    try:
        f = open(data_path + file,'r')
        f.close()
    except:
        # new sequence,not the original one
        new_meta.remove(file)

我相信的更一致的方式：

from os import path
new_meta = meta
for file in meta:
    if not path.exists(data_path + file):
        new_meta.remove(file)

from os import path
meta = ['path/to/file1','path/to/file2','path/to/filen']
new_meta = [f for f in meta if path.exists(f)]

arrays file file python python-3.x

如何从大文件名数组中测试文件是否存在，以及如何以一致的方式从文件名中删除文件名？

问题描述

解决方法

相关问答