我的代码用于移动文件并获取数据统计信息不适用于Google云端硬盘上的大文件夹

问题描述

我正在使用Google Colab执行以下任务，但是它没有用。当我在文件少于10个的小文件夹上进行测试时，我的脚本运行良好。但是，它们不适用于具有数千个文件的较大文件。附带说明，我无法说出文件夹的大小，因为Google云端硬盘没有此选项。

希望知道原因以及如何解决它。非常感谢！

任务＃1 ：将所有json文件从一个文件夹移动到Google云端硬盘上的另一个文件夹。当我在较小的文件夹上测试时。所有文件都按预期方式移动。但是，当在具有更大尺寸的“真实文件夹”上使用时，看起来好像可以使用。没有超时。但是当我查看Google云端硬盘上的文件夹时，文件仍然存在。什么都没改变。

source = glob.glob('/path_to_source_folder/*.json')
destination = '/path_to_destination_folder/'

for json_file in source:
  id = os.path.basename(json_file)
  file = '/path_to_destination_folder/{}'.format(id)
  if os.path.exists(file):
    print('The file {} already exists'.format(id))
    os.remove(json_file)
  else:
    shutil.move(json_file,destination)

任务＃2 ：获取文件夹和json文件的统计信息。我在较小的文件夹上进行了测试，效果很好。旁注：较小文件夹中的json文件与较大文件中的json文件具有相同的结构。当涉及到较大的文件夹时，它并没有超时。结果为“ 0”。例如“ 0个用户”，“ 0个帖子”等。这些肯定是错误的。

files = glob.glob('/path_to_reference_folder/*.json')

total_users = 0
not_empty_users = 0
total_posts_by_users = []

for file in files:
  total_users += 1
  with open(file,'r') as f:
    tmp = f.readlines()
    if len(tmp) > 0:
      not_empty_users += 1
    total_posts_by_users.append(len(tmp))

print("total {} users".format(total_users))

print("total {} posts by users".format(np.sum(total_posts_by_users)))
print("total {} users not empty".format(not_empty_users))
print("total {} average posts per users".format(np.mean(total_posts_by_users)))

注意：早期步骤-安装Drive和导入库

# Mounting Drive
from google.colab import drive
drive_mounting = drive.mount('/content/drive')

# Importing libraries
import numpy as np
import os
import glob
import json
import shut

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）