python Ray：如何写入文件

问题描述

我该如何构建一个ray框架，使每个过程将其结果写入公共文件？我目前正在尝试的是：

import ray
import time
import pickle
import filelock
ray.init()

filename = 'data/db.pkl'



@ray.remote
def f(i):
    try:    
        with filelock.FileLock(filename):
            with open(filename,'rb') as file:
                data = pickle.load(file)
    except FileNotFoundError:
        data = {}
    
    if i not in data.keys():

        # The actual computations that takes times and need to be parralell: here just a square.
        new_key = i
        new_item = i**2
        
        with filelock.FileLock(filename):
            with open(filename,'rb') as file:
                data = pickle.load(file)

            data[new_key] = new_item
            with open(filename,'wb') as file:
                pickle.dump(data,file)
    return None


numbers = [0,1,2,3,4,5,6,7,8,9,10]
rez = [f.remote(i) for i in numbers]

但是我得到一个错误。

我如何实现此行为？我希望每个过程： 1°检查数据库以查看是否需要工作 2°工作 3°将结果写入数据库。

在不锁定文件的情况下，这项工作有效，但并没有保存所有结果...我如何才能实现所需的行为？请注意，稍后我将需要使用它来进行分布式设置。

解决方法

首先，您应该使用'ab'（追加模式而不是'wb'来覆盖文件）。使用附加模式，您不需要锁定，因为它在POSIX系统上是thread-safe。
使用文件锁时出现什么错误？
鉴于您最终将使程序分发，我认为最简单的方法是在ray.put()中使用f(i)将数据存储在Ray shared memory中，然后编写对象从主程序中移出。

python ray