仅使用参数子集作为标识符缓存 Python 函数结果

问题描述

有没有一种简单的方法可以基于单个标识符参数在 python 中缓存函数结果？例如，假设我的函数有 3 个参数 arg1、arg2 和 id。是否有一种简单的方法可以仅根据 id 的值来缓存函数结果？也就是说，每当 id 取相同的值时，缓存函数将返回相同的结果，而不管 arg1 和 arg2。

背景： 我有一个耗时且重复调用的函数，其中 arg1 和 arg2 是由大型 numpy 数组组成的列表和字典。因此，functools.lru_cache 不能按原样工作。然而，只有少数 arg1 和 arg2 的特定组合。因此，我的想法是手动指定一些 id，它为 arg1 和 arg2 的每个可能组合采用唯一值。

解决方法

def cache(fun):
    cache.cache_ = {}
    def inner(arg1,arg2,id):
        if id not in cache.cache_:
            print(f'Caching {id}') # to check when it is cached
            cache.cache_[id] = fun(arg1,id)
        return cache.cache_[id]
    return inner
    
@cache
def function(arg1,arg3):
    print('something')

您可以按照 DarrylG 的建议创建自己的装饰器。您可以在 print(cache.cache_) 内执行 if id not in cache.cache_: 以检查它是否仅缓存 id 的新值。

您可以使用 cache_ 使 cache.cache_ 成为函数属性 PEP 232。然后当您想重置 cache_ 时，您可以使用 cache.cache_.clear()。这将使您可以直接访问缓存结果的字典。

function(1,2,'a')
function(11,22,'b')
function(11,'a')
function([111,11],222,'a')

print(f'Cache {cache.cache_}') # view previously cached results
cache.cache_.clear() # clear cache
print(f'Cache {cache.cache_}') # cache is now empty

# call some function again to populate cache
function(1,'a')

编辑：解决@Bob (OP) 的新评论，在大多数情况下，返回对同一对象的引用就足够了，但 OP 的用例似乎需要答案的新副本，这可能是由于 function(arg1,arg3) 的性质根据 arg1、arg_2 和 arg3 被视为唯一（在“cache”函数内部，唯一性仅使用 id 定义）。在这种情况下，返回对可变对象的相同引用会导致不良行为。如同一评论中所述，inner 函数中的 return 语句应从 return cache.cache_[id] 更改为 return copy.deepcopy(cache.cache_[id])。

我认为你可以将过多的参数移动到一个单独的函数（调用者），如下所示：

import functools

def get_and_update(a,b,c):
    return {'a': a,'b': b,'c': c}

# ->

@functools.lru_cache
def get_by_a(a):
    return {}

def get_and_update(a,c):
    res = get_by_a(a)
    res.update(a=a,b=b,c=c)
    return res

x1 = get_and_update('x',1,2)
x2 = get_and_update('x',3)
assert x1 is x2
print(x1,x2,sep='\n')

{'a': 'x','b': 2,'c': 3}
{'a': 'x','c': 3}

functools python