OpenAI Gym：在动作空间中浏览所有可能的动作

问题描述

我想构建一种蛮力方法，以在选择最佳动作空间之前测试“健身房”动作空间中的所有动作。是否有任何简单直接的方法来采取所有可能的措施？

具体地说，我的行动空间是

import gym

action_space = gym.spaces.Multidiscrete([5 for _ in range(4)])

我知道我可以使用action_space.sample()对随机动作进行采样，还可以检查动作空间中是否包含动作，但是我想生成该空间内所有可能动作的列表。

有没有比一堆for循环更优雅（和更出色）的东西了？ for循环的问题是我希望它可以在任何大小的动作空间中工作，因此我无法对4的for循环进行硬编码以遍历不同的动作。

解决方法

gym 环境中的动作通常仅用整数表示，这意味着如果您获得可能动作的总数，则可以创建所有可能动作的数组。

在健身房环境中获取可能动作总数的方法取决于它所具有的动作空间的类型，对于您的情况，它是一个 MultiDiscrete 动作空间，因此属性 nvec 可以如上所述使用 here通过 @Valentin Macé 像这样 -：

>> print(env.action_space.nvec)
array([5,5,5],dtype=int64)

注意属性 nvec 代表 n 向量，因为它的输出是一个多维向量。另请注意，该属性是一个 numpy 数组。

现在我们有了将它转换成动作列表的数组，假设因为 action_space.sample 函数从 MultiDiscrete action_space 的每个维度返回一个随机函数的 numpy 数组，即 -:

>> env.action_space.sample() # This does not return a single action but 4 actions for your case since you have a multi discrete action space of length 4.
array([2,2,1],dtype=int64)

因此，为了将数组转换为每个维度中可能的操作列表，我们可以使用列表推导式 -:

>> [list(range(1,(k + 1))) for k in action_space.nvec]
[[1,3,4,[1,5]]

请注意，这可以扩展到任意数量的维度，并且在性能方面也非常高效。

现在您可以只使用两个循环来循环每个维度中可能的操作，就像这样 -:

possible_actions = [list(range(1,(k + 1))) for k in action_space.nvec]
for action_dim in possible_actions :
    for action in action_dim :
        # Find best action.....
        pass

有关相同的更多信息，我希望您也访问 github 上的 this 线程，讨论一个有点类似的问题，以防万一您发现同样有用。

编辑：因此，根据您的评论 @CGFoX 我假设您希望它可以将动作的所有可能组合向量生成为任意数量维度的列表，有点像这样 - ：

>> get_actions()
[[1,1,2] ....] # For all possible combinations.

使用递归可以实现相同的效果，并且只有两个循环，这也可以扩展到所提供的多个维度。

def flatten(actions) :
    # This function flattens any actions passed somewhat like so -:
    # INPUT -: [[1,3],5]
    # OUTPUT -: [1,5]
    
    new_actions = [] # Initializing the new flattened list of actions.
    for action in actions :
        # Loop through the actions
        if type(action) == list :
            # If any actions is a pair of actions i.e. a list e.g. [1,1] then
            # add it's elements to the new_actions list.
            new_actions += action
        elif type(action) == int :
            # If the action is an integer then append it directly to the new_actions
            # list.
            new_actions.append(action)
    
    # Returns the new_actions list generated.
    return new_actions

def get_actions(possible_actions) :
    # This functions recieves as input the possibilities of actions for every dimension
    # and returns all possible dimensional combinations for the same.
    # Like so -:
    # INPUT-: [[1,4],4]] # Example for 2 dimensions but can be scaled for any.
    # OUTPUT-: [[1,2],3] ... [4,1] ... [4,4]]
    if len(possible_actions) == 1 :
        # If there is only one possible list of actions then it itself is the
        # list containing all possible combinations and thus is returned.
        return possible_actions
    pairs = [] # Initializing a list to contain all pairs of actions generated.
    for action in possible_actions[0] :
        # Now we loop over the first set of possibilities of actions i.e. index 0
        # and we make pairs of it with the second set i.e. index 1,appending each pair
        # to the pairs list.
        # NOTE: Incase the function is recursively called the first set of possibilities
        # of actions may contain vectors and thus the newly formed pair has to be flattened.
        # i.e. If a pair has already been made in previous generation like so -:
        # [[[1,[2,[3,3] ... ],4]]
        # Then the pair formed will be this -: [[[1,[[1,2] ... ]
        # But we want them to be flattened like so -: [[1,2] ... ]
        for action2 in possible_actions[1] :
            pairs.append(flatten([action,action2]))
    
    # Now we create a new list of all possible set of actions by combining the 
    # newly generated pairs and the sets of possibilities of actions that have not
    # been paired i.e. sets other than the first and the second.
    # NOTE: When we made pairs we did so only for the first two indexes and not for
    # all thus to do so we make a new list with the sets that remained unpaired
    # and the paired set. i.e.
    # BEFORE PAIRING -: [[1,4]]
    # AFTER PAIRING -: [[[1,2] ... ],4]] # Notice how the third set
    # i.e. the index 2 is still unpaired and first two sets have been paired.
    new_possible_actions = [pairs] + possible_actions[2 : ]
    # Now we recurse the function and call it within itself to make pairs for the
    # left out sets,Note that since the first two sets were combined to form a paired
    # first set now this set will be paired with the third set.
    # This recursion will keep happening until all the sets have been paired to form
    # a single set with all possible combinations.
    possible_action_vectors = get_actions(new_possible_actions)
    # Finally the result of the recursion is returned.
    # NOTE: Only the first index is returned since now the first index contains the
    # paired set of actions.
    return possible_action_vectors[0]

一旦我们定义了这个函数，它就可以与我们之前生成的动作可能性集一起使用，以获得所有可能的组合，就像这样 -:

possible_actions = [list(range(1,(k + 1))) for k in action_space.nvec]
print(get_actions(possible_actions))
>> [[1,`[1,[4,[5,5]]

EDIT-2 ：我修复了一些以前返回嵌套列表的代码，现在返回的列表是包含对的列表，而不是嵌套在另一个列表中。

EDIT-3-：修正了我的拼写错误。

还可以使用下面的函数，分别在观察或动作空间中制作所有状态或动作的明确列表。

def get_space_list(space):

    """
    Converts gym `space`,constructed from `types`,to list `space_list`
    """

    # -------------------------------- #

    types = [
        gym.spaces.multi_binary.MultiBinary,gym.spaces.discrete.Discrete,gym.spaces.multi_discrete.MultiDiscrete,gym.spaces.dict.Dict,gym.spaces.tuple.Tuple,]

    if type(space) not in types:
        raise ValueError(f'input space {space} is not constructed from spaces of types:' + '\n' + str(types))

    # -------------------------------- #

    if type(space) is gym.spaces.multi_binary.MultiBinary:
        return [
            np.reshape(np.array(element),space.n)
            for element in itertools.product(
                *[range(2)] * np.prod(space.n)
            )
        ]

    if type(space) is gym.spaces.discrete.Discrete:
        return list(range(space.n))

    if type(space) is gym.spaces.multi_discrete.MultiDiscrete:
        return [
            np.array(element) for element in itertools.product(
                *[range(n) for n in space.nvec]
            )
        ]

    if type(space) is gym.spaces.dict.Dict:

        keys = space.spaces.keys()
        
        values_list = itertools.product(
            *[get_space_list(sub_space) for sub_space in space.spaces.values()]
        )

        return [
            {key: value for key,value in zip(keys,values)}
            for values in values_list
        ]

        return space_list

    if type(space) is gym.spaces.tuple.Tuple:
        return [
            list(element) for element in itertools.product(
                *[get_space_list(sub_space) for sub_space in space.spaces]
            )
        ]

    # -------------------------------- #

for-loop openai-gym python