Tf-agents 环境示例中 _observation_spec 的形状和 _action

问题描述

在 tensorflow documentation for TF-Agents Environments 中有一个简单（受二十一点启发）纸牌游戏的环境示例。

init 如下所示：

class CardGameEnv(py_environment.PyEnvironment):

  def __init__(self):
    self._action_spec = array_spec.BoundedArraySpec(
        shape=(),dtype=np.int32,minimum=0,maximum=1,name='action')
    self._observation_spec = array_spec.BoundedArraySpec(
        shape=(1,),name='observation')
    self._state = 0
    self._episode_ended = False

动作规范只允许 0（不要求卡片）或 1（要求卡片），因此形状为 shape=()（只需要一个整数）是明智的。

然而，我不太理解观察规格形状是 shape=(1,)，因为它只会代表当前回合中的卡片总和（因此也是一个整数）。

形状差异的原因是什么？

解决方法

一开始我以为它们是一样的。为了测试它们，我在 W3 Schools Python“Try Editor”(I accessed it through this link) 上运行了以下代码：

import numpy as np

arr1 = np.zeros((),dtype=np.int32)
arr2 = np.zeros((1),dtype=np.int32)

print("This is the first array:",arr1,"\n")
print("This is the second array:",arr2,"\n")

我得到的输出是：

This is the first array: 0

This is the second array: [0]

这使我得出结论，shape=() 是一个简单的整数，被视为一个 0 维数组，但 shape=(1,) 是一个由单个整数组成的一维数组。我希望这是准确的，因为我自己也需要一些确认。在第二个测试中进一步检查：

import numpy as np

arr1 = np.array(42)
arr2 = np.array([1])
arr3 = np.array([1,2,3,4])

print(arr1.shape)
print(arr2.shape)
print(arr3.shape)

输出是：

()
(1,)
(4,)

这似乎证实了我首先得出的结论，因为 arr1 是一个 0 维数组，而 arr3 是一个包含 4 个元素的一维数组 (as explained in the W3 Schools tutorial)，而数组 arr2 的形状与 arr3 相似，但具有不同数量的元素。

至于为什么将动作和观察分别表示为整数和一个元素的数组，可能是因为 TensorFlow 使用张量（n 维数组）工作，将观察视为数组可能更容易计算。

动作被声明为一个整数，可能是为了简化 _step() 函数内部的流程，因为使用 if/elif/else 结构的数组会有点乏味。有 other examples 的 action_specs 具有更多元素和离散/连续值，因此没有其他想法。

我不确定所有这些是否正确，但至少开始讨论似乎是个好点子。

python tensorflow tensorflow tensorflow tensorflow-agents

Tf-agents 环境示例中 _observation_spec 的形状和 _action_spec 的形状

问题描述

解决方法