如何获取 Openai 健身房空间的维度使用 Keras 构建神经网络时要在 DQN 中使用的元组

问题描述

我使用 Openai Gym 空间构建了一个自定义环境。元组，因为我的观察由以下组成：小时（0-23）、天（1-7）、月（1-12），它们是离散的；四个连续的数字，来自一个 csv 文件；和一个形状数组 (4*24)，它们也来自一个 csv 文件。

self.observation_space = spaces.Tuple(spaces=(
                                             spaces.Box(low=-high,high=high,shape=(4,),dtype=np.float16),spaces.Box(low=-high,24),spaces.discrete(24),spaces.discrete(7),spaces.discrete(12)))

这是我从 csv 文件中读取数据的 reset() 函数：

    def reset(self):
        index = 0
        hour = 0
        day = 1
        month = 6
        array1 = np.array([
            self.df.loc[index,'A'],self.df.loc[index,'B'],'C'],'D'],],dtype=np.float16)
        array2 = np.array([
            self.df.loc[index: index+23,self.df.loc[index: index+23,dtype=np.float16)
        tup = (array1,array2,hour,day,month)
        return tup

为了训练代理，我想使用 DQN 算法，它是来自 keras-rl library 的 DQNAgent 这是我构建神经网络模型的代码：

model = Sequential()
model.add(Flatten(input_shape=(1,) + env.observation_space.shape))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(nb_actions))
model.add(Activation('linear'))

根据我的理解，spaces.Tuple 实例没有 shape() 方法，len 方法返回元组中的空格数。例如len = 5 在我的环境中

state = env.reset()
len = state.__len__()

但是要构建神经网络，似乎需要 4 + 4*24 + 3 = 103 个输入神经元。我尝试将输入维度硬编码为：

model.add(Flatten(input_shape=(1,) + (103,)))

但我收到以下错误：

ValueError: 检查输入时出错：预期 flatten_1_input 具有形状 (1,103) 但得到形状为 (1,5) 的数组。

于是我尝试了：

model.add(Flatten(input_shape=(1,) + (env.observation_space.__len__(),)))

但我也有错误：

TypeError：只有大小为 1 的数组可以转换为 Python 标量上述异常是以下异常的直接原因：回溯（最近一次调用最后一次）：文件“C:/Users/yuche/DropBox/risk hedging/rl-project/DqndamarketAgent.py”，第 37 行，在 dqn.fit（环境，nb_steps=1440，可视化=真，详细=2）文件“C:\Users\yuche\anaconda3\envs\py37\lib\site-packages\rl\core.py”，第169行，合适 action = self.forward（观察）文件“C:\Users\yuche\anaconda3\envs\py37\lib\site-packages\rl\agents\dqn.py”，第228行，向前 q_values = self.compute_q_values(state) 文件“C:\Users\yuche\anaconda3\envs\py37\lib\site-packages\rl\agents\dqn.py”，第69行，在compute_q_values q_values = self.compute_batch_q_values([state]).flatten() 文件“C:\Users\yuche\anaconda3\envs\py37\lib\site-packages\rl\agents\dqn.py”，第64行，在compute_batch_q_values q_values = self.model.predict_on_batch(batch) 文件“C:\Users\yuche\anaconda3\envs\py37\lib\site-packages\keras\engine\training.py”，第 1580 行，在 predict_on_batch 中输出 = self.predict_function(ins) 文件“C:\Users\yuche\anaconda3\envs\py37\lib\site-packages\tensorflow\python\keras\backend.py”，第3277行，调用 dtype=tensor_type.as_numpy_dtype)) 文件“C:\Users\yuche\anaconda3\envs\py37\lib\site-packages\numpy\core_asarray.py”，第83行，asarray 返回数组（a，dtype，copy=False，order=order） ValueError: 使用序列设置数组元素。

我用谷歌搜索了这个错误并找到了可能的原因：

当您定义或构建的函数期望任何单个参数但获得数组时，就会发生这种情况。

看来我还是需要103个而不是5个神经元作为输入，但是Tuple直接将两个数组馈送到网络中。我想知道，DQN中Tuple的典型用法是什么？

顺便说一句，我想出了一个使用 Spaces.Box 而不是 Spaces.Tuple 的方法：

self.observation_space = spaces.Box(low=-high,shape=(103,dtype=np.float16)

但这似乎不是最理想的方式。

提前致谢！

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

dqn keras keras openai-gym python reinforcement-learning