我如何在银行抢劫 atari 游戏中获得奖励？

问题描述

我有点不明白为什么我的代理人在 Atari 游戏“银行抢劫”中没有获得任何奖励。每次银行抢劫后，当我渲染环境时，我都会监控收到的奖励，但是当我运行下面的代码时，我没有得到任何奖励。更有趣的是，如果我在其他环境中运行它——比如出租车，我会得到奖励，但当我在银行抢劫时尝试它时，我什么也得不到。

我的代码如下：

import gym
import torch

import matplotlib.pyplot as plt


#env = gym.make('Taxi-v3')
env = gym.make('BankHeist-ram-v0')

plt.style.use('ggplot')

number_of_states = env.observation_space.shape[0]
print(number_of_states)
number_of_actions = env.action_space.n
print(number_of_actions)
#number_of_states = env.observation_space.n
#number_of_actions = env.action_space.n





gamma = 0.9

egreedy = 0.1

Q = torch.zeros([number_of_states,number_of_actions])
print(Q)
num_episodes = 1000

steps_total = []
rewards_total = []

for i_episode in range(num_episodes):
    
    state = env.reset()
    step = 0

    while True:
    
    step += 1
    
   # env.render()
    
    random_for_egreedy = torch.rand(1)[0]
    
    if random_for_egreedy > egreedy:      
        random_values = Q[state] + torch.rand(1,number_of_actions) / 1000      
        action = torch.max(random_values,1)[1][0]  
        action = action.item()
    else:
        action = env.action_space.sample()
    
    new_state,reward,done,info = env.step(action)

    Q[state,action] = reward + gamma * torch.max(Q[new_state])
    
    state = new_state

    if done:
        steps_total.append(step)
        rewards_total.append(reward)
        print("Episode finished after %i steps" % step )
        print("Episode achived %i rewards" % reward )
        break

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

dqn openai-gym python pytorch reinforcement-learning