问题描述
我有点不明白为什么我的代理人在 Atari 游戏“银行抢劫”中没有获得任何奖励。每次银行抢劫后,当我渲染环境时,我都会监控收到的奖励,但是当我运行下面的代码时,我没有得到任何奖励。 更有趣的是,如果我在其他环境中运行它——比如出租车,我会得到奖励,但当我在银行抢劫时尝试它时,我什么也得不到。
我的代码如下:
import gym
import torch
import matplotlib.pyplot as plt
#env = gym.make('Taxi-v3')
env = gym.make('BankHeist-ram-v0')
plt.style.use('ggplot')
number_of_states = env.observation_space.shape[0]
print(number_of_states)
number_of_actions = env.action_space.n
print(number_of_actions)
#number_of_states = env.observation_space.n
#number_of_actions = env.action_space.n
gamma = 0.9
egreedy = 0.1
Q = torch.zeros([number_of_states,number_of_actions])
print(Q)
num_episodes = 1000
steps_total = []
rewards_total = []
for i_episode in range(num_episodes):
state = env.reset()
step = 0
while True:
step += 1
# env.render()
random_for_egreedy = torch.rand(1)[0]
if random_for_egreedy > egreedy:
random_values = Q[state] + torch.rand(1,number_of_actions) / 1000
action = torch.max(random_values,1)[1][0]
action = action.item()
else:
action = env.action_space.sample()
new_state,reward,done,info = env.step(action)
Q[state,action] = reward + gamma * torch.max(Q[new_state])
state = new_state
if done:
steps_total.append(step)
rewards_total.append(reward)
print("Episode finished after %i steps" % step )
print("Episode achived %i rewards" % reward )
break
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)