问题描述
我正在尝试基于 pacman 训练一个代理,但问题是它只能运行一集。由于一集包含三个生活,我添加了变量 dead
来检查该集是否已经结束。不幸的是,特工在一集后停止训练,但是,我无法真正澄清他这样做的原因。这是我的主函数代码,应该足够了:
def main():
if __name__ =='__main__':
env = gym.make('MsPacman-v0')
#env = gym.make('FrozenLake-v0')
state_size = (88,80,1)
action_size = env.action_space.n
episodes = 1000
batch_size = 32
skip_start = 90
total_time = 0
all_reward = 0
blend = 4 # Number of images to blend
done = False
gamma = 0.99
agent = Agent(state_size,action_size,gamma,epsilon = 1.0,epsilon_min = 0.1,epsilon_decay = 0.995,update_rate = 50)
for e in range(episodes):
total_reward = 0
game_score = 0
scores = []
tot_reward = []
tot_episodes = []
#state = env.reset()
state = process_frame(env.reset())
images = deque(maxlen = blend)
images.append(state)
dead = False
lives = 2
#for skip in range(skip_start):
# env.step(0)
while not done:
dead = False
while not dead:
env.render()
total_time += 1
if total_time % agent.update_rate == 0:
agent.update_target_model()
state = blend_images(images,blend)
action = agent.epsilon_greedy(state)
next_state,reward,done,info = env.step(action)
game_score += reward
next_state = process_frame(next_state)
images.append(next_state)
next_state = blend_images(images,blend)
agent.remember(state,action,next_state,done)
state = next_state
dead = info['ale.lives']<lives
lives = info['ale.lives']
print("episode: {}/{},game score: {},avg reward: {}"
.format(e+1,episodes,game_score,all_reward/(e+1)))
print(dead)
total_reward += game_score if not dead else -100
if done:
scores.append(game_score)
tot_reward.append(total_reward)
tot_episodes.append(e)
#all_reward += game_score
#print(total_reward)
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)