问题描述
我想创建一个游戏,在游戏中玩家的行为和奖励/后果都有提前时间,因此,我不想与玩家完全分享观察,但仍然坚持下去,因为这是需要的为了未来。有办法吗? 如果我在init中创建变量并对其进行更新,则该变量对于游戏的每个实例都是可见的,因此玩家已经比我所知道的要了解得多。
解决方法
您对Cartpole的需求的一个大概例子是这样的:
import gym
from gym.utils import seeding
import numpy as np
class myEnv(gym.Env):
def __init__(self,*args,**kwargs):
"""
Define all the necessary stuff here
"""
self.env = gym.make('CartPole-v1') # add stuff here to define game params
self.action_space = self.env.action_space
self.observation_space = self.env.observation_space
self.past_actions = []
self.delay = 2 # to have a delay of two timesteps
def reset(self):
"""
Define the reset
"""
self.observation = self.env.reset()
return self.observation
def step(self,action):
"""
Add the delay of actions here
"""
self.past_actions.append(action) # to keep track of actions
reward = 0; done = 0; info = {} # reward,done and info are 0,{} for first two timesteps
if len(self.past_actions) > self.delay:
present_action = self.past_actions.pop(0)
# change observation,reward,done,info
# according to the action 'delay' timesteps ago
self.observation,info = self.env.step(present_action)
return self.observation,info
def seed(self,seed=0):
"""
Define seed method here
"""
self.np_random,seed = seeding.np_random(seed)
return self.env.seed(seed=seed)
def render(self,mode="human",**kwargs):
"""
Define rendering method here
"""
return self.env.render(*args,**kwargs)
def close(self):
"""
Define close method here
"""
return self.env.close()