如何使用稳定的基线3使模型在循环中学习？

问题描述

在来自 stablebaselines3 网站 (https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html) 的示例代码中，模型首先通过 model.learn(total_timesteps=25000) 行学习，然后可以在播放循环中使用。

现在，由于我希望能够在代理进行学习的同时监控不同的参数（来自自定义环境），我的问题是：如何在播放循环中使用 model.learn？>

import gym
from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env

# Parallel environments
env = make_vec_env("CartPole-v1",n_envs=4)

model = PPO("MlpPolicy",env,verbose=1)
model.learn(total_timesteps=25000)
model.save("ppo_cartpole")

del model # remove to demonstrate saving and loading

model = PPO.load("ppo_cartpole")

obs = env.reset()
while True:
    action,_states = model.predict(obs)
    obs,rewards,dones,info = env.step(action)
    env.render()

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

python-3.x reinforcement-learning stable-baselines