RLLib 调整 PPOtrainer 但不调整 A2CTrainer

问题描述

我正在将两种算法与 CartPole 环境进行比较。将导入设为：

import ray
from ray import tune
from ray.rllib import agents
ray.init() # Skip or set to ignore if already called

完美运行：

experiment = tune.run(
    agents.ppo.PPOTrainer,config={
        "env": "CartPole-v1","num_gpus": 1,"num_workers": 0,"num_envs_per_worker": 50,"rollout_fragment_length": 100,"train_batch_size": 5000,"sgd_minibatch_size": 500,"num_sgd_iter": 10,"entropy_coeff": 0.01,"lr_schedule": [
              [0,0.0005],[10000000,0.000000000001],],"lambda": 0.95,"kl_coeff": 0.5,"clip_param": 0.1,"vf_share_layers": False,},metric="episode_reward_mean",mode="max",stop={"training_iteration": 100},checkpoint_at_end=True,)

但是当我对 A2C 代理执行相同操作时：

experiment = tune.run(
    agents.a3c.A2CTrainer,)

它返回这个异常：

---------------------------------------------------------------------------
TuneError                                 Traceback (most recent call last)
<ipython-input-9-6680e67f9343> in <module>()
     23     mode="max",24     stop={"training_iteration": 100},---> 25     checkpoint_at_end=True,26 )

/usr/local/lib/python3.6/dist-packages/ray/tune/tune.py in run(run_or_experiment,name,metric,mode,stop,time_budget_s,config,resources_per_trial,num_samples,local_dir,search_alg,scheduler,keep_checkpoints_num,checkpoint_score_attr,checkpoint_freq,checkpoint_at_end,verbose,progress_reporter,loggers,log_to_file,trial_name_creator,trial_dirname_creator,sync_config,export_formats,max_failures,fail_fast,restore,server_port,resume,queue_trials,reuse_actors,trial_executor,raise_on_Failed_trial,callbacks,ray_auto_init,run_errored_only,global_checkpoint_period,with_server,upload_dir,sync_to_cloud,sync_to_driver,sync_on_checkpoint)
    432     if incomplete_trials:
    433         if raise_on_Failed_trial:
--> 434             raise TuneError("Trials did not complete",incomplete_trials)
    435         else:
    436             logger.error("Trials did not complete: %s",incomplete_trials)

TuneError: ('Trials did not complete',[A2C_CartPole-v1_6acda_00000])

谁能告诉我这是怎么回事？我不知道这是否与我正在使用的库的版本有关，或者我编写了错误的代码。这是一个常见问题吗？

解决方法

由于您从 PPO 试用版复制的配置，A2C 代码失败：“sgd_minibatch_size”、“kl_coeff”和许多其他配置是 PPO 特定的配置，这会导致使用 A2C 运行时出现问题。

错误在日志目录中的“error.txt”中有说明。

python ray reinforcement-learning rllib