问题描述
对于强化学习,我已经读过张量板不是理想的,因为它提供了每个情节和/或步骤的输入。由于在强化学习中有成千上万的步骤,因此无法全面概述内容。我在这里看到了这个修改过的tensorboard类:https://pythonprogramming.net/deep-q-learning-dqn-reinforcement-learning-python-tutorial/
课程:
class ModifiedTensorBoard(TensorBoard):
# Overriding init to set initial step and writer (we want one log file for all .fit() calls)
def __init__(self,name,**kwargs):
super().__init__(**kwargs)
self.step = 1
self.writer = tf.summary.create_file_writer(self.log_dir)
self._log_write_dir = os.path.join(self.log_dir,name)
# Overriding this method to stop creating default log writer
def set_model(self,model):
pass
# Overrided,saves logs with our step number
# (otherwise every .fit() will start writing from 0th step)
def on_epoch_end(self,epoch,logs=None):
self.update_stats(**logs)
# Overrided
# We train for one batch only,no need to save anything at epoch end
def on_batch_end(self,batch,logs=None):
pass
# Overrided,so won't close writer
def on_train_end(self,_):
pass
def on_train_batch_end(self,logs=None):
pass
# Custom method for saving own metrics
# Creates writer,writes custom metrics and closes writer
def update_stats(self,**stats):
self._write_logs(stats,self.step)
def _write_logs(self,logs,index):
with self.writer.as_default():
for name,value in logs.items():
tf.summary.scalar(name,value,step=index)
self.step += 1
self.writer.flush()
,我想使其在此层工作:
n_actions = env.action_space.n
input_dim = env.observation_space.n
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(20,input_dim = input_dim,activation = 'relu'))#32
model.add(tf.keras.layers.Dense(10,activation = 'relu'))#10
model.add(tf.keras.layers.Dense(n_actions,activation = 'linear'))
model.compile(optimizer=tf.keras.optimizers.Adam(),loss = 'mse')
但是我还没有开始工作。以前曾经使用过张量板的人,你知道如何设置吗?任何见解都将不胜感激。
解决方法
我在训练 RL 算法时总是使用 tensorboard,没有像上面那样修改过任何代码。 只需启动您的作家:
writer = tf.summary.create_file_writer(logdir=log_folder)
开始你的代码:
with writer.as_default():
... do everythng indented inside here
例如如果您想每 100 步将您的奖励或第一层的权重保存到张量板上,只需执行以下操作:
if step % 100 = 0:
tf.summary.scalar(name="reward",data=reward,step=step)
dqn_variable = model.trainable_variables
tf.summary.histogram(name="dqn_variables",data=tf.convert_to_tensor(dqn_variable[0]),step=step)
writer.flush()
这应该可以解决问题:)