ValueError:操作数无法与形状一起广播 - Keras

问题描述

我正在使用以(状态、动作、奖励、next_state)元组形式提供的另一个代理的演示来训练一个代理。我正在使用 Keras 和 Sklearn。

这是 q 学习的工作原理:

def q_learning_model():
NUM_STATES = len(states)
NUM_ACTIONS = 4
GAMMA = 0.99

model_in = tf.keras.layers.Input(shape=(1,),dtype=tf.int32)
tmp = tf.one_hot(model_in,NUM_STATES)
tmp = tf.keras.layers.Dense(NUM_ACTIONS,use_bias=False)(tmp)
model_out = tf.squeeze(tmp,axis=1)
q_function = tf.keras.Model(model_in,model_out)

state = tf.keras.layers.Input(shape=(1,dtype=tf.int32,name="State")
action = tf.keras.layers.Input(shape=(1,name="Action")
reward = tf.keras.layers.Input(shape=(1,name="Reward")
next_state = tf.keras.layers.Input(shape=(1,name="Next_State")

td_target = reward + GAMMA * tf.reduce_max(q_function(next_state),axis=-1)
predictions = tf.gather(q_function(state),action,axis=-1)
train_model = tf.keras.Model(
    inputs=[state,reward,next_state],outputs=[predictions,td_target]
)

# to date it still feels as if tf.stop_gradient is a horrible
# hack similar to DDQL to stabelize the algorithm
td_error = 0.5 * tf.abs(tf.stop_gradient(td_target) - predictions) ** 2
train_model.add_loss(td_error,[state,next_state])

predicted_action = tf.argmax(q_function(state),axis=-1)
correct_predictions = tf.keras.metrics.categorical_accuracy(
    action,predicted_action)
train_model.add_metric(correct_predictions,name="Matched_Actions",aggregation="mean")

return q_function,train_model

在主函数中,我调用一个外部数据文件,如下所示:

states,actions,rewards,next_states = load_data("data.csv")
indices = np.arange(len(states))

然后我培训我的代理:

q_scores = list()
policy_scores = list()
for train_idx,test_idx in KFold(shuffle=True).split(indices):
    train_data = [
        states[train_idx,...],actions[train_idx,rewards[train_idx,next_states[train_idx,]
    test_data = [
        states[test_idx,actions[test_idx,rewards[test_idx,next_states[test_idx,]
    
    q_function,train_q = q_learning_model()
    del q_function
    train_q.compile(optimizer="sgd",experimental_run_tf_function=False)
    train_q.fit(train_data)

    _,score = train_q.evaluate(test_data)
    q_scores.append(score)

    policy_fn,train_policy = q_learning_model()
    del policy_fn
    train_policy.compile(optimizer="sgd",experimental_run_tf_function=False)
    train_policy.fit(train_data)
    _,score = train_policy.evaluate(test_data)
    policy_scores.append(score)

似乎一切正常,但出现以下错误

self.results[0] += batch_outs[0] * (batch_end - batch_start)
ValueError: operands Could not be broadcast together with shapes (32,32,32) (3,3,3) (32,32)

即使我的 train_data 形状(状态、动作、奖励、next_state)如下:

train_data[0].shape -> (1123,)
train_data[1].shape -> (1123,)
train_data[2].shape -> (1123,)
train_data[3].shape -> (1123,)

让我知道您是否遇到过类似的问题以及您是如何解决这些问题的。 如果您发现代码中存在其他错误,请随时回复

感谢您的时间和支持

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)