问题描述
我目前正在为游戏 Snake 实施带有线性函数逼近的 Q-Learning,但我似乎没有让它发挥作用:权重越来越大(无论是在正方向还是在负方向)和最终都变成 NaN,我不知道为什么。 也许我的梯度下降实现有问题,或者我的功能有问题。我不确定。
这是我使用线性函数逼近的 Q-Learning 代理的代码:
public class ApproxQAgent extends Agent {
private float[] weights;
private Function<Snake,Float>[] features;
private Snake oldSnake;
public ApproxQAgent(String path,Function<Snake,Float>...features) {
super(path);
this.features = features;
this.weights = new float[features.length];
}
public float getFeature(Snake snake,int action,Float> feature) {
Snake newSnake = new Snake(snake);
newSnake.takeAction(action);
return feature.apply(newSnake);
}
@Override
public void train(Snake snake,RLSnake rlSnake,float alpha,float gamma,float epsilon) {
if(oldSnake != null) {
float reward = rlSnake.reward;
int newState = getState(snake);
float qPlus = reward + gamma * getBestValue(snake);
//learn weights
for(int i = 0; i < weights.length; ++i) {
weights[i] = weights[i] - alpha * (qPlus - getQValue(oldSnake,rlSnake.action)* getFeature(oldSnake,rlSnake.action,features[i]));
}
rlSnake.action = getEpsilonGreedyAction(newState,epsilon);
rlSnake.state = newState;
}
oldSnake = new Snake(snake);
}
public int getEpsilonGreedyAction(Snake snake,float epsilon) {
if(Math.random() < epsilon) {
int[] actions = qStore.getActions();
return actions[new Random().nextInt(actions.length)];
}
return getBestAction(snake);
}
@Override
public int getBestAction(Snake snake) {
float bestValue = Float.NEGATIVE_INFINITY;
int bestAction = 0;
for(int action : qStore.getActions()) {
float value = getQValue(snake,action);
if(value > bestValue) {
bestValue = value;
bestAction = action;
}
}
return bestAction;
}
public float getBestValue(Snake snake) {
float bestValue = Float.NEGATIVE_INFINITY;
for(int action : qStore.getActions()) {
float value = getQValue(snake,action);
if(value > bestValue) {
bestValue = value;
}
}
return bestValue;
}
public float getQValue(Snake snake,int action) {
float sum = 0;
for(int i = 0; i < features.length; ++i) {
sum += weights[i] * getFeature(snake,action,features[i]);
}
return sum;
}
}
我很确定问题不在 Agent 类中。
这些是我的特点:
Function<Snake,Float> f1 = s -> (float)s.tails.get(0).cx;
Function<Snake,Float> f2 = s -> (float)s.tails.get(0).cy;
Function<Snake,Float> f3 = s -> (float)(Game.getEcsManager().getEntityWithName("food").getComponentByType(Food.class).cx - s.tails.get(0).cx);
Function<Snake,Float> f4 = s -> (float)(Game.getEcsManager().getEntityWithName("food").getComponentByType(Food.class).cy - s.tails.get(0).cy);
Function<Snake,Float> f5 = s -> (float)s.tails.size();
Function<Snake,Float> f6 = s -> s.hasCollisionWithWall() ? 1.0f : 0.0f;
Function<Snake,Float> f7 = s -> s.hasCollisionWithTail() ? 1.0f : 0.0f;
Function<Snake,Float> f8 = s -> (float)s.tails.get(s.tails.size() - 1).cx;
Function<Snake,Float> f9 = s -> (float)s.tails.get(s.tails.size() - 1).cy;
Function<Snake,Float> f10 = s -> (float)s.vx;
Function<Snake,Float> f11 = s -> (float)s.vy;
我已经尝试对特征进行归一化并对权重进行归一化(按最高权重),但要么权重仍然变为 NaN,要么学习根本不起作用。
如果你能帮助我,我会很棒。
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)