强化学习：权重变成 NaN

问题描述

我目前正在为游戏 Snake 实施带有线性函数逼近的 Q-Learning，但我似乎没有让它发挥作用：权重越来越大（无论是在正方向还是在负方向）和最终都变成 NaN，我不知道为什么。也许我的梯度下降实现有问题，或者我的功能有问题。我不确定。

这是我使用线性函数逼近的 Q-Learning 代理的代码：

public class ApproxQAgent extends Agent {
    private float[] weights;
    private Function<Snake,Float>[] features;
    private Snake oldSnake;
    
    public ApproxQAgent(String path,Function<Snake,Float>...features) {
        super(path);
        this.features = features;
        this.weights = new float[features.length];
    }

    public float getFeature(Snake snake,int action,Float> feature) {
        Snake newSnake = new Snake(snake);
        newSnake.takeAction(action);
        return feature.apply(newSnake);
    }

    @Override
    public void train(Snake snake,RLSnake rlSnake,float alpha,float gamma,float epsilon) {
        if(oldSnake != null) {
            float reward = rlSnake.reward;
            int newState = getState(snake);
            float qPlus = reward + gamma * getBestValue(snake);
            //learn weights
            for(int i = 0; i < weights.length; ++i) {
                weights[i] = weights[i] - alpha * (qPlus - getQValue(oldSnake,rlSnake.action)* getFeature(oldSnake,rlSnake.action,features[i]));
            }
            
            rlSnake.action = getEpsilonGreedyAction(newState,epsilon);
            rlSnake.state = newState;
        }
        oldSnake = new Snake(snake);
    }


    public int getEpsilonGreedyAction(Snake snake,float epsilon) {
        if(Math.random() < epsilon) {
            int[] actions = qStore.getActions();
            return actions[new Random().nextInt(actions.length)];
        }
        
        return getBestAction(snake);
    }
    
    @Override
    public int getBestAction(Snake snake) {
        float bestValue = Float.NEGATIVE_INFINITY;
        int bestAction = 0;
        for(int action : qStore.getActions()) {
            float value = getQValue(snake,action);
            if(value > bestValue) {
                bestValue = value;
                bestAction = action;
            }
        }
        
        return bestAction;
    }
    
    public float getBestValue(Snake snake) {
        float bestValue = Float.NEGATIVE_INFINITY;
        for(int action : qStore.getActions()) {
            float value = getQValue(snake,action);
            if(value > bestValue) {
                bestValue = value;
            }
        }
        
        return bestValue;
    }


    public float getQValue(Snake snake,int action) {
        float sum =  0;
        for(int i = 0; i < features.length; ++i) {
            sum += weights[i] * getFeature(snake,action,features[i]);
        }
        
        return sum;
    }

    
    
    
    

}

我很确定问题不在 Agent 类中。

这些是我的特点：

Function<Snake,Float> f1 = s -> (float)s.tails.get(0).cx;
        Function<Snake,Float> f2 = s -> (float)s.tails.get(0).cy;
        Function<Snake,Float> f3 = s -> (float)(Game.getEcsManager().getEntityWithName("food").getComponentByType(Food.class).cx - s.tails.get(0).cx);
        Function<Snake,Float> f4 = s -> (float)(Game.getEcsManager().getEntityWithName("food").getComponentByType(Food.class).cy - s.tails.get(0).cy);
        Function<Snake,Float> f5 = s -> (float)s.tails.size();
        Function<Snake,Float> f6 = s -> s.hasCollisionWithWall() ? 1.0f : 0.0f;
        Function<Snake,Float> f7 = s -> s.hasCollisionWithTail() ? 1.0f : 0.0f;
        Function<Snake,Float> f8 = s -> (float)s.tails.get(s.tails.size() - 1).cx;
        Function<Snake,Float> f9 = s -> (float)s.tails.get(s.tails.size() - 1).cy;
        Function<Snake,Float> f10 = s -> (float)s.vx;
        Function<Snake,Float> f11 = s -> (float)s.vy;

我已经尝试对特征进行归一化并对权重进行归一化（按最高权重），但要么权重仍然变为 NaN，要么学习根本不起作用。

如果你能帮助我，我会很棒。

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

gradient-descent java java reinforcement-learning