自平衡机器人的遗传算法实现

问题描述

我在 Webots 中制作了一个自平衡机器人,它可以在遗传算法的帮助下学习自我平衡。该设置运行良好,但我不确定它是多么好的遗传算法,因为我自己完成了这个,所以我对此不确定。另请注意,我在这里使用的某些术语可能不是标准术语,如果我使用的术语有误,请告诉我。

计划流程

(1)创造人口

我正在使用 LQR 控制器来平衡机器人。它有四个参数,因此我群体中的每个人都是一个包含四个随机生成的参数的列表。然而,为了减少搜索间的范围,我为参数添加了一些界限,通过尝试手动平衡机器人,我得到了一个粗略的想法。

代码

def population_create(p_size,geno_size,bounds):
    
    p = [ random_param(geno_size,bounds) for i in range(p_size)]
    return p
    
def random_param(g,b):
    return [random.uniform(b[i][0],b[i][1]) for i in range(g)]

(2)给机器人发送参数,评估适合度

然后我将这些随机参数传递给机器人,看看它在大约 60 秒内的平衡情况如何。 通过对机器人的角度求和并测量机器人从初始位置的位移和负载来计算适应度。

def getPerformanceData():
    global init_translation,init_rotation,load_init_translation,load_init_rotation
    emitter.send("return_fitness".encode('utf-8'))
    while superv.step(timestep) != -1: 
 
        if reciever.getQueueLength()>0:
            message = reciever.getData().decode('utf-8')
            reciever.nextPacket()
            angle_fitness = float(message)
 
            load_translation = load.getField("translation").getSFVec3f()
            load_rotation = load.getField("rotation").getSFRotation()
            load_t_cost = sum([(i1-i2)**2 for i1,i2 in zip(load_translation,load_init_translation)])
            load_r_cost = sum([(i1-i2)**2 for i1,i2 in zip(load_rotation,load_init_rotation)])
            
 
            sbr_translation = sbr.getField("translation").getSFVec3f()
            sbr_rotation = sbr.getField("rotation").getSFRotation()
            sbr_t_cost = sum([(i1-i2)**2 for i1,i2 in zip(sbr_translation,init_translation)])
            sbr_r_cost = sum([(i1-i2)**2 for i1,i2 in zip(sbr_rotation,init_rotation)])
            #print("Angle fitness - ",angle_fitness)
            #print("Load fitness - ",(load_r_cost+load_t_cost))
            #print("Robot T fitness ",(sbr_r_cost+sbr_t_cost))
            return angle_fitness+((load_r_cost+load_t_cost)+(sbr_r_cost+sbr_t_cost))*30
   


def evaluate_genotype(genotype):
    #test_genotype = [6.70891752445785,-2.984975676757869,148.50048150101875,655.0303108723926]
    # send genotype to robot
    send_genotype(genotype)
    
    # run for some time
    run_seconds(60)
    #store fitness
    fitness = getPerformanceData()
    #print("Supervisor:fitness of ",genotype," - %f "%(fitness))
   
    sbr.resetPhysics()
    restore_robot_position()

    # Restore Robots Position
    run_seconds(5,True)
    
    sbr.resetPhysics()
    restore_robot_position()
    
    # reset physics
    return fitness

 

(3) 交叉、变异和创造新种群

然后我根据适应度对参数进行排序。好的人会原封不动地传给下一代。然后我对剩余的个体实施交叉和变异。变异是为了使参数不会脱离它们自己的域。

def population_reproduce(p,fitness):

    size_p = len(p)
    new_p = []
    
    dataframe = pd.DataFrame({"Param":p,"fitness":fitness})
    dataframe = dataframe.sort_values(['fitness'])
    dataframe = dataframe.reset_index(drop=True)    

    sorted_p = dataframe['Param'].tolist()

    elite_part = round(ELITE_PART*size_p)
    new_p = new_p + sorted_p[:elite_part]

    for i in range(size_p-elite_part):
        mom = p[random.randint(0,size_p-1)]
        dad = p[random.randint(0,size_p-1)]
        child = crossover(mom,dad)
        child = mutate(child)
        new_p.append(child)
    
    return new_p

def crossover(p1,p2):
    
    crossover = []
    Locii = [random.randint(0,8) for _ in range(len(p1))]
    
    for i in range(len(p1)):
        if Locii[i]>4:
            crossover.append(p2[i])
        else:
            crossover.append(p1[i])
        
    return crossover

def mutate(c):
    size = len(c)
    for i in range(size):
        if random.random()< MUTATION_PROBABILITY:
            if i==0:
                c[1] += random.gauss(0,2)
            elif i==1:
                c[2] += random.gauss(0,1)
            else:
                c[i] += random.gauss(0,2)*10
            
    return c   
    

(4) 重复步骤 (2) 和 (3) 预定次数

这些是我得到的一些输出

Generation 1 
Best fitness  [6.438820290364836,-2.6048039057927954,117.55012307994608,630.0038471538783] 
Best fitness Value - 0.934142 
Average fitness - 3943375.740127

Generation 2 Robot: 
Best fitness  [6.091447760688381,-3.012376769204556,140.9627885511517,639.3500316914635] 
Best fitness Value - 69.073272 
Average fitness - 1858776.135231

Generation 3 Best fitness  [6.091447760688381,163.4547194017456,569.1154953081709] 
Best fitness Value - 5.653302
Average fitness - 1248849.999916

Generation 4 Best fitness  [4.731672906621537,-3.088996850105585,150.1271801370339,537.8098244643127] 
Best fitness Value - 505.242039 
Average fitness - 1454077.404463

Generation 5 Best fitness  [4.731672906621537,129.0944277341561,536.0589859999795] 
Best fitness Value - 22.842133 
Average fitness - 1037094.178346

每次训练时,我都会得到一个有效的解决方案。

我想知道

  1. 这是遗传算法的正确实现吗?
  2. 我可以通过哪些方式改进此实施?

以下是完整脚本的链接,如果您有兴趣,还可观看有关其工作原理的视频。

Population Script

Supervisor

Robot Controller

Video

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)