为什么遗传算法有效？他们什么时候停止工作？

问题描述

我一直在学习一些化学开发的算法，我遇到了遗传算法。因此，我编写了一个简单的 GA，尝试将一组给定符号（基因）中的目标字符串归零。

所以， genes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMnopQRSTUVWXYZ 1234567890,.-;:_!\"#%&/()=?@${[]}" 和target = to be or not be that is the question。

这是我的头文件：

#a chromosome in a genetic algorithm is a possible solution to the problem

import random 

genes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMnop \
QRSTUVWXYZ 1234567890,.-;:_!\"#%&/()=?@${[]}"


class Individual: 
    #define some properties 
    def __init__(self,chromosome):
        self.chromosome = chromosome #the actual solution 
        self.fitness = self.calc_fitness() 
        
    def mutated_genes(self): 
        #mutate the genes you have in an Individual 
        gene = random.choice(genes) 
        return gene 
    
    def create_gnome(self,target):
        gnome_len = len(target) 
        return [self.mutated_genes() for _ in range(gnome_len)]
    
    def mate(self,par2):
        #mate with another individual 
        #child chromosome 
        child_chromosome = [] 
        for gp1,gp2 in zip(self.chromosome,par2.chromosome):
            #generate a random number 
            prob = random.random() 
            
            #if prob is less than 0.45,accept gene from parent 1 
            if prob<0.45:
                child_chromosome.append(gp1)
            elif prob < 0.9:
                child_chromosome.append(gp2) #if between 0.45 and 0.9,accept gene from parent 2
            else: 
                child_chromosome.append(self.mutated_genes()) 
                
        return Individual(child_chromosome)
                
    def calc_fitness(self):
        #calculate a fitness score
        #this is the number of characters in the string which 
        #match the target 
        fitness = 0
        for gs,gt in zip(self.chromosome,target):
            if gs!=gt:
                fitness += 1
        return fitness

这是我的驱动程序代码：

%run "string_search.ipynb"
population_size = 100 
target = "to be or not to be that is the question" 
generation = 1 #setting up generations to evolve  
found = False #boolean 
population = [] 
#generate a population 
for i in range(population_size):
    indiv = Individual([])
    gnome = indiv.create_gnome(target)
    population.append(Individual(gnome))
while not found: 
    #sort the population in increasing order of fitness score 
    population = sorted(population,key = lambda x:x.fitness)
    #if the individual having lowest fitness score is 0,then we stop the search
    if population[0].fitness == 0:
        found = True 

        break 
    #otherwise,create a new generation 
    new_generation = [] 
    #10% of the fittest population goes to the next generation 
    s = int(0.1*population_size)
    new_generation.extend(population[:s])
    
    #from 50% of fittest population,individuals will mate to produce offspring 
    s = int(0.9*population_size)
    for i in range(s):
        parent1 = random.choice(population[:50]) #choose some individual from the top 50%
        parent2 = random.choice(population[:50]) #choose another individual from the top 50%
        child = parent1.mate(parent2) 
        new_generation.append(child) 
    population = new_generation 
    generation += 1 
    str_chr = "".join(population[0].chromosome)
    print("Generation: {} \t String: {} \t fitness: {}".format(generation,str_chr,population[0].fitness))

这是我的结果：

.
.
.
Generation: 14465    String: to be or not to be that is the 8uestion     fitness: 1
Generation: 14466    String: to be or not to be that is the 8uestion     fitness: 1
Generation: 14467    String: to be or not to be that is the 8uestion     fitness: 1
Generation: 14468    String: to be or not to be that is the 8uestion     fitness: 1
Generation: 14469    String: to be or not to be that is the 8uestion     fitness: 1
Generation: 14470    String: to be or not to be that is the question     fitness: 1

我有点着迷于该算法如何设法得出最多一个字符的字符串（question 是 8uestion）。

我的问题是，为什么遗传算法的效果和它们一样好？为什么在真正收敛到真正的解决方案之前，它们会在 fitness = 1 处停留很长时间？为什么增加种群规模会提高收敛速度？如果我通过适应度分数改变接受父母 1 而不是父母 2 的概率，算法会变得更好吗？

我一直试图理解这一点，但大多数博客只是简单地实现了一些代码和经验状态，这些代码和经验状态使人口越多，收敛时间越长。如果您对我有任何建议，我将不胜感激。

解决方法

在我看来，这项任务非常适合遗传算法，因为有明显的适应度、交叉和变异选择。您保留具有最正确字符的个体并重新组合它们，这使得最终得到非常相似的字符串的可能性非常高。每个字符独立地对适应度做出贡献，“基因”之间没有相互作用，这意味着每个字符都可以独立优化。

对于长期存在的最后一点健身损失，这是由于您实施的随机突变造成的。如果父母双方只有一个与目标不同的角色，那么后代很可能会与目标完全匹配。然而，每条新染色体的 10% 是随机生成的，可能会引入额外的损失。虽然突变对于探索至关重要，但您可能希望在优化过程接近尾声时降低突变率，至少对于一部分总体而言，以便更具剥削性和更少探索性。这可能会让您达到理想的 0 健康度。

algorithm genetic-algorithm python time-complexity