问题描述
我一直在学习一些化学开发的算法,我遇到了遗传算法。因此,我编写了一个简单的 GA,尝试将一组给定符号(基因)中的目标字符串归零。
所以,
genes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMnopQRSTUVWXYZ 1234567890,.-;:_!\"#%&/()=?@${[]}"
和target = to be or not be that is the question
。
这是我的头文件:
#a chromosome in a genetic algorithm is a possible solution to the problem
import random
genes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMnop \
QRSTUVWXYZ 1234567890,.-;:_!\"#%&/()=?@${[]}"
class Individual:
#define some properties
def __init__(self,chromosome):
self.chromosome = chromosome #the actual solution
self.fitness = self.calc_fitness()
def mutated_genes(self):
#mutate the genes you have in an Individual
gene = random.choice(genes)
return gene
def create_gnome(self,target):
gnome_len = len(target)
return [self.mutated_genes() for _ in range(gnome_len)]
def mate(self,par2):
#mate with another individual
#child chromosome
child_chromosome = []
for gp1,gp2 in zip(self.chromosome,par2.chromosome):
#generate a random number
prob = random.random()
#if prob is less than 0.45,accept gene from parent 1
if prob<0.45:
child_chromosome.append(gp1)
elif prob < 0.9:
child_chromosome.append(gp2) #if between 0.45 and 0.9,accept gene from parent 2
else:
child_chromosome.append(self.mutated_genes())
return Individual(child_chromosome)
def calc_fitness(self):
#calculate a fitness score
#this is the number of characters in the string which
#match the target
fitness = 0
for gs,gt in zip(self.chromosome,target):
if gs!=gt:
fitness += 1
return fitness
这是我的驱动程序代码:
%run "string_search.ipynb"
population_size = 100
target = "to be or not to be that is the question"
generation = 1 #setting up generations to evolve
found = False #boolean
population = []
#generate a population
for i in range(population_size):
indiv = Individual([])
gnome = indiv.create_gnome(target)
population.append(Individual(gnome))
while not found:
#sort the population in increasing order of fitness score
population = sorted(population,key = lambda x:x.fitness)
#if the individual having lowest fitness score is 0,then we stop the search
if population[0].fitness == 0:
found = True
break
#otherwise,create a new generation
new_generation = []
#10% of the fittest population goes to the next generation
s = int(0.1*population_size)
new_generation.extend(population[:s])
#from 50% of fittest population,individuals will mate to produce offspring
s = int(0.9*population_size)
for i in range(s):
parent1 = random.choice(population[:50]) #choose some individual from the top 50%
parent2 = random.choice(population[:50]) #choose another individual from the top 50%
child = parent1.mate(parent2)
new_generation.append(child)
population = new_generation
generation += 1
str_chr = "".join(population[0].chromosome)
print("Generation: {} \t String: {} \t fitness: {}".format(generation,str_chr,population[0].fitness))
这是我的结果:
.
.
.
Generation: 14465 String: to be or not to be that is the 8uestion fitness: 1
Generation: 14466 String: to be or not to be that is the 8uestion fitness: 1
Generation: 14467 String: to be or not to be that is the 8uestion fitness: 1
Generation: 14468 String: to be or not to be that is the 8uestion fitness: 1
Generation: 14469 String: to be or not to be that is the 8uestion fitness: 1
Generation: 14470 String: to be or not to be that is the question fitness: 1
我有点着迷于该算法如何设法得出最多一个字符的字符串(question
是 8uestion
)。
我的问题是,为什么遗传算法的效果和它们一样好?
为什么在真正收敛到真正的解决方案之前,它们会在 fitness = 1
处停留很长时间?为什么增加种群规模会提高收敛速度?如果我通过适应度分数改变接受父母 1 而不是父母 2 的概率,算法会变得更好吗?
我一直试图理解这一点,但大多数博客只是简单地实现了一些代码和经验状态,这些代码和经验状态使人口越多,收敛时间越长。如果您对我有任何建议,我将不胜感激。
解决方法
在我看来,这项任务非常适合遗传算法,因为有明显的适应度、交叉和变异选择。您保留具有最正确字符的个体并重新组合它们,这使得最终得到非常相似的字符串的可能性非常高。每个字符独立地对适应度做出贡献,“基因”之间没有相互作用,这意味着每个字符都可以独立优化。
对于长期存在的最后一点健身损失,这是由于您实施的随机突变造成的。如果父母双方只有一个与目标不同的角色,那么后代很可能会与目标完全匹配。然而,每条新染色体的 10% 是随机生成的,可能会引入额外的损失。虽然突变对于探索至关重要,但您可能希望在优化过程接近尾声时降低突变率,至少对于一部分总体而言,以便更具剥削性和更少探索性。这可能会让您达到理想的 0
健康度。