问题描述
我的字符串 -
name_target = "ARUN GULABRAO INDULKAR"
[ARUNGULABRAOINDULKAR,1]
[ARUNGINDULKAR,0.9]
[ARUNGULABRAOI,0.9]
[AGULABRAOINDULKAR,0.9]
[ARUNGI,0.8]
[AGINDULKAR,0.8]
[AGULABRAOI,0.8]
[ARUNINDULKARGULABRAO,1]
[ARUNIGULABRAO,0.9]
[ARUNINDULKARG,0.9]
[AINDULKARGULABRAO,0.9]
[ARUNIG,0.8]
[AIGULABRAO,0.8]
[AINDULKARG,0.8]
[GULABRAOARUNINDULKAR,1]
[GULABRAOAINDULKAR,0.9]
[GULABRAOARUNI,0.9]
[GARUNINDULKAR,0.9]
[GULABRAOAI,0.8]
[GAINDULKAR,0.8]
[GARUNI,0.8]
[GULABRAOINDULKaraRUN,1]
[GULABRAOIARUN,0.9]
[GULABRAOINDULKara,0.9]
[GINDULKaraRUN,0.9]
[GULABRAOIA,0.8]
[GIARUN,0.8]
[GINDULKara,0.8]
[INDULKaraRUNGULABRAO,1]
[INDULKaraGULABRAO,0.9]
[INDULKaraRUNG,0.9]
[IARUNGULABRAO,0.9]
[INDULKaraG,0.8]
[IAGULABRAO,0.8]
[IARUNG,0.8]
[INDULKARGULABRAOARUN,1]
[INDULKARGARUN,0.9]
[INDULKARGULABRAOA,0.9]
[IGULABRAOARUN,0.9]
[INDULKARGA,0.8]
[IGARUN,0.8]
[IGULABRAOA,0.8]
不关心这个输出数据结构,它可以是任何东西。如果不使用缩写和全名,则权重为 1
。
如果使用缩写,权重会减少 10%。例如,第二个输出行中的 ARUNGINDULKAR
得到 0.9
,因为中间名被缩写了。 ARUNGI
得到 0.8
,因为中间名和姓氏被缩写了。
我有效地使用了 itertools.permutations(name_target)
来生成第一组排列。
我无法理解如何组合缩写。 name_target
在被 ' '
请忽略预期输出中的重复项。
解决方法
您可以使用带有生成器的递归来构建名称缩写组合。 itertools.permutations
还用于创建原始输入名称的所有可能排序,并将这些全名组合中的每一个都传递给 get_combos
,在那里生成缩写组合。一个布尔标志(True
代表全名,False
代表缩写)与 get_combos
中生成的每个名称组件相关联,允许计算权重:
from itertools import permutations as prmt
def get_combos(d,l,c = []):
if d:
yield from get_combos(d[1:],c+[(d[0],True)])
if sum(not b for _,b in c) + 1 < l:
yield from get_combos(d[1:],c+[(d[0][0],False)])
else:
yield [''.join(a for a,_ in c),1-sum(0.1 for _,b in c if not b)]
name_target = "ARUN GULABRAO INDULKAR"
n = name_target.split()
l = len(n)
result = [i for b in prmt(n,l) for i in get_combos(b,l)]
输出:
[['ARUNGULABRAOINDULKAR',1],['ARUNGULABRAOI',0.9],['ARUNGINDULKAR',['ARUNGI',0.8],['AGULABRAOINDULKAR',['AGULABRAOI',['AGINDULKAR',['ARUNINDULKARGULABRAO',['ARUNINDULKARG',['ARUNIGULABRAO',['ARUNIG',['AINDULKARGULABRAO',['AINDULKARG',['AIGULABRAO',['GULABRAOARUNINDULKAR',['GULABRAOARUNI',['GULABRAOAINDULKAR',['GULABRAOAI',['GARUNINDULKAR',['GARUNI',['GAINDULKAR',['GULABRAOINDULKARARUN',['GULABRAOINDULKARA',['GULABRAOIARUN',['GULABRAOIA',['GINDULKARARUN',['GINDULKARA',['GIARUN',['INDULKARARUNGULABRAO',['INDULKARARUNG',['INDULKARAGULABRAO',['INDULKARAG',['IARUNGULABRAO',['IARUNG',['IAGULABRAO',['INDULKARGULABRAOARUN',['INDULKARGULABRAOA',['INDULKARGARUN',['INDULKARGA',['IGULABRAOARUN',['IGULABRAOA',['IGARUN',0.8]]