生成所有排列,包括带有权重的缩写

问题描述

我的字符串 -

name_target = "ARUN GULABRAO INDULKAR"

我想用原始名称和缩写生成所有排列,并为每个排列分配权重 -

[ARUNGULABRAOINDULKAR,1]
[ARUNGINDULKAR,0.9]
[ARUNGULABRAOI,0.9]
[AGULABRAOINDULKAR,0.9]
[ARUNGI,0.8]
[AGINDULKAR,0.8]
[AGULABRAOI,0.8]
[ARUNINDULKARGULABRAO,1]
[ARUNIGULABRAO,0.9]
[ARUNINDULKARG,0.9]
[AINDULKARGULABRAO,0.9]
[ARUNIG,0.8]
[AIGULABRAO,0.8]
[AINDULKARG,0.8]
[GULABRAOARUNINDULKAR,1]
[GULABRAOAINDULKAR,0.9]
[GULABRAOARUNI,0.9]
[GARUNINDULKAR,0.9]
[GULABRAOAI,0.8]
[GAINDULKAR,0.8]
[GARUNI,0.8]
[GULABRAOINDULKaraRUN,1]
[GULABRAOIARUN,0.9]
[GULABRAOINDULKara,0.9]
[GINDULKaraRUN,0.9]
[GULABRAOIA,0.8]
[GIARUN,0.8]
[GINDULKara,0.8]
[INDULKaraRUNGULABRAO,1]
[INDULKaraGULABRAO,0.9]
[INDULKaraRUNG,0.9]
[IARUNGULABRAO,0.9]
[INDULKaraG,0.8]
[IAGULABRAO,0.8]
[IARUNG,0.8]
[INDULKARGULABRAOARUN,1]
[INDULKARGARUN,0.9]
[INDULKARGULABRAOA,0.9]
[IGULABRAOARUN,0.9]
[INDULKARGA,0.8]
[IGARUN,0.8]
[IGULABRAOA,0.8]

不关心这个输出数据结构,它可以是任何东西。如果不使用缩写和全名,则权重为 1

如果使用缩写,权重会减少 10%。例如,第二个输出行中的 ARUNGINDULKAR 得到 0.9,因为中间名被缩写了。 ARUNGI 得到 0.8,因为中间名和姓氏被缩写了。

我有效地使用了 itertools.permutations(name_target)生成第一组排列。

我无法理解如何组合缩写。 name_target 在被 ' '

分割时可以是可变长度

请忽略预期输出中的重复项。

解决方法

您可以使用带有生成器的递归来构建名称缩写组合。 itertools.permutations 还用于创建原始输入名称的所有可能排序,并将这些全名组合中的每一个都传递给 get_combos,在那里生成缩写组合。一个布尔标志(True 代表全名,False 代表缩写)与 get_combos 中生成的每个名称组件相关联,允许计算权重:

from itertools import permutations as prmt
def get_combos(d,l,c = []):
   if d:
      yield from get_combos(d[1:],c+[(d[0],True)])
      if sum(not b for _,b in c) + 1 < l:
         yield from get_combos(d[1:],c+[(d[0][0],False)])
   else:
      yield [''.join(a for a,_ in c),1-sum(0.1 for _,b in c if not b)]

name_target = "ARUN GULABRAO INDULKAR"
n = name_target.split()
l = len(n)
result = [i for b in prmt(n,l) for i in get_combos(b,l)]

输出:

[['ARUNGULABRAOINDULKAR',1],['ARUNGULABRAOI',0.9],['ARUNGINDULKAR',['ARUNGI',0.8],['AGULABRAOINDULKAR',['AGULABRAOI',['AGINDULKAR',['ARUNINDULKARGULABRAO',['ARUNINDULKARG',['ARUNIGULABRAO',['ARUNIG',['AINDULKARGULABRAO',['AINDULKARG',['AIGULABRAO',['GULABRAOARUNINDULKAR',['GULABRAOARUNI',['GULABRAOAINDULKAR',['GULABRAOAI',['GARUNINDULKAR',['GARUNI',['GAINDULKAR',['GULABRAOINDULKARARUN',['GULABRAOINDULKARA',['GULABRAOIARUN',['GULABRAOIA',['GINDULKARARUN',['GINDULKARA',['GIARUN',['INDULKARARUNGULABRAO',['INDULKARARUNG',['INDULKARAGULABRAO',['INDULKARAG',['IARUNGULABRAO',['IARUNG',['IAGULABRAO',['INDULKARGULABRAOARUN',['INDULKARGULABRAOA',['INDULKARGARUN',['INDULKARGA',['IGULABRAOARUN',['IGULABRAOA',['IGARUN',0.8]]

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...