问题描述
我有一个嵌套列表,我需要将它链接起来,然后运行指标,然后“解链”回其原始嵌套格式。以下是用于说明的示例数据:
from itertools import chain
nested_list = [['x','xx','xxx'],['yy','yyy','y','yyyy'],['zz','z']]
chained_list = list(chain(*nested_list))
print("chained_list: \n",chained_list)
metrics_list = [str(chained_list[x]) +'_score' \
for x in range(len(chained_list))]
print("metrics_list: \n",metrics_list)
zipped_scores = list(zip(chained_list,metrics_list))
print("zipped_scores: \n",zipped_scores)
unchain_function = '????'
chained_list:
['x','xxx','yy','yyyy','zz','z']
metrics_list:
['x_score','xx_score','xxx_score','yy_score','yyy_score','y_score','yyyy_score','zz_score','z_score']
zipped_scores:
[('x','x_score'),('xx','xx_score'),('xxx','xxx_score'),('yy','yy_score'),('yyy','yyy_score'),('y','y_score'),('yyyy','yyyy_score'),('zz','zz_score'),('z','z_score')]
是否有 python 函数或 pythonic 方法来编写“unchain_function”来获得这个所需的输出?
[
[
('x','xxx_score')
],[
('yy','yyyy_score')
],[
('zz','z_score')
]
]
(背景:这是为了在长度大于 100,000 的列表上运行指标)
解决方法
我不知道这是多么pythonic,但这应该有效。长话短说,我们使用 Wrapper
类将不可变的原语(如果不替换就无法更改)转换为可变变量(因此我们可以对同一变量有多个引用,每个引用的组织方式不同)。
我们创建了一个相同的嵌套列表,只是每个值都是原始列表中相应值的 Wrapper
。然后,我们应用相同的转换来解开包装器列表。将处理后的链表中的更改复制到链式包装器列表中,然后从嵌套包装器列表中访问这些更改并展开它们。
我认为使用名为 Wrapper
的显式和简单的类更容易理解,但您可以通过使用单例列表来包含变量而不是 Wrapper
的实例来做本质上相同的事情.
from itertools import chain
nested_list = [['x','xx','xxx'],['yy','yyy','y','yyyy'],['zz','z']]
chained_list = list(chain(*nested_list))
metrics_list = [str(chained_list[x]) +'_score' for x in range(len(chained_list))]
zipped_scores = list(zip(chained_list,metrics_list))
# create a simple Wrapper class,so we can essentially have a mutable primitive.
# We can put the Wrapper into two different lists,and modify its value without
# overwriting it.
class Wrapper:
def __init__(self,value):
self.value = value
# create a 'duplicate list' of the nested and chained lists,respectively,# such that each element of these lists is a Wrapper of the corresponding
# element in the above lists
nested_wrappers = [[Wrapper(elem) for elem in sublist] for sublist in nested_list]
chained_wrappers = list(chain(*nested_wrappers))
# now we have two references to the same MUTABLE Wrapper for each element of
# the original lists - one nested,and one chained. If we change a property
# of the chained Wrapper,the change will reflect on the corresponding nested
# Wrapper. Copy the changes from the zipped scores onto the chained wrappers
for score,wrapper in zip(zipped_scores,chained_wrappers):
wrapper.value = score
# then extract the values in the unchained list of the same wrappers,thus
# preserving both the changes and the original nested organization
unchained_list = [[wrapper.value for wrapper in sublist] for sublist in nested_wrappers]
这以 unchained_list
结束,等于以下内容:
[[('x','x_score'),('xx','xx_score'),('xxx','xxx_score')],[('yy','yy_score'),('yyy','yyy_score'),('y','y_score'),('yyyy','yyyy_score')],[('zz','zz_score'),('z','z_score')]]
,
我认为您只是想根据某种条件对数据进行分组,即每个元组中第一个索引的第一个字母。
给定
您的扁平化压缩数据:
data = [
('x','xxx_score'),('yy','yyyy_score'),('zz','z_score')
]
代码
[list(g) for _,g in itertools.groupby(data,key=lambda x: x[0][0])]
输出
[[('x','z_score')]]
另见
- 此 post 关于此工具的工作原理
您使算法变得非常复杂,您只需通过如下所示的简单步骤即可完成:
-
首先创建一个所需大小的空嵌套列表
formatted_list = [[] for _ in range(3)]
-
只需遍历列表并相应地格式化
对于范围内的 K (0,3):
for i in nested_list[K]: formatted_list[K].append(i + '_score') print([formatted_list])