迭代潜在的生成器

问题描述

我需要遍历 pandas.Series 对象流(尽管我想使用的对象类型与我们无关)。可选地,将任意函数应用于每个系列,并且——这里是关键——这个任意函数可以是一个生成函数,它产生两个(或更多)值。我对 more_itertools.flatten 函数抱有希望,但它无济于事,因为它在常规函数或没有函数映射到生成器的情况下会中断。有没有办法把这个可迭代对象变成一个简单的系列对象生成器?这是一个显示问题的简单示例:

In [1]: from more_itertools import flatten
   ...: 
   ...: def generator():
   ...:     for i in range(10):
   ...:         yield i
   ...: 
   ...: def postprocess1(i):
   ...:     yield 2*i
   ...: 
   ...: def postprocess1_return(i):
   ...:     return 2*i
   ...: 
   ...: def postprocess2(i):
   ...:     yield from (i,2*i)
   ...: 

In [2]: list(generator())
   ...: 
Out[2]: [0,1,2,3,4,5,6,7,8,9]

In [3]: list(map(postprocess1,generator()))
   ...: 
Out[3]: 
[<generator object postprocess1 at 0x7f5a402916d0>,<generator object postprocess1 at 0x7f5a40291e40>,<generator object postprocess1 at 0x7f5a40291f20>,<generator object postprocess1 at 0x7f5a40291dd0>,<generator object postprocess1 at 0x7f5a40291eb0>,<generator object postprocess1 at 0x7f5a40209040>,<generator object postprocess1 at 0x7f5a40209190>,<generator object postprocess1 at 0x7f5a402092e0>,<generator object postprocess1 at 0x7f5a402090b0>,<generator object postprocess1 at 0x7f5a40209350>]

In [4]: list(map(postprocess1_return,generator()))
   ...: 
Out[4]: [0,10,12,14,16,18]

In [5]: list(map(postprocess2,generator()))
   ...: 
Out[5]: 
[<generator object postprocess2 at 0x7f5a403ad430>,<generator object postprocess2 at 0x7f5a40209580>,<generator object postprocess2 at 0x7f5a402097b0>,<generator object postprocess2 at 0x7f5a40209510>,<generator object postprocess2 at 0x7f5a40209430>,<generator object postprocess2 at 0x7f5a40209740>,<generator object postprocess2 at 0x7f5a402096d0>,<generator object postprocess2 at 0x7f5a40209820>,<generator object postprocess2 at 0x7f5a40209660>,<generator object postprocess2 at 0x7f5a40209890>]

In [6]: list(flatten(generator()))
   ...: 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-7cd770547fa4> in <module>
----> 1 list(flatten(generator()))

TypeError: 'int' object is not iterable

In [7]: list(flatten(map(postprocess1,generator())))
   ...: 
Out[7]: [0,18]

In [8]: list(flatten(map(postprocess1_return,generator())))
   ...: 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-35ce9aef7285> in <module>
----> 1 list(flatten(map(postprocess1_return,generator())))

TypeError: 'int' object is not iterable

In [9]: list(flatten(map(postprocess2,generator())))
Out[9]: [0,9,18]

解决方法

我想通了:more_itertools.collapse(generator,base_type=pd.Series) 可以解决问题!

显然,基值的类型实际上很重要:在我的实际代码中,如果没有 base_type=pd.Series,一系列 a 的所有元素都会一个一个地产生,这不是我想要的。