问题描述
|
我正在研究一个涉及从统一diff补丁中验证格式的问题。
内部格式中的变量一次可以跨越多行,因此我编写了一个生成器,提取每行并在完成时产生变量。
为了避免从统一的diff文件读取时不得不重写此函数,我创建了一个生成器,以在将其传递给内部格式验证器之前从行中剥离统一的diff字符。但是,我陷入了无限循环(在代码和头脑中)。我已经将问题抽象到以下代码。我敢肯定有更好的方法可以做到这一点。我只是不知道那是什么。
from collections import Iterable
def inner_format_validator(inner_item):
# Do some validation to inner items
return inner_item[0] != \'+\'
def inner_gen(iterable):
for inner_item in iterable:
# Operates only on inner_info type data
yield inner_format_validator(inner_item)
def outer_gen(iterable):
class DecoratedGenerator(Iterable):
def __iter__(self):
return self
def next(self):
# Using iterable from closure
for outer_item in iterable:
self.outer_info = outer_item[0]
inner_item = outer_item[1:]
return inner_item
decorated_gen = DecoratedGenerator()
for inner_item in inner_gen(decorated_gen):
yield inner_item,decorated_gen.outer_info
if __name__ == \'__main__\':
def wrap(string):
# The point here is that I don\'t kNow what the first character will be
pseudo_rand = len(string)
if pseudo_rand * pseudo_rand % 2 == 0:
return \'+\' + string
else:
return \'-\' + string
inner_items = [\"whatever\"] * 3
# wrap screws up inner_format_validator
outer_items = [wrap(\"whatever\")] * 3
# I need to be able to
# iterate over inner_items
for inner_info in inner_gen(inner_items):
print(inner_info)
# and iterate over outer_items
for outer_info,inner_info in outer_gen(outer_items):
# This is an infinite loop
print(outer_info)
print(inner_info)
关于更好,更pythonic的方法有什么想法吗?
解决方法
我会做一些简单的事情,像这样:
def outer_gen(iterable):
iterable = iter(iterable)
first_item = next(iterable)
info = first_item[0]
yield info,first_item[1:]
for item in iterable:
yield info,item
这将只执行前4行,然后进入循环并产生所需的内容。
您可能想在此处和此处添加try
/except
到acthIndexErrors
。
如果您想在值以某物开头或相反时开始取值,请记住您可以使用itertools
工具箱中的很多东西,尤其是dropwhile
,takewhile
和chain
:
>>> import itertools
>>> l = [\'+foo\',\'-bar\',\'+foo\']
>>> list(itertools.takewhile(lambda x: x.startswith(\'+\'),l))
[\'+foo\']
>>> list(itertools.dropwhile(lambda x: x.startswith(\'+\'),l))
[\'-bar\',\'+foo\']
>>> a = itertools.takewhile(lambda x: x.startswith(\'+\'),l)
>>> b = itertools.dropwhile(lambda x: x.startswith(\'+\'),l)
>>> list(itertools.chain(a,b))
[\'+foo\',\'+foo\']
记住,您可以像生成理解列表那样创建生成器,将它们存储在变量中并链接它们,就像用管道传递Linux命令一样:
import random
def create_item():
return random.choice((\'+\',\'-\')) + random.choice((\'foo\',\'bar\'))
random_items = (create_item() for s in xrange(10))
added_items = ((i[0],i[1:]) for i in random_items if i.startswith(\'+\'))
valid_items = ((prefix,line) for prefix,line in added_items if \'foo\' in line)
print list(valid_items)
有了这些,您应该能够找到一些解决问题的pythonic方法:-)
, 我仍然不太喜欢这个,但至少它更短了,并且有点pythonic:
from itertools import imap,izip
from functools import partial
def inner_format_validator(inner_item):
return not inner_item.startswith(\'+\')
inner_gen = partial(imap,inner_format_validator)
def split(astr):
return astr[0],astr[1:]
def outer_gen(iterable):
outer_stuff,inner_stuff = izip(*imap(split,iterable))
return izip(inner_gen(inner_stuff),outer_stuff)
[EDIT]i12ѭ和outer_gen()
,不带imap和部分内容:
def inner_gen(iterable):
for each in iterable:
yield inner_format_validator(each)
def outer_gen(iterable):
outer_stuff,inner_stuff = izip(*(split(each) for each in iterable))
return izip(inner_gen(inner_stuff),outer_stuff)
也许这是一个更好的解决方案,尽管有所不同:
def transmogrify(iter_of_iters,*transmogrifiers):
for iters in iter_of_iters:
yield (
trans(each) if trans else each
for trans,each in izip(transmogrifiers,iters)
)
for outer,inner in transmogrify(imap(split,stuff),inner_format_validator,None):
print inner,outer
, 我认为,如果将DecoratedGenerator的定义更改为:
class DecoratedGenerator(Iterable):
def __iter__(self):
# Using iterable from closure
for outer_item in iterable:
self.outer_info = outer_item[0]
inner_item = outer_item[1:]
yield inner_item
您的原始版本永远不会终止,因为它的ѭ17was方法是无状态的,并且每次调用都会返回相同的值。不过,您根本不需要使用next()方法-您可以自己实现__iter__()
(就像我一样),然后一切正常。