如何在不运行for循环的情况下使类的iter方法返回值？

问题描述

我有一个类，它有一个 __iter__ 方法，就像这样

class Mycorpus:
    
    '''This class helps us to train the model without loading the whole dataset to the RAM.'''
    
    def __init__(self,filepath= text_file):
        self.filepath = filepath
        
    def __iter__(self):
        with open(self.filepath,'r') as rfile:
            csv_reader = csv.DictReader(rfile,delimiter=',')
            for row in csv_reader:
        
                # splitter splits the conversation into client and agent part
                client_convo,agent_convo = convo_split.splitter(row['Combined'])

                client_tokens = preprocess(client_convo)
                agent_tokens = preprocess(agent_convo)
                
                yield client_tokens

我将此对象传递给一个函数，该函数要求此对象在迭代时一次返回一组标记。即，client_tokens 或 agent_tokens。我希望 __iter__ 产生一个 client_tokens，并在下一次迭代中产生来自同一客户端代理对的 agent_tokens。我不想同时产生两组令牌，因为它会破坏功能。一次只有一个。我在这里的主要目标是避免在文件中循环两次并在相同的对话中再次使用拆分器功能。

我尝试过做类似下面的事情。

def __init__(self,filepath= text_file):
        self.filepath = filepath
        self.agent_turn = 0

def __iter__(self):
        with open(self.filepath,')
 
            if self.agent_turn:
                self.agent_turn = 0
                yield agent_tokens
            
            else:
                for row in csv_reader:
                
                    # splitter splits the conversation into client and agent part
                    client_convo,agent_convo = convo_split.splitter(row['Combined'])

                    client_tokens = preprocess(client_convo)
                    agent_tokens = preprocess(agent_convo)
                    self.agent_turn = 1
                    yield client_tokens

但上面的代码只给出了 client_tokens。有没有更好的方法来做到这一点而不使用整个数据集来记忆？我的要求甚至可以使用 __iter__ 方法吗？非常感谢任何帮助或指导。

解决方法

您使用了两个 yield 语句，正如许多示例向您展示的那样。请记住，生成器/迭代器在 yield 语句之后重新进入，而不是在函数顶部。

        for row in csv_reader:
    
            # splitter splits the conversation into client and agent part
            client_convo,agent_convo = convo_split.splitter(row['Combined'])

            client_tokens = preprocess(client_convo)
            agent_tokens = preprocess(agent_convo)
            
            yield client_tokens
            yield agent_tokens