在许多不同的类中编写 map 和 reduce?

问题描述

我从关于如何使用 mrjob here 实现多个步骤的文档中看到了这个示例:

from mrjob.job import MRJob
from mrjob.step import MRStep
import re

WORD_RE = re.compile(r"[\w']+")

class MRMostUsedWord(MRJob):

    def mapper_get_words(self,_,line):
        # yield each word in the line
        for word in WORD_RE.findall(line):
            yield (word.lower(),1)

    def combiner_count_words(self,word,counts):
        # sum the words we've seen so far
        yield (word,sum(counts))

    def reducer_count_words(self,counts):
        # send all (num_occurrences,word) pairs to the same reducer.
        # num_occurrences is so we can easily use Python's max() function.
        yield None,(sum(counts),word)

    # discard the key; it is just None
    def reducer_find_max_word(self,word_count_pairs):
        # each item of word_count_pairs is (count,word),# so yielding one results in key=counts,value=word
        yield max(word_count_pairs)

    def steps(self):
        return [
            MRStep(mapper=self.mapper_get_words,combiner=self.combiner_count_words,reducer=self.reducer_count_words),MRStep(reducer=self.reducer_find_max_word)
        ]

if __name__ == '__main__':
    MRMostUsedWord.run()

如果我有很多步骤,我想把它写在另一个类或不同的文件中,我该如何导入和定义step()函数

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)