问题描述
我从关于如何使用 mrjob
here 实现多个步骤的文档中看到了这个示例:
from mrjob.job import MRJob
from mrjob.step import MRStep
import re
WORD_RE = re.compile(r"[\w']+")
class MRMostUsedWord(MRJob):
def mapper_get_words(self,_,line):
# yield each word in the line
for word in WORD_RE.findall(line):
yield (word.lower(),1)
def combiner_count_words(self,word,counts):
# sum the words we've seen so far
yield (word,sum(counts))
def reducer_count_words(self,counts):
# send all (num_occurrences,word) pairs to the same reducer.
# num_occurrences is so we can easily use Python's max() function.
yield None,(sum(counts),word)
# discard the key; it is just None
def reducer_find_max_word(self,word_count_pairs):
# each item of word_count_pairs is (count,word),# so yielding one results in key=counts,value=word
yield max(word_count_pairs)
def steps(self):
return [
MRStep(mapper=self.mapper_get_words,combiner=self.combiner_count_words,reducer=self.reducer_count_words),MRStep(reducer=self.reducer_find_max_word)
]
if __name__ == '__main__':
MRMostUsedWord.run()
如果我有很多步骤,我想把它写在另一个类或不同的文件中,我该如何导入和定义step()
函数?
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)