问题描述
我正在尝试使用Python mrjob
库在Google Cloud Dataproc上运行Hadoop Map Reduce字数统计示例。但是,mrjob
失败,但出现以下异常:
TypeError: __init__() got an unexpected keyword argument 'channel'
Traceback (most recent call last):
File "freq.py",line 21,in <module>
MRWordFreqCount.run()
File "/usr/local/lib/python3.8/dist-packages/mrjob/job.py",line 616,in run
cls().execute()
File "/usr/local/lib/python3.8/dist-packages/mrjob/job.py",line 687,in execute
self.run_job()
File "/usr/local/lib/python3.8/dist-packages/mrjob/job.py",line 636,in run_job
runner.run()
File "/usr/local/lib/python3.8/dist-packages/mrjob/runner.py",line 503,in run
self._run()
File "/usr/local/lib/python3.8/dist-packages/mrjob/dataproc.py",line 468,in _run
self._launch()
File "/usr/local/lib/python3.8/dist-packages/mrjob/dataproc.py",line 473,in _launch
self._launch_cluster()
File "/usr/local/lib/python3.8/dist-packages/mrjob/dataproc.py",line 637,in _launch_cluster
self._get_cluster(self._cluster_id)
File "/usr/local/lib/python3.8/dist-packages/mrjob/dataproc.py",line 1188,in _get_cluster
return self.cluster_client.get_cluster(
File "/usr/local/lib/python3.8/dist-packages/mrjob/dataproc.py",line 376,in cluster_client
return google.cloud.dataproc_v1beta2.ClusterControllerClient(
TypeError: __init__() got an unexpected keyword argument 'channel'
我检查了GOOGLE_APPLICATION_CREDENTIALS
的设置是否正确,是否在Google Cloud上启用了API,以及为该服务帐户设置了所有必需的角色。
mrjob
成功将文件上传到Google Cloud Storage。但是一旦尝试创建新的Dataproc集群失败。
可能有什么错误吗?
在Dataproc上启动mrjob
的命令行:
$ python3 freq.py -r dataproc words.txt
当前的Python环境:
$ python3 -VV
Python 3.8.5 (default,Jul 28 2020,12:59:40)
[GCC 9.3.0]
$ pip3 list | grep google
google-api-core 1.23.0
google-auth 1.23.0
google-auth-oauthlib 0.4.2
google-cloud-core 1.4.3
google-cloud-dataproc 2.0.2
google-cloud-logging 1.15.1
google-cloud-storage 1.32.0
google-crc32c 1.0.0
google-pasta 0.2.0
google-resumable-media 1.1.0
googleapis-common-protos 1.52.0
$ pip3 list | grep mrjob
mrjob 0.7.4
解决方法
解决方案是将google-cloud-dataproc降级为1.1.1。
调试到mrjob实现后,我发现mrjob版本0.7.4使用自2.0.0版本以来在google-cloud-dataproc库上重命名的参数调用google.cloud.dataproc_v1beta2.ClusterControllerClient的构造函数。
如何使用pip3降级:
$ pip3 install --force-reinstall --no-deps google-cloud-dataproc==1.1.1