问题描述
|
我注意到,mrjob和boto都不支持Python界面来在Amazon Elastic MapReduce(EMR)上提交和运行Hive作业。是否有其他Python客户端库支持在EMR上运行Hive?
解决方法
使用boto,您可以执行以下操作:
args1 = [u\'s3://us-east-1.elasticmapreduce/libs/hive/hive-script\',u\'--base-path\',u\'s3://us-east-1.elasticmapreduce/libs/hive/\',u\'--install-hive\',u\'--hive-versions\',u\'0.7\']
args2 = [u\'s3://us-east-1.elasticmapreduce/libs/hive/hive-script\',u\'0.7\',u\'--run-hive-script\',u\'--args\',u\'-f\',s3_query_file_uri]
steps = []
for name,args in zip((\'Setup Hive\',\'Run Hive Script\'),(args1,args2)):
step = JarStep(name,\'s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar\',step_args=args,#action_on_failure=\"CANCEL_AND_WAIT\"
)
#should be inside loop
steps.append(step)
# Kick off the job
jobid = EmrConnection().run_jobflow(name,s3_log_uri,steps=steps,master_instance_type=master_instance_type,slave_instance_type=slave_instance_type,num_instances=num_instances,hadoop_version=\"0.20\")