Python客户端支持用于在Amazon EMR之上运行Hive

问题描述

| 我注意到,mrjob和boto都不支持Python界面来在Amazon Elastic MapReduce(EMR)上提交和运行Hive作业。是否有其他Python客户端库支持在EMR上运行Hive?     

解决方法

使用boto,您可以执行以下操作:
args1 = [u\'s3://us-east-1.elasticmapreduce/libs/hive/hive-script\',u\'--base-path\',u\'s3://us-east-1.elasticmapreduce/libs/hive/\',u\'--install-hive\',u\'--hive-versions\',u\'0.7\']
args2 = [u\'s3://us-east-1.elasticmapreduce/libs/hive/hive-script\',u\'0.7\',u\'--run-hive-script\',u\'--args\',u\'-f\',s3_query_file_uri]
steps = []
for name,args in zip((\'Setup Hive\',\'Run Hive Script\'),(args1,args2)):
    step = JarStep(name,\'s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar\',step_args=args,#action_on_failure=\"CANCEL_AND_WAIT\"
                   )
    #should be inside loop
    steps.append(step)
# Kick off the job
jobid = EmrConnection().run_jobflow(name,s3_log_uri,steps=steps,master_instance_type=master_instance_type,slave_instance_type=slave_instance_type,num_instances=num_instances,hadoop_version=\"0.20\")