问题描述
我正在尝试使用 AWS cloudformation 创建 EMR 集群。 我在 EMR Pyspark 作业中使用以下步骤参数,我需要在其中提供多个 .py zip 文件。
EMRStepArgs:
Description: EMR Step Args
Type: CommaDelimitedList
Default: "spark-submit,--deploy-mode,cluster,--packages,org.mongodb.spark:mongo-spark-connector_2.12:3.0.1,--py-files,'s3://logsetl-emr/py-dist/jobs.zip\,s3://logsetl-emr/py-dist/shared.zip\,s3://logsetl-emr/py-dist/libs.zip\,s3://logsetl-emr/py-dist/schema.zip',--files,s3://logsetl-emr/py-dist/config.json,s3://logsetl-emr/py-dist/main.py,--job,cdn,--start_date,'2021-04-14',--end_date,'2021-04-14'"
此 EMRStepArgs
将作为 cloudformation .yaml 文件中 EMR 步骤的参数提供
我在 EMR 集群中得到的是
spark-submit --deploy-mode cluster --packages org.mongodb.spark:mongo-spark-connector_2.12:3.0.1 --py-files 's3://logsetl-emr/py-dist/jobs.zip s3://logsetl-emr/py-dist/shared.zip s3://logsetl-emr/py-dist/libs.zip s3://logsetl-emr/py-dist/schema.zip' --files s3://logsetl-emr/py-dist/config.json s3://logsetl-emr/py-dist/main.py --job cdn --start_date '2021-04-14' --end_date '2021-04-14'
我想要的是
spark-submit --deploy-mode cluster --packages org.mongodb.spark:mongo-spark-connector_2.12:3.0.1 --py-files s3://logsetl-emr/py-dist/jobs.zip,s3://logsetl-emr/py-dist/shared.zip,s3://logsetl-emr/py-dist/libs.zip,s3://logsetl-emr/py-dist/schema.zip --files s3://logsetl-emr/py-dist/config.json s3://logsetl-emr/py-dist/main.py --job cdn --start_date '2021-04-14' --end_date '2021-04-14'
我不知道如何忽略参数中的逗号。
解决方法
问题通过将 EMRStepArgs 定义为
EMRStepArgs:
Description: EMR Step Args
Type: String
Default: "spark-submit --master yarn --conf spark.yarn.submit.waitAppCompletion=true --deploy-mode cluster --packages org.mongodb.spark:mongo-spark-connector_2.12:3.0.1 --py-files s3://logsetl--emr/py-dist/jobs.zip,s3://logsetl--emr/py-dist/shared.zip,s3://logsetl--emr/py-dist/libs.zip,s3://logsetl--emr/py-dist/schema.zip --files s3://logsetl--emr/py-dist/config.json s3://logsetl--emr/py-dist/main.py --job cdn --start_date '2021-04-14' --end_date '2021-04-14'"
并按以下步骤使用它
- ActionOnFailure: !Ref EMRActionOnfailure
HadoopJarStep:
Args:
!Split [" ",!Ref EMRStepArgs]
# Ref: EMRStepArgs
Jar: command-runner.jar
MainClass: ""