dataproc在python中创建集群gcloud等效命令

问题描述

如何在python中复制以下gcloud命令?

['hello','h3.a','ds4']

这是我到目前为止在python中拥有的东西:

gcloud beta dataproc clusters create spark-nlp-cluster \
     --region global \
     --Metadata 'PIP_PACKAGES=google-cloud-storage spark-nlp==2.5.3' \
     --worker-machine-type n1-standard-1 \
     --num-workers 2 \
     --image-version 1.4-debian10 \
     --initialization-actions gs://dataproc-initialization-actions/python/pip-install.sh \
     --optional-components=JUPYTER,ANACONDA \
     --enable-component-gateway 

不确定如何将这些gcloud命令转换为python:


    cluster_data = {
        "project_id": project,"cluster_name": cluster_name,"config": {
            "gce_cluster_config": {"zone_uri": zone_uri},"master_config": {"num_instances": 1,"machine_type_uri": "n1-standard-1"},"worker_config": {"num_instances": 2,"software_config":{"image_version":"1.4-debian10","optional_components":{"JUPYTER","ANACONDA"}}
            
        },}

    cluster = dataproc.create_cluster(
        request={"project_id": project,"region": region,"cluster": cluster_data}
    )

解决方法

你可以这样试试:

cluster_data = {
    "project_id": project,"cluster_name": cluster_name,"config": {
        "gce_cluster_config": {"zone_uri": zone_uri},"master_config": {"num_instances": 1,"machine_type_uri": "n1-standard-1"},"worker_config": {"num_instances": 2,"software_config":{"image_version":"1.4-debian10","optional_components":{"JUPYTER","ANACONDA"}},"initialization_actions":{"executable_file" : "gs://dataproc-initialization-actions/python/pip-install.sh"},"gce_cluster_config": {"metadata": "PIP_PACKAGES=google-cloud-storage,spark-nlp==2.5.3"},"endpoint_config": {"enable_http_port_access":True},},}

您可以访问更多:GCP Cluster Configs