将输出存储到 gcs

问题描述

嗨,我在 google composer(管理器气流)中运行纸厂。我正在使用 PythonVirtualenvOperator 在 Composer 中运行纸厂。源笔记本在谷歌云存储中,我需要存储执行的笔记本的路径也在谷歌云存储中。但是当像这样运行造纸厂时出现错误意外的关键字参数 'min'

以下是代码片段:

def getGCSObjects():
  import papermill as pm
  pm.execute_notebook(
    'gs://BUCKET/inputs/add.ipynb','gs://BUCKET/inputs/add_out.ipynb',parameters=dict(alpha=0.6,ratio=0.1)
  )

list_gcs_files = PythonVirtualenvOperator(
  task_id='list_gcs_files',system_site_packages=True,python_version='3.6',requirements=[
   'gcsfs>=0.2.0'
   'papermill',],dag=dag,python_callable=getGCSObjects,)

错误输出

[2021-06-30 09:14:17,905] {taskinstance.py:902} INFO - Executing <Task(PythonVirtualenvOperator): list_gcs_files> on 2021-06-30T00:00:00+00:00
[2021-06-30 09:14:17,905] {taskinstance.py:902} INFO - Executing <Task(PythonVirtualenvOperator): list_gcs_files> on 2021-06-30T00:00:00+00:00
[2021-06-30 09:14:19,489] {python_operator.py:316} INFO - Executing cmd
['virtualenv','/tmp/venvoyf919ht','--system-site-packages','--python=python3.6']
[2021-06-30 09:14:19,828] {python_operator.py:321} INFO - Got output
b'created virtual environment Cpython3.6.10.final.0-64 in 235ms\n  creator Cpython3Posix(dest=/tmp/venvoyf919ht,clear=False,no_vcs_ignore=False,global=True)\n  seeder FromAppData(download=False,pip=bundle,wheel=bundle,setuptools=bundle,via=copy,app_data_dir=/home/airflow/.local/share/virtualenv)\n    added seed packages: pip==20.2.4,setuptools==50.3.2,wheel==0.35.1\n  activators PythonActivator,FishActivator,XonshActivator,CShellActivator,PowerShellActivator,BashActivator\n'
[2021-06-30 09:14:19,831] {python_operator.py:316} INFO - Executing cmd
['/tmp/venvoyf919ht/bin/pip','install','gcsfs>=0.2.0papermill']
[2021-06-30 09:14:27,079] {python_operator.py:321} INFO - Got output
b'Requirement already satisfied: gcsfs>=0.2.0papermill in /opt/python3.6/lib/python3.6/site-packages (2021.6.1)\nRequirement already satisfied: aiohttp in /opt/python3.6/lib/python3.6/site-packages (from gcsfs>=0.2.0papermill) (3.7.4.post0)\nRequirement already satisfied: fsspec==2021.06.1 in /opt/python3.6/lib/python3.6/site-packages (from gcsfs>=0.2.0papermill) (2021.6.1)\nRequirement already satisfied: google-auth>=1.2 in /opt/python3.6/lib/python3.6/site-packages (from gcsfs>=0.2.0papermill) (1.24.0)\nRequirement already satisfied: google-auth-oauthlib in /opt/python3.6/lib/python3.6/site-packages (from gcsfs>=0.2.0papermill) (0.4.2)\nRequirement already satisfied: requests in /opt/python3.6/lib/python3.6/site-packages (from gcsfs>=0.2.0papermill) (2.25.0)\nRequirement already satisfied: decorator in /opt/python3.6/lib/python3.6/site-packages (from gcsfs>=0.2.0papermill) (5.0.9)\nRequirement already satisfied: yarl<2.0,>=1.0 in /opt/python3.6/lib/python3.6/site-packages (from aiohttp->gcsfs>=0.2.0papermill) (1.6.3)\nRequirement already satisfied: chardet<5.0,>=2.0 in /opt/python3.6/lib/python3.6/site-packages (from aiohttp->gcsfs>=0.2.0papermill) (3.0.4)\nRequirement already satisfied: async-timeout<4.0,>=3.0 in /opt/python3.6/lib/python3.6/site-packages (from aiohttp->gcsfs>=0.2.0papermill) (3.0.1)\nRequirement already satisfied: typing-extensions>=3.6.5 in /opt/python3.6/lib/python3.6/site-packages (from aiohttp->gcsfs>=0.2.0papermill) (3.7.4.3)\nRequirement already satisfied: attrs>=17.3.0 in /opt/python3.6/lib/python3.6/site-packages (from aiohttp->gcsfs>=0.2.0papermill) (20.3.0)\nRequirement already satisfied: idna-ssl>=1.0; python_version < "3.7" in /opt/python3.6/lib/python3.6/site-packages (from aiohttp->gcsfs>=0.2.0papermill) (1.1.0)\nRequirement already satisfied: multidict<7.0,>=4.5 in /opt/python3.6/lib/python3.6/site-packages (from aiohttp->gcsfs>=0.2.0papermill) (5.1.0)\nRequirement already satisfied: setuptools>=40.3.0 in /tmp/venvoyf919ht/lib/python3.6/site-packages (from google-auth>=1.2->gcsfs>=0.2.0papermill) (50.3.2)\nRequirement already satisfied: rsa<5,>=3.1.4; python_version >= "3.6" in /opt/python3.6/lib/python3.6/site-packages (from google-auth>=1.2->gcsfs>=0.2.0papermill) (4.6)\nRequirement already satisfied: pyasn1-modules>=0.2.1 in /opt/python3.6/lib/python3.6/site-packages (from google-auth>=1.2->gcsfs>=0.2.0papermill) (0.2.8)\nRequirement already satisfied: cachetools<5.0,>=2.0.0 in /opt/python3.6/lib/python3.6/site-packages (from google-auth>=1.2->gcsfs>=0.2.0papermill) (4.1.1)\nRequirement already satisfied: six>=1.9.0 in /opt/python3.6/lib/python3.6/site-packages (from google-auth>=1.2->gcsfs>=0.2.0papermill) (1.15.0)\nRequirement already satisfied: requests-oauthlib>=0.7.0 in /opt/python3.6/lib/python3.6/site-packages (from google-auth-oauthlib->gcsfs>=0.2.0papermill) (1.3.0)\nRequirement already satisfied: idna<3,>=2.5 in /opt/python3.6/lib/python3.6/site-packages (from requests->gcsfs>=0.2.0papermill) (2.8)\nRequirement already satisfied: certifi>=2017.4.17 in /opt/python3.6/lib/python3.6/site-packages (from requests->gcsfs>=0.2.0papermill) (2020.11.8)\nRequirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/python3.6/lib/python3.6/site-packages (from requests->gcsfs>=0.2.0papermill) (1.25.11)\nRequirement already satisfied: pyasn1>=0.1.3 in /opt/python3.6/lib/python3.6/site-packages (from rsa<5,>=3.1.4; python_version >= "3.6"->google-auth>=1.2->gcsfs>=0.2.0papermill) (0.4.8)\nRequirement already satisfied: oauthlib>=3.0.0 in /opt/python3.6/lib/python3.6/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib->gcsfs>=0.2.0papermill) (3.1.0)\n'
[2021-06-30 09:14:27,200] {python_operator.py:316} INFO - Executing cmd
['/tmp/venvoyf919ht/bin/python','/tmp/venvoyf919ht/script.py','/tmp/venvoyf919ht/script.in','/tmp/venvoyf919ht/script.out','/tmp/venvoyf919ht/string_args.txt']
[2021-06-30 09:14:28,919] {python_operator.py:323} INFO - Got error output
b'Input notebook does not contain a cell with tag \'parameters\'\n\rExecuting:   0%|          | 0/4 [00:00<?,?cell/s]Traceback (most recent call last):\n  File "/tmp/venvoyf919ht/script.py",line 16,in <module>\n    res = getGCSObjects(*args,**kwargs)\n  File "/tmp/venvoyf919ht/script.py",line 13,in getGCSObjects\n    parameters=dict(alpha=0.6,ratio=0.1)\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/execute.py",line 118,in execute_notebook\n    **engine_kwargs\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/engines.py",line 49,in execute_notebook_with_engine\n    return self.get_engine(engine_name).execute_notebook(nb,kernel_name,**kwargs)\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/engines.py",line 341,in execute_notebook\n    nb_man.notebook_start()\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/engines.py",line 69,in wrapper\n    return func(self,*args,line 198,in notebook_start\n    self.save()\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/engines.py",line 139,in save\n    write_ipynb(self.nb,self.output_path)\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/iorw.py",line 397,in write_ipynb\n    papermill_io.write(nbformat.writes(nb),path)\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/iorw.py",line 128,in write\n    return self.get_handler(path).write(buf,line 316,in write\n    multiplier=self.RETRY_MULTIPLIER,min=self.RETRY_DELAY,max=self.RETRY_MAX_DELAY\nTypeError: __init__() got an unexpected keyword argument \'min\'\n\rExecuting:   0%|          | 0/4 [00:00<?,?cell/s]\n'
[2021-06-30 09:14:28,970] {taskinstance.py:1152} ERROR - Command '['/tmp/venvoyf919ht/bin/python','/tmp/venvoyf919ht/string_args.txt']' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/usr/local/lib/airflow/airflow/models/taskinstance.py",line 985,in _run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/airflow/airflow/operators/python_operator.py",line 113,in execute
    return_value = self.execute_callable()
  File "/usr/local/lib/airflow/airflow/operators/python_operator.py",line 307,in execute_callable
    string_args_filename))
  File "/usr/local/lib/airflow/airflow/operators/python_operator.py",line 319,in _execute_in_subprocess
    close_fds=True)
  File "/opt/python3.6/lib/python3.6/subprocess.py",line 356,in check_output
    **kwargs).stdout
  File "/opt/python3.6/lib/python3.6/subprocess.py",line 438,in run
    output=stdout,stderr=stderr)
subprocess.CalledProcessError: Command '['/tmp/venvoyf919ht/bin/python','/tmp/venvoyf919ht/string_args.txt']' returned non-zero exit status 1.
[2021-06-30 09:14:28,974] {taskinstance.py:1196} INFO - Marking task as Failed. dag_id=papermill_run_notebook_v0.1,task_id=list_gcs_files,execution_date=20210630T000000,start_date=20210630T091417,end_date=20210630T091428
[2021-06-30 09:14:28,end_date=20210630T091428
Traceback (most recent call last):
  File "/usr/local/bin/airflow",line 7,in <module>
    exec(compile(f.read(),__file__,'exec'))
  File "/usr/local/lib/airflow/airflow/bin/airflow",line 37,in <module>
    args.func(args)
  File "/usr/local/lib/airflow/airflow/utils/cli.py",line 233,in wrapper
    func(args)
  File "/usr/local/lib/airflow/airflow/utils/cli.py",line 81,in wrapper
    return f(*args,**kwargs)
  File "/usr/local/lib/airflow/airflow/bin/cli.py",line 814,in test
    ti.run(ignore_task_deps=True,ignore_ti_state=True,test_mode=True)
  File "/usr/local/lib/airflow/airflow/utils/db.py",line 74,in wrapper
    return func(*args,**kwargs)
  File "/usr/local/lib/airflow/airflow/models/taskinstance.py",line 1109,in run
    session=session)
  File "/usr/local/lib/airflow/airflow/utils/db.py",line 70,'/tmp/venvoyf919ht/string_args.txt']' returned non-zero exit status 1.

ERROR: (gcloud.composer.environments.run) kubectl returned non-zero status code.

任何帮助将不胜感激,谢谢。

解决方法

我刚刚发生了类似的事情。对我来说,错误来自输入和输出笔记本的无效路径。当我在包含我的 DAG 的存储桶中创建一个单独的文件夹并将我的笔记本移到那里时,它起作用了。您应该能够将执行块更改为这样的内容;

pm.execute_notebook(
    r"/home/airflow/gcs/notebooks/notebook.ipynb",r"/home/airflow/gcs/notebooks/notebook.ipynb",parameters=dict(alpha=0.6,ratio=0.1)

其中 /home/airflow/gcs/dags 包含您的 DAG,您将创建 notebooks 目录并将您的笔记本移动到那里。

正如有人评论的那样,这看起来像是 Airflow Error - got an unexpected keyword argument 'min' 的副本。希望这有助于更好地解释它,并解决您的问题