Apache Airflow 和 Apache Atlas 超时

问题描述

我在 AWS ECS 中运行 Apache Airflow,在 EC2 上运行 Apache Atlas。我已经能够将 Apache Airflow 的本地实例连接到 EC2 上的 Apache Atlas;但是,我无法连接我的 AWS ECS 实例和 EC2 实例。当 DAG 中的 Airflow 任务尝试将信息推送到 Apache Atlas 时,我收到以下错误

[2021-02-18 18:49:37,301] {connectionpool.py:752} WARNING - retrying (Retry(total=4,connect=None,read=None,redirect=None,status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.httpconnection object at 0x7fb1e2e87410>,'Connection to <ip-address> timed out. (connect timeout=10)')': /api/atlas/v2/types/typedefs
[2021-02-18 18:49:47,302] {connectionpool.py:752} WARNING - retrying (Retry(total=3,status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.httpconnection object at 0x7fb1e2e87b10>,'Connection to <ip-address> timed out. (connect timeout=10)')': /api/atlas/v2/types/typedefs
[2021-02-18 18:49:57,311] {connectionpool.py:752} WARNING - retrying (Retry(total=2,status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.httpconnection object at 0x7fb1e2e9f190>,'Connection to <ip-address> timed out. (connect timeout=10)')': /api/atlas/v2/types/typedefs
[2021-02-18 18:50:07,319] {connectionpool.py:752} WARNING - retrying (Retry(total=1,status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.httpconnection object at 0x7fb1e2e9f7d0>,'Connection to <ip-address> timed out. (connect timeout=10)')': /api/atlas/v2/types/typedefs
[2021-02-18 18:50:17,327] {connectionpool.py:752} WARNING - retrying (Retry(total=0,status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.httpconnection object at 0x7fb1e2e9fe10>,'Connection to <ip-address> timed out. (connect timeout=10)')': /api/atlas/v2/types/typedefs
[2021-02-18 18:50:27,338] {taskinstance.py:1150} ERROR - httpconnectionPool(host='<ip-address>,port=21000): Max retries exceeded with url: /api/atlas/v2/types/typedefs (Caused by ConnectTimeoutError(<urllib3.connection.httpconnection object at 0x7fb1e2ea3490>,'Connection to <ip-address> timed out. (connect timeout=10)'))

编辑: 按要求发布代码

airflow.cfg 配置

backend = airflow.lineage.backend.atlas.AtlasBackend

[atlas]
host = <ip-address>
port = 21000
username = admin
password = <password>

解决方法

我能够通过将 ip 地址设置为私有 ip 地址而不是运行 atlas 的 ec2 的公共 ip 地址来解决问题。另外,我不得不更新ec2运行apache atlas的安全组入站规则,以允许airflow webserver流量的私有IP地址进入。