无法将外部 k8s 集群注册到 Anthos - 'PermissionDenied'

问题描述

在当前项目中成功将来自不同 GCloud 项目的 GKE 集群附加到 Anthos 后,我尝试将外部 k8s 集群(本地;Rancher)附加到同一个 Anthos 环境。

  1. 创建了 GCloud 服务帐户

gcloud iam service-accounts create ${SERVICE_ACCOUNT_NAME} --project=${HUB_PROJECT_ID}

  1. 创建 GCloud 策略绑定
gcloud projects add-iam-policy-binding ${HUB_PROJECT_ID} \
 --member="serviceAccount:${SERVICE_ACCOUNT_NAME}@${HUB_PROJECT_ID}.iam.gserviceaccount.com" \
 --role="roles/gkehub.connect" \
 --condition "expression=resource.name == \
'projects/${HUB_PROJECT_ID}/locations/global/memberships/${MEMBERSHIP_NAME}',\
title=bind-${SERVICE_ACCOUNT_NAME}-to-${MEMBERSHIP_NAME}"

我确认 SA 具有 roles/gkehub.connect

  condition:
    expression: resource.name == 'projects/[project-id]/locations/global/memberships/[membership-name]'
    title: [TITLE]
  members:
    - serviceAccount:[SERVICE_ACCOUNT_NAME]@[HUB_PROJECT_ID].iam.gserviceaccount.com
  role: roles/gkehub.connect
  1. 注册集群
gcloud container hub memberships register [MY-MEMBERSHIP] \
--service-account-key-file=/path/to/[SA-KEYFILE].json \
--kubeconfig=/path/to/[KUBE.YAML] \
--context=[MY-CONTEXT] \
--project=[HUB_PROJECT_ID] \
--proxy=http://[MY-PROXY]:[MY-PORT]

此时,gcloud container hub memberships list 显示集群已注册

但是,Anthos UI 显示其状态为“未知”

enter image description here

而且,gke-connect pod 正在抛出 PermissionDenied 错误

$ k logs -f gke-connect-agent-20210326-00-00-7bd487f46f-frq9l
2021/03/31 20:09:38.304981 gkeconnect_agent.go:39: GKE Connect Agent. Log timestamps in UTC.
2021/03/31 20:09:38.305064 gkeconnect_agent.go:40:
Built on: 2021-03-26 07:40:06 +0000 UTC
Built at: 365197196
Build Status: mint
Build Label: 20210326-00-00
2021/03/31 20:09:38.330249 environment.go:216: Got ExternalID [XYZ] from namespace kube-system.
2021/03/31 20:09:38.332193 environment.go:493: Using GCP Service Account key
2021/03/31 20:09:38.334169 agent.go:157: Using agent version: "20210326-00-00"
2021/03/31 20:09:38.334316 environment.go:421: Loading Endpoint Configs...
2021/03/31 20:09:38.334320 gkeconnect_agent.go:59: Starting HTTP server on ":8080".
2021/03/31 20:09:38.337643 agent.go:227: opening tunnel to gkeconnect.googleapis.com:443...
2021/03/31 20:09:38.337913 tunnel.go:245: serve: connected to backend
2021/03/31 20:09:38.337942 dialer.go:183: dialer: waiting for next event,outstanding connections=0
2021/03/31 20:09:38.337960 dialer.go:183: dialer: waiting for next event,outstanding connections=0
2021/03/31 20:09:38.337976 dialer.go:264: dialer: dial: connecting to gkeconnect.googleapis.com:443...
2021/03/31 20:09:38.338823 agent.go:281: Starting watch on secrets in namespace "gke-connect"...
2021/03/31 20:09:38.412879 dialer.go:275: dialer: dial: connected to gkeconnect.googleapis.com:443
2021/03/31 20:09:38.412894 tunnel.go:313: serve: opening egress stream...
2021/03/31 20:09:38.413018 dialer.go:225: Dial successful,current connections: 1
2021/03/31 20:09:38.458756 tunnel.go:321: serve: registering project_number="28431226446",connection_id="[MY-MEMBERSHIP-ID]" connection_class="DEFAULT" agent_version="20210326-00-00" ...
2021/03/31 20:09:38.585243 tunnel.go:370: serve: recv error: rpc error: code = PermissionDenied desc = The caller does not have permission   <-- ??????
2021/03/31 20:09:38.585360 dialer.go:277: dialer: dial: connection to gkeconnect.googleapis.com:443 Failed after 247.373126ms: serve: receive request Failed: rpc error: code = PermissionDenied desc = The caller does not have permission
2021/03/31 20:09:38.590399 dialer.go:207: dialer: connection done: serve: receive request Failed: rpc error: code = PermissionDenied desc = The caller does not have permission
2021/03/31 20:09:38.590416 dialer.go:295: dialer: backoff: 720.254544ms

我关注了 the troubleshooting guide 并仔细检查了 ServiceAccount 是否具有必要的权限:gkehub.connect

我可以从两个不同的集群互相 ping 通,所以据我所知没有网络问题。至少“PermissionDenied”并不表示任何网络问题。

我现在尝试了整个过程两次,行为是一致的。错误消息不是很有帮助,因此如果有人可以提供帮助,我将不胜感激。

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)