问题描述
我在 AWS 中有一个使用 Kops 部署的 Kubernetes 集群,没有使用 EKS(目前太贵了)。 CNI 是 Calico。 我有 4 个节点:
- 1 个主 t3a.medium
- 2 个小工人 t2.micro
- 1 个更大的工人 t3a.medium
它们被贴上了标签。
我有一个与此 Helm chart 一起安装的 Gitlab kubernetes 运行器,它被配置为在大型工作器上运行。 它还被配置为在大型工作器上生成跑步者。
它运行良好,但几天后我注意到有时(如果不是每次)管道的第一个作业会导致在大节点上运行的 pod calico-node 出现错误。 错误是:
bird: Mesh_10_1_1_136: Socket error: bind: Address not available
bird: Mesh_10_1_1_212: Socket error: bind: Address not available
bird: Mesh_10_1_1_14: Socket error: bind: Address not available
10.1.1.136 212 和 14 是主节点和 2 个较小节点的 ip。 大节点IP永远不会出现。
所以我的问题是:
- 发生了什么?
- 我该怎么做才能防止这个 calico pod 出错?
在此先非常感谢您。 干杯。
[编辑] 我在 calico-node 日志中发现了这些行:
2021-05-24 22:43:26.881 [INFO][49] monitor-addresses/startup.go 576: Node IPv4 changed,will check for conflicts
2021-05-24 22:43:26.907 [WARNING][49] monitor-addresses/startup.go 1107: IPv4 address has changed. This Could happen if there are multiple nodes with the same name. node="ip-10-1-1-198.eu-west-1.compute.internal" original="10.1.1.198" updated="192.168.0.1"
2021-05-24 22:43:26.936 [INFO][45] confd/client.go 877: Recompute BGP peerings: HostBGPConfig(node=ip-10-1-1-198.eu-west-1.compute.internal; name=ip_addr_v4) updated; HostBGPConfig(node=ip-10-1-1-198.eu-west-1.compute.internal; name=network_v4) updated
2021-05-24 22:43:26.937 [INFO][53] Felix/int_dataplane.go 1325: Received *proto.HostMetadataUpdate update from calculation graph msg=hostname:"ip-10-1-1-198.eu-west-1.compute.internal" ipv4_addr:"192.168.0.1"
2021-05-24 22:43:26.937 [INFO][53] Felix/int_dataplane.go 1453: Applying dataplane updates
2021-05-24 22:43:26.937 [INFO][53] Felix/ipip_mgr.go 222: All-hosts IP set out-of sync,refreshing it.
2021-05-24 22:43:26.937 [INFO][53] Felix/ipsets.go 119: queueing IP set for creation family="inet" setID="all-hosts-net" setType="hash:net"
2021-05-24 22:43:26.941 [INFO][49] monitor-addresses/startup.go 308: Updated node IP addresses
2021-05-24 22:43:26.951 [INFO][53] Felix/ipsets.go 749: Doing full IP set rewrite family="inet" numMembersInPendingReplace=4 setID="all-hosts-net"
bird: Mesh_10_1_1_212: Received: Peer de-configured
bird: Mesh_10_1_1_212: State changed to stop
bird: Mesh_10_1_1_212: State changed to down
bird: Mesh_10_1_1_212: Starting
bird: Mesh_10_1_1_212: State changed to start
2021-05-24 22:43:26.971 [INFO][53] Felix/int_dataplane.go 1467: Finished applying updates to dataplane. msecToApply=33.685021
bird: Reconfiguration requested by SIGHUP
bird: Reconfiguring
bird: device1: Reconfigured
bird: direct1: Reconfigured
bird: Reconfigured
2021-05-24 22:43:26.984 [INFO][45] confd/resource.go 277: Target config /etc/calico/confd/config/bird6.cfg has been updated due to change in key: /calico/bgp/v1/host
bird: Reconfiguration requested by SIGHUP
bird: Reconfiguring
bird: device1: Reconfigured
bird: direct1: Reconfigured
bird: Restarting protocol Mesh_10_1_1_136
bird: Mesh_10_1_1_136: Shutting down
bird: Mesh_10_1_1_136: State changed to stop
bird: Restarting protocol Mesh_10_1_1_14
bird: Mesh_10_1_1_14: Shutting down
bird: Mesh_10_1_1_14: State changed to stop
bird: Restarting protocol Mesh_10_1_1_212
bird: Mesh_10_1_1_212: Shutting down
bird: Mesh_10_1_1_212: State changed to stop
bird: Mesh_10_1_1_212: State changed to down
bird: Mesh_10_1_1_212: Initializing
bird: Mesh_10_1_1_212: Starting
bird: Mesh_10_1_1_212: State changed to start
bird: Mesh_10_1_1_136: State changed to down
bird: Mesh_10_1_1_136: Initializing
bird: Mesh_10_1_1_136: Starting
bird: Mesh_10_1_1_136: State changed to start
bird: Mesh_10_1_1_14: State changed to down
bird: Mesh_10_1_1_14: Initializing
bird: Mesh_10_1_1_14: Starting
bird: Mesh_10_1_1_14: State changed to start
bird: Reconfigured
[编辑 2] 引发此错误的 gitlab 步骤如下:
integrationtesting:
tags:
- kubernetes
image: docker/compose:alpine-1.29.2
stage: tests
before_script:
- echo "NPM_TOKEN=$NPM_TOKEN" > test_integ/dependencies/.env
- docker-compose -f test_integ/dependencies/docker-compose.yaml up --build -d
script:
- docker-compose -f test_integ/dependencies/tester-compose.yaml up --build --abort-on-container-exit --exit-code-from tester
after_script:
- docker-compose -f test_integ/dependencies/docker-compose.yaml -f test_integ/dependencies/tester-compose.yaml down
与:
docker-compose.yaml
version: "3.9"
networks:
testinteg:
name: testinteg
services:
mongosrv:
container_name: "mongosrv"
image: mongo
networks:
- testinteg
users:
container_name: "users"
build:
context: "../.."
dockerfile: "Dockerfile"
target: run
args:
NPM_TOKEN: "${NPM_TOKEN}"
network: host
environment:
NODE_ENV: "dev"
PORT: 80
LOG_LEVEL: "debug"
LOG_FORMAT: "splat,simple"
PASSWORD_JWT_SECRET: "anothersecurestring"
PASSWORD_JWT_TTL: "30s"
SSL_ENABLED: "false"
MOCK_DB: "false"
MONGO_DB: "users"
MONGO_HOST: "mongosrv"
depends_on:
- mongosrv
networks:
- testinteg
test-compose.yaml
version: "3.9"
networks:
testinteg:
name: testinteg
services:
tester:
container_name: "tester"
build:
context: "../.."
dockerfile: "Dockerfile"
target: testinteg
args:
NPM_TOKEN: "${NPM_TOKEN}"
network: host
environment:
MSHOST: users
MSPORT: 80
volumes:
- ../tests:/app/test_integ/tests
networks:
- testinteg
最终信息:Dockerfile 正在针对 Jfrog Artifactory 的私有 npm 注册表运行 npm ci
。
如果 network: host
部分中没有 build
选项,则无法解析域(Docker 中的 Docker 问题)。
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)