问题描述
两天以来,我一直在与 Ubuntu 20.04 上的 Kubernetes 设置作斗争。我在 vSphere 上创建了所谓的模板虚拟机,并从中克隆了三个虚拟机。
我对每个主节点有以下配置:
/etc/hosts
127.0.0.1 localhost
127.0.1.1 kubernetes-master1
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
192.168.255.200 kubernetes-cluster.homelab01.local
192.168.255.201 kubernetes-master1.homelab01.local
192.168.255.202 kubernetes-master2.homelab01.local
192.168.255.203 kubernetes-master3.homelab01.local
192.168.255.204 kubernetes-worker1.homelab01.local
192.168.255.205 kubernetes-worker2.homelab01.local
192.168.255.206 kubernetes-worker3.homelab01.local
127.0.1.1 kubernetes-master1
在第一个母版上,127.0.1.1 kubernetes-master2
在第二个上,127.0.1.1 kubernetes-master3
在第三个上。
我正在使用 Docker 19.03.11,这是 Kubernetes 根据文档最新支持的。
码头工人
Client: Docker Engine - Community
Version: 19.03.11
API version: 1.40
Go version: go1.13.10
Git commit: 42e35e61f3
Built: Mon Jun 1 09:12:34 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.11
API version: 1.40 (minimum version 1.12)
Go version: go1.13.10
Git commit: 42e35e61f3
Built: Mon Jun 1 09:11:07 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.13
GitCommit: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683
我使用以下命令安装docker:
sudo apt-get update && sudo apt-get install -y \
containerd.io=1.2.13-2 \
docker-ce=5:19.03.11~3-0~ubuntu-$(lsb_release -cs) \
docker-ce-cli=5:19.03.11~3-0~ubuntu-$(lsb_release -cs)
我将所有必要的数据包标记为暂停。
sudo apt-mark hold kubelet kubeadm kubectl docker-ce containerd.io docker-ce-cli
有关 VM 的一些详细信息。
大师1sudo cat /sys/class/dmi/id/product_uuid
f09c3242-c8f7-c97e-bc6a-b2065c286ea9
IP: 192.168.255.201
大师2
sudo cat /sys/class/dmi/id/product_uuid
b4fe3242-ba37-a533-c12f-b30b735cbe9f
IP: 192.168.255.202
大师3
sudo cat /sys/class/dmi/id/product_uuid
c3cc3242-4115-8c38-8e46-166190620249
IP: 192.168.255.203
IP 地址和名称解析在所有主机上都能完美运行
192.168.255.200 kubernetes-cluster.homelab01.local
192.168.255.201 kubernetes-master1.homelab01.local
192.168.255.202 kubernetes-master2.homelab01.local
192.168.255.203 kubernetes-master3.homelab01.local
192.168.255.204 kubernetes-worker1.homelab01.local
192.168.255.205 kubernetes-worker2.homelab01.local
192.168.255.206 kubernetes-worker3.homelab01.local
Keepalived.conf
来自 master1。在 master2 上它有 state=backup 和优先级 100,在 master3 上 state=backup 和优先级 89。
! /etc/keepalived/keepalived.conf
! Configuration File for keepalived
$STATE=MASTER
$INTERFACE=ens160
$ROUTER_ID=51
$PRIORITY=255
$AUTH_PASS=Kub3rn3t3S!
$APISERVER_VIP=192.168.255.200/24
global_defs {
router_id LVS_DEVEL
}
vrrp_script check_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 3
weight -2
fall 10
rise 2
}
vrrp_instance VI_1 {
state $STATE
interface $INTERFACE
virtual_router_id $ROUTER_ID
priority $PRIORITY
authentication {
auth_type PASS
auth_pass $AUTH_PASS
}
virtual_ipaddress {
$APISERVER_VIP
}
track_script {
check_apiserver
}
}
check_apiserver.sh
/etc/keepalived/check_apiserver.sh
#!/bin/sh
APISERVER_VIP=192.168.255.200
APISERVER_DEST_PORT=6443
errorExit() {
echo "*** $*" 1>&2
exit 1
}
curl --silent --max-time 2 --insecure https://localhost:${APISERVER_DEST_PORT}/ -o /dev/null || errorExit "Error GET https://localhost:${APISERVER_DEST_PORT}/"
if ip addr | grep -q ${APISERVER_VIP}; then
curl --silent --max-time 2 --insecure https://${APISERVER_VIP}:${APISERVER_DEST_PORT}/ -o /dev/null || errorExit "Error GET https://${APISERVER_VIP}:${APISERVER_DEST_PORT}/"
fi
保活服务
sudo service keepalived status
● keepalived.service - Keepalive Daemon (LVS and VRRP)
Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2021-01-06 16:41:38 CET; 1min 26s ago
Main PID: 804 (keepalived)
Tasks: 2 (limit: 4620)
Memory: 4.7M
CGroup: /system.slice/keepalived.service
├─804 /usr/sbin/keepalived --dont-fork
└─840 /usr/sbin/keepalived --dont-fork
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: Registering Kernel netlink reflector
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: Registering Kernel netlink command channel
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: opening file '/etc/keepalived/keepalived.conf>
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: WARNING - default user 'keepalived_script' fo>
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: (Line 29) Truncating auth_pass to 8 characters
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: Security VIOLATION - scripts are being execut>
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: (VI_1) ignoring tracked script check_apiserve>
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: Warning - script check_apiserver is not used
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: Registering gratuitous ARP shared channel
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: (VI_1) Entering MASTER STATE
lines 1-20/20 (END)
haproxy.cfg
# /etc/haproxy/haproxy.cfg
#
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
log /dev/log local0
log /dev/log local1 notice
daemon
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 1
timeout http-request 10s
timeout queue 20s
timeout connect 5s
timeout client 20s
timeout server 20s
timeout http-keep-alive 10s
timeout check 10s
#---------------------------------------------------------------------
# apiserver frontend which proxys to the masters
#---------------------------------------------------------------------
frontend apiserver
bind *:8443
mode tcp
option tcplog
default_backend apiserver
#---------------------------------------------------------------------
# round robin balancing for apiserver
#---------------------------------------------------------------------
backend apiserver
option httpchk GET /healthz
http-check expect status 200
mode tcp
option ssl-hello-chk
balance roundrobin
server kubernetes-master1 192.168.255.201:6443 check
server kubernetes-master2 192.168.255.202:6443 check
server kubernetes-master3 192.168.255.203:6443 check
haproxy 服务状态
sudo service haproxy status
● haproxy.service - HAProxy Load Balancer
Loaded: loaded (/lib/systemd/system/haproxy.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2021-01-06 16:41:38 CET; 3min 12s ago
Docs: man:haproxy(1)
file:/usr/share/doc/haproxy/configuration.txt.gz
Process: 847 ExecStartPre=/usr/sbin/haproxy -f $CONfig -c -q $EXTRAOPTS (code=exited,status=0/SUC>
Main PID: 849 (haproxy)
Tasks: 3 (limit: 4620)
Memory: 4.7M
CGroup: /system.slice/haproxy.service
├─849 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -S /run/hapro>
└─856 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -S /run/hapro>
Jan 06 16:41:38 kubernetes-master1 haproxy[856]: Server apiserver/kubernetes-master1 is DOWN,reason: >
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: [WARNING] 005/164139 (856) : Server apiserver/kuberne>
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: Server apiserver/kubernetes-master2 is DOWN,reason: >
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: Server apiserver/kubernetes-master2 is DOWN,reason: >
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: [WARNING] 005/164139 (856) : Server apiserver/kuberne>
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: [ALERT] 005/164139 (856) : backend 'apiserver' has no>
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: Server apiserver/kubernetes-master3 is DOWN,reason: >
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: Server apiserver/kubernetes-master3 is DOWN,reason: >
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: backend apiserver has no server available!
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: backend apiserver has no server available!
lines 1-23/23 (END)
我正在使用以下命令创建第一个 kubernetes 节点
sudo kubeadm init --control-plane-endpoint kubernetes-cluster.homelab01.local:8443 --upload-certs
这很好用,我用命令应用了 Calico CNI 插件
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
之后我尝试从 master2 加入。
Keepalived 工作得非常好,因为我在所有三个节点上测试了它,停止服务并观察故障转移到其他节点。当我在第一个 master1 节点上创建 kubernetes haproxy 时,通知后端是可见的。
Kubernetes 集群引导过程
udo kubeadm init --control-plane-endpoint kubernetes-cluster.homelab01.local:8443 --upload-certs
[init] Using Kubernetes version: v1.20.1
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two,depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes-cluster.homelab01.local kubernetes-master1 kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.255.201]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kubernetes-master1 localhost] and IPs [192.168.255.201 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [kubernetes-master1 localhost] and IPs [192.168.255.201 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 18.539325 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.20" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
57abea9f00357a4459c852249ac0170633c9a0f2327cde191e529a1689ea158b
[mark-control-plane] Marking the node kubernetes-master1 as control-plane by adding the labels "node-role.kubernetes.io/master=''" and "node-role.kubernetes.io/control-plane='' (deprecated)"
[mark-control-plane] Marking the node kubernetes-master1 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: 2cu336.rjxs8i0svtna27ke
[bootstrap-token] Configuring bootstrap tokens,cluster-info ConfigMap,RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: coredns
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster,you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively,if you are the root user,you can run:
export KUBECONfig=/etc/kubernetes/admin.conf
You should Now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can Now join any number of the control-plane node running the following command on each as root:
kubeadm join kubernetes-cluster.homelab01.local:8443 --token 2cu336.rjxs8i0svtna27ke \
--discovery-token-ca-cert-hash sha256:eb0668ca16acec622e4a97d69e0d4c42e64b1a61ffea13a3787956817021ca54 \
--control-plane --certificate-key 57abea9f00357a4459c852249ac0170633c9a0f2327cde191e529a1689ea158b
Please note that the certificate-key gives access to cluster sensitive data,keep it secret!
As a safeguard,uploaded-certs will be deleted in two hours; If necessary,you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join kubernetes-cluster.homelab01.local:8443 --token 2cu336.rjxs8i0svtna27ke \
--discovery-token-ca-cert-hash sha256:eb0668ca16acec622e4a97d69e0d4c42e64b1a61ffea13a3787956817021ca54
所有东西都在 master1 上运行
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/calico-kube-controllers-744cfdf676-mks4d 1/1 Running 0 36s
kube-system pod/calico-node-bnvmz 1/1 Running 0 37s
kube-system pod/coredns-74ff55c5b-skdzk 1/1 Running 0 3m11s
kube-system pod/coredns-74ff55c5b-tctl9 1/1 Running 0 3m11s
kube-system pod/etcd-kubernetes-master1 1/1 Running 0 3m4s
kube-system pod/kube-apiserver-kubernetes-master1 1/1 Running 0 3m4s
kube-system pod/kube-controller-manager-kubernetes-master1 1/1 Running 0 3m4s
kube-system pod/kube-proxy-smmmx 1/1 Running 0 3m11s
kube-system pod/kube-scheduler-kubernetes-master1 1/1 Running 0 3m4s
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 3m17
s
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 3m11
s
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SE
LECTOR AGE
kube-system daemonset.apps/calico-node 1 1 1 1 1 kuberne
tes.io/os=linux 38s
kube-system daemonset.apps/kube-proxy 1 1 1 1 1 kuberne
tes.io/os=linux 3m11s
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/calico-kube-controllers 1/1 1 1 38s
kube-system deployment.apps/coredns 2/2 2 2 3m11s
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/calico-kube-controllers-744cfdf676 1 1 1 37s
kube-system replicaset.apps/coredns-74ff55c5b 2 2 2 3m11s
尝试将 master2 加入集群 master1 后,Kubernetes 立即死亡。
wojcieh@kubernetes-master2:~$ sudo kubeadm join kubernetes-cluster.homelab01.local:8443 --token 2cu336.rjxs8i0svtna27ke \
> --discovery-token-ca-cert-hash sha256:eb0668ca16acec622e4a97d69e0d4c42e64b1a61ffea13a3787956817021ca54 \
> --control-plane --certificate-key 57abea9f00357a4459c852249ac0170633c9a0f2327cde191e529a1689ea158b
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two,depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes-cluster.homelab01.local kubernetes-master2 kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.255.202]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kubernetes-master2 localhost] and IPs [192.168.255.202 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [kubernetes-master2 localhost] and IPs [192.168.255.202 127.0.0.1 ::1]
[certs] Valid certificates and keys Now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Creating static Pod manifest for "etcd"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
[kubelet-check] Initial timeout of 40s passed.
broadcast message from systemd-journald@kubernetes-master2 (Wed 2021-01-06 16:53:04 CET):
haproxy[870]: backend apiserver has no server available!
broadcast message from systemd-journald@kubernetes-master2 (Wed 2021-01-06 16:53:04 CET):
haproxy[870]: backend apiserver has no server available!
^C
wojcieh@kubernetes-master2:~$
这里有一些可能相关的日志
来自 master1 https://pastebin.com/Y1zcwfWt 的日志
来自 master2 的日志 https://pastebin.com/rBELgK1Y
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)