[Kubernetes] Kubeadm Troubleshooting

Troubleshooting for kubeadm.

T1 前置准备问题

Run kubeadm config images pull

1
2
3
4
5
6
7
8
I1216 10:31:32.821965   25588 version.go:251] remote version is much newer: v1.17.0; falling back to: stable-1.16
[init] Using Kubernetes version: v1.16.4
[preflight] Running pre-flight checks
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 19.03.5. Latest validated version: 18.09
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR Swap]: running with swap on is not supported. Please disable swap
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

依照顺序:

  1. kubeadm config print init-defaults > init.default.yaml得到默认的初始配置,对生成的文件进行编辑,可以按需生成合适的配置。如:

    1. advertiseAddress: 1.2.3.4修改为本机地址
    2. 如果使用国内的docker镜像源,需要将imageRepository: k8s.gcr.io修改为对应的镜像源地址
    3. 定制镜像仓库地址:imageRepository: docker.io/dustise
    4. kubernetes版本:kubernetesVersion: v1.17.0
    5. Pod地址范围:
    1
    2
    networking:
    podSubnet: "192.168.0.0/16"
  2. 修改docker版本

  3. Disable swap

    sudo swapoff -a #关闭交换分区

    sudo free -m # 查看交换分区状态

    sudo chmod +w /etc/fstab # 修改fstab文件的权限

    vi /etc/fstab # 将swap行注释掉,注意先备份,此步是为了永久关闭swap分区

T2 Dashboard无法访问

碰到页面显示

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {

},
"status": "Failure",
"message": "pods is forbidden: User "system:anonymous" cannot list pods in the namespace "default"",
"reason": "Forbidden",
"details": {
"kind": "pods"
},
"code": 403
}

那么参考这个issue,在之前用来生成dashboard的recommend.yaml中加入

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
---
# ------------------- Gross Hack For anonymous auth through api proxy ------------------- #
# Allows users to reach login page and other proxied dashboard URLs
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: kubernetes-dashboard-anonymous
rules:
- apiGroups: [""]
resources: ["services/proxy"]
resourceNames: ["https:kubernetes-dashboard:"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- nonResourceURLs: ["/ui", "/ui/*", "/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/*"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kubernetes-dashboard-anonymous
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kubernetes-dashboard-anonymous
subjects:
- kind: User
name: system:anonymous

通过命令重新启动dashboard:kubectl replace --force -f recommended.yaml

T3 配置网络问题

1.使用calico未发现正常网卡

错误信息:

1
:Readiness probe failed: caliconode is not ready: BIRD is not ready: BGP not established with 10.117.150.23

解决方法:

编辑calicol.yaml,添加

1
2
- name: IP_AUTODETECTION_METHOD
value: "interface=enp09s" # 实际接通其他集群的网卡

2.使用其他网络配置的残留

由于此前操作有错误,需移除之前的网络配置,重新安装。

但是此前的网络配置未移除干净,会造成一些问题:

  • ip link,例如此前使用过weave对应的网卡weavecalico对应的网卡tunl0,需要使用ip link delete {name}来进行移除。

  • 更为有效的方法(针对weave):

    1
    2
    sudo curl -L git.io/weave -o /usr/local/bin/weave
    sudo chmod a+x /usr/local/bin/weave

    then

    1
    weave reset
  • CNI的配置文件在使用kubeadm reset后并不会被删除:rm -rf /etc/cni/net.d

T4 顽固的swap (UPDATE 2020/03/31)

不知道出于什么原因,每次实验室服务器重启之后,交换分区都会被重新启用(/etc/fstab甚至都是disable的状态)

暂时的解决方法是,每次服务器不得不需要重启时,先停止所有容器(不太可能),重启后swapoff -a +systemctl restart kubelet,两套连招,暂且将就用用。