2019-12-23Operations5 分钟读完 (大约 824 个字)

[Kubernetes] Kubeadm Troubleshooting

Troubleshooting for kubeadm.

T1 前置准备问题

Run kubeadm config images pull

I1216 10:31:32.821965   25588 version.go:251] remote version is much newer: v1.17.0; falling back to: stable-1.16
[init] Using Kubernetes version: v1.16.4
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 19.03.5. Latest validated version: 18.09
error execution phase preflight: [preflight] Some fatal errors occurred:
	[ERROR Swap]: running with swap on is not supported. Please disable swap
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

依照顺序：

kubeadm config print init-defaults > init.default.yaml得到默认的初始配置，对生成的文件进行编辑，可以按需生成合适的配置。如：
1. 将advertiseAddress: 1.2.3.4修改为本机地址
2. 如果使用国内的docker镜像源，需要将imageRepository: k8s.gcr.io修改为对应的镜像源地址
3. 定制镜像仓库地址：imageRepository: docker.io/dustise
4. kubernetes版本：kubernetesVersion: v1.17.0
5. Pod地址范围：
1
2
networking:
podSubnet: "192.168.0.0/16"
修改docker版本
Disable swap

sudo swapoff -a #关闭交换分区

sudo free -m # 查看交换分区状态

sudo chmod +w /etc/fstab # 修改fstab文件的权限

vi /etc/fstab # 将swap行注释掉，注意先备份，此步是为了永久关闭swap分区

T2 Dashboard无法访问

碰到页面显示

{
"kind": "Status",
"apiVersion": "v1",
"metadata": {

},
"status": "Failure",
"message": "pods is forbidden: User "system:anonymous" cannot list pods in the namespace "default"",
"reason": "Forbidden",
"details": {
"kind": "pods"
},
"code": 403
}

那么参考这个issue，在之前用来生成dashboard的recommend.yaml中加入

---
# ------------------- Gross Hack For anonymous auth through api proxy ------------------- #
# Allows users to reach login page and other proxied dashboard URLs
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: kubernetes-dashboard-anonymous
rules:
- apiGroups: [""]
  resources: ["services/proxy"]
  resourceNames: ["https:kubernetes-dashboard:"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- nonResourceURLs: ["/ui", "/ui/*", "/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/*"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kubernetes-dashboard-anonymous
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kubernetes-dashboard-anonymous
subjects:
- kind: User
  name: system:anonymous

通过命令重新启动dashboard：kubectl replace --force -f recommended.yaml

T3 配置网络问题

1.使用calico未发现正常网卡

错误信息：

1	:Readiness probe failed: caliconode is not ready: BIRD is not ready: BGP not established with 10.117.150.23

解决方法：

编辑calicol.yaml，添加

1 2	- name: IP_AUTODETECTION_METHOD value: "interface=enp09s" # 实际接通其他集群的网卡

2.使用其他网络配置的残留

由于此前操作有错误，需移除之前的网络配置，重新安装。

但是此前的网络配置未移除干净，会造成一些问题：

ip link，例如此前使用过weave对应的网卡weave、calico对应的网卡tunl0，需要使用ip link delete {name}来进行移除。

更为有效的方法(针对weave)：

1 2	sudo curl -L git.io/weave -o /usr/local/bin/weave sudo chmod a+x /usr/local/bin/weave

then

1	weave reset

CNI的配置文件在使用kubeadm reset后并不会被删除:rm -rf /etc/cni/net.d

T4 顽固的swap (UPDATE 2020/03/31)

不知道出于什么原因，每次实验室服务器重启之后，交换分区都会被重新启用（/etc/fstab甚至都是disable的状态）

暂时的解决方法是，每次服务器不得不需要重启时，先停止所有容器（不太可能），重启后swapoff -a +systemctl restart kubelet，两套连招，暂且将就用用。

#Kubernetes