kubernetes安装常见问题总结

https://github.com/fanux/fanux.github.io/issues/3

重启机器kubelet起不来?

确保selinux swap 已经关闭。 swapoff -a&& setenforce 0&&systemctl start kubelet 执行后会拉起其他服务

永久关闭swap:

1. Identify configured swap devices and files with cat /proc/swaps.
2. Turn off all swap devices and files with swapoff -a.
3. Remove any matching reference found in /etc/fstab

永久关闭selinux:

vim /etc/sysconfig/selinux SELINUX=enforcing 改为 SELINUX=disabled 

chrome 浏览器可能访问不了dashboard

是因为新版chrome安全检测太严格,不认自签证书,要解决可以使用火狐,或者自己买证书给dashboard配置上。 访问不了dashboard先检查pod有没有启动成功kubectl get pod -n kube-system,再在节点上用curl检查,如果能curl到 那就是浏览器的原因了。注意是https

自己创建证书:

1.用openssl创建证书

$ mkdir certs
$ cd certs
$ openssl genrsa -des3 -passout pass:x -out dashboard.pass.key 2048
$ openssl rsa -passin pass:x -in dashboard.pass.key -out dashboard.key
$ rm dashboard.pass.key
$ openssl req -new -key dashboard.key -out dashboard.csr
$ openssl x509 -req -sha256 -days 365 -in dashboard.csr -signkey dashboard.key -out dashboard.crt
$ rm -rf dashboard.csr

2.删除dashboard,这个文件在解压包里找 $ kubectl delete -f kubernetes-dashboard.yaml

3.创建kubernetes-dashboard-secret $ kubectl create secret generic kubernetes-dashboard-certs –from-file=$HOME/certs -n kube-system

4.创建新的dashboard $ kubectl create-f kubernetes-dashboard.yaml

修改calico pod地址段?

kubeadm文档 要改两个地方1. kubeadm配置:

networking:
  dnsDomain: <string>
  serviceSubnet: <cidr>
  podSubnet: <cidr>   # 这里
  1. calico yaml配置:
    - name: FELIX_DEFAULTENDPOINTTOHOSTACTION
      value: "ACCEPT"
    # Configure the IP Pool from which Pod IPs will be chosen.
    - name: CALICO_IPV4POOL_CIDR
      value: "192.168.122.0/24"   # 这里
    - name: CALICO_IPV4POOL_IPIP
      value: "always"
    # Disable IPv6 on Kubernetes.
    - name: FELIX_IPV6SUPPORT
      value: "false"

kubeadm join命令找不到了?

用这个 kubeadm token create –print-join-command

需要通过外网访问APIserver?

典型场景:通过阿里云floatingIP访问APIserver,这时需要把floatingip加入到证书里面,或者如keepalived的虚拟IP, 修改conf/kubeadm.yaml 加入以下字段:

apiServerCertSANs:
  - 10.100.81.11   // 你的外网IP等
  - master01.bja.paas  //如果通过域名访问也需要加域名

然后重新init

最后直接修改kubeconfig文件(~/.kube/config)里的IP即可,把这个文件拷贝到本机,就可以通过外网访问apiserver了

获取登录token:

admin token:

kubectl describe secret `kubectl get secret |grep cluster-admin|awk '{print $1}'`|grep token|awk '{print $2}'|tail -1

dashboard service account token:

kubectl get secret $(kubectl get serviceaccount dashboard -o jsonpath="{.secrets[0].name}") -o jsonpath="{.data.token}" | base64 --decode

或者使用fist auth 模块

在这一步卡死,且docker ps没有任何容器起来,但是kubelet正常

[init] this might take a minute or longer if the control plane images have to be pulled 

看/var/log/messages有如下日志

 RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to create a sandbox for pod "kube-apiserver-istiohost": Error response from daemon: cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice"

这是docker版本与k8s兼容性问题,建议跟换docker版本。 我在1.13.1上出现过此问题,升级到了18.06.0-ce解决,不过具体版本可根据自己需求安装

安装docker-ce:

yum install -y yum-utils
yum-config-manager     --add-repo     https://download.docker.com/linux/centos/docker-ce.repo
yum-config-manager --disable docker-ce-edge
yum makecache fast
yum install docker-ce

NodePort无法访问:iptables -P FORWARD ACCEPT

failure loading ca certificate: the certificate is not valid yet

服务器时间不同步导致

coredns无法启动

.:53
2018/09/28 08:45:32 [INFO] CoreDNS-1.2.2
2018/09/28 08:45:32 [INFO] linux/amd64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/amd64, go1.11, eb51e8b
2018/09/28 08:45:32 [INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769
2018/09/28 08:45:38 [FATAL] plugin/loop: Seen "HINFO IN 4443432808327291531.7218519048545008660." more than twice, loop detected

修改宿主机的/etc/resolv.conf 内容:

[root@iZrj9aqbeed7la2925ggzaZ ~]# cat /etc/resolv.conf
; generated by /usr/sbin/dhclient-script
nameserver 8.8.8.8

杀掉DNS pod即可

calico无法启动

Readiness probe failed: calico/node is not ready: felix is not ready: Get http://localhost:9099/readiness: dial tcp [::1]:9099: connect: connection refused 

很可能是网卡发现有问题,calico虚拟化时没找对网卡,calico会经常找docker0网桥,导致clusterIP不通从而calico node连不上etcd

解决办法: 配置好/etc/hosts

或者修改网卡发现机制: calico网卡发现 conf/net/calico.yaml文件:

- name: IP_AUTODETECTION_METHOD
              value: "interface=eth.*"   # 如果你的网卡不是eth开头,换成自己的,在yaml文件里修改

还有可能是/etc/resove.conf配置错误,如果里面配置了一些search可能会导致calico无法启动, 解决方法是删除不需要的DNS配置,仅配置:

nameserver 8.8.8.8

kubeadm no default route错误

It picks the interface with the default gateway and listens to that. 所以配置好主机默认路由即可,就是default:

Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         169.254.1.1     0.0.0.0         UG    0      0        0 eth1

1.14版本join时需要增加一个–master参数

为了兼容单master与多master的情况:

kubeadm join 10.103.97.1:6443 --token 9vr73a.a8uxyaju799qwdjv \
    --master 10.103.97.100:6443 \
    --discovery-token-ca-cert-hash sha256:7c2e69131a36ae2a042a339b33381c6d0d43887e2de83720eff5359e26aec866

1.14以上版本机器重启node节点notready

这与开机没有加载ipvs内核模块有关,首先请确保ipvs已经加载. 然后确保node kubelet已经正常启动 最后查看lvscare的pod有没有启动 如果都正常还是notready的话,重启lvscare pod即可

# Open ipvs
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4

cat <<EOF >  /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl --system
sysctl -w net.ipv4.ip_forward=1
systemctl stop firewalld && systemctl disable firewalld
swapoff -a
setenforce 0

可以直接杀lvscare容器,kubelet会自动拉起

error execution phase preflight: couldn’t validate the identity of the API

可能是token过期,需要在master上重新生成token再join:

kubeadm token create --print-join-command

sealos时,join别忘记加 --master 参数参考readme

kubelet能起来但是安装时卡住

有位朋友centos7.3上安装master时卡住, docker 1.13.1,原因是系统兼容性问题,如果发现安装过程中kubelet已经能起来了,而容器一个没起,可能就是这个原因, 起容器会报这个错:

 applying cgroup configuration for process caused \\\"Cannot set property TasksAccounting, or unknown property.

解决办法:

yum update

kubelet起不来

在有些系统下可能会有这个问题:

Executable path is not absolute: sh /usr/bin/kubelet-pre-start.sh
7月 22 13:59:21 ning systemd[1]: /etc/systemd/system/kubelet.service:7: Executable path is not absolute: sh /usr/bin/kubelet-pre-start.sh

因为找不到 sh这个命令导致

修改 kube/conf/kubelet.serviceExecStartPre=sh 改成 ExecStartPre=/bin/bash

calico无法启动

[root@k8s03 ~]# kubectl logs calico-node-v4s8w -n kube-system
Error from server (BadRequest): container "calico-node" in pod "calico-node-v4s8w" is waiting to start: PodInitializing
Unable to update cni config: No networks found in /etc/cni/net.d

这是由于init container初始化失败, 把正常借点的/etc/cni拷贝到不正常的节点即可.