Skip to content

实战-kubeadm方式搭建k8s集群-v1-22-1-20211102-测试成功-阳明(CRI-Containerd)

实战:kubeadm方式搭建k8s集群(k8s-v1.22.1,containerd-v1.5.5)-2021.11.2(部署成功-阳明)

2021.11.2 搭建。

实验环境

1、硬件环境

3台虚机 2c2g,20g。(nat模式,可访问外网)

角色主机名ip
master节点master1172.29.9.51
node节点node1172.29.9.52
node节点node2172.29.9.53

2、软件环境

软件版本
操作系统centos7.6_x64 1810 mini(其他centos7.x版本也行)
containerdv1.5.5
kubernetesv1.22.1

实验软件

链接:https:bashhostnamectl--staticset-hostnamenode1bashhostnamectl--staticset-hostnamenode2bash

注意: 节点的 hostname 必须使用标准的 DNS 命名,另外千万不用什么默认的 localhost的 hostname,会导致各种错误出现的。在 Kubernetes 项目里,机器的名字以及一切存储在 Etcd 中的 API 对象,都必须使用标准的 DNS 命名(RFC 1123)。可以使用命令 hostnamectl set-hostname node1来修改 hostname。

1.2关闭防火墙,selinux

bash
systemctlstopfirewalld&&systemctldisablefirewalldsystemctlstopNetworkManager&&systemctldisableNetworkManagersetenforce0sed-is/SELINUX=enforcing/SELINUX=disabled//etc/selinux/config

1.3关闭swap分区

bash
swapoff-ased-ri's/.*swap.*/#&/'/etc/fstab

问题:k8s集群安装为什么需要关闭swap分区? swap必须关,否则kubelet起不来,进而导致k8s集群起不来; 可能kublet考虑到用swap做数据交换的话,对性能影响比较大;

1.4配置dns解析

bash
cat>>/etc/hosts<<EOF172.29.9.51 master1172.29.9.52 node1172.29.9.53 node2EOF

问题:k8s集群安装时节点是否需要配置dns解析? 就是后面的kubectl如果需要连接运行在node上面的容器的话,它是通过kubectl get node出来的名称去连接的,所以那个的话,我们需要在宿主机上能够解析到它。如果它解析不到的话,那么他就可能连不上;

1.5将桥接的IPv4流量传递到iptables的链

bash
modprobebr_netfiltercat>/etc/sysctl.d/k8s.conf<<EOFnet.bridge.bridge-nf-call-ip6tables =1net.bridge.bridge-nf-call-iptables =1net.ipv4.ip_forward =1EOFsysctl--system

注意:将桥接的IPv4流量传递到iptables的链 由于开启内核 ipv4 转发需要加载 br_netfilter 模块,所以加载下该模块: modprobe br_netfilter bridge-nf说明:

bridge-nf 使得 netfilter 可以对 Linux 网桥上的 IPv4/ARP/IPv6 包过滤。比如,设置net.bridge.bridge-nf-call-iptables=1后,二层的网桥在转发包时也会被 iptables的 FORWARD 规则所过滤。常用的选项包括:

  • net.bridge.bridge-nf-call-arptables:是否在 arptables 的 FORWARD 中过滤网桥的 ARP 包
  • net.bridge.bridge-nf-call-ip6tables:是否在 ip6tables 链中过滤 IPv6 包
  • net.bridge.bridge-nf-call-iptables:是否在 iptables 链中过滤 IPv4 包
  • net.bridge.bridge-nf-filter-vlan-tagged:是否在 iptables/arptables 中过滤打了 vlan 标签的包。

1.6安装 ipvs

bash
cat>/etc/sysconfig/modules/ipvs.modules<<EOF#!/bin/bashmodprobe -- ip_vsmodprobe -- ip_vs_rrmodprobe -- ip_vs_wrrmodprobe -- ip_vs_shmodprobe -- nf_conntrack_ipv4EOFchmod755/etc/sysconfig/modules/ipvs.modules&&bash/etc/sysconfig/modules/ipvs.modules&&lsmod|grep-eip_vs-enf_conntrack_ipv4yuminstallipset-yyuminstallipvsadm-y

说明: 01、上面脚本创建了的/etc/sysconfig/modules/ipvs.modules文件,保证在节点重启后能自动加载所需模块。使用lsmod |grep -e ip_vs -e nf_conntrack_ipv4命令查看是否已经正确加载所需的内核模块;02、要确保各个节点上已经安装了 ipset 软件包,因此需要:yum install ipset -y;03、为了便于查看 ipvs 的代理规则,最好安装一下管理工具 ipvsadm:yum install ipvsadm -y

1.7同步服务器时间

bash
yuminstallchrony-ysystemctlenablechronyd--nowchronycsources

1.8配置免密(方便后期从master节点传文件到node节点)

bash
#在master1节点执行如下命令,按2次回车ssh-keygen#在master1节点执行ssh-copy-id-i~/.ssh/id_rsa.pubroot@172.29.9.52ssh-copy-id-i~/.ssh/id_rsa.pubroot@172.29.9.53

2、安装 Containerd(all节点均要配置)

2.1安装containerd

bash
cd/root/yuminstalllibseccomp-ywgethttps:tar-C/-xzfcri-containerd-cni-1.5.5-linux-amd64.tar.gzecho"export PATH=$PATH:/usr/local/bin:/usr/local/sbin">>~/.bashrcsource~/.bashrcmkdir-p/etc/containerdcontainerdconfigdefault>/etc/containerd/config.tomlsystemctlenablecontainerd--nowctrversion

说明:centos7上具体如何安装containerd,请看文章:实战:centos7上containerd的安装-20211023,本次只提供具体shell命令。

2.2将 containerd 的 cgroup driver 配置为 systemd

对于使用 systemd 作为 init system 的 Linux 的发行版,使用 systemd作为容器的 cgroup driver可以确保节点在资源紧张的情况更加稳定,所以推荐将 containerd 的 cgroup driver 配置为 systemd。

修改前面生成的配置文件 /etc/containerd/config.toml,在 plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options配置块下面将 SystemdCgroup设置为 true

bash
#通过搜索SystemdCgroup进行定位#vim /etc/containerd/config.toml[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]...[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]SystemdCgroup=true....#注意:最终输出shell命令:sed-i"s/SystemdCgroup =false/SystemdCgroup =true/g"/etc/containerd/config.toml

2.3配置镜像加速器地址

然后再为镜像仓库配置一个加速器,需要在 cri 配置块下面的 registry配置块下面进行配置 registry.mirrors:(注意缩进)

bash
[root@master1 ~]#vim /etc/containerd/config.toml[plugins."io.containerd.grpc.v1.cri".registry][plugins."io.containerd.grpc.v1.cri".registry.mirrors][plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]endpoint=["https:[plugins."io.containerd.grpc.v1.cri".registry.mirrors."k8s.gcr.io"]endpoint=["https:……sandbox_image="registry.aliyuncs.com/k8sxio/pause:3.5"……

2.5启动containerd服务

由于上面我们下载的 containerd 压缩包中包含一个 etc/systemd/system/containerd.service的文件,这样我们就可以通过 systemd 来配置 containerd 作为守护进程运行了,现在我们就可以启动 containerd 了,直接执行下面的命令即可:

bash
systemctldaemon-reloadsystemctlenablecontainerd--now

2.6验证

启动完成后就可以使用 containerd 的本地 CLI 工具 ctrcrictl了,比如查看版本:

ctr versioncrictl version

至此,containerd安装完成。

3、使用 kubeadm 部署 Kubernetes

3.1添加阿里云YUM软件源(all节点均要配置)

我们使用阿里云的源进行安装:

bash
cat>/etc/yum.repos.d/kubernetes.repo<<EOF[kubernetes]name=Kubernetesbaseurl=https:enabled=1gpgcheck=0repo_gpgcheck=0gpgkey=https:EOF

3.2安装 kubeadm、kubelet、kubectl(all节点均要配置)

bash
yummakecachefastyuminstall-ykubelet-1.22.2kubeadm-1.22.2kubectl-1.22.2--disableexcludes=kuberneteskubeadmversionsystemctlenable--nowkubelet

说明:--disableexcludes 禁掉除了kubernetes之外的别的仓库

符合预期:

3.3初始化集群(master1节点操作)

当我们执行 kubelet --help命令的时候可以看到原来大部分命令行参数都被 DEPRECATED了,这是因为官方推荐我们使用 --config来指定配置文件,在配置文件中指定原来这些参数的配置,可以通过官方文档 Set Kubelet parameters via a config file了解更多相关信息,这样 Kubernetes 就可以支持**动态 Kubelet 配置(Dynamic Kubelet Configuration)**了,参考 Reconfigure a Node’s Kubelet in a Live Cluster

然后我们可以通过下面的命令在 master 节点上输出集群初始化默认使用的配置:

bash
[root@master1 ~]#kubeadm config print init-defaults --component-configs KubeletConfiguration >kubeadmyaml

然后根据我们自己的需求修改配置,比如修改 imageRepository指定集群初始化时拉取 Kubernetes 所需镜像的地址,kube-proxy 的模式为 ipvs,另外需要注意的是我们这里是准备安装 flannel 网络插件的,需要将 networking.podSubnet设置为10.244.0.0/16

yaml
[root@master1 ~]#vim kubeadmyamlapiVersion:kubeadm.k8s.io/v1beta3bootstrapTokens:- groups:- system:bootstrappers:kubeadm:default-node-tokentoken:abcdef.0123456789abcdefttl:24h0m0susages:- signing- authenticationkind:InitConfigurationlocalAPIEndpoint:advertiseAddress:172.29.9.51# 修改1:指定master节点内网IPbindPort:6443nodeRegistration:criSocket:/run/containerd/containerd.sock# 修改2:使用 containerd的Unix socket 地址imagePullPolicy:IfNotPresentname:master1#name #修改3:修改master节点名称taints:# 修改4:给master添加污点,master节点不能调度应用- effect:"NoSchedule"key:"node-role.kubernetes.io/master"---apiVersion:kubeproxy.config.k8s.io/v1alpha1kind:KubeProxyConfigurationmode:ipvs# 修改5:修改kube-proxy 模式为ipvs,默认为iptables---apiServer:timeoutForControlPlane:4m0sapiVersion:kubeadm.k8s.io/v1beta3certificatesDir:/etc/kubernetes/pkiclusterName:kubernetescontrollerManager:{}dns:{}etcd:local:dataDir:/var/lib/etcdimageRepository:registry.aliyuncs.com/k8sxio#修改6:image地址kind:ClusterConfigurationkubernetesVersion:1.22.2#修改7:指定k8s版本号,默认这里忽略了小版本号networking:dnsDomain:cluster.localserviceSubnet:10.96.0.0/12podSubnet:10.244.0.0/16# 修改8:指定 pod 子网scheduler:{}---apiVersion:kubelet.config.k8s.io/v1beta1authentication:anonymous:enabled:falsewebhook:cacheTTL:0senabled:truex509:clientCAFile:/etc/kubernetes/pki/ca.crtauthorization:mode:Webhookwebhook:cacheAuthorizedTTL:0scacheUnauthorizedTTL:0sclusterDNS:- 10.96.0.10clusterDomain:cluster.localcpuManagerReconcilePeriod:0sevictionPressureTransitionPeriod:0sfileCheckFrequency:0shealthzBindAddress:127.0.0.1healthzPort:10248httpCheckFrequency:0simageMinimumGCAge:0skind:KubeletConfigurationcgroupDriver:systemd# 修改9:配置 cgroup driverlogging:{}memorySwap:{}nodeStatusReportFrequency:0snodeStatusUpdateFrequency:0srotateCertificates:trueruntimeRequestTimeout:0sshutdownGracePeriod:0sshutdownGracePeriodCriticalPods:0sstaticPodPath:/etc/kubernetes/manifestsstreamingConnectionIdleTimeout:0ssyncFrequency:0svolumeStatsAggPeriod:0s

配置提示

对于上面的资源清单的文档比较杂,要想完整了解上面的资源对象对应的属性,可以查看对应的 godoc 文档,地址:https:[root@master1 ~]#kubeadm config images list --config kubeadm.yaml#记得昨天测试都没报错啊,现在怎么报错了。。。可以忽略,不影响后续使用的。

配置文件准备好过后,可以使用如下命令先将相关镜像 pull 下面:

上面在拉取 coredns镜像的时候出错了,阿里云仓库里没有找到这个镜像,我们可以手动到官方仓库 pull 该镜像,然后重新 tag 下镜像地址即可:

bash
[root@master1 ~]#ctr -n k8s.io i pull docker.io/coredns/coredns:1.8.4docker.io/coredns/coredns:1.8.4:resolved|++++++++++++++++++++++++++++++++++++++|index-sha256:6e5a02c21641597998b4be7cb5eb1e7b02c0d8d23cce4dd09f4682d463798890:done|++++++++++++++++++++++++++++++++++++++|manifest-sha256:10683d82b024a58cc248c468c2632f9d1b260500f7cd9bb8e73f751048d7d6d4:done|++++++++++++++++++++++++++++++++++++++|layer-sha256:bc38a22c706b427217bcbd1a7ac7c8873e75efdd0e59d6b9f069b4b243db4b4b:done|++++++++++++++++++++++++++++++++++++++|config-sha256:8d147537fb7d1ac8895da4d55a5e53621949981e2e6460976dae812f83d84a44:done|++++++++++++++++++++++++++++++++++++++|layer-sha256:c6568d217a0023041ef9f729e8836b19f863bcdb612bb3a329ebc165539f5a80:done|++++++++++++++++++++++++++++++++++++++|elapsed:15.9stotal:12.1M(780.6 KiB/s) unpackinglinux/amd64sha256:6e5a02c21641597998b4be7cb5eb1e7b02c0d8d23cce4dd09f4682d463798890...done:684.151259ms[root@master1 ~]#ctr -n k8s.io i ls -qdocker.io/coredns/coredns:1.8.4registry.aliyuncs.com/k8sxio/etcd:3.5.0-0registry.aliyuncs.com/k8sxio/etcd@sha256:9ce33ba33d8e738a5b85ed50b5080ac746deceed4a7496c550927a7a19ca3b6dregistry.aliyuncs.com/k8sxio/kube-apiserver:v1.22.2registry.aliyuncs.com/k8sxio/kube-apiserver@sha256:eb4fae890583e8d4449c1e18b097aec5574c25c8f0323369a2df871ffa146f41registry.aliyuncs.com/k8sxio/kube-controller-manager:v1.22.2registry.aliyuncs.com/k8sxio/kube-controller-manager@sha256:91ccb477199cdb4c63fb0c8fcc39517a186505daf4ed52229904e6f9d09fd6f9registry.aliyuncs.com/k8sxio/kube-proxy:v1.22.2registry.aliyuncs.com/k8sxio/kube-proxy@sha256:561d6cb95c32333db13ea847396167e903d97cf6e08dd937906c3dd0108580b7registry.aliyuncs.com/k8sxio/kube-scheduler:v1.22.2registry.aliyuncs.com/k8sxio/kube-scheduler@sha256:c76cb73debd5e37fe7ad42cea9a67e0bfdd51dd56be7b90bdc50dd1bc03c018bregistry.aliyuncs.com/k8sxio/pause:3.5registry.aliyuncs.com/k8sxio/pause@sha256:1ff6c18fbef2045af6b9c16bf034cc421a29027b800e4f9b68ae9b1cb3e9ae07sha256:0048118155842e4c91f0498dd298b8e93dc3aecc7052d9882b76f48e311a76basha256:5425bcbd23c54270d9de028c09634f8e9a014e9351387160c133ccf3a53ab3dcsha256:873127efbc8a791d06e85271d9a2ec4c5d58afdf612d490e24fb3ec68e891c8dsha256:8d147537fb7d1ac8895da4d55a5e53621949981e2e6460976dae812f83d84a44sha256:b51ddc1014b04295e85be898dac2cd4c053433bfe7e702d7e9d6008f3779609bsha256:e64579b7d8862eff8418d27bf67011e348a5d926fa80494a6475b3dc959777f5sha256:ed210e3e4a5bae1237f1bb44d72a05a2f1e5c6bfe7a7e73da179e2534269c459[root@master1 ~]#[root@master1 ~]#ctr -n k8s.io i tag docker.io/coredns/coredns:1.8.4 registry.aliyuncs.com/k8sxio/coredns:v1.8.4registry.aliyuncs.com/k8sxio/coredns:v1.8.4[root@master1 ~]#

然后就可以使用上面的配置文件在 master 节点上进行初始化:

这里需要特别注意下:会报错。。。

bash
[root@master1 ~]#kubeadm init --config kubeadm.yamlW103107:14:21.83705926278strict.go:55]errorunmarshalingconfigurationschema.GroupVersionKind{Group:"kubelet.config.k8s.io",Version:"v1beta1",Kind:"KubeletConfiguration"}:errorconvertingYAMLtoJSON:yaml:unmarshalerrors:line27:key"cgroupDriver"alreadysetinmap[init] Using Kubernetes version:v1.22.2[preflight] Running pre-flight checks[preflight] Pulling images required forsetting up a Kubernetes cluster[preflight] This might take a minute or two,depending on the speed of your internet connection[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'[certs] Using certificateDir folder "/etc/kubernetes/pki"[certs] Generating "ca"certificate and key[certs] Generating "apiserver"certificate and key[certs] apiserver serving cert is signed forDNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.c luster.local master1] and IPs [10.96.0.1172.29.9.51][certs] Generating "apiserver-kubelet-client"certificate and key[certs] Generating "front-proxy-ca"certificate and key[certs] Generating "front-proxy-client"certificate and key[certs] Generating "etcd/ca"certificate and key[certs] Generating "etcd/server"certificate and key[certs] etcd/server serving cert is signed forDNS names [localhost master1] and IPs [172.29.9.51127.0.0.1::1][certs] Generating "etcd/peer"certificate and key[certs] etcd/peer serving cert is signed forDNS names [localhost master1] and IPs [172.29.9.51127.0.0.1::1][certs] Generating "etcd/healthcheck-client"certificate and key[certs] Generating "apiserver-etcd-client"certificate and key[certs] Generating "sa"key and public key[kubeconfig] Using kubeconfig folder "/etc/kubernetes"[kubeconfig] Writing "admin.conf"kubeconfig file[kubeconfig] Writing "kubelet.conf"kubeconfig file[kubeconfig] Writing "controller-manager.conf"kubeconfig file[kubeconfig] Writing "scheduler.conf"kubeconfig file[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"[kubelet-start] Starting the kubelet[control-plane] Using manifest folder "/etc/kubernetes/manifests"[control-plane] Creating static Pod manifest for"kube-apiserver"[control-plane] Creating static Pod manifest for"kube-controller-manager"[control-plane] Creating static Pod manifest for"kube-scheduler"[etcd] Creating static Pod manifest forlocal etcd in "/etc/kubernetes/manifests"[wait-control-plane] Waiting forthe kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s[kubelet-check] Initial timeout of 40s passed.Unfortunately,anerrorhasoccurred:timedoutwaitingfortheconditionThiserrorislikelycausedby:-Thekubeletisnotrunning-Thekubeletisunhealthyduetoamisconfigurationofthenodeinsomeway(required cgroupsdisabled)Ifyouareonasystemd-poweredsystem,youcantrytotroubleshoottheerrorwiththefollowingcommands:-'systemctl status kubelet'-'journalctl -xeu kubelet'Additionally,acontrolplanecomponentmayhavecrashedorexitedwhenstartedbythecontainerruntime.Totroubleshoot,listallcontainersusingyourpreferredcontainerruntimesCLI.HereisoneexamplehowyoumaylistallKubernetescontainersrunningincri-o/containerdusingcrictl:-'crictl --runtime-endpoint /run/containerd/containerd.sock ps -a |grep kube |grep -v pause'Onceyouhavefoundthefailingcontainer,youcaninspectitslogswith:-'crictl --runtime-endpoint /run/containerd/containerd.sock logs CONTAINERID'errorexecutionphasewait-control-plane:couldn't initialize a Kubernetes clusterTo see the stack trace of this error execute with --v=5 or higher[root@master1 ~]#

我们进一步排查报错log:

通过上述排查,从vim /var/log/messages可以看出是error="failed to get sandbox image \"k8s.gcr.io/pause:3.5\"问题。

咦,奇怪了,不是本地都已经下载好了阿里云pause镜像了吗,这里怎么提示还从默认k8s仓库拉pause镜像呢?

这里我们再根据报错提示再次从k8s官方仓库拉取下pause镜像,看下效果:

尝试了几次,发现从k8s官方仓库拉取pause镜像一直失败,即使科学上网也还是有问题。

咦,我们不是可以直接把阿里云仓库下载的镜像直接打下tag不就可以了吗,下面测试下:

此时,我们用kubeadm reset命令清空下刚才master1节点,再次初始化集群看下下效果:

bash
[root@master1 ~]#kubeadm init --config kubeadm.yamlW103107:56:49.68172727288strict.go:55]errorunmarshalingconfigurationschema.GroupVersionKind{Group:"kubelet.config.k8s.io",Version:"v1beta1",Kind:"KubeletConfiguration"}:errorconvertingYAMLtoJSON:yaml:unmarshalerrors:line27:key"cgroupDriver"alreadysetinmap[init] Using Kubernetes version:v1.22.2[preflight] Running pre-flight checks[preflight] Pulling images required forsetting up a Kubernetes cluster[preflight] This might take a minute or two,depending on the speed of your internet connection[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'[certs] Using certificateDir folder "/etc/kubernetes/pki"[certs] Generating "ca"certificate and key[certs] Generating "apiserver"certificate and key[certs] apiserver serving cert is signed forDNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local master1] and IPs [10.96.0.1172.29.9.51][certs] Generating "apiserver-kubelet-client"certificate and key[certs] Generating "front-proxy-ca"certificate and key[certs] Generating "front-proxy-client"certificate and key[certs] Generating "etcd/ca"certificate and key[certs] Generating "etcd/server"certificate and key[certs] etcd/server serving cert is signed forDNS names [localhost master1] and IPs [172.29.9.51127.0.0.1::1][certs] Generating "etcd/peer"certificate and key[certs] etcd/peer serving cert is signed forDNS names [localhost master1] and IPs [172.29.9.51127.0.0.1::1][certs] Generating "etcd/healthcheck-client"certificate and key[certs] Generating "apiserver-etcd-client"certificate and key[certs] Generating "sa"key and public key[kubeconfig] Using kubeconfig folder "/etc/kubernetes"[kubeconfig] Writing "admin.conf"kubeconfig file[kubeconfig] Writing "kubelet.conf"kubeconfig file[kubeconfig] Writing "controller-manager.conf"kubeconfig file[kubeconfig] Writing "scheduler.conf"kubeconfig file[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"[kubelet-start] Starting the kubelet[control-plane] Using manifest folder "/etc/kubernetes/manifests"[control-plane] Creating static Pod manifest for"kube-apiserver"[control-plane] Creating static Pod manifest for"kube-controller-manager"[control-plane] Creating static Pod manifest for"kube-scheduler"[etcd] Creating static Pod manifest forlocal etcd in "/etc/kubernetes/manifests"[wait-control-plane] Waiting forthe kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s[kubelet-check] Initial timeout of 40s passed.[apiclient] All control plane components are healthy after 210.014030 seconds[upload-config] Storing the configuration used in ConfigMap "kubeadm-config"in the "kube-system"Namespace[kubelet] Creating a ConfigMap "kubelet-config-1.22"in namespace kube-system with the configuration forthe kubelets in the cluster[upload-certs] Skipping phase. Please see --upload-certs[mark-control-plane] Marking the node master1 as control-plane by adding the labels:[node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers][mark-control-plane] Marking the node master1 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule][bootstrap-token] Using token:abcdef.0123456789abcdef[bootstrap-token] Configuring bootstrap tokens,cluster-info ConfigMap,RBAC Roles[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order fornodes to get long term certificate credentials[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token[bootstrap-token] configured RBAC rules to allow certificate rotation forall node client certificates in the cluster[bootstrap-token] Creating the "cluster-info"ConfigMap in the "kube-public"namespace[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf"to point to a rotatable kubelet client certificate and key[addons] Applied essential addon:CoreDNS[addons] Applied essential addon:kube-proxyYourKubernetescontrol-planehasinitializedsuccessfully!Tostartusingyourcluster,youneedtorunthefollowingasaregularuser:mkdir-p$HOME/.kubesudocp-i/etc/kubernetes/admin.conf$HOME/.kube/configsudochown$(id-u):$(id-g) $HOME/.kube/configAlternatively,ifyouaretherootuser,youcanrun:exportKUBECONFIG=/etc/kubernetes/admin.confYoushouldnowdeployapodnetworktothecluster.Run"kubectl apply -f [podnetwork].yaml"withoneoftheoptionslistedat:https:Thenyoucanjoinanynumberofworkernodesbyrunningthefollowingoneachasroot:kubeadmjoin172.29.9.51:6443--tokenabcdef.0123456789abcdef\--discovery-token-ca-cert-hashsha256:7fb11aea8a467bd1453efe10600c167b87a5f04d55d7e60298583a6a0c736ec4[root@master1 ~]#

注意:

I1030 07:26:13.898398   18436 checks.go:205] validating availability of port 10250  kubelet端口 I1030 07:26:13.898547   18436 checks.go:205] validating availability of port 2379 etcd端口 I1030 07:26:13.898590   18436 checks.go:205] validating availability of port 2380 etcd端口

master1节点初始化成功。

根据安装提示拷贝 kubeconfig 文件:

然后可以使用 kubectl 命令查看 master 节点已经初始化成功了:

3.4添加节点

记住初始化集群上面的配置和操作要提前做好,将 master 节点上面的 $HOME/.kube/config文件拷贝到 node 节点对应的文件中,安装 kubeadm、kubelet、kubectl(可选),然后执行上面初始化完成后提示的 join 命令即可:

bash
kubeadmjoin172.29.9.51:6443--tokenabcdef.0123456789abcdef\--discovery-token-ca-cert-hashsha256:7fb11aea8a467bd1453efe10600c167b87a5f04d55d7e60298583a6a0c736ec4

join 命令:如果忘记了上面的 join 命令可以使用命令 kubeadm token create --print-join-command重新获取。

执行成功后运行 get nodes 命令:

[root@master1 ~]#kubectl get nodeNAME STATUS ROLES AGE VERSIONmaster1 Ready control-plane,master 31m v1.22.2node1 Ready <none>102s v1.22.2node2 Ready <none>95s v1.22.2[root@master1 ~]#

3.5安装网络插件flannel

这个时候其实集群还不能正常使用,因为还没有安装网络插件,接下来安装网络插件,可以在文档 https:# 搜索到名为 kube-flannel-ds 的 DaemonSet,在kube-flannel容器下面~vikube-flannel.yml......containers:-name:kube-flannelimage:quay.io/coreos/flannel:v0.15.0command:-/opt/bin/flanneldargs:---ip-masq---kube-subnet-mgr---iface=eth0# 如果是多网卡的话,指定内网网卡的名称......

bash
https:---apiVersion:policy/v1beta1kind:PodSecurityPolicymetadata:name:psp.flannel.unprivilegedannotations:seccomp.security.alpha.kubernetes.io/allowedProfileNames:docker/defaultseccomp.security.alpha.kubernetes.io/defaultProfileName:docker/defaultapparmor.security.beta.kubernetes.io/allowedProfileNames:runtime/defaultapparmor.security.beta.kubernetes.io/defaultProfileName:runtime/defaultspec:privileged:falsevolumes:-configMap-secret-emptyDir-hostPathallowedHostPaths:-pathPrefix:"/etc/cni/net.d"-pathPrefix:"/etc/kube-flannel"-pathPrefix:"/run/flannel"readOnlyRootFilesystem:false# Users and groupsrunAsUser:rule:RunAsAnysupplementalGroups:rule:RunAsAnyfsGroup:rule:RunAsAny# Privilege EscalationallowPrivilegeEscalation:falsedefaultAllowPrivilegeEscalation:false# CapabilitiesallowedCapabilities:['NET_ADMIN','NET_RAW']defaultAddCapabilities:[]requiredDropCapabilities:[]# Host namespaceshostPID:falsehostIPC:falsehostNetwork:truehostPorts:-min:0max:65535# SELinuxseLinux:# SELinux is unused in CaaSPrule:'RunAsAny'---kind:ClusterRoleapiVersion:rbac.authorization.k8s.io/v1metadata:name:flannelrules:-apiGroups:['extensions']resources:['podsecuritypolicies']verbs:['use']resourceNames:['psp.flannel.unprivileged']-apiGroups:-""resources:-podsverbs:-get-apiGroups:-""resources:-nodesverbs:-list-watch-apiGroups:-""resources:-nodes/statusverbs:-patch---kind:ClusterRoleBindingapiVersion:rbac.authorization.k8s.io/v1metadata:name:flannelroleRef:apiGroup:rbac.authorization.k8s.iokind:ClusterRolename:flannelsubjects:-kind:ServiceAccountname:flannelnamespace:kube-system---apiVersion:v1kind:ServiceAccountmetadata:name:flannelnamespace:kube-system---kind:ConfigMapapiVersion:v1metadata:name:kube-flannel-cfgnamespace:kube-systemlabels:tier:nodeapp:flanneldata:cni-conf.json:|{"name":"cbr0","cniVersion":"0.3.1","plugins":[{"type":"flannel","delegate":{"hairpinMode":true,"isDefaultGateway":true}},{"type":"portmap","capabilities":{"portMappings":true}}]}net-conf.json:|{"Network":"10.244.0.0/16","Backend":{"Type":"vxlan"}}---apiVersion:apps/v1kind:DaemonSetmetadata:name:kube-flannel-dsnamespace:kube-systemlabels:tier:nodeapp:flannelspec:selector:matchLabels:app:flanneltemplate:metadata:labels:tier:nodeapp:flannelspec:affinity:nodeAffinity:requiredDuringSchedulingIgnoredDuringExecution:nodeSelectorTerms:-matchExpressions:-key:kubernetes.io/osoperator:Invalues:-linuxhostNetwork:truepriorityClassName:system-node-criticaltolerations:-operator:Existseffect:NoScheduleserviceAccountName:flannelinitContainers:-name:install-cni-pluginimage:rancher/mirrored-flannelcni-flannel-cni-plugin:v1.2command:-cpargs:--f-/flannel-/opt/cni/bin/flannelvolumeMounts:-name:cni-pluginmountPath:/opt/cni/bin-name:install-cniimage:quay.io/coreos/flannel:v0.15.0command:-cpargs:--f-/etc/kube-flannel/cni-conf.json-/etc/cni/net.d/10-flannel.conflistvolumeMounts:-name:cnimountPath:/etc/cni/net.d-name:flannel-cfgmountPath:/etc/kube-flannel/containers:-name:kube-flannelimage:quay.io/coreos/flannel:v0.15.0command:-/opt/bin/flanneldargs:---ip-masq---kube-subnet-mgrresources:requests:cpu:"100m"memory:"50Mi"limits:cpu:"100m"memory:"50Mi"securityContext:privileged:falsecapabilities:add:["NET_ADMIN","NET_RAW"]env:-name:POD_NAMEvalueFrom:fieldRef:fieldPath:metadata.name-name:POD_NAMESPACEvalueFrom:fieldRef:fieldPath:metadata.namespacevolumeMounts:-name:runmountPath:/run/flannel-name:flannel-cfgmountPath:/etc/kube-flannel/volumes:-name:runhostPath:path:/run/flannel-name:cni-pluginhostPath:path:/opt/cni/bin-name:cnihostPath:path:/etc/cni/net.d-name:flannel-cfgconfigMap:name:kube-flannel-cfg

可以看出:node1和node2的  kube-flannel、kube-proxy pod均没有启动成功!

这些组件都是以daemonset方式启动的,即每一个节点均会起一个pod;

我们通过kubectl describe pod xxx命令来看下这些pod部署失败的原因:

经查看发现导致node1、node2节点部署kube-flannel、kube-proxy pod失败的原因是没有k8s.gcr.io/pause:3.5镜像,这个应该和在master节点遇到的现象一样。

接下来我们在node1节点上测试下:

我们先来看下本地存在哪些镜像?

查看发现本地无任何镜像。

根据报错提示,我们手动拉取下k8s.gcr.io/pause:3.5镜像:

结果报错,看这个报错是这个镜像无法拉取。

此时,我们采用阿里源拉取下这个镜像,然后再重新打下tag,和master1节点一样:

重新打好tag后,我们再次回到master1节点观察下pod启动情况:

现在发现node1节点的kube-flannel、kube-peoxy pod都启动完成了,就剩node2的了,那就是这个问题了。

我们按照相同方法在node2节点上进行相同操作:

最终,在master1节点上再次确认下pod启动情况:

目前,一切正常。

当我们部署完网络插件后执行 ifconfig 命令,正常会看到新增的cni0flannel1这两个虚拟设备,但是如果没有看到cni0这个设备也不用太担心,我们可以观察/var/lib/cni目录是否存在,如果不存在并不是说部署有问题,而是该节点上暂时还没有应用运行,我们只需要在该节点上运行一个 Pod 就可以看到该目录会被创建,并且cni0设备也会被创建出来。

用同样的方法添加另外一个节点即可。

4、Dashboard

v1.22.2版本的集群需要安装最新的 2.0+ 版本的 Dashboard:

4.1下载kube-dashboard的yaml文件

bash
# 推荐使用下面这种方式[root@master1 ~]#wget https:--2021-10-3122:56:47--https:Resolvingraw.githubusercontent.com(raw.githubusercontent.com)... 185.199.110.133,185.199.111.133,185.199.108.133,...Connectingtoraw.githubusercontent.com(raw.githubusercontent.com)|185.199.110.133|:443...connected.HTTPrequestsent,awaitingresponse...200OKLength:7552(7.4K) [text/plain]Savingto:‘recommended.yaml’100%[===============================================================================================================================>]7,5523.76KB/sin2.0s2021-10-3122:56:55(3.76 KB/s) - ‘recommended.yaml’ saved [7552/7552][root@master1 ~]#lltotal124120-rw-r--r--1rootroot127074581Jul3001:16cri-containerd-cni-1.5.5-linux-amd64.tar.gz-rw-r--r--1rootroot1976Oct3106:54kubeadm.yaml-rw-r--r--1rootroot5175Oct3108:38kube-flannel.yml-rw-r--r--1rootroot7552Oct3122:56recommended.yaml[root@master1 ~]#mv recommended.yaml kube-dashboard.yaml

4.2修改kube-dashboard.yaml文件

bash
[root@master1 ~]#vim kube-dashboard.yaml# 修改Service为NodePort类型......kind:ServiceapiVersion:v1metadata:labels:k8s-app:kubernetes-dashboardname:kubernetes-dashboardnamespace:kubernetes-dashboardspec:ports:-port:443targetPort:8443selector:k8s-app:kubernetes-dashboardtype:NodePort# 加上type=NodePort变成NodePort类型的服务......

4.3部署kube-dashboard.yaml文件

在 YAML 文件中可以看到新版本 Dashboard 集成了一个 metrics-scraper的组件,可以通过 Kubernetes 的 Metrics API 收集一些基础资源的监控信息,并在 web 页面上展示,所以要想在页面上展示监控信息就需要提供 Metrics API,比如安装 Metrics Server。

直接创建:

bash
[root@master1 ~]#kubectl apply -f kube-dashboard.yamlnamespace/kubernetes-dashboardcreatedserviceaccount/kubernetes-dashboardcreatedservice/kubernetes-dashboardcreatedsecret/kubernetes-dashboard-certscreatedsecret/kubernetes-dashboard-csrfcreatedsecret/kubernetes-dashboard-key-holdercreatedconfigmap/kubernetes-dashboard-settingscreatedrole.rbac.authorization.k8s.io/kubernetes-dashboardcreatedclusterrole.rbac.authorization.k8s.io/kubernetes-dashboardcreatedrolebinding.rbac.authorization.k8s.io/kubernetes-dashboardcreatedclusterrolebinding.rbac.authorization.k8s.io/kubernetes-dashboardcreateddeployment.apps/kubernetes-dashboardcreatedservice/dashboard-metrics-scrapercreatedWarning:spec.template.metadata.annotations[seccomp.security.alpha.kubernetes.io/pod]:deprecatedsincev1.19;usethe"seccompProfile"fieldinsteaddeployment.apps/dashboard-metrics-scrapercreated[root@master1 ~]#

新版本的 Dashboard 会被默认安装在 kubernetes-dashboard 这个命名空间下面:

bash
[root@master1 ~]#kubectl get pod -ANAMESPACENAMEREADYSTATUSRESTARTSAGEkube-systemcoredns-7568f67dbd-559d21/1Running014mkube-systemcoredns-7568f67dbd-b95fg1/1Running014mkube-systemetcd-master11/1Running115hkube-systemkube-apiserver-master11/1Running2(15h ago) 15hkube-systemkube-controller-manager-master11/1Running115hkube-systemkube-flannel-ds-gprz71/1Running014hkube-systemkube-flannel-ds-h9pw61/1Running014hkube-systemkube-flannel-ds-pct7r1/1Running014hkube-systemkube-proxy-4sw761/1Running015hkube-systemkube-proxy-mkghd1/1Running014hkube-systemkube-proxy-s27481/1Running014hkube-systemkube-scheduler-master11/1Running115hkubernetes-dashboarddashboard-metrics-scraper-856586f554-zwv6r1/1Running034skubernetes-dashboardkubernetes-dashboard-67484c44f6-hh5zj1/1Running034s[root@master1 ~]#

4.4配置cni

我们仔细看可以发现上面的 Pod 分配的 IP 段是 10.88.xx.xx,包括前面自动安装的 CoreDNS 也是如此,我们前面不是配置的 podSubnet 为 10.244.0.0/16吗?

我们先去查看下 CNI 的配置文件:

bash
[root@node1 ~]#ll /etc/cni/net.d/total8-rw-r--r--11001116604Jul3001:1310-containerd-net.conflist-rw-r--r--1rootroot292Oct3120:3010-flannel.conflist[root@node1 ~]#

可以看到里面包含两个配置,一个是 10-containerd-net.conflist,另外一个是我们上面创建的 Flannel 网络插件生成的配置,我们的需求肯定是想使用 Flannel 的这个配置,我们可以查看下 containerd 这个自带的 cni 插件配置:

bash
[root@node1 net.d]#cat 10-containerd-net.conflist{"cniVersion":"0.4.0","name":"containerd-net","plugins":[{"type":"bridge","bridge":"cni0","isGateway":true,"ipMasq":true,"promiscMode":true,"ipam":{"type":"host-local","ranges":[[{"subnet":"10.88.0.0/16"}],[{"subnet":"2001:4860:4860::/64"}]],"routes":[{"dst":"0.0.0.0/0"},{"dst":"::/0"}]}},{"type":"portmap","capabilities":{"portMappings":true}}]}[root@node1 net.d]#

可以看到上面的 IP 段恰好就是 10.88.0.0/16,但是这个 cni 插件类型是 bridge网络,网桥的名称为 cni0

但是使用 bridge 网络的容器无法跨多个宿主机进行通信,跨主机通信需要借助其他的 cni 插件,比如上面我们安装的 Flannel,或者 Calico 等等。

由于我们这里有两个 cni 配置,所以我们需要将 10-containerd-net.conflist这个配置删除,因为如果这个目录中有多个 cni 配置文件,kubelet 将会使用按文件名的字典顺序排列的第一个作为配置文件,所以前面默认选择使用的是 containerd-net这个插件。

在node1上操作:

bash
[root@node1 ~]#mv /etc/cni/net.d/10-containerd-net.conflist{,.bak}[root@node1 ~]#ifconfig cni0 down &&iplinkdeletecni0[root@node1 ~]#systemctl daemon-reload[root@node1 ~]#systemctl restart containerd kubelet#命令汇总mv/etc/cni/net.d/10-containerd-net.conflist{,.bak}ifconfigcni0down&&iplinkdeletecni0systemctldaemon-reloadsystemctlrestartcontainerdkubelet

按删除方法把node2节点也配置下:

bash
[root@node2 ~]#mv /etc/cni/net.d/10-containerd-net.conflist{,.bak}[root@node2 ~]#ifconfig cni0 down &&iplinkdeletecni0[root@node2 ~]#systemctl daemon-reload[root@node2 ~]#systemctl restart containerd kubelet

然后记得重建 coredns 和 dashboard 的 Pod,重建后 Pod 的 IP 地址就正常了:

bash
[root@master1 ~]#kubectl delete pod coredns-7568f67dbd-ftnlb coredns-7568f67dbd-scdb5 -nkube-systempod"coredns-7568f67dbd-ftnlb"deletedpod"coredns-7568f67dbd-scdb5"deleted[root@master1 ~]#kubectl delete pod dashboard-metrics-scraper-856586f554-r5qvs kubernetes-dashboard-67484c44f6-9dg85 -nkubernetes-dashboardpod"dashboard-metrics-scraper-856586f554-r5qvs"deletedpod"kubernetes-dashboard-67484c44f6-9dg85"deleted

查看 Dashboard 的 NodePort 端口:

bash
[root@master1 ~]#kubectl get svc -ANAMESPACENAMETYPECLUSTER-IPEXTERNAL-IPPORT(S) AGEdefaultkubernetesClusterIP10.96.0.1<none>443/TCP15hkube-systemkube-dnsClusterIP10.96.0.10<none>53/UDP,53/TCP,9153/TCP15hkubernetes-dashboarddashboard-metrics-scraperClusterIP10.99.185.146<none>8000/TCP18mkubernetes-dashboardkubernetes-dashboardNodePort10.105.110.130<none>443:30498/TCP18m[root@master1 ~]#

然后可以通过上面的 31498端口去访问 Dashboard,要记住使用 https,Chrome 不生效可以使用Firefox测试,如果没有 Firefox 下面打不开页面,可以点击下页面中的信任证书即可:

信任后就可以访问到 Dashboard 的登录页面了:

然后创建一个具有全局所有权限的用户来登录 Dashboard:(admin.yaml)

yaml
#admin.yamlkind:ClusterRoleBindingapiVersion:rbac.authorization.k8s.io/v1metadata:name:adminroleRef:kind:ClusterRolename:cluster-adminapiGroup:rbac.authorization.k8s.iosubjects:- kind:ServiceAccountname:adminnamespace:kubernetes-dashboard---apiVersion:v1kind:ServiceAccountmetadata:name:adminnamespace:kubernetes-dashboard

直接创建:

bash
[root@master1 ~]#kubectl apply -f admin.yamlclusterrolebinding.rbac.authorization.k8s.io/admincreatedserviceaccount/admincreated[root@master1 ~]#[root@master1 ~]#kubectl get secret admin-token-sbh8v -o jsonpath={.data.token}-nkubernetes-dashboard|base64-deyJhbGciOiJSUzI1NiIsImtpZCI6Im9hOERROHhuWTRSeVFweG1kdnpXSTRNNXg3bDZ0ZkRDVWNzN0l5Z0haT1UifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJhZG1pbi10b2tlbi1zYmg4diIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJhZG1pbiIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjYwMDVmZjc5LWJlYzItNDE5MC1iMmNmLWMwOGVhNDRmZTVmMCIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlcm5ldGVzLWRhc2hib2FyZDphZG1pbiJ9.FyxLcGjtxI5Asl07Z21FzAFFcgwVFI7zc-2ITU6uQV3dzQaUNEl642MyIkhkEvdqd6He0eKgR79xm1ly9PL0exD2YUVFfeCnt-M-LiJwM59YGEQfQHYcku6ozikyQ7YeooV2bQ6EsAsyseVcCrHfa4ZXjmBb9L5rRX9Kds49yVsVFWbgbhe3LzNFMy1wm4fTN4EXPvegmA6mUaMLlLsrQJ2oNokx9TYuifIXjQDATZMTFc-YQMZahAbT-rUz8KccOt1O59NebitAHW0YKVFTkxbvBQwghe_yf25_j07LRbygSnHV5OrMEqZXl82AhdnXsvqdjjes6AxejGuDtwiiyw[root@master1~]## 会生成一串很长的base64后的字符串

然后用上面的 base64 解码后的字符串作为 token 登录 Dashboard 即可,新版本还新增了一个暗黑模式:

5、清理

如果你的集群安装过程中遇到了其他问题,我们可以使用下面的命令来进行重置:

bash
~kubeadmreset~ifconfigcni0down&&iplinkdeletecni0~ifconfigflannel.1down&&iplinkdeleteflannel.1➜~rm -rf /var/lib/cni/

最终我们就完成了使用 kubeadm 搭建 v1.22.1 版本的 kubernetes 集群、coredns、ipvs、flannel、containerd。

最后记得做下3台虚机的快照,方便后期还原。

注意

1.coredns镜像和pause镜像问题(已解决)

01.coredns镜像拉取失败问题

报错现象:在拉取registry.aliyuncs.com/k8sxio/coredns:v1.8.4时报错,说阿里云的k8s.io仓库里没这个镜像。

因此只能先从官方拉取这个镜像,然后再打上tag即可:

bash
ctr-nk8s.ioipulldocker.io/coredns/coredns:1.8.4ctr-nk8s.ioitagdocker.io/coredns/coredns:1.8.4registry.aliyuncs.com/k8sxio/coredns:v1.8.4

02.从阿里仓库k8sxio仓库拉取的pause镜像无法使用问题

从阿里云仓库下载的3个节点的pause镜像,k8s不识别,还依然去官网拉取pasue急需耐心那个,需要重新打成官网k8s.gcr.io/pause:3.5才可以。

可能是个bug问题,先搁置,或者使用最新版的v1.22.3去测试下。

bash
[root@master1 ~]#systemctl cat kubelet[root@master1 ~]#cat /var/lib/kubelet/kubeadm-flags.env

可以看到kubelet拉取镜像的仓库地址已经修改成了阿里云k8sxio仓库:

bash
[root@master1 ~]#vim kubeadm.yaml

修改方法:

bash
ctr-nk8s.ioipullregistry.aliyuncs.com/k8sxio/pause:3.5ctr-nk8s.ioitagregistry.aliyuncs.com/k8sxio/pause:3.5k8s.gcr.io/pause:3.5
bash
[root@master1 ~]#ctr -n k8s.io i ls -qdocker.io/coredns/coredns:1.8.4docker.io/rancher/mirrored-flannelcni-flannel-cni-plugin:v1.2docker.io/rancher/mirrored-flannelcni-flannel-cni-plugin@sha256:b69fb2dddf176edeb7617b176543f3f33d71482d5d425217f360eca5390911dck8s.gcr.io/pause:3.5quay.io/coreos/flannel:v0.15.0quay.io/coreos/flannel@sha256:bf24fa829f753d20b4e36c64cf9603120c6ffec9652834953551b3ea455c4630registry.aliyuncs.com/k8sxio/coredns:v1.8.4registry.aliyuncs.com/k8sxio/etcd:3.5.0-0registry.aliyuncs.com/k8sxio/etcd@sha256:9ce33ba33d8e738a5b85ed50b5080ac746deceed4a7496c550927a7a19ca3b6dregistry.aliyuncs.com/k8sxio/kube-apiserver:v1.22.2registry.aliyuncs.com/k8sxio/kube-apiserver@sha256:eb4fae890583e8d4449c1e18b097aec5574c25c8f0323369a2df871ffa146f41registry.aliyuncs.com/k8sxio/kube-controller-manager:v1.22.2registry.aliyuncs.com/k8sxio/kube-controller-manager@sha256:91ccb477199cdb4c63fb0c8fcc39517a186505daf4ed52229904e6f9d09fd6f9registry.aliyuncs.com/k8sxio/kube-proxy:v1.22.2registry.aliyuncs.com/k8sxio/kube-proxy@sha256:561d6cb95c32333db13ea847396167e903d97cf6e08dd937906c3dd0108580b7registry.aliyuncs.com/k8sxio/kube-scheduler:v1.22.2registry.aliyuncs.com/k8sxio/kube-scheduler@sha256:c76cb73debd5e37fe7ad42cea9a67e0bfdd51dd56be7b90bdc50dd1bc03c018bregistry.aliyuncs.com/k8sxio/pause:3.5registry.aliyuncs.com/k8sxio/pause@sha256:1ff6c18fbef2045af6b9c16bf034cc421a29027b800e4f9b68ae9b1cb3e9ae07sha256:0048118155842e4c91f0498dd298b8e93dc3aecc7052d9882b76f48e311a76basha256:09b38f011a29c697679aa10918b7514e22136b50ceb6cf59d13151453fe8b7a0sha256:5425bcbd23c54270d9de028c09634f8e9a014e9351387160c133ccf3a53ab3dcsha256:873127efbc8a791d06e85271d9a2ec4c5d58afdf612d490e24fb3ec68e891c8dsha256:8d147537fb7d1ac8895da4d55a5e53621949981e2e6460976dae812f83d84a44sha256:98660e6e4c3ae49bf49cd640309f79626c302e1d8292e1971dcc2e6a6b7b8c4dsha256:b51ddc1014b04295e85be898dac2cd4c053433bfe7e702d7e9d6008f3779609bsha256:e64579b7d8862eff8418d27bf67011e348a5d926fa80494a6475b3dc959777f5sha256:ed210e3e4a5bae1237f1bb44d72a05a2f1e5c6bfe7a7e73da179e2534269c459[root@master1~]#


说明:2021年11月4日19:47:09更新

本次实验:infra容器镜像pause问题

1、查看kubelet中--pod-infra-container-image选项含义

bash
[root@master1 ~]#kubelet --help|grepinfra--pod-infra-container-imagestringSpecifiedimagewillnotbeprunedbytheimagegarbagecollector.Whencontainer-runtimeissetto'docker',allcontainersineachpodwillusethenetwork/ipcnamespacesfromthisimage.OtherCRIimplementationshavetheirownconfigurationtosetthisimage.(default "k8s.gcr.io/pause:3.5")指定的镜像将不会被镜像垃圾收集器修剪。当容器运行时设置为'docker'时,每个pod中的所有容器都将使用这个镜像中的network/ipc名称空间。其他CRI实现有自己的配置来设置这个镜像。(默认“k8s.gcr.io /pause:3.5”)

这个k8s.gcr.io/pause:3.5,每个pod启动之前,都会用这个镜像去启动一个容器。

默认是这个k8s.gcr.io/pause:3.5镜像,但是实际上是通过修改这个kubelet中这个--pod-infra-container-image参数来指定的。

2、我们来看一下目前这个参数的配置

bash
[root@master1 ~]#systemctl status kubeletkubelet.service-kubelet:TheKubernetesNodeAgentLoaded:loaded(/usr/lib/systemd/system/kubelet.service;enabled;vendorpreset:disabled)Drop-In:/usr/lib/systemd/system/kubelet.service.d└─10-kubeadm.confActive:active(running) since Sun 2021-10-31 23:12:58 CST;3daysagoDocs:https:MainPID:95001(kubelet)Tasks:15Memory:89.7MCGroup:/system.slice/kubelet.service└─95001/usr/bin/kubelet--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf--kubeconfig=/etc/kubernetes/kubelet.conf--config=/var/lib/kubelet/...Nov0407:20:37master1kubelet[95001]:E110407:20:37.29529795001kubelet.go:2332]"Container runtime network not ready"networkReady="NetworkReady=fals...itialized"Nov0407:20:42master1kubelet[95001]:E110407:20:42.29738295001kubelet.go:2332]"Container runtime network not ready"networkReady="NetworkReady=fals...itialized"Nov0407:20:47master1kubelet[95001]:E110407:20:47.29943195001kubelet.go:2332]"Container runtime network not ready"networkReady="NetworkReady=fals...itialized"Nov0407:20:52master1kubelet[95001]:E110407:20:52.30312795001kubelet.go:2332]"Container runtime network not ready"networkReady="NetworkReady=fals...itialized"Nov0407:20:57master1kubelet[95001]:E110407:20:57.30500795001kubelet.go:2332]"Container runtime network not ready"networkReady="NetworkReady=fals...itialized"Nov0407:21:02master1kubelet[95001]:E110407:21:02.30635695001kubelet.go:2332]"Container runtime network not ready"networkReady="NetworkReady=fals...itialized"Nov0407:21:07master1kubelet[95001]:E110407:21:07.30843495001kubelet.go:2332]"Container runtime network not ready"networkReady="NetworkReady=fals...itialized"Nov0407:21:12master1kubelet[95001]:E110407:21:12.31112095001kubelet.go:2332]"Container runtime network not ready"networkReady="NetworkReady=fals...itialized"Nov0407:21:17master1kubelet[95001]:E110407:21:17.31370895001kubelet.go:2332]"Container runtime network not ready"networkReady="NetworkReady=fals...itialized"Nov0407:21:22master1kubelet[95001]:E110407:21:22.31537495001kubelet.go:2332]"Container runtime network not ready"networkReady="NetworkReady=fals...itialized"Hint:Somelineswereellipsized,use-ltoshowinfull.[root@master1 ~]#cat /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf# Note:This dropin only works with kubeadm and kubelet v1.11+[Service]Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"# This is a file that "kubeadm init"and "kubeadm join"generates at runtime,populating the KUBELET_KUBEADM_ARGS variable dynamicallyEnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably,the user should use# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.EnvironmentFile=-/etc/sysconfig/kubeletExecStart=ExecStart=/usr/bin/kubelet$KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS#查看环境变量[root@master1 ~]#cat /var/lib/kubelet/kubeadm-flags.envKUBELET_KUBEADM_ARGS="--container-runtime=remote --container-runtime-endpoint=/run/containerd/containerd.sock --pod-infra-container-image=registry.aliyuncs.com/k8sxio/pause:3.5"[root@master1 ~]#

可以看到--pod-infra-container-image=registry.aliyuncs.com/k8sxio/pause:3.5这个参数的值是阿里云镜像地址了,按理说,node节点这里如归指定了参数选项,就会默认会拉取这个镜像的,但实际上还是从默认地址k8s.gcr.io/pause:3.5拉取,这个是不正常的。

可能存在的原因是:

1.目前自己安装的版本是k8s v1.22.2版本,可能是这个版本存在bug问题;

2.升级一下;

3.回退一下版本或者更新一下版本;

4.老师去跟着一下源码;

这个不影响自己测试,但生产对环境还是影响很大的;

关于新版本是否会出现那个pause镜像 bug问题,自己还未测试,这里进行搁置!2021年11月2日22:12:00


  • 注意:阳明总最后解决方法。

我们再来看一下这个参数:

bash
[root@master1 ~]#kubelet --help|grepinfra--pod-infra-container-imagestringSpecifiedimagewillnotbeprunedbytheimagegarbagecollector.Whencontainer-runtimeissetto'docker',allcontainersineachpodwillusethenetwork/ipcnamespacesfromthisimage.OtherCRIimplementationshavetheirownconfigurationtosetthisimage.(default "k8s.gcr.io/pause:3.5")[root@master1 ~]#

我们重启下kubelet服务再查看下报错:

bash
[root@master1 ~]#tail -f /var/log/messages[root@master1 ~]#systemctl restart kubelet

可以看到一条告警:对于远程容器运行时,kublet中的--pod-infr     a-container-image参数将不生效,需要单独设置才行;

配置方法:

bash
[root@master1 ~]#vim /etc/containerd/config.toml……sandbox_image="registry.aliyuncs.com/k8sxio/pause:3.5"……

重启containerd服务:

bash
[root@master1 ~]#systemctl daemon-reload[root@master1 ~]#systemctl restart containerd

至此,目前这个pause问题已解决!

2.有的yaml文件无法下载问题

因网络问题,kube-dashboard.yaml和kube-flannel.yml文件无法下载,大家可直接使用我提供的yaml文件即可。

bash
[root@master1 ~]#lltotal 124124-rw-r--r-- 1 root root 7569 Oct 31 23:01 kube-dashboard.yaml-rw-r--r-- 1 root root 5175 Oct 31 08:38 kube-flannel.yml

关于我

我的博客主旨:

  • 排版美观,语言精炼;
  • 文档即手册,步骤明细,拒绝埋坑,提供源码;
  • 本人实战文档都是亲测成功的,各位小伙伴在实际操作过程中如有什么疑问,可随时联系本人帮您解决问题,让我们一起进步!

🍀 微信二维码 x2675263825 (舍得), qq:2675263825。

image-20230107215114763

🍀 微信公众号 《云原生架构师实战》

image-20230107215126971

🍀 个人博客站点

http:

版权:此文章版权归 One 所有,如有转载,请注明出处!

链接:可点击右上角分享此页面复制文章链接

上次更新时间:

最近更新