测试服务器上的calico-kube-controllers 这个pod一致报错,查看错误日志说是pod连接不上kube-apiserver,连接的kube-apiserver方式为,默认default namespace下的 kubernetes service,此serviceIP为:10.88.0.1
[root@public-test ~]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.88.0.1 <none> 443/TCP 116d
查询services后端的endpoints 如下:
NAME ENDPOINTS AGE
kubernetes 192.168.1.60:6443,192.168.1.61:6443,192.168.1.62:6443 116
为三台master机器IP
本来三台master上的节点都有污点,正常情况下,calico-kube-controller 这个pod不会调度到 master 节点上的,单实际情况:
calico-kube-controller 确实调度了master节点上,此pod的 readness 和liveness 检测都是连接到 直接连接 默认命名空间的kubernetes 服务的,偏巧的是 master没有kube-proxy 服务,没有办法通过svc 向后端 endpoints 进行负载均衡。
所以日志里一直提示连接kube-apiserver timeout .
后面看到 calico-etcd.yaml 文件,calico-kube-controller pod 设置了 “容忍度”
tolerations:
# Mark the pod as a critical add-on for rescheduling.
- key: CriticalAddonsOnly
operator: Exists
- key: node-role.kubernetes.io/master
effect: NoSchedule
在calico上启用 eBPF 需要借助calicoctl 工具,calicoctl 配置连接etcd 的方式请参照如下连接:
https://wenda.zuncuang.com/article/730
有关什么是eBPF 请参照:
使用calicoctl 启用 calicoctl 很简单:
calicoctl patch felixconfiguration default --patch='{"spec": {"bpfEnabled": true}}'
fs不过启用eBPF 需要较新的内核支持,calico 关于启用 eBPF 的优缺点,已经内核要求,我在calico官方网站找到一篇文章,摘录下来记录:
链接地址为:https://docs.projectcalico.org/archive/v3.21/maintenance/enabling-bpf#configure-calico-to-talk-directly-to-the-api-server
This guide explains how to enable the eBPF dataplane; a high-performance alternative to the standard (iptables based) dataplane for both Calico and kube-proxy.
The eBPF dataplane mode has several advantages over standard linux networking pipeline mode:
It has native support for Kubernetes services (without needing kube-proxy) that:
To learn more and see performance metrics from our test environment, see the blog, Introducing the Calico eBPF dataplane.
eBPF mode currently has some limitations relative to the standard Linux pipeline mode:
This how-to guide uses the following Calico features:
eBPF (or “extended Berkeley Packet Filter”), is a technology that allows safe mini programs to be attached to various low-level hooks in the Linux kernel. eBPF has a wide variety of uses, including networking, security, and tracing. You’ll see a lot of non-networking projects leveraging eBPF, but for Calico our focus is on networking, and in particular, pushing the networking capabilities of the latest Linux kernels to the limit.
eBPF mode has the following pre-requisites:
A supported Linux distribution:
If Calico does not detect a compatible kernel, Calico will emit a warning and fall back to standard linux networking.
For best pod-to-pod performance, an underlying network that doesn’t require Calico to use an overlay. For example:
If you must use an overlay, we recommend that you use VXLAN, not IPIP. VXLAN has much better performance than IPIP in eBPF mode due to various kernel optimisations.
Note: The default kernel used by EKS is not compatible with eBPF mode. If you wish to try eBPF mode with EKS, follow the Creating an EKS cluster for eBPF mode guide, which explain how to set up a suitable cluster.
This section explains how to make sure your cluster is suitable for eBPF mode.
To check that the kernel on a node is suitable, you can run
$ uname -rv
The output should look like this:
5.4.0-42-generic #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020
In this case the kernel version is v5.4, which is suitable.
On Red Hat-derived distributions, you may see something like this:
4.18.0-193.el8.x86_64 (mockbuild@x86-vm-08.build.eng.bos.redhat.com)
Since the Red Hat kernel is v4.18 with at least build number 193, this kernel is suitable.
To verify that the BPF filesystem is mounted, on the host, you can run the following command:
mount | grep "/sys/fs/bpf"
If the BPF filesystem is mounted, you should see:
none on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
If you see no output, then the BPF filesystem is not mounted; consult the documentation for your OS distribution to see how to make sure the file system is mounted at boot in its standard location /sys/fs/bpf. This may involve editing /etc/fstab or adding a systemd unit, depending on your distribution. If the file system is not mounted on the host then eBPF mode will work normally until Calico is restarted, at which point workload networking will be disrupted for several seconds.
If your distribution uses systemd, you can refer to the following settings:
cat <<EOF | sudo tee /etc/systemd/system/sys-fs-bpf.mount [Unit] Description=BPF mounts DefaultDependencies=no Before=local-fs.target umount.target After=swap.target [Mount] What=bpffs Where=/sys/fs/bpf Type=bpf Options=rw,nosuid,nodev,noexec,relatime,mode=700 [Install] WantedBy=multi-user.target EOF systemctl daemon-reload systemctl start sys-fs-bpf.mount systemctl enable sys-fs-bpf.mount
In eBPF mode, Calico implements Kubernetes service networking directly (rather than relying on kube-proxy). This means that, like kube-proxy, Calico must connect directly to the Kubernetes API server rather than via the API server’s ClusterIP.
First, make a note of the address of the API server:
If you have a single API server with a static IP address, you can use its IP address and port. The IP can be found by running:
The output should look like the following, with a single IP address and port under “ENDPOINTS”:
NAME ENDPOINTS AGE kubernetes 172.16.101.157:6443 40m
If there are multiple entries under “ENDPOINTS” then your cluster must have more than one API server. In that case, you should try to determine the load balancing approach used by your cluster and use the appropriate option below.
Tip: If your cluster uses a ConfigMap to configure kube-proxy you can find the “right” way to reach the API server by examining the config map. For example:
$ kubectl get configmap -n kube-system kube-proxy -o yaml | grep server` server: https://d881b853ae312e00302a84f1e346a77.gr7.us-west-2.eks.amazonaws.comIn this case, the server is d881b853aea312e00302a84f1e346a77.gr7.us-west-2.eks.amazonaws.com and the port is 443 (the standard HTTPS port).
The next step depends on whether you installed Calico using the operator, or a manifest:
If you installed Calico using a manifest, create the following config map in the kube-system namespace using the host and port determined above:
kind: ConfigMap apiVersion: v1 metadata: name: kubernetes-services-endpoint namespace: kube-system data: KUBERNETES_SERVICE_HOST: "<API server host>" KUBERNETES_SERVICE_PORT: "<API server port>"
Wait 60s for kubelet to pick up the ConfigMap (see Kubernetes issue #30189); then, restart the Calico pods to pick up the change:
kubectl delete pod -n kube-system -l k8s-app=calico-node kubectl delete pod -n kube-system -l k8s-app=calico-kube-controllers
And, if using Typha:
kubectl delete pod -n kube-system -l k8s-app=calico-typha
Confirm that pods restart and then reach the Running state with the following command:
watch "kubectl get pods -n kube-system | grep calico"
You can verify that the change was picked up by checking the logs of one of the calico/node pods.
kubectl get po -n kube-system -l k8s-app=calico-node
Should show one or more pods:
NAME READY STATUS RESTARTS AGE calico-node-d6znw 1/1 Running 0 48m ...
Then, to search the logs, choose a pod and run:
kubectl logs -n kube-system <pod name> | grep KUBERNETES_SERVICE_HOST
You should see the following log, with the correct KUBERNETES_SERVICE_... values.
2020-08-26 12:26:29.025 [INFO][7] daemon.go 182: Kubernetes server override env vars. KUBERNETES_SERVICE_HOST="172.16.101.157" KUBERNETES_SERVICE_PORT="6443"
In eBPF mode Calico replaces kube-proxy so running both would waste resources. This section explains how to disable kube-proxy in some common environments.
For a cluster that runs kube-proxy in a DaemonSet (such as a kubeadm-created cluster), you can disable kube-proxy, reversibly, by adding a node selector to kube-proxy’s DaemonSet that matches no nodes, for example:
kubectl patch ds -n kube-system kube-proxy -p '{"spec":{"template":{"spec":{"nodeSelector":{"non-calico": "true"}}}}}'
Then, should you want to start kube-proxy again, you can simply remove the node selector.
If you choose not to disable kube-proxy (for example, because it is managed by your Kubernetes distribution), then you must change Felix configuration parameter BPFKubeProxyIptablesCleanupEnabled to false. This can be done with calicoctl as follows:
calicoctl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyIptablesCleanupEnabled": false}}'
If both kube-proxy and BPFKubeProxyIptablesCleanupEnabled is enabled then kube-proxy will write its iptables rules and Felix will try to clean them up resulting in iptables flapping between the two.
If you are running OpenShift, you can disable kube-proxy as follows:
kubectl patch networks.operator.openshift.io cluster --type merge -p '{"spec":{"deployKubeProxy": false}}'
To re-enable it:
kubectl patch networks.operator.openshift.io cluster --type merge -p '{"spec":{"deployKubeProxy": true}}'
If the name of the your node’s interface doesn’t match the default regular expression of ^(en.*|eth.*|tunl0$), you must configure felix to detect your interface by modifying the bpfDataIfacePattern configuration option with an appropriate regex.
calicoctl patch felixconfiguration default --patch='{"spec": {"bpfDataIfacePattern": "<Regular expression>"}}'
The next step depends on whether you installed Calico using the operator, or a manifest:
If you installed Calico using a manifest, change Felix configuration parameter BPFEnabled to true. This can be done with calicoctl, as follows:
calicoctl patch felixconfiguration default --patch='{"spec": {"bpfEnabled": true}}'
Enabling eBPF mode should not disrupt existing connections but existing connections will continue to use the standard Linux datapath. You may wish to restart pods to ensure that they start new connections using the BPF dataplane.
Direct return mode skips a hop through the network for traffic to services (such as node ports) from outside the cluster. This reduces latency and CPU overhead but it requires the underlying network to allow nodes to send traffic with each other’s IPs. In AWS, this requires all your nodes to be in the same subnet and for the source/dest check to be disabled.
DSR mode is disabled by default; to enable it, set the BPFExternalServiceMode Felix configuration parameter to "DSR". This can be done with calicoctl:
calicoctl patch felixconfiguration default --patch='{"spec": {"bpfExternalServiceMode": "DSR"}}'
To switch back to tunneled mode, set the configuration parameter to "Tunnel":
calicoctl patch felixconfiguration default --patch='{"spec": {"bpfExternalServiceMode": "Tunnel"}}'
Switching external traffic mode can disrupt in-progress connections.
To revert to standard Linux networking:
(Depending on whether you installed Calico with the operator or with a manifest) reverse the changes to the operator’s Installation or the FelixConfiguration resource:
$ kubectl patch installation.operator.tigera.io default --type merge -p '{"spec":{"calicoNetwork":{"linuxDataplane":"Iptables"}}}'
or:
calicoctl patch felixconfiguration default --patch='{"spec": {"bpfEnabled": false}}'
kubectl patch ds -n kube-system kube-proxy --type merge -p '{"spec":{"template":{"spec":{"nodeSelector":{"non-calico": null}}}}}'
如果觉得我的文章对您有用,请随意打赏。你的支持将鼓励我继续创作!