Setting up Kubernetes High Availability Cluster – Building and testing a multiple masters Part II

Following on from Part 1, in this part I’m going to build a Kubernetes HA cluster.

Building a Kubernetes Node
I will build a Kubernetes control plane node and a worker node. The cluster building tool will be kubeadm, the Pod network add on will be flannel, and the CRI will be Docker. The basic process follows that of the originators kubernetes.io and Docker from the sites below:
https://kubernetes.io/docs/setup/production-environment/
https://docs.docker.com/engine/install/ubuntu/

Building the Control Plane Node
Installing Docker
Install Docker on the 3 control planes. The detailed procedure is described in the original Docker documentation, so I won’t even try and translate it here (it won’t go well). I strongly recommend looking at the primary resources.
Once the Docker installation is finished, don’t forget to create the Docker daemon configuration file, /etc/docker/daemon.json. This can be found in the Kubernetes document.

/etc/docker/daemon.json

{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2"
}

If you fail to create this file, you’ll get the following warning when running kubeadm init (mentioned later).

I0304 00:32:10.042809    2629 checks.go:102] validating the container runtime
I0304 00:32:10.175750    2629 checks.go:128] validating if the "docker" service is enabled and active
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/

There are a number of options that can be selected in daemon.json but since this is just for testing purposes, I’ll keep it in line with the Kubernetes documentation.

Some extra facts
Incoming news ― Don’t panic! Right, now the warning’s out of the way, I can safely tell you the below link contains the news that Docker will be removed from the Kubernetes container runtime in 1.22, scheduled for release in late 2021. The Docker image can still be pulled with no issues, but it will not be supported as container runtime.

https://kubernetes.io/blog/2020/12/02/dont-panic-kubernetes-and-docker/

Please note that the Docker version installed for this blog is 20.10.4.

# docker version
Client: Docker Engine - Community
 Version:           20.10.4
 …

Server: Docker Engine - Community
 Engine:
  Version:          20.10.4
  …

Installing kubeadm, kubelet, kubectl
I won’t repeat the installation procedure for the cluster creation tools kubeadm, kubelet daemon and kubectl here as there are already extremely thorough documents over at Kubernetes(https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/） Before installation, do the essential prep of disabling the swap file and setting up a firewall.

master1:~# cat /etc/fstab
…
#/swapfile     none     swap     sw     0     0

master1:~# firewall-cmd --zone=public --add-port={6443,2379-2380,10250-10252}/tcp --permanent
master1:~# firewall-cmd --reload
master1:~# firewall-cmd --zone=public --list-ports
6443/tcp 2379-2380/tcp 10250-10252/tcp

The firewall settings for flannel will be discussed later.

Making a Kubernetes cluster
After building the load balancer in the previous steps and installing Docker, kubeadm, kubelet, kubectl on three Kubernetes control plane nodes, create a Kubernetes cluster.

The Kubernetes cluster making process is as follows:
・create the cluster’s first control plane with kubeadm init
・install flannel
・add the 2 control planes to the cluster using kubeadm join

Building the first control plane node
Initialize the first control-plane with the kubeadm tool. The command is “kubeadm init [flags]” and init is divided into multiple phases. I will be referring to details found here: (https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/#init-workflow)

master1:~# kubeadm init --control-plane-endpoint 192.0.2.100:6443 --upload-certs --pod-network-cidr=10.244.0.0/16 --v=5

–control-plane-endpoint 192.0.2.100:6443
Assign the load balancer virtual IP and port as the API Server destination IP address and port. This specifies that communication to various API servers, such as worker node and kubectl commands, should be made to this IP address:port.

–upload-certs
This will create a certificate in this control plane node to be used for inter?cluster communication. This is a Secret object named “kubeadm certs”, and the additional control planes will download this certificate and join the cluster. The following is the Secret object:

master1:~# kubectl describe secret kubeadm-certs -n kube-system
Name:         kubeadm-certs
Namespace:    kube-system
Labels:       <none>
Annotations:  <none>

Type:  Opaque

Data
====
front-proxy-ca.crt:  1106 bytes
front-proxy-ca.key:  1707 bytes
sa.key:              1703 bytes
sa.pub:              479 bytes
ca.crt:              1094 bytes
ca.key:              1707 bytes
etcd-ca.crt:         1086 bytes
etcd-ca.key:         1707 bytes

–pod-network-cidr=10.244.0.0/16
Since I will use flannel for this testing, I will assign the flannel default network.
The following is an excerpt of the relevant part of flannel.yml, which will be described later.

kind: ConfigMap
…
net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
    …

–v=5
Let’s set the debugging level to 5.

You mean it runs with bash on Linux. If that’s a premise, at least write it properly.

When the command is executed, the interface to be used is determined and various preliminary checks are performed. If there aren’t any issues, then the /etc/kubernetes/pki folder will be created, various certificates will be prepared, and the manifest files for the various static pods that make up the API-Server will be created, and the image will be pulled and started.
When the pods start successfully, (message：[apiclient] All control plane components are healthy after 66.502566 seconds) i.e once port 6443 becomes active, api?server1 will go UP for the HAProxy monitoring.

Mar 04 00:34:21 loadbalancer1 haproxy[192558]: [WARNING] 062/003421 (192558) : Server apiserver/api-server1 is UP, reason: Layer6 check passed, check duration: 1ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.

Keepalived also transitions to a normal MASTER/BACKUP state as HAProxy becomes active.

Mar 04 00:33:49 loadbalancer1 Keepalived_vrrp[45385]: Script `check_haproxy` now returning 0
Mar 04 00:33:55 loadbalancer1 Keepalived_vrrp[45385]: (VI_1) Entering MASTER STATE

Then, at the end of the kubeadm init command, the commands for adding control-plane node and worker node will be displayed.

To add control-plane node

kubeadm join 192.0.2.100:6443 --token l6ev91.zw6c88y6nd37ctxx \
    --discovery-token-ca-cert-hash sha256:6720510784653492fee8d9c4d7c7fd1d4560d5f5b7f4246cad7133de3abb6e82 \
    --control-plane
    --certificate-key b7c8581d6d8ea88570ba15ca5a200ed6dd34951f97ba7075b540e63e416fd8a9

To add worker node

kubeadm join 192.0.2.100:6443 --token l6ev91.zw6c88y6nd37ctxx \
    --discovery-token-ca-cert-hash sha256:6720510784653492fee8d9c4d7c7fd1d4560d5f5b7f4246cad7133de3abb6e82

Checking the status
Set the KUBECONFIG environment variable as shown in the output of the kubeadm init command, and check the node status.

master1:~# export KUBECONFIG=/etc/kubernetes/admin.conf
master1:~# kubectl get nodes
NAME      STATUS     ROLES                  AGE   VERSION
master1   NotReady   control-plane,master   35m   v1.20.4

The reason why “STATUS” is “NotReady” is because the CoreDNS Pod is not running.

master1:~# kubectl get pods -n kube-system
NAME                              READY   STATUS    RESTARTS   AGE
coredns-74ff55c5b-f92vl           0/1     Pending   0          46m
coredns-74ff55c5b-mwjk6           0/1     Pending   0          46m
etcd-master1                      1/1     Running   0          46m
kube-apiserver-master1            1/1     Running   0          46m
kube-controller-manager-master1   1/1     Running   0          46m
kube-proxy-m9t8z                  1/1     Running   0          46m
kube-scheduler-master1            1/1     Running   0          46m

The reason why CoreDNS isn’t running is because the Pod network add on is not installed yet. In this section, we will install flannel as a Pod network add-on.

Installing flannel
When installing flannel, permit the ports used by flannel on firewalld. The port depends on whether the backend to be employed is UDP or VXLAN. This time I will use the default VXLAN, so permit UDP 8472.
flannel.yml excerpt

kind: ConfigMap
…
net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
    …

firewall-cmd config

master1:~# firewall-cmd --zone=public --add-port=8472/udp --permanent
master1:~# firewall-cmd --reload
master1:~# firewall-cmd --zone=public --list-ports
6443/tcp 2379-2380/tcp 10250-10252/tcp 8472/udp

Installing flannel

master1:~# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created

When flannel starts, it creates a pod network, and pod to pod communication can occur. With this, CoreDNS will launch.

master1:~# kubectl get pods -n kube-system
NAME                              READY   STATUS    RESTARTS   AGE
coredns-74ff55c5b-f92vl           1/1     Running   0          3h29m
coredns-74ff55c5b-mwjk6           1/1     Running   0          3h29m
etcd-master1                      1/1     Running   0          3h29m
kube-apiserver-master1            1/1     Running   0          3h29m
kube-controller-manager-master1   1/1     Running   0          3h29m
kube-flannel-ds-6rcn6             1/1     Running   0          72s
kube-proxy-m9t8z                  1/1     Running   0          3h29m
kube-scheduler-master1            1/1     Running   0          3h29m

This completes the control-plane node for the first unit. Following that, the second and third control plane nodes will be added to the cluster.

Adding the second and subsequent control-plane nodes to the cluster
Allow flannel’s UDP port 8472 in firewalld and run the “for adding Control plane node” command shown at the end of the first kubeadm init. However, the kubeadm certs certificate will disappear in two hours, so create it again if two hours have passed after kubeadm init. In fact, when I was working (or pretending to work while writing this blog), I was summoned to do another task. By the time I returned to my seat two hours had passed, and kubeadm certs had vanished into thin air. I had to deploy the certificate re-creation command.

Since kubeadm init is divided into multiple phases, you can extract and run only the certificate creation phase. The certificate key is generated and output with the following command:

master1:~# kubeadm init phase upload-certs --upload-certs
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
7b89adf0ae835f1b8d572035263047b07380ff7d849b836de140cc2d169cf364

The below is a quick way to display the kubeadm join command to be executed on the second and subsequent control plane nodes. The idea is to create a new discovery token and have the kubeadm join command appear at the same time. However, the displayed kubeadm join command is used in adding the worker node, so to add the control plane node, assign the certificate generated in the first control plane node with the upload certs mentioned earlier.

master1:~# kubeadm token create --print-join-command
kubeadm join 192.0.2.100:6443 --token kbcaen.d2cnbu6ntca5259j     --discovery-token-ca-cert-hash sha256:6720510784653492fee8d9c4d7c7fd1d4560d5f5b7f4246cad7133de3abb6e82

As shown below, run the second and subsequent control planes with –control plane specified and the certificate key added.

master2:~# kubeadm join 192.0.2.100:6443 --token kbcaen.d2cnbu6ntca5259j     --discovery-token-ca-cert-hash sha256:6720510784653492fee8d9c4d7c7fd1d4560d5f5b7f4246cad7133de3abb6e82 --control-plane --certificate-key 7b89adf0ae835f1b8d572035263047b07380ff7d849b836de140cc2d169cf364

The second control plane will become active on HAProxy.

Mar 04 23:43:35 loadbalancer1 haproxy[192558]: Server apiserver/api-server2 is UP, reason: Layer6 check passed, check duration: 1ms. 2 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.

The second control plane node has been added as a control plane.

master1:~# kubectl get nodes -o wide
NAME      STATUS   ROLES                  AGE    VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
master1   Ready    control-plane,master   22h    v1.20.4   192.0.2.1   　　　<none>　　    Ubuntu 20.04.2 LTS   5.4.0-66-generic   docker://20.10.5
master2   Ready    control-plane,master   3m6s   v1.20.4   192.0.2.2   　　　<none>    　　Ubuntu 20.04.2 LTS   5.4.0-66-generic   docker://20.10.5

You can see that each static pod that makes up the control plane is distributed across the two control-plane nodes. As for CoreDNS, its behavior is different from other pods.

master1:~# kubectl get pods -n kube-system -o wide
NAME                              READY   STATUS    RESTARTS   AGE    IP               NODE      NOMINATED NODE   READINESS GATES
coredns-74ff55c5b-f92vl           1/1     Running   2          22h    10.244.0.7       master1   <none>           <none>
coredns-74ff55c5b-mwjk6           1/1     Running   2          22h    10.244.0.6       master1   <none>           <none>
etcd-master1                      1/1     Running   2          22h    192.0.2.1       master1   <none>           <none>
etcd-master2                      1/1     Running   0          5m4s   192.0.2.2       master2   <none>           <none>
kube-apiserver-master1            1/1     Running   2          22h    192.0.2.1       master1   <none>           <none>
kube-apiserver-master2            1/1     Running   0          5m4s   192.0.2.2       master2   <none>           <none>
kube-controller-manager-master1   1/1     Running   3          22h    192.0.2.1       master1   <none>           <none>
kube-controller-manager-master2   1/1     Running   0          5m5s   192.0.2.2       master2   <none>           <none>
kube-flannel-ds-6rcn6             1/1     Running   2          19h    192.0.2.1       master1   <none>           <none>
kube-flannel-ds-w8rc2             1/1     Running   1          5m6s   192.0.2.2       master2   <none>           <none>
kube-proxy-m9t8z                  1/1     Running   2          22h    192.0.2.1       master1   <none>           <none>
kube-proxy-t5rrs                  1/1     Running   0          5m6s   192.0.2.2       master2   <none>           <none>
kube-scheduler-master1            1/1     Running   3          22h    192.0.2.1       master1   <none>           <none>
kube-scheduler-master2            1/1     Running   0          5m4s   192.0.2.2       master2   <none>           <none>

Add the third control plane node to the cluster.

master3:~# kubeadm join 192.0.2.100:6443 --token kbcaen.d2cnbu6ntca5259j     --discovery-token-ca-cert-hash sha256:6720510784653492fee8d9c4d7c7fd1d4560d5f5b7f4246cad7133de3abb6e82 --control-plane --certificate-key 7b89adf0ae835f1b8d572035263047b07380ff7d849b836de140cc2d169cf364

The third control plane node has been added as a control plane.

master1:~# kubectl get nodes -o wide
NAME      STATUS   ROLES                  AGE     VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
master1   Ready    control-plane,master   22h     v1.20.4   192.0.2.1           Ubuntu 20.04.2 LTS   5.4.0-66-generic   docker://20.10.5
master2   Ready    control-plane,master   9m44s   v1.20.4   192.0.2.2           Ubuntu 20.04.2 LTS   5.4.0-66-generic   docker://20.10.5
master3   Ready    control-plane,master   110s    v1.20.4   192.0.2.3           Ubuntu 20.04.2 LTS   5.4.0-66-generic   docker://20.10.5

This completes the construction of the control-plane node cluster. Let’s move on to adding the Worker node.

Building the Worker node
The difference between the worker node/control plane node building process is whether or not –control plane is specified as an option of kubeadm join when joining a cluster. So install Docker, kubeadm, kubelet, and kubectl in the same method as the control plane node. The only difference is the ports allowed by the firewall, so I have added those below.

worker1:~# firewall-cmd --zone=public --add-port={10250,30000-32767}/tcp --permanent
worker1:~# firewall-cmd --zone=public --add-port=8472/udp --permanent
worker1:~# firewall-cmd --reload
worker1:~# firewall-cmd --zone=public --list-ports
10250/tcp 30000-32767/tcp 8472/udp

Join the worker node to the cluster.

worker1:~# kubeadm join 192.0.2.100:6443 --token kbcaen.d2cnbu6ntca5259j     --discovery-token-ca-cert-hash sha256:6720510784653492fee8d9c4d7c7fd1d4560d5f5b7f4246cad7133de3abb6e82

The worker node has been added to the cluster.

master1:~# kubectl get nodes
NAME      STATUS   ROLES                  AGE     VERSION
master1   Ready    control-plane,master   27h     v1.20.4
master2   Ready    control-plane,master   4h21m   v1.20.4
master3   Ready    control-plane,master   4h13m   v1.20.4
worker1   Ready    <none>                 71s     v1.20.4

This completes the Kubernetes HA cluster.

Testing
I could have shut down one of the three control plane nodes, and completed the operation testing by running kubectl, but since I was running a worker node with all that 64GB of memory, I decided to run a pod. On top of that, I would shut down master1, and if the aforementioned pod’s functions continued to work, it would pass the operation test.

The pod used for testing will be dnsutils, which is introduced on the Kubernetes name resolution troubleshooting page (https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/). There is a tool installed in this pod image that is useful for name resolution, and it’s a pod that has been a real help to me outside of writing this blog. As we can also test the name resolution of the internal Kubernetes cluster, I decided that it’s better to know the internal structure of the Kubernetes cluster than to run an arbitrary stateless application.

Before we begin the verification, let’s run the following command on the three control-plane nodes.

master1:~# iptables -t filter -I FORWARD 16 -p udp -m conntrack --ctstate NEW -j ACCEPT

This is a last-ditch workaround for a bug I encountered during the verification phase, and is not critical to HA verification, so I will leave the details for later. The contents of the command mean: “Insert a rule in line 16 of the FORWARD chain in the filter table that says UDP with a NEW state in connection tracking (conntrack) is allowed.” Note that the line doesn’t necessarily have to be inserted in line 16. It’s just that before running it, I checked the corresponding iptables table and chain line and found that the bug was in line 16.

Create a dnsutils pod.

master1:~# kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml
master1:~# kubectl get pods
NAME       READY   STATUS    RESTARTS   AGE
dnsutils   1/1     Running   0          5s

First, check that dnsutils can resolve names.

master1:~# kubectl exec -it dnsutils -- nslookup kubernetes.default.svc.cluster.local.
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.96.0.1

Here, I assign the FQDN as kubernetes.default.svc.cluster.local. with a dot on the end, but even just assigning kubernetes or kubernetes.default will resolve names nicely. This is because CoreDNS will tack “default.svc.cluster.local.” on the end, and try until it is resolved. For reference, I have put a packet capture below of what happens when “nslookup kubernetes.default” is run.

03:44:09.300787 IP 10.244.3.3.55212 > 10.244.0.7.domain: 62854+ A? kubernetes.default.default.svc.cluster.local. (62)
03:44:09.302349 IP 10.244.0.7.domain > 10.244.3.3.55212: 62854 NXDomain*- 0/1/0 (155) ←Failure
03:44:09.303306 IP 10.244.3.3.43908 > 10.244.0.7.domain: 38359+ A? kubernetes.default.svc.cluster.local. (54)
03:44:09.303726 IP 10.244.0.7.domain > 10.244.3.3.43908: 38359*- 1/0/0 A 10.96.0.1 (106) ←Success

Shut down master1.

master1:~# shutdown -h now

Set up the environment variables to run kubectl commands on master2.

master2:~# export KUBECONFIG=/etc/kubernetes/admin.conf

After a while the pods that make up the control plane, including the CoreDNS pod, will start running on the other control plane nodes.

master2:~# kubectl get pods -n kube-system -o wide
NAME                              READY   STATUS        RESTARTS   AGE     IP               NODE      NOMINATED NODE   READINESS GATES
coredns-74ff55c5b-4tmmz           1/1     Terminating   0          57m     10.244.0.2       master1   <none>           <none>
coredns-74ff55c5b-z4pdk           1/1     Terminating   0          57m     10.244.0.3       master1   <none>           <none>
coredns-74ff55c5b-c5sgh           1/1     Running       0          3m46s   10.244.3.4       master1   <none>           <none>
coredns-74ff55c5b-ggtrq           1/1     Running       0          3m46s   10.244.3.3       master1   <none>           <none>
etcd-master1                      1/1     Running       0          57m     192.0.2.1        master1   <none>           <none>
etcd-master2                      1/1     Running       0          57m     192.0.2.2        master2   <none>           <none>
etcd-master3                      1/1     Running       0          56m     192.0.2.3        master3   <none>           <none>
kube-apiserver-master1            1/1     Running       0          57m     192.0.2.1        master1   <none>           <none>
kube-apiserver-master2            1/1     Running       0          57m     192.0.2.2        master2   <none>           <none>
kube-apiserver-master3            1/1     Running       0          54m     192.0.2.3        master3   <none>           <none>
…

The log below shows the election of the new etcd leader:

master2:~# kubectl logs etcd-master2 -n kube-system
…
2021-03-08 10:07:44.129986 I | etcdserver/api/etcdhttp: /health OK (status code 200)
2021-03-08 10:07:54.124452 I | etcdserver/api/etcdhttp: /health OK (status code 200)
2021-03-08 10:08:04.124697 I | etcdserver/api/etcdhttp: /health OK (status code 200)
(Shut down master1 here)
raft2021/03/08 10:08:05 INFO: 2dcb5bfc2532f1aa [term 4] received MsgTimeoutNow from 29ecb8f9253d2802 and starts an election to get leadership.
raft2021/03/08 10:08:05 INFO: 2dcb5bfc2532f1aa became candidate at term 5
…
raft2021/03/08 10:08:05 INFO: 2dcb5bfc2532f1aa became leader at term 5

The name resolution with dnsuntils also worked.

master2:~# kubectl exec -it dnsutils -- nslookup kubernetes.default
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.96.0.1

This completes the kubernetes HA cluster operational testing.

For your reference
Below this is the bug I encountered when doing name resolution with dnsutils. I’ll share the whole story with you for future reference. When I ran nslookup with dnsutils, the name resolution failed due to admin prohibited. I put the results of tcpdumping the ens3 interface below.

10:01:17.683365 IP worker1.44084 > 192.0.2.1.8472: OTV, flags [I] (0x08), overlay 0, instance 1
IP 10.244.3.2.44228 > 10.244.0.2.domain: 38797+ A? kubernetes.default.svc.cluster.local. (54)
10:01:17.683868 IP 192.0.2.1.60753 > worker1.8472: OTV, flags [I] (0x08), overlay 0, instance 1
IP 10.244.0.0 > 10.244.3.2: ICMP host 10.244.0.2 unreachable - admin prohibited, length 90

Going from Worker1’s dnsutils pod to master1’s CoreDNS pod, the packet arrives at the destination with the pod network 10.244.0.0/16 address, encapsulated with VXLAN by flannel. After that, ICMP admin prohibited is returned.

It’s clear that it’s being rejected by the firewall, but I didn’t configure anything of the sort on firewalld, and judging by the message, it’s most likely that an iptables in the backend is rejecting it. The iptables rules consist of a lot of stuff that kubernetes and Docker have put in, as well as the rules I set up in firewalld. So then I decided to do a trace of iptables. Since nf_log_ipv4, the module for Netfilter logging, is already enabled, I could immediately trace iptables. As shown below, I selected UDP 53 as the trace target.

iptables -t raw -j TRACE -p udp --dport 53 -I PREROUTING 1

I planted TRACE in the first line of the PREROUTING chain of the first raw table that comes in with Netfilter.

The following log appeared in /var/log/syslog when I ran nslookup again. The last line is this:

Mar  8 02:07:08 master1 kernel: [271253.540926] TRACE: filter:FORWARD:rule:16 IN=flannel.1 OUT=cni0 MAC=62:6d:28:95:48:fe:3e:d3:39:f5:46:2c:08:00 SRC=10.244.3.2 DST=10.244.0.7 LEN=82 TOS=0x00 PREC=0x00 TTL=62 ID=26190 PROTO=UDP SPT=45338 DPT=53 LEN=62

It seems to be stuck at line 16 of the forward chain of the filter table. I took a look at the chain of the table in question:

master1:~# iptables -t filter -L FORWARD --line-number
Chain FORWARD (policy DROP)
num  target     prot opt source               destination
…
16   REJECT     all  --  anywhere             anywhere             reject-with icmp-host-prohibited
…

I guess line 16 says to reject it with ICMP admin prohibited. I originally wanted it to properly pick up the relevant packets and FORWARD them. So, as a last ditch effort, I prepared the following:

master1:~# iptables -t filter -I FORWARD 16 -p udp -m conntrack --ctstate NEW -j ACCEPT

The End

Unitas Global Column & News

Setting up Kubernetes High Availability Cluster – Building and testing a multiple masters Part II