I decided to write this post as I’ve seen on Stackoverflow many posts where people are confused about setting properly NetworkPolicy in Kubernetes – especially how to setup egress to not block traffic which will be sent back to client. I understand that might be confusing especially in TCP protocol where client’s port on which data will be sent back is chosen randomly.
So in that article I will show you how you can setup Minikube to support network policies as out-of-the-box network driver in Minikube doesn’t support that. After some playing with policies we will dig deeper to see how such firewall is implemented in Calico.
As always source code for that article can be found on https://github.com/jakubbujny/article-kubernetes-network-policies
NetworkPolicy in theory
NetworkPolicy allows us to define network rules inside Kubernetes cluster and it’s based on podSelector so it means that we can attach NetworkPolicy to pods matching them using e.g. labels. Those policies can limit outgoing and incoming network access using source/dest ports, CIDRs, namespaces and other pod’s labels.
NetworkPolicy in Kubernetes is only API definition which must be then implemented by network CNI plugin. It means if we define NetworkPolicy and we don’t have in our cluster proper CNI plugin which implements it, that NetworkPolicy won’t have any effect so we will have false-security feeling.
NetworkPolicy in practice – setup Calico on Minikube
Minikube out-of-the-box doesn’t support NetworkPolicies. To use that we must install some external CNI.
To allow Minikube to CNI we must start it with following command:
minikube start --network-plugin=cni
And then install Calico by running
curl https://docs.projectcalico.org/master/manifests/calico.yaml | kubectl apply -f -
We should see new pods created in kube-system namespace – we should wait until they are in Running state before proceeding:
➜ ~ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-7f68846cd-8f97d 0/1 Pending 0 6s
calico-node-mbdzl 0/1 Init:0/3 0 6s
So wait until it will be:
➜ ~ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-7f68846cd-8f97d 1/1 Running 0 74s
calico-node-mbdzl 1/1 Running 0 74s
As our example application we will use:
- pod with ubuntu which will have limited ingress to port 80 and egress to Internet access but only on 443 port to restrict that pod to use TLS, named: ubuntu1
- pod with ubuntu without any limitations to test connectivity, named: ubuntu2
Let’s start with first pod
apiVersion: v1 kind: Pod metadata: name: ubuntu1 labels: app: ubuntu1 spec: containers: - name: ubuntu image: ubuntu:18.04 command: ["bash", "-c"] args: ["apt-get update && apt-get install -y curl python3 && while true; do sleep 1; done"] ports: - containerPort: 80
That’s simple pod which in start command install curl for testing and python3 for starting simple web server in next steps. After all it goes into infinite loop so we can exec into that container and write some commands.
apiVersion: v1 kind: Pod metadata: name: ubuntu2 labels: app: ubuntu2 spec: containers: - name: ubuntu image: ubuntu:18.04 command: ["bash", "-c"] args: ["apt-get update && apt-get install -y curl && while true; do sleep 1; done"]
Second pod looks the same as first but we don’t install python and don’t expose port.
Network communication – sysdig
In current communication network is not limited – let’s verify it and see how looks network traffic.
To achieve that we will use sysdig – tool which is preinstalled on Minikube machine and it’s like strace combined with tcpdump but much more powerful. Minikube underhood spawn virtual machine where Kubernetes is installed. We can SSH into that machine and make ourselves root
sudo su –
after that we want to observe traffic on port 80. We can do it using following command:
Now terminal should freeze, waiting for traffic to dump.
Let’s open another terminal and find ubuntu1 IP address
kubectl describe pod ubuntu1
exec into ubuntu1 pod and start web server
kubectl exec -it ubuntu1 bash
python3 -m http.server 80
Serving HTTP on 0.0.0.0 port 80 (http://0.0.0.0:80/) …
open another terminal and exec into ubuntu2 pod, send HTTP request to ubuntu1 webserver
kubectl exec -it ubuntu2 bash
root@ubuntu2:/# curl 192.168.120.77
Now in sysdig terminal we should see many new logs lines, look on the first
80565 10:16:48.740561399 0 curl (11292) 192.168.120.77:80
as we can see ubuntu2 opened connection to ubuntu1:80 – randomly chosen port to send back data is 37454
Network policies – ingress and egress
Now it’s time to limit the network for ubuntu1 pod. There will be 3 policies:
- UDP egress on port 53 – that allow DNS traffic so we can translate e.g. google.com to ip address inside pod
- TCP ingress on port 80 – allow clients to connect to our webserver
- TCP egress on port 443 – allow pod to connect to Internet services on TLS port
--- apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: dns spec: podSelector: matchLabels: app: ubuntu1 policyTypes: - Egress egress: - to: - ipBlock: cidr: 0.0.0.0/0 ports: - protocol: UDP port: 53 --- apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-ingress-to-webserver spec: podSelector: matchLabels: app: ubuntu1 policyTypes: - Ingress ingress: - from: - ipBlock: cidr: 0.0.0.0/0 ports: - port: 80 protocol: TCP --- apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-egress-to-tls spec: podSelector: matchLabels: app: ubuntu1 policyTypes: - Egress egress: - to: - ipBlock: cidr: 0.0.0.0/0 ports: - port: 443 protocol: TCP
Now we can go back to ubuntu1’s console and try to open connection to google.com on port 80 – we should see timeout and on HTTPS connection we should see response.
root@ubuntu1:/# curl -m 5 -vv google.com
* Rebuilt URL to: google.com/
* Trying 188.8.131.52…
* TCP_NODELAY set
* Connection timed out after 5004 milliseconds
* stopped the pause stream!
* Closing connection 0
curl: (28) Connection timed out after 5004 milliseconds
root@ubuntu1:/# curl https://google.com
Now let’s start again webserver and go back to ubuntu2 to test connection again. We still see answer from webserver.
So how is that possible? We defined limiting rule for egress traffic so we see that traffic is only possible on port 443 but data is somehow still sent back to client on some other port.
Deep dive – Calico iptables
Firewall rules which we defined must be then implemented by CNI. Calico use netfilter/iptables for that – every time we define NetworkPolicy, Calico automatically setup proper netfilter rules on every node in cluster.
Let’s go back onto minikube VM and by making some combination of iptables -S and grep try to find related rules.
-A cali-pi-_Pyx9r8CS7bPqC0nMrCi -p tcp -m comment –comment “cali:x8PiQWJp-yhKM8vP” -m multiport –dports 80 -j MARK –set-xmark 0x10000/0x10000
-A cali-tw-cali22716d29a85 -m comment –comment “cali:MfYo–qV2fDDHXqO” -m mark –mark 0x0/0x20000 -j cali-pi-_Pyx9r8CS7bPqC0nMrCi
-A cali-from-wl-dispatch -i cali22716d29a85 -m comment –comment “cali:8gmcTnib5j5lzG4A” -g cali-fw-cali22716d29a85
-A cali-fw-cali22716d29a85 -m comment –comment “cali:CcU6YKJiUYOoRnia” -m conntrack –ctstate RELATED,ESTABLISHED -j ACCEPT
As we can see Calico defines custom tables and targets – we don’t want to do whole reverse engineering of that as it’s really complex.
In simple words, in first line, we see our main iptables rule on port 80 which make MARK on the packets – MARK means advanced routing rules which are defined by Calico.
Second line show us how Calico define relations inside those iptables rules so id Pyx9r8CS7bPqC0nMrCi is related to cali22716d29a85.
In third line we see that cali22716d29a85 is actually network interface defined on node and packets processing chain should go to cali-fw-cali22716d29a85.
Finally the most important fourth line has –ctstate RELATED,ESTABLISHED -j ACCEPT parameters. Netfliter is stateful firewall – it means it understands how TCP connection works and can track in memory which connections are new and which are related or established. So if netfilter see that some client on port 37454 already established connection to port 80 it will track that connection and won’t drop packets by egress rule limiting network traffic to only 443 port. The same rule applies when pod open connection to some Internet services so ingress rule won’t drop packets.