NetworkPolicy on Kubernetes – how to setup properly ingress and egress if I want to limit POD’s network access

WordPress-firewall

Overview

I decided to write this post as I’ve seen on Stackoverflow many posts where people are confused about setting properly NetworkPolicy in Kubernetes – especially how to setup egress to not block traffic which will be sent back to client. I understand that might be confusing especially in TCP protocol where client’s port on which data will be sent back is chosen randomly.

So in that article I will show you how you can setup Minikube to support network policies as out-of-the-box network driver in Minikube doesn’t support that. After some playing with policies we will dig deeper to see how such firewall is implemented in Calico.

As always source code for that article can be found on https://github.com/jakubbujny/article-kubernetes-network-policies

NetworkPolicy in theory

NetworkPolicy allows us to define network rules inside Kubernetes cluster and it’s based on podSelector so it means that we can attach NetworkPolicy to pods matching them using e.g. labels. Those policies can limit outgoing and incoming network access using source/dest ports, CIDRs, namespaces and other pod’s labels.

NetworkPolicy in Kubernetes is only API definition which must be then implemented by network CNI plugin. It means if we define NetworkPolicy and we don’t have in our cluster proper CNI plugin which implements it, that NetworkPolicy won’t have any effect so we will have false-security feeling.

NetworkPolicy in practice – setup Calico on Minikube

Minikube out-of-the-box doesn’t support NetworkPolicies. To use that we must install some external CNI.

One of the most popular and stable plugin is now Calico – it use things which exist in IT for many year like netfilter/iptables and Border Gateway Protocol

To allow Minikube to CNI we must start it with following command:

 minikube start --network-plugin=cni 

And then install Calico by running

 curl https://docs.projectcalico.org/master/manifests/calico.yaml | kubectl apply -f - 

We should see new pods created in kube-system namespace – we should wait until they are in Running state before proceeding:

➜ ~ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-7f68846cd-8f97d 0/1 Pending 0 6s
calico-node-mbdzl 0/1 Init:0/3 0 6s

So wait until it will be:

➜ ~ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-7f68846cd-8f97d 1/1 Running 0 74s
calico-node-mbdzl 1/1 Running 0 74s

Prepare pods

As our example application we will use:

  • pod with ubuntu which will have limited ingress to port 80 and egress to Internet access but only on 443 port to restrict that pod to use TLS, named: ubuntu1
  • pod with ubuntu without any limitations to test connectivity, named: ubuntu2

Let’s start with first pod

apiVersion: v1
kind: Pod
metadata:
  name: ubuntu1
  labels:
    app: ubuntu1
spec:
  containers:
    - name: ubuntu
      image: ubuntu:18.04
      command: ["bash", "-c"]
      args: ["apt-get update && apt-get install -y curl python3 && while true; do sleep 1; done"]
      ports:
        - containerPort: 80

That’s simple pod which in start command install curl for testing and python3 for starting simple web server in next steps. After all it goes into infinite loop so we can exec into that container and write some commands.

apiVersion: v1
kind: Pod
metadata:
  name: ubuntu2
  labels:
    app: ubuntu2
spec:
  containers:
    - name: ubuntu
      image: ubuntu:18.04
      command: ["bash", "-c"]
      args: ["apt-get update && apt-get install -y curl && while true; do sleep 1; done"]

Second pod looks the same as first but we don’t install python and don’t expose port.

Network communication – sysdig

In current communication network is not limited – let’s verify it and see how looks network traffic.

To achieve that we will use sysdig – tool which is preinstalled on Minikube machine and it’s like strace combined with tcpdump but much more powerful. Minikube underhood spawn virtual machine where Kubernetes is installed. We can SSH into that machine and make ourselves root

minikube ssh

sudo su –

after that we want to observe traffic on port 80. We can do it using following command:

sysdig fd.port=80

Now terminal should freeze, waiting for traffic to dump.

Let’s open another terminal and find ubuntu1 IP address

kubectl describe pod ubuntu1

….

Status: Running
IP: 192.168.120.77
Containers:

exec into ubuntu1 pod and start web server

kubectl exec -it ubuntu1 bash

python3 -m http.server 80

Serving HTTP on 0.0.0.0 port 80 (http://0.0.0.0:80/) …

open another terminal and exec into ubuntu2 pod, send HTTP request to ubuntu1 webserver

kubectl exec -it ubuntu2 bash
root@ubuntu2:/# curl 192.168.120.77

Now in sysdig terminal we should see many new logs lines, look on the first

80565 10:16:48.740561399 0 curl (11292) 192.168.120.77:80

as we can see ubuntu2 opened connection to ubuntu1:80 – randomly chosen port to send back data is 37454

Network policies – ingress and egress

Now it’s time to limit the network for ubuntu1 pod. There will be 3 policies:

  • UDP egress on port 53 – that allow DNS traffic so we can translate e.g. google.com to ip address inside pod
  • TCP ingress on port 80 – allow clients to connect to our webserver
  • TCP egress on port 443 – allow pod to connect to Internet services on TLS port

Configuration:

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: dns
spec:
  podSelector:
    matchLabels:
      app: ubuntu1
  policyTypes:
    - Egress
  egress:
    - to:
        - ipBlock:
            cidr: 0.0.0.0/0
      ports:
        - protocol: UDP
          port: 53

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-ingress-to-webserver
spec:
  podSelector:
    matchLabels:
      app: ubuntu1
  policyTypes:
    - Ingress
  ingress:
    - from:
        - ipBlock:
            cidr: 0.0.0.0/0
      ports:
        - port: 80
          protocol: TCP

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-egress-to-tls
spec:
  podSelector:
    matchLabels:
      app: ubuntu1
  policyTypes:
    - Egress
  egress:
    - to:
        - ipBlock:
            cidr: 0.0.0.0/0
      ports:
        - port: 443
          protocol: TCP

Now we can go back to ubuntu1’s console and try to open connection to google.com on port 80 – we should see timeout and on HTTPS connection we should see response.

root@ubuntu1:/# curl -m 5 -vv google.com
* Rebuilt URL to: google.com/
* Trying 216.58.215.78…
* TCP_NODELAY set
* Connection timed out after 5004 milliseconds
* stopped the pause stream!
* Closing connection 0
curl: (28) Connection timed out after 5004 milliseconds

root@ubuntu1:/# curl https://google.com

Now let’s start again webserver and go back to ubuntu2 to test connection again. We still see answer from webserver.

So how is that possible? We defined limiting rule for egress traffic so we see that traffic is only possible on port 443 but data is somehow still sent back to client on some other port.

Deep dive – Calico iptables

Firewall rules which we defined must be then implemented by CNI. Calico use netfilter/iptables for that – every time we define NetworkPolicy, Calico automatically setup proper netfilter rules on every node in cluster.

Let’s go back onto minikube VM and by making some combination of iptables -S and grep try to find related rules.

-A cali-pi-_Pyx9r8CS7bPqC0nMrCi -p tcp -m comment –comment “cali:x8PiQWJp-yhKM8vP” -m multiport –dports 80 -j MARK –set-xmark 0x10000/0x10000

-A cali-tw-cali22716d29a85 -m comment –comment “cali:MfYo–qV2fDDHXqO” -m mark –mark 0x0/0x20000 -j cali-pi-_Pyx9r8CS7bPqC0nMrCi

-A cali-from-wl-dispatch -i cali22716d29a85 -m comment –comment “cali:8gmcTnib5j5lzG4A” -g cali-fw-cali22716d29a85

-A cali-fw-cali22716d29a85 -m comment –comment “cali:CcU6YKJiUYOoRnia” -m conntrack –ctstate RELATED,ESTABLISHED -j ACCEPT

As we can see Calico defines custom tables and targets – we don’t want to do whole reverse engineering of that as it’s really complex.

In simple words, in first line, we see our main iptables rule on port 80 which make MARK on the packets – MARK means advanced routing rules which are defined by Calico.

Second line show us how Calico define relations inside those iptables rules so id Pyx9r8CS7bPqC0nMrCi is related to cali22716d29a85.

In third line we see that cali22716d29a85 is actually network interface defined on node and packets processing chain should go to cali-fw-cali22716d29a85.

Finally the most important fourth line has –ctstate RELATED,ESTABLISHED -j ACCEPT parameters. Netfliter is stateful firewall – it means it understands how TCP connection works and can track in memory which connections are new and which are related or  established. So if netfilter see that some client on port 37454 already established connection to port 80 it will track that connection and won’t drop packets by egress rule limiting network traffic to only 443 port. The same rule applies when pod open connection to some Internet services so ingress rule won’t drop packets.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s