Use nftables to limit access to exposed docker ports

BobAnkh published on
4 min, 798 words

We will introduce how to use nftables and iptables to only allow certain IPs to access the docker ports exposed on your host.

Main Ref: nftables-whitelisting-docker

The operations are mainly from the link above, using iptables to mark traffic to docker and perform some policy when forwarding.

Core reason explanation

Here is the Packet flow in Netfilter and General Networking:

Netfilter-packet-flow

Docker is using br_netfilter, resulting in (according to netfilter):

A bridged packet never enters any network code above layer 1 (Link Layer). So, a bridged IP packet/frame will never enter the IP code.

And according to docker-iptables:

Docker installs two custom iptables chains named DOCKER-USER and DOCKER, and it ensures that incoming packets are always checked by these two chains first.

All of Docker’s iptables rules are added to the DOCKER chain. Do not manipulate this chain manually. If you need to add rules which load before Docker’s rules, add them to the DOCKER-USER chain. These rules are applied before any rules Docker creates automatically.

Docker also sets the policy for the FORWARD chain to DROP. If your Docker host also acts as a router, this will result in that router not forwarding any traffic anymore.

The idea is to use a mark to have messages from iptables passed to nftables, to differentiate cases:

  • rules evaluation is happening in the bridge path instead of the routing path: Always accept such case.
  • packet was ACCEPT-ed by Docker: Further restrictions can be added but mostly accept such case.
  • packet was ignored by Docker: Use normal nftables rules that don't have to consider the presence of Docker.
  • packet was DROP-ed for any reason in iptables: That's a moot case, nftables won't see this packet and nothing has to or can be done about it.

Use iptables to set mark

Add a rule to mark packets before Docker evaluation in the DOCKER chain overridden by the bridge path detection case with a different mark (inserting them here as explained before, but numbering them to preserve natural order, which does matter here):

iptables -I DOCKER-USER 1 -j MARK --set-mark 0xd0cca5e
iptables -I DOCKER-USER 2 -m physdev --physdev-is-bridged -j MARK --set-mark 0x10ca1

Here 0x10ca1 and 0xd0cca5e are arbitrarily chosen values. They are just bitmasks, so you can use any value you want, but it's better to avoid using values that are already used by Docker or the kernel.

Append a final rule that resets the packet's mark only if it was the tentative Docker evaluation mark, and add a final ACCEPT rule to override Docker's default DROP policy set on the FORWARD chain: the idea is to defer further evaluation to nftables for packets unrelated to Docker.

iptables -A FORWARD -m mark --mark 0xd0cca5e -j MARK --set-mark 0
iptables -A FORWARD -j ACCEPT

Use nftables to filter traffic

Now you can use nftables to filter traffic based on the mark set by iptables. We change the inet filter forward priority value to a value slightly greater than NF_IP_PRI_FILTER (0), for example 10 to ensure nftables's forward chain happens after iptables filter/FORWARD in order to respect this chronology.

The 4 previous described cases can be detected in nftables by checking the mark on the packet. Adding counter expressions to help debug:

  • mark 0x10ca1: for bridge path, we can just passthrough
  • mark 0xd0cca5e: for Docker accepted packets:
    • we create a regular/user chain to treat the Docker case and add a rule calling it (just jump to the dockercase chain for all traffic with this mark)
    • in the dockercase chain, we add additional restrictions about Docker, but accept by default
  • no mark: general case not related to Docker
  • (no packet: dropped in iptables, non-case)

Here are restrictions that only allows cloudflare's ip range to access the exposed HTTP/HTTPS ports (you can get cloudflare ipv4 list from here):

#!/usr/sbin/nft -f

flush ruleset

define SAFE_IPS = {
        173.245.48.0/20,
        103.21.244.0/22,
        103.22.200.0/22,
        103.31.4.0/22,
        141.101.64.0/18,
        108.162.192.0/18,
        190.93.240.0/20,
        188.114.96.0/20,
        197.234.240.0/22,
        198.41.128.0/17,
        162.158.0.0/15,
        104.16.0.0/13,
        104.24.0.0/14,
        172.64.0.0/13,
        131.0.72.0/22,
}
table inet filter {
        chain input {
                type filter hook input priority 0;
        }
        chain forward {
                type filter hook forward priority 10; policy drop;
                meta mark 0x10ca1 counter accept
                meta mark 0xd0cca5e counter jump dockercase
        }
        chain dockercase {
                iif eth0 tcp dport {80, 443} ip saddr != $SAFE_IPS counter drop
                counter accept
        }
        chain output {
                type filter hook output priority 0;
        }
}