# sudo /bin/sh mkbr.sh reset bridge0 tap1 tap4 bridge1 tap0 tap5
Chapter 5. Network Address Translation - NAT
Table of Contents
Network Address Translation (NAT) is the process of changing the source or destination address of a packet as it flows through the firewall. This is done chiefly to segregate internal networks and subnets from external networks.
FreeBSD has two capabilities for NAT - natd(8) a daemon process that can perform these translations, and in-kernel NAT with ipfw.
Both of these capabilities use the libalias(3)
library.
This section will focus primarily on in-kernel NAT with ipfw.
5.1. General Procedures for Working NAT Examples
In this section we will use more than two virtual machines (VMs). If you have followed the directions on Setting Up the Entire IPFW Lab, we can begin with the setup for simple NAT.
The examples in this section and later sections grow increasingly complex. Follow this standard procedure for startup with each new example:
On the host, begin by setting up the bridge and tap setup needed for the examples. Use mkbr.sh to configure bridge and tap devices on the host. Examine the figure, and run the script with all bridges and taps accounted for.
Start up the required VMs. Use runvm.sh to start up several VMs at one time.
On each VM, set up the required addressing. Check the diagram in each Section for addressing requirements.
Ensure all VMs have connectivity to their local network peers.
If there are additional scripts to load onto the firewall, external, internal, dnshost, or v6only VMs, load them.
If there are specific DNS entries that are required for an example, load them into the dnshost and test the entries from another VM.
Other VMs in some examples require adding additional routes.
On the firewall VM, unload and reload the firewall: (kldunload ipfw and kldload ipfw).
Check whether any sysctl entries are required for the example.
Follow the procedure given for each section.
Troubleshoot as necessary.
5.2. Setting Up for Simple NAT
Shut down the existing VMs from the previous examples and reload ipfw. To set up the correct network bridge and tap architecture as shown in the figure above, use this command:
Restart the desired VMs with:
# /bin/sh runvm.sh firewall external1 internal
You will have to reconfigure the network addressing. Use the above figure to set up the correct addresses for each VM and ensure you can ping adjacent interfaces.
For routing, the external1 VM should have the default route set to 203.0.113.50.
The internal VM should have its default route set to 10.10.10.50.
The firewall should have its default route set to 203.0.113.10 (external1) since we want all traffic to exit via the firewall em1
interface.
The firewall should already be set up for IP forwarding (sysctl net.inet.ip.forwarding=1), but if not, set the sysctl as indicated. You should be able to ping em0 on external1 VM from the internal VM host and vice-versa. Check all addressing, the host bridge and tap devices, and the sysctl net.inet.ip.forwarding=1 on the firewall if something is not working.
On the firewall VM, restart ipfw with
# kldload ipfw
To use in-kernel NAT, you must first load the ipfw_nat kernel module:
# kldload ipfw_nat
Running kldstat should now show output similar to:
# kldstat Id Refs Address Size Name 1 11 0xffffffff80200000 1f370e8 kernel 2 1 0xffffffff82818000 3220 intpm.ko 3 1 0xffffffff8281c000 2178 smbus.ko 4 2 0xffffffff8281f000 27450 ipfw.ko 5 1 0xffffffff82847000 42d0 ipfw_nat.ko 6 1 0xffffffff8284c000 c962 libalias.ko #
We are now ready to explore ipfw_nat.
Similar to other ipfw entities such as pipes and queues, ipfw_nat works with a NAT object. A NAT object is a single entry in the packet aliasing database.
We first create a NAT object:
# ipfw nat 25 config ip 198.51.100.50 ipfw nat 25 config ip 198.51.100.50 # # ipfw nat show config ipfw nat 25 config ip 198.51.100.50
Note that the NAT object identifier must be numeric, not alphabetic or alphanumeric.
A NAT object identifier such as foo
or 25foo
will be rejected by ipfw.
Next, load two rules that will use that instance:
# ipfw add 1000 nat 25 tcp from any to any # ipfw add 2000 nat 25 icmp from any to any
Listing the ruleset shows the NAT object and the rule body.
# ipfw list 01000 nat 25 tcp from any to any 02000 nat 25 icmp from any to any 65535 deny ip from any to any
We now have an ipfw_nat instance in the packet aliasing database and rules that will engage that instance. This is generally referred to as "static NAT".
The ipfw_nat instance will replace the IP source address of any packet exiting the firewall with 198.51.100.50, provided that packet has reached the ipfw_nat rule and matches its configuration.
To test, start tcpdump(1) on the host system monitoring bridge0. (You should again ensure that the host system is not running a firewall.)
host_system# tcpdump -n -i bridge0 -v
Then, from the firewall VM, telnet(1) to any IP address not used in our lab:
# telnet 172.16.10.10 Trying 172.16.10.10... ^C
All you need is a few seconds to attempt the connection (which will not succeed anyway).
Examining the host tcpdump
output we see the following:
host_system# tcpdump -n -i bridge0 -v tcpdump: listening on bridge0, link-type EN10MB (Ethernet), snapshot length 262144 bytes 19:58:34.099782 IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60) 198.51.100.50.62143 > 172.16.10.10.23: Flags [S], cksum 0x89d4 (correct), seq 3107170690, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 384725297 ecr 0], length 0 19:58:38.300043 IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60) 198.51.100.50.62143 > 172.16.10.10.23: Flags [S], cksum 0x796b (correct), seq 3107170690, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 384729498 ecr 0], length 0 19:58:46.500217 IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60) 198.51.100.50.62143 > 172.16.10.10.23: Flags [S], cksum 0x5964 (correct), seq 3107170690, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 384737697 ecr 0], length 0 ^C
The source address has been changed from 203.0.113.50 to 198.51.100.50 as per our ipfw_nat instance. Note however, that with our configuration binding NAT to an IP address, as opposed to an interface, the NAT aliasing takes place on all configured interfaces, internal and external. You can verify this by repeating the above tcpdump on bridge1, and running a similar command for an existing host on the internal network. This time the destination sends a TCP reset (RST), since the packet reached the destination but the service on the destination was not open.
# telnet 10.10.10.20
host_system# tcpdump -n -i bridge1 -v tcpdump: listening on bridge1, link-type EN10MB (Ethernet), snapshot length 262144 bytes 20:12:13.706505 IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60) 198.51.100.50.32825 > 10.10.10.20.23: Flags [S], cksum 0x6039 (correct), seq 1314409263, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 3924141446 ecr 0], length 0 20:12:13.710494 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40) 10.10.10.20.23 > 198.51.100.50.32825: Flags [R.], cksum 0x5774 (correct), seq 0, ack 1314409264, win 0, length 0 20:12:29.573756 IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60) ^C
To specify that only the outside interface is to be NATed, use the keyword if on the ipfw NAT configuration statement and specify the correct external interface:
# ipfw nat 25 config if em1 # ipfw nat show config ipfw nat 25 config if em1
Note that you cannot use the ip ip_addr and if interf_name options at the same time on the same NAT instance - you must use one or the other.
What happens in this case is that the NAT instance will ensure that the IP address of interface em1 will always be used on traffic exiting through that interface - even if the address changes (because of DHCP or an administrative addressing change):
Traffic destined externally from the internal VM host via:
root@internal:~ # telnet 172.16.10.10 Trying 172.16.10.10... ^C
On the FreeBSD host:
host_system# tcpdump -n -i bridge0 -v tcpdump: listening on bridge0, link-type EN10MB (Ethernet), snapshot length 262144 bytes 20:24:41.147755 IP (tos 0x10, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60) 203.0.113.50.40001 > 172.16.10.10.23: Flags [S], cksum 0x5962 (correct), seq 950423268, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 2027118491 ecr 0], length 0 20:24:42.189806 IP (tos 0x10, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60) 203.0.113.50.40001 > 172.16.10.10.23: Flags [S], cksum 0x554b (correct), seq 950423268, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 2027119538 ecr 0], length 0 20:24:44.394747 IP (tos 0x10, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60) 203.0.113.50.40001 > 172.16.10.10.23: Flags [S], cksum 0x4caa (correct), seq 950423268, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 2027121747 ecr 0], length 0 ^C
Though it does not look like it, ipfw is translating the packets as they exit the firewall.
Consider this exchange where the internal VM host pings the external1 VM:
root@internal:~ # ping 203.0.113.10 PING 203.0.113.10 (203.0.113.10): 56 data bytes 64 bytes from 203.0.113.10: icmp_seq=0 ttl=63 time=2.742 ms 64 bytes from 203.0.113.10: icmp_seq=1 ttl=63 time=2.675 ms ^C
The traffic on the internal bridge (bridge1) shows the packets from the internal1 VM:
host_system# tcpdump -n -i bridge1 -v tcpdump: listening on bridge1, link-type EN10MB (Ethernet), snapshot length 262144 bytes 20:29:27.048162 IP (tos 0x0, ttl 64, id 58916, offset 0, flags [none], proto ICMP (1), length 84) 10.10.10.20 > 203.0.113.10: ICMP echo request, id 15077, seq 0, length 64 20:29:27.052446 IP (tos 0x0, ttl 63, id 36018, offset 0, flags [none], proto ICMP (1), length 84) 203.0.113.10 > 10.10.10.20: ICMP echo reply, id 15077, seq 0, length 64 20:29:28.104133 IP (tos 0x0, ttl 64, id 58917, offset 0, flags [none], proto ICMP (1), length 84) 10.10.10.20 > 203.0.113.10: ICMP echo request, id 15077, seq 1, length 64 20:29:28.105732 IP (tos 0x0, ttl 63, id 36019, offset 0, flags [none], proto ICMP (1), length 84) 203.0.113.10 > 10.10.10.20: ICMP echo reply, id 15077, seq 1, length 64
whereas the traffic on the external bridge (bridge0) shows the correct translation:
host_system# tcpdump -n -i bridge0 -v tcpdump: listening on bridge0, link-type EN10MB (Ethernet), snapshot length 262144 bytes 20:33:19.695939 IP (tos 0x0, ttl 63, id 58919, offset 0, flags [none], proto ICMP (1), length 84) 203.0.113.50 > 203.0.113.10: ICMP echo request, id 58206, seq 0, length 64 20:33:19.696546 IP (tos 0x0, ttl 64, id 36021, offset 0, flags [none], proto ICMP (1), length 84) 203.0.113.10 > 203.0.113.50: ICMP echo reply, id 58206, seq 0, length 64 20:33:20.715148 IP (tos 0x0, ttl 63, id 58920, offset 0, flags [none], proto ICMP (1), length 84) 203.0.113.50 > 203.0.113.10: ICMP echo request, id 58206, seq 1, length 64 20:33:20.715824 IP (tos 0x0, ttl 64, id 36022, offset 0, flags [none], proto ICMP (1), length 84) 203.0.113.10 > 203.0.113.50: ICMP echo reply, id 58206, seq 1, length 64 ^C
The unreg_only and unreg_cgn configuration options allows you to bypass the NAT operation if the source IP of the packet is not one of the RFC 1918 addresses (unreg_only) or the RFC 6598 addresses (unreg_cgn - carrier grade NAT). In these cases, the original source address will be maintained in the packet, even though there is an ipfw_nat instance and a matching rule.
# ipfw nat 25 show config ipfw nat 25 config if em1 # # ipfw nat 25 config if em1 unreg_only ipfw nat 25 config if em1 unreg_only # # ipfw nat 25 show config ipfw nat 25 config if em1 unreg_only #
To try the unreg_only option, on the internal VM, change its IP address on em0
to a registered number, say 140.140.140.140/24, and change the corresponding link on the firewall (em1
) to a compatible address - 140.140.140.1/24.
The internal VM will need a new default route: 140.140.140.1
root@internal:~ # ifconfig em0 140.140.140.140/24 root@internal:~ # route add default 140.140.140.1 add net default: gateway 140.140.140.1 root@internal:~ # and on the firewall root@firewall:~ # ifconfig em0 140.140.140.1/24
From the internal VM, try to ping an external address not in our lab:
# ping 5.5.5.5
and observe on the host system that the ipfw_nat instance did not replace the source address with the configured IP:
host_system# tcpdump -n -i bridge0 -v tcpdump: listening on bridge0, link-type EN10MB (Ethernet), snapshot length 262144 bytes 21:07:18.154319 IP (tos 0x0, ttl 63, id 58943, offset 0, flags [none], proto ICMP (1), length 84) 140.140.140.140 > 5.5.5.5: ICMP echo request, id 38569, seq 0, length 64 21:07:19.180094 IP (tos 0x0, ttl 63, id 58944, offset 0, flags [none], proto ICMP (1), length 84) 140.140.140.140 > 5.5.5.5: ICMP echo request, id 38569, seq 1, length 64 21:07:20.194988 IP (tos 0x0, ttl 63, id 58945, offset 0, flags [none], proto ICMP (1), length 84) 140.140.140.140 > 5.5.5.5: ICMP echo request, id 38569, seq 2, length 64
Not all the options available to ipfw_nat are described in the NAT section of the ipfw(8) man page.
Some of the options usable from natd(8) are available to ipfw_nat. These include:
redirect_port proto targetIP:targetPORT[-targetPORT] [aliasIP:]aliasPORT[-aliasPORT] [remoteIP[:remotePORT[-remotePORT]]] redirect_proto proto localIP [publicIP [remoteIP]] redirect_address localIP publicIP
The below options are used for Load Sharing NAT (LSNAT) as described in RFC 2391.
redirect_port proto targetIP:targetPORT[,targetIP:targetPORT[,...]] [aliasIP:]aliasPORT [remoteIP[:remotePORT]] redirect_address localIP[,localIP[,...]] publicIP
We discuss LSNAT
in the next section.
5.3. Setting Up for LSNAT
For this example, we will use the three VMs external1, external2, and external3 and pretend they are on the inside of the network; and our internal VM is on the outside of the network.
The figure below shows the architecture setup for working with LSNAT.
As before, shutdown all virtual machines and rebuild the network from scratch.
Use this command to set up the network bridge and tap architecture.
# sudo /bin/sh mkbr.sh reset bridge0 tap4 tap5 bridge1 tap0 tap1 tap2 tap3
Note that the host interface is not needed for this example.
Restart the virtual machines with:
# /bin/sh runvm.sh firewall internal external1 external2 external3
or start them up individually.
Configure each virtual machine to ensure its network configuration matches the above figure and test connectivity between adjacent systems with ping(8).
Throughout this section, remember that the "external" VMs are now internal web servers load balancing between .10, .20, .30, and the "internal" server VM is the outside host accessing the internal webservers.
On each inside VM the following commands are necessary to perform the examples in this section:
# route delete default # # route add default 10.10.10.50
On the outside VM perform these commands:
# route delete default # # route add default 198.51.100.50
Also, on each VM, edit the nginx index.html page and insert a line of text that has the VM name or IP address of the VM - something like this:
File: /usr/local/www/nginx/index.html: <h1>Welcome to nginx!</h1> <p>If you see this page, the nginx web server is successfully installed and working. Further configuration is required.</p> <p> This is VM EXTERNAL1</p>
and start nginx on each inside VM:
# service nginx onestart Performing sanity check on nginx configuration: nginx: the configuration file /usr/local/etc/nginx/nginx.conf syntax is ok nginx: configuration file /usr/local/etc/nginx/nginx.conf test is successful Starting nginx. #
With no ipfw loaded on the firewall, you should be able to ping all inside addresses (10.10.10.10, .20, .30) from the outside host (198.51.100.20). You should also be able to access each web server via:
# lynx 203.0.113.10 # (or .20 or .30)
We now start on LSNAT configuration.
5.3.1. Setting up LSNAT- One address (10.10.10.10)
We begin with loading ipfw and ipfw_nat on the firewall VM
# kldload ipfw # # kldload ipfw_nat
The first configuration is similar to static NAT, though from the outside to the inside. The command redirects incoming traffic from the outside VM sent to destination IP 3.3.3.3 to inside VM 10.10.10.10.
# ipfw nat 25 config redirect_addr 10.10.10.10 3.3.3.3 ipfw nat 25 config redirect_addr 10.10.10.10 3.3.3.3
Next create a ruleset that utilizes this NAT instance:
# ipfw add 50 check-state # ipfw add 1000 nat 25 tcp from any to any # # ipfw list 00050 check-state :default 01000 nat 25 tcp from any to any 65535 deny ip from any to any #
Do not use the setup keyword on the ipfw rule referencing LSNAT. The setup keyword causes the final ACK of the TCP 3-way handshake to be never received and the connection is never established. |
From the outside VM, access the web server using:
# lynx 3.3.3.3
brings up the web page on 10.10.10.10.
NAT with one address is working.
5.3.2. Engaging Multiple Hosts With LSNAT
Next, reconfigure the nat 25 instance to utilize all of the inside hosts:
# ipfw nat 25 config redirect_addr 10.10.10.10,10.10.10.20,10.10.10.30 3.3.3.3
(Note that adding a modification to a NAT instance just overwrites the existing instance. It does not create a new instance with the same number.)
On the outside VM, running lynx 3.3.3.3 repeatedly retrieves the home page of each internal server - in round-robin fashion, without regard for any network load, or server utilization.
In the lynx browser, you can reload the current page by pressing Ctl+R. |
# ipfw nat 25 show config ipfw nat 25 config log redirect_addr 10.10.10.10,10.10.10.20,10.10.10.30 3.3.3.3 #
By adding a rule to redirect icmp traffic, both icmp and tcp will be load shared across the firewall.
# ipfw add 2000 nat 25 icmp from any to any # # ipfw list 00050 check-state :default 01000 nat 25 tcp from any to any 02000 nat 25 icmp from any to any 65535 deny ip from any to any
You can test this by running tcpdump -n -i em0 on each inside VM, and running ping -c 1 3.3.3.3 on the outside VM a few times. The incoming ping will hit each inside VM in turn.
However, if you run ping 3.3.3.3, the result is that these pings hit only one internal VM. The reason is that the aliasing engine treats ICMP differently from TCP and UDP. The aliasing engine recognized the ICMP id number, and if this number does not change, it uses the same alias. If the command ping -c 1 3.3.3.3 is used repeatedly, the ICMP id number changes, and this creates a new entry in the aliasing database resulting in redirection to a different VM.
It is common to want to balance the load across servers according to certain characteristics such as system load. This is possible - manually - by reconfiguring the NAT statement. You can add multiple instances of the same host to give that host more traffic. Consider this ruleset created with the Unix line continuation character '\' to close the space between successive IP addresses except for the last one and the alias address:
# ipfw nat 25 config log redirect_addr \ 10.10.10.30,\ 10.10.10.20,10.10.10.20,\ 10.10.10.10,10.10.10.10,10.10.10.10,10.10.10.10 3.3.3.3
This configuration shifts the NAT load heavily toward 10.10.10.10 and moderately toward 10.10.10.20, with 10.10.10.30 having a lot less traffic. Repeat the above single ping example above to see the result. While this works, it is a bit of a hack.
It would be better to have a range assignment feature similar to the sparse address feature already in ipfw, something like:
# ipfw nat 25 config redirect_addr 10.10.10.0/24{10,20-25,30-50} 3.3.3.3 ipfw: unknown host 10.10.10.0/24{10
but this feature does not work with LSNAT.
However, it is possible to use the prob keyword to address load balancing. In a rule with the prob keyword, if the rule matches and the probability is "true", the action of the rule is taken and processing stops for that packet. If the rule matches, and the probability is "not true", the action is not taken, and processing continues with the next rule. You can verify this with a simple test ruleset and the ucont.sh shell rule from an external host.
03000 prob 0.200000 allow udp from any to me 5656 // set probability to 20% chance of matching 04000 count udp from any to me // count how many were not chosen by rule 3000 05000 prob 0.400000 allow udp from any to me 5656 // set probability to 40% chance of matching 06000 count udp from any to me // count how many were not chosen by 3000 and 5000 07000 prob 0.999000 allow udp from any to me 5656 // set probability to 99.9% chance of matching 08000 count udp from any to me // count how many were not chosen by all 3 rules 09000 allow udp from any to me 5656 // unconditional matching 65535 deny ip from any to any // default rule deny
After a run of 200 entries from sh ucont.sh 5656 1 the counts are:
03000 47 3314 prob 0.200000 allow udp from any to me 5656 04000 153 10776 count udp from any to me 05000 64 4505 prob 0.400000 allow udp from any to me 5656 06000 89 6271 count udp from any to me 07000 89 6271 prob 0.999000 allow udp from any to me 5656 08000 0 0 count udp from any to me 09000 0 0 allow udp from any to me 5656 65535 0 0 deny ip from any to any
From the above data, out of 200 packets sent from ucont.sh, 47 were matched by rule 3000, but 153 were not matched (rule 4000). Then, 64 were matched at rule 5000, but 89 were not matched. Finally, 89 where matched at rule 7000.
If you duplicate this example and find some packets hitting the default deny rule (65535), delete the host interface from the bridge and re-run the test. You are then unlikely to have any stray UDP packets hitting the default rule. |
While the above works for UDP, it does not work for TCP. The TCP 3-way handshake is broken because some packets will match, but others will not.
Other NAT Keywords
The other keywords in the NAT section of ipfw(8) are straightforward:
deny_in : deny incoming packets
same_ports : keep the same ports after redirection
reset : clear the aliasing table when the address changes
reverse : reverse the direction of the NAT
proxy_only : packet aliasing is not performed
skip_global
global
tablearg : discussed in Understanding the Word Tablearg