Chapter 5. Network Address Translation - NAT

Network Address Translation (NAT) is the process of changing the source or destination address of a packet as it flows through the firewall. This is done chiefly to segregate internal networks and subnets from external networks.

FreeBSD has two capabilities for NAT - natd(8) a daemon process that can perform these translations, and in-kernel NAT with ipfw. Both of these capabilities use the libalias(3) library. This section will focus primarily on in-kernel NAT with ipfw.

5.1. General Procedures for Working NAT Examples

In this section we will use more than two virtual machines (VMs). If you have followed the directions on Setting Up the Entire ipfw Lab, we can begin with the setup for simple NAT.

The examples in this section and later sections grow increasingly complex. Follow this standard procedure for startup with each new example:

  1. On the host, begin by setting up the bridge and tap setup needed for the examples. Use mkbr.sh to configure bridge and tap devices on the host. Examine the figure, and run the script with all bridges and taps accounted for.

  2. Start up the required VMs. Use runvm.sh to start up several VMs at one time.

  3. On each VM, set up the required addressing. Check the diagram in each Section for addressing requirements.

  4. Ensure all VMs have connectivity to their local network peers.

  5. If there are additional scripts to load onto the firewall, external, internal, dnshost, or v6only VMs, load them.

  6. If there are specific DNS entries that are required for an example, load them into the dnshost and test the entries from another VM.

  7. Other VMs in some examples require adding additional routes.

  8. On the firewall VM, unload and reload the firewall: (kldunload ipfw and kldload ipfw).

  9. Check whether any sysctl entries are required for the example.

  10. Follow the procedure given for each section.

  11. Troubleshoot as necessary.

5.2. Setting Up for Simple NAT

.Setting Up Simple NAT
Figure 1. Setting Up Simple NAT

Shut down the existing VMs from the previous examples and reload ipfw. To set up the correct network bridge and tap architecture as shown in Figure 1, use this command:

# sudo /bin/sh mkbr.sh reset bridge0 tap1 tap4 bridge1 tap0 tap5

Restart the desired VMs with:

# /bin/sh runvm.sh firewall external1 internal

You will have to reconfigure the network addressing. Use the above figure to set up the correct addresses for each VM and ensure you can ping adjacent interfaces.

For routing, the external1 VM should have the default route set to 203.0.113.50. The internal VM should have its default route set to 198.51.100.50. The firewall should have its default route set to 203.0.113.10 (external1) since we want all traffic to exit via the firewall em0 interface.

The firewall should already be set up for IP forwarding (sysctl net.inet.ip.forwarding=1), but if not, set the sysctl as indicated. You should be able to ping em0 on external1 VM from the internal VM host and vice-versa. Check all addressing, the host bridge and tap devices, and the sysctl net.inet.ip.forwarding=1 on the firewall if something is not working.

On the firewall VM, restart ipfw with

# kldload ipfw

To use in-kernel NAT, you must first load the ipfw_nat kernel module:

# kldload ipfw_nat

Running kldstat should now show output similar to:

# kldstat
Id Refs Address                Size Name
 1   11 0xffffffff80200000  1f370e8 kernel
 2    1 0xffffffff82818000     3220 intpm.ko
 3    1 0xffffffff8281c000     2178 smbus.ko
 4    2 0xffffffff8281f000    27450 ipfw.ko
 5    1 0xffffffff82847000     42d0 ipfw_nat.ko
 6    1 0xffffffff8284c000     c962 libalias.ko
#

We are now ready to explore ipfw_nat.

Similar to other ipfw entities such as pipes and queues, ipfw_nat works with a NAT object. A NAT object is a single entry in the packet aliasing database.

We first create a NAT object:

# ipfw nat 25 config ip 198.51.100.50
ipfw nat 25 config ip 198.51.100.50
#
# ipfw nat show config
ipfw nat 25 config ip 198.51.100.50

Note that the NAT object identifier must be numeric, not alphabetic or alphanumeric. A NAT object identifier such as foo or 25foo will be rejected by ipfw.

iNext, load two rules that will use that instance:

# ipfw add 1000 nat 25 tcp from any to any
# ipfw add 2000 nat 25 icmp from any to any

Immediately listing the rulest shows the the NAT object and the rule body.

# ipfw list
01000 nat 25 tcp from any to any
02000 nat 25 icmp from any to any
65535 deny ip from any to any

#

We now have an ipfw_nat instance in the packet aliasing database and rules that will engage that instance. This is generally referred to as "static NAT".

The ipfw_nat instance will replace the IP source address of any packet exiting the firewall with 198.51.100.50, provided that packet has reached the ipfw_nat rule and matches its configuration.

To test, start tcpdump(1) on the host system monitoring bridge0. (You should again ensure that the host system is not running a firewall.)

host_system# tcpdump -n -i bridge0 -v

Then, from the firewall VM window, telnet(1) to any IP address not used in our lab:

# telnet 172.16.10.10
Trying 172.16.10.10...
^C

All you need is a few seconds to attempt the connection (which will not succeed anyway).

Examining the host tcpdump output we see the following:

host_system# tcpdump -n -i bridge0 -v
tcpdump: listening on bridge0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
19:58:34.099782 IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    198.51.100.50.62143 > 172.16.10.10.23: Flags [S], cksum 0x89d4 (correct), seq 3107170690, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 384725297 ecr 0], length 0
19:58:38.300043 IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    198.51.100.50.62143 > 172.16.10.10.23: Flags [S], cksum 0x796b (correct), seq 3107170690, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 384729498 ecr 0], length 0
19:58:46.500217 IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    198.51.100.50.62143 > 172.16.10.10.23: Flags [S], cksum 0x5964 (correct), seq 3107170690, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 384737697 ecr 0], length 0
^C

The source address has been changed from 203.0.113.50 to 198.51.100.50 as per our ipfw_nat instance. Note however, that with our configuration binding NAT to an IP address, as opposed to an interface, the NAT aliasing takes place on all configured interfaces, internal and external. You can verify this by repeating the above tcpdump on bridge1, and running a similar command for an existing host on the internal network. This time the destination will send a reset (RST), since the packet reached the destination but the service on the destination was not open.

# telnet 10.10.10.20
host_system# tcpdump -n -i bridge1 -v
tcpdump: listening on bridge1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
20:12:13.706505 IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    198.51.100.50.32825 > 10.10.10.20.23: Flags [S], cksum 0x6039 (correct), seq 1314409263, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 3924141446 ecr 0], length 0
20:12:13.710494 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
    10.10.10.20.23 > 198.51.100.50.32825: Flags [R.], cksum 0x5774 (correct), seq 0, ack 1314409264, win 0, length 0
20:12:29.573756 IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
^C

To specify just the outside interface is to be NATed, use the keyword if on the ipfw NAT configuration statement and specify the correct external interface:

# ipfw nat 25 config if em1
# ipfw nat show config
ipfw nat 25 config if em1

Note that you cannot use the ip ip_addr and if interf_name options at the same time on the same NAT instance - you must use one or the other.

What happens in this case is that the NAT instance will ensure that the IP address of interface em1 will always be used on traffic exiting through that interface - even if the address changes (because of DHCP or an administrative addressing change):

Traffic destined externally from the internal VM host via:

root@internal:~ # telnet 172.16.10.10
Trying 172.16.10.10...
^C

On the FreeBSD host:

host_system# tcpdump -n -i bridge0 -v
tcpdump: listening on bridge0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
20:24:41.147755 IP (tos 0x10, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    203.0.113.50.40001 > 172.16.10.10.23: Flags [S], cksum 0x5962 (correct), seq 950423268, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 2027118491 ecr 0], length 0
20:24:42.189806 IP (tos 0x10, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    203.0.113.50.40001 > 172.16.10.10.23: Flags [S], cksum 0x554b (correct), seq 950423268, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 2027119538 ecr 0], length 0
20:24:44.394747 IP (tos 0x10, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    203.0.113.50.40001 > 172.16.10.10.23: Flags [S], cksum 0x4caa (correct), seq 950423268, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 2027121747 ecr 0], length 0
^C

Though it does not look like it, ipfw is translating the packets as they exit the firewall.

Consider this exchange where the internal VM host pings external1:

root@internal:~ # ping 203.0.113.10
PING 203.0.113.10 (203.0.113.10): 56 data bytes
64 bytes from 203.0.113.10: icmp_seq=0 ttl=63 time=2.742 ms
64 bytes from 203.0.113.10: icmp_seq=1 ttl=63 time=2.675 ms
^C

The traffic on the internal bridge (bridge1) shows the packets from internal1:

host_system# tcpdump -n -i bridge1 -v
tcpdump: listening on bridge1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
20:29:27.048162 IP (tos 0x0, ttl 64, id 58916, offset 0, flags [none], proto ICMP (1), length 84)
    10.10.10.20 > 203.0.113.10: ICMP echo request, id 15077, seq 0, length 64
20:29:27.052446 IP (tos 0x0, ttl 63, id 36018, offset 0, flags [none], proto ICMP (1), length 84)
    203.0.113.10 > 10.10.10.20: ICMP echo reply, id 15077, seq 0, length 64
20:29:28.104133 IP (tos 0x0, ttl 64, id 58917, offset 0, flags [none], proto ICMP (1), length 84)
    10.10.10.20 > 203.0.113.10: ICMP echo request, id 15077, seq 1, length 64
20:29:28.105732 IP (tos 0x0, ttl 63, id 36019, offset 0, flags [none], proto ICMP (1), length 84)
    203.0.113.10 > 10.10.10.20: ICMP echo reply, id 15077, seq 1, length 64

whereas the traffic on the external bridge (bridge0) shows the corrrect translation:

host_system# tcpdump -n -i bridge0 -v
tcpdump: listening on bridge0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
20:33:19.695939 IP (tos 0x0, ttl 63, id 58919, offset 0, flags [none], proto ICMP (1), length 84)
    203.0.113.50 > 203.0.113.10: ICMP echo request, id 58206, seq 0, length 64
20:33:19.696546 IP (tos 0x0, ttl 64, id 36021, offset 0, flags [none], proto ICMP (1), length 84)
    203.0.113.10 > 203.0.113.50: ICMP echo reply, id 58206, seq 0, length 64
20:33:20.715148 IP (tos 0x0, ttl 63, id 58920, offset 0, flags [none], proto ICMP (1), length 84)
    203.0.113.50 > 203.0.113.10: ICMP echo request, id 58206, seq 1, length 64
20:33:20.715824 IP (tos 0x0, ttl 64, id 36022, offset 0, flags [none], proto ICMP (1), length 84)
    203.0.113.10 > 203.0.113.50: ICMP echo reply, id 58206, seq 1, length 64
^C

The unreg_only and unreg_cgn configuration options allow to bypass the NAT operation if the source IP of the packet is not one of the RFC 1918 addresses (unreg_only) or the RFC 6598 addresses (unreg_cgn - carrier grade NAT). In these cases, the original source address will be maintained in the packet, even though there is an ipfw_nat instance and a matching rule.

# ipfw nat 25 show config
ipfw nat 25 config if em1
#
# ipfw  nat 25 config if em1 unreg_only
ipfw nat 25 config if em1 unreg_only
#
# ipfw nat 25 show config
ipfw nat 25 config if em1 unreg_only
#

To try the unreg_only option, on the internal machine, change its IP address on em0 to a registered number, say 140.140.140.140/24, and change the corresponding link on the firewall (em1) to a compatible address - 140.140.140.1/24. The internal machine will need a new default route: 140.140.140.1

root@internal:~ # ifconfig em0 140.140.140.140/24
root@internal:~ # route add default 140.140.140.1
add net default: gateway 140.140.140.1
root@internal:~ #

and on the firewall

root@firewall:~ # ifconfig em1 140.140.140.1/24

From the internal machine, try to ping an external address not in our lab:

# ping 5.5.5.5

and observe on the host system that the ipfw_nat instance did not replace the source address with the configured IP:

host_system# tcpdump -n -i bridge0 -v
tcpdump: listening on bridge0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
21:07:18.154319 IP (tos 0x0, ttl 63, id 58943, offset 0, flags [none], proto ICMP (1), length 84)
    140.140.140.140 > 5.5.5.5: ICMP echo request, id 38569, seq 0, length 64
21:07:19.180094 IP (tos 0x0, ttl 63, id 58944, offset 0, flags [none], proto ICMP (1), length 84)
    140.140.140.140 > 5.5.5.5: ICMP echo request, id 38569, seq 1, length 64
21:07:20.194988 IP (tos 0x0, ttl 63, id 58945, offset 0, flags [none], proto ICMP (1), length 84)
    140.140.140.140 > 5.5.5.5: ICMP echo request, id 38569, seq 2, length 64

Not all the options available to ipfw_nat are described in the NAT section of the ipfw(8) man page.

Some of the options usable from natd(8) are avialable to ipfw_nat. These include:

redirect_port proto targetIP:targetPORT[-targetPORT]
                 [aliasIP:]aliasPORT[-aliasPORT]
                 [remoteIP[:remotePORT[-remotePORT]]]
redirect_proto proto localIP [publicIP [remoteIP]]

redirect_address localIP publicIP

The below options are used for Load Sharing NAT (LSNAT) as described in RFC 2391.

redirect_port proto targetIP:targetPORT[,targetIP:targetPORT[,...]]
                 [aliasIP:]aliasPORT [remoteIP[:remotePORT]]
redirect_address localIP[,localIP[,...]] publicIP

We discuss LSNAT in the next section.

5.3. Setting Up for LSNAT

For this example, we will use our three hosts "external1", "external2", and "external3" and pretend they are on the inside of the network; and our "internal" host is on the outside of the network.

The Figure below shows the architecture setup for working with LSNAT.

.Setting Up for LSNAT
Figure 2. Setting Up for LSNAT

As before, shutdown all virtual machines and rebuild the network from scratch.

Use this command to set up the network bridge and tap architecture.

# sudo /bin/sh mkbr.sh reset bridge0 tap4 tap5 bridge1 tap0 tap1 tap2 tap3

Note that the host interface is not needed for this example.

Restart the virtual machines with:

# /bin/sh runvm.sh firewall internal external1 external2 external3

or start them up individually.

Configure each machine to ensure its network configuration matches the above Figure and test connectivity between adjacent systems with ping(8).

Throughout this section, remember that the "external" server VMs are now internal web servers load balancing between .10, .20, .30, and the "internal" server VM is the outside host accessing the internal webservers.

On each inside machines the following commands are necessary to perform the examples in this section:

# route delete default
#
# route add default 10.10.10.50

On the outside machine perform these commands:

# route delete default
#
# route add default 198.51.100.50

Also, on each host host, edit the primary index.html page and insert a line of text that has the actual IP address of the host - something like this:

File:  /usr/local/www/nginx/index.html:

<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p> This is VM EXTERNAL1 </p>

and start nginx on each inside host:

# service nginx onestart
Performing sanity check on nginx configuration:
nginx: the configuration file /usr/local/etc/nginx/nginx.conf syntax is ok
nginx: configuration file /usr/local/etc/nginx/nginx.conf test is successful
Starting nginx.
#
.LSNAT Setup Showing All VMs
Figure 3. LSNAT Setup Showing All VMs

With no ipfw loaded on the firewall, you should be able to ping all inside addresses (10.10.10.10, .20, .30) from the outside host (198.51.100.20). You should also be able to access each web server via:

# lynx 203.0.113.10         (or .20 or .30)

We now start on LSNAT configuration.

5.3.1. Setting up LSNAT- One address (10.10.10.10)

We begin with loading ipfw and ipfw_nat on the firewall VM

# kldload ipfw
#
# kldload ipfw_nat

The first configuration is similar to static NAT, though from the outside to the inside. The command redirects incoming traffic from the outside host sent to destination IP 3.3.3.3 to inside host 10.10.10.10.

# ipfw nat 25 config redirect_addr 10.10.10.10 3.3.3.3
ipfw nat 25 config redirect_addr 10.10.10.10 3.3.3.3

Next create a rulset that utilizes this NAT instance:

# ipfw add 50 check-state
# ipfw add 1000 nat 25 tcp from any to any
#
# ipfw list
00050 check-state :default
01000 nat 25 tcp from any to any
65535 deny ip from any to any
#

Do not use the setup keyword on the ipfw rule referencing LSNAT. The setup keyword causes the final ACK of the TCP 3-way handshake to be never received and the connection is never established.

From the outside host, access the web server using:

# lynx 3.3.3.3

brings up the web page on 10.10.10.10.

.Accessing Nginx on 10.10.10.10 With LSNAT via 3.3.3.3
Figure 4. Accessing Nginx on 10.10.10.10 With LSNAT via 3.3.3.3

NAT with one address is working.

5.3.2. Engaging Multiple Hosts With LSNAT

Next, reconfigure the nat 25 instance to utilize all of the inside hosts:

# ipfw nat 25 config redirect_addr 10.10.10.10,10.10.10.20,10.10.10.30  3.3.3.3

(Note that adding a modification to a NAT instance just overwrites the existing instance. It does not create a new instance with the same number.)

On the outside host, running lynx 3.3.3.3 repeatedly retrieves the home page of each server - in round-robin fashion, without regard for any network load, or server utilization.

In the lynx browser, you can reload the current page by pressing Ctl+R.

# ipfw nat 25 show config
ipfw nat 25 config log redirect_addr 10.10.10.10,10.10.10.20,10.10.10.30 3.3.3.3
#

By adding a rule to redirect icmp traffic, both icmp and tcp will be load shared across the firewall.

# ipfw add 2000 nat 25 icmp from any to any
#
# ipfw list
00050 check-state :default
01000 nat 25 tcp from any to any
02000 nat 25 icmp from any to any
65535 deny ip from any to any

You can test this by running tcpdump -n -i em0 on each inside machine, and running ping -c 1 3.3.3.3 on the outside host a few times. The incoming ping will hit each inside machine in turn.

However, if you run ping 3.3.3.3, the result is that these pings hit only one server. The reason is that the aliasing engine treats ICMP differently from TCP and UDP. The aliasing engine recognized the ICMP id number, and if this number does not change, it uses the same alias. If the command ping -c 1 3.3.3.3 is used repeatedly, the ICMP id number changes, and this creates a new entry in the aliasing database resulting in redirection to a different VM.

It is common to want to balance the load across servers according to certain characteristics such as system load. This is possible - manually - by reconfiguring the NAT statement. You can add multiple instances of the same host to give that host more traffic. Consider this ruleset created with the Unix line continuation character '\' to close the space between successive IP addresses except for the last one and the alias adddress:

# ipfw nat 25 config log redirect_addr \
10.10.10.30,\
10.10.10.20,10.10.10.20,\
10.10.10.10,10.10.10.10,10.10.10.10,10.10.10.10  3.3.3.3

This configuration shifts the NAT load heavily toward 10.10.10.10 and moderately toward 10.10.10.20, with 10.10.10.30 having a lot less traffic. Repeat the above single ping example above to see the result. While this works, it is a bit of a hack.

It would be better to have a range assignment feature similar to the sparse address feature already in ipfw, something like:

# ipfw nat 25 config redirect_addr 10.10.10.0/24{10,20-25,30-50} 3.3.3.3

but this feature does not work with LSNAT.

However, it is possible to use the prob keyword to address load balancing. In a rule with the prob keyword, if the rule matches and the probability is "true", the action of the rule is taken and processing stops for that packet. If the rule matches, and the probability is "not true", the action is not taken, and processing continues with the next rule. You can verify this with a simple test ruleset and the ucont.sh shell rule from an external host.

03000 prob 0.200000 allow udp from any to me 5656 // set probability to 20% chance of matching
04000 count udp from any to me                    // count how many where not chosen by rule 3000
05000 prob 0.400000 allow udp from any to me 5656 // set probability to 40% chance of matching
06000 count udp from any to me                    // count how many were not chosen by 3000 and 5000
07000 prob 0.999000 allow udp from any to me 5656 // set probability to 99.9% chance of matching
08000 count udp from any to me                    // count how many were not chosen by all 3 rules
09000 allow udp from any to me 5656               // unconditional matching
65535 deny ip from any to any                     // default rule deny

Ater a run of 200 entries from sh ucont.sh 5656 1 the counts are:

03000  47  3314 prob 0.200000 allow udp from any to me 5656
04000 153 10776 count udp from any to me
05000  64  4505 prob 0.400000 allow udp from any to me 5656
06000  89  6271 count udp from any to me
07000  89  6271 prob 0.999000 allow udp from any to me 5656
08000   0     0 count udp from any to me
09000   0     0 allow udp from any to me 5656
65535   0     0 deny ip from any to any

From the above data, out of 200 packets sent from ucont.sh, 47 were matched by rule 3000, but 153 were not matched (rule 4000). Then, 64 were matched at rule 5000, but 89 were not matched. Finally, 89 where matched at rule 7000.

If you duplicate this example and find some packets hitting the default deny rule (65535), delete the host interface from the bridge and re-run the test. You are then unlikely to have any stray UDP packets hitting the default rule.

While the above works for UDP, it does not work for TCP. The TCP 3-way handshake is broken because some packets will match, but others will not.

@@@@@@@@ NEED MORE RESEARCH HERE.

Other NAT Keywords

The other keywords in the NAT section of ipfw(8) are straightforward:

deny_in : deny incoming packets same_ports : keep the same ports after redirection reset : clear the aliasing table when the address changes reverse : reverse the direction of the NAT proxy_only : packet aliasing is not performed skip_global global tablearg