In normal cases, routing on TCP/IP is based on the destination IP address. This article focuses on how to make routing choices based on the destination port. It can be used to split the traffic between several links. It’s very useful on a network with important loads. For instance you may want to route the SSH traffic using one ADSL link, and the web traffic on another ADSL link. It may also prevent the interactive sessions from becoming unresponsive. If all the outgoing packets are routed to a single ADSL link with no traffic control, your ssh/telnet/vnc session may become unresponsive as soon as someone is downloading a large file on your network. With this technique you can route all the interactive traffic to a dedicated line in order to keep quick packet transmissions with the server.
You don’t need anything special in order to get the advanced routing to work. It should work on all flavours of Linux that come with a 2.6 kernel (not sure if it works with linux-2.4). All the features that we use are in the mainline kernel, so you don’t need a specific kernel patch. You must be careful with the networking options only if you compile your own kernel. You will also need the iproute2 package and the basic networking tools to be installed, but that is the case with all of the major linux distributions.
To explain how we do destination-port routing, we will use the following network environment:
Now, let’s see how we can implement destination port routing. The old
route and the recent
iproute2 tools provide no option to select the
packets using the destination port. But
ip rule comes with an
interesting option that uses the attributes of a packet to decide how to
We will use the
fwmark attribute to do that. This attribute does not
belong to the IP header packet. This attribute is only stored in the
memory of the local machine which works on the packet. This means that
it will just be dropped as soon as the packet leaves the router. Anyway
it’s all we need since this attribute can be used by both
iproute2. The first thing to do is to use the advanced packet
matching options provided with iptables and netfilter to mark the
packets. Once the packet is marked we can use iproute2 to make policy
and use this attribute.
Obviously the packet must have been marked by netfilter before it reaches the routing code. That’s why it’s important to remember when netfilter works on the packets. Netfilter has five hooks in the kernel network stack. This means that there are five places where the netfilter functions can work on the packets. These are the kinds of packets that can be seen by each of the five hooks:
So if we mark the packets at POSTROUTING, the routing code will not see the mark and the advanced routing will have no effect. That’s why we must work in the PREROUTING hook for incoming routed packets, and in the OUTPUT hook if we want to route the packets sent by the router itself.
Netfilter and iptables work with three tables:
We will work with the mangle table since we want to change an attribute of a packet. We want to split the traffic between the two links. Let’s consider we want to route the ssh traffic through the first link and the web traffic through the second link. We will have to mark the TCP packets having dport=22 (destination port) with mark=1 and the TCP packets having dport=80 with mark=2:
iptables -t mangle -A PREROUTING -i eth0 -p tcp -m tcp --dport 22 -j MARK --set-mark 1 iptables -t mangle -A PREROUTING -i eth0 -p tcp -m tcp --dport 80 -j MARK --set-mark 2
Here is the complete code that cleans the table first, and that does logging:
iptables -t mangle -F iptables -t mangle -X iptables -t mangle -N LOG_FWMARK1 iptables -t mangle -A LOG_FWMARK1 -j LOG --log-prefix 'iptables-mark1: ' --log-level info iptables -t mangle -A LOG_FWMARK1 -j MARK --set-mark 1 iptables -t mangle -N LOG_FWMARK2 iptables -t mangle -A LOG_FWMARK2 -j LOG --log-prefix 'iptables-mark2: ' --log-level info iptables -t mangle -A LOG_FWMARK2 -j MARK --set-mark 2 iptables -t mangle -A PREROUTING -i eth0 -p tcp -m tcp --dport 22 -j LOG_FWMARK1 iptables -t mangle -A PREROUTING -i eth0 -p tcp -m tcp --dport 80 -j LOG_FWMARK2
To route the packets using the mark attribute, we have to use the
ip rule command. It’s named policy routing.
We have to create secondary routing tables that will be used when the
mark attribute of a packet matches a rule.
First, we have to create these two routing tables by editing
/etc/iproute2/rt_tables. Here is the code that automatically creates
two tables called
if ! cat /etc/iproute2/rt_tables | grep -q '^251' then echo '251 rt_link1' >> /etc/iproute2/rt_tables fi if ! cat /etc/iproute2/rt_tables | grep -q '^252' then echo '252 rt_link2' >> /etc/iproute2/rt_tables fi
Here is the list of the routing tables you should have on Jupiter:
# -----------/etc/iproute2/rt_tables------------ # reserved values 255 local 254 main 253 default 0 unspec # custom routes 252 rt_link2 251 rt_link1
Now we must populate these two routing tables. The best thing to do is
just to add one default route in each table. Each default route drives
the packet to the ethernet card where the link to use is connected. That
way, when a packet with dport=22 follows the default route written in
rt_link1, it will be sent to Neptune through device eth1 (Link1). We
ip route flush to be sure that the table is empty.
ip route flush table rt_link1 ip route add table rt_link1 default dev eth1 ip route flush table rt_link2 ip route add table rt_link2 default dev eth2
Now we have to use the
ip rule command to say what to do with the
marked packets. The following lines say that the packets having the mark
fwmark=1 must follow the routing instructions of the routing table named
rt_link1, and the packets with the second mark must use
the end we flush the routing cache to be sure that the new rules are
taken into account.
ip rule del from all fwmark 2 2>/dev/null ip rule del from all fwmark 1 2>/dev/null ip rule add fwmark 1 table rt_link1 ip rule add fwmark 2 table rt_link2 ip route flush cache
Here is the list of all rules after these commands are executed:
# ip rule show 0: from all lookup local 32764: from all fwmark 0x2 lookup rt_link2 32765: from all fwmark 0x1 lookup rt_link1 32766: from all lookup main 32767: from all lookup default
There are two network parameters that have to be checked if you want your router to behave as expected. First we want to be sure that the kernel running on Jupiter is configured to route the packets. To enable routing on IPv4 you must set ip_forward to 1 (1 means enabled, 0 means disabled).
echo 1 >| /proc/sys/net/ipv4/ip_forward
You must also disable Reverse Path Filtering. It’s an option enabled by default that increases the security and prevents ip spoofing by checking that the source address of the incoming packets match the routing table on the local machine. Since we are doing a complex setup, this option would lead to dropping our packets, so it must be disabled.
echo 0 >| /proc/sys/net/ipv4/conf/all/rp_filter
These changes will be lost if you reboot your server. You can either
ensure that is automatically executed by a script at boot time, or you
can edit your network configuration files to be sure that these changes
will be kept after reboot. On Gentoo and Redhat you have to edit
# /etc/sysctl.conf # # Enables packet forwarding net.ipv4.ip_forward = 1 # Disable reverse path filtering net.ipv4.conf.all.rp_filter = 0
Now the packets from Saturn to Neptune should be routed as expected. But there is still one problem to solve. The replies sent by Neptune to Saturn will ignore the advanced routing and will always be sent through the same link, the one that matches the route to 192.168.157.3 that is configured on Neptune. When Neptune receives packets from Saturn, the source address is 192.168.157.3. Since there is no advanced routing configured on Neptune, the packets to Saturn just follow the normal route.
This is a case of asymmetric routing. The packets from Saturn to Neptune having dport=80 are routed through the second link because of the advanced routing on Jupiter. And the replies to these packets are sent through the first link just because it’s normal routing. One solution to this problem would be to configure the advanced routing on Neptune as well as Jupiter. But we wanted to keep the configuration as simple as possible and we only want to configure advanced routing on Neptune.
The best thing to do is to configure SNAT (Source Network-Address-Translation) on Jupiter so that all packets sent through link1 or link2 come with a rewritten source address. We want the source address of the packets from link1 to be 10.37.1.253 and the source address of the packets from link2 will be 10.37.2.253. That way Neptune will receive packets with a source address that matches the link from which they come. When Neptune replies to the requests coming from link1 or link2 it will just use the source address seen in these packets as the new destination address.
You will also see that the SNAT involves an implicit DNAT (Destination Network-Address-Translation). When Jupiter receives a packet on eth2 (the interface where the second link is connected), it works because the destination address is 10.37.2.253. This is a reply to a packet from Saturn (192.168.157.3), so we want Jupiter to change the destination address, and to forward it to Saturn. This is done by the implicit DNAT.
It’s important to notice that the Source address NAT is executed in POSTROUTING. That way it’s executed after the routing, which is the place where we drive each packet to the correct device (either eth1 or eth2 on Jupiter). The SNAT iptable rule uses the “outgoing device” match to determine what source address must be written in the packet header.
In case you are using ADSL links between Jupiter and Neptune, you will be forced to use public IP addresses outside of your local network. Most modems can do NAT for you. In that case you don’t have to worry about that.
Here is the code to configure SNAT on Jupiter:
iptables -t nat -F iptables -t nat -X iptables -t nat -A POSTROUTING -o eth1 -j SNAT --to-source 10.37.1.253 iptables -t nat -A POSTROUTING -o eth2 -j SNAT --to-source 10.37.2.253
Here is what you can do in case it does not work:
In this article we considered the packet filtering as not enabled on your router and on your network. In case you are using iptables already, you will have to check that it’s consistent with the new iptables rules involved in the destination port routing.
You can use iptables to log the interesting packets using
-j LOG in
your rules. Don’t forget to install syslog on your machine. Here is an
example of what you can get:
iptables-mark1: IN= OUT=eth1 SRC=10.37.1.100 DST=10.37.3.101 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=53236 DF PROTO=TCP SPT=44443 DPT=22 WINDOW=5840 RES=0x00 SYN URGP=0 iptables-mark1: IN= OUT=eth1 SRC=10.37.1.100 DST=10.37.3.101 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=53237 DF PROTO=TCP SPT=44443 DPT=22 WINDOW=5840 RES=0x00 SYN URGP=0
To enable logging, you can replace a simple iptables action (such as MARK) with a customized chain (such as LOG_FWMARK). Everytime a packet is marked you will also have a messages written in your logs. For instance, you can replace this simple iptables command:
iptables -t mangle -A PREROUTING -i eth0 -p tcp -m tcp --dport 80 -j MARK --set-mark 2
With the following chain:
iptables -t mangle -N LOG_FWMARK2 iptables -t mangle -A LOG_FWMARK2 -j LOG --log-prefix 'iptables-mark2: ' --log-level info iptables -t mangle -A LOG_FWMARK2 -j MARK --set-mark 2 iptables -t mangle -A PREROUTING -i eth0 -p tcp -m tcp --dport 80 -j LOG_FWMARK2
You can use a sniffer such as tcpdump (console) or wireshark (graphical mode) to check what packets are transmitted and with which attributes.
Even if 95% of the networking configuration has to be done on the router (Jupiter) don’t forget to set a route to Neptune on Saturn. It may be necessary if Jupiter is not the default gateway on Saturn. Here is what to do on Saturn:
ip route add 220.127.116.11 via 192.168.157.253 # (dummy0) (Jupiter eth0)
To have a better understanding of how this advanced routing configuration works, let’s take an example of a networking packet sent from Saturn to Neptune. We consider the user on Saturn wants to connect to Neptune via ssh. In our example the ssh packets are supposed to be routed through the first link on Jupiter. (Link1)
ssh 18.104.22.168to connect to Neptune.
22.214.171.124must be routed via 192.168.157.253 so the link layer on Saturn sends the packet to Jupiter. This packet contains
ip.src=192.168.157.3, ip.dst=172.16.1.100, tcp.dport=22.
1” is executed and the
mark=1attribute is written in the packet information in the memory of Jupiter.
rt_link1”. Since the default route in this routing table is “
eth1” the packet is sent to the device named eth1.
eth1” so “
10.37.1.253” is executed. The SNAT code rewrites the source address in the packet:
ip.src=172.16.1.100, ip.dst=192.168.157.3, tcp.sport=22
Here is the script to execute on Jupiter to configure the destination port routing:
#!/bin/bash echo 1 >| /proc/sys/net/ipv4/ip_forward echo 0 >| /proc/sys/net/ipv4/conf/all/rp_filter iptables -t mangle -F iptables -t mangle -X iptables -t mangle -A PREROUTING -i eth0 -p tcp -m tcp --dport 22 -j MARK --set-mark 1 iptables -t mangle -A PREROUTING -i eth0 -p tcp -m tcp --dport 80 -j MARK --set-mark 2 iptables -t nat -F iptables -t nat -X iptables -t nat -A POSTROUTING -o eth1 -j SNAT --to-source 10.37.1.253 iptables -t nat -A POSTROUTING -o eth2 -j SNAT --to-source 10.37.2.253 if ! cat /etc/iproute2/rt_tables | grep -q '^251' then echo '251 rt_link1' >> /etc/iproute2/rt_tables fi if ! cat /etc/iproute2/rt_tables | grep -q '^252' then echo '252 rt_link2' >> /etc/iproute2/rt_tables fi ip route flush table rt_link1 ip route add table rt_link1 default dev eth1 ip route flush table rt_link2 ip route add table rt_link2 default dev eth2 ip rule del from all fwmark 2 2>/dev/null ip rule del from all fwmark 1 2>/dev/null ip rule add fwmark 1 table rt_link1 ip rule add fwmark 2 table rt_link2 ip route flush cache