Using Multipath TCP on recent Linux kernels

The first version of Multipath TCP on Linux was an off-tree patch intially developed by UCLouvain researchers [40]. This implementation was initially the reference implementation of Multipath TCP. It influenced the design of the protocol as new features were always tested on this implementation. You can find additional information about this implementation on https://www.multipath-tcp.org.

Starting with version 5.6, the official Linux kernel includes support for Multipath TCP. The set of features supported by this implementation has increased over time as shown by its ChangeLog.

To avoid any interference with regular TCP, this implementation only creates a Multipath TCP connection if the application has created its socket using the IPPROTO_MPTCP protocol. Applications will probably be modified in the coming months and years to add specific support for Multipath TCP, but in the mean time, the Multipath TCP developers have created a work around to force legacy applications to use Multipath TCP with the mptcpize command which is bundled with the mptcpd daemon. We use this approach in this section and discuss applications with native Multipath TCP support later.

To illustrate Multipath TCP, we use a very simple setup with a Linux client using Ubuntu 22 and a Linux server using Debian. The client uses Linux kernel version 5.15 while the server uses version 5.17. The server has a single network interface with an IPv4 and an IPv6 address. The client has both a Wi-Fi and an Ethernet interface. These two interfaces are connected to the same router that allocates IP addresses in the same subnet on both interfaces. The client has both an IPv4 and an IPv6 address.

Enabling Multipath TCP

Multipath TCP is a feature that needs to be compiled inside the kernel. If you compile your own kernel, you can manually select Multipath TCP.

Most users prefer to rely on already compiled Linux kernels that are included in their distribution. The following distributions support Multipath TCP:

  • CentOS starting with

  • Debian starting with

  • Ubuntu starting with 22.04

You need to to install a recent kernel to benefit from Multipath TCP. On some distributions, this installation will be part of the regular upgrade. On other distributions, you will need to add it manually.

Once the kernel has been installed and your computer has rebooted, you first need to verify that Multipath TCP is enabled.

sudo sysctl -a | grep mptcp.enabled
net.mptcp.enabled = 1

Here, the kernel supports Multipath TCP. If, for any reason, you want to disable Multipath TCP, you need to set this sysctl variable to 0.

To illustrate the basic operation of mptcpize, let us first use the netcat command over the loopback interface. This is obviously not the target use case for Multipath TCP, but a nice way to perform tests.

Netcat allows to easily launch clients and servers. We start the server using: mptcpize run nc -l -p 12345. This is a TCP server that listens on port 12345. The client connects to this server using the mptcpize run nc 127.0.0.1 12345 command. The connection is established and all text lines sent by the client are printed by the server on standard output.

# mptcpize run nc -l -p 12345
Simple test

There are several ways to check that Multipath TCP is used for this connection. First, the ss command provides information about the status of the different sockets.

ss -iaM
State    Recv-Q    Send-Q       Local Address:Port        Peer Address:Port    Process
ESTAB    0         0                127.0.0.1:12345          127.0.0.1:52854
         subflows_max:2 remote_key token:5bba80d9 write_seq:2266a099179e2476 snd_una:2266a099179e2476 rcv_nxt:de9999038d0a29a2
ESTAB    0         0                127.0.0.1:52854          127.0.0.1:12345
         subflows_max:2 remote_key token:c1f12b87 write_seq:de9999038d0a29a2 snd_una:de9999038d0a29a2 rcv_nxt:2266a099179e2476

ss provides several useful information to debug a Multipath TCP connection. The first column indicates that the connection is in the Established state, which means that it can currently transfer data. It also indicates the length of the Send and Receive queues at the TCP level and the four-tuple that identifies the connection. The next line provides Multipath TCP information with the maximum number of subflows which can be attached to the connection, the token assigned by the remote host and the write_seq, snd_una and rcv_next parameters of the sate machine. The next two lines provide information about the other direction of the connection.

It is also possible to capture packets on the loopback interface to verify that Multipath TCP is used. The output below provides the first collected packets:

tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes
18:43:42.676396 IP 127.0.0.1.52854 > 127.0.0.1.12345: Flags [S], seq 904893125, win 65495, options [mss 65495,sackOK,TS val 4026038040 ecr 0,nop,wscale 7,mptcp capable v1], length 0
18:43:42.676426 IP 127.0.0.1.12345 > 127.0.0.1.52854: Flags [S.], seq 1804351310, ack 904893126, win 65483, options [mss 65495,sackOK,TS val 4026038040 ecr 4026038040,nop,wscale 7,mptcp capable v1 {0x45edb502d861e7b1}], length 0
18:43:42.676472 IP 127.0.0.1.52854 > 127.0.0.1.12345: Flags [.], ack 1, win 512, options [nop,nop,TS val 4026038040 ecr 4026038040,mptcp capable v1 {0xdbb760db1d55e07b,0x45edb502d861e7b1}], length 0
18:44:59.519697 IP 127.0.0.1.52854 > 127.0.0.1.12345: Flags [P.], seq 1:13, ack 1, win 512, options [nop,nop,TS val 4026114884 ecr 4026038040,mptcp capable v1 {0xdbb760db1d55e07b,0x45edb502d861e7b1},nop,nop], length 12
18:44:59.519755 IP 127.0.0.1.12345 > 127.0.0.1.52854: Flags [.], ack 13, win 512, options [nop,nop,TS val 4026114884 ecr 4026114884,mptcp dss ack 16040019788386937262], length 0

The first packet is the SYN that includes the MP_CAPABLE option. The server replies with the SYN+ACK with the MP_CAPABLE containing the server key. The client returns the third ACK with the MP_CAPABLE and the two keys. As the server did not send any data, the MP_CAPABLE option is sent again in the packet containing the Simple test string. This packet also contains the DSS option. The server replies with an acknowledgment that carries the DSS option.

We can now use the netcat application to explore the operation of Multipath TCP over the Internet. Let us start with a very simple example.

mptcpize run nc serverv4 12345
hello

The netcat process listens on port 12345 on the server. This results in the following Multipath TCP connection :

09:05:23.695876 IP host-78-129-5-171.dynamic.voo.be.41510 > serverv4.12345: Flags [S], seq 3525674027, win 64240, options [mss 1460,sackOK,TS val 2619832768 ecr 0,nop,wscale 7,mptcp capable v1], length 0
09:05:23.696076 IP serverv4.12345 > host-78-129-5-171.dynamic.voo.be.41510: Flags [S.], seq 1745741580, ack 3525674028, win 65160, options [mss 1460,sackOK,TS val 3069340264 ecr 2619832768,nop,wscale 7,mptcp capable v1 {0x82aa42ef4245f0d0}], length 0
09:05:23.707909 IP host-78-129-5-171.dynamic.voo.be.41510 > serverv4.12345: Flags [.], ack 1, win 502, options [nop,nop,TS val 2619832783 ecr 3069340264,mptcp capable v1 {0x9dc8e3972e3d9f25,0x82aa42ef4245f0d0}], length 0
09:05:30.776312 IP host-78-129-5-171.dynamic.voo.be.41510 > serverv4.12345: Flags [P.], seq 1:7, ack 1, win 502, options [nop,nop,TS val 2619839851 ecr 3069340264,mptcp capable v1 {0x9dc8e3972e3d9f25,0x82aa42ef4245f0d0},nop,nop], length 6
09:05:30.776484 IP serverv4.12345 > host-78-129-5-171.dynamic.voo.be.41510: Flags [.], ack 7, win 510, options [nop,nop,TS val 3069347345 ecr 2619839851,mptcp dss ack 1561335003985645838], length 0

This is a Multipath TCP connection since it includes the Multipath TCP options, but the client does not create an additional subflow and the server does not announce its other addresses. This is the expected behavior since these operations are controlled by the path manager. On Linux, the Multipath TCP path manager can be configured using the ip-mptcp command. This command can be used to configure different parameters that are associated to an IP address. The path manager associates a numeric identifier to each IP address or endpoint. The ip mptcp endpoint show command lists the identifiers of the active IP addresses on the host. Here is an example of the output of this command on our client:

ip mptcp endpoint show
fe80::3934:7572:b1ff:b555 id 1 dev wlp3s0
192.168.0.43 id 2 dev wlp3s0
fe80::5642:39bd:3390:43d3 id 3 dev enp2s0
192.168.0.37 id 4 dev enp2s0
2a02:2788:10c4:123:3d66:f590:d891:8fb3 id 5 dev wlp3s0
2a02:2788:10c4:123:6636:10c6:692b:18cc id 6 dev enp2s0
2a02:2788:10c4:123:2a09:5ec7:9b99:4a97 id 7 dev enp2s0

The two fe80 addresses are the IPv6 link local addresses configured on the Ethernet (enp2s0) and Wi-Fi (wlp3s0) interfaces of our client host. There are three flags which can be associated with each endpoint identifier:

  • subflow. When this flag is set, the path manager will try to create a subflow over this interface when a Multipath TCP is created or the interface becomes active while there was an ongoing Multipath TCP connection. This flag is mainly useful for clients.

  • signal. When this flag is set, the path manager will announce the address of the endpoint over any Multipath TCP connection created using other addresses. This flag can be used on clients or servers. It is mainly useful on servers that have multiple addresses.

  • backup. This flag can be combined with the two other flags. When combined with the subflow flag, it indicates that a backup subflow will be created. When combined with the signal flag, it indicates that the address will be advertised as a backup address.

On our client host, we can configure the Wi-Fi interface as a backup interface that creates subflows as follows :

sudo ip mptcp endpoint del id 2
sudo ip mptcp endpoint add 192.168.0.43 dev wlp3s0 subflow backup
sudo ip mptcp endpoint show
fe80::3934:7572:b1ff:b555 id 1 dev wlp3s0
fe80::5642:39bd:3390:43d3 id 3 dev enp2s0
192.168.0.37 id 4 dev enp2s0
2a02:2788:10c4:123:3d66:f590:d891:8fb3 id 5 dev wlp3s0
2a02:2788:10c4:123:6636:10c6:692b:18cc id 6 dev enp2s0
2a02:2788:10c4:123:2a09:5ec7:9b99:4a97 id 7 dev enp2s0
192.168.0.43 id 8 subflow backup dev wlp3s0

We had to first remove the configuration for this endpoint because a default one was already active. Then we added the new parameters and verified them.

The path manager also has some limits which can be configured using the ip mptcp limits command. Two limits can be set.

  • ip mptcp limits set subflow n where n is an integer. This restricts the Multipath TCP connection to use up to n different subflows. Servers should protect themselves by setting this limit to a few subflows. Most use cases would work well with 2 or 4 subflows.

  • ip mptcp limits set add_addr_accepted n where n is an integer. This parameter limits the number of addresses that are learned over each Multipath TCP connection. This parameter could be used to protect the Multipath TCP implementation against attacks where two many addresses are advertised. Most use cases would work with 4 accepted addresses.

These parameters control the path manager, but before creating Multipath TCP subflows over different paths, we need to configure the IP routing table of our client host. Our client host has two network interfaces: Wi-Fi and Ethernet. By default, Linux prefers the Ethernet interface to Wi-Fi. The two interfaces are configured as follows :

ip -4 addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 127.0.0.1/8 scope host lo
    valid_lft forever preferred_lft forever
2: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    inet 192.168.0.37/24 brd 192.168.0.255 scope global dynamic noprefixroute enp2s0
    valid_lft 75697sec preferred_lft 75697sec
3: wlp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 192.168.0.43/24 brd 192.168.0.255 scope global dynamic noprefixroute wlp3s0
    valid_lft 75696sec preferred_lft 75696sec

By default, Linux creates the two following default routes.

route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.0.1     0.0.0.0         UG    100    0        0 enp2s0
0.0.0.0         192.168.0.1     0.0.0.0         UG    600    0        0 wlp3s0

We need to configure the routing tables to be able to use the two interfaces simultaneously. For this, we need to ensure that packets with source address 192.168.0.37 are sent over the enp2s0 interface while packets with source address 192.168.0.43 are sent over the wlp3s0 interface. This can be achieved using two different routing tables.

# create the two routing tables
ip rule add from 192.168.0.37 table 1
ip rule add from 192.168.0.43 table 2

# Configure routing table 1 for enp2s0
ip route add 192.168.0.0/24 dev enp2s0 scope link table 1
ip route add default via 192.168.0.1 dev enp2s0 table 1

# Configure routing table 2 for wlp3s0
ip route add 192.168.0.0/24 dev wlp3s0 scope link table 2
ip route add default via 192.168.0.1 dev wlp3s0 table 2

# Configure a default route to regular internet
ip route add default scope global nexthop via 192.168.0.1 dev enp2s0

We can check the routing tables using the ip command.

ip rule show
0:   from all lookup local
32764:       from 192.168.0.43 lookup 2
32765:       from 192.168.0.37 lookup 1
32766:       from all lookup main
32767:       from all lookup default

ip route
default via 192.168.0.1 dev enp2s0
default via 192.168.0.1 dev enp2s0 proto dhcp metric 100
default via 192.168.0.1 dev wlp3s0 proto dhcp metric 600
169.254.0.0/16 dev wlp3s0 scope link metric 1000
192.168.0.0/24 dev enp2s0 proto kernel scope link src 192.168.0.37 metric 100
192.168.0.0/24 dev wlp3s0 proto kernel scope link src 192.168.0.43 metric 600

ip route show table 1
default via 192.168.0.1 dev enp2s0
192.168.0.0/24 dev enp2s0 scope link

ip route show table 2
default via 192.168.0.1 dev wlp3s0
192.168.0.0/24 dev wlp3s0 scope link

We can verify that the two routing tables are correct using nc by forcing it to use a specific source address.

echo -e "GET / HTTP/1.0\r\n" | nc -4 -s 192.168.0.37 test.multipath-tcp.org 80
HTTP/1.0 200 OK
Content-Type: text/html
ETag: "4215149735"
Last-Modified: Tue, 05 Jul 2022 16:11:47 GMT
Content-Length: 389
Connection: close
Date: Wed, 06 Jul 2022 11:34:24 GMT
Server: lighttpd/1.4.59

<!DOCTYPE html>
<html>
<head>
<title>Welcome to test.multipath-tcp.org!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to test.multipath-tcp.org !</h1>
<p>This web server runs Multipath TCP v1</p>

<p><em>Thank you for using Multipath TCP.</em></p>
</body>
</html>

You should get the same result when using the second interface, IP address 192.168.0.43 in our example.

echo -e "GET / HTTP/1.0\r\n" | nc -4 -s 192.168.0.43 test.multipath-tcp.org 80

The next step is to verify that Multipath TCP is working correctly and that two subflows are created. For this, we'll use the -i parameter of nc to add a delay between the two lines of the HTTP GET. We will leverage this delay to check that MPTCP is correctly working using ss or tcpdump

echo -e "GET / HTTP/1.0\r\n" | mptcpize run nc -4 -i 5 test.multipath-tcp.org 80

We can observe the creation of the connection and the subflow using both ss and tcpdump. ss shows that there are two subflows towards test.multipath-tcp.org.

ss -4 -iatM  dst test.multipath-tcp.org
Netid  State  Recv-Q  Send-Q         Local Address:Port      Peer Address:Port  Process
tcp    ESTAB  0       0        192.168.0.43%wlp3s0:34801     5.196.67.207:http
      cubic wscale:7,7 rto:220 rtt:17.439/8.719 mss:1448 pmtu:1500 rcvmss:536 advmss:1448 cwnd:10 bytes_acked:1 segs_out:2 segs_in:2 send 6.64Mbps lastsnd:1776 lastrcv:1776 lastack:1764 pacing_rate 13.3Mbps delivered:1 rcv_space:14480 rcv_ssthresh:64088 minrtt:17.439
tcp    ESTAB  0       0               192.168.0.37:47672     5.196.67.207:http
      cubic wscale:7,7 rto:216 rtt:14/5.405 mss:1448 pmtu:1500 rcvmss:536 advmss:1448 cwnd:10 bytes_sent:16 bytes_acked:17 segs_out:3 segs_in:3 data_segs_out:1 send 8.27Mbps lastsnd:1808 lastrcv:1808 lastack:1792 pacing_rate 16.5Mbps delivery_rate 790kbps delivered:2 busy:16ms rcv_space:14480 rcv_ssthresh:64088 minrtt:13.905
mptcp  ESTAB  0       0               192.168.0.37:47672     5.196.67.207:http
      subflows:1 subflows_max:8 remote_key token:e1e3cdeb write_seq:1045ecfa3f05f4ea snd_una:1045ecfa3f05f4ea rcv_nxt:d0568f430363c9aa

The line starting with mptcp indicates that the Multipath TCP connection above has one additional subflow.

The tcpdump output reveals precisely which packets have been sent over each network interface.

sudo tcpdump -n -i any host test.multipath-tcp.org and port 80tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
13:43:26.620667 enp2s0 Out IP 192.168.0.37.47672 > 5.196.67.207.80: Flags [S], seq 3585891423, win 64240, options [mss 1460,sackOK,TS val 3993892549 ecr 0,nop,wscale 7,mptcp capable v1], length 0
13:43:26.634537 enp2s0 In  IP 5.196.67.207.80 > 192.168.0.37.47672: Flags [S.], seq 1788691420, ack 3585891424, win 65160, options [mss 1460,sackOK,TS val 1030255900 ecr 3993892549,nop,wscale 7,mptcp capable v1 {0x54f04ad5bd2d9f42}], length 0
13:43:26.634609 enp2s0 Out IP 192.168.0.37.47672 > 5.196.67.207.80: Flags [.], ack 1, win 502, options [nop,nop,TS val 3993892563 ecr 1030255900,mptcp capable v1 {0xff2ec3a2f6151881,0x54f04ad5bd2d9f42}], length 0
13:43:26.634718 enp2s0 Out IP 192.168.0.37.47672 > 5.196.67.207.80: Flags [P.], seq 1:17, ack 1, win 502, options [nop,nop,TS val 3993892563 ecr 1030255900,mptcp capable v1 {0xff2ec3a2f6151881,0x54f04ad5bd2d9f42},nop,nop], length 16: HTTP: GET / HTTP/1.0
13:43:26.649351 enp2s0 In  IP 5.196.67.207.80 > 192.168.0.37.47672: Flags [.], ack 17, win 509, options [nop,nop,TS val 1030255916 ecr 3993892563,mptcp dss ack 1172603837543216362], length 0
13:43:26.649351 enp2s0 In  IP 5.196.67.207.80 > 192.168.0.37.47672: Flags [.], ack 17, win 509, options [nop,nop,TS val 1030255916 ecr 3993892563,mptcp dss ack 1172603837543216362], length 0
13:43:26.649498 wlp3s0 Out IP 192.168.0.43.34801 > 5.196.67.207.80: Flags [S], seq 2321572505, win 64240, options [mss 1460,sackOK,TS val 1218002018 ecr 0,nop,wscale 7,mptcp join id 8 token 0xeef7df2f nonce 0xc0d346f6], length 0
13:43:26.666895 wlp3s0 In  IP 5.196.67.207.80 > 192.168.0.43.34801: Flags [S.], seq 1973196884, ack 2321572506, win 65160, options [mss 1460,sackOK,TS val 1030255931 ecr 1218002018,nop,wscale 7,mptcp join id 0 hmac 0xc7489cf7056428b4 nonce 0xa54f9af], length 0
13:43:26.666966 wlp3s0 Out IP 192.168.0.43.34801 > 5.196.67.207.80: Flags [.], ack 1, win 502, options [nop,nop,TS val 1218002035 ecr 1030255931,mptcp join hmac 0xb4e6a41bf5861313df7f5f454966998ad7e698a4], length 0
13:43:26.677776 wlp3s0 In  IP 5.196.67.207.80 > 192.168.0.43.34801: Flags [.], ack 1, win 510, options [nop,nop,TS val 1030255944 ecr 1218002035,mptcp dss ack 1172603837543216362], length 0
13:43:31.635023 enp2s0 Out IP 192.168.0.37.47672 > 5.196.67.207.80: Flags [P.], seq 17:18, ack 1, win 502, options [nop,nop,TS val 3993897563 ecr 1030255916,mptcp dss ack 56871338 seq 1172603837543216362 subseq 17 len 1,nop,nop], length 1: HTTP
13:43:31.646703 enp2s0 In  IP 5.196.67.207.80 > 192.168.0.37.47672: Flags [.], ack 18, win 509, options [nop,nop,TS val 1030260913 ecr 3993897563,mptcp dss ack 1172603837543216363], length 0
13:43:31.647276 enp2s0 In  IP 5.196.67.207.80 > 192.168.0.37.47672: Flags [P.], seq 1:602, ack 18, win 509, options [nop,nop,TS val 1030260914 ecr 3993897563,mptcp dss ack 1172603837543216363 seq 15012343925868579242 subseq 1 len 601,nop,nop], length 601: HTTP: HTTP/1.0 200 OK
13:43:31.647300 enp2s0 Out IP 192.168.0.37.47672 > 5.196.67.207.80: Flags [.], ack 602, win 501, options [nop,nop,TS val 3993897576 ecr 1030260914,mptcp dss ack 15012343925868579843], length 0
13:43:31.647276 enp2s0 In  IP 5.196.67.207.80 > 192.168.0.37.47672: Flags [.], ack 18, win 509, options [nop,nop,TS val 1030260914 ecr 3993897563,mptcp dss fin ack 1172603837543216363 seq 15012343925868579843 subseq 0 len 1,nop,nop], length 0
13:43:31.647321 enp2s0 Out IP 192.168.0.37.47672 > 5.196.67.207.80: Flags [.], ack 602, win 501, options [nop,nop,TS val 3993897576 ecr 1030260914,mptcp dss ack 15012343925868579844], length 0
13:43:31.647330 wlp3s0 Out IP 192.168.0.43.34801 > 5.196.67.207.80: Flags [.], ack 1, win 502, options [nop,nop,TS val 1218002046 ecr 1030255944,mptcp dss ack 15012343925868579844], length 0
13:43:31.648565 wlp3s0 In  IP 5.196.67.207.80 > 192.168.0.43.34801: Flags [.], ack 1, win 510, options [nop,nop,TS val 1030255944 ecr 1218002035,mptcp dss fin ack 1172603837543216363 seq 15012343925868579843 subseq 0 len 1,nop,nop], length 0
13:43:36.635392 enp2s0 Out IP 192.168.0.37.47672 > 5.196.67.207.80: Flags [.], ack 602, win 501, options [nop,nop,TS val 3993897576 ecr 1030260914,mptcp dss fin ack 15012343925868579844 seq 1172603837543216363 subseq 0 len 1,nop,nop], length 0
13:43:36.635416 wlp3s0 Out IP 192.168.0.43.34801 > 5.196.67.207.80: Flags [.], ack 1, win 502, options [nop,nop,TS val 1218007017 ecr 1030255944,mptcp dss fin ack 15012343925868579844 seq 1172603837543216363 subseq 0 len 1,nop,nop], length 0
13:43:36.636468 enp2s0 Out IP 192.168.0.37.47672 > 5.196.67.207.80: Flags [.], ack 602, win 501, options [nop,nop,TS val 3993897576 ecr 1030260914,mptcp dss fin ack 15012343925868579844 seq 1172603837543216363 subseq 0 len 1,nop,nop], length 0
13:43:36.636482 wlp3s0 Out IP 192.168.0.43.34801 > 5.196.67.207.80: Flags [.], ack 1, win 502, options [nop,nop,TS val 1218007017 ecr 1030255944,mptcp dss fin ack 15012343925868579844 seq 1172603837543216363 subseq 0 len 1,nop,nop], length 0
13:43:36.640425 enp2s0 Out IP 192.168.0.37.47672 > 5.196.67.207.80: Flags [.], ack 602, win 501, options [nop,nop,TS val 3993897576 ecr 1030260914,mptcp dss fin ack 15012343925868579844 seq 1172603837543216363 subseq 0 len 1,nop,nop], length 0
13:43:36.640431 wlp3s0 Out IP 192.168.0.43.34801 > 5.196.67.207.80: Flags [.], ack 1, win 502, options [nop,nop,TS val 1218007017 ecr 1030255944,mptcp dss fin ack 15012343925868579844 seq 1172603837543216363 subseq 0 len 1,nop,nop], length 0
13:43:36.645605 enp2s0 In  IP 5.196.67.207.80 > 192.168.0.37.47672: Flags [.], ack 18, win 509, options [nop,nop,TS val 1030265912 ecr 3993897576,mptcp dss ack 1172603837543216364], length 0
13:43:36.645659 enp2s0 Out IP 192.168.0.37.47672 > 5.196.67.207.80: Flags [F.], seq 18, ack 602, win 501, options [nop,nop,TS val 3993902574 ecr 1030265912,mptcp dss ack 15012343925868579844], length 0
13:43:36.645674 wlp3s0 Out IP 192.168.0.43.34801 > 5.196.67.207.80: Flags [F.], seq 1, ack 1, win 502, options [nop,nop,TS val 1218012014 ecr 1030255944,mptcp dss ack 15012343925868579844], length 0
13:43:36.646315 wlp3s0 In  IP 5.196.67.207.80 > 192.168.0.43.34801: Flags [.], ack 1, win 510, options [nop,nop,TS val 1030260930 ecr 1218002046,mptcp dss ack 1172603837543216364], length 0
13:43:36.647699 enp2s0 In  IP 5.196.67.207.80 > 192.168.0.37.47672: Flags [F.], seq 602, ack 18, win 509, options [nop,nop,TS val 1030265912 ecr 3993897576,mptcp dss ack 1172603837543216364], length 0
13:43:36.647718 enp2s0 Out IP 192.168.0.37.47672 > 5.196.67.207.80: Flags [.], ack 603, win 501, options [nop,nop,TS val 3993902576 ecr 1030265912,mptcp dss ack 15012343925868579844], length 0
13:43:36.648629 wlp3s0 In  IP 5.196.67.207.80 > 192.168.0.43.34801: Flags [F.], seq 1, ack 1, win 510, options [nop,nop,TS val 1030265912 ecr 1218002046,mptcp dss ack 1172603837543216364], length 0
13:43:36.648649 wlp3s0 Out IP 192.168.0.43.34801 > 5.196.67.207.80: Flags [.], ack 2, win 502, options [nop,nop,TS val 1218012017 ecr 1030265912,mptcp dss ack 15012343925868579844], length 0
13:43:36.657040 enp2s0 In  IP 5.196.67.207.80 > 192.168.0.37.47672: Flags [.], ack 19, win 509, options [nop,nop,TS val 1030265923 ecr 3993902574,mptcp dss ack 1172603837543216364], length 0
13:43:36.662211 wlp3s0 In  IP 5.196.67.207.80 > 192.168.0.43.34801: Flags [.], ack 2, win 510, options [nop,nop,TS val 1030265928 ecr 1218012014,mptcp dss ack 1172603837543216364], length 0

If your host is dual stack, you also need to do the same configuration for IPv6 as well. Our test host uses the following IPv6 addresses.

ip -6 addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN qlen 1000
    inet6 ::1/128 scope host
    valid_lft forever preferred_lft forever
2: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 2a02:2788:10c4:123:f468:1851:9a9f:7d44/64 scope global temporary dynamic
    valid_lft 592298sec preferred_lft 73422sec
    inet6 2a02:2788:10c4:123:6636:10c6:692b:18cc/64 scope global dynamic mngtmpaddr noprefixroute
    valid_lft 1209600sec preferred_lft 604800sec
    inet6 fe80::5642:39bd:3390:43d3/64 scope link noprefixroute
    valid_lft forever preferred_lft forever
3: wlp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 2a02:2788:10c4:123:3d66:f590:d891:8fb3/64 scope global dynamic noprefixroute
    valid_lft 1209600sec preferred_lft 604800sec
    inet6 fe80::3934:7572:b1ff:b555/64 scope link noprefixroute
    valid_lft forever preferred_lft forever

We thus had to configure the following IPv6 routing tables. This is similar to the commands used to configure the IPv4 routing tables.

ip -6 rule add from 2a02:2788:10c4:123:6636:10c6:692b:18cc table 1
ip -6 rule add from 2a02:2788:10c4:123:3d66:f590:d891:8fb3 table 2
ip route add 2a02:2788:10c4:123::/64 dev enp2s0 scope link table 1
ip route add 2a02:2788:10c4:123::/64 dev wlp3s0 scope link table 2
ip route add default via fe80::10:18ff:fe07:fc33 dev enp2s0 table 1
ip route add default via fe80::10:18ff:fe07:fc33 dev wlp3s0 table 2
ip route add default scope global nexthop via fe80::10:18ff:fe07:fc33 dev enp2s0

Remember that if you want to create subflows using IPv6 addresses, you also need to configure the stack with ip mptcp endpoint add as you did for the IPv4 addresses.

Note

The current versions of the Linux kernel only use one address family at a time. If a connection was created using IPv4, then only IPv4 addresses will be used to create new subflows. Future versions of the kernel will allow to mix IPv4 and IPv6 subflows.

Analyzing the output of ss

Analyzing the output of nstat

The Linux TCP/IP stack maintains a lot of counters that track various events inside the kernel. These counters are very useful for system administrators who need to manage Linux hosts and debug some specific networking problems.

Linux supports a few hundred counters associated to the protocols in the network and transport layers. Other operating systems have defined their own counters to track similar networking events. Fortunately, the IETF has standard some counters that are common to different operating systems and TCP/IP implementations. These counters are exported as variables which can be queried using a management protocol such as SNMP. This enables a management server to collect statistics for a series of hosts to process and analyze them. Several versions of SNMP exist, but we will not discuss them in details in this document. Instead, we focus on the Linux TCP/IP implementation and explain the different counters that the nstat application exposes to the user.

Linux kernel version 5.18 collects 363 different counters that are divided in 7 categories :

  • 67 counters track the IPv4 implementation

  • 80 counters track the ICMPv4 implementation

  • 32 counters track the IPv6 implementation

  • 46 counters track ICMPv6

  • 135 counters track TCP

  • 35 counters track UDP

  • 46 counters track Multipath TCP

Some of these counters are part of the Management Information Bases (MIB) defined within the IETF, e.g. RFC 1213 for IPv4 and ICMPv4, RFC 4293 for IPv6 and ICMPv6, RFC 4022 for TCP, RFC 4113 for UDP. As of this writing, there is no official IETF MIB for Multipath TCP.

Using nstat

In this document, we describe the counters that are exposed by nstat for the different protocols of the TCP/IP stack. Before discussing these counters, it is useful to understand how nstat works.

nstat is a command line tool that supports a small number of arguments

nstat --help
Usage: nstat [OPTION] [ PATTERN [ PATTERN ] ]
       -h, --help            this message
       -a, --ignore  ignore history
       -d, --scan=SECS       sample every statistics every SECS
       -j, --json            format output in JSON
       -n, --nooutput        do history only
       -p, --pretty  pretty print
       -r, --reset           reset history
       -s, --noupdate        don't update history
       -t, --interval=SECS   report average over the last SECS
       -V, --version output version information
       -z, --zeros           show entries with zero activity

By default, nstat displays the counters whose value has changed since the latest invocation of the tool. This is usually a small subset of the counters that depends on the networking activity of the host.

nstat can collect historical information and provides average counters.

nstat can also list the current value of the different counters.

#nstat -az
       #kernel
       IpInReceives                    1073367            0.0
       IpInHdrErrors                   0                  0.0
       IpInAddrErrors                  0                  0.0
       IpForwDatagrams                 0                  0.0
       IpInUnknownProtos               0                  0.0
       IpInDiscards                    0                  0.0
       IpInDelivers                    1072518            0.0
       IpOutRequests                   484889             0.0
       IpOutDiscards                   0                  0.0
       IpOutNoRoutes                   0                  0.0
       IpReasmTimeout                  0                  0.0
       IpReasmReqds                    0                  0.0
       IpReasmOKs                      0                  0.0
       IpReasmFails                    0                  0.0
       IpFragOKs                       0                  0.0
       IpFragFails                     0                  0.0
       IpFragCreates                   0                  0.0
       IcmpInMsgs                      561                0.0
       IcmpInErrors                    125                0.0
       IcmpInCsumErrors                0                  0.0
       IcmpInDestUnreachs              6                  0.0
       IcmpInTimeExcds                 125                0.0
       IcmpInParmProbs                 0                  0.0
       IcmpInSrcQuenchs                0                  0.0
       IcmpInRedirects                 0                  0.0
       IcmpInEchos                     298                0.0
       IcmpInEchoReps                  0                  0.0
       IcmpInTimestamps                33                 0.0
       IcmpInTimestampReps             0                  0.0
       IcmpInAddrMasks                 99                 0.0
       IcmpInAddrMaskReps              0                  0.0
       IcmpOutMsgs                     331                0.0
       IcmpOutErrors                   0                  0.0
       IcmpOutDestUnreachs             0                  0.0
       IcmpOutTimeExcds                0                  0.0
       IcmpOutParmProbs                0                  0.0
       IcmpOutSrcQuenchs               0                  0.0
       IcmpOutRedirects                0                  0.0
       IcmpOutEchos                    0                  0.0
       IcmpOutEchoReps                 298                0.0
       IcmpOutTimestamps               0                  0.0
       IcmpOutTimestampReps            33                 0.0
       IcmpOutAddrMasks                0                  0.0
       IcmpOutAddrMaskReps             0                  0.0
       IcmpMsgInType3                  6                  0.0
       IcmpMsgInType8                  298                0.0
       IcmpMsgInType11                 125                0.0
       IcmpMsgInType13                 33                 0.0
       IcmpMsgInType17                 99                 0.0
       IcmpMsgOutType0                 298                0.0
       IcmpMsgOutType14                33                 0.0
       TcpActiveOpens                  3330               0.0
       TcpPassiveOpens                 252                0.0
       TcpAttemptFails                 0                  0.0
       TcpEstabResets                  78                 0.0
       TcpInSegs                       3202615            0.0
       TcpOutSegs                      6431616            0.0
       TcpRetransSegs                  7584               0.0
       TcpInErrs                       0                  0.0
       TcpOutRsts                      102                0.0
       TcpInCsumErrors                 0                  0.0
       UdpInDatagrams                  18972              0.0
       UdpNoPorts                      0                  0.0
       UdpInErrors                     0                  0.0
       UdpOutDatagrams                 19257              0.0
       UdpRcvbufErrors                 0                  0.0
       UdpSndbufErrors                 0                  0.0
       UdpInCsumErrors                 0                  0.0
       UdpIgnoredMulti                 19989              0.0
       UdpMemErrors                    0                  0.0
       UdpLiteInDatagrams              0                  0.0
       UdpLiteNoPorts                  0                  0.0
       UdpLiteInErrors                 0                  0.0
       UdpLiteOutDatagrams             0                  0.0
       UdpLiteRcvbufErrors             0                  0.0
       UdpLiteSndbufErrors             0                  0.0
       UdpLiteInCsumErrors             0                  0.0
       UdpLiteIgnoredMulti             0                  0.0
       UdpLiteMemErrors                0                  0.0
       Ip6InReceives                   2198489            0.0
       Ip6InHdrErrors                  0                  0.0
       Ip6InTooBigErrors               0                  0.0
       Ip6InNoRoutes                   200                0.0
       Ip6InAddrErrors                 0                  0.0
       Ip6InUnknownProtos              0                  0.0
       Ip6InTruncatedPkts              0                  0.0
       Ip6InDiscards                   0                  0.0
       Ip6InDelivers                   2177604            0.0
       Ip6OutForwDatagrams             0                  0.0
       Ip6OutRequests                  1567967            0.0
       Ip6OutDiscards                  0                  0.0
       Ip6OutNoRoutes                  6                  0.0
       Ip6ReasmTimeout                 0                  0.0
       Ip6ReasmReqds                   0                  0.0
       Ip6ReasmOKs                     0                  0.0
       Ip6ReasmFails                   0                  0.0
       Ip6FragOKs                      0                  0.0
       Ip6FragFails                    0                  0.0
       Ip6FragCreates                  0                  0.0
       Ip6InMcastPkts                  20785              0.0
       Ip6OutMcastPkts                 13                 0.0
       Ip6InOctets                     2578707266         0.0
       Ip6OutOctets                    3533261025         0.0
       Ip6InMcastOctets                1442288            0.0
       Ip6OutMcastOctets               1252               0.0
       Ip6InBcastOctets                0                  0.0
       Ip6OutBcastOctets               0                  0.0
       Ip6InNoECTPkts                  2060704            0.0
       Ip6InECT1Pkts                   0                  0.0
       Ip6InECT0Pkts                   137799             0.0
       Ip6InCEPkts                     0                  0.0
       Icmp6InMsgs                     7525               0.0
       Icmp6InErrors                   0                  0.0
       Icmp6OutMsgs                    7511               0.0
       Icmp6OutErrors                  0                  0.0
       Icmp6InCsumErrors               0                  0.0
       Icmp6InDestUnreachs             10                 0.0
       Icmp6InPktTooBigs               0                  0.0
       Icmp6InTimeExcds                0                  0.0
       Icmp6InParmProblems             0                  0.0
       Icmp6InEchos                    2                  0.0
       Icmp6InEchoReplies              6                  0.0
       Icmp6InGroupMembQueries         0                  0.0
       Icmp6InGroupMembResponses       0                  0.0
       Icmp6InGroupMembReductions      0                  0.0
       Icmp6InRouterSolicits           0                  0.0
       Icmp6InRouterAdvertisements     0                  0.0
       Icmp6InNeighborSolicits         4316               0.0
       Icmp6InNeighborAdvertisements   3189               0.0
       Icmp6InRedirects                0                  0.0
       Icmp6InMLDv2Reports             2                  0.0
       Icmp6OutDestUnreachs            0                  0.0
       Icmp6OutPktTooBigs              0                  0.0
       Icmp6OutTimeExcds               0                  0.0
       Icmp6OutParmProblems            0                  0.0
       Icmp6OutEchos                   6                  0.0
       Icmp6OutEchoReplies             2                  0.0
       Icmp6OutGroupMembQueries        0                  0.0
       Icmp6OutGroupMembResponses      0                  0.0
       Icmp6OutGroupMembReductions     0                  0.0
       Icmp6OutRouterSolicits          0                  0.0
       Icmp6OutRouterAdvertisements    0                  0.0
       Icmp6OutNeighborSolicits        3179               0.0
       Icmp6OutNeighborAdvertisements  4316               0.0
       Icmp6OutRedirects               0                  0.0
       Icmp6OutMLDv2Reports            8                  0.0
       Icmp6InType1                    10                 0.0
       Icmp6InType128                  2                  0.0
       Icmp6InType129                  6                  0.0
       Icmp6InType135                  4316               0.0
       Icmp6InType136                  3189               0.0
       Icmp6InType143                  2                  0.0
       Icmp6OutType128                 6                  0.0
       Icmp6OutType129                 2                  0.0
       Icmp6OutType135                 3179               0.0
       Icmp6OutType136                 4316               0.0
       Icmp6OutType143                 8                  0.0
       Udp6InDatagrams                 460                0.0
       Udp6NoPorts                     0                  0.0
       Udp6InErrors                    0                  0.0
       Udp6OutDatagrams                95                 0.0
       Udp6RcvbufErrors                0                  0.0
       Udp6SndbufErrors                0                  0.0
       Udp6InCsumErrors                0                  0.0
       Udp6IgnoredMulti                0                  0.0
       Udp6MemErrors                   0                  0.0
       UdpLite6InDatagrams             0                  0.0
       UdpLite6NoPorts                 0                  0.0
       UdpLite6InErrors                0                  0.0
       UdpLite6OutDatagrams            0                  0.0
       UdpLite6RcvbufErrors            0                  0.0
       UdpLite6SndbufErrors            0                  0.0
       UdpLite6InCsumErrors            0                  0.0
       UdpLite6MemErrors               0                  0.0
       TcpExtSyncookiesSent            0                  0.0
       TcpExtSyncookiesRecv            0                  0.0
       TcpExtSyncookiesFailed          0                  0.0
       TcpExtEmbryonicRsts             0                  0.0
       TcpExtPruneCalled               3791               0.0
       TcpExtRcvPruned                 0                  0.0
       TcpExtOfoPruned                 0                  0.0
       TcpExtOutOfWindowIcmps          0                  0.0
       TcpExtLockDroppedIcmps          0                  0.0
       TcpExtArpFilter                 0                  0.0
       TcpExtTW                        2283               0.0
       TcpExtTWRecycled                0                  0.0
       TcpExtTWKilled                  0                  0.0
       TcpExtPAWSActive                0                  0.0
       TcpExtPAWSEstab                 11                 0.0
       TcpExtDelayedACKs               31995              0.0
       TcpExtDelayedACKLocked          47                 0.0
       TcpExtDelayedACKLost            282                0.0
       TcpExtListenOverflows           0                  0.0
       TcpExtListenDrops               0                  0.0
       TcpExtTCPHPHits                 699069             0.0
       TcpExtTCPPureAcks               997468             0.0
       TcpExtTCPHPAcks                 1235546            0.0
       TcpExtTCPRenoRecovery           0                  0.0
       TcpExtTCPSackRecovery           2526               0.0
       TcpExtTCPSACKReneging           0                  0.0
       TcpExtTCPSACKReorder            36858              0.0
       TcpExtTCPRenoReorder            0                  0.0
       TcpExtTCPTSReorder              85                 0.0
       TcpExtTCPFullUndo               1                  0.0
       TcpExtTCPPartialUndo            67                 0.0
       TcpExtTCPDSACKUndo              11                 0.0
       TcpExtTCPLossUndo               0                  0.0
       TcpExtTCPLostRetransmit         184                0.0
       TcpExtTCPRenoFailures           0                  0.0
       TcpExtTCPSackFailures           0                  0.0
       TcpExtTCPLossFailures           0                  0.0
       TcpExtTCPFastRetrans            7084               0.0
       TcpExtTCPSlowStartRetrans       0                  0.0
       TcpExtTCPTimeouts               168                0.0
       TcpExtTCPLossProbes             345                0.0
       TcpExtTCPLossProbeRecovery      82                 0.0
       TcpExtTCPRenoRecoveryFail       0                  0.0
       TcpExtTCPSackRecoveryFail       0                  0.0
       TcpExtTCPRcvCollapsed           0                  0.0
       TcpExtTCPBacklogCoalesce        10938              0.0
       TcpExtTCPDSACKOldSent           300                0.0
       TcpExtTCPDSACKOfoSent           49                 0.0
       TcpExtTCPDSACKRecv              317                0.0
       TcpExtTCPDSACKOfoRecv           2                  0.0
       TcpExtTCPAbortOnData            25                 0.0
       TcpExtTCPAbortOnClose           54                 0.0
       TcpExtTCPAbortOnMemory          0                  0.0
       TcpExtTCPAbortOnTimeout         4                  0.0
       TcpExtTCPAbortOnLinger          0                  0.0
       TcpExtTCPAbortFailed            0                  0.0
       TcpExtTCPMemoryPressures        0                  0.0
       TcpExtTCPMemoryPressuresChrono  0                  0.0
       TcpExtTCPSACKDiscard            0                  0.0
       TcpExtTCPDSACKIgnoredOld        2                  0.0
       TcpExtTCPDSACKIgnoredNoUndo     272                0.0
       TcpExtTCPSpuriousRTOs           0                  0.0
       TcpExtTCPMD5NotFound            0                  0.0
       TcpExtTCPMD5Unexpected          0                  0.0
       TcpExtTCPMD5Failure             0                  0.0
       TcpExtTCPSackShifted            34290              0.0
       TcpExtTCPSackMerged             11301              0.0
       TcpExtTCPSackShiftFallback      40480              0.0
       TcpExtTCPBacklogDrop            0                  0.0
       TcpExtPFMemallocDrop            0                  0.0
       TcpExtTCPMinTTLDrop             0                  0.0
       TcpExtTCPDeferAcceptDrop        0                  0.0
       TcpExtIPReversePathFilter       0                  0.0
       TcpExtTCPTimeWaitOverflow       0                  0.0
       TcpExtTCPReqQFullDoCookies      0                  0.0
       TcpExtTCPReqQFullDrop           0                  0.0
       TcpExtTCPRetransFail            0                  0.0
       TcpExtTCPRcvCoalesce            100585             0.0
       TcpExtTCPOFOQueue               15954              0.0
       TcpExtTCPOFODrop                0                  0.0
       TcpExtTCPOFOMerge               38                 0.0
       TcpExtTCPChallengeACK           0                  0.0
       TcpExtTCPSYNChallenge           0                  0.0
       TcpExtTCPFastOpenActive         0                  0.0
       TcpExtTCPFastOpenActiveFail     0                  0.0
       TcpExtTCPFastOpenPassive        0                  0.0
       TcpExtTCPFastOpenPassiveFail    0                  0.0
       TcpExtTCPFastOpenListenOverflow 0                  0.0
       TcpExtTCPFastOpenCookieReqd     0                  0.0
       TcpExtTCPFastOpenBlackhole      0                  0.0
       TcpExtTCPSpuriousRtxHostQueues  0                  0.0
       TcpExtBusyPollRxPackets         0                  0.0
       TcpExtTCPAutoCorking            73847              0.0
       TcpExtTCPFromZeroWindowAdv      40                 0.0
       TcpExtTCPToZeroWindowAdv        40                 0.0
       TcpExtTCPWantZeroWindowAdv      2870               0.0
       TcpExtTCPSynRetrans             91                 0.0
       TcpExtTCPOrigDataSent           5948573            0.0
       TcpExtTCPHystartTrainDetect     34                 0.0
       TcpExtTCPHystartTrainCwnd       1880               0.0
       TcpExtTCPHystartDelayDetect     3                  0.0
       TcpExtTCPHystartDelayCwnd       261                0.0
       TcpExtTCPACKSkippedSynRecv      0                  0.0
       TcpExtTCPACKSkippedPAWS         9                  0.0
       TcpExtTCPACKSkippedSeq          11                 0.0
       TcpExtTCPACKSkippedFinWait2     0                  0.0
       TcpExtTCPACKSkippedTimeWait     0                  0.0
       TcpExtTCPACKSkippedChallenge    0                  0.0
       TcpExtTCPWinProbe               0                  0.0
       TcpExtTCPKeepAlive              67                 0.0
       TcpExtTCPMTUPFail               0                  0.0
       TcpExtTCPMTUPSuccess            0                  0.0
       TcpExtTCPDelivered              5951000            0.0
       TcpExtTCPDeliveredCE            0                  0.0
       TcpExtTCPAckCompressed          3021               0.0
       TcpExtTCPZeroWindowDrop         0                  0.0
       TcpExtTCPRcvQDrop               0                  0.0
       TcpExtTCPWqueueTooBig           0                  0.0
       TcpExtTCPFastOpenPassiveAltKey  0                  0.0
       TcpExtTcpTimeoutRehash          72                 0.0
       TcpExtTcpDuplicateDataRehash    0                  0.0
       TcpExtTCPDSACKRecvSegs          371                0.0
       TcpExtTCPDSACKIgnoredDubious    0                  0.0
       TcpExtTCPMigrateReqSuccess      0                  0.0
       TcpExtTCPMigrateReqFailure      0                  0.0
       IpExtInNoRoutes                 0                  0.0
       IpExtInTruncatedPkts            0                  0.0
       IpExtInMcastPkts                62                 0.0
       IpExtOutMcastPkts               24                 0.0
       IpExtInBcastPkts                19989              0.0
       IpExtOutBcastPkts               0                  0.0
       IpExtInOctets                   533061309          0.0
       IpExtOutOctets                  5153892360         0.0
       IpExtInMcastOctets              7448               0.0
       IpExtOutMcastOctets             3592               0.0
       IpExtInBcastOctets              2082276            0.0
       IpExtOutBcastOctets             0                  0.0
       IpExtInCsumErrors               0                  0.0
       IpExtInNoECTPkts                1073527            0.0
       IpExtInECT1Pkts                 0                  0.0
       IpExtInECT0Pkts                 0                  0.0
       IpExtInCEPkts                   0                  0.0
       IpExtReasmOverlaps              0                  0.0
       MPTcpExtMPCapableSYNRX          0                  0.0
       MPTcpExtMPCapableSYNTX          2203               0.0
       MPTcpExtMPCapableSYNACKRX       2172               0.0
       MPTcpExtMPCapableACKRX          0                  0.0
       MPTcpExtMPCapableFallbackACK    0                  0.0
       MPTcpExtMPCapableFallbackSYNACK 22                 0.0
       MPTcpExtMPFallbackTokenInit     0                  0.0
       MPTcpExtMPTCPRetrans            0                  0.0
       MPTcpExtMPJoinNoTokenFound      0                  0.0
       MPTcpExtMPJoinSynRx             0                  0.0
       MPTcpExtMPJoinSynAckRx          0                  0.0
       MPTcpExtMPJoinSynAckHMacFailure 0                  0.0
       MPTcpExtMPJoinAckRx             0                  0.0
       MPTcpExtMPJoinAckHMacFailure    0                  0.0
       MPTcpExtDSSNotMatching          0                  0.0
       MPTcpExtInfiniteMapRx           0                  0.0
       MPTcpExtDSSNoMatchTCP           0                  0.0
       MPTcpExtDataCsumErr             0                  0.0
       MPTcpExtOFOQueueTail            0                  0.0
       MPTcpExtOFOQueue                0                  0.0
       MPTcpExtOFOMerge                0                  0.0
       MPTcpExtNoDSSInWindow           0                  0.0
       MPTcpExtDuplicateData           0                  0.0
       MPTcpExtAddAddr                 0                  0.0
       MPTcpExtEchoAdd                 0                  0.0
       MPTcpExtPortAdd                 0                  0.0
       MPTcpExtAddAddrDrop             0                  0.0
       MPTcpExtMPJoinPortSynRx         0                  0.0
       MPTcpExtMPJoinPortSynAckRx      0                  0.0
       MPTcpExtMPJoinPortAckRx         0                  0.0
       MPTcpExtMismatchPortSynRx       0                  0.0
       MPTcpExtMismatchPortAckRx       0                  0.0
       MPTcpExtRmAddr                  0                  0.0
       MPTcpExtRmAddrDrop              0                  0.0
       MPTcpExtRmSubflow               0                  0.0
       MPTcpExtMPPrioTx                0                  0.0
       MPTcpExtMPPrioRx                0                  0.0
       MPTcpExtMPFailTx                0                  0.0
       MPTcpExtMPFailRx                0                  0.0
       MPTcpExtMPFastcloseTx           0                  0.0
       MPTcpExtMPFastcloseRx           0                  0.0
       MPTcpExtMPRstTx                 17                 0.0
       MPTcpExtMPRstRx                 0                  0.0
       MPTcpExtRcvPruned               0                  0.0
       MPTcpExtSubflowStale            0                  0.0
       MPTcpExtSubflowRecover          0                  0.0

Among all these variables, the ones named \*Ext\* are Linux specific variables that are not defined in IETF MIBs. The others are usually defined in an IETF RFC. The counters maintained by the Linux kernel are defined in include/uapi/linux/snmp.h and net/mptcp/mib.h for the Multipath TCP counters. Each of the counters exposed by nstat correspond to one specific identifier in the Linux kernel. For example, the beginning of the IP part of the counters is defined as follows:

enum
{
     IPSTATS_MIB_NUM = 0,
     /* frequently written fields in fast path, kept in same cache line */
     IPSTATS_MIB_INPKTS,                     /* InReceives */
     IPSTATS_MIB_INOCTETS,                   /* InOctets */
     IPSTATS_MIB_INDELIVERS,                 /* InDelivers */
     IPSTATS_MIB_OUTFORWDATAGRAMS,           /* OutForwDatagrams */
     IPSTATS_MIB_OUTPKTS,                    /* OutRequests */
     IPSTATS_MIB_OUTOCTETS,                  /* OutOctets */
     /* other fields */

Before looking at the precise meaning of each of the counters managed by nstat, it is interesting to recall the definition of the Case diagrams. This graphical representation of SNMP variables can be really useful to understand the meaning of the Linux networking counters.

The Case diagrams

The Case diagrams were introduced by Jeffrey Case and Craig Partridge in 1989 in the paper Case diagrams: a first step to diagrammed management information bases. This article describes a simple but powerful graphical representation of the interactions among the different SNMP variables that a networking stack maintains.

A Case diagram represents the flow of packets through a stack and the different variables that are updated as the packet progress through the stack. The incoming packets are represented as progressing from the bottom layer of the stack to the upper layer, while the outgoing packets are represented in the other direction. The progression of these packets is represented using a large arrow. An horizontal line that crosses this arrow indicates the point in the stack where the associated SNMP counter is updated. A small that leaves the main packet processing flow indicates a specific treatment for a packet and a counter that is updated. In some cases, an arrow enters the main workflow and updates the associated counter.

The original paper used the IP counters of the MIB-2 to illustrate the Case diagrams. This figure is reproduced below in ASCII format to simplify the updates to the document.

                   Transport Layer
-----------------------------------------------------
                         /\
                         ||
                         ||
 IpInDelivers +++++++++++++++
                         ||
                         ||
                         |+-----------> IpInUnknownProtos
                         ||
                         ||
 IpInDiscards <----------+|
                         ||
                         |<----------------  IpReasmOKs
                         ||                     /\
                         ||   IpReasmFails  <---+|
                         ||                     ||
                         |+--------------> IpReasmReqds
                         ||
                         ||
 IpForwDatagrams <-------+|
                         ||
                         |+--------------> IpInAddrErrors
                         ||
                         ||
 IpInHdrErrors <---------+|
                         ||
                         ||
                         ||
                      +++++++++++++++++++ IpInReceives
                         ||
-----------------------------------------------------
                   Interface Layer

The Case diagram above shows how the packets are processed by the IP stack. First, the Interface layer extracts the payload of the received frame and passes it to the IP layer. At this point, the IpInReceives counter is incremented. The processing of the IPv4 packet starts. First, the stack checks for errors inside the IPv4 header. If an error is detected in the IPv4 header, the packet is dropped and IpInHdrErrors is incremented. Then, the destination address is checked. If the address is incorrect, the packet processing stops and IpInAddrErrors is incremented.

If IP forwarding is enabled and the packet is not destined to this host, then the packet is forwarded using the FIB. The IpForwDatagrams counter is incremented.

The next step is to check whether the received packet is a fragment of a larger packet that needs to be reassembled. If the received packet is a fragment, then the IpReasmReqds counter is incremented and the packet passed through the reassembly process. This reassembly can take time since more fragments can be required to recover a complete packet. If the packet reassembly succeeds, then IpReasmOKs is incremented and the processing of the full packet continues. If the reassembly fails, e.g. because a fragment is missing before the timeout expires, then IpReamsFails gets incremented.

A this point, the packets have almost finished to be processed by the IP stack. Most packets will be delivered to the transport layer and increment the IpInDelivers counter except if the IP queue becomes full. In this case, the IpInDiscards counter is incremented. The incoming packet could also be discarded if its Protocol field does not match one of the transport layers supported by the stack (i.e. UDP, TCP, DCCP, ...). In this case, the IpInUnknownProtos counter is incremented.

The Multipath TCP counters

Linux version 5.18 maintains 46 counters for Multipath TCP. These counters correspond to different parts of the protocol and can be organized in four groups. The first group gathers the counters that are incremented when TCP packets containing the MP_CAPABLE option are processed. The second group gathers the counters that are incremented when processing packets with the MP_JOIN option. The third group gathers the counters that are modified when packets with the ADD_ADDR, RM_ADDR or MP_PRIO option are processed. The fourth group gathers the remaining counters of the Multipath TCP stack.

Two versions of Multipath TCP have been specified within the IETF. Version 0 was initially defined in RFC 6824. The off-tree but well maintained set of patches distributed by https://www.multipath-tcp.org implemented this version of Multipath TCP. Based on the experience gathered with this implementation and also Apple's implementation, Multipath TCP evolved and the IETF published version 1 in RFC 8684. The Multipath TCP counters correspond to this version of Multipath TCP.

The MPCapable counters

This group gathers the following counters: MPTcpExtMPCapableSYNRX, MPTcpExtMPCapableSYNTX, MPTcpExtMPCapableSYNACKRX, MPTcpExtMPCapableACKRX, MPTcpExtMPCapableFallbackACK, MPTcpExtMPCapableFallbackSYNACK and MPTcpExtMPFallbackTokenInit. They relate to the establishment of the initial Multipath TCP subflow which is described in the The Multipath TCP handshake section.

The MPTcpExtMPCapableSYNTX counter is similar to the TcpActiveOpens counter maintained by TCP. It counts the number of Multipath TCP connections that this host has tried to establish. Its value will usually be much smaller than TcpActiveOpens. When a Multipath connection is initiated using the connect system call, both MPTcpExtMPCapableSYNTX and TcpActiveOpens are incremented. Although the name of the counter is MPTcpExtMPCapableSYNTX, it is only incremented once per Multipath TCP connection if the SYN packet needs to be retransmitted.

The MPTcpExtMPCapableSYNACKRX counter is incremented every time a Multipath TCP connection is confirmed by the reception of a SYN+ACK with the MP_CAPABLE option to a SYN packet that it sent earlier. The value of this counter should be lower than MPTcpExtMPCapableSYNTX since only a subset of the connections initiated by a host will typically reach a Multipath TCP compliant server. If a client receives a SYN+ACK without the MP_CAPABLE option in response to a SYN sent with the MP_CAPABLE option, then the MPTcpExtMPCapableFallbackSYNACK counter is incremented. This counter tracks the Multipath TCP connections that were forced to fall back to regular TCP during the three-way handshake of the initial subflow.

On the other hand, the MPTcpExtMPCapableSYNRX counter tracks the number of Multipath TCP connections that were accepted by the host. Its value will usually be much smaller than TcpPassiveOpens which tracks all accepted TCP connections. When a Multipath connection is accepted, both MPTcpExtMPCapableSYNRX and TcpPassiveOpens are incremented. As for MPTcpExtMPCapableSYNTX, the MPTcpExtMPCapableSYNRX counter is only incremented once per connection and not each time a packet is received. Upon reception of a SYN with the MP_CAPABLE option, a Multipath TCP server returns a SYN+ACK with the MP_CAPABLE option. The MPTcpExtMPCapableACKRX counter is incremented upon reception of the third ACK containing the MP_CAPABLE option. If this option is not present in this ACK, then the MPTcpExtMPCapableFallbackACK gets incremented. If this counter increases, it probably indicates some interference with a middlebox that injects acknowledgments during the three-way handshake.

Listing 2 The MPCapable counters (active opens)
                         ||
                         ||
 MPTcpExtMPCapableSYNTX+++++++++++++++
                         ||
                         ||
                         |+-----------> MPTcpExtMPCapableSYNACKRX
                         ||
                         ||
                         |+-----------> MPTcpExtMPCapableFallbackSYNACK
                         ||
                         ||
Listing 3 The MPCapable counters (passive opens)
                         ||
                         ||
 MPTcpExtMPCapableSYNRX+++++++++++++++
                         ||
                         ||
                         |+-----------> MPTcpExtMPCapableACKRX
                         ||
                         ||
                         |+-----------> MPTcpExtMPCapableFallbackACK
                         ||
                         ||

The Join counters

There are thirteen counters in this group. They are incremented when a host processes SYN packets corresponding to additional subflows.

The first counter, MPTcpExtMPJoinSynRx is incremented every time a SYN packet with the MP_JOIN option is received. Upon reception of a such packet, the host first verifies that it knows the token of the Multipath TCP connection. If so, the processing continues and the host returns a SYN+ACK packet with the MP_JOIN option, its random number and a HMAC. Otherwise, the MPTcpExtMPJoinNoTokenFound counter is incremented. The host then waits for the third ACK which contains the MP_JOIN option and the HMAC computed by the remote host. It then checks the validity of the received HMAC. If the HMAC is invalid, then the MPTcpExtMPJoinAckHMacFailure counter is incremented.

The MPTcpExtMPJoinSynRx counter will increase on Multipath TCP hosts that accept subflows, typically servers. The value of the MPTcpExtMPJoinACKRX counter should be close to the previous one. If the two other counters, MPTcpExtMPJoinNoTokenFound or MPTcpExtMPJoinAckHMacFailure increase, then the system administrator should probably investigate as these are indication of possible attacks.

Listing 4 The Join counters when accepting subflows
                         ||
                         ||
 MPTcpExtMPJoinSynRX +++++++++++++++
                         ||
                         ||
                         |+-----------> MPTcpExtMPJoinNoTokenFound
                         ||
                         ||
 MPTcpExtMPJoinACKRX +++++++++++++++
                         ||
                         |+-----------> MPTcpExtMPJoinAckHMacFailure
                         ||
                         ||

Unfortunately, there is no counter that tracks the creation of new subflows by a host. The TCP stack counts these new subflows as active opens, but there is no specific Multipath TCP counter. However, the MPTcpExtMPJoinSynAckRX counter tracks the reception of SYN+ACK packets containing the MP_JOIN option. This is thus an indirect way to track the creation of new subflows. Upon reception of such a packet, in response to a previously sent SYN packet with the MP_JOIN option, a host checks the validity of the received HMAC. If the HMAC is invalid, the MPTcpExtMPJoinSynAckHMacFailure is incremented. This counter should rarely increase. If it increases, then the problem should be investigated by collecting packet traces.

Listing 5 The Join counters when initiating subflows
                         ||
                         ||
 MPTcpExtMPJoinSynAckRX +++++++++++++++
                         ||
                         ||
                         |+-----------> MPTcpExtMPJoinSynAckHMacFailure
                         ||
                         ||

A Multipath TCP host will usually accept additional subflows on the address and ports where the initial subflow was accepted. The following counters track the arrival of packets destined to different port numbers:

  • MPTcpExtMPJoinPortSynRx

  • MPTcpExtMPJoinPortSynAckRx MPTcpExtMPJoinPortAckRx

The last two counters, MPTcpExtMismatchPortSynRx and MPTcpExtMismatchPortAckRx are a bit different. They are incremented when a SYN or ACK sent to a different port number are received.

The MP_JOIN option contains a B that indicates whether the new subflow should be considered as a backup subflow or a regular one. This information is used by the path manager, but no counter tracks the value of the backup bit in the MP_JOIN option. Once a subflow has been established, its backup status can be changed using the MP_PRIO option. The MPTcpExtMPPrioTx counter is incremented every time such an option is sent. The MPTcpExtMPPrioRx counter is incremented by each received MP_PRIO option.

The address advertisement counters

There are six counters in this group. The advertisement of addresses by Multipath TCP is described in ref:Address management <mmtpbook:mptcp-addr-management>.

When a host receives a packet with a valid ADD_ADDR option with its Echo bit set to zero, the MPTcpExtAddAddr counter is incremented. If this option includes an optional port number, the MPTcpExtPortAdd counter is also incremented. In addition to these two counters, the MPTcpExtAddAddrDrop tracks the address advertisements that were received by the host, but not processed by the path manager, e.g. because no user space path manager was active.

Multipath TCP does not track the advertisements of addresses by sending the ADD_ADDR option. However, it tracks the reception of packets containing the ADD_ADDR option with the Echo bit set to one with the MPTcpExtEchoAdd counter. These packets are echoed by the remote host.

Similarly, the MPTcpExtRmAddr counter tracks the number of received RM_ADDR options. These options typically indicate a change in the addresses owned by a remote peer. Mobile hosts are likely to send these options when they move from one type of network to another. The MPTcpExtRmAddrDrop is incremented when the path manager cannot process an incoming RM_ADDR option.

When a host receives a RM_ADDR option from a remote peer, its path manager should remove the subflows associated with this address. The MPTcpExtRmSubflow counter tracks the number of subflows that have been destroyed by a path manager.

The connection termination counters

There are seven counters in this group. They track the abnormal termination of a Multipath TCP connection. A normal Multipath TCP connection should end with the exchange of DATA_FIN in both directions. However, are scenarios are possible. First, one of the hosts may wish to quickly terminate the Multipath TCP connection without having to maintain state. Multipath TCP uses the FAST_CLOSE option in this case. The MPTcpExtMPFastcloseTx and MPTcpExtMPFastcloseRx counters track the transmission and the reception of such options.

Multipath TCP was designed to prevent as much as possible interference from middleboxes, but there are some types of interferences that force Multipath TCP to fallback to regular TCP. In this case, the host that first noticed the interference (e.g. problem during the handshake, DSS checksum problem, ...) sends a packet with the MP_FAIL option. This forces the Multipath TCP connection to fall back to a regular TCP connection. The MPTcpExtMPFailTx and MPTcpExtMPFailRx counters track the transmission and the reception of the MP_FAIL option. During some types of fall backs, a host may also send an infinite DSS mapping. The MPTcpExtInfiniteMapRx counter tracks the reception of such infinite DSS mappings.

An increase of these counters would indicate some type of middlebox interference which should be investigated since it could prevent a complete utilization of Multipath TCP.

Like TCP, Multipath TCP uses TCP RST to terminate subflows. Multipath TCP also defines the MP_TCPRST option which can contain an option reason code and flags indicating some information about the reason for the transmission of the RST. The MPTcpExtMPRstTx and MPTcpExtMPRstRx counters track the transmission and the reception of such RST packets.

The other counters

The remaining eleven counters are mainly related to processing of data.

If the DSS checksum is enabled, the MPTcpExtDataCsumErr is incremented every time a check of the DSS checksum fails. This should be a rare event that likely indicates the presence of middleboxes. It should be correlated with the MPTcpExtMPFailTx and MPTcpExtMPFailRx counters discussed in the previous section.

Three counters track the DSS option of the incoming packets : MPTcpExtDSSNotMatching, MPTcpExtDSSNoMatchTCP and MPTcpExtNoDSSInWindow. The first counter is incremented when a mapping is received for data that has already been mapped and the new mapping is not the same as the existing one. The second counter is incremented when the TCP sequence numbers found in the mapping do not match with the current TCP sequence numbers. The third counter is incremented upon reception of a packet that indicates a DSS option that is outside the current window. These three counters should rarely increase.

The last counter that tracks data at the Multipath TCP connection level is MPTcpExtDuplicateData. It counts the number of received packets whose data has been ignored because it had already been received earlier. Such duplicated data can occur with Multipath TCP when data sent over a subflow is retransmitted over another subflow. It would be interesting to follow the evolution of this counter on a server that interacts with mobile devices.

Multipath TCP tracks losses on the subflows that compose a Multipath TCP connection. If one subflow accumulates losses, it may be marked as stale and the packet scheduler will stop using it to transmit data until the losses have been recovered. The MPTcpExtSubflowStale counter is incremented every time a subflow is marked as being stale. The MPTcpExtSubflowRecover counter tracks the transitions from stale to active.

Multipath TCP uses an out-of-order queue to reorder the data received over the different subflows. The MPTcpExtOFOQueueTail and MPTcpExtOFOQueue counters track the insertion of data at the tail and in the out-of-order queue. The MPTcpExtOFOMerge is incremented when data present in the out-or-order queue can be merged.

Finally, the MPTcpExtRcvPruned tracks the number of packets that were dropped because the memory available for Multipath TCP was full. If this counter increases, you should probably check the memory configuration of your host.

Native Multipath TCP applications on Linux

On recent Linux kernels, Multipath TCP is enabled on a per-socket basis by passing IP_PROTO_MPTCP as the third parameter of the socket system call that creates the socket. This implies that existing applications that use TCP need to be changed to support Multipath TCP.

The mptcp-hello project on GitHub provides simple examples showing how to enable Multipath TCP on a TCP application in the following programming languages :

Patches have been proposed to add Multipath TCP support to the following applications :

In addition some specific applications are developed with Multipath TCP support :