Re: [PATCH v2] tcp: Initial support for RFC5925 auth option

From: Leonard Crestez
Date: Wed Nov 03 2021 - 18:22:56 EST


On 11/3/21 5:18 AM, David Ahern wrote:
On 11/1/21 10:34 AM, Leonard Crestez wrote:
This is similar to TCP MD5 in functionality but it's sufficiently
different that wire formats are incompatible. Compared to TCP-MD5 more
algorithms are supported and multiple keys can be used on the same
connection but there is still no negotiation mechanism.

Expected use-case is protecting long-duration BGP/LDP connections
between routers using pre-shared keys. The goal of this series is to
allow routers using the linux TCP stack to interoperate with vendors
such as Cisco and Juniper.

Both algorithms described in RFC5926 are implemented but the code is not
very easily extensible beyond that. In particular there are several code
paths making stack allocations based on RFC5926 maximum, those would
have to be increased.

This version implements SNE and l3mdev awareness and adds more tests.
Here are some known flaws and limitations:

* Interaction with TCP-MD5 not tested in all corners
* Interaction with FASTOPEN not tested and unlikely to work because
sequence number assumptions for syn/ack.
* Not clear if crypto_shash_setkey might sleep. If some implementation
do that then maybe they could be excluded through alloc flags.
* Traffic key is not cached (reducing performance)
* User is responsible for ensuring keys do not overlap.
* There is no useful way to list keys, making userspace debug difficult.
* There is no prefixlen support equivalent to md5. This is used in
some complex FRR configs.

Test suite was added to tools/selftests/tcp_authopt. Tests are written
in python using pytest and scapy and check the API in some detail and
validate packet captures. Python code is already used in linux and in
kselftests but virtualenvs not very much, this particular test suite
uses `pip` to create a private virtualenv and hide dependencies.

This actually forms the bulk of the series by raw line-count. Since
there is a lot of code it was mostly split on "functional area" so most
files are only affected by a single code. A lot of those tests are
relevant to TCP-MD5 so perhaps it might help to split into a separate
series?

Some testing support is included in nettest and fcnal-test.sh, similar
to the current level of tcp-md5 testing.

SNE was tested by creating connections in a loop until a large SEQ is
randomly selected and then making it rollover. The "connect in a loop"
step ran into timewait overflow and connection failure on port reuse.
After spending some time on this issue and my conclusion is that AO
makes it impossible to kill remainders of old connections in a manner
similar to unsigned or md5sig, this is because signatures are dependent
on ISNs. This means that if a timewait socket is closed improperly then
information required to RST the peer is lost.

The fact that AO completely breaks all connection-less RSTs is
acknowledged in the RFC and the workaround of "respect timewait" seems
acceptable.

Changes for frr (old): https://github.com/FRRouting/frr/pull/9442
That PR was made early for ABI feedback, it has many issues.


overall looks ok to me. I did not wade through the protocol details.

I did see the comment about no prefixlen support in the tests. A lot of
patches to absorb, perhaps I missed it. Does AuthOpt support for
prefixes? If not, you should consider adding that as a quick follow on
(within the same dev cycle). MD5 added prefix support for scalability;
seems like AO should be concerned about the same.

I just skipped it because it's not required for core functionality.

It's very straight forward so I will add it to the next version.

--
Regards,
Leonard