vxlan: how to expose opt-in RFC conformity with unprocessed header flags

From: Thomas Lamprecht
Date: Fri Jan 12 2024 - 10:13:44 EST


Hi!

We got a customer that reported an issue where the Linux VXLAN
implementation diverges from the RFC, namely when any of the (reserved)
flags other than the VNI one is set, the kernel just drops the package.

According to the vxlan_rcv function in vxlan_core this is done by choice:

if (unparsed.vx_flags || unparsed.vx_vni) {
/* If there are any unprocessed flags remaining treat
* this as a malformed packet. This behavior diverges from
* VXLAN RFC (RFC7348) which stipulates that bits in reserved
* in reserved fields are to be ignored. The approach here
* maintains compatibility with previous stack code, and also
* is more robust and provides a little more security in
* adding extensions to VXLAN.
*/
goto drop;
}

Normally this is not an issue, as the same RFC also dictates that the sender
must have those reserved bits set to zero. But naturally, some devices are
not following that side of the contract either, like some Juniper switches
of said customers, which set the B-bit (like it would be a VXLAN-GPE) in the
VXLAN packet, even though they have VXLAN-GPE explicitly disabled.

So, while I asked the customer to open a support ticket with their switch
vendor, as that one is breaking the RFC too, the kernel is just the simpler
thing to "fix", especially for our side the only thing we can change at all.

As just changing the code so that it would be always RFC conform (at least
in this regard) seems to be a no-go, as some setups would then suddenly see
extra (malicious) traffic go through, so to my actual question:

What would be the accepted way to add a switch of making this RFC conform in
an opt-in way? A module parameter? A sysfs entry? Through netlink?

As depending on the answer of that I'd like to prepare a patch implementing
the opt-in RFC-conformance w.r.t. ignoring the reserved bits values of the
VXLAN flags, this way setups with complementary broken HW in their network
path can opt in to that behavior as a workaround.

thanks!
Thomas