Re: [PATCH net-next 0/3] make skip_sw actually skip software

From: Asbjørn Sloth Tønnesen
Date: Fri Feb 16 2024 - 07:18:06 EST


Hi Marcelo,

On 2/15/24 18:00, Marcelo Ricardo Leitner wrote:
On Thu, Feb 15, 2024 at 04:04:41PM +0000, Asbjørn Sloth Tønnesen wrote:
...
Since we use TC flower offload for the hottest
prefixes, and leave the long tail to Linux / the CPU.
we therefore need both the hardware and software
datapath to perform well.

I found that skip_sw rules, are quite expensive
in the kernel datapath, sice they must be evaluated
and matched upon, before the kernel checks the
skip_sw flag.

This patchset optimizes the case where all rules
are skip_sw.

The talk is interesting. Yet, I don't get how it is set up.
How do you use a dedicated block for skip_sw, and then have a
catch-all on sw again please?

Bird installs the DFZ Internet routing table into the main kernel table
for the software datapath.

Bird also installs a subset of routing table into an aux. kernel table.

flower-route then picks up the routes from the aux. kernel table, and
installs them as TC skip_sw filters.

On these machines we don't have any non-skip_sw TC filters.

Since 2021, we have statically offloaded all inbound traffic, since
nexthop for our IP space is always the switch next to it, which does
interior L3 routing. Thereby we could offload ~50% of the packets.

I have put an example of the static script here:
https://files.fiberby.net/ast/2024/tc_skip_sw/mlx5_static_offload.sh

And `tc filter show dev enp5s0f0np0 ingress` after running the script:
https://files.fiberby.net/ast/2024/tc_skip_sw/mlx_offload_demo_tc_dump.txt


I'm missing which traffic is being matched against the sw datapath. In
theory, you have all the heavy duty filters offloaded, so the sw
datapath should be seeing only a few packets, right?

We are an residential ISP, our traffic is therefore residential Internet
traffic, we run the BGP routers as a router on a stick, the filters therefore
see both inbound and outbound traffic.

~50% of packets are inbound traffic, our own prefixes are therefore the
hottest prefixes. Most streaming traffic is handled internally, and is
therefore not seen on our core routers. We regularly have 5%-10% of all
outbound traffic going towards the same prefix, and have 50% of outbound
traffic distributed across just a few prefixes.

We currently only offload our own prefixes, and a select few other known
high-traffic prefixes.

The goal is to offload the majority of the trafic, but it is still early
days for flower-route, and I need to implement some smarter chain layout
first and dynamic filter placement based on hardware counters.

Even when I get flower-route to offload almost all traffic, there will still
be a long tail of prefixes not in hardware, so the kernel still needs
to not be pulled down by the offloaded filters.

--
Best regards
Asbjørn Sloth Tønnesen
Network Engineer
Fiberby - AS42541