Re: [PATCH net 2/2] act_ct: support asymmetric conntrack

From: Aaron Conole
Date: Mon Nov 18 2019 - 16:24:30 EST


Paul Blakey <paulb@xxxxxxxxxxxx> writes:

> On 11/14/2019 4:22 PM, Roi Dayan wrote:
>>
>> On 2019-11-08 11:07 PM, Aaron Conole wrote:
>>> The act_ct TC module shares a common conntrack and NAT infrastructure
>>> exposed via netfilter. It's possible that a packet needs both SNAT and
>>> DNAT manipulation, due to e.g. tuple collision. Netfilter can support
>>> this because it runs through the NAT table twice - once on ingress and
>>> again after egress. The act_ct action doesn't have such capability.
>>>
>>> Like netfilter hook infrastructure, we should run through NAT twice to
>>> keep the symmetry.
>>>
>>> Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
>>>
>>> Signed-off-by: Aaron Conole <aconole@xxxxxxxxxx>
>>> ---
>>> net/sched/act_ct.c | 13 ++++++++++++-
>>> 1 file changed, 12 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
>>> index fcc46025e790..f3232a00970f 100644
>>> --- a/net/sched/act_ct.c
>>> +++ b/net/sched/act_ct.c
>>> @@ -329,6 +329,7 @@ static int tcf_ct_act_nat(struct sk_buff *skb,
>>> bool commit)
>>> {
>>> #if IS_ENABLED(CONFIG_NF_NAT)
>>> + int err;
>>> enum nf_nat_manip_type maniptype;
>>>
>>> if (!(ct_action & TCA_CT_ACT_NAT))
>>> @@ -359,7 +360,17 @@ static int tcf_ct_act_nat(struct sk_buff *skb,
>>> return NF_ACCEPT;
>>> }
>>>
>>> - return ct_nat_execute(skb, ct, ctinfo, range, maniptype);
>>> + err = ct_nat_execute(skb, ct, ctinfo, range, maniptype);
>>> + if (err == NF_ACCEPT &&
>>> + ct->status & IPS_SRC_NAT && ct->status & IPS_DST_NAT) {
>>> + if (maniptype == NF_NAT_MANIP_SRC)
>>> + maniptype = NF_NAT_MANIP_DST;
>>> + else
>>> + maniptype = NF_NAT_MANIP_SRC;
>>> +
>>> + err = ct_nat_execute(skb, ct, ctinfo, range, maniptype);
>>> + }
>>> + return err;
>>> #else
>>> return NF_ACCEPT;
>>> #endif
>>>
>> +paul
>
> Hi Aaron,
>
> I think I understand the issue and this looks good,
>
> Can you describe the scenario to reproduce this?

It reproduces with OpenShift 3.10, which makes forward direction packets
between namespaces pump through a tun device that applies NAT rules to
rewrite the dest. Limit the namespace number of ephemeral sockets using
by editing net.ipv4.ip_local_port_range in the client namespace, and
connect to the server namespace. That's the mechanism for OvS. But for
TC I guess there wouldn't be anything convenient avaiable.

I'll try to script up something that doesn't use openshift.

>
> Thanks,
>
> Paul.