Re: [PATCH] netfilter: per netns nf_conntrack_cachep

From: Jon Masters
Date: Wed Feb 03 2010 - 13:38:58 EST


On Wed, 2010-02-03 at 13:10 +0100, Patrick McHardy wrote:
> Patrick McHardy wrote:
> > Jon Masters wrote:
> >> On Tue, 2010-02-02 at 19:58 +0200, Alexey Dobriyan wrote:
> >>
> >>> Yes, moving to init_net-only function is fine.
> >> So moving the "setup up fake conntrack" bits to init_init_net from
> >> init_net still results in the panic, which means that the use count
> >> really is dropping to zero and we really are trying to free it when
> >> using multiple namespaces. Per ns is probably an easier way to go.
> >
> > Agreed, that will also avoid problems in the future with the
> > ct_net pointer pointing to &init_net. I'll take care of this
> > tommorrow.
>
> Unfortunately a per-namespace conntrack is not easily possible without
> larger changes (most of which are already queued in nf-next-2.6.git
> though). So for now I just moved the untrack handling to the init_net
> setup and cleanup functions and we can try to fix the remainder in
> 2.6.34.

Ok. I'd love to help out actually, given that I've been poking at this,
and it's quite fun. So please at least send me patches. The only other
thing I consider a priority issue at the moment for this is that writing
into /sys/module/nf_conntrack/parameters/hashsize on a running system
with multiple namespaces will cause the system to corrupt random memory
silently and fall over. That probably needs fixing until there is
per-namespace hashsize tracking, and this isn't a global tunable.

Also, some other things I think are required before 2.6.34:

*). Per namespace cacheing allocation (the cachep bits). We know it's
still possible for weirdness to happen in the SLAB cache here.
*). Per namespace hashsize tracking. Existing code corrupts hashtables
if the global size is changed when there is more than one netns
*). Per namespace expectations. This is for similar reasons to the need
for multiple hashtables, though I haven't poked at that.

I also think it is necessary to expose net namespace layout and
configuration via sysfs or some other interface, add a net->id parameter
(and may even an optional name), etc. Where does netns discussion
happen, on netdev I would presume?

> Jon, could you give this patch a try please?

Yup. Box is stable and boots multiple virtual machines as it did with
the quick hack from yesterday, so this has now fixed the problem.

Can you let me know if this is the final patch you want to post? If so,
we should get this into stable asap (and I have a couple of vendor
kernels that will need a version of this fix also).

Jon.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/