Re: NUMA x86: add constraints check for nid parameters

From: Petr Holasek
Date: Wed Dec 07 2011 - 14:40:30 EST


On Tue, 06 Dec 2011, David Rientjes wrote:

>
> On Fri, 2 Dec 2011, Petr Holasek wrote:
>
> > > > > > This patch adds constraints checks into __node_distance() and
> > > > > > numa_set_distance() functions. If from or to parameters are
> > > > > > lower than zero, it results into oops now.
> > > > >
> > > > > Passing negative numbers into __node_distance() sounds like a bug in
> > > > > the caller, and this patch will remove our means of detecting that bug.
> > > >
> > > > That's true, but upper boundary is checked now, so why not to check lower?
> > >
> > > Because it adds more code to the kernel and can hide bugs?
> > >
>
> The upper bound is checked to ensure that we don't dereference past end of
> the array that stores the distance table, so it will catch errors for
> things like memory hotplug when additional nodes are onlined and the data
> structure isn't updated accordingly.

Thanks for clarification, I missed this case.

>
> > > If what we're doing here is to be defensive against buggy BIOS tables
> > > (a good idea) then we should validate the BIOS table values as close as
> > > possible to the point where they were read frmo the BIOS. And we should
> > > (probably) emit a warning if a bad table entry is detected, rather than
> > > silently fixing it up.
> >
> > numa_set_distance() does exactly what you described above, only emits a
> > warning. I agree with your objections with __node_distance() checks, it
> > really can hide bugs in caller. So silent fix-up is the main problem and
> > we shouldn't check anything so the caller will be advised when using
> > wrong nid by oops with a benefit of less code for us. Do I understand your
> > opinion on this type of code?
>
> I'd have no objection to adding a check to numa_set_distance() to ensure
> the node ids are non-negative in the same way we check that the distances
> themselves are non-negative; that can catch errors when pxms are used
> uninitialized when parsing the SRAT. However, I think adding the check to
> __node_distance() is unnecessary.

Ok, I'll try v2 without __node_distance()
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/