Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096

From: Ingo Molnar
Date: Tue Oct 20 2009 - 09:43:31 EST



* Jeff Mahoney <jeffm@xxxxxxxx> wrote:

> On 10/20/2009 02:35 AM, Ingo Molnar wrote:
> >
> > * Jiri Kosina <jkosina@xxxxxxx> wrote:
> >
> >> On Tue, 20 Oct 2009, Ingo Molnar wrote:
> >>
> >>>> Commit 34d76c41 introduced percpu array update_shares_data, size of which
> >>>> being proportional to NR_CPUS. Unfortunately this blows up ia64 for large
> >>>> NR_CPUS configuration, as ia64 allows only 64k for .percpu section.
> >>>>
> >>>> Fix this by allocating this array dynamically and keep only pointer to it
> >>>> percpu.
> >>>>
> >>>> Signed-off-by: Jiri Kosina <jkosina@xxxxxxx>
> >>>> ---
> >>>> kernel/sched.c | 15 +++++++--------
> >>>> 1 files changed, 7 insertions(+), 8 deletions(-)
> >>>
> >>> Seems like an IA64 bug to me.
> >>
> >> IA64 guys actually use that as some kind of optimization for fast
> >> access to the percpu data in their pagefault handler, as far as I
> >> know.
> >
> > Still looks like a bug if it causes a breakage (linker error) on IA64,
> > and if the 'fix' (i'd call it a workaround) causes a (small but nonzero)
> > performance regression on other architectures.
>
> The linker error isn't a bug, it's enforcement. The ia64 linker script
> explicitly rewinds the location pointer back to the start of
> .data.percpu + 64k to start the .data section to cause the error if
> .data.percpu is larger than 64k.

Since every other SMP architecture manages to support more than 64K of
pecpu data, this is clearly an ugly, self-inflicted limitation of IA64
that has now escallated into a link failure.

Now, 34d76c41 could certainly be improved in a way that works around the
IA64 problem too: we can allocate the data dynamically as long as the
proper percpu allocator is used (not kmalloc as in the patch in this
thread). But arguing that the current IA64 64K limit behavior is
anything but very broken is rather shortsighted.

IA64 should be fixed really - we can get past the 64K of percpu data
limit anytime we add a few more pages of per-cpu data to the kernel -
the scheduler just happened to be the one to cross it this time.

The scheduler change in 34d76c41 has been done two months ago and has
been upstream for a month, so this compaint is rather late and at
minimum a certain degree of honesty about the situation is warranted.

Saying that all static percpu data must be below 64K, which will only be
noticed once IA64 gets its testing act together months after it's been
created is silly. If you want to enforce such a limit make it testable
in a _timely_ fashion. Or fix the limit really.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/