Problem with percpu values when bringing up second CPU?

From: Jeremy Fitzhardinge
Date: Tue Aug 04 2009 - 18:58:40 EST


Hi,

I just tracked down a bug I was having to a change where I changed one
of my Xen event channel variables to a percpu variable, relating to
masking an event channel.

The symptom was that shortly after bringing up the second CPU, the first
CPU's timer events stopped arriving, apparently because they had become
masked.

The event channels masks are declared as:

#define NR_EVENT_CHANNEL_LONGS (NR_EVENT_CHANNELS/BITS_PER_LONG)
static DEFINE_PER_CPU(unsigned long,
cpu_evtchn_mask[NR_EVENT_CHANNEL_LONGS]) =
{[0 ... NR_EVENT_CHANNEL_LONGS-1] = ~0ul }; /* everything masked by default */


My theory about what's happening is that when the second CPU comes up,
it allocates separate percpu areas for each CPU, but it is somehow
failing to accurately copy CPU 0's percpu data over; either it isn't
copying it all (ie, using the initialized values rather than the current
values), or failing to copy the values in an interrupt-atomic way.

Does this sound plausible?

When I convert this back to an ad-hoc percpu variable (an array indexed
by cpu number), it goes back to working. Also, if I boot with maxcpus=1
it also works with percpu data.

Also, because we don't have large pages under Xen, it always allocates
percpu as 4k pages:

PERCPU: Allocated 21 4k pages, static data 82080 bytes

Thanks,
J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/