Re: BUG_ON(!cpus_equal(cpumask, tmp));

From: Andrew Morton
Date: Mon Mar 29 2004 - 19:26:01 EST


Andrew Morton <akpm@xxxxxxxx> wrote:
>
> "Martin J. Bligh" <mbligh@xxxxxxxxxxx> wrote:
> >
> > The bug is on "BUG_ON(!cpus_equal(cpumask, tmp));" in flush_tlb_others
> > This was from 2.6.1-rc1-mjb1, and seems to be a race on shutdown
> > (prints after "Power Down"), but I've no reason to believe it's specific
> > to the patchset I have - it's not an area I'm touching, I think.
> >
> > I presume we've got a race between taking CPUs offline and the
> > tlbflush code ... tlb_flush_mm reads the value from mm->cpu_vm_mask,
> > and then presumably some other cpu changes cpu_online_map before it
> > gets to calling flush_tlb_others ... does that sound about right?
>
> Looks like it, yes. I don't think there's a sane way of fixing that - we'd
> need the flush_tlb_others() caller to hold some lock which keeps the cpu
> add/remove code away.
>
> I'd propose removing the assertion?

If the going-away CPU can still take IPIs we're OK, I think. If it cannot,
we have a problem. Can you do a s/BUG_ON/WARN_ON/ and run with that for a
while? Check that the warnings come out and the machine doesn't go crump?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/