Re: BUG?: kernel does not (re)set irq smp_affinity to reboot_cpu

From: Russell King - ARM Linux
Date: Mon Jun 27 2016 - 07:32:02 EST


On Mon, Jun 27, 2016 at 12:55:26PM +0200, Hans de Goede wrote:
> Hi Russel,
>
> On 27-06-16 11:45, Russell King - ARM Linux wrote:
> >On Mon, Jun 27, 2016 at 10:13:05AM +0100, Marc Zyngier wrote:
> >>I'm wondering if that's not an effect of this patch:
> >>
> >>https://lkml.org/lkml/2015/9/24/138
> >>
> >>missing on the ARM side (the corresponding arm64 patch is 217d453d473c).
> >
> >No, because we don't take the other CPUs offline through CPU hotplug at
> >reboot - we stop them. That's because CPU hotplug involves scheduling,
> >and a reboot can't be scheduled as it can happen from IRQ contexts.
> >
> >For a long time, we have not supported IRQs on any CPU after the system
> >has gone down for halt/reboot/poweroff etc:
> >
> >ipi_cpu_stop() disables IRQs and FIQs before entering an infinite loop.
> >machine_{halt,power_off,restart}() in arch/arm/kernel/reboot.c disables
> >IRQs on the requesting CPU.
> >
> >So, IRQs get disabled on _all_ CPUs. Code after this point should not
> >re-enable IRQs to be able to use drivers, which it sounds like what's
> >happening in Hans scenario. Remember, as I've said above, these paths
> >should not even be scheduling, and should never be reliant on receiving
> >interrupts. *Especially* as they can themselves be called from IRQ
> >context.
>
> First of all thanks for your input.
>
> Note this is not reboot, this is poweroff.

I think I covered that - all the paths are indentical in the ARM
architecture code, and have been identical in this respect well before
any of the drivers you've pointed out.

> And for poweroff many (ARM) boards depend on working i2c, which
> depends on irqs, for example all these mfd drivers:
>
> drivers/mfd/rn5t618.c
> drivers/mfd/twl4030-power.c
> drivers/mfd/palmas.c
> drivers/mfd/dm355evm_msp.c
> drivers/mfd/tps6586x.c
> drivers/mfd/retu-mfd.c
> drivers/mfd/max8907.c
> drivers/mfd/tps65910.c
> drivers/mfd/tps80031.c
> drivers/mfd/rk808.c
> drivers/mfd/axp20x.c
>
> Define pm_power_off and use i2c.

Right, so these drivers are all buggy, and need fixing.

> So although you may very well be right that using irqs to implement poweroff
> is not how things should be, in practice we've been using them for this for
> quite a while now and this usually works fine.

... and they're all violating the conditions set down for by the
architecture for an orderly poweroff - presumably the reason this
works for !SMP cases is because somewhere along the path, they're
re-enabling IRQs behind the back of architecture code.

> So it seems that the assumption that machine_power_off may be called
> from irq context is not always true, specifically it is only true on
> certain platforms (mach-ixp4xx, omap4, omap5 and whatever uses
> ab8500.c). I would expect the pm_power_off implementations on these
> platforms to indeed not use irqs themselves, that would indeed be
> bad.

Right, but the overriding thing here is that it _may_ be called from
IRQ context _and_ pm_power_off() is called with IRQs disabled. That
second one is the more important point - pm_power_off() handlers are
called with a non-schedulable context.

> Which brings us back to our original problem, how do we fix
> irq smp_affinity on power off ?

Only if we accept that pm_power_off() should be called with IRQs
enabled, which we haven't ascertained yet.

Even on x86, pm_power_off() is generally called with IRQs disabled,
and more - the APICs are disabled along with the system IOMMU in the
case of x86_64. These are only avoided if the reboot mode is set to
"force" (reboot_force).

Now, we could do as you are suggesting, and route IRQs to the
remaining CPU via all shutdown paths, but that would be papering over
the fundamental bug here: if a function is called with IRQs disabled,
it (or any called function) has no business re-enabling IRQs.

--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.