Re: Re: [PATCH] [BUGFIX] crash/ioapic: Prevent crash_kexec() fromdeadlocking of ioapic_lock

From: Yoshihiro YUNOMAE
Date: Thu Aug 22 2013 - 04:38:25 EST


(2013/08/20 23:27), Don Zickus wrote:
On Tue, Aug 20, 2013 at 03:12:32AM -0700, Eric W. Biederman wrote:
Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@xxxxxxxxxxx> writes:

Hi Ingo,

Thank you for fixing typos!
OK, I'll fix them and rename to ioapic_zap_locks().

Thank you again!


The better fix for this would be to remove the disable_IO_APIC call from
crash_kexec.

I know last time it was investigated the kernel was very close to
working without needing that, and the code will be much more robust in
the long term if we can avoid disabling them in the crashing kernel.

Yoshihiro is there any chance you can look into removing the
disable_IO_APIC entirely?

The apic disablement and the disable_IO_APIC exists entirely due to
limitations in the kernel boot path.

Yup. We went down this path a year ago:

https://lkml.org/lkml/2012/2/2/331

Then we got sidetracked and talked about removing the lapic stuff at
shutdown too:

http://lists.infradead.org/pipermail/kexec/2012-February/006017.html
(sorry couldn't find lkml link for some reason)

And the second patch was committed.

However, it was quickly reverted when Yinghai Lu noticed a problem:

https://lkml.org/lkml/2012/2/11/143

The problem stemmed from the fact that the nmi_watchdog caused an NMI in
the middle of transitioning between the two kernels (we didn't shutdown
the lapic) and caused a reset (there is no NMI handler in purgatory).

I think I dropped the ball in investigating how to write an idt for the
purgatory code to handle spurious NMIs.

Regardless of all that, I think if we stick to just removing the ioapic
shutdown code (ie the first patch linked above), we should be ok. I
believe my testing went smoothly. It was the lapic stuff that needed more
tweaking.

So, I agree with Eric, let's remove the disable_IO_APIC() stuff and keep
the code simpler.

Thank you for commenting about my patch.
I didn't know you already have submitted the patches for this deadlock
problem.

I can't answer definitively right now that no problems are induced by
removing disable_IO_APIC(). However, my patch should be work well (and
has already been merged to -tip tree). So how about taking my patch at
first, and then discussing the removal of disabled_IO_APIC()?

Thanks,
Yoshihiro YUNOMAE

--
Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: yoshihiro.yunomae.ez@xxxxxxxxxxx


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/