Re: [PATCH 08/10] NOTIFIER: Take over TIF_MCE_NOTIFY and implementtask return notifier

From: Avi Kivity
Date: Mon Jun 13 2011 - 08:47:35 EST


On 06/13/2011 03:40 PM, Borislav Petkov wrote:

<snippage>

>
> So: MCE uses irq_work_queue() -> wake up a realtime task -> process the
> mce, unmap the page, go back to sleep.

Yes, this is basically it. However, the other cores cannot schedule a
task which maps the compromized page until we haven't finished finding
and 'fixing' all the mappers.

So we either hold off the cores from executing userspace - in that
case no need to mark a task as unsuitable to run - or use the task
return notifiers in patch 10/10.

HOWEVER, AFAICT, if the page is mapped multiple times,
killing/recovering the current task doesn't help from another core
touching it and causing a follow-up MCE. So holding off all the cores
from scheduling userspace in some manner might be the superior solution.
Especially if you don't execute the #MC handler on all CPUs as is the
case on AMD.


That's basically impossible, since the other cores may be in fact executing userspace, with the next instruction accessing the bad page. In fact the access may have been started simultaneously with the one that triggered the #MC.

The best you can do is IPI everyone as soon as you've caught the #MC, but you have to be prepared for multiple #MC for the same page. Once you have that, global synchronization is not so important anymore.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/