Re: [PATCH 07/10] MCE: replace mce.c use of TIF_MCE_NOTIFY with user_return_notifier

From: Avi Kivity
Date: Sun Jun 12 2011 - 06:30:41 EST


On 06/12/2011 01:24 PM, Borislav Petkov wrote:
On Sun, Jun 12, 2011 at 04:29:41AM -0400, Avi Kivity wrote:
> On 06/10/2011 12:35 AM, Luck, Tony wrote:
> > From: Tony Luck<tony.luck@xxxxxxxxx>
> >
> > Ingo wrote:
> > > We already have a generic facility to do such things at
> > > return-to-userspace: _TIF_USER_RETURN_NOTIFY.
> >
> > This just a proof of concept patch ... before this can become
> > real the user-return-notifier code would have to be made NMI
> > safe (currently it uses hlist_add_head/hlist_del, which would
> > need to be changed to Ying's NMI-safe single threaded lists).
>
> You could use irq_work_queue() to push this into an irq context, which
> is user-return-notifier safe.

Maybe I'm missing something but it looks like irq_work_queue() queues
work which is run in irq_work_run() with IRQs disabled. However, user
return notifiers are run after IRQs get enabled in entry_64.S. And we
want to run memory_failure() with IRQs enabled.

More importantly, we want to be able to do the following:

* run #MC handler which queues work

* when returning to userspace, preempt and schedule that previously
queued work _before_ the process that caused the MCE gets to execute.

Yes.

Imagine this scenario:

Your userspace process causes a data cache read error due to either
alpha particles or maybe because the DRAM device containing the process
page is faulty and generates ECC errors which the ECC code cannot
correct, i.e. an uncorrectable error we definitely want to handle; IOW
Action Required MCE.

Now, if you get lucky and this page is mapped only by the process that
caused the MCE, you could unmap it, mark it PageReserved and cause the
process to refault. But in order to do that, you want to execute the
memory_failure() handler _before_ you schedule the process again.

In the instruction cache read error case, you don't have processor
context to return to (or you're being too conservative and don't want to
risk it) so you kill the process, which is pretty easy to do.

Does that make a bit more sense? Tony?


You're missing the flow. The MCE handler calls irq_work_queue(), which schedules a user return notifier, which does any needed processing in task context.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/