>
> If running into the MCE again is really bad, then you need something
> more, since other threads (or other processes) could run into the same
> page as well.
Well, the #MC handler runs on all CPUs on Intel so what we could do is
set the current task to TASK_STOPPED or _UNINTERRUPTIBLE or something
that doesn't make it viable for scheduling anymore.
Then we can take our time running the notifier since the "problematic"
task won't get scheduled until we're done. Then, when we finish
analyzing the MCE, we either kill it so it has to handle SIGKILL the
next time it gets scheduled or we unmap its page with error in it so
that it #PFs on the next run.
But no, I don't think we can catch all possible situations where a page
is mapped by multiple tasks ...
> If not, do we care? Let it hit the MCE again, as long as
> we'll catch it eventually.
... and in that case we are going to have to let it hit again. Or is
there a way to get to the tasklist of all the tasks mapping a page in
atomic context, stop them from scheduling and run the notifier work in
process context?
Hmmm..