Re: [RFC] x86_64: A real proposal for iret-less return to kernel

From: Borislav Petkov
Date: Thu May 22 2014 - 04:50:51 EST


On Thu, May 22, 2014 at 09:03:34AM +0900, Linus Torvalds wrote:
> No, that's fine, if it's a thread-synchronous thing (ie a memory load
> that causes errors). But for NMI handlers, that is irrelevant: if
> the NMI code itself gets memory errors, the machine really is dead.
> Let's face it, we're going to panic and reboot, there's no other
> real alternative (other than the "just log it, pray, and continue
> in unstable mode", which is actually a perfectly valid alternative
> in many cases, since people don't necessarily care deeply and have
> written their distributed algorithms to not rely on any particular
> thread too much, and will verify the end results anyway).

Oh, definitely.

Infact, we'll panic on uncorrectable errors in any unmovable memory,
i.e. kernel code and data because we simply can't recover from it.
Anything that happens in the NMI handler most probably falls in that
category so...

I was simply pointing out the fact that Andy's algo needs to pay
attention to MCEs and other higher prio exceptions happening.

> The problem is literally the non-synchronous things (like another
> CPU having problems) where things like broadcast will actually turn
> a non-thread-synchronous thing into problems for other CPU's. Then,
> a user-mode memory access error (that we *can* recover from, perhaps
> by killing the process and isolating the page) can turn into a
> unrecoverable error on another CPU because it got interrupted at a
> point where it really couldn't afford to be interrupted.

That definitely sounds like a nasty thing, sure.

Although, there's at least one problem I've been thinking about wrt the
non-broadcast MCE: it is pretty hard to handle an uncorrectable memory
error in a page which is shared by multiple threads running on multiple
cores.

So normally one of the cores will detect it, raise an MCE and deal with
it but there's nothing stopping the other cores from touching that data.

One of the possible things which could happen is, if the other cores
consume that data, they will trigger an MCE too and will have to see
that the first core which detected the error is about to poison that
page so their job in the MCE handler is done and they have to exit.

I'm not saying this is undoable but it is a bit tricky and some
scenarios would need to be played out first to know better.

So, to a certain extent, broadcasting the MCE and keeping the cores in a
holding pattern, not touching any userspace stuff might've been one way
to deal with situations like that. It certainly makes things easier for
that particular scenario.

I'm not saying it was a good idea due to the point you're making - maybe
they should've talked to software people first. I'm basically trying to
explain to me what the reasoning behind that broadcasting might be.

> It appears Intel is fixing their braindamage.

Yep, we'd still need to deal with the existing systems but we don't have
a choice anyway.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/