Re: x86, microcode: BUG: microcode update that changes x86_capability

From: Andy Lutomirski
Date: Wed Sep 24 2014 - 11:02:59 EST


On Wed, Sep 24, 2014 at 7:56 AM, Henrique de Moraes Holschuh
<hmh@xxxxxxxxxx> wrote:
> On Tue, 23 Sep 2014, Borislav Petkov wrote:
>> On Fri, Sep 19, 2014 at 01:42:17PM -0300, Henrique de Moraes Holschuh wrote:
>> > 1. offline a "guinea pig" group of "cpus", i.e. an entire "microcode update
>> > unit" that doesn't include the BSP. This is going to be a pain, as what
>> > composes a "microcode update unit" is not set in stone, and could change in
>> > a future microarch.
>>
>> I'm pretty sure it is very dangerous to run with different microcode
>> revisions on different cores. Your plan won't fly and I have hard time
>> understanding why one would do such thing even if it did work.
>
> I don't want that plan to fly, it is too complex and I wrote as much at
> the end of that email. I won't bother with the situations where it would
> be helpful, they're not very interesting.
>
>
> On the topic of microcode revision skew in a multi-processor system:
>
> For a long time we had an Extremely Bad userspace interface that required
> userspace to trigger the microcode update once per cpu, and it fetched the
> microcode from userspace once per cpu.
>
> This made for an absurdly large time window during which we'd have
> microcode revision skew across cpus, and yet nothing blew up sky-high. If
> microcode revision skew was not generally safe, we'd have had a lot of
> trouble already.
>
> In fact, we still run the system with microcode revision skew while the
> microcode update is taking place through the regular microcode driver, as
> it is serialized one cpu at a time, and the other cpus are active and
> running.
>
> I don't know about AMD, but on Intel, the time it takes to update the
> microcode on a core is anything but negligible[1], so the microcode
> version skew window still exists, and it is not small. It is much smaller
> than it once was, but it is still there.
>
> The only way to really minimize the risk of microcode version skew is to
> limit oneself to firmware and early initramfs microcode updates.
>
>> If we're going to have to hide stuff which software might be using, I
>> don't see a way around rebooting.
>
> Nor do I.
>
> But IMHO we still need to detect and do something smart when
> x86_capability changes due to a microcode update.
>
> And I'd really prefer it to be "update x86_capability, warn the user and
> carry on" for anything that is not going to crash the kernel. Several
> distros will really want this backported to -stable, as the older kernels
> cannot do early microcode updates.
>

I'm trying to see if Intel is willing to document any additional
controls for the TSX bits in this ucode. No word yet, but I might
hear something soon.

--Andy

>
> [1] Intel processors take from 200 thousand cycles to several million
> cycles per core to sucessfully apply a microcode update. Verified
> using get_cycles() right before and right after the WRMSR 0x79.
> Variance was really high, about 10%. My limited testing matched what
> has been previously reported by Ben Hawkes.
>
> --
> "One disk to rule them all, One disk to find them. One disk to bring
> them all and in the darkness grind them. In the Land of Redmond
> where the shadows lie." -- The Silicon Valley Tarot
> Henrique Holschuh



--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/