Re: [PATCH] Parallel microcode update in Linux

From: Mihai Carabas
Date: Mon Sep 02 2019 - 07:10:52 EST


La 02.09.2019 10:39, Pavel Machek a scris:
Hi!

+ u64 p0, p1;
int ret;

atomic_set(&late_cpus_in, 0);
atomic_set(&late_cpus_out, 0);

+ p0 = rdtsc_ordered();
+
ret = stop_machine_cpuslocked(__reload_late, NULL, cpu_online_mask);
+
+ p1 = rdtsc_ordered();
+
if (ret > 0)
microcode_check();

pr_info("Reload completed, microcode revision: 0x%x\n", boot_cpu_data.microcode);

+ pr_info("p0: %lld, p1: %lld, diff: %lld\n", p0, p1, p1 - p0);
+
return ret;
}

We have used a machine with a broken microcode in BIOS and no microcode in
initramfs (to bypass early loading).

Here are the results for parallel loading (we made two measurements):

[ 18.197760] microcode: updated to revision 0x200005e, date = 2019-04-02
[ 18.201225] x86/CPU: CPU features have changed after loading microcode, but might not take effect.
[ 18.201230] microcode: Reload completed, microcode revision: 0x200005e
[ 18.201232] microcode: p0: 118138123843052, p1: 118138153732656, diff: 29889604

Here are the results of serial loading:

[ 17.542518] microcode: updated to revision 0x200005e, date = 2019-04-02
[ 17.898365] x86/CPU: CPU features have changed after loading microcode, but might not take effect.
[ 17.898370] microcode: Reload completed, microcode revision: 0x200005e
[ 17.898372] microcode: p0: 149220216047388, p1: 149221058945422, diff: 842898034

One can see that the difference is an order magnitude.

Well, that's impressive, but it seems to finish 300 msec later? Where does that difference
come from / how much real time do you gain by this?

The difference comes from the large amount of cores/threads the machine has: 72 in this case, but there are machines with more. As the commit message says initially the microcode was applied serially one by one and now the microcode is updated in parallel on all cores.

300ms seems nothing but it is enough to cause disruption in some critical services (e.g. storage) - 300ms in which we do not execute anything on CPUs. Also this 300ms is increasing when the machine is fully loaded with guests.


Yes, but if you look at the dmesgs I quoted, paralel microcode update
actually finished 300msec _later_.

That is the serial loading (it is written before: "Here are the results of serial loading:"), parallel is before. Am I missing something?


Pavel