Re: [RFC 1/2] x86/bugs: Disable coresched on hardware that does not need it

From: Alexander Graf
Date: Thu Nov 12 2020 - 15:02:14 EST




On 12.11.20 16:28, Joel Fernandes wrote:

On Thu, Nov 12, 2020 at 03:52:32PM +0100, Alexander Graf wrote:


On 12.11.20 14:40, Joel Fernandes wrote:

On Wed, Nov 11, 2020 at 11:29:37PM +0100, Alexander Graf wrote:


On 11.11.20 23:15, Joel Fernandes wrote:

On Wed, Nov 11, 2020 at 5:13 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:

On Wed, Nov 11, 2020 at 5:00 PM Alexander Graf <graf@xxxxxxxxxx> wrote:
On 11.11.20 22:14, Joel Fernandes wrote:
Some hardware such as certain AMD variants don't have cross-HT MDS/L1TF
issues. Detect this and don't enable core scheduling as it can
needlessly slow the device done.

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index dece79e4d1e9..0e6e61e49b23 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -152,6 +152,14 @@ void __init check_bugs(void)
#endif
}

+/*
+ * Do not need core scheduling if CPU does not have MDS/L1TF vulnerability.
+ */
+int arch_allow_core_sched(void)
+{
+ return boot_cpu_has_bug(X86_BUG_MDS) || boot_cpu_has_bug(X86_BUG_L1TF);

Can we make this more generic and user settable, similar to the L1 cache
flushing modes in KVM?

I am not 100% convinced that there are no other thread sibling attacks
possible without MDS and L1TF. If I'm paranoid, I want to still be able
to force enable core scheduling.

In addition, we are also using core scheduling as a poor man's mechanism
to give customers consistent performance for virtual machine thread
siblings. This is important irrespective of CPU bugs. In such a
scenario, I want to force enable core scheduling.

Ok, I can make it new kernel command line option with:
coresched=on
coresched=secure (only if HW has MDS/L1TF)
coresched=off

Also, I would keep "secure" as the default. (And probably, we should
modify the informational messages in sysfs to reflect this..)

I agree that "secure" should be the default.

Ok.

Can we also integrate into the "mitigations" kernel command line[1] for this?

Sure, the integration into [1] sounds conceptually fine to me however it is
not super straight forward. Like: What if user wants to force-enable
core-scheduling for the usecase you mention, but still wants the cross-HT
mitigation because they are only tagging VMs (as in your usecase) and not
other tasks. Idk.

Can we roll this backwards from what you would expect as a user? How about
we make this 2-dimensional?

coresched=[on|off|secure][,force]

where "on" means "core scheduling can be done if colors are set", "off"
means "no core scheduling is done" and "secure" means "core scheduling can
be done on MDS or L1TF if colors are set".

So support for this force thing is not there ATM in the patchset. We can
always incrementally add it later. I personally don't expect users to be Ok
with tagging every single task as it is equivalent to disabling SMT and makes
coresched useless.

It just flips the default from "always consider everything safe" to "always consider everything unsafe". Inside a cgroup, you can still set the same color to make use of siblings.

Either way, I agree that it can be a follow-up.


The "force" option would then mean "apply a color to every new task".

What then happens with mitigations= is easy. "auto" means
"coresched=secure". "off" means "coresched=off" and if you want to force
core scheduling for everything if necessary, you just do mitigations=auto
coresched=auto,force.

Am I missing something obvious? :)

I guess I am confused for the following usage:
mitigations=auto,nosmt coresched=secure

Note that auto,nosmt disables SMT selectively *only if needed*. Now, you add
coresched=secure to the mix. Should auto,nosmt disable SMT or not? It should be
disabled if the user did not tag anything (because system is insecure). It
should be enabled, if they tagged things. So it really depends on user doing
the right thing. And it is super confusing already -- I would just rather
keep coresched= separate from mitigations= and document it properly. TBH-
coresched does require system admin / designer to tag things as needed so why
pretend that its easy to configure anyway? :)

coresched=secure still won't allow you to trust your system without thinking about it, while nosmt does. So I would say that nosmt does not imply anything for coresched (until ,force is available, then we're talking ...)

The main thing I'm interested in though is mitigations=off. When you know you only care about performance and not side channel security (HPC for example), then you can in general just set mitigations=off. That should definitely affect the core scheduling setting as well.


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879