Re: [RFD] x86/split_lock: Request to Intel

From: Sean Christopherson
Date: Thu Oct 17 2019 - 13:23:15 EST


On Thu, Oct 17, 2019 at 02:29:45PM +0200, Thomas Gleixner wrote:
> The more I look at this trainwreck, the less interested I am in merging any
> of this at all.
>
> The fact that it took Intel more than a year to figure out that the MSR is
> per core and not per thread is yet another proof that this industry just
> works by pure chance.
>
> There is a simple way out of this misery:
>
> Intel issues a microcode update which does:
>
> 1) Convert the OR logic of the AC enable bit in the TEST_CTRL MSR to
> AND logic, i.e. when one thread disables AC it's automatically
> disabled on the core.
>
> Alternatively it supresses the #AC when the current thread has it
> disabled.
>
> 2) Provide a separate bit which indicates that the AC enable logic is
> actually AND based or that #AC is supressed when the current thread
> has it disabled.
>
> Which way I don't really care as long as it makes sense.

The #AC bit doesn't use OR-logic, it's straight up shared, i.e. writes on
one CPU are immediately visible on its sibling CPU. It doesn't magically
solve the problem, but I don't think we need IPIs to coordinate between
siblings, e.g. wouldn't something like this work? The per-cpu things
being pointers that are shared by siblings.

void split_lock_disable(void)
{
spinlock_t *ac_lock = this_cpu_ptr(split_lock_ac_lock);

spin_lock(ac_lock);
if (this_cpu_inc_return(*split_lock_ac_disabled) == 1)
WRMSR(RDMSR() & ~bit);
spin_unlock(ac_lock);
}

void split_lock_enable(void)
{
spinlock_t *ac_lock = this_cpu_ptr(split_lock_ac_lock);

spin_lock(ac_lock);
if (this_cpu_dec_return(*split_lock_ac_disabled) == 0)
WRMSR(RDMSR() | bit);
spin_unlock(ac_lock);
}


To avoid the spin_lock and WRMSR latency on every VM-Enter and VM-Exit,
actions (3a) and (4a) from your matrix (copied below) could be changed to
only do split_lock_disable() if the guest actually generates an #AC, and
then do split_lock_enable() on the next VM-Exit. Assuming even legacy
guests are somewhat sane and rarely do split-locks, lazily disabling the
control would eliminate most of the overhead and would also reduce the
time that the sibling CPU is running in the host without #AC protection.


N | #AC | #AC enabled | SMT | Ctrl | Guest | Action
R | available | on host | | exposed | #AC |
--|-----------|-------------|-----|---------|-------|---------------------
| | | | | |
0 | N | x | x | N | x | None
| | | | | |
1 | Y | N | x | N | x | None
| | | | | |
2 | Y | Y | x | Y | Y | Forward to guest
| | | | | |
3 | Y | Y | N | Y | N | A) Store in vCPU and
| | | | | | toggle on VMENTER/EXIT
| | | | | |
| | | | | | B) SIGBUS or KVM exit code
| | | | | |
4 | Y | Y | Y | Y | N | A) Disable globally on
| | | | | | host. Store in vCPU/guest
| | | | | | state and evtl. reenable
| | | | | | when guest goes away.
| | | | | |
| | | | | | B) SIGBUS or KVM exit code


> If that's not going to happen, then we just bury the whole thing and put it
> on hold until a sane implementation of that functionality surfaces in
> silicon some day in the not so foreseeable future.
>
> Seriously, this makes only sense when it's by default enabled and not
> rendered useless by VIRT. Otherwise we never get any reports and none of
> the issues are going to be fixed.
>
> Thanks,
>
> tglx