Re: [PATCH] KVM: x86/mmu: Remove KVM MMU write lock when accessing indirect_shadow_pages

From: Mingwei Zhang
Date: Tue Jun 06 2023 - 20:24:26 EST


> > Hmm. I agree with both points above, but below, the change seems too
> > heavyweight. smp_wb() is a mfence(), i.e., serializing all
> > loads/stores before the instruction. Doing that for every shadow page
> > creation and destruction seems a lot.
>
> No, the smp_*b() variants are just compiler barriers on x86.

hmm, it is a "lock addl" now for smp_mb(). Check this: 450cbdd0125c
("locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE")

So this means smp_mb() is not a free lunch and we need to be a little
bit careful.

>
> > In fact, the case that only matters is '0->1' which may potentially
> > confuse kvm_mmu_pte_write() when it reads 'indirect_shadow_count', but
> > the majority of the cases are 'X => X + 1' where X != 0. So, those
> > cases do not matter. So, if we want to add barriers, we only need it
> > for 0->1. Maybe creating a new variable and not blocking
> > account_shadow() and unaccount_shadow() is a better idea?
> >
> > Regardless, the above problem is related to interactions among
> > account_shadow(), unaccount_shadow() and kvm_mmu_pte_write(). It has
> > nothing to do with the 'reexecute_instruction()', which is what this
> > patch is about. So, I think having a READ_ONCE() for
> > reexecute_instruction() should be good enough. What do you think.
>
> The reexecute_instruction() case should be fine without any fanciness, it's
> nothing more than a heuristic, i.e. neither a false positive nor a false negative
> will impact functional correctness, and nothing changes regardless of how many
> times the compiler reads the variable outside of mmu_lock.
>
> I was thinking that it would be better to have a single helper to locklessly
> access indirect_shadow_pages, but I agree that applying the barriers to
> reexecute_instruction() introduces a different kind of confusion.
>
> Want to post a v2 of yours without a READ_ONCE(), and I'll post a separate fix
> for the theoretical kvm_mmu_pte_write() race? And then Paolo can tell me that
> there's no race and school me on lockless programming once more ;-)

yeah, that works for me.