Re: [PATCH v12 19/22] x86/kexec(): Reset TDX private memory on platforms with TDX erratum

From: Huang, Kai
Date: Thu Jun 29 2023 - 05:45:31 EST


On Thu, 2023-06-29 at 05:38 +0000, Huang, Kai wrote:
> On Thu, 2023-06-29 at 03:19 +0000, Huang, Kai wrote:
> > On Wed, 2023-06-28 at 12:20 +0300, Nikolay Borisov wrote:
> > > > + atomic_inc_return(&tdx_may_has_private_mem);
> > > > +
> > > >     /* Config the key of global KeyID on all packages */
> > > >     ret = config_global_keyid();
> > > >     if (ret)
> > > > @@ -1154,6 +1167,15 @@ static int init_tdx_module(void)
> > > >     * as suggested by the TDX spec.
> > > >     */
> > > >     tdmrs_reset_pamt_all(&tdx_tdmr_list);
> > > > + /*
> > > > + * No more TDX private pages now, and PAMTs/TDMRs are
> > > > + * going to be freed.  Make this globally visible so
> > > > + * tdx_reset_memory() can read stable TDMRs/PAMTs.
> > > > + *
> > > > + * Note atomic_dec_return(), which is an atomic RMW with
> > > > + * return value, always enforces the memory barrier.
> > > > + */
> > > > + atomic_dec_return(&tdx_may_has_private_mem);
> > >
> > > Make a comment here which either refers to the comment at the increment
> > > site.
> >
> > I guess I got your point. Will try to make better comments.
> >
> > >
> > > >    out_free_pamts:
> > > >     tdmrs_free_pamt_all(&tdx_tdmr_list);
> > > >    out_free_tdmrs:
> > > > @@ -1229,6 +1251,63 @@ int tdx_enable(void)
> > > >    }
> > > >    EXPORT_SYMBOL_GPL(tdx_enable);
> > > >   
> > > > +/*
> > > > + * Convert TDX private pages back to normal on platforms with
> > > > + * "partial write machine check" erratum.
> > > > + *
> > > > + * Called from machine_kexec() before booting to the new kernel.
> > > > + */
> > > > +void tdx_reset_memory(void)
> > > > +{
> > > > + if (!platform_tdx_enabled())
> > > > + return;
> > > > +
> > > > + /*
> > > > + * Kernel read/write to TDX private memory doesn't
> > > > + * cause machine check on hardware w/o this erratum.
> > > > + */
> > > > + if (!boot_cpu_has_bug(X86_BUG_TDX_PW_MCE))
> > > > + return;
> > > > +
> > > > + /* Called from kexec() when only rebooting cpu is alive */
> > > > + WARN_ON_ONCE(num_online_cpus() != 1);
> > > > +
> > > > + if (!atomic_read(&tdx_may_has_private_mem))
> > > > + return;
> > >
> > > I think a comment is warranted here explicitly calling our the ordering
> > > requirement/guarantees. Actually this is a non-rmw operation so it
> > > doesn't have any bearing on the ordering/implicit mb's achieved at the
> > > "increment" site.
> >
> > We don't need explicit ordering/barrier here, if I am not missing something.
> > The atomic_{inc/dec}_return() already made sure the memory ordering -- which
> > guarantees when @tdx_may_has_private_mem reads true _here_, the TDMRs/PAMTs must
> > be stable.
> >
> > Quoted from Documentation/atomic_t.txt:
> >
> > "
> > - RMW operations that have a return value are fully ordered;  
> >
> > ...
> >
> > Fully ordered primitives are ordered against everything prior and everything
> > subsequent. Therefore a fully ordered primitive is like having an smp_mb()
> > before and an smp_mb() after the primitive.
> > "
> >
> >
> > Am I missing anything?
>
> OK I guess I figured out by myself after more thinking. Although the
> atomic_{inc|dec}_return() code path has guaranteed when @tdx_may_has_private_mem
> is true, TDMRs/PAMTs are stable, but here in the reading path, the code below
>
> tdmrs_reset_pamt_all(&tdx_tdmr_list);
>
> may still be executed speculatively before the if () statement completes
>
> if (!atomic_read(&tdx_may_has_private_mem))
> return;
>
> So we need CPU memory barrier instead of compiler barrier.
>

(Sorry for multiple replies)

Hmm.. reading the SDM more carefully, the speculative execution shouldn't
matter. It may cause instruction/data being fetched to the cache, etc, but the
instruction shouldn't take effort unless the above branch predication truly
turns out to be the right result.

What matters is memory reads/writes order. On x86, per SDM on single processor
(which is the case here) basically reads/writes are not reordered:

"
In a single-processor system for memory regions defined as write-back cacheable,
the memory-ordering model respects the following principles ...:
• Reads are not reordered with other reads.
• Writes are not reordered with older reads.
• Writes to memory are not reordered with other writes, with the following
exceptions:
(string operations/non-temporal moves)
...
"

So in practice there should be no problem. But I will just use the correct
atomic_t operations to force a memory barrier here.