Re: [PATCH 08/14] KVM: arm64: Add support for tagging shared pages in page-table

From: Marc Zyngier
Date: Tue Jul 20 2021 - 06:13:52 EST


On Mon, 19 Jul 2021 16:49:13 +0100,
Quentin Perret <qperret@xxxxxxxxxx> wrote:
>
> On Monday 19 Jul 2021 at 15:43:34 (+0100), Marc Zyngier wrote:
> > On Mon, 19 Jul 2021 11:47:29 +0100,
> > Quentin Perret <qperret@xxxxxxxxxx> wrote:
> > >
> > > The hypervisor will soon be in charge of tracking ownership of all
> > > memory pages in the system. The current page-tracking infrastructure at
> > > EL2 only allows binary states: a page is either owned or not by an
> > > entity. But a number of use-cases will require more complex states for
> > > pages that are shared between two entities (host, hypervisor, or guests).
> > >
> > > In preparation for supporting these use-cases, introduce in the KVM
> > > page-table library some infrastructure allowing to tag shared pages
> > > using ignored bits (a.k.a. software bits) in PTEs.
> > >
> > > Signed-off-by: Quentin Perret <qperret@xxxxxxxxxx>
> > > ---
> > > arch/arm64/include/asm/kvm_pgtable.h | 5 +++++
> > > arch/arm64/kvm/hyp/pgtable.c | 25 +++++++++++++++++++++++++
> > > 2 files changed, 30 insertions(+)
> > >
> > > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > > index dd72653314c7..f6d3d5c8910d 100644
> > > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > > @@ -81,6 +81,8 @@ enum kvm_pgtable_stage2_flags {
> > > * @KVM_PGTABLE_PROT_W: Write permission.
> > > * @KVM_PGTABLE_PROT_R: Read permission.
> > > * @KVM_PGTABLE_PROT_DEVICE: Device attributes.
> > > + * @KVM_PGTABLE_STATE_SHARED: Page shared with another entity.
> > > + * @KVM_PGTABLE_STATE_BORROWED: Page borrowed from another entity.
> > > */
> > > enum kvm_pgtable_prot {
> > > KVM_PGTABLE_PROT_X = BIT(0),
> > > @@ -88,6 +90,9 @@ enum kvm_pgtable_prot {
> > > KVM_PGTABLE_PROT_R = BIT(2),
> > >
> > > KVM_PGTABLE_PROT_DEVICE = BIT(3),
> > > +
> > > + KVM_PGTABLE_STATE_SHARED = BIT(4),
> > > + KVM_PGTABLE_STATE_BORROWED = BIT(5),
> >
> > I'd rather have some indirection here, as we have other potential
> > users for the SW bits outside of pKVM (see the NV series, which uses
> > some of these SW bits as the backend for TTL-based TLB invalidation).
> >
> > Can we instead only describe the SW bit states in this enum, and let
> > the users map the semantic they require onto that state? See [1] for
> > what I carry in the NV branch.
>
> Works for me -- I just wanted to make sure we don't have users in
> different places that use the same bits without knowing, but no strong
> opinions, so happy to change.
>
> > > };
> > >
> > > #define KVM_PGTABLE_PROT_RW (KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W)
> > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > > index 5bdbe7a31551..51598b79dafc 100644
> > > --- a/arch/arm64/kvm/hyp/pgtable.c
> > > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > > @@ -211,6 +211,29 @@ static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
> > > return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
> > > }
> > >
> > > +static kvm_pte_t pte_ignored_bit_prot(enum kvm_pgtable_prot prot)
> >
> > Can we call these sw rather than ignored?
>
> Sure.
>
> > > +{
> > > + kvm_pte_t ignored_bits = 0;
> > > +
> > > + /*
> > > + * Ignored bits 0 and 1 are reserved to track the memory ownership
> > > + * state of each page:
> > > + * 00: The page is owned solely by the page-table owner.
> > > + * 01: The page is owned by the page-table owner, but is shared
> > > + * with another entity.
> > > + * 10: The page is shared with, but not owned by the page-table owner.
> > > + * 11: Reserved for future use (lending).
> > > + */
> > > + if (prot & KVM_PGTABLE_STATE_SHARED) {
> > > + if (prot & KVM_PGTABLE_STATE_BORROWED)
> > > + ignored_bits |= BIT(1);
> > > + else
> > > + ignored_bits |= BIT(0);
> > > + }
> > > +
> > > + return FIELD_PREP(KVM_PTE_LEAF_ATTR_IGNORED, ignored_bits);
> > > +}
> > > +
> > > static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
> > > u32 level, kvm_pte_t *ptep,
> > > enum kvm_pgtable_walk_flags flag)
> > > @@ -357,6 +380,7 @@ static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
> > > attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_AP, ap);
> > > attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
> > > attr |= KVM_PTE_LEAF_ATTR_LO_S1_AF;
> > > + attr |= pte_ignored_bit_prot(prot);
> > > *ptep = attr;
> > >
> > > return 0;
> > > @@ -558,6 +582,7 @@ static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot p
> > >
> > > attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
> > > attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
> > > + attr |= pte_ignored_bit_prot(prot);
> > > *ptep = attr;
> > >
> > > return 0;
> >
> > How about kvm_pgtable_stage2_relax_perms()?
>
> It should leave SW bits untouched, and it really felt like a path were
> we want to change permissions and nothing else. What did you have in
> mind?

It isn't clear to me that it would not (cannot?) be used to change
other bits, given that it takes an arbitrary 'prot' set. If there is
such an intended restriction, we definitely should document it.

M.

--
Without deviation from the norm, progress is not possible.