Re: [PATCH] KVM: arm64: Use BTI for pKVM

From: Marc Zyngier
Date: Wed May 17 2023 - 10:20:07 EST


On Wed, 17 May 2023 09:49:50 +0100,
Mostafa Saleh <smostafa@xxxxxxxxxx> wrote:
>
> Hi Marc,
>
> On Tue, May 16, 2023 at 04:47:10PM +0100, Marc Zyngier wrote:
> > On Tue, 16 May 2023 15:18:46 +0100,
> > Mostafa Saleh <smostafa@xxxxxxxxxx> wrote:
> > >
> > > CONFIG_ARM64_BTI_KERNEL compiles the kernel to support ARMv8.5-BTI.
> > > However, the nvhe code doesn't make use of it as it doesn't map any
> > > pages with Guarded Page(GP) bit.
> > >
> > > This patch maps pKVM .text section with GP bit which matches the
> > > kernel handling for BTI.
> >
> > Why pKVM only? Surely we can benefit from it all over the nvhe code,
> > right?
> Yes, I will add it also for nvhe in v2.
>
> > >
> > > A new flag is added to enum kvm_pgtable_prot: KVM_PGTABLE_PROT_GP_S1,
> > > which represents BTI guarded page in hypervisor stage-1 page table.
> > >
> > > Signed-off-by: Mostafa Saleh <smostafa@xxxxxxxxxx>
> > > ---
> > > arch/arm64/include/asm/kvm_pgtable.h | 3 +++
> > > arch/arm64/kvm/hyp/nvhe/setup.c | 8 ++++++--
> > > arch/arm64/kvm/hyp/pgtable.c | 6 ++++--
> > > 3 files changed, 13 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > > index 4cd6762bda80..5bcd06d664d3 100644
> > > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > > @@ -151,6 +151,7 @@ enum kvm_pgtable_stage2_flags {
> > > * @KVM_PGTABLE_PROT_W: Write permission.
> > > * @KVM_PGTABLE_PROT_R: Read permission.
> > > * @KVM_PGTABLE_PROT_DEVICE: Device attributes.
> > > + * @KVM_PGTABLE_PROT_GP_S1: GP(guarded page) used for BTI in stage-1 only
> > > * @KVM_PGTABLE_PROT_SW0: Software bit 0.
> > > * @KVM_PGTABLE_PROT_SW1: Software bit 1.
> > > * @KVM_PGTABLE_PROT_SW2: Software bit 2.
> > > @@ -163,6 +164,8 @@ enum kvm_pgtable_prot {
> > >
> > > KVM_PGTABLE_PROT_DEVICE = BIT(3),
> > >
> > > + KVM_PGTABLE_PROT_GP_S1 = BIT(50),
> > > +
> > > KVM_PGTABLE_PROT_SW0 = BIT(55),
> > > KVM_PGTABLE_PROT_SW1 = BIT(56),
> > > KVM_PGTABLE_PROT_SW2 = BIT(57),
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> > > index 110f04627785..95f80e2b2946 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> > > @@ -66,7 +66,7 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
> > > {
> > > void *start, *end, *virt = hyp_phys_to_virt(phys);
> > > unsigned long pgt_size = hyp_s1_pgtable_pages() << PAGE_SHIFT;
> > > - enum kvm_pgtable_prot prot;
> > > + enum kvm_pgtable_prot prot = PAGE_HYP_EXEC;
> > > int ret, i;
> > >
> > > /* Recreate the hyp page-table using the early page allocator */
> > > @@ -88,7 +88,11 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
> > > if (ret)
> > > return ret;
> > >
> > > - ret = pkvm_create_mappings(__hyp_text_start, __hyp_text_end, PAGE_HYP_EXEC);
> > > + /* Hypervisor text is mapped as guarded pages(GP). */
> > > + if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) && cpus_have_const_cap(ARM64_BTI))
> > > + prot |= KVM_PGTABLE_PROT_GP_S1;
> >
> > Is there any reason why this isn't a final cap? I also dislike the
> > IS_ENABLED(), but I can see that we don't have separate caps for
> > in-kernel BTI and userspace visible BTI...
> I was trying to make this close to EL1 code (system_supports_bti()),
> I see in hypervisor cpus_have_const_cap is the same as cpus_have_final_cap.

Ah, yes. If that's immaterial, then no need for a change.

> Yes, I don't see a way to distinguish if BTI was enabled for the kernel
> in EL2 without CONFIG_ARM64_BTI_KERNEL.
>
> > > +
> > > + ret = pkvm_create_mappings(__hyp_text_start, __hyp_text_end, prot);
> > > if (ret)
> > > return ret;
> > >
> > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > > index 3d61bd3e591d..028e198acd48 100644
> > > --- a/arch/arm64/kvm/hyp/pgtable.c
> > > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > > @@ -145,7 +145,8 @@ static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
> > > u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
> > > KVM_PTE_TYPE_BLOCK;
> > >
> > > - pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
> > > + pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI |
> > > + KVM_PGTABLE_PROT_GP_S1);
> > > pte |= FIELD_PREP(KVM_PTE_TYPE, type);
> > > pte |= KVM_PTE_VALID;
> > >
> > > @@ -378,7 +379,8 @@ static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
> > > attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_AP, ap);
> > > attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
> > > attr |= KVM_PTE_LEAF_ATTR_LO_S1_AF;
> > > - attr |= prot & KVM_PTE_LEAF_ATTR_HI_SW;
> > > + attr |= prot & (KVM_PTE_LEAF_ATTR_HI_SW | KVM_PGTABLE_PROT_GP_S1);
> > > +
> >
> > You should probably check that the page is executable before blindly
> > accepting to set the GP bit (don't accept it for non-exec pages).
> Will do in v2.
>
> > Another thing to check would be the state of SCTLR_EL2.BT, which I
> > think we clear by construction, but it be worth having a look.
> Yes, I see it is initialised by zero in ___kvm_hyp_init in hyp-init.S,
> I believe this should be changed to 1 when BTI is enabled (as in
> bti_enable() for EL1), I will update it.

Yup, I'd like to see some level of consistency with what we do at EL1,
at least so that we are all aligned on what we allow from an ABI
perspective.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.