Re: [PATCH 2/6] PKEY: Add arch_check_pkey_enforce_api()

From: Jeff Xu
Date: Thu May 18 2023 - 18:52:22 EST


On Thu, May 18, 2023 at 2:43 PM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> On 5/15/23 06:05, jeffxu@xxxxxxxxxxxx wrote:
> > +static inline int __arch_check_vma_pkey_for_write(struct vm_area_struct *vma)
> > +{
> > + int pkey = vma_pkey(vma);
> > +
> > + if (mm_pkey_enforce_api(vma->vm_mm, pkey)) {
> > + if (!__pkru_allows_write(read_pkru(), pkey))
> > + return -EACCES;
> > + }
> > +
> > + return 0;
> > +}
>
> Please think very carefully about what I'm about to say:
>
> What connects vma->vm_mm to read_pkru() here?
>
> Now think about what happens when we have kthread_use_mm() or a ptrace()
> doing get_task_mm() and working on another process's mm or VMA.
>
> Look at arch_vma_access_permitted() and notice how it avoids read_pkru()
> for 'foreign' aka. 'remote' accesses:
>
> > static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
> > bool write, bool execute, bool foreign)
> > {
> ...
> > if (foreign || vma_is_foreign(vma))
> > return true;
> > return // check read_pkru()
> > }
>
> In other words, it lets all remote accesses right through. That's
> because there is *NOTHING* that fundamentally and tightly connects the
> PKRU value in this context to the VMA or the context that initiated this
> operation.
>
> If your security model depends on PKRU protection, this 'remote'
> disconnection is problematic. The PKRU enforcement inside the kernel is
> best-effort. That usually doesn't map into the security space very well.
>
> Do you have a solid handle on all call paths that will reach
> __arch_check_vma_pkey_for_write() and can you ensure they are all
> non-remote?

Is this about the attack scenario where the attacker uses ptrace()
into the chrome process ? if so it is not in our threat model, and
that is more related to sandboxing on the host.

Or is this about io_uring? Yes, io_uring kernel thread breaks our
expectations of PKRU & user space threads, however I thought the break
is not just for this - any syscall involved in memory operation will
break after into io_uring ?

Other than those, yes, I try to ensure the check is only used at the
beginning of syscall entry in all cases, which should be non-remote I
hope.

Thanks
-Jeff