Re: [PATCH v2] perf: Allow restricted kernel breakpoints on user addresses

From: Dmitry Vyukov
Date: Wed Feb 01 2023 - 07:01:36 EST


On Wed, 1 Feb 2023 at 12:54, Mark Rutland <mark.rutland@xxxxxxx> wrote:
>
> Hi Dmitry,
>
> We raced to reply here, so there's more detail in my reply to Marco. I'm
> providing minimal detail here, sorry for being terse! :)
>
> On Wed, Feb 01, 2023 at 10:53:44AM +0100, Dmitry Vyukov wrote:
> > On Wed, 1 Feb 2023 at 10:34, Marco Elver <elver@xxxxxxxxxx> wrote:
> > >
> > > On Mon, 30 Jan 2023 at 11:46, Mark Rutland <mark.rutland@xxxxxxx> wrote:
> > > [...]
> > > > > This again feels like a deficiency with access_ok(). Is there a better
> > > > > primitive than access_ok(), or can we have something that gives us the
> > > > > guarantee that whatever it says is "ok" is a userspace address?
> > > >
> > > > I don't think so, since this is contextual and temporal -- a helper can't give
> > > > a single correct answert in all cases because it could change.
> > >
> > > That's fair, but unfortunate. Just curious: would
> > > copy_from_user_nofault() reliably fail if it tries to access one of
> > > those mappings but where access_ok() said "ok"?
> >
> > I also wonder if these special mappings are ever accessible in a user
> > task context?
>
> No. The special mappings are actually distinct page tables from the user page
> tables, so whenever userspace is executing and can issue a syscall, the user
> page tables are installed.
>
> The special mappings are only installed for transient periods within the
> context of a user task. There *might* be some latent issues with work happening
> in IPI context (e.g. perf user backtrace) on some architectures.
>
> > If yes, can a racing process_vm_readv/writev mess with these special mappings?
>
> No; those happen in task context, and cannot be invoked within the critical
> section where the page tables with the special mappings are installed.
>
> > We could use copy_from_user() to probe that the watchpoint address is
> > legit. But I think the memory can be potentially PROT_NONE but still
> > legit, so copy_from_user() won't work for these corner cases.
>
> Please see my other reply; ahead-of-time checks cannot help here. An address
> might be a legitimate user address and *also* transiently be a special mapping
> (since the two aare in entirely separate page tables).

This brings more clarity. Thanks for the explanations.

If addresses overlap, then it seems that the kernel must disable all
watchpoints while the mapping is installed. This patch tries to relax
checks, but CAP_ADMIN can install such watchpoints today. And they can
unintentionally break kernel, or produce false watchpoint triggers.
And if all watchpoints are disabled while the mapping is installed,
then this patch should be OK, right?