Re: [PATCH v2] perf: Allow restricted kernel breakpoints on user addresses

From: Dmitry Vyukov
Date: Wed Feb 01 2023 - 04:54:02 EST


On Wed, 1 Feb 2023 at 10:34, Marco Elver <elver@xxxxxxxxxx> wrote:
>
> On Mon, 30 Jan 2023 at 11:46, Mark Rutland <mark.rutland@xxxxxxx> wrote:
> [...]
> > > This again feels like a deficiency with access_ok(). Is there a better
> > > primitive than access_ok(), or can we have something that gives us the
> > > guarantee that whatever it says is "ok" is a userspace address?
> >
> > I don't think so, since this is contextual and temporal -- a helper can't give
> > a single correct answert in all cases because it could change.
>
> That's fair, but unfortunate. Just curious: would
> copy_from_user_nofault() reliably fail if it tries to access one of
> those mappings but where access_ok() said "ok"?

I also wonder if these special mappings are ever accessible in a user
task context?
If yes, can a racing process_vm_readv/writev mess with these special mappings?

We could use copy_from_user() to probe that the watchpoint address is
legit. But I think the memory can be potentially PROT_NONE but still
legit, so copy_from_user() won't work for these corner cases.

> Though that would probably restrict us to only creating watchpoints
> for addresses that are actually mapped in the task.
>
> > In the cases we switch to another mapping, we could try to ensure that we
> > enable/disable potentially unsafe watchpoints/breakpoints.
>
> That seems it'd be too hard to reason that it's 100% safe, everywhere,
> on every arch. I'm still convinced we can prohibit creation of such
> watchpoints in the first place, but need something other than
> access_ok().
>
> > Taking a look at arm64, our idmap code might actually be ok, since we usually
> > mask all the DAIF bits (and the 'D' or 'Debug' bit masks HW
> > breakpoints/watchpoints). For EFI we largely switch to another thread (but not
> > always), so that would need some auditing.
> >
> > So if this only needs to work in per-task mode rather than system-wide mode, I
> > reckon we can have some save/restore logic around those special cases where we
> > transiently install a mapping, which would protect us.
>
> It should only work in per-task mode.
>
> > For the threads that run with special mappings in the low half, I'm not sure
> > what to do. If we've ruled out system-wide monitoring I believe those would be
> > protected from unprivileged users.
>
> Can the task actually access those special mappings, or is it only
> accessible by the kernel?
>
> Thanks,
> -- Marco