Re: [PATCH 6/9] x86, pkeys: add pkey set/get syscalls

From: Mel Gorman
Date: Fri Jul 08 2016 - 06:22:44 EST


On Thu, Jul 07, 2016 at 10:33:00AM -0700, Dave Hansen wrote:
> On 07/07/2016 07:45 AM, Mel Gorman wrote:
> > On Thu, Jul 07, 2016 at 05:47:28AM -0700, Dave Hansen wrote:
> >> >
> >> > From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> >> >
> >> > This establishes two more system calls for protection key management:
> >> >
> >> > unsigned long pkey_get(int pkey);
> >> > int pkey_set(int pkey, unsigned long access_rights);
> >> >
> >> > The return value from pkey_get() and the 'access_rights' passed
> >> > to pkey_set() are the same format: a bitmask containing
> >> > PKEY_DENY_WRITE and/or PKEY_DENY_ACCESS, or nothing set at all.
> >> >
> >> > These can replace userspace's direct use of the new rdpkru/wrpkru
> >> > instructions.
> ...
> > This one feels like something that can or should be implemented in
> > glibc.
>
> I generally agree, except that glibc doesn't have any visibility into
> whether a pkey is currently valid or not.
>

Well, it could if it tracked the pkey_alloc/pkey_free calls too. I accept
that's not perfect as nothing prevents the syscalls being used directly.

> > Applications that frequently get
> > called will get hammed into the ground with serialisation on mmap_sem
> > not to mention the cost of the syscall entry/exit.
>
> I think we can do both of them without mmap_sem, as long as we resign
> ourselves to this just being fundamentally racy (which it is already, I
> think). But, is it worth performance-tuning things that we don't expect
> performance-sensitive apps to be using in the first place? They'll just
> use the RDPKRU/WRPKRU instructions directly.
>

I accept the premature optimisation arguement but I think it'll eventually
bite us. Why this red-flagged for me was because so many people have
complained about just system call overhead when using particular types of
hardware -- DAX springs to mind with the MAP_PMEM_AWARE discussions. Using
mmap_sem means that pkey operations stop parallel faults, mmaps and so on. If
the applications that care are trying to minimise page table operations,
TLB flushes and so on, they might not be that happy if parallel faults
are stalled.

I think whether you serialise pkey_get/pkey_set operations or not, it's
going to be inherently racy with different sized windows. A sequence counter
would be sufficient to protect it to prevent partial reads. If userspace
cares about the race, then userspace is going to have to serialise its
threads access to the keys anyway.

--
Mel Gorman
SUSE Labs