Re: [PATCH v10 1/4] random: add vgetrandom_alloc() syscall

From: Jason A. Donenfeld
Date: Fri Dec 02 2022 - 09:38:40 EST


On Wed, Nov 30, 2022 at 05:38:13PM +0100, Jason A. Donenfeld wrote:
> On Wed, Nov 30, 2022 at 04:39:55PM +0100, Jason A. Donenfeld wrote:
> > 2) Convert vgetrandom_alloc() into a clone3-style syscall, as Christian
> > suggested earlier, which might allow for a bit more overloading
> > capability. That would be a struct that looks like:
> >
> > struct vgetrandom_alloc_args {
> > __aligned_u64 flags;
> > __aligned_u64 states;
> > __aligned_u64 num;
> > __aligned_u64 size_of_each;
> > }
> >
> > - If flags is VGRA_ALLOCATE, states and size_of_each must be zero on
> > input, while num is the hint, as is the case now. On output, states,
> > size_of_each, and num are filled in.
> >
> > - If flags is VGRA_DEALLOCATE, states, size_of_each, and num must be as
> > they were originally, and then it deallocates.
> >
> > I suppose (2) would alleviate your concerns entirely, without future
> > uncertainty over what it'd be like to add special cases to munmap(). And
> > it'd add a bit more future proofing to the syscall, depending on what we
> > do.
> >
> > So maybe I'm warming up to that approach a bit.
>
> So I just did a little quick implementation to see what it'd feel like,
> and actually, it's quite simple, and might address a lot of concerns all
> at once. What do you think of the below? Documentation and such still
> needs work obviously, but the bones should be there.

Well, despite writing into the ether here, I continue to chase my tail
around in circles over this. After Adhemerval expressed a sort of "meh"
opinion to me on IRC around doing the clone3-like thing, I went down a
mm rabbit hole and started looking at all the various ways memory is
allocated in userspace and under what conditions and for what and why.
Turns out there are a few drivers doing interesting things in this
space.

The long and short of it is that:
- All addresses involve maps and page tables.
- Allocating is mapping, deallocating is unmapping, and there's no way
around that.
- Memory that's "special" usually comes with special attributes or
operations on its vma.

So, this makes me think that `munmap` is the fine *and correct* API for
deallocation. It's what everything else uses, even "special" things. And
it doesn't constrain us in the future in case this gets "registered"
somehow, as Florian described it, because it's still attached to
current->mm and will still always go through the same mapping APIs
anyway.

In light of that, I'm going to stick with the original API design, and
not do the clone3() args struct thing and the VGRA_DEALLOCATE flag.
However, I think it'd be a good idea to add an additional parameter of
"unsigned long addr", which is enforced/reserved to be always 0 for now.
This might prove useful for something together with the currently unused
flags argument, sometime in the future.

Jason