Re: [RFC PATCH 00/11] mm/mempolicy: Make task->mempolicy externally modifiable via syscall and procfs

From: Michal Hocko
Date: Tue Nov 28 2023 - 04:45:19 EST


On Mon 27-11-23 11:14:44, Gregory Price wrote:
> On Mon, Nov 27, 2023 at 04:29:56PM +0100, Michal Hocko wrote:
> > Sorry, didn't have much time to do a proper review. Couple of points
> > here at least.
> >
> > >
> > > So... yeah... the is one area I think the community very much needs to
> > > comment: set/get_mempolicy2, many new mempolicy syscalls, procfs? All
> > > of the above?
> >
> > I think we should actively avoid using proc interface. The most
> > reasonable way would be to add get_mempolicy2 interface that would allow
> > extensions and then create a pidfd counterpart to allow acting on a
> > remote task. The latter would require some changes to make mempolicy
> > code less current oriented.
>
> Sounds good, I'll pull my get/set_mempolicy2 RFC on top of this.
>
> Just context: patches 1-6 refactor mempolicy to allow remote task
> twiddling (fixing the current-oriented issues), and patch 7 adds the pidfd
> interfaces you describe above.
>
>
> Couple Questions
>
> 1) Should we consider simply adding a pidfd arg to set/get_mempolicy2,
> where if (pidfd == 0), then it operates on current, otherwise it
> operates on the target task? That would mitigate the need for what
> amounts to the exact same interface.

This wouldn't fit into existing pidfd interfaces I am aware of. We
assume pidfd to be real fd, no special cases.

> 2) Should we combine all the existing operations into set_mempolicy2 and
> add an operation arg.
>
> set_mempolicy2(pidfd, arg_struct, len)
>
> struct {
> int pidfd; /* optional */
> int operation; /* describe which op_args to use */
> union {
> struct {
> } set_mempolicy;
> struct {
> } set_vma_home_node;
> struct {
> } mbind;
> ...
> } op_args;
> } args;
>
> capturing:
> sys_set_mempolicy
> sys_set_mempolicy_home_node
> sys_mbind
>
> or should we just make a separate interface for mbind/home_node to
> limit complexity of the single syscall?

My preference would be to go with specific syscalls. Multiplexing
syscalls have turned much more complex and less flexible over time.
Just have a look at futex.
--
Michal Hocko
SUSE Labs