Re: [External] Re: [PATCH v2] mm: add new syscall pidfd_set_mempolicy().

From: Michal Hocko
Date: Wed Nov 16 2022 - 09:57:42 EST


On Wed 16-11-22 19:28:10, Zhongkun He wrote:
> Hi Michal, I've done the performance testing, please check it out.
>
> > > Yes this is all understood but the level of the overhead is not really
> > > clear. So the question is whether this will induce a visible overhead.
> > > Because from the maintainability point of view it is much less costly to
> > > have a clear life time model. Right now we have a mix of reference
> > > counting and per-task requirements which is rather subtle and easy to
> > > get wrong. In an ideal world we would have get_vma_policy always
> > > returning a reference counted policy or NULL. If we really need to
> > > optimize for cache line bouncing we can go with per cpu reference
> > > counters (something that was not available at the time the mempolicy
> > > code has been introduced).
> > >
> > > So I am not saying that the task_work based solution is not possible I
> > > just think that this looks like a good opportunity to get from the
> > > existing subtle model.
>
> Test tools:
> numactl -m 0-3 ./run-mmtests.sh -n -c configs/config-workload-
> aim9-pagealloc test_name
>
> Modification:
> Get_vma_policy(), get_task_policy() always returning a reference
> counted policy, except for the static policy(default_policy and
> preferred_node_policy[nid]).

It would be better to add the patch that has been tested.

> All vma manipulation is protected by a down_read, so mpol_get()
> can be called directly to take a refcount on the mpol. but there
> is no lock in task->mempolicy context.
> so task->mempolicy should be protected by task_lock.
>
> struct mempolicy *get_task_policy(struct task_struct *p)
> {
> struct mempolicy *pol;
> int node;
>
> if (p->mempolicy) {
> task_lock(p);
> pol = p->mempolicy;
> mpol_get(pol);
> task_unlock(p);
> if (pol)
> return pol;
> }


One way to deal with that would be to use a similar model as css_tryget

Btw. have you tried to profile those slowdowns to identify hotspots?

Thanks
--
Michal Hocko
SUSE Labs