Re: [PATCH 2/2] mm/page_alloc: Add remote draining support to per-cpu lists

From: Paul E. McKenney
Date: Tue Feb 15 2022 - 12:32:25 EST


On Tue, Feb 15, 2022 at 09:47:35AM +0100, Nicolas Saenz Julienne wrote:
> On Tue, 2022-02-08 at 12:47 -0300, Marcelo Tosatti wrote:
> > > Changes since RFC:
> > > - Avoid unnecessary spin_lock_irqsave/restore() in free_pcppages_bulk()
> > > - Add more detail to commit and code comments.
> > > - Use synchronize_rcu() instead of synchronize_rcu_expedited(), the RCU
> > > documentation says to avoid it unless really justified. I don't think
> > > it's justified here, if we can schedule and join works, waiting for
> > > an RCU grace period is OK.
> >
> > https://patchwork.ozlabs.org/project/netdev/patch/1306228052.3026.16.camel@edumazet-laptop/
> >
> > Adding 100ms to direct reclaim path might be problematic. It will also
> > slowdown kcompactd (note it'll call drain_all_pages for each zone).
>
> I did some measurements on an idle machine, worst case was ~30ms. I agree that
> might too much for direct reclaim, so I'll switch back to expedited and add a
> comment.

Given your measurements, it looks to me like this is a case where use
of expedited grace periods really is justified.

For one thing, expedited grace periods are much less disruptive than
they were in the old days, for example, back when they used stop-machine.
For another thing, systems that cannot tolerate the disturbance (an IPI
per non-idle non-nohz_full CPU per grace period, less than a wakeup)
can always be booted with rcupdate.rcu_normal=1, which will make
synchronize_rcu_expedited() act like synchronize_rcu(), at least once
RCU has spawned its kthreads. And CONFIG_PREEMPT_RT=y kernels forcibly
set this mode. ;-)

Nevertheless, expedited grace periods should not be used lightly because
they do increase overhead.

Thanx, Paul

> > > - Avoid sparse warnings by using rcu_access_pointer() and
> > > rcu_dereference_protected().
> > >
> > > include/linux/mmzone.h | 22 +++++-
> > > mm/page_alloc.c | 155 ++++++++++++++++++++++++++---------------
> > > mm/vmstat.c | 6 +-
> > > 3 files changed, 120 insertions(+), 63 deletions(-)
> > >
> > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > > index b4cb85d9c6e8..b0b593fd8e48 100644
> > > --- a/include/linux/mmzone.h
> > > +++ b/include/linux/mmzone.h
> > > @@ -388,13 +388,31 @@ struct per_cpu_pages {
> > > short expire; /* When 0, remote pagesets are drained */
> > > #endif
> > >
> > > - struct pcplists *lp;
> > > + /*
> > > + * As a rule of thumb, any access to struct per_cpu_pages's 'lp' has
> > > + * happen with the pagesets local_lock held and using
> > > + * rcu_dereference_check(). If there is a need to modify both
> > > + * 'lp->count' and 'lp->lists' in the same critical section 'pcp->lp'
> > > + * can only be derefrenced once. See for example:
> >
> > Typo.
>
> Noted.
>
> Thanks!
>
> --
> Nicolás Sáenz
>