RE: [PATCH 0/2] execve scalability issues, part 1

From: David Laight
Date: Wed Aug 23 2023 - 06:51:57 EST


From: Jan Kara
> Sent: Wednesday, August 23, 2023 10:49 AM
....
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -737,7 +737,11 @@ struct mm_struct {
> >
> > unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for
> > /proc/PID/auxv */
> >
> > - struct percpu_counter rss_stat[NR_MM_COUNTERS];
> > + union {
> > + struct percpu_counter rss_stat[NR_MM_COUNTERS];
> > + u64 *rss_stat_single;
> > + };
> > + bool magic_flag_stuffed_elsewhere;

I wouldn't use a union to save a pointer - it is asking for trouble.

> >
> > struct linux_binfmt *binfmt;
> >
> >
> > Then for single-threaded case an area is allocated for NR_MM_COUNTERS
> > countes * 2 -- first set updated without any synchro by current
> > thread. Second set only to be modified by others and protected with
> > mm->arg_lock. The lock protects remote access to the union to begin
> > with.
>
> arg_lock seems a bit like a hack. How is it related to rss_stat? The scheme
> with two counters is clever but I'm not 100% convinced the complexity is
> really worth it. I'm not sure the overhead of always using an atomic
> counter would really be measurable as atomic counter ops in local CPU cache
> tend to be cheap. Did you try to measure the difference?

A separate lock is worse than atomics.
(Although some 32bit arch may have issues with 64bit atomics.)

I think you'll be surprised just how slow atomic ops are.
Even when present in the local cache.
(Probably because any other copies have to be invalidated.)

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)