Re: [PATCH v4 1/3] cacheinfo: Add arch specific early level initializer

From: Radu Rendec
Date: Thu May 11 2023 - 15:56:05 EST


On Wed, 2023-05-10 at 17:00 -0700, Ricardo Neri wrote:
> On Wed, May 10, 2023 at 04:44:49PM -0400, Radu Rendec wrote:
> > On Wed, 2023-05-10 at 12:12 -0700, Ricardo Neri wrote:
> > > On Wed, Apr 12, 2023 at 02:57:57PM -0400, Radu Rendec wrote:
> > > > This patch gives architecture specific code the ability to initialize
> > > > the cache level and allocate cacheinfo memory early, when cache level
> > > > initialization runs on the primary CPU for all possible CPUs.
> > [cut]
> > > > -int detect_cache_attributes(unsigned int cpu)
> > > > +static inline int init_level_allocate_ci(unsigned int cpu)
> > > >  {
> > > > -       int ret;
> > > > +       unsigned int early_leaves = cache_leaves(cpu);
> > > >  
> > > >         /* Since early initialization/allocation of the cacheinfo is allowed
> > > >          * via fetch_cache_info() and this also gets called as CPU hotplug
> > > >          * callbacks via cacheinfo_cpu_online, the init/alloc can be skipped
> > > >          * as it will happen only once (the cacheinfo memory is never freed).
> > > > -        * Just populate the cacheinfo.
> > > > +        * Just populate the cacheinfo. However, if the cacheinfo has been
> > > > +        * allocated early through the arch-specific early_cache_level() call,
> > > > +        * there is a chance the info is wrong (this can happen on arm64). In
> > > > +        * that case, call init_cache_level() anyway to give the arch-specific
> > > > +        * code a chance to make things right.
> > > >          */
> > > > -       if (per_cpu_cacheinfo(cpu))
> > > > -               goto populate_leaves;
> > > > +       if (per_cpu_cacheinfo(cpu) && !ci_cacheinfo(cpu)->early_ci_levels)
> > > > +               return 0;
> > > >  
> > > >         if (init_cache_level(cpu) || !cache_leaves(cpu))
> > > >                 return -ENOENT;
> > > >  
> > > > -       ret = allocate_cache_info(cpu);
> > > > +       /*
> > > > +        * Now that we have properly initialized the cache level info, make
> > > > +        * sure we don't try to do that again the next time we are called
> > > > +        * (e.g. as CPU hotplug callbacks).
> > > > +        */
> > > > +       ci_cacheinfo(cpu)->early_ci_levels = false;
> > > > +
> > > > +       if (cache_leaves(cpu) <= early_leaves)
> > > > +               return 0;
> > > > +
> > >
> > > I had posted a patchset[1] for x86 that initializes
> > > ci_cacheinfo(cpu)->num_leaves during SMP boot.
> > >
> > > This means that early_leaves and a late cache_leaves() are equal but
> > > per_cpu_cacheinfo(cpu) is never allocated. Currently, x86 does not use
> > > fetch_cache_info().
> > >
> > > I think that we should check here that per_cpu_cacheinfo() has been allocated to
> > > take care of the case in which early and late cache leaves remain the same:
> > >
> > > -       if (cache_leaves(cpu) <= early_leaves)
> > > +       if (cache_leaves(cpu) <= early_leaves && per_cpu_cacheinfo(cpu))
> > >
> > > Otherwise, in v6.4-rc1 + [1] I observe a NULL pointer dereference from
> > > last_level_cache_is_valid().
> > >
> > > I can post a patch with this fix if it makes sense.
> > >
> > > [1]. https://lore.kernel.org/all/20230424001956.21434-3-ricardo.neri-calderon@xxxxxxxxxxxxxxx/
> >
> > Thanks for bringing this to my attention. I need to run some tests on
> > x86 (I did all that work/testing on arm64) and wrap my head around it.
> >
> > While I don't see any problem with the fix you're proposing, I'm afraid
> > it may circle back to the other problem I tried to fix initially. Have
> > you tested this on an RT kernel by any chance?
>
> That is a good point. I did not test on an RT kernel. I'll try that.

It looks like the flow is much simpler on x86: detect_cache_attributes()
is called only once for each CPU, and it's called in kthread context.

I haven't tested on an RT kernel but I think it should be fine. I put a
msleep() there and saw no issues, which means kmalloc() on RT should be
fine as well.

> > I'm thinking that if we end up in init_level_allocate_ci() without the
> > cacheinfo memory having been allocated earlier, we're up for a "BUG"
> > splat on RT kernels.
> >
> > If early_leaves has the right value at that point, the cacheinfo memory
> > should be allocated early (on the primary CPU), so perhaps there's a
> > different problem somewhere else.
>
> That can work for x86, IMO. Not sure about other archs. As you mention,
> other archs still want the chance to correct the early cache info.

You're right. I got confused for a moment because I was used to the
arm64 flow. On x86, there is no "early" cache info per se because, as I
already mentioned, detect_cache_attributes() is called only once for
each CPU.

I was intrigued about how this worked without your changes, and I
looked closer. Between the initialization of the early_leaves variable
at the beginning of init_level_allocate_ci() and the comparison of
cache_leaves(cpu) and early_leaves, init_cache_level() gets called.
Before your changes, (struct cpu_cacheinfo).num_leaves was initialized
to 0 and then changed in init_cache_level(). That way, early_leaves
ended up as 0, which made the comparison evaluate to false.

At this point I think the patch you proposed is the right way to fix
this. I don't see any reason why it would interfere with other archs
that really use early allocation. This new patch should probably be
added to your series, since otherwise your other patches would
basically "introduce" a null-pointer deref.

My only suggestion would be to add a short comment before the
comparison, to explain that on x86 detect_cache_attributes() is called
only once for each CPU and so early allocation is not possible but
(struct cpu_cacheinfo).num_leaves is already initialized by the time
detect_cache_attributes() is called.

Regards,
Radu