Re: [PATCH v5 09/19] arch_topology: Use the last level cache information from the cacheinfo

From: Sudeep Holla
Date: Thu Jun 30 2022 - 16:08:47 EST


On Thu, Jun 30, 2022 at 07:20:04PM +0000, Conor.Dooley@xxxxxxxxxxxxx wrote:
>
>
> On 30/06/2022 18:35, Sudeep Holla wrote:
> > On Thu, Jun 30, 2022 at 04:37:50PM +0000, Conor.Dooley@xxxxxxxxxxxxx wrote:
> >> On 30/06/2022 11:39, Sudeep Holla wrote:
> >>>
> >>> I can't think of any reason for that to happen unless detect_cache_attributes
> >>> is failing from init_cpu_topology and we are ignoring that.
> >>>
> >>> Are all RISC-V platforms failing on -next or is it just this platform ?
> >>
> >> I don't know. I only have SoCs with this core complex & one that does not
> >> work with upstream. I can try my other board with this SoC - but I am on
> >> leave at the moment w/ a computer or internet during the day so it may be
> >> a few days before I can try it.
> >>
> >
> > Sure, no worries.
> >
> >> However, Niklas Cassel has tried to use the Canaan K210 on next-20220630
> >> but had issues with RCU stalling:
> >> https://lore.kernel.org/linux-riscv/Yr3PKR0Uj1bE5Y6O@x1-carbon/T/#m52016996fcf5fa0501066d73352ed8e806803e06
> >> Not going to claim any relation, but that's minus 1 to the platforms that
> >> can be used to test this on upstream RISC-V.
> >>
> >
> > Ah OK, will check and ask full logs to see if there is any relation.
> >
> >>> We may have to try with some logs in detect_cache_attributes,
> >>> last_level_cache_is_valid and last_level_cache_is_shared to check where it
> >>> is going wrong.
> >>>
> >>> It must be crashing in smp_callin->update_siblings_masks->last_level_cache_is_shared
>
>
> So, looks like there's a problem in cache_leaves_are_shared() which is hit
> by the above path. Both of the if clauses are false, and the function falls
> through to return sib_leaf->fw_token == this_leaf->fw_token;

Both if() failing is expected and that statement
return sib_leaf->fw_token == this_leaf->fw_token;
execution is correct.

> Both sib_leaf & this_leaf seem to be null.
>

But this is wrong as last_level_cache_is_shared checks for
last_level_cache_is_valid which must return false if the fw_token = NULL
So we must not hit the above return statement with NULL fw_token.

> static inline bool cache_leaves_are_shared(struct cacheinfo *this_leaf,
> struct cacheinfo *sib_leaf)
> {
> /*
> * For non DT/ACPI systems, assume unique level 1 caches,
> * system-wide shared caches for all other levels. This will be used
> * only if arch specific code has not populated shared_cpu_map
> */
> if (!(IS_ENABLED(CONFIG_OF) || IS_ENABLED(CONFIG_ACPI)))
> return !(this_leaf->level == 1);
>
> if ((sib_leaf->attributes & CACHE_ID) &&
> (this_leaf->attributes & CACHE_ID))
> return sib_leaf->id == this_leaf->id;
>
> return sib_leaf->fw_token == this_leaf->fw_token;
> }
>
> Any ideas what to look at next?

I wonder how did we not get last_level_cache_is_valid as false if the
fw_node is NULL. But it should not be NULL at the first place.

--
Regards,
Sudeep