Re: PROBLEM: Only one CPU active on Ultra 60 since ~4.8 (regression)

From: Linux regression tracking (Thorsten Leemhuis)
Date: Thu Mar 28 2024 - 15:36:23 EST


[CCing Linus, in case I say something to his disliking]

On 22.03.24 05:57, Nick Bowler wrote:
>
> Just a friendly reminder that this issue still happens on Linux 6.8 and
> reverting commit 9b2f753ec237 as indicated below is still sufficient to
> resolve the problem.

FWIW, that commit 9b2f753ec23710 ("sparc64: Fix cpu_possible_mask if
nr_cpus is set") is from v4.8. Reverting it after all that time might
easily lead to even bigger trouble. That's why it might be better to
handle this like a bug and not like a regression. At least unless we
find someone to judge how likely such an outcome is. But it seems nobody
really cared so far, so unless this mail makes someone act you might be
out of luck. :-/

I wish it was different, but in the end we (including the maintainers)
are all just volunteers here which you can only motivate or compel (up
to some point) to look into some issue, but can not force to do so.

Ciao, Thorsten

> On 2023-01-21 08:31, Linux kernel regression tracking (Thorsten Leemhuis) wrote:
>> CCing the sparc maintainer. Also CCing the regression list, as it should
>> be in the loop for regressions:
>> https://docs.kernel.org/admin-guide/reporting-regressions.html
>>
>> The the mail address of the culprit's author bounces. There is another
>> Atish Patra still active; does anyone known if those two are the same
>> person?
>>
>> Anyway, that's it from my side.
> [...]
>> On 20.01.23 04:15, Nick Bowler wrote:
>>> Hi,
>>>
>>> I'm resending this report CC'd to linux-kernel as there was no response
>>> on the sparclinux list.
>>>
>>> I tried 6.2-rc4 and there is no change in behaviour. Reverting the
>>> indicated commit still works to fix the problem.
>>>
>>> On 2022-07-12, Nick Bowler <nbowler@xxxxxxxxxx> wrote:
>>>> When using newer kernels on my Ultra 60 with dual 450MHz UltraSPARC-II
>>>> CPUs, I noticed that only CPU 0 comes up, while older kernels (including
>>>> 4.7) are working fine with both CPUs.
>>>>
>>>> I bisected the failure to this commit:
>>>>
>>>> 9b2f753ec23710aa32c0d837d2499db92fe9115b is the first bad commit
>>>> commit 9b2f753ec23710aa32c0d837d2499db92fe9115b
>>>> Author: Atish Patra <atish.patra@xxxxxxxxxx>
>>>> Date: Thu Sep 15 14:54:40 2016 -0600
>>>>
>>>> sparc64: Fix cpu_possible_mask if nr_cpus is set
>>>>
>>>> This is a small change that reverts very easily on top of 5.18: there is
>>>> just one trivial conflict. Once reverted, both CPUs work again.
>>>>
>>>> Maybe this is related to the fact that the CPUs on this system are
>>>> numbered CPU0 and CPU2 (there is no CPU1)?
>>>>
>>>> Here is /proc/cpuinfo on a working kernel:
>>>>
>>>> % cat /proc/cpuinfo
>>>> cpu : TI UltraSparc II (BlackBird)
>>>> fpu : UltraSparc II integrated FPU
>>>> pmu : ultra12
>>>> prom : OBP 3.23.1 1999/07/16 12:08
>>>> type : sun4u
>>>> ncpus probed : 2
>>>> ncpus active : 2
>>>> D$ parity tl1 : 0
>>>> I$ parity tl1 : 0
>>>> cpucaps : flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis
>>>> Cpu0ClkTck : 000000001ad31b4f
>>>> Cpu2ClkTck : 000000001ad31b4f
>>>> MMU Type : Spitfire
>>>> MMU PGSZs : 8K,64K,512K,4MB
>>>> State:
>>>> CPU0: online
>>>> CPU2: online
>>>>
>>>> And on a broken kernel:
>>>>
>>>> % cat /proc/cpuinfo
>>>> cpu : TI UltraSparc II (BlackBird)
>>>> fpu : UltraSparc II integrated FPU
>>>> pmu : ultra12
>>>> prom : OBP 3.23.1 1999/07/16 12:08
>>>> type : sun4u
>>>> ncpus probed : 2
>>>> ncpus active : 1
>>>> D$ parity tl1 : 0
>>>> I$ parity tl1 : 0
>>>> cpucaps : flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis
>>>> Cpu0ClkTck : 000000001ad31861
>>>> MMU Type : Spitfire
>>>> MMU PGSZs : 8K,64K,512K,4MB
>>>> State:
>>>> CPU0: online
>>>>
>>>> Let me know if you need any more info.
>>>>
>>>> Thanks,
>>>> Nick
>
>