Re: [PATCH RFC 1/1] kernel/cpu: to track which CPUHP callback is failed

From: Dongli Zhang
Date: Wed Apr 07 2021 - 18:14:43 EST




On 4/6/21 9:10 AM, Qais Yousef wrote:
> On 04/05/21 14:22, Dongli Zhang wrote:
>> May I have if there is any feedback on this? To pr_err_once() here helps narrow
>> down what is the root cause of cpu online failure.
>>
>>
>> The issue fixed by d7eb79c6290c ("KVM: kvmclock: Fix vCPUs > 64 can't be
>> online/hotpluged") is able to demonstrate how this pr_err_once() helps.
>>
>> Due to VM kernel issue, at most 64 VCPUs can online if host clocksource is
>> switched to hpet.
>>
>> # echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource
>>
>>
>> By default, we have no idea why only 64 out of 100 VCPUs are online in VM. We
>> need to customize kernel to debug why some CPUs are not able to online.
>>
>>
>> We will have below and know the root cause is with kvmclock, if we add the
>> pr_err_once().
>
> Isn't pr_debug() more appropriate here? Don't you want one on the
> cpuhp_down_callbacks() too?

I will add one to cpuhp_down_callbacks() as well. Thank you very much for the
suggestion.

>
> I *think* it's common now to have CONFIG_DYNAMIC_DEBUG enabled so one can
> enable that line when they need to
>
> https://www.kernel.org/doc/html/latest/admin-guide/dynamic-debug-howto.html
>
> I can see the appeal of pr_err_once() but I think this can fail for legitimate
> reasons and is not necessarily strictly always an error?

The reason I used pr_err_once() is to remind the sysadmin something bad (maybe
not necessarily strictly as error) used to happen. It is quite helpful for
developer/sysadmin that is not familiar with cpu/smp/bootup/online.

Otherwise, someone will need to manually configure kernel command line with
"loglevel=8 dyndbg="file kernel/cpu.c +p".

The good thing with pr_err_once is it is not based on the assumption that
someone already know the root cause is related to kernel/cpu.c.

The good thing with pr_debug is we will not need to print for at most once.

I will send v2 with pr_debug() and let all maintainers to decide if it is
fine/required to print something here. Some places use WARN_ON_ONCE() for the
return value of cpuhp_invoke_callback(). I prefer to use pr_debug/pr_err to
print extra debug information for cpu online/hotplug case.

Thank you very much!

Dongli Zhang

>
> Thanks
>
> --
> Qais Yousef
>