Re: [PATCH v4] cpufreq: intel_pstate: Reporting reasons why driver prematurely exit

From: Erwan Velu
Date: Wed Feb 13 2019 - 04:13:54 EST



Le 13/02/2019 Ã 00:01, Rafael J. Wysocki a ÃcritÂ:
[...]
> Newline characters are missing in all of your messages.

oops. Fixing this.

> "ACPI _PSS not found\n"

Done.


>> return true;
>> }
>>
>> @@ -2484,10 +2485,15 @@ static bool __init intel_pstate_no_acpi_pcch(void)
>> acpi_handle handle;
>>
>> status = acpi_get_handle(NULL, "\\_SB", &handle);
>> - if (ACPI_FAILURE(status))
>> + if (ACPI_FAILURE(status)) {
>> + pr_debug("Cannot detect ACPI SB");
> This is very unlikely to happen, I wouldn't bother to print anything here.

As this is test is made, it means someone considered that case could exist.

That's why a added a message to catch this case while debugging.

I do agree that is not very likely.

>
>> return true;
>> + }
>>
>> - return !acpi_has_method(handle, "PCCH");
>> + status = acpi_has_method(handle, "PCCH");
>> + if (!status)
>> + pr_debug("Cannot detect ACPI PCCH");
> This message can be printed by the caller and, again, I would prefer
> something like "ACPI PCCH not found".
Fixed too.

[..]
> "ACPI _PPC not found\n"
fixed.
>> return false;
>> }
>>
>> @@ -2539,8 +2546,10 @@ static bool __init intel_pstate_platform_pwr_mgmt_exists(void)
>> id = x86_match_cpu(intel_pstate_cpu_oob_ids);
>> if (id) {
>> rdmsrl(MSR_MISC_PWR_MGMT, misc_pwr);
>> - if ( misc_pwr & (1 << 8))
>> + if (misc_pwr & (1 << 8)) {
>> + pr_debug("MSR_MISC_PWR_MGMT reports enabled HW coordination");
> IIRC this means that the platform is managing P-states on systems
> without HWP, so I would print something like "P-states managed by the
> platform\n".

That's what the caller reports.

I added this additional message because this function added as special
case for this MSR setting.

if intel_pstate_platform_pwr_mgmt_exists() is true, that's nice to know
why when debugging.

>> return true;
>> + }
>> }
>>
>> idx = acpi_match_platform_list(plat_info);
>> @@ -2606,22 +2615,28 @@ static int __init intel_pstate_init(void)
>> }
>> } else {
>> id = x86_match_cpu(intel_pstate_cpu_ids);
>> - if (!id)
>> + if (!id) {
>> + pr_warn("CPU ID is not in the list of supported devices\n");
> Why not pr_debug()?
>
> And analogously below?

I'm coming from a use case where on the same hardware, changing the
kernel changed the performance profile and impacted user's performance.

I had to debug this case from a running server.

From the dmesg, one of the difference I saw was about the
missingÂ"Intel P-state driver initializing" message, but nothing else.

The reason why the driver didn't engaged itself wasn't explicit.

As CONFIG_X86_INTEL_PSTATE is set to Y by default, I wasn't enable to
reload it in any way.

By reading the code, most of the code path and return options doesn't
have a single pr_<something> call to explain why.

So I had to rebuild the kernel + my patch to understand which path was
taken and then find the root cause of why the pstate behavior changed.

I wanted to contribute back my experience here by providing messages to
ease the understanding and potentially debugging of the driver without
recompiling it.


So the root of this patch is being able to report the main reasons of
why the driver didn't engaged itself with a pr_warn (could be also a
pr_info).

All the major and probable "return -ENODEV" calls before pr_info("Intel
P-state driver initializing\n") deserves to be explicit and reachable on
dmesg for the operators.

I'm operating 30K+ servers and having this kind direct information would
save serious time when debugging this situations.

If my experience in that area could help other users, I'd be happy.


I'm positing a v5 with all those changes.

Thanks for the review.


Erwan,