Re: [PATCH v4] cpufreq: intel_pstate: Reporting reasons why driver prematurely exit

From: Rafael J. Wysocki
Date: Wed Feb 13 2019 - 05:34:37 EST


On Wed, Feb 13, 2019 at 10:13 AM Erwan Velu <e.velu@xxxxxxxxxx> wrote:
>
>
> Le 13/02/2019 Ã 00:01, Rafael J. Wysocki a Ãcrit :
> [...]
> > Newline characters are missing in all of your messages.
>
> oops. Fixing this.
>
> > "ACPI _PSS not found\n"
>
> Done.
>
>
> >> return true;
> >> }
> >>
> >> @@ -2484,10 +2485,15 @@ static bool __init intel_pstate_no_acpi_pcch(void)
> >> acpi_handle handle;
> >>
> >> status = acpi_get_handle(NULL, "\\_SB", &handle);
> >> - if (ACPI_FAILURE(status))
> >> + if (ACPI_FAILURE(status)) {
> >> + pr_debug("Cannot detect ACPI SB");
> > This is very unlikely to happen, I wouldn't bother to print anything here.
>
> As this is test is made, it means someone considered that case could exist.
>
> That's why a added a message to catch this case while debugging.
>
> I do agree that is not very likely.

So a single "no PCCH" message for this whole function should be sufficient.

> >
> >> return true;
> >> + }
> >>
> >> - return !acpi_has_method(handle, "PCCH");
> >> + status = acpi_has_method(handle, "PCCH");
> >> + if (!status)
> >> + pr_debug("Cannot detect ACPI PCCH");
> > This message can be printed by the caller and, again, I would prefer
> > something like "ACPI PCCH not found".
> Fixed too.
>
> [..]
> > "ACPI _PPC not found\n"
> fixed.
> >> return false;
> >> }
> >>
> >> @@ -2539,8 +2546,10 @@ static bool __init intel_pstate_platform_pwr_mgmt_exists(void)
> >> id = x86_match_cpu(intel_pstate_cpu_oob_ids);
> >> if (id) {
> >> rdmsrl(MSR_MISC_PWR_MGMT, misc_pwr);
> >> - if ( misc_pwr & (1 << 8))
> >> + if (misc_pwr & (1 << 8)) {
> >> + pr_debug("MSR_MISC_PWR_MGMT reports enabled HW coordination");
> > IIRC this means that the platform is managing P-states on systems
> > without HWP, so I would print something like "P-states managed by the
> > platform\n".
>
> That's what the caller reports.
>
> I added this additional message because this function added as special
> case for this MSR setting.
>
> if intel_pstate_platform_pwr_mgmt_exists() is true, that's nice to know
> why when debugging.

I see.

> >> return true;
> >> + }
> >> }
> >>
> >> idx = acpi_match_platform_list(plat_info);
> >> @@ -2606,22 +2615,28 @@ static int __init intel_pstate_init(void)
> >> }
> >> } else {
> >> id = x86_match_cpu(intel_pstate_cpu_ids);
> >> - if (!id)
> >> + if (!id) {
> >> + pr_warn("CPU ID is not in the list of supported devices\n");
> > Why not pr_debug()?
> >
> > And analogously below?
>
> I'm coming from a use case where on the same hardware, changing the
> kernel changed the performance profile and impacted user's performance.
>
> I had to debug this case from a running server.
>
> From the dmesg, one of the difference I saw was about the
> missing "Intel P-state driver initializing" message, but nothing else.
>
> The reason why the driver didn't engaged itself wasn't explicit.

And what did turn out to be the problem?

Anyway, pr_info() should be sufficient IMO.

> As CONFIG_X86_INTEL_PSTATE is set to Y by default, I wasn't enable to
> reload it in any way.
>
> By reading the code, most of the code path and return options doesn't
> have a single pr_<something> call to explain why.
>
> So I had to rebuild the kernel + my patch to understand which path was
> taken and then find the root cause of why the pstate behavior changed.
>
> I wanted to contribute back my experience here by providing messages to
> ease the understanding and potentially debugging of the driver without
> recompiling it.
>
>
> So the root of this patch is being able to report the main reasons of
> why the driver didn't engaged itself with a pr_warn (could be also a
> pr_info).
>
> All the major and probable "return -ENODEV" calls before pr_info("Intel
> P-state driver initializing\n") deserves to be explicit and reachable on
> dmesg for the operators.
>
> I'm operating 30K+ servers and having this kind direct information would
> save serious time when debugging this situations.
>
> If my experience in that area could help other users, I'd be happy.
>
> I'm positing a v5 with all those changes.

Many thanks for your contribution!