[PATCH] ACPI: PHAT: Add Platform Health Assessment Table support

From: Avadhut Naik
Date: Mon Aug 21 2023 - 22:26:22 EST


Hi,

On 8/21/2023 13:01, Rafael J. Wysocki wrote:
> On Mon, Aug 21, 2023 at 7:52 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
>>
>> On Mon, Aug 21, 2023 at 7:35 PM Limonciello, Mario
>> <mario.limonciello@xxxxxxx> wrote:
>>>
>>>
>>>
>>> On 8/21/2023 12:29 PM, Rafael J. Wysocki wrote:
>>>> On Mon, Aug 21, 2023 at 7:17 PM Limonciello, Mario
>>>> <mario.limonciello@xxxxxxx> wrote:
>>>>>
>>>>> On 8/21/2023 12:12 PM, Rafael J. Wysocki wrote:
>>>>> <snip>
>>>>>>> I was just talking to some colleagues about PHAT recently as well.
>>>>>>>
>>>>>>> The use case that jumps out is "system randomly rebooted while I was
>>>>>>> doing XYZ". You don't know what happened, but you keep using your
>>>>>>> system. Then it happens again.
>>>>>>>
>>>>>>> If the reason for the random reboot is captured to dmesg you can cross
>>>>>>> reference your journal from the next boot after any random reboot and
>>>>>>> get the reason for it. If a user reports this to a Gitlab issue tracker
>>>>>>> or Bugzilla it can be helpful in establishing a pattern.
>>>>>>>
>>>>>>>>> The below location may be appropriate in that case:
>>>>>>>>> /sys/firmware/acpi/
>>>>>>>>
>>>>>>>> Yes, it may. >
>>>>>>>>> We already have FPDT and BGRT being exported from there.
>>>>>>>>
>>>>>>>> In fact, all of the ACPI tables can be retrieved verbatim from
>>>>>>>> /sys/firmware/acpi/tables/ already, so why exactly do you want the
>>>>>>>> kernel to parse PHAT in particular?
>>>>>>>>
>>>>>>>
>>>>>>> It's not to say that /sys/firmware/acpi/PHAT isn't useful, but having
>>>>>>> something internal to the kernel "automatically" parsing it and saving
>>>>>>> information to a place like the kernel log that is already captured by
>>>>>>> existing userspace tools I think is "more" useful.
>>>>>>
>>>>>> What existing user space tools do you mean? Is there anything already
>>>>>> making use of the kernel's PHAT output?
>>>>>>
>>>>>
>>>>> I was meaning things like systemd already capture the kernel long
>>>>> ringbuffer. If you save stuff like this into the kernel log, it's going
>>>>> to be indexed and easier to grep for boots that had it.
>>>>>
>>>>>> And why can't user space simply parse PHAT by itself?
>>>>>> > There are multiple ACPI tables that could be dumped into the kernel
>>>>>> log, but they aren't. Guess why.
>>>>>
>>>>> Right; there's not reason it can't be done by userspace directly.
>>>>>
>>>>> Another way to approach this problem could be to modify tools that
>>>>> excavate records from a reboot to also get PHAT. For example
>>>>> systemd-pstore will get any kernel panics from the previous boot from
>>>>> the EFI pstore and put them into /var/lib/systemd/pstore.
>>>>>
>>>>> No reason that couldn't be done automatically for PHAT too.
>>>>
>>>> I'm not sure about the connection between the PHAT dump in the kernel
>>>> log and pstore.
>>>>
>>>> The PHAT dump would be from the time before the failure, so it is
>>>> unclear to me how useful it can be for diagnosing it. However, after
>>>> a reboot one should be able to retrieve PHAT data from the table
>>>> directly and that may include some information regarding the failure.
>>>
>>> Right so the thought is that at bootup you get the last entry from PHAT
>>> and save that into the log.
>>>
>>> Let's say you have 3 boots:
>>> X - Triggered a random reboot
>>> Y - Cleanly shut down
>>> Z - Boot after a clean shut down
>>>
>>> So on boot Y you would have in your logs the reason that boot X rebooted.
>>
>> Yes, and the same can be retrieved from the PHAT directly from user
>> space at that time, can't it?
>>
>>> On boot Z you would see something about how boot Y's reason.
>>>
>>>>
>>>> With pstore, the assumption is that there will be some information
>>>> relevant for diagnosing the failure in the kernel buffer, but I'm not
>>>> sure how the PHAT dump from before the failure can help here?
>>>
>>> Alone it's not useful.
>>> I had figured if you can put it together with other data it's useful.
>>> For example if you had some thermal data in the logs showing which
>>> component overheated or if you looked at pstore and found a NULL pointer
>>> dereference.
>>
>> IIUC, the current PHAT content can be useful. The PHAT content from
>> boot X (before the failure) which is what will be there in pstore
>> after the random reboot, is of limited value AFAICS.
>
> To be more precise, I don't see why the kernel needs to be made a
> man-in-the-middle between the firmware which is the source of the
> information and user space that consumes it.

I do somewhat agree with your point.

IIUC, ACPI Table parsing can be undertaken from user-space for ACPI
tables that provide error information through sysfs and, if required, MMIO.

Our principal motive though in wanting to add support for this table in the
kernel, and please correct me if I am wrong, was the absence of an open-source
tool to accomplish this. Having support for the table in the kernel should alleviate
users from the need to develop tools and manually run them whence an unexpected
reset is encountered. The data is already available in the dmesg / journal for
analysis and will be available across reboots in the journal.

An alternative for a tool might be using acpidump utility and ASL but even that
can be tedious at times since tables are in little-endian format and users might
be required to undertake byte level decoding of the dumped table by referring
to ACPI specs. Wouldn't having parsed data available in the dmesg, at least,
be convenient in such cases?

Another important motive was the reset reason health record itself. Below is
an excerpt from the ACPI spec v6.5

The reset reason is intended to supplement existing fault reporting mechanisms on the platform (e.g. BERT tables, CPER) or in the operating system (e.g. event logs)

Since existing fault reporting mechanisms log into the dmesg buffer (AFAIK) and
Reset Reason Health Record is intended to supplement them, wouldn't it be fitting
to have the record available in dmesg buffer too and ensure that all error info
is not scattered but available in a single place and across reboots?

Also, I do agree with Mario in that we should set expected reset reasons to
debug. Rather, we could have the entire record as debug for expected resets and
only log to the dmesg for unexpected resets.

--
Thanks,
Avadhut Naik