Re: [BUG] kernel side can NOT trigger memory error with einj

From: Shuai Xue
Date: Sun Mar 20 2022 - 09:12:10 EST


在 2022/3/18 AM12:57, Luck, Tony 写道:
>> - rc = apei_exec_run(&trigger_ctx, ACPI_EINJ_TRIGGER_ERROR);
>> + ptr = kmap(pfn_to_page(pfn));
>> + tmp = *(ptr + (param1 & ~ PAGE_MASK));
>
> That hack works when the trigger action is just trying to access the injected
> location. But on Intel platforms the trigger "kicks" the patrol scrubber in the
> memory controller to access the address. So the error is triggered not by
> an access from the core, but by internal memory controller access.
>
> This results in a different error signature (for an uncorrected error injection
> it will be a UCNA or SRAO in Intel acronym-speak).

As far as I know, APEI only defines five injection instructions, ACPI_EINJ_READ_REGISTER,
ACPI_EINJ_READ_REGISTER_VALUE, ACPI_EINJ_WRITE_REGISTER, ACPI_EINJ_WRITE_REGISTER_VALUE and
ACPI_EINJ_NOOP. ACPI_EINJ_TRIGGER_ERROR action should run one of them, I don't see
any of them will kick the patrol scrubber. For example, trigger with ACPI_EINJ_READ_REGISTER:

apei_exec_run(&trigger_ctx, ACPI_EINJ_TRIGGER_ERROR)
__apei_exec_run // ins=0
=> apei_exec_read_register
=> apei_read
=> acpi_os_read_memory
=> acpi_map_vaddr_lookup /* lookup VA of PA from acpi_ioremap */
=> acpi_os_ioremap
=> acpi_os_read_iomem
=> *(u32 *) value = readl(virt_addr);

As we can see, the error is triggered by access from the core. However, the physical
address can NOT be mapped by acpi_os_ioremap.

If I missed anything, please let me know. Thank you very much.

Best Regards,
Shuai