Re: [PATCH] x86/fault: ignore RSVD flag in error code if P flag is 0

From: H. Peter Anvin
Date: Thu Jun 30 2022 - 20:43:35 EST


On June 29, 2022 10:58:36 PM PDT, Vasily Averin <vvs@xxxxxxxxxx> wrote:
>Some older Intel CPUs have errata:
>"Not-Present Page Faults May Set the RSVD Flag in the Error Code
>
>Problem:
>An attempt to access a page that is not marked present causes a page
>fault. Such a page fault delivers an error code in which both the
>P flag (bit 0) and the RSVD flag (bit 3) are 0. Due to this erratum,
>not-present page faults may deliver an error code in which the P flag
>is 0 but the RSVD flag is 1.
>
>Implication:
>Software may erroneously infer that a page fault was due to a
>reserved-bit violation when it was actually due to an attempt
>to access a not-present page.
>
>Workaround: Page-fault handlers should ignore the RSVD flag in the error
>code if the P flag is 0."
>
>This issues was observed on several nodes crashed with messages
>httpd: Corrupted page table at address 7f62d5b48e68
>PGD 80000002e92bf067 PUD 1c99c5067 PMD 195015067 PTE 7fffffffb78b680
>Bad pagetable: 000c [#1] SMP
>
>Let's follow the recommendation and will ignore the RSVD flag in the
>error code if the P flag is 0
>
>Link: https://lore.kernel.org/all/aae9c7c6-989c-0261-470a-252537493b53@xxxxxxxxxx
>Signed-off-by: Vasily Averin <vvs@xxxxxxxxxx>
>---
> arch/x86/mm/fault.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
>diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>index fe10c6d76bac..ffc6d6bd2a22 100644
>--- a/arch/x86/mm/fault.c
>+++ b/arch/x86/mm/fault.c
>@@ -1481,6 +1481,15 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code,
> if (unlikely(kmmio_fault(regs, address)))
> return;
>
>+ /*
>+ * Some older Intel CPUs have errata
>+ * "Not-Present Page Faults May Set the RSVD Flag in the Error Code"
>+ * It is recommended to ignore the RSVD flag (bit 3) in the error code
>+ * if the P flag (bit 0) is 0.
>+ */
>+ if (unlikely((error_code & X86_PF_RSVD) && !(error_code & X86_PF_PROT)))
>+ error_code &= ~X86_PF_RSVD;
>+
> /* Was the fault on kernel-controlled part of the address space? */
> if (unlikely(fault_in_kernel_space(address))) {
> do_kern_addr_fault(regs, error_code, address);

Are there other bits we could/should mask.out in the case P = 0? The only bits that should be able to appear are ones that are independent of the PTE content.