Re: [syzbot] [xfs?] BUG: unable to handle kernel paging request in clear_user_rep_good

From: Linus Torvalds
Date: Mon May 01 2023 - 14:56:13 EST


[ Added Borislav and stable people ]

On Sun, Apr 30, 2023 at 9:31 PM syzbot
<syzbot+401145a9a237779feb26@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> syzbot suspects this issue was fixed by commit:

Indeed.

My initial reaction was "no, that didn't fix anything, it just cleaned
stuff up", but it turns out that yes, it did in fact fix a real bug in
the process.

The fix was not intentional, but the cleanup actually got rid of buggy code.

So here's the automatic marker for syzbot:

#syz fix: x86: don't use REP_GOOD or ERMS for user memory clearing

and the reason for the bug - in case people care - is that the old
clear_user_rep_good (which no longer exists after that commit) had the
exception entry pointing to the wrong instruction.

The buggy code did:

.Lrep_good_bytes:
mov %edx, %ecx
rep stosb

and the exception entry weas

_ASM_EXTABLE_UA(.Lrep_good_bytes, .Lrep_good_exit)

so the exception entry pointed at the register move instruction, not
at the actual "rep stosb" that does the user space store.

End result: if you had a situation where you *should* return -EFAULT,
and you triggered that "last final bytes" case, instead of the
exception handling dealing with it properly and fixing it up, you got
that kernel oops.

The bug goes back to commit 0db7058e8e23 ("x86/clear_user: Make it
faster") from about a year ago, which made it into v6.1.

It only affects old hardware that doesn't have the ERMS capability
flag, which *probably* means that it's mostly only triggerable in
virtualization (since pretty much any CPU from the last decade has
ERMS, afaik).

Borislav - opinions? This needs fixing for v6.1..v6.3, and the options are:

(1) just fix up the exception entry. I think this is literally this
one-liner, but somebody should double-check me. I did *not* actually
test this:

--- a/arch/x86/lib/clear_page_64.S
+++ b/arch/x86/lib/clear_page_64.S
@@ -142,8 +142,8 @@ SYM_FUNC_START(clear_user_rep_good)
and $7, %edx
jz .Lrep_good_exit

-.Lrep_good_bytes:
mov %edx, %ecx
+.Lrep_good_bytes:
rep stosb

.Lrep_good_exit:

because the only use of '.Lrep_good_bytes' is that exception table entry.

(2) backport just that one commit for clear_user

In this case we should probably do commit e046fe5a36a9 ("x86: set
FSRS automatically on AMD CPUs that have FSRM") too, since that commit
changes the decision to use 'rep stosb' to check FSRS.

(3) backport the entire series of commits:

git log --oneline v6.3..034ff37d3407

Or we could even revert that commit 0db7058e8e23, but it seems silly
to revert when we have so many ways to fix it, including a one-line
code movement.

Borislav / stable people? Opinions?

Linus