Re: Strange EFAULT on mips64el returned by syscall when another thread is forking

From: Linus Torvalds
Date: Wed Jan 24 2024 - 16:55:03 EST


On Wed, 24 Jan 2024 at 13:33, Xi Ruoyao <xry111@xxxxxxxxxxx> wrote:
>
> Re-posting the broken test case for Ben (I also added a waitpid call to
> prevent PID exhaustion):

Funky, funky.

> ssize_t ret = read (fd, buf, 7);
> if (ret == -1 && errno == EFAULT)
> abort ();

So I think I have a clue:

> and the "interesting" aspects:
>
> 1. If I change the third parameter of "read" to any value >= 8, it no
> longer fails. But it fails with any integer in [1, 8).

One change (the only one, really), is that now that MIPS uses
lock_mm_and_find_vma(), it also has this code:

if (regs && !user_mode(regs)) {
unsigned long ip = instruction_pointer(regs);
if (!search_exception_tables(ip))
return false;
}

in case the mmap trylock fails.

That code protects against the deadlock case of "we hold the mmap
lock, and take a kernel page fault due to a bug, and that page fault
happens to be to user space, and the page fault code then deadlocks on
the mmap lock".

It's a rare bug, but it's so nasty to debug that x86 has had that code
pretty much forever, and the lock_mm_and_find_vma() helper got it that
way. MIPS was clearly expecting kernel debugging to happen on other
platforms ;)

And I think the "fails with any integer in [1, 8)" is because the MIPS
"copy_from_user()" code is likely doing something special for those
small copies.

And I note that the MIPS extable.c code uses

fixup = search_exception_tables(exception_epc(regs));

Note the difference: lock_mm_and_find_vma() uses
instruction_pointer(regs), extable.c uses exception_epc(regs).

The former is just "((regs)->cp0_epc)", while the latter is some
complex mess due to MIPS delay slots and isa16.

My *suspicion* is that instruction_pointer() needs to be fixed to do
the same full exception_epc() thing.

But honestly, I absolutely detest delay slots and refuse to touch
anything MIPS for that reason,.

And there could certainly be something else going on too. But that odd
size limitation, and the fact that it only happens on MIPS, does make
me think the above analysis is right.

I guess you could test it by changing the two cases of
'instruction_pointer(regs)' in mm/memory.c to use exception_epc(regs)
instead. It will only build on MIPS, but for *testing* that theory
out, it's fine.

Over to MIPS people..

Linus