Re: do_page_fault() mm->mmap_sem deadlocks

From: Andrew Morton
Date: Mon May 17 2004 - 13:09:21 EST


Andy Whitcroft <apw@xxxxxxxxxxxx> wrote:
>
> Whilst trying to debug a bug in PPC64 unmap handling we noticed that it is
> relatively easy for a kernel bug to lead to a deadlock trying to
> down(mm->mmap_sem) when trying to categorise a fault for handling. The
> fault handlers grab the mmap_sem lock to scan the VMA list to see if this
> fault is against a valid segment of the task address space. This deadlocks
> if this task is in a system call which also manipulates the virtual
> address, such as munmap(). This is very undesirable as it prevents the
> kernel from detecting the fault and reporting it as an oops severely
> hampering debug.

Yes. The "oops deadlocks when mmap_sem is held" problem is rather
irritating, and would be nice to fix.

> Most kernel code should not access any area which leads to a fault. Indeed
> the only code which should trigger faults from kernel space are routines
> dealing with copying data to and from user space.

And vmalloc faults on x86: we fault in those 4k pte's on-demand. Your
patch seems to cover that OK.

> Fault originating
> outside of these specific routines are not valid and could be 'failed'
> without the need to reference current->mm, thereby preventing deadlock and
> ensuring the failure is reported.

Think so.


> ...
> I have assumed that faults triggered by kernel space are going
> to be rare relative to user space as most pointers will have been recently
> referenced by the calling application, but this search is of concern for
> performance.

These kernel-mode faults are by no means uncommon - probably the most
frequent scenario is:

p = malloc(...);
read(fd, p, n);

Here, generic_file_read() will take a copy_to_user() fault for each page to
COW it.

Which makes one wonder why we aren't caching the result of the previous
exception table search. I bet that would get a nice hit rate, and would
fix up any overhead which your change introduces.

>
> Do people think this is something which is worth pursuing.

I do.

> The patches
> attached are minimal and there are definatly optimisations to be made (such
> as sharing the result of the exception lookup). I guess this might even be
> something which is worth having as a config option for when hangs is there
> is deemed to be any degradation? If people think this is something worth
> looking at I'll come back with cleaner, better tested and benchmarked
> patches.

Please. Also please consider (and instrument) a last-hit cache in
search_exception_tables(). Maybe it should be per-task.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/