Re: [PATCH] gfs2: Fix mmap + page fault deadlocks

From: Andreas Gruenbacher
Date: Thu Jul 01 2021 - 20:21:46 EST


On Thu, Jul 1, 2021 at 11:41 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Thu, Jul 1, 2021 at 1:43 PM Andreas Gruenbacher <agruenba@xxxxxxxxxx> wrote:
> > here's another attempt at fixing the mmap + page fault deadlocks we're
> > seeing on gfs2. Still not ideal because get_user_pages_fast ignores the
> > current->pagefault_disabled flag
>
> Of course get_user_pages_fast() ignores the pagefault_disabled flag,
> because it doesn't do any page faults.
>
> If you don't want to fall back to the "maybe do IO" case, you should
> use the FOLL_FAST_ONLY flag - or get_user_pages_fast_only(), which
> does that itself.
>
> > For getting get_user_pages_fast changed to fix this properly, I'd need
> > help from the memory management folks.
>
> I really don't think you need anything at all from the mm people,
> because we already support that whole "fast only" case.

Yes, fair enough.

> Also, I have to say that I think the direct-IO code is fundamentally
> mis-designed. Why it is doing the page lookup _during_ the IO is a
> complete mystery to me. Why wasn't that done ahead of time before the
> filesystem took the locks it needed?

That would be inconvenient for reads, when the number of bytes read is
much smaller than the buffer size and we won't need to page in the
entire buffer.

> So what the direct-IO code _should_ do is to turn an ITER_IOVEC into a
> ITER_KVEC by doing the page lookup ahead of time, and none of these
> issues should even exist, and then the whole pagefault_disabled and/or
> FOLL_FAST_ONLY would be a complete non-issue.
>
> Is there any reason why that isn't what it does (other than historical baggage)?

It turns out that there's an even deeper issue with keeping references
to user-space pages. Those references will essentially pin the glock
of the associated inode to the node. Moving a glock off a node
requires truncating the inode's page cache, but the page references
would prevent that. So we'd only end up with different kinds of
potential deadlocks.

If we could get iomap_dio_rw to use "fast only" mode when requested,
we could fault in the pages without keeping references, try the IO,
and repeat when necessary.

Thanks a lot,
Adreas