Re: [PATCH v1] mm/gup: disallow FOLL_FORCE|FOLL_WRITE on hugetlb mappings

From: Mike Kravetz
Date: Tue Nov 22 2022 - 12:42:07 EST


On 11/22/22 10:05, David Hildenbrand wrote:
> On 21.11.22 22:33, Andrew Morton wrote:
> > On Mon, 21 Nov 2022 09:05:43 +0100 David Hildenbrand <david@xxxxxxxxxx> wrote:
> >
> > > > > MikeK do you have test cases?
> > > >
> > > > Sorry, I do not have any test cases.
> > > >
> > > > I can ask one of our product groups about their usage. But, that would
> > > > certainly not be a comprehensive view.
> > >
> > > With
> > >
> > > https://lkml.kernel.org/r/20221116102659.70287-1-david@xxxxxxxxxx
> > >
> > > on it's way, the RDMA concern should be gone, hopefully.
> > >
> > > @Andrew, can you queue this one? Thanks.
> >
> > This is all a little tricky.
> >
> > It's not good that 6.0 and earlier permit unprivileged userspace to
> > trigger a WARN. But we cannot backport this fix into earlier kernels
> > because it requires the series "mm/gup: remove FOLL_FORCE usage from
> > drivers (reliable R/O long-term pinning)".
> >
> > Is it possible to come up with a fix for 6.1 and earlier which won't
> > break RDMA?
>
> Let's recap:

Thanks!

>
> (1) Nobody so far reported a RDMA regression, it was all pure
> speculation. The only report we saw was via ptrace when fuzzing
> syscalls.
>
> (2) To trigger it, one would need a hugetlb MAP_PRIVATE mappings without
> PROT_WRITE. For example:
>
> mmap(0, SIZE, PROT_READ,
> MAP_PRIVATE|MAP_ANON|MAP_HUGETLB|MAP_HUGE_2MB, -1, 0)
> or
> mmap(0, SIZE, PROT_READ, MAP_PRIVATE, hugetlbfd, 0)
>
> While that's certainly valid, it's not the common use case with
> hugetlb pages.

FWIW, I did check with our product teams and they do not knowingly make use
of private mappings without write. Of course, that is only a small and
limited sample size.

RDMA to shared hugetlb mappings is the common case.

>
> (3) Before 1d8d14641fd9 (< v6.0), it "worked by accident" but was wrong:
> pages would get mapped writable into page tables, even though we did
> not have VM_WRITE. FOLL_FORCE support is essentially absent but not
> fenced properly.
>
> (4) With 1d8d14641fd9 (v6.0 + v6.1-rc), it results in a warning instead.
>
> (5) This patch silences the warning.
>
>
> Ways forward are:
>
> (1) Implement FOLL_FORCE for hugetlb and backport that. Fixes the
> warning in 6.0 and wrong behavior before that. The functionality,
> however, might not be required in 6.2 at all anymore: the last
> remaining use case would be ptrace (which, again, we don't have
> actual users reporting breakages).
>
> (2) Use this patch and backport it into 6.0/6.1 to fix the warning. RDMA
> will be handled properly in 6.2 via reliable long-term pinnings.

I am OK with this approach.
--
Mike Kravetz

>
> (3) Use this patch and backport it into 6.0/6.1 to fix the warning.
> Further, backport the reliable long-term pinning changes into
> 6.0/6.1 if there are user reports.
>
> (4) On user report regarding RDMA in 6.0 and 6.1, revert the sanity
> check that triggers the warning and restore previous (wrong)
> behavior.
>
>
> To summarize, the benefit of (1) would be to have ptrace on hugetlb COW
> mappings working. As stated, I'd like to minimize FOLL_FORCE implementations
> if there are no legacy users because FOLL_FORCE has a proven record of
> security issues. Further, backports to < 6.0 might not be straight forward.
>
> I'd suggest (2), but I'm happy to hear other opinions.
>
> --
> Thanks,
>
> David / dhildenb
>