Re: Linux 5.1-rc5

From: Martin Schwidefsky
Date: Wed Apr 17 2019 - 04:02:56 EST


On Wed, 17 Apr 2019 09:46:37 +0200
Martin Schwidefsky <schwidefsky@xxxxxxxxxx> wrote:

> On Tue, 16 Apr 2019 09:49:46 -0700
> Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> > On Tue, Apr 16, 2019 at 9:16 AM Linus Torvalds
> > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > We actually already *have* this function.
> > >
> > > It's called "gup_fast_permitted()" and it's used by x86-64 to verify
> > > the proper address range. Exactly like s390 needs..
> > >
> > > Could you please use that instead?
> >
> > IOW, something like the attached.
> >
> > Obviously untested. And maybe 'current' isn't declared in
> > <asm/pgtable.h>, in which case you'd need to modify it to instead make
> > the inline function be "s390_gup_fast_permitted()" that takes a
> > pointer to the mm, and do something like
> >
> > #define gup_fast_permitted(start, pages) \
> > s390_gup_fast_permitted(current->mm, start, pages)
> >
> > instead.
> >
> > But I think you get the idea..
>
> Nice, I did not realize that gup_fast_permitted is a platform
> override-able function. So that part is doable in arch/s390. But I
> spoke to soon, I got my first crash and realized that the common gup code
> is not usable as it is. The reason is this e.g. this sequence:
>
> pgdp = pgd_offset(current->mm, addr);
> pgd_t pgd = READ_ONCE(*pgdp);
> /* some checking on pgd */
> gup_p4d_range(pgd, addr, next, write, pages, nr);
>
> p4dp = p4d_offset(&pgd, addr);
> p4d_t p4d = READ_ONCE(*p4dp);
> /* some checking on p4d */
> gup_pud_range(p4d, addr, next, write, pages, nr);
>
> pudp = pud_offset(&p4d, addr);
> pud_t pud = READ_ONCE(*pudp);
> /* some checking on pud */
> gup_pmd_range(pud, addr, next, write, pages, nr;
>
> Each step along the way will read the page table entry and pass the
> table entry to the next function. This clashes with the page table
> folding on s390. The s390 gup code looks more like this:
>
> pgdp = pgd_offset(current->mm, addr);
> /* some checking on pgd */
> pgd_t pgd = READ_ONCE(*pgdp);
> gup_p4d_range(pgdp, pgd, addr, next, write, pages, &nr);
>
> p4dp = p4d_offset(pgdp, addr);
> p4d_t p4d = READ_ONCE(*p4dp);
> /* some checking on p4d */
> gup_pud_range(p4dp, p4d, addr, next, write, pages, nr);
>
> pudp = pud_offset(p4dp, addr);
> pud_t pud = READ_ONCE(*pudp);
> /* some checking on pud */
> gup_pmd_range(pudp, pud, addr, next, write, pages, nr;
>
> There are magic dereferences in the s390 versions of p4d_offset,
> pud_offset and pmd_offset functions. To make this work the pointer
> passed to these functions may not be the local copy of the already
> dereferenced table entry. I'll cook up a patch for the common code.

Grumpf, that does *not* work. For gup the table entries may be read only
once. Now I remember why I open-coded p4d_offset, pud_offset and pmd_offset
in arch/s390/mm/gup.c, to avoid to read the table entries twice.
It will be hard to use the common gup code after all.

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.