Re: [PATCH v3 04/13] powerpc: assert_pte_locked() use pte_offset_map_nolock()

From: Jay Patel
Date: Fri Jul 21 2023 - 09:15:53 EST


On Jul 19 2023, Aneesh Kumar K V wrote:
> On 7/19/23 10:34 AM, Hugh Dickins wrote:
> > On Tue, 18 Jul 2023, Aneesh Kumar K.V wrote:
> >> Hugh Dickins <hughd@xxxxxxxxxx> writes:
> >>
> >>> Instead of pte_lockptr(), use the recently added pte_offset_map_nolock()
> >>> in assert_pte_locked(). BUG if pte_offset_map_nolock() fails: this is
> >>> stricter than the previous implementation, which skipped when pmd_none()
> >>> (with a comment on khugepaged collapse transitions): but wouldn't we want
> >>> to know, if an assert_pte_locked() caller can be racing such transitions?
> >>>
> >>
> >> The reason we had that pmd_none check there was to handle khugpaged. In
> >> case of khugepaged we do pmdp_collapse_flush and then do a ptep_clear.
> >> ppc64 had the assert_pte_locked check inside that ptep_clear.
> >>
> >> _pmd = pmdp_collapse_flush(vma, address, pmd);
> >> ..
> >> ptep_clear()
> >> -> asset_ptep_locked()
> >> ---> pmd_none
> >> -----> BUG
> >>
> >>
> >> The problem is how assert_pte_locked() verify whether we are holding
> >> ptl. It does that by walking the page table again and in this specific
> >> case by the time we call the function we already had cleared pmd .
> >
> > Aneesh, please clarify, I've spent hours on this.
> >
> > From all your use of past tense ("had"), I thought you were Acking my
> > patch; but now, after looking again at v3.11 source and today's,
> > I think you are NAKing my patch in its present form.
> >
>
> Sorry for the confusion my reply created.
>
> > You are pointing out that anon THP's __collapse_huge_page_copy_succeeded()
> > uses ptep_clear() at a point after pmdp_collapse_flush() already cleared
> > *pmd, so my patch now leads that one use of assert_pte_locked() to BUG.
> > Is that your point?
> >
>
> Yes. I haven't tested this yet to verify that it is indeed hitting that BUG.
> But a code inspection tells me we will hit that BUG on powerpc because of
> the above details.
>
Hi Aneesh,

After testing it, I can confirm that it encountered a BUG on powerpc.
Log report as attached

Thanks,
Jay Patel
> > I can easily restore that khugepaged comment (which had appeared to me
> > out of date at the time, but now looks still relevant) and pmd_none(*pmd)
> > check: but please clarify.
> >
>
> That is correct. if we add that pmd_none check back we should be good here.
>
>
> -aneesh
[ 53.513058][ T105] ------------[ cut here ]------------
[ 53.513080][ T105] kernel BUG at arch/powerpc/mm/pgtable.c:327!
[ 53.513090][ T105] Oops: Exception in kernel mode, sig: 5 [#1]
[ 53.513099][ T105] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[ 53.513109][ T105] Modules linked in: bonding pseries_rng rng_core vmx_crypto gf128mul ibmveth crc32c_vpmsum fuse autofs4
[ 53.513135][ T105] CPU: 3 PID: 105 Comm: khugepaged Not tainted 6.5.0-rc1-gebfaf626e99f-dirty #1
[ 53.513146][ T105] Hardware name: IBM,9009-42G POWER9 (raw) 0x4e0202 0xf000005 of:IBM,FW950.80 (VL950_131) hv:phyp pSeries
[ 53.513156][ T105] NIP: c000000000079478 LR: c00000000007946c CTR: 0000000000000000
[ 53.513165][ T105] REGS: c000000008e9b930 TRAP: 0700 Not tainted (6.5.0-rc1-gebfaf626e99f-dirty)
[ 53.513175][ T105] MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24002882 XER: 20040000
[ 53.513202][ T105] CFAR: c000000000412a30 IRQMASK: 0
[ 53.513202][ T105] GPR00: c00000000007946c c000000008e9bbd0 c0000000012d3500 0000000000000001
[ 53.513202][ T105] GPR04: 0000000011000000 c000000008e9bbb0 c000000008e9bbf0 ffffffffffffffff
[ 53.513202][ T105] GPR08: 00000000000003ff 0000000000000000 0000000000000000 000000000000000a
[ 53.513202][ T105] GPR12: 00000000497b0000 c00000001ec9d480 c00c00000016fe00 c000000051455000
[ 53.513202][ T105] GPR16: 0000000000000000 ffffffffffffffff 0000000000000001 0000000000000001
[ 53.513202][ T105] GPR20: c000000002912430 c000000051455000 0000000000000000 c00000000946e650
[ 53.513202][ T105] GPR24: c0000000029800e8 0000000011000000 c00c000000145168 c000000002980180
[ 53.513202][ T105] GPR28: 0000000011000000 8603f85b000080c0 c000000008e9bc70 c00000001bd0d680
[ 53.513304][ T105] NIP [c000000000079478] assert_pte_locked.part.18+0x168/0x1b0
[ 53.513318][ T105] LR [c00000000007946c] assert_pte_locked.part.18+0x15c/0x1b0
[ 53.513329][ T105] Call Trace:
[ 53.513335][ T105] [c000000008e9bbd0] [c00000000007946c] assert_pte_locked.part.18+0x15c/0x1b0 (unreliable)
[ 53.513350][ T105] [c000000008e9bc00] [c00000000048e10c] collapse_huge_page+0x11dc/0x1700
[ 53.513362][ T105] [c000000008e9bd40] [c00000000048ed18] hpage_collapse_scan_pmd+0x6e8/0x850
[ 53.513374][ T105] [c000000008e9be30] [c000000000492544] khugepaged+0x7e4/0xb70
[ 53.513386][ T105] [c000000008e9bf90] [c00000000015f038] kthread+0x138/0x140
[ 53.513399][ T105] [c000000008e9bfe0] [c00000000000dd58] start_kernel_thread+0x14/0x18
[ 53.513411][ T105] Code: 7c852378 7c844c36 794a1564 7c894038 794af082 79241f24 78eaf00e 7c8a2214 48399541 60000000 7c630074 7863d182 <0b030000> e9210020 81290000 7d290074
[ 53.513448][ T105] ---[ end trace 0000000000000000 ]---
[ 53.516544][ T105]
[ 53.516551][ T105] note: khugepaged[105] exited with irqs disabled
[ 182.648447][ T1952] mconf[1952]: segfault (11) at 110efa38 nip 1001e97c lr 1001e8a8 code 1
[ 182.648482][ T1952] mconf[1952]: code: 60420000 4bfffd59 4bffffd0 60000000 60420000 e93f0070 2fa90000 409e0014
[ 182.648494][ T1952] mconf[1952]: code: 480000cc e9290000 2fa90000 419e00c0 <81490008> 2f8a0005 409effec e9490028
[ 962.694079][T39811] sda2: Can't mount, would change RO state