Re: [PATCH 00/12] mm: free retracted page table by RCU

From: Hugh Dickins
Date: Tue Jun 06 2023 - 02:29:17 EST


On Fri, 2 Jun 2023, Jann Horn wrote:
> On Fri, Jun 2, 2023 at 6:37 AM Hugh Dickins <hughd@xxxxxxxxxx> wrote:
>
> > The most obvious vital thing (in the split ptlock case) is that it
> > remains a struct page with a usable ptl spinlock embedded in it.
> >
> > The question becomes more urgent when/if extending to replacing the
> > pagetable pmd by huge pmd in one go, without any mmap_lock: powerpc
> > wants to deposit the page table for later use even in the shmem/file
> > case (and all arches in the anon case): I did work out the details once
> > before, but I'm not sure whether I would still agree with myself; and was
> > glad to leave replacement out of this series, to revisit some time later.
> >
> > >
> > > So in particular, in handle_pte_fault() we can reach the "if
> > > (unlikely(!pte_same(*vmf->pte, entry)))" with vmf->pte pointing to a
> > > detached zeroed page table, but we're okay with that because in that
> > > case we know that !pte_none(vmf->orig_pte)&&pte_none(*vmf->pte) ,
> > > which implies !pte_same(*vmf->pte, entry) , which means we'll bail
> > > out?
> >
> > There is no current (even at end of series) circumstance in which we
> > could be pointing to a detached page table there; but yes, I want to
> > allow for that, and yes I agree with your analysis.
>
> Hmm, what am I missing here?

I spent quite a while trying to reconstruct what I had been thinking,
what meaning of "detached" or "there" I had in mind when I asserted so
confidently "There is no current (even at end of series) circumstance
in which we could be pointing to a detached page table there".

But had to give up and get on with more useful work.
Of course you are right, and that is what this series is about.

Hugh

>
> static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
> {
> pte_t entry;
>
> if (unlikely(pmd_none(*vmf->pmd))) {
> [not executed]
> } else {
> /*
> * A regular pmd is established and it can't morph into a huge
> * pmd by anon khugepaged, since that takes mmap_lock in write
> * mode; but shmem or file collapse to THP could still morph
> * it into a huge pmd: just retry later if so.
> */
> vmf->pte = pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd,
> vmf->address, &vmf->ptl);
> if (unlikely(!vmf->pte))
> [not executed]
> // this reads a present readonly PTE
> vmf->orig_pte = ptep_get_lockless(vmf->pte);
> vmf->flags |= FAULT_FLAG_ORIG_PTE_VALID;
>
> if (pte_none(vmf->orig_pte)) {
> [not executed]
> }
> }
>
> [at this point, a concurrent THP collapse operation detaches the page table]
> // vmf->pte now points into a detached page table
>
> if (!vmf->pte)
> [not executed]
>
> if (!pte_present(vmf->orig_pte))
> [not executed]
>
> if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma))
> [not executed]
>
> spin_lock(vmf->ptl);
> entry = vmf->orig_pte;
> // vmf->pte still points into a detached page table
> if (unlikely(!pte_same(*vmf->pte, entry))) {
> update_mmu_tlb(vmf->vma, vmf->address, vmf->pte);
> goto unlock;
> }
> [...]
> }