Re: [syzbot] [mm?] WARNING in __folio_rmap_sanity_checks

From: Yin, Fengwei
Date: Fri Jan 05 2024 - 09:50:24 EST




On 1/5/2024 4:56 PM, David Hildenbrand wrote:
If I am not wrong, that triggers:

VM_WARN_ON_FOLIO(folio_test_large(folio) &&
            !folio_test_large_rmappable(folio), folio);

So we are trying to rmap a large folio that did not go through
folio_prep_large_rmappable().

Would someone mind explaining the rules to me for this? As far as I can see,
folio_prep_large_rmappable() just inits the _deferred_list and sets a flag so we
remember to deinit the list on destruction. Why can't we just init that list for
all folios order-2 or greater? Then everything is rmappable?

I think we much rather want to look into moving all mapcount-related stuff into folio_prep_large_rmappable(). It doesn't make any sense to initialize that for any compound pages, especially the ones that will never get mapped to user space.



net/packet/af_packet.c calls vm_insert_page() on some pages/folios stoed
in the "struct packet_ring_buffer". No idea where that comes from, but I
suspect it's simply some compound allocation.
Looks like:
    alloc_pg_vec
      alloc_one_pg_vec_page
           gfp_t gfp_flags = GFP_KERNEL | __GFP_COMP |
                             __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY;

           buffer = (char *) __get_free_pages(gfp_flags, order);
So you are right here... :).

Hm, but I wonder if this something that's supposed to work or is this one of
the cases where we should actually use a VM_PFN mapping?

It's not a pagecache(file/shmem) page after all.

We could relax that check and document why we expect something that is not
marked rmappable. But it fells wrong. I suspect this should be a VM_PFNMAP
instead (like recent udmabuf changes).

VM_PFNMAP looks correct.

And why is making the folio rmappable and mapping it the normal way not the
right solution here? Because the folio could be order-1? Or something more profound?


Think about it: we are adding/removing a page from rmap handling that can *never* be looked up using the rmap because there is no rmap for these pages, and folio->index is just completely unexpressive. VM_MIXEDMAP doesn't have any linearity constraints.

Logically, it doesn't make any sense to involve rmap code although it currently might work. validate_page_before_insert() blocks off most pages where the order-0 mapcount would be used for other purposes and everything would blow up.

Looking at vm_insert_page(), this interface is only for pages the caller allocated. Maybe we should just not do any rmap accounting when mapping/unmapping these pages: not involve any rmap code, including mapcounts?
This is my understanding.


vm_normal_page() works on these mappings, so we'd also have to skip rmap code when unmapping these pages etc. Maybe that's the whole reason we have the rmap handling here: to not special-case the unmap path.

vm_insert_page() will set VM_MIXEDMAP and vm_normal_page() will skip
the page if CONFIG_ARCH_HAS_PTE_SPECIAL is enabled (it's enabled for
x86). So the unmap path will skip these kind of folios?


Regards
Yin, Fengwei


Alternatively, we can:

(1) Require the caller to make sure large folios are rmappable. We
    already require allocations to be compound. Should be easy to add.
(2) Allow non-rmappable folios in rmap code just for mapcount tracking.
    Confusing but possible.


I do have another question: why do we just check the large folio
rmappable? Does that mean order0 folio is always rmappable?


We didn't really have a check for that I believe. We simply reject all pages in vm_insert_page() that are problematic because the pagecount is overloaded.

I ask this because vm_insert_pages() is called in net/ipv4/tcp.c
and drivers call vm_insert_page. I suppose they all need be VM_PFNMAP.

Right, similar problem.