Re: A better dump_page()

From: David Rientjes
Date: Tue Jan 03 2023 - 18:07:49 EST


On Tue, 3 Jan 2023, Matthew Wilcox wrote:

> On Tue, Jan 03, 2023 at 11:42:11AM +0100, Vlastimil Babka wrote:
> > Separately we should also make the __dump_page() more resilient.
>
> Right. It's not ideal when one of our best debugging tools obfuscates
> the problem we're trying to debug. I've seen probems like this before,
> and the problem is that somebody calls dump_page() on a page that they
> don't own a refcount on. That lets the page mutate under us in some
> fairly awkward ways (as you've seen here, it seems to be part of several
> different compound allocations at various points during the dump
> process).
>
> One possibility I thought about was taking our own refcount on the
> page at the start of dump_page(). That would kill off the possibility
> of ever passing in a const struct page, and it would confuse people.
> Also, what if somebody passes in a pointer to something that's not a
> struct page? Then we've (tried to) modify memory that's not a refcount.
>
> I think the best we can do is to snapshot the struct page and the folio
> it appears to belong to at the start of dump_page(). It'll take a
> little care (for example, folio_pfn() must be passed the original
> folio, and not the snapshot), but I think it's doable.
>

By snapshot do you mean memcpy() of the metadata to the stack? I assume
this still leaves the opportunity for the underlying mutation of the page
but makes the window more narrow.