Re: [PATCH] mm: migrate: fix getting incorrect page mapping during page migration

From: Matthew Wilcox
Date: Fri Dec 15 2023 - 09:52:12 EST


On Fri, Dec 15, 2023 at 08:07:52PM +0800, Baolin Wang wrote:
> When running stress-ng testing, we found below kernel crash after a few hours:
>
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> pc : dentry_name+0xd8/0x224
> lr : pointer+0x22c/0x370
> sp : ffff800025f134c0
> ......
> Call trace:
> dentry_name+0xd8/0x224
> pointer+0x22c/0x370
> vsnprintf+0x1ec/0x730
> vscnprintf+0x2c/0x60
> vprintk_store+0x70/0x234
> vprintk_emit+0xe0/0x24c
> vprintk_default+0x3c/0x44
> vprintk_func+0x84/0x2d0
> printk+0x64/0x88
> __dump_page+0x52c/0x530
> dump_page+0x14/0x20

[...]

> There are seveval ways to fix this issue:
> (1) Setting the PAGE_MAPPING_ANON flag for target page's ->mapping when saving
> 'anon_vma', but this can confuse PageAnon() for PFN walkers, since the target
> page has not built mappings yet.
> (2) Getting the page lock to call page_mapping() in __dump_page() to avoid crashing
> the system, however, there are still some PFN walkers that call page_mapping()
> without holding the page lock, such as compaction.
> (3) Using target page->private field to save the 'anon_vma' pointer and 2 bits
> page state, just as page->mapping records an anonymous page, which can remove
> the page_mapping() impact for PFN walkers and also seems a simple way.
>
> So I choose option 3 to fix this issue, and this can also fix other potential
> issues for PFN walkers, such as compaction.

I'm not saying no to this fix, but dump_mapping() is supposed to be
resilient against this. Is the issue that 'dentry' is NULL, or is it
some field within dentry that is NULL? eg, would this fix your
case?

+++ b/fs/inode.c
@@ -588,7 +588,7 @@ void dump_mapping(const struct address_space *mapping)
}

dentry_ptr = container_of(dentry_first, struct dentry, d_u.d_alias);
- if (get_kernel_nofault(dentry, dentry_ptr)) {
+ if (get_kernel_nofault(dentry, dentry_ptr) || !dentry) {
pr_warn("aops:%ps ino:%lx invalid dentry:%px\n",
a_ops, ino, dentry_ptr);
return;

Just to be clear, I think we should fix both the dumping and the migration
code.