Re: [PATCH] fs: improve dump_mapping() robustness

From: Charan Teja Kalla
Date: Mon Jan 22 2024 - 12:29:33 EST




On 1/22/2024 12:47 PM, Baolin Wang wrote:
>>
>> We too seen the below crash while printing the dentry name.
>>
>> aops:shmem_aops ino:5e029 dentry name:"dev/zero"
>> flags:
>> 0x8000000000080006(referenced|uptodate|swapbacked|zone=2|kasantag=0x0)
>> raw: 8000000000080006 ffffffc033b1bb60 ffffffc033b1bb60 ffffff8862537600
>> raw: 0000000000000001 0000000000000000 00000003ffffffff ffffff807fe64000
>> page dumped because: migration failure
>> migrating pfn aef223 failed ret:1
>> page:000000009e72a120 refcount:3 mapcount:0 mapping:000000003325dda1
>> index:0x1 pfn:0xaef223
>> memcg:ffffff807fe64000
>> Unable to handle kernel NULL pointer dereference at virtual address
>> 0000000000000000
>> Mem abort info:
>>    ESR = 0x0000000096000005
>>    EC = 0x25: DABT (current EL), IL = 32 bits
>>    SET = 0, FnV = 0
>>    EA = 0, S1PTW = 0
>>    FSC = 0x05: level 1 translation fault
>> Data abort info:
>>    ISV = 0, ISS = 0x00000005
>>    CM = 0, WnR = 0
>> user pgtable: 4k pages, 39-bit VAs, pgdp=000000090c12d000
>> [0000000000000000] pgd=0000000000000000, p4d=0000000000000000,
>> pud=0000000000000000
>> Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
>>
>> dentry_name+0x1f8/0x3a8
>> pointer+0x3b0/0x6b8
>> vsnprintf+0x4a4/0x65c
>> vprintk_store+0x168/0x4a8
>> vprintk_emit+0x98/0x218
>> vprintk_default+0x44/0x70
>> vprintk+0xf0/0x138
>> _printk+0x54/0x80
>> dump_mapping+0x17c/0x188
>> dump_page+0x1d0/0x2e8
>> offline_pages+0x67c/0x898
>>
>>
>>
>> Not much comfortable with block layer internals, TMK, the below is what
>> happening in the my case:
>> memoffline                 dput()
>> (offline_pages)         (as part of closing of the shmem file)
>> ------------         --------------------------------------
>>                     .......
>>             1) dentry_unlink_inode()
>>                   hlist_del_init(&dentry->d_u.d_alias);
>>
>>             2) iput():
>>                 a) inode->i_state |= I_FREEING
>>                 .....
>>                 b) evict_inode()->..->shmem_undo_range
>>                    1) get the folios with elevated refcount
>> 3) do_migrate_range():
>>     a) Because of the elevated
>>     refcount in 2.b.1, the
>>     migration of this page will
>>     be failed.
>>
>>                    2) truncate_inode_folio() ->
>>                      filemap_remove_folio():
>>                   (deletes from the page cache,
>>                  set page->mapping=NULL,
>>                  decrement the refcount on folio)
>>    b) Call dump_page():
>>       1) mapping = page_mapping(page);
>>       2) dump_mapping(mapping)
>>       a) We unlinked the dentry in 1)
>>             thus dentry_ptr from host->i_dentry.first
>>             is not a proper one.
>>
>>           b) dentry name print with %pd is resulting into
>>        the mentioned crash.
>>
>>
>> At least in this case, I think __this patchset in its current form can
>> help us__.
>
> This looks another case of NULL pointer access. Thanks for the detailed
> analysis. Could you provide a Tested-by or Reviewed-by tag if it can
> solve your problem?
Seen this issue couple of times, over 3 months back. Not sure if we ever
encounter this issue again. Still, will pick this and let you know the
side effects of this patch, after thorough testing.

Thanks.