Re: [syzbot] KASAN: invalid-access Read in copy_page

From: Catalin Marinas
Date: Thu Oct 06 2022 - 06:24:48 EST


On Wed, Oct 05, 2022 at 01:38:55PM +0100, James Morse wrote:
> On 27/09/2022 17:55, Andrey Konovalov wrote:
> > On Tue, Sep 6, 2022 at 6:23 PM Catalin Marinas <catalin.marinas@xxxxxxx> wrote:
> >> On Tue, Sep 06, 2022 at 04:39:57PM +0200, Andrey Konovalov wrote:
> >>> On Tue, Sep 6, 2022 at 4:29 PM Catalin Marinas <catalin.marinas@xxxxxxx> wrote:
> >>>>>> Does it take long to reproduce this kasan warning?
> >>>>>
> >>>>> syzbot finds several such cases every day (200 crashes for the past 35 days):
> >>>>> https://syzkaller.appspot.com/bug?extid=c2c79c6d6eddc5262b77
> >>>>> So once it reaches the tested tree, we should have an answer within a day.
[...]
> I've reproduced this with the latest qemu and v6.0 kernel using ubuntu 15.04 user-space.
>
> The reproducer is just to log in once its booted. The vm has swap, and I've turned the
> memory down low enough to force it to swap. The round trip time is about 15 minutes.
>
> I've not managed to reproduce it without swap, or with more memory. (but it may be a
> timing thing)

Thanks James. I got the error without swap enabled. Just booted Debian
under Qemu with 256MB of RAM (no graphics), did an 'ls -lR /' and it
triggered shortly after. There's no MTE used in user-space.

==================================================================
BUG: KASAN: invalid-access in copy_page+0x10/0xd0
Read at addr f9ff0000050ba000 by task kcompactd0/28
Pointer tag: [f9], memory tag: [f8]

CPU: 0 PID: 28 Comm: kcompactd0 Tainted: G W 6.0.0-rc3-dirty #1
Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
Call trace:
dump_backtrace.part.0+0xdc/0xf0
show_stack+0x1c/0x4c
dump_stack_lvl+0x68/0x84
print_report+0x104/0x610
kasan_report+0x90/0xb0
__do_kernel_fault+0x70/0x194
do_tag_check_fault+0x7c/0x90
do_mem_abort+0x48/0xa0
el1_abort+0x40/0x60
el1h_64_sync_handler+0xdc/0xec
el1h_64_sync+0x64/0x68
copy_page+0x10/0xd0
folio_copy+0x50/0xb0
migrate_folio+0x50/0x9c
move_to_new_folio+0xc0/0x1d4
migrate_pages+0x16b4/0x1740
compact_zone+0x66c/0xb0c
proactive_compact_node+0x70/0xac
kcompactd+0x1b4/0x370
kthread+0x110/0x114
ret_from_fork+0x10/0x20

The buggy address belongs to the physical page:
page:000000007339140a refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff90019 pfn:0x450ba
memcg:f9ff0000052e4000
anon flags: 0x3fffc180088000d(locked|uptodate|dirty|swapbacked|arch_2|node=0|zone=0|lastcpupid=0xffff|kasantag=0x6)
raw: 03fffc180088000d fffffc0000142e48 ffff80000815bd68 fdff000001c738c1
raw: 0000000ffff90019 0000000000000000 00000001ffffffff f9ff0000052e4000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff0000050b9e00: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
ffff0000050b9f00: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
>ffff0000050ba000: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
^
ffff0000050ba100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
ffff0000050ba200: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
==================================================================

It looks like it always happens on read. Something updated the tag in
page->flags for an existing page (or repainted the page, though less
likely as I think the page is in use).

I'm surprised that even without MTE in user-space, we still get
PG_mte_tagged (arch_2) set. Time for more printks.

--
Catalin