Re: [syzbot] KASAN: invalid-access Read in copy_page

From: James Morse
Date: Wed Oct 05 2022 - 08:39:09 EST


Hi guys,

On 27/09/2022 17:55, Andrey Konovalov wrote:
> On Tue, Sep 6, 2022 at 6:23 PM Catalin Marinas <catalin.marinas@xxxxxxx> wrote:
>>
>> On Tue, Sep 06, 2022 at 04:39:57PM +0200, Andrey Konovalov wrote:
>>> On Tue, Sep 6, 2022 at 4:29 PM Catalin Marinas <catalin.marinas@xxxxxxx> wrote:
>>>>>> Does it take long to reproduce this kasan warning?
>>>>>
>>>>> syzbot finds several such cases every day (200 crashes for the past 35 days):
>>>>> https://syzkaller.appspot.com/bug?extid=c2c79c6d6eddc5262b77
>>>>> So once it reaches the tested tree, we should have an answer within a day.
>>>
>>> To be specific, this syzkaller instance fuzzes the mainline, so the
>>> patch with the WARN_ON needs to end up there.
>>>
>>> If this is unacceptable, perhaps, we could switch the MTE syzkaller
>>> instance to the arm64 testing tree.
>>
>> It needs some more digging first. My first guess was that a PROT_MTE
>> page was mapped into the user address space and the task repainted it
>> but I don't think that's the case.

> syzkaller still keeps hitting this issue and I was wondering if you
> have any ideas of what could be wrong here?
>
>> Since I can't find the kernel boot log for these runs, is there any kind
>> of swap enabled? I'm trying to narrow down where the problem may be.
>
> I don't think there is.


I've reproduced this with the latest qemu and v6.0 kernel using ubuntu 15.04 user-space.

The reproducer is just to log in once its booted. The vm has swap, and I've turned the
memory down low enough to force it to swap. The round trip time is about 15 minutes.

I've not managed to reproduce it without swap, or with more memory. (but it may be a
timing thing)


Below is one example of tag corruption that affected page-cache memory that wouldn't be
swapped:
-------------------%<-------------------
[49488.484420] BUG: KASAN: invalid-access in __arch_copy_to_user+0x180/0x240
[49488.487122] Read at addr f1ff00000ad48000 by task apt-config/5041
[49488.488614] Pointer tag: [f1], memory tag: [fe]

[49488.490921] CPU: 1 PID: 5041 Comm: apt-config Not tainted 6.0.0 #14546
[49488.492364] Hardware name: linux,dummy-virt (DT)
[49488.493790] Call trace:
[49488.494640] dump_backtrace.part.0+0xd0/0xe0
[49488.495811] show_stack+0x18/0x50
[49488.496785] dump_stack_lvl+0x68/0x84
[49488.497781] print_report+0x104/0x604
[49488.498790] kasan_report+0x8c/0xb0
[49488.499758] __do_kernel_fault+0x11c/0x1bc
[49488.500801] do_tag_check_fault+0x78/0x90
[49488.501830] do_mem_abort+0x44/0x9c
[49488.502813] el1_abort+0x40/0x60
[49488.503839] el1h_64_sync_handler+0xb0/0xd0
[49488.504880] el1h_64_sync+0x64/0x68
[49488.505847] __arch_copy_to_user+0x180/0x240
[49488.506917] _copy_to_iter+0x68/0x5c0
[49488.507918] copy_page_to_iter+0xac/0x33c
[49488.508943] filemap_read+0x1b4/0x3b0
[49488.509936] generic_file_read_iter+0x108/0x1a0
[49488.511033] ext4_file_read_iter+0x58/0x1f0
[49488.512078] vfs_read+0x1f8/0x2a0
[49488.513031] ksys_read+0x68/0xf4
[49488.513978] __arm64_sys_read+0x1c/0x2c
[49488.514998] invoke_syscall+0x48/0x114
[49488.516046] el0_svc_common.constprop.0+0x44/0xec
[49488.517153] do_el0_svc+0x2c/0xc0
[49488.518120] el0_svc+0x2c/0xb4
[49488.519041] el0t_64_sync_handler+0xb8/0xc0
[49488.520080] el0t_64_sync+0x198/0x19c

[49488.522268] The buggy address belongs to the physical page:
[49488.523778] page:00000000db6e19d9 refcount:20 mapcount:18 mapping:0000000052573be9
index:0x0 pfn:0x4ad48
[49488.524938] memcg:faff000002c70000
[49488.525430] aops:ext4_da_aops ino:8061 dentry name:"libc-2.21.so"
[49488.526289] flags:
0x1ffc38002020876(referenced|uptodate|lru|active|workingset|arch_1|mappedtodisk|arch_2|node=0|zone=0|lastcpupid=0x7ff|kasantag=0xe)
CMA
[49488.527947] raw: 01ffc38002020876 fffffc00002b5248 fffffc00002b51c8 f8ff00000335c760
[49488.528325] raw: 0000000000000000 0000000000000000 0000001400000011 faff000002c70000
[49488.528669] page dumped because: kasan: bad access detected

[49488.529615] Memory state around the buggy address:
[49488.531027] ffff00000ad47e00: f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1
[49488.532442] ffff00000ad47f00: f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1
[49488.533922] >ffff00000ad48000: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
[49488.535259] ^
[49488.536292] ffff00000ad48100: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
[49488.537628] ffff00000ad48200: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
[49488.539015] ==================================================================
[49488.603970] Disabling lock debugging due to kernel taint
-------------------%<-------------------


Thanks,

James