Re: [syzbot] [crypto?] general protection fault in scatterwalk_copychunks (5)

From: Chris Li
Date: Tue Dec 26 2023 - 17:33:01 EST


Hi Nhat,

Thanks for the first stab on this bug.

On Tue, Dec 26, 2023 at 1:08 PM Nhat Pham <nphamcs@xxxxxxxxx> wrote:
>
> On Tue, Dec 26, 2023 at 7:28 AM syzbot
> <syzbot+3eff5e51bf1db122a16e@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 39676dfe5233 Add linux-next specific files for 20231222
> > git tree: linux-next
> > console+strace: https://syzkaller.appspot.com/x/log.txt?x=172080a1e80000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=f3761490b734dc96
> > dashboard link: https://syzkaller.appspot.com/bug?extid=3eff5e51bf1db122a16e
> > compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=178f6e26e80000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c399e9e80000
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/360542c2ca67/disk-39676dfe.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/900dfb21ca8a/vmlinux-39676dfe.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/c94a2a3ea0e0/bzImage-39676dfe.xz
> >
> > The issue was bisected to:
> >
> > commit 7bc134496bbbaacb0d4423b819da4eca850a839d
> > Author: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx>
> > Date: Mon Dec 18 11:50:31 2023 +0000
> >
> > mm/zswap: change dstmem size to one page
> >
> > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15f60c36e80000
> > final oops: https://syzkaller.appspot.com/x/report.txt?x=17f60c36e80000
> > console output: https://syzkaller.appspot.com/x/log.txt?x=13f60c36e80000
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+3eff5e51bf1db122a16e@xxxxxxxxxxxxxxxxxxxxxxxxx
> > Fixes: 7bc134496bbb ("mm/zswap: change dstmem size to one page")
> >
> > general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
> > KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
> > CPU: 0 PID: 5065 Comm: syz-executor140 Not tainted 6.7.0-rc6-next-20231222-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
> > RIP: 0010:scatterwalk_start include/crypto/scatterwalk.h:63 [inline] <========= This is the offending line.
> > RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:83 [inline]
> > RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:72 [inline]
> > RIP: 0010:scatterwalk_copychunks+0x3e0/0x560 crypto/scatterwalk.c:50

This is the call stack with an inline function. Very nice annotations
on the inline function expansions call stacks.

> > Code: f0 48 c1 e8 03 80 3c 08 00 0f 85 81 01 00 00 49 8d 44 24 08 4d 89 26 48 bf 00 00 00 00 00 fc ff df 48 89 44 24 10 48 c1 e8 03 <0f> b6 04 38 84 c0 74 08 3c 03 0f 8e 47 01 00 00 48 8b 44 24 08 41
> > RSP: 0018:ffffc90003a8ecf0 EFLAGS: 00010202
> > RAX: 0000000000000001 RBX: 0000000000000000 RCX: dffffc0000000000
> > RDX: ffff88802785d940 RSI: ffffffff8465df74 RDI: dffffc0000000000
> > RBP: 0000000000001000 R08: 0000000000000005 R09: 0000000000000000
> > R10: 0000000000000002 R11: 82d8bd1b6060f805 R12: 0000000000000000
> > R13: 0000000000000014 R14: ffffc90003a8ed88 R15: 0000000000001000
> > FS: 00005555565c5380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000d5e538 CR3: 0000000079f3a000 CR4: 00000000003506f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> > <TASK>
> > scatterwalk_map_and_copy+0x151/0x1d0 crypto/scatterwalk.c:67
> > scomp_acomp_comp_decomp+0x3a3/0x780 crypto/scompress.c:149
>
> Looks like it's this line:
>
> scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,1);

Yes indeed.

The offending instruction is actually this one:

The inlined part of the call stack:
RIP: 0010:scatterwalk_start include/crypto/scatterwalk.h:63 [inline]
<========= This is the offending line.
static inline void scatterwalk_start(struct scatter_walk *walk,
struct scatterlist *sg)
{
walk->sg = sg;
walk->offset = sg->offset; <============== RIP pointer
}

RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:83 [inline]

static inline void scatterwalk_pagedone(struct scatter_walk *walk, int out,
unsigned int more)
{
if (out) {
struct page *page;

page = sg_page(walk->sg) + ((walk->offset - 1) >> PAGE_SHIFT);
flush_dcache_page(page);
}

if (more && walk->offset >= walk->sg->offset + walk->sg->length)
scatterwalk_start(walk, sg_next(walk->sg)); <==========================
}

RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:72 [inline]
RIP: 0010:scatterwalk_copychunks+0x3e0/0x560 crypto/scatterwalk.c:50

void scatterwalk_copychunks(void *buf, struct scatter_walk *walk,
size_t nbytes, int out)
{
for (;;) {
unsigned int len_this_page = scatterwalk_pagelen(walk);
u8 *vaddr;

if (len_this_page > nbytes)
len_this_page = nbytes;

if (out != 2) {
vaddr = scatterwalk_map(walk);
memcpy_dir(buf, vaddr, len_this_page, out);
scatterwalk_unmap(vaddr);
}

scatterwalk_advance(walk, len_this_page);

if (nbytes == len_this_page)
break;

buf += len_this_page;
nbytes -= len_this_page;

scatterwalk_pagedone(walk, out & 1, 1); <=========================
}
}

then the unlined portion of the call stack:
scatterwalk_map_and_copy+0x151/0x1d0 crypto/scatterwalk.c:67

void scatterwalk_map_and_copy(void *buf, struct scatterlist *sg,
unsigned int start, unsigned int nbytes, int out)
{
struct scatter_walk walk;
struct scatterlist tmp[2];

if (!nbytes)
return;

sg = scatterwalk_ffwd(tmp, sg, start);

scatterwalk_start(&walk, sg);
scatterwalk_copychunks(buf, &walk, nbytes, out); <===========================
scatterwalk_done(&walk, out, 0);
}

scomp_acomp_comp_decomp+0x3a3/0x780 crypto/scompress.c:149
if (dir)
ret = crypto_scomp_compress(scomp, scratch->src, req->slen,
scratch->dst, &req->dlen, *ctx);
else
ret = crypto_scomp_decompress(scomp, scratch->src, req->slen,
scratch->dst, &req->dlen, *ctx);
if (!ret) {
if (!req->dst) {
req->dst = sgl_alloc(req->dlen, GFP_ATOMIC, NULL);
if (!req->dst) {
ret = -ENOMEM;
goto out;
}
}
scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
<=======================
1);
}

crypto_acomp_compress include/crypto/acompress.h:302 [inline]
zswap_store+0x98b/0x2430 mm/zswap.c:1666
swap_writepage+0x8e/0x220 mm/page_io.c:198
pageout+0x399/0x9e0 mm/vmscan.c:656
shrink_folio_list+0x2f47/0x3ea0 mm/vmscan.c:1319
reclaim_folio_list+0xe4/0x3a0 mm/vmscan.c:2104
reclaim_pages+0x483/0x6a0 mm/vmscan.c:2140
madvise_cold_or_pageout_pte_range+0x129e/0x1f70 mm/madvise.c:526
walk_pmd_range mm/pagewalk.c:143 [inline]
>
> My guess is dlen here exceeds 1 page - maybe the data is
> incompressible, so the output exceeds the original input? Based on the
> included kernel config, the algorithm used here is lzo, and it seems
> that can happen for this particular compressor:
>
> https://github.com/nemequ/lzo/blob/master/doc/LZO.FAQ#L200C1-L203C1
> https://github.com/nemequ/lzo/blob/master/doc/LZO.TXT#L157C2-L157C62

The decompressed output can be bigger than input. However, here we
specify the output size limit to be one page. The decompressor should
not output more than the 1 page of the dst buffer.

I did check the lzo1x_decompress_safe() function.
It is supposed to use the NEED_OP(x) check before it stores the output buffer.
However I do find one place that seems to be missing that check, at
least it is not obvious to me how that check is effective. I will
paste it here let other experts take a look as well:
Line 228:

if (op - m_pos >= 8) {
unsigned char *oe = op + t;
if (likely(HAVE_OP(t + 15))) {
do {
COPY8(op, m_pos);
op += 8;
m_pos += 8;
COPY8(op, m_pos);
op += 8;
m_pos += 8;
} while (op < oe);
op = oe;
if (HAVE_IP(6)) {
state = next;
COPY4(op, ip); <========================= This COPY4 does not have
obvious NEED_OP() check. As far as I can tell, this output is not
converted by the above HAVE_OP(t + 15)) check, which means to protect
the unaligned two 8 byte stores inside the "do while" loops.
op += next;
ip += next;
continue;
}
} else {
NEED_OP(t);
do {
*op++ = *m_pos++;
} while (op < oe);
}
} else


>
> Not 100% sure about linux kernel's implementation though. I'm no
> compression expert :)

Me neither. Anyway, if it is a decompression issue. For this bug, it
is good to have some debug print assert to check that after
decompression, the *dlen is not bigger than the original output
length. If it does blow over the decompression buffer, it is a bug
that needs to be fixed in the decompression function side, not the
zswap code.

Chris

>
> If this is the case though, perhaps we can't reduce the output buffer
> size to 1 page after all, unless we contractually obligate the output
> size to be <= input size (i.e new function in the compression API).
>
>
> > crypto_acomp_compress include/crypto/acompress.h:302 [inline]
> > zswap_store+0x98b/0x2430 mm/zswap.c:1666
> > swap_writepage+0x8e/0x220 mm/page_io.c:198
> > pageout+0x399/0x9e0 mm/vmscan.c:656
> > shrink_folio_list+0x2f47/0x3ea0 mm/vmscan.c:1319
> > reclaim_folio_list+0xe4/0x3a0 mm/vmscan.c:2104
> > reclaim_pages+0x483/0x6a0 mm/vmscan.c:2140
> > madvise_cold_or_pageout_pte_range+0x129e/0x1f70 mm/madvise.c:526
> > walk_pmd_range mm/pagewalk.c:143 [inline]
> > walk_pud_range mm/pagewalk.c:221 [inline]
> > walk_p4d_range mm/pagewalk.c:256 [inline]
> > walk_pgd_range+0xa48/0x1870 mm/pagewalk.c:293
> > __walk_page_range+0x630/0x770 mm/pagewalk.c:395
> > walk_page_range+0x626/0xa80 mm/pagewalk.c:521
> > madvise_pageout_page_range mm/madvise.c:585 [inline]
> > madvise_pageout+0x32c/0x820 mm/madvise.c:612
> > madvise_vma_behavior+0x1cc/0x1b50 mm/madvise.c:1031
> > madvise_walk_vmas+0x1cf/0x2c0 mm/madvise.c:1260
> > do_madvise+0x333/0x660 mm/madvise.c:1440
> > __do_sys_madvise mm/madvise.c:1453 [inline]
> > __se_sys_madvise mm/madvise.c:1451 [inline]
> > __x64_sys_madvise+0xa9/0x110 mm/madvise.c:1451
> > do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
> > entry_SYSCALL_64_after_hwframe+0x62/0x6a
> > RIP: 0033:0x7f15a5e14b69
> > Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > RSP: 002b:00007ffde7b4a5c8 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
> > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f15a5e14b69
> > RDX: 0000000000000015 RSI: 0000000000c00304 RDI: 0000000020000000
> > RBP: 0000000000000000 R08: 00005555565c6610 R09: 00005555565c6610
> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
> > R13: 00007ffde7b4a808 R14: 0000000000000001 R15: 0000000000000001
> > </TASK>
> > Modules linked in:
> > ---[ end trace 0000000000000000 ]---
> > RIP: 0010:scatterwalk_start include/crypto/scatterwalk.h:63 [inline]
> > RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:83 [inline]
> > RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:72 [inline]
> > RIP: 0010:scatterwalk_copychunks+0x3e0/0x560 crypto/scatterwalk.c:50
> > Code: f0 48 c1 e8 03 80 3c 08 00 0f 85 81 01 00 00 49 8d 44 24 08 4d 89 26 48 bf 00 00 00 00 00 fc ff df 48 89 44 24 10 48 c1 e8 03 <0f> b6 04 38 84 c0 74 08 3c 03 0f 8e 47 01 00 00 48 8b 44 24 08 41
> > RSP: 0018:ffffc90003a8ecf0 EFLAGS: 00010202
> > RAX: 0000000000000001 RBX: 0000000000000000 RCX: dffffc0000000000
> > RDX: ffff88802785d940 RSI: ffffffff8465df74 RDI: dffffc0000000000
> > RBP: 0000000000001000 R08: 0000000000000005 R09: 0000000000000000
> > R10: 0000000000000002 R11: 82d8bd1b6060f805 R12: 0000000000000000
> > R13: 0000000000000014 R14: ffffc90003a8ed88 R15: 0000000000001000
> > FS: 00005555565c5380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000d5e538 CR3: 0000000079f3a000 CR4: 00000000003506f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > ----------------
> > Code disassembly (best guess):
> > 0: f0 48 c1 e8 03 lock shr $0x3,%rax
> > 5: 80 3c 08 00 cmpb $0x0,(%rax,%rcx,1)
> > 9: 0f 85 81 01 00 00 jne 0x190
> > f: 49 8d 44 24 08 lea 0x8(%r12),%rax
> > 14: 4d 89 26 mov %r12,(%r14)
> > 17: 48 bf 00 00 00 00 00 movabs $0xdffffc0000000000,%rdi
> > 1e: fc ff df
> > 21: 48 89 44 24 10 mov %rax,0x10(%rsp)
> > 26: 48 c1 e8 03 shr $0x3,%rax
> > * 2a: 0f b6 04 38 movzbl (%rax,%rdi,1),%eax <-- trapping instruction
> > 2e: 84 c0 test %al,%al
> > 30: 74 08 je 0x3a
> > 32: 3c 03 cmp $0x3,%al
> > 34: 0f 8e 47 01 00 00 jle 0x181
> > 3a: 48 8b 44 24 08 mov 0x8(%rsp),%rax
> > 3f: 41 rex.B
> >
> >
> > ---
> > This report is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at syzkaller@xxxxxxxxxxxxxxxx.
> >
> > syzbot will keep track of this issue. See:
> > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> > For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> >
> > If the report is already addressed, let syzbot know by replying with:
> > #syz fix: exact-commit-title
> >
> > If you want syzbot to run the reproducer, reply with:
> > #syz test: git://repo/address.git branch-or-commit-hash
> > If you attach or paste a git patch, syzbot will apply it before testing.
> >
> > If you want to overwrite report's subsystems, reply with:
> > #syz set subsystems: new-subsystem
> > (See the list of subsystem names on the web dashboard)
> >
> > If the report is a duplicate of another one, reply with:
> > #syz dup: exact-subject-of-another-report
> >
> > If you want to undo deduplication, reply with:
> > #syz undup
>