Re: [syzbot] general protection fault in skb_dequeue (3)

From: David Howells
Date: Thu Feb 02 2023 - 03:53:24 EST


Hi John, David,

Could you have a look at this?

> syzbot found the following issue on:
>
> HEAD commit: 80bd9028feca Add linux-next specific files for 20230131
> git tree: linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1468e369480000
> kernel config: https://syzkaller.appspot.com/x/.config?x=904dc2f450eaad4a
> dashboard link: https://syzkaller.appspot.com/bug?extid=a440341a59e3b7142895
> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12c5d2be480000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11259a79480000
> ...
> The issue was bisected to:
>
> commit 920756a3306a35f1c08f25207d375885bef98975
> Author: David Howells <dhowells@xxxxxxxxxx>
> Date: Sat Jan 21 12:51:18 2023 +0000
>
> block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages
>
> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=170384f9480000
> final oops: https://syzkaller.appspot.com/x/report.txt?x=148384f9480000
> console output: https://syzkaller.appspot.com/x/log.txt?x=108384f9480000
> ...
> general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
> KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
> CPU: 0 PID: 2838 Comm: kworker/u4:6 Not tainted 6.2.0-rc6-next-20230131-syzkaller-09515-g80bd9028feca #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023
> Workqueue: phy4 ieee80211_iface_work
> RIP: 0010:__skb_unlink include/linux/skbuff.h:2321 [inline]
> RIP: 0010:__skb_dequeue include/linux/skbuff.h:2337 [inline]
> RIP: 0010:skb_dequeue+0xf5/0x180 net/core/skbuff.c:3511

I don't think this is specifically related to anything networking. I've run
it a few times and weird stuff happens in various places. I'm wondering if
it's related to FOLL_PIN in some way.

The syzbot test in question does the following:

#{"repeat":true,"procs":1,"slowdown":1,"sandbox":"none","sandbox_arg":0,"netdev":true,"cgroups":true,"close_fds":true,"usb":true,"wifi":true,"sysctl":true,"tmpdir":true}
socket(0x0, 0x2, 0x0)
epoll_create(0x7)
r0 = creat(&(0x7f0000000040)='./bus\x00', 0x9)
ftruncate(r0, 0x800)
lseek(r0, 0x200, 0x2)
r1 = open(&(0x7f0000000000)='./bus\x00', 0x24000, 0x0) <-- O_DIRECT
sendfile(r0, r1, 0x0, 0x1dd00)

Basically a DIO splice from a file to itself.

I've hand-written my own much simpler tester (see attached). You need to run
at least two copies in parallel, I think, to trigger the bug. It's possible
truncate is interfering somehow.

David
---
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/sendfile.h>
#include <sys/wait.h>

#define file_size 0x800
#define send_size 0x1dd00
#define repeat_count 1000

int main(int argc, char *argv[])
{
int in, out, i, wt;

if (argc != 2 || !argv[1][0]) {
fprintf(stderr, "Usage: %s <file>\n", argv[0]);
exit(2);
}

for (i = 0; i < repeat_count; i++) {
switch (fork()) {
case -1:
perror("fork");
exit(1);
case 0:
out = creat(argv[1], 0666);
if (out < 0) {
perror(argv[1]);
exit(1);
}

if (ftruncate(out, file_size) < 0) {
perror("ftruncate");
exit(1);
}

if (lseek(out, file_size, SEEK_SET) < 0) {
perror("lseek");
exit(1);
}

in = open(argv[1], O_RDONLY | O_DIRECT | O_NOFOLLOW);
if (in < 0) {
perror("open");
exit(1);
}

if (sendfile(out, in, NULL, send_size) < 0) {
perror("sendfile");
exit(1);
}
exit(0);

default:
if (wait(&wt) < 0) {
perror("wait");
exit(1);
}
break;
}
}

exit(0);
}