Re: [syzbot] general protection fault in skb_dequeue (3)

From: David Hildenbrand
Date: Thu Feb 02 2023 - 04:03:42 EST


On 02.02.23 09:52, David Howells wrote:
Hi John, David,

Could you have a look at this?

syzbot found the following issue on:

HEAD commit: 80bd9028feca Add linux-next specific files for 20230131
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1468e369480000
kernel config: https://syzkaller.appspot.com/x/.config?x=904dc2f450eaad4a
dashboard link: https://syzkaller.appspot.com/bug?extid=a440341a59e3b7142895
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12c5d2be480000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11259a79480000
...
The issue was bisected to:

commit 920756a3306a35f1c08f25207d375885bef98975
Author: David Howells <dhowells@xxxxxxxxxx>
Date: Sat Jan 21 12:51:18 2023 +0000

block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=170384f9480000
final oops: https://syzkaller.appspot.com/x/report.txt?x=148384f9480000
console output: https://syzkaller.appspot.com/x/log.txt?x=108384f9480000
...
general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 PID: 2838 Comm: kworker/u4:6 Not tainted 6.2.0-rc6-next-20230131-syzkaller-09515-g80bd9028feca #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023
Workqueue: phy4 ieee80211_iface_work
RIP: 0010:__skb_unlink include/linux/skbuff.h:2321 [inline]
RIP: 0010:__skb_dequeue include/linux/skbuff.h:2337 [inline]
RIP: 0010:skb_dequeue+0xf5/0x180 net/core/skbuff.c:3511

I don't think this is specifically related to anything networking. I've run
it a few times and weird stuff happens in various places. I'm wondering if
it's related to FOLL_PIN in some way.

The syzbot test in question does the following:

#{"repeat":true,"procs":1,"slowdown":1,"sandbox":"none","sandbox_arg":0,"netdev":true,"cgroups":true,"close_fds":true,"usb":true,"wifi":true,"sysctl":true,"tmpdir":true}
socket(0x0, 0x2, 0x0)
epoll_create(0x7)
r0 = creat(&(0x7f0000000040)='./bus\x00', 0x9)
ftruncate(r0, 0x800)
lseek(r0, 0x200, 0x2)
r1 = open(&(0x7f0000000000)='./bus\x00', 0x24000, 0x0) <-- O_DIRECT
sendfile(r0, r1, 0x0, 0x1dd00)

Basically a DIO splice from a file to itself.

I've hand-written my own much simpler tester (see attached). You need to run
at least two copies in parallel, I think, to trigger the bug. It's possible
truncate is interfering somehow.

David
---
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/sendfile.h>
#include <sys/wait.h>

#define file_size 0x800
#define send_size 0x1dd00
#define repeat_count 1000

int main(int argc, char *argv[])
{
int in, out, i, wt;

if (argc != 2 || !argv[1][0]) {
fprintf(stderr, "Usage: %s <file>\n", argv[0]);
exit(2);
}

for (i = 0; i < repeat_count; i++) {
switch (fork()) {
case -1:
perror("fork");
exit(1);
case 0:
out = creat(argv[1], 0666);
if (out < 0) {
perror(argv[1]);
exit(1);
}

if (ftruncate(out, file_size) < 0) {
perror("ftruncate");
exit(1);
}

if (lseek(out, file_size, SEEK_SET) < 0) {
perror("lseek");
exit(1);
}

in = open(argv[1], O_RDONLY | O_DIRECT | O_NOFOLLOW);
if (in < 0) {
perror("open");
exit(1);
}

if (sendfile(out, in, NULL, send_size) < 0) {
perror("sendfile");
exit(1);
}
exit(0);

[as raised on IRC]

At first, I wondered if that's related to shared anonymous pages getting pinned R/O that would trigger COW-unsharing ... but I don't even see where we are supposed to use FOLL_PIN vs. FOLL_GET here? IOW, we're not even supposed to access user space memory (neither FOLL_GET nor FOLL_PIN) but still end up with a change in behavior.

--
Thanks,

David / dhildenb