Re: [syzbot] BUG: sleeping function called from invalid context in __fdget_pos

From: Dave Hansen
Date: Tue Jun 29 2021 - 10:46:35 EST


... adding Ard who was recently modifying some of the
kernel_fpu_begin/end() sites in the AESNI crypto code.

On 6/28/21 12:22 PM, syzbot wrote:
> console output: https://syzkaller.appspot.com/x/log.txt?x=170e6c94300000
> kernel config: https://syzkaller.appspot.com/x/.config?x=42ecca11b759d96c
> dashboard link: https://syzkaller.appspot.com/bug?extid=5d1bad8042a8f0e8117a
>
> Unfortunately, I don't have any reproducer for this issue yet.
...
> BUG: sleeping function called from invalid context at kernel/locking/mutex.c:938
> in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 29652, name: syz-executor.0
> no locks held by syz-executor.0/29652.
> Preemption disabled at:
> [<ffffffff812aa454>] kernel_fpu_begin_mask+0x64/0x260 arch/x86/kernel/fpu/core.c:126
> CPU: 0 PID: 29652 Comm: syz-executor.0 Not tainted 5.13.0-rc7-syzkaller #0

There's a better backtrace in the log before the rather useless
backtrace from lockdep:

> [ 1341.360547][T29635] FAULT_INJECTION: forcing a failure.
> [ 1341.360547][T29635] name failslab, interval 1, probability 0, space 0, times 0
> [ 1341.374439][T29635] CPU: 1 PID: 29635 Comm: syz-executor.0 Not tainted 5.13.0-rc7-syzkaller #0
> [ 1341.374712][T29630] FAT-fs (loop2): bogus number of reserved sectors
> [ 1341.383571][T29635] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> [ 1341.383591][T29635] Call Trace:
> [ 1341.383603][T29635] dump_stack+0x141/0x1d7
> [ 1341.383630][T29635] should_fail.cold+0x5/0xa
> [ 1341.383651][T29635] ? skcipher_walk_next+0x6e2/0x1680
> [ 1341.383673][T29635] should_failslab+0x5/0x10
> [ 1341.383691][T29635] __kmalloc+0x72/0x330
> [ 1341.383720][T29635] skcipher_walk_next+0x6e2/0x1680
> [ 1341.383744][T29635] ? kfree+0xe5/0x7f0
> [ 1341.383776][T29635] skcipher_walk_first+0xf8/0x3c0
> [ 1341.383805][T29635] skcipher_walk_virt+0x523/0x760
> [ 1341.445438][T29635] xts_crypt+0x137/0x7f0
> [ 1341.449689][T29635] ? aesni_encrypt+0x80/0x80

There's one suspect-looking site in xts_crypt():

> kernel_fpu_begin();
>
> /* calculate first value of T */
> aesni_enc(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv);
>
> while (walk.nbytes > 0) {
> int nbytes = walk.nbytes;
>
> ...
>
> err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
>
> kernel_fpu_end();
>
> if (walk.nbytes > 0)
> kernel_fpu_begin();
> }

I wonder if a slab allocation failure could leave us with walk.nbytes==0.