Re: BUG: sleeping function called from invalid context at ./include/linux/uaccess.h:LINE

From: David Hildenbrand
Date: Mon Nov 06 2017 - 11:01:18 EST


On 06.11.2017 16:10, Nick Desaulniers wrote:
> Does it have to be stack allocated?

We can't use kmalloc and friends in emulate.c. We would have to
introduce new emulator callbacks.

a) for malloc and free. hmmm.
b) for carrying out the fxrstr/fixup.

Paolo, what do you suggest?

>
> On Nov 6, 2017 3:52 AM, "David Hildenbrand" <david@xxxxxxxxxx
> <mailto:david@xxxxxxxxxx>> wrote:
>
> On 31.10.2017 12:34, syzbot wrote:
> > Hello,
> >
> > syzkaller hit the following crash on
> > 91dfed74eabcdae9378131546c446442c29bf769
> >
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
> <http://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master>
> > compiler: gcc (GCC) 7.1.1 20170620
> > .config is attached
> > Raw console output is attached.
> > C reproducer is attached
> > syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> > for information about syzkaller reproducers
> >
> >
> > in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: syzkaller879109
> > 2 locks held by syzkaller879109/2909:
> >Â Â#0:Â (&vcpu->mutex){+.+.}, at: [<ffffffff8106222c>]
> vcpu_load+0x1c/0x70
> > arch/x86/kvm/../../../virt/kvm/kvm_main.c:154
> >Â Â#1:Â (&kvm->srcu){....}, at: [<ffffffff810dd162>] vcpu_enter_guest
> > arch/x86/kvm/x86.c:6983 [inline]
> >Â Â#1:Â (&kvm->srcu){....}, at: [<ffffffff810dd162>] vcpu_run
> > arch/x86/kvm/x86.c:7061 [inline]
> >Â Â#1:Â (&kvm->srcu){....}, at: [<ffffffff810dd162>]
> > kvm_arch_vcpu_ioctl_run+0x1bc2/0x58b0 arch/x86/kvm/x86.c:7222
> > CPU: 1 PID: 2909 Comm: syzkaller879109 Not tainted
> 4.13.0-rc4-next-20170811
> > #1
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
> 01/01/2011
> > Call Trace:
> >Â Â__dump_stack lib/dump_stack.c:16 [inline]
> >Â Âdump_stack+0x194/0x257 lib/dump_stack.c:52
> >Â Â___might_sleep+0x2b2/0x470 kernel/sched/core.c:6014
> >Â Â__might_sleep+0x95/0x190 kernel/sched/core.c:5967
> >Â Â__might_fault+0xab/0x1d0 mm/memory.c:4383
> >Â Â__copy_from_user include/linux/uaccess.h:71 [inline]
> >Â Â__kvm_read_guest_page+0x58/0xa0
> > arch/x86/kvm/../../../virt/kvm/kvm_main.c:1771
> >Â Âkvm_vcpu_read_guest_page+0x44/0x60
> > arch/x86/kvm/../../../virt/kvm/kvm_main.c:1791
> >Â Âkvm_read_guest_virt_helper+0x76/0x140 arch/x86/kvm/x86.c:4407
> >Â Âkvm_read_guest_virt_system+0x3c/0x50 arch/x86/kvm/x86.c:4466
> >Â Âsegmented_read_std+0x10c/0x180 arch/x86/kvm/emulate.c:819
> >Â Âem_fxrstor+0x27b/0x410 arch/x86/kvm/emulate.c:4022
>
>
> In em_fxrstor, we do a get_fpu(), which in return disables preemption.
>
> With preempt_disable(), we do a
> segmented_read_std()->kvm_vcpu_read_guest_page(), triggering the
> warning.
>
> >Â Âx86_emulate_insn+0x55d/0x3c50 arch/x86/kvm/emulate.c:5471
> >Â Âx86_emulate_instruction+0x411/0x1ca0 arch/x86/kvm/x86.c:5698
> >Â Âkvm_mmu_page_fault+0x18b/0x2c0 arch/x86/kvm/mmu.c:4854
> >Â Âhandle_ept_violation+0x1fc/0x5e0 arch/x86/kvm/vmx.c:6400
> >Â Âvmx_handle_exit+0x281/0x1ab0 arch/x86/kvm/vmx.c:8718
> >Â Âvcpu_enter_guest arch/x86/kvm/x86.c:6999 [inline]
> >Â Âvcpu_run arch/x86/kvm/x86.c:7061 [inline]
> >Â Âkvm_arch_vcpu_ioctl_run+0x1cee/0x58b0 arch/x86/kvm/x86.c:7222
> >Â Âkvm_vcpu_ioctl+0x64c/0x1010
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:2591
> >Â Âvfs_ioctl fs/ioctl.c:45 [inline]
> >Â Âdo_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
> >Â ÂSYSC_ioctl fs/ioctl.c:700 [inline]
> >Â ÂSyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
> >Â Âentry_SYSCALL_64_fastpath+0x1f/0xbe
>
> I don't really see a way to avoid two fxstate variables. Unloading the
> fpu in between the fxstore/fxrstr could lead to host values getting
> overwritten. Loading/saving the fpu in kvm_arch_vcpu_ioctl_run() would
> most probably also not work, as the relevant portions of fxregs_state
> would not get saved/restored. So the preemption would still be needed.
>
>
> So all I can offer for now is the following (untested, can send as
> proper patch if needed):
>
>
> From f32d06c8c621c6d68e073e9bdb81a6280b6c9544 Mon Sep 17 00:00:00 2001
> From: David Hildenbrand <david@xxxxxxxxxx <mailto:david@xxxxxxxxxx>>
> Date: Mon, 6 Nov 2017 12:35:39 +0100
> Subject: [PATCH v1] KVM: x86: fix em_fxstor sleeping while in atomic
>
> Commit 9d643f63128b tried to optimize the stack size, but introduced a
> guest memory access which might sleep while in atomic.
>
> Let's undo that part of the commit but keep the cleanups.
>
> Reported by syzbot:
>
> in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: syzkaller879109
> 2 locks held by syzkaller879109/2909:
> Â #0:Â (&vcpu->mutex){+.+.}, at: [<ffffffff8106222c>]
> vcpu_load+0x1c/0x70
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:154
> Â #1:Â (&kvm->srcu){....}, at: [<ffffffff810dd162>] vcpu_enter_guest
> arch/x86/kvm/x86.c:6983 [inline]
> Â #1:Â (&kvm->srcu){....}, at: [<ffffffff810dd162>] vcpu_run
> arch/x86/kvm/x86.c:7061 [inline]
> Â #1:Â (&kvm->srcu){....}, at: [<ffffffff810dd162>]
> kvm_arch_vcpu_ioctl_run+0x1bc2/0x58b0 arch/x86/kvm/x86.c:7222
> CPU: 1 PID: 2909 Comm: syzkaller879109 Not tainted
> 4.13.0-rc4-next-20170811
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
> 01/01/2011
> Call Trace:
> Â __dump_stack lib/dump_stack.c:16 [inline]
> Â dump_stack+0x194/0x257 lib/dump_stack.c:52
> Â ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6014
> Â __might_sleep+0x95/0x190 kernel/sched/core.c:5967
> Â __might_fault+0xab/0x1d0 mm/memory.c:4383
> Â __copy_from_user include/linux/uaccess.h:71 [inline]
> Â __kvm_read_guest_page+0x58/0xa0
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:1771
> Â kvm_vcpu_read_guest_page+0x44/0x60
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:1791
> Â kvm_read_guest_virt_helper+0x76/0x140 arch/x86/kvm/x86.c:4407
> Â kvm_read_guest_virt_system+0x3c/0x50 arch/x86/kvm/x86.c:4466
> Â segmented_read_std+0x10c/0x180 arch/x86/kvm/emulate.c:819
> Â em_fxrstor+0x27b/0x410 arch/x86/kvm/emulate.c:4022
> Â x86_emulate_insn+0x55d/0x3c50 arch/x86/kvm/emulate.c:5471
> Â x86_emulate_instruction+0x411/0x1ca0 arch/x86/kvm/x86.c:5698
> Â kvm_mmu_page_fault+0x18b/0x2c0 arch/x86/kvm/mmu.c:4854
> Â handle_ept_violation+0x1fc/0x5e0 arch/x86/kvm/vmx.c:6400
> Â vmx_handle_exit+0x281/0x1ab0 arch/x86/kvm/vmx.c:8718
> Â vcpu_enter_guest arch/x86/kvm/x86.c:6999 [inline]
> Â vcpu_run arch/x86/kvm/x86.c:7061 [inline]
> Â kvm_arch_vcpu_ioctl_run+0x1cee/0x58b0 arch/x86/kvm/x86.c:7222
> Â kvm_vcpu_ioctl+0x64c/0x1010
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:2591
> Â vfs_ioctl fs/ioctl.c:45 [inline]
> Â do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
> Â SYSC_ioctl fs/ioctl.c:700 [inline]
> Â SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
> Â entry_SYSCALL_64_fastpath+0x1f/0xbe
> RIP: 0033:0x437fc9
> RSP: 002b:00007ffc7b4d5ab8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000437fc9
> RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005
> RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000020ae8000
> R10: 0000000000009120 R11: 0000000000000206 R12: 0000000000000000
> R13: 0000000000000004 R14: 0000000000000004 R15: 0000000020077000
>
> Fixes: 9d643f63128b ("KVM: x86: avoid large stack allocations in
> em_fxrstor")
> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx
> <mailto:david@xxxxxxxxxx>>
> ---
> Âarch/x86/kvm/emulate.c | 16 +++++++++-------
> Â1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index fb0055953fbc..d87f01a2d6f4 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -4002,7 +4002,7 @@ static int em_fxsave(struct x86_emulate_ctxt
> *ctxt)
>
> Âstatic int em_fxrstor(struct x86_emulate_ctxt *ctxt)
> Â{
> -Â Â Â Âstruct fxregs_state fx_state;
> +Â Â Â Âstruct fxregs_state fx_state, fx_old;
> Â Â Â Â int rc;
> Â Â Â Â size_t size;
>
> @@ -4010,19 +4010,21 @@ static int em_fxrstor(struct
> x86_emulate_ctxt *ctxt)
> Â Â Â Â if (rc != X86EMUL_CONTINUE)
> Â Â Â Â Â Â Â Â return rc;
>
> +Â Â Â Âsize = fxstate_size(ctxt);
> +Â Â Â Ârc = segmented_read_std(ctxt, ctxt->memop.addr.mem,
> &fx_state, size);
> +Â Â Â Âif (rc != X86EMUL_CONTINUE)
> +Â Â Â Â Â Â Â Âreturn rc;
> +
> Â Â Â Â ctxt->ops->get_fpu(ctxt);
>
> -Â Â Â Âsize = fxstate_size(ctxt);
> Â Â Â Â if (size < __fxstate_size(16)) {
> -Â Â Â Â Â Â Â Ârc = asm_safe("fxsave %[fx]", , [fx] "+m"(fx_state));
> +Â Â Â Â Â Â Â Ârc = asm_safe("fxsave %[fx]", , [fx] "+m"(fx_old));
> Â Â Â Â Â Â Â Â if (rc != X86EMUL_CONTINUE)
> Â Â Â Â Â Â Â Â Â Â Â Â goto out;
> +Â Â Â Â Â Â Â Âmemcpy(((void *)&fx_state) + size, ((void *)&fx_old)
> + size,
> +Â Â Â Â Â Â Â Â Â Â Â __fxstate_size(16) - size);
> Â Â Â Â }
>
> -Â Â Â Ârc = segmented_read_std(ctxt, ctxt->memop.addr.mem,
> &fx_state, size);
> -Â Â Â Âif (rc != X86EMUL_CONTINUE)
> -Â Â Â Â Â Â Â Âgoto out;
> -
> Â Â Â Â if (fx_state.mxcsr >> 16) {
> Â Â Â Â Â Â Â Â rc = emulate_gp(ctxt, 0);
> Â Â Â Â Â Â Â Â goto out;
> --
> 2.13.6
>
>
>
> --
>
> Thanks,
>
> David / dhildenb
>


--

Thanks,

David / dhildenb