Re: Xen PV seems to be broken on Linus' tree

From: Juergen Gross
Date: Wed Nov 22 2017 - 07:50:26 EST


On 22/11/17 05:46, Andy Lutomirski wrote:
> On Tue, Nov 21, 2017 at 8:11 PM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>> On Tue, Nov 21, 2017 at 7:33 PM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>>> I'm doing:
>>>
>>> /usr/bin/qemu-system-x86_64 -machine accel=kvm:tcg -cpu host -net none
>>> -nographic -kernel xen-4.8.2 -initrd './arch/x86/boot/bzImage' -m 2G
>>> -smp 2 -append console=com1
>>>
>>> With Linus' commit c8a0739b185d11d6e2ca7ad9f5835841d1cfc765 and the
>>> attached config.
>>>
>>> It dies with a bunch of sensible log lines and then:
>>>
>>> (XEN) d0v0 Unhandled invalid opcode fault/trap [#6, ec=0000]
>>> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08023961a
>>> entry.o#create_bounce_frame+0x137/0x146
>>> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
>>> (XEN) ----[ Xen-4.8.2 x86_64 debug=n Not tainted ]----
>>> (XEN) CPU: 0
>>> (XEN) RIP: e033:[<ffffffff811226eb>]
>>> (XEN) RFLAGS: 0000000000000296 EM: 1 CONTEXT: pv guest (d0v0)
>>> (XEN) rax: 000000000000002f rbx: ffffffff81e65a48 rcx: ffffffff81e71288
>>> (XEN) rdx: ffffffff81e27500 rsi: 0000000000000001 rdi: ffffffff81133f88
>>> (XEN) rbp: 0000000000000000 rsp: ffffffff81e03e78 r8: 0000000000000000
>>> (XEN) r9: 0000000000000001 r10: 0000000000000000 r11: 0000000000000000
>>> (XEN) r12: 0000000000000000 r13: 0000000000000001 r14: 0000000000000001
>>> (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000003506e0
>>> (XEN) cr3: 000000007b0b3000 cr2: 0000000000000000
>>> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033
>>> (XEN) Guest stack trace from rsp=ffffffff81e03e78:
>>> (XEN) ffffffff81e71288 0000000000000000 ffffffff811226eb 000000010000e030
>>> (XEN) 0000000000010096 ffffffff81e03eb8 000000000000e02b ffffffff811226eb
>>> (XEN) ffffffff81122c2e 0000000000000200 0000000000000000 0000000000000000
>>> (XEN) 0000000000000030 ffffffff81c69cf5 ffffffff81080b20 ffffffff81080560
>>> (XEN) 0000000000000000 ffffffff810d3741 ffffffff8107b420 ffffffff81094660
>>>
>>> Is this familiar?
>>>
>>> I'll feel really dumb if it ends up being my fault.
>>
>> Nah, it's broken at least back to v4.13, and I suspect it's config
>> related. objdump gives me this:
>>
>> ffffffff8112b0e1: e9 e8 fe ff ff jmpq
>> ffffffff8112afce <check_flags.part.42+0x4e>
>> ffffffff8112b0e6: 48 c7 c6 2d f8 c8 81 mov $0xffffffff81c8f82d,%rsi
>> ffffffff8112b0ed: 48 c7 c7 58 b9 c8 81 mov $0xffffffff81c8b958,%rdi
>> ffffffff8112b0f4: e8 13 2d 01 00 callq ffffffff8113de0c <printk>
>> ffffffff8112b0f9: 0f ff (bad) <-- crash here
>>
>> That's "ud0", which is used by WARN. So we're probably hitting an
>> early warning and Xen probably has something busted with early
>> exception handling.
>>
>> Anyone want to debug it and fix it?
>
> Well, I think I debugged it. x86_64 has a shiny function
> idt_setup_early_handler(), and Xen doesn't call it. Fixing the
> problem may be as simple as calling it at an appropriate time and
> doing whatever asm magic is needed to deal with Xen's weird IDT
> calling convention.

Hmm, yes, this should work. I'll have a try.

BTW: I don't think this ever worked.


Juergen