Re: [PATCH v4] x86/xen: Add support for HVMOP_set_evtchn_upcall_vector

From: Jan Beulich
Date: Fri Nov 11 2022 - 08:17:18 EST


On 11.11.2022 13:44, Juergen Gross wrote:
> On 11.11.22 10:01, Juergen Gross wrote:
>> On 08.11.22 17:26, Jan Beulich wrote:
>>> On 03.11.2022 16:41, Jan Beulich wrote:
>>>> On 03.11.2022 14:38, Jan Beulich wrote:
>>>>> On 29.07.2022 09:04, Jane Malalane wrote:
>>>>>> @@ -125,6 +130,9 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_xen_hvm_callback)
>>>>>>   {
>>>>>>       struct pt_regs *old_regs = set_irq_regs(regs);
>>>>>> +    if (xen_percpu_upcall)
>>>>>> +        ack_APIC_irq();
>>>>>> +
>>>>>>       inc_irq_stat(irq_hv_callback_count);
>>>>>>       xen_hvm_evtchn_do_upcall();
>>>>>> @@ -168,6 +176,15 @@ static int xen_cpu_up_prepare_hvm(unsigned int cpu)
>>>>>>       if (!xen_have_vector_callback)
>>>>>>           return 0;
>>>>>> +    if (xen_percpu_upcall) {
>>>>>> +        rc = xen_set_upcall_vector(cpu);
>>>>>
>>>>>  From all I can tell at least for APs this happens before setup_local_apic().
>>>>> With there being APIC interaction in this operation mode, as seen e.g. in
>>>>> the earlier hunk above, I think this is logically wrong. And it leads to
>>>>> apic_pending_intr_clear() issuing its warning: The vector registration, as
>>>>> an intentional side effect, marks the vector as pending. Unless IRQs were
>>>>> enabled at any point between the registration and the check, there's
>>>>> simply no way for the corresponding IRR bit to be dealt with (by
>>>>> propagating to ISR when the interrupt is delivered, and then being cleared
>>>>> from ISR by EOI).
>>>>
>>>> With Roger's help I now have a pointer to osstest also exposing the issue:
>>>>
>>>> http://logs.test-lab.xenproject.org/osstest/logs/174592/test-amd64-amd64-xl-pvhv2-intel/huxelrebe0---var-log-xen-console-guest-debian.guest.osstest.log.gz
>>>
>>> I've noticed only now that my mail to Jane bounced, and I'm now told
>>> she's no longer in her role at Citrix. Since I don't expect to have time
>>> to investigate an appropriate solution here, may I ask whether one of
>>> the two of you could look into this, being the maintainers of this code?
>>
>> I think the correct way to handle this would be:
>>
>> - rename CPUHP_AP_ARM_XEN_STARTING to CPUHP_AP_XEN_STARTING
>> - move the xen_set_upcall_vector() call to a new hotplug callback
>>   registered for CPUHP_AP_XEN_STARTING (this can be done even
>>   conditionally only if xen_percpu_upcall is set)
>>
>> Writing a patch now ...
>
> For the APs this is working as expected.
>
> The boot processor seems to be harder to fix. The related message is being
> issued even with interrupts being on when setup_local_APIC() is called.

Hmm, puzzling: I don't recall having seen the message for the BSP. Which
made me assume (without having actually checked) that ...

> I've tried to register the callback only after the setup_local_APIC() call,

... it's already happening afterwards in that case.

> but this results in a system hang when the APs are started.
>
> Any ideas?

Not really, to be honest.

Jan